From ad37616e9048123ab664bf2064604888e25c539f Mon Sep 17 00:00:00 2001
From: Adriana Reus <adriana.reus@intel.com>
Date: Fri, 12 Jun 2015 19:01:07 +0300
Subject: iio: Documentation: Add additional *scale_available attributes

Added some more *scale_available attributes to the list that
are used in various drivers but were missiong from Documentation.

Signed-off-by: Adriana Reus <adriana.reus@intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
---
 Documentation/ABI/testing/sysfs-bus-iio | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'Documentation')
diff --git a/Documentation/ABI/testing/sysfs-bus-iio b/Documentation/ABI/testing/sysfs-bus-iio
index bbed111c31b4..666a341b01b8 100644
--- a/Documentation/ABI/testing/sysfs-bus-iio
+++ b/Documentation/ABI/testing/sysfs-bus-iio
@@ -413,6 +413,11 @@ Description:
 		to compute the calories burnt by the user.
 
 What:		/sys/bus/iio/devices/iio:deviceX/in_accel_scale_available
+What:		/sys/.../iio:deviceX/in_anglvel_scale_available
+What:		/sys/.../iio:deviceX/in_magn_scale_available
+What:		/sys/.../iio:deviceX/in_illuminance_scale_available
+What:		/sys/.../iio:deviceX/in_intensity_scale_available
+What:		/sys/.../iio:deviceX/in_proximity_scale_available
 What:		/sys/.../iio:deviceX/in_voltageX_scale_available
 What:		/sys/.../iio:deviceX/in_voltage-voltage_scale_available
 What:		/sys/.../iio:deviceX/out_voltageX_scale_available
-- 
cgit v1.2.3


From c046707ff99912878ba63db5f2f2e2916ca961ce Mon Sep 17 00:00:00 2001
From: Hans Verkuil <hans.verkuil@cisco.com>
Date: Mon, 15 Jun 2015 08:33:41 -0300
Subject: [media] DocBook/media: fix bad spacing in VIDIOC_EXPBUF

The VIDIOC_EXPBUF documentation had spurious spaces that made it irritating
to read. Fix this.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
---
 Documentation/DocBook/media/v4l/vidioc-expbuf.xml | 38 +++++++++++------------
 1 file changed, 19 insertions(+), 19 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/media/v4l/vidioc-expbuf.xml b/Documentation/DocBook/media/v4l/vidioc-expbuf.xml
index a78c9207422f..0ae0b6a915d0 100644
--- a/Documentation/DocBook/media/v4l/vidioc-expbuf.xml
+++ b/Documentation/DocBook/media/v4l/vidioc-expbuf.xml
@@ -62,28 +62,28 @@ buffer as a DMABUF file at any time after buffers have been allocated with the
 &VIDIOC-REQBUFS; ioctl.</para>
 
 <para> To export a buffer, applications fill &v4l2-exportbuffer;.  The
-<structfield> type </structfield> field is set to the same buffer type as was
-previously used with  &v4l2-requestbuffers;<structfield> type </structfield>.
-Applications must also set the <structfield> index </structfield> field. Valid
+<structfield>type</structfield> field is set to the same buffer type as was
+previously used with &v4l2-requestbuffers; <structfield>type</structfield>.
+Applications must also set the <structfield>index</structfield> field. Valid
 index numbers range from zero to the number of buffers allocated with
-&VIDIOC-REQBUFS; (&v4l2-requestbuffers;<structfield> count </structfield>)
-minus one.  For the multi-planar API, applications set the <structfield> plane
-</structfield> field to the index of the plane to be exported. Valid planes
+&VIDIOC-REQBUFS; (&v4l2-requestbuffers; <structfield>count</structfield>)
+minus one.  For the multi-planar API, applications set the <structfield>plane</structfield>
+field to the index of the plane to be exported. Valid planes
 range from zero to the maximal number of valid planes for the currently active
-format. For the single-planar API, applications must set <structfield> plane
-</structfield> to zero.  Additional flags may be posted in the <structfield>
-flags </structfield> field.  Refer to a manual for open() for details.
+format. For the single-planar API, applications must set <structfield>plane</structfield>
+to zero.  Additional flags may be posted in the <structfield>flags</structfield>
+field.  Refer to a manual for open() for details.
 Currently only O_CLOEXEC, O_RDONLY, O_WRONLY, and O_RDWR are supported.  All
 other fields must be set to zero.
 In the case of multi-planar API, every plane is exported separately using
-multiple <constant> VIDIOC_EXPBUF </constant> calls. </para>
+multiple <constant>VIDIOC_EXPBUF</constant> calls.</para>
 
-<para> After calling <constant>VIDIOC_EXPBUF</constant> the <structfield> fd
-</structfield> field will be set by a driver.  This is a DMABUF file
+<para>After calling <constant>VIDIOC_EXPBUF</constant> the <structfield>fd</structfield>
+field will be set by a driver.  This is a DMABUF file
 descriptor. The application may pass it to other DMABUF-aware devices. Refer to
 <link linkend="dmabuf">DMABUF importing</link> for details about importing
 DMABUF files into V4L2 nodes. It is recommended to close a DMABUF file when it
-is no longer used to allow the associated memory to be reclaimed. </para>
+is no longer used to allow the associated memory to be reclaimed.</para>
   </refsect1>
 
   <refsect1>
@@ -170,9 +170,9 @@ multi-planar API. Otherwise this value must be set to zero. </entry>
 	  <row>
 	    <entry>__u32</entry>
 	    <entry><structfield>flags</structfield></entry>
-	    <entry>Flags for the newly created file, currently only <constant>
-O_CLOEXEC </constant>, <constant>O_RDONLY</constant>, <constant>O_WRONLY
-</constant>, and <constant>O_RDWR</constant> are supported, refer to the manual
+	    <entry>Flags for the newly created file, currently only
+<constant>O_CLOEXEC</constant>, <constant>O_RDONLY</constant>, <constant>O_WRONLY</constant>,
+and <constant>O_RDWR</constant> are supported, refer to the manual
 of open() for more details.</entry>
 	  </row>
 	  <row>
@@ -200,9 +200,9 @@ set the array to zero.</entry>
 	<term><errorcode>EINVAL</errorcode></term>
 	<listitem>
 	  <para>A queue is not in MMAP mode or DMABUF exporting is not
-supported or <structfield> flags </structfield> or <structfield> type
-</structfield> or <structfield> index </structfield> or <structfield> plane
-</structfield> fields are invalid.</para>
+supported or <structfield>flags</structfield> or <structfield>type</structfield>
+or <structfield>index</structfield> or <structfield>plane</structfield> fields
+are invalid.</para>
 	</listitem>
       </varlistentry>
     </variablelist>
-- 
cgit v1.2.3


From 6a219f15a86812a226d197aa93b2806e9cecda7c Mon Sep 17 00:00:00 2001
From: Ian Molton <ian.molton@codethink.co.uk>
Date: Wed, 3 Jun 2015 10:59:52 -0300
Subject: [media] media: adv7604: document support for ADV7612 dual HDMI input
 decoder

This documentation accompanies the patch adding support for the ADV7612
dual HDMI decoder / repeater chip.

Signed-off-by: Ian Molton <ian.molton@codethink.co.uk>
Reviewed-by: William Towle <william.towle@codethink.co.uk>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
---
 .../devicetree/bindings/media/i2c/adv7604.txt          | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/media/i2c/adv7604.txt b/Documentation/devicetree/bindings/media/i2c/adv7604.txt
index c27cede3bd68..7eafdbc055f9 100644
--- a/Documentation/devicetree/bindings/media/i2c/adv7604.txt
+++ b/Documentation/devicetree/bindings/media/i2c/adv7604.txt
@@ -1,15 +1,17 @@
-* Analog Devices ADV7604/11 video decoder with HDMI receiver
+* Analog Devices ADV7604/11/12 video decoder with HDMI receiver
 
-The ADV7604 and ADV7611 are multiformat video decoders with an integrated HDMI
-receiver. The ADV7604 has four multiplexed HDMI inputs and one analog input,
-and the ADV7611 has one HDMI input and no analog input.
+The ADV7604 and ADV7611/12 are multiformat video decoders with an integrated
+HDMI receiver. The ADV7604 has four multiplexed HDMI inputs and one analog
+input, and the ADV7611 has one HDMI input and no analog input. The 7612 is
+similar to the 7611 but has 2 HDMI inputs.
 
-These device tree bindings support the ADV7611 only at the moment.
+These device tree bindings support the ADV7611/12 only at the moment.
 
 Required Properties:
 
   - compatible: Must contain one of the following
     - "adi,adv7611" for the ADV7611
+    - "adi,adv7612" for the ADV7612
 
   - reg: I2C slave address
 
@@ -22,10 +24,10 @@ port, in accordance with the video interface bindings defined in
 Documentation/devicetree/bindings/media/video-interfaces.txt. The port nodes
 are numbered as follows.
 
-  Port			ADV7611
+  Port			ADV7611    ADV7612
 ------------------------------------------------------------
-  HDMI			0
-  Digital output	1
+  HDMI			0             0, 1
+  Digital output	1                2
 
 The digital output port node must contain at least one endpoint.
 
-- 
cgit v1.2.3


From bf9c82278c348eb6b72496de6a3e5269aadb6b34 Mon Sep 17 00:00:00 2001
From: Ian Molton <ian.molton@codethink.co.uk>
Date: Wed, 3 Jun 2015 10:59:53 -0300
Subject: [media] media: adv7604: ability to read default input port from DT

Adds support to the adv7604 driver for specifying the default input
port in the Device tree. If no value is provided, the driver will be
unable to select an input without help from userspace.

Tested-by: William Towle <william.towle@codethink.co.uk>
Signed-off-by: Ian Molton <ian.molton@codethink.co.uk>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
---
 Documentation/devicetree/bindings/media/i2c/adv7604.txt | 3 +++
 drivers/media/i2c/adv7604.c                             | 8 +++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/media/i2c/adv7604.txt b/Documentation/devicetree/bindings/media/i2c/adv7604.txt
index 7eafdbc055f9..8337f75c75da 100644
--- a/Documentation/devicetree/bindings/media/i2c/adv7604.txt
+++ b/Documentation/devicetree/bindings/media/i2c/adv7604.txt
@@ -47,6 +47,7 @@ Optional Endpoint Properties:
   If none of hsync-active, vsync-active and pclk-sample is specified the
   endpoint will use embedded BT.656 synchronization.
 
+  - default-input: Select which input is selected after reset.
 
 Example:
 
@@ -60,6 +61,8 @@ Example:
 		#address-cells = <1>;
 		#size-cells = <0>;
 
+		default-input = <0>;
+
 		port@0 {
 			reg = <0>;
 		};
diff --git a/drivers/media/i2c/adv7604.c b/drivers/media/i2c/adv7604.c
index c8fefeab0513..21b549a8dc74 100644
--- a/drivers/media/i2c/adv7604.c
+++ b/drivers/media/i2c/adv7604.c
@@ -2772,6 +2772,7 @@ static int adv76xx_parse_dt(struct adv76xx_state *state)
 	struct device_node *endpoint;
 	struct device_node *np;
 	unsigned int flags;
+	u32 v;
 
 	np = state->i2c_clients[ADV76XX_PAGE_IO]->dev.of_node;
 
@@ -2781,6 +2782,12 @@ static int adv76xx_parse_dt(struct adv76xx_state *state)
 		return -EINVAL;
 
 	v4l2_of_parse_endpoint(endpoint, &bus_cfg);
+
+	if (!of_property_read_u32(endpoint, "default-input", &v))
+		state->pdata.default_input = v;
+	else
+		state->pdata.default_input = -1;
+
 	of_node_put(endpoint);
 
 	flags = bus_cfg.bus.parallel.flags;
@@ -2819,7 +2826,6 @@ static int adv76xx_parse_dt(struct adv76xx_state *state)
 	/* Hardcode the remaining platform data fields. */
 	state->pdata.disable_pwrdnb = 0;
 	state->pdata.disable_cable_det_rst = 0;
-	state->pdata.default_input = -1;
 	state->pdata.blank_data = 1;
 	state->pdata.alt_data_sat = 1;
 	state->pdata.op_format_mode_sel = ADV7604_OP_FORMAT_MODE0;
-- 
cgit v1.2.3


From 0760bacef728f6c980278188a5aac7f0fbe8bed8 Mon Sep 17 00:00:00 2001
From: Karsten Merker <merker@debian.org>
Date: Tue, 23 Jun 2015 19:02:29 +0200
Subject: devicetree: Add msi to the vendor-prefix list

Document the the "msi" (Micro-Star International Co. Ltd.) vendor prefix
which is used in sun6i-a31s-primo81.dts.

Signed-off-by: Karsten Merker <merker@debian.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
---
 Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index d444757c4d9e..c80e1173bbee 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -135,6 +135,7 @@ mitsubishi	Mitsubishi Electric Corporation
 mosaixtech	Mosaix Technologies, Inc.
 moxa	Moxa
 mpl	MPL AG
+msi	Micro-Star International Co. Ltd.
 mti	Imagination Technologies Ltd. (formerly MIPS Technologies Inc.)
 mundoreader	Mundo Reader S.L.
 murata	Murata Manufacturing Co., Ltd.
-- 
cgit v1.2.3


From 859e42800bcfc4db9cefaa2c24d6e3a203fe961d Mon Sep 17 00:00:00 2001
From: Sascha Hauer <s.hauer@pengutronix.de>
Date: Wed, 24 Jun 2015 08:17:03 +0200
Subject: dt-bindings: soc: Add documentation for the MediaTek SCPSYS unit

This adds documentation for the MediaTek SCPSYS unit found in MT8173 SoCs.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
---
 .../devicetree/bindings/soc/mediatek/scpsys.txt    | 41 ++++++++++++++++++++++
 1 file changed, 41 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/soc/mediatek/scpsys.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/soc/mediatek/scpsys.txt b/Documentation/devicetree/bindings/soc/mediatek/scpsys.txt
new file mode 100644
index 000000000000..c0511142b39c
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/mediatek/scpsys.txt
@@ -0,0 +1,41 @@
+MediaTek SCPSYS
+===============
+
+The System Control Processor System (SCPSYS) has several power management
+related tasks in the system. The tasks include thermal measurement, dynamic
+voltage frequency scaling (DVFS), interrupt filter and lowlevel sleep control.
+The System Power Manager (SPM) inside the SCPSYS is for the MTCMOS power
+domain control.
+
+The driver implements the Generic PM domain bindings described in
+power/power_domain.txt. It provides the power domains defined in
+include/dt-bindings/power/mt8173-power.h.
+
+Required properties:
+- compatible: Must be "mediatek,mt8173-scpsys"
+- #power-domain-cells: Must be 1
+- reg: Address range of the SCPSYS unit
+- infracfg: must contain a phandle to the infracfg controller
+- clock, clock-names: clocks according to the common clock binding.
+                      The clocks needed "mm" and "mfg". These are the
+		      clocks which hardware needs to be enabled before
+		      enabling certain power domains.
+
+Example:
+
+	scpsys: scpsys@10006000 {
+		#power-domain-cells = <1>;
+		compatible = "mediatek,mt8173-scpsys";
+		reg = <0 0x10006000 0 0x1000>;
+		infracfg = <&infracfg>;
+		clocks = <&clk26m>,
+			 <&topckgen CLK_TOP_MM_SEL>;
+		clock-names = "mfg", "mm";
+	};
+
+Example consumer:
+
+	afe: mt8173-afe-pcm@11220000 {
+		compatible = "mediatek,mt8173-afe-pcm";
+		power-domains = <&scpsys MT8173_POWER_DOMAIN_AUDIO>;
+	};
-- 
cgit v1.2.3


From 1fdc5ae80868637926e76a32e8c62cdd093b48f4 Mon Sep 17 00:00:00 2001
From: Marek Belisko <marek@goldelico.com>
Date: Sat, 20 Jun 2015 22:47:02 +0200
Subject: Documentation: vendor-prefixes: Add option prefix

Add option to vendor-prefixes file which will be used for Option NV company.

Signed-off-by: Marek Belisko <marek@goldelico.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index d444757c4d9e..d247994a43f2 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -150,6 +150,7 @@ nvidia	NVIDIA
 nxp	NXP Semiconductors
 onnn	ON Semiconductor Corp.
 opencores	OpenCores.org
+option	Option NV
 ortustech	Ortus Technology Co., Ltd.
 ovti	OmniVision Technologies
 panasonic	Panasonic Corporation
-- 
cgit v1.2.3


From e76ea35ae723013392a0f61fcc16026506c0aa7f Mon Sep 17 00:00:00 2001
From: Heiko Stuebner <heiko@sntech.de>
Date: Sun, 5 Jul 2015 11:00:17 +0200
Subject: dt-bindings: add documentation of rk3668 clock controller

Add the devicetree binding for the cru on the rk3368 which quite similar
structured as previous clock controllers.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ian Campbell <ijc+devicetree@hellion.org.uk>
Cc: Kumar Gala <galak@codeaurora.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 .../bindings/clock/rockchip,rk3368-cru.txt         | 61 ++++++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/rockchip,rk3368-cru.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/clock/rockchip,rk3368-cru.txt b/Documentation/devicetree/bindings/clock/rockchip,rk3368-cru.txt
new file mode 100644
index 000000000000..7c8bbcfed8d2
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/rockchip,rk3368-cru.txt
@@ -0,0 +1,61 @@
+* Rockchip RK3368 Clock and Reset Unit
+
+The RK3368 clock controller generates and supplies clock to various
+controllers within the SoC and also implements a reset controller for SoC
+peripherals.
+
+Required Properties:
+
+- compatible: should be "rockchip,rk3368-cru"
+- reg: physical base address of the controller and length of memory mapped
+  region.
+- #clock-cells: should be 1.
+- #reset-cells: should be 1.
+
+Optional Properties:
+
+- rockchip,grf: phandle to the syscon managing the "general register files"
+  If missing, pll rates are not changeable, due to the missing pll lock status.
+
+Each clock is assigned an identifier and client nodes can use this identifier
+to specify the clock which they consume. All available clocks are defined as
+preprocessor macros in the dt-bindings/clock/rk3368-cru.h headers and can be
+used in device tree sources. Similar macros exist for the reset sources in
+these files.
+
+External clocks:
+
+There are several clocks that are generated outside the SoC. It is expected
+that they are defined using standard clock bindings with following
+clock-output-names:
+ - "xin24m" - crystal input - required,
+ - "xin32k" - rtc clock - optional,
+ - "ext_i2s" - external I2S clock - optional,
+ - "ext_gmac" - external GMAC clock - optional
+ - "ext_hsadc" - external HSADC clock - optional,
+ - "ext_isp" - external ISP clock - optional,
+ - "ext_jtag" - external JTAG clock - optional
+ - "ext_vip" - external VIP clock - optional,
+ - "usbotg_out" - output clock of the pll in the otg phy
+
+Example: Clock controller node:
+
+	cru: clock-controller@ff760000 {
+		compatible = "rockchip,rk3368-cru";
+		reg = <0x0 0xff760000 0x0 0x1000>;
+		rockchip,grf = <&grf>;
+		#clock-cells = <1>;
+		#reset-cells = <1>;
+	};
+
+Example: UART controller node that consumes the clock generated by the clock
+  controller:
+
+	uart0: serial@10124000 {
+		compatible = "snps,dw-apb-uart";
+		reg = <0x10124000 0x400>;
+		interrupts = <GIC_SPI 34 IRQ_TYPE_LEVEL_HIGH>;
+		reg-shift = <2>;
+		reg-io-width = <1>;
+		clocks = <&cru SCLK_UART0>;
+	};
-- 
cgit v1.2.3


From 80eeb1f0f757c790b020d9f425bb0e824973d49c Mon Sep 17 00:00:00 2001
From: Sergej Sawazki <ce3a@gmx.de>
Date: Sun, 28 Jun 2015 16:24:55 +0200
Subject: clk: add gpio controlled clock multiplexer

Add a common clock driver for basic gpio controlled clock multiplexers.
This driver can be used for devices like 5V41068A or 831721I from IDT
or for discrete multiplexer circuits. The 'select' pin selects one of
two parent clocks.

Cc: Jyri Sarha <jsarha@ti.com>
Signed-off-by: Sergej Sawazki <ce3a@gmx.de>
[sboyd@codeaurora.org: Fix error paths to free memory and do it
in the correct order]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 .../devicetree/bindings/clock/gpio-mux-clock.txt   |  19 ++
 drivers/clk/clk-gpio-gate.c                        | 242 +++++++++++++++------
 include/linux/clk-provider.h                       |  17 ++
 3 files changed, 214 insertions(+), 64 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/clock/gpio-mux-clock.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/clock/gpio-mux-clock.txt b/Documentation/devicetree/bindings/clock/gpio-mux-clock.txt
new file mode 100644
index 000000000000..2be1e038ca62
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/gpio-mux-clock.txt
@@ -0,0 +1,19 @@
+Binding for simple gpio clock multiplexer.
+
+This binding uses the common clock binding[1].
+
+[1] Documentation/devicetree/bindings/clock/clock-bindings.txt
+
+Required properties:
+- compatible : shall be "gpio-mux-clock".
+- clocks: list of two references to parent clocks.
+- #clock-cells : from common clock binding; shall be set to 0.
+- select-gpios : GPIO reference for selecting the parent clock.
+
+Example:
+	clock {
+		compatible = "gpio-mux-clock";
+		clocks = <&parentclk1>, <&parentclk2>;
+		#clock-cells = <0>;
+		select-gpios = <&gpio 1 GPIO_ACTIVE_HIGH>;
+	};
diff --git a/drivers/clk/clk-gpio-gate.c b/drivers/clk/clk-gpio-gate.c
index ef942daa955a..c0d202c24a97 100644
--- a/drivers/clk/clk-gpio-gate.c
+++ b/drivers/clk/clk-gpio-gate.c
@@ -1,12 +1,15 @@
 /*
  * Copyright (C) 2013 - 2014 Texas Instruments Incorporated - http://www.ti.com
- * Author: Jyri Sarha <jsarha@ti.com>
+ *
+ * Authors:
+ *    Jyri Sarha <jsarha@ti.com>
+ *    Sergej Sawazki <ce3a@gmx.de>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  *
- * Gpio gated clock implementation
+ * Gpio controlled clock implementation
  */
 
 #include <linux/clk-provider.h>
@@ -61,24 +64,55 @@ const struct clk_ops clk_gpio_gate_ops = {
 EXPORT_SYMBOL_GPL(clk_gpio_gate_ops);
 
 /**
- * clk_register_gpio - register a gpip clock with the clock framework
- * @dev: device that is registering this clock
- * @name: name of this clock
- * @parent_name: name of this clock's parent
- * @gpio: gpio number to gate this clock
- * @active_low: true if gpio should be set to 0 to enable clock
- * @flags: clock flags
+ * DOC: basic clock multiplexer which can be controlled with a gpio output
+ * Traits of this clock:
+ * prepare - clk_prepare only ensures that parents are prepared
+ * rate - rate is only affected by parent switching.  No clk_set_rate support
+ * parent - parent is adjustable through clk_set_parent
  */
-struct clk *clk_register_gpio_gate(struct device *dev, const char *name,
-		const char *parent_name, unsigned gpio, bool active_low,
-		unsigned long flags)
+
+static u8 clk_gpio_mux_get_parent(struct clk_hw *hw)
 {
-	struct clk_gpio *clk_gpio = NULL;
-	struct clk *clk = ERR_PTR(-EINVAL);
-	struct clk_init_data init = { NULL };
+	struct clk_gpio *clk = to_clk_gpio(hw);
+
+	return gpiod_get_value(clk->gpiod);
+}
+
+static int clk_gpio_mux_set_parent(struct clk_hw *hw, u8 index)
+{
+	struct clk_gpio *clk = to_clk_gpio(hw);
+
+	gpiod_set_value(clk->gpiod, index);
+
+	return 0;
+}
+
+const struct clk_ops clk_gpio_mux_ops = {
+	.get_parent = clk_gpio_mux_get_parent,
+	.set_parent = clk_gpio_mux_set_parent,
+	.determine_rate = __clk_mux_determine_rate,
+};
+EXPORT_SYMBOL_GPL(clk_gpio_mux_ops);
+
+static struct clk *clk_register_gpio(struct device *dev, const char *name,
+		const char **parent_names, u8 num_parents, unsigned gpio,
+		bool active_low, unsigned long flags,
+		const struct clk_ops *clk_gpio_ops)
+{
+	struct clk_gpio *clk_gpio;
+	struct clk *clk;
+	struct clk_init_data init = {};
 	unsigned long gpio_flags;
 	int err;
 
+	if (dev)
+		clk_gpio = devm_kzalloc(dev, sizeof(*clk_gpio),	GFP_KERNEL);
+	else
+		clk_gpio = kzalloc(sizeof(*clk_gpio), GFP_KERNEL);
+
+	if (!clk_gpio)
+		return ERR_PTR(-ENOMEM);
+
 	if (active_low)
 		gpio_flags = GPIOF_ACTIVE_LOW | GPIOF_OUT_INIT_HIGH;
 	else
@@ -88,70 +122,108 @@ struct clk *clk_register_gpio_gate(struct device *dev, const char *name,
 		err = devm_gpio_request_one(dev, gpio, gpio_flags, name);
 	else
 		err = gpio_request_one(gpio, gpio_flags, name);
-
 	if (err) {
 		if (err != -EPROBE_DEFER)
 			pr_err("%s: %s: Error requesting clock control gpio %u\n",
 					__func__, name, gpio);
-		return ERR_PTR(err);
-	}
-
-	if (dev)
-		clk_gpio = devm_kzalloc(dev, sizeof(struct clk_gpio),
-					GFP_KERNEL);
-	else
-		clk_gpio = kzalloc(sizeof(struct clk_gpio), GFP_KERNEL);
+		if (!dev)
+			kfree(clk_gpio);
 
-	if (!clk_gpio) {
-		clk = ERR_PTR(-ENOMEM);
-		goto clk_register_gpio_gate_err;
+		return ERR_PTR(err);
 	}
 
 	init.name = name;
-	init.ops = &clk_gpio_gate_ops;
+	init.ops = clk_gpio_ops;
 	init.flags = flags | CLK_IS_BASIC;
-	init.parent_names = (parent_name ? &parent_name : NULL);
-	init.num_parents = (parent_name ? 1 : 0);
+	init.parent_names = parent_names;
+	init.num_parents = num_parents;
 
 	clk_gpio->gpiod = gpio_to_desc(gpio);
 	clk_gpio->hw.init = &init;
 
-	clk = clk_register(dev, &clk_gpio->hw);
+	if (dev)
+		clk = devm_clk_register(dev, &clk_gpio->hw);
+	else
+		clk = clk_register(NULL, &clk_gpio->hw);
 
 	if (!IS_ERR(clk))
 		return clk;
 
-	if (!dev)
+	if (!dev) {
+		gpiod_put(clk_gpio->gpiod);
 		kfree(clk_gpio);
-
-clk_register_gpio_gate_err:
-	if (!dev)
-		gpio_free(gpio);
+	}
 
 	return clk;
 }
+
+/**
+ * clk_register_gpio_gate - register a gpio clock gate with the clock framework
+ * @dev: device that is registering this clock
+ * @name: name of this clock
+ * @parent_name: name of this clock's parent
+ * @gpio: gpio number to gate this clock
+ * @active_low: true if gpio should be set to 0 to enable clock
+ * @flags: clock flags
+ */
+struct clk *clk_register_gpio_gate(struct device *dev, const char *name,
+		const char *parent_name, unsigned gpio, bool active_low,
+		unsigned long flags)
+{
+	return clk_register_gpio(dev, name,
+			(parent_name ? &parent_name : NULL),
+			(parent_name ? 1 : 0), gpio, active_low, flags,
+			&clk_gpio_gate_ops);
+}
 EXPORT_SYMBOL_GPL(clk_register_gpio_gate);
 
+/**
+ * clk_register_gpio_mux - register a gpio clock mux with the clock framework
+ * @dev: device that is registering this clock
+ * @name: name of this clock
+ * @parent_names: names of this clock's parents
+ * @num_parents: number of parents listed in @parent_names
+ * @gpio: gpio number to gate this clock
+ * @active_low: true if gpio should be set to 0 to enable clock
+ * @flags: clock flags
+ */
+struct clk *clk_register_gpio_mux(struct device *dev, const char *name,
+		const char **parent_names, u8 num_parents, unsigned gpio,
+		bool active_low, unsigned long flags)
+{
+	if (num_parents != 2) {
+		pr_err("mux-clock %s must have 2 parents\n", name);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return clk_register_gpio(dev, name, parent_names, num_parents,
+			gpio, active_low, flags, &clk_gpio_mux_ops);
+}
+EXPORT_SYMBOL_GPL(clk_register_gpio_mux);
+
 #ifdef CONFIG_OF
 /**
- * The clk_register_gpio_gate has to be delayed, because the EPROBE_DEFER
+ * clk_register_get() has to be delayed, because -EPROBE_DEFER
  * can not be handled properly at of_clk_init() call time.
  */
 
-struct clk_gpio_gate_delayed_register_data {
+struct clk_gpio_delayed_register_data {
+	const char *gpio_name;
 	struct device_node *node;
 	struct mutex lock;
 	struct clk *clk;
+	struct clk *(*clk_register_get)(const char *name,
+			const char **parent_names, u8 num_parents,
+			unsigned gpio, bool active_low);
 };
 
-static struct clk *of_clk_gpio_gate_delayed_register_get(
-		struct of_phandle_args *clkspec,
-		void *_data)
+static struct clk *of_clk_gpio_delayed_register_get(
+		struct of_phandle_args *clkspec, void *_data)
 {
-	struct clk_gpio_gate_delayed_register_data *data = _data;
+	struct clk_gpio_delayed_register_data *data = _data;
 	struct clk *clk;
-	const char *clk_name = data->node->name;
-	const char *parent_name;
+	const char **parent_names;
+	int i, num_parents;
 	int gpio;
 	enum of_gpio_flags of_flags;
 
@@ -162,47 +234,89 @@ static struct clk *of_clk_gpio_gate_delayed_register_get(
 		return data->clk;
 	}
 
-	gpio = of_get_named_gpio_flags(data->node, "enable-gpios", 0,
-						&of_flags);
+	gpio = of_get_named_gpio_flags(data->node, data->gpio_name, 0,
+			&of_flags);
 	if (gpio < 0) {
 		mutex_unlock(&data->lock);
-		if (gpio != -EPROBE_DEFER)
-			pr_err("%s: %s: Can't get 'enable-gpios' DT property\n",
-			       __func__, clk_name);
+		if (gpio == -EPROBE_DEFER)
+			pr_debug("%s: %s: GPIOs not yet available, retry later\n",
+					data->node->name, __func__);
+		else
+			pr_err("%s: %s: Can't get '%s' DT property\n",
+					data->node->name, __func__,
+					data->gpio_name);
 		return ERR_PTR(gpio);
 	}
 
-	parent_name = of_clk_get_parent_name(data->node, 0);
+	num_parents = of_clk_get_parent_count(data->node);
 
-	clk = clk_register_gpio_gate(NULL, clk_name, parent_name, gpio,
-					of_flags & OF_GPIO_ACTIVE_LOW, 0);
-	if (IS_ERR(clk)) {
-		mutex_unlock(&data->lock);
-		return clk;
-	}
+	parent_names = kcalloc(num_parents, sizeof(char *), GFP_KERNEL);
+	if (!parent_names)
+		return ERR_PTR(-ENOMEM);
+
+	for (i = 0; i < num_parents; i++)
+		parent_names[i] = of_clk_get_parent_name(data->node, i);
+
+	clk = data->clk_register_get(data->node->name, parent_names,
+			num_parents, gpio, of_flags & OF_GPIO_ACTIVE_LOW);
+	if (IS_ERR(clk))
+		goto out;
 
 	data->clk = clk;
+out:
 	mutex_unlock(&data->lock);
+	kfree(parent_names);
 
 	return clk;
 }
 
-/**
- * of_gpio_gate_clk_setup() - Setup function for gpio controlled clock
- */
-static void __init of_gpio_gate_clk_setup(struct device_node *node)
+static struct clk *of_clk_gpio_gate_delayed_register_get(const char *name,
+		const char **parent_names, u8 num_parents,
+		unsigned gpio, bool active_low)
+{
+	return clk_register_gpio_gate(NULL, name, parent_names[0],
+			gpio, active_low, 0);
+}
+
+static struct clk *of_clk_gpio_mux_delayed_register_get(const char *name,
+		const char **parent_names, u8 num_parents, unsigned gpio,
+		bool active_low)
+{
+	return clk_register_gpio_mux(NULL, name, parent_names, num_parents,
+			gpio, active_low, 0);
+}
+
+static void __init of_gpio_clk_setup(struct device_node *node,
+		const char *gpio_name,
+		struct clk *(*clk_register_get)(const char *name,
+				const char **parent_names, u8 num_parents,
+				unsigned gpio, bool active_low))
 {
-	struct clk_gpio_gate_delayed_register_data *data;
+	struct clk_gpio_delayed_register_data *data;
 
-	data = kzalloc(sizeof(struct clk_gpio_gate_delayed_register_data),
-		       GFP_KERNEL);
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
 		return;
 
 	data->node = node;
+	data->gpio_name = gpio_name;
+	data->clk_register_get = clk_register_get;
 	mutex_init(&data->lock);
 
-	of_clk_add_provider(node, of_clk_gpio_gate_delayed_register_get, data);
+	of_clk_add_provider(node, of_clk_gpio_delayed_register_get, data);
+}
+
+static void __init of_gpio_gate_clk_setup(struct device_node *node)
+{
+	of_gpio_clk_setup(node, "enable-gpios",
+		of_clk_gpio_gate_delayed_register_get);
 }
 CLK_OF_DECLARE(gpio_gate_clk, "gpio-gate-clock", of_gpio_gate_clk_setup);
+
+void __init of_gpio_mux_clk_setup(struct device_node *node)
+{
+	of_gpio_clk_setup(node, "select-gpios",
+		of_clk_gpio_mux_delayed_register_get);
+}
+CLK_OF_DECLARE(gpio_mux_clk, "gpio-mux-clock", of_gpio_mux_clk_setup);
 #endif
diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index 78842f46f152..823d7f70878e 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -549,6 +549,23 @@ struct clk *clk_register_gpio_gate(struct device *dev, const char *name,
 
 void of_gpio_clk_gate_setup(struct device_node *node);
 
+/**
+ * struct clk_gpio_mux - gpio controlled clock multiplexer
+ *
+ * @hw:		see struct clk_gpio
+ * @gpiod:	gpio descriptor to select the parent of this clock multiplexer
+ *
+ * Clock with a gpio control for selecting the parent clock.
+ * Implements .get_parent, .set_parent and .determine_rate
+ */
+
+extern const struct clk_ops clk_gpio_mux_ops;
+struct clk *clk_register_gpio_mux(struct device *dev, const char *name,
+		const char **parent_names, u8 num_parents, unsigned gpio,
+		bool active_low, unsigned long flags);
+
+void of_gpio_mux_clk_setup(struct device_node *node);
+
 /**
  * clk_register - allocate a new clock, register it and return an opaque cookie
  * @dev: device that is registering this clock
-- 
cgit v1.2.3


From 93e3a9e9993ef018522b7005d97d8124b7e73385 Mon Sep 17 00:00:00 2001
From: Sifan Naeem <sifan.naeem@imgtec.com>
Date: Thu, 18 Jun 2015 13:27:12 +0100
Subject: spi: img-spfi: check for max speed supported by the spfi block

Maximum speed supported by spfi is limited to 1/4 of the spfi clock.

But in some SoCs the maximum speed supported by the spfi block can
be limited to less than 1/4 of the spfi clock. In such cases we have
to define the limit in the device tree so that the driver can pick
it up.

Signed-off-by: Sifan Naeem <sifan.naeem@imgtec.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/devicetree/bindings/spi/spi-img-spfi.txt |  1 +
 drivers/spi/spi-img-spfi.c                             | 14 ++++++++++++++
 2 files changed, 15 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/spi/spi-img-spfi.txt b/Documentation/devicetree/bindings/spi/spi-img-spfi.txt
index e02fbf18c82c..494db6012d02 100644
--- a/Documentation/devicetree/bindings/spi/spi-img-spfi.txt
+++ b/Documentation/devicetree/bindings/spi/spi-img-spfi.txt
@@ -21,6 +21,7 @@ Required properties:
 Optional properties:
 - img,supports-quad-mode: Should be set if the interface supports quad mode
   SPI transfers.
+- spfi-max-frequency: Maximum speed supported by the spfi block.
 
 Example:
 
diff --git a/drivers/spi/spi-img-spfi.c b/drivers/spi/spi-img-spfi.c
index 788e2b176a4f..f31d792a4662 100644
--- a/drivers/spi/spi-img-spfi.c
+++ b/drivers/spi/spi-img-spfi.c
@@ -546,6 +546,7 @@ static int img_spfi_probe(struct platform_device *pdev)
 	struct img_spfi *spfi;
 	struct resource *res;
 	int ret;
+	u32 max_speed_hz;
 
 	master = spi_alloc_master(&pdev->dev, sizeof(*spfi));
 	if (!master)
@@ -610,6 +611,19 @@ static int img_spfi_probe(struct platform_device *pdev)
 	master->max_speed_hz = clk_get_rate(spfi->spfi_clk) / 4;
 	master->min_speed_hz = clk_get_rate(spfi->spfi_clk) / 512;
 
+	/*
+	 * Maximum speed supported by spfi is limited to the lower value
+	 * between 1/4 of the SPFI clock or to "spfi-max-frequency"
+	 * defined in the device tree.
+	 * If no value is defined in the device tree assume the maximum
+	 * speed supported to be 1/4 of the SPFI clock.
+	 */
+	if (!of_property_read_u32(spfi->dev->of_node, "spfi-max-frequency",
+				  &max_speed_hz)) {
+		if (master->max_speed_hz > max_speed_hz)
+			master->max_speed_hz = max_speed_hz;
+	}
+
 	master->setup = img_spfi_setup;
 	master->cleanup = img_spfi_cleanup;
 	master->transfer_one = img_spfi_transfer_one;
-- 
cgit v1.2.3


From 3692db3a904800445f8af249e1ac0f2b6496f31d Mon Sep 17 00:00:00 2001
From: Laxman Dewangan <ldewangan@nvidia.com>
Date: Wed, 1 Jul 2015 18:31:43 +0530
Subject: regulator: max8973: add support to configure ETR from DT

The MAX8973/MAX77621 feature an Enhanced Transient Response(ETR)
circuit that is enabled through software. The enhanced transient
response reduces the voltage droop during large load steps by
temporarily allowing all three phases to fire in unison, slewing
total inductor current faster than would normally be possible if
all three phases continued to operate 120deg out of phase. The
enhanced transient response detector features two selectable
sensitivity settings, which select the output voltage slew rate
during load transients that triggers the ETR circuit. The sensitivity
of the ETR detector is set by the CKADV[1:0] bits in the CONTROL2
register.

Add support to configure the ETR through platform data from DT.
Update the DT binding document accordingly.

Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../bindings/regulator/max8973-regulator.txt          |  6 ++++++
 drivers/regulator/max8973-regulator.c                 | 19 +++++++++++++++++++
 2 files changed, 25 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/regulator/max8973-regulator.txt b/Documentation/devicetree/bindings/regulator/max8973-regulator.txt
index 55efb24e5683..f80ea2fe27e6 100644
--- a/Documentation/devicetree/bindings/regulator/max8973-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/max8973-regulator.txt
@@ -25,6 +25,12 @@ Optional properties:
 -maxim,enable-frequency-shift: boolean, enable 9% frequency shift.
 -maxim,enable-bias-control: boolean, enable bias control. By enabling this
 		startup delay can be reduce to 20us from 220us.
+-maxim,enable-etr: boolean, enable Enhanced Transient Response.
+-maxim,enable-high-etr-sensitivity: boolean, Enhanced transient response
+		circuit is enabled and set for high sensitivity. If this
+		property is available then etr will be enable default.
+
+Enhanced transient response (ETR) will affect the configuration of CKADV.
 
 Example:
 
diff --git a/drivers/regulator/max8973-regulator.c b/drivers/regulator/max8973-regulator.c
index e94ddcf97722..a8fd7f4dd8d8 100644
--- a/drivers/regulator/max8973-regulator.c
+++ b/drivers/regulator/max8973-regulator.c
@@ -421,6 +421,8 @@ static struct max8973_regulator_platform_data *max8973_parse_dt(
 	struct device_node *np = dev->of_node;
 	int ret;
 	u32 pval;
+	bool etr_enable;
+	bool etr_sensitivity_high;
 
 	pdata = devm_kzalloc(dev, sizeof(*pdata), GFP_KERNEL);
 	if (!pdata)
@@ -452,6 +454,23 @@ static struct max8973_regulator_platform_data *max8973_parse_dt(
 	if (of_property_read_bool(np, "maxim,enable-bias-control"))
 		pdata->control_flags  |= MAX8973_CONTROL_BIAS_ENABLE;
 
+	etr_enable = of_property_read_bool(np, "maxim,enable-etr");
+	etr_sensitivity_high = of_property_read_bool(np,
+				"maxim,enable-high-etr-sensitivity");
+	if (etr_sensitivity_high)
+		etr_enable = true;
+
+	if (etr_enable) {
+		if (etr_sensitivity_high)
+			pdata->control_flags |=
+				MAX8973_CONTROL_CLKADV_TRIP_75mV_PER_US;
+		else
+			pdata->control_flags |=
+				MAX8973_CONTROL_CLKADV_TRIP_150mV_PER_US;
+	} else {
+		pdata->control_flags |= MAX8973_CONTROL_CLKADV_TRIP_DISABLED;
+	}
+
 	return pdata;
 }
 
-- 
cgit v1.2.3


From 02258b8bcbb98b28064cc829f7062455da398633 Mon Sep 17 00:00:00 2001
From: Lee Jones <lee.jones@linaro.org>
Date: Tue, 7 Jul 2015 16:06:50 +0100
Subject: regulator: pwm-regulator: Re-write bindings

* Add support for continuous-voltage mode
* Put more meat on the bones with regards to voltage-table mode
* Sort out formatting for ease of consumption

Cc: devicetree@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../bindings/regulator/pwm-regulator.txt           | 68 ++++++++++++++++++----
 1 file changed, 56 insertions(+), 12 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/regulator/pwm-regulator.txt b/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
index ce91f61feb12..892b36655b3d 100644
--- a/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
@@ -1,27 +1,71 @@
-pwm regulator bindings
+Bindings for the Generic PWM Regulator
+======================================
+
+Currently supports 2 modes of operation:
+
+voltage-table:		When in this mode, a voltage table (See below) of
+			predefined voltage <=> duty-cycle values must be
+			provided via DT. Limitations are that the regulator can
+			only operate at the voltages supplied in the table.
+			Intermediary duty-cycle values which would normally
+			allow finer grained voltage selection are ignored and
+			rendered useless.  Although more control is given to
+			the user if the assumptions made in continuous-voltage
+			mode do not reign true.
+
+continuous-voltage:	This mode uses the regulator's maximum and minimum
+			supplied voltages specified in the
+			regulator-{min,max}-microvolt properties to calculate
+			appropriate duty-cycle values.  This allows for a much
+			more fine grained solution when compared with
+			voltage-table mode above.  This solution does make an
+			assumption that a %50 duty-cycle value will cause the
+			regulator voltage to run at half way between the
+			supplied max_uV and min_uV values.
 
 Required properties:
-- compatible: Should be "pwm-regulator"
-- pwms: OF device-tree PWM specification (see PWM binding pwm.txt)
-- voltage-table: voltage and duty table, include 2 members in each set of
-  brackets, first one is voltage(unit: uv), the next is duty(unit: percent)
+--------------------
+- compatible:		Should be "pwm-regulator"
+
+- pwms:			PWM specification (See: ../pwm/pwm.txt)
+
+One of these must be provided:
+- voltage-table: 	Voltage and Duty-Cycle table consisting of 2 cells
+			    First cell is voltage in microvolts (uV)
+			    Second cell is duty-cycle in percent (%)
+
+- max-duty-cycle:	Maximum Duty-Cycle value -- this will normally be
+  			255 (0xff) for an 8 bit PWM device
 
-Any property defined as part of the core regulator binding defined in
-regulator.txt can also be used.
+If both are provided, the current default is voltage-table mode.
 
-Example:
+Any property defined as part of the core regulator binding can also be used.
+(See: ../regulator/regulator.txt)
+
+Continuous Voltage Example:
 	pwm_regulator {
 		compatible = "pwm-regulator;
 		pwms = <&pwm1 0 8448 0>;
+		regulator-min-microvolt = <1016000>;
+		regulator-max-microvolt = <1114000>;
+		regulator-name = "vdd_logic";
+
+		max-duty-cycle = <255>; /* 8bit PWM */
+	};
 
+Voltage Table Example:
+	pwm_regulator {
+		compatible = "pwm-regulator;
+		pwms = <&pwm1 0 8448 0>;
+		regulator-min-microvolt = <1016000>;
+		regulator-max-microvolt = <1114000>;
+		regulator-name = "vdd_logic";
+
+			      /* Voltage Duty-Cycle */
 		voltage-table = <1114000 0>,
 				<1095000 10>,
 				<1076000 20>,
 				<1056000 30>,
 				<1036000 40>,
 				<1016000 50>;
-
-		regulator-min-microvolt = <1016000>;
-		regulator-max-microvolt = <1114000>;
-		regulator-name = "vdd_logic";
 	};
-- 
cgit v1.2.3


From e54e4b1b622e534381b422b8a80a21453dd6f644 Mon Sep 17 00:00:00 2001
From: Roger Shimizu <rogershimizu@gmail.com>
Date: Tue, 23 Jun 2015 00:00:16 +0900
Subject: ARM: dts: add buffalo linkstation ls-wxl/wsxl

Add dts file to support Buffalo Linkstation LS-WXL and LS-WSXL,
which are 2-bay NAS with 3.5" and 2.5" HDD respectively.

Signed-off-by: Roger Shimizu <rogershimizu@gmail.com>
Reviewed-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
 .../devicetree/bindings/arm/marvell,kirkwood.txt   |   1 +
 arch/arm/boot/dts/Makefile                         |   1 +
 arch/arm/boot/dts/kirkwood-lswxl.dts               | 301 +++++++++++++++++++++
 3 files changed, 303 insertions(+)
 create mode 100644 arch/arm/boot/dts/kirkwood-lswxl.dts

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/marvell,kirkwood.txt b/Documentation/devicetree/bindings/arm/marvell,kirkwood.txt
index 4f40ff3fee4b..b79212f60ac4 100644
--- a/Documentation/devicetree/bindings/arm/marvell,kirkwood.txt
+++ b/Documentation/devicetree/bindings/arm/marvell,kirkwood.txt
@@ -20,6 +20,7 @@ And in addition, the compatible shall be extended with the specific
 board. Currently known boards are:
 
 "buffalo,lschlv2"
+"buffalo,lswxl"
 "buffalo,lsxhl"
 "buffalo,lsxl"
 "dlink,dns-320"
diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 246473a244f6..9afd1c955c43 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -176,6 +176,7 @@ dtb-$(CONFIG_MACH_KIRKWOOD) += \
 	kirkwood-km_kirkwood.dtb \
 	kirkwood-laplug.dtb \
 	kirkwood-lschlv2.dtb \
+	kirkwood-lswxl.dtb \
 	kirkwood-lsxhl.dtb \
 	kirkwood-mplcec4.dtb \
 	kirkwood-mv88f6281gtw-ge.dtb \
diff --git a/arch/arm/boot/dts/kirkwood-lswxl.dts b/arch/arm/boot/dts/kirkwood-lswxl.dts
new file mode 100644
index 000000000000..f5db16a08597
--- /dev/null
+++ b/arch/arm/boot/dts/kirkwood-lswxl.dts
@@ -0,0 +1,301 @@
+/*
+ * Device Tree file for Buffalo Linkstation LS-WXL/WSXL
+ *
+ * Copyright (C) 2015, rogershimizu@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+/dts-v1/;
+
+#include "kirkwood.dtsi"
+#include "kirkwood-6281.dtsi"
+
+/ {
+	model = "Buffalo Linkstation LS-WXL/WSXL";
+	compatible = "buffalo,lswxl", "marvell,kirkwood-88f6281", "marvell,kirkwood";
+
+	memory { /* 128 MB */
+		device_type = "memory";
+		reg = <0x00000000 0x8000000>;
+	};
+
+	chosen {
+		bootargs = "console=ttyS0,115200n8 earlyprintk";
+		stdout-path = &uart0;
+	};
+
+	mbus {
+		pcie-controller {
+			status = "okay";
+			pcie@1,0 {
+				status = "okay";
+			};
+		};
+	};
+
+	ocp@f1000000 {
+		pinctrl: pin-controller@10000 {
+			pmx_power_hdd0: pmx-power-hdd0 {
+				marvell,pins = "mpp28";
+				marvell,function = "gpio";
+			};
+			pmx_power_hdd1: pmx-power-hdd1 {
+				marvell,pins = "mpp29";
+				marvell,function = "gpio";
+			};
+			pmx_usb_vbus: pmx-usb-vbus {
+				marvell,pins = "mpp37";
+				marvell,function = "gpio";
+			};
+			pmx_fan_high: pmx-fan-high {
+				marvell,pins = "mpp47";
+				marvell,function = "gpio";
+			};
+			pmx_fan_low: pmx-fan-low {
+				marvell,pins = "mpp48";
+				marvell,function = "gpio";
+			};
+			pmx_led_hdderr0: pmx-led-hdderr0 {
+				marvell,pins = "mpp8";
+				marvell,function = "gpio";
+			};
+			pmx_led_hdderr1: pmx-led-hdderr1 {
+				marvell,pins = "mpp46";
+				marvell,function = "gpio";
+			};
+			pmx_led_alarm: pmx-led-alarm {
+				marvell,pins = "mpp49";
+				marvell,function = "gpio";
+			};
+			pmx_led_function_red: pmx-led-function-red {
+				marvell,pins = "mpp34";
+				marvell,function = "gpio";
+			};
+			pmx_led_function_blue: pmx-led-function-blue {
+				marvell,pins = "mpp36";
+				marvell,function = "gpio";
+			};
+			pmx_led_info: pmx-led-info {
+				marvell,pins = "mpp38";
+				marvell,function = "gpio";
+			};
+			pmx_led_power: pmx-led-power {
+				marvell,pins = "mpp39";
+				marvell,function = "gpio";
+			};
+			pmx_fan_lock: pmx-fan-lock {
+				marvell,pins = "mpp40";
+				marvell,function = "gpio";
+			};
+			pmx_button_function: pmx-button-function {
+				marvell,pins = "mpp41";
+				marvell,function = "gpio";
+			};
+			pmx_power_switch: pmx-power-switch {
+				marvell,pins = "mpp42";
+				marvell,function = "gpio";
+			};
+			pmx_power_auto_switch: pmx-power-auto-switch {
+				marvell,pins = "mpp43";
+				marvell,function = "gpio";
+			};
+		};
+
+		serial@12000 {
+			status = "okay";
+		};
+
+		sata@80000 {
+			status = "okay";
+			nr-ports = <2>;
+		};
+
+		spi@10600 {
+			status = "okay";
+
+			m25p40@0 {
+				#address-cells = <1>;
+				#size-cells = <1>;
+				compatible = "st,m25p40", "jedec,spi-nor";
+				reg = <0>;
+				spi-max-frequency = <25000000>;
+				mode = <0>;
+
+				partition@0 {
+					reg = <0x0 0x60000>;
+					label = "uboot";
+					read-only;
+				};
+
+				partition@60000 {
+					reg = <0x60000 0x10000>;
+					label = "dtb";
+					read-only;
+				};
+
+				partition@70000 {
+					reg = <0x70000 0x10000>;
+					label = "uboot_env";
+				};
+			};
+		};
+	};
+
+	gpio_keys {
+		compatible = "gpio-keys";
+		#address-cells = <1>;
+		#size-cells = <0>;
+		pinctrl-0 = <&pmx_button_function &pmx_power_switch
+			     &pmx_power_auto_switch>;
+		pinctrl-names = "default";
+
+		button@1 {
+			label = "Function Button";
+			linux,code = <KEY_OPTION>;
+			gpios = <&gpio1 41 GPIO_ACTIVE_LOW>;
+		};
+
+		button@2 {
+			label = "Power-on Switch";
+			linux,code = <KEY_RESERVED>;
+			linux,input-type = <5>;
+			gpios = <&gpio1 42 GPIO_ACTIVE_LOW>;
+		};
+
+		button@3 {
+			label = "Power-auto Switch";
+			linux,code = <KEY_ESC>;
+			linux,input-type = <5>;
+			gpios = <&gpio1 43 GPIO_ACTIVE_LOW>;
+		};
+	};
+
+	gpio_leds {
+		compatible = "gpio-leds";
+		pinctrl-0 = <&pmx_led_function_red &pmx_led_alarm
+			     &pmx_led_info &pmx_led_power
+			     &pmx_led_function_blue
+			     &pmx_led_hdderr0
+			     &pmx_led_hdderr1>;
+		pinctrl-names = "default";
+
+		led@1 {
+			label = "lswxl:blue:func";
+			gpios = <&gpio1 36 GPIO_ACTIVE_LOW>;
+		};
+
+		led@2 {
+			label = "lswxl:red:alarm";
+			gpios = <&gpio1 49 GPIO_ACTIVE_LOW>;
+		};
+
+		led@3 {
+			label = "lswxl:amber:info";
+			gpios = <&gpio1 6 GPIO_ACTIVE_LOW>;
+		};
+
+		led@4 {
+			label = "lswxl:blue:power";
+			gpios = <&gpio1 8 GPIO_ACTIVE_LOW>;
+		};
+
+		led@5 {
+			label = "lswxl:red:func";
+			gpios = <&gpio1 5 GPIO_ACTIVE_LOW>;
+			default-state = "keep";
+		};
+
+		led@6 {
+			label = "lswxl:red:hdderr0";
+			gpios = <&gpio1 2 GPIO_ACTIVE_LOW>;
+		};
+
+		led@7 {
+			label = "lswxl:red:hdderr1";
+			gpios = <&gpio1 3 GPIO_ACTIVE_LOW>;
+		};
+	};
+
+	gpio_fan {
+		compatible = "gpio-fan";
+		pinctrl-0 = <&pmx_fan_low &pmx_fan_high &pmx_fan_lock>;
+		pinctrl-names = "default";
+
+		gpios = <&gpio0 47 GPIO_ACTIVE_LOW
+			 &gpio0 48 GPIO_ACTIVE_LOW>;
+
+		gpio-fan,speed-map = <0 3
+				1500 2
+				3250 1
+				5000 0>;
+
+		alarm-gpios = <&gpio1 49 GPIO_ACTIVE_HIGH>;
+	};
+
+	restart_poweroff {
+		compatible = "restart-poweroff";
+	};
+
+	regulators {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <0>;
+		pinctrl-0 = <&pmx_power_hdd0 &pmx_power_hdd1 &pmx_usb_vbus>;
+		pinctrl-names = "default";
+
+		usb_power: regulator@1 {
+			compatible = "regulator-fixed";
+			reg = <1>;
+			regulator-name = "USB Power";
+			regulator-min-microvolt = <5000000>;
+			regulator-max-microvolt = <5000000>;
+			enable-active-high;
+			regulator-always-on;
+			regulator-boot-on;
+			gpio = <&gpio0 37 GPIO_ACTIVE_HIGH>;
+		};
+		hdd_power0: regulator@2 {
+			compatible = "regulator-fixed";
+			reg = <2>;
+			regulator-name = "HDD0 Power";
+			regulator-min-microvolt = <5000000>;
+			regulator-max-microvolt = <5000000>;
+			enable-active-high;
+			regulator-always-on;
+			regulator-boot-on;
+			gpio = <&gpio0 28 GPIO_ACTIVE_HIGH>;
+		};
+		hdd_power1: regulator@3 {
+			compatible = "regulator-fixed";
+			reg = <3>;
+			regulator-name = "HDD1 Power";
+			regulator-min-microvolt = <5000000>;
+			regulator-max-microvolt = <5000000>;
+			enable-active-high;
+			regulator-always-on;
+			regulator-boot-on;
+			gpio = <&gpio0 29 GPIO_ACTIVE_HIGH>;
+		};
+	};
+};
+
+&mdio {
+	status = "okay";
+
+	ethphy1: ethernet-phy@8 {
+		device_type = "ethernet-phy";
+		reg = <8>;
+	};
+};
+
+&eth1 {
+	status = "okay";
+
+	ethernet1-port@0 {
+		phy-handle = <&ethphy1>;
+	};
+};
-- 
cgit v1.2.3


From c43379e150aa335b447f14e55cefd8f1c405cb5f Mon Sep 17 00:00:00 2001
From: Roger Shimizu <rogershimizu@gmail.com>
Date: Tue, 23 Jun 2015 00:02:30 +0900
Subject: ARM: dts: add buffalo linkstation ls-wvl/vl

Add dts file to support Buffalo Linkstation LS-WVL and LS-VL,
which are 3.5" HDD NAS in 2-bay and 1-bay respectively.

[gregory.clement@free-electrons.com: fix typo in pmx-led-function-red]
Signed-off-by: Roger Shimizu <rogershimizu@gmail.com>
Reviewed-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
 .../devicetree/bindings/arm/marvell,kirkwood.txt   |   1 +
 arch/arm/boot/dts/Makefile                         |   1 +
 arch/arm/boot/dts/kirkwood-lswvl.dts               | 301 +++++++++++++++++++++
 3 files changed, 303 insertions(+)
 create mode 100644 arch/arm/boot/dts/kirkwood-lswvl.dts

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/marvell,kirkwood.txt b/Documentation/devicetree/bindings/arm/marvell,kirkwood.txt
index b79212f60ac4..5171ad8f48ff 100644
--- a/Documentation/devicetree/bindings/arm/marvell,kirkwood.txt
+++ b/Documentation/devicetree/bindings/arm/marvell,kirkwood.txt
@@ -20,6 +20,7 @@ And in addition, the compatible shall be extended with the specific
 board. Currently known boards are:
 
 "buffalo,lschlv2"
+"buffalo,lswvl"
 "buffalo,lswxl"
 "buffalo,lsxhl"
 "buffalo,lsxl"
diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 9afd1c955c43..0f354f590a25 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -176,6 +176,7 @@ dtb-$(CONFIG_MACH_KIRKWOOD) += \
 	kirkwood-km_kirkwood.dtb \
 	kirkwood-laplug.dtb \
 	kirkwood-lschlv2.dtb \
+	kirkwood-lswvl.dtb \
 	kirkwood-lswxl.dtb \
 	kirkwood-lsxhl.dtb \
 	kirkwood-mplcec4.dtb \
diff --git a/arch/arm/boot/dts/kirkwood-lswvl.dts b/arch/arm/boot/dts/kirkwood-lswvl.dts
new file mode 100644
index 000000000000..09eed3cea0af
--- /dev/null
+++ b/arch/arm/boot/dts/kirkwood-lswvl.dts
@@ -0,0 +1,301 @@
+/*
+ * Device Tree file for Buffalo Linkstation LS-WVL/VL
+ *
+ * Copyright (C) 2015, rogershimizu@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+/dts-v1/;
+
+#include "kirkwood.dtsi"
+#include "kirkwood-6282.dtsi"
+
+/ {
+	model = "Buffalo Linkstation LS-WVL/VL";
+	compatible = "buffalo,lswvl", "buffalo,lsvl", "marvell,kirkwood-88f6282", "marvell,kirkwood";
+
+	memory { /* 256 MB */
+		device_type = "memory";
+		reg = <0x00000000 0x10000000>;
+	};
+
+	chosen {
+		bootargs = "console=ttyS0,115200n8 earlyprintk";
+		stdout-path = &uart0;
+	};
+
+	mbus {
+		pcie-controller {
+			status = "okay";
+			pcie@1,0 {
+				status = "okay";
+			};
+		};
+	};
+
+	ocp@f1000000 {
+		pinctrl: pin-controller@10000 {
+			pmx_power_hdd0: pmx-power-hdd0 {
+				marvell,pins = "mpp8";
+				marvell,function = "gpio";
+			};
+			pmx_power_hdd1: pmx-power-hdd1 {
+				marvell,pins = "mpp9";
+				marvell,function = "gpio";
+			};
+			pmx_usb_vbus: pmx-usb-vbus {
+				marvell,pins = "mpp12";
+				marvell,function = "gpio";
+			};
+			pmx_fan_high: pmx-fan-high {
+				marvell,pins = "mpp16";
+				marvell,function = "gpio";
+			};
+			pmx_fan_low: pmx-fan-low {
+				marvell,pins = "mpp17";
+				marvell,function = "gpio";
+			};
+			pmx_led_hdderr0: pmx-led-hdderr0 {
+				marvell,pins = "mpp34";
+				marvell,function = "gpio";
+			};
+			pmx_led_hdderr1: pmx-led-hdderr1 {
+				marvell,pins = "mpp35";
+				marvell,function = "gpio";
+			};
+			pmx_led_alarm: pmx-led-alarm {
+				marvell,pins = "mpp36";
+				marvell,function = "gpio";
+			};
+			pmx_led_function_red: pmx-led-function-red {
+				marvell,pins = "mpp37";
+				marvell,function = "gpio";
+			};
+			pmx_led_info: pmx-led-info {
+				marvell,pins = "mpp38";
+				marvell,function = "gpio";
+			};
+			pmx_led_function_blue: pmx-led-function-blue {
+				marvell,pins = "mpp39";
+				marvell,function = "gpio";
+			};
+			pmx_led_power: pmx-led-power {
+				marvell,pins = "mpp40";
+				marvell,function = "gpio";
+			};
+			pmx_fan_lock: pmx-fan-lock {
+				marvell,pins = "mpp43";
+				marvell,function = "gpio";
+			};
+			pmx_button_function: pmx-button-function {
+				marvell,pins = "mpp45";
+				marvell,function = "gpio";
+			};
+			pmx_power_switch: pmx-power-switch {
+				marvell,pins = "mpp46";
+				marvell,function = "gpio";
+			};
+			pmx_power_auto_switch: pmx-power-auto-switch {
+				marvell,pins = "mpp47";
+				marvell,function = "gpio";
+			};
+		};
+
+		serial@12000 {
+			status = "okay";
+		};
+
+		sata@80000 {
+			status = "okay";
+			nr-ports = <2>;
+		};
+
+		spi@10600 {
+			status = "okay";
+
+			m25p40@0 {
+				#address-cells = <1>;
+				#size-cells = <1>;
+				compatible = "st,m25p40", "jedec,spi-nor";
+				reg = <0>;
+				spi-max-frequency = <25000000>;
+				mode = <0>;
+
+				partition@0 {
+					reg = <0x0 0x60000>;
+					label = "uboot";
+					read-only;
+				};
+
+				partition@60000 {
+					reg = <0x60000 0x10000>;
+					label = "dtb";
+					read-only;
+				};
+
+				partition@70000 {
+					reg = <0x70000 0x10000>;
+					label = "uboot_env";
+				};
+			};
+		};
+	};
+
+	gpio_keys {
+		compatible = "gpio-keys";
+		#address-cells = <1>;
+		#size-cells = <0>;
+		pinctrl-0 = <&pmx_button_function &pmx_power_switch
+			     &pmx_power_auto_switch>;
+		pinctrl-names = "default";
+
+		button@1 {
+			label = "Function Button";
+			linux,code = <KEY_OPTION>;
+			gpios = <&gpio0 45 GPIO_ACTIVE_LOW>;
+		};
+
+		button@2 {
+			label = "Power-on Switch";
+			linux,code = <KEY_RESERVED>;
+			linux,input-type = <5>;
+			gpios = <&gpio0 46 GPIO_ACTIVE_LOW>;
+		};
+
+		button@3 {
+			label = "Power-auto Switch";
+			linux,code = <KEY_ESC>;
+			linux,input-type = <5>;
+			gpios = <&gpio0 47 GPIO_ACTIVE_LOW>;
+		};
+	};
+
+	gpio_leds {
+		compatible = "gpio-leds";
+		pinctrl-0 = <&pmx_led_function_red &pmx_led_alarm
+			     &pmx_led_info &pmx_led_power
+			     &pmx_led_function_blue
+			     &pmx_led_hdderr0
+			     &pmx_led_hdderr1>;
+		pinctrl-names = "default";
+
+		led@1 {
+			label = "lswvl:red:alarm";
+			gpios = <&gpio0 36 GPIO_ACTIVE_LOW>;
+		};
+
+		led@2 {
+			label = "lswvl:red:func";
+			gpios = <&gpio0 37 GPIO_ACTIVE_LOW>;
+		};
+
+		led@3 {
+			label = "lswvl:amber:info";
+			gpios = <&gpio0 38 GPIO_ACTIVE_LOW>;
+		};
+
+		led@4 {
+			label = "lswvl:blue:func";
+			gpios = <&gpio0 39 GPIO_ACTIVE_LOW>;
+		};
+
+		led@5 {
+			label = "lswvl:blue:power";
+			gpios = <&gpio0 40 GPIO_ACTIVE_LOW>;
+			default-state = "keep";
+		};
+
+		led@6 {
+			label = "lswvl:red:hdderr0";
+			gpios = <&gpio0 34 GPIO_ACTIVE_LOW>;
+		};
+
+		led@7 {
+			label = "lswvl:red:hdderr1";
+			gpios = <&gpio0 35 GPIO_ACTIVE_LOW>;
+		};
+	};
+
+	gpio_fan {
+		compatible = "gpio-fan";
+		pinctrl-0 = <&pmx_fan_low &pmx_fan_high &pmx_fan_lock>;
+		pinctrl-names = "default";
+
+		gpios = <&gpio0 17 GPIO_ACTIVE_LOW
+			 &gpio0 16 GPIO_ACTIVE_LOW>;
+
+		gpio-fan,speed-map = <0 3
+				1500 2
+				3250 1
+				5000 0>;
+
+		alarm-gpios = <&gpio0 43 GPIO_ACTIVE_HIGH>;
+	};
+
+	restart_poweroff {
+		compatible = "restart-poweroff";
+	};
+
+	regulators {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <0>;
+		pinctrl-0 = <&pmx_power_hdd0 &pmx_power_hdd1 &pmx_usb_vbus>;
+		pinctrl-names = "default";
+
+		usb_power: regulator@1 {
+			compatible = "regulator-fixed";
+			reg = <1>;
+			regulator-name = "USB Power";
+			regulator-min-microvolt = <5000000>;
+			regulator-max-microvolt = <5000000>;
+			enable-active-high;
+			regulator-always-on;
+			regulator-boot-on;
+			gpio = <&gpio0 12 GPIO_ACTIVE_HIGH>;
+		};
+		hdd_power0: regulator@2 {
+			compatible = "regulator-fixed";
+			reg = <2>;
+			regulator-name = "HDD0 Power";
+			regulator-min-microvolt = <5000000>;
+			regulator-max-microvolt = <5000000>;
+			enable-active-high;
+			regulator-always-on;
+			regulator-boot-on;
+			gpio = <&gpio0 8 GPIO_ACTIVE_HIGH>;
+		};
+		hdd_power1: regulator@3 {
+			compatible = "regulator-fixed";
+			reg = <3>;
+			regulator-name = "HDD1 Power";
+			regulator-min-microvolt = <5000000>;
+			regulator-max-microvolt = <5000000>;
+			enable-active-high;
+			regulator-always-on;
+			regulator-boot-on;
+			gpio = <&gpio0 9 GPIO_ACTIVE_HIGH>;
+		};
+	};
+};
+
+&mdio {
+	status = "okay";
+
+	ethphy0: ethernet-phy@0 {
+		device_type = "ethernet-phy";
+		reg = <0>;
+	};
+};
+
+&eth0 {
+	status = "okay";
+
+	ethernet0-port@0 {
+		phy-handle = <&ethphy0>;
+	};
+};
-- 
cgit v1.2.3


From a505bfb10a0fc6f4342a2521b899d61ff2461e72 Mon Sep 17 00:00:00 2001
From: Lee Jones <lee.jones@linaro.org>
Date: Thu, 9 Jul 2015 16:35:26 +0100
Subject: regulator: pwm-regulator: Ensure headings aren't confused with
 properties

We're using capitalisation and removing the '-' to mitigate confusion.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/devicetree/bindings/regulator/pwm-regulator.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/regulator/pwm-regulator.txt b/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
index 892b36655b3d..23b47720b2e4 100644
--- a/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
@@ -3,7 +3,7 @@ Bindings for the Generic PWM Regulator
 
 Currently supports 2 modes of operation:
 
-voltage-table:		When in this mode, a voltage table (See below) of
+Voltage Table:		When in this mode, a voltage table (See below) of
 			predefined voltage <=> duty-cycle values must be
 			provided via DT. Limitations are that the regulator can
 			only operate at the voltages supplied in the table.
@@ -13,7 +13,7 @@ voltage-table:		When in this mode, a voltage table (See below) of
 			the user if the assumptions made in continuous-voltage
 			mode do not reign true.
 
-continuous-voltage:	This mode uses the regulator's maximum and minimum
+Continuous Voltage:	This mode uses the regulator's maximum and minimum
 			supplied voltages specified in the
 			regulator-{min,max}-microvolt properties to calculate
 			appropriate duty-cycle values.  This allows for a much
-- 
cgit v1.2.3


From f747a1fe7848453957dbdf362a42d7a6735c6ff0 Mon Sep 17 00:00:00 2001
From: Lee Jones <lee.jones@linaro.org>
Date: Thu, 9 Jul 2015 16:35:27 +0100
Subject: regulator: pwm-regulator: Remove obsoleted property

In "[3d7ef30] regulator: pwm-regulator: Simplify voltage to duty-cycle
call" we stopped using max_duty_cycle, so we can retire it from device
data and DT.

There is no need to deprecate this property, as it hasn't hit Mainline
yet.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/devicetree/bindings/regulator/pwm-regulator.txt | 11 ++++-------
 drivers/regulator/pwm-regulator.c                             |  9 ---------
 2 files changed, 4 insertions(+), 16 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/regulator/pwm-regulator.txt b/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
index 23b47720b2e4..ed936f0f34f2 100644
--- a/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/pwm-regulator.txt
@@ -29,15 +29,14 @@ Required properties:
 
 - pwms:			PWM specification (See: ../pwm/pwm.txt)
 
-One of these must be provided:
+Only required for Voltage Table Mode:
 - voltage-table: 	Voltage and Duty-Cycle table consisting of 2 cells
 			    First cell is voltage in microvolts (uV)
 			    Second cell is duty-cycle in percent (%)
 
-- max-duty-cycle:	Maximum Duty-Cycle value -- this will normally be
-  			255 (0xff) for an 8 bit PWM device
-
-If both are provided, the current default is voltage-table mode.
+NB: To be clear, if voltage-table is provided, then the device will be used
+in Voltage Table Mode.  If no voltage-table is provided, then the device will
+be used in Continuous Voltage Mode.
 
 Any property defined as part of the core regulator binding can also be used.
 (See: ../regulator/regulator.txt)
@@ -49,8 +48,6 @@ Continuous Voltage Example:
 		regulator-min-microvolt = <1016000>;
 		regulator-max-microvolt = <1114000>;
 		regulator-name = "vdd_logic";
-
-		max-duty-cycle = <255>; /* 8bit PWM */
 	};
 
 Voltage Table Example:
diff --git a/drivers/regulator/pwm-regulator.c b/drivers/regulator/pwm-regulator.c
index cb482089050b..d92e66772ec0 100644
--- a/drivers/regulator/pwm-regulator.c
+++ b/drivers/regulator/pwm-regulator.c
@@ -30,7 +30,6 @@ struct pwm_regulator_data {
 	int state;
 
 	/* Continuous voltage */
-	u32 max_duty_cycle;
 	int volt_uV;
 };
 
@@ -201,14 +200,6 @@ static int pwm_regulator_init_continuous(struct platform_device *pdev,
 					 struct pwm_regulator_data *drvdata)
 {
 	struct device_node *np = pdev->dev.of_node;
-	int ret;
-
-	ret = of_property_read_u32(np, "max-duty-cycle",
-				   &drvdata->max_duty_cycle);
-	if (ret) {
-		dev_err(&pdev->dev, "Failed to read \"pwm-max-value\"\n");
-		return ret;
-	}
 
 	pwm_regulator_desc.ops = &pwm_regulator_voltage_continuous_ops;
 	pwm_regulator_desc.continuous_voltage_range = true;
-- 
cgit v1.2.3


From d6eff078e0bf9f24dcc2c7fcdbdf26b57148b197 Mon Sep 17 00:00:00 2001
From: Sébastien Hinderer <Sebastien.Hinderer@ens-lyon.org>
Date: Wed, 8 Jul 2015 21:12:32 +0200
Subject: SubmittingPatches: fix typo
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

From 7e0befc8e48a49e2ddf86bbd861027b14ea5a53d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?S=C3=A9bastien=20Hinderer?=
 <Sebastien.Hinderer@ens-lyon.org>
Date: Wed, 8 Jul 2015 21:10:18 +0200
Subject: [PATCH] SubmittingPatches: fix typo
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="utf-8"

Signed-off-by: Sébastien Hinderer <Sebastien.Hinderer@ens-lyon.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/SubmittingPatches | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 27e7e5edeca8..e9395dfc4190 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -267,7 +267,7 @@ You should always copy the appropriate subsystem maintainer(s) on any patch
 to code that they maintain; look through the MAINTAINERS file and the
 source code revision history to see who those maintainers are.  The
 script scripts/get_maintainer.pl can be very useful at this step.  If you
-cannot find a maintainer for the subsystem your are working on, Andrew
+cannot find a maintainer for the subsystem you are working on, Andrew
 Morton (akpm@linux-foundation.org) serves as a maintainer of last resort.
 
 You should also normally choose at least one mailing list to receive a copy
-- 
cgit v1.2.3


From bd760c498f009e2a6795952469af6f8e6edac05a Mon Sep 17 00:00:00 2001
From: Daniel Grimshaw <grimshaw@linux.vnet.ibm.com>
Date: Wed, 8 Jul 2015 10:44:51 -0700
Subject: Documentation: filesystems: btrfs: Fixed typos and whitespace

I am a high school student trying to become familiar with
Linux kernel development. The btrfs documentation in
Documentation/filesystems had a few typos and errors in
whitespace. This patch corrects both of these.

Signed-off-by: Daniel Grimshaw <grimshaw@linux.vnet.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/btrfs.txt | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt
index d11cc2f8077b..c772b47e7ef0 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -61,7 +61,7 @@ Options with (*) are default options and will not show in the mount options.
 
 	check_int enables the integrity checker module, which examines all
 	block write requests to ensure on-disk consistency, at a large
-	memory and CPU cost.  
+	memory and CPU cost.
 
 	check_int_data includes extent data in the integrity checks, and
 	implies the check_int option.
@@ -113,7 +113,7 @@ Options with (*) are default options and will not show in the mount options.
 	Disable/enable debugging option to be more verbose in some ENOSPC conditions.
 
   fatal_errors=<action>
-	Action to take when encountering a fatal error: 
+	Action to take when encountering a fatal error:
 	  "bug" - BUG() on a fatal error.  This is the default.
 	  "panic" - panic() on a fatal error.
 
@@ -132,10 +132,10 @@ Options with (*) are default options and will not show in the mount options.
 
   max_inline=<bytes>
 	Specify the maximum amount of space, in bytes, that can be inlined in
-	a metadata B-tree leaf.  The value is specified in bytes, optionally 
+	a metadata B-tree leaf.  The value is specified in bytes, optionally
 	with a K, M, or G suffix, case insensitive.  In practice, this value
 	is limited by the root sector size, with some space unavailable due
-	to leaf headers.  For a 4k sectorsize, max inline data is ~3900 bytes.
+	to leaf headers.  For a 4k sector size, max inline data is ~3900 bytes.
 
   metadata_ratio=<value>
 	Specify that 1 metadata chunk should be allocated after every <value>
@@ -170,7 +170,7 @@ Options with (*) are default options and will not show in the mount options.
 
   recovery
 	Enable autorecovery attempts if a bad tree root is found at mount time.
-	Currently this scans a list of several previous tree roots and tries to 
+	Currently this scans a list of several previous tree roots and tries to
 	use the first readable.
 
   rescan_uuid_tree
@@ -194,7 +194,7 @@ Options with (*) are default options and will not show in the mount options.
   ssd_spread
 	Options to control ssd allocation schemes.  By default, BTRFS will
 	enable or disable ssd allocation heuristics depending on whether a
-	rotational or nonrotational disk is in use.  The ssd and nossd options
+	rotational or non-rotational disk is in use.  The ssd and nossd options
 	can override this autodetection.
 
 	The ssd_spread mount option attempts to allocate into big chunks
@@ -216,13 +216,13 @@ Options with (*) are default options and will not show in the mount options.
 	This allows mounting of subvolumes which are not in the root of the mounted
 	filesystem.
 	You can use "btrfs subvolume show " to see the object ID for a subvolume.
-	
+
   thread_pool=<number>
 	The number of worker threads to allocate.  The default number is equal
 	to the number of CPUs + 2, or 8, whichever is smaller.
 
   user_subvol_rm_allowed
-	Allow subvolumes to be deleted by a non-root user. Use with caution. 
+	Allow subvolumes to be deleted by a non-root user. Use with caution.
 
 MAILING LIST
 ============
-- 
cgit v1.2.3


From 35a256fee52c7c207796302681fa95189c85b408 Mon Sep 17 00:00:00 2001
From: Tom Herbert <tom@herbertland.com>
Date: Wed, 8 Jul 2015 16:58:22 -0700
Subject: ipv6: Nonlocal bind

Add support to allow non-local binds similar to how this was done for IPv4.
Non-local binds are very useful in emulating the Internet in a box, etc.

This add the ip_nonlocal_bind sysctl under ipv6.

Testing:

Set up nonlocal binding and receive routing on a host, e.g.:

ip -6 rule add from ::/0 iif eth0 lookup 200
ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
sysctl -w net.ipv6.ip_nonlocal_bind=1

Set up routing to 2001:0:0:1::/64 on peer to go to first host

ping6 -I 2001:0:0:1::1 peer-address -- to verify

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/ip-sysctl.txt | 5 +++++
 include/net/netns/ipv6.h               | 1 +
 net/ipv4/ping.c                        | 3 ++-
 net/ipv6/af_inet6.c                    | 3 ++-
 net/ipv6/raw.c                         | 3 ++-
 net/ipv6/sysctl_net_ipv6.c             | 8 ++++++++
 6 files changed, 20 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 5fae7704daab..f63aeefd2c24 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1435,6 +1435,11 @@ mtu - INTEGER
 	Default Maximum Transfer Unit
 	Default: 1280 (IPv6 required minimum)
 
+ip_nonlocal_bind - BOOLEAN
+	If set, allows processes to bind() to non-local IPv6 addresses,
+	which can be quite useful - but may break some applications.
+	Default: 0
+
 router_probe_interval - INTEGER
 	Minimum interval (in seconds) between Router Probing described
 	in RFC4191.
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 8d93544a2d2b..c0368db6df54 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -31,6 +31,7 @@ struct netns_sysctl_ipv6 {
 	int auto_flowlabels;
 	int icmpv6_time;
 	int anycast_src_echo_reply;
+	int ip_nonlocal_bind;
 	int fwmark_reflect;
 	int idgen_retries;
 	int idgen_delay;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 05ff44b758df..e89094ab5ddb 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -363,7 +363,8 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk,
 						    scoped);
 		rcu_read_unlock();
 
-		if (!(isk->freebind || isk->transparent || has_addr ||
+		if (!(net->ipv6.sysctl.ip_nonlocal_bind ||
+		      isk->freebind || isk->transparent || has_addr ||
 		      addr_type == IPV6_ADDR_ANY))
 			return -EADDRNOTAVAIL;
 
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 39e670a91596..7bc92ea4ae8f 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -342,7 +342,8 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 			 */
 			v4addr = LOOPBACK4_IPV6;
 			if (!(addr_type & IPV6_ADDR_MULTICAST))	{
-				if (!(inet->freebind || inet->transparent) &&
+				if (!net->ipv6.sysctl.ip_nonlocal_bind &&
+				    !(inet->freebind || inet->transparent) &&
 				    !ipv6_chk_addr(net, &addr->sin6_addr,
 						   dev, 0)) {
 					err = -EADDRNOTAVAIL;
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index ca4700cb26c4..fdbada1569a3 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -295,7 +295,8 @@ static int rawv6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		 * unspecified and mapped address have a v4 equivalent.
 		 */
 		v4addr = LOOPBACK4_IPV6;
-		if (!(addr_type & IPV6_ADDR_MULTICAST))	{
+		if (!(addr_type & IPV6_ADDR_MULTICAST) &&
+		    !sock_net(sk)->ipv6.sysctl.ip_nonlocal_bind) {
 			err = -EADDRNOTAVAIL;
 			if (!ipv6_chk_addr(sock_net(sk), &addr->sin6_addr,
 					   dev, 0)) {
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 4e705add4f18..db48aebd9c47 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -75,6 +75,13 @@ static struct ctl_table ipv6_table_template[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "ip_nonlocal_bind",
+		.data		= &init_net.ipv6.sysctl.ip_nonlocal_bind,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 	{ }
 };
 
@@ -117,6 +124,7 @@ static int __net_init ipv6_sysctl_net_init(struct net *net)
 	ipv6_table[5].data = &net->ipv6.sysctl.idgen_retries;
 	ipv6_table[6].data = &net->ipv6.sysctl.idgen_delay;
 	ipv6_table[7].data = &net->ipv6.sysctl.flowlabel_state_ranges;
+	ipv6_table[8].data = &net->ipv6.sysctl.ip_nonlocal_bind;
 
 	ipv6_route_table = ipv6_route_sysctl_init(net);
 	if (!ipv6_route_table)
-- 
cgit v1.2.3


From eeedcea69e927857d32aaf089725eddd2c79dd0a Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Fri, 26 Jun 2015 08:09:29 +0100
Subject: ARM: 8395/1: l2c: Add support for the "arm,shared-override" property

"CoreLink Level 2 Cache Controller L2C-310", p. 2-15, section 2.3.2
Shareable attribute" states:

    "The default behavior of the cache controller with respect to the
     shareable attribute is to transform Normal Memory Non-cacheable
     transactions into:
        - cacheable no allocate for reads
        - write through no write allocate for writes."

Depending on the system architecture, this may cause memory corruption
in the presence of bus mastering devices (e.g. OHCI). To avoid such
corruption, the default behavior can be disabled by setting the Shared
Override bit in the Auxiliary Control register.

Currently the Shared Override bit can be set only using C code:
  - by calling l2x0_init() directly, which is deprecated,
  - by setting/clearing the bit in the machine_desc.l2c_aux_val/mask
    fields, but using values differing from 0/~0 is also deprecated.

Hence add support for an "arm,shared-override" device tree property for
the l2c device node. By specifying this property, affected systems can
indicate that non-cacheable transactions must not be transformed.
Then, it's up to the OS to decide. The current behavior is to set the
"shared attribute override enable" bit, as there may exist kernel linear
mappings and cacheable aliases for the DMA buffers, even if CMA is
enabled.

See also commit 1a8e41cd672f894b ("ARM: 6395/1: VExpress: Set bit 22 in
the PL310 (cache controller) AuxCtlr register"):

    "Clearing bit 22 in the PL310 Auxiliary Control register (shared
     attribute override enable) has the side effect of transforming
     Normal Shared Non-cacheable reads into Cacheable no-allocate reads.

     Coherent DMA buffers in Linux always have a Cacheable alias via the
     kernel linear mapping and the processor can speculatively load
     cache lines into the PL310 controller. With bit 22 cleared,
     Non-cacheable reads would unexpectedly hit such cache lines leading
     to buffer corruption."

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 Documentation/devicetree/bindings/arm/l2cc.txt | 6 ++++++
 arch/arm/mm/cache-l2x0.c                       | 5 +++++
 2 files changed, 11 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/l2cc.txt b/Documentation/devicetree/bindings/arm/l2cc.txt
index 2251dccb141e..06c88a4d28ac 100644
--- a/Documentation/devicetree/bindings/arm/l2cc.txt
+++ b/Documentation/devicetree/bindings/arm/l2cc.txt
@@ -67,6 +67,12 @@ Optional properties:
   disable if zero.
 - arm,prefetch-offset : Override prefetch offset value. Valid values are
   0-7, 15, 23, and 31.
+- arm,shared-override : The default behavior of the pl310 cache controller with
+  respect to the shareable attribute is to transform "normal memory
+  non-cacheable transactions" into "cacheable no allocate" (for reads) or
+  "write through no write allocate" (for writes).
+  On systems where this may cause DMA buffer corruption, this property must be
+  specified to indicate that such transforms are precluded.
 - prefetch-data : Data prefetch. Value: <0> (forcibly disable), <1>
   (forcibly enable), property absent (retain settings set by firmware)
 - prefetch-instr : Instruction prefetch. Value: <0> (forcibly disable),
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 71b3d3309024..493692d838c6 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -1171,6 +1171,11 @@ static void __init l2c310_of_parse(const struct device_node *np,
 		}
 	}
 
+	if (of_property_read_bool(np, "arm,shared-override")) {
+		*aux_val |= L2C_AUX_CTRL_SHARED_OVERRIDE;
+		*aux_mask &= ~L2C_AUX_CTRL_SHARED_OVERRIDE;
+	}
+
 	prefetch = l2x0_saved_regs.prefetch_ctrl;
 
 	ret = of_property_read_u32(np, "arm,double-linefill", &val);
-- 
cgit v1.2.3


From 32c1735c4091ef0de8e39f68a9d83a07450b959b Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben@decadent.org.uk>
Date: Wed, 8 Jul 2015 20:06:44 +0100
Subject: DocBook: Don't store mtime (or name) in compressed man pages
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The mtime on a man page is the build time.  As gzip stores the mtime
and original name in the compressed file by default, this makes
compressed man pages unreproducible.  Neither of these are important
metadata in this case, so turn this off.

Reported-by: Jérémy Bobbio <lunar@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index b6a6a2e0dd3b..11a41456b943 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -56,7 +56,7 @@ htmldocs: $(HTML)
 
 MAN := $(patsubst %.xml, %.9, $(BOOKS))
 mandocs: $(MAN)
-	find $(obj)/man -name '*.9' | xargs gzip -f
+	find $(obj)/man -name '*.9' | xargs gzip -nf
 
 installmandocs: mandocs
 	mkdir -p /usr/local/man/man9/
-- 
cgit v1.2.3


From b48ed85145711b28f9024dcdaf6c0238d7b5042c Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben@decadent.org.uk>
Date: Wed, 8 Jul 2015 20:06:51 +0100
Subject: DocBook: Generate consistent IDs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

By default, DocBook XSL uses a non-deterministic function to generate
IDs for HTML elements where it can't take a name from the input
document.  However, it has the option to generate 'consistent'
(deterministic) IDs instead.  Enable this to make the HTML pages
reproducible.

Reported-by: Jérémy Bobbio <lunar@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/stylesheet.xsl | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/stylesheet.xsl b/Documentation/DocBook/stylesheet.xsl
index 85b25275196f..3bf4ecf3d760 100644
--- a/Documentation/DocBook/stylesheet.xsl
+++ b/Documentation/DocBook/stylesheet.xsl
@@ -5,6 +5,7 @@
 <param name="funcsynopsis.tabular.threshold">80</param>
 <param name="callout.graphics">0</param>
 <!-- <param name="paper.type">A4</param> -->
+<param name="generate.consistent.ids">1</param>
 <param name="generate.section.toc.level">2</param>
 <param name="use.id.as.filename">1</param>
 </stylesheet>
-- 
cgit v1.2.3


From b44158b17099ed5c7c8f4bfb7029942adbfbc318 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben@decadent.org.uk>
Date: Wed, 8 Jul 2015 20:07:05 +0100
Subject: DocBook: Avoid building man pages repeatedly and inconsistently
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Some kernel-doc sections are included in multiple DocBook files.  This
means the mandocs target will generate the same manual page multiple
times with different metadata (author name/address and manual title,
taken from the including DocBook file).  If it's invoked in a parallel
build, the output is nondeterminstic.

For each section that is duplicated, mark the less specific manual's
inclusion as 'extra' and exclude it during conversion to manual pages.
Use xmlif for this, as that is bundled with xmlto which we already
use.

I would have preferred to use more conventional markup for this, but
each of the following approaches failed:

1. Wrap the extra inclusions with a new element and add a template to
   the stylesheet to include/exclude them.  Unfortunately DocBook XSL
   doesn't seem to support foreign elements at an intermediate level
   in the document tree.
2. Use DocBook profiling.  This works but requires passing an absolute
   path to the profile stylesheet to xmlto, so it's not portable.
3. Use SGML marked sections.  docbook2x can handle these but xmlto
   chokes on them.

Reported-by: Jérémy Bobbio <lunar@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/Makefile            | 10 +++++++++-
 Documentation/DocBook/device-drivers.tmpl |  6 ++++++
 Documentation/DocBook/gadget.tmpl         |  3 +++
 Documentation/DocBook/kernel-api.tmpl     |  6 ++++++
 4 files changed, 24 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index 11a41456b943..83fcb6c2a00f 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -56,6 +56,13 @@ htmldocs: $(HTML)
 
 MAN := $(patsubst %.xml, %.9, $(BOOKS))
 mandocs: $(MAN)
+	@dups=$$(sed -n 's/.*<refname>\([^<]*\)<\/refname>.*/\1/p'	\
+		 $(obj)/*.xml.noextra | sort | uniq -d);		\
+	if [ -n "$$dups" ]; then					\
+		echo >&2 "The following manual pages are generated more than once:"; \
+		printf >&2 '%s\n' "$$dups";				\
+		exit 1;							\
+	fi
 	find $(obj)/man -name '*.9' | xargs gzip -nf
 
 installmandocs: mandocs
@@ -150,7 +157,7 @@ quiet_cmd_db2html = HTML    $@
             cp $(PNG-$(basename $(notdir $@))) $(patsubst %.html,%,$@); fi
 
 quiet_cmd_db2man = MAN     $@
-      cmd_db2man = if grep -q refentry $<; then xmlto man $(XMLTOFLAGS) -o $(obj)/man $< ; fi
+      cmd_db2man = if grep -q refentry $<; then xmlif excludeextra=1 <$< >$<.noextra && xmlto man $(XMLTOFLAGS) -o $(obj)/man $<.noextra ; fi
 %.9 : %.xml
 	@(which xmlto > /dev/null 2>&1) || \
 	 (echo "*** You need to install xmlto ***"; \
@@ -217,6 +224,7 @@ clean-files := $(DOCBOOKS) \
 	$(patsubst %.xml, %.ps,   $(DOCBOOKS)) \
 	$(patsubst %.xml, %.pdf,  $(DOCBOOKS)) \
 	$(patsubst %.xml, %.html, $(DOCBOOKS)) \
+	$(patsubst %, %.noextra,  $(DOCBOOKS)) \
 	$(patsubst %.xml, %.9,    $(DOCBOOKS)) \
 	$(index)
 
diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index faf09d4a0ea8..87853ea1f70b 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -194,8 +194,13 @@ X!Edrivers/pnp/system.c
 
   <chapter id="snddev">
      <title>Sound Devices</title>
+<?xmlif if excludeextra='1'?>
+<?xmlif else?>
 !Iinclude/sound/core.h
+<?xmlif fi?>
 !Esound/sound_core.c
+<?xmlif if excludeextra='1'?>
+<?xmlif else?>
 !Iinclude/sound/pcm.h
 !Esound/core/pcm.c
 !Esound/core/device.c
@@ -211,6 +216,7 @@ X!Edrivers/pnp/system.c
 !Esound/core/hwdep.c
 !Esound/core/pcm_native.c
 !Esound/core/memalloc.c
+<?xmlif fi?>
 <!-- FIXME: Removed for now since no structured comments in source
 X!Isound/sound_firmware.c
 -->
diff --git a/Documentation/DocBook/gadget.tmpl b/Documentation/DocBook/gadget.tmpl
index 641629221176..e1b87bd63f82 100644
--- a/Documentation/DocBook/gadget.tmpl
+++ b/Documentation/DocBook/gadget.tmpl
@@ -488,7 +488,10 @@ These are the same types and constants used by host
 side drivers (and usbcore).
 </para>
 
+<?xmlif if excludeextra='1'?>
+<?xmlif else?>
 !Iinclude/linux/usb/ch9.h
+<?xmlif fi?>
 </sect1>
 
 <sect1 id="core"><title>Core Objects and Methods</title>
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index ecfd0ea40661..722249a0dbfe 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -58,8 +58,11 @@
 
      <sect1><title>String Conversions</title>
 !Elib/vsprintf.c
+<?xmlif if excludeextra='1'?>
+<?xmlif else?>
 !Finclude/linux/kernel.h kstrtol
 !Finclude/linux/kernel.h kstrtoul
+<?xmlif fi?>
 !Elib/kstrtox.c
      </sect1>
      <sect1><title>String Manipulation</title>
@@ -178,7 +181,10 @@ X!Ekernel/module.c
   <chapter id="hardware">
      <title>Hardware Interfaces</title>
      <sect1><title>Interrupt Handling</title>
+<?xmlif if excludeextra='1'?>
+<?xmlif else?>
 !Ekernel/irq/manage.c
+<?xmlif fi?>
      </sect1>
 
      <sect1><title>DMA Channels</title>
-- 
cgit v1.2.3


From 05e85d4e0180dbbce823e19d81388e60ac924be1 Mon Sep 17 00:00:00 2001
From: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Date: Mon, 22 Jun 2015 16:31:05 +0200
Subject: ASoC: sti: add binding for ASoC driver

Add ASoC driver bindings documentation.
Describe the required properties for each of the hardware IPs drivers.

Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../devicetree/bindings/sound/st,sti-asoc-card.txt | 155 +++++++++++++++++++++
 1 file changed, 155 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/st,sti-asoc-card.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/sound/st,sti-asoc-card.txt b/Documentation/devicetree/bindings/sound/st,sti-asoc-card.txt
new file mode 100644
index 000000000000..028fa1c82f50
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/st,sti-asoc-card.txt
@@ -0,0 +1,155 @@
+STMicroelectronics sti ASoC cards
+
+The sti ASoC Sound Card can be used, for all sti SoCs using internal sti-sas
+codec or external codecs.
+
+sti sound drivers allows to expose sti SoC audio interface through the
+generic ASoC simple card. For details about sound card declaration please refer to
+Documentation/devicetree/bindings/sound/simple-card.txt.
+
+1) sti-uniperiph-dai: audio dai device.
+---------------------------------------
+
+Required properties:
+  - compatible: "st,sti-uni-player" or "st,sti-uni-reader"
+
+  - st,syscfg: phandle to boot-device system configuration registers
+
+  - clock-names: name of the clocks listed in clocks property in the same order
+
+  - reg: CPU DAI IP Base address and size entries, listed  in same
+	 order than the CPU_DAI properties.
+
+  - reg-names: names of the mapped memory regions listed in regs property in
+	       the same order.
+
+  - interrupts: CPU_DAI interrupt line, listed in the same order than the
+		CPU_DAI properties.
+
+  - dma: CPU_DAI DMA controller phandle and DMA request line, listed in the same
+	 order than the CPU_DAI properties.
+
+  - dma-names: identifier string for each DMA request line in the dmas property.
+	"tx" for "st,sti-uni-player" compatibility
+	"rx" for "st,sti-uni-reader" compatibility
+
+  - version: IP version integrated in SOC.
+
+  - dai-name: DAI name that describes the IP.
+
+Required properties ("st,sti-uni-player" compatibility only):
+  - clocks: CPU_DAI IP clock source, listed in the same order than the
+	    CPU_DAI properties.
+
+  - uniperiph-id: internal SOC IP instance ID.
+
+  - IP mode: IP working mode depending on associated codec.
+	"HDMI" connected to HDMI codec IP and IEC HDMI formats.
+	"SPDIF"connected to SPDIF codec and support SPDIF formats.
+	"PCM"  PCM standard mode for I2S or TDM bus.
+
+Optional properties:
+  - pinctrl-0: defined for CPU_DAI@1 and CPU_DAI@4 to describe I2S PIOs for
+	       external codecs connection.
+
+  - pinctrl-names: should contain only one value - "default".
+
+Example:
+
+	sti_uni_player2: sti-uni-player@2 {
+		compatible = "st,sti-uni-player";
+		status = "okay";
+		#sound-dai-cells = <0>;
+		st,syscfg = <&syscfg_core>;
+		clocks = <&clk_s_d0_flexgen CLK_PCM_2>;
+		reg = <0x8D82000 0x158>;
+		interrupts = <GIC_SPI 86 IRQ_TYPE_NONE>;
+		dmas = <&fdma0 4 0 1>;
+		dai-name = "Uni Player #1 (DAC)";
+		dma-names = "tx";
+		uniperiph-id = <2>;
+		version = <5>;
+		mode = "PCM";
+	};
+
+	sti_uni_player3: sti-uni-player@3 {
+		compatible = "st,sti-uni-player";
+		status = "okay";
+		#sound-dai-cells = <0>;
+		st,syscfg = <&syscfg_core>;
+		clocks = <&clk_s_d0_flexgen CLK_SPDIFF>;
+		reg = <0x8D85000 0x158>;
+		interrupts = <GIC_SPI 89 IRQ_TYPE_NONE>;
+		dmas = <&fdma0 7 0 1>;
+		dma-names = "tx";
+		dai-name = "Uni Player #1 (PIO)";
+		uniperiph-id = <3>;
+		version = <5>;
+		mode = "SPDIF";
+	};
+
+	sti_uni_reader1: sti-uni-reader@1 {
+		compatible = "st,sti-uni-reader";
+		status = "disabled";
+		#sound-dai-cells = <0>;
+		st,syscfg = <&syscfg_core>;
+		reg = <0x8D84000 0x158>;
+		interrupts = <GIC_SPI 88 IRQ_TYPE_NONE>;
+		dmas = <&fdma0 6 0 1>;
+		dma-names = "rx";
+		dai-name = "Uni Reader #1 (HDMI RX)";
+		version = <3>;
+	};
+
+2) sti-sas-codec: internal audio codec IPs driver
+-------------------------------------------------
+
+Required properties:
+  - compatible: "st,sti<chip>-sas-codec" .
+	Should be chip "st,stih416-sas-codec" or "st,stih407-sas-codec"
+
+  - st,syscfg: phandle to boot-device system configuration registers.
+
+  - pinctrl-0: SPDIF PIO description.
+
+  - pinctrl-names: should contain only one value - "default".
+
+Example:
+	sti_sas_codec: sti-sas-codec {
+		compatible = "st,stih407-sas-codec";
+		#sound-dai-cells = <1>;
+		st,reg_audio = <&syscfg_core>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&pinctrl_spdif_out >;
+	};
+
+Example of audio card declaration:
+	sound {
+		compatible = "simple-audio-card";
+		simple-audio-card,name = "sti audio card";
+		status = "okay";
+
+		simple-audio-card,dai-link@0 {
+			/* DAC */
+			format = "i2s";
+			dai-tdm-slot-width = <32>;
+			cpu {
+				sound-dai = <&sti_uni_player2>;
+			};
+
+			codec {
+				sound-dai = <&sti_sasg_codec 1>;
+			};
+		};
+		simple-audio-card,dai-link@1 {
+			/* SPDIF */
+			format = "left_j";
+			cpu {
+				sound-dai = <&sti_uni_player3>;
+			};
+
+			codec {
+				sound-dai = <&sti_sasg_codec 0>;
+			};
+		};
+	};
-- 
cgit v1.2.3


From 4b10df7f2c8b3c6de12375b85456c6fd275af6d0 Mon Sep 17 00:00:00 2001
From: "Olivier C. Larocque" <olivier.c.larocque@gmail.com>
Date: Thu, 2 Jul 2015 21:33:23 -0400
Subject: Documentation: CodingStyle: remove broken links in the References
 section

Remove 2 broken links for programming reference books in Appendix I. After
a lookup on an Internet archives web site, it seems that these links have
been broken for around 3 months. We can then assume that they will not be
back up and safely remove them from the documentation.

Signed-off-by: Olivier C. Larocque <olivier.c.larocque@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/CodingStyle | 2 --
 1 file changed, 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle
index b713c35f8543..c06f817b3091 100644
--- a/Documentation/CodingStyle
+++ b/Documentation/CodingStyle
@@ -929,13 +929,11 @@ The C Programming Language, Second Edition
 by Brian W. Kernighan and Dennis M. Ritchie.
 Prentice Hall, Inc., 1988.
 ISBN 0-13-110362-8 (paperback), 0-13-110370-9 (hardback).
-URL: http://cm.bell-labs.com/cm/cs/cbook/
 
 The Practice of Programming
 by Brian W. Kernighan and Rob Pike.
 Addison-Wesley, Inc., 1999.
 ISBN 0-201-61586-X.
-URL: http://cm.bell-labs.com/cm/cs/tpop/
 
 GNU manuals - where in compliance with K&R and this text - for cpp, gcc,
 gcc internals and indent, all available from http://www.gnu.org/manual/
-- 
cgit v1.2.3


From 60498bb58e9a614b9706f250667033f30632a25c Mon Sep 17 00:00:00 2001
From: Luis de Bethencourt <luis@debethencourt.com>
Date: Fri, 3 Jul 2015 16:22:11 +0100
Subject: SubmittingPatches: update CodingStyle reference

Link to the internal up to date Coding Style document inside the Kernel
sources instead of an external one.

Signed-off-by: Luis de Bethencourt <luis@debethencourt.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/SubmittingPatches | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index e9395dfc4190..67cf50914bbd 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -796,7 +796,7 @@ NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!
   <https://lkml.org/lkml/2005/7/11/336>
 
 Kernel Documentation/CodingStyle:
-  <http://users.sosdg.org/~qiyong/lxr/source/Documentation/CodingStyle>
+  <Documentation/CodingStyle>
 
 Linus Torvalds's mail on the canonical patch format:
   <http://lkml.org/lkml/2005/4/7/183>
-- 
cgit v1.2.3


From 9ba6e988c7208e3cb1f71862cc578397ae938159 Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <k.kozlowski.k@gmail.com>
Date: Sun, 5 Jul 2015 11:56:51 +0900
Subject: Documentation: ARM: EXYNOS: Extend boot loader interface
 documentation

Extend the kernel-bootloader interface documentation with usage of
register INFORM1 (0x0804) and different CPU resume address on Exynos542x
family (with Multi-Cluster Power Management enabled).

Additionally add glossary and reformat section titles.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski.k@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/arm/Samsung/Bootloader-interface.txt | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/arm/Samsung/Bootloader-interface.txt b/Documentation/arm/Samsung/Bootloader-interface.txt
index b96ead9a6919..df8d4fb85939 100644
--- a/Documentation/arm/Samsung/Bootloader-interface.txt
+++ b/Documentation/arm/Samsung/Bootloader-interface.txt
@@ -15,6 +15,7 @@ executing kernel.
 
 
 1. Non-Secure mode
+
 Address:      sysram_ns_base_addr
 Offset        Value                                        Purpose
 =============================================================================
@@ -28,6 +29,7 @@ Offset        Value                                        Purpose
 
 
 2. Secure mode
+
 Address:      sysram_base_addr
 Offset        Value                                        Purpose
 =============================================================================
@@ -40,14 +42,25 @@ Offset        Value                                        Purpose
 Address:      pmu_base_addr
 Offset        Value                                        Purpose
 =============================================================================
-0x0800        exynos_cpu_resume                            AFTR
+0x0800        exynos_cpu_resume                            AFTR, suspend
+0x0800        mcpm_entry_point (Exynos542x with MCPM)      AFTR, suspend
+0x0804        0xfcba0d10 (Magic cookie)                    AFTR
+0x0804        0x00000bad (Magic cookie)                    System suspend
 0x0814        exynos4_secondary_startup (Exynos4210 r1.1)  Secondary CPU boot
 0x0818        0xfcba0d10 (Magic cookie, Exynos4210 r1.1)   AFTR
 0x081C        exynos_cpu_resume (Exynos4210 r1.1)          AFTR
 
 
 3. Other (regardless of secure/non-secure mode)
+
 Address:      pmu_base_addr
 Offset        Value                           Purpose
 =============================================================================
 0x0908        Non-zero (only Exynos3250)      Secondary CPU boot up indicator
+
+
+4. Glossary
+
+AFTR - ARM Off Top Running, a low power mode, Cortex cores and many other
+modules are power gated, except the TOP modules
+MCPM - Multi-Cluster Power Management
-- 
cgit v1.2.3


From dc12f20ba06f89bc60d4dfadaf2b03404858e205 Mon Sep 17 00:00:00 2001
From: Masanari Iida <standby24x7@gmail.com>
Date: Mon, 6 Jul 2015 23:41:57 +0900
Subject: Doc: powerpc: Fix typos in Documentation/powerpc

This patch fix some spelling typo found in Documentation/powerpc.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/powerpc/cxl.txt         | 2 +-
 Documentation/powerpc/dscr.txt        | 6 +++---
 Documentation/powerpc/qe_firmware.txt | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/powerpc/cxl.txt b/Documentation/powerpc/cxl.txt
index 2a230d01cd8c..205c1b81625c 100644
--- a/Documentation/powerpc/cxl.txt
+++ b/Documentation/powerpc/cxl.txt
@@ -133,7 +133,7 @@ User API
     The following file operations are supported on both slave and
     master devices.
 
-    A userspace library libcxl is avaliable here:
+    A userspace library libcxl is available here:
 	https://github.com/ibm-capi/libcxl
     This provides a C interface to this kernel API.
 
diff --git a/Documentation/powerpc/dscr.txt b/Documentation/powerpc/dscr.txt
index 1ff4400c57b3..ece300c64f76 100644
--- a/Documentation/powerpc/dscr.txt
+++ b/Documentation/powerpc/dscr.txt
@@ -4,7 +4,7 @@
 DSCR register in powerpc allows user to have some control of prefetch of data
 stream in the processor. Please refer to the ISA documents or related manual
 for more detailed information regarding how to use this DSCR to attain this
-control of the pefetches . This document here provides an overview of kernel
+control of the prefetches . This document here provides an overview of kernel
 support for DSCR, related kernel objects, it's functionalities and exported
 user interface.
 
@@ -44,7 +44,7 @@ user interface.
 	value into every CPU's DSCR register right away and updates the current
 	thread's DSCR value as well.
 
-	Changing the CPU specif DSCR default value in the sysfs does exactly
+	Changing the CPU specific DSCR default value in the sysfs does exactly
 	the same thing as above but unlike the global one above, it just changes
 	stuff for that particular CPU instead for all the CPUs on the system.
 
@@ -62,7 +62,7 @@ user interface.
 
 	Accessing DSCR through user level SPR (0x03) from user space will first
 	create a facility unavailable exception. Inside this exception handler
-	all mfspr isntruction based read attempts will get emulated and returned
+	all mfspr instruction based read attempts will get emulated and returned
 	where as the first mtspr instruction based write attempts will enable
 	the DSCR facility for the next time around (both for read and write) by
 	setting DSCR facility in the FSCR register.
diff --git a/Documentation/powerpc/qe_firmware.txt b/Documentation/powerpc/qe_firmware.txt
index 2031ddb33d09..e7ac24aec4ff 100644
--- a/Documentation/powerpc/qe_firmware.txt
+++ b/Documentation/powerpc/qe_firmware.txt
@@ -117,7 +117,7 @@ specific been defined.  This table describes the structure.
 Extended Modes
 
 This is a double word bit array (64 bits) that defines special functionality
-which has an impact on the softwarew drivers.  Each bit has its own impact
+which has an impact on the software drivers.  Each bit has its own impact
 and has special instructions for the s/w associated with it.  This structure is
 described in this table:
 
-- 
cgit v1.2.3


From 03bc7523cb4241a7f4ad67d7ae38027d1884a3ee Mon Sep 17 00:00:00 2001
From: Stefan Tatschner <stefan@sevenbyte.org>
Date: Wed, 8 Jul 2015 10:41:07 +0200
Subject: can-doc: Fix wrong chapter reference

In f35f6c8f7 (can: update MAINTAINERS and Documentation) chapter 3.3
was removed. This patch fixes some old references to chapter 3.4 which
no longer exists.

Signed-off-by: Stefan Tatschner <stefan@sevenbyte.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/networking/can.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt
index b48d4a149411..fd1a1aad49a9 100644
--- a/Documentation/networking/can.txt
+++ b/Documentation/networking/can.txt
@@ -510,7 +510,7 @@ solution for a couple of reasons:
 
   4.1.2 RAW socket option CAN_RAW_ERR_FILTER
 
-  As described in chapter 3.4 the CAN interface driver can generate so
+  As described in chapter 3.3 the CAN interface driver can generate so
   called Error Message Frames that can optionally be passed to the user
   application in the same way as other CAN frames. The possible
   errors are divided into different error classes that may be filtered
@@ -1152,7 +1152,7 @@ solution for a couple of reasons:
     $ ip link set canX type can restart
 
   Note that a restart will also create a CAN error message frame (see
-  also chapter 3.4).
+  also chapter 3.3).
 
   6.6 CAN FD (flexible data rate) driver support
 
-- 
cgit v1.2.3


From 40929167e6e8450a06501d292c5e5c81455d1c47 Mon Sep 17 00:00:00 2001
From: Roger Quadros <rogerq@ti.com>
Date: Mon, 6 Jul 2015 13:27:43 -0700
Subject: Input: pixcir_i2c_ts - add RESET gpio

The controller has a RESET pin which is usually controlled over
a GPIO line. If such a GPIO is provided, perform a RESET
during probe.

Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 .../bindings/input/touchscreen/pixcir_i2c_ts.txt   |  3 +++
 drivers/input/touchscreen/pixcir_i2c_ts.c          | 22 ++++++++++++++++++++++
 2 files changed, 25 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/input/touchscreen/pixcir_i2c_ts.txt b/Documentation/devicetree/bindings/input/touchscreen/pixcir_i2c_ts.txt
index 6e551090f465..8eb240a287c8 100644
--- a/Documentation/devicetree/bindings/input/touchscreen/pixcir_i2c_ts.txt
+++ b/Documentation/devicetree/bindings/input/touchscreen/pixcir_i2c_ts.txt
@@ -8,6 +8,9 @@ Required properties:
 - touchscreen-size-x: horizontal resolution of touchscreen (in pixels)
 - touchscreen-size-y: vertical resolution of touchscreen (in pixels)
 
+Optional properties:
+- reset-gpio: GPIO connected to the RESET line of the chip
+
 Example:
 
 	i2c@00000000 {
diff --git a/drivers/input/touchscreen/pixcir_i2c_ts.c b/drivers/input/touchscreen/pixcir_i2c_ts.c
index 5330c0446540..a5a83139aad0 100644
--- a/drivers/input/touchscreen/pixcir_i2c_ts.c
+++ b/drivers/input/touchscreen/pixcir_i2c_ts.c
@@ -36,6 +36,7 @@ struct pixcir_i2c_ts_data {
 	struct i2c_client *client;
 	struct input_dev *input;
 	struct gpio_desc *gpio_attb;
+	struct gpio_desc *gpio_reset;
 	const struct pixcir_ts_platform_data *pdata;
 	bool running;
 	int max_fingers;	/* Max fingers supported in this instance */
@@ -189,6 +190,17 @@ static irqreturn_t pixcir_ts_isr(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static void pixcir_reset(struct pixcir_i2c_ts_data *tsdata)
+{
+	if (!IS_ERR_OR_NULL(tsdata->gpio_reset)) {
+		gpiod_set_value_cansleep(tsdata->gpio_reset, 1);
+		ndelay(100);	/* datasheet section 1.2.3 says 80ns min. */
+		gpiod_set_value_cansleep(tsdata->gpio_reset, 0);
+		/* wait for controller ready. 100ms guess. */
+		msleep(100);
+	}
+}
+
 static int pixcir_set_power_mode(struct pixcir_i2c_ts_data *ts,
 				 enum pixcir_power_mode mode)
 {
@@ -529,6 +541,14 @@ static int pixcir_i2c_ts_probe(struct i2c_client *client,
 		return error;
 	}
 
+	tsdata->gpio_reset = devm_gpiod_get_optional(dev, "reset",
+						     GPIOD_OUT_LOW);
+	if (IS_ERR(tsdata->gpio_reset)) {
+		error = PTR_ERR(tsdata->gpio_reset);
+		dev_err(dev, "Failed to request RESET gpio: %d\n", error);
+		return error;
+	}
+
 	error = devm_request_threaded_irq(dev, client->irq, NULL, pixcir_ts_isr,
 					  IRQF_TRIGGER_FALLING | IRQF_ONESHOT,
 					  client->name, tsdata);
@@ -537,6 +557,8 @@ static int pixcir_i2c_ts_probe(struct i2c_client *client,
 		return error;
 	}
 
+	pixcir_reset(tsdata);
+
 	/* Always be in IDLE mode to save power, device supports auto wake */
 	error = pixcir_set_power_mode(tsdata, PIXCIR_POWER_IDLE);
 	if (error) {
-- 
cgit v1.2.3


From 4245746037e379dc9f1388e422d52001cd431921 Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Wed, 24 Jun 2015 14:14:21 +0200
Subject: regulator: da9210: Add optional interrupt support

Add optional interrupt support to the da9210 regulator driver, to handle
over-current, under- and over-voltage, and over-temperature events.

Only the interrupt sources for which we handle events are unmasked, to
avoid interrupts we cannot handle.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../devicetree/bindings/regulator/da9210.txt       |  4 ++
 drivers/regulator/da9210-regulator.c               | 75 ++++++++++++++++++++++
 2 files changed, 79 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/regulator/da9210.txt b/Documentation/devicetree/bindings/regulator/da9210.txt
index 3297c53cb915..7aa9b1fa6b21 100644
--- a/Documentation/devicetree/bindings/regulator/da9210.txt
+++ b/Documentation/devicetree/bindings/regulator/da9210.txt
@@ -5,6 +5,10 @@ Required properties:
 - compatible:	must be "dlg,da9210"
 - reg:		the i2c slave address of the regulator. It should be 0x68.
 
+Optional properties:
+
+- interrupts:	a reference to the DA9210 interrupt, if available.
+
 Any standard regulator properties can be used to configure the single da9210
 DCDC.
 
diff --git a/drivers/regulator/da9210-regulator.c b/drivers/regulator/da9210-regulator.c
index f0489cb9018b..8e39f7457bc3 100644
--- a/drivers/regulator/da9210-regulator.c
+++ b/drivers/regulator/da9210-regulator.c
@@ -22,6 +22,8 @@
 #include <linux/i2c.h>
 #include <linux/module.h>
 #include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/slab.h>
 #include <linux/regulator/driver.h>
 #include <linux/regulator/machine.h>
@@ -120,6 +122,55 @@ static int da9210_get_current_limit(struct regulator_dev *rdev)
 	return da9210_buck_limits[sel];
 }
 
+static irqreturn_t da9210_irq_handler(int irq, void *data)
+{
+	struct da9210 *chip = data;
+	unsigned int val, handled = 0;
+	int error, ret = IRQ_NONE;
+
+	error = regmap_read(chip->regmap, DA9210_REG_EVENT_B, &val);
+	if (error < 0)
+		goto error_i2c;
+
+	if (val & DA9210_E_OVCURR) {
+		regulator_notifier_call_chain(chip->rdev,
+					      REGULATOR_EVENT_OVER_CURRENT,
+					      NULL);
+		handled |= DA9210_E_OVCURR;
+	}
+	if (val & DA9210_E_NPWRGOOD) {
+		regulator_notifier_call_chain(chip->rdev,
+					      REGULATOR_EVENT_UNDER_VOLTAGE,
+					      NULL);
+		handled |= DA9210_E_NPWRGOOD;
+	}
+	if (val & (DA9210_E_TEMP_WARN | DA9210_E_TEMP_CRIT)) {
+		regulator_notifier_call_chain(chip->rdev,
+					      REGULATOR_EVENT_OVER_TEMP, NULL);
+		handled |= val & (DA9210_E_TEMP_WARN | DA9210_E_TEMP_CRIT);
+	}
+	if (val & DA9210_E_VMAX) {
+		regulator_notifier_call_chain(chip->rdev,
+					      REGULATOR_EVENT_REGULATION_OUT,
+					      NULL);
+		handled |= DA9210_E_VMAX;
+	}
+	if (handled) {
+		/* Clear handled events */
+		error = regmap_write(chip->regmap, DA9210_REG_EVENT_B, handled);
+		if (error < 0)
+			goto error_i2c;
+
+		ret = IRQ_HANDLED;
+	}
+
+	return ret;
+
+error_i2c:
+	dev_err(regmap_get_device(chip->regmap), "I2C error : %d\n", error);
+	return ret;
+}
+
 /*
  * I2C driver interface functions
  */
@@ -168,6 +219,30 @@ static int da9210_i2c_probe(struct i2c_client *i2c,
 	}
 
 	chip->rdev = rdev;
+	if (i2c->irq) {
+		error = devm_request_threaded_irq(&i2c->dev, i2c->irq, NULL,
+						  da9210_irq_handler,
+						  IRQF_TRIGGER_LOW |
+						  IRQF_ONESHOT | IRQF_SHARED,
+						  "da9210", chip);
+		if (error) {
+			dev_err(&i2c->dev, "Failed to request IRQ%u: %d\n",
+				i2c->irq, error);
+			return error;
+		}
+
+		error = regmap_update_bits(chip->regmap, DA9210_REG_MASK_B,
+					 DA9210_M_OVCURR | DA9210_M_NPWRGOOD |
+					 DA9210_M_TEMP_WARN |
+					 DA9210_M_TEMP_CRIT | DA9210_M_VMAX, 0);
+		if (error < 0) {
+			dev_err(&i2c->dev, "Failed to update mask reg: %d\n",
+				error);
+			return error;
+		}
+	} else {
+		dev_warn(&i2c->dev, "No IRQ configured\n");
+	}
 
 	i2c_set_clientdata(i2c, chip);
 
-- 
cgit v1.2.3


From 7bd393543287b921f964a350166bf2866527a1b5 Mon Sep 17 00:00:00 2001
From: James Ban <james.ban.opensource@diasemi.com>
Date: Tue, 30 Jun 2015 13:39:39 +0900
Subject: regulator: da9211: support da9215

This is a patch for supporting da9215 buck converter.

Signed-off-by: James Ban <james.ban.opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../devicetree/bindings/regulator/da9211.txt       | 32 +++++++++++++++--
 drivers/regulator/Kconfig                          |  6 ++--
 drivers/regulator/da9211-regulator.c               | 40 ++++++++++++++++------
 drivers/regulator/da9211-regulator.h               | 18 +++++-----
 include/linux/regulator/da9211.h                   | 19 +++++-----
 5 files changed, 81 insertions(+), 34 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/regulator/da9211.txt b/Documentation/devicetree/bindings/regulator/da9211.txt
index eb618907c7de..c620493e8dbe 100644
--- a/Documentation/devicetree/bindings/regulator/da9211.txt
+++ b/Documentation/devicetree/bindings/regulator/da9211.txt
@@ -1,7 +1,7 @@
-* Dialog Semiconductor DA9211/DA9213 Voltage Regulator
+* Dialog Semiconductor DA9211/DA9213/DA9215 Voltage Regulator
 
 Required properties:
-- compatible: "dlg,da9211" or "dlg,da9213".
+- compatible: "dlg,da9211" or "dlg,da9213" or "dlg,da9215"
 - reg: I2C slave address, usually 0x68.
 - interrupts: the interrupt outputs of the controller
 - regulators: A node that houses a sub-node for each regulator within the
@@ -66,3 +66,31 @@ Example 2) DA9213
 			};
 		};
 	};
+
+
+Example 3) DA9215
+	pmic: da9215@68 {
+		compatible = "dlg,da9215";
+		reg = <0x68>;
+		interrupts = <3 27>;
+
+		regulators {
+			BUCKA {
+				regulator-name = "VBUCKA";
+				regulator-min-microvolt = < 300000>;
+				regulator-max-microvolt = <1570000>;
+				regulator-min-microamp 	= <4000000>;
+				regulator-max-microamp 	= <7000000>;
+				enable-gpios = <&gpio 27 0>;
+			};
+			BUCKB {
+				regulator-name = "VBUCKB";
+				regulator-min-microvolt = < 300000>;
+				regulator-max-microvolt = <1570000>;
+				regulator-min-microamp 	= <4000000>;
+				regulator-max-microamp 	= <7000000>;
+				enable-gpios = <&gpio 17 0>;
+			};
+		};
+	};
+
diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig
index bef3bde6971b..23496da101de 100644
--- a/drivers/regulator/Kconfig
+++ b/drivers/regulator/Kconfig
@@ -209,13 +209,13 @@ config REGULATOR_DA9210
 	  interface.
 
 config REGULATOR_DA9211
-	tristate "Dialog Semiconductor DA9211/DA9212/DA9213/DA9214 regulator"
+	tristate "Dialog Semiconductor DA9211/DA9212/DA9213/DA9214/DA9215 regulator"
 	depends on I2C
 	select REGMAP_I2C
 	help
 	  Say y here to support for the Dialog Semiconductor DA9211/DA9212
-	  /DA9213/DA9214.
-	  The DA9211/DA9212/DA9213/DA9214 is a multi-phase synchronous
+	  /DA9213/DA9214/DA9215.
+	  The DA9211/DA9212/DA9213/DA9214/DA9215 is a multi-phase synchronous
 	  step down converter 12A or 16A DC-DC Buck controlled through an I2C
 	  interface.
 
diff --git a/drivers/regulator/da9211-regulator.c b/drivers/regulator/da9211-regulator.c
index df79e4b1946e..0858100d2d03 100644
--- a/drivers/regulator/da9211-regulator.c
+++ b/drivers/regulator/da9211-regulator.c
@@ -1,6 +1,6 @@
 /*
- * da9211-regulator.c - Regulator device driver for DA9211/DA9213
- * Copyright (C) 2014  Dialog Semiconductor Ltd.
+ * da9211-regulator.c - Regulator device driver for DA9211/DA9213/DA9215
+ * Copyright (C) 2015  Dialog Semiconductor Ltd.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Library General Public
@@ -32,6 +32,7 @@
 /* DEVICE IDs */
 #define DA9211_DEVICE_ID	0x22
 #define DA9213_DEVICE_ID	0x23
+#define DA9215_DEVICE_ID	0x24
 
 #define DA9211_BUCK_MODE_SLEEP	1
 #define DA9211_BUCK_MODE_SYNC	2
@@ -90,6 +91,13 @@ static const int da9213_current_limits[] = {
 	3000000, 3200000, 3400000, 3600000, 3800000, 4000000, 4200000, 4400000,
 	4600000, 4800000, 5000000, 5200000, 5400000, 5600000, 5800000, 6000000
 };
+/* Current limits for DA9215 buck (uA) indices
+ * corresponds with register values
+ */
+static const int da9215_current_limits[] = {
+	4000000, 4200000, 4400000, 4600000, 4800000, 5000000, 5200000, 5400000,
+	5600000, 5800000, 6000000, 6200000, 6400000, 6600000, 6800000, 7000000
+};
 
 static unsigned int da9211_buck_get_mode(struct regulator_dev *rdev)
 {
@@ -157,6 +165,10 @@ static int da9211_set_current_limit(struct regulator_dev *rdev, int min,
 		current_limits = da9213_current_limits;
 		max_size = ARRAY_SIZE(da9213_current_limits)-1;
 		break;
+	case DA9215:
+		current_limits = da9215_current_limits;
+		max_size = ARRAY_SIZE(da9215_current_limits)-1;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -189,6 +201,9 @@ static int da9211_get_current_limit(struct regulator_dev *rdev)
 	case DA9213:
 		current_limits = da9213_current_limits;
 		break;
+	case DA9215:
+		current_limits = da9215_current_limits;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -350,13 +365,11 @@ static int da9211_regulator_init(struct da9211 *chip)
 	/* If configuration for 1/2 bucks is different between platform data
 	 * and the register, driver should exit.
 	 */
-	if ((chip->pdata->num_buck == 2 && data == 0x40)
-		|| (chip->pdata->num_buck == 1 && data == 0x00)) {
-		if (data == 0)
-			chip->num_regulator = 1;
-		else
-			chip->num_regulator = 2;
-	} else {
+	if (chip->pdata->num_buck == 1 && data == 0x00)
+		chip->num_regulator = 1;
+	else if (chip->pdata->num_buck == 2 && data != 0x00)
+		chip->num_regulator = 2;
+	else {
 		dev_err(chip->dev, "Configuration is mismatched\n");
 		return -EINVAL;
 	}
@@ -438,6 +451,9 @@ static int da9211_i2c_probe(struct i2c_client *i2c,
 	case DA9213_DEVICE_ID:
 		chip->chip_id = DA9213;
 		break;
+	case DA9215_DEVICE_ID:
+		chip->chip_id = DA9215;
+		break;
 	default:
 		dev_err(chip->dev, "Unsupported device id = 0x%x.\n", data);
 		return -ENODEV;
@@ -478,6 +494,7 @@ static int da9211_i2c_probe(struct i2c_client *i2c,
 static const struct i2c_device_id da9211_i2c_id[] = {
 	{"da9211", DA9211},
 	{"da9213", DA9213},
+	{"da9215", DA9215},
 	{},
 };
 MODULE_DEVICE_TABLE(i2c, da9211_i2c_id);
@@ -486,6 +503,7 @@ MODULE_DEVICE_TABLE(i2c, da9211_i2c_id);
 static const struct of_device_id da9211_dt_ids[] = {
 	{ .compatible = "dlg,da9211", .data = &da9211_i2c_id[0] },
 	{ .compatible = "dlg,da9213", .data = &da9211_i2c_id[1] },
+	{ .compatible = "dlg,da9215", .data = &da9211_i2c_id[2] },
 	{},
 };
 MODULE_DEVICE_TABLE(of, da9211_dt_ids);
@@ -504,5 +522,5 @@ static struct i2c_driver da9211_regulator_driver = {
 module_i2c_driver(da9211_regulator_driver);
 
 MODULE_AUTHOR("James Ban <James.Ban.opensource@diasemi.com>");
-MODULE_DESCRIPTION("Regulator device driver for Dialog DA9211/DA9213");
-MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Regulator device driver for Dialog DA9211/DA9213/DA9215");
+MODULE_LICENSE("GPL");
diff --git a/drivers/regulator/da9211-regulator.h b/drivers/regulator/da9211-regulator.h
index 93fa9df2721c..d6ad96fc64d3 100644
--- a/drivers/regulator/da9211-regulator.h
+++ b/drivers/regulator/da9211-regulator.h
@@ -1,16 +1,16 @@
 /*
- * da9211-regulator.h - Regulator definitions for DA9211/DA9213
- * Copyright (C) 2014  Dialog Semiconductor Ltd.
+ * da9211-regulator.h - Regulator definitions for DA9211/DA9213/DA9215
+ * Copyright (C) 2015  Dialog Semiconductor Ltd.
  *
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Library General Public
- * License as published by the Free Software Foundation; either
- * version 2 of the License, or (at your option) any later version.
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
  *
- * This library is distributed in the hope that it will be useful,
+ * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Library General Public License for more details.
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
  */
 
 #ifndef __DA9211_REGISTERS_H__
diff --git a/include/linux/regulator/da9211.h b/include/linux/regulator/da9211.h
index 5dd65acc2a69..a43a5ca1167b 100644
--- a/include/linux/regulator/da9211.h
+++ b/include/linux/regulator/da9211.h
@@ -1,16 +1,16 @@
 /*
- * da9211.h - Regulator device driver for DA9211/DA9213
- * Copyright (C) 2014  Dialog Semiconductor Ltd.
+ * da9211.h - Regulator device driver for DA9211/DA9213/DA9215
+ * Copyright (C) 2015  Dialog Semiconductor Ltd.
  *
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Library General Public
- * License as published by the Free Software Foundation; either
- * version 2 of the License, or (at your option) any later version.
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
  *
- * This library is distributed in the hope that it will be useful,
+ * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Library General Public License for more details.
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
  */
 
 #ifndef __LINUX_REGULATOR_DA9211_H
@@ -23,6 +23,7 @@
 enum da9211_chip_id {
 	DA9211,
 	DA9213,
+	DA9215,
 };
 
 struct da9211_pdata {
-- 
cgit v1.2.3


From 5119222f2e3de2e51a3b3270036d53c55ce68236 Mon Sep 17 00:00:00 2001
From: Anatol Pomozov <anatol.pomozov@gmail.com>
Date: Sun, 12 Jul 2015 08:14:19 -0700
Subject: ASoC: max98357a: Make 'sdmode-gpios' dts property optional

The option is not needed if chip is always on or managed by some other part
of system like platform card driver.

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/devicetree/bindings/sound/max98357a.txt | 6 +++++-
 sound/soc/codecs/max98357a.c                          | 5 ++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/sound/max98357a.txt b/Documentation/devicetree/bindings/sound/max98357a.txt
index a7a149a236e5..28645a2ff885 100644
--- a/Documentation/devicetree/bindings/sound/max98357a.txt
+++ b/Documentation/devicetree/bindings/sound/max98357a.txt
@@ -4,7 +4,11 @@ This node models the Maxim MAX98357A DAC.
 
 Required properties:
 - compatible   : "maxim,max98357a"
-- sdmode-gpios : GPIO specifier for the GPIO -> DAC SDMODE pin
+
+Optional properties:
+- sdmode-gpios : GPIO specifier for the chip's SD_MODE pin.
+        If this option is not specified then driver does not manage
+        the pin state (e.g. chip is always on).
 
 Example:
 
diff --git a/sound/soc/codecs/max98357a.c b/sound/soc/codecs/max98357a.c
index 3a2fda08a893..fa1b79302bb3 100644
--- a/sound/soc/codecs/max98357a.c
+++ b/sound/soc/codecs/max98357a.c
@@ -31,6 +31,9 @@ static int max98357a_daiops_trigger(struct snd_pcm_substream *substream,
 {
 	struct gpio_desc *sdmode = snd_soc_dai_get_drvdata(dai);
 
+	if (!sdmode)
+		return 0;
+
 	switch (cmd) {
 	case SNDRV_PCM_TRIGGER_START:
 	case SNDRV_PCM_TRIGGER_RESUME:
@@ -60,7 +63,7 @@ static int max98357a_codec_probe(struct snd_soc_codec *codec)
 {
 	struct gpio_desc *sdmode;
 
-	sdmode = devm_gpiod_get(codec->dev, "sdmode", GPIOD_OUT_LOW);
+	sdmode = devm_gpiod_get_optional(codec->dev, "sdmode", GPIOD_OUT_LOW);
 	if (IS_ERR(sdmode)) {
 		dev_err(codec->dev, "%s() unable to get sdmode GPIO: %ld\n",
 				__func__, PTR_ERR(sdmode));
-- 
cgit v1.2.3


From a8069fafe8a3ce1567e2394b8551902ede300445 Mon Sep 17 00:00:00 2001
From: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Date: Sun, 21 Jun 2015 20:23:48 -0300
Subject: of: add vendor prefix for CIAA project

CIAA stands for Argentine Open Industrial Computer. The CIAA project
is an initiative sponsored by universities and private companies.
Its main goal is to promote local embedded systems development.

Among other devices, the project developed a NXP LPC4337-based
reference design.

Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
---
 Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index d444757c4d9e..c2f0af7b276d 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -46,6 +46,7 @@ chipone		ChipOne
 chipspark	ChipSPARK
 chrp	Common Hardware Reference Platform
 chunghwa	Chunghwa Picture Tubes Ltd.
+ciaa	Computadora Industrial Abierta Argentina
 cirrus	Cirrus Logic, Inc.
 cloudengines	Cloud Engines, Inc.
 cnm	Chips&Media, Inc.
-- 
cgit v1.2.3


From cfa229b7b4232b23b77e059e19adf432bedd59a9 Mon Sep 17 00:00:00 2001
From: Heiko Stuebner <heiko@sntech.de>
Date: Tue, 26 May 2015 21:49:05 +0200
Subject: dt-bindings: add vendor prefix for Netxeon Technology

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Romain Perier <romain.perier@gmail.com>
---
 Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index d444757c4d9e..576f4ac7fb1a 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -143,6 +143,7 @@ national	National Semiconductor
 neonode		Neonode Inc.
 netgear	NETGEAR
 netlogic	Broadcom Corporation (formerly NetLogic Microsystems)
+netxeon		Shenzhen Netxeon Technology CO., LTD
 newhaven	Newhaven Display International
 nintendo	Nintendo
 nokia	Nokia
-- 
cgit v1.2.3


From cde669fa71cfb6fdbd579c409b08a5cfa1863584 Mon Sep 17 00:00:00 2001
From: Heiko Stuebner <heiko@sntech.de>
Date: Tue, 26 May 2015 21:50:09 +0200
Subject: ARM: dts: rockchip: add Netxeon R89 board

This board is used in some TV-boxes like for example the Beelink R89 or
Tronsmart R28.

The board itself follows the reference design for the most part. But
there are no schematics available it seems, so some things should be
taken with a grain of salt.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Romain Perier <romain.perier@gmail.com>
---
 Documentation/devicetree/bindings/arm/rockchip.txt |   4 +
 arch/arm/boot/dts/Makefile                         |   3 +-
 arch/arm/boot/dts/rk3288-r89.dts                   | 412 +++++++++++++++++++++
 3 files changed, 418 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/boot/dts/rk3288-r89.dts

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/rockchip.txt b/Documentation/devicetree/bindings/arm/rockchip.txt
index 60d4a1e0a9b5..b30bed1a10ad 100644
--- a/Documentation/devicetree/bindings/arm/rockchip.txt
+++ b/Documentation/devicetree/bindings/arm/rockchip.txt
@@ -26,3 +26,7 @@ Rockchip platforms device tree bindings
 - ChipSPARK PopMetal-RK3288 board:
     Required root node properties:
       - compatible = "chipspark,popmetal-rk3288", "rockchip,rk3288";
+
+- Netxeon R89 board:
+    Required root node properties:
+      - compatible = "netxeon,r89", "rockchip,rk3288";
diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 246473a244f6..f239007cffc9 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -488,7 +488,8 @@ dtb-$(CONFIG_ARCH_ROCKCHIP) += \
 	rk3288-evb-act8846.dtb \
 	rk3288-evb-rk808.dtb \
 	rk3288-firefly-beta.dtb \
-	rk3288-firefly.dtb
+	rk3288-firefly.dtb \
+	rk3288-r89.dtb
 dtb-$(CONFIG_ARCH_S3C24XX) += \
 	s3c2416-smdk2416.dtb
 dtb-$(CONFIG_ARCH_S3C64XX) += \
diff --git a/arch/arm/boot/dts/rk3288-r89.dts b/arch/arm/boot/dts/rk3288-r89.dts
new file mode 100644
index 000000000000..b1d9ca64003e
--- /dev/null
+++ b/arch/arm/boot/dts/rk3288-r89.dts
@@ -0,0 +1,412 @@
+/*
+ * Copyright (c) 2015 Heiko Stuebner <heiko@sntech.de>
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This file is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+#include <dt-bindings/pwm/pwm.h>
+#include "rk3288.dtsi"
+
+/ {
+	compatible = "netxeon,r89", "rockchip,rk3288";
+
+	memory {
+		reg = <0x0 0x80000000>;
+	};
+
+	ext_gmac: external-gmac-clock {
+		compatible = "fixed-clock";
+		clock-frequency = <125000000>;
+		clock-output-names = "ext_gmac";
+		#clock-cells = <0>;
+	};
+
+	gpio-keys {
+		compatible = "gpio-keys";
+		#address-cells = <1>;
+		#size-cells = <0>;
+		autorepeat;
+
+		pinctrl-names = "default";
+		pinctrl-0 = <&pwrbtn>;
+
+		button@0 {
+			gpios = <&gpio0 5 GPIO_ACTIVE_LOW>;
+			linux,code = <116>;
+			label = "GPIO Key Power";
+			linux,input-type = <1>;
+			gpio-key,wakeup = <1>;
+			debounce-interval = <100>;
+		};
+	};
+
+	vcc_host: vcc-host-regulator {
+		compatible = "regulator-fixed";
+		enable-active-high;
+		gpio = <&gpio0 14 GPIO_ACTIVE_HIGH>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&host_vbus_drv>;
+		regulator-name = "vcc_host";
+		regulator-always-on;
+		regulator-boot-on;
+	};
+
+	vcc_otg: vcc-otg-regulator {
+		compatible = "regulator-fixed";
+		enable-active-high;
+		gpio = <&gpio0 12 GPIO_ACTIVE_HIGH>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&otg_vbus_drv>;
+		regulator-name = "vcc_otg";
+		regulator-always-on;
+		regulator-boot-on;
+	};
+
+	vcc_sdmmc: sdmmc-regulator {
+		compatible = "regulator-fixed";
+		regulator-name = "sdmmc-supply";
+		regulator-min-microvolt = <3300000>;
+		regulator-max-microvolt = <3300000>;
+		gpio = <&gpio7 11 GPIO_ACTIVE_LOW>;
+		startup-delay-us = <100000>;
+		vin-supply = <&vcc_io>;
+	};
+
+	vcc_sys: sys-regulator {
+		compatible = "regulator-fixed";
+		regulator-name = "sys-supply";
+		regulator-min-microvolt = <5000000>;
+		regulator-max-microvolt = <5000000>;
+		regulator-always-on;
+		regulator-boot-on;
+	};
+};
+
+&cpu0 {
+	cpu0-supply = <&vdd_cpu>;
+};
+
+&gmac {
+	phy-supply = <&vcc_lan>;
+	phy-mode = "rgmii";
+	clock_in_out = "input";
+	snps,reset-gpio = <&gpio4 7 0>;
+	snps,reset-active-low;
+	snps,reset-delays-us = <0 10000 1000000>;
+	assigned-clocks = <&cru SCLK_MAC>;
+	assigned-clock-parents = <&ext_gmac>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&rgmii_pins>;
+	tx_delay = <0x30>;
+	rx_delay = <0x10>;
+	status = "ok";
+};
+
+&hdmi {
+	status = "okay";
+};
+
+&i2c0 {
+	status = "okay";
+
+	vdd_cpu: pmic@40 {
+		compatible = "silergy,syr827";
+		reg = <0x40>;
+		fcs,suspend-voltage-selector = <1>;
+		regulator-name = "VDD_CPU";
+		regulator-enable-ramp-delay = <300>;
+		regulator-min-microvolt = <850000>;
+		regulator-max-microvolt = <1350000>;
+		regulator-ramp-delay = <8000>;
+		regulator-always-on;
+		regulator-boot-on;
+		vin-supply = <&vcc_sys>;
+	};
+
+	vdd_gpu: pmic@41 {
+		compatible = "silergy,syr828";
+		reg = <0x41>;
+		fcs,suspend-voltage-selector = <1>;
+		regulator-name = "VDD_GPU";
+		regulator-enable-ramp-delay = <300>;
+		regulator-min-microvolt = <850000>;
+		regulator-max-microvolt = <1350000>;
+		regulator-ramp-delay = <8000>;
+		regulator-always-on;
+		regulator-boot-on;
+		vin-supply = <&vcc_sys>;
+	};
+
+	rtc@51 {
+		compatible = "haoyu,hym8563";
+		reg = <0x51>;
+		#clock-cells = <0>;
+		clock-output-names = "xin32k";
+		interrupt-parent = <&gpio0>;
+		interrupts = <4 IRQ_TYPE_EDGE_FALLING>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&pmic_int>;
+	};
+
+	act8846: pmic@5a {
+		compatible = "active-semi,act8846";
+		reg = <0x5a>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&pmic_vsel>, <&pwr_hold>;
+		system-power-controller;
+
+		regulators {
+			vcc_ddr: REG1 {
+				regulator-name = "VCC_DDR";
+				regulator-min-microvolt = <1200000>;
+				regulator-max-microvolt = <1200000>;
+				regulator-always-on;
+			};
+
+			vcc_io: REG2 {
+				regulator-name = "VCC_IO";
+				regulator-min-microvolt = <3300000>;
+				regulator-max-microvolt = <3300000>;
+				regulator-always-on;
+			};
+
+			vdd_log: REG3 {
+				regulator-name = "VDD_LOG";
+				regulator-min-microvolt = <1000000>;
+				regulator-max-microvolt = <1000000>;
+				regulator-always-on;
+			};
+
+			vcc_20: REG4 {
+				regulator-name = "VCC_20";
+				regulator-min-microvolt = <2000000>;
+				regulator-max-microvolt = <2000000>;
+				regulator-always-on;
+			};
+
+			vccio_sd: REG5 {
+				regulator-name = "VCCIO_SD";
+				regulator-min-microvolt = <3300000>;
+				regulator-max-microvolt = <3300000>;
+				regulator-always-on;
+			};
+
+			vdd10_lcd: REG6 {
+				regulator-name = "VDD10_LCD";
+				regulator-min-microvolt = <1000000>;
+				regulator-max-microvolt = <1000000>;
+				regulator-always-on;
+			};
+
+			vcc_wl: REG7 {
+				regulator-name = "VCC_WL";
+				regulator-min-microvolt = <3300000>;
+				regulator-max-microvolt = <3300000>;
+				regulator-always-on;
+			};
+
+			vcca_33: REG8 {
+				regulator-name = "VCCA_33";
+				regulator-min-microvolt = <3300000>;
+				regulator-max-microvolt = <3300000>;
+				regulator-always-on;
+			};
+
+			vcc_lan: REG9 {
+				regulator-name = "VCC_LAN";
+				regulator-min-microvolt = <3300000>;
+				regulator-max-microvolt = <3300000>;
+				regulator-always-on;
+			};
+
+			vdd_10: REG10 {
+				regulator-name = "VDD_10";
+				regulator-min-microvolt = <1000000>;
+				regulator-max-microvolt = <1000000>;
+				regulator-always-on;
+			};
+
+			vcc_18: REG11 {
+				regulator-name = "VCC_18";
+				regulator-min-microvolt = <1800000>;
+				regulator-max-microvolt = <1800000>;
+				regulator-always-on;
+			};
+
+			vcc18_lcd: REG12 {
+				regulator-name = "VCC18_LCD";
+				regulator-min-microvolt = <1800000>;
+				regulator-max-microvolt = <1800000>;
+				regulator-always-on;
+			};
+		};
+	};
+};
+
+&i2c5 {
+	status = "okay";
+};
+
+&pinctrl {
+	pcfg_output_high: pcfg-output-high {
+		output-high;
+	};
+
+	pcfg_output_low: pcfg-output-low {
+		output-low;
+	};
+
+	act8846 {
+		pmic_vsel: pmic-vsel {
+			rockchip,pins = <7 1 RK_FUNC_GPIO &pcfg_output_low>;
+		};
+
+		pwr_hold: pwr-hold {
+			rockchip,pins = <0 6 RK_FUNC_GPIO &pcfg_output_high>;
+		};
+	};
+
+	buttons {
+		pwrbtn: pwrbtn {
+			rockchip,pins = <0 5 RK_FUNC_GPIO &pcfg_pull_up>;
+		};
+	};
+
+	pmic {
+		pmic_int: pmic-int {
+			rockchip,pins = <RK_GPIO0 4 RK_FUNC_GPIO &pcfg_pull_up>;
+		};
+	};
+
+	usb {
+		host_vbus_drv: host-vbus-drv {
+			rockchip,pins = <0 14 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+
+		otg_vbus_drv: otg-vbus-drv {
+			rockchip,pins = <0 12 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+};
+
+&pwm0 {
+	status = "okay";
+};
+
+&saradc {
+	vref-supply = <&vcc_18>;
+	status = "okay";
+};
+
+&sdmmc {
+	bus-width = <4>;
+	cap-mmc-highspeed;
+	cap-sd-highspeed;
+	card-detect-delay = <200>;
+	disable-wp;
+	num-slots = <1>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&sdmmc_clk &sdmmc_cmd &sdmmc_cd &sdmmc_bus4>;
+	vmmc-supply = <&vcc_sdmmc>;
+	vqmmc-supply = <&vccio_sd>;
+	status = "okay";
+};
+
+&tsadc {
+	rockchip,hw-tshut-mode = <0>;
+	rockchip,hw-tshut-polarity = <0>;
+	status = "okay";
+};
+
+&uart0 {
+	status = "okay";
+};
+
+&uart1 {
+	status = "okay";
+};
+
+&uart2 {
+	status = "okay";
+};
+
+&uart3 {
+	status = "okay";
+};
+
+&uart4 {
+	status = "okay";
+};
+
+&usb_host0_ehci {
+	status = "okay";
+};
+
+&usb_host1 {
+	status = "okay";
+};
+
+&usb_otg {
+	status = "okay";
+};
+
+&usbphy {
+	status = "okay";
+};
+
+&vopb {
+	status = "okay";
+};
+
+&vopb_mmu {
+	status = "okay";
+};
+
+&vopl {
+	status = "okay";
+};
+
+&vopl_mmu {
+	status = "okay";
+};
+
+&wdt {
+	status = "okay";
+};
-- 
cgit v1.2.3


From bdc89213710250d75fcbb86c2e623d1bbe8e3bb2 Mon Sep 17 00:00:00 2001
From: Sébastien Hinderer <Sebastien.Hinderer@ens-lyon.org>
Date: Sun, 12 Jul 2015 17:44:15 +0200
Subject: SubmittingPatches: fix wrong wording
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Sébastien Hinderer <Sebastien.Hinderer@ens-lyon.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/SubmittingPatches | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 67cf50914bbd..918c9727d3ec 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -340,7 +340,7 @@ on the changes you are submitting.  It is important for a kernel
 developer to be able to "quote" your changes, using standard e-mail
 tools, so that they may comment on specific portions of your code.
 
-For this reason, all patches should be submitting e-mail "inline".
+For this reason, all patches should be submitted by e-mail "inline".
 WARNING:  Be wary of your editor's word-wrap corrupting your patch,
 if you choose to cut-n-paste your patch.
 
-- 
cgit v1.2.3


From 6e7ac7b4ab4d7b4486b4680ebbffbff3b1b0d775 Mon Sep 17 00:00:00 2001
From: Sébastien Hinderer <Sebastien.Hinderer@ens-lyon.org>
Date: Sun, 12 Jul 2015 17:49:55 +0200
Subject: SubmittingPatches: fix wrong wording
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

"facilitate easier reviewing" says the same thing twice.

Signed-off-by: Sébastien Hinderer <Sebastien.Hinderer@ens-lyon.org>
[jc: made it "easier review"]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/SubmittingPatches | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 918c9727d3ec..56ef2e6f79c2 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -94,7 +94,7 @@ generated it with diff(1), to ensure accuracy.
 
 If your changes produce a lot of deltas, you need to split them into
 individual patches which modify things in logical stages; see section
-#3.  This will facilitate easier reviewing by other kernel developers,
+#3.  This will facilitate review by other kernel developers,
 very important if you want your patch accepted.
 
 If you're using git, "git rebase -i" can help you with this process.  If
-- 
cgit v1.2.3


From 5d250eeb9d9d2237d49cd284ad35a6707694b88e Mon Sep 17 00:00:00 2001
From: Masanari Iida <standby24x7@gmail.com>
Date: Mon, 13 Jul 2015 12:29:11 +0900
Subject: Doc: pps: Fix file name in pps.txt

This patch fix a file name of example code.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/pps/pps.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/pps/pps.txt b/Documentation/pps/pps.txt
index c508cceeee7d..7cb7264ad598 100644
--- a/Documentation/pps/pps.txt
+++ b/Documentation/pps/pps.txt
@@ -125,7 +125,7 @@ The same function may also run the defined echo function
 (pps_ktimer_echo(), passing to it the "ptr" pointer) if the user
 asked for that... etc..
 
-Please see the file drivers/pps/clients/ktimer.c for example code.
+Please see the file drivers/pps/clients/pps-ktimer.c for example code.
 
 
 SYSFS support
-- 
cgit v1.2.3


From acc068d967a63aef163167a66a02966f42052b80 Mon Sep 17 00:00:00 2001
From: Peter Huewe <peterhuewe@gmx.de>
Date: Mon, 13 Jul 2015 22:02:48 +0200
Subject: Documenation: Update location of docproc.c

docproc.c was moved in 2011 from scripts/basic to scripts
-> updated the references accordingly

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/kernel-doc-nano-HOWTO.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/kernel-doc-nano-HOWTO.txt b/Documentation/kernel-doc-nano-HOWTO.txt
index acbc1a3d0d91..78f69cdc9b3f 100644
--- a/Documentation/kernel-doc-nano-HOWTO.txt
+++ b/Documentation/kernel-doc-nano-HOWTO.txt
@@ -128,7 +128,7 @@ are:
   special place-holders for where the extracted documentation should
   go.
 
-- scripts/basic/docproc.c
+- scripts/docproc.c
 
   This is a program for converting SGML template files into SGML
   files. When a file is referenced it is searched for symbols
-- 
cgit v1.2.3


From 917d8e2d10f40e28aa9e0d824b2e5b8197d79fc2 Mon Sep 17 00:00:00 2001
From: Aleksa Sarai <cyphar@cyphar.com>
Date: Sat, 13 Jun 2015 03:21:58 +1000
Subject: cgroup: add documentation for the PIDs controller

Add documentation derived from kernel/cgroup_pids.c to the relevant
Documentation/ directory, along with a few examples of how to use the
PIDs controller as well an explanation of its peculiarities.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 Documentation/cgroups/00-INDEX |  2 +
 Documentation/cgroups/pids.txt | 85 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+)
 create mode 100644 Documentation/cgroups/pids.txt

(limited to 'Documentation')

diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroups/00-INDEX
index 96ce071a3633..3f5a40f57d4a 100644
--- a/Documentation/cgroups/00-INDEX
+++ b/Documentation/cgroups/00-INDEX
@@ -22,6 +22,8 @@ net_cls.txt
 	- Network classifier cgroups details and usages.
 net_prio.txt
 	- Network priority cgroups details and usages.
+pids.txt
+	- Process number cgroups details and usages.
 resource_counter.txt
 	- Resource Counter API.
 unified-hierarchy.txt
diff --git a/Documentation/cgroups/pids.txt b/Documentation/cgroups/pids.txt
new file mode 100644
index 000000000000..1a078b5d281a
--- /dev/null
+++ b/Documentation/cgroups/pids.txt
@@ -0,0 +1,85 @@
+						   Process Number Controller
+						   =========================
+
+Abstract
+--------
+
+The process number controller is used to allow a cgroup hierarchy to stop any
+new tasks from being fork()'d or clone()'d after a certain limit is reached.
+
+Since it is trivial to hit the task limit without hitting any kmemcg limits in
+place, PIDs are a fundamental resource. As such, PID exhaustion must be
+preventable in the scope of a cgroup hierarchy by allowing resource limiting of
+the number of tasks in a cgroup.
+
+Usage
+-----
+
+In order to use the `pids` controller, set the maximum number of tasks in
+pids.max (this is not available in the root cgroup for obvious reasons). The
+number of processes currently in the cgroup is given by pids.current.
+
+Organisational operations are not blocked by cgroup policies, so it is possible
+to have pids.current > pids.max. This can be done by either setting the limit to
+be smaller than pids.current, or attaching enough processes to the cgroup such
+that pids.current > pids.max. However, it is not possible to violate a cgroup
+policy through fork() or clone(). fork() and clone() will return -EAGAIN if the
+creation of a new process would cause a cgroup policy to be violated.
+
+To set a cgroup to have no limit, set pids.max to "max". This is the default for
+all new cgroups (N.B. that PID limits are hierarchical, so the most stringent
+limit in the hierarchy is followed).
+
+pids.current tracks all child cgroup hierarchies, so parent/pids.current is a
+superset of parent/child/pids.current.
+
+Example
+-------
+
+First, we mount the pids controller:
+# mkdir -p /sys/fs/cgroup/pids
+# mount -t cgroup -o pids none /sys/fs/cgroup/pids
+
+Then we create a hierarchy, set limits and attach processes to it:
+# mkdir -p /sys/fs/cgroup/pids/parent/child
+# echo 2 > /sys/fs/cgroup/pids/parent/pids.max
+# echo $$ > /sys/fs/cgroup/pids/parent/cgroup.procs
+# cat /sys/fs/cgroup/pids/parent/pids.current
+2
+#
+
+It should be noted that attempts to overcome the set limit (2 in this case) will
+fail:
+
+# cat /sys/fs/cgroup/pids/parent/pids.current
+2
+# ( /bin/echo "Here's some processes for you." | cat )
+sh: fork: Resource temporary unavailable
+#
+
+Even if we migrate to a child cgroup (which doesn't have a set limit), we will
+not be able to overcome the most stringent limit in the hierarchy (in this case,
+parent's):
+
+# echo $$ > /sys/fs/cgroup/pids/parent/child/cgroup.procs
+# cat /sys/fs/cgroup/pids/parent/pids.current
+2
+# cat /sys/fs/cgroup/pids/parent/child/pids.current
+2
+# cat /sys/fs/cgroup/pids/parent/child/pids.max
+max
+# ( /bin/echo "Here's some processes for you." | cat )
+sh: fork: Resource temporary unavailable
+#
+
+We can set a limit that is smaller than pids.current, which will stop any new
+processes from being forked at all (note that the shell itself counts towards
+pids.current):
+
+# echo 1 > /sys/fs/cgroup/pids/parent/pids.max
+# /bin/echo "We can't even spawn a single process now."
+sh: fork: Resource temporary unavailable
+# echo 0 > /sys/fs/cgroup/pids/parent/pids.max
+# /bin/echo "We can't even spawn a single process now."
+sh: fork: Resource temporary unavailable
+#
-- 
cgit v1.2.3


From 4fcb7dfd827d7d12ffc612fbc67769e002a482e4 Mon Sep 17 00:00:00 2001
From: Frank Li <Frank.Li@freescale.com>
Date: Wed, 27 May 2015 00:25:58 +0800
Subject: Document: dt: fsl: snvs: change support syscon

snvs actually is multi fucntion driver.
Change to use syscon to access register.
Change snvs parent interrupt to option because single function
may have seperated irq number.

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
---
 .../devicetree/bindings/crypto/fsl-sec4.txt        | 42 +++++++++++++++-------
 1 file changed, 29 insertions(+), 13 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/crypto/fsl-sec4.txt b/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
index e4022776ac6e..b21e7c2eef21 100644
--- a/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
+++ b/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
@@ -288,12 +288,13 @@ Secure Non-Volatile Storage (SNVS) Node
     Node defines address range and the associated
     interrupt for the SNVS function.  This function
     monitors security state information & reports
-    security violations.
+    security violations. This also included rtc,
+    system power off and ON/OFF key.
 
   - compatible
       Usage: required
       Value type: <string>
-      Definition: Must include "fsl,sec-v4.0-mon".
+      Definition: Must include "fsl,sec-v4.0-mon" and "syscon".
 
   - reg
       Usage: required
@@ -324,7 +325,7 @@ Secure Non-Volatile Storage (SNVS) Node
            the child address, parent address, & length.
 
    - interrupts
-      Usage: required
+      Usage: optional
       Value type: <prop_encoded-array>
       Definition:  Specifies the interrupts generated by this
            device.  The value of the interrupts property
@@ -341,7 +342,7 @@ Secure Non-Volatile Storage (SNVS) Node
 
 EXAMPLE
 	sec_mon@314000 {
-		compatible = "fsl,sec-v4.0-mon";
+		compatible = "fsl,sec-v4.0-mon", "syscon";
 		reg = <0x314000 0x1000>;
 		ranges = <0 0x314000 0x1000>;
 		interrupt-parent = <&mpic>;
@@ -358,16 +359,31 @@ Secure Non-Volatile Storage (SNVS) Low Power (LP) RTC Node
       Value type: <string>
       Definition: Must include "fsl,sec-v4.0-mon-rtc-lp".
 
-  - reg
+  - interrupts
       Usage: required
-      Value type: <prop-encoded-array>
-      Definition: A standard property.  Specifies the physical
-          address and length of the SNVS LP configuration registers.
+      Value type: <prop_encoded-array>
+      Definition: Specifies the interrupts generated by this
+	   device.  The value of the interrupts property
+	   consists of one interrupt specifier. The format
+	   of the specifier is defined by the binding document
+	   describing the node's interrupt parent.
+
+ - regmap
+	Usage: required
+	Value type: <phandle>
+	Definition: this is phandle to the register map node.
+
+ - offset
+	Usage: option
+	value type: <u32>
+	Definition: LP register offset. default it is 0x34.
 
 EXAMPLE
-	sec_mon_rtc_lp@314000 {
+	sec_mon_rtc_lp@1 {
 		compatible = "fsl,sec-v4.0-mon-rtc-lp";
-		reg = <0x34 0x58>;
+		interrupts = <93 2>;
+		regmap = <&snvs>;
+		offset = <0x34>;
 	};
 
 =====================================================================
@@ -443,12 +459,12 @@ FULL EXAMPLE
 		compatible = "fsl,sec-v4.0-mon";
 		reg = <0x314000 0x1000>;
 		ranges = <0 0x314000 0x1000>;
-		interrupt-parent = <&mpic>;
-		interrupts = <93 2>;
 
 		sec_mon_rtc_lp@34 {
 			compatible = "fsl,sec-v4.0-mon-rtc-lp";
-			reg = <0x34 0x58>;
+			regmap = <&sec_mon>;
+			offset = <0x34>;
+			interrupts = <93 2>;
 		};
 	};
 
-- 
cgit v1.2.3


From cc28791d5ece61eaa78e7b9bbc3ac1d14f664683 Mon Sep 17 00:00:00 2001
From: Frank Li <Frank.Li@freescale.com>
Date: Wed, 27 May 2015 00:26:01 +0800
Subject: Document: devicetree: input: imx: i.mx snvs power device tree
 bindings

The snvs-pwrkey is designed to enable POWER key function which controlled
by SNVS ONOFF. the driver can report the status of POWER key and wakeup
system if pressed after system suspend.

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: Robin Gong <b38343@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
---
 .../devicetree/bindings/crypto/fsl-sec4.txt        | 49 ++++++++++++++++++++++
 .../devicetree/bindings/input/snvs-pwrkey.txt      |  1 +
 2 files changed, 50 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/input/snvs-pwrkey.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/crypto/fsl-sec4.txt b/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
index b21e7c2eef21..71a39c5bd486 100644
--- a/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
+++ b/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
@@ -386,6 +386,47 @@ EXAMPLE
 		offset = <0x34>;
 	};
 
+=====================================================================
+System ON/OFF key driver
+
+  The snvs-pwrkey is designed to enable POWER key function which controlled
+  by SNVS ONOFF, the driver can report the status of POWER key and wakeup
+  system if pressed after system suspend.
+
+  - compatible:
+      Usage: required
+      Value type: <string>
+      Definition: Mush include "fsl,sec-v4.0-pwrkey".
+
+  - interrupts:
+      Usage: required
+      Value type: <prop_encoded-array>
+      Definition: The SNVS ON/OFF interrupt number to the CPU(s).
+
+  - linux,keycode:
+      Usage: option
+      Value type: <int>
+      Definition: Keycode to emit, KEY_POWER by default.
+
+  - wakeup:
+      Usage: option
+      Value type: <boo>
+      Definition: Button can wake-up the system.
+
+ - regmap:
+      Usage: required:
+      Value type: <phandle>
+      Definition: this is phandle to the register map node.
+
+EXAMPLE:
+	snvs-pwrkey@0x020cc000 {
+		compatible = "fsl,sec-v4.0-pwrkey";
+		regmap = <&snvs>;
+		interrupts = <0 4 0x4>
+	        linux,keycode = <116>; /* KEY_POWER */
+		wakeup;
+	};
+
 =====================================================================
 FULL EXAMPLE
 
@@ -466,6 +507,14 @@ FULL EXAMPLE
 			offset = <0x34>;
 			interrupts = <93 2>;
 		};
+
+		snvs-pwrkey@0x020cc000 {
+			compatible = "fsl,sec-v4.0-pwrkey";
+			regmap = <&sec_mon>;
+			interrupts = <0 4 0x4>;
+			linux,keycode = <116>; /* KEY_POWER */
+			wakeup;
+		};
 	};
 
 =====================================================================
diff --git a/Documentation/devicetree/bindings/input/snvs-pwrkey.txt b/Documentation/devicetree/bindings/input/snvs-pwrkey.txt
new file mode 100644
index 000000000000..70c14250323b
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/snvs-pwrkey.txt
@@ -0,0 +1 @@
+See Documentation/devicetree/bindings/crypto/fsl-sec4.txt
-- 
cgit v1.2.3


From 21b05de4c8fac08fff08cf84ef1d4fe5786f9608 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Thu, 14 May 2015 17:29:51 -0700
Subject: documentation: Bring rcutorture parameters up to date

This commit changes the documentation of the rcutorture parameters to
better match reality.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/kernel-parameters.txt | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1d6f0459cd7b..01b5b68a237a 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3135,22 +3135,35 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			in a given burst of a callback-flood test.
 
 	rcutorture.fqs_duration= [KNL]
-			Set duration of force_quiescent_state bursts.
+			Set duration of force_quiescent_state bursts
+			in microseconds.
 
 	rcutorture.fqs_holdoff= [KNL]
-			Set holdoff time within force_quiescent_state bursts.
+			Set holdoff time within force_quiescent_state bursts
+			in microseconds.
 
 	rcutorture.fqs_stutter= [KNL]
-			Set wait time between force_quiescent_state bursts.
+			Set wait time between force_quiescent_state bursts
+			in seconds.
+
+	rcutorture.gp_cond= [KNL]
+			Use conditional/asynchronous update-side
+			primitives, if available.
 
 	rcutorture.gp_exp= [KNL]
-			Use expedited update-side primitives.
+			Use expedited update-side primitives, if available.
 
 	rcutorture.gp_normal= [KNL]
-			Use normal (non-expedited) update-side primitives.
-			If both gp_exp and gp_normal are set, do both.
-			If neither gp_exp nor gp_normal are set, still
-			do both.
+			Use normal (non-expedited) asynchronous
+			update-side primitives, if available.
+
+	rcutorture.gp_sync= [KNL]
+			Use normal (non-expedited) synchronous
+			update-side primitives, if available.  If all
+			of rcutorture.gp_cond=, rcutorture.gp_exp=,
+			rcutorture.gp_normal=, and rcutorture.gp_sync=
+			are zero, rcutorture acts as if is interpreted
+			they are all non-zero.
 
 	rcutorture.n_barrier_cbs= [KNL]
 			Set callbacks/threads for rcu_barrier() testing.
@@ -3177,9 +3190,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Set time (s) between CPU-hotplug operations, or
 			zero to disable CPU-hotplug testing.
 
-	rcutorture.torture_runnable= [BOOT]
-			Start rcutorture running at boot time.
-
 	rcutorture.shuffle_interval= [KNL]
 			Set task-shuffle interval (s).  Shuffling tasks
 			allows some CPUs to go into dyntick-idle mode
@@ -3220,6 +3230,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Test RCU's dyntick-idle handling.  See also the
 			rcutorture.shuffle_interval parameter.
 
+	rcutorture.torture_runnable= [BOOT]
+			Start rcutorture running at boot time.
+
 	rcutorture.torture_type= [KNL]
 			Specify the RCU implementation to test.
 
-- 
cgit v1.2.3


From 297f739938908a4262603314576e32ee7375296c Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Thu, 14 May 2015 17:31:07 -0700
Subject: documentation: Fix spelling of "operators"

Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/rcu_dereference.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/RCU/rcu_dereference.txt b/Documentation/RCU/rcu_dereference.txt
index 1e6c0da994f5..c0bf2441a2ba 100644
--- a/Documentation/RCU/rcu_dereference.txt
+++ b/Documentation/RCU/rcu_dereference.txt
@@ -28,7 +28,7 @@ o	You must use one of the rcu_dereference() family of primitives
 o	Avoid cancellation when using the "+" and "-" infix arithmetic
 	operators.  For example, for a given variable "x", avoid
 	"(x-x)".  There are similar arithmetic pitfalls from other
-	arithmetic operatiors, such as "(x*0)", "(x/(x+1))" or "(x%1)".
+	arithmetic operators, such as "(x*0)", "(x/(x+1))" or "(x%1)".
 	The compiler is within its rights to substitute zero for all of
 	these expressions, so that subsequent accesses no longer depend
 	on the rcu_dereference(), again possibly resulting in bugs due
-- 
cgit v1.2.3


From 57aecae950c55ef50934640794160cd118e73256 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Mon, 18 May 2015 18:27:42 -0700
Subject: documentation: Fix variable-name typo in memory-barriers.txt

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 13feb697271f..3d06f98b2ff2 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -746,7 +746,7 @@ You must also be careful not to rely too much on boolean short-circuit
 evaluation.  Consider this example:
 
 	q = READ_ONCE_CTRL(a);
-	if (a || 1 > 0)
+	if (q || 1 > 0)
 		ACCESS_ONCE(b) = 1;
 
 Because the first condition cannot fault and the second condition is
-- 
cgit v1.2.3


From 9af194cefc3c40e75a59df4cbb06e1c1064bee7f Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Thu, 18 Jun 2015 14:33:24 -0700
Subject: documentation: Replace ACCESS_ONCE() by READ_ONCE() and WRITE_ONCE()

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 346 +++++++++++++++++++-------------------
 1 file changed, 177 insertions(+), 169 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 3d06f98b2ff2..470c07c868e4 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -194,22 +194,22 @@ There are some minimal guarantees that may be expected of a CPU:
  (*) On any given CPU, dependent memory accesses will be issued in order, with
      respect to itself.  This means that for:
 
-	ACCESS_ONCE(Q) = P; smp_read_barrier_depends(); D = ACCESS_ONCE(*Q);
+	WRITE_ONCE(Q, P); smp_read_barrier_depends(); D = READ_ONCE(*Q);
 
      the CPU will issue the following memory operations:
 
 	Q = LOAD P, D = LOAD *Q
 
      and always in that order.  On most systems, smp_read_barrier_depends()
-     does nothing, but it is required for DEC Alpha.  The ACCESS_ONCE()
-     is required to prevent compiler mischief.  Please note that you
-     should normally use something like rcu_dereference() instead of
-     open-coding smp_read_barrier_depends().
+     does nothing, but it is required for DEC Alpha.  The READ_ONCE()
+     and WRITE_ONCE() are required to prevent compiler mischief.  Please
+     note that you should normally use something like rcu_dereference()
+     instead of open-coding smp_read_barrier_depends().
 
  (*) Overlapping loads and stores within a particular CPU will appear to be
      ordered within that CPU.  This means that for:
 
-	a = ACCESS_ONCE(*X); ACCESS_ONCE(*X) = b;
+	a = READ_ONCE(*X); WRITE_ONCE(*X, b);
 
      the CPU will only issue the following sequence of memory operations:
 
@@ -217,7 +217,7 @@ There are some minimal guarantees that may be expected of a CPU:
 
      And for:
 
-	ACCESS_ONCE(*X) = c; d = ACCESS_ONCE(*X);
+	WRITE_ONCE(*X, c); d = READ_ONCE(*X);
 
      the CPU will only issue:
 
@@ -228,11 +228,11 @@ There are some minimal guarantees that may be expected of a CPU:
 
 And there are a number of things that _must_ or _must_not_ be assumed:
 
- (*) It _must_not_ be assumed that the compiler will do what you want with
-     memory references that are not protected by ACCESS_ONCE().  Without
-     ACCESS_ONCE(), the compiler is within its rights to do all sorts
-     of "creative" transformations, which are covered in the Compiler
-     Barrier section.
+ (*) It _must_not_ be assumed that the compiler will do what you want
+     with memory references that are not protected by READ_ONCE() and
+     WRITE_ONCE().  Without them, the compiler is within its rights to
+     do all sorts of "creative" transformations, which are covered in
+     the Compiler Barrier section.
 
  (*) It _must_not_ be assumed that independent loads and stores will be issued
      in the order given.  This means that for:
@@ -520,8 +520,8 @@ following sequence of events:
 	{ A == 1, B == 2, C = 3, P == &A, Q == &C }
 	B = 4;
 	<write barrier>
-	ACCESS_ONCE(P) = &B
-			      Q = ACCESS_ONCE(P);
+	WRITE_ONCE(P, &B)
+			      Q = READ_ONCE(P);
 			      D = *Q;
 
 There's a clear data dependency here, and it would seem that by the end of the
@@ -547,8 +547,8 @@ between the address load and the data load:
 	{ A == 1, B == 2, C = 3, P == &A, Q == &C }
 	B = 4;
 	<write barrier>
-	ACCESS_ONCE(P) = &B
-			      Q = ACCESS_ONCE(P);
+	WRITE_ONCE(P, &B);
+			      Q = READ_ONCE(P);
 			      <data dependency barrier>
 			      D = *Q;
 
@@ -574,8 +574,8 @@ access:
 	{ M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
 	M[1] = 4;
 	<write barrier>
-	ACCESS_ONCE(P) = 1
-			      Q = ACCESS_ONCE(P);
+	WRITE_ONCE(P, 1);
+			      Q = READ_ONCE(P);
 			      <data dependency barrier>
 			      D = M[Q];
 
@@ -596,10 +596,10 @@ A load-load control dependency requires a full read memory barrier, not
 simply a data dependency barrier to make it work correctly.  Consider the
 following bit of code:
 
-	q = ACCESS_ONCE(a);
+	q = READ_ONCE(a);
 	if (q) {
 		<data dependency barrier>  /* BUG: No data dependency!!! */
-		p = ACCESS_ONCE(b);
+		p = READ_ONCE(b);
 	}
 
 This will not have the desired effect because there is no actual data
@@ -608,10 +608,10 @@ by attempting to predict the outcome in advance, so that other CPUs see
 the load from b as having happened before the load from a.  In such a
 case what's actually required is:
 
-	q = ACCESS_ONCE(a);
+	q = READ_ONCE(a);
 	if (q) {
 		<read barrier>
-		p = ACCESS_ONCE(b);
+		p = READ_ONCE(b);
 	}
 
 However, stores are not speculated.  This means that ordering -is- provided
@@ -619,7 +619,7 @@ for load-store control dependencies, as in the following example:
 
 	q = READ_ONCE_CTRL(a);
 	if (q) {
-		ACCESS_ONCE(b) = p;
+		WRITE_ONCE(b, p);
 	}
 
 Control dependencies pair normally with other types of barriers.  That
@@ -647,11 +647,11 @@ branches of the "if" statement as follows:
 	q = READ_ONCE_CTRL(a);
 	if (q) {
 		barrier();
-		ACCESS_ONCE(b) = p;
+		WRITE_ONCE(b, p);
 		do_something();
 	} else {
 		barrier();
-		ACCESS_ONCE(b) = p;
+		WRITE_ONCE(b, p);
 		do_something_else();
 	}
 
@@ -660,12 +660,12 @@ optimization levels:
 
 	q = READ_ONCE_CTRL(a);
 	barrier();
-	ACCESS_ONCE(b) = p;  /* BUG: No ordering vs. load from a!!! */
+	WRITE_ONCE(b, p);  /* BUG: No ordering vs. load from a!!! */
 	if (q) {
-		/* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */
+		/* WRITE_ONCE(b, p); -- moved up, BUG!!! */
 		do_something();
 	} else {
-		/* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */
+		/* WRITE_ONCE(b, p); -- moved up, BUG!!! */
 		do_something_else();
 	}
 
@@ -676,7 +676,7 @@ assembly code even after all compiler optimizations have been applied.
 Therefore, if you need ordering in this example, you need explicit
 memory barriers, for example, smp_store_release():
 
-	q = ACCESS_ONCE(a);
+	q = READ_ONCE(a);
 	if (q) {
 		smp_store_release(&b, p);
 		do_something();
@@ -690,10 +690,10 @@ ordering is guaranteed only when the stores differ, for example:
 
 	q = READ_ONCE_CTRL(a);
 	if (q) {
-		ACCESS_ONCE(b) = p;
+		WRITE_ONCE(b, p);
 		do_something();
 	} else {
-		ACCESS_ONCE(b) = r;
+		WRITE_ONCE(b, r);
 		do_something_else();
 	}
 
@@ -706,10 +706,10 @@ the needed conditional.  For example:
 
 	q = READ_ONCE_CTRL(a);
 	if (q % MAX) {
-		ACCESS_ONCE(b) = p;
+		WRITE_ONCE(b, p);
 		do_something();
 	} else {
-		ACCESS_ONCE(b) = r;
+		WRITE_ONCE(b, r);
 		do_something_else();
 	}
 
@@ -718,7 +718,7 @@ equal to zero, in which case the compiler is within its rights to
 transform the above code into the following:
 
 	q = READ_ONCE_CTRL(a);
-	ACCESS_ONCE(b) = p;
+	WRITE_ONCE(b, p);
 	do_something_else();
 
 Given this transformation, the CPU is not required to respect the ordering
@@ -731,10 +731,10 @@ one, perhaps as follows:
 	q = READ_ONCE_CTRL(a);
 	BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
 	if (q % MAX) {
-		ACCESS_ONCE(b) = p;
+		WRITE_ONCE(b, p);
 		do_something();
 	} else {
-		ACCESS_ONCE(b) = r;
+		WRITE_ONCE(b, r);
 		do_something_else();
 	}
 
@@ -747,17 +747,17 @@ evaluation.  Consider this example:
 
 	q = READ_ONCE_CTRL(a);
 	if (q || 1 > 0)
-		ACCESS_ONCE(b) = 1;
+		WRITE_ONCE(b, 1);
 
 Because the first condition cannot fault and the second condition is
 always true, the compiler can transform this example as following,
 defeating control dependency:
 
 	q = READ_ONCE_CTRL(a);
-	ACCESS_ONCE(b) = 1;
+	WRITE_ONCE(b, 1);
 
 This example underscores the need to ensure that the compiler cannot
-out-guess your code.  More generally, although ACCESS_ONCE() does force
+out-guess your code.  More generally, although READ_ONCE() does force
 the compiler to actually emit code for a given load, it does not force
 the compiler to use the results.
 
@@ -769,7 +769,7 @@ x and y both being zero:
 	=======================   =======================
 	r1 = READ_ONCE_CTRL(x);   r2 = READ_ONCE_CTRL(y);
 	if (r1 > 0)               if (r2 > 0)
-	  ACCESS_ONCE(y) = 1;       ACCESS_ONCE(x) = 1;
+	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
 
 	assert(!(r1 == 1 && r2 == 1));
 
@@ -779,7 +779,7 @@ then adding the following CPU would guarantee a related assertion:
 
 	CPU 2
 	=====================
-	ACCESS_ONCE(x) = 2;
+	WRITE_ONCE(x, 2);
 
 	assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */
 
@@ -798,8 +798,7 @@ In summary:
 
   (*) Control dependencies must be headed by READ_ONCE_CTRL().
       Or, as a much less preferable alternative, interpose
-      be headed by READ_ONCE() or an ACCESS_ONCE() read and must
-      have smp_read_barrier_depends() between this read and the
+      smp_read_barrier_depends() between a READ_ONCE() and the
       control-dependent write.
 
   (*) Control dependencies can order prior loads against later stores.
@@ -815,15 +814,16 @@ In summary:
 
   (*) Control dependencies require at least one run-time conditional
       between the prior load and the subsequent store, and this
-      conditional must involve the prior load.  If the compiler
-      is able to optimize the conditional away, it will have also
-      optimized away the ordering.  Careful use of ACCESS_ONCE() can
-      help to preserve the needed conditional.
+      conditional must involve the prior load.  If the compiler is able
+      to optimize the conditional away, it will have also optimized
+      away the ordering.  Careful use of READ_ONCE_CTRL() READ_ONCE(),
+      and WRITE_ONCE() can help to preserve the needed conditional.
 
   (*) Control dependencies require that the compiler avoid reordering the
-      dependency into nonexistence.  Careful use of ACCESS_ONCE() or
-      barrier() can help to preserve your control dependency.  Please
-      see the Compiler Barrier section for more information.
+      dependency into nonexistence.  Careful use of READ_ONCE_CTRL()
+      or smp_read_barrier_depends() can help to preserve your control
+      dependency.  Please see the Compiler Barrier section for more
+      information.
 
   (*) Control dependencies pair normally with other types of barriers.
 
@@ -848,11 +848,11 @@ barrier, an acquire barrier, a release barrier, or a general barrier:
 
 	CPU 1		      CPU 2
 	===============	      ===============
-	ACCESS_ONCE(a) = 1;
+	WRITE_ONCE(a, 1);
 	<write barrier>
-	ACCESS_ONCE(b) = 2;   x = ACCESS_ONCE(b);
+	WRITE_ONCE(b, 2);     x = READ_ONCE(b);
 			      <read barrier>
-			      y = ACCESS_ONCE(a);
+			      y = READ_ONCE(a);
 
 Or:
 
@@ -860,7 +860,7 @@ Or:
 	===============	      ===============================
 	a = 1;
 	<write barrier>
-	ACCESS_ONCE(b) = &a;  x = ACCESS_ONCE(b);
+	WRITE_ONCE(b, &a);    x = READ_ONCE(b);
 			      <data dependency barrier>
 			      y = *x;
 
@@ -868,11 +868,11 @@ Or even:
 
 	CPU 1		      CPU 2
 	===============	      ===============================
-	r1 = ACCESS_ONCE(y);
+	r1 = READ_ONCE(y);
 	<general barrier>
-	ACCESS_ONCE(y) = 1;   if (r2 = ACCESS_ONCE(x)) {
+	WRITE_ONCE(y, 1);     if (r2 = READ_ONCE(x)) {
 			         <implicit control dependency>
-			         ACCESS_ONCE(y) = 1;
+			         WRITE_ONCE(y, 1);
 			      }
 
 	assert(r1 == 0 || r2 == 0);
@@ -886,11 +886,11 @@ versa:
 
 	CPU 1                               CPU 2
 	===================                 ===================
-	ACCESS_ONCE(a) = 1;  }----   --->{  v = ACCESS_ONCE(c);
-	ACCESS_ONCE(b) = 2;  }    \ /    {  w = ACCESS_ONCE(d);
+	WRITE_ONCE(a, 1);    }----   --->{  v = READ_ONCE(c);
+	WRITE_ONCE(b, 2);    }    \ /    {  w = READ_ONCE(d);
 	<write barrier>            \        <read barrier>
-	ACCESS_ONCE(c) = 3;  }    / \    {  x = ACCESS_ONCE(a);
-	ACCESS_ONCE(d) = 4;  }----   --->{  y = ACCESS_ONCE(b);
+	WRITE_ONCE(c, 3);    }    / \    {  x = READ_ONCE(a);
+	WRITE_ONCE(d, 4);    }----   --->{  y = READ_ONCE(b);
 
 
 EXAMPLES OF MEMORY BARRIER SEQUENCES
@@ -1340,10 +1340,10 @@ compiler from moving the memory accesses either side of it to the other side:
 
 	barrier();
 
-This is a general barrier -- there are no read-read or write-write variants
-of barrier().  However, ACCESS_ONCE() can be thought of as a weak form
-for barrier() that affects only the specific accesses flagged by the
-ACCESS_ONCE().
+This is a general barrier -- there are no read-read or write-write
+variants of barrier().  However, READ_ONCE() and WRITE_ONCE() can be
+thought of as weak forms of barrier() that affect only the specific
+accesses flagged by the READ_ONCE() or WRITE_ONCE().
 
 The barrier() function has the following effects:
 
@@ -1355,9 +1355,10 @@ The barrier() function has the following effects:
  (*) Within a loop, forces the compiler to load the variables used
      in that loop's conditional on each pass through that loop.
 
-The ACCESS_ONCE() function can prevent any number of optimizations that,
-while perfectly safe in single-threaded code, can be fatal in concurrent
-code.  Here are some examples of these sorts of optimizations:
+The READ_ONCE() and WRITE_ONCE() functions can prevent any number of
+optimizations that, while perfectly safe in single-threaded code, can
+be fatal in concurrent code.  Here are some examples of these sorts
+of optimizations:
 
  (*) The compiler is within its rights to reorder loads and stores
      to the same variable, and in some cases, the CPU is within its
@@ -1370,11 +1371,11 @@ code.  Here are some examples of these sorts of optimizations:
      Might result in an older value of x stored in a[1] than in a[0].
      Prevent both the compiler and the CPU from doing this as follows:
 
-	a[0] = ACCESS_ONCE(x);
-	a[1] = ACCESS_ONCE(x);
+	a[0] = READ_ONCE(x);
+	a[1] = READ_ONCE(x);
 
-     In short, ACCESS_ONCE() provides cache coherence for accesses from
-     multiple CPUs to a single variable.
+     In short, READ_ONCE() and WRITE_ONCE() provide cache coherence for
+     accesses from multiple CPUs to a single variable.
 
  (*) The compiler is within its rights to merge successive loads from
      the same variable.  Such merging can cause the compiler to "optimize"
@@ -1391,9 +1392,9 @@ code.  Here are some examples of these sorts of optimizations:
 		for (;;)
 			do_something_with(tmp);
 
-     Use ACCESS_ONCE() to prevent the compiler from doing this to you:
+     Use READ_ONCE() to prevent the compiler from doing this to you:
 
-	while (tmp = ACCESS_ONCE(a))
+	while (tmp = READ_ONCE(a))
 		do_something_with(tmp);
 
  (*) The compiler is within its rights to reload a variable, for example,
@@ -1415,9 +1416,9 @@ code.  Here are some examples of these sorts of optimizations:
      a was modified by some other CPU between the "while" statement and
      the call to do_something_with().
 
-     Again, use ACCESS_ONCE() to prevent the compiler from doing this:
+     Again, use READ_ONCE() to prevent the compiler from doing this:
 
-	while (tmp = ACCESS_ONCE(a))
+	while (tmp = READ_ONCE(a))
 		do_something_with(tmp);
 
      Note that if the compiler runs short of registers, it might save
@@ -1437,21 +1438,21 @@ code.  Here are some examples of these sorts of optimizations:
 
 	do { } while (0);
 
-     This transformation is a win for single-threaded code because it gets
-     rid of a load and a branch.  The problem is that the compiler will
-     carry out its proof assuming that the current CPU is the only one
-     updating variable 'a'.  If variable 'a' is shared, then the compiler's
-     proof will be erroneous.  Use ACCESS_ONCE() to tell the compiler
-     that it doesn't know as much as it thinks it does:
+     This transformation is a win for single-threaded code because it
+     gets rid of a load and a branch.  The problem is that the compiler
+     will carry out its proof assuming that the current CPU is the only
+     one updating variable 'a'.  If variable 'a' is shared, then the
+     compiler's proof will be erroneous.  Use READ_ONCE() to tell the
+     compiler that it doesn't know as much as it thinks it does:
 
-	while (tmp = ACCESS_ONCE(a))
+	while (tmp = READ_ONCE(a))
 		do_something_with(tmp);
 
      But please note that the compiler is also closely watching what you
-     do with the value after the ACCESS_ONCE().  For example, suppose you
+     do with the value after the READ_ONCE().  For example, suppose you
      do the following and MAX is a preprocessor macro with the value 1:
 
-	while ((tmp = ACCESS_ONCE(a)) % MAX)
+	while ((tmp = READ_ONCE(a)) % MAX)
 		do_something_with(tmp);
 
      Then the compiler knows that the result of the "%" operator applied
@@ -1475,12 +1476,12 @@ code.  Here are some examples of these sorts of optimizations:
      surprise if some other CPU might have stored to variable 'a' in the
      meantime.
 
-     Use ACCESS_ONCE() to prevent the compiler from making this sort of
+     Use WRITE_ONCE() to prevent the compiler from making this sort of
      wrong guess:
 
-	ACCESS_ONCE(a) = 0;
+	WRITE_ONCE(a, 0);
 	/* Code that does not store to variable a. */
-	ACCESS_ONCE(a) = 0;
+	WRITE_ONCE(a, 0);
 
  (*) The compiler is within its rights to reorder memory accesses unless
      you tell it not to.  For example, consider the following interaction
@@ -1509,40 +1510,43 @@ code.  Here are some examples of these sorts of optimizations:
 	}
 
      If the interrupt occurs between these two statement, then
-     interrupt_handler() might be passed a garbled msg.  Use ACCESS_ONCE()
+     interrupt_handler() might be passed a garbled msg.  Use WRITE_ONCE()
      to prevent this as follows:
 
 	void process_level(void)
 	{
-		ACCESS_ONCE(msg) = get_message();
-		ACCESS_ONCE(flag) = true;
+		WRITE_ONCE(msg, get_message());
+		WRITE_ONCE(flag, true);
 	}
 
 	void interrupt_handler(void)
 	{
-		if (ACCESS_ONCE(flag))
-			process_message(ACCESS_ONCE(msg));
+		if (READ_ONCE(flag))
+			process_message(READ_ONCE(msg));
 	}
 
-     Note that the ACCESS_ONCE() wrappers in interrupt_handler()
-     are needed if this interrupt handler can itself be interrupted
-     by something that also accesses 'flag' and 'msg', for example,
-     a nested interrupt or an NMI.  Otherwise, ACCESS_ONCE() is not
-     needed in interrupt_handler() other than for documentation purposes.
-     (Note also that nested interrupts do not typically occur in modern
-     Linux kernels, in fact, if an interrupt handler returns with
-     interrupts enabled, you will get a WARN_ONCE() splat.)
-
-     You should assume that the compiler can move ACCESS_ONCE() past
-     code not containing ACCESS_ONCE(), barrier(), or similar primitives.
-
-     This effect could also be achieved using barrier(), but ACCESS_ONCE()
-     is more selective:  With ACCESS_ONCE(), the compiler need only forget
-     the contents of the indicated memory locations, while with barrier()
-     the compiler must discard the value of all memory locations that
-     it has currented cached in any machine registers.  Of course,
-     the compiler must also respect the order in which the ACCESS_ONCE()s
-     occur, though the CPU of course need not do so.
+     Note that the READ_ONCE() and WRITE_ONCE() wrappers in
+     interrupt_handler() are needed if this interrupt handler can itself
+     be interrupted by something that also accesses 'flag' and 'msg',
+     for example, a nested interrupt or an NMI.  Otherwise, READ_ONCE()
+     and WRITE_ONCE() are not needed in interrupt_handler() other than
+     for documentation purposes.  (Note also that nested interrupts
+     do not typically occur in modern Linux kernels, in fact, if an
+     interrupt handler returns with interrupts enabled, you will get a
+     WARN_ONCE() splat.)
+
+     You should assume that the compiler can move READ_ONCE() and
+     WRITE_ONCE() past code not containing READ_ONCE(), WRITE_ONCE(),
+     barrier(), or similar primitives.
+
+     This effect could also be achieved using barrier(), but READ_ONCE()
+     and WRITE_ONCE() are more selective:  With READ_ONCE() and
+     WRITE_ONCE(), the compiler need only forget the contents of the
+     indicated memory locations, while with barrier() the compiler must
+     discard the value of all memory locations that it has currented
+     cached in any machine registers.  Of course, the compiler must also
+     respect the order in which the READ_ONCE()s and WRITE_ONCE()s occur,
+     though the CPU of course need not do so.
 
  (*) The compiler is within its rights to invent stores to a variable,
      as in the following example:
@@ -1562,16 +1566,16 @@ code.  Here are some examples of these sorts of optimizations:
      a branch.  Unfortunately, in concurrent code, this optimization
      could cause some other CPU to see a spurious value of 42 -- even
      if variable 'a' was never zero -- when loading variable 'b'.
-     Use ACCESS_ONCE() to prevent this as follows:
+     Use WRITE_ONCE() to prevent this as follows:
 
 	if (a)
-		ACCESS_ONCE(b) = a;
+		WRITE_ONCE(b, a);
 	else
-		ACCESS_ONCE(b) = 42;
+		WRITE_ONCE(b, 42);
 
      The compiler can also invent loads.  These are usually less
      damaging, but they can result in cache-line bouncing and thus in
-     poor performance and scalability.  Use ACCESS_ONCE() to prevent
+     poor performance and scalability.  Use READ_ONCE() to prevent
      invented loads.
 
  (*) For aligned memory locations whose size allows them to be accessed
@@ -1590,9 +1594,9 @@ code.  Here are some examples of these sorts of optimizations:
      This optimization can therefore be a win in single-threaded code.
      In fact, a recent bug (since fixed) caused GCC to incorrectly use
      this optimization in a volatile store.  In the absence of such bugs,
-     use of ACCESS_ONCE() prevents store tearing in the following example:
+     use of WRITE_ONCE() prevents store tearing in the following example:
 
-	ACCESS_ONCE(p) = 0x00010002;
+	WRITE_ONCE(p, 0x00010002);
 
      Use of packed structures can also result in load and store tearing,
      as in this example:
@@ -1609,22 +1613,23 @@ code.  Here are some examples of these sorts of optimizations:
 	foo2.b = foo1.b;
 	foo2.c = foo1.c;
 
-     Because there are no ACCESS_ONCE() wrappers and no volatile markings,
-     the compiler would be well within its rights to implement these three
-     assignment statements as a pair of 32-bit loads followed by a pair
-     of 32-bit stores.  This would result in load tearing on 'foo1.b'
-     and store tearing on 'foo2.b'.  ACCESS_ONCE() again prevents tearing
-     in this example:
+     Because there are no READ_ONCE() or WRITE_ONCE() wrappers and no
+     volatile markings, the compiler would be well within its rights to
+     implement these three assignment statements as a pair of 32-bit
+     loads followed by a pair of 32-bit stores.  This would result in
+     load tearing on 'foo1.b' and store tearing on 'foo2.b'.  READ_ONCE()
+     and WRITE_ONCE() again prevent tearing in this example:
 
 	foo2.a = foo1.a;
-	ACCESS_ONCE(foo2.b) = ACCESS_ONCE(foo1.b);
+	WRITE_ONCE(foo2.b, READ_ONCE(foo1.b));
 	foo2.c = foo1.c;
 
-All that aside, it is never necessary to use ACCESS_ONCE() on a variable
-that has been marked volatile.  For example, because 'jiffies' is marked
-volatile, it is never necessary to say ACCESS_ONCE(jiffies).  The reason
-for this is that ACCESS_ONCE() is implemented as a volatile cast, which
-has no effect when its argument is already marked volatile.
+All that aside, it is never necessary to use READ_ONCE() and
+WRITE_ONCE() on a variable that has been marked volatile.  For example,
+because 'jiffies' is marked volatile, it is never necessary to
+say READ_ONCE(jiffies).  The reason for this is that READ_ONCE() and
+WRITE_ONCE() are implemented as volatile casts, which has no effect when
+its argument is already marked volatile.
 
 Please note that these compiler barriers have no direct effect on the CPU,
 which may then reorder things however it wishes.
@@ -1646,14 +1651,15 @@ The Linux kernel has eight basic CPU memory barriers:
 All memory barriers except the data dependency barriers imply a compiler
 barrier. Data dependencies do not impose any additional compiler ordering.
 
-Aside: In the case of data dependencies, the compiler would be expected to
-issue the loads in the correct order (eg. `a[b]` would have to load the value
-of b before loading a[b]), however there is no guarantee in the C specification
-that the compiler may not speculate the value of b (eg. is equal to 1) and load
-a before b (eg. tmp = a[1]; if (b != 1) tmp = a[b]; ). There is also the
-problem of a compiler reloading b after having loaded a[b], thus having a newer
-copy of b than a[b]. A consensus has not yet been reached about these problems,
-however the ACCESS_ONCE macro is a good place to start looking.
+Aside: In the case of data dependencies, the compiler would be expected
+to issue the loads in the correct order (eg. `a[b]` would have to load
+the value of b before loading a[b]), however there is no guarantee in
+the C specification that the compiler may not speculate the value of b
+(eg. is equal to 1) and load a before b (eg. tmp = a[1]; if (b != 1)
+tmp = a[b]; ). There is also the problem of a compiler reloading b after
+having loaded a[b], thus having a newer copy of b than a[b]. A consensus
+has not yet been reached about these problems, however the READ_ONCE()
+macro is a good place to start looking.
 
 SMP memory barriers are reduced to compiler barriers on uniprocessor compiled
 systems because it is assumed that a CPU will appear to be self-consistent,
@@ -2126,12 +2132,12 @@ three CPUs; then should the following sequence of events occur:
 
 	CPU 1				CPU 2
 	===============================	===============================
-	ACCESS_ONCE(*A) = a;		ACCESS_ONCE(*E) = e;
+	WRITE_ONCE(*A, a);		WRITE_ONCE(*E, e);
 	ACQUIRE M			ACQUIRE Q
-	ACCESS_ONCE(*B) = b;		ACCESS_ONCE(*F) = f;
-	ACCESS_ONCE(*C) = c;		ACCESS_ONCE(*G) = g;
+	WRITE_ONCE(*B, b);		WRITE_ONCE(*F, f);
+	WRITE_ONCE(*C, c);		WRITE_ONCE(*G, g);
 	RELEASE M			RELEASE Q
-	ACCESS_ONCE(*D) = d;		ACCESS_ONCE(*H) = h;
+	WRITE_ONCE(*D, d);		WRITE_ONCE(*H, h);
 
 Then there is no guarantee as to what order CPU 3 will see the accesses to *A
 through *H occur in, other than the constraints imposed by the separate locks
@@ -2151,18 +2157,18 @@ However, if the following occurs:
 
 	CPU 1				CPU 2
 	===============================	===============================
-	ACCESS_ONCE(*A) = a;
+	WRITE_ONCE(*A, a);
 	ACQUIRE M		     [1]
-	ACCESS_ONCE(*B) = b;
-	ACCESS_ONCE(*C) = c;
+	WRITE_ONCE(*B, b);
+	WRITE_ONCE(*C, c);
 	RELEASE M	     [1]
-	ACCESS_ONCE(*D) = d;		ACCESS_ONCE(*E) = e;
+	WRITE_ONCE(*D, d);		WRITE_ONCE(*E, e);
 					ACQUIRE M		     [2]
 					smp_mb__after_unlock_lock();
-					ACCESS_ONCE(*F) = f;
-					ACCESS_ONCE(*G) = g;
+					WRITE_ONCE(*F, f);
+					WRITE_ONCE(*G, g);
 					RELEASE M	     [2]
-					ACCESS_ONCE(*H) = h;
+					WRITE_ONCE(*H, h);
 
 CPU 3 might see:
 
@@ -2881,11 +2887,11 @@ A programmer might take it for granted that the CPU will perform memory
 operations in exactly the order specified, so that if the CPU is, for example,
 given the following piece of code to execute:
 
-	a = ACCESS_ONCE(*A);
-	ACCESS_ONCE(*B) = b;
-	c = ACCESS_ONCE(*C);
-	d = ACCESS_ONCE(*D);
-	ACCESS_ONCE(*E) = e;
+	a = READ_ONCE(*A);
+	WRITE_ONCE(*B, b);
+	c = READ_ONCE(*C);
+	d = READ_ONCE(*D);
+	WRITE_ONCE(*E, e);
 
 they would then expect that the CPU will complete the memory operation for each
 instruction before moving on to the next one, leading to a definite sequence of
@@ -2932,12 +2938,12 @@ However, it is guaranteed that a CPU will be self-consistent: it will see its
 _own_ accesses appear to be correctly ordered, without the need for a memory
 barrier.  For instance with the following code:
 
-	U = ACCESS_ONCE(*A);
-	ACCESS_ONCE(*A) = V;
-	ACCESS_ONCE(*A) = W;
-	X = ACCESS_ONCE(*A);
-	ACCESS_ONCE(*A) = Y;
-	Z = ACCESS_ONCE(*A);
+	U = READ_ONCE(*A);
+	WRITE_ONCE(*A, V);
+	WRITE_ONCE(*A, W);
+	X = READ_ONCE(*A);
+	WRITE_ONCE(*A, Y);
+	Z = READ_ONCE(*A);
 
 and assuming no intervention by an external influence, it can be assumed that
 the final result will appear to be:
@@ -2953,13 +2959,14 @@ accesses:
 	U=LOAD *A, STORE *A=V, STORE *A=W, X=LOAD *A, STORE *A=Y, Z=LOAD *A
 
 in that order, but, without intervention, the sequence may have almost any
-combination of elements combined or discarded, provided the program's view of
-the world remains consistent.  Note that ACCESS_ONCE() is -not- optional
-in the above example, as there are architectures where a given CPU might
-reorder successive loads to the same location.  On such architectures,
-ACCESS_ONCE() does whatever is necessary to prevent this, for example, on
-Itanium the volatile casts used by ACCESS_ONCE() cause GCC to emit the
-special ld.acq and st.rel instructions that prevent such reordering.
+combination of elements combined or discarded, provided the program's view
+of the world remains consistent.  Note that READ_ONCE() and WRITE_ONCE()
+are -not- optional in the above example, as there are architectures
+where a given CPU might reorder successive loads to the same location.
+On such architectures, READ_ONCE() and WRITE_ONCE() do whatever is
+necessary to prevent this, for example, on Itanium the volatile casts
+used by READ_ONCE() and WRITE_ONCE() cause GCC to emit the special ld.acq
+and st.rel instructions (respectively) that prevent such reordering.
 
 The compiler may also combine, discard or defer elements of the sequence before
 the CPU even sees them.
@@ -2973,13 +2980,14 @@ may be reduced to:
 
 	*A = W;
 
-since, without either a write barrier or an ACCESS_ONCE(), it can be
+since, without either a write barrier or an WRITE_ONCE(), it can be
 assumed that the effect of the storage of V to *A is lost.  Similarly:
 
 	*A = Y;
 	Z = *A;
 
-may, without a memory barrier or an ACCESS_ONCE(), be reduced to:
+may, without a memory barrier or an READ_ONCE() and WRITE_ONCE(), be
+reduced to:
 
 	*A = Y;
 	Z = Y;
-- 
cgit v1.2.3


From 96d7744e0a5631a1b5fef2a97658150b165f02b6 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Mon, 13 Jul 2015 15:55:52 -0700
Subject: doc: Call out smp_mb__after_unlock_lock() transitivity

Although "full barrier" should be interpreted as providing transitivity,
it is worth eliminating any possible confusion.  This commit therefore
adds "(including transitivity)" to eliminate any possible confusion.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 470c07c868e4..318523872db5 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1858,11 +1858,12 @@ Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
 imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
 pair to produce a full barrier, the ACQUIRE can be followed by an
 smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
-if either (a) the RELEASE and the ACQUIRE are executed by the same
-CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable.
-The smp_mb__after_unlock_lock() primitive is free on many architectures.
-Without smp_mb__after_unlock_lock(), the CPU's execution of the critical
-sections corresponding to the RELEASE and the ACQUIRE can cross, so that:
+(including transitivity) if either (a) the RELEASE and the ACQUIRE are
+executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
+the same variable.  The smp_mb__after_unlock_lock() primitive is free
+on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
+execution of the critical sections corresponding to the RELEASE and the
+ACQUIRE can cross, so that:
 
 	*A = a;
 	RELEASE M
-- 
cgit v1.2.3


From 7c50181b69ce65b0a7db6936aca26f86b70e4436 Mon Sep 17 00:00:00 2001
From: Gregory Fong <gregory.0xf0@gmail.com>
Date: Wed, 17 Jun 2015 18:00:41 -0700
Subject: dt-bindings: brcmstb-gpio: document properties for wakeup

Some brcmstb GPIO controllers can be used to wake from suspend, so use
the de facto standard property 'wakeup-source' to mark the nodes of
controllers with that capability.

Also document interrupts-extended, which will be used for wakeup
handling because the interrupt parent for the wake IRQ is different
from the regular IRQ.

While we're at it, a few more fixes: We don't actually use the
"interrupt-names" property, so remove it from the listed optional
properties and from the examples.  And since we're modifying the
examples, also follow Brian's suggestions to:
- change #gpio-cells, #interrupt-cells, and brcm,gpio-bank-widths from
  hex to dec
- use phandles

Reviewed-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 .../devicetree/bindings/gpio/brcm,brcmstb-gpio.txt | 35 +++++++++++++++++-----
 1 file changed, 28 insertions(+), 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/gpio/brcm,brcmstb-gpio.txt b/Documentation/devicetree/bindings/gpio/brcm,brcmstb-gpio.txt
index 435f1bcca341..b405b4410bfb 100644
--- a/Documentation/devicetree/bindings/gpio/brcm,brcmstb-gpio.txt
+++ b/Documentation/devicetree/bindings/gpio/brcm,brcmstb-gpio.txt
@@ -33,6 +33,13 @@ Optional properties:
 - interrupt-parent:
     phandle of the parent interrupt controller
 
+- interrupts-extended:
+    Alternate form of specifying interrupts and parents that allows for
+    multiple parents.  This takes precedence over 'interrupts' and
+    'interrupt-parent'.  Wakeup-capable GPIO controllers often route their
+    wakeup interrupt lines through a different interrupt controller than the
+    primary interrupt line, making this property necessary.
+
 - #interrupt-cells:
     Should be <2>.  The first cell is the GPIO number, the second should specify
     flags.  The following subset of flags is supported:
@@ -47,19 +54,33 @@ Optional properties:
 - interrupt-controller:
     Marks the device node as an interrupt controller
 
-- interrupt-names:
-    The name of the IRQ resource used by this controller
+- wakeup-source:
+    GPIOs for this controller can be used as a wakeup source
 
 Example:
 	upg_gio: gpio@f040a700 {
-		#gpio-cells = <0x2>;
-		#interrupt-cells = <0x2>;
+		#gpio-cells = <2>;
+		#interrupt-cells = <2>;
 		compatible = "brcm,bcm7445-gpio", "brcm,brcmstb-gpio";
 		gpio-controller;
 		interrupt-controller;
 		reg = <0xf040a700 0x80>;
-		interrupt-parent = <0xf>;
+		interrupt-parent = <&irq0_intc>;
+		interrupts = <0x6>;
+		brcm,gpio-bank-widths = <32 32 32 24>;
+	};
+
+	upg_gio_aon: gpio@f04172c0 {
+		#gpio-cells = <2>;
+		#interrupt-cells = <2>;
+		compatible = "brcm,bcm7445-gpio", "brcm,brcmstb-gpio";
+		gpio-controller;
+		interrupt-controller;
+		reg = <0xf04172c0 0x40>;
+		interrupt-parent = <&irq0_aon_intc>;
 		interrupts = <0x6>;
-		interrupt-names = "upg_gio";
-		brcm,gpio-bank-widths = <0x20 0x20 0x20 0x18>;
+		interrupts-extended = <&irq0_aon_intc 0x6>,
+			<&aon_pm_l2_intc 0x5>;
+		wakeup-source;
+		brcm,gpio-bank-widths = <18 4>;
 	};
-- 
cgit v1.2.3


From 0c59d26770333cf605d9119a78dd6c1ebebc6a61 Mon Sep 17 00:00:00 2001
From: Tuomas Tynkkynen <ttynkkynen@nvidia.com>
Date: Wed, 13 May 2015 17:58:35 +0300
Subject: clk: tegra: Add binding for the Tegra124 DFLL clocksource

The DFLL is the main clocksource for the fast CPU cluster on Tegra124
and also provides automatic CPU rail voltage scaling as well. The DFLL
is a separate IP block from the usual Tegra124 clock-and-reset
controller, so it gets its own node in the device tree.

Signed-off-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com>
Signed-off-by: Mikko Perttunen <mikko.perttunen@kapsi.fi>
Acked-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 .../bindings/clock/nvidia,tegra124-dfll.txt        | 79 ++++++++++++++++++++++
 1 file changed, 79 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/nvidia,tegra124-dfll.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/clock/nvidia,tegra124-dfll.txt b/Documentation/devicetree/bindings/clock/nvidia,tegra124-dfll.txt
new file mode 100644
index 000000000000..ee7e5fd4a50b
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/nvidia,tegra124-dfll.txt
@@ -0,0 +1,79 @@
+NVIDIA Tegra124 DFLL FCPU clocksource
+
+This binding uses the common clock binding:
+Documentation/devicetree/bindings/clock/clock-bindings.txt
+
+The DFLL IP block on Tegra is a root clocksource designed for clocking
+the fast CPU cluster. It consists of a free-running voltage controlled
+oscillator connected to the CPU voltage rail (VDD_CPU), and a closed loop
+control module that will automatically adjust the VDD_CPU voltage by
+communicating with an off-chip PMIC either via an I2C bus or via PWM signals.
+Currently only the I2C mode is supported by these bindings.
+
+Required properties:
+- compatible : should be "nvidia,tegra124-dfll"
+- reg : Defines the following set of registers, in the order listed:
+        - registers for the DFLL control logic.
+        - registers for the I2C output logic.
+        - registers for the integrated I2C master controller.
+        - look-up table RAM for voltage register values.
+- interrupts: Should contain the DFLL block interrupt.
+- clocks: Must contain an entry for each entry in clock-names.
+  See clock-bindings.txt for details.
+- clock-names: Must include the following entries:
+  - soc: Clock source for the DFLL control logic.
+  - ref: The closed loop reference clock
+  - i2c: Clock source for the integrated I2C master.
+- resets: Must contain an entry for each entry in reset-names.
+  See ../reset/reset.txt for details.
+- reset-names: Must include the following entries:
+  - dvco: Reset control for the DFLL DVCO.
+- #clock-cells: Must be 0.
+- clock-output-names: Name of the clock output.
+- vdd-cpu-supply: Regulator for the CPU voltage rail that the DFLL
+  hardware will start controlling. The regulator will be queried for
+  the I2C register, control values and supported voltages.
+
+Required properties for the control loop parameters:
+- nvidia,sample-rate: Sample rate of the DFLL control loop.
+- nvidia,droop-ctrl: See the register CL_DVFS_DROOP_CTRL in the TRM.
+- nvidia,force-mode: See the field DFLL_PARAMS_FORCE_MODE in the TRM.
+- nvidia,cf: Numeric value, see the field DFLL_PARAMS_CF_PARAM in the TRM.
+- nvidia,ci: Numeric value, see the field DFLL_PARAMS_CI_PARAM in the TRM.
+- nvidia,cg: Numeric value, see the field DFLL_PARAMS_CG_PARAM in the TRM.
+
+Optional properties for the control loop parameters:
+- nvidia,cg-scale: Boolean value, see the field DFLL_PARAMS_CG_SCALE in the TRM.
+
+Required properties for I2C mode:
+- nvidia,i2c-fs-rate: I2C transfer rate, if using full speed mode.
+
+Example:
+
+clock@0,70110000 {
+        compatible = "nvidia,tegra124-dfll";
+        reg = <0 0x70110000 0 0x100>, /* DFLL control */
+              <0 0x70110000 0 0x100>, /* I2C output control */
+              <0 0x70110100 0 0x100>, /* Integrated I2C controller */
+              <0 0x70110200 0 0x100>; /* Look-up table RAM */
+        interrupts = <GIC_SPI 62 IRQ_TYPE_LEVEL_HIGH>;
+        clocks = <&tegra_car TEGRA124_CLK_DFLL_SOC>,
+                 <&tegra_car TEGRA124_CLK_DFLL_REF>,
+                 <&tegra_car TEGRA124_CLK_I2C5>;
+        clock-names = "soc", "ref", "i2c";
+        resets = <&tegra_car TEGRA124_RST_DFLL_DVCO>;
+        reset-names = "dvco";
+        #clock-cells = <0>;
+        clock-output-names = "dfllCPU_out";
+        vdd-cpu-supply = <&vdd_cpu>;
+        status = "okay";
+
+        nvidia,sample-rate = <12500>;
+        nvidia,droop-ctrl = <0x00000f00>;
+        nvidia,force-mode = <1>;
+        nvidia,cf = <10>;
+        nvidia,ci = <0>;
+        nvidia,cg = <2>;
+
+        nvidia,i2c-fs-rate = <400000>;
+};
-- 
cgit v1.2.3


From 135796c9ccd3d6c53fbc3f6c4d73d8ba6820b739 Mon Sep 17 00:00:00 2001
From: Tuomas Tynkkynen <ttynkkynen@nvidia.com>
Date: Wed, 13 May 2015 17:58:46 +0300
Subject: cpufreq: tegra124: Add device tree bindings

The cpufreq driver for Tegra124 will be a different one than the old
Tegra20 cpufreq driver (tegra-cpufreq), which does not use the device
tree.

Signed-off-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com>
Signed-off-by: Mikko Perttunen <mikko.perttunen@kapsi.fi>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 .../bindings/cpufreq/tegra124-cpufreq.txt          | 44 ++++++++++++++++++++++
 1 file changed, 44 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/cpufreq/tegra124-cpufreq.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/cpufreq/tegra124-cpufreq.txt b/Documentation/devicetree/bindings/cpufreq/tegra124-cpufreq.txt
new file mode 100644
index 000000000000..b1669fbfb740
--- /dev/null
+++ b/Documentation/devicetree/bindings/cpufreq/tegra124-cpufreq.txt
@@ -0,0 +1,44 @@
+Tegra124 CPU frequency scaling driver bindings
+----------------------------------------------
+
+Both required and optional properties listed below must be defined
+under node /cpus/cpu@0.
+
+Required properties:
+- clocks: Must contain an entry for each entry in clock-names.
+  See ../clocks/clock-bindings.txt for details.
+- clock-names: Must include the following entries:
+  - cpu_g: Clock mux for the fast CPU cluster.
+  - cpu_lp: Clock mux for the low-power CPU cluster.
+  - pll_x: Fast PLL clocksource.
+  - pll_p: Auxiliary PLL used during fast PLL rate changes.
+  - dfll: Fast DFLL clocksource that also automatically scales CPU voltage.
+- vdd-cpu-supply: Regulator for CPU voltage
+
+Optional properties:
+- clock-latency: Specify the possible maximum transition latency for clock,
+  in unit of nanoseconds.
+
+Example:
+--------
+cpus {
+	#address-cells = <1>;
+	#size-cells = <0>;
+
+	cpu@0 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0>;
+
+		clocks = <&tegra_car TEGRA124_CLK_CCLK_G>,
+			 <&tegra_car TEGRA124_CLK_CCLK_LP>,
+			 <&tegra_car TEGRA124_CLK_PLL_X>,
+			 <&tegra_car TEGRA124_CLK_PLL_P>,
+			 <&dfll>;
+		clock-names = "cpu_g", "cpu_lp", "pll_x", "pll_p", "dfll";
+		clock-latency = <300000>;
+		vdd-cpu-supply: <&vdd_cpu>;
+	};
+
+	<...>
+};
-- 
cgit v1.2.3


From 0e948042c4203b97e44370993ef042c945308282 Mon Sep 17 00:00:00 2001
From: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Date: Wed, 17 Jun 2015 23:47:28 -0700
Subject: pinctrl: qcom: spmi-mpp: Implement support for sink mode

The MPP supports three modes; digital, analog and sink mode. This patch
implements support for the latter.

Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 .../devicetree/bindings/pinctrl/qcom,pmic-mpp.txt  |   5 +
 drivers/pinctrl/qcom/pinctrl-spmi-mpp.c            | 115 +++++++++++++--------
 2 files changed, 76 insertions(+), 44 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt
index ed19991aad35..d29fb96a57d3 100644
--- a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt
+++ b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt
@@ -134,6 +134,11 @@ to specify in a pin configuration subnode:
 		    and/or output-high, output-low MPP could operate as
 		    Bidirectional Logic, Analog Input, Analog Output.
 
+- qcom,sink-mode:
+	Usage: optional
+	Value type: <u32> or <none>
+	Definition: Selects sink mode of operation
+
 - qcom,amux-route:
 	Usage: optional
 	Value type: <u32>
diff --git a/drivers/pinctrl/qcom/pinctrl-spmi-mpp.c b/drivers/pinctrl/qcom/pinctrl-spmi-mpp.c
index 745c37dea7d0..9dde023640ba 100644
--- a/drivers/pinctrl/qcom/pinctrl-spmi-mpp.c
+++ b/drivers/pinctrl/qcom/pinctrl-spmi-mpp.c
@@ -62,6 +62,7 @@
 #define PMIC_MPP_REG_DIG_IN_CTL			0x43
 #define PMIC_MPP_REG_EN_CTL			0x46
 #define PMIC_MPP_REG_AIN_CTL			0x4a
+#define PMIC_MPP_REG_SINK_CTL			0x4c
 
 /* PMIC_MPP_REG_MODE_CTL */
 #define PMIC_MPP_REG_MODE_VALUE_MASK		0x1
@@ -98,6 +99,7 @@
 /* Qualcomm specific pin configurations */
 #define PMIC_MPP_CONF_AMUX_ROUTE		(PIN_CONFIG_END + 1)
 #define PMIC_MPP_CONF_ANALOG_MODE		(PIN_CONFIG_END + 2)
+#define PMIC_MPP_CONF_SINK_MODE			(PIN_CONFIG_END + 3)
 
 /**
  * struct pmic_mpp_pad - keep current MPP settings
@@ -109,11 +111,13 @@
  * @input_enabled: Set to true if MPP input buffer logic is enabled.
  * @analog_mode: Set to true when MPP should operate in Analog Input, Analog
  *	Output or Bidirectional Analog mode.
+ * @sink_mode: Boolean indicating if ink mode is slected
  * @num_sources: Number of power-sources supported by this MPP.
  * @power_source: Current power-source used.
  * @amux_input: Set the source for analog input.
  * @pullup: Pullup resistor value. Valid in Bidirectional mode only.
  * @function: See pmic_mpp_functions[].
+ * @drive_strength: Amount of current in sink mode
  */
 struct pmic_mpp_pad {
 	u16		base;
@@ -123,11 +127,13 @@ struct pmic_mpp_pad {
 	bool		output_enabled;
 	bool		input_enabled;
 	bool		analog_mode;
+	bool		sink_mode;
 	unsigned int	num_sources;
 	unsigned int	power_source;
 	unsigned int	amux_input;
 	unsigned int	pullup;
 	unsigned int	function;
+	unsigned int	drive_strength;
 };
 
 struct pmic_mpp_state {
@@ -140,12 +146,14 @@ struct pmic_mpp_state {
 static const struct pinconf_generic_params pmic_mpp_bindings[] = {
 	{"qcom,amux-route",	PMIC_MPP_CONF_AMUX_ROUTE,	0},
 	{"qcom,analog-mode",	PMIC_MPP_CONF_ANALOG_MODE,	0},
+	{"qcom,sink-mode",	PMIC_MPP_CONF_SINK_MODE,	0},
 };
 
 #ifdef CONFIG_DEBUG_FS
 static const struct pin_config_item pmic_conf_items[] = {
 	PCONFDUMP(PMIC_MPP_CONF_AMUX_ROUTE, "analog mux", NULL, true),
 	PCONFDUMP(PMIC_MPP_CONF_ANALOG_MODE, "analog output", NULL, false),
+	PCONFDUMP(PMIC_MPP_CONF_SINK_MODE, "sink mode", NULL, false),
 };
 #endif
 
@@ -243,33 +251,28 @@ static int pmic_mpp_get_function_groups(struct pinctrl_dev *pctldev,
 	return 0;
 }
 
-static int pmic_mpp_set_mux(struct pinctrl_dev *pctldev, unsigned function,
-				unsigned pin)
+static int pmic_mpp_write_mode_ctl(struct pmic_mpp_state *state,
+				   struct pmic_mpp_pad *pad)
 {
-	struct pmic_mpp_state *state = pinctrl_dev_get_drvdata(pctldev);
-	struct pmic_mpp_pad *pad;
 	unsigned int val;
-	int ret;
-
-	pad = pctldev->desc->pins[pin].drv_data;
-
-	pad->function = function;
 
-	if (!pad->analog_mode) {
-		val = PMIC_MPP_MODE_DIGITAL_INPUT;
+	if (pad->analog_mode) {
+		val = PMIC_MPP_MODE_ANALOG_INPUT;
 		if (pad->output_enabled) {
 			if (pad->input_enabled)
-				val = PMIC_MPP_MODE_DIGITAL_BIDIR;
+				val = PMIC_MPP_MODE_ANALOG_BIDIR;
 			else
-				val = PMIC_MPP_MODE_DIGITAL_OUTPUT;
+				val = PMIC_MPP_MODE_ANALOG_OUTPUT;
 		}
+	} else if (pad->sink_mode) {
+		val = PMIC_MPP_MODE_CURRENT_SINK;
 	} else {
-		val = PMIC_MPP_MODE_ANALOG_INPUT;
+		val = PMIC_MPP_MODE_DIGITAL_INPUT;
 		if (pad->output_enabled) {
 			if (pad->input_enabled)
-				val = PMIC_MPP_MODE_ANALOG_BIDIR;
+				val = PMIC_MPP_MODE_DIGITAL_BIDIR;
 			else
-				val = PMIC_MPP_MODE_ANALOG_OUTPUT;
+				val = PMIC_MPP_MODE_DIGITAL_OUTPUT;
 		}
 	}
 
@@ -277,9 +280,22 @@ static int pmic_mpp_set_mux(struct pinctrl_dev *pctldev, unsigned function,
 	val |= pad->function << PMIC_MPP_REG_MODE_FUNCTION_SHIFT;
 	val |= pad->out_value & PMIC_MPP_REG_MODE_VALUE_MASK;
 
-	ret = pmic_mpp_write(state, pad, PMIC_MPP_REG_MODE_CTL, val);
-	if (ret < 0)
-		return ret;
+	return pmic_mpp_write(state, pad, PMIC_MPP_REG_MODE_CTL, val);
+}
+
+static int pmic_mpp_set_mux(struct pinctrl_dev *pctldev, unsigned function,
+				unsigned pin)
+{
+	struct pmic_mpp_state *state = pinctrl_dev_get_drvdata(pctldev);
+	struct pmic_mpp_pad *pad;
+	unsigned int val;
+	int ret;
+
+	pad = pctldev->desc->pins[pin].drv_data;
+
+	pad->function = function;
+
+	ret = pmic_mpp_write_mode_ctl(state, pad);
 
 	val = pad->is_enabled << PMIC_MPP_REG_MASTER_EN_SHIFT;
 
@@ -339,9 +355,15 @@ static int pmic_mpp_config_get(struct pinctrl_dev *pctldev,
 	case PMIC_MPP_CONF_AMUX_ROUTE:
 		arg = pad->amux_input;
 		break;
+	case PIN_CONFIG_DRIVE_STRENGTH:
+		arg = pad->drive_strength;
+		break;
 	case PMIC_MPP_CONF_ANALOG_MODE:
 		arg = pad->analog_mode;
 		break;
+	case PMIC_MPP_CONF_SINK_MODE:
+		arg = pad->sink_mode;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -403,13 +425,19 @@ static int pmic_mpp_config_set(struct pinctrl_dev *pctldev, unsigned int pin,
 			pad->output_enabled = true;
 			pad->out_value = arg;
 			break;
+		case PIN_CONFIG_DRIVE_STRENGTH:
+			arg = pad->drive_strength;
+			break;
 		case PMIC_MPP_CONF_AMUX_ROUTE:
 			if (arg >= PMIC_MPP_AMUX_ROUTE_ABUS4)
 				return -EINVAL;
 			pad->amux_input = arg;
 			break;
 		case PMIC_MPP_CONF_ANALOG_MODE:
-			pad->analog_mode = true;
+			pad->analog_mode = !!arg;
+			break;
+		case PMIC_MPP_CONF_SINK_MODE:
+			pad->sink_mode = !!arg;
 			break;
 		default:
 			return -EINVAL;
@@ -434,29 +462,7 @@ static int pmic_mpp_config_set(struct pinctrl_dev *pctldev, unsigned int pin,
 	if (ret < 0)
 		return ret;
 
-	if (!pad->analog_mode) {
-		val = 0;	/* just digital input */
-		if (pad->output_enabled) {
-			if (pad->input_enabled)
-				val = 2; /* digital input and output */
-			else
-				val = 1; /* just digital output */
-		}
-	} else {
-		val = 4;	/* just analog input */
-		if (pad->output_enabled) {
-			if (pad->input_enabled)
-				val = 3; /* analog input and output */
-			else
-				val = 5; /* just analog output */
-		}
-	}
-
-	val = val << PMIC_MPP_REG_MODE_DIR_SHIFT;
-	val |= pad->function << PMIC_MPP_REG_MODE_FUNCTION_SHIFT;
-	val |= pad->out_value & PMIC_MPP_REG_MODE_VALUE_MASK;
-
-	ret = pmic_mpp_write(state, pad, PMIC_MPP_REG_MODE_CTL, val);
+	ret = pmic_mpp_write_mode_ctl(state, pad);
 	if (ret < 0)
 		return ret;
 
@@ -476,6 +482,9 @@ static void pmic_mpp_config_dbg_show(struct pinctrl_dev *pctldev,
 		"0.6kOhm", "10kOhm", "30kOhm", "Disabled"
 	};
 
+	static const char *const modes[] = {
+		"digital", "analog", "sink"
+	};
 
 	pad = pctldev->desc->pins[pin].drv_data;
 
@@ -495,7 +504,7 @@ static void pmic_mpp_config_dbg_show(struct pinctrl_dev *pctldev,
 		}
 
 		seq_printf(s, " %-4s", pad->output_enabled ? "out" : "in");
-		seq_printf(s, " %-4s", pad->analog_mode ? "ana" : "dig");
+		seq_printf(s, " %-7s", modes[pad->analog_mode ? 1 : (pad->sink_mode ? 2 : 0)]);
 		seq_printf(s, " %-7s", pmic_mpp_functions[pad->function]);
 		seq_printf(s, " vin-%d", pad->power_source);
 		seq_printf(s, " %-8s", biases[pad->pullup]);
@@ -666,31 +675,43 @@ static int pmic_mpp_populate(struct pmic_mpp_state *state,
 		pad->input_enabled = true;
 		pad->output_enabled = false;
 		pad->analog_mode = false;
+		pad->sink_mode = false;
 		break;
 	case PMIC_MPP_MODE_DIGITAL_OUTPUT:
 		pad->input_enabled = false;
 		pad->output_enabled = true;
 		pad->analog_mode = false;
+		pad->sink_mode = false;
 		break;
 	case PMIC_MPP_MODE_DIGITAL_BIDIR:
 		pad->input_enabled = true;
 		pad->output_enabled = true;
 		pad->analog_mode = false;
+		pad->sink_mode = false;
 		break;
 	case PMIC_MPP_MODE_ANALOG_BIDIR:
 		pad->input_enabled = true;
 		pad->output_enabled = true;
 		pad->analog_mode = true;
+		pad->sink_mode = false;
 		break;
 	case PMIC_MPP_MODE_ANALOG_INPUT:
 		pad->input_enabled = true;
 		pad->output_enabled = false;
 		pad->analog_mode = true;
+		pad->sink_mode = false;
 		break;
 	case PMIC_MPP_MODE_ANALOG_OUTPUT:
 		pad->input_enabled = false;
 		pad->output_enabled = true;
 		pad->analog_mode = true;
+		pad->sink_mode = false;
+		break;
+	case PMIC_MPP_MODE_CURRENT_SINK:
+		pad->input_enabled = false;
+		pad->output_enabled = true;
+		pad->analog_mode = false;
+		pad->sink_mode = true;
 		break;
 	default:
 		dev_err(state->dev, "unknown MPP direction\n");
@@ -721,6 +742,12 @@ static int pmic_mpp_populate(struct pmic_mpp_state *state,
 	pad->amux_input = val >> PMIC_MPP_REG_AIN_ROUTE_SHIFT;
 	pad->amux_input &= PMIC_MPP_REG_AIN_ROUTE_MASK;
 
+	val = pmic_mpp_read(state, pad, PMIC_MPP_REG_SINK_CTL);
+	if (val < 0)
+		return val;
+
+	pad->drive_strength = val;
+
 	val = pmic_mpp_read(state, pad, PMIC_MPP_REG_EN_CTL);
 	if (val < 0)
 		return val;
-- 
cgit v1.2.3


From 42a8679cf27439c5c9f1f9aa85854f0dec74687c Mon Sep 17 00:00:00 2001
From: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Date: Fri, 10 Apr 2015 23:35:58 +0200
Subject: ALSA: hda/tegra - Fix hda2codec_2x clock and reset names

Fix hda2codec_2x clock and reset names in Tegra30 HDA controller device
tree node documentation. While at it also fix coma vs. semicolon issue.

Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Reviewed-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 Documentation/devicetree/bindings/sound/nvidia,tegra30-hda.txt | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/sound/nvidia,tegra30-hda.txt b/Documentation/devicetree/bindings/sound/nvidia,tegra30-hda.txt
index 13e2ef496724..5032efa9ab64 100644
--- a/Documentation/devicetree/bindings/sound/nvidia,tegra30-hda.txt
+++ b/Documentation/devicetree/bindings/sound/nvidia,tegra30-hda.txt
@@ -8,10 +8,10 @@ Required properties:
 - interrupts : The interrupt from the HDA controller.
 - clocks : Must contain an entry for each required entry in clock-names.
   See ../clocks/clock-bindings.txt for details.
-- clock-names : Must include the following entries: hda, hdacodec_2x, hda2hdmi
+- clock-names : Must include the following entries: hda, hda2codec_2x, hda2hdmi
 - resets : Must contain an entry for each entry in reset-names.
   See ../reset/reset.txt for details.
-- reset-names : Must include the following entries: hda, hdacodec_2x, hda2hdmi
+- reset-names : Must include the following entries: hda, hda2codec_2x, hda2hdmi
 
 Example:
 
@@ -24,7 +24,7 @@ hda@0,70030000 {
 		 <&tegra_car TEGRA124_CLK_HDA2CODEC_2X>;
 	clock-names = "hda", "hda2hdmi", "hda2codec_2x";
 	resets = <&tegra_car 125>, /* hda */
-		 <&tegra_car 128>; /* hda2hdmi */
-		 <&tegra_car 111>, /* hda2codec_2x */
+		 <&tegra_car 128>, /* hda2hdmi */
+		 <&tegra_car 111>; /* hda2codec_2x */
 	reset-names = "hda", "hda2hdmi", "hda2codec_2x";
 };
-- 
cgit v1.2.3


From 5cf4af3bebb718b72a23b7e80fddfe8b4e8b0620 Mon Sep 17 00:00:00 2001
From: Thierry Reding <treding@nvidia.com>
Date: Fri, 23 Jan 2015 09:34:44 +0100
Subject: ALSA: hda/tegra: Order clock and reset names consistently

The example in the device tree binding documentation and the driver code
list hda2codec_2x as the last entry in the clock-names and reset-names
properties. Update the description of the properties to use the same
order.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 Documentation/devicetree/bindings/sound/nvidia,tegra30-hda.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/sound/nvidia,tegra30-hda.txt b/Documentation/devicetree/bindings/sound/nvidia,tegra30-hda.txt
index 5032efa9ab64..275c6ea356f6 100644
--- a/Documentation/devicetree/bindings/sound/nvidia,tegra30-hda.txt
+++ b/Documentation/devicetree/bindings/sound/nvidia,tegra30-hda.txt
@@ -8,10 +8,10 @@ Required properties:
 - interrupts : The interrupt from the HDA controller.
 - clocks : Must contain an entry for each required entry in clock-names.
   See ../clocks/clock-bindings.txt for details.
-- clock-names : Must include the following entries: hda, hda2codec_2x, hda2hdmi
+- clock-names : Must include the following entries: hda, hda2hdmi, hda2codec_2x
 - resets : Must contain an entry for each entry in reset-names.
   See ../reset/reset.txt for details.
-- reset-names : Must include the following entries: hda, hda2codec_2x, hda2hdmi
+- reset-names : Must include the following entries: hda, hda2hdmi, hda2codec_2x
 
 Example:
 
-- 
cgit v1.2.3


From 24f743a0f06675da4e7c6a07b88e90d425edd30a Mon Sep 17 00:00:00 2001
From: Jun Nie <jun.nie@linaro.org>
Date: Mon, 29 Jun 2015 10:35:56 +0800
Subject: gpio: Document ZTE zx296702 GPIO DT binding

Add document of ZTE zx296702 GPIO binding

Signed-off-by: Jun Nie <jun.nie@linaro.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 .../devicetree/bindings/gpio/zx296702-gpio.txt     | 24 ++++++++++++++++++++++
 1 file changed, 24 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/gpio/zx296702-gpio.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/gpio/zx296702-gpio.txt b/Documentation/devicetree/bindings/gpio/zx296702-gpio.txt
new file mode 100644
index 000000000000..0dab156fcf41
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/zx296702-gpio.txt
@@ -0,0 +1,24 @@
+ZTE ZX296702 GPIO controller
+
+Required properties:
+- compatible : "zte,zx296702-gpio"
+- #gpio-cells : Should be two. The first cell is the pin number and the
+  second cell is used to specify optional parameters:
+  - bit 0 specifies polarity (0 for normal, 1 for inverted)
+- gpio-controller : Marks the device node as a GPIO controller.
+- interrupts : Interrupt mapping for GPIO IRQ.
+- gpio-ranges : Interaction with the PINCTRL subsystem.
+
+gpio1: gpio@b008040 {
+	compatible = "zte,zx296702-gpio";
+	reg = <0xb008040 0x40>;
+	gpio-controller;
+	#gpio-cells = <2>;
+	gpio-ranges = < &pmx0 0 54 2 &pmx0 2 59 14>;
+	interrupts = <GIC_SPI 26 IRQ_TYPE_LEVEL_HIGH>;
+	interrupt-parent = <&intc>;
+	interrupt-controller;
+	#interrupt-cells = <2>;
+	clock-names = "gpio_pclk";
+	clocks = <&lsp0clk ZX296702_GPIO_CLK>;
+};
-- 
cgit v1.2.3


From 16ccaf5bb5a52372bfebd3dfbb79dd810ad49c09 Mon Sep 17 00:00:00 2001
From: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Date: Tue, 30 Jun 2015 11:29:57 +0300
Subject: pinctrl: sh-pfc: Accept standard function, pins and groups properties

The "function", "pins" and "groups" pinmux and pinctrl properties have
been standardized. Support them in addition to the custom "renesas,*"
properties. New-style and old-style properties can't be mixed in DT.

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 .../bindings/pinctrl/renesas,pfc-pinctrl.txt       | 20 +++++------
 drivers/pinctrl/sh-pfc/pinctrl.c                   | 42 +++++++++++++++++-----
 2 files changed, 44 insertions(+), 18 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt
index 51cee44fc140..e089142cfb14 100644
--- a/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt
+++ b/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt
@@ -58,12 +58,12 @@ are parsed through phandles and processed purely based on their content.
 
 Pin Configuration Node Properties:
 
-- renesas,pins : An array of strings, each string containing the name of a pin.
-- renesas,groups : An array of strings, each string containing the name of a pin
+- pins : An array of strings, each string containing the name of a pin.
+- groups : An array of strings, each string containing the name of a pin
   group.
 
-- renesas,function: A string containing the name of the function to mux to the
-  pin group(s) specified by the renesas,groups property
+- function: A string containing the name of the function to mux to the pin
+  group(s) specified by the groups property.
 
   Valid values for pin, group and function names can be found in the group and
   function arrays of the PFC data file corresponding to the SoC
@@ -141,19 +141,19 @@ Example 3: KZM-A9-GT (SH-Mobile AG5) default pin state hog and pin control maps
 
 		mmcif_pins: mmcif {
 			mux {
-				renesas,groups = "mmc0_data8_0", "mmc0_ctrl_0";
-				renesas,function = "mmc0";
+				groups = "mmc0_data8_0", "mmc0_ctrl_0";
+				function = "mmc0";
 			};
 			cfg {
-				renesas,groups = "mmc0_data8_0";
-				renesas,pins = "PORT279";
+				groups = "mmc0_data8_0";
+				pins = "PORT279";
 				bias-pull-up;
 			};
 		};
 
 		scifa4_pins: scifa4 {
-			renesas,groups = "scifa4_data", "scifa4_ctrl";
-			renesas,function = "scifa4";
+			groups = "scifa4_data", "scifa4_ctrl";
+			function = "scifa4";
 		};
 	};
 
diff --git a/drivers/pinctrl/sh-pfc/pinctrl.c b/drivers/pinctrl/sh-pfc/pinctrl.c
index ff678966008b..6fe7459f0ccb 100644
--- a/drivers/pinctrl/sh-pfc/pinctrl.c
+++ b/drivers/pinctrl/sh-pfc/pinctrl.c
@@ -40,6 +40,10 @@ struct sh_pfc_pinctrl {
 
 	struct pinctrl_pin_desc *pins;
 	struct sh_pfc_pin_config *configs;
+
+	const char *func_prop_name;
+	const char *groups_prop_name;
+	const char *pins_prop_name;
 };
 
 static int sh_pfc_get_groups_count(struct pinctrl_dev *pctldev)
@@ -96,10 +100,13 @@ static int sh_pfc_map_add_config(struct pinctrl_map *map,
 	return 0;
 }
 
-static int sh_pfc_dt_subnode_to_map(struct device *dev, struct device_node *np,
+static int sh_pfc_dt_subnode_to_map(struct pinctrl_dev *pctldev,
+				    struct device_node *np,
 				    struct pinctrl_map **map,
 				    unsigned int *num_maps, unsigned int *index)
 {
+	struct sh_pfc_pinctrl *pmx = pinctrl_dev_get_drvdata(pctldev);
+	struct device *dev = pmx->pfc->dev;
 	struct pinctrl_map *maps = *map;
 	unsigned int nmaps = *num_maps;
 	unsigned int idx = *index;
@@ -113,10 +120,27 @@ static int sh_pfc_dt_subnode_to_map(struct device *dev, struct device_node *np,
 	const char *pin;
 	int ret;
 
+	/* Support both the old Renesas-specific properties and the new standard
+	 * properties. Mixing old and new properties isn't allowed, neither
+	 * inside a subnode nor across subnodes.
+	 */
+	if (!pmx->func_prop_name) {
+		if (of_find_property(np, "groups", NULL) ||
+		    of_find_property(np, "pins", NULL)) {
+			pmx->func_prop_name = "function";
+			pmx->groups_prop_name = "groups";
+			pmx->pins_prop_name = "pins";
+		} else {
+			pmx->func_prop_name = "renesas,function";
+			pmx->groups_prop_name = "renesas,groups";
+			pmx->pins_prop_name = "renesas,pins";
+		}
+	}
+
 	/* Parse the function and configuration properties. At least a function
 	 * or one configuration must be specified.
 	 */
-	ret = of_property_read_string(np, "renesas,function", &function);
+	ret = of_property_read_string(np, pmx->func_prop_name, &function);
 	if (ret < 0 && ret != -EINVAL) {
 		dev_err(dev, "Invalid function in DT\n");
 		return ret;
@@ -129,11 +153,12 @@ static int sh_pfc_dt_subnode_to_map(struct device *dev, struct device_node *np,
 	if (!function && num_configs == 0) {
 		dev_err(dev,
 			"DT node must contain at least a function or config\n");
+		ret = -ENODEV;
 		goto done;
 	}
 
 	/* Count the number of pins and groups and reallocate mappings. */
-	ret = of_property_count_strings(np, "renesas,pins");
+	ret = of_property_count_strings(np, pmx->pins_prop_name);
 	if (ret == -EINVAL) {
 		num_pins = 0;
 	} else if (ret < 0) {
@@ -143,7 +168,7 @@ static int sh_pfc_dt_subnode_to_map(struct device *dev, struct device_node *np,
 		num_pins = ret;
 	}
 
-	ret = of_property_count_strings(np, "renesas,groups");
+	ret = of_property_count_strings(np, pmx->groups_prop_name);
 	if (ret == -EINVAL) {
 		num_groups = 0;
 	} else if (ret < 0) {
@@ -174,7 +199,7 @@ static int sh_pfc_dt_subnode_to_map(struct device *dev, struct device_node *np,
 	*num_maps = nmaps;
 
 	/* Iterate over pins and groups and create the mappings. */
-	of_property_for_each_string(np, "renesas,groups", prop, group) {
+	of_property_for_each_string(np, pmx->groups_prop_name, prop, group) {
 		if (function) {
 			maps[idx].type = PIN_MAP_TYPE_MUX_GROUP;
 			maps[idx].data.mux.group = group;
@@ -198,7 +223,7 @@ static int sh_pfc_dt_subnode_to_map(struct device *dev, struct device_node *np,
 		goto done;
 	}
 
-	of_property_for_each_string(np, "renesas,pins", prop, pin) {
+	of_property_for_each_string(np, pmx->pins_prop_name, prop, pin) {
 		ret = sh_pfc_map_add_config(&maps[idx], pin,
 					    PIN_MAP_TYPE_CONFIGS_PIN,
 					    configs, num_configs);
@@ -246,7 +271,7 @@ static int sh_pfc_dt_node_to_map(struct pinctrl_dev *pctldev,
 	index = 0;
 
 	for_each_child_of_node(np, child) {
-		ret = sh_pfc_dt_subnode_to_map(dev, child, map, num_maps,
+		ret = sh_pfc_dt_subnode_to_map(pctldev, child, map, num_maps,
 					       &index);
 		if (ret < 0)
 			goto done;
@@ -254,7 +279,8 @@ static int sh_pfc_dt_node_to_map(struct pinctrl_dev *pctldev,
 
 	/* If no mapping has been found in child nodes try the config node. */
 	if (*num_maps == 0) {
-		ret = sh_pfc_dt_subnode_to_map(dev, np, map, num_maps, &index);
+		ret = sh_pfc_dt_subnode_to_map(pctldev, np, map, num_maps,
+					       &index);
 		if (ret < 0)
 			goto done;
 	}
-- 
cgit v1.2.3


From f3ee390e4ef23a557b3519f6fb314a9e80d337c1 Mon Sep 17 00:00:00 2001
From: Alexandru M Stan <amstan@chromium.org>
Date: Tue, 7 Jul 2015 20:04:32 +0200
Subject: ARM: dts: rockchip: add veyron-jerry board

The Hisense Chromebook C11, also named jerry.

Signed-off-by: Alexandru M Stan <amstan@chromium.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Doug Anderson <dianders@chromium.org>
---
 Documentation/devicetree/bindings/arm/rockchip.txt |   7 +
 arch/arm/boot/dts/Makefile                         |   3 +-
 arch/arm/boot/dts/rk3288-veyron-jerry.dts          | 197 +++++++++++++++++++++
 3 files changed, 206 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/boot/dts/rk3288-veyron-jerry.dts

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/rockchip.txt b/Documentation/devicetree/bindings/arm/rockchip.txt
index b30bed1a10ad..ab89f5089217 100644
--- a/Documentation/devicetree/bindings/arm/rockchip.txt
+++ b/Documentation/devicetree/bindings/arm/rockchip.txt
@@ -30,3 +30,10 @@ Rockchip platforms device tree bindings
 - Netxeon R89 board:
     Required root node properties:
       - compatible = "netxeon,r89", "rockchip,rk3288";
+
+- Google Jerry (Hisense Chromebook C11 and more):
+    Required root node properties:
+      - compatible = "google,veyron-jerry-rev7", "google,veyron-jerry-rev6",
+		     "google,veyron-jerry-rev5", "google,veyron-jerry-rev4",
+		     "google,veyron-jerry-rev3", "google,veyron-jerry",
+		     "google,veyron", "rockchip,rk3288";
diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index f239007cffc9..b07ee6be757e 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -489,7 +489,8 @@ dtb-$(CONFIG_ARCH_ROCKCHIP) += \
 	rk3288-evb-rk808.dtb \
 	rk3288-firefly-beta.dtb \
 	rk3288-firefly.dtb \
-	rk3288-r89.dtb
+	rk3288-r89.dtb \
+	rk3288-veyron-jerry.dtb
 dtb-$(CONFIG_ARCH_S3C24XX) += \
 	s3c2416-smdk2416.dtb
 dtb-$(CONFIG_ARCH_S3C64XX) += \
diff --git a/arch/arm/boot/dts/rk3288-veyron-jerry.dts b/arch/arm/boot/dts/rk3288-veyron-jerry.dts
new file mode 100644
index 000000000000..42d724559915
--- /dev/null
+++ b/arch/arm/boot/dts/rk3288-veyron-jerry.dts
@@ -0,0 +1,197 @@
+/*
+ * Google Veyron Jerry Rev 3+ board device tree source
+ *
+ * Copyright 2015 Google, Inc
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This file is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *  Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+#include "rk3288-veyron-chromebook.dtsi"
+#include "cros-ec-sbs.dtsi"
+
+/ {
+	model = "Google Jerry";
+	compatible = "google,veyron-jerry-rev7", "google,veyron-jerry-rev6",
+		     "google,veyron-jerry-rev5", "google,veyron-jerry-rev4",
+		     "google,veyron-jerry-rev3", "google,veyron-jerry",
+		     "google,veyron", "rockchip,rk3288";
+
+	panel_regulator: panel-regulator {
+		compatible = "regulator-fixed";
+		enable-active-high;
+		gpio = <&gpio7 14 GPIO_ACTIVE_HIGH>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&lcd_enable_h>;
+		regulator-name = "panel_regulator";
+		vin-supply = <&vcc33_sys>;
+	};
+
+	vcc18_lcd: vcc18-lcd {
+		compatible = "regulator-fixed";
+		enable-active-high;
+		gpio = <&gpio2 13 GPIO_ACTIVE_HIGH>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&avdd_1v8_disp_en>;
+		regulator-name = "vcc18_lcd";
+		regulator-always-on;
+		regulator-boot-on;
+		vin-supply = <&vcc18_wl>;
+	};
+
+	backlight_regulator: backlight-regulator {
+		compatible = "regulator-fixed";
+		enable-active-high;
+		gpio = <&gpio2 12 GPIO_ACTIVE_HIGH>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&bl_pwr_en>;
+		regulator-name = "backlight_regulator";
+		vin-supply = <&vcc33_sys>;
+		startup-delay-us = <15000>;
+	};
+};
+
+&rk808 {
+	pinctrl-names = "default";
+	pinctrl-0 = <&pmic_int_l>;
+
+	regulators {
+		mic_vcc: LDO_REG2 {
+			regulator-name = "mic_vcc";
+			regulator-always-on;
+			regulator-boot-on;
+			regulator-min-microvolt = <1800000>;
+			regulator-max-microvolt = <1800000>;
+			regulator-state-mem {
+				regulator-on-in-suspend;
+			};
+		};
+	};
+};
+
+&sdmmc {
+	disable-wp;
+	pinctrl-names = "default";
+	pinctrl-0 = <&sdmmc_clk &sdmmc_cmd &sdmmc_cd_disabled &sdmmc_cd_gpio
+			&sdmmc_bus4>;
+};
+
+&vcc_5v {
+	enable-active-high;
+	gpio = <&gpio7 21 GPIO_ACTIVE_HIGH>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&drv_5v>;
+};
+
+&vcc50_hdmi {
+	enable-active-high;
+	gpio = <&gpio5 19 GPIO_ACTIVE_HIGH>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&vcc50_hdmi_en>;
+};
+
+&pinctrl {
+	backlight {
+		bl_pwr_en: bl_pwr_en {
+			rockchip,pins = <2 12 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	buck-5v {
+		drv_5v: drv-5v {
+			rockchip,pins = <7 21 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	hdmi {
+		vcc50_hdmi_en: vcc50-hdmi-en {
+			rockchip,pins = <5 19 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	lcd {
+		lcd_enable_h: lcd-en {
+			rockchip,pins = <7 14 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+
+		avdd_1v8_disp_en: avdd-1v8-disp-en {
+			rockchip,pins = <2 13 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	pmic {
+		dvs_1: dvs-1 {
+			rockchip,pins = <7 12 RK_FUNC_GPIO &pcfg_pull_down>;
+		};
+
+		dvs_2: dvs-2 {
+			rockchip,pins = <7 15 RK_FUNC_GPIO &pcfg_pull_down>;
+		};
+	};
+};
+
+&i2c4 {
+	status = "okay";
+
+	/*
+	 * Trackpad pin control is shared between Elan and Synaptics devices
+	 * so we have to pull it up to the bus level.
+	 */
+	pinctrl-names = "default";
+	pinctrl-0 = <&i2c4_xfer &trackpad_int>;
+
+	trackpad@15 {
+		/*
+		 * Remove the inherited pinctrl settings to avoid clashing
+		 * with bus-wide ones.
+		 */
+		/delete-property/pinctrl-names;
+		/delete-property/pinctrl-0;
+	};
+
+	trackpad@2c {
+		compatible = "hid-over-i2c";
+		interrupt-parent = <&gpio7>;
+		interrupts = <3 IRQ_TYPE_EDGE_FALLING>;
+		reg = <0x2c>;
+		hid-descr-addr = <0x0020>;
+		vcc-supply = <&vcc33_io>;
+		wakeup-source;
+	};
+};
-- 
cgit v1.2.3


From 05ffc630627003c7bf275eec05b0eae81e0016b1 Mon Sep 17 00:00:00 2001
From: Heiko Stuebner <heiko@sntech.de>
Date: Tue, 7 Jul 2015 20:09:34 +0200
Subject: ARM: dts: rockchip: add veyron-pinky board

While pinky was one of the earlier development models, is on the list
of endangered species today and nearly extinct, I want to keep mine
around for the foreseeable future after spending all the time making a
nice hole into the base below the dut-connector.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
---
 Documentation/devicetree/bindings/arm/rockchip.txt |   5 +
 arch/arm/boot/dts/Makefile                         |   3 +-
 arch/arm/boot/dts/rk3288-veyron-pinky.dts          | 128 +++++++++++++++++++++
 3 files changed, 135 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/boot/dts/rk3288-veyron-pinky.dts

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/rockchip.txt b/Documentation/devicetree/bindings/arm/rockchip.txt
index ab89f5089217..c112ffac382d 100644
--- a/Documentation/devicetree/bindings/arm/rockchip.txt
+++ b/Documentation/devicetree/bindings/arm/rockchip.txt
@@ -37,3 +37,8 @@ Rockchip platforms device tree bindings
 		     "google,veyron-jerry-rev5", "google,veyron-jerry-rev4",
 		     "google,veyron-jerry-rev3", "google,veyron-jerry",
 		     "google,veyron", "rockchip,rk3288";
+
+- Google Pinky (dev-board):
+    Required root node properties:
+      - compatible = "google,veyron-pinky-rev2", "google,veyron-pinky",
+		     "google,veyron", "rockchip,rk3288";
diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index b07ee6be757e..ce7e1a821a23 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -490,7 +490,8 @@ dtb-$(CONFIG_ARCH_ROCKCHIP) += \
 	rk3288-firefly-beta.dtb \
 	rk3288-firefly.dtb \
 	rk3288-r89.dtb \
-	rk3288-veyron-jerry.dtb
+	rk3288-veyron-jerry.dtb \
+	rk3288-veyron-pinky.dtb
 dtb-$(CONFIG_ARCH_S3C24XX) += \
 	s3c2416-smdk2416.dtb
 dtb-$(CONFIG_ARCH_S3C64XX) += \
diff --git a/arch/arm/boot/dts/rk3288-veyron-pinky.dts b/arch/arm/boot/dts/rk3288-veyron-pinky.dts
new file mode 100644
index 000000000000..25eb4b0c1330
--- /dev/null
+++ b/arch/arm/boot/dts/rk3288-veyron-pinky.dts
@@ -0,0 +1,128 @@
+/*
+ * Google Veyron Pinky Rev 2 board device tree source
+ *
+ * Copyright 2015 Google, Inc
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This file is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *  Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+#include "rk3288-veyron-chromebook.dtsi"
+#include "cros-ec-sbs.dtsi"
+
+/ {
+	model = "Google Pinky";
+	compatible = "google,veyron-pinky-rev2", "google,veyron-pinky",
+		     "google,veyron", "rockchip,rk3288";
+
+	/delete-node/emmc-pwrseq;
+};
+
+&emmc {
+	/*
+	 * Use a pullup instead of a drive since the output is 3.3V and
+	 * really should be 1.8V (oops).  The external pulldown will help
+	 * bring the voltage down if we only drive with a pullup here.
+	 * Therefore disable the powerseq (and actual reset) for pinky.
+	 */
+	/delete-property/mmc-pwrseq;
+	pinctrl-0 = <&emmc_clk &emmc_cmd &emmc_bus8 &emmc_reset>;
+};
+
+&gpio_keys {
+	pinctrl-0 = <&pwr_key_h &ap_lid_int_l>;
+
+	power {
+		gpios = <&gpio0 5 GPIO_ACTIVE_HIGH>;
+	};
+};
+
+/* Touchpad connector */
+&i2c3 {
+	status = "okay";
+
+	clock-frequency = <400000>;
+	i2c-scl-falling-time-ns = <50>;
+	i2c-scl-rising-time-ns = <300>;
+};
+
+&pinctrl {
+	buttons {
+		pwr_key_h: pwr-key-h {
+			rockchip,pins = <0 5 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	emmc {
+		emmc_reset: emmc-reset {
+			rockchip,pins = <7 12 RK_FUNC_GPIO &pcfg_pull_up>;
+		};
+	};
+
+	sdmmc {
+		sdmmc_wp_gpio: sdmmc-wp-gpio {
+			rockchip,pins = <7 10 RK_FUNC_GPIO &pcfg_pull_up>;
+		};
+	};
+};
+
+&rk808 {
+	regulators {
+		vcc18_lcd: SWITCH_REG2 {
+			regulator-always-on;
+			regulator-boot-on;
+			regulator-name = "vcc18_lcd";
+			regulator-state-mem {
+				regulator-on-in-suspend;
+			};
+		};
+	};
+};
+
+&sdmmc {
+	pinctrl-names = "default";
+	pinctrl-0 = <&sdmmc_clk &sdmmc_cmd &sdmmc_cd_disabled &sdmmc_cd_gpio
+		     &sdmmc_wp_gpio &sdmmc_bus4>;
+	wp-gpios = <&gpio7 10 GPIO_ACTIVE_HIGH>;
+};
+
+&tsadc {
+	/* Some connection is flaky making the tsadc hang the system */
+	status = "disabled";
+};
-- 
cgit v1.2.3


From 7f37a3d80b6d687717685cb7156ef7c9acbd2964 Mon Sep 17 00:00:00 2001
From: Jun Nie <jun.nie@linaro.org>
Date: Tue, 5 May 2015 22:06:07 +0800
Subject: Documentation: dma: Add documentation for ZTE DMA

This patch adds documentation for the ZTE ZX296702 SoC DMA device
DTS binding.

Signed-off-by: Jun Nie <jun.nie@linaro.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
---
 Documentation/devicetree/bindings/dma/zxdma.txt | 38 +++++++++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/zxdma.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/dma/zxdma.txt b/Documentation/devicetree/bindings/dma/zxdma.txt
new file mode 100644
index 000000000000..3207ceb04d0b
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/zxdma.txt
@@ -0,0 +1,38 @@
+* ZTE ZX296702 DMA controller
+
+Required properties:
+- compatible: Should be "zte,zx296702-dma"
+- reg: Should contain DMA registers location and length.
+- interrupts: Should contain one interrupt shared by all channel
+- #dma-cells: see dma.txt, should be 1, para number
+- dma-channels: physical channels supported
+- dma-requests: virtual channels supported, each virtual channel
+		have specific request line
+- clocks: clock required
+
+Example:
+
+Controller:
+	dma: dma-controller@0x09c00000{
+		compatible = "zte,zx296702-dma";
+		reg = <0x09c00000 0x1000>;
+		clocks = <&topclk ZX296702_DMA_ACLK>;
+		interrupts = <GIC_SPI 66 IRQ_TYPE_LEVEL_HIGH>;
+		#dma-cells = <1>;
+		dma-channels = <24>;
+		dma-requests = <24>;
+	};
+
+Client:
+Use specific request line passing from dmax
+For example, spdif0 tx channel request line is 4
+	spdif0: spdif0@0b004000 {
+		#sound-dai-cells = <0>;
+		compatible = "zte,zx296702-spdif";
+		reg = <0x0b004000 0x1000>;
+		clocks = <&lsp0clk ZX296702_SPDIF0_DIV>;
+		clock-names = "tx";
+		interrupts = <GIC_SPI 21 IRQ_TYPE_LEVEL_HIGH>;
+		dmas = <&dma 4>;
+		dma-names = "tx";
+	}
-- 
cgit v1.2.3


From 4a01a64c8ee8d205bbe25adb40bb3922d0ebce70 Mon Sep 17 00:00:00 2001
From: Heiko Stuebner <heiko@sntech.de>
Date: Thu, 25 Jun 2015 19:22:27 +0200
Subject: dt-bindings: document rk3368 R88 board from Rockchip

The initial board containing the rk3368 ARM64 soc.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
---
 Documentation/devicetree/bindings/arm/rockchip.txt | 4 ++++
 1 file changed, 4 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/rockchip.txt b/Documentation/devicetree/bindings/arm/rockchip.txt
index c112ffac382d..c7769400ef85 100644
--- a/Documentation/devicetree/bindings/arm/rockchip.txt
+++ b/Documentation/devicetree/bindings/arm/rockchip.txt
@@ -42,3 +42,7 @@ Rockchip platforms device tree bindings
     Required root node properties:
       - compatible = "google,veyron-pinky-rev2", "google,veyron-pinky",
 		     "google,veyron", "rockchip,rk3288";
+
+- Rockchip R88 board:
+    Required root node properties:
+      - compatible = "rockchip,r88", "rockchip,rk3368";
-- 
cgit v1.2.3


From e1443d2849b146be4ed8d4ef89ae7e215aafaa5b Mon Sep 17 00:00:00 2001
From: Stephen Chandler Paul <cpaul@redhat.com>
Date: Wed, 15 Jul 2015 10:20:17 -0700
Subject: Input: i8042 - add unmask_kbd_data option

A big problem with the current i8042 debugging option is that it outputs
data going to and from the keyboard by default. As a result, many dmesg
logs uploaded by users will unintentionally contain sensitive information
such as their password, as such it's probably a good idea not to output
data coming from the keyboard unless specifically enabled by the user.

Signed-off-by: Stephen Chandler Paul <cpaul@redhat.com>
Reviewed-by: Andreas Mohr <andim2@users.sf.net>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 Documentation/kernel-parameters.txt |  4 ++++
 drivers/input/serio/i8042.c         | 43 +++++++++++++++++++++++++++++++++----
 drivers/input/serio/i8042.h         | 13 +++++++++++
 drivers/input/serio/serio.c         |  5 ++---
 include/linux/serio.h               |  2 ++
 5 files changed, 60 insertions(+), 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index bfcb1a62a7b4..fd0f7cd8e496 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1274,6 +1274,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			     <bus_id>,<clkrate>
 
 	i8042.debug	[HW] Toggle i8042 debug mode
+	i8042.unmask_kbd_data
+			[HW] Enable printing of interrupt data from the KBD port
+			     (disabled by default, and as a pre-condition
+			     requires that i8042.debug=1 be enabled)
 	i8042.direct	[HW] Put keyboard port into non-translated mode
 	i8042.dumbkbd	[HW] Pretend that controller can only read data from
 			     keyboard and cannot control its state
diff --git a/drivers/input/serio/i8042.c b/drivers/input/serio/i8042.c
index cb5ece77fd7d..c9c98f0ab284 100644
--- a/drivers/input/serio/i8042.c
+++ b/drivers/input/serio/i8042.c
@@ -88,6 +88,10 @@ MODULE_PARM_DESC(nopnp, "Do not use PNP to detect controller settings");
 static bool i8042_debug;
 module_param_named(debug, i8042_debug, bool, 0600);
 MODULE_PARM_DESC(debug, "Turn i8042 debugging mode on and off");
+
+static bool i8042_unmask_kbd_data;
+module_param_named(unmask_kbd_data, i8042_unmask_kbd_data, bool, 0600);
+MODULE_PARM_DESC(unmask_kbd_data, "Unconditional enable (may reveal sensitive data) of normally sanitize-filtered kbd data traffic debug log [pre-condition: i8042.debug=1 enabled]");
 #endif
 
 static bool i8042_bypass_aux_irq_test;
@@ -116,6 +120,7 @@ struct i8042_port {
 	struct serio *serio;
 	int irq;
 	bool exists;
+	bool driver_bound;
 	signed char mux;
 };
 
@@ -133,6 +138,7 @@ static bool i8042_kbd_irq_registered;
 static bool i8042_aux_irq_registered;
 static unsigned char i8042_suppress_kbd_ack;
 static struct platform_device *i8042_platform_device;
+static struct notifier_block i8042_kbd_bind_notifier_block;
 
 static irqreturn_t i8042_interrupt(int irq, void *dev_id);
 static bool (*i8042_platform_filter)(unsigned char data, unsigned char str,
@@ -528,10 +534,10 @@ static irqreturn_t i8042_interrupt(int irq, void *dev_id)
 	port = &i8042_ports[port_no];
 	serio = port->exists ? port->serio : NULL;
 
-	dbg("%02x <- i8042 (interrupt, %d, %d%s%s)\n",
-	    data, port_no, irq,
-	    dfl & SERIO_PARITY ? ", bad parity" : "",
-	    dfl & SERIO_TIMEOUT ? ", timeout" : "");
+	filter_dbg(port->driver_bound, data, "<- i8042 (interrupt, %d, %d%s%s)\n",
+		   port_no, irq,
+		   dfl & SERIO_PARITY ? ", bad parity" : "",
+		   dfl & SERIO_TIMEOUT ? ", timeout" : "");
 
 	filtered = i8042_filter(data, str, serio);
 
@@ -1438,6 +1444,29 @@ static int __init i8042_setup_kbd(void)
 	return error;
 }
 
+static int i8042_kbd_bind_notifier(struct notifier_block *nb,
+				   unsigned long action, void *data)
+{
+	struct device *dev = data;
+	struct serio *serio = to_serio_port(dev);
+	struct i8042_port *port = serio->port_data;
+
+	if (serio != i8042_ports[I8042_KBD_PORT_NO].serio)
+		return 0;
+
+	switch (action) {
+	case BUS_NOTIFY_BOUND_DRIVER:
+		port->driver_bound = true;
+		break;
+
+	case BUS_NOTIFY_UNBIND_DRIVER:
+		port->driver_bound = false;
+		break;
+	}
+
+	return 0;
+}
+
 static int __init i8042_probe(struct platform_device *dev)
 {
 	int error;
@@ -1507,6 +1536,10 @@ static struct platform_driver i8042_driver = {
 	.shutdown	= i8042_shutdown,
 };
 
+static struct notifier_block i8042_kbd_bind_notifier_block = {
+	.notifier_call = i8042_kbd_bind_notifier,
+};
+
 static int __init i8042_init(void)
 {
 	struct platform_device *pdev;
@@ -1528,6 +1561,7 @@ static int __init i8042_init(void)
 		goto err_platform_exit;
 	}
 
+	bus_register_notifier(&serio_bus, &i8042_kbd_bind_notifier_block);
 	panic_blink = i8042_panic_blink;
 
 	return 0;
@@ -1543,6 +1577,7 @@ static void __exit i8042_exit(void)
 	platform_driver_unregister(&i8042_driver);
 	i8042_platform_exit();
 
+	bus_unregister_notifier(&serio_bus, &i8042_kbd_bind_notifier_block);
 	panic_blink = NULL;
 }
 
diff --git a/drivers/input/serio/i8042.h b/drivers/input/serio/i8042.h
index fc080beffedc..1db0a40c9bab 100644
--- a/drivers/input/serio/i8042.h
+++ b/drivers/input/serio/i8042.h
@@ -73,6 +73,17 @@ static unsigned long i8042_start_time;
 			printk(KERN_DEBUG KBUILD_MODNAME ": [%d] " format,	\
 			       (int) (jiffies - i8042_start_time), ##arg);	\
 	} while (0)
+
+#define filter_dbg(filter, data, format, args...)		\
+	do {							\
+		if (!i8042_debug)				\
+			break;					\
+								\
+		if (!filter || i8042_unmask_kbd_data)		\
+			dbg("%02x " format, data, ##args);	\
+		else						\
+			dbg("** " format, ##args);		\
+	} while (0)
 #else
 #define dbg_init() do { } while (0)
 #define dbg(format, arg...)							\
@@ -80,6 +91,8 @@ static unsigned long i8042_start_time;
 		if (0)								\
 			printk(KERN_DEBUG pr_fmt(format), ##arg);		\
 	} while (0)
+
+#define filter_dbg(filter, data, format, args...) do { } while (0)
 #endif
 
 #endif /* _I8042_H */
diff --git a/drivers/input/serio/serio.c b/drivers/input/serio/serio.c
index a05a5179da32..8f828975ab10 100644
--- a/drivers/input/serio/serio.c
+++ b/drivers/input/serio/serio.c
@@ -49,8 +49,6 @@ static DEFINE_MUTEX(serio_mutex);
 
 static LIST_HEAD(serio_list);
 
-static struct bus_type serio_bus;
-
 static void serio_add_port(struct serio *serio);
 static int serio_reconnect_port(struct serio *serio);
 static void serio_disconnect_port(struct serio *serio);
@@ -1017,7 +1015,7 @@ irqreturn_t serio_interrupt(struct serio *serio,
 }
 EXPORT_SYMBOL(serio_interrupt);
 
-static struct bus_type serio_bus = {
+struct bus_type serio_bus = {
 	.name		= "serio",
 	.drv_groups	= serio_driver_groups,
 	.match		= serio_bus_match,
@@ -1029,6 +1027,7 @@ static struct bus_type serio_bus = {
 	.pm		= &serio_pm_ops,
 #endif
 };
+EXPORT_SYMBOL(serio_bus);
 
 static int __init serio_init(void)
 {
diff --git a/include/linux/serio.h b/include/linux/serio.h
index 9f779c7a2da4..df4ab5de1586 100644
--- a/include/linux/serio.h
+++ b/include/linux/serio.h
@@ -18,6 +18,8 @@
 #include <linux/mod_devicetable.h>
 #include <uapi/linux/serio.h>
 
+extern struct bus_type serio_bus;
+
 struct serio {
 	void *port_data;
 
-- 
cgit v1.2.3


From e40da86a37f64c73b810bc7a63d77c44dc61accb Mon Sep 17 00:00:00 2001
From: Tim Howe <tim.howe@cirrus.com>
Date: Thu, 16 Jul 2015 14:51:40 -0500
Subject: ASoC: cs4349: Add support for Cirrus Logic CS4349

Signed-off-by: Tim Howe <tim.howe@cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/devicetree/bindings/sound/cs4349.txt |  19 +
 sound/soc/codecs/Kconfig                           |   6 +
 sound/soc/codecs/Makefile                          |   2 +
 sound/soc/codecs/cs4349.c                          | 401 +++++++++++++++++++++
 sound/soc/codecs/cs4349.h                          | 146 ++++++++
 5 files changed, 574 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/cs4349.txt
 create mode 100644 sound/soc/codecs/cs4349.c
 create mode 100644 sound/soc/codecs/cs4349.h

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/sound/cs4349.txt b/Documentation/devicetree/bindings/sound/cs4349.txt
new file mode 100644
index 000000000000..54c117b59dba
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/cs4349.txt
@@ -0,0 +1,19 @@
+CS4349 audio CODEC
+
+Required properties:
+
+  - compatible : "cirrus,cs4349"
+
+  - reg : the I2C address of the device for I2C
+
+Optional properties:
+
+  - reset-gpios : a GPIO spec for the reset pin.
+
+Example:
+
+codec: cs4349@48 {
+        compatible = "cirrus,cs4349";
+        reg = <0x48>;
+        reset-gpios = <&gpio 54 0>;
+};
diff --git a/sound/soc/codecs/Kconfig b/sound/soc/codecs/Kconfig
index efaafce8ba38..6a759d13ef22 100644
--- a/sound/soc/codecs/Kconfig
+++ b/sound/soc/codecs/Kconfig
@@ -53,6 +53,7 @@ config SND_SOC_ALL_CODECS
 	select SND_SOC_CS4271_I2C if I2C
 	select SND_SOC_CS4271_SPI if SPI_MASTER
 	select SND_SOC_CS42XX8_I2C if I2C
+	select SND_SOC_CS4349 if I2C
 	select SND_SOC_CX20442 if TTY
 	select SND_SOC_DA7210 if SND_SOC_I2C_AND_SPI
 	select SND_SOC_DA7213 if I2C
@@ -403,6 +404,11 @@ config SND_SOC_CS42XX8_I2C
 	select SND_SOC_CS42XX8
 	select REGMAP_I2C
 
+# Cirrus Logic CS4349 HiFi DAC
+config SND_SOC_CS4349
+	tristate "Cirrus Logic CS4349 CODEC"
+	depends on I2C
+
 config SND_SOC_CX20442
 	tristate
 	depends on TTY
diff --git a/sound/soc/codecs/Makefile b/sound/soc/codecs/Makefile
index cf160d972cb3..f19e8d29f89d 100644
--- a/sound/soc/codecs/Makefile
+++ b/sound/soc/codecs/Makefile
@@ -45,6 +45,7 @@ snd-soc-cs4271-i2c-objs := cs4271-i2c.o
 snd-soc-cs4271-spi-objs := cs4271-spi.o
 snd-soc-cs42xx8-objs := cs42xx8.o
 snd-soc-cs42xx8-i2c-objs := cs42xx8-i2c.o
+snd-soc-cs4349-objs := cs4349.o
 snd-soc-cx20442-objs := cx20442.o
 snd-soc-da7210-objs := da7210.o
 snd-soc-da7213-objs := da7213.o
@@ -232,6 +233,7 @@ obj-$(CONFIG_SND_SOC_CS4271_I2C)	+= snd-soc-cs4271-i2c.o
 obj-$(CONFIG_SND_SOC_CS4271_SPI)	+= snd-soc-cs4271-spi.o
 obj-$(CONFIG_SND_SOC_CS42XX8)	+= snd-soc-cs42xx8.o
 obj-$(CONFIG_SND_SOC_CS42XX8_I2C) += snd-soc-cs42xx8-i2c.o
+obj-$(CONFIG_SND_SOC_CS4349)	+= snd-soc-cs4349.o
 obj-$(CONFIG_SND_SOC_CX20442)	+= snd-soc-cx20442.o
 obj-$(CONFIG_SND_SOC_DA7210)	+= snd-soc-da7210.o
 obj-$(CONFIG_SND_SOC_DA7213)	+= snd-soc-da7213.o
diff --git a/sound/soc/codecs/cs4349.c b/sound/soc/codecs/cs4349.c
new file mode 100644
index 000000000000..a8df8a749607
--- /dev/null
+++ b/sound/soc/codecs/cs4349.c
@@ -0,0 +1,401 @@
+/*
+ * cs4349.c  --  CS4349 ALSA Soc Audio driver
+ *
+ * Copyright 2015 Cirrus Logic, Inc.
+ *
+ * Authors: Tim Howe <Tim.Howe@cirrus.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/delay.h>
+#include <linux/gpio.h>
+#include <linux/platform_device.h>
+#include <linux/pm.h>
+#include <linux/i2c.h>
+#include <linux/of_device.h>
+#include <linux/regmap.h>
+#include <linux/slab.h>
+#include <sound/core.h>
+#include <sound/pcm.h>
+#include <sound/pcm_params.h>
+#include <sound/soc.h>
+#include <sound/soc-dapm.h>
+#include <sound/initval.h>
+#include <sound/tlv.h>
+#include "cs4349.h"
+
+
+static const struct reg_default cs4349_reg_defaults[] = {
+	{ 2, 0x00 },	/* r02	- Mode Control */
+	{ 3, 0x09 },	/* r03	- Volume, Mixing and Inversion Control */
+	{ 4, 0x81 },	/* r04	- Mute Control */
+	{ 5, 0x00 },	/* r05	- Channel A Volume Control */
+	{ 6, 0x00 },	/* r06	- Channel B Volume Control */
+	{ 7, 0xB1 },	/* r07	- Ramp and Filter Control */
+	{ 8, 0x1C },	/* r08	- Misc. Control */
+};
+
+/* Private data for the CS4349 */
+struct  cs4349_private {
+	struct regmap			*regmap;
+	struct cs4349_platform_data	pdata;
+	struct gpio_desc		*reset_gpio;
+	unsigned int			mode;
+	int				rate;
+};
+
+static bool cs4349_readable_register(struct device *dev, unsigned int reg)
+{
+	switch (reg) {
+	case CS4349_CHIPID:
+	case CS4349_MODE:
+	case CS4349_VMI:
+	case CS4349_MUTE:
+	case CS4349_VOLA:
+	case CS4349_VOLB:
+	case CS4349_RMPFLT:
+	case CS4349_MISC:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int cs4349_set_dai_fmt(struct snd_soc_dai *codec_dai,
+			      unsigned int format)
+{
+	struct snd_soc_codec *codec = codec_dai->codec;
+	struct cs4349_private *cs4349 = snd_soc_codec_get_drvdata(codec);
+	unsigned int fmt;
+
+	fmt = format & SND_SOC_DAIFMT_FORMAT_MASK;
+
+	switch (fmt) {
+	case SND_SOC_DAIFMT_I2S:
+	case SND_SOC_DAIFMT_LEFT_J:
+	case SND_SOC_DAIFMT_RIGHT_J:
+		cs4349->mode = format & SND_SOC_DAIFMT_FORMAT_MASK;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int cs4349_pcm_hw_params(struct snd_pcm_substream *substream,
+			    struct snd_pcm_hw_params *params,
+			    struct snd_soc_dai *dai)
+{
+	struct snd_soc_pcm_runtime *rtd = substream->private_data;
+	struct snd_soc_codec *codec = rtd->codec;
+	struct cs4349_private *cs4349 = snd_soc_codec_get_drvdata(codec);
+	int mode, fmt, ret;
+
+	mode = snd_soc_read(codec, CS4349_MODE);
+	cs4349->rate = params_rate(params);
+
+	switch (cs4349->mode) {
+	case SND_SOC_DAIFMT_I2S:
+		mode |= MODE_FORMAT(DIF_I2S);
+		break;
+	case SND_SOC_DAIFMT_LEFT_J:
+		mode |= MODE_FORMAT(DIF_LEFT_JST);
+		break;
+	case SND_SOC_DAIFMT_RIGHT_J:
+		switch (params_width(params)) {
+		case 16:
+			fmt = DIF_RGHT_JST16;
+			break;
+		case 24:
+			fmt = DIF_RGHT_JST24;
+			break;
+		default:
+			return -EINVAL;
+		}
+		mode |= MODE_FORMAT(fmt);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ret = snd_soc_write(codec, CS4349_MODE, mode);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+static int cs4349_digital_mute(struct snd_soc_dai *dai, int mute)
+{
+	struct snd_soc_codec *codec = dai->codec;
+	int reg;
+
+	reg = 0;
+	if (mute)
+		reg = MUTE_AB_MASK;
+
+	return snd_soc_update_bits(codec, CS4349_MUTE, MUTE_AB_MASK, reg);
+}
+
+static DECLARE_TLV_DB_SCALE(dig_tlv, -12750, 50, 0);
+
+static const char * const chan_mix_texts[] = {
+	"Mute", "MuteA", "MuteA SwapB", "MuteA MonoB", "SwapA MuteB",
+	"BothR", "Swap", "SwapA MonoB", "MuteB", "Normal", "BothL",
+	"MonoB", "MonoA MuteB", "MonoA", "MonoA SwapB", "Mono",
+	/*Normal == Channel A = Left, Channel B = Right*/
+};
+
+static const char * const fm_texts[] = {
+	"Auto", "Single", "Double", "Quad",
+};
+
+static const char * const deemph_texts[] = {
+	"None", "44.1k", "48k", "32k",
+};
+
+static const char * const softr_zeroc_texts[] = {
+	"Immediate", "Zero Cross", "Soft Ramp", "SR on ZC",
+};
+
+static int deemph_values[] = {
+	0, 4, 8, 12,
+};
+
+static int softr_zeroc_values[] = {
+	0, 64, 128, 192,
+};
+
+static const struct soc_enum chan_mix_enum =
+	SOC_ENUM_SINGLE(CS4349_VMI, 0,
+			ARRAY_SIZE(chan_mix_texts),
+			chan_mix_texts);
+
+static const struct soc_enum fm_mode_enum =
+	SOC_ENUM_SINGLE(CS4349_MODE, 0,
+			ARRAY_SIZE(fm_texts),
+			fm_texts);
+
+static SOC_VALUE_ENUM_SINGLE_DECL(deemph_enum, CS4349_MODE, 0, DEM_MASK,
+				deemph_texts, deemph_values);
+
+static SOC_VALUE_ENUM_SINGLE_DECL(softr_zeroc_enum, CS4349_RMPFLT, 0,
+				SR_ZC_MASK, softr_zeroc_texts,
+				softr_zeroc_values);
+
+static const struct snd_kcontrol_new cs4349_snd_controls[] = {
+	SOC_DOUBLE_R_TLV("Master Playback Volume",
+			 CS4349_VOLA, CS4349_VOLB, 0, 0xFF, 1, dig_tlv),
+	SOC_ENUM("Functional Mode", fm_mode_enum),
+	SOC_ENUM("De-Emphasis Control", deemph_enum),
+	SOC_ENUM("Soft Ramp Zero Cross Control", softr_zeroc_enum),
+	SOC_ENUM("Channel Mixer", chan_mix_enum),
+	SOC_SINGLE("VolA = VolB Switch", CS4349_VMI, 7, 1, 0),
+	SOC_SINGLE("InvertA Switch", CS4349_VMI, 6, 1, 0),
+	SOC_SINGLE("InvertB Switch", CS4349_VMI, 5, 1, 0),
+	SOC_SINGLE("Auto-Mute Switch", CS4349_MUTE, 7, 1, 0),
+	SOC_SINGLE("MUTEC A = B Switch", CS4349_MUTE, 5, 1, 0),
+	SOC_SINGLE("Soft Ramp Up Switch", CS4349_RMPFLT, 5, 1, 0),
+	SOC_SINGLE("Soft Ramp Down Switch", CS4349_RMPFLT, 4, 1, 0),
+	SOC_SINGLE("Slow Roll Off Filter Switch", CS4349_RMPFLT, 2, 1, 0),
+	SOC_SINGLE("Freeze Switch", CS4349_MISC, 5, 1, 0),
+	SOC_SINGLE("Popguard Switch", CS4349_MISC, 4, 1, 0),
+};
+
+static const struct snd_soc_dapm_widget cs4349_dapm_widgets[] = {
+	SND_SOC_DAPM_DAC("HiFi DAC", NULL, SND_SOC_NOPM, 0, 0),
+
+	SND_SOC_DAPM_OUTPUT("OutputA"),
+	SND_SOC_DAPM_OUTPUT("OutputB"),
+};
+
+static const struct snd_soc_dapm_route cs4349_routes[] = {
+	{"DAC Playback", NULL, "OutputA"},
+	{"DAC Playback", NULL, "OutputB"},
+
+	{"OutputA", NULL, "HiFi DAC"},
+	{"OutputB", NULL, "HiFi DAC"},
+};
+
+#define CS4349_PCM_FORMATS (SNDRV_PCM_FMTBIT_S8  | \
+			SNDRV_PCM_FMTBIT_S16_LE  | SNDRV_PCM_FMTBIT_S16_BE  | \
+			SNDRV_PCM_FMTBIT_S18_3LE | SNDRV_PCM_FMTBIT_S18_3BE | \
+			SNDRV_PCM_FMTBIT_S20_3LE | SNDRV_PCM_FMTBIT_S20_3BE | \
+			SNDRV_PCM_FMTBIT_S24_3LE | SNDRV_PCM_FMTBIT_S24_3BE | \
+			SNDRV_PCM_FMTBIT_S24_LE  | SNDRV_PCM_FMTBIT_S24_BE  | \
+			SNDRV_PCM_FMTBIT_S32_LE)
+
+#define CS4349_PCM_RATES SNDRV_PCM_RATE_8000_192000
+
+static const struct snd_soc_dai_ops cs4349_dai_ops = {
+	.hw_params	= cs4349_pcm_hw_params,
+	.set_fmt	= cs4349_set_dai_fmt,
+	.digital_mute	= cs4349_digital_mute,
+};
+
+static struct snd_soc_dai_driver cs4349_dai = {
+	.name = "cs4349_hifi",
+	.playback = {
+		.stream_name	= "DAC Playback",
+		.channels_min	= 1,
+		.channels_max	= 2,
+		.rates		= CS4349_PCM_RATES,
+		.formats	= CS4349_PCM_FORMATS,
+	},
+	.ops = &cs4349_dai_ops,
+	.symmetric_rates = 1,
+};
+
+static struct snd_soc_codec_driver soc_codec_dev_cs4349 = {
+	.controls		= cs4349_snd_controls,
+	.num_controls		= ARRAY_SIZE(cs4349_snd_controls),
+
+	.dapm_widgets		= cs4349_dapm_widgets,
+	.num_dapm_widgets	= ARRAY_SIZE(cs4349_dapm_widgets),
+	.dapm_routes		= cs4349_routes,
+	.num_dapm_routes	= ARRAY_SIZE(cs4349_routes),
+};
+
+static struct regmap_config cs4349_regmap = {
+	.reg_bits		= 8,
+	.val_bits		= 8,
+
+	.max_register		= CS4349_NUMREGS,
+	.reg_defaults		= cs4349_reg_defaults,
+	.num_reg_defaults	= ARRAY_SIZE(cs4349_reg_defaults),
+	.readable_reg		= cs4349_readable_register,
+	.cache_type		= REGCACHE_RBTREE,
+};
+
+static int cs4349_i2c_probe(struct i2c_client *client,
+				      const struct i2c_device_id *id)
+{
+	struct cs4349_private *cs4349;
+	struct cs4349_platform_data *pdata = dev_get_platdata(&client->dev);
+	int ret = 0;
+
+	cs4349 = devm_kzalloc(&client->dev, sizeof(*cs4349), GFP_KERNEL);
+	if (!cs4349)
+		return -ENOMEM;
+
+	cs4349->regmap = devm_regmap_init_i2c(client, &cs4349_regmap);
+	if (IS_ERR(cs4349->regmap)) {
+		ret = PTR_ERR(cs4349->regmap);
+		dev_err(&client->dev, "regmap_init() failed: %d\n", ret);
+		return ret;
+	}
+
+	if (pdata)
+		cs4349->pdata = *pdata;
+
+	/* Reset the Device */
+	cs4349->reset_gpio = devm_gpiod_get_optional(&client->dev,
+		"reset", GPIOD_OUT_LOW);
+	if (IS_ERR(cs4349->reset_gpio))
+		return PTR_ERR(cs4349->reset_gpio);
+
+	if (cs4349->reset_gpio)
+		gpiod_set_value_cansleep(cs4349->reset_gpio, 1);
+
+	i2c_set_clientdata(client, cs4349);
+
+	return snd_soc_register_codec(&client->dev, &soc_codec_dev_cs4349,
+		&cs4349_dai, 1);
+}
+
+static int cs4349_i2c_remove(struct i2c_client *client)
+{
+	struct cs4349_private *cs4349 = i2c_get_clientdata(client);
+
+	snd_soc_unregister_codec(&client->dev);
+
+	/* Hold down reset */
+	if (cs4349->reset_gpio)
+		gpiod_set_value_cansleep(cs4349->reset_gpio, 0);
+
+	return 0;
+}
+
+#ifdef CONFIG_PM
+static int cs4349_runtime_suspend(struct device *dev)
+{
+	struct cs4349_private *cs4349 = dev_get_drvdata(dev);
+	struct snd_soc_pcm_runtime *rtd = dev_get_drvdata(dev);
+	int ret;
+
+	ret = snd_soc_update_bits(rtd->codec, CS4349_MISC, PWR_DWN, 1);
+	if (ret < 0)
+		return ret;
+
+	regcache_cache_only(cs4349->regmap, true);
+
+	/* Hold down reset */
+	if (cs4349->reset_gpio)
+		gpiod_set_value_cansleep(cs4349->reset_gpio, 0);
+
+	return 0;
+}
+
+static int cs4349_runtime_resume(struct device *dev)
+{
+	struct cs4349_private *cs4349 = dev_get_drvdata(dev);
+	struct snd_soc_pcm_runtime *rtd = dev_get_drvdata(dev);
+	int ret;
+
+	ret = snd_soc_update_bits(rtd->codec, CS4349_MISC, PWR_DWN, 0);
+	if (ret < 0)
+		return ret;
+
+	if (cs4349->reset_gpio)
+		gpiod_set_value_cansleep(cs4349->reset_gpio, 1);
+
+	regcache_cache_only(cs4349->regmap, false);
+	regcache_sync(cs4349->regmap);
+
+	return 0;
+}
+#endif
+
+static const struct dev_pm_ops cs4349_runtime_pm = {
+	SET_RUNTIME_PM_OPS(cs4349_runtime_suspend, cs4349_runtime_resume,
+			   NULL)
+};
+
+static const struct of_device_id cs4349_of_match[] = {
+	{ .compatible = "cirrus,cs4349", },
+	{},
+};
+
+MODULE_DEVICE_TABLE(of, cs4349_of_match);
+
+static const struct i2c_device_id cs4349_i2c_id[] = {
+	{"cs4349", 0},
+	{}
+};
+
+MODULE_DEVICE_TABLE(i2c, cs4349_i2c_id);
+
+static struct i2c_driver cs4349_i2c_driver = {
+	.driver = {
+		.name		= "cs4349",
+		.owner		= THIS_MODULE,
+		.of_match_table	= cs4349_of_match,
+	},
+	.id_table	= cs4349_i2c_id,
+	.probe		= cs4349_i2c_probe,
+	.remove		= cs4349_i2c_remove,
+};
+
+module_i2c_driver(cs4349_i2c_driver);
+
+MODULE_AUTHOR("Tim Howe <tim.howe@cirrus.com>");
+MODULE_DESCRIPTION("Cirrus Logic CS4349 ALSA SoC Codec Driver");
+MODULE_LICENSE("GPL");
diff --git a/sound/soc/codecs/cs4349.h b/sound/soc/codecs/cs4349.h
new file mode 100644
index 000000000000..3884a8907d23
--- /dev/null
+++ b/sound/soc/codecs/cs4349.h
@@ -0,0 +1,146 @@
+/*
+ * ALSA SoC CS4349 codec driver
+ *
+ * Copyright 2015 Cirrus Logic, Inc.
+ *
+ * Author: Tim Howe <Tim.Howe@cirrus.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ */
+
+#ifndef __CS4349_H__
+#define __CS4349_H__
+
+struct cs4349_platform_data {
+
+	/* GPIO for Reset */
+	unsigned int gpio_nreset;
+
+};
+
+/* CS4349 registers addresses */
+#define CS4349_CHIPID		0x01	/* Device and Rev ID, Read Only */
+#define CS4349_MODE		0x02	/* Mode Control */
+#define CS4349_VMI		0x03	/* Volume, Mixing, Inversion Control */
+#define CS4349_MUTE		0x04	/* Mute Control */
+#define CS4349_VOLA		0x05	/* DAC Channel A Volume Control */
+#define CS4349_VOLB		0x06	/* DAC Channel B Volume Control */
+#define CS4349_RMPFLT		0x07	/* Ramp and Filter Control */
+#define CS4349_MISC		0x08	/* Power Down,Freeze Control,Pop Stop*/
+
+#define CS4349_FIRSTREG		0x01
+#define CS4349_LASTREG		0x08
+#define CS4349_NUMREGS		(CS4349_LASTREG - CS4349_FIRSTREG + 1)
+#define CS4349_I2C_INCR		0x80
+
+
+/* Device and Revision ID */
+#define CS4349_REVA		0xF0	/* Rev A */
+#define CS4349_REVB		0xF1	/* Rev B */
+#define CS4349_REVC2		0xFF	/* Rev C2 */
+
+
+/* PDN_DONE Poll Maximum
+ * If soft ramp is set it will take much longer to power down
+ * the system.
+ */
+#define PDN_POLL_MAX		900
+
+
+/* Bitfield Definitions */
+
+/* CS4349_MODE */
+/* (Digital Interface Format, De-Emphasis Control, Functional Mode */
+#define DIF2			(1 << 6)
+#define DIF1			(1 << 5)
+#define DIF0			(1 << 4)
+#define DEM1			(1 << 3)
+#define DEM0			(1 << 2)
+#define FM1			(1 << 1)
+#define DIF_LEFT_JST		0x00
+#define DIF_I2S			0x01
+#define DIF_RGHT_JST16		0x02
+#define DIF_RGHT_JST24		0x03
+#define DIF_TDM0		0x04
+#define DIF_TDM1		0x05
+#define DIF_TDM2		0x06
+#define DIF_TDM3		0x07
+#define DIF_MASK		0x70
+#define MODE_FORMAT(x)		(((x)&7)<<4)
+#define DEM_MASK		0x0C
+#define NO_DEM			0x00
+#define DEM_441			0x04
+#define DEM_48K			0x08
+#define DEM_32K			0x0C
+#define FM_AUTO			0x00
+#define FM_SNGL			0x01
+#define FM_DBL			0x02
+#define FM_QUAD			0x03
+#define FM_SNGL_MIN		30000
+#define FM_SNGL_MAX		54000
+#define FM_DBL_MAX		108000
+#define FM_QUAD_MAX		216000
+#define FM_MASK			0x03
+
+/* CS4349_VMI (VMI = Volume, Mixing and Inversion Controls) */
+#define VOLBISA			(1 << 7)
+#define VOLAISB			(1 << 7)
+/* INVERT_A only available for Left Jstfd, Right Jstfd16 and Right Jstfd24 */
+#define INVERT_A		(1 << 6)
+/* INVERT_B only available for Left Jstfd, Right Jstfd16 and Right Jstfd24 */
+#define INVERT_B		(1 << 5)
+#define ATAPI3			(1 << 3)
+#define ATAPI2			(1 << 2)
+#define ATAPI1			(1 << 1)
+#define ATAPI0			(1 << 0)
+#define MUTEAB			0x00
+#define MUTEA_RIGHTB		0x01
+#define MUTEA_LEFTB		0x02
+#define MUTEA_SUMLRDIV2B	0x03
+#define RIGHTA_MUTEB		0x04
+#define RIGHTA_RIGHTB		0x05
+#define RIGHTA_LEFTB		0x06
+#define RIGHTA_SUMLRDIV2B	0x07
+#define LEFTA_MUTEB		0x08
+#define LEFTA_RIGHTB		0x09	/* Default */
+#define LEFTA_LEFTB		0x0A
+#define LEFTA_SUMLRDIV2B	0x0B
+#define SUMLRDIV2A_MUTEB	0x0C
+#define SUMLRDIV2A_RIGHTB	0x0D
+#define SUMLRDIV2A_LEFTB	0x0E
+#define SUMLRDIV2_AB		0x0F
+#define CHMIX_MASK		0x0F
+
+/* CS4349_MUTE */
+#define AUTOMUTE		(1 << 7)
+#define MUTEC_AB		(1 << 5)
+#define MUTE_A			(1 << 4)
+#define MUTE_B			(1 << 3)
+#define MUTE_AB_MASK		0x18
+
+/* CS4349_RMPFLT (Ramp and Filter Control) */
+#define SCZ1			(1 << 7)
+#define SCZ0			(1 << 6)
+#define RMP_UP			(1 << 5)
+#define RMP_DN			(1 << 4)
+#define FILT_SEL		(1 << 2)
+#define IMMDT_CHNG		0x31
+#define ZEROCRSS		0x71
+#define SOFT_RMP		0xB1
+#define SFTRMP_ZEROCRSS		0xF1
+#define SR_ZC_MASK		0xC0
+
+/* CS4349_MISC */
+#define PWR_DWN			(1 << 7)
+#define FREEZE			(1 << 5)
+#define POPG_EN			(1 << 4)
+
+#endif	/* __CS4349_H__ */
-- 
cgit v1.2.3


From b7419dd73606118b8797d49b53a9fbe2e2dfa863 Mon Sep 17 00:00:00 2001
From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Date: Wed, 15 Jul 2015 07:08:05 +0000
Subject: ASoC: rsrc-card: use snd_soc_of_parse_audio_route/prefix for routing

using common audio routing path method makes sense.
Let's use snd_soc_of_parse_audio_route/prefix.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Tested-by: Keita Kobayashi <keita.kobayashi.ym@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../bindings/sound/renesas,rsrc-card.txt           |  7 +++++++
 sound/soc/sh/rcar/rsrc-card.c                      | 22 ++++++++++++++++++----
 2 files changed, 25 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/sound/renesas,rsrc-card.txt b/Documentation/devicetree/bindings/sound/renesas,rsrc-card.txt
index c64155027288..962748a8d919 100644
--- a/Documentation/devicetree/bindings/sound/renesas,rsrc-card.txt
+++ b/Documentation/devicetree/bindings/sound/renesas,rsrc-card.txt
@@ -6,6 +6,7 @@ Required properties:
 
 - compatible				: "renesas,rsrc-card,<board>"
 					  Examples with soctypes are:
+					    - "renesas,rsrc-card"
 					    - "renesas,rsrc-card,lager"
 					    - "renesas,rsrc-card,koelsch"
 Optional properties:
@@ -29,6 +30,12 @@ Optional subnode properties:
 - frame-inversion			: bool property. Add this if the
 					  dai-link uses frame clock inversion.
 - convert-rate				: platform specified sampling rate convert
+- audio-prefix				: see audio-routing
+- audio-routing				: A list of the connections between audio components.
+					  Each entry is a pair of strings, the first being the connection's sink,
+					  the second being the connection's source. Valid names for sources.
+					  use audio-prefix if some components is using same sink/sources naming.
+					  it can be used if compatible was "renesas,rsrc-card";
 
 Required CPU/CODEC subnodes properties:
 
diff --git a/sound/soc/sh/rcar/rsrc-card.c b/sound/soc/sh/rcar/rsrc-card.c
index 84e935711e29..d61db9c385ea 100644
--- a/sound/soc/sh/rcar/rsrc-card.c
+++ b/sound/soc/sh/rcar/rsrc-card.c
@@ -41,6 +41,7 @@ static const struct rsrc_card_of_data routes_of_ssi0_ak4642 = {
 static const struct of_device_id rsrc_card_of_match[] = {
 	{ .compatible = "renesas,rsrc-card,lager",	.data = &routes_of_ssi0_ak4642 },
 	{ .compatible = "renesas,rsrc-card,koelsch",	.data = &routes_of_ssi0_ak4642 },
+	{ .compatible = "renesas,rsrc-card", },
 	{},
 };
 MODULE_DEVICE_TABLE(of, rsrc_card_of_match);
@@ -242,8 +243,15 @@ static int rsrc_card_parse_links(struct device_node *np,
 		snd_soc_of_get_dai_name(np, &dai_link->codec_dai_name);
 
 		/* additional name prefix */
-		priv->codec_conf.of_node	= dai_link->codec_of_node;
-		priv->codec_conf.name_prefix	= of_data->prefix;
+		if (of_data) {
+			priv->codec_conf.of_node = dai_link->codec_of_node;
+			priv->codec_conf.name_prefix = of_data->prefix;
+		} else {
+			snd_soc_of_parse_audio_prefix(&priv->snd_card,
+						      &priv->codec_conf,
+						      dai_link->codec_of_node,
+						      "audio-prefix");
+		}
 
 		/* set dai_name */
 		snprintf(dai_props->dai_name, DAI_NAME_NUM, "be.%s",
@@ -361,8 +369,14 @@ static int rsrc_card_parse_of(struct device_node *node,
 	priv->snd_card.num_links		= num;
 	priv->snd_card.codec_conf		= &priv->codec_conf;
 	priv->snd_card.num_configs		= 1;
-	priv->snd_card.of_dapm_routes		= of_data->routes;
-	priv->snd_card.num_of_dapm_routes	= of_data->num_routes;
+
+	if (of_data) {
+		priv->snd_card.of_dapm_routes		= of_data->routes;
+		priv->snd_card.num_of_dapm_routes	= of_data->num_routes;
+	} else {
+		snd_soc_of_parse_audio_routing(&priv->snd_card,
+					       "audio-routing");
+	}
 
 	/* Parse the card name from DT */
 	snd_soc_of_parse_card_name(&priv->snd_card, "card-name");
-- 
cgit v1.2.3


From 099f3e4adddc8fe9899fb879053887a95e9aed7d Mon Sep 17 00:00:00 2001
From: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Date: Tue, 14 Jul 2015 23:40:33 -0700
Subject: pinctrl: qcom: spmi-mpp: Add support for setting analog output level

When the MPP is configured for analog output the output level is selected by
the AOUT_CTL register, this patch makes it possible to control this.

Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 .../devicetree/bindings/pinctrl/qcom,pmic-mpp.txt  |  7 +++++++
 drivers/pinctrl/qcom/pinctrl-spmi-mpp.c            | 23 ++++++++++++++++++++++
 2 files changed, 30 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt
index d29fb96a57d3..0e4d4e62e220 100644
--- a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt
+++ b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt
@@ -127,6 +127,13 @@ to specify in a pin configuration subnode:
 	Definition: Selects the power source for the specified pins. Valid power
 		    sources are defined in <dt-bindings/pinctrl/qcom,pmic-mpp.h>
 
+- qcom,analog-level:
+	Usage: optional
+	Value type: <u32>
+	Definition: Selects the source for analog output. Valued values are
+		    defined in <dt-binding/pinctrl/qcom,pmic-mpp.h>
+		    PMIC_MPP_AOUT_LVL_*
+
 - qcom,analog-mode:
 	Usage: optional
 	Value type: <none>
diff --git a/drivers/pinctrl/qcom/pinctrl-spmi-mpp.c b/drivers/pinctrl/qcom/pinctrl-spmi-mpp.c
index 9dde023640ba..e52a72348a67 100644
--- a/drivers/pinctrl/qcom/pinctrl-spmi-mpp.c
+++ b/drivers/pinctrl/qcom/pinctrl-spmi-mpp.c
@@ -61,6 +61,7 @@
 #define PMIC_MPP_REG_DIG_PULL_CTL		0x42
 #define PMIC_MPP_REG_DIG_IN_CTL			0x43
 #define PMIC_MPP_REG_EN_CTL			0x46
+#define PMIC_MPP_REG_AOUT_CTL			0x48
 #define PMIC_MPP_REG_AIN_CTL			0x4a
 #define PMIC_MPP_REG_SINK_CTL			0x4c
 
@@ -100,6 +101,7 @@
 #define PMIC_MPP_CONF_AMUX_ROUTE		(PIN_CONFIG_END + 1)
 #define PMIC_MPP_CONF_ANALOG_MODE		(PIN_CONFIG_END + 2)
 #define PMIC_MPP_CONF_SINK_MODE			(PIN_CONFIG_END + 3)
+#define PMIC_MPP_CONF_ANALOG_LEVEL		(PIN_CONFIG_END + 4)
 
 /**
  * struct pmic_mpp_pad - keep current MPP settings
@@ -115,6 +117,7 @@
  * @num_sources: Number of power-sources supported by this MPP.
  * @power_source: Current power-source used.
  * @amux_input: Set the source for analog input.
+ * @aout_level: Analog output level
  * @pullup: Pullup resistor value. Valid in Bidirectional mode only.
  * @function: See pmic_mpp_functions[].
  * @drive_strength: Amount of current in sink mode
@@ -131,6 +134,7 @@ struct pmic_mpp_pad {
 	unsigned int	num_sources;
 	unsigned int	power_source;
 	unsigned int	amux_input;
+	unsigned int	aout_level;
 	unsigned int	pullup;
 	unsigned int	function;
 	unsigned int	drive_strength;
@@ -145,6 +149,7 @@ struct pmic_mpp_state {
 
 static const struct pinconf_generic_params pmic_mpp_bindings[] = {
 	{"qcom,amux-route",	PMIC_MPP_CONF_AMUX_ROUTE,	0},
+	{"qcom,analog-level",	PMIC_MPP_CONF_ANALOG_LEVEL,	0},
 	{"qcom,analog-mode",	PMIC_MPP_CONF_ANALOG_MODE,	0},
 	{"qcom,sink-mode",	PMIC_MPP_CONF_SINK_MODE,	0},
 };
@@ -152,6 +157,7 @@ static const struct pinconf_generic_params pmic_mpp_bindings[] = {
 #ifdef CONFIG_DEBUG_FS
 static const struct pin_config_item pmic_conf_items[] = {
 	PCONFDUMP(PMIC_MPP_CONF_AMUX_ROUTE, "analog mux", NULL, true),
+	PCONFDUMP(PMIC_MPP_CONF_ANALOG_LEVEL, "analog level", NULL, true),
 	PCONFDUMP(PMIC_MPP_CONF_ANALOG_MODE, "analog output", NULL, false),
 	PCONFDUMP(PMIC_MPP_CONF_SINK_MODE, "sink mode", NULL, false),
 };
@@ -358,6 +364,9 @@ static int pmic_mpp_config_get(struct pinctrl_dev *pctldev,
 	case PIN_CONFIG_DRIVE_STRENGTH:
 		arg = pad->drive_strength;
 		break;
+	case PMIC_MPP_CONF_ANALOG_LEVEL:
+		arg = pad->aout_level;
+		break;
 	case PMIC_MPP_CONF_ANALOG_MODE:
 		arg = pad->analog_mode;
 		break;
@@ -433,6 +442,9 @@ static int pmic_mpp_config_set(struct pinctrl_dev *pctldev, unsigned int pin,
 				return -EINVAL;
 			pad->amux_input = arg;
 			break;
+		case PMIC_MPP_CONF_ANALOG_LEVEL:
+			pad->aout_level = arg;
+			break;
 		case PMIC_MPP_CONF_ANALOG_MODE:
 			pad->analog_mode = !!arg;
 			break;
@@ -462,6 +474,10 @@ static int pmic_mpp_config_set(struct pinctrl_dev *pctldev, unsigned int pin,
 	if (ret < 0)
 		return ret;
 
+	ret = pmic_mpp_write(state, pad, PMIC_MPP_REG_AOUT_CTL, pad->aout_level);
+	if (ret < 0)
+		return ret;
+
 	ret = pmic_mpp_write_mode_ctl(state, pad);
 	if (ret < 0)
 		return ret;
@@ -507,6 +523,7 @@ static void pmic_mpp_config_dbg_show(struct pinctrl_dev *pctldev,
 		seq_printf(s, " %-7s", modes[pad->analog_mode ? 1 : (pad->sink_mode ? 2 : 0)]);
 		seq_printf(s, " %-7s", pmic_mpp_functions[pad->function]);
 		seq_printf(s, " vin-%d", pad->power_source);
+		seq_printf(s, " %d", pad->aout_level);
 		seq_printf(s, " %-8s", biases[pad->pullup]);
 		seq_printf(s, " %-4s", pad->out_value ? "high" : "low");
 	}
@@ -748,6 +765,12 @@ static int pmic_mpp_populate(struct pmic_mpp_state *state,
 
 	pad->drive_strength = val;
 
+	val = pmic_mpp_read(state, pad, PMIC_MPP_REG_AOUT_CTL);
+	if (val < 0)
+		return val;
+
+	pad->aout_level = val;
+
 	val = pmic_mpp_read(state, pad, PMIC_MPP_REG_EN_CTL);
 	if (val < 0)
 		return val;
-- 
cgit v1.2.3


From eb5c144cbbc0ca9bb9a77c7c83fddc87469318de Mon Sep 17 00:00:00 2001
From: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Date: Wed, 17 Jun 2015 23:47:30 -0700
Subject: pinctrl: qcom: spmi-mpp: Transpose pinmux function

The "function" of the MPP driver was inherited from the GPIO driver, but the
differences between the two hardware blocks makes both the driver and the
device tree binding to be awkward.

Instead of overloading the "normal" function with various modes this patch
transposes the pinmux function to represent the three operating modes of the
MPP (digital, analog and current sink). The properties of pin pairing and DTEST
routing is moved to separate properties.

Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 .../devicetree/bindings/pinctrl/qcom,pmic-mpp.txt  |  29 ++--
 drivers/pinctrl/qcom/pinctrl-spmi-mpp.c            | 157 ++++++++++++---------
 2 files changed, 99 insertions(+), 87 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt
index 0e4d4e62e220..b096d8351b8f 100644
--- a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt
+++ b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt
@@ -77,12 +77,9 @@ to specify in a pin configuration subnode:
 	Value type: <string>
 	Definition: Specify the alternative function to be configured for the
 		    specified pins.  Valid values are:
-		    "normal",
-		    "paired",
-		    "dtest1",
-		    "dtest2",
-		    "dtest3",
-		    "dtest4"
+		    "digital",
+		    "analog",
+		    "sink"
 
 - bias-disable:
 	Usage: optional
@@ -134,17 +131,11 @@ to specify in a pin configuration subnode:
 		    defined in <dt-binding/pinctrl/qcom,pmic-mpp.h>
 		    PMIC_MPP_AOUT_LVL_*
 
-- qcom,analog-mode:
+- qcom,dtest:
 	Usage: optional
-	Value type: <none>
-	Definition: Selects Analog mode of operation: combined with input-enable
-		    and/or output-high, output-low MPP could operate as
-		    Bidirectional Logic, Analog Input, Analog Output.
-
-- qcom,sink-mode:
-	Usage: optional
-	Value type: <u32> or <none>
-	Definition: Selects sink mode of operation
+	Value type: <u32>
+	Definition: Selects which dtest rail to be routed in the various functions.
+		    Valid values are 1-4
 
 - qcom,amux-route:
 	Usage: optional
@@ -152,6 +143,10 @@ to specify in a pin configuration subnode:
 	Definition: Selects the source for analog input. Valid values are
 		    defined in <dt-bindings/pinctrl/qcom,pmic-mpp.h>
 		    PMIC_MPP_AMUX_ROUTE_CH5, PMIC_MPP_AMUX_ROUTE_CH6...
+- qcom,paired:
+	Usage: optional
+	Value type: <none>
+	Definition: Indicates that the pin should be operating in paired mode.
 
 Example:
 
@@ -168,7 +163,7 @@ Example:
 		pm8841_default: default {
 			gpio {
 				pins = "mpp1", "mpp2", "mpp3", "mpp4";
-				function = "normal";
+				function = "digital";
 				input-enable;
 				power-source = <PM8841_MPP_S3>;
 			};
diff --git a/drivers/pinctrl/qcom/pinctrl-spmi-mpp.c b/drivers/pinctrl/qcom/pinctrl-spmi-mpp.c
index e52a72348a67..e3be3ce2cada 100644
--- a/drivers/pinctrl/qcom/pinctrl-spmi-mpp.c
+++ b/drivers/pinctrl/qcom/pinctrl-spmi-mpp.c
@@ -95,13 +95,17 @@
 #define PMIC_MPP_MODE_ANALOG_OUTPUT		5
 #define PMIC_MPP_MODE_CURRENT_SINK		6
 
+#define PMIC_MPP_SELECTOR_NORMAL		0
+#define PMIC_MPP_SELECTOR_PAIRED		1
+#define PMIC_MPP_SELECTOR_DTEST_FIRST		4
+
 #define PMIC_MPP_PHYSICAL_OFFSET		1
 
 /* Qualcomm specific pin configurations */
 #define PMIC_MPP_CONF_AMUX_ROUTE		(PIN_CONFIG_END + 1)
-#define PMIC_MPP_CONF_ANALOG_MODE		(PIN_CONFIG_END + 2)
-#define PMIC_MPP_CONF_SINK_MODE			(PIN_CONFIG_END + 3)
-#define PMIC_MPP_CONF_ANALOG_LEVEL		(PIN_CONFIG_END + 4)
+#define PMIC_MPP_CONF_ANALOG_LEVEL		(PIN_CONFIG_END + 2)
+#define PMIC_MPP_CONF_DTEST_SELECTOR		(PIN_CONFIG_END + 3)
+#define PMIC_MPP_CONF_PAIRED			(PIN_CONFIG_END + 4)
 
 /**
  * struct pmic_mpp_pad - keep current MPP settings
@@ -111,9 +115,7 @@
  * @out_value: Cached pin output value.
  * @output_enabled: Set to true if MPP output logic is enabled.
  * @input_enabled: Set to true if MPP input buffer logic is enabled.
- * @analog_mode: Set to true when MPP should operate in Analog Input, Analog
- *	Output or Bidirectional Analog mode.
- * @sink_mode: Boolean indicating if ink mode is slected
+ * @paired: Pin operates in paired mode
  * @num_sources: Number of power-sources supported by this MPP.
  * @power_source: Current power-source used.
  * @amux_input: Set the source for analog input.
@@ -121,6 +123,7 @@
  * @pullup: Pullup resistor value. Valid in Bidirectional mode only.
  * @function: See pmic_mpp_functions[].
  * @drive_strength: Amount of current in sink mode
+ * @dtest: DTEST route selector
  */
 struct pmic_mpp_pad {
 	u16		base;
@@ -129,8 +132,7 @@ struct pmic_mpp_pad {
 	bool		out_value;
 	bool		output_enabled;
 	bool		input_enabled;
-	bool		analog_mode;
-	bool		sink_mode;
+	bool		paired;
 	unsigned int	num_sources;
 	unsigned int	power_source;
 	unsigned int	amux_input;
@@ -138,6 +140,7 @@ struct pmic_mpp_pad {
 	unsigned int	pullup;
 	unsigned int	function;
 	unsigned int	drive_strength;
+	unsigned int	dtest;
 };
 
 struct pmic_mpp_state {
@@ -150,16 +153,16 @@ struct pmic_mpp_state {
 static const struct pinconf_generic_params pmic_mpp_bindings[] = {
 	{"qcom,amux-route",	PMIC_MPP_CONF_AMUX_ROUTE,	0},
 	{"qcom,analog-level",	PMIC_MPP_CONF_ANALOG_LEVEL,	0},
-	{"qcom,analog-mode",	PMIC_MPP_CONF_ANALOG_MODE,	0},
-	{"qcom,sink-mode",	PMIC_MPP_CONF_SINK_MODE,	0},
+	{"qcom,dtest",		PMIC_MPP_CONF_DTEST_SELECTOR,	0},
+	{"qcom,paired",		PMIC_MPP_CONF_PAIRED,		0},
 };
 
 #ifdef CONFIG_DEBUG_FS
 static const struct pin_config_item pmic_conf_items[] = {
 	PCONFDUMP(PMIC_MPP_CONF_AMUX_ROUTE, "analog mux", NULL, true),
 	PCONFDUMP(PMIC_MPP_CONF_ANALOG_LEVEL, "analog level", NULL, true),
-	PCONFDUMP(PMIC_MPP_CONF_ANALOG_MODE, "analog output", NULL, false),
-	PCONFDUMP(PMIC_MPP_CONF_SINK_MODE, "sink mode", NULL, false),
+	PCONFDUMP(PMIC_MPP_CONF_DTEST_SELECTOR, "dtest", NULL, true),
+	PCONFDUMP(PMIC_MPP_CONF_PAIRED, "paired", NULL, false),
 };
 #endif
 
@@ -167,11 +170,12 @@ static const char *const pmic_mpp_groups[] = {
 	"mpp1", "mpp2", "mpp3", "mpp4", "mpp5", "mpp6", "mpp7", "mpp8",
 };
 
+#define PMIC_MPP_DIGITAL	0
+#define PMIC_MPP_ANALOG		1
+#define PMIC_MPP_SINK		2
+
 static const char *const pmic_mpp_functions[] = {
-	PMIC_MPP_FUNC_NORMAL, PMIC_MPP_FUNC_PAIRED,
-	"reserved1", "reserved2",
-	PMIC_MPP_FUNC_DTEST1, PMIC_MPP_FUNC_DTEST2,
-	PMIC_MPP_FUNC_DTEST3, PMIC_MPP_FUNC_DTEST4,
+	"digital", "analog", "sink"
 };
 
 static inline struct pmic_mpp_state *to_mpp_state(struct gpio_chip *chip)
@@ -260,31 +264,46 @@ static int pmic_mpp_get_function_groups(struct pinctrl_dev *pctldev,
 static int pmic_mpp_write_mode_ctl(struct pmic_mpp_state *state,
 				   struct pmic_mpp_pad *pad)
 {
+	unsigned int mode;
+	unsigned int sel;
 	unsigned int val;
-
-	if (pad->analog_mode) {
-		val = PMIC_MPP_MODE_ANALOG_INPUT;
-		if (pad->output_enabled) {
-			if (pad->input_enabled)
-				val = PMIC_MPP_MODE_ANALOG_BIDIR;
-			else
-				val = PMIC_MPP_MODE_ANALOG_OUTPUT;
-		}
-	} else if (pad->sink_mode) {
-		val = PMIC_MPP_MODE_CURRENT_SINK;
-	} else {
-		val = PMIC_MPP_MODE_DIGITAL_INPUT;
-		if (pad->output_enabled) {
-			if (pad->input_enabled)
-				val = PMIC_MPP_MODE_DIGITAL_BIDIR;
-			else
-				val = PMIC_MPP_MODE_DIGITAL_OUTPUT;
-		}
+	unsigned int en;
+
+	switch (pad->function) {
+	case PMIC_MPP_ANALOG:
+		if (pad->input_enabled && pad->output_enabled)
+			mode = PMIC_MPP_MODE_ANALOG_BIDIR;
+		else if (pad->input_enabled)
+			mode = PMIC_MPP_MODE_ANALOG_INPUT;
+		else
+			mode = PMIC_MPP_MODE_ANALOG_OUTPUT;
+		break;
+	case PMIC_MPP_DIGITAL:
+		if (pad->input_enabled && pad->output_enabled)
+			mode = PMIC_MPP_MODE_DIGITAL_BIDIR;
+		else if (pad->input_enabled)
+			mode = PMIC_MPP_MODE_DIGITAL_INPUT;
+		else
+			mode = PMIC_MPP_MODE_DIGITAL_OUTPUT;
+		break;
+	case PMIC_MPP_SINK:
+	default:
+		mode = PMIC_MPP_MODE_CURRENT_SINK;
+		break;
 	}
 
-	val = val << PMIC_MPP_REG_MODE_DIR_SHIFT;
-	val |= pad->function << PMIC_MPP_REG_MODE_FUNCTION_SHIFT;
-	val |= pad->out_value & PMIC_MPP_REG_MODE_VALUE_MASK;
+	if (pad->dtest)
+		sel = PMIC_MPP_SELECTOR_DTEST_FIRST + pad->dtest - 1;
+	else if (pad->paired)
+		sel = PMIC_MPP_SELECTOR_PAIRED;
+	else
+		sel = PMIC_MPP_SELECTOR_NORMAL;
+
+	en = !!pad->out_value;
+
+	val = mode << PMIC_MPP_REG_MODE_DIR_SHIFT |
+	      sel << PMIC_MPP_REG_MODE_FUNCTION_SHIFT |
+	      en;
 
 	return pmic_mpp_write(state, pad, PMIC_MPP_REG_MODE_CTL, val);
 }
@@ -358,21 +377,21 @@ static int pmic_mpp_config_get(struct pinctrl_dev *pctldev,
 	case PIN_CONFIG_OUTPUT:
 		arg = pad->out_value;
 		break;
+	case PMIC_MPP_CONF_DTEST_SELECTOR:
+		arg = pad->dtest;
+		break;
 	case PMIC_MPP_CONF_AMUX_ROUTE:
 		arg = pad->amux_input;
 		break;
+	case PMIC_MPP_CONF_PAIRED:
+		arg = pad->paired;
+		break;
 	case PIN_CONFIG_DRIVE_STRENGTH:
 		arg = pad->drive_strength;
 		break;
 	case PMIC_MPP_CONF_ANALOG_LEVEL:
 		arg = pad->aout_level;
 		break;
-	case PMIC_MPP_CONF_ANALOG_MODE:
-		arg = pad->analog_mode;
-		break;
-	case PMIC_MPP_CONF_SINK_MODE:
-		arg = pad->sink_mode;
-		break;
 	default:
 		return -EINVAL;
 	}
@@ -434,6 +453,9 @@ static int pmic_mpp_config_set(struct pinctrl_dev *pctldev, unsigned int pin,
 			pad->output_enabled = true;
 			pad->out_value = arg;
 			break;
+		case PMIC_MPP_CONF_DTEST_SELECTOR:
+			pad->dtest = arg;
+			break;
 		case PIN_CONFIG_DRIVE_STRENGTH:
 			arg = pad->drive_strength;
 			break;
@@ -445,11 +467,8 @@ static int pmic_mpp_config_set(struct pinctrl_dev *pctldev, unsigned int pin,
 		case PMIC_MPP_CONF_ANALOG_LEVEL:
 			pad->aout_level = arg;
 			break;
-		case PMIC_MPP_CONF_ANALOG_MODE:
-			pad->analog_mode = !!arg;
-			break;
-		case PMIC_MPP_CONF_SINK_MODE:
-			pad->sink_mode = !!arg;
+		case PMIC_MPP_CONF_PAIRED:
+			pad->paired = !!arg;
 			break;
 		default:
 			return -EINVAL;
@@ -498,10 +517,6 @@ static void pmic_mpp_config_dbg_show(struct pinctrl_dev *pctldev,
 		"0.6kOhm", "10kOhm", "30kOhm", "Disabled"
 	};
 
-	static const char *const modes[] = {
-		"digital", "analog", "sink"
-	};
-
 	pad = pctldev->desc->pins[pin].drv_data;
 
 	seq_printf(s, " mpp%-2d:", pin + PMIC_MPP_PHYSICAL_OFFSET);
@@ -520,12 +535,15 @@ static void pmic_mpp_config_dbg_show(struct pinctrl_dev *pctldev,
 		}
 
 		seq_printf(s, " %-4s", pad->output_enabled ? "out" : "in");
-		seq_printf(s, " %-7s", modes[pad->analog_mode ? 1 : (pad->sink_mode ? 2 : 0)]);
 		seq_printf(s, " %-7s", pmic_mpp_functions[pad->function]);
 		seq_printf(s, " vin-%d", pad->power_source);
 		seq_printf(s, " %d", pad->aout_level);
 		seq_printf(s, " %-8s", biases[pad->pullup]);
 		seq_printf(s, " %-4s", pad->out_value ? "high" : "low");
+		if (pad->dtest)
+			seq_printf(s, " dtest%d", pad->dtest);
+		if (pad->paired)
+			seq_puts(s, " paired");
 	}
 }
 
@@ -646,6 +664,7 @@ static int pmic_mpp_populate(struct pmic_mpp_state *state,
 			     struct pmic_mpp_pad *pad)
 {
 	int type, subtype, val, dir;
+	unsigned int sel;
 
 	type = pmic_mpp_read(state, pad, PMIC_MPP_REG_TYPE);
 	if (type < 0)
@@ -691,52 +710,50 @@ static int pmic_mpp_populate(struct pmic_mpp_state *state,
 	case PMIC_MPP_MODE_DIGITAL_INPUT:
 		pad->input_enabled = true;
 		pad->output_enabled = false;
-		pad->analog_mode = false;
-		pad->sink_mode = false;
+		pad->function = PMIC_MPP_DIGITAL;
 		break;
 	case PMIC_MPP_MODE_DIGITAL_OUTPUT:
 		pad->input_enabled = false;
 		pad->output_enabled = true;
-		pad->analog_mode = false;
-		pad->sink_mode = false;
+		pad->function = PMIC_MPP_DIGITAL;
 		break;
 	case PMIC_MPP_MODE_DIGITAL_BIDIR:
 		pad->input_enabled = true;
 		pad->output_enabled = true;
-		pad->analog_mode = false;
-		pad->sink_mode = false;
+		pad->function = PMIC_MPP_DIGITAL;
 		break;
 	case PMIC_MPP_MODE_ANALOG_BIDIR:
 		pad->input_enabled = true;
 		pad->output_enabled = true;
-		pad->analog_mode = true;
-		pad->sink_mode = false;
+		pad->function = PMIC_MPP_ANALOG;
 		break;
 	case PMIC_MPP_MODE_ANALOG_INPUT:
 		pad->input_enabled = true;
 		pad->output_enabled = false;
-		pad->analog_mode = true;
-		pad->sink_mode = false;
+		pad->function = PMIC_MPP_ANALOG;
 		break;
 	case PMIC_MPP_MODE_ANALOG_OUTPUT:
 		pad->input_enabled = false;
 		pad->output_enabled = true;
-		pad->analog_mode = true;
-		pad->sink_mode = false;
+		pad->function = PMIC_MPP_ANALOG;
 		break;
 	case PMIC_MPP_MODE_CURRENT_SINK:
 		pad->input_enabled = false;
 		pad->output_enabled = true;
-		pad->analog_mode = false;
-		pad->sink_mode = true;
+		pad->function = PMIC_MPP_SINK;
 		break;
 	default:
 		dev_err(state->dev, "unknown MPP direction\n");
 		return -ENODEV;
 	}
 
-	pad->function = val >> PMIC_MPP_REG_MODE_FUNCTION_SHIFT;
-	pad->function &= PMIC_MPP_REG_MODE_FUNCTION_MASK;
+	sel = val >> PMIC_MPP_REG_MODE_FUNCTION_SHIFT;
+	sel &= PMIC_MPP_REG_MODE_FUNCTION_MASK;
+
+	if (sel >= PMIC_MPP_SELECTOR_DTEST_FIRST)
+		pad->dtest = sel + 1;
+	else if (sel == PMIC_MPP_SELECTOR_PAIRED)
+		pad->paired = true;
 
 	val = pmic_mpp_read(state, pad, PMIC_MPP_REG_DIG_VIN_CTL);
 	if (val < 0)
-- 
cgit v1.2.3


From 86e46aa80d7456663afeac51971d4234dbc59e5d Mon Sep 17 00:00:00 2001
From: Hans Verkuil <hverkuil@xs4all.nl>
Date: Thu, 2 Jul 2015 10:32:39 -0300
Subject: [media] DocBook: fix media-ioc-device-info.xml type

The documentation had two media_version entries. The second one was a typo and
it should be driver_version instead. Correct this.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
---
 Documentation/DocBook/media/v4l/media-ioc-device-info.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/media/v4l/media-ioc-device-info.xml b/Documentation/DocBook/media/v4l/media-ioc-device-info.xml
index 2ce521419e67..b0a21ac300b8 100644
--- a/Documentation/DocBook/media/v4l/media-ioc-device-info.xml
+++ b/Documentation/DocBook/media/v4l/media-ioc-device-info.xml
@@ -102,7 +102,7 @@
 	  </row>
 	  <row>
 	    <entry>__u32</entry>
-	    <entry><structfield>media_version</structfield></entry>
+	    <entry><structfield>driver_version</structfield></entry>
 	    <entry>Media device driver version, formatted with the
 	    <constant>KERNEL_VERSION()</constant> macro. Together with the
 	    <structfield>driver</structfield> field this identifies a particular
-- 
cgit v1.2.3


From ee5da769b3f238f818045e7630ed9ee3788690bc Mon Sep 17 00:00:00 2001
From: Hans Verkuil <hverkuil@xs4all.nl>
Date: Wed, 8 Jul 2015 05:47:08 -0300
Subject: [media] DocBook media: fix typo in V4L2_CTRL_FLAG_EXECUTE_ON_WRITE

Fix small typo (missing 'it') in the documentation for
V4L2_CTRL_FLAG_EXECUTE_ON_WRITE.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
---
 Documentation/DocBook/media/v4l/vidioc-queryctrl.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/media/v4l/vidioc-queryctrl.xml b/Documentation/DocBook/media/v4l/vidioc-queryctrl.xml
index dc83ad70f8dc..6ec39c698baf 100644
--- a/Documentation/DocBook/media/v4l/vidioc-queryctrl.xml
+++ b/Documentation/DocBook/media/v4l/vidioc-queryctrl.xml
@@ -616,7 +616,7 @@ pointer to memory containing the payload of the control.</entry>
 	    <entry><constant>V4L2_CTRL_FLAG_EXECUTE_ON_WRITE</constant></entry>
 	    <entry>0x0200</entry>
 	    <entry>The value provided to the control will be propagated to the driver
-even if remains constant. This is required when the control represents an action
+even if it remains constant. This is required when the control represents an action
 on the hardware. For example: clearing an error flag or triggering the flash. All the
 controls of the type <constant>V4L2_CTRL_TYPE_BUTTON</constant> have this flag set.</entry>
 	  </row>
-- 
cgit v1.2.3


From 0034af036554c39eefd14d835a8ec3496ac46712 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@fb.com>
Date: Thu, 16 Jul 2015 09:14:26 -0600
Subject: block: make /sys/block/<dev>/queue/discard_max_bytes writeable

Lots of devices support huge discard sizes these days. Depending
on how the device handles them internally, huge discards can
introduce massive latencies (hundreds of msec) on the device side.

We have a sysfs file, discard_max_bytes, that advertises the max
hardware supported discard size. Make this writeable, and split
the settings into a soft and hard limit. This can be set from
'discard_granularity' and up to the hardware limit.

Add a new sysfs file, 'discard_max_hw_bytes', that shows the hw
set limit.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
---
 Documentation/block/queue-sysfs.txt | 10 +++++++++-
 block/blk-settings.c                |  4 ++++
 block/blk-sysfs.c                   | 40 ++++++++++++++++++++++++++++++++++++-
 include/linux/blkdev.h              |  1 +
 4 files changed, 53 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt
index 3a29f8914df9..e5d914845be6 100644
--- a/Documentation/block/queue-sysfs.txt
+++ b/Documentation/block/queue-sysfs.txt
@@ -20,7 +20,7 @@ This shows the size of internal allocation of the device in bytes, if
 reported by the device. A value of '0' means device does not support
 the discard functionality.
 
-discard_max_bytes (RO)
+discard_max_hw_bytes (RO)
 ----------------------
 Devices that support discard functionality may have internal limits on
 the number of bytes that can be trimmed or unmapped in a single operation.
@@ -29,6 +29,14 @@ number of bytes that can be discarded in a single operation. Discard
 requests issued to the device must not exceed this limit. A discard_max_bytes
 value of 0 means that the device does not support discard functionality.
 
+discard_max_bytes (RW)
+----------------------
+While discard_max_hw_bytes is the hardware limit for the device, this
+setting is the software limit. Some devices exhibit large latencies when
+large discards are issued, setting this value lower will make Linux issue
+smaller discards and potentially help reduce latencies induced by large
+discard operations.
+
 discard_zeroes_data (RO)
 ------------------------
 When read, this file will show if the discarded block are zeroed by the
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 12600bfffca9..b38d8d723276 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -116,6 +116,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->chunk_sectors = 0;
 	lim->max_write_same_sectors = 0;
 	lim->max_discard_sectors = 0;
+	lim->max_hw_discard_sectors = 0;
 	lim->discard_granularity = 0;
 	lim->discard_alignment = 0;
 	lim->discard_misaligned = 0;
@@ -303,6 +304,7 @@ EXPORT_SYMBOL(blk_queue_chunk_sectors);
 void blk_queue_max_discard_sectors(struct request_queue *q,
 		unsigned int max_discard_sectors)
 {
+	q->limits.max_hw_discard_sectors = max_discard_sectors;
 	q->limits.max_discard_sectors = max_discard_sectors;
 }
 EXPORT_SYMBOL(blk_queue_max_discard_sectors);
@@ -641,6 +643,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 
 		t->max_discard_sectors = min_not_zero(t->max_discard_sectors,
 						      b->max_discard_sectors);
+		t->max_hw_discard_sectors = min_not_zero(t->max_hw_discard_sectors,
+							 b->max_hw_discard_sectors);
 		t->discard_granularity = max(t->discard_granularity,
 					     b->discard_granularity);
 		t->discard_alignment = lcm_not_zero(t->discard_alignment, alignment) %
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 6264b382d4d1..b1f34e463c0f 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -145,12 +145,43 @@ static ssize_t queue_discard_granularity_show(struct request_queue *q, char *pag
 	return queue_var_show(q->limits.discard_granularity, page);
 }
 
+static ssize_t queue_discard_max_hw_show(struct request_queue *q, char *page)
+{
+	unsigned long long val;
+
+	val = q->limits.max_hw_discard_sectors << 9;
+	return sprintf(page, "%llu\n", val);
+}
+
 static ssize_t queue_discard_max_show(struct request_queue *q, char *page)
 {
 	return sprintf(page, "%llu\n",
 		       (unsigned long long)q->limits.max_discard_sectors << 9);
 }
 
+static ssize_t queue_discard_max_store(struct request_queue *q,
+				       const char *page, size_t count)
+{
+	unsigned long max_discard;
+	ssize_t ret = queue_var_store(&max_discard, page, count);
+
+	if (ret < 0)
+		return ret;
+
+	if (max_discard & (q->limits.discard_granularity - 1))
+		return -EINVAL;
+
+	max_discard >>= 9;
+	if (max_discard > UINT_MAX)
+		return -EINVAL;
+
+	if (max_discard > q->limits.max_hw_discard_sectors)
+		max_discard = q->limits.max_hw_discard_sectors;
+
+	q->limits.max_discard_sectors = max_discard;
+	return ret;
+}
+
 static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *page)
 {
 	return queue_var_show(queue_discard_zeroes_data(q), page);
@@ -360,9 +391,15 @@ static struct queue_sysfs_entry queue_discard_granularity_entry = {
 	.show = queue_discard_granularity_show,
 };
 
+static struct queue_sysfs_entry queue_discard_max_hw_entry = {
+	.attr = {.name = "discard_max_hw_bytes", .mode = S_IRUGO },
+	.show = queue_discard_max_hw_show,
+};
+
 static struct queue_sysfs_entry queue_discard_max_entry = {
-	.attr = {.name = "discard_max_bytes", .mode = S_IRUGO },
+	.attr = {.name = "discard_max_bytes", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_discard_max_show,
+	.store = queue_discard_max_store,
 };
 
 static struct queue_sysfs_entry queue_discard_zeroes_data_entry = {
@@ -421,6 +458,7 @@ static struct attribute *default_attrs[] = {
 	&queue_io_opt_entry.attr,
 	&queue_discard_granularity_entry.attr,
 	&queue_discard_max_entry.attr,
+	&queue_discard_max_hw_entry.attr,
 	&queue_discard_zeroes_data_entry.attr,
 	&queue_write_same_max_entry.attr,
 	&queue_nonrot_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d4068c17d0df..243f29e779ec 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -268,6 +268,7 @@ struct queue_limits {
 	unsigned int		io_min;
 	unsigned int		io_opt;
 	unsigned int		max_discard_sectors;
+	unsigned int		max_hw_discard_sectors;
 	unsigned int		max_write_same_sectors;
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
-- 
cgit v1.2.3


From 18eb5e9fb2936cb5d4b6899a102e97d84547f2f5 Mon Sep 17 00:00:00 2001
From: Joachim Eastwood <manabian@gmail.com>
Date: Mon, 13 Jul 2015 23:20:12 +0200
Subject: doc: dt: add documentation for pl172 memory bindings

Add documentation for configuration and timing setup of
static memory devices on the ARM PL172 controller.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
---
 .../bindings/memory-controllers/arm,pl172.txt      | 125 +++++++++++++++++++++
 1 file changed, 125 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/arm,pl172.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/memory-controllers/arm,pl172.txt b/Documentation/devicetree/bindings/memory-controllers/arm,pl172.txt
new file mode 100644
index 000000000000..e6df32f9986d
--- /dev/null
+++ b/Documentation/devicetree/bindings/memory-controllers/arm,pl172.txt
@@ -0,0 +1,125 @@
+* Device tree bindings for ARM PL172 MultiPort Memory Controller
+
+Required properties:
+
+- compatible:		"arm,pl172", "arm,primecell"
+
+- reg:			Must contains offset/length value for controller.
+
+- #address-cells:	Must be 2. The partition number has to be encoded in the
+			first address cell and it may accept values 0..N-1
+			(N - total number of partitions). The second cell is the
+			offset into the partition.
+
+- #size-cells:		Must be set to 1.
+
+- ranges:		Must contain one or more chip select memory regions.
+
+- clocks:		Must contain references to controller clocks.
+
+- clock-names:		Must contain "mpmcclk" and "apb_pclk".
+
+- clock-ranges:		Empty property indicating that child nodes can inherit
+			named clocks. Required only if clock tree data present
+			in device tree.
+			See clock-bindings.txt
+
+Child chip-select (cs) nodes contain the memory devices nodes connected to
+such as NOR (e.g. cfi-flash) and NAND.
+
+Required child cs node properties:
+
+- #address-cells:	Must be 2.
+
+- #size-cells:		Must be 1.
+
+- ranges:		Empty property indicating that child nodes can inherit
+			memory layout.
+
+- clock-ranges:		Empty property indicating that child nodes can inherit
+			named clocks. Required only if clock tree data present
+			in device tree.
+
+- mpmc,cs:		Chip select number. Indicates to the pl0172 driver
+			which chipselect is used for accessing the memory.
+
+- mpmc,memory-width:	Width of the chip select memory. Must be equal to
+			either 8, 16 or 32.
+
+Optional child cs node config properties:
+
+- mpmc,async-page-mode:	Enable asynchronous page mode.
+
+- mpmc,cs-active-high:	Set chip select polarity to active high.
+
+- mpmc,byte-lane-low:	Set byte lane state to low.
+
+- mpmc,extended-wait:	Enable extended wait.
+
+- mpmc,buffer-enable:	Enable write buffer.
+
+- mpmc,write-protect:	Enable write protect.
+
+Optional child cs node timing properties:
+
+- mpmc,write-enable-delay:	Delay from chip select assertion to write
+				enable (WE signal) in nano seconds.
+
+- mpmc,output-enable-delay:	Delay from chip select assertion to output
+				enable (OE signal) in nano seconds.
+
+- mpmc,write-access-delay:	Delay from chip select assertion to write
+				access in nano seconds.
+
+- mpmc,read-access-delay:	Delay from chip select assertion to read
+				access in nano seconds.
+
+- mpmc,page-mode-read-delay:	Delay for asynchronous page mode sequential
+				accesses in nano seconds.
+
+- mpmc,turn-round-delay:	Delay between access to memory banks in nano
+				seconds.
+
+If any of the above timing parameters are absent, current parameter value will
+be taken from the corresponding HW reg.
+
+Example for pl172 with nor flash on chip select 0 shown below.
+
+emc: memory-controller@40005000 {
+	compatible = "arm,pl172", "arm,primecell";
+	reg = <0x40005000 0x1000>;
+	clocks = <&ccu1 CLK_CPU_EMCDIV>, <&ccu1 CLK_CPU_EMC>;
+	clock-names = "mpmcclk", "apb_pclk";
+	#address-cells = <2>;
+	#size-cells = <1>;
+	ranges = <0 0 0x1c000000 0x1000000
+		  1 0 0x1d000000 0x1000000
+		  2 0 0x1e000000 0x1000000
+		  3 0 0x1f000000 0x1000000>;
+
+	cs0 {
+		#address-cells = <2>;
+		#size-cells = <1>;
+		ranges;
+
+		mpmc,cs = <0>;
+		mpmc,memory-width = <16>;
+		mpmc,byte-lane-low;
+		mpmc,write-enable-delay = <0>;
+		mpmc,output-enable-delay = <0>;
+		mpmc,read-enable-delay = <70>;
+		mpmc,page-mode-read-delay = <70>;
+
+		flash@0,0 {
+			compatible = "sst,sst39vf320", "cfi-flash";
+			reg = <0 0 0x400000>;
+			bank-width = <2>;
+			#address-cells = <1>;
+			#size-cells = <1>;
+			partition@0 {
+				label = "data";
+				reg = <0 0x400000>;
+			};
+		};
+	};
+};
-- 
cgit v1.2.3


From 9269e3c3cfac277a49b485e27ac6850f9a11a259 Mon Sep 17 00:00:00 2001
From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Date: Wed, 15 Jul 2015 07:17:17 +0000
Subject: ASoC: rsnd: add CTU (Channel Transfer Unit) prototype support

This patch adds CTU (Channel Transfer Unit) support for rsnd driver.
But, it does nothing to data at this point, but is required for MIX
support.

CTU design is a little different from other IPs (CTU0 is including
CTU00 - CTU03, and CTU1 is including CTU10 - CTU13, these have different
register mapping) We need to care about it on this driver.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Tested-by: Keita Kobayashi <keita.kobayashi.ym@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../devicetree/bindings/sound/renesas,rsnd.txt     |  14 ++
 include/sound/rcar_snd.h                           |   7 +
 sound/soc/sh/rcar/Makefile                         |   2 +-
 sound/soc/sh/rcar/core.c                           |  12 +-
 sound/soc/sh/rcar/ctu.c                            | 171 +++++++++++++++++++++
 sound/soc/sh/rcar/dma.c                            |  13 +-
 sound/soc/sh/rcar/gen.c                            |   2 +
 sound/soc/sh/rcar/rsnd.h                           |  21 +++
 8 files changed, 236 insertions(+), 6 deletions(-)
 create mode 100644 sound/soc/sh/rcar/ctu.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/sound/renesas,rsnd.txt b/Documentation/devicetree/bindings/sound/renesas,rsnd.txt
index b6b3a786855f..278607de05de 100644
--- a/Documentation/devicetree/bindings/sound/renesas,rsnd.txt
+++ b/Documentation/devicetree/bindings/sound/renesas,rsnd.txt
@@ -18,6 +18,9 @@ Required properties:
 - rcar_sound,src		: Should contain SRC feature.
 				  The number of SRC subnode should be same as HW.
 				  see below for detail.
+- rcar_sound,ctu		: Should contain CTU feature.
+				  The number of CTU subnode should be same as HW.
+				  see below for detail.
 - rcar_sound,dvc		: Should contain DVC feature.
 				  The number of DVC subnode should be same as HW.
 				  see below for detail.
@@ -90,6 +93,17 @@ rcar_sound: sound@ec500000 {
 		};
 	};
 
+	rcar_sound,ctu {
+		ctu00: ctu@0 { };
+		ctu01: ctu@1 { };
+		ctu02: ctu@2 { };
+		ctu03: ctu@3 { };
+		ctu10: ctu@4 { };
+		ctu11: ctu@5 { };
+		ctu12: ctu@6 { };
+		ctu13: ctu@7 { };
+	};
+
 	rcar_sound,src {
 		src0: src@0 {
 			interrupts = <0 352 IRQ_TYPE_LEVEL_HIGH>;
diff --git a/include/sound/rcar_snd.h b/include/sound/rcar_snd.h
index 4cecd0c175f6..8f9303093ab9 100644
--- a/include/sound/rcar_snd.h
+++ b/include/sound/rcar_snd.h
@@ -61,6 +61,10 @@ struct rsnd_src_platform_info {
 /*
  * flags
  */
+struct rsnd_ctu_platform_info {
+	u32 flags;
+};
+
 struct rsnd_dvc_platform_info {
 	u32 flags;
 };
@@ -68,6 +72,7 @@ struct rsnd_dvc_platform_info {
 struct rsnd_dai_path_info {
 	struct rsnd_ssi_platform_info *ssi;
 	struct rsnd_src_platform_info *src;
+	struct rsnd_ctu_platform_info *ctu;
 	struct rsnd_dvc_platform_info *dvc;
 };
 
@@ -93,6 +98,8 @@ struct rcar_snd_info {
 	int ssi_info_nr;
 	struct rsnd_src_platform_info *src_info;
 	int src_info_nr;
+	struct rsnd_ctu_platform_info *ctu_info;
+	int ctu_info_nr;
 	struct rsnd_dvc_platform_info *dvc_info;
 	int dvc_info_nr;
 	struct rsnd_dai_platform_info *dai_info;
diff --git a/sound/soc/sh/rcar/Makefile b/sound/soc/sh/rcar/Makefile
index 3a274fd3593c..7c4730a81c4a 100644
--- a/sound/soc/sh/rcar/Makefile
+++ b/sound/soc/sh/rcar/Makefile
@@ -1,4 +1,4 @@
-snd-soc-rcar-objs	:= core.o gen.o dma.o adg.o ssi.o src.o dvc.o
+snd-soc-rcar-objs	:= core.o gen.o dma.o adg.o ssi.o src.o ctu.o dvc.o
 obj-$(CONFIG_SND_SOC_RCAR)	+= snd-soc-rcar.o
 
 snd-soc-rsrc-card-objs	:= rsrc-card.o
diff --git a/sound/soc/sh/rcar/core.c b/sound/soc/sh/rcar/core.c
index e20d8ea0aafe..63ae7bbfd1dc 100644
--- a/sound/soc/sh/rcar/core.c
+++ b/sound/soc/sh/rcar/core.c
@@ -651,6 +651,11 @@ static int rsnd_path_init(struct rsnd_priv *priv,
 	if (ret < 0)
 		return ret;
 
+	/* CTU */
+	ret = rsnd_path_add(priv, io, ctu);
+	if (ret < 0)
+		return ret;
+
 	/* DVC */
 	ret = rsnd_path_add(priv, io, dvc);
 	if (ret < 0)
@@ -666,13 +671,14 @@ static void rsnd_of_parse_dai(struct platform_device *pdev,
 	struct device_node *dai_node,	*dai_np;
 	struct device_node *ssi_node,	*ssi_np;
 	struct device_node *src_node,	*src_np;
+	struct device_node *ctu_node,	*ctu_np;
 	struct device_node *dvc_node,	*dvc_np;
 	struct device_node *playback, *capture;
 	struct rsnd_dai_platform_info *dai_info;
 	struct rcar_snd_info *info = rsnd_priv_to_info(priv);
 	struct device *dev = &pdev->dev;
 	int nr, i;
-	int dai_i, ssi_i, src_i, dvc_i;
+	int dai_i, ssi_i, src_i, ctu_i, dvc_i;
 
 	if (!of_data)
 		return;
@@ -698,6 +704,7 @@ static void rsnd_of_parse_dai(struct platform_device *pdev,
 
 	ssi_node = of_get_child_by_name(dev->of_node, "rcar_sound,ssi");
 	src_node = of_get_child_by_name(dev->of_node, "rcar_sound,src");
+	ctu_node = of_get_child_by_name(dev->of_node, "rcar_sound,ctu");
 	dvc_node = of_get_child_by_name(dev->of_node, "rcar_sound,dvc");
 
 #define mod_parse(name)							\
@@ -734,6 +741,7 @@ if (name##_node) {							\
 
 			mod_parse(ssi);
 			mod_parse(src);
+			mod_parse(ctu);
 			mod_parse(dvc);
 
 			of_node_put(playback);
@@ -1146,6 +1154,7 @@ static int rsnd_probe(struct platform_device *pdev)
 		rsnd_dma_probe,
 		rsnd_ssi_probe,
 		rsnd_src_probe,
+		rsnd_ctu_probe,
 		rsnd_dvc_probe,
 		rsnd_adg_probe,
 		rsnd_dai_probe,
@@ -1241,6 +1250,7 @@ static int rsnd_remove(struct platform_device *pdev)
 			      struct rsnd_priv *priv) = {
 		rsnd_ssi_remove,
 		rsnd_src_remove,
+		rsnd_ctu_remove,
 		rsnd_dvc_remove,
 	};
 	int ret = 0, i;
diff --git a/sound/soc/sh/rcar/ctu.c b/sound/soc/sh/rcar/ctu.c
new file mode 100644
index 000000000000..05edd2009b81
--- /dev/null
+++ b/sound/soc/sh/rcar/ctu.c
@@ -0,0 +1,171 @@
+/*
+ * ctu.c
+ *
+ * Copyright (c) 2015 Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include "rsnd.h"
+
+#define CTU_NAME_SIZE	16
+#define CTU_NAME "ctu"
+
+struct rsnd_ctu {
+	struct rsnd_ctu_platform_info *info; /* rcar_snd.h */
+	struct rsnd_mod mod;
+};
+
+#define rsnd_ctu_nr(priv) ((priv)->ctu_nr)
+#define for_each_rsnd_ctu(pos, priv, i)					\
+	for ((i) = 0;							\
+	     ((i) < rsnd_ctu_nr(priv)) &&				\
+		     ((pos) = (struct rsnd_ctu *)(priv)->ctu + i);	\
+	     i++)
+
+#define rsnd_ctu_initialize_lock(mod)	__rsnd_ctu_initialize_lock(mod, 1)
+#define rsnd_ctu_initialize_unlock(mod)	__rsnd_ctu_initialize_lock(mod, 0)
+static void __rsnd_ctu_initialize_lock(struct rsnd_mod *mod, u32 enable)
+{
+	rsnd_mod_write(mod, CTU_CTUIR, enable);
+}
+
+static int rsnd_ctu_init(struct rsnd_mod *mod,
+			 struct rsnd_dai_stream *io,
+			 struct rsnd_priv *priv)
+{
+	rsnd_mod_hw_start(mod);
+
+	rsnd_ctu_initialize_lock(mod);
+
+	rsnd_mod_write(mod, CTU_ADINR, rsnd_get_adinr_chan(mod, io));
+
+	rsnd_ctu_initialize_unlock(mod);
+
+	return 0;
+}
+
+static int rsnd_ctu_quit(struct rsnd_mod *mod,
+			 struct rsnd_dai_stream *io,
+			 struct rsnd_priv *priv)
+{
+	rsnd_mod_hw_stop(mod);
+
+	return 0;
+}
+
+static struct rsnd_mod_ops rsnd_ctu_ops = {
+	.name		= CTU_NAME,
+	.init		= rsnd_ctu_init,
+	.quit		= rsnd_ctu_quit,
+};
+
+struct rsnd_mod *rsnd_ctu_mod_get(struct rsnd_priv *priv, int id)
+{
+	if (WARN_ON(id < 0 || id >= rsnd_ctu_nr(priv)))
+		id = 0;
+
+	return &((struct rsnd_ctu *)(priv->ctu) + id)->mod;
+}
+
+void rsnd_of_parse_ctu(struct platform_device *pdev,
+		       const struct rsnd_of_data *of_data,
+		       struct rsnd_priv *priv)
+{
+	struct device_node *node;
+	struct rsnd_ctu_platform_info *ctu_info;
+	struct rcar_snd_info *info = rsnd_priv_to_info(priv);
+	struct device *dev = &pdev->dev;
+	int nr;
+
+	if (!of_data)
+		return;
+
+	node = of_get_child_by_name(dev->of_node, "rcar_sound,ctu");
+	if (!node)
+		return;
+
+	nr = of_get_child_count(node);
+	if (!nr)
+		goto rsnd_of_parse_ctu_end;
+
+	ctu_info = devm_kzalloc(dev,
+				sizeof(struct rsnd_ctu_platform_info) * nr,
+				GFP_KERNEL);
+	if (!ctu_info) {
+		dev_err(dev, "ctu info allocation error\n");
+		goto rsnd_of_parse_ctu_end;
+	}
+
+	info->ctu_info		= ctu_info;
+	info->ctu_info_nr	= nr;
+
+rsnd_of_parse_ctu_end:
+	of_node_put(node);
+
+}
+
+int rsnd_ctu_probe(struct platform_device *pdev,
+		   const struct rsnd_of_data *of_data,
+		   struct rsnd_priv *priv)
+{
+	struct rcar_snd_info *info = rsnd_priv_to_info(priv);
+	struct device *dev = rsnd_priv_to_dev(priv);
+	struct rsnd_ctu *ctu;
+	struct clk *clk;
+	char name[CTU_NAME_SIZE];
+	int i, nr, ret;
+
+	/* This driver doesn't support Gen1 at this point */
+	if (rsnd_is_gen1(priv)) {
+		dev_warn(dev, "CTU is not supported on Gen1\n");
+		return -EINVAL;
+	}
+
+	rsnd_of_parse_ctu(pdev, of_data, priv);
+
+	nr = info->ctu_info_nr;
+	if (!nr)
+		return 0;
+
+	ctu = devm_kzalloc(dev, sizeof(*ctu) * nr, GFP_KERNEL);
+	if (!ctu)
+		return -ENOMEM;
+
+	priv->ctu_nr	= nr;
+	priv->ctu	= ctu;
+
+	for_each_rsnd_ctu(ctu, priv, i) {
+		/*
+		 * CTU00, CTU01, CTU02, CTU03 => CTU0
+		 * CTU10, CTU11, CTU12, CTU13 => CTU1
+		 */
+		snprintf(name, CTU_NAME_SIZE, "%s.%d",
+			 CTU_NAME, i / 4);
+
+		clk = devm_clk_get(dev, name);
+		if (IS_ERR(clk))
+			return PTR_ERR(clk);
+
+		ctu->info = &info->ctu_info[i];
+
+		ret = rsnd_mod_init(priv, &ctu->mod, &rsnd_ctu_ops,
+				    clk, RSND_MOD_CTU, i);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+void rsnd_ctu_remove(struct platform_device *pdev,
+		     struct rsnd_priv *priv)
+{
+	struct rsnd_ctu *ctu;
+	int i;
+
+	for_each_rsnd_ctu(ctu, priv, i) {
+		rsnd_mod_quit(&ctu->mod);
+	}
+}
diff --git a/sound/soc/sh/rcar/dma.c b/sound/soc/sh/rcar/dma.c
index 23282f48f71f..229b68d2cf70 100644
--- a/sound/soc/sh/rcar/dma.c
+++ b/sound/soc/sh/rcar/dma.c
@@ -426,7 +426,8 @@ rsnd_gen2_dma_addr(struct rsnd_dai_stream *io,
 	phys_addr_t src_reg = rsnd_gen_get_phy_addr(priv, RSND_GEN2_SCU);
 	int is_ssi = !!(rsnd_io_to_mod_ssi(io) == mod);
 	int use_src = !!rsnd_io_to_mod_src(io);
-	int use_dvc = !!rsnd_io_to_mod_dvc(io);
+	int use_cmd = !!rsnd_io_to_mod_dvc(io) ||
+		      !!rsnd_io_to_mod_ctu(io);
 	int id = rsnd_mod_id(mod);
 	struct dma_addr {
 		dma_addr_t out_addr;
@@ -464,7 +465,7 @@ rsnd_gen2_dma_addr(struct rsnd_dai_stream *io,
 	};
 
 	/* it shouldn't happen */
-	if (use_dvc && !use_src)
+	if (use_cmd && !use_src)
 		dev_err(dev, "DVC is selected without SRC\n");
 
 	/* use SSIU or SSI ? */
@@ -472,8 +473,8 @@ rsnd_gen2_dma_addr(struct rsnd_dai_stream *io,
 		is_ssi++;
 
 	return (is_from) ?
-		dma_addrs[is_ssi][is_play][use_src + use_dvc].out_addr :
-		dma_addrs[is_ssi][is_play][use_src + use_dvc].in_addr;
+		dma_addrs[is_ssi][is_play][use_src + use_cmd].out_addr :
+		dma_addrs[is_ssi][is_play][use_src + use_cmd].in_addr;
 }
 
 static dma_addr_t rsnd_dma_addr(struct rsnd_dai_stream *io,
@@ -504,6 +505,7 @@ static void rsnd_dma_of_path(struct rsnd_dma *dma,
 	struct rsnd_mod *this = rsnd_dma_to_mod(dma);
 	struct rsnd_mod *ssi = rsnd_io_to_mod_ssi(io);
 	struct rsnd_mod *src = rsnd_io_to_mod_src(io);
+	struct rsnd_mod *ctu = rsnd_io_to_mod_ctu(io);
 	struct rsnd_mod *dvc = rsnd_io_to_mod_dvc(io);
 	struct rsnd_mod *mod[MOD_MAX];
 	struct rsnd_mod *mod_start, *mod_end;
@@ -543,6 +545,9 @@ static void rsnd_dma_of_path(struct rsnd_dma *dma,
 		if (src) {
 			mod[i] = src;
 			src = NULL;
+		} else if (ctu) {
+			mod[i] = ctu;
+			ctu = NULL;
 		} else if (dvc) {
 			mod[i] = dvc;
 			dvc = NULL;
diff --git a/sound/soc/sh/rcar/gen.c b/sound/soc/sh/rcar/gen.c
index a2d5df4d5d17..41b75cd4e09b 100644
--- a/sound/soc/sh/rcar/gen.c
+++ b/sound/soc/sh/rcar/gen.c
@@ -240,6 +240,8 @@ static int rsnd_gen2_probe(struct platform_device *pdev,
 		RSND_GEN_M_REG(SRC_SRCCR,	0x224,	0x40),
 		RSND_GEN_M_REG(SRC_BSDSR,	0x22c,	0x40),
 		RSND_GEN_M_REG(SRC_BSISR,	0x238,	0x40),
+		RSND_GEN_M_REG(CTU_CTUIR,	0x504,	0x100),
+		RSND_GEN_M_REG(CTU_ADINR,	0x508,	0x100),
 		RSND_GEN_M_REG(DVC_SWRSR,	0xe00,	0x100),
 		RSND_GEN_M_REG(DVC_DVUIR,	0xe04,	0x100),
 		RSND_GEN_M_REG(DVC_ADINR,	0xe08,	0x100),
diff --git a/sound/soc/sh/rcar/rsnd.h b/sound/soc/sh/rcar/rsnd.h
index 7fee2079ba5a..f2128a7cf259 100644
--- a/sound/soc/sh/rcar/rsnd.h
+++ b/sound/soc/sh/rcar/rsnd.h
@@ -47,6 +47,8 @@ enum rsnd_reg {
 	RSND_REG_SCU_SYS_STATUS0,
 	RSND_REG_SCU_SYS_INT_EN0,
 	RSND_REG_CMD_ROUTE_SLCT,
+	RSND_REG_CTU_CTUIR,
+	RSND_REG_CTU_ADINR,
 	RSND_REG_DVC_SWRSR,
 	RSND_REG_DVC_DVUIR,
 	RSND_REG_DVC_ADINR,
@@ -220,6 +222,7 @@ struct dma_chan *rsnd_dma_request_channel(struct device_node *of_node,
  */
 enum rsnd_mod_type {
 	RSND_MOD_DVC = 0,
+	RSND_MOD_CTU,
 	RSND_MOD_SRC,
 	RSND_MOD_SSI,
 	RSND_MOD_MAX,
@@ -351,6 +354,7 @@ struct rsnd_dai_stream {
 #define rsnd_io_to_mod(io, i)	((i) < RSND_MOD_MAX ? (io)->mod[(i)] : NULL)
 #define rsnd_io_to_mod_ssi(io)	rsnd_io_to_mod((io), RSND_MOD_SSI)
 #define rsnd_io_to_mod_src(io)	rsnd_io_to_mod((io), RSND_MOD_SRC)
+#define rsnd_io_to_mod_ctu(io)	rsnd_io_to_mod((io), RSND_MOD_CTU)
 #define rsnd_io_to_mod_dvc(io)	rsnd_io_to_mod((io), RSND_MOD_DVC)
 #define rsnd_io_to_rdai(io)	((io)->rdai)
 #define rsnd_io_to_priv(io)	(rsnd_rdai_to_priv(rsnd_io_to_rdai(io)))
@@ -462,6 +466,12 @@ struct rsnd_priv {
 	void *src;
 	int src_nr;
 
+	/*
+	 * below value will be filled on rsnd_ctu_probe()
+	 */
+	void *ctu;
+	int ctu_nr;
+
 	/*
 	 * below value will be filled on rsnd_dvc_probe()
 	 */
@@ -567,6 +577,17 @@ int rsnd_src_ssiu_stop(struct rsnd_mod *ssi_mod,
 int rsnd_src_ssi_irq_enable(struct rsnd_mod *ssi_mod);
 int rsnd_src_ssi_irq_disable(struct rsnd_mod *ssi_mod);
 
+/*
+ *	R-Car CTU
+ */
+int rsnd_ctu_probe(struct platform_device *pdev,
+		   const struct rsnd_of_data *of_data,
+		   struct rsnd_priv *priv);
+
+void rsnd_ctu_remove(struct platform_device *pdev,
+		     struct rsnd_priv *priv);
+struct rsnd_mod *rsnd_ctu_mod_get(struct rsnd_priv *priv, int id);
+
 /*
  *	R-Car DVC
  */
-- 
cgit v1.2.3


From 70fb10529f61c31c26397a02091177bedd23217d Mon Sep 17 00:00:00 2001
From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Date: Wed, 15 Jul 2015 07:17:36 +0000
Subject: ASoC: rsnd: add MIX (Mixer) support

This patch adds MIX (Mixer) initial support for rsnd driver.
It is assuming that this MIX is used via DPCM.

This is sample code for playback.

	CPU0  : [MEM] -> [SRC1] -> [CTU02] -+
					    |
					    +-> [MIX0] -> [DVC0] -> [SSI0]
	                                    |
	CPU1  : [MEM] -> [SRC2] -> [CTU03] -+

	sound {
		compatible = "renesas,rsrc-card";

		...

		cpu@0 {
			sound-dai = <&rcar_sound 0>;
		};

		cpu@1 {
			sound-dai = <&rcar_sound 1>;
		};

		codec {
			...
		};
	};

	rcar_sound {

		...

		rcar_sound,dai {
			dai0 {
				playback = <&src1 &ctu02 &mix0 &dvc0 &ssi0>;
			};
			dai1 {
				playback = <&src2 &ctu03 &mix0 &dvc0 &ssi0>;
			};
		};
	};

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Tested-by: Keita Kobayashi <keita.kobayashi.ym@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../devicetree/bindings/sound/renesas,rsnd.txt     |   8 +
 include/sound/rcar_snd.h                           |   7 +
 sound/soc/sh/rcar/Makefile                         |   2 +-
 sound/soc/sh/rcar/core.c                           |  85 +++++++--
 sound/soc/sh/rcar/dma.c                            |   5 +
 sound/soc/sh/rcar/gen.c                            |  10 ++
 sound/soc/sh/rcar/mix.c                            | 200 +++++++++++++++++++++
 sound/soc/sh/rcar/rsnd.h                           |  29 +++
 8 files changed, 333 insertions(+), 13 deletions(-)
 create mode 100644 sound/soc/sh/rcar/mix.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/sound/renesas,rsnd.txt b/Documentation/devicetree/bindings/sound/renesas,rsnd.txt
index 278607de05de..1173395b5e5c 100644
--- a/Documentation/devicetree/bindings/sound/renesas,rsnd.txt
+++ b/Documentation/devicetree/bindings/sound/renesas,rsnd.txt
@@ -21,6 +21,9 @@ Required properties:
 - rcar_sound,ctu		: Should contain CTU feature.
 				  The number of CTU subnode should be same as HW.
 				  see below for detail.
+- rcar_sound,mix		: Should contain MIX feature.
+				  The number of MIX subnode should be same as HW.
+				  see below for detail.
 - rcar_sound,dvc		: Should contain DVC feature.
 				  The number of DVC subnode should be same as HW.
 				  see below for detail.
@@ -93,6 +96,11 @@ rcar_sound: sound@ec500000 {
 		};
 	};
 
+	rcar_sound,mix {
+		mix0: mix@0 { };
+		mix1: mix@1 { };
+	};
+
 	rcar_sound,ctu {
 		ctu00: ctu@0 { };
 		ctu01: ctu@1 { };
diff --git a/include/sound/rcar_snd.h b/include/sound/rcar_snd.h
index 8f9303093ab9..bb7b2ebfee7b 100644
--- a/include/sound/rcar_snd.h
+++ b/include/sound/rcar_snd.h
@@ -65,6 +65,10 @@ struct rsnd_ctu_platform_info {
 	u32 flags;
 };
 
+struct rsnd_mix_platform_info {
+	u32 flags;
+};
+
 struct rsnd_dvc_platform_info {
 	u32 flags;
 };
@@ -73,6 +77,7 @@ struct rsnd_dai_path_info {
 	struct rsnd_ssi_platform_info *ssi;
 	struct rsnd_src_platform_info *src;
 	struct rsnd_ctu_platform_info *ctu;
+	struct rsnd_mix_platform_info *mix;
 	struct rsnd_dvc_platform_info *dvc;
 };
 
@@ -100,6 +105,8 @@ struct rcar_snd_info {
 	int src_info_nr;
 	struct rsnd_ctu_platform_info *ctu_info;
 	int ctu_info_nr;
+	struct rsnd_mix_platform_info *mix_info;
+	int mix_info_nr;
 	struct rsnd_dvc_platform_info *dvc_info;
 	int dvc_info_nr;
 	struct rsnd_dai_platform_info *dai_info;
diff --git a/sound/soc/sh/rcar/Makefile b/sound/soc/sh/rcar/Makefile
index 7c4730a81c4a..8b258501aa35 100644
--- a/sound/soc/sh/rcar/Makefile
+++ b/sound/soc/sh/rcar/Makefile
@@ -1,4 +1,4 @@
-snd-soc-rcar-objs	:= core.o gen.o dma.o adg.o ssi.o src.o ctu.o dvc.o
+snd-soc-rcar-objs	:= core.o gen.o dma.o adg.o ssi.o src.o ctu.o mix.o dvc.o
 obj-$(CONFIG_SND_SOC_RCAR)	+= snd-soc-rcar.o
 
 snd-soc-rsrc-card-objs	:= rsrc-card.o
diff --git a/sound/soc/sh/rcar/core.c b/sound/soc/sh/rcar/core.c
index 63ae7bbfd1dc..927a7b02123b 100644
--- a/sound/soc/sh/rcar/core.c
+++ b/sound/soc/sh/rcar/core.c
@@ -605,23 +605,74 @@ static const struct snd_soc_dai_ops rsnd_soc_dai_ops = {
 void rsnd_path_parse(struct rsnd_priv *priv,
 		     struct rsnd_dai_stream *io)
 {
-	struct rsnd_mod *src = rsnd_io_to_mod_src(io);
 	struct rsnd_mod *dvc = rsnd_io_to_mod_dvc(io);
-	int src_id = rsnd_mod_id(src);
-	u32 path[] = {
-		[0] = 0x30000,
-		[1] = 0x30001,
-		[2] = 0x40000,
-		[3] = 0x10000,
-		[4] = 0x20000,
-		[5] = 0x40100
-	};
+	struct rsnd_mod *mix = rsnd_io_to_mod_mix(io);
+	struct rsnd_mod *src = rsnd_io_to_mod_src(io);
+	struct rsnd_mod *cmd;
+	struct device *dev = rsnd_priv_to_dev(priv);
+	u32 data;
 
 	/* Gen1 is not supported */
 	if (rsnd_is_gen1(priv))
 		return;
 
-	rsnd_mod_write(dvc, CMD_ROUTE_SLCT, path[src_id]);
+	if (!mix && !dvc)
+		return;
+
+	if (mix) {
+		struct rsnd_dai *rdai;
+		int i;
+		u32 path[] = {
+			[0] = 0,
+			[1] = 1 << 0,
+			[2] = 0,
+			[3] = 0,
+			[4] = 0,
+			[5] = 1 << 8
+		};
+
+		/*
+		 * it is assuming that integrater is well understanding about
+		 * data path. Here doesn't check impossible connection,
+		 * like src2 + src5
+		 */
+		data = 0;
+		for_each_rsnd_dai(rdai, priv, i) {
+			io = &rdai->playback;
+			if (mix == rsnd_io_to_mod_mix(io))
+				data |= path[rsnd_mod_id(src)];
+
+			io = &rdai->capture;
+			if (mix == rsnd_io_to_mod_mix(io))
+				data |= path[rsnd_mod_id(src)];
+		}
+
+		/*
+		 * We can't use ctu = rsnd_io_ctu() here.
+		 * Since, ID of dvc/mix are 0 or 1 (= same as CMD number)
+		 * but ctu IDs are 0 - 7 (= CTU00 - CTU13)
+		 */
+		cmd = mix;
+	} else {
+		u32 path[] = {
+			[0] = 0x30000,
+			[1] = 0x30001,
+			[2] = 0x40000,
+			[3] = 0x10000,
+			[4] = 0x20000,
+			[5] = 0x40100
+		};
+
+		data = path[rsnd_mod_id(src)];
+
+		cmd = dvc;
+	}
+
+	dev_dbg(dev, "ctu/mix path = 0x%08x", data);
+
+	rsnd_mod_write(cmd, CMD_ROUTE_SLCT, data);
+
+	rsnd_mod_write(cmd, CMD_CTRL, 0x10);
 }
 
 static int rsnd_path_init(struct rsnd_priv *priv,
@@ -656,6 +707,11 @@ static int rsnd_path_init(struct rsnd_priv *priv,
 	if (ret < 0)
 		return ret;
 
+	/* MIX */
+	ret = rsnd_path_add(priv, io, mix);
+	if (ret < 0)
+		return ret;
+
 	/* DVC */
 	ret = rsnd_path_add(priv, io, dvc);
 	if (ret < 0)
@@ -672,13 +728,14 @@ static void rsnd_of_parse_dai(struct platform_device *pdev,
 	struct device_node *ssi_node,	*ssi_np;
 	struct device_node *src_node,	*src_np;
 	struct device_node *ctu_node,	*ctu_np;
+	struct device_node *mix_node,	*mix_np;
 	struct device_node *dvc_node,	*dvc_np;
 	struct device_node *playback, *capture;
 	struct rsnd_dai_platform_info *dai_info;
 	struct rcar_snd_info *info = rsnd_priv_to_info(priv);
 	struct device *dev = &pdev->dev;
 	int nr, i;
-	int dai_i, ssi_i, src_i, ctu_i, dvc_i;
+	int dai_i, ssi_i, src_i, ctu_i, mix_i, dvc_i;
 
 	if (!of_data)
 		return;
@@ -705,6 +762,7 @@ static void rsnd_of_parse_dai(struct platform_device *pdev,
 	ssi_node = of_get_child_by_name(dev->of_node, "rcar_sound,ssi");
 	src_node = of_get_child_by_name(dev->of_node, "rcar_sound,src");
 	ctu_node = of_get_child_by_name(dev->of_node, "rcar_sound,ctu");
+	mix_node = of_get_child_by_name(dev->of_node, "rcar_sound,mix");
 	dvc_node = of_get_child_by_name(dev->of_node, "rcar_sound,dvc");
 
 #define mod_parse(name)							\
@@ -742,6 +800,7 @@ if (name##_node) {							\
 			mod_parse(ssi);
 			mod_parse(src);
 			mod_parse(ctu);
+			mod_parse(mix);
 			mod_parse(dvc);
 
 			of_node_put(playback);
@@ -1155,6 +1214,7 @@ static int rsnd_probe(struct platform_device *pdev)
 		rsnd_ssi_probe,
 		rsnd_src_probe,
 		rsnd_ctu_probe,
+		rsnd_mix_probe,
 		rsnd_dvc_probe,
 		rsnd_adg_probe,
 		rsnd_dai_probe,
@@ -1251,6 +1311,7 @@ static int rsnd_remove(struct platform_device *pdev)
 		rsnd_ssi_remove,
 		rsnd_src_remove,
 		rsnd_ctu_remove,
+		rsnd_mix_remove,
 		rsnd_dvc_remove,
 	};
 	int ret = 0, i;
diff --git a/sound/soc/sh/rcar/dma.c b/sound/soc/sh/rcar/dma.c
index 229b68d2cf70..305b12929853 100644
--- a/sound/soc/sh/rcar/dma.c
+++ b/sound/soc/sh/rcar/dma.c
@@ -427,6 +427,7 @@ rsnd_gen2_dma_addr(struct rsnd_dai_stream *io,
 	int is_ssi = !!(rsnd_io_to_mod_ssi(io) == mod);
 	int use_src = !!rsnd_io_to_mod_src(io);
 	int use_cmd = !!rsnd_io_to_mod_dvc(io) ||
+		      !!rsnd_io_to_mod_mix(io) ||
 		      !!rsnd_io_to_mod_ctu(io);
 	int id = rsnd_mod_id(mod);
 	struct dma_addr {
@@ -506,6 +507,7 @@ static void rsnd_dma_of_path(struct rsnd_dma *dma,
 	struct rsnd_mod *ssi = rsnd_io_to_mod_ssi(io);
 	struct rsnd_mod *src = rsnd_io_to_mod_src(io);
 	struct rsnd_mod *ctu = rsnd_io_to_mod_ctu(io);
+	struct rsnd_mod *mix = rsnd_io_to_mod_mix(io);
 	struct rsnd_mod *dvc = rsnd_io_to_mod_dvc(io);
 	struct rsnd_mod *mod[MOD_MAX];
 	struct rsnd_mod *mod_start, *mod_end;
@@ -548,6 +550,9 @@ static void rsnd_dma_of_path(struct rsnd_dma *dma,
 		} else if (ctu) {
 			mod[i] = ctu;
 			ctu = NULL;
+		} else if (mix) {
+			mod[i] = mix;
+			mix = NULL;
 		} else if (dvc) {
 			mod[i] = dvc;
 			dvc = NULL;
diff --git a/sound/soc/sh/rcar/gen.c b/sound/soc/sh/rcar/gen.c
index 41b75cd4e09b..f04d17bc6e3d 100644
--- a/sound/soc/sh/rcar/gen.c
+++ b/sound/soc/sh/rcar/gen.c
@@ -242,6 +242,16 @@ static int rsnd_gen2_probe(struct platform_device *pdev,
 		RSND_GEN_M_REG(SRC_BSISR,	0x238,	0x40),
 		RSND_GEN_M_REG(CTU_CTUIR,	0x504,	0x100),
 		RSND_GEN_M_REG(CTU_ADINR,	0x508,	0x100),
+		RSND_GEN_M_REG(MIX_SWRSR,	0xd00,	0x40),
+		RSND_GEN_M_REG(MIX_MIXIR,	0xd04,	0x40),
+		RSND_GEN_M_REG(MIX_ADINR,	0xd08,	0x40),
+		RSND_GEN_M_REG(MIX_MIXMR,	0xd10,	0x40),
+		RSND_GEN_M_REG(MIX_MVPDR,	0xd14,	0x40),
+		RSND_GEN_M_REG(MIX_MDBAR,	0xd18,	0x40),
+		RSND_GEN_M_REG(MIX_MDBBR,	0xd1c,	0x40),
+		RSND_GEN_M_REG(MIX_MDBCR,	0xd20,	0x40),
+		RSND_GEN_M_REG(MIX_MDBDR,	0xd24,	0x40),
+		RSND_GEN_M_REG(MIX_MDBER,	0xd28,	0x40),
 		RSND_GEN_M_REG(DVC_SWRSR,	0xe00,	0x100),
 		RSND_GEN_M_REG(DVC_DVUIR,	0xe04,	0x100),
 		RSND_GEN_M_REG(DVC_ADINR,	0xe08,	0x100),
diff --git a/sound/soc/sh/rcar/mix.c b/sound/soc/sh/rcar/mix.c
new file mode 100644
index 000000000000..0d5c102db6f5
--- /dev/null
+++ b/sound/soc/sh/rcar/mix.c
@@ -0,0 +1,200 @@
+/*
+ * mix.c
+ *
+ * Copyright (c) 2015 Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include "rsnd.h"
+
+#define MIX_NAME_SIZE	16
+#define MIX_NAME "mix"
+
+struct rsnd_mix {
+	struct rsnd_mix_platform_info *info; /* rcar_snd.h */
+	struct rsnd_mod mod;
+};
+
+#define rsnd_mix_nr(priv) ((priv)->mix_nr)
+#define for_each_rsnd_mix(pos, priv, i)					\
+	for ((i) = 0;							\
+	     ((i) < rsnd_mix_nr(priv)) &&				\
+		     ((pos) = (struct rsnd_mix *)(priv)->mix + i);	\
+	     i++)
+
+
+static void rsnd_mix_soft_reset(struct rsnd_mod *mod)
+{
+	rsnd_mod_write(mod, MIX_SWRSR, 0);
+	rsnd_mod_write(mod, MIX_SWRSR, 1);
+}
+
+#define rsnd_mix_initialize_lock(mod)	__rsnd_mix_initialize_lock(mod, 1)
+#define rsnd_mix_initialize_unlock(mod)	__rsnd_mix_initialize_lock(mod, 0)
+static void __rsnd_mix_initialize_lock(struct rsnd_mod *mod, u32 enable)
+{
+	rsnd_mod_write(mod, MIX_MIXIR, enable);
+}
+
+static void rsnd_mix_volume_update(struct rsnd_dai_stream *io,
+				  struct rsnd_mod *mod)
+{
+
+	/* Disable MIX dB setting */
+	rsnd_mod_write(mod, MIX_MDBER, 0);
+
+	rsnd_mod_write(mod, MIX_MDBAR, 0);
+	rsnd_mod_write(mod, MIX_MDBBR, 0);
+	rsnd_mod_write(mod, MIX_MDBCR, 0);
+	rsnd_mod_write(mod, MIX_MDBDR, 0);
+
+	/* Enable MIX dB setting */
+	rsnd_mod_write(mod, MIX_MDBER, 1);
+}
+
+static int rsnd_mix_init(struct rsnd_mod *mod,
+			 struct rsnd_dai_stream *io,
+			 struct rsnd_priv *priv)
+{
+	rsnd_mod_hw_start(mod);
+
+	rsnd_mix_soft_reset(mod);
+
+	rsnd_mix_initialize_lock(mod);
+
+	rsnd_mod_write(mod, MIX_ADINR, rsnd_get_adinr_chan(mod, io));
+
+	rsnd_path_parse(priv, io);
+
+	/* volume step */
+	rsnd_mod_write(mod, MIX_MIXMR, 0);
+	rsnd_mod_write(mod, MIX_MVPDR, 0);
+
+	rsnd_mix_volume_update(io, mod);
+
+	rsnd_mix_initialize_unlock(mod);
+
+	return 0;
+}
+
+static int rsnd_mix_quit(struct rsnd_mod *mod,
+			 struct rsnd_dai_stream *io,
+			 struct rsnd_priv *priv)
+{
+	rsnd_mod_hw_stop(mod);
+
+	return 0;
+}
+
+static struct rsnd_mod_ops rsnd_mix_ops = {
+	.name		= MIX_NAME,
+	.init		= rsnd_mix_init,
+	.quit		= rsnd_mix_quit,
+};
+
+struct rsnd_mod *rsnd_mix_mod_get(struct rsnd_priv *priv, int id)
+{
+	if (WARN_ON(id < 0 || id >= rsnd_mix_nr(priv)))
+		id = 0;
+
+	return &((struct rsnd_mix *)(priv->mix) + id)->mod;
+}
+
+static void rsnd_of_parse_mix(struct platform_device *pdev,
+			      const struct rsnd_of_data *of_data,
+			      struct rsnd_priv *priv)
+{
+	struct device_node *node;
+	struct rsnd_mix_platform_info *mix_info;
+	struct rcar_snd_info *info = rsnd_priv_to_info(priv);
+	struct device *dev = &pdev->dev;
+	int nr;
+
+	if (!of_data)
+		return;
+
+	node = of_get_child_by_name(dev->of_node, "rcar_sound,mix");
+	if (!node)
+		return;
+
+	nr = of_get_child_count(node);
+	if (!nr)
+		goto rsnd_of_parse_mix_end;
+
+	mix_info = devm_kzalloc(dev,
+				sizeof(struct rsnd_mix_platform_info) * nr,
+				GFP_KERNEL);
+	if (!mix_info) {
+		dev_err(dev, "mix info allocation error\n");
+		goto rsnd_of_parse_mix_end;
+	}
+
+	info->mix_info		= mix_info;
+	info->mix_info_nr	= nr;
+
+rsnd_of_parse_mix_end:
+	of_node_put(node);
+
+}
+
+int rsnd_mix_probe(struct platform_device *pdev,
+		   const struct rsnd_of_data *of_data,
+		   struct rsnd_priv *priv)
+{
+	struct rcar_snd_info *info = rsnd_priv_to_info(priv);
+	struct device *dev = rsnd_priv_to_dev(priv);
+	struct rsnd_mix *mix;
+	struct clk *clk;
+	char name[MIX_NAME_SIZE];
+	int i, nr, ret;
+
+	/* This driver doesn't support Gen1 at this point */
+	if (rsnd_is_gen1(priv)) {
+		dev_warn(dev, "MIX is not supported on Gen1\n");
+		return -EINVAL;
+	}
+
+	rsnd_of_parse_mix(pdev, of_data, priv);
+
+	nr = info->mix_info_nr;
+	if (!nr)
+		return 0;
+
+	mix	= devm_kzalloc(dev, sizeof(*mix) * nr, GFP_KERNEL);
+	if (!mix)
+		return -ENOMEM;
+
+	priv->mix_nr	= nr;
+	priv->mix	= mix;
+
+	for_each_rsnd_mix(mix, priv, i) {
+		snprintf(name, MIX_NAME_SIZE, "%s.%d",
+			 MIX_NAME, i);
+
+		clk = devm_clk_get(dev, name);
+		if (IS_ERR(clk))
+			return PTR_ERR(clk);
+
+		mix->info = &info->mix_info[i];
+
+		ret = rsnd_mod_init(priv, &mix->mod, &rsnd_mix_ops,
+				    clk, RSND_MOD_MIX, i);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+void rsnd_mix_remove(struct platform_device *pdev,
+		     struct rsnd_priv *priv)
+{
+	struct rsnd_mix *mix;
+	int i;
+
+	for_each_rsnd_mix(mix, priv, i) {
+		rsnd_mod_quit(&mix->mod);
+	}
+}
diff --git a/sound/soc/sh/rcar/rsnd.h b/sound/soc/sh/rcar/rsnd.h
index f2128a7cf259..7a0e52b4640a 100644
--- a/sound/soc/sh/rcar/rsnd.h
+++ b/sound/soc/sh/rcar/rsnd.h
@@ -49,6 +49,16 @@ enum rsnd_reg {
 	RSND_REG_CMD_ROUTE_SLCT,
 	RSND_REG_CTU_CTUIR,
 	RSND_REG_CTU_ADINR,
+	RSND_REG_MIX_SWRSR,
+	RSND_REG_MIX_MIXIR,
+	RSND_REG_MIX_ADINR,
+	RSND_REG_MIX_MIXMR,
+	RSND_REG_MIX_MVPDR,
+	RSND_REG_MIX_MDBAR,
+	RSND_REG_MIX_MDBBR,
+	RSND_REG_MIX_MDBCR,
+	RSND_REG_MIX_MDBDR,
+	RSND_REG_MIX_MDBER,
 	RSND_REG_DVC_SWRSR,
 	RSND_REG_DVC_DVUIR,
 	RSND_REG_DVC_ADINR,
@@ -222,6 +232,7 @@ struct dma_chan *rsnd_dma_request_channel(struct device_node *of_node,
  */
 enum rsnd_mod_type {
 	RSND_MOD_DVC = 0,
+	RSND_MOD_MIX,
 	RSND_MOD_CTU,
 	RSND_MOD_SRC,
 	RSND_MOD_SSI,
@@ -355,6 +366,7 @@ struct rsnd_dai_stream {
 #define rsnd_io_to_mod_ssi(io)	rsnd_io_to_mod((io), RSND_MOD_SSI)
 #define rsnd_io_to_mod_src(io)	rsnd_io_to_mod((io), RSND_MOD_SRC)
 #define rsnd_io_to_mod_ctu(io)	rsnd_io_to_mod((io), RSND_MOD_CTU)
+#define rsnd_io_to_mod_mix(io)	rsnd_io_to_mod((io), RSND_MOD_MIX)
 #define rsnd_io_to_mod_dvc(io)	rsnd_io_to_mod((io), RSND_MOD_DVC)
 #define rsnd_io_to_rdai(io)	((io)->rdai)
 #define rsnd_io_to_priv(io)	(rsnd_rdai_to_priv(rsnd_io_to_rdai(io)))
@@ -472,6 +484,12 @@ struct rsnd_priv {
 	void *ctu;
 	int ctu_nr;
 
+	/*
+	 * below value will be filled on rsnd_mix_probe()
+	 */
+	void *mix;
+	int mix_nr;
+
 	/*
 	 * below value will be filled on rsnd_dvc_probe()
 	 */
@@ -588,6 +606,17 @@ void rsnd_ctu_remove(struct platform_device *pdev,
 		     struct rsnd_priv *priv);
 struct rsnd_mod *rsnd_ctu_mod_get(struct rsnd_priv *priv, int id);
 
+/*
+ *	R-Car MIX
+ */
+int rsnd_mix_probe(struct platform_device *pdev,
+		   const struct rsnd_of_data *of_data,
+		   struct rsnd_priv *priv);
+
+void rsnd_mix_remove(struct platform_device *pdev,
+		     struct rsnd_priv *priv);
+struct rsnd_mod *rsnd_mix_mod_get(struct rsnd_priv *priv, int id);
+
 /*
  *	R-Car DVC
  */
-- 
cgit v1.2.3


From 54e7ad47c94d26614acb4fcc577a79b3347c2d86 Mon Sep 17 00:00:00 2001
From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Date: Thu, 16 Jul 2015 21:35:28 +0200
Subject: spi: mpc512x-psc: adapt mpc5121-psc document to reality
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The drivers support MPC5125 additionally to MPC5121, and there is an spi
mode that is also supported. Additionally some minor corrections are
done.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../bindings/powerpc/fsl/mpc5121-psc.txt           | 24 ++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt b/Documentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt
index 8832e8798912..647817527c88 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt
@@ -6,14 +6,14 @@ PSC in UART mode
 For PSC in UART mode the needed PSC serial devices
 are specified by fsl,mpc5121-psc-uart nodes in the
 fsl,mpc5121-immr SoC node. Additionally the PSC FIFO
-Controller node fsl,mpc5121-psc-fifo is requered there:
+Controller node fsl,mpc5121-psc-fifo is required there:
 
-fsl,mpc5121-psc-uart nodes
+fsl,mpc512x-psc-uart nodes
 --------------------------
 
 Required properties :
- - compatible : Should contain "fsl,mpc5121-psc-uart" and "fsl,mpc5121-psc"
- - cell-index : Index of the PSC in hardware
+ - compatible : Should contain "fsl,<soc>-psc-uart" and "fsl,<soc>-psc"
+   Supported <soc>s: mpc5121, mpc5125
  - reg : Offset and length of the register set for the PSC device
  - interrupts : <a b> where a is the interrupt number of the
    PSC FIFO Controller and b is a field that represents an
@@ -25,12 +25,21 @@ Recommended properties :
  - fsl,rx-fifo-size : the size of the RX fifo slice (a multiple of 4)
  - fsl,tx-fifo-size : the size of the TX fifo slice (a multiple of 4)
 
+PSC in SPI mode
+---------------
 
-fsl,mpc5121-psc-fifo node
+Similar to the UART mode a PSC can be operated in SPI mode. The compatible used
+for that is fsl,mpc5121-psc-spi. It requires a fsl,mpc5121-psc-fifo as well.
+The required and recommended properties are identical to the
+fsl,mpc5121-psc-uart nodes, just use spi instead of uart in the compatible
+string.
+
+fsl,mpc512x-psc-fifo node
 -------------------------
 
 Required properties :
- - compatible : Should be "fsl,mpc5121-psc-fifo"
+ - compatible : Should be "fsl,<soc>-psc-fifo"
+   Supported <soc>s: mpc5121, mpc5125
  - reg : Offset and length of the register set for the PSC
          FIFO Controller
  - interrupts : <a b> where a is the interrupt number of the
@@ -39,6 +48,9 @@ Required properties :
  - interrupt-parent : the phandle for the interrupt controller that
    services interrupts for this device.
 
+Recommended properties :
+ - clocks : specifies the clock needed to operate the fifo controller
+ - clock-names : name(s) for the clock(s) listed in clocks
 
 Example for a board using PSC0 and PSC1 devices in serial mode:
 
-- 
cgit v1.2.3


From b4c45fe974bc5fa6240a729ea1f77db8b56d132a Mon Sep 17 00:00:00 2001
From: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Date: Tue, 14 Jul 2015 23:40:35 -0700
Subject: pinctrl: qcom: ssbi: Family A gpio & mpp drivers

This introduces pinctrl drivers for gpio and mpp blocks found in family A
PMICs.

Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 .../devicetree/bindings/pinctrl/qcom,pmic-mpp.txt  |   5 +
 drivers/pinctrl/qcom/Kconfig                       |  12 +
 drivers/pinctrl/qcom/Makefile                      |   2 +
 drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c           | 791 ++++++++++++++++++
 drivers/pinctrl/qcom/pinctrl-ssbi-mpp.c            | 882 +++++++++++++++++++++
 include/dt-bindings/pinctrl/qcom,pmic-mpp.h        |  51 ++
 6 files changed, 1743 insertions(+)
 create mode 100644 drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
 create mode 100644 drivers/pinctrl/qcom/pinctrl-ssbi-mpp.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt
index b096d8351b8f..d7803a2a94e9 100644
--- a/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt
+++ b/Documentation/devicetree/bindings/pinctrl/qcom,pmic-mpp.txt
@@ -7,8 +7,13 @@ of PMIC's from Qualcomm.
 	Usage: required
 	Value type: <string>
 	Definition: Should contain one of:
+		    "qcom,pm8018-mpp",
+		    "qcom,pm8038-mpp",
+		    "qcom,pm8821-mpp",
 		    "qcom,pm8841-mpp",
 		    "qcom,pm8916-mpp",
+		    "qcom,pm8917-mpp",
+		    "qcom,pm8921-mpp",
 		    "qcom,pm8941-mpp",
 		    "qcom,pma8084-mpp",
 
diff --git a/drivers/pinctrl/qcom/Kconfig b/drivers/pinctrl/qcom/Kconfig
index 58f5632b27f4..8eef820b216e 100644
--- a/drivers/pinctrl/qcom/Kconfig
+++ b/drivers/pinctrl/qcom/Kconfig
@@ -76,4 +76,16 @@ config PINCTRL_QCOM_SPMI_PMIC
          which are using SPMI for communication with SoC. Example PMIC's
          devices are pm8841, pm8941 and pma8084.
 
+config PINCTRL_QCOM_SSBI_PMIC
+       tristate "Qualcomm SSBI PMIC pin controller driver"
+       depends on GPIOLIB && OF
+       select PINMUX
+       select PINCONF
+       select GENERIC_PINCONF
+       help
+         This is the pinctrl, pinmux, pinconf and gpiolib driver for the
+         Qualcomm GPIO and MPP blocks found in the Qualcomm PMIC's chips,
+         which are using SSBI for communication with SoC. Example PMIC's
+         devices are pm8058 and pm8921.
+
 endif
diff --git a/drivers/pinctrl/qcom/Makefile b/drivers/pinctrl/qcom/Makefile
index 3666c703ce88..e321f7ab325b 100644
--- a/drivers/pinctrl/qcom/Makefile
+++ b/drivers/pinctrl/qcom/Makefile
@@ -9,3 +9,5 @@ obj-$(CONFIG_PINCTRL_MSM8X74)	+= pinctrl-msm8x74.o
 obj-$(CONFIG_PINCTRL_MSM8916)	+= pinctrl-msm8916.o
 obj-$(CONFIG_PINCTRL_QCOM_SPMI_PMIC) += pinctrl-spmi-gpio.o
 obj-$(CONFIG_PINCTRL_QCOM_SPMI_PMIC) += pinctrl-spmi-mpp.o
+obj-$(CONFIG_PINCTRL_QCOM_SSBI_PMIC) += pinctrl-ssbi-gpio.o
+obj-$(CONFIG_PINCTRL_QCOM_SSBI_PMIC) += pinctrl-ssbi-mpp.o
diff --git a/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c b/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
new file mode 100644
index 000000000000..c978b311031b
--- /dev/null
+++ b/drivers/pinctrl/qcom/pinctrl-ssbi-gpio.c
@@ -0,0 +1,791 @@
+/*
+ * Copyright (c) 2015, Sony Mobile Communications AB.
+ * Copyright (c) 2013, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pinctrl/pinctrl.h>
+#include <linux/pinctrl/pinmux.h>
+#include <linux/pinctrl/pinconf.h>
+#include <linux/pinctrl/pinconf-generic.h>
+#include <linux/slab.h>
+#include <linux/regmap.h>
+#include <linux/gpio.h>
+#include <linux/interrupt.h>
+#include <linux/of_device.h>
+
+#include <dt-bindings/pinctrl/qcom,pmic-gpio.h>
+
+#include "../core.h"
+#include "../pinctrl-utils.h"
+
+/* mode */
+#define PM8XXX_GPIO_MODE_ENABLED	BIT(0)
+#define PM8XXX_GPIO_MODE_INPUT		0
+#define PM8XXX_GPIO_MODE_OUTPUT		2
+
+/* output buffer */
+#define PM8XXX_GPIO_PUSH_PULL		0
+#define PM8XXX_GPIO_OPEN_DRAIN		1
+
+/* bias */
+#define PM8XXX_GPIO_BIAS_PU_30		0
+#define PM8XXX_GPIO_BIAS_PU_1P5		1
+#define PM8XXX_GPIO_BIAS_PU_31P5	2
+#define PM8XXX_GPIO_BIAS_PU_1P5_30	3
+#define PM8XXX_GPIO_BIAS_PD		4
+#define PM8XXX_GPIO_BIAS_NP		5
+
+/* GPIO registers */
+#define SSBI_REG_ADDR_GPIO_BASE		0x150
+#define SSBI_REG_ADDR_GPIO(n)		(SSBI_REG_ADDR_GPIO_BASE + n)
+
+#define PM8XXX_BANK_WRITE		BIT(7)
+
+#define PM8XXX_MAX_GPIOS               44
+
+/* custom pinconf parameters */
+#define PM8XXX_QCOM_DRIVE_STRENGH      (PIN_CONFIG_END + 1)
+#define PM8XXX_QCOM_PULL_UP_STRENGTH   (PIN_CONFIG_END + 2)
+
+/**
+ * struct pm8xxx_pin_data - dynamic configuration for a pin
+ * @reg:               address of the control register
+ * @irq:               IRQ from the PMIC interrupt controller
+ * @power_source:      logical selected voltage source, mapping in static data
+ *                     is used translate to register values
+ * @mode:              operating mode for the pin (input/output)
+ * @open_drain:        output buffer configured as open-drain (vs push-pull)
+ * @output_value:      configured output value
+ * @bias:              register view of configured bias
+ * @pull_up_strength:  placeholder for selected pull up strength
+ *                     only used to configure bias when pull up is selected
+ * @output_strength:   selector of output-strength
+ * @disable:           pin disabled / configured as tristate
+ * @function:          pinmux selector
+ * @inverted:          pin logic is inverted
+ */
+struct pm8xxx_pin_data {
+	unsigned reg;
+	int irq;
+	u8 power_source;
+	u8 mode;
+	bool open_drain;
+	bool output_value;
+	u8 bias;
+	u8 pull_up_strength;
+	u8 output_strength;
+	bool disable;
+	u8 function;
+	bool inverted;
+};
+
+struct pm8xxx_gpio {
+	struct device *dev;
+	struct regmap *regmap;
+	struct pinctrl_dev *pctrl;
+	struct gpio_chip chip;
+
+	struct pinctrl_desc desc;
+	unsigned npins;
+};
+
+static const struct pinconf_generic_params pm8xxx_gpio_bindings[] = {
+	{"qcom,drive-strength",		PM8XXX_QCOM_DRIVE_STRENGH,	0},
+	{"qcom,pull-up-strength",	PM8XXX_QCOM_PULL_UP_STRENGTH,	0},
+};
+
+#ifdef CONFIG_DEBUG_FS
+static const struct pin_config_item pm8xxx_conf_items[ARRAY_SIZE(pm8xxx_gpio_bindings)] = {
+	PCONFDUMP(PM8XXX_QCOM_DRIVE_STRENGH, "drive-strength", NULL, true),
+	PCONFDUMP(PM8XXX_QCOM_PULL_UP_STRENGTH,  "pull up strength", NULL, true),
+};
+#endif
+
+static const char * const pm8xxx_groups[PM8XXX_MAX_GPIOS] = {
+	"gpio1", "gpio2", "gpio3", "gpio4", "gpio5", "gpio6", "gpio7", "gpio8",
+	"gpio9", "gpio10", "gpio11", "gpio12", "gpio13", "gpio14", "gpio15",
+	"gpio16", "gpio17", "gpio18", "gpio19", "gpio20", "gpio21", "gpio22",
+	"gpio23", "gpio24", "gpio25", "gpio26", "gpio27", "gpio28", "gpio29",
+	"gpio30", "gpio31", "gpio32", "gpio33", "gpio34", "gpio35", "gpio36",
+	"gpio37", "gpio38", "gpio39", "gpio40", "gpio41", "gpio42", "gpio43",
+	"gpio44",
+};
+
+static const char * const pm8xxx_gpio_functions[] = {
+	PMIC_GPIO_FUNC_NORMAL, PMIC_GPIO_FUNC_PAIRED,
+	PMIC_GPIO_FUNC_FUNC1, PMIC_GPIO_FUNC_FUNC2,
+	PMIC_GPIO_FUNC_DTEST1, PMIC_GPIO_FUNC_DTEST2,
+	PMIC_GPIO_FUNC_DTEST3, PMIC_GPIO_FUNC_DTEST4,
+};
+
+static int pm8xxx_read_bank(struct pm8xxx_gpio *pctrl,
+			    struct pm8xxx_pin_data *pin, int bank)
+{
+	unsigned int val = bank << 4;
+	int ret;
+
+	ret = regmap_write(pctrl->regmap, pin->reg, val);
+	if (ret) {
+		dev_err(pctrl->dev, "failed to select bank %d\n", bank);
+		return ret;
+	}
+
+	ret = regmap_read(pctrl->regmap, pin->reg, &val);
+	if (ret) {
+		dev_err(pctrl->dev, "failed to read register %d\n", bank);
+		return ret;
+	}
+
+	return val;
+}
+
+static int pm8xxx_write_bank(struct pm8xxx_gpio *pctrl,
+			     struct pm8xxx_pin_data *pin,
+			     int bank,
+			     u8 val)
+{
+	int ret;
+
+	val |= PM8XXX_BANK_WRITE;
+	val |= bank << 4;
+
+	ret = regmap_write(pctrl->regmap, pin->reg, val);
+	if (ret)
+		dev_err(pctrl->dev, "failed to write register\n");
+
+	return ret;
+}
+
+static int pm8xxx_get_groups_count(struct pinctrl_dev *pctldev)
+{
+	struct pm8xxx_gpio *pctrl = pinctrl_dev_get_drvdata(pctldev);
+
+	return pctrl->npins;
+}
+
+static const char *pm8xxx_get_group_name(struct pinctrl_dev *pctldev,
+					 unsigned group)
+{
+	return pm8xxx_groups[group];
+}
+
+
+static int pm8xxx_get_group_pins(struct pinctrl_dev *pctldev,
+				 unsigned group,
+				 const unsigned **pins,
+				 unsigned *num_pins)
+{
+	struct pm8xxx_gpio *pctrl = pinctrl_dev_get_drvdata(pctldev);
+
+	*pins = &pctrl->desc.pins[group].number;
+	*num_pins = 1;
+
+	return 0;
+}
+
+static const struct pinctrl_ops pm8xxx_pinctrl_ops = {
+	.get_groups_count	= pm8xxx_get_groups_count,
+	.get_group_name		= pm8xxx_get_group_name,
+	.get_group_pins         = pm8xxx_get_group_pins,
+	.dt_node_to_map		= pinconf_generic_dt_node_to_map_group,
+	.dt_free_map		= pinctrl_utils_dt_free_map,
+};
+
+static int pm8xxx_get_functions_count(struct pinctrl_dev *pctldev)
+{
+	return ARRAY_SIZE(pm8xxx_gpio_functions);
+}
+
+static const char *pm8xxx_get_function_name(struct pinctrl_dev *pctldev,
+					    unsigned function)
+{
+	return pm8xxx_gpio_functions[function];
+}
+
+static int pm8xxx_get_function_groups(struct pinctrl_dev *pctldev,
+				      unsigned function,
+				      const char * const **groups,
+				      unsigned * const num_groups)
+{
+	struct pm8xxx_gpio *pctrl = pinctrl_dev_get_drvdata(pctldev);
+
+	*groups = pm8xxx_groups;
+	*num_groups = pctrl->npins;
+	return 0;
+}
+
+static int pm8xxx_pinmux_set_mux(struct pinctrl_dev *pctldev,
+				 unsigned function,
+				 unsigned group)
+{
+	struct pm8xxx_gpio *pctrl = pinctrl_dev_get_drvdata(pctldev);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[group].drv_data;
+	u8 val;
+
+	pin->function = function;
+	val = pin->function << 1;
+
+	pm8xxx_write_bank(pctrl, pin, 4, val);
+
+	return 0;
+}
+
+static const struct pinmux_ops pm8xxx_pinmux_ops = {
+	.get_functions_count	= pm8xxx_get_functions_count,
+	.get_function_name	= pm8xxx_get_function_name,
+	.get_function_groups	= pm8xxx_get_function_groups,
+	.set_mux		= pm8xxx_pinmux_set_mux,
+};
+
+static int pm8xxx_pin_config_get(struct pinctrl_dev *pctldev,
+				 unsigned int offset,
+				 unsigned long *config)
+{
+	struct pm8xxx_gpio *pctrl = pinctrl_dev_get_drvdata(pctldev);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+	unsigned param = pinconf_to_config_param(*config);
+	unsigned arg;
+
+	switch (param) {
+	case PIN_CONFIG_BIAS_DISABLE:
+		arg = pin->bias == PM8XXX_GPIO_BIAS_NP;
+		break;
+	case PIN_CONFIG_BIAS_PULL_DOWN:
+		arg = pin->bias == PM8XXX_GPIO_BIAS_PD;
+		break;
+	case PIN_CONFIG_BIAS_PULL_UP:
+		arg = pin->bias <= PM8XXX_GPIO_BIAS_PU_1P5_30;
+		break;
+	case PM8XXX_QCOM_PULL_UP_STRENGTH:
+		arg = pin->pull_up_strength;
+		break;
+	case PIN_CONFIG_BIAS_HIGH_IMPEDANCE:
+		arg = pin->disable;
+		break;
+	case PIN_CONFIG_INPUT_ENABLE:
+		arg = pin->mode == PM8XXX_GPIO_MODE_INPUT;
+		break;
+	case PIN_CONFIG_OUTPUT:
+		if (pin->mode & PM8XXX_GPIO_MODE_OUTPUT)
+			arg = pin->output_value;
+		else
+			arg = 0;
+		break;
+	case PIN_CONFIG_POWER_SOURCE:
+		arg = pin->power_source;
+		break;
+	case PM8XXX_QCOM_DRIVE_STRENGH:
+		arg = pin->output_strength;
+		break;
+	case PIN_CONFIG_DRIVE_PUSH_PULL:
+		arg = !pin->open_drain;
+		break;
+	case PIN_CONFIG_DRIVE_OPEN_DRAIN:
+		arg = pin->open_drain;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	*config = pinconf_to_config_packed(param, arg);
+
+	return 0;
+}
+
+static int pm8xxx_pin_config_set(struct pinctrl_dev *pctldev,
+				 unsigned int offset,
+				 unsigned long *configs,
+				 unsigned num_configs)
+{
+	struct pm8xxx_gpio *pctrl = pinctrl_dev_get_drvdata(pctldev);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+	unsigned param;
+	unsigned arg;
+	unsigned i;
+	u8 banks = 0;
+	u8 val;
+
+	for (i = 0; i < num_configs; i++) {
+		param = pinconf_to_config_param(configs[i]);
+		arg = pinconf_to_config_argument(configs[i]);
+
+		switch (param) {
+		case PIN_CONFIG_BIAS_DISABLE:
+			pin->bias = PM8XXX_GPIO_BIAS_NP;
+			banks |= BIT(2);
+			pin->disable = 0;
+			banks |= BIT(3);
+			break;
+		case PIN_CONFIG_BIAS_PULL_DOWN:
+			pin->bias = PM8XXX_GPIO_BIAS_PD;
+			banks |= BIT(2);
+			pin->disable = 0;
+			banks |= BIT(3);
+			break;
+		case PM8XXX_QCOM_PULL_UP_STRENGTH:
+			if (arg > PM8XXX_GPIO_BIAS_PU_1P5_30) {
+				dev_err(pctrl->dev, "invalid pull-up strength\n");
+				return -EINVAL;
+			}
+			pin->pull_up_strength = arg;
+			/* FALLTHROUGH */
+		case PIN_CONFIG_BIAS_PULL_UP:
+			pin->bias = pin->pull_up_strength;
+			banks |= BIT(2);
+			pin->disable = 0;
+			banks |= BIT(3);
+			break;
+		case PIN_CONFIG_BIAS_HIGH_IMPEDANCE:
+			pin->disable = 1;
+			banks |= BIT(3);
+			break;
+		case PIN_CONFIG_INPUT_ENABLE:
+			pin->mode = PM8XXX_GPIO_MODE_INPUT;
+			banks |= BIT(0) | BIT(1);
+			break;
+		case PIN_CONFIG_OUTPUT:
+			pin->mode = PM8XXX_GPIO_MODE_OUTPUT;
+			pin->output_value = !!arg;
+			banks |= BIT(0) | BIT(1);
+			break;
+		case PIN_CONFIG_POWER_SOURCE:
+			pin->power_source = arg;
+			banks |= BIT(0);
+			break;
+		case PM8XXX_QCOM_DRIVE_STRENGH:
+			if (arg > PMIC_GPIO_STRENGTH_LOW) {
+				dev_err(pctrl->dev, "invalid drive strength\n");
+				return -EINVAL;
+			}
+			pin->output_strength = arg;
+			banks |= BIT(3);
+			break;
+		case PIN_CONFIG_DRIVE_PUSH_PULL:
+			pin->open_drain = 0;
+			banks |= BIT(1);
+			break;
+		case PIN_CONFIG_DRIVE_OPEN_DRAIN:
+			pin->open_drain = 1;
+			banks |= BIT(1);
+			break;
+		default:
+			dev_err(pctrl->dev,
+				"unsupported config parameter: %x\n",
+				param);
+			return -EINVAL;
+		}
+	}
+
+	if (banks & BIT(0)) {
+		val = pin->power_source << 1;
+		val |= PM8XXX_GPIO_MODE_ENABLED;
+		pm8xxx_write_bank(pctrl, pin, 0, val);
+	}
+
+	if (banks & BIT(1)) {
+		val = pin->mode << 2;
+		val |= pin->open_drain << 1;
+		val |= pin->output_value;
+		pm8xxx_write_bank(pctrl, pin, 1, val);
+	}
+
+	if (banks & BIT(2)) {
+		val = pin->bias << 1;
+		pm8xxx_write_bank(pctrl, pin, 2, val);
+	}
+
+	if (banks & BIT(3)) {
+		val = pin->output_strength << 2;
+		val |= pin->disable;
+		pm8xxx_write_bank(pctrl, pin, 3, val);
+	}
+
+	if (banks & BIT(4)) {
+		val = pin->function << 1;
+		pm8xxx_write_bank(pctrl, pin, 4, val);
+	}
+
+	if (banks & BIT(5)) {
+		val = 0;
+		if (!pin->inverted)
+			val |= BIT(3);
+		pm8xxx_write_bank(pctrl, pin, 5, val);
+	}
+
+	return 0;
+}
+
+static const struct pinconf_ops pm8xxx_pinconf_ops = {
+	.is_generic = true,
+	.pin_config_group_get = pm8xxx_pin_config_get,
+	.pin_config_group_set = pm8xxx_pin_config_set,
+};
+
+static struct pinctrl_desc pm8xxx_pinctrl_desc = {
+	.name = "pm8xxx_gpio",
+	.pctlops = &pm8xxx_pinctrl_ops,
+	.pmxops = &pm8xxx_pinmux_ops,
+	.confops = &pm8xxx_pinconf_ops,
+	.owner = THIS_MODULE,
+};
+
+static int pm8xxx_gpio_direction_input(struct gpio_chip *chip,
+				       unsigned offset)
+{
+	struct pm8xxx_gpio *pctrl = container_of(chip, struct pm8xxx_gpio, chip);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+	u8 val;
+
+	pin->mode = PM8XXX_GPIO_MODE_INPUT;
+	val = pin->mode << 2;
+
+	pm8xxx_write_bank(pctrl, pin, 1, val);
+
+	return 0;
+}
+
+static int pm8xxx_gpio_direction_output(struct gpio_chip *chip,
+					unsigned offset,
+					int value)
+{
+	struct pm8xxx_gpio *pctrl = container_of(chip, struct pm8xxx_gpio, chip);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+	u8 val;
+
+	pin->mode = PM8XXX_GPIO_MODE_OUTPUT;
+	pin->output_value = !!value;
+
+	val = pin->mode << 2;
+	val |= pin->open_drain << 1;
+	val |= pin->output_value;
+
+	pm8xxx_write_bank(pctrl, pin, 1, val);
+
+	return 0;
+}
+
+static int pm8xxx_gpio_get(struct gpio_chip *chip, unsigned offset)
+{
+	struct pm8xxx_gpio *pctrl = container_of(chip, struct pm8xxx_gpio, chip);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+	bool state;
+	int ret;
+
+	if (pin->mode == PM8XXX_GPIO_MODE_OUTPUT) {
+		ret = pin->output_value;
+	} else {
+		ret = irq_get_irqchip_state(pin->irq, IRQCHIP_STATE_LINE_LEVEL, &state);
+		if (!ret)
+			ret = !!state;
+	}
+
+	return ret;
+}
+
+static void pm8xxx_gpio_set(struct gpio_chip *chip, unsigned offset, int value)
+{
+	struct pm8xxx_gpio *pctrl = container_of(chip, struct pm8xxx_gpio, chip);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+	u8 val;
+
+	pin->output_value = !!value;
+
+	val = pin->mode << 2;
+	val |= pin->open_drain << 1;
+	val |= pin->output_value;
+
+	pm8xxx_write_bank(pctrl, pin, 1, val);
+}
+
+static int pm8xxx_gpio_of_xlate(struct gpio_chip *chip,
+				const struct of_phandle_args *gpio_desc,
+				u32 *flags)
+{
+	if (chip->of_gpio_n_cells < 2)
+		return -EINVAL;
+
+	if (flags)
+		*flags = gpio_desc->args[1];
+
+	return gpio_desc->args[0] - 1;
+}
+
+
+static int pm8xxx_gpio_to_irq(struct gpio_chip *chip, unsigned offset)
+{
+	struct pm8xxx_gpio *pctrl = container_of(chip, struct pm8xxx_gpio, chip);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+
+	return pin->irq;
+}
+
+#ifdef CONFIG_DEBUG_FS
+#include <linux/seq_file.h>
+
+static void pm8xxx_gpio_dbg_show_one(struct seq_file *s,
+				  struct pinctrl_dev *pctldev,
+				  struct gpio_chip *chip,
+				  unsigned offset,
+				  unsigned gpio)
+{
+	struct pm8xxx_gpio *pctrl = container_of(chip, struct pm8xxx_gpio, chip);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+
+	static const char * const modes[] = {
+		"in", "both", "out", "off"
+	};
+	static const char * const biases[] = {
+		"pull-up 30uA", "pull-up 1.5uA", "pull-up 31.5uA",
+		"pull-up 1.5uA + 30uA boost", "pull-down 10uA", "no pull"
+	};
+	static const char * const buffer_types[] = {
+		"push-pull", "open-drain"
+	};
+	static const char * const strengths[] = {
+		"no", "high", "medium", "low"
+	};
+
+	seq_printf(s, " gpio%-2d:", offset + 1);
+	if (pin->disable) {
+		seq_puts(s, " ---");
+	} else {
+		seq_printf(s, " %-4s", modes[pin->mode]);
+		seq_printf(s, " %-7s", pm8xxx_gpio_functions[pin->function]);
+		seq_printf(s, " VIN%d", pin->power_source);
+		seq_printf(s, " %-27s", biases[pin->bias]);
+		seq_printf(s, " %-10s", buffer_types[pin->open_drain]);
+		seq_printf(s, " %-4s", pin->output_value ? "high" : "low");
+		seq_printf(s, " %-7s", strengths[pin->output_strength]);
+		if (pin->inverted)
+			seq_puts(s, " inverted");
+	}
+}
+
+static void pm8xxx_gpio_dbg_show(struct seq_file *s, struct gpio_chip *chip)
+{
+	unsigned gpio = chip->base;
+	unsigned i;
+
+	for (i = 0; i < chip->ngpio; i++, gpio++) {
+		pm8xxx_gpio_dbg_show_one(s, NULL, chip, i, gpio);
+		seq_puts(s, "\n");
+	}
+}
+
+#else
+#define msm_gpio_dbg_show NULL
+#endif
+
+static struct gpio_chip pm8xxx_gpio_template = {
+	.direction_input = pm8xxx_gpio_direction_input,
+	.direction_output = pm8xxx_gpio_direction_output,
+	.get = pm8xxx_gpio_get,
+	.set = pm8xxx_gpio_set,
+	.of_xlate = pm8xxx_gpio_of_xlate,
+	.to_irq = pm8xxx_gpio_to_irq,
+	.dbg_show = pm8xxx_gpio_dbg_show,
+	.owner = THIS_MODULE,
+};
+
+static int pm8xxx_pin_populate(struct pm8xxx_gpio *pctrl,
+			       struct pm8xxx_pin_data *pin)
+{
+	int val;
+
+	val = pm8xxx_read_bank(pctrl, pin, 0);
+	if (val < 0)
+		return val;
+
+	pin->power_source = (val >> 1) & 0x7;
+
+	val = pm8xxx_read_bank(pctrl, pin, 1);
+	if (val < 0)
+		return val;
+
+	pin->mode = (val >> 2) & 0x3;
+	pin->open_drain = !!(val & BIT(1));
+	pin->output_value = val & BIT(0);
+
+	val = pm8xxx_read_bank(pctrl, pin, 2);
+	if (val < 0)
+		return val;
+
+	pin->bias = (val >> 1) & 0x7;
+	if (pin->bias <= PM8XXX_GPIO_BIAS_PU_1P5_30)
+		pin->pull_up_strength = pin->bias;
+	else
+		pin->pull_up_strength = PM8XXX_GPIO_BIAS_PU_30;
+
+	val = pm8xxx_read_bank(pctrl, pin, 3);
+	if (val < 0)
+		return val;
+
+	pin->output_strength = (val >> 2) & 0x3;
+	pin->disable = val & BIT(0);
+
+	val = pm8xxx_read_bank(pctrl, pin, 4);
+	if (val < 0)
+		return val;
+
+	pin->function = (val >> 1) & 0x7;
+
+	val = pm8xxx_read_bank(pctrl, pin, 5);
+	if (val < 0)
+		return val;
+
+	pin->inverted = !(val & BIT(3));
+
+	return 0;
+}
+
+static const struct of_device_id pm8xxx_gpio_of_match[] = {
+	{ .compatible = "qcom,pm8018-gpio", .data = (void *)6 },
+	{ .compatible = "qcom,pm8038-gpio", .data = (void *)12 },
+	{ .compatible = "qcom,pm8058-gpio", .data = (void *)40 },
+	{ .compatible = "qcom,pm8917-gpio", .data = (void *)38 },
+	{ .compatible = "qcom,pm8921-gpio", .data = (void *)44 },
+	{ },
+};
+MODULE_DEVICE_TABLE(of, pm8xxx_gpio_of_match);
+
+static int pm8xxx_gpio_probe(struct platform_device *pdev)
+{
+	struct pm8xxx_pin_data *pin_data;
+	struct pinctrl_pin_desc *pins;
+	struct pm8xxx_gpio *pctrl;
+	int ret;
+	int i;
+
+	pctrl = devm_kzalloc(&pdev->dev, sizeof(*pctrl), GFP_KERNEL);
+	if (!pctrl)
+		return -ENOMEM;
+
+	pctrl->dev = &pdev->dev;
+	pctrl->npins = (unsigned)of_device_get_match_data(&pdev->dev);
+
+	pctrl->regmap = dev_get_regmap(pdev->dev.parent, NULL);
+	if (!pctrl->regmap) {
+		dev_err(&pdev->dev, "parent regmap unavailable\n");
+		return -ENXIO;
+	}
+
+	pctrl->desc = pm8xxx_pinctrl_desc;
+	pctrl->desc.npins = pctrl->npins;
+
+	pins = devm_kcalloc(&pdev->dev,
+			    pctrl->desc.npins,
+			    sizeof(struct pinctrl_pin_desc),
+			    GFP_KERNEL);
+	if (!pins)
+		return -ENOMEM;
+
+	pin_data = devm_kcalloc(&pdev->dev,
+				pctrl->desc.npins,
+				sizeof(struct pm8xxx_pin_data),
+				GFP_KERNEL);
+	if (!pin_data)
+		return -ENOMEM;
+
+	for (i = 0; i < pctrl->desc.npins; i++) {
+		pin_data[i].reg = SSBI_REG_ADDR_GPIO(i);
+		pin_data[i].irq = platform_get_irq(pdev, i);
+		if (pin_data[i].irq < 0) {
+			dev_err(&pdev->dev,
+				"missing interrupts for pin %d\n", i);
+			return pin_data[i].irq;
+		}
+
+		ret = pm8xxx_pin_populate(pctrl, &pin_data[i]);
+		if (ret)
+			return ret;
+
+		pins[i].number = i;
+		pins[i].name = pm8xxx_groups[i];
+		pins[i].drv_data = &pin_data[i];
+	}
+	pctrl->desc.pins = pins;
+
+	pctrl->desc.num_custom_params = ARRAY_SIZE(pm8xxx_gpio_bindings);
+	pctrl->desc.custom_params = pm8xxx_gpio_bindings;
+#ifdef CONFIG_DEBUG_FS
+	pctrl->desc.custom_conf_items = pm8xxx_conf_items;
+#endif
+
+	pctrl->pctrl = pinctrl_register(&pctrl->desc, &pdev->dev, pctrl);
+	if (!pctrl->pctrl) {
+		dev_err(&pdev->dev, "couldn't register pm8xxx gpio driver\n");
+		return -ENODEV;
+	}
+
+	pctrl->chip = pm8xxx_gpio_template;
+	pctrl->chip.base = -1;
+	pctrl->chip.dev = &pdev->dev;
+	pctrl->chip.of_node = pdev->dev.of_node;
+	pctrl->chip.of_gpio_n_cells = 2;
+	pctrl->chip.label = dev_name(pctrl->dev);
+	pctrl->chip.ngpio = pctrl->npins;
+	ret = gpiochip_add(&pctrl->chip);
+	if (ret) {
+		dev_err(&pdev->dev, "failed register gpiochip\n");
+		goto unregister_pinctrl;
+	}
+
+	ret = gpiochip_add_pin_range(&pctrl->chip,
+				     dev_name(pctrl->dev),
+				     0, 0, pctrl->chip.ngpio);
+	if (ret) {
+		dev_err(pctrl->dev, "failed to add pin range\n");
+		goto unregister_gpiochip;
+	}
+
+	platform_set_drvdata(pdev, pctrl);
+
+	dev_dbg(&pdev->dev, "Qualcomm pm8xxx gpio driver probed\n");
+
+	return 0;
+
+unregister_gpiochip:
+	gpiochip_remove(&pctrl->chip);
+
+unregister_pinctrl:
+	pinctrl_unregister(pctrl->pctrl);
+
+	return ret;
+}
+
+static int pm8xxx_gpio_remove(struct platform_device *pdev)
+{
+	struct pm8xxx_gpio *pctrl = platform_get_drvdata(pdev);
+
+	gpiochip_remove(&pctrl->chip);
+
+	pinctrl_unregister(pctrl->pctrl);
+
+	return 0;
+}
+
+static struct platform_driver pm8xxx_gpio_driver = {
+	.driver = {
+		.name = "qcom-ssbi-gpio",
+		.of_match_table = pm8xxx_gpio_of_match,
+	},
+	.probe = pm8xxx_gpio_probe,
+	.remove = pm8xxx_gpio_remove,
+};
+
+module_platform_driver(pm8xxx_gpio_driver);
+
+MODULE_AUTHOR("Bjorn Andersson <bjorn.andersson@sonymobile.com>");
+MODULE_DESCRIPTION("Qualcomm PM8xxx GPIO driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/pinctrl/qcom/pinctrl-ssbi-mpp.c b/drivers/pinctrl/qcom/pinctrl-ssbi-mpp.c
new file mode 100644
index 000000000000..2d1b69f171be
--- /dev/null
+++ b/drivers/pinctrl/qcom/pinctrl-ssbi-mpp.c
@@ -0,0 +1,882 @@
+/*
+ * Copyright (c) 2015, Sony Mobile Communications AB.
+ * Copyright (c) 2013, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pinctrl/pinctrl.h>
+#include <linux/pinctrl/pinmux.h>
+#include <linux/pinctrl/pinconf.h>
+#include <linux/pinctrl/pinconf-generic.h>
+#include <linux/slab.h>
+#include <linux/regmap.h>
+#include <linux/gpio.h>
+#include <linux/interrupt.h>
+#include <linux/of_device.h>
+
+#include <dt-bindings/pinctrl/qcom,pmic-mpp.h>
+
+#include "../core.h"
+#include "../pinctrl-utils.h"
+
+/* MPP registers */
+#define SSBI_REG_ADDR_MPP_BASE		0x50
+#define SSBI_REG_ADDR_MPP(n)		(SSBI_REG_ADDR_MPP_BASE + n)
+
+/* MPP Type: type */
+#define PM8XXX_MPP_TYPE_D_INPUT         0
+#define PM8XXX_MPP_TYPE_D_OUTPUT        1
+#define PM8XXX_MPP_TYPE_D_BI_DIR        2
+#define PM8XXX_MPP_TYPE_A_INPUT         3
+#define PM8XXX_MPP_TYPE_A_OUTPUT        4
+#define PM8XXX_MPP_TYPE_SINK            5
+#define PM8XXX_MPP_TYPE_DTEST_SINK      6
+#define PM8XXX_MPP_TYPE_DTEST_OUTPUT    7
+
+/* Digital Input: control */
+#define PM8XXX_MPP_DIN_TO_INT           0
+#define PM8XXX_MPP_DIN_TO_DBUS1         1
+#define PM8XXX_MPP_DIN_TO_DBUS2         2
+#define PM8XXX_MPP_DIN_TO_DBUS3         3
+
+/* Digital Output: control */
+#define PM8XXX_MPP_DOUT_CTRL_LOW        0
+#define PM8XXX_MPP_DOUT_CTRL_HIGH       1
+#define PM8XXX_MPP_DOUT_CTRL_MPP        2
+#define PM8XXX_MPP_DOUT_CTRL_INV_MPP    3
+
+/* Bidirectional: control */
+#define PM8XXX_MPP_BI_PULLUP_1KOHM      0
+#define PM8XXX_MPP_BI_PULLUP_OPEN       1
+#define PM8XXX_MPP_BI_PULLUP_10KOHM     2
+#define PM8XXX_MPP_BI_PULLUP_30KOHM     3
+
+/* Analog Output: control */
+#define PM8XXX_MPP_AOUT_CTRL_DISABLE            0
+#define PM8XXX_MPP_AOUT_CTRL_ENABLE             1
+#define PM8XXX_MPP_AOUT_CTRL_MPP_HIGH_EN        2
+#define PM8XXX_MPP_AOUT_CTRL_MPP_LOW_EN         3
+
+/* Current Sink: control */
+#define PM8XXX_MPP_CS_CTRL_DISABLE      0
+#define PM8XXX_MPP_CS_CTRL_ENABLE       1
+#define PM8XXX_MPP_CS_CTRL_MPP_HIGH_EN  2
+#define PM8XXX_MPP_CS_CTRL_MPP_LOW_EN   3
+
+/* DTEST Current Sink: control */
+#define PM8XXX_MPP_DTEST_CS_CTRL_EN1    0
+#define PM8XXX_MPP_DTEST_CS_CTRL_EN2    1
+#define PM8XXX_MPP_DTEST_CS_CTRL_EN3    2
+#define PM8XXX_MPP_DTEST_CS_CTRL_EN4    3
+
+/* DTEST Digital Output: control */
+#define PM8XXX_MPP_DTEST_DBUS1          0
+#define PM8XXX_MPP_DTEST_DBUS2          1
+#define PM8XXX_MPP_DTEST_DBUS3          2
+#define PM8XXX_MPP_DTEST_DBUS4          3
+
+/* custom pinconf parameters */
+#define PM8XXX_CONFIG_AMUX		(PIN_CONFIG_END + 1)
+#define PM8XXX_CONFIG_DTEST_SELECTOR	(PIN_CONFIG_END + 2)
+#define PM8XXX_CONFIG_ALEVEL		(PIN_CONFIG_END + 3)
+#define PM8XXX_CONFIG_PAIRED		(PIN_CONFIG_END + 4)
+
+/**
+ * struct pm8xxx_pin_data - dynamic configuration for a pin
+ * @reg:		address of the control register
+ * @irq:		IRQ from the PMIC interrupt controller
+ * @mode:		operating mode for the pin (digital, analog or current sink)
+ * @input:		pin is input
+ * @output:		pin is output
+ * @high_z:		pin is floating
+ * @paired:		mpp operates in paired mode
+ * @output_value:	logical output value of the mpp
+ * @power_source:	selected power source
+ * @dtest:		DTEST route selector
+ * @amux:		input muxing in analog mode
+ * @aout_level:		selector of the output in analog mode
+ * @drive_strength:	drive strength of the current sink
+ * @pullup:		pull up value, when in digital bidirectional mode
+ */
+struct pm8xxx_pin_data {
+	unsigned reg;
+	int irq;
+
+	u8 mode;
+
+	bool input;
+	bool output;
+	bool high_z;
+	bool paired;
+	bool output_value;
+
+	u8 power_source;
+	u8 dtest;
+	u8 amux;
+	u8 aout_level;
+	u8 drive_strength;
+	unsigned pullup;
+};
+
+struct pm8xxx_mpp {
+	struct device *dev;
+	struct regmap *regmap;
+	struct pinctrl_dev *pctrl;
+	struct gpio_chip chip;
+
+	struct pinctrl_desc desc;
+	unsigned npins;
+};
+
+static const struct pinconf_generic_params pm8xxx_mpp_bindings[] = {
+	{"qcom,amux-route",	PM8XXX_CONFIG_AMUX,		0},
+	{"qcom,analog-level",	PM8XXX_CONFIG_ALEVEL,		0},
+	{"qcom,dtest",		PM8XXX_CONFIG_DTEST_SELECTOR,	0},
+	{"qcom,paired",		PM8XXX_CONFIG_PAIRED,		0},
+};
+
+#ifdef CONFIG_DEBUG_FS
+static const struct pin_config_item pm8xxx_conf_items[] = {
+	PCONFDUMP(PM8XXX_CONFIG_AMUX, "analog mux", NULL, true),
+	PCONFDUMP(PM8XXX_CONFIG_ALEVEL, "analog level", NULL, true),
+	PCONFDUMP(PM8XXX_CONFIG_DTEST_SELECTOR, "dtest", NULL, true),
+	PCONFDUMP(PM8XXX_CONFIG_PAIRED, "paired", NULL, false),
+};
+#endif
+
+#define PM8XXX_MAX_MPPS	12
+static const char * const pm8xxx_groups[PM8XXX_MAX_MPPS] = {
+	"mpp1", "mpp2", "mpp3", "mpp4", "mpp5", "mpp6", "mpp7", "mpp8",
+	"mpp9", "mpp10", "mpp11", "mpp12",
+};
+
+#define PM8XXX_MPP_DIGITAL	0
+#define PM8XXX_MPP_ANALOG	1
+#define PM8XXX_MPP_SINK		2
+
+static const char * const pm8xxx_mpp_functions[] = {
+	"digital", "analog", "sink",
+};
+
+static int pm8xxx_mpp_update(struct pm8xxx_mpp *pctrl,
+			     struct pm8xxx_pin_data *pin)
+{
+	unsigned level;
+	unsigned ctrl;
+	unsigned type;
+	int ret;
+	u8 val;
+
+	switch (pin->mode) {
+	case PM8XXX_MPP_DIGITAL:
+		if (pin->dtest) {
+			type = PM8XXX_MPP_TYPE_DTEST_OUTPUT;
+			ctrl = pin->dtest - 1;
+		} else if (pin->input && pin->output) {
+			type = PM8XXX_MPP_TYPE_D_BI_DIR;
+			if (pin->high_z)
+				ctrl = PM8XXX_MPP_BI_PULLUP_OPEN;
+			else if (pin->pullup == 600)
+				ctrl = PM8XXX_MPP_BI_PULLUP_1KOHM;
+			else if (pin->pullup == 10000)
+				ctrl = PM8XXX_MPP_BI_PULLUP_10KOHM;
+			else
+				ctrl = PM8XXX_MPP_BI_PULLUP_30KOHM;
+		} else if (pin->input) {
+			type = PM8XXX_MPP_TYPE_D_INPUT;
+			if (pin->dtest)
+				ctrl = pin->dtest;
+			else
+				ctrl = PM8XXX_MPP_DIN_TO_INT;
+		} else {
+			type = PM8XXX_MPP_TYPE_D_OUTPUT;
+			ctrl = !!pin->output_value;
+			if (pin->paired)
+				ctrl |= BIT(1);
+		}
+
+		level = pin->power_source;
+		break;
+	case PM8XXX_MPP_ANALOG:
+		if (pin->output) {
+			type = PM8XXX_MPP_TYPE_A_OUTPUT;
+			level = pin->aout_level;
+			ctrl = pin->output_value;
+			if (pin->paired)
+				ctrl |= BIT(1);
+		} else {
+			type = PM8XXX_MPP_TYPE_A_INPUT;
+			level = pin->amux;
+			ctrl = 0;
+		}
+		break;
+	case PM8XXX_MPP_SINK:
+		level = (pin->drive_strength / 5) - 1;
+		if (pin->dtest) {
+			type = PM8XXX_MPP_TYPE_DTEST_SINK;
+			ctrl = pin->dtest - 1;
+		} else {
+			type = PM8XXX_MPP_TYPE_SINK;
+			ctrl = pin->output_value;
+			if (pin->paired)
+				ctrl |= BIT(1);
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	val = type << 5 | level << 2 | ctrl;
+	ret = regmap_write(pctrl->regmap, pin->reg, val);
+	if (ret)
+		dev_err(pctrl->dev, "failed to write register\n");
+
+	return ret;
+}
+
+static int pm8xxx_get_groups_count(struct pinctrl_dev *pctldev)
+{
+	struct pm8xxx_mpp *pctrl = pinctrl_dev_get_drvdata(pctldev);
+
+	return pctrl->npins;
+}
+
+static const char *pm8xxx_get_group_name(struct pinctrl_dev *pctldev,
+					 unsigned group)
+{
+	return pm8xxx_groups[group];
+}
+
+
+static int pm8xxx_get_group_pins(struct pinctrl_dev *pctldev,
+				 unsigned group,
+				 const unsigned **pins,
+				 unsigned *num_pins)
+{
+	struct pm8xxx_mpp *pctrl = pinctrl_dev_get_drvdata(pctldev);
+
+	*pins = &pctrl->desc.pins[group].number;
+	*num_pins = 1;
+
+	return 0;
+}
+
+static const struct pinctrl_ops pm8xxx_pinctrl_ops = {
+	.get_groups_count	= pm8xxx_get_groups_count,
+	.get_group_name		= pm8xxx_get_group_name,
+	.get_group_pins         = pm8xxx_get_group_pins,
+	.dt_node_to_map		= pinconf_generic_dt_node_to_map_group,
+	.dt_free_map		= pinctrl_utils_dt_free_map,
+};
+
+static int pm8xxx_get_functions_count(struct pinctrl_dev *pctldev)
+{
+	return ARRAY_SIZE(pm8xxx_mpp_functions);
+}
+
+static const char *pm8xxx_get_function_name(struct pinctrl_dev *pctldev,
+					    unsigned function)
+{
+	return pm8xxx_mpp_functions[function];
+}
+
+static int pm8xxx_get_function_groups(struct pinctrl_dev *pctldev,
+				      unsigned function,
+				      const char * const **groups,
+				      unsigned * const num_groups)
+{
+	struct pm8xxx_mpp *pctrl = pinctrl_dev_get_drvdata(pctldev);
+
+	*groups = pm8xxx_groups;
+	*num_groups = pctrl->npins;
+	return 0;
+}
+
+static int pm8xxx_pinmux_set_mux(struct pinctrl_dev *pctldev,
+				 unsigned function,
+				 unsigned group)
+{
+	struct pm8xxx_mpp *pctrl = pinctrl_dev_get_drvdata(pctldev);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[group].drv_data;
+
+	pin->mode = function;
+	pm8xxx_mpp_update(pctrl, pin);
+
+	return 0;
+}
+
+static const struct pinmux_ops pm8xxx_pinmux_ops = {
+	.get_functions_count	= pm8xxx_get_functions_count,
+	.get_function_name	= pm8xxx_get_function_name,
+	.get_function_groups	= pm8xxx_get_function_groups,
+	.set_mux		= pm8xxx_pinmux_set_mux,
+};
+
+static int pm8xxx_pin_config_get(struct pinctrl_dev *pctldev,
+				 unsigned int offset,
+				 unsigned long *config)
+{
+	struct pm8xxx_mpp *pctrl = pinctrl_dev_get_drvdata(pctldev);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+	unsigned param = pinconf_to_config_param(*config);
+	unsigned arg;
+
+	switch (param) {
+	case PIN_CONFIG_BIAS_PULL_UP:
+		arg = pin->pullup;
+		break;
+	case PIN_CONFIG_BIAS_HIGH_IMPEDANCE:
+		arg = pin->high_z;
+		break;
+	case PIN_CONFIG_INPUT_ENABLE:
+		arg = pin->input;
+		break;
+	case PIN_CONFIG_OUTPUT:
+		arg = pin->output_value;
+		break;
+	case PIN_CONFIG_POWER_SOURCE:
+		arg = pin->power_source;
+		break;
+	case PIN_CONFIG_DRIVE_STRENGTH:
+		arg = pin->drive_strength;
+		break;
+	case PM8XXX_CONFIG_DTEST_SELECTOR:
+		arg = pin->dtest;
+		break;
+	case PM8XXX_CONFIG_AMUX:
+		arg = pin->amux;
+		break;
+	case PM8XXX_CONFIG_ALEVEL:
+		arg = pin->aout_level;
+		break;
+	case PM8XXX_CONFIG_PAIRED:
+		arg = pin->paired;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	*config = pinconf_to_config_packed(param, arg);
+
+	return 0;
+}
+
+static int pm8xxx_pin_config_set(struct pinctrl_dev *pctldev,
+				 unsigned int offset,
+				 unsigned long *configs,
+				 unsigned num_configs)
+{
+	struct pm8xxx_mpp *pctrl = pinctrl_dev_get_drvdata(pctldev);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+	unsigned param;
+	unsigned arg;
+	unsigned i;
+
+	for (i = 0; i < num_configs; i++) {
+		param = pinconf_to_config_param(configs[i]);
+		arg = pinconf_to_config_argument(configs[i]);
+
+		switch (param) {
+		case PIN_CONFIG_BIAS_PULL_UP:
+			pin->pullup = arg;
+			break;
+		case PIN_CONFIG_BIAS_HIGH_IMPEDANCE:
+			pin->high_z = true;
+			break;
+		case PIN_CONFIG_INPUT_ENABLE:
+			pin->input = true;
+			break;
+		case PIN_CONFIG_OUTPUT:
+			pin->output = true;
+			pin->output_value = !!arg;
+			break;
+		case PIN_CONFIG_POWER_SOURCE:
+			pin->power_source = arg;
+			break;
+		case PIN_CONFIG_DRIVE_STRENGTH:
+			pin->drive_strength = arg;
+			break;
+		case PM8XXX_CONFIG_DTEST_SELECTOR:
+			pin->dtest = arg;
+			break;
+		case PM8XXX_CONFIG_AMUX:
+			pin->amux = arg;
+			break;
+		case PM8XXX_CONFIG_ALEVEL:
+			pin->aout_level = arg;
+			break;
+		case PM8XXX_CONFIG_PAIRED:
+			pin->paired = !!arg;
+			break;
+		default:
+			dev_err(pctrl->dev,
+				"unsupported config parameter: %x\n",
+				param);
+			return -EINVAL;
+		}
+	}
+
+	pm8xxx_mpp_update(pctrl, pin);
+
+	return 0;
+}
+
+static const struct pinconf_ops pm8xxx_pinconf_ops = {
+	.is_generic = true,
+	.pin_config_group_get = pm8xxx_pin_config_get,
+	.pin_config_group_set = pm8xxx_pin_config_set,
+};
+
+static struct pinctrl_desc pm8xxx_pinctrl_desc = {
+	.name = "pm8xxx_mpp",
+	.pctlops = &pm8xxx_pinctrl_ops,
+	.pmxops = &pm8xxx_pinmux_ops,
+	.confops = &pm8xxx_pinconf_ops,
+	.owner = THIS_MODULE,
+};
+
+static int pm8xxx_mpp_direction_input(struct gpio_chip *chip,
+				       unsigned offset)
+{
+	struct pm8xxx_mpp *pctrl = container_of(chip, struct pm8xxx_mpp, chip);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+
+	switch (pin->mode) {
+	case PM8XXX_MPP_DIGITAL:
+		pin->input = true;
+		break;
+	case PM8XXX_MPP_ANALOG:
+		pin->input = true;
+		pin->output = true;
+		break;
+	case PM8XXX_MPP_SINK:
+		return -EINVAL;
+	}
+
+	pm8xxx_mpp_update(pctrl, pin);
+
+	return 0;
+}
+
+static int pm8xxx_mpp_direction_output(struct gpio_chip *chip,
+					unsigned offset,
+					int value)
+{
+	struct pm8xxx_mpp *pctrl = container_of(chip, struct pm8xxx_mpp, chip);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+
+	switch (pin->mode) {
+	case PM8XXX_MPP_DIGITAL:
+		pin->output = true;
+		break;
+	case PM8XXX_MPP_ANALOG:
+		pin->input = false;
+		pin->output = true;
+		break;
+	case PM8XXX_MPP_SINK:
+		pin->input = false;
+		pin->output = true;
+		break;
+	}
+
+	pm8xxx_mpp_update(pctrl, pin);
+
+	return 0;
+}
+
+static int pm8xxx_mpp_get(struct gpio_chip *chip, unsigned offset)
+{
+	struct pm8xxx_mpp *pctrl = container_of(chip, struct pm8xxx_mpp, chip);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+	bool state;
+	int ret;
+
+	if (!pin->input)
+		return pin->output_value;
+
+	ret = irq_get_irqchip_state(pin->irq, IRQCHIP_STATE_LINE_LEVEL, &state);
+	if (!ret)
+		ret = !!state;
+
+	return ret;
+}
+
+static void pm8xxx_mpp_set(struct gpio_chip *chip, unsigned offset, int value)
+{
+	struct pm8xxx_mpp *pctrl = container_of(chip, struct pm8xxx_mpp, chip);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+
+	pin->output_value = !!value;
+
+	pm8xxx_mpp_update(pctrl, pin);
+}
+
+static int pm8xxx_mpp_of_xlate(struct gpio_chip *chip,
+				const struct of_phandle_args *gpio_desc,
+				u32 *flags)
+{
+	if (chip->of_gpio_n_cells < 2)
+		return -EINVAL;
+
+	if (flags)
+		*flags = gpio_desc->args[1];
+
+	return gpio_desc->args[0] - 1;
+}
+
+
+static int pm8xxx_mpp_to_irq(struct gpio_chip *chip, unsigned offset)
+{
+	struct pm8xxx_mpp *pctrl = container_of(chip, struct pm8xxx_mpp, chip);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+
+	return pin->irq;
+}
+
+#ifdef CONFIG_DEBUG_FS
+#include <linux/seq_file.h>
+
+static void pm8xxx_mpp_dbg_show_one(struct seq_file *s,
+				  struct pinctrl_dev *pctldev,
+				  struct gpio_chip *chip,
+				  unsigned offset,
+				  unsigned gpio)
+{
+	struct pm8xxx_mpp *pctrl = container_of(chip, struct pm8xxx_mpp, chip);
+	struct pm8xxx_pin_data *pin = pctrl->desc.pins[offset].drv_data;
+
+	static const char * const aout_lvls[] = {
+		"1v25", "1v25_2", "0v625", "0v3125", "mpp", "abus1", "abus2",
+		"abus3"
+	};
+
+	static const char * const amuxs[] = {
+		"amux5", "amux6", "amux7", "amux8", "amux9", "abus1", "abus2",
+		"abus3",
+	};
+
+	seq_printf(s, " mpp%-2d:", offset + 1);
+
+	switch (pin->mode) {
+	case PM8XXX_MPP_DIGITAL:
+		seq_puts(s, " digital ");
+		if (pin->dtest) {
+			seq_printf(s, "dtest%d\n", pin->dtest);
+		} else if (pin->input && pin->output) {
+			if (pin->high_z)
+				seq_puts(s, "bi-dir high-z");
+			else
+				seq_printf(s, "bi-dir %dOhm", pin->pullup);
+		} else if (pin->input) {
+			if (pin->dtest)
+				seq_printf(s, "in dtest%d", pin->dtest);
+			else
+				seq_puts(s, "in gpio");
+		} else if (pin->output) {
+			seq_puts(s, "out ");
+
+			if (!pin->paired) {
+				seq_puts(s, pin->output_value ?
+					 "high" : "low");
+			} else {
+				seq_puts(s, pin->output_value ?
+					 "inverted" : "follow");
+			}
+		}
+		break;
+	case PM8XXX_MPP_ANALOG:
+		seq_puts(s, " analog ");
+		if (pin->output) {
+			seq_printf(s, "out %s ", aout_lvls[pin->aout_level]);
+			if (!pin->paired) {
+				seq_puts(s, pin->output_value ?
+					 "high" : "low");
+			} else {
+				seq_puts(s, pin->output_value ?
+					 "inverted" : "follow");
+			}
+		} else {
+			seq_printf(s, "input mux %s", amuxs[pin->amux]);
+		}
+		break;
+	case PM8XXX_MPP_SINK:
+		seq_printf(s, " sink %dmA ", pin->drive_strength);
+		if (pin->dtest) {
+			seq_printf(s, "dtest%d", pin->dtest);
+		} else {
+			if (!pin->paired) {
+				seq_puts(s, pin->output_value ?
+					 "high" : "low");
+			} else {
+				seq_puts(s, pin->output_value ?
+					 "inverted" : "follow");
+			}
+		}
+		break;
+	}
+
+}
+
+static void pm8xxx_mpp_dbg_show(struct seq_file *s, struct gpio_chip *chip)
+{
+	unsigned gpio = chip->base;
+	unsigned i;
+
+	for (i = 0; i < chip->ngpio; i++, gpio++) {
+		pm8xxx_mpp_dbg_show_one(s, NULL, chip, i, gpio);
+		seq_puts(s, "\n");
+	}
+}
+
+#else
+#define msm_mpp_dbg_show NULL
+#endif
+
+static struct gpio_chip pm8xxx_mpp_template = {
+	.direction_input = pm8xxx_mpp_direction_input,
+	.direction_output = pm8xxx_mpp_direction_output,
+	.get = pm8xxx_mpp_get,
+	.set = pm8xxx_mpp_set,
+	.of_xlate = pm8xxx_mpp_of_xlate,
+	.to_irq = pm8xxx_mpp_to_irq,
+	.dbg_show = pm8xxx_mpp_dbg_show,
+	.owner = THIS_MODULE,
+};
+
+static int pm8xxx_pin_populate(struct pm8xxx_mpp *pctrl,
+			       struct pm8xxx_pin_data *pin)
+{
+	unsigned int val;
+	unsigned level;
+	unsigned ctrl;
+	unsigned type;
+	int ret;
+
+	ret = regmap_read(pctrl->regmap, pin->reg, &val);
+	if (ret) {
+		dev_err(pctrl->dev, "failed to read register\n");
+		return ret;
+	}
+
+	type = (val >> 5) & 7;
+	level = (val >> 2) & 7;
+	ctrl = (val) & 3;
+
+	switch (type) {
+	case PM8XXX_MPP_TYPE_D_INPUT:
+		pin->mode = PM8XXX_MPP_DIGITAL;
+		pin->input = true;
+		pin->power_source = level;
+		pin->dtest = ctrl;
+		break;
+	case PM8XXX_MPP_TYPE_D_OUTPUT:
+		pin->mode = PM8XXX_MPP_DIGITAL;
+		pin->output = true;
+		pin->power_source = level;
+		pin->output_value = !!(ctrl & BIT(0));
+		pin->paired = !!(ctrl & BIT(1));
+		break;
+	case PM8XXX_MPP_TYPE_D_BI_DIR:
+		pin->mode = PM8XXX_MPP_DIGITAL;
+		pin->input = true;
+		pin->output = true;
+		pin->power_source = level;
+		switch (ctrl) {
+		case PM8XXX_MPP_BI_PULLUP_1KOHM:
+			pin->pullup = 600;
+			break;
+		case PM8XXX_MPP_BI_PULLUP_OPEN:
+			pin->high_z = true;
+			break;
+		case PM8XXX_MPP_BI_PULLUP_10KOHM:
+			pin->pullup = 10000;
+			break;
+		case PM8XXX_MPP_BI_PULLUP_30KOHM:
+			pin->pullup = 30000;
+			break;
+		}
+		break;
+	case PM8XXX_MPP_TYPE_A_INPUT:
+		pin->mode = PM8XXX_MPP_ANALOG;
+		pin->input = true;
+		pin->amux = level;
+		break;
+	case PM8XXX_MPP_TYPE_A_OUTPUT:
+		pin->mode = PM8XXX_MPP_ANALOG;
+		pin->output = true;
+		pin->aout_level = level;
+		pin->output_value = !!(ctrl & BIT(0));
+		pin->paired = !!(ctrl & BIT(1));
+		break;
+	case PM8XXX_MPP_TYPE_SINK:
+		pin->mode = PM8XXX_MPP_SINK;
+		pin->drive_strength = 5 * (level + 1);
+		pin->output_value = !!(ctrl & BIT(0));
+		pin->paired = !!(ctrl & BIT(1));
+		break;
+	case PM8XXX_MPP_TYPE_DTEST_SINK:
+		pin->mode = PM8XXX_MPP_SINK;
+		pin->dtest = ctrl + 1;
+		pin->drive_strength = 5 * (level + 1);
+		break;
+	case PM8XXX_MPP_TYPE_DTEST_OUTPUT:
+		pin->mode = PM8XXX_MPP_DIGITAL;
+		pin->power_source = level;
+		if (ctrl >= 1)
+			pin->dtest = ctrl;
+		break;
+	}
+
+	return 0;
+}
+
+static const struct of_device_id pm8xxx_mpp_of_match[] = {
+	{ .compatible = "qcom,pm8018-mpp", .data = (void *)6 },
+	{ .compatible = "qcom,pm8038-mpp", .data = (void *)6 },
+	{ .compatible = "qcom,pm8917-mpp", .data = (void *)10 },
+	{ .compatible = "qcom,pm8821-mpp", .data = (void *)4 },
+	{ .compatible = "qcom,pm8921-mpp", .data = (void *)12 },
+	{ },
+};
+MODULE_DEVICE_TABLE(of, pm8xxx_mpp_of_match);
+
+static int pm8xxx_mpp_probe(struct platform_device *pdev)
+{
+	struct pm8xxx_pin_data *pin_data;
+	struct pinctrl_pin_desc *pins;
+	struct pm8xxx_mpp *pctrl;
+	int ret;
+	int i;
+
+	pctrl = devm_kzalloc(&pdev->dev, sizeof(*pctrl), GFP_KERNEL);
+	if (!pctrl)
+		return -ENOMEM;
+
+	pctrl->dev = &pdev->dev;
+	pctrl->npins = (unsigned)of_device_get_match_data(&pdev->dev);
+
+	pctrl->regmap = dev_get_regmap(pdev->dev.parent, NULL);
+	if (!pctrl->regmap) {
+		dev_err(&pdev->dev, "parent regmap unavailable\n");
+		return -ENXIO;
+	}
+
+	pctrl->desc = pm8xxx_pinctrl_desc;
+	pctrl->desc.npins = pctrl->npins;
+
+	pins = devm_kcalloc(&pdev->dev,
+			    pctrl->desc.npins,
+			    sizeof(struct pinctrl_pin_desc),
+			    GFP_KERNEL);
+	if (!pins)
+		return -ENOMEM;
+
+	pin_data = devm_kcalloc(&pdev->dev,
+				pctrl->desc.npins,
+				sizeof(struct pm8xxx_pin_data),
+				GFP_KERNEL);
+	if (!pin_data)
+		return -ENOMEM;
+
+	for (i = 0; i < pctrl->desc.npins; i++) {
+		pin_data[i].reg = SSBI_REG_ADDR_MPP(i);
+		pin_data[i].irq = platform_get_irq(pdev, i);
+		if (pin_data[i].irq < 0) {
+			dev_err(&pdev->dev,
+				"missing interrupts for pin %d\n", i);
+			return pin_data[i].irq;
+		}
+
+		ret = pm8xxx_pin_populate(pctrl, &pin_data[i]);
+		if (ret)
+			return ret;
+
+		pins[i].number = i;
+		pins[i].name = pm8xxx_groups[i];
+		pins[i].drv_data = &pin_data[i];
+	}
+	pctrl->desc.pins = pins;
+
+	pctrl->desc.num_custom_params = ARRAY_SIZE(pm8xxx_mpp_bindings);
+	pctrl->desc.custom_params = pm8xxx_mpp_bindings;
+#ifdef CONFIG_DEBUG_FS
+	pctrl->desc.custom_conf_items = pm8xxx_conf_items;
+#endif
+
+	pctrl->pctrl = pinctrl_register(&pctrl->desc, &pdev->dev, pctrl);
+	if (!pctrl->pctrl) {
+		dev_err(&pdev->dev, "couldn't register pm8xxx mpp driver\n");
+		return -ENODEV;
+	}
+
+	pctrl->chip = pm8xxx_mpp_template;
+	pctrl->chip.base = -1;
+	pctrl->chip.dev = &pdev->dev;
+	pctrl->chip.of_node = pdev->dev.of_node;
+	pctrl->chip.of_gpio_n_cells = 2;
+	pctrl->chip.label = dev_name(pctrl->dev);
+	pctrl->chip.ngpio = pctrl->npins;
+	ret = gpiochip_add(&pctrl->chip);
+	if (ret) {
+		dev_err(&pdev->dev, "failed register gpiochip\n");
+		goto unregister_pinctrl;
+	}
+
+	ret = gpiochip_add_pin_range(&pctrl->chip,
+				     dev_name(pctrl->dev),
+				     0, 0, pctrl->chip.ngpio);
+	if (ret) {
+		dev_err(pctrl->dev, "failed to add pin range\n");
+		goto unregister_gpiochip;
+	}
+
+	platform_set_drvdata(pdev, pctrl);
+
+	dev_dbg(&pdev->dev, "Qualcomm pm8xxx mpp driver probed\n");
+
+	return 0;
+
+unregister_gpiochip:
+	gpiochip_remove(&pctrl->chip);
+
+unregister_pinctrl:
+	pinctrl_unregister(pctrl->pctrl);
+
+	return ret;
+}
+
+static int pm8xxx_mpp_remove(struct platform_device *pdev)
+{
+	struct pm8xxx_mpp *pctrl = platform_get_drvdata(pdev);
+
+	gpiochip_remove(&pctrl->chip);
+
+	pinctrl_unregister(pctrl->pctrl);
+
+	return 0;
+}
+
+static struct platform_driver pm8xxx_mpp_driver = {
+	.driver = {
+		.name = "qcom-ssbi-mpp",
+		.of_match_table = pm8xxx_mpp_of_match,
+	},
+	.probe = pm8xxx_mpp_probe,
+	.remove = pm8xxx_mpp_remove,
+};
+
+module_platform_driver(pm8xxx_mpp_driver);
+
+MODULE_AUTHOR("Bjorn Andersson <bjorn.andersson@sonymobile.com>");
+MODULE_DESCRIPTION("Qualcomm PM8xxx MPP driver");
+MODULE_LICENSE("GPL v2");
diff --git a/include/dt-bindings/pinctrl/qcom,pmic-mpp.h b/include/dt-bindings/pinctrl/qcom,pmic-mpp.h
index c10205491f8d..a15c1704d0ec 100644
--- a/include/dt-bindings/pinctrl/qcom,pmic-mpp.h
+++ b/include/dt-bindings/pinctrl/qcom,pmic-mpp.h
@@ -7,6 +7,47 @@
 #define _DT_BINDINGS_PINCTRL_QCOM_PMIC_MPP_H
 
 /* power-source */
+
+/* Digital Input/Output: level [PM8058] */
+#define PM8058_MPP_VPH			0
+#define PM8058_MPP_S3			1
+#define PM8058_MPP_L2			2
+#define PM8058_MPP_L3			3
+
+/* Digital Input/Output: level [PM8901] */
+#define PM8901_MPP_MSMIO		0
+#define PM8901_MPP_DIG			1
+#define PM8901_MPP_L5			2
+#define PM8901_MPP_S4			3
+#define PM8901_MPP_VPH			4
+
+/* Digital Input/Output: level [PM8921] */
+#define PM8921_MPP_S4			1
+#define PM8921_MPP_L15			3
+#define PM8921_MPP_L17			4
+#define PM8921_MPP_VPH			7
+
+/* Digital Input/Output: level [PM8821] */
+#define PM8821_MPP_1P8			0
+#define PM8821_MPP_VPH			7
+
+/* Digital Input/Output: level [PM8018] */
+#define PM8018_MPP_L4			0
+#define PM8018_MPP_L14			1
+#define PM8018_MPP_S3			2
+#define PM8018_MPP_L6			3
+#define PM8018_MPP_L2			4
+#define PM8018_MPP_L5			5
+#define PM8018_MPP_VPH			7
+
+/* Digital Input/Output: level [PM8038] */
+#define PM8038_MPP_L20			0
+#define PM8038_MPP_L11			1
+#define PM8038_MPP_L5			2
+#define PM8038_MPP_L15			3
+#define PM8038_MPP_L17			4
+#define PM8038_MPP_VPH			7
+
 #define PM8841_MPP_VPH			0
 #define PM8841_MPP_S3			2
 
@@ -37,6 +78,16 @@
 #define PMIC_MPP_AMUX_ROUTE_ABUS3	6
 #define PMIC_MPP_AMUX_ROUTE_ABUS4	7
 
+/* Analog Output: level */
+#define PMIC_MPP_AOUT_LVL_1V25		0
+#define PMIC_MPP_AOUT_LVL_1V25_2	1
+#define PMIC_MPP_AOUT_LVL_0V625		2
+#define PMIC_MPP_AOUT_LVL_0V3125	3
+#define PMIC_MPP_AOUT_LVL_MPP		4
+#define PMIC_MPP_AOUT_LVL_ABUS1		5
+#define PMIC_MPP_AOUT_LVL_ABUS2		6
+#define PMIC_MPP_AOUT_LVL_ABUS3		7
+
 /* To be used with "function" */
 #define PMIC_MPP_FUNC_NORMAL		"normal"
 #define PMIC_MPP_FUNC_PAIRED		"paired"
-- 
cgit v1.2.3


From 75c27f119b6475d95374bdad872c6938b5c26196 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Thu, 11 Jun 2015 15:22:43 -0700
Subject: rcu: Remove CONFIG_RCU_CPU_STALL_INFO

The CONFIG_RCU_CPU_STALL_INFO has been default-y for a couple of
releases with no complaints, so it is time to eliminate this Kconfig
option entirely, so that the long-form RCU CPU stall warnings cannot
be disabled.  This commit does just that.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/stallwarn.txt                    | 12 +-----
 kernel/rcu/tree.h                                  |  4 --
 kernel/rcu/tree_plugin.h                           | 45 ----------------------
 lib/Kconfig.debug                                  | 14 -------
 .../selftests/rcutorture/configs/rcu/TREE01        |  1 -
 .../selftests/rcutorture/configs/rcu/TREE02        |  1 -
 .../selftests/rcutorture/configs/rcu/TREE02-T      |  1 -
 .../selftests/rcutorture/configs/rcu/TREE03        |  1 -
 .../selftests/rcutorture/configs/rcu/TREE04        |  1 -
 .../selftests/rcutorture/configs/rcu/TREE05        |  1 -
 .../selftests/rcutorture/configs/rcu/TREE06        |  1 -
 .../selftests/rcutorture/configs/rcu/TREE07        |  1 -
 .../selftests/rcutorture/configs/rcu/TREE08        |  1 -
 .../selftests/rcutorture/configs/rcu/TREE08-T      |  1 -
 .../selftests/rcutorture/configs/rcu/TREE09        |  1 -
 .../selftests/rcutorture/doc/TREE_RCU-kconfig.txt  |  1 -
 16 files changed, 2 insertions(+), 85 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index b57c0c1cdac6..046f32637b95 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -26,12 +26,6 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
 	Stall-warning messages may be enabled and disabled completely via
 	/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
 
-CONFIG_RCU_CPU_STALL_INFO
-
-	This kernel configuration parameter causes the stall warning to
-	print out additional per-CPU diagnostic information, including
-	information on scheduling-clock ticks and RCU's idle-CPU tracking.
-
 RCU_STALL_DELAY_DELTA
 
 	Although the lockdep facility is extremely useful, it does add
@@ -101,15 +95,13 @@ interact.  Please note that it is not possible to entirely eliminate this
 sort of false positive without resorting to things like stop_machine(),
 which is overkill for this sort of problem.
 
-If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set,
-more information is printed with the stall-warning message, for example:
+Recent kernels will print a long form of the stall-warning message:
 
 	INFO: rcu_preempt detected stall on CPU
 	0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543
 	   (t=65000 jiffies)
 
-In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is
-printed:
+In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed:
 
 	INFO: rcu_preempt detected stall on CPU
 	0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 nonlazy_posted: 25 .D
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index faee5242d6ff..7c0b09d754a1 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -288,12 +288,10 @@ struct rcu_data {
 	bool		gpwrap;		/* Possible gpnum/completed wrap. */
 	struct rcu_node *mynode;	/* This CPU's leaf of hierarchy */
 	unsigned long grpmask;		/* Mask to apply to leaf qsmask. */
-#ifdef CONFIG_RCU_CPU_STALL_INFO
 	unsigned long	ticks_this_gp;	/* The number of scheduling-clock */
 					/*  ticks this CPU has handled */
 					/*  during and after the last grace */
 					/* period it is aware of. */
-#endif /* #ifdef CONFIG_RCU_CPU_STALL_INFO */
 
 	/* 2) batch handling */
 	/*
@@ -388,9 +386,7 @@ struct rcu_data {
 #endif /* #ifdef CONFIG_RCU_NOCB_CPU */
 
 	/* 8) RCU CPU stall data. */
-#ifdef CONFIG_RCU_CPU_STALL_INFO
 	unsigned int softirq_snap;	/* Snapshot of softirq activity. */
-#endif /* #ifdef CONFIG_RCU_CPU_STALL_INFO */
 
 	int cpu;
 	struct rcu_state *rsp;
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7234f03e0aa2..ef41c1b04ba6 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -82,8 +82,6 @@ static void __init rcu_bootup_announce_oddness(void)
 		pr_info("\tRCU lockdep checking is enabled.\n");
 	if (IS_ENABLED(CONFIG_RCU_TORTURE_TEST_RUNNABLE))
 		pr_info("\tRCU torture testing starts during boot.\n");
-	if (IS_ENABLED(CONFIG_RCU_CPU_STALL_INFO))
-		pr_info("\tAdditional per-CPU info printed with stalls.\n");
 	if (RCU_NUM_LVLS >= 4)
 		pr_info("\tFour(or more)-level hierarchy is enabled.\n");
 	if (RCU_FANOUT_LEAF != 16)
@@ -418,8 +416,6 @@ static void rcu_print_detail_task_stall(struct rcu_state *rsp)
 		rcu_print_detail_task_stall_rnp(rnp);
 }
 
-#ifdef CONFIG_RCU_CPU_STALL_INFO
-
 static void rcu_print_task_stall_begin(struct rcu_node *rnp)
 {
 	pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):",
@@ -431,18 +427,6 @@ static void rcu_print_task_stall_end(void)
 	pr_cont("\n");
 }
 
-#else /* #ifdef CONFIG_RCU_CPU_STALL_INFO */
-
-static void rcu_print_task_stall_begin(struct rcu_node *rnp)
-{
-}
-
-static void rcu_print_task_stall_end(void)
-{
-}
-
-#endif /* #else #ifdef CONFIG_RCU_CPU_STALL_INFO */
-
 /*
  * Scan the current list of tasks blocked within RCU read-side critical
  * sections, printing out the tid of each.
@@ -1685,8 +1669,6 @@ early_initcall(rcu_register_oom_notifier);
 
 #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
 
-#ifdef CONFIG_RCU_CPU_STALL_INFO
-
 #ifdef CONFIG_RCU_FAST_NO_HZ
 
 static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
@@ -1775,33 +1757,6 @@ static void increment_cpu_stall_ticks(void)
 		raw_cpu_inc(rsp->rda->ticks_this_gp);
 }
 
-#else /* #ifdef CONFIG_RCU_CPU_STALL_INFO */
-
-static void print_cpu_stall_info_begin(void)
-{
-	pr_cont(" {");
-}
-
-static void print_cpu_stall_info(struct rcu_state *rsp, int cpu)
-{
-	pr_cont(" %d", cpu);
-}
-
-static void print_cpu_stall_info_end(void)
-{
-	pr_cont("} ");
-}
-
-static void zero_cpu_stall_ticks(struct rcu_data *rdp)
-{
-}
-
-static void increment_cpu_stall_ticks(void)
-{
-}
-
-#endif /* #else #ifdef CONFIG_RCU_CPU_STALL_INFO */
-
 #ifdef CONFIG_RCU_NOCB_CPU
 
 /*
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e2894b23efb6..8a34205d6922 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1353,20 +1353,6 @@ config RCU_CPU_STALL_TIMEOUT
 	  RCU grace period persists, additional CPU stall warnings are
 	  printed at more widely spaced intervals.
 
-config RCU_CPU_STALL_INFO
-	bool "Print additional diagnostics on RCU CPU stall"
-	depends on (TREE_RCU || PREEMPT_RCU) && DEBUG_KERNEL
-	default y
-	help
-	  For each stalled CPU that is aware of the current RCU grace
-	  period, print out additional per-CPU diagnostic information
-	  regarding scheduling-clock ticks, idle state, and,
-	  for RCU_FAST_NO_HZ kernels, idle-entry state.
-
-	  Say N if you are unsure.
-
-	  Say Y if you want to enable such diagnostics.
-
 config RCU_TRACE
 	bool "Enable tracing for RCU"
 	depends on DEBUG_KERNEL
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE01 b/tools/testing/selftests/rcutorture/configs/rcu/TREE01
index 8e9137f66831..f572b873c620 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE01
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE01
@@ -13,7 +13,6 @@ CONFIG_MAXSMP=y
 CONFIG_RCU_NOCB_CPU=y
 CONFIG_RCU_NOCB_CPU_ZERO=y
 CONFIG_DEBUG_LOCK_ALLOC=n
-CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_BOOST=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
 CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE02 b/tools/testing/selftests/rcutorture/configs/rcu/TREE02
index aeea6a204d14..ef6a22c44dea 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE02
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE02
@@ -17,7 +17,6 @@ CONFIG_RCU_FANOUT_LEAF=3
 CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=y
 CONFIG_PROVE_LOCKING=n
-CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_BOOST=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
 CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T b/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T
index 2ac9e68ea3d1..917d2517b5b5 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE02-T
@@ -17,6 +17,5 @@ CONFIG_RCU_FANOUT_LEAF=3
 CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=y
 CONFIG_PROVE_LOCKING=n
-CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_BOOST=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE03 b/tools/testing/selftests/rcutorture/configs/rcu/TREE03
index 72aa7d87ea99..7a17c503b382 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE03
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE03
@@ -13,7 +13,6 @@ CONFIG_RCU_FANOUT=2
 CONFIG_RCU_FANOUT_LEAF=2
 CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=n
-CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_BOOST=y
 CONFIG_RCU_KTHREAD_PRIO=2
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE04 b/tools/testing/selftests/rcutorture/configs/rcu/TREE04
index 3f5112751cda..39a2c6d7d7ec 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE04
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE04
@@ -17,6 +17,5 @@ CONFIG_RCU_FANOUT=4
 CONFIG_RCU_FANOUT_LEAF=4
 CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=n
-CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
 CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE05 b/tools/testing/selftests/rcutorture/configs/rcu/TREE05
index c04dfea6fd21..1257d3227b1e 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE05
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE05
@@ -17,6 +17,5 @@ CONFIG_RCU_NOCB_CPU_NONE=y
 CONFIG_DEBUG_LOCK_ALLOC=y
 CONFIG_PROVE_LOCKING=y
 #CHECK#CONFIG_PROVE_RCU=y
-CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
 CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE06 b/tools/testing/selftests/rcutorture/configs/rcu/TREE06
index f51d2c73a68e..d3e456b74cbe 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE06
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE06
@@ -18,6 +18,5 @@ CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=y
 CONFIG_PROVE_LOCKING=y
 #CHECK#CONFIG_PROVE_RCU=y
-CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
 CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE07 b/tools/testing/selftests/rcutorture/configs/rcu/TREE07
index f422af4ff5a3..3956b4131f72 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE07
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE07
@@ -17,6 +17,5 @@ CONFIG_RCU_FANOUT=2
 CONFIG_RCU_FANOUT_LEAF=2
 CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=n
-CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
 CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE08 b/tools/testing/selftests/rcutorture/configs/rcu/TREE08
index a24d2ca30646..bb9b0c1a23c2 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE08
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE08
@@ -19,7 +19,6 @@ CONFIG_RCU_NOCB_CPU_ALL=y
 CONFIG_DEBUG_LOCK_ALLOC=n
 CONFIG_PROVE_LOCKING=y
 #CHECK#CONFIG_PROVE_RCU=y
-CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_BOOST=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
 CONFIG_RCU_EXPERT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T b/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
index b2b8cea69dc9..2ad13f0d29cc 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
@@ -17,6 +17,5 @@ CONFIG_RCU_FANOUT_LEAF=2
 CONFIG_RCU_NOCB_CPU=y
 CONFIG_RCU_NOCB_CPU_ALL=y
 CONFIG_DEBUG_LOCK_ALLOC=n
-CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_BOOST=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE09 b/tools/testing/selftests/rcutorture/configs/rcu/TREE09
index aa4ed08d999d..6710e749d9de 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE09
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE09
@@ -13,7 +13,6 @@ CONFIG_SUSPEND=n
 CONFIG_HIBERNATION=n
 CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=n
-CONFIG_RCU_CPU_STALL_INFO=n
 CONFIG_RCU_BOOST=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
 #CHECK#CONFIG_RCU_EXPERT=n
diff --git a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
index b24c0004fc49..657f3a035488 100644
--- a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
+++ b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
@@ -16,7 +16,6 @@ CONFIG_PROVE_LOCKING -- Do several, covering CONFIG_DEBUG_LOCK_ALLOC=y and not.
 CONFIG_PROVE_RCU -- Hardwired to CONFIG_PROVE_LOCKING.
 CONFIG_RCU_BOOST -- one of PREEMPT_RCU.
 CONFIG_RCU_KTHREAD_PRIO -- set to 2 for _BOOST testing.
-CONFIG_RCU_CPU_STALL_INFO -- Now default, avoid at least twice.
 CONFIG_RCU_FANOUT -- Cover hierarchy, but overlap with others.
 CONFIG_RCU_FANOUT_LEAF -- Do one non-default.
 CONFIG_RCU_FAST_NO_HZ -- Do one, but not with CONFIG_RCU_NOCB_CPU_ALL.
-- 
cgit v1.2.3


From 99a930b0d2b018f31474d69137f311ce3581a4c2 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Tue, 30 Jun 2015 14:54:09 -0700
Subject: documentation: Describe new expedited stall warnings

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/stallwarn.txt | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 046f32637b95..efb9454875ab 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -163,6 +163,23 @@ message will be about three times the interval between the beginning
 of the stall and the first message.
 
 
+Stall Warnings for Expedited Grace Periods
+
+If an expedited grace period detects a stall, it will place a message
+like the following in dmesg:
+
+	INFO: rcu_sched detected expedited stalls on CPUs: { 1 2 6 } 26009 jiffies s: 1043
+
+This indicates that CPUs 1, 2, and 6 have failed to respond to a
+reschedule IPI, that the expedited grace period has been going on for
+26,009 jiffies, and that the expedited grace-period sequence counter is
+1043.  The fact that this last value is odd indicates that an expedited
+grace period is in flight.
+
+It is entirely possible to see stall warnings from normal and from
+expedited grace periods at about the same time from the same run.
+
+
 What Causes RCU CPU Stall Warnings?
 
 So your kernel printed an RCU CPU stall warning.  The next question is
-- 
cgit v1.2.3


From cdacbe1f91264687af956e810278030f2ab5a3d0 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Sat, 11 Jul 2015 16:24:45 -0700
Subject: rcu: Add fastpath bypassing funnel locking

In the common case, there will be only one expedited grace period in
the system at a given time, in which case it is not helpful to use
funnel locking.  This commit therefore adds a fastpath that bypasses
funnel locking when the root ->exp_funnel_mutex is not held.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/RCU/trace.txt | 36 ++++++++++--------------------------
 kernel/rcu/tree.c           | 16 ++++++++++++++++
 kernel/rcu/tree.h           |  2 +-
 kernel/rcu/tree_trace.c     |  4 ++--
 4 files changed, 29 insertions(+), 29 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index 08651da15448..97f17e9decda 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -237,42 +237,26 @@ o	"ktl" is the low-order 16 bits (in hexadecimal) of the count of
 
 The output of "cat rcu/rcu_preempt/rcuexp" looks as follows:
 
-s=21872 d=21872 w=0 tf=0 wd1=0 wd2=0 n=0 sc=21872 dt=21872 dl=0 dx=21872
+s=21872 wd0=0 wd1=0 wd2=0 wd3=5 n=0 enq=0 sc=21872
 
 These fields are as follows:
 
-o	"s" is the starting sequence number.
+o	"s" is the sequence number, with an odd number indicating that
+	an expedited grace period is in progress.
 
-o	"d" is the ending sequence number.  When the starting and ending
-	numbers differ, there is an expedited grace period in progress.
-
-o	"w" is the number of times that the sequence numbers have been
-	in danger of wrapping.
-
-o	"tf" is the number of times that contention has resulted in a
-	failure to begin an expedited grace period.
-
-o	"wd1" and "wd2" are the number of times that an attempt to
-	start an expedited grace period found that someone else had
-	completed an expedited grace period that satisfies the
+o	"wd0", "wd1", "wd2", and "wd3" are the number of times that an
+	attempt to start an expedited grace period found that someone
+	else had completed an expedited grace period that satisfies the
 	attempted request.  "Our work is done."
 
-o	"n" is number of times that contention was so great that
-	the request was demoted from an expedited grace period to
-	a normal grace period.
+o	"n" is number of times that a concurrent CPU-hotplug operation
+	forced a fallback to a normal grace period.
+
+o	"enq" is the number of quiescent states still outstanding.
 
 o	"sc" is the number of times that the attempt to start a
 	new expedited grace period succeeded.
 
-o	"dt" is the number of times that we attempted to update
-	the "d" counter.
-
-o	"dl" is the number of times that we failed to update the "d"
-	counter.
-
-o	"dx" is the number of times that we succeeded in updating
-	the "d" counter.
-
 
 The output of "cat rcu/rcu_preempt/rcugp" looks as follows:
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f66f6e7730bc..3af0dee2d045 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3355,6 +3355,22 @@ static struct rcu_node *exp_funnel_lock(struct rcu_state *rsp, unsigned long s)
 	struct rcu_node *rnp0;
 	struct rcu_node *rnp1 = NULL;
 
+	/*
+	 * First try directly acquiring the root lock in order to reduce
+	 * latency in the common case where expedited grace periods are
+	 * rare.  We check mutex_is_locked() to avoid pathological levels of
+	 * memory contention on ->exp_funnel_mutex in the heavy-load case.
+	 */
+	rnp0 = rcu_get_root(rsp);
+	if (!mutex_is_locked(&rnp0->exp_funnel_mutex)) {
+		if (mutex_trylock(&rnp0->exp_funnel_mutex)) {
+			if (sync_exp_work_done(rsp, rnp0, NULL,
+					       &rsp->expedited_workdone0, s))
+				return NULL;
+			return rnp0;
+		}
+	}
+
 	/*
 	 * Each pass through the following loop works its way
 	 * up the rcu_node tree, returning if others have done the
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 543ba726396c..80d974df0ea0 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -493,7 +493,7 @@ struct rcu_state {
 	/* End of fields guarded by barrier_mutex. */
 
 	unsigned long expedited_sequence;	/* Take a ticket. */
-	atomic_long_t expedited_tryfail;	/* # acquisition failures. */
+	atomic_long_t expedited_workdone0;	/* # done by others #0. */
 	atomic_long_t expedited_workdone1;	/* # done by others #1. */
 	atomic_long_t expedited_workdone2;	/* # done by others #2. */
 	atomic_long_t expedited_workdone3;	/* # done by others #3. */
diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c
index ec62369f1b02..6fc4c5ff3bb5 100644
--- a/kernel/rcu/tree_trace.c
+++ b/kernel/rcu/tree_trace.c
@@ -185,9 +185,9 @@ static int show_rcuexp(struct seq_file *m, void *v)
 {
 	struct rcu_state *rsp = (struct rcu_state *)m->private;
 
-	seq_printf(m, "t=%lu tf=%lu wd1=%lu wd2=%lu wd3=%lu n=%lu enq=%d sc=%lu\n",
+	seq_printf(m, "s=%lu wd0=%lu wd1=%lu wd2=%lu wd3=%lu n=%lu enq=%d sc=%lu\n",
 		   rsp->expedited_sequence,
-		   atomic_long_read(&rsp->expedited_tryfail),
+		   atomic_long_read(&rsp->expedited_workdone0),
 		   atomic_long_read(&rsp->expedited_workdone1),
 		   atomic_long_read(&rsp->expedited_workdone2),
 		   atomic_long_read(&rsp->expedited_workdone3),
-- 
cgit v1.2.3


From 69a462b9532c906441ccbec52b62e4a8b5f26271 Mon Sep 17 00:00:00 2001
From: Mars Cheng <mars.cheng@mediatek.com>
Date: Tue, 14 Jul 2015 14:07:08 +0800
Subject: Document: DT: Add bindings for mediatek MT6580 SoC Platform

This adds a DT binding documentation for the MT6580 SoC from Mediatek.

Signed-off-by: Mars Cheng <mars.cheng@mediatek.com>
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
---
 Documentation/devicetree/bindings/arm/mediatek.txt                 | 4 ++++
 Documentation/devicetree/bindings/arm/mediatek/mediatek,sysirq.txt | 1 +
 Documentation/devicetree/bindings/serial/mtk-uart.txt              | 3 ++-
 Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt     | 6 +++++-
 4 files changed, 12 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/mediatek.txt b/Documentation/devicetree/bindings/arm/mediatek.txt
index dd7550a29db6..2daa424a13a4 100644
--- a/Documentation/devicetree/bindings/arm/mediatek.txt
+++ b/Documentation/devicetree/bindings/arm/mediatek.txt
@@ -5,6 +5,7 @@ Boards with a MediaTek mt65xx/mt81xx SoC shall have the following property:
 Required root node property:
 
 compatible: Must contain one of
+   "mediatek,mt6580"
    "mediatek,mt6589"
    "mediatek,mt6592"
    "mediatek,mt8127"
@@ -14,6 +15,9 @@ compatible: Must contain one of
 
 Supported boards:
 
+- Evaluation board for MT6580:
+    Required root node properties:
+      - compatible = "mediatek,mt6580-evbp1", "mediatek,mt6580";
 - bq Aquaris5 smart phone:
     Required root node properties:
       - compatible = "mundoreader,bq-aquaris5", "mediatek,mt6589";
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,sysirq.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,sysirq.txt
index 4f5a5352ccd8..3c9c3a7f3d25 100644
--- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,sysirq.txt
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,sysirq.txt
@@ -11,6 +11,7 @@ Required properties:
 	"mediatek,mt6592-sysirq"
 	"mediatek,mt6589-sysirq"
 	"mediatek,mt6582-sysirq"
+	"mediatek,mt6580-sysirq"
 	"mediatek,mt6577-sysirq"
 - interrupt-controller : Identifies the node as an interrupt controller
 - #interrupt-cells : Use the same format as specified by GIC in
diff --git a/Documentation/devicetree/bindings/serial/mtk-uart.txt b/Documentation/devicetree/bindings/serial/mtk-uart.txt
index 8d63f1da07aa..a875997f2062 100644
--- a/Documentation/devicetree/bindings/serial/mtk-uart.txt
+++ b/Documentation/devicetree/bindings/serial/mtk-uart.txt
@@ -7,8 +7,9 @@ Required properties:
   * "mediatek,mt8173-uart" for MT8173 compatible UARTS
   * "mediatek,mt6589-uart" for MT6589 compatible UARTS
   * "mediatek,mt6582-uart" for MT6582 compatible UARTS
+  * "mediatek,mt6580-uart" for MT6580 compatible UARTS
   * "mediatek,mt6577-uart" for all compatible UARTS (MT8173, MT6589, MT6582, 
-	MT6577)
+	MT6580, MT6577)
 
 - reg: The base address of the UART register bank.
 
diff --git a/Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt b/Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt
index 7c4408ff4b83..53a3029b7589 100644
--- a/Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt
+++ b/Documentation/devicetree/bindings/timer/mediatek,mtk-timer.txt
@@ -2,7 +2,11 @@ Mediatek MT6577, MT6572 and MT6589 Timers
 ---------------------------------------
 
 Required properties:
-- compatible: Should be "mediatek,mt6577-timer"
+- compatible should contain:
+	* "mediatek,mt6589-timer" for MT6589 compatible timers
+	* "mediatek,mt6580-timer" for MT6580 compatible timers
+	* "mediatek,mt6577-timer" for all compatible timers (MT6589, MT6580,
+		MT6577)
 - reg: Should contain location and length for timers register.
 - clocks: Clocks driving the timer hardware. This list should include two
 	clocks. The order is system clock and as second clock the RTC clock.
-- 
cgit v1.2.3


From def56bba7a5c623fd887d1f4b4c864521d615a76 Mon Sep 17 00:00:00 2001
From: Shawn Guo <shawnguo@kernel.org>
Date: Wed, 15 Jul 2015 10:36:37 +0800
Subject: input: snvs_pwrkey: use "wakeup-source" as deivce tree property name

Instead of inventing a new property name, let's use "wakeup-source" to
be consistent with other driver and subsystem bindings.

Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 Documentation/devicetree/bindings/crypto/fsl-sec4.txt | 2 +-
 drivers/input/keyboard/snvs_pwrkey.c                  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/crypto/fsl-sec4.txt b/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
index 71a39c5bd486..f16bbd6644b8 100644
--- a/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
+++ b/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
@@ -408,7 +408,7 @@ System ON/OFF key driver
       Value type: <int>
       Definition: Keycode to emit, KEY_POWER by default.
 
-  - wakeup:
+  - wakeup-source:
       Usage: option
       Value type: <boo>
       Definition: Button can wake-up the system.
diff --git a/drivers/input/keyboard/snvs_pwrkey.c b/drivers/input/keyboard/snvs_pwrkey.c
index 512a1fc2a864..78fd24ca3813 100644
--- a/drivers/input/keyboard/snvs_pwrkey.c
+++ b/drivers/input/keyboard/snvs_pwrkey.c
@@ -122,7 +122,7 @@ static int imx_snvs_pwrkey_probe(struct platform_device *pdev)
 		dev_warn(&pdev->dev, "KEY_POWER without setting in dts\n");
 	}
 
-	pdata->wakeup = of_property_read_bool(np, "wakeup");
+	pdata->wakeup = of_property_read_bool(np, "wakeup-source");
 
 	pdata->irq = platform_get_irq(pdev, 0);
 	if (pdata->irq < 0) {
-- 
cgit v1.2.3


From eff3cddc222c88943ff515ae9335687c9e2cbaf6 Mon Sep 17 00:00:00 2001
From: Jacob Keller <jacob.e.keller@intel.com>
Date: Wed, 22 Apr 2015 14:40:30 -0700
Subject: clarify implementation of ethtool's get_ts_info op

This patch adds some clarification about the intended way to implement
both SIOCSHWTSTAMP and ethtool's get_ts_info. The HWTSTAMP API has
several Rx filters which are very specific, as well as more general
filters. The specific filters really only exist to support some broken
hardware which can't fully implement the generic filters. This patch
adds clarification that it is okay to support the specific filters in
SIOCSHWTSTAMP by upscaling them to the generic filters. In addition,
update the header for ethtool_ts_info to specify that drivers ought to
only report the filters they support without upscaling in this manner.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Reviewed-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 Documentation/networking/timestamping.txt | 7 +++++++
 include/uapi/linux/ethtool.h              | 5 +++++
 2 files changed, 12 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index 5f0922613f1a..a977339fbe0a 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -359,6 +359,13 @@ the requested fine-grained filtering for incoming packets is not
 supported, the driver may time stamp more than just the requested types
 of packets.
 
+Drivers are free to use a more permissive configuration than the requested
+configuration. It is expected that drivers should only implement directly the
+most generic mode that can be supported. For example if the hardware can
+support HWTSTAMP_FILTER_V2_EVENT, then it should generally always upscale
+HWTSTAMP_FILTER_V2_L2_SYNC_MESSAGE, and so forth, as HWTSTAMP_FILTER_V2_EVENT
+is more generic (and more useful to applications).
+
 A driver which supports hardware time stamping shall update the struct
 with the actual, possibly more permissive configuration. If the
 requested packets cannot be time stamped, then nothing should be
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index cd67aec187d9..cd1629170103 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1093,6 +1093,11 @@ struct ethtool_sfeatures {
  * the 'hwtstamp_tx_types' and 'hwtstamp_rx_filters' enumeration values,
  * respectively.  For example, if the device supports HWTSTAMP_TX_ON,
  * then (1 << HWTSTAMP_TX_ON) in 'tx_types' will be set.
+ *
+ * Drivers should only report the filters they actually support without
+ * upscaling in the SIOCSHWTSTAMP ioctl. If the SIOCSHWSTAMP request for
+ * HWTSTAMP_FILTER_V1_SYNC is supported by HWTSTAMP_FILTER_V1_EVENT, then the
+ * driver should only report HWTSTAMP_FILTER_V1_EVENT in this op.
  */
 struct ethtool_ts_info {
 	__u32	cmd;
-- 
cgit v1.2.3


From 1dcafd3aeb0a70dbdd11b10269da2afc454e2395 Mon Sep 17 00:00:00 2001
From: Cristina Opriceana <cristina.opriceana@gmail.com>
Date: Sun, 19 Jul 2015 01:33:55 +0300
Subject: iio: Documentation: Remove bytes_per_datum attribute

Remove sysfs bytes_per_datum device attribute ABI documentation
since the attribute is not present anymore.

Signed-off-by: Cristina Opriceana <cristina.opriceana@gmail.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
---
 Documentation/ABI/testing/sysfs-bus-iio | 7 -------
 1 file changed, 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-bus-iio b/Documentation/ABI/testing/sysfs-bus-iio
index 666a341b01b8..400d23441a61 100644
--- a/Documentation/ABI/testing/sysfs-bus-iio
+++ b/Documentation/ABI/testing/sysfs-bus-iio
@@ -1040,13 +1040,6 @@ Contact:	linux-iio@vger.kernel.org
 Description:
 		Number of scans contained by the buffer.
 
-What:		/sys/bus/iio/devices/iio:deviceX/buffer/bytes_per_datum
-KernelVersion:	2.6.37
-Contact:	linux-iio@vger.kernel.org
-Description:
-		Bytes per scan.  Due to alignment fun, the scan may be larger
-		than implied directly by the scan_element parameters.
-
 What:		/sys/bus/iio/devices/iio:deviceX/buffer/enable
 KernelVersion:	2.6.35
 Contact:	linux-iio@vger.kernel.org
-- 
cgit v1.2.3


From 07a4257e3d758af96cf98b1e9baadeb627475b03 Mon Sep 17 00:00:00 2001
From: Stefan Wahren <stefan.wahren@i2se.com>
Date: Sat, 18 Jul 2015 12:30:41 +0000
Subject: iio: mxs-lradc: clarify supported devices

At the beginning the driver supported only i.MX28 SoC, but now the
whole MXS platform. So remove any confusing comments which apply
only to i.MX28.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
---
 .../devicetree/bindings/staging/iio/adc/mxs-lradc.txt  |  2 +-
 drivers/staging/iio/adc/mxs-lradc.c                    | 18 +++++++++---------
 2 files changed, 10 insertions(+), 10 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/staging/iio/adc/mxs-lradc.txt b/Documentation/devicetree/bindings/staging/iio/adc/mxs-lradc.txt
index 307537787574..555fb117d4fa 100644
--- a/Documentation/devicetree/bindings/staging/iio/adc/mxs-lradc.txt
+++ b/Documentation/devicetree/bindings/staging/iio/adc/mxs-lradc.txt
@@ -1,4 +1,4 @@
-* Freescale i.MX28 LRADC device driver
+* Freescale MXS LRADC device driver
 
 Required properties:
 - compatible: Should be "fsl,imx23-lradc" for i.MX23 SoC and "fsl,imx28-lradc"
diff --git a/drivers/staging/iio/adc/mxs-lradc.c b/drivers/staging/iio/adc/mxs-lradc.c
index d7c5223f1c3e..1ccb3676f2e5 100644
--- a/drivers/staging/iio/adc/mxs-lradc.c
+++ b/drivers/staging/iio/adc/mxs-lradc.c
@@ -1,5 +1,5 @@
 /*
- * Freescale i.MX28 LRADC driver
+ * Freescale MXS LRADC driver
  *
  * Copyright (c) 2012 DENX Software Engineering, GmbH.
  * Marek Vasut <marex@denx.de>
@@ -1392,7 +1392,7 @@ static const struct iio_chan_spec mxs_lradc_chan_spec[] = {
 	MXS_ADC_CHAN(4, IIO_VOLTAGE),
 	MXS_ADC_CHAN(5, IIO_VOLTAGE),
 	MXS_ADC_CHAN(6, IIO_VOLTAGE),
-	MXS_ADC_CHAN(7, IIO_VOLTAGE),	/* VBATT */
+	MXS_ADC_CHAN(7, IIO_VOLTAGE),
 	/* Combined Temperature sensors */
 	{
 		.type = IIO_TEMP,
@@ -1411,12 +1411,12 @@ static const struct iio_chan_spec mxs_lradc_chan_spec[] = {
 		.scan_index = -1,
 		.channel = 9,
 	},
-	MXS_ADC_CHAN(10, IIO_VOLTAGE),	/* VDDIO */
-	MXS_ADC_CHAN(11, IIO_VOLTAGE),	/* VTH */
-	MXS_ADC_CHAN(12, IIO_VOLTAGE),	/* VDDA */
-	MXS_ADC_CHAN(13, IIO_VOLTAGE),	/* VDDD */
-	MXS_ADC_CHAN(14, IIO_VOLTAGE),	/* VBG */
-	MXS_ADC_CHAN(15, IIO_VOLTAGE),	/* VDD5V */
+	MXS_ADC_CHAN(10, IIO_VOLTAGE),
+	MXS_ADC_CHAN(11, IIO_VOLTAGE),
+	MXS_ADC_CHAN(12, IIO_VOLTAGE),
+	MXS_ADC_CHAN(13, IIO_VOLTAGE),
+	MXS_ADC_CHAN(14, IIO_VOLTAGE),
+	MXS_ADC_CHAN(15, IIO_VOLTAGE),
 };
 
 static int mxs_lradc_hw_init(struct mxs_lradc *lradc)
@@ -1707,6 +1707,6 @@ static struct platform_driver mxs_lradc_driver = {
 module_platform_driver(mxs_lradc_driver);
 
 MODULE_AUTHOR("Marek Vasut <marex@denx.de>");
-MODULE_DESCRIPTION("Freescale i.MX28 LRADC driver");
+MODULE_DESCRIPTION("Freescale MXS LRADC driver");
 MODULE_LICENSE("GPL v2");
 MODULE_ALIAS("platform:" DRIVER_NAME);
-- 
cgit v1.2.3


From 2563f7fc83d655792ec9999e2dda7a37172ae6aa Mon Sep 17 00:00:00 2001
From: Jandy Gou <qingsong.gou@ck-telecom.com>
Date: Fri, 17 Jul 2015 16:34:36 +0800
Subject: iio: magnetometer: mmc35240: Add DT binding doc

Signed-off-by: Jandy Gou <qingsong.gou@ck-telecom.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
---
 .../devicetree/bindings/iio/magnetometer/mmc35240.txt       | 13 +++++++++++++
 1 file changed, 13 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iio/magnetometer/mmc35240.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/iio/magnetometer/mmc35240.txt b/Documentation/devicetree/bindings/iio/magnetometer/mmc35240.txt
new file mode 100644
index 000000000000..a01235c7fa15
--- /dev/null
+++ b/Documentation/devicetree/bindings/iio/magnetometer/mmc35240.txt
@@ -0,0 +1,13 @@
+* MEMSIC MMC35240 magnetometer sensor
+
+Required properties:
+
+  - compatible : should be "memsic,mmc35240"
+  - reg : the I2C address of the magnetometer
+
+Example:
+
+mmc35240@30 {
+        compatible = "memsic,mmc35240";
+        reg = <0x30>;
+};
-- 
cgit v1.2.3


From e757d5c4c367e747ae15186b753788f9c2752753 Mon Sep 17 00:00:00 2001
From: LABBE Corentin <clabbe.montjoie@gmail.com>
Date: Fri, 17 Jul 2015 16:39:40 +0200
Subject: ARM: sun4i: dt: Add DT bindings documentation for SUN4I Security
 System

This patch adds documentation for Device-Tree bindings for the
Security System cryptographic accelerator driver.

Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
 Documentation/devicetree/bindings/crypto/sun4i-ss.txt | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/crypto/sun4i-ss.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/crypto/sun4i-ss.txt b/Documentation/devicetree/bindings/crypto/sun4i-ss.txt
new file mode 100644
index 000000000000..1e02d177fea6
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/sun4i-ss.txt
@@ -0,0 +1,19 @@
+* Allwinner Security System found on A20 SoC
+
+Required properties:
+- compatible : Should be "allwinner,sun4i-a10-crypto".
+- reg: Should contain the Security System register location and length.
+- interrupts: Should contain the IRQ line for the Security System.
+- clocks : List of clock specifiers, corresponding to ahb and ss.
+- clock-names : Name of the functional clock, should be
+	* "ahb" : AHB gating clock
+	* "mod" : SS controller clock
+
+Example:
+	crypto: crypto-engine@01c15000 {
+		compatible = "allwinner,sun4i-a10-crypto";
+		reg = <0x01c15000 0x1000>;
+		interrupts = <GIC_SPI 86 IRQ_TYPE_LEVEL_HIGH>;
+		clocks = <&ahb_gates 5>, <&ss_clk>;
+		clock-names = "ahb", "mod";
+	};
-- 
cgit v1.2.3


From ab51fbab39d864f3223e44a2600fd951df261f0b Mon Sep 17 00:00:00 2001
From: Davidlohr Bueso <dave@stgolabs.net>
Date: Mon, 29 Jun 2015 23:26:02 -0700
Subject: futex: Fault/error injection capabilities

Although futexes are well known for being a royal pita,
we really have very little debugging capabilities - except
for relying on tglx's eye half the time.

By simply making use of the existing fault-injection machinery,
we can improve this situation, allowing generating artificial
uaddress faults and deadlock scenarios. Of course, when this is
disabled in production systems, the overhead for failure checks
is practically zero -- so this is very cheap at the same time.
Future work would be nice to now enhance trinity to make use of
this.

There is a special tunable 'ignore-private', which can filter
out private futexes. Given the tsk->make_it_fail filter and
this option, pi futexes can be narrowed down pretty closely.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Link: http://lkml.kernel.org/r/1435645562-975-3-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/fault-injection/fault-injection.txt | 11 +++
 kernel/futex.c                                    | 89 ++++++++++++++++++++++-
 lib/Kconfig.debug                                 |  7 ++
 3 files changed, 105 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt
index 4cf1a2a6bd72..415484f3d59a 100644
--- a/Documentation/fault-injection/fault-injection.txt
+++ b/Documentation/fault-injection/fault-injection.txt
@@ -15,6 +15,10 @@ o fail_page_alloc
 
   injects page allocation failures. (alloc_pages(), get_free_pages(), ...)
 
+o fail_futex
+
+  injects futex deadlock and uaddr fault errors.
+
 o fail_make_request
 
   injects disk IO errors on devices permitted by setting
@@ -113,6 +117,12 @@ configuration of fault-injection capabilities.
 	specifies the minimum page allocation order to be injected
 	failures.
 
+- /sys/kernel/debug/fail_futex/ignore-private:
+
+	Format: { 'Y' | 'N' }
+	default is 'N', setting it to 'Y' will disable failure injections
+	when dealing with private (address space) futexes.
+
 o Boot option
 
 In order to inject faults while debugfs is not available (early boot time),
@@ -121,6 +131,7 @@ use the boot option:
 	failslab=
 	fail_page_alloc=
 	fail_make_request=
+	fail_futex=
 	mmc_core.fail_request=<interval>,<probability>,<space>,<times>
 
 How to add new fault injection capability
diff --git a/kernel/futex.c b/kernel/futex.c
index 153eb22b0fc0..6ea31bb703c9 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -64,6 +64,7 @@
 #include <linux/hugetlb.h>
 #include <linux/freezer.h>
 #include <linux/bootmem.h>
+#include <linux/fault-inject.h>
 
 #include <asm/futex.h>
 
@@ -258,6 +259,66 @@ static unsigned long __read_mostly futex_hashsize;
 
 static struct futex_hash_bucket *futex_queues;
 
+/*
+ * Fault injections for futexes.
+ */
+#ifdef CONFIG_FAIL_FUTEX
+
+static struct {
+	struct fault_attr attr;
+
+	u32 ignore_private;
+} fail_futex = {
+	.attr = FAULT_ATTR_INITIALIZER,
+	.ignore_private = 0,
+};
+
+static int __init setup_fail_futex(char *str)
+{
+	return setup_fault_attr(&fail_futex.attr, str);
+}
+__setup("fail_futex=", setup_fail_futex);
+
+bool should_fail_futex(bool fshared)
+{
+	if (fail_futex.ignore_private && !fshared)
+		return false;
+
+	return should_fail(&fail_futex.attr, 1);
+}
+
+#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
+
+static int __init fail_futex_debugfs(void)
+{
+	umode_t mode = S_IFREG | S_IRUSR | S_IWUSR;
+	struct dentry *dir;
+
+	dir = fault_create_debugfs_attr("fail_futex", NULL,
+					&fail_futex.attr);
+	if (IS_ERR(dir))
+		return PTR_ERR(dir);
+
+	if (!debugfs_create_bool("ignore-private", mode, dir,
+				 &fail_futex.ignore_private)) {
+		debugfs_remove_recursive(dir);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+late_initcall(fail_futex_debugfs);
+
+#endif /* CONFIG_FAULT_INJECTION_DEBUG_FS */
+
+#else
+static inline bool should_fail_futex(bool fshared)
+{
+	return false;
+}
+#endif /* CONFIG_FAIL_FUTEX */
+
 static inline void futex_get_mm(union futex_key *key)
 {
 	atomic_inc(&key->private.mm->mm_count);
@@ -413,6 +474,9 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
 	if (unlikely(!access_ok(rw, uaddr, sizeof(u32))))
 		return -EFAULT;
 
+	if (unlikely(should_fail_futex(fshared)))
+		return -EFAULT;
+
 	/*
 	 * PROCESS_PRIVATE futexes are fast.
 	 * As the mm cannot disappear under us and the 'key' only needs
@@ -428,6 +492,10 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
 	}
 
 again:
+	/* Ignore any VERIFY_READ mapping (futex common case) */
+	if (unlikely(should_fail_futex(fshared)))
+		return -EFAULT;
+
 	err = get_user_pages_fast(address, 1, 1, &page);
 	/*
 	 * If write access is not required (eg. FUTEX_WAIT), try
@@ -516,7 +584,7 @@ again:
 		 * A RO anonymous page will never change and thus doesn't make
 		 * sense for futex operations.
 		 */
-		if (ro) {
+		if (unlikely(should_fail_futex(fshared)) || ro) {
 			err = -EFAULT;
 			goto out;
 		}
@@ -974,6 +1042,9 @@ static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
 {
 	u32 uninitialized_var(curval);
 
+	if (unlikely(should_fail_futex(true)))
+		return -EFAULT;
+
 	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
 		return -EFAULT;
 
@@ -1015,12 +1086,18 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
 	if (get_futex_value_locked(&uval, uaddr))
 		return -EFAULT;
 
+	if (unlikely(should_fail_futex(true)))
+		return -EFAULT;
+
 	/*
 	 * Detect deadlocks.
 	 */
 	if ((unlikely((uval & FUTEX_TID_MASK) == vpid)))
 		return -EDEADLK;
 
+	if ((unlikely(should_fail_futex(true))))
+		return -EDEADLK;
+
 	/*
 	 * Lookup existing state first. If it exists, try to attach to
 	 * its pi_state.
@@ -1155,6 +1232,9 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this,
 	 */
 	newval = FUTEX_WAITERS | task_pid_vnr(new_owner);
 
+	if (unlikely(should_fail_futex(true)))
+		ret = -EFAULT;
+
 	if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval))
 		ret = -EFAULT;
 	else if (curval != uval)
@@ -1457,6 +1537,9 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
 	if (get_futex_value_locked(&curval, pifutex))
 		return -EFAULT;
 
+	if (unlikely(should_fail_futex(true)))
+		return -EFAULT;
+
 	/*
 	 * Find the top_waiter and determine if there are additional waiters.
 	 * If the caller intends to requeue more than 1 waiter to pifutex,
@@ -2537,7 +2620,7 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
  * futex_wait_requeue_pi() - Wait on uaddr and take uaddr2
  * @uaddr:	the futex we initially wait on (non-pi)
  * @flags:	futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, etc.), they must be
- * 		the same type, no requeueing from private to shared, etc.
+ *		the same type, no requeueing from private to shared, etc.
  * @val:	the expected value of uaddr
  * @abs_time:	absolute timeout
  * @bitset:	32 bit wakeup bitset set by userspace, defaults to all
@@ -3012,6 +3095,8 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 	if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
 		      cmd == FUTEX_WAIT_BITSET ||
 		      cmd == FUTEX_WAIT_REQUEUE_PI)) {
+		if (unlikely(should_fail_futex(!(op & FUTEX_PRIVATE_FLAG))))
+			return -EFAULT;
 		if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
 			return -EFAULT;
 		if (!timespec_valid(&ts))
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e2894b23efb6..22554d6f720f 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1542,6 +1542,13 @@ config FAIL_MMC_REQUEST
 	  and to test how the mmc host driver handles retries from
 	  the block device.
 
+config FAIL_FUTEX
+	bool "Fault-injection capability for futexes"
+	select DEBUG_FS
+	depends on FAULT_INJECTION && FUTEX
+	help
+	  Provide fault-injection capability for futexes.
+
 config FAULT_INJECTION_DEBUG_FS
 	bool "Debugfs entries for fault-injection capabilities"
 	depends on FAULT_INJECTION && SYSFS && DEBUG_FS
-- 
cgit v1.2.3


From 1ae25d626cfe7e11adc2c3e71d0de1f882954ef3 Mon Sep 17 00:00:00 2001
From: Josh Wu <josh.wu@atmel.com>
Date: Mon, 20 Jul 2015 17:32:05 +0800
Subject: power: reset: at91: add sama5d3 reset function

This patch introduces a new compatible string: "atmel,sama5d3-rstc" and
new reset function for sama5d3 and later chips.

As in sama5d3 or later chips, we don't have to shutdown the DDR
controller before reset. Shutdown the DDR controller before reset is a
workaround to avoid DDR signal driving the bus, but since sama5d3 and
later chips there is no such a conflict.

So in this patch:
   1. the sama5d3 reset function only need to write the rstc register
and return.
   2. we can remove the code related with sama5d3 DDR controller as
we don't use it at all.

Signed-off-by: Josh Wu <josh.wu@atmel.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
---
 .../devicetree/bindings/arm/atmel-at91.txt         |  2 +-
 drivers/power/reset/at91-reset.c                   | 26 ++++++++++++++++------
 2 files changed, 20 insertions(+), 8 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/atmel-at91.txt b/Documentation/devicetree/bindings/arm/atmel-at91.txt
index 424ac8cbfa08..dd998b9c0433 100644
--- a/Documentation/devicetree/bindings/arm/atmel-at91.txt
+++ b/Documentation/devicetree/bindings/arm/atmel-at91.txt
@@ -87,7 +87,7 @@ One interrupt per TC channel in a TC block:
 
 RSTC Reset Controller required properties:
 - compatible: Should be "atmel,<chip>-rstc".
-  <chip> can be "at91sam9260" or "at91sam9g45"
+  <chip> can be "at91sam9260" or "at91sam9g45" or "sama5d3"
 - reg: Should contain registers location and length
 
 Example:
diff --git a/drivers/power/reset/at91-reset.c b/drivers/power/reset/at91-reset.c
index 36dc52fb2ec8..c378d4ec826f 100644
--- a/drivers/power/reset/at91-reset.c
+++ b/drivers/power/reset/at91-reset.c
@@ -123,6 +123,15 @@ static int at91sam9g45_restart(struct notifier_block *this, unsigned long mode,
 	return NOTIFY_DONE;
 }
 
+static int sama5d3_restart(struct notifier_block *this, unsigned long mode,
+			   void *cmd)
+{
+	writel(cpu_to_le32(AT91_RSTC_KEY | AT91_RSTC_PERRST | AT91_RSTC_PROCRST),
+	       at91_rstc_base);
+
+	return NOTIFY_DONE;
+}
+
 static void __init at91_reset_status(struct platform_device *pdev)
 {
 	u32 reg = readl(at91_rstc_base + AT91_RSTC_SR);
@@ -155,13 +164,13 @@ static void __init at91_reset_status(struct platform_device *pdev)
 static const struct of_device_id at91_ramc_of_match[] = {
 	{ .compatible = "atmel,at91sam9260-sdramc", },
 	{ .compatible = "atmel,at91sam9g45-ddramc", },
-	{ .compatible = "atmel,sama5d3-ddramc", },
 	{ /* sentinel */ }
 };
 
 static const struct of_device_id at91_reset_of_match[] = {
 	{ .compatible = "atmel,at91sam9260-rstc", .data = at91sam9260_restart },
 	{ .compatible = "atmel,at91sam9g45-rstc", .data = at91sam9g45_restart },
+	{ .compatible = "atmel,sama5d3-rstc", .data = sama5d3_restart },
 	{ /* sentinel */ }
 };
 
@@ -181,13 +190,16 @@ static int at91_reset_of_probe(struct platform_device *pdev)
 		return -ENODEV;
 	}
 
-	for_each_matching_node(np, at91_ramc_of_match) {
-		at91_ramc_base[idx] = of_iomap(np, 0);
-		if (!at91_ramc_base[idx]) {
-			dev_err(&pdev->dev, "Could not map ram controller address\n");
-			return -ENODEV;
+	if (!of_device_is_compatible(pdev->dev.of_node, "atmel,sama5d3-rstc")) {
+		/* we need to shutdown the ddr controller, so get ramc base */
+		for_each_matching_node(np, at91_ramc_of_match) {
+			at91_ramc_base[idx] = of_iomap(np, 0);
+			if (!at91_ramc_base[idx]) {
+				dev_err(&pdev->dev, "Could not map ram controller address\n");
+				return -ENODEV;
+			}
+			idx++;
 		}
-		idx++;
 	}
 
 	match = of_match_node(at91_reset_of_match, pdev->dev.of_node);
-- 
cgit v1.2.3


From 49bdb0440541ad1144ad08d5613f9a28bcd2a8dc Mon Sep 17 00:00:00 2001
From: zhengxing <zhengxing@rock-chips.com>
Date: Sun, 19 Jul 2015 19:33:48 +0800
Subject: ASoC: rockchip: Add machine driver for max98090 codec

The driver is used for rockchip board using a max98090.

Reviewed-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: zhengxing <zhengxing@rock-chips.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../bindings/sound/rockchip-max98090.txt           |  19 ++
 sound/soc/rockchip/Kconfig                         |  10 +
 sound/soc/rockchip/Makefile                        |   4 +
 sound/soc/rockchip/rockchip_max98090.c             | 236 +++++++++++++++++++++
 4 files changed, 269 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/rockchip-max98090.txt
 create mode 100644 sound/soc/rockchip/rockchip_max98090.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/sound/rockchip-max98090.txt b/Documentation/devicetree/bindings/sound/rockchip-max98090.txt
new file mode 100644
index 000000000000..a805aa99ad75
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/rockchip-max98090.txt
@@ -0,0 +1,19 @@
+ROCKCHIP with MAX98090 CODEC
+
+Required properties:
+- compatible: "rockchip,rockchip-audio-max98090"
+- rockchip,model: The user-visible name of this sound complex
+- rockchip,i2s-controller: The phandle of the Rockchip I2S controller that's
+  connected to the CODEC
+- rockchip,audio-codec: The phandle of the MAX98090 audio codec
+- rockchip,headset-codec: The phandle of Ext chip for jack detection
+
+Example:
+
+sound {
+	compatible = "rockchip,rockchip-audio-max98090";
+	rockchip,model = "ROCKCHIP-I2S";
+	rockchip,i2s-controller = <&i2s>;
+	rockchip,audio-codec = <&max98090>;
+	rockchip,headset-codec = <&headsetcodec>;
+};
diff --git a/sound/soc/rockchip/Kconfig b/sound/soc/rockchip/Kconfig
index e18182699d83..d12356617fc7 100644
--- a/sound/soc/rockchip/Kconfig
+++ b/sound/soc/rockchip/Kconfig
@@ -14,3 +14,13 @@ config SND_SOC_ROCKCHIP_I2S
 	  Say Y or M if you want to add support for I2S driver for
 	  Rockchip I2S device. The device supports upto maximum of
 	  8 channels each for play and record.
+
+config SND_SOC_ROCKCHIP_MAX98090
+	tristate "ASoC support for Rockchip boards using a MAX98090 codec"
+	depends on SND_SOC_ROCKCHIP && I2C && GPIOLIB
+	select SND_SOC_ROCKCHIP_I2S
+	select SND_SOC_MAX98090
+	select SND_SOC_TS3A227E
+	help
+	  Say Y or M here if you want to add support for SoC audio on Rockchip
+	  boards using the MAX98090 codec, such as Veyron.
diff --git a/sound/soc/rockchip/Makefile b/sound/soc/rockchip/Makefile
index b9219092b47f..df3445b53806 100644
--- a/sound/soc/rockchip/Makefile
+++ b/sound/soc/rockchip/Makefile
@@ -2,3 +2,7 @@
 snd-soc-i2s-objs := rockchip_i2s.o
 
 obj-$(CONFIG_SND_SOC_ROCKCHIP_I2S) += snd-soc-i2s.o
+
+snd-soc-rockchip-max98090-objs := rockchip_max98090.o
+
+obj-$(CONFIG_SND_SOC_ROCKCHIP_MAX98090) += snd-soc-rockchip-max98090.o
diff --git a/sound/soc/rockchip/rockchip_max98090.c b/sound/soc/rockchip/rockchip_max98090.c
new file mode 100644
index 000000000000..acace20d4127
--- /dev/null
+++ b/sound/soc/rockchip/rockchip_max98090.c
@@ -0,0 +1,236 @@
+/*
+ * Rockchip machine ASoC driver for boards using a MAX90809 CODEC.
+ *
+ * Copyright (c) 2014, ROCKCHIP CORPORATION.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/gpio.h>
+#include <linux/of_gpio.h>
+#include <sound/core.h>
+#include <sound/jack.h>
+#include <sound/pcm.h>
+#include <sound/pcm_params.h>
+#include <sound/soc.h>
+
+#include "rockchip_i2s.h"
+#include "../codecs/ts3a227e.h"
+
+#define DRV_NAME "rockchip-snd-max98090"
+
+static struct snd_soc_jack headset_jack;
+static struct snd_soc_jack_pin headset_jack_pins[] = {
+	{
+		.pin = "Headset Jack",
+		.mask = SND_JACK_HEADPHONE | SND_JACK_MICROPHONE |
+			SND_JACK_BTN_0 | SND_JACK_BTN_1 |
+			SND_JACK_BTN_2 | SND_JACK_BTN_3,
+	},
+};
+
+static const struct snd_soc_dapm_widget rk_dapm_widgets[] = {
+	SND_SOC_DAPM_HP("Headphone", NULL),
+	SND_SOC_DAPM_MIC("Headset Mic", NULL),
+	SND_SOC_DAPM_MIC("Int Mic", NULL),
+	SND_SOC_DAPM_SPK("Speaker", NULL),
+};
+
+static const struct snd_soc_dapm_route rk_audio_map[] = {
+	{"IN34", NULL, "Headset Mic"},
+	{"IN34", NULL, "MICBIAS"},
+	{"MICBIAS", NULL, "Headset Mic"},
+	{"DMICL", NULL, "Int Mic"},
+	{"Headphone", NULL, "HPL"},
+	{"Headphone", NULL, "HPR"},
+	{"Speaker", NULL, "SPKL"},
+	{"Speaker", NULL, "SPKR"},
+};
+
+static const struct snd_kcontrol_new rk_mc_controls[] = {
+	SOC_DAPM_PIN_SWITCH("Headphone"),
+	SOC_DAPM_PIN_SWITCH("Headset Mic"),
+	SOC_DAPM_PIN_SWITCH("Int Mic"),
+	SOC_DAPM_PIN_SWITCH("Speaker"),
+};
+
+static int rk_aif1_hw_params(struct snd_pcm_substream *substream,
+			     struct snd_pcm_hw_params *params)
+{
+	int ret = 0;
+	struct snd_soc_pcm_runtime *rtd = substream->private_data;
+	struct snd_soc_dai *cpu_dai = rtd->cpu_dai;
+	struct snd_soc_dai *codec_dai = rtd->codec_dai;
+	int mclk;
+
+	switch (params_rate(params)) {
+	case 8000:
+	case 16000:
+	case 48000:
+	case 96000:
+		mclk = 12288000;
+		break;
+	case 44100:
+		mclk = 11289600;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ret = snd_soc_dai_set_sysclk(cpu_dai, 0, mclk,
+				     SND_SOC_CLOCK_OUT);
+	if (ret < 0) {
+		dev_err(codec_dai->dev, "Can't set codec clock %d\n", ret);
+		return ret;
+	}
+
+	ret = snd_soc_dai_set_sysclk(codec_dai, 0, mclk,
+				     SND_SOC_CLOCK_IN);
+	if (ret < 0) {
+		dev_err(codec_dai->dev, "Can't set codec clock %d\n", ret);
+		return ret;
+	}
+
+	return ret;
+}
+
+static int rk_init(struct snd_soc_pcm_runtime *runtime)
+{
+	/* Enable Headset and 4 Buttons Jack detection */
+	return snd_soc_card_jack_new(runtime->card, "Headset Jack",
+			       SND_JACK_HEADSET |
+			       SND_JACK_BTN_0 | SND_JACK_BTN_1 |
+			       SND_JACK_BTN_2 | SND_JACK_BTN_3,
+			       &headset_jack,
+			       headset_jack_pins,
+			       ARRAY_SIZE(headset_jack_pins));
+}
+
+static int rk_98090_headset_init(struct snd_soc_component *component)
+{
+	return ts3a227e_enable_jack_detect(component, &headset_jack);
+}
+
+static struct snd_soc_ops rk_aif1_ops = {
+	.hw_params = rk_aif1_hw_params,
+};
+
+static struct snd_soc_aux_dev rk_98090_headset_dev = {
+	.name = "Headset Chip",
+	.init = rk_98090_headset_init,
+};
+
+static struct snd_soc_dai_link rk_dailink = {
+	.name = "max98090",
+	.stream_name = "Audio",
+	.codec_dai_name = "HiFi",
+	.init = rk_init,
+	.ops = &rk_aif1_ops,
+	/* set max98090 as slave */
+	.dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF |
+		SND_SOC_DAIFMT_CBS_CFS,
+};
+
+static struct snd_soc_card snd_soc_card_rk = {
+	.name = "ROCKCHIP-I2S",
+	.dai_link = &rk_dailink,
+	.num_links = 1,
+	.aux_dev = &rk_98090_headset_dev,
+	.num_aux_devs = 1,
+	.dapm_widgets = rk_dapm_widgets,
+	.num_dapm_widgets = ARRAY_SIZE(rk_dapm_widgets),
+	.dapm_routes = rk_audio_map,
+	.num_dapm_routes = ARRAY_SIZE(rk_audio_map),
+	.controls = rk_mc_controls,
+	.num_controls = ARRAY_SIZE(rk_mc_controls),
+};
+
+static int snd_rk_mc_probe(struct platform_device *pdev)
+{
+	int ret = 0;
+	struct snd_soc_card *card = &snd_soc_card_rk;
+	struct device_node *np = pdev->dev.of_node;
+
+	/* register the soc card */
+	card->dev = &pdev->dev;
+
+	rk_dailink.codec_of_node = of_parse_phandle(np,
+			"rockchip,audio-codec", 0);
+	if (!rk_dailink.codec_of_node) {
+		dev_err(&pdev->dev,
+			"Property 'rockchip,audio-codec' missing or invalid\n");
+		return -EINVAL;
+	}
+
+	rk_dailink.cpu_of_node = of_parse_phandle(np,
+			"rockchip,i2s-controller", 0);
+	if (!rk_dailink.cpu_of_node) {
+		dev_err(&pdev->dev,
+			"Property 'rockchip,i2s-controller' missing or invalid\n");
+		return -EINVAL;
+	}
+
+	rk_dailink.platform_of_node = rk_dailink.cpu_of_node;
+
+	rk_98090_headset_dev.codec_of_node = of_parse_phandle(np,
+			"rockchip,headset-codec", 0);
+	if (!rk_98090_headset_dev.codec_of_node) {
+		dev_err(&pdev->dev,
+			"Property 'rockchip,headset-codec' missing/invalid\n");
+		return -EINVAL;
+	}
+
+	ret = snd_soc_of_parse_card_name(card, "rockchip,model");
+	if (ret) {
+		dev_err(&pdev->dev,
+			"Soc parse card name failed %d\n", ret);
+		return ret;
+	}
+
+	ret = devm_snd_soc_register_card(&pdev->dev, card);
+	if (ret) {
+		dev_err(&pdev->dev,
+			"Soc register card failed %d\n", ret);
+		return ret;
+	}
+
+	return ret;
+}
+
+static const struct of_device_id rockchip_max98090_of_match[] = {
+	{ .compatible = "rockchip,rockchip-audio-max98090", },
+	{},
+};
+
+MODULE_DEVICE_TABLE(of, rockchip_max98090_of_match);
+
+static struct platform_driver snd_rk_mc_driver = {
+	.probe = snd_rk_mc_probe,
+	.driver = {
+		.name = DRV_NAME,
+		.owner = THIS_MODULE,
+		.pm = &snd_soc_pm_ops,
+		.of_match_table = rockchip_max98090_of_match,
+	},
+};
+
+module_platform_driver(snd_rk_mc_driver);
+
+MODULE_AUTHOR("jianqun <jay.xu@rock-chips.com>");
+MODULE_DESCRIPTION("Rockchip max98090 machine ASoC driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:" DRV_NAME);
-- 
cgit v1.2.3


From 86059653ea7ca7b30ed25d6bec5807ba59a4f2e6 Mon Sep 17 00:00:00 2001
From: zhengxing <zhengxing@rock-chips.com>
Date: Sun, 19 Jul 2015 19:33:49 +0800
Subject: ASoC: rockchip: Add machine driver for rt5645/rt5650 codec

The driver is used for rockchip board using a rt5645/rt5650.

Reviewed-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: zhengxing <zhengxing@rock-chips.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../devicetree/bindings/sound/rockchip-rt5645.txt  |  17 ++
 sound/soc/rockchip/Kconfig                         |   9 +
 sound/soc/rockchip/Makefile                        |   2 +
 sound/soc/rockchip/rockchip_rt5645.c               | 225 +++++++++++++++++++++
 4 files changed, 253 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/rockchip-rt5645.txt
 create mode 100644 sound/soc/rockchip/rockchip_rt5645.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/sound/rockchip-rt5645.txt b/Documentation/devicetree/bindings/sound/rockchip-rt5645.txt
new file mode 100644
index 000000000000..411a62b3ff41
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/rockchip-rt5645.txt
@@ -0,0 +1,17 @@
+ROCKCHIP with RT5645/RT5650 CODECS
+
+Required properties:
+- compatible: "rockchip,rockchip-audio-rt5645"
+- rockchip,model: The user-visible name of this sound complex
+- rockchip,i2s-controller: The phandle of the Rockchip I2S controller that's
+  connected to the CODEC
+- rockchip,audio-codec: The phandle of the RT5645/RT5650 audio codec
+
+Example:
+
+sound {
+	compatible = "rockchip,rockchip-audio-rt5645";
+	rockchip,model = "ROCKCHIP-I2S";
+	rockchip,i2s-controller = <&i2s>;
+	rockchip,audio-codec = <&rt5645>;
+};
diff --git a/sound/soc/rockchip/Kconfig b/sound/soc/rockchip/Kconfig
index d12356617fc7..58bae8e2cf5f 100644
--- a/sound/soc/rockchip/Kconfig
+++ b/sound/soc/rockchip/Kconfig
@@ -24,3 +24,12 @@ config SND_SOC_ROCKCHIP_MAX98090
 	help
 	  Say Y or M here if you want to add support for SoC audio on Rockchip
 	  boards using the MAX98090 codec, such as Veyron.
+
+config SND_SOC_ROCKCHIP_RT5645
+	tristate "ASoC support for Rockchip boards using a RT5645/RT5650 codec"
+	depends on SND_SOC_ROCKCHIP && I2C && GPIOLIB
+	select SND_SOC_ROCKCHIP_I2S
+	select SND_SOC_RT5645
+	help
+	  Say Y or M here if you want to add support for SoC audio on Rockchip
+	  boards using the RT5645/RT5650 codec, such as Veyron.
diff --git a/sound/soc/rockchip/Makefile b/sound/soc/rockchip/Makefile
index df3445b53806..1bc1dc3c729a 100644
--- a/sound/soc/rockchip/Makefile
+++ b/sound/soc/rockchip/Makefile
@@ -4,5 +4,7 @@ snd-soc-i2s-objs := rockchip_i2s.o
 obj-$(CONFIG_SND_SOC_ROCKCHIP_I2S) += snd-soc-i2s.o
 
 snd-soc-rockchip-max98090-objs := rockchip_max98090.o
+snd-soc-rockchip-rt5645-objs := rockchip_rt5645.o
 
 obj-$(CONFIG_SND_SOC_ROCKCHIP_MAX98090) += snd-soc-rockchip-max98090.o
+obj-$(CONFIG_SND_SOC_ROCKCHIP_RT5645) += snd-soc-rockchip-rt5645.o
diff --git a/sound/soc/rockchip/rockchip_rt5645.c b/sound/soc/rockchip/rockchip_rt5645.c
new file mode 100644
index 000000000000..3c6bb1ea06ec
--- /dev/null
+++ b/sound/soc/rockchip/rockchip_rt5645.c
@@ -0,0 +1,225 @@
+/*
+ * Rockchip machine ASoC driver for boards using a RT5645/RT5650 CODEC.
+ *
+ * Copyright (c) 2015, ROCKCHIP CORPORATION.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/gpio.h>
+#include <linux/of_gpio.h>
+#include <linux/delay.h>
+#include <sound/core.h>
+#include <sound/jack.h>
+#include <sound/pcm.h>
+#include <sound/pcm_params.h>
+#include <sound/soc.h>
+#include "rockchip_i2s.h"
+
+#define DRV_NAME "rockchip-snd-rt5645"
+
+static struct snd_soc_jack headset_jack;
+
+/* Jack detect via rt5645 driver. */
+extern int rt5645_set_jack_detect(struct snd_soc_codec *codec,
+	struct snd_soc_jack *hp_jack, struct snd_soc_jack *mic_jack,
+	struct snd_soc_jack *btn_jack);
+
+static const struct snd_soc_dapm_widget rk_dapm_widgets[] = {
+	SND_SOC_DAPM_HP("Headphones", NULL),
+	SND_SOC_DAPM_SPK("Speakers", NULL),
+	SND_SOC_DAPM_MIC("Headset Mic", NULL),
+	SND_SOC_DAPM_MIC("Int Mic", NULL),
+};
+
+static const struct snd_soc_dapm_route rk_audio_map[] = {
+	/* Input Lines */
+	{"DMIC L2", NULL, "Int Mic"},
+	{"DMIC R2", NULL, "Int Mic"},
+	{"RECMIXL", NULL, "Headset Mic"},
+	{"RECMIXR", NULL, "Headset Mic"},
+
+	/* Output Lines */
+	{"Headphones", NULL, "HPOR"},
+	{"Headphones", NULL, "HPOL"},
+	{"Speakers", NULL, "SPOL"},
+	{"Speakers", NULL, "SPOR"},
+};
+
+static const struct snd_kcontrol_new rk_mc_controls[] = {
+	SOC_DAPM_PIN_SWITCH("Headphones"),
+	SOC_DAPM_PIN_SWITCH("Speakers"),
+	SOC_DAPM_PIN_SWITCH("Headset Mic"),
+	SOC_DAPM_PIN_SWITCH("Int Mic"),
+};
+
+static int rk_aif1_hw_params(struct snd_pcm_substream *substream,
+			     struct snd_pcm_hw_params *params)
+{
+	int ret = 0;
+	struct snd_soc_pcm_runtime *rtd = substream->private_data;
+	struct snd_soc_dai *cpu_dai = rtd->cpu_dai;
+	struct snd_soc_dai *codec_dai = rtd->codec_dai;
+	int mclk;
+
+	switch (params_rate(params)) {
+	case 8000:
+	case 16000:
+	case 48000:
+	case 96000:
+		mclk = 12288000;
+		break;
+	case 44100:
+		mclk = 11289600;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ret = snd_soc_dai_set_sysclk(cpu_dai, 0, mclk,
+				     SND_SOC_CLOCK_OUT);
+	if (ret < 0) {
+		dev_err(codec_dai->dev, "Can't set codec clock %d\n", ret);
+		return ret;
+	}
+
+	ret = snd_soc_dai_set_sysclk(codec_dai, 0, mclk,
+				     SND_SOC_CLOCK_IN);
+	if (ret < 0) {
+		dev_err(codec_dai->dev, "Can't set codec clock %d\n", ret);
+		return ret;
+	}
+
+	return ret;
+}
+
+static int rk_init(struct snd_soc_pcm_runtime *runtime)
+{
+	struct snd_soc_card *card = runtime->card;
+	int ret;
+
+	/* Enable Headset and 4 Buttons Jack detection */
+	ret = snd_soc_card_jack_new(card, "Headset Jack",
+				    SND_JACK_HEADPHONE | SND_JACK_MICROPHONE |
+				    SND_JACK_BTN_0 | SND_JACK_BTN_1 |
+				    SND_JACK_BTN_2 | SND_JACK_BTN_3,
+				    &headset_jack, NULL, 0);
+	if (!ret) {
+		dev_err(card->dev, "New Headset Jack failed! (%d)\n", ret);
+		return ret;
+	}
+
+	return rt5645_set_jack_detect(runtime->codec,
+				     &headset_jack,
+				     &headset_jack,
+				     &headset_jack);
+}
+
+static struct snd_soc_ops rk_aif1_ops = {
+	.hw_params = rk_aif1_hw_params,
+};
+
+static struct snd_soc_dai_link rk_dailink = {
+	.name = "rt5645",
+	.stream_name = "rt5645 PCM",
+	.codec_dai_name = "rt5645-aif1",
+	.init = rk_init,
+	.ops = &rk_aif1_ops,
+	/* set rt5645 as slave */
+	.dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF |
+		SND_SOC_DAIFMT_CBS_CFS,
+};
+
+static struct snd_soc_card snd_soc_card_rk = {
+	.name = "I2S-RT5650",
+	.dai_link = &rk_dailink,
+	.num_links = 1,
+	.dapm_widgets = rk_dapm_widgets,
+	.num_dapm_widgets = ARRAY_SIZE(rk_dapm_widgets),
+	.dapm_routes = rk_audio_map,
+	.num_dapm_routes = ARRAY_SIZE(rk_audio_map),
+	.controls = rk_mc_controls,
+	.num_controls = ARRAY_SIZE(rk_mc_controls),
+};
+
+static int snd_rk_mc_probe(struct platform_device *pdev)
+{
+	int ret = 0;
+	struct snd_soc_card *card = &snd_soc_card_rk;
+	struct device_node *np = pdev->dev.of_node;
+
+	/* register the soc card */
+	card->dev = &pdev->dev;
+
+	rk_dailink.codec_of_node = of_parse_phandle(np,
+			"rockchip,audio-codec", 0);
+	if (!rk_dailink.codec_of_node) {
+		dev_err(&pdev->dev,
+			"Property 'rockchip,audio-codec' missing or invalid\n");
+		return -EINVAL;
+	}
+
+	rk_dailink.cpu_of_node = of_parse_phandle(np,
+			"rockchip,i2s-controller", 0);
+	if (!rk_dailink.cpu_of_node) {
+		dev_err(&pdev->dev,
+			"Property 'rockchip,i2s-controller' missing or invalid\n");
+		return -EINVAL;
+	}
+
+	rk_dailink.platform_of_node = rk_dailink.cpu_of_node;
+
+	ret = snd_soc_of_parse_card_name(card, "rockchip,model");
+	if (ret) {
+		dev_err(&pdev->dev,
+			"Soc parse card name failed %d\n", ret);
+		return ret;
+	}
+
+	ret = devm_snd_soc_register_card(&pdev->dev, card);
+	if (ret) {
+		dev_err(&pdev->dev,
+			"Soc register card failed %d\n", ret);
+		return ret;
+	}
+
+	return ret;
+}
+
+static const struct of_device_id rockchip_rt5645_of_match[] = {
+	{ .compatible = "rockchip,rockchip-audio-rt5645", },
+	{},
+};
+
+MODULE_DEVICE_TABLE(of, rockchip_rt5645_of_match);
+
+static struct platform_driver snd_rk_mc_driver = {
+	.probe = snd_rk_mc_probe,
+	.driver = {
+		.name = DRV_NAME,
+		.owner = THIS_MODULE,
+		.pm = &snd_soc_pm_ops,
+		.of_match_table = rockchip_rt5645_of_match,
+	},
+};
+
+module_platform_driver(snd_rk_mc_driver);
+
+MODULE_AUTHOR("Xing Zheng <zhengxing@rock-chips.com>");
+MODULE_DESCRIPTION("Rockchip rt5645 machine ASoC driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:" DRV_NAME);
-- 
cgit v1.2.3


From 0ec5f8bd9920fc1a7d0b0d2d4876ea80665a70fd Mon Sep 17 00:00:00 2001
From: Chris Zhong <zyw@rock-chips.com>
Date: Mon, 20 Jul 2015 17:23:52 +0800
Subject: regulator: rk808: add the description about dvs gpio for rk808

add the description about dvs1, dvs2, and add the example.

Signed-off-by: Chris Zhong <zyw@rock-chips.com>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/devicetree/bindings/mfd/rk808.txt | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mfd/rk808.txt b/Documentation/devicetree/bindings/mfd/rk808.txt
index 9e6e2592e5c8..4ca6aab4273a 100644
--- a/Documentation/devicetree/bindings/mfd/rk808.txt
+++ b/Documentation/devicetree/bindings/mfd/rk808.txt
@@ -24,6 +24,10 @@ Optional properties:
 - vcc10-supply: The input supply for LDO_REG6
 - vcc11-supply: The input supply for LDO_REG8
 - vcc12-supply: The input supply for SWITCH_REG2
+- dvs-gpios:  buck1/2 can be controlled by gpio dvs, this is GPIO specifiers
+  for 2 host gpio's used for dvs. The format of the gpio specifier depends in
+  the gpio controller. If DVS GPIOs aren't present, voltage changes will happen
+  very quickly with no slow ramp time.
 
 Regulators: All the regulators of RK808 to be instantiated shall be
 listed in a child node named 'regulators'. Each regulator is represented
@@ -55,7 +59,9 @@ Example:
 		interrupt-parent = <&gpio0>;
 		interrupts = <4 IRQ_TYPE_LEVEL_LOW>;
 		pinctrl-names = "default";
-		pinctrl-0 = <&pmic_int>;
+		pinctrl-0 = <&pmic_int &dvs_1 &dvs_2>;
+		dvs-gpios = <&gpio7 11 GPIO_ACTIVE_HIGH>,
+			    <&gpio7 15 GPIO_ACTIVE_HIGH>;
 		reg = <0x1b>;
 		rockchip,system-power-controller;
 		wakeup-source;
-- 
cgit v1.2.3


From f686a36b4b79782a94f07769fb1c0187d24ea8a8 Mon Sep 17 00:00:00 2001
From: Andrea Galbusera <gizero@gmail.com>
Date: Tue, 14 Jul 2015 15:36:21 +0200
Subject: iio: adc: mcp320x: Add support for mcp3301

This adds support for Microchip's 13 bit 1 channel AD converter MCP3301

Signed-off-by: Andrea Galbusera <gizero@gmail.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
---
 Documentation/devicetree/bindings/iio/adc/mcp320x.txt |  1 +
 drivers/iio/adc/Kconfig                               |  4 ++--
 drivers/iio/adc/mcp320x.c                             | 16 +++++++++++++++-
 3 files changed, 18 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/iio/adc/mcp320x.txt b/Documentation/devicetree/bindings/iio/adc/mcp320x.txt
index b85184391b78..2a1f3af30155 100644
--- a/Documentation/devicetree/bindings/iio/adc/mcp320x.txt
+++ b/Documentation/devicetree/bindings/iio/adc/mcp320x.txt
@@ -18,6 +18,7 @@ Required properties:
 				"mcp3202"
 				"mcp3204"
 				"mcp3208"
+				"mcp3301"
 
 
 Examples:
diff --git a/drivers/iio/adc/Kconfig b/drivers/iio/adc/Kconfig
index dace593b158b..2714f043fa6c 100644
--- a/drivers/iio/adc/Kconfig
+++ b/drivers/iio/adc/Kconfig
@@ -229,8 +229,8 @@ config MCP320X
 	depends on SPI
 	help
 	  Say yes here to build support for Microchip Technology's
-	  MCP3001, MCP3002, MCP3004, MCP3008, MCP3201, MCP3202, MCP3204 or
-	  MCP3208 analog to digital converter.
+	  MCP3001, MCP3002, MCP3004, MCP3008, MCP3201, MCP3202, MCP3204,
+	  MCP3208 or MCP3301 analog to digital converter.
 
 	  This driver can also be built as a module. If so, the module will be
 	  called mcp320x.
diff --git a/drivers/iio/adc/mcp320x.c b/drivers/iio/adc/mcp320x.c
index 8d9c9b9215dd..72969c8a0a51 100644
--- a/drivers/iio/adc/mcp320x.c
+++ b/drivers/iio/adc/mcp320x.c
@@ -25,6 +25,7 @@
  * http://ww1.microchip.com/downloads/en/DeviceDoc/21290D.pdf  mcp3201
  * http://ww1.microchip.com/downloads/en/DeviceDoc/21034D.pdf  mcp3202
  * http://ww1.microchip.com/downloads/en/DeviceDoc/21298c.pdf  mcp3204/08
+ * http://ww1.microchip.com/downloads/en/DeviceDoc/21700E.pdf  mcp3301
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -47,6 +48,7 @@ enum {
 	mcp3202,
 	mcp3204,
 	mcp3208,
+	mcp3301,
 };
 
 struct mcp320x_chip_info {
@@ -76,6 +78,7 @@ static int mcp320x_channel_to_tx_data(int device_index,
 	switch (device_index) {
 	case mcp3001:
 	case mcp3201:
+	case mcp3301:
 		return 0;
 	case mcp3002:
 	case mcp3202:
@@ -102,7 +105,7 @@ static int mcp320x_adc_conversion(struct mcp320x *adc, u8 channel,
 	adc->tx_buf = mcp320x_channel_to_tx_data(device_index,
 						channel, differential);
 
-	if (device_index != mcp3001 && device_index != mcp3201) {
+	if (device_index != mcp3001 && device_index != mcp3201 && device_index != mcp3301) {
 		ret = spi_sync(adc->spi, &adc->msg);
 		if (ret < 0)
 			return ret;
@@ -125,6 +128,8 @@ static int mcp320x_adc_conversion(struct mcp320x *adc, u8 channel,
 	case mcp3204:
 	case mcp3208:
 		return (adc->rx_buf[0] << 4 | adc->rx_buf[1] >> 4);
+	case mcp3301:
+		return sign_extend32((adc->rx_buf[0] & 0x1f) << 8 | adc->rx_buf[1], 12);
 	default:
 		return -EINVAL;
 	}
@@ -274,6 +279,11 @@ static const struct mcp320x_chip_info mcp320x_chip_infos[] = {
 		.num_channels = ARRAY_SIZE(mcp3208_channels),
 		.resolution = 12
 	},
+	[mcp3301] = {
+		.channels = mcp3201_channels,
+		.num_channels = ARRAY_SIZE(mcp3201_channels),
+		.resolution = 13
+	},
 };
 
 static int mcp320x_probe(struct spi_device *spi)
@@ -366,6 +376,9 @@ static const struct of_device_id mcp320x_dt_ids[] = {
 	}, {
 		.compatible = "mcp3208",
 		.data = &mcp320x_chip_infos[mcp3208],
+	}, {
+		.compatible = "mcp3301",
+		.data = &mcp320x_chip_infos[mcp3301],
 	}, {
 	}
 };
@@ -381,6 +394,7 @@ static const struct spi_device_id mcp320x_id[] = {
 	{ "mcp3202", mcp3202 },
 	{ "mcp3204", mcp3204 },
 	{ "mcp3208", mcp3208 },
+	{ "mcp3301", mcp3301 },
 	{ }
 };
 MODULE_DEVICE_TABLE(spi, mcp320x_id);
-- 
cgit v1.2.3


From 5e9972cd6f1a69ee8a8ee8664d3e4c4ea59c0901 Mon Sep 17 00:00:00 2001
From: Sanchayan Maity <maitysanchayan@gmail.com>
Date: Tue, 14 Jul 2015 19:23:22 +0530
Subject: iio: adc: vf610: Determine sampling frequencies by using minimum
 sample time

The driver currently does not take into account the minimum sample time
as per the Figure 6-8 Chapter 9.1.1 12-bit ADC electrical characteristics.
We set a static amount of cycles instead of considering the sample time
as a given value, which depends on hardware characteristics.

Determine sampling frequencies by first reading the device tree property
node and then calculating the required Long Sample Time Adder (LSTAdder)
value, based on the ADC clock frequency and sample time value obtained
from the device tree. This LSTAdder value is then used for calculating
the sampling frequencies possible.

In case the sample time property is not specified through the device
tree, a safe default value of 1000ns is assumed.

Signed-off-by: Sanchayan Maity <maitysanchayan@gmail.com>
Acked-by: Stefan Agner <stefan@agner.ch>
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
---
 .../devicetree/bindings/iio/adc/vf610-adc.txt      |  5 ++
 drivers/iio/adc/vf610_adc.c                        | 79 ++++++++++++++++++++--
 2 files changed, 80 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/iio/adc/vf610-adc.txt b/Documentation/devicetree/bindings/iio/adc/vf610-adc.txt
index 3eb40e20c143..1aad0514e647 100644
--- a/Documentation/devicetree/bindings/iio/adc/vf610-adc.txt
+++ b/Documentation/devicetree/bindings/iio/adc/vf610-adc.txt
@@ -17,6 +17,11 @@ Recommended properties:
   - Frequency in normal mode (ADLPC=0, ADHSC=0)
   - Frequency in high-speed mode (ADLPC=0, ADHSC=1)
   - Frequency in low-power mode (ADLPC=1, ADHSC=0)
+- min-sample-time: Minimum sampling time in nanoseconds. This value has
+  to be chosen according to the conversion mode and the connected analog
+  source resistance (R_as) and capacitance (C_as). Refer the datasheet's
+  operating requirements. A safe default across a wide range of R_as and
+  C_as as well as conversion modes is 1000ns.
 
 Example:
 adc0: adc@4003b000 {
diff --git a/drivers/iio/adc/vf610_adc.c b/drivers/iio/adc/vf610_adc.c
index 480f335a0f9f..23b8fb94ec09 100644
--- a/drivers/iio/adc/vf610_adc.c
+++ b/drivers/iio/adc/vf610_adc.c
@@ -68,6 +68,9 @@
 #define VF610_ADC_CLK_DIV8		0x60
 #define VF610_ADC_CLK_MASK		0x60
 #define VF610_ADC_ADLSMP_LONG		0x10
+#define VF610_ADC_ADSTS_SHORT   0x100
+#define VF610_ADC_ADSTS_NORMAL  0x200
+#define VF610_ADC_ADSTS_LONG    0x300
 #define VF610_ADC_ADSTS_MASK		0x300
 #define VF610_ADC_ADLPC_EN		0x80
 #define VF610_ADC_ADHSC_EN		0x400
@@ -98,6 +101,8 @@
 #define VF610_ADC_CALF			0x2
 #define VF610_ADC_TIMEOUT		msecs_to_jiffies(100)
 
+#define DEFAULT_SAMPLE_TIME		1000
+
 enum clk_sel {
 	VF610_ADCIOC_BUSCLK_SET,
 	VF610_ADCIOC_ALTCLK_SET,
@@ -124,6 +129,17 @@ enum conversion_mode_sel {
 	VF610_ADC_CONV_LOW_POWER,
 };
 
+enum lst_adder_sel {
+	VF610_ADCK_CYCLES_3,
+	VF610_ADCK_CYCLES_5,
+	VF610_ADCK_CYCLES_7,
+	VF610_ADCK_CYCLES_9,
+	VF610_ADCK_CYCLES_13,
+	VF610_ADCK_CYCLES_17,
+	VF610_ADCK_CYCLES_21,
+	VF610_ADCK_CYCLES_25,
+};
+
 struct vf610_adc_feature {
 	enum clk_sel	clk_sel;
 	enum vol_ref	vol_ref;
@@ -132,6 +148,8 @@ struct vf610_adc_feature {
 	int	clk_div;
 	int     sample_rate;
 	int	res_mode;
+	u32 lst_adder_index;
+	u32 default_sample_time;
 
 	bool	calibration;
 	bool	ovwren;
@@ -155,11 +173,13 @@ struct vf610_adc {
 };
 
 static const u32 vf610_hw_avgs[] = { 1, 4, 8, 16, 32 };
+static const u32 vf610_lst_adder[] = { 3, 5, 7, 9, 13, 17, 21, 25 };
 
 static inline void vf610_adc_calculate_rates(struct vf610_adc *info)
 {
 	struct vf610_adc_feature *adc_feature = &info->adc_feature;
 	unsigned long adck_rate, ipg_rate = clk_get_rate(info->clk);
+	u32 adck_period, lst_addr_min;
 	int divisor, i;
 
 	adck_rate = info->max_adck_rate[adc_feature->conv_mode];
@@ -173,6 +193,19 @@ static inline void vf610_adc_calculate_rates(struct vf610_adc *info)
 		adc_feature->clk_div = 8;
 	}
 
+	/*
+	 * Determine the long sample time adder value to be used based
+	 * on the default minimum sample time provided.
+	 */
+	adck_period = NSEC_PER_SEC / adck_rate;
+	lst_addr_min = adc_feature->default_sample_time / adck_period;
+	for (i = 0; i < ARRAY_SIZE(vf610_lst_adder); i++) {
+		if (vf610_lst_adder[i] > lst_addr_min) {
+			adc_feature->lst_adder_index = i;
+			break;
+		}
+	}
+
 	/*
 	 * Calculate ADC sample frequencies
 	 * Sample time unit is ADCK cycles. ADCK clk source is ipg clock,
@@ -182,12 +215,13 @@ static inline void vf610_adc_calculate_rates(struct vf610_adc *info)
 	 * SFCAdder: fixed to 6 ADCK cycles
 	 * AverageNum: 1, 4, 8, 16, 32 samples for hardware average.
 	 * BCT (Base Conversion Time): fixed to 25 ADCK cycles for 12 bit mode
-	 * LSTAdder(Long Sample Time): fixed to 3 ADCK cycles
+	 * LSTAdder(Long Sample Time): 3, 5, 7, 9, 13, 17, 21, 25 ADCK cycles
 	 */
 	adck_rate = ipg_rate / info->adc_feature.clk_div;
 	for (i = 0; i < ARRAY_SIZE(vf610_hw_avgs); i++)
 		info->sample_freq_avail[i] =
-			adck_rate / (6 + vf610_hw_avgs[i] * (25 + 3));
+			adck_rate / (6 + vf610_hw_avgs[i] *
+			 (25 + vf610_lst_adder[adc_feature->lst_adder_index]));
 }
 
 static inline void vf610_adc_cfg_init(struct vf610_adc *info)
@@ -347,8 +381,40 @@ static void vf610_adc_sample_set(struct vf610_adc *info)
 		break;
 	}
 
-	/* Use the short sample mode */
-	cfg_data &= ~(VF610_ADC_ADLSMP_LONG | VF610_ADC_ADSTS_MASK);
+	/*
+	 * Set ADLSMP and ADSTS based on the Long Sample Time Adder value
+	 * determined.
+	 */
+	switch (adc_feature->lst_adder_index) {
+	case VF610_ADCK_CYCLES_3:
+		break;
+	case VF610_ADCK_CYCLES_5:
+		cfg_data |= VF610_ADC_ADSTS_SHORT;
+		break;
+	case VF610_ADCK_CYCLES_7:
+		cfg_data |= VF610_ADC_ADSTS_NORMAL;
+		break;
+	case VF610_ADCK_CYCLES_9:
+		cfg_data |= VF610_ADC_ADSTS_LONG;
+		break;
+	case VF610_ADCK_CYCLES_13:
+		cfg_data |= VF610_ADC_ADLSMP_LONG;
+		break;
+	case VF610_ADCK_CYCLES_17:
+		cfg_data |= VF610_ADC_ADLSMP_LONG;
+		cfg_data |= VF610_ADC_ADSTS_SHORT;
+		break;
+	case VF610_ADCK_CYCLES_21:
+		cfg_data |= VF610_ADC_ADLSMP_LONG;
+		cfg_data |= VF610_ADC_ADSTS_NORMAL;
+		break;
+	case VF610_ADCK_CYCLES_25:
+		cfg_data |= VF610_ADC_ADLSMP_LONG;
+		cfg_data |= VF610_ADC_ADSTS_NORMAL;
+		break;
+	default:
+		dev_err(info->dev, "error in sample time select\n");
+	}
 
 	/* update hardware average selection */
 	cfg_data &= ~VF610_ADC_AVGS_MASK;
@@ -713,6 +779,11 @@ static int vf610_adc_probe(struct platform_device *pdev)
 	of_property_read_u32_array(pdev->dev.of_node, "fsl,adck-max-frequency",
 			info->max_adck_rate, 3);
 
+	ret = of_property_read_u32(pdev->dev.of_node, "min-sample-time",
+			&info->adc_feature.default_sample_time);
+	if (ret)
+		info->adc_feature.default_sample_time = DEFAULT_SAMPLE_TIME;
+
 	platform_set_drvdata(pdev, indio_dev);
 
 	init_completion(&info->completion);
-- 
cgit v1.2.3


From 28e51af7c608f661a728db9d47b33c54096fdc59 Mon Sep 17 00:00:00 2001
From: "J. Bruce Fields" <bfields@redhat.com>
Date: Wed, 8 Jul 2015 16:27:56 -0400
Subject: Revert "Documentation: NFS/RDMA: Document separate Kconfig symbols"

This reverts commit 731d5cca82729c85ca3296902a64836619f4ba2d.

Commit ffe1f0df5862 ("rpcrdma: Merge svcrdma and xprtrdma modules into
one") forgot to update the corresponding documentation.

Reported-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 Documentation/filesystems/nfs/nfs-rdma.txt | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/nfs/nfs-rdma.txt b/Documentation/filesystems/nfs/nfs-rdma.txt
index 95c13aa575ff..906b6c233f62 100644
--- a/Documentation/filesystems/nfs/nfs-rdma.txt
+++ b/Documentation/filesystems/nfs/nfs-rdma.txt
@@ -138,9 +138,9 @@ Installation
   - Build, install, reboot
 
     The NFS/RDMA code will be enabled automatically if NFS and RDMA
-    are turned on. The NFS/RDMA client and server are configured via the
-    SUNRPC_XPRT_RDMA_CLIENT and SUNRPC_XPRT_RDMA_SERVER config options that both
-    depend on SUNRPC and INFINIBAND. The default value of both options will be:
+    are turned on. The NFS/RDMA client and server are configured via the hidden
+    SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The
+    value of SUNRPC_XPRT_RDMA will be:
 
      - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client
        and server will not be built
@@ -238,9 +238,8 @@ NFS/RDMA Setup
 
   - Start the NFS server
 
-    If the NFS/RDMA server was built as a module
-    (CONFIG_SUNRPC_XPRT_RDMA_SERVER=m in kernel config), load the RDMA
-    transport module:
+    If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
+    kernel config), load the RDMA transport module:
 
     $ modprobe svcrdma
 
@@ -259,9 +258,8 @@ NFS/RDMA Setup
 
   - On the client system
 
-    If the NFS/RDMA client was built as a module
-    (CONFIG_SUNRPC_XPRT_RDMA_CLIENT=m in kernel config), load the RDMA client
-    module:
+    If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
+    kernel config), load the RDMA client module:
 
     $ modprobe xprtrdma.ko
 
-- 
cgit v1.2.3


From a48037e7c6c25436912f78f48cdbb75a710b7aa9 Mon Sep 17 00:00:00 2001
From: Scott Feldman <sfeldma@gmail.com>
Date: Sat, 18 Jul 2015 18:24:52 -0700
Subject: switchdev: update documentation for offload_fwd_mark

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/switchdev.txt | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index c5d7ade10ff2..9825f32a8634 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -279,8 +279,18 @@ and unknown unicast packets to all ports in domain, if allowed by port's
 current STP state.  The switch driver, knowing which ports are within which
 vlan L2 domain, can program the switch device for flooding.  The packet should
 also be sent to the port netdev for processing by the bridge driver.  The
-bridge should not reflood the packet to the same ports the device flooded.
-XXX: the mechanism to avoid duplicate flood packets is being discuseed.
+bridge should not reflood the packet to the same ports the device flooded,
+otherwise there will be duplicate packets on the wire.
+
+To avoid duplicate packets, the device/driver should mark a packet as already
+forwarded using skb->offload_fwd_mark.  The same mark is set on the device
+ports in the domain using dev->offload_fwd_mark.  If the skb->offload_fwd_mark
+is non-zero and matches the forwarding egress port's dev->skb_mark, the kernel
+will drop the skb right before transmit on the egress port, with the
+understanding that the device already forwarded the packet on same egress port.
+The driver can use switchdev_port_fwd_mark_set() to set a globally unique mark
+for port's dev->offload_fwd_mark, based on the port's parent ID (switch ID) and
+a group ifindex.
 
 It is possible for the switch device to not handle flooding and push the
 packets up to the bridge driver for flooding.  This is not ideal as the number
-- 
cgit v1.2.3


From b838e1d96c613019095ba008afbee800977b0582 Mon Sep 17 00:00:00 2001
From: Jungseok Lee <jungseoklee85@gmail.com>
Date: Sat, 11 Jul 2015 14:51:40 +0000
Subject: tracing: Introduce two additional marks for delay

A fine granulity support for delay would be very useful when profiling
VM logics, such as page allocation including page reclaim and memory
compaction with function graph.

Thus, this patch adds two additional marks with two changes.

 - An equal sign in mark selection function is removed to align code
   behavior with comments and documentation.

 - The function graph example related to delay in ftrace.txt is updated
   to cover all supported marks.

Link: http://lkml.kernel.org/r/1436626300-1679-3-git-send-email-jungseoklee85@gmail.com

Cc: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 Documentation/trace/ftrace.txt | 51 +++++++++++++++++++++++++++++++-----------
 kernel/trace/trace_output.c    |  4 +++-
 2 files changed, 41 insertions(+), 14 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 7ddb1e319f84..072d3c4d5753 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -686,6 +686,8 @@ The above is mostly meaningful for kernel developers.
 	 The marks are determined by the difference between this
 	 current trace and the next trace.
 	  '$' - greater than 1 second
+	  '@' - greater than 100 milisecond
+	  '*' - greater than 10 milisecond
 	  '#' - greater than 1000 microsecond
 	  '!' - greater than 100 microsecond
 	  '+' - greater than 10 microsecond
@@ -1939,26 +1941,49 @@ want, depending on your needs.
 
   ie:
 
-  0)               |    up_write() {
-  0)   0.646 us    |      _spin_lock_irqsave();
-  0)   0.684 us    |      _spin_unlock_irqrestore();
-  0)   3.123 us    |    }
-  0)   0.548 us    |    fput();
-  0) + 58.628 us   |  }
+  3) # 1837.709 us |          } /* __switch_to */
+  3)               |          finish_task_switch() {
+  3)   0.313 us    |            _raw_spin_unlock_irq();
+  3)   3.177 us    |          }
+  3) # 1889.063 us |        } /* __schedule */
+  3) ! 140.417 us  |      } /* __schedule */
+  3) # 2034.948 us |    } /* schedule */
+  3) * 33998.59 us |  } /* schedule_preempt_disabled */
 
   [...]
 
-  0)               |      putname() {
-  0)               |        kmem_cache_free() {
-  0)   0.518 us    |          __phys_addr();
-  0)   1.757 us    |        }
-  0)   2.861 us    |      }
-  0) ! 115.305 us  |    }
-  0) ! 116.402 us  |  }
+  1)   0.260 us    |              msecs_to_jiffies();
+  1)   0.313 us    |              __rcu_read_unlock();
+  1) + 61.770 us   |            }
+  1) + 64.479 us   |          }
+  1)   0.313 us    |          rcu_bh_qs();
+  1)   0.313 us    |          __local_bh_enable();
+  1) ! 217.240 us  |        }
+  1)   0.365 us    |        idle_cpu();
+  1)               |        rcu_irq_exit() {
+  1)   0.417 us    |          rcu_eqs_enter_common.isra.47();
+  1)   3.125 us    |        }
+  1) ! 227.812 us  |      }
+  1) ! 457.395 us  |    }
+  1) @ 119760.2 us |  }
+
+  [...]
+
+  2)               |    handle_IPI() {
+  1)   6.979 us    |                  }
+  2)   0.417 us    |      scheduler_ipi();
+  1)   9.791 us    |                }
+  1) + 12.917 us   |              }
+  2)   3.490 us    |    }
+  1) + 15.729 us   |            }
+  1) + 18.542 us   |          }
+  2) $ 3594274 us  |  }
 
   + means that the function exceeded 10 usecs.
   ! means that the function exceeded 100 usecs.
   # means that the function exceeded 1000 usecs.
+  * means that the function exceeded 10 msecs.
+  @ means that the function exceeded 100 msecs.
   $ means that the function exceeded 1 sec.
 
 
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index dfab253727dc..8e481a84aeea 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -496,6 +496,8 @@ static const struct trace_mark {
 	char			sym;
 } mark[] = {
 	MARK(1000000000ULL	, '$'), /* 1 sec */
+	MARK(100000000ULL	, '@'), /* 100 msec */
+	MARK(10000000ULL	, '*'), /* 10 msec */
 	MARK(1000000ULL		, '#'), /* 1000 usecs */
 	MARK(100000ULL		, '!'), /* 100 usecs */
 	MARK(10000ULL		, '+'), /* 10 usecs */
@@ -508,7 +510,7 @@ char trace_find_mark(unsigned long long d)
 	int size = ARRAY_SIZE(mark);
 
 	for (i = 0; i < size; i++) {
-		if (d >= mark[i].val)
+		if (d > mark[i].val)
 			break;
 	}
 
-- 
cgit v1.2.3


From f4c190eb8b4f80b12dc98ce7d54a3bea0e4e7e69 Mon Sep 17 00:00:00 2001
From: Joachim Eastwood <manabian@gmail.com>
Date: Fri, 17 Jul 2015 00:26:12 +0200
Subject: stmmac: drop custom_* fields from plat_stmmacenet_data

Both of these fields are unused and has been unused since they
were added 3 and 5 years ago. Drop them since they are clearly
not very useful.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/stmmac.txt | 4 ----
 include/linux/stmmac.h              | 2 --
 2 files changed, 6 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index e655e2453c98..5fddefa69baf 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -139,8 +139,6 @@ struct plat_stmmacenet_data {
 	void (*free)(struct platform_device *pdev, void *priv);
 	int (*init)(struct platform_device *pdev, void *priv);
 	void (*exit)(struct platform_device *pdev, void *priv);
-	void *custom_cfg;
-	void *custom_data;
 	void *bsp_priv;
 };
 
@@ -186,8 +184,6 @@ Where:
 	     which will be stored in bsp_priv, and then passed to init and
 	     exit callbacks. init/exit callbacks should not use or modify
 	     platform data.
- o custom_cfg/custom_data: this is a custom configuration that can be passed
-			   while initializing the resources.
  o bsp_priv: another private pointer.
 
 For MDIO bus The we have:
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index c735f5c91eea..c86a20047cb1 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -123,8 +123,6 @@ struct plat_stmmacenet_data {
 	void (*free)(struct platform_device *pdev, void *priv);
 	int (*init)(struct platform_device *pdev, void *priv);
 	void (*exit)(struct platform_device *pdev, void *priv);
-	void *custom_cfg;
-	void *custom_data;
 	void *bsp_priv;
 };
 
-- 
cgit v1.2.3


From 949163015ce6fdb76a5e846a3582d3c40c23c001 Mon Sep 17 00:00:00 2001
From: Paolo Pisati <p.pisati@gmail.com>
Date: Mon, 20 Jul 2015 18:23:50 +0200
Subject: x86/boot: Obsolete the MCA sys_desc_table

The kernel does not support the MCA bus anymroe, so mark sys_desc_table
as obsolete: remove any reference from the code together with the remaining
of MCA logic.

bloat-o-meter output:

  add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-55 (-55)
  function                                     old     new   delta
  i386_start_kernel                            128     119      -9
  setup_arch                                  1421    1375     -46

Signed-off-by: Paolo Pisati <p.pisati@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1437409430-8491-1-git-send-email-p.pisati@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/x86/zero-page.txt       |  3 ++-
 arch/x86/boot/Makefile                |  2 +-
 arch/x86/boot/boot.h                  |  3 ---
 arch/x86/boot/compressed/eboot.c      |  4 ----
 arch/x86/boot/main.c                  |  3 ---
 arch/x86/boot/mca.c                   | 38 -----------------------------------
 arch/x86/include/asm/processor.h      |  8 --------
 arch/x86/include/uapi/asm/bootparam.h |  2 +-
 arch/x86/kernel/kexec-bzimage64.c     |  3 ---
 arch/x86/kernel/setup.c               |  5 -----
 10 files changed, 4 insertions(+), 67 deletions(-)
 delete mode 100644 arch/x86/boot/mca.c

(limited to 'Documentation')

diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
index 82fbdbc1e0b0..95a4d34af3fd 100644
--- a/Documentation/x86/zero-page.txt
+++ b/Documentation/x86/zero-page.txt
@@ -17,7 +17,8 @@ Offset	Proto	Name		Meaning
 				(struct ist_info)
 080/010	ALL	hd0_info	hd0 disk parameter, OBSOLETE!!
 090/010	ALL	hd1_info	hd1 disk parameter, OBSOLETE!!
-0A0/010	ALL	sys_desc_table	System description table (struct sys_desc_table)
+0A0/010	ALL	sys_desc_table	System description table (struct sys_desc_table),
+				OBSOLETE!!
 0B0/010	ALL	olpc_ofw_header	OLPC's OpenFirmware CIF and friends
 0C0/004	ALL	ext_ramdisk_image ramdisk_image high 32bits
 0C4/004	ALL	ext_ramdisk_size  ramdisk_size high 32bits
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 57bbf2fb21f6..0d553e54171b 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -23,7 +23,7 @@ targets		+= fdimage fdimage144 fdimage288 image.iso mtools.conf
 subdir-		:= compressed
 
 setup-y		+= a20.o bioscall.o cmdline.o copy.o cpu.o cpuflags.o cpucheck.o
-setup-y		+= early_serial_console.o edd.o header.o main.o mca.o memory.o
+setup-y		+= early_serial_console.o edd.o header.o main.o memory.o
 setup-y		+= pm.o pmjump.o printf.o regs.o string.o tty.o video.o
 setup-y		+= video-mode.o version.o
 setup-$(CONFIG_X86_APM_BOOT) += apm.o
diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index bd49ec61255c..0033e96c3f09 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -307,9 +307,6 @@ void query_edd(void);
 /* header.S */
 void __attribute__((noreturn)) die(void);
 
-/* mca.c */
-int query_mca(void);
-
 /* memory.c */
 int detect_memory(void);
 
diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index 2c82bd150d43..e140bccf9cf3 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -1041,7 +1041,6 @@ void setup_graphics(struct boot_params *boot_params)
 struct boot_params *make_boot_params(struct efi_config *c)
 {
 	struct boot_params *boot_params;
-	struct sys_desc_table *sdt;
 	struct apm_bios_info *bi;
 	struct setup_header *hdr;
 	struct efi_info *efi;
@@ -1089,7 +1088,6 @@ struct boot_params *make_boot_params(struct efi_config *c)
 	hdr = &boot_params->hdr;
 	efi = &boot_params->efi_info;
 	bi = &boot_params->apm_bios_info;
-	sdt = &boot_params->sys_desc_table;
 
 	/* Copy the second sector to boot_params */
 	memcpy(&hdr->jump, image->image_base + 512, 512);
@@ -1118,8 +1116,6 @@ struct boot_params *make_boot_params(struct efi_config *c)
 	/* Clear APM BIOS info */
 	memset(bi, 0, sizeof(*bi));
 
-	memset(sdt, 0, sizeof(*sdt));
-
 	status = efi_parse_options(cmdline_ptr);
 	if (status != EFI_SUCCESS)
 		goto fail2;
diff --git a/arch/x86/boot/main.c b/arch/x86/boot/main.c
index fd6c9f236996..9bcea386db65 100644
--- a/arch/x86/boot/main.c
+++ b/arch/x86/boot/main.c
@@ -161,9 +161,6 @@ void main(void)
 	/* Set keyboard repeat rate (why?) and query the lock flags */
 	keyboard_init();
 
-	/* Query MCA information */
-	query_mca();
-
 	/* Query Intel SpeedStep (IST) information */
 	query_ist();
 
diff --git a/arch/x86/boot/mca.c b/arch/x86/boot/mca.c
deleted file mode 100644
index a95a531148ef..000000000000
--- a/arch/x86/boot/mca.c
+++ /dev/null
@@ -1,38 +0,0 @@
-/* -*- linux-c -*- ------------------------------------------------------- *
- *
- *   Copyright (C) 1991, 1992 Linus Torvalds
- *   Copyright 2007 rPath, Inc. - All Rights Reserved
- *   Copyright 2009 Intel Corporation; author H. Peter Anvin
- *
- *   This file is part of the Linux kernel, and is made available under
- *   the terms of the GNU General Public License version 2.
- *
- * ----------------------------------------------------------------------- */
-
-/*
- * Get the MCA system description table
- */
-
-#include "boot.h"
-
-int query_mca(void)
-{
-	struct biosregs ireg, oreg;
-	u16 len;
-
-	initregs(&ireg);
-	ireg.ah = 0xc0;
-	intcall(0x15, &ireg, &oreg);
-
-	if (oreg.eflags & X86_EFLAGS_CF)
-		return -1;	/* No MCA present */
-
-	set_fs(oreg.es);
-	len = rdfs16(oreg.bx);
-
-	if (len > sizeof(boot_params.sys_desc_table))
-		len = sizeof(boot_params.sys_desc_table);
-
-	copy_from_fs(&boot_params.sys_desc_table, oreg.bx, len);
-	return 0;
-}
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 43e6519df0d5..3e15e1358f21 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -647,14 +647,6 @@ static inline void update_debugctlmsr(unsigned long debugctlmsr)
 
 extern void set_task_blockstep(struct task_struct *task, bool on);
 
-/*
- * from system description table in BIOS. Mostly for MCA use, but
- * others may find it useful:
- */
-extern unsigned int		machine_id;
-extern unsigned int		machine_submodel_id;
-extern unsigned int		BIOS_revision;
-
 /* Boot loader type from the setup header: */
 extern int			bootloader_type;
 extern int			bootloader_version;
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index ab456dc233b5..329254373479 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -120,7 +120,7 @@ struct boot_params {
 	__u8  _pad3[16];				/* 0x070 */
 	__u8  hd0_info[16];	/* obsolete! */		/* 0x080 */
 	__u8  hd1_info[16];	/* obsolete! */		/* 0x090 */
-	struct sys_desc_table sys_desc_table;		/* 0x0a0 */
+	struct sys_desc_table sys_desc_table; /* obsolete! */	/* 0x0a0 */
 	struct olpc_ofw_header olpc_ofw_header;		/* 0x0b0 */
 	__u32 ext_ramdisk_image;			/* 0x0c0 */
 	__u32 ext_ramdisk_size;				/* 0x0c4 */
diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
index ca83f7ac388b..961e51e9c6f6 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -223,9 +223,6 @@ setup_boot_parameters(struct kimage *image, struct boot_params *params,
 	memset(&params->hd0_info, 0, sizeof(params->hd0_info));
 	memset(&params->hd1_info, 0, sizeof(params->hd1_info));
 
-	/* Default sysdesc table */
-	params->sys_desc_table.length = 0;
-
 	if (image->type == KEXEC_TYPE_CRASH) {
 		ret = crash_setup_memmap_entries(image, params);
 		if (ret)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 80f874bf999e..b143c2d04420 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -916,11 +916,6 @@ void __init setup_arch(char **cmdline_p)
 #ifdef CONFIG_X86_32
 	apm_info.bios = boot_params.apm_bios_info;
 	ist_info = boot_params.ist_info;
-	if (boot_params.sys_desc_table.length != 0) {
-		machine_id = boot_params.sys_desc_table.table[0];
-		machine_submodel_id = boot_params.sys_desc_table.table[1];
-		BIOS_revision = boot_params.sys_desc_table.table[2];
-	}
 #endif
 	saved_video_mode = boot_params.hdr.vid_mode;
 	bootloader_type = boot_params.hdr.type_of_loader;
-- 
cgit v1.2.3


From c21cde6fe1ba08b357c96071c71af6543f2863ec Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij@linaro.org>
Date: Tue, 21 Jul 2015 11:36:57 +0200
Subject: gpio: document interaction with other subsystems

Now I am very fed up with people reinventing kernel wheels in
userspace "just because they can" (read, sysfs). Put in a angry
blurb in sysfs doc and put in a new file with pointers to other
subsystem drivers utilizing GPIOs.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 Documentation/gpio/00-INDEX            |  3 ++
 Documentation/gpio/drivers-on-gpio.txt | 95 ++++++++++++++++++++++++++++++++++
 Documentation/gpio/sysfs.txt           |  9 ++--
 3 files changed, 102 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/gpio/drivers-on-gpio.txt

(limited to 'Documentation')

diff --git a/Documentation/gpio/00-INDEX b/Documentation/gpio/00-INDEX
index 1de43ae46ae6..179beb234f98 100644
--- a/Documentation/gpio/00-INDEX
+++ b/Documentation/gpio/00-INDEX
@@ -6,6 +6,9 @@ consumer.txt
 	- How to obtain and use GPIOs in a driver
 driver.txt
 	- How to write a GPIO driver
+drivers-on-gpio.txt:
+	- Drivers in other subsystems that can use GPIO to provide more
+	  complex functionality.
 board.txt
 	- How to assign GPIOs to a consumer device and a function
 sysfs.txt
diff --git a/Documentation/gpio/drivers-on-gpio.txt b/Documentation/gpio/drivers-on-gpio.txt
new file mode 100644
index 000000000000..f6121328630f
--- /dev/null
+++ b/Documentation/gpio/drivers-on-gpio.txt
@@ -0,0 +1,95 @@
+Subsystem drivers using GPIO
+============================
+
+Note that standard kernel drivers exist for common GPIO tasks and will provide
+the right in-kernel and userspace APIs/ABIs for the job, and that these
+drivers can quite easily interconnect with other kernel subsystems using
+hardware descriptions such as device tree or ACPI:
+
+- leds-gpio: drivers/leds/leds-gpio.c will handle LEDs connected to  GPIO
+  lines, giving you the LED sysfs interface
+
+- ledtrig-gpio: drivers/leds/trigger/ledtrig-gpio.c will provide a LED trigger,
+  i.e. a LED will turn on/off in response to a GPIO line going high or low
+  (and that LED may in turn use the leds-gpio as per above).
+
+- gpio-keys: drivers/input/keyboard/gpio_keys.c is used when your GPIO line
+  can generate interrupts in response to a key press. Also supports debounce.
+
+- gpio-keys-polled: drivers/input/keyboard/gpio_keys_polled.c is used when your
+  GPIO line cannot generate interrupts, so it needs to be periodically polled
+  by a timer.
+
+- gpio_mouse: drivers/input/mouse/gpio_mouse.c is used to provide a mouse with
+  up to three buttons by simply using GPIOs and no mouse port. You can cut the
+  mouse cable and connect the wires to GPIO lines or solder a mouse connector
+  to the lines for a more permanent solution of this type.
+
+- gpio-beeper: drivers/input/misc/gpio-beeper.c is used to provide a beep from
+  an external speaker connected to a GPIO line.
+
+- gpio-tilt-polled: drivers/input/misc/gpio_tilt_polled.c provides tilt
+  detection switches using GPIO, which is useful for your homebrewn pinball
+  machine if for nothing else. It can detect different tilt angles of the
+  monitored object.
+
+- extcon-gpio: drivers/extcon/extcon-gpio.c is used when you need to read an
+  external connector status, such as a headset line for an audio driver or an
+  HDMI connector. It will provide a better userspace sysfs interface than GPIO.
+
+- restart-gpio: drivers/power/gpio-restart.c is used to restart/reboot the
+  system by pulling a GPIO line and will register a restart handler so
+  userspace can issue the right system call to restart the system.
+
+- poweroff-gpio: drivers/power/gpio-poweroff.c is used to power the system down
+  by pulling a GPIO line and will register a pm_power_off() callback so that
+  userspace can issue the right system call to power down the system.
+
+- gpio-gate-clock: drivers/clk/clk-gpio-gate.c is used to control a gated clock
+  (off/on) that uses a GPIO, and integrated with the clock subsystem.
+
+- i2c-gpio: drivers/i2c/busses/i2c-gpio.c is used to drive an I2C bus
+  (two wires, SDA and SCL lines) by hammering (bitbang) two GPIO lines. It will
+  appear as any other I2C bus to the system and makes it possible to connect
+  drivers for the I2C devices on the bus like any other I2C bus driver.
+
+- spi_gpio: drivers/spi/spi-gpio.c is used to drive an SPI bus (variable number
+  of wires, atleast SCK and optionally MISO, MOSI and chip select lines) using
+  GPIO hammering (bitbang). It will appear as any other SPI bus on the system
+  and makes it possible to connect drivers for SPI devices on the bus like
+  any other SPI bus driver. For example any MMC/SD card can then be connected
+  to this SPI by using the mmc_spi host from the MMC/SD card subsystem.
+
+- w1-gpio: drivers/w1/masters/w1-gpio.c is used to drive a one-wire bus using
+  a GPIO line, integrating with the W1 subsystem and handling devices on
+  the bus like any other W1 device.
+
+- gpio-fan: drivers/hwmon/gpio-fan.c is used to control a fan for cooling the
+  system, connected to a GPIO line (and optionally a GPIO alarm line),
+  presenting all the right in-kernel and sysfs interfaces to make your system
+  not overheat.
+
+- gpio-regulator: drivers/regulator/gpio-regulator.c is used to control a
+  regulator providing a certain voltage by pulling a GPIO line, integrating
+  with the regulator subsystem and giving you all the right interfaces.
+
+- gpio-wdt: drivers/watchdog/gpio_wdt.c is used to provide a watchdog timer
+  that will periodically "ping" a hardware connected to a GPIO line by toggling
+  it from 1-to-0-to-1. If that hardware does not recieve its "ping"
+  periodically, it will reset the system.
+
+- gpio-nand: drivers/mtd/nand/gpio.c is used to connect a NAND flash chip to
+  a set of simple GPIO lines: RDY, NCE, ALE, CLE, NWP. It interacts with the
+  NAND flash MTD subsystem and provides chip access and partition parsing like
+  any other NAND driving hardware.
+
+Apart from this there are special GPIO drivers in subsystems like MMC/SD to
+read card detect and write protect GPIO lines, and in the TTY serial subsystem
+to emulate MCTRL (modem control) signals CTS/RTS by using two GPIO lines. The
+MTD NOR flash has add-ons for extra GPIO lines too, though the address bus is
+usually connected directly to the flash.
+
+Use those instead of talking directly to the GPIOs using sysfs; they integrate
+with kernel frameworks better than your userspace code could. Needless to say,
+just using the apropriate kernel drivers will simplify and speed up your
+embedded hacking in particular by providing ready-made components.
diff --git a/Documentation/gpio/sysfs.txt b/Documentation/gpio/sysfs.txt
index 535b6a8a7a7c..0700b55637f5 100644
--- a/Documentation/gpio/sysfs.txt
+++ b/Documentation/gpio/sysfs.txt
@@ -20,11 +20,10 @@ userspace GPIO can be used to determine system configuration data that
 standard kernels won't know about. And for some tasks, simple userspace
 GPIO drivers could be all that the system really needs.
 
-Note that standard kernel drivers exist for common "LEDs and Buttons"
-GPIO tasks:  "leds-gpio" and "gpio_keys", respectively. Use those
-instead of talking directly to the GPIOs; they integrate with kernel
-frameworks better than your userspace code could.
-
+DO NOT ABUSE SYFS TO CONTROL HARDWARE THAT HAS PROPER KERNEL DRIVERS.
+PLEASE READ THE DOCUMENT NAMED "drivers-on-gpio.txt" IN THIS DOCUMENTATION
+DIRECTORY TO AVOID REINVENTING KERNEL WHEELS IN USERSPACE. I MEAN IT.
+REALLY.
 
 Paths in Sysfs
 --------------
-- 
cgit v1.2.3


From 36bd1683559e23d4032defadbfbd3ccd5601b8c8 Mon Sep 17 00:00:00 2001
From: Teresa Remmet <t.remmet@phytec.de>
Date: Thu, 16 Jul 2015 10:30:49 +0200
Subject: ARM: dts: Add phyBOARD-WEGA-AM335x rdk

phyBOARD-WEGA-AM335x represents a direct soldered
combination of a phyCORE-AM335x SoM and carrier board.

Different kind of SoM options can be connected to
the wega carrier board. So we created a separate
wega dtsi file. The final dts contains the actual
SoM on the carrier board.

WEGA carrier board features:
* ETH phy on carrier board: 1x MII
* 1x CAN
* 2x UART
* USB0 (device)
* USB1 (host)
* mSD slot

Signed-off-by: Teresa Remmet <t.remmet@phytec.de>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
 .../devicetree/bindings/arm/omap/omap.txt          |   3 +
 arch/arm/boot/dts/Makefile                         |   3 +-
 arch/arm/boot/dts/am335x-wega-rdk.dts              |  22 +++
 arch/arm/boot/dts/am335x-wega.dtsi                 | 151 +++++++++++++++++++++
 4 files changed, 178 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/boot/dts/am335x-wega-rdk.dts
 create mode 100644 arch/arm/boot/dts/am335x-wega.dtsi

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/omap/omap.txt b/Documentation/devicetree/bindings/arm/omap/omap.txt
index 4f6a82cef1d1..9f4e5136e568 100644
--- a/Documentation/devicetree/bindings/arm/omap/omap.txt
+++ b/Documentation/devicetree/bindings/arm/omap/omap.txt
@@ -135,6 +135,9 @@ Boards:
 - AM335X OrionLXm : Substation Automation Platform
   compatible = "novatech,am335x-lxm", "ti,am33xx"
 
+- AM335X phyBOARD-WEGA: Single Board Computer dev kit
+  compatible = "phytec,am335x-wega", "phytec,am335x-phycore-som", "ti,am33xx"
+
 - OMAP5 EVM : Evaluation Module
   compatible = "ti,omap5-evm", "ti,omap5"
 
diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 1c62928587e0..180846365081 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -442,7 +442,8 @@ dtb-$(CONFIG_SOC_AM33XX) += \
 	am335x-nano.dtb \
 	am335x-pepper.dtb \
 	am335x-lxm.dtb \
-	am335x-chiliboard.dtb
+	am335x-chiliboard.dtb \
+	am335x-wega-rdk.dtb
 dtb-$(CONFIG_ARCH_OMAP4) += \
 	omap4-duovero-parlor.dtb \
 	omap4-panda.dtb \
diff --git a/arch/arm/boot/dts/am335x-wega-rdk.dts b/arch/arm/boot/dts/am335x-wega-rdk.dts
new file mode 100644
index 000000000000..6431b7db8109
--- /dev/null
+++ b/arch/arm/boot/dts/am335x-wega-rdk.dts
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2015 Phytec Messtechnik GmbH
+ * Author: Teresa Remmet <t.remmet@phytec.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/dts-v1/;
+
+#include "am335x-phycore-som.dtsi"
+#include "am335x-wega.dtsi"
+
+/* SoM */
+&i2c_eeprom {
+	status = "okay";
+};
+
+&i2c_rtc {
+	status = "okay";
+};
diff --git a/arch/arm/boot/dts/am335x-wega.dtsi b/arch/arm/boot/dts/am335x-wega.dtsi
new file mode 100644
index 000000000000..5e541bd1b45a
--- /dev/null
+++ b/arch/arm/boot/dts/am335x-wega.dtsi
@@ -0,0 +1,151 @@
+/*
+ * Copyright (C) 2015 Phytec Messtechnik GmbH
+ * Author: Teresa Remmet <t.remmet@phytec.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/ {
+	model = "Phytec AM335x phyBOARD-WEGA";
+	compatible = "phytec,am335x-wega", "phytec,am335x-phycore-som", "ti,am33xx";
+
+};
+
+/* CAN Busses */
+&am33xx_pinmux {
+	dcan1_pins: pinmux_dcan1 {
+		pinctrl-single,pins = <
+			0x168 (PIN_OUTPUT_PULLUP | MUX_MODE2) /* uart0_ctsn.d_can1_tx */
+			0x16c (PIN_INPUT_PULLUP | MUX_MODE2) /* uart0_rtsn.d_can1_rx */
+		>;
+	};
+};
+
+&dcan1 {
+	pinctrl-names = "default";
+	pinctrl-0 = <&dcan1_pins>;
+	status = "okay";
+};
+
+/* Ethernet */
+&am33xx_pinmux {
+	ethernet1_pins: pinmux_ethernet1 {
+		pinctrl-single,pins = <
+			0x40 (PIN_OUTPUT | MUX_MODE1)		/* gpmc_a0.mii2_txen */
+			0x44 (PIN_INPUT_PULLDOWN | MUX_MODE1)	/* gpmc_a1.mii2_rxdv */
+			0x48 (PIN_OUTPUT | MUX_MODE1)		/* gpmc_a2.mii2_txd3 */
+			0x4c (PIN_OUTPUT | MUX_MODE1)		/* gpmc_a3.mii2_txd2 */
+			0x50 (PIN_OUTPUT | MUX_MODE1)		/* gpmc_a4.mii2_txd1 */
+			0x54 (PIN_OUTPUT | MUX_MODE1)		/* gpmc_a5.mii2_txd0 */
+			0x58 (PIN_INPUT_PULLDOWN | MUX_MODE1)	/* gpmc_a6.mii2_txclk */
+			0x5c (PIN_INPUT_PULLDOWN | MUX_MODE1)	/* gpmc_a7.mii2_rxclk */
+			0x60 (PIN_INPUT_PULLDOWN | MUX_MODE1)	/* gpmc_a8.mii2_rxd3 */
+			0x64 (PIN_INPUT_PULLDOWN | MUX_MODE1)	/* gpmc_a9.mii2_rxd2 */
+			0x68 (PIN_INPUT_PULLDOWN | MUX_MODE1)	/* gpmc_a10.mii2_rxd1 */
+			0x6c (PIN_INPUT_PULLDOWN | MUX_MODE1)	/* gpmc_a11.mii2_rxd0 */
+			0x74 (PIN_INPUT_PULLDOWN | MUX_MODE1)	/* gpmc_wpn.mii2_rxerr */
+			0x78 (PIN_INPUT_PULLDOWN | MUX_MODE1)	/* gpmc_ben1.mii2_col */
+		>;
+	};
+};
+
+&cpsw_emac1 {
+	phy_id = <&davinci_mdio>, <1>;
+	phy-mode = "mii";
+	dual_emac_res_vlan = <2>;
+};
+
+&mac {
+	slaves = <2>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&ethernet0_pins &ethernet1_pins>;
+	dual_emac = <1>;
+};
+
+/* MMC */
+&am33xx_pinmux {
+	mmc1_pins: pinmux_mmc1 {
+		pinctrl-single,pins = <
+			0x0F0 (PIN_INPUT_PULLUP | MUX_MODE0)	/* mmc0_dat3.mmc0_dat3 */
+			0x0F4 (PIN_INPUT_PULLUP | MUX_MODE0)	/* mmc0_dat2.mmc0_dat2 */
+			0x0F8 (PIN_INPUT_PULLUP | MUX_MODE0)	/* mmc0_dat1.mmc0_dat1 */
+			0x0FC (PIN_INPUT_PULLUP | MUX_MODE0)	/* mmc0_dat0.mmc0_dat0 */
+			0x100 (PIN_INPUT_PULLUP | MUX_MODE0)	/* mmc0_clk.mmc0_clk */
+			0x104 (PIN_INPUT_PULLUP | MUX_MODE0)	/* mmc0_cmd.mmc0_cmd */
+			0x160 (PIN_INPUT_PULLUP | MUX_MODE7)	/* spi0_cs1.mmc0_sdcd */
+		>;
+	};
+};
+
+&mmc1 {
+	vmmc-supply = <&vmmc_reg>;
+	bus-width = <4>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&mmc1_pins>;
+	cd-gpios = <&gpio0 6 GPIO_ACTIVE_HIGH>;
+	status = "okay";
+};
+
+/* UARTs */
+&am33xx_pinmux {
+	uart0_pins: pinmux_uart0 {
+		pinctrl-single,pins = <
+			0x170 (PIN_INPUT_PULLUP | MUX_MODE0)    /* uart0_rxd.uart0_rxd */
+			0x174 (PIN_OUTPUT_PULLDOWN | MUX_MODE0) /* uart0_txd.uart0_txd */
+		>;
+	};
+
+	uart1_pins: pinmux_uart1_pins {
+		pinctrl-single,pins = <
+			0x180 (PIN_INPUT_PULLUP | MUX_MODE0)	/* uart1_rxd.uart1_rxd */
+			0x184 (PIN_OUTPUT_PULLDOWN | MUX_MODE0)	/* uart1_txd.uart1_txd */
+			0x178 (PIN_INPUT | MUX_MODE0)		/* uart1_ctsn.uart1_ctsn */
+			0x17c (PIN_OUTPUT_PULLDOWN | MUX_MODE0)		/* uart1_rtsn.uart1_rtsn */
+		>;
+	};
+};
+
+&uart0 {
+	pinctrl-names = "default";
+	pinctrl-0 = <&uart0_pins>;
+	status = "okay";
+};
+
+&uart1 {
+	pinctrl-names = "default";
+	pinctrl-0 = <&uart1_pins>;
+	status = "okay";
+};
+
+/* USB */
+&cppi41dma {
+	status = "okay";
+};
+
+&usb_ctrl_mod {
+	status = "okay";
+};
+
+&usb {
+	status = "okay";
+};
+
+&usb0 {
+	dr_mode = "peripheral";
+	status = "okay";
+};
+
+&usb0_phy {
+	status = "okay";
+};
+
+&usb1 {
+	dr_mode = "host";
+	status = "okay";
+};
+
+&usb1_phy {
+	status = "okay";
+};
-- 
cgit v1.2.3


From 8ab30c1538b14424015e45063c41d509b24c1dea Mon Sep 17 00:00:00 2001
From: Alex Bennée <alex.bennee@linaro.org>
Date: Tue, 7 Jul 2015 17:29:53 +0100
Subject: KVM: add comments for kvm_debug_exit_arch struct
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Bring into line with the comments for the other structures and their
KVM_EXIT_* cases. Also update api.txt to reflect use in kvm_run
documentation.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 Documentation/virtual/kvm/api.txt | 4 +++-
 include/uapi/linux/kvm.h          | 3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index a7926a90156f..9f746eab333d 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3111,11 +3111,13 @@ data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
 where kvm expects application code to place the data for the next
 KVM_RUN invocation (KVM_EXIT_IO_IN).  Data format is a packed array.
 
+		/* KVM_EXIT_DEBUG */
 		struct {
 			struct kvm_debug_exit_arch arch;
 		} debug;
 
-Unused.
+If the exit_reason is KVM_EXIT_DEBUG, then a vcpu is processing a debug event
+for which architecture specific information is returned.
 
 		/* KVM_EXIT_MMIO */
 		struct {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 716ad4ae4d4b..4ab3c6a8d563 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -237,6 +237,7 @@ struct kvm_run {
 			__u32 count;
 			__u64 data_offset; /* relative to kvm_run start */
 		} io;
+		/* KVM_EXIT_DEBUG */
 		struct {
 			struct kvm_debug_exit_arch arch;
 		} debug;
@@ -285,6 +286,7 @@ struct kvm_run {
 			__u32 data;
 			__u8  is_write;
 		} dcr;
+		/* KVM_EXIT_INTERNAL_ERROR */
 		struct {
 			__u32 suberror;
 			/* Available with KVM_CAP_INTERNAL_ERROR_DATA: */
@@ -295,6 +297,7 @@ struct kvm_run {
 		struct {
 			__u64 gprs[32];
 		} osi;
+		/* KVM_EXIT_PAPR_HCALL */
 		struct {
 			__u64 nr;
 			__u64 ret;
-- 
cgit v1.2.3


From 0e6f07f29cfb8d79dbbdb12560a73f7121ba324e Mon Sep 17 00:00:00 2001
From: Alex Bennée <alex.bennee@linaro.org>
Date: Tue, 7 Jul 2015 17:29:55 +0100
Subject: KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctl
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds a stub function to support the KVM_SET_GUEST_DEBUG
ioctl. Any unsupported flag will return -EINVAL. For now, only
KVM_GUESTDBG_ENABLE is supported, although it won't have any effects.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 Documentation/virtual/kvm/api.txt |  2 +-
 arch/arm/kvm/arm.c                |  7 -------
 arch/arm/kvm/guest.c              |  6 ++++++
 arch/arm64/kvm/guest.c            | 27 +++++++++++++++++++++++++++
 4 files changed, 34 insertions(+), 8 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 9f746eab333d..19adfd385882 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2671,7 +2671,7 @@ handled.
 4.87 KVM_SET_GUEST_DEBUG
 
 Capability: KVM_CAP_SET_GUEST_DEBUG
-Architectures: x86, s390, ppc
+Architectures: x86, s390, ppc, arm64
 Type: vcpu ioctl
 Parameters: struct kvm_guest_debug (in)
 Returns: 0 on success; -1 on error
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index bc738d2b8392..1b693cb2d5b2 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -301,13 +301,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	kvm_arm_set_running_vcpu(NULL);
 }
 
-int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
-					struct kvm_guest_debug *dbg)
-{
-	return -EINVAL;
-}
-
-
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index d503fbb787d3..96e935bbc38c 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -290,3 +290,9 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 {
 	return -EINVAL;
 }
+
+int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
+					struct kvm_guest_debug *dbg)
+{
+	return -EINVAL;
+}
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 9535bd555d1d..0ba86775235d 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -331,3 +331,30 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 {
 	return -EINVAL;
 }
+
+#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE)
+
+/**
+ * kvm_arch_vcpu_ioctl_set_guest_debug - set up guest debugging
+ * @kvm:	pointer to the KVM struct
+ * @kvm_guest_debug: the ioctl data buffer
+ *
+ * This sets up and enables the VM for guest debugging. Userspace
+ * passes in a control flag to enable different debug types and
+ * potentially other architecture specific information in the rest of
+ * the structure.
+ */
+int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
+					struct kvm_guest_debug *dbg)
+{
+	if (dbg->control & ~KVM_GUESTDBG_VALID_MASK)
+		return -EINVAL;
+
+	if (dbg->control & KVM_GUESTDBG_ENABLE) {
+		vcpu->guest_debug = dbg->control;
+	} else {
+		/* If not enabled clear all flags */
+		vcpu->guest_debug = 0;
+	}
+	return 0;
+}
-- 
cgit v1.2.3


From 4bd611ca60afa155bca25b40312ed61c4d46237f Mon Sep 17 00:00:00 2001
From: Alex Bennée <alex.bennee@linaro.org>
Date: Tue, 7 Jul 2015 17:29:57 +0100
Subject: KVM: arm64: guest debug, add SW break point support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds support for SW breakpoints inserted by userspace.

We do this by trapping all guest software debug exceptions to the
hypervisor (MDCR_EL2.TDE). The exit handler sets an exit reason of
KVM_EXIT_DEBUG with the kvm_debug_exit_arch structure holding the
exception syndrome information.

It will be up to userspace to extract the PC (via GET_ONE_REG) and
determine if the debug event was for a breakpoint it inserted. If not
userspace will need to re-inject the correct exception restart the
hypervisor to deliver the debug exception to the guest.

Any other guest software debug exception (e.g. single step or HW
assisted breakpoints) will cause an error and the VM to be killed. This
is addressed by later patches which add support for the other debug
types.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 Documentation/virtual/kvm/api.txt |  2 +-
 arch/arm64/kvm/debug.c            |  3 +++
 arch/arm64/kvm/guest.c            |  2 +-
 arch/arm64/kvm/handle_exit.c      | 36 ++++++++++++++++++++++++++++++++++++
 4 files changed, 41 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 19adfd385882..0f498da354f2 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2693,7 +2693,7 @@ when running. Common control bits are:
 The top 16 bits of the control field are architecture specific control
 flags which can include the following:
 
-  - KVM_GUESTDBG_USE_SW_BP:     using software breakpoints [x86]
+  - KVM_GUESTDBG_USE_SW_BP:     using software breakpoints [x86, arm64]
   - KVM_GUESTDBG_USE_HW_BP:     using hardware breakpoints [x86, s390]
   - KVM_GUESTDBG_INJECT_DB:     inject DB type exception [x86]
   - KVM_GUESTDBG_INJECT_BP:     inject BP type exception [x86]
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index faf0e1fdba9e..8d1bfa438310 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -73,6 +73,9 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 	if (trap_debug)
 		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
 
+	/* Trap breakpoints? */
+	if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
+		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
 }
 
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 0ba86775235d..22d22c54fd8d 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -332,7 +332,7 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
-#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE)
+#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP)
 
 /**
  * kvm_arch_vcpu_ioctl_set_guest_debug - set up guest debugging
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 524fa25671fc..27f38a9f9ea2 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -82,6 +82,40 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return 1;
 }
 
+/**
+ * kvm_handle_guest_debug - handle a debug exception instruction
+ *
+ * @vcpu:	the vcpu pointer
+ * @run:	access to the kvm_run structure for results
+ *
+ * We route all debug exceptions through the same handler. If both the
+ * guest and host are using the same debug facilities it will be up to
+ * userspace to re-inject the correct exception for guest delivery.
+ *
+ * @return: 0 (while setting run->exit_reason), -1 for error
+ */
+static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	u32 hsr = kvm_vcpu_get_hsr(vcpu);
+	int ret = 0;
+
+	run->exit_reason = KVM_EXIT_DEBUG;
+	run->debug.arch.hsr = hsr;
+
+	switch (hsr >> ESR_ELx_EC_SHIFT) {
+	case ESR_ELx_EC_BKPT32:
+	case ESR_ELx_EC_BRK64:
+		break;
+	default:
+		kvm_err("%s: un-handled case hsr: %#08x\n",
+			__func__, (unsigned int) hsr);
+		ret = -1;
+		break;
+	}
+
+	return ret;
+}
+
 static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_WFx]	= kvm_handle_wfx,
 	[ESR_ELx_EC_CP15_32]	= kvm_handle_cp15_32,
@@ -96,6 +130,8 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
 	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_DABT_LOW]	= kvm_handle_guest_abort,
+	[ESR_ELx_EC_BKPT32]	= kvm_handle_guest_debug,
+	[ESR_ELx_EC_BRK64]	= kvm_handle_guest_debug,
 };
 
 static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
-- 
cgit v1.2.3


From 834bf88726f0f11ddc7ff9679fc9458654c01a12 Mon Sep 17 00:00:00 2001
From: Alex Bennée <alex.bennee@linaro.org>
Date: Tue, 7 Jul 2015 17:30:02 +0100
Subject: KVM: arm64: enable KVM_CAP_SET_GUEST_DEBUG
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Finally advertise the KVM capability for SET_GUEST_DEBUG. Once arm
support is added this check can be moved to the common
kvm_vm_ioctl_check_extension() code.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 Documentation/virtual/kvm/api.txt      |  7 +++++-
 arch/arm64/include/asm/hw_breakpoint.h | 14 ++++++++++++
 arch/arm64/include/asm/kvm_host.h      |  6 ++++-
 arch/arm64/kernel/hw_breakpoint.c      | 12 ----------
 arch/arm64/kvm/debug.c                 | 40 +++++++++++++++++++++++++++++-----
 arch/arm64/kvm/guest.c                 |  7 ++++++
 arch/arm64/kvm/handle_exit.c           |  6 +++++
 arch/arm64/kvm/reset.c                 | 16 ++++++++++++++
 8 files changed, 89 insertions(+), 19 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 0f498da354f2..35affb5d9456 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2694,7 +2694,7 @@ The top 16 bits of the control field are architecture specific control
 flags which can include the following:
 
   - KVM_GUESTDBG_USE_SW_BP:     using software breakpoints [x86, arm64]
-  - KVM_GUESTDBG_USE_HW_BP:     using hardware breakpoints [x86, s390]
+  - KVM_GUESTDBG_USE_HW_BP:     using hardware breakpoints [x86, s390, arm64]
   - KVM_GUESTDBG_INJECT_DB:     inject DB type exception [x86]
   - KVM_GUESTDBG_INJECT_BP:     inject BP type exception [x86]
   - KVM_GUESTDBG_EXIT_PENDING:  trigger an immediate guest exit [s390]
@@ -2709,6 +2709,11 @@ updated to the correct (supplied) values.
 The second part of the structure is architecture specific and
 typically contains a set of debug registers.
 
+For arm64 the number of debug registers is implementation defined and
+can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and
+KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number
+indicating the number of supported registers.
+
 When debug events exit the main run loop with the reason
 KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
 structure containing architecture specific debug information.
diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h
index 52b484b6aa1a..4c47cb2fbb52 100644
--- a/arch/arm64/include/asm/hw_breakpoint.h
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -16,6 +16,8 @@
 #ifndef __ASM_HW_BREAKPOINT_H
 #define __ASM_HW_BREAKPOINT_H
 
+#include <asm/cputype.h>
+
 #ifdef __KERNEL__
 
 struct arch_hw_breakpoint_ctrl {
@@ -132,5 +134,17 @@ static inline void ptrace_hw_copy_thread(struct task_struct *task)
 
 extern struct pmu perf_ops_bp;
 
+/* Determine number of BRP registers available. */
+static inline int get_num_brps(void)
+{
+	return ((read_cpuid(ID_AA64DFR0_EL1) >> 12) & 0xf) + 1;
+}
+
+/* Determine number of WRP registers available. */
+static inline int get_num_wrps(void)
+{
+	return ((read_cpuid(ID_AA64DFR0_EL1) >> 20) & 0xf) + 1;
+}
+
 #endif	/* __KERNEL__ */
 #endif	/* __ASM_BREAKPOINT_H */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 9b99402b14df..409217f48456 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -116,13 +116,17 @@ struct kvm_vcpu_arch {
 	 * debugging the guest from the host and to maintain separate host and
 	 * guest state during world switches. vcpu_debug_state are the debug
 	 * registers of the vcpu as the guest sees them.  host_debug_state are
-	 * the host registers which are saved and restored during world switches.
+	 * the host registers which are saved and restored during
+	 * world switches. external_debug_state contains the debug
+	 * values we want to debug the guest. This is set via the
+	 * KVM_SET_GUEST_DEBUG ioctl.
 	 *
 	 * debug_ptr points to the set of debug registers that should be loaded
 	 * onto the hardware when running the guest.
 	 */
 	struct kvm_guest_debug_arch *debug_ptr;
 	struct kvm_guest_debug_arch vcpu_debug_state;
+	struct kvm_guest_debug_arch external_debug_state;
 
 	/* Pointer to host CPU context */
 	kvm_cpu_context_t *host_cpu_context;
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index 7a1a5da6c8c1..77bee00bd7ea 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -48,18 +48,6 @@ static DEFINE_PER_CPU(int, stepping_kernel_bp);
 static int core_num_brps;
 static int core_num_wrps;
 
-/* Determine number of BRP registers available. */
-static int get_num_brps(void)
-{
-	return ((read_cpuid(ID_AA64DFR0_EL1) >> 12) & 0xf) + 1;
-}
-
-/* Determine number of WRP registers available. */
-static int get_num_wrps(void)
-{
-	return ((read_cpuid(ID_AA64DFR0_EL1) >> 20) & 0xf) + 1;
-}
-
 int hw_breakpoint_slots(int type)
 {
 	/*
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index e0947b77faaa..4a99e54d7f3d 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -105,10 +105,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 				MDCR_EL2_TDRA |
 				MDCR_EL2_TDOSA);
 
-	/* Trap on access to debug registers? */
-	if (trap_debug)
-		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
-
 	/* Is Guest debugging in effect? */
 	if (vcpu->guest_debug) {
 		/* Route all software debug exceptions to EL2 */
@@ -143,11 +139,45 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 		} else {
 			vcpu_sys_reg(vcpu, MDSCR_EL1) &= ~DBG_MDSCR_SS;
 		}
+
+		/*
+		 * HW Breakpoints and watchpoints
+		 *
+		 * We simply switch the debug_ptr to point to our new
+		 * external_debug_state which has been populated by the
+		 * debug ioctl. The existing KVM_ARM64_DEBUG_DIRTY
+		 * mechanism ensures the registers are updated on the
+		 * world switch.
+		 */
+		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW) {
+			/* Enable breakpoints/watchpoints */
+			vcpu_sys_reg(vcpu, MDSCR_EL1) |= DBG_MDSCR_MDE;
+
+			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
+			vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+			trap_debug = true;
+		}
 	}
+
+	BUG_ON(!vcpu->guest_debug &&
+		vcpu->arch.debug_ptr != &vcpu->arch.vcpu_debug_state);
+
+	/* Trap debug register access */
+	if (trap_debug)
+		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
 }
 
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu)
 {
-	if (vcpu->guest_debug)
+	if (vcpu->guest_debug) {
 		restore_guest_debug_regs(vcpu);
+
+		/*
+		 * If we were using HW debug we need to restore the
+		 * debug_ptr to the guest debug state.
+		 */
+		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW)
+			kvm_arm_reset_debug_ptr(vcpu);
+
+	}
 }
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 48de4f4aaa1a..6f1b249e0587 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -334,6 +334,7 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 
 #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE |    \
 			    KVM_GUESTDBG_USE_SW_BP | \
+			    KVM_GUESTDBG_USE_HW | \
 			    KVM_GUESTDBG_SINGLESTEP)
 
 /**
@@ -354,6 +355,12 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
 
 	if (dbg->control & KVM_GUESTDBG_ENABLE) {
 		vcpu->guest_debug = dbg->control;
+
+		/* Hardware assisted Break and Watch points */
+		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW) {
+			vcpu->arch.external_debug_state = dbg->arch;
+		}
+
 	} else {
 		/* If not enabled clear all flags */
 		vcpu->guest_debug = 0;
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index e9de13ed477e..68a0759b1375 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -103,7 +103,11 @@ static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	run->debug.arch.hsr = hsr;
 
 	switch (hsr >> ESR_ELx_EC_SHIFT) {
+	case ESR_ELx_EC_WATCHPT_LOW:
+		run->debug.arch.far = vcpu->arch.fault.far_el2;
+		/* fall through */
 	case ESR_ELx_EC_SOFTSTP_LOW:
+	case ESR_ELx_EC_BREAKPT_LOW:
 	case ESR_ELx_EC_BKPT32:
 	case ESR_ELx_EC_BRK64:
 		break;
@@ -132,6 +136,8 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_DABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,
+	[ESR_ELx_EC_WATCHPT_LOW]= kvm_handle_guest_debug,
+	[ESR_ELx_EC_BREAKPT_LOW]= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BKPT32]	= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BRK64]	= kvm_handle_guest_debug,
 };
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 0b4326578985..b4af6185713f 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -22,6 +22,7 @@
 #include <linux/errno.h>
 #include <linux/kvm_host.h>
 #include <linux/kvm.h>
+#include <linux/hw_breakpoint.h>
 
 #include <kvm/arm_arch_timer.h>
 
@@ -56,6 +57,12 @@ static bool cpu_has_32bit_el1(void)
 	return !!(pfr0 & 0x20);
 }
 
+/**
+ * kvm_arch_dev_ioctl_check_extension
+ *
+ * We currently assume that the number of HW registers is uniform
+ * across all CPUs (see cpuinfo_sanity_check).
+ */
 int kvm_arch_dev_ioctl_check_extension(long ext)
 {
 	int r;
@@ -64,6 +71,15 @@ int kvm_arch_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_ARM_EL1_32BIT:
 		r = cpu_has_32bit_el1();
 		break;
+	case KVM_CAP_GUEST_DEBUG_HW_BPS:
+		r = get_num_brps();
+		break;
+	case KVM_CAP_GUEST_DEBUG_HW_WPS:
+		r = get_num_wrps();
+		break;
+	case KVM_CAP_SET_GUEST_DEBUG:
+		r = 1;
+		break;
 	default:
 		r = 0;
 	}
-- 
cgit v1.2.3


From 019d8817b1b064c2bacfbcf40fc68184438ad05a Mon Sep 17 00:00:00 2001
From: Alan Stern <stern@rowland.harvard.edu>
Date: Wed, 15 Jul 2015 14:40:06 +0200
Subject: PM / sleep: Allow devices without runtime PM to do direct-complete

Don't unset the direct_complete flag on devices that have runtime PM
disabled, if they are runtime suspended.

This is needed because otherwise ancestor devices wouldn't be able to
do direct_complete without adding runtime PM support to all its
descendants.

Also removes pm_runtime_suspended_if_enabled() because it's now unused.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/power/devices.txt    | 7 +++++++
 Documentation/power/runtime_pm.txt | 4 ----
 drivers/base/power/main.c          | 2 +-
 include/linux/pm_runtime.h         | 6 ------
 4 files changed, 8 insertions(+), 11 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
index d172bce0fd49..8ba6625fdd63 100644
--- a/Documentation/power/devices.txt
+++ b/Documentation/power/devices.txt
@@ -341,6 +341,13 @@ the phases are:
 	and is entirely responsible for bringing the device back to the
 	functional state as appropriate.
 
+	Note that this direct-complete procedure applies even if the device is
+	disabled for runtime PM; only the runtime-PM status matters.  It follows
+	that if a device has system-sleep callbacks but does not support runtime
+	PM, then its prepare callback must never return a positive value.  This
+	is because all devices are initially set to runtime-suspended with
+	runtime PM disabled.
+
     2.	The suspend methods should quiesce the device to stop it from performing
 	I/O.  They also may save the device registers and put it into the
 	appropriate low-power state, depending on the bus type the device is on,
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index e76dc0ad4d2b..0784bc3a2ab5 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -445,10 +445,6 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
   bool pm_runtime_status_suspended(struct device *dev);
     - return true if the device's runtime PM status is 'suspended'
 
-  bool pm_runtime_suspended_if_enabled(struct device *dev);
-    - return true if the device's runtime PM status is 'suspended' and its
-      'power.disable_depth' field is equal to 1
-
   void pm_runtime_allow(struct device *dev);
     - set the power.runtime_auto flag for the device and decrease its usage
       counter (used by the /sys/devices/.../power/control interface to
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 30b7bbfdc558..1710c26ba097 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1377,7 +1377,7 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async)
 	if (dev->power.direct_complete) {
 		if (pm_runtime_status_suspended(dev)) {
 			pm_runtime_disable(dev);
-			if (pm_runtime_suspended_if_enabled(dev))
+			if (pm_runtime_status_suspended(dev))
 				goto Complete;
 
 			pm_runtime_enable(dev);
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index 30e84d48bfea..3bdbb4189780 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -98,11 +98,6 @@ static inline bool pm_runtime_status_suspended(struct device *dev)
 	return dev->power.runtime_status == RPM_SUSPENDED;
 }
 
-static inline bool pm_runtime_suspended_if_enabled(struct device *dev)
-{
-	return pm_runtime_status_suspended(dev) && dev->power.disable_depth == 1;
-}
-
 static inline bool pm_runtime_enabled(struct device *dev)
 {
 	return !dev->power.disable_depth;
@@ -164,7 +159,6 @@ static inline void device_set_run_wake(struct device *dev, bool enable) {}
 static inline bool pm_runtime_suspended(struct device *dev) { return false; }
 static inline bool pm_runtime_active(struct device *dev) { return true; }
 static inline bool pm_runtime_status_suspended(struct device *dev) { return false; }
-static inline bool pm_runtime_suspended_if_enabled(struct device *dev) { return false; }
 static inline bool pm_runtime_enabled(struct device *dev) { return false; }
 
 static inline void pm_runtime_no_callbacks(struct device *dev) {}
-- 
cgit v1.2.3


From 4cba5c2103657d43d0886e4cff8004d95a3d0def Mon Sep 17 00:00:00 2001
From: Stas Sergeev <stsp@list.ru>
Date: Mon, 20 Jul 2015 17:49:57 -0700
Subject: of_mdio: add new DT property 'managed' to specify the PHY management
 type

Currently the PHY management type is selected by the MAC driver arbitrary.
The decision is based on the presence of the "fixed-link" node and on a
will of the driver's authors.
This caused a regression recently, when mvneta driver suddenly started
to use the in-band status for auto-negotiation on fixed links.
It appears the auto-negotiation may not work when expected by the MAC driver.
Sebastien Rannou explains:
<< Yes, I confirm that my HW does not generate an in-band status. AFAIK, it's
a PHY that aggregates 4xSGMIIs to 1xQSGMII ; the MAC side of the PHY (with
inband status) is connected to the switch through QSGMII, and in this context
we are on the media side of the PHY. >>
https://lkml.org/lkml/2015/7/10/206

This patch introduces the new string property 'managed' that allows
the user to set the management type explicitly.
The supported values are:
"auto" - default. Uses either MDIO or nothing, depending on the presence
of the fixed-link node
"in-band-status" - use in-band status

Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net>

CC: Rob Herring <robh+dt@kernel.org>
CC: Pawel Moll <pawel.moll@arm.com>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Ian Campbell <ijc+devicetree@hellion.org.uk>
CC: Kumar Gala <galak@codeaurora.org>
CC: Florian Fainelli <f.fainelli@gmail.com>
CC: Grant Likely <grant.likely@linaro.org>
CC: devicetree@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/devicetree/bindings/net/ethernet.txt |  4 ++++
 drivers/of/of_mdio.c                               | 19 +++++++++++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/ethernet.txt b/Documentation/devicetree/bindings/net/ethernet.txt
index 41b3f3f864e8..5d88f37480b6 100644
--- a/Documentation/devicetree/bindings/net/ethernet.txt
+++ b/Documentation/devicetree/bindings/net/ethernet.txt
@@ -25,7 +25,11 @@ The following properties are common to the Ethernet controllers:
   flow control thresholds.
 - tx-fifo-depth: the size of the controller's transmit fifo in bytes. This
   is used for components that can have configurable fifo sizes.
+- managed: string, specifies the PHY management type. Supported values are:
+  "auto", "in-band-status". "auto" is the default, it usess MDIO for
+  management if fixed-link is not specified.
 
 Child nodes of the Ethernet controller are typically the individual PHY devices
 connected via the MDIO bus (sometimes the MDIO bus controller is separate).
 They are described in the phy.txt file in this same directory.
+For non-MDIO PHY management see fixed-link.txt.
diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index fdc60db60829..7c8c23cc6896 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -266,7 +266,8 @@ EXPORT_SYMBOL(of_phy_attach);
 bool of_phy_is_fixed_link(struct device_node *np)
 {
 	struct device_node *dn;
-	int len;
+	int len, err;
+	const char *managed;
 
 	/* New binding */
 	dn = of_get_child_by_name(np, "fixed-link");
@@ -275,6 +276,10 @@ bool of_phy_is_fixed_link(struct device_node *np)
 		return true;
 	}
 
+	err = of_property_read_string(np, "managed", &managed);
+	if (err == 0 && strcmp(managed, "auto") != 0)
+		return true;
+
 	/* Old binding */
 	if (of_get_property(np, "fixed-link", &len) &&
 	    len == (5 * sizeof(__be32)))
@@ -289,8 +294,18 @@ int of_phy_register_fixed_link(struct device_node *np)
 	struct fixed_phy_status status = {};
 	struct device_node *fixed_link_node;
 	const __be32 *fixed_link_prop;
-	int len;
+	int len, err;
 	struct phy_device *phy;
+	const char *managed;
+
+	err = of_property_read_string(np, "managed", &managed);
+	if (err == 0) {
+		if (strcmp(managed, "in-band-status") == 0) {
+			/* status is zeroed, namely its .link member */
+			phy = fixed_phy_register(PHY_POLL, &status, np);
+			return IS_ERR(phy) ? PTR_ERR(phy) : 0;
+		}
+	}
 
 	/* New binding */
 	fixed_link_node = of_get_child_by_name(np, "fixed-link");
-- 
cgit v1.2.3


From 5eb26c60590983e11f567916a83d1f0a70986553 Mon Sep 17 00:00:00 2001
From: Gabriel Fernandez <gabriel.fernandez@linaro.org>
Date: Tue, 23 Jun 2015 16:09:00 +0200
Subject: ARM: STi: DT: Rename st_pll3200c32_407_c0_x into st_pll3200c32_cx_x

Use a generic name for this kind of PLL

Signed-off-by: Gabriel Fernandez <gabriel.fernandez@linaro.org>
Signed-off-by: Maxime Coquelin <maxime.coquelin@st.com>
---
 Documentation/devicetree/bindings/clock/st/st,clkgen-pll.txt | 4 ++--
 arch/arm/boot/dts/stih407-clock.dtsi                         | 4 ++--
 arch/arm/boot/dts/stih410-clock.dtsi                         | 4 ++--
 arch/arm/boot/dts/stih418-clock.dtsi                         | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/clock/st/st,clkgen-pll.txt b/Documentation/devicetree/bindings/clock/st/st,clkgen-pll.txt
index efb51cf0c845..d8b168ebd5f1 100644
--- a/Documentation/devicetree/bindings/clock/st/st,clkgen-pll.txt
+++ b/Documentation/devicetree/bindings/clock/st/st,clkgen-pll.txt
@@ -21,8 +21,8 @@ Required properties:
 	"st,stih416-plls-c32-ddr",	"st,clkgen-plls-c32"
 	"st,stih407-plls-c32-a0",	"st,clkgen-plls-c32"
 	"st,stih407-plls-c32-a9",	"st,clkgen-plls-c32"
-	"st,stih407-plls-c32-c0_0",	"st,clkgen-plls-c32"
-	"st,stih407-plls-c32-c0_1",	"st,clkgen-plls-c32"
+	"sst,plls-c32-cx_0",		"st,clkgen-plls-c32"
+	"sst,plls-c32-cx_1",		"st,clkgen-plls-c32"
 
 	"st,stih415-gpu-pll-c32",	"st,clkgengpu-pll-c32"
 	"st,stih416-gpu-pll-c32",	"st,clkgengpu-pll-c32"
diff --git a/arch/arm/boot/dts/stih407-clock.dtsi b/arch/arm/boot/dts/stih407-clock.dtsi
index e65744fc12ab..ad45f5e8fac7 100644
--- a/arch/arm/boot/dts/stih407-clock.dtsi
+++ b/arch/arm/boot/dts/stih407-clock.dtsi
@@ -134,7 +134,7 @@
 
 			clk_s_c0_pll0: clk-s-c0-pll0 {
 				#clock-cells = <1>;
-				compatible = "st,stih407-plls-c32-c0_0", "st,clkgen-plls-c32";
+				compatible = "st,plls-c32-cx_0", "st,clkgen-plls-c32";
 
 				clocks = <&clk_sysin>;
 
@@ -143,7 +143,7 @@
 
 			clk_s_c0_pll1: clk-s-c0-pll1 {
 				#clock-cells = <1>;
-				compatible = "st,stih407-plls-c32-c0_1", "st,clkgen-plls-c32";
+				compatible = "st,plls-c32-cx_1", "st,clkgen-plls-c32";
 
 				clocks = <&clk_sysin>;
 
diff --git a/arch/arm/boot/dts/stih410-clock.dtsi b/arch/arm/boot/dts/stih410-clock.dtsi
index 6b5803a30096..d1f2acafc9b6 100644
--- a/arch/arm/boot/dts/stih410-clock.dtsi
+++ b/arch/arm/boot/dts/stih410-clock.dtsi
@@ -137,7 +137,7 @@
 
 			clk_s_c0_pll0: clk-s-c0-pll0 {
 				#clock-cells = <1>;
-				compatible = "st,stih407-plls-c32-c0_0", "st,clkgen-plls-c32";
+				compatible = "st,plls-c32-cx_0", "st,clkgen-plls-c32";
 
 				clocks = <&clk_sysin>;
 
@@ -146,7 +146,7 @@
 
 			clk_s_c0_pll1: clk-s-c0-pll1 {
 				#clock-cells = <1>;
-				compatible = "st,stih407-plls-c32-c0_1", "st,clkgen-plls-c32";
+				compatible = "st,plls-c32-cx_1", "st,clkgen-plls-c32";
 
 				clocks = <&clk_sysin>;
 
diff --git a/arch/arm/boot/dts/stih418-clock.dtsi b/arch/arm/boot/dts/stih418-clock.dtsi
index 0ab23daa2829..148e1772465f 100644
--- a/arch/arm/boot/dts/stih418-clock.dtsi
+++ b/arch/arm/boot/dts/stih418-clock.dtsi
@@ -137,7 +137,7 @@
 
 			clk_s_c0_pll0: clk-s-c0-pll0 {
 				#clock-cells = <1>;
-				compatible = "st,stih407-plls-c32-c0_0", "st,clkgen-plls-c32";
+				compatible = "st,plls-c32-cx_0", "st,clkgen-plls-c32";
 
 				clocks = <&clk_sysin>;
 
@@ -146,7 +146,7 @@
 
 			clk_s_c0_pll1: clk-s-c0-pll1 {
 				#clock-cells = <1>;
-				compatible = "st,stih407-plls-c32-c0_1", "st,clkgen-plls-c32";
+				compatible = "st,plls-c32-cx_1", "st,clkgen-plls-c32";
 
 				clocks = <&clk_sysin>;
 
-- 
cgit v1.2.3


From 25614824685247e00b786032a504f10bfab347b1 Mon Sep 17 00:00:00 2001
From: Philipp Zabel <p.zabel@pengutronix.de>
Date: Fri, 17 Jul 2015 11:02:55 -0300
Subject: [media] tc358743: support probe from device tree

Add support for probing the TC358743 subdevice from device tree.
The reference clock must be supplied using the common clock bindings.
MIPI CSI-2 specific properties are parsed from the OF graph endpoint
node and support for a non-continuous MIPI CSI-2 clock is added.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
---
 .../devicetree/bindings/media/i2c/tc358743.txt     |  48 +++++++
 drivers/media/i2c/tc358743.c                       | 155 ++++++++++++++++++++-
 2 files changed, 197 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/media/i2c/tc358743.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/media/i2c/tc358743.txt b/Documentation/devicetree/bindings/media/i2c/tc358743.txt
new file mode 100644
index 000000000000..5218921629ed
--- /dev/null
+++ b/Documentation/devicetree/bindings/media/i2c/tc358743.txt
@@ -0,0 +1,48 @@
+* Toshiba TC358743 HDMI-RX to MIPI CSI2-TX Bridge
+
+The Toshiba TC358743 HDMI-RX to MIPI CSI2-TX (H2C) is a bridge that converts
+a HDMI stream to MIPI CSI-2 TX. It is programmable through I2C.
+
+Required Properties:
+
+- compatible: value should be "toshiba,tc358743"
+- clocks, clock-names: should contain a phandle link to the reference clock
+		       source, the clock input is named "refclk".
+
+Optional Properties:
+
+- reset-gpios: gpio phandle GPIO connected to the reset pin
+- interrupts, interrupt-parent: GPIO connected to the interrupt pin
+- data-lanes: should be <1 2 3 4> for four-lane operation,
+	      or <1 2> for two-lane operation
+- clock-lanes: should be <0>
+- clock-noncontinuous: Presence of this boolean property decides whether the
+		       MIPI CSI-2 clock is continuous or non-continuous.
+- link-frequencies: List of allowed link frequencies in Hz. Each frequency is
+		    expressed as a 64-bit big-endian integer. The frequency
+		    is half of the bps per lane due to DDR transmission.
+
+For further information on the MIPI CSI-2 endpoint node properties, see
+Documentation/devicetree/bindings/media/video-interfaces.txt.
+
+Example:
+
+	tc358743@0f {
+		compatible = "toshiba,tc358743";
+		reg = <0x0f>;
+		clocks = <&hdmi_osc>;
+		clock-names = "refclk";
+		reset-gpios = <&gpio6 9 GPIO_ACTIVE_LOW>;
+		interrupt-parent = <&gpio2>;
+		interrupts = <5 IRQ_TYPE_LEVEL_HIGH>;
+
+		port {
+			tc358743_out: endpoint {
+				remote-endpoint = <&mipi_csi2_in>;
+				data-lanes = <1 2 3 4>;
+				clock-lanes = <0>;
+				clock-noncontinuous;
+				link-frequencies = /bits/ 64 <297000000>;
+			};
+		};
+	};
diff --git a/drivers/media/i2c/tc358743.c b/drivers/media/i2c/tc358743.c
index 0ccae3308b68..76d0aaa19493 100644
--- a/drivers/media/i2c/tc358743.c
+++ b/drivers/media/i2c/tc358743.c
@@ -29,7 +29,9 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/i2c.h>
+#include <linux/clk.h>
 #include <linux/delay.h>
+#include <linux/gpio/consumer.h>
 #include <linux/videodev2.h>
 #include <linux/workqueue.h>
 #include <linux/v4l2-dv-timings.h>
@@ -37,6 +39,7 @@
 #include <media/v4l2-dv-timings.h>
 #include <media/v4l2-device.h>
 #include <media/v4l2-ctrls.h>
+#include <media/v4l2-of.h>
 #include <media/tc358743.h>
 
 #include "tc358743_regs.h"
@@ -69,6 +72,7 @@ static const struct v4l2_dv_timings_cap tc358743_timings_cap = {
 
 struct tc358743_state {
 	struct tc358743_platform_data pdata;
+	struct v4l2_of_bus_mipi_csi2 bus;
 	struct v4l2_subdev sd;
 	struct media_pad pad;
 	struct v4l2_ctrl_handler hdl;
@@ -90,6 +94,8 @@ struct tc358743_state {
 
 	struct v4l2_dv_timings timings;
 	u32 mbus_fmt_code;
+
+	struct gpio_desc *reset_gpio;
 };
 
 static void tc358743_enable_interrupts(struct v4l2_subdev *sd,
@@ -700,7 +706,8 @@ static void tc358743_set_csi(struct v4l2_subdev *sd)
 			((lanes > 2) ? MASK_D2M_HSTXVREGEN : 0x0) |
 			((lanes > 3) ? MASK_D3M_HSTXVREGEN : 0x0));
 
-	i2c_wr32(sd, TXOPTIONCNTRL, MASK_CONTCLKMODE);
+	i2c_wr32(sd, TXOPTIONCNTRL, (state->bus.flags &
+		 V4L2_MBUS_CSI2_CONTINUOUS_CLOCK) ? MASK_CONTCLKMODE : 0);
 	i2c_wr32(sd, STARTCNTRL, MASK_START);
 	i2c_wr32(sd, CSI_START, MASK_STRT);
 
@@ -1638,6 +1645,136 @@ static const struct v4l2_ctrl_config tc358743_ctrl_audio_present = {
 
 /* --------------- PROBE / REMOVE --------------- */
 
+#ifdef CONFIG_OF
+static void tc358743_gpio_reset(struct tc358743_state *state)
+{
+	gpiod_set_value(state->reset_gpio, 0);
+	usleep_range(5000, 10000);
+	gpiod_set_value(state->reset_gpio, 1);
+	usleep_range(1000, 2000);
+	gpiod_set_value(state->reset_gpio, 0);
+	msleep(20);
+}
+
+static int tc358743_probe_of(struct tc358743_state *state)
+{
+	struct device *dev = &state->i2c_client->dev;
+	struct v4l2_of_endpoint *endpoint;
+	struct device_node *ep;
+	struct clk *refclk;
+	u32 bps_pr_lane;
+	int ret = -EINVAL;
+
+	refclk = devm_clk_get(dev, "refclk");
+	if (IS_ERR(refclk)) {
+		if (PTR_ERR(refclk) != -EPROBE_DEFER)
+			dev_err(dev, "failed to get refclk: %ld\n",
+				PTR_ERR(refclk));
+		return PTR_ERR(refclk);
+	}
+
+	ep = of_graph_get_next_endpoint(dev->of_node, NULL);
+	if (!ep) {
+		dev_err(dev, "missing endpoint node\n");
+		return -EINVAL;
+	}
+
+	endpoint = v4l2_of_alloc_parse_endpoint(ep);
+	if (IS_ERR(endpoint)) {
+		dev_err(dev, "failed to parse endpoint\n");
+		return PTR_ERR(endpoint);
+	}
+
+	if (endpoint->bus_type != V4L2_MBUS_CSI2 ||
+	    endpoint->bus.mipi_csi2.num_data_lanes == 0 ||
+	    endpoint->nr_of_link_frequencies == 0) {
+		dev_err(dev, "missing CSI-2 properties in endpoint\n");
+		goto free_endpoint;
+	}
+
+	state->bus = endpoint->bus.mipi_csi2;
+
+	clk_prepare_enable(refclk);
+
+	state->pdata.refclk_hz = clk_get_rate(refclk);
+	state->pdata.ddc5v_delay = DDC5V_DELAY_100_MS;
+	state->pdata.enable_hdcp = false;
+	/* A FIFO level of 16 should be enough for 2-lane 720p60 at 594 MHz. */
+	state->pdata.fifo_level = 16;
+	/*
+	 * The PLL input clock is obtained by dividing refclk by pll_prd.
+	 * It must be between 6 MHz and 40 MHz, lower frequency is better.
+	 */
+	switch (state->pdata.refclk_hz) {
+	case 26000000:
+	case 27000000:
+	case 42000000:
+		state->pdata.pll_prd = state->pdata.refclk_hz / 6000000;
+		break;
+	default:
+		dev_err(dev, "unsupported refclk rate: %u Hz\n",
+			state->pdata.refclk_hz);
+		goto disable_clk;
+	}
+
+	/*
+	 * The CSI bps per lane must be between 62.5 Mbps and 1 Gbps.
+	 * The default is 594 Mbps for 4-lane 1080p60 or 2-lane 720p60.
+	 */
+	bps_pr_lane = 2 * endpoint->link_frequencies[0];
+	if (bps_pr_lane < 62500000U || bps_pr_lane > 1000000000U) {
+		dev_err(dev, "unsupported bps per lane: %u bps\n", bps_pr_lane);
+		goto disable_clk;
+	}
+
+	/* The CSI speed per lane is refclk / pll_prd * pll_fbd */
+	state->pdata.pll_fbd = bps_pr_lane /
+			       state->pdata.refclk_hz * state->pdata.pll_prd;
+
+	/*
+	 * FIXME: These timings are from REF_02 for 594 Mbps per lane (297 MHz
+	 * link frequency). In principle it should be possible to calculate
+	 * them based on link frequency and resolution.
+	 */
+	if (bps_pr_lane != 594000000U)
+		dev_warn(dev, "untested bps per lane: %u bps\n", bps_pr_lane);
+	state->pdata.lineinitcnt = 0xe80;
+	state->pdata.lptxtimecnt = 0x003;
+	/* tclk-preparecnt: 3, tclk-zerocnt: 20 */
+	state->pdata.tclk_headercnt = 0x1403;
+	state->pdata.tclk_trailcnt = 0x00;
+	/* ths-preparecnt: 3, ths-zerocnt: 1 */
+	state->pdata.ths_headercnt = 0x0103;
+	state->pdata.twakeup = 0x4882;
+	state->pdata.tclk_postcnt = 0x008;
+	state->pdata.ths_trailcnt = 0x2;
+	state->pdata.hstxvregcnt = 0;
+
+	state->reset_gpio = devm_gpiod_get(dev, "reset");
+	if (IS_ERR(state->reset_gpio)) {
+		dev_err(dev, "failed to get reset gpio\n");
+		ret = PTR_ERR(state->reset_gpio);
+		goto disable_clk;
+	}
+
+	tc358743_gpio_reset(state);
+
+	ret = 0;
+	goto free_endpoint;
+
+disable_clk:
+	clk_disable_unprepare(refclk);
+free_endpoint:
+	v4l2_of_free_endpoint(endpoint);
+	return ret;
+}
+#else
+static inline int tc358743_probe_of(struct tc358743_state *state)
+{
+	return -ENODEV;
+}
+#endif
+
 static int tc358743_probe(struct i2c_client *client,
 			  const struct i2c_device_id *id)
 {
@@ -1658,14 +1795,20 @@ static int tc358743_probe(struct i2c_client *client,
 	if (!state)
 		return -ENOMEM;
 
+	state->i2c_client = client;
+
 	/* platform data */
-	if (!pdata) {
-		v4l_err(client, "No platform data!\n");
-		return -ENODEV;
+	if (pdata) {
+		state->pdata = *pdata;
+		state->bus.flags = V4L2_MBUS_CSI2_CONTINUOUS_CLOCK;
+	} else {
+		err = tc358743_probe_of(state);
+		if (err == -ENODEV)
+			v4l_err(client, "No platform data!\n");
+		if (err)
+			return err;
 	}
-	state->pdata = *pdata;
 
-	state->i2c_client = client;
 	sd = &state->sd;
 	v4l2_i2c_subdev_init(sd, client, &tc358743_ops);
 	sd->flags |= V4L2_SUBDEV_FL_HAS_DEVNODE | V4L2_SUBDEV_FL_HAS_EVENTS;
-- 
cgit v1.2.3


From 3985e8a3611a93bb36789f65db862e5700aab65e Mon Sep 17 00:00:00 2001
From: Erik Kline <ek@google.com>
Date: Wed, 22 Jul 2015 16:38:25 +0900
Subject: ipv6: sysctl to restrict candidate source addresses

Per RFC 6724, section 4, "Candidate Source Addresses":

    It is RECOMMENDED that the candidate source addresses be the set
    of unicast addresses assigned to the interface that will be used
    to send to the destination (the "outgoing" interface).

Add a sysctl to enable this behaviour.

Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/ip-sysctl.txt |  7 +++++++
 include/linux/ipv6.h                   |  1 +
 include/uapi/linux/ipv6.h              |  1 +
 net/ipv6/addrconf.c                    | 22 +++++++++++++++++++---
 4 files changed, 28 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index f63aeefd2c24..1a5ab21bcca5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1460,6 +1460,13 @@ router_solicitations - INTEGER
 	routers are present.
 	Default: 3
 
+use_oif_addrs_only - BOOLEAN
+	When enabled, the candidate source addresses for destinations
+	routed via this interface are restricted to the set of addresses
+	configured on this interface (vis. RFC 6724, section 4).
+
+	Default: false
+
 use_tempaddr - INTEGER
 	Preference for Privacy Extensions (RFC3041).
 	  <= 0 : disable Privacy Extensions
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 1319a6bb6b82..06ed637225b8 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -57,6 +57,7 @@ struct ipv6_devconf {
 		bool initialized;
 		struct in6_addr secret;
 	} stable_secret;
+	__s32		use_oif_addrs_only;
 	void		*sysctl;
 };
 
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 5efa54ae567c..641a146ead7d 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -171,6 +171,7 @@ enum {
 	DEVCONF_USE_OPTIMISTIC,
 	DEVCONF_ACCEPT_RA_MTU,
 	DEVCONF_STABLE_SECRET,
+	DEVCONF_USE_OIF_ADDRS_ONLY,
 	DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 32153c248959..eb0c6a3a8a00 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -211,7 +211,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
 	.accept_ra_mtu		= 1,
 	.stable_secret		= {
 		.initialized = false,
-	}
+	},
+	.use_oif_addrs_only	= 0,
 };
 
 static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
@@ -253,6 +254,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
 	.stable_secret		= {
 		.initialized = false,
 	},
+	.use_oif_addrs_only	= 0,
 };
 
 /* Check if a valid qdisc is available */
@@ -1472,11 +1474,16 @@ int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev,
 	 *    include addresses assigned to interfaces
 	 *    belonging to the same site as the outgoing
 	 *    interface.)
+	 *  - "It is RECOMMENDED that the candidate source addresses
+	 *    be the set of unicast addresses assigned to the
+	 *    interface that will be used to send to the destination
+	 *    (the 'outgoing' interface)." (RFC 6724)
 	 */
 	if (dst_dev) {
+		idev = __in6_dev_get(dst_dev);
 		if ((dst_type & IPV6_ADDR_MULTICAST) ||
-		    dst.scope <= IPV6_ADDR_SCOPE_LINKLOCAL) {
-			idev = __in6_dev_get(dst_dev);
+		    dst.scope <= IPV6_ADDR_SCOPE_LINKLOCAL ||
+		    (idev && idev->cnf.use_oif_addrs_only)) {
 			use_oif_addr = true;
 		}
 	}
@@ -4607,6 +4614,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
 	array[DEVCONF_ACCEPT_RA_FROM_LOCAL] = cnf->accept_ra_from_local;
 	array[DEVCONF_ACCEPT_RA_MTU] = cnf->accept_ra_mtu;
 	/* we omit DEVCONF_STABLE_SECRET for now */
+	array[DEVCONF_USE_OIF_ADDRS_ONLY] = cnf->use_oif_addrs_only;
 }
 
 static inline size_t inet6_ifla6_size(void)
@@ -5605,6 +5613,14 @@ static struct addrconf_sysctl_table
 			.mode		= 0600,
 			.proc_handler	= addrconf_sysctl_stable_secret,
 		},
+		{
+			.procname       = "use_oif_addrs_only",
+			.data           = &ipv6_devconf.use_oif_addrs_only,
+			.maxlen         = sizeof(int),
+			.mode           = 0644,
+			.proc_handler   = proc_dointvec,
+
+		},
 		{
 			/* sentinel */
 		}
-- 
cgit v1.2.3


From f78f5b90c4ffa559e400c3919a02236101f29f3f Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Thu, 18 Jun 2015 15:50:02 -0700
Subject: rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()

This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
consistency with the WARN() series of macros.  This also requires
inverting the sense of the conditional, which this commit also does.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/RCU/whatisRCU.txt  |  2 +-
 arch/x86/kernel/cpu/mcheck/mce.c |  6 ++--
 arch/x86/kernel/traps.c          |  2 +-
 drivers/base/power/opp.c         |  4 +--
 include/linux/fdtable.h          |  4 +--
 include/linux/rcupdate.h         | 63 ++++++++++++++++++++++++++--------------
 kernel/cgroup.c                  |  4 +--
 kernel/pid.c                     |  5 ++--
 kernel/rcu/srcu.c                | 10 +++----
 kernel/rcu/tiny.c                |  8 ++---
 kernel/rcu/tree.c                | 28 +++++++++---------
 kernel/rcu/tree_plugin.h         |  8 ++---
 kernel/rcu/update.c              |  4 +--
 kernel/sched/core.c              |  8 ++---
 kernel/workqueue.c               | 20 ++++++-------
 security/device_cgroup.c         |  6 ++--
 16 files changed, 101 insertions(+), 81 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 5746b0c77f3e..adc2184009c5 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -883,7 +883,7 @@ All:  lockdep-checked RCU-protected pointer access
 
 	rcu_access_pointer
 	rcu_dereference_raw
-	rcu_lockdep_assert
+	RCU_LOCKDEP_WARN
 	rcu_sleep_check
 	RCU_NONIDLE
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index df919ff103c3..3d6b5269fb2e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -54,9 +54,9 @@ static DEFINE_MUTEX(mce_chrdev_read_mutex);
 
 #define rcu_dereference_check_mce(p) \
 ({ \
-	rcu_lockdep_assert(rcu_read_lock_sched_held() || \
-			   lockdep_is_held(&mce_chrdev_read_mutex), \
-			   "suspicious rcu_dereference_check_mce() usage"); \
+	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() && \
+			 !lockdep_is_held(&mce_chrdev_read_mutex), \
+			 "suspicious rcu_dereference_check_mce() usage"); \
 	smp_load_acquire(&(p)); \
 })
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index f5791927aa64..c5a5231d1d11 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -136,7 +136,7 @@ enum ctx_state ist_enter(struct pt_regs *regs)
 	preempt_count_add(HARDIRQ_OFFSET);
 
 	/* This code is a bit fragile.  Test it. */
-	rcu_lockdep_assert(rcu_is_watching(), "ist_enter didn't work");
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "ist_enter didn't work");
 
 	return prev_state;
 }
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index 677fb2843553..3b188f20b43f 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -110,8 +110,8 @@ static DEFINE_MUTEX(dev_opp_list_lock);
 
 #define opp_rcu_lockdep_assert()					\
 do {									\
-	rcu_lockdep_assert(rcu_read_lock_held() ||			\
-				lockdep_is_held(&dev_opp_list_lock),	\
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&			\
+				!lockdep_is_held(&dev_opp_list_lock),	\
 			   "Missing rcu_read_lock() or "		\
 			   "dev_opp_list_lock protection");		\
 } while (0)
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index fbb88740634a..674e3e226465 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -86,8 +86,8 @@ static inline struct file *__fcheck_files(struct files_struct *files, unsigned i
 
 static inline struct file *fcheck_files(struct files_struct *files, unsigned int fd)
 {
-	rcu_lockdep_assert(rcu_read_lock_held() ||
-			   lockdep_is_held(&files->file_lock),
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
+			   !lockdep_is_held(&files->file_lock),
 			   "suspicious rcu_dereference_check() usage");
 	return __fcheck_files(files, fd);
 }
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 33ec16b9c2ee..ff476515f716 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -536,6 +536,11 @@ static inline int rcu_read_lock_sched_held(void)
 
 #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
 
+/* Deprecate rcu_lockdep_assert():  Use RCU_LOCKDEP_WARN() instead. */
+static inline void __attribute((deprecated)) deprecate_rcu_lockdep_assert(void)
+{
+}
+
 #ifdef CONFIG_PROVE_RCU
 
 /**
@@ -546,17 +551,32 @@ static inline int rcu_read_lock_sched_held(void)
 #define rcu_lockdep_assert(c, s)					\
 	do {								\
 		static bool __section(.data.unlikely) __warned;		\
+		deprecate_rcu_lockdep_assert();				\
 		if (debug_lockdep_rcu_enabled() && !__warned && !(c)) {	\
 			__warned = true;				\
 			lockdep_rcu_suspicious(__FILE__, __LINE__, s);	\
 		}							\
 	} while (0)
 
+/**
+ * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met
+ * @c: condition to check
+ * @s: informative message
+ */
+#define RCU_LOCKDEP_WARN(c, s)						\
+	do {								\
+		static bool __section(.data.unlikely) __warned;		\
+		if (debug_lockdep_rcu_enabled() && !__warned && (c)) {	\
+			__warned = true;				\
+			lockdep_rcu_suspicious(__FILE__, __LINE__, s);	\
+		}							\
+	} while (0)
+
 #if defined(CONFIG_PROVE_RCU) && !defined(CONFIG_PREEMPT_RCU)
 static inline void rcu_preempt_sleep_check(void)
 {
-	rcu_lockdep_assert(!lock_is_held(&rcu_lock_map),
-			   "Illegal context switch in RCU read-side critical section");
+	RCU_LOCKDEP_WARN(lock_is_held(&rcu_lock_map),
+			 "Illegal context switch in RCU read-side critical section");
 }
 #else /* #ifdef CONFIG_PROVE_RCU */
 static inline void rcu_preempt_sleep_check(void)
@@ -567,15 +587,16 @@ static inline void rcu_preempt_sleep_check(void)
 #define rcu_sleep_check()						\
 	do {								\
 		rcu_preempt_sleep_check();				\
-		rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map),	\
-				   "Illegal context switch in RCU-bh read-side critical section"); \
-		rcu_lockdep_assert(!lock_is_held(&rcu_sched_lock_map),	\
-				   "Illegal context switch in RCU-sched read-side critical section"); \
+		RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map),	\
+				 "Illegal context switch in RCU-bh read-side critical section"); \
+		RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map),	\
+				 "Illegal context switch in RCU-sched read-side critical section"); \
 	} while (0)
 
 #else /* #ifdef CONFIG_PROVE_RCU */
 
-#define rcu_lockdep_assert(c, s) do { } while (0)
+#define rcu_lockdep_assert(c, s) deprecate_rcu_lockdep_assert()
+#define RCU_LOCKDEP_WARN(c, s) do { } while (0)
 #define rcu_sleep_check() do { } while (0)
 
 #endif /* #else #ifdef CONFIG_PROVE_RCU */
@@ -606,13 +627,13 @@ static inline void rcu_preempt_sleep_check(void)
 ({ \
 	/* Dependency order vs. p above. */ \
 	typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
-	rcu_lockdep_assert(c, "suspicious rcu_dereference_check() usage"); \
+	RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \
 	rcu_dereference_sparse(p, space); \
 	((typeof(*p) __force __kernel *)(________p1)); \
 })
 #define __rcu_dereference_protected(p, c, space) \
 ({ \
-	rcu_lockdep_assert(c, "suspicious rcu_dereference_protected() usage"); \
+	RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_protected() usage"); \
 	rcu_dereference_sparse(p, space); \
 	((typeof(*p) __force __kernel *)(p)); \
 })
@@ -836,8 +857,8 @@ static inline void rcu_read_lock(void)
 	__rcu_read_lock();
 	__acquire(RCU);
 	rcu_lock_acquire(&rcu_lock_map);
-	rcu_lockdep_assert(rcu_is_watching(),
-			   "rcu_read_lock() used illegally while idle");
+	RCU_LOCKDEP_WARN(!rcu_is_watching(),
+			 "rcu_read_lock() used illegally while idle");
 }
 
 /*
@@ -887,8 +908,8 @@ static inline void rcu_read_lock(void)
  */
 static inline void rcu_read_unlock(void)
 {
-	rcu_lockdep_assert(rcu_is_watching(),
-			   "rcu_read_unlock() used illegally while idle");
+	RCU_LOCKDEP_WARN(!rcu_is_watching(),
+			 "rcu_read_unlock() used illegally while idle");
 	__release(RCU);
 	__rcu_read_unlock();
 	rcu_lock_release(&rcu_lock_map); /* Keep acq info for rls diags. */
@@ -916,8 +937,8 @@ static inline void rcu_read_lock_bh(void)
 	local_bh_disable();
 	__acquire(RCU_BH);
 	rcu_lock_acquire(&rcu_bh_lock_map);
-	rcu_lockdep_assert(rcu_is_watching(),
-			   "rcu_read_lock_bh() used illegally while idle");
+	RCU_LOCKDEP_WARN(!rcu_is_watching(),
+			 "rcu_read_lock_bh() used illegally while idle");
 }
 
 /*
@@ -927,8 +948,8 @@ static inline void rcu_read_lock_bh(void)
  */
 static inline void rcu_read_unlock_bh(void)
 {
-	rcu_lockdep_assert(rcu_is_watching(),
-			   "rcu_read_unlock_bh() used illegally while idle");
+	RCU_LOCKDEP_WARN(!rcu_is_watching(),
+			 "rcu_read_unlock_bh() used illegally while idle");
 	rcu_lock_release(&rcu_bh_lock_map);
 	__release(RCU_BH);
 	local_bh_enable();
@@ -952,8 +973,8 @@ static inline void rcu_read_lock_sched(void)
 	preempt_disable();
 	__acquire(RCU_SCHED);
 	rcu_lock_acquire(&rcu_sched_lock_map);
-	rcu_lockdep_assert(rcu_is_watching(),
-			   "rcu_read_lock_sched() used illegally while idle");
+	RCU_LOCKDEP_WARN(!rcu_is_watching(),
+			 "rcu_read_lock_sched() used illegally while idle");
 }
 
 /* Used by lockdep and tracing: cannot be traced, cannot call lockdep. */
@@ -970,8 +991,8 @@ static inline notrace void rcu_read_lock_sched_notrace(void)
  */
 static inline void rcu_read_unlock_sched(void)
 {
-	rcu_lockdep_assert(rcu_is_watching(),
-			   "rcu_read_unlock_sched() used illegally while idle");
+	RCU_LOCKDEP_WARN(!rcu_is_watching(),
+			 "rcu_read_unlock_sched() used illegally while idle");
 	rcu_lock_release(&rcu_sched_lock_map);
 	__release(RCU_SCHED);
 	preempt_enable();
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f89d9292eee6..b89f3168411b 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -107,8 +107,8 @@ static DEFINE_SPINLOCK(release_agent_path_lock);
 struct percpu_rw_semaphore cgroup_threadgroup_rwsem;
 
 #define cgroup_assert_mutex_or_rcu_locked()				\
-	rcu_lockdep_assert(rcu_read_lock_held() ||			\
-			   lockdep_is_held(&cgroup_mutex),		\
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&			\
+			   !lockdep_is_held(&cgroup_mutex),		\
 			   "cgroup_mutex or RCU read lock required");
 
 /*
diff --git a/kernel/pid.c b/kernel/pid.c
index 4fd07d5b7baf..ca368793808e 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -451,9 +451,8 @@ EXPORT_SYMBOL(pid_task);
  */
 struct task_struct *find_task_by_pid_ns(pid_t nr, struct pid_namespace *ns)
 {
-	rcu_lockdep_assert(rcu_read_lock_held(),
-			   "find_task_by_pid_ns() needs rcu_read_lock()"
-			   " protection");
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
+			 "find_task_by_pid_ns() needs rcu_read_lock() protection");
 	return pid_task(find_pid_ns(nr, ns), PIDTYPE_PID);
 }
 
diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
index de35087c92a5..d3fcb2ec8536 100644
--- a/kernel/rcu/srcu.c
+++ b/kernel/rcu/srcu.c
@@ -415,11 +415,11 @@ static void __synchronize_srcu(struct srcu_struct *sp, int trycount)
 	struct rcu_head *head = &rcu.head;
 	bool done = false;
 
-	rcu_lockdep_assert(!lock_is_held(&sp->dep_map) &&
-			   !lock_is_held(&rcu_bh_lock_map) &&
-			   !lock_is_held(&rcu_lock_map) &&
-			   !lock_is_held(&rcu_sched_lock_map),
-			   "Illegal synchronize_srcu() in same-type SRCU (or RCU) read-side critical section");
+	RCU_LOCKDEP_WARN(lock_is_held(&sp->dep_map) ||
+			 lock_is_held(&rcu_bh_lock_map) ||
+			 lock_is_held(&rcu_lock_map) ||
+			 lock_is_held(&rcu_sched_lock_map),
+			 "Illegal synchronize_srcu() in same-type SRCU (or in RCU) read-side critical section");
 
 	might_sleep();
 	init_completion(&rcu.completion);
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index c291bd65d2cb..d0471056d0af 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -191,10 +191,10 @@ static void rcu_process_callbacks(struct softirq_action *unused)
  */
 void synchronize_sched(void)
 {
-	rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map) &&
-			   !lock_is_held(&rcu_lock_map) &&
-			   !lock_is_held(&rcu_sched_lock_map),
-			   "Illegal synchronize_sched() in RCU read-side critical section");
+	RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
+			 lock_is_held(&rcu_lock_map) ||
+			 lock_is_held(&rcu_sched_lock_map),
+			 "Illegal synchronize_sched() in RCU read-side critical section");
 	cond_resched();
 }
 EXPORT_SYMBOL_GPL(synchronize_sched);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cb64d7e13d24..0a73d26357a2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -649,12 +649,12 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
 	 * It is illegal to enter an extended quiescent state while
 	 * in an RCU read-side critical section.
 	 */
-	rcu_lockdep_assert(!lock_is_held(&rcu_lock_map),
-			   "Illegal idle entry in RCU read-side critical section.");
-	rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map),
-			   "Illegal idle entry in RCU-bh read-side critical section.");
-	rcu_lockdep_assert(!lock_is_held(&rcu_sched_lock_map),
-			   "Illegal idle entry in RCU-sched read-side critical section.");
+	RCU_LOCKDEP_WARN(lock_is_held(&rcu_lock_map),
+			 "Illegal idle entry in RCU read-side critical section.");
+	RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map),
+			 "Illegal idle entry in RCU-bh read-side critical section.");
+	RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map),
+			 "Illegal idle entry in RCU-sched read-side critical section.");
 }
 
 /*
@@ -3161,10 +3161,10 @@ static inline int rcu_blocking_is_gp(void)
  */
 void synchronize_sched(void)
 {
-	rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map) &&
-			   !lock_is_held(&rcu_lock_map) &&
-			   !lock_is_held(&rcu_sched_lock_map),
-			   "Illegal synchronize_sched() in RCU-sched read-side critical section");
+	RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
+			 lock_is_held(&rcu_lock_map) ||
+			 lock_is_held(&rcu_sched_lock_map),
+			 "Illegal synchronize_sched() in RCU-sched read-side critical section");
 	if (rcu_blocking_is_gp())
 		return;
 	if (rcu_gp_is_expedited())
@@ -3188,10 +3188,10 @@ EXPORT_SYMBOL_GPL(synchronize_sched);
  */
 void synchronize_rcu_bh(void)
 {
-	rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map) &&
-			   !lock_is_held(&rcu_lock_map) &&
-			   !lock_is_held(&rcu_sched_lock_map),
-			   "Illegal synchronize_rcu_bh() in RCU-bh read-side critical section");
+	RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
+			 lock_is_held(&rcu_lock_map) ||
+			 lock_is_held(&rcu_sched_lock_map),
+			 "Illegal synchronize_rcu_bh() in RCU-bh read-side critical section");
 	if (rcu_blocking_is_gp())
 		return;
 	if (rcu_gp_is_expedited())
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index a983bc68a146..9e922f111d63 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -538,10 +538,10 @@ EXPORT_SYMBOL_GPL(call_rcu);
  */
 void synchronize_rcu(void)
 {
-	rcu_lockdep_assert(!lock_is_held(&rcu_bh_lock_map) &&
-			   !lock_is_held(&rcu_lock_map) &&
-			   !lock_is_held(&rcu_sched_lock_map),
-			   "Illegal synchronize_rcu() in RCU read-side critical section");
+	RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
+			 lock_is_held(&rcu_lock_map) ||
+			 lock_is_held(&rcu_sched_lock_map),
+			 "Illegal synchronize_rcu() in RCU read-side critical section");
 	if (!rcu_scheduler_active)
 		return;
 	if (rcu_gp_is_expedited())
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index a0a0dd03c73a..47268fb1d27b 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -589,8 +589,8 @@ EXPORT_SYMBOL_GPL(call_rcu_tasks);
 void synchronize_rcu_tasks(void)
 {
 	/* Complain if the scheduler has not started.  */
-	rcu_lockdep_assert(!rcu_scheduler_active,
-			   "synchronize_rcu_tasks called too soon");
+	RCU_LOCKDEP_WARN(rcu_scheduler_active,
+			 "synchronize_rcu_tasks called too soon");
 
 	/* Wait for the grace period. */
 	wait_rcu_gp(call_rcu_tasks);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78b4bad10081..5e73c79fadd0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2200,8 +2200,8 @@ unsigned long to_ratio(u64 period, u64 runtime)
 #ifdef CONFIG_SMP
 inline struct dl_bw *dl_bw_of(int i)
 {
-	rcu_lockdep_assert(rcu_read_lock_sched_held(),
-			   "sched RCU must be held");
+	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
+			 "sched RCU must be held");
 	return &cpu_rq(i)->rd->dl_bw;
 }
 
@@ -2210,8 +2210,8 @@ static inline int dl_bw_cpus(int i)
 	struct root_domain *rd = cpu_rq(i)->rd;
 	int cpus = 0;
 
-	rcu_lockdep_assert(rcu_read_lock_sched_held(),
-			   "sched RCU must be held");
+	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
+			 "sched RCU must be held");
 	for_each_cpu_and(i, rd->span, cpu_active_mask)
 		cpus++;
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4c4f06176f74..cb91c63b4f4a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -338,20 +338,20 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq);
 #include <trace/events/workqueue.h>
 
 #define assert_rcu_or_pool_mutex()					\
-	rcu_lockdep_assert(rcu_read_lock_sched_held() ||		\
-			   lockdep_is_held(&wq_pool_mutex),		\
-			   "sched RCU or wq_pool_mutex should be held")
+	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() &&			\
+			 !lockdep_is_held(&wq_pool_mutex),		\
+			 "sched RCU or wq_pool_mutex should be held")
 
 #define assert_rcu_or_wq_mutex(wq)					\
-	rcu_lockdep_assert(rcu_read_lock_sched_held() ||		\
-			   lockdep_is_held(&wq->mutex),			\
-			   "sched RCU or wq->mutex should be held")
+	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() &&			\
+			 !lockdep_is_held(&wq->mutex),			\
+			 "sched RCU or wq->mutex should be held")
 
 #define assert_rcu_or_wq_mutex_or_pool_mutex(wq)			\
-	rcu_lockdep_assert(rcu_read_lock_sched_held() ||		\
-			   lockdep_is_held(&wq->mutex) ||		\
-			   lockdep_is_held(&wq_pool_mutex),		\
-			   "sched RCU, wq->mutex or wq_pool_mutex should be held")
+	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held() &&			\
+			 !lockdep_is_held(&wq->mutex) &&		\
+			 !lockdep_is_held(&wq_pool_mutex),		\
+			 "sched RCU, wq->mutex or wq_pool_mutex should be held")
 
 #define for_each_cpu_worker_pool(pool, cpu)				\
 	for ((pool) = &per_cpu(cpu_worker_pools, cpu)[0];		\
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 188c1d26393b..73455089feef 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -400,9 +400,9 @@ static bool verify_new_ex(struct dev_cgroup *dev_cgroup,
 {
 	bool match = false;
 
-	rcu_lockdep_assert(rcu_read_lock_held() ||
-			   lockdep_is_held(&devcgroup_mutex),
-			   "device_cgroup:verify_new_ex called without proper synchronization");
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
+			 lockdep_is_held(&devcgroup_mutex),
+			 "device_cgroup:verify_new_ex called without proper synchronization");
 
 	if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) {
 		if (behavior == DEVCG_DEFAULT_ALLOW) {
-- 
cgit v1.2.3


From 655fe4effe0f1f40e4f6ca6b3cc64a7fe0032183 Mon Sep 17 00:00:00 2001
From: Kevin Strasser <kevin.strasser@linux.intel.com>
Date: Tue, 16 Jun 2015 10:35:30 -0700
Subject: usbcore: add sysfs support to xHCI usb3 hardware LPM

Add a sysfs node to make it easier to verify if LPM is supported and being
enabled for USB 3.0 devices.

Signed-off-by: Kevin Strasser <kevin.strasser@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/ABI/testing/sysfs-bus-usb | 14 ++++++++++++++
 Documentation/usb/power-management.txt  | 15 +++++++++++++--
 drivers/usb/core/hub.c                  |  4 ++++
 drivers/usb/core/sysfs.c                | 31 +++++++++++++++++++++++++++++++
 4 files changed, 62 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb
index e5cc7633d013..e935625f5995 100644
--- a/Documentation/ABI/testing/sysfs-bus-usb
+++ b/Documentation/ABI/testing/sysfs-bus-usb
@@ -114,6 +114,20 @@ Description:
 		enabled for the device. Developer can write y/Y/1 or n/N/0 to
 		the file to enable/disable the feature.
 
+What:		/sys/bus/usb/devices/.../power/usb3_hardware_lpm
+Date:		June 2015
+Contact:	Kevin Strasser <kevin.strasser@linux.intel.com>
+Description:
+		If CONFIG_PM_RUNTIME is set and a USB 3.0 lpm-capable device is
+		plugged in to a xHCI host which supports link PM, it will check
+		if U1 and U2 exit latencies have been set in the BOS
+		descriptor; if the check is is passed and the host supports
+		USB3 hardware LPM, USB3 hardware LPM will be enabled for the
+		device and the USB device directory will contain a file named
+		power/usb3_hardware_lpm. The file holds a string value (enable
+		or disable) indicating whether or not USB3 hardware LPM is
+		enabled for the device.
+
 What:		/sys/bus/usb/devices/.../removable
 Date:		February 2012
 Contact:	Matthew Garrett <mjg@redhat.com>
diff --git a/Documentation/usb/power-management.txt b/Documentation/usb/power-management.txt
index b5f83911732a..4a15c90bc11d 100644
--- a/Documentation/usb/power-management.txt
+++ b/Documentation/usb/power-management.txt
@@ -521,10 +521,10 @@ enabling hardware LPM, the host can automatically put the device into
 lower power state(L1 for usb2.0 devices, or U1/U2 for usb3.0 devices),
 which state device can enter and resume very quickly.
 
-The user interface for controlling USB2 hardware LPM is located in the
+The user interface for controlling hardware LPM is located in the
 power/ subdirectory of each USB device's sysfs directory, that is, in
 /sys/bus/usb/devices/.../power/ where "..." is the device's ID. The
-relevant attribute files is usb2_hardware_lpm.
+relevant attribute files are usb2_hardware_lpm and usb3_hardware_lpm.
 
 	power/usb2_hardware_lpm
 
@@ -537,6 +537,17 @@ relevant attribute files is usb2_hardware_lpm.
 		can write y/Y/1 or n/N/0 to the file to	enable/disable
 		USB2 hardware LPM manually. This is for	test purpose mainly.
 
+	power/usb3_hardware_lpm
+
+		When a USB 3.0 lpm-capable device is plugged in to a
+		xHCI host which supports link PM, it will check if U1
+		and U2 exit latencies have been set in the BOS
+		descriptor; if the check is is passed and the host
+		supports USB3 hardware LPM, USB3 hardware LPM will be
+		enabled for the device and this file will be created.
+		The file holds a string value (enable or disable)
+		indicating whether or not USB3 hardware LPM is
+		enabled for the device.
 
 	USB Port Power Control
 	----------------------
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 43cb2f2e3b43..d9ce8f9d9acc 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3950,6 +3950,8 @@ int usb_disable_lpm(struct usb_device *udev)
 	if (usb_disable_link_state(hcd, udev, USB3_LPM_U2))
 		goto enable_lpm;
 
+	udev->usb3_lpm_enabled = 0;
+
 	return 0;
 
 enable_lpm:
@@ -4007,6 +4009,8 @@ void usb_enable_lpm(struct usb_device *udev)
 
 	usb_enable_link_state(hcd, udev, USB3_LPM_U1);
 	usb_enable_link_state(hcd, udev, USB3_LPM_U2);
+
+	udev->usb3_lpm_enabled = 1;
 }
 EXPORT_SYMBOL_GPL(usb_enable_lpm);
 
diff --git a/drivers/usb/core/sysfs.c b/drivers/usb/core/sysfs.c
index d26973844a4d..cfc68c11c3f5 100644
--- a/drivers/usb/core/sysfs.c
+++ b/drivers/usb/core/sysfs.c
@@ -531,6 +531,25 @@ static ssize_t usb2_lpm_besl_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(usb2_lpm_besl);
 
+static ssize_t usb3_hardware_lpm_show(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	struct usb_device *udev = to_usb_device(dev);
+	const char *p;
+
+	usb_lock_device(udev);
+
+	if (udev->usb3_lpm_enabled)
+		p = "enabled";
+	else
+		p = "disabled";
+
+	usb_unlock_device(udev);
+
+	return sprintf(buf, "%s\n", p);
+}
+static DEVICE_ATTR_RO(usb3_hardware_lpm);
+
 static struct attribute *usb2_hardware_lpm_attr[] = {
 	&dev_attr_usb2_hardware_lpm.attr,
 	&dev_attr_usb2_lpm_l1_timeout.attr,
@@ -542,6 +561,15 @@ static struct attribute_group usb2_hardware_lpm_attr_group = {
 	.attrs	= usb2_hardware_lpm_attr,
 };
 
+static struct attribute *usb3_hardware_lpm_attr[] = {
+	&dev_attr_usb3_hardware_lpm.attr,
+	NULL,
+};
+static struct attribute_group usb3_hardware_lpm_attr_group = {
+	.name	= power_group_name,
+	.attrs	= usb3_hardware_lpm_attr,
+};
+
 static struct attribute *power_attrs[] = {
 	&dev_attr_autosuspend.attr,
 	&dev_attr_level.attr,
@@ -564,6 +592,9 @@ static int add_power_attributes(struct device *dev)
 		if (udev->usb2_hw_lpm_capable == 1)
 			rc = sysfs_merge_group(&dev->kobj,
 					&usb2_hardware_lpm_attr_group);
+		if (udev->lpm_capable == 1)
+			rc = sysfs_merge_group(&dev->kobj,
+					&usb3_hardware_lpm_attr_group);
 	}
 
 	return rc;
-- 
cgit v1.2.3


From 2ce7918990641b07e70e1b25752d666369e2016e Mon Sep 17 00:00:00 2001
From: Andrey Smetanin <asmetanin@virtuozzo.com>
Date: Fri, 3 Jul 2015 15:01:41 +0300
Subject: kvm/x86: add sending hyper-v crash notification to user space

Sending of notification is done by exiting vcpu to user space
if KVM_REQ_HV_CRASH is enabled for vcpu. At exit to user space
the kvm_run structure contains system_event with type
KVM_SYSTEM_EVENT_CRASH to notify about guest crash occurred.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virtual/kvm/api.txt | 5 +++++
 arch/x86/kvm/x86.c                | 6 ++++++
 include/linux/kvm_host.h          | 1 +
 include/uapi/linux/kvm.h          | 1 +
 4 files changed, 13 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index a7926a90156f..a4ebcb712375 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3277,6 +3277,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
 		struct {
 #define KVM_SYSTEM_EVENT_SHUTDOWN       1
 #define KVM_SYSTEM_EVENT_RESET          2
+#define KVM_SYSTEM_EVENT_CRASH          3
 			__u32 type;
 			__u64 flags;
 		} system_event;
@@ -3296,6 +3297,10 @@ Valid values for 'type' are:
   KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
    As with SHUTDOWN, userspace can choose to ignore the request, or
    to schedule the reset to occur in the future and may call KVM_RUN again.
+  KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest
+   has requested a crash condition maintenance. Userspace can choose
+   to ignore the request, or to gather VM memory core dump and/or
+   reset/shutdown of the VM.
 
 		/* Fix the size of the union. */
 		char padding[256];
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cfa3e5a7d6be..28076c266a9a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6263,6 +6263,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			vcpu_scan_ioapic(vcpu);
 		if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
 			kvm_vcpu_reload_apic_access_page(vcpu);
+		if (kvm_check_request(KVM_REQ_HV_CRASH, vcpu)) {
+			vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+			vcpu->run->system_event.type = KVM_SYSTEM_EVENT_CRASH;
+			r = 0;
+			goto out;
+		}
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1d917d9b7f12..51103f0feb7e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -139,6 +139,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_DISABLE_IBS       24
 #define KVM_REQ_APIC_PAGE_RELOAD  25
 #define KVM_REQ_SMI               26
+#define KVM_REQ_HV_CRASH          27
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID		0
 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID	1
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 716ad4ae4d4b..9ef19ebd9df4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -317,6 +317,7 @@ struct kvm_run {
 		struct {
 #define KVM_SYSTEM_EVENT_SHUTDOWN       1
 #define KVM_SYSTEM_EVENT_RESET          2
+#define KVM_SYSTEM_EVENT_CRASH          3
 			__u32 type;
 			__u64 flags;
 		} system_event;
-- 
cgit v1.2.3


From ab4a936247561cd998913bab5f15e3d3eaed1f9e Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij@linaro.org>
Date: Wed, 17 Jun 2015 23:10:21 +0200
Subject: pinctrl: nomadik: assure GPIO chips are populated

If the pin controller probes before the GPIO driver it needs to
populate the GPIO driver state containers ahead of the actual
driver probe as the addresses are used by both halves of the
driver.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 .../devicetree/bindings/pinctrl/ste,nomadik.txt    |  7 ++++--
 drivers/pinctrl/nomadik/pinctrl-nomadik.c          | 25 ++++++++++++++++++++++
 2 files changed, 30 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/pinctrl/ste,nomadik.txt b/Documentation/devicetree/bindings/pinctrl/ste,nomadik.txt
index f63fcb3ed352..2213802435e0 100644
--- a/Documentation/devicetree/bindings/pinctrl/ste,nomadik.txt
+++ b/Documentation/devicetree/bindings/pinctrl/ste,nomadik.txt
@@ -3,7 +3,9 @@ ST Ericsson Nomadik pinmux controller
 Required properties:
 - compatible: "stericsson,db8500-pinctrl", "stericsson,db8540-pinctrl",
               "stericsson,stn8815-pinctrl"
-- reg: Should contain the register physical address and length of the PRCMU.
+- nomadik-gpio-chips: array of phandles to the corresponding GPIO chips
+              (these have the register ranges used by the pin controller).
+- prcm: phandle to the PRCMU managing the back end of this pin controller
 
 Please refer to pinctrl-bindings.txt in this directory for details of the
 common pinctrl bindings used by client devices, including the meaning of the
@@ -74,7 +76,8 @@ Example board file extract:
 
 	pinctrl@80157000 {
 		compatible = "stericsson,db8500-pinctrl";
-		reg = <0x80157000 0x2000>;
+		nomadik-gpio-chips = <&gpio0>, <&gpio1>, <&gpio2>, <&gpio3>;
+		prcm = <&prcmu>;
 
 		pinctrl-names = "default";
 
diff --git a/drivers/pinctrl/nomadik/pinctrl-nomadik.c b/drivers/pinctrl/nomadik/pinctrl-nomadik.c
index 7b1160def285..143d1c06078c 100644
--- a/drivers/pinctrl/nomadik/pinctrl-nomadik.c
+++ b/drivers/pinctrl/nomadik/pinctrl-nomadik.c
@@ -2019,6 +2019,31 @@ static int nmk_pinctrl_probe(struct platform_device *pdev)
 	if (version == PINCTRL_NMK_DB8540)
 		nmk_pinctrl_db8540_init(&npct->soc);
 
+	/*
+	 * Since we depend on the GPIO chips to provide clock and register base
+	 * for the pin control operations, make sure that we have these
+	 * populated before we continue. Follow the phandles to instantiate
+	 * them. The GPIO portion of the actual hardware may be probed before
+	 * or after this point: it shouldn't matter as the APIs are orthogonal.
+	 */
+	for (i = 0; i < NMK_MAX_BANKS; i++) {
+		struct device_node *gpio_np;
+		struct nmk_gpio_chip *nmk_chip;
+
+		gpio_np = of_parse_phandle(np, "nomadik-gpio-chips", i);
+		if (gpio_np) {
+			dev_info(&pdev->dev,
+				 "populate NMK GPIO %d \"%s\"\n",
+				 i, gpio_np->name);
+			nmk_chip = nmk_gpio_populate_chip(gpio_np, pdev);
+			if (IS_ERR(nmk_chip))
+				dev_err(&pdev->dev,
+					"could not populate nmk chip struct "
+					"- continue anyway\n");
+			of_node_put(gpio_np);
+		}
+	}
+
 	prcm_np = of_parse_phandle(np, "prcm", 0);
 	if (prcm_np)
 		npct->prcm_base = of_iomap(prcm_np, 0);
-- 
cgit v1.2.3


From bea6356c12600b683de55a70e6a4c6cc36fa640f Mon Sep 17 00:00:00 2001
From: Lee Jones <lee.jones@linaro.org>
Date: Tue, 12 May 2015 13:58:12 +0100
Subject: clocksource: bindings: Provide bindings for ST's LPC Clocksource
 device

On current ST platforms the LPC controls a number of functions including
Watchdog and Real Time Clock.  This patch provides the bindings used to
configure LPC in Clocksource mode.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 .../devicetree/bindings/timer/st,stih407-lpc       | 28 ++++++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/timer/st,stih407-lpc

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/timer/st,stih407-lpc b/Documentation/devicetree/bindings/timer/st,stih407-lpc
new file mode 100644
index 000000000000..72acb487b856
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/st,stih407-lpc
@@ -0,0 +1,28 @@
+STMicroelectronics Low Power Controller (LPC) - Clocksource
+===========================================================
+
+LPC currently supports Watchdog OR Real Time Clock OR Clocksource
+functionality.
+
+[See: ../watchdog/st_lpc_wdt.txt for Watchdog options]
+[See: ../rtc/rtc-st-lpc.txt for RTC options]
+
+Required properties
+
+- compatible   : Must be: "st,stih407-lpc"
+- reg          : LPC registers base address + size
+- interrupts   : LPC interrupt line number and associated flags
+- clocks       : Clock used by LPC device (See: ../clock/clock-bindings.txt)
+- st,lpc-mode  : The LPC can run either one of three modes:
+                  ST_LPC_MODE_RTC    [0]
+                  ST_LPC_MODE_WDT    [1]
+                  ST_LPC_MODE_CLKSRC [2]
+		 One (and only one) mode must be selected.
+
+Example:
+       lpc@fde05000 {
+               compatible      = "st,stih407-lpc";
+               reg             = <0xfde05000 0x1000>;
+               clocks          = <&clk_s_d3_flexgen CLK_LPC_0>;
+               st,lpc-mode     = <ST_LPC_MODE_CLKSRC>;
+       };
-- 
cgit v1.2.3


From 82c32c0326dc758bdf202c82cb05594ba0dd0d2e Mon Sep 17 00:00:00 2001
From: Lee Jones <lee.jones@linaro.org>
Date: Tue, 12 May 2015 13:58:14 +0100
Subject: watchdog: bindings: Supply knowledge of a third supported device -
 clocksource

Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 Documentation/devicetree/bindings/watchdog/st_lpc_wdt.txt | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/watchdog/st_lpc_wdt.txt b/Documentation/devicetree/bindings/watchdog/st_lpc_wdt.txt
index 388c88a01222..039c5ca45577 100644
--- a/Documentation/devicetree/bindings/watchdog/st_lpc_wdt.txt
+++ b/Documentation/devicetree/bindings/watchdog/st_lpc_wdt.txt
@@ -1,9 +1,11 @@
 STMicroelectronics Low Power Controller (LPC) - Watchdog
 ========================================================
 
-LPC currently supports Watchdog OR Real Time Clock functionality.
+LPC currently supports Watchdog OR Real Time Clock OR Clocksource
+functionality.
 
 [See: ../rtc/rtc-st-lpc.txt for RTC options]
+[See: ../timer/st,stih407-lpc for Clocksource options]
 
 Required properties
 
@@ -12,9 +14,11 @@ Required properties
 - reg		: LPC registers base address + size
 - interrupts    : LPC interrupt line number and associated flags
 - clocks	: Clock used by LPC device (See: ../clock/clock-bindings.txt)
-- st,lpc-mode	: The LPC can run either one of two modes ST_LPC_MODE_RTC [0] or
-		  ST_LPC_MODE_WDT [1].  One (and only one) mode must be
-		  selected.
+- st,lpc-mode	: The LPC can run either one of three modes:
+                  ST_LPC_MODE_RTC    [0]
+                  ST_LPC_MODE_WDT    [1]
+                  ST_LPC_MODE_CLKSRC [2]
+		 One (and only one) mode must be selected.
 
 Required properties [watchdog mode]
 
-- 
cgit v1.2.3


From c74a06dce5f82ea4e94a552d486293667c905267 Mon Sep 17 00:00:00 2001
From: Lee Jones <lee.jones@linaro.org>
Date: Tue, 12 May 2015 13:58:16 +0100
Subject: rtc: bindings: Supply knowledge of a third supported device -
 clocksource

Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 Documentation/devicetree/bindings/rtc/rtc-st-lpc.txt | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/rtc/rtc-st-lpc.txt b/Documentation/devicetree/bindings/rtc/rtc-st-lpc.txt
index 73407f502e4e..daf88265df32 100644
--- a/Documentation/devicetree/bindings/rtc/rtc-st-lpc.txt
+++ b/Documentation/devicetree/bindings/rtc/rtc-st-lpc.txt
@@ -1,20 +1,23 @@
 STMicroelectronics Low Power Controller (LPC) - RTC
 ===================================================
 
-LPC currently supports Watchdog OR Real Time Clock functionality.
+LPC currently supports Watchdog OR Real Time Clock OR Clocksource
+functionality.
 
 [See: ../watchdog/st_lpc_wdt.txt for Watchdog options]
+[See: ../timer/st,stih407-lpc for Clocksource options]
 
 Required properties
 
-- compatible 	: Must be one of: "st,stih407-lpc" "st,stih416-lpc"
-				  "st,stih415-lpc" "st,stid127-lpc"
+- compatible 	: Must be: "st,stih407-lpc"
 - reg		: LPC registers base address + size
 - interrupts    : LPC interrupt line number and associated flags
 - clocks	: Clock used by LPC device (See: ../clock/clock-bindings.txt)
-- st,lpc-mode	: The LPC can run either one of two modes ST_LPC_MODE_RTC [0] or
-		  ST_LPC_MODE_WDT [1].  One (and only one) mode must be
-		  selected.
+- st,lpc-mode	: The LPC can run either one of three modes:
+                  ST_LPC_MODE_RTC    [0]
+                  ST_LPC_MODE_WDT    [1]
+                  ST_LPC_MODE_CLKSRC [2]
+		 One (and only one) mode must be selected.
 
 Example:
 	lpc@fde05000 {
-- 
cgit v1.2.3


From afc257bc9ac9c2f9055c9f9829322b3ff7574165 Mon Sep 17 00:00:00 2001
From: Mars Cheng <mars.cheng@mediatek.com>
Date: Tue, 14 Jul 2015 14:58:12 +0800
Subject: Document: DT: Add bindings for mediatek MT6795 SoC Platform

This adds DT binding documentation for Mediatek MT6795.

Signed-off-by: Mars Cheng <mars.cheng@mediatek.com>
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
---
 Documentation/devicetree/bindings/arm/mediatek.txt               | 9 +++++++--
 .../devicetree/bindings/arm/mediatek/mediatek,sysirq.txt         | 3 ++-
 Documentation/devicetree/bindings/serial/mtk-uart.txt            | 5 +++--
 3 files changed, 12 insertions(+), 5 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/mediatek.txt b/Documentation/devicetree/bindings/arm/mediatek.txt
index dd7550a29db6..ce26ff7aa9b0 100644
--- a/Documentation/devicetree/bindings/arm/mediatek.txt
+++ b/Documentation/devicetree/bindings/arm/mediatek.txt
@@ -1,12 +1,14 @@
-MediaTek mt65xx & mt81xx Platforms Device Tree Bindings
+MediaTek mt65xx, mt67xx & mt81xx Platforms Device Tree Bindings
 
-Boards with a MediaTek mt65xx/mt81xx SoC shall have the following property:
+Boards with a MediaTek mt65xx/mt67xx/mt81xx SoC shall have the
+following property:
 
 Required root node property:
 
 compatible: Must contain one of
    "mediatek,mt6589"
    "mediatek,mt6592"
+   "mediatek,mt6795"
    "mediatek,mt8127"
    "mediatek,mt8135"
    "mediatek,mt8173"
@@ -20,6 +22,9 @@ Supported boards:
 - Evaluation board for MT6592:
     Required root node properties:
       - compatible = "mediatek,mt6592-evb", "mediatek,mt6592";
+- Evaluation board for MT6795(Helio X10):
+    Required root node properties:
+      - compatible = "mediatek,mt6795-evb", "mediatek,mt6795";
 - MTK mt8127 tablet moose EVB:
     Required root node properties:
       - compatible = "mediatek,mt8127-moose", "mediatek,mt8127";
diff --git a/Documentation/devicetree/bindings/arm/mediatek/mediatek,sysirq.txt b/Documentation/devicetree/bindings/arm/mediatek/mediatek,sysirq.txt
index 4f5a5352ccd8..8136b6d44871 100644
--- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,sysirq.txt
+++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,sysirq.txt
@@ -1,4 +1,4 @@
-Mediatek 65xx/81xx sysirq
++Mediatek 65xx/67xx/81xx sysirq
 
 Mediatek SOCs sysirq support controllable irq inverter for each GIC SPI
 interrupt.
@@ -8,6 +8,7 @@ Required properties:
 	"mediatek,mt8173-sysirq"
 	"mediatek,mt8135-sysirq"
 	"mediatek,mt8127-sysirq"
+	"mediatek,mt6795-sysirq"
 	"mediatek,mt6592-sysirq"
 	"mediatek,mt6589-sysirq"
 	"mediatek,mt6582-sysirq"
diff --git a/Documentation/devicetree/bindings/serial/mtk-uart.txt b/Documentation/devicetree/bindings/serial/mtk-uart.txt
index 8d63f1da07aa..5991511fa7ee 100644
--- a/Documentation/devicetree/bindings/serial/mtk-uart.txt
+++ b/Documentation/devicetree/bindings/serial/mtk-uart.txt
@@ -5,10 +5,11 @@ Required properties:
   * "mediatek,mt8135-uart" for MT8135 compatible UARTS
   * "mediatek,mt8127-uart" for MT8127 compatible UARTS
   * "mediatek,mt8173-uart" for MT8173 compatible UARTS
+  * "mediatek,mt6795-uart" for MT6795 compatible UARTS
   * "mediatek,mt6589-uart" for MT6589 compatible UARTS
   * "mediatek,mt6582-uart" for MT6582 compatible UARTS
-  * "mediatek,mt6577-uart" for all compatible UARTS (MT8173, MT6589, MT6582, 
-	MT6577)
+  * "mediatek,mt6577-uart" for all compatible UARTS (MT8173, MT6795, MT6589,
+	MT6582, MT6577)
 
 - reg: The base address of the UART register bank.
 
-- 
cgit v1.2.3


From 82ff50b222d8ac645cdeba974c612c9eef01c3dd Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.com>
Date: Tue, 14 Jul 2015 14:55:05 +0200
Subject: doc: Update doc about journalling layer

Documentation of journalling layer in
Documentation/DocBook/filesystems.tmpl speaks about JBD layer. Since
that is going away, update the documentation to speak about JBD2. Also
update the parts that have changed since someone last touched the
document and remove some parts which are just misleading and outdated.

Signed-off-by: Jan Kara <jack@suse.com>
---
 Documentation/DocBook/filesystems.tmpl | 178 +++++++++++++--------------------
 1 file changed, 67 insertions(+), 111 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/filesystems.tmpl b/Documentation/DocBook/filesystems.tmpl
index bcdfdb9a9277..6006b6358c86 100644
--- a/Documentation/DocBook/filesystems.tmpl
+++ b/Documentation/DocBook/filesystems.tmpl
@@ -146,36 +146,30 @@
 The journalling layer is  easy to use. You need to
 first of all create a journal_t data structure. There are
 two calls to do this dependent on how you decide to allocate the physical
-media on which the journal resides. The journal_init_inode() call
-is for journals stored in filesystem inodes, or the journal_init_dev()
-call can be use for journal stored on a raw device (in a continuous range
+media on which the journal resides. The jbd2_journal_init_inode() call
+is for journals stored in filesystem inodes, or the jbd2_journal_init_dev()
+call can be used for journal stored on a raw device (in a continuous range
 of blocks). A journal_t is a typedef for a struct pointer, so when
-you are finally finished make sure you call journal_destroy() on it
+you are finally finished make sure you call jbd2_journal_destroy() on it
 to free up any used kernel memory.
 </para>
 
 <para>
 Once you have got your journal_t object you need to 'mount' or load the journal
-file, unless of course you haven't initialised it yet - in which case you
-need to call journal_create().
+file. The journalling layer expects the space for the journal was already
+allocated and initialized properly by the userspace tools.  When loading the
+journal you must call jbd2_journal_load() to process journal contents.  If the
+client file system detects the journal contents does not need to be processed
+(or even need not have valid contents), it may call jbd2_journal_wipe() to
+clear the journal contents before calling jbd2_journal_load().
 </para>
 
 <para>
-Most of the time however your journal file will already have been created, but
-before you load it you must call journal_wipe() to empty the journal file.
-Hang on, you say , what if the filesystem wasn't cleanly umount()'d . Well, it is the
-job of the client file system to detect this and skip the call to journal_wipe().
-</para>
-
-<para>
-In either case the next call should be to journal_load() which prepares the
-journal file for use. Note that journal_wipe(..,0) calls journal_skip_recovery()
-for you if it detects any outstanding transactions in the journal and similarly
-journal_load() will call journal_recover() if necessary.
-I would advise reading fs/ext3/super.c for examples on this stage.
-[RGG: Why is the journal_wipe() call necessary - doesn't this needlessly
-complicate the API. Or isn't a good idea for the journal layer to hide
-dirty mounts from the client fs]
+Note that jbd2_journal_wipe(..,0) calls jbd2_journal_skip_recovery() for you if
+it detects any outstanding transactions in the journal and similarly
+jbd2_journal_load() will call jbd2_journal_recover() if necessary.  I would
+advise reading ext4_load_journal() in fs/ext4/super.c for examples on this
+stage.
 </para>
 
 <para>
@@ -189,41 +183,41 @@ You still need to actually journal your filesystem changes, this
 is done by wrapping them into transactions. Additionally you
 also need to wrap the modification of each of the buffers
 with calls to the journal layer, so it knows what the modifications
-you are actually making are. To do this use  journal_start() which
+you are actually making are. To do this use jbd2_journal_start() which
 returns a transaction handle.
 </para>
 
 <para>
-journal_start()
-and its counterpart journal_stop(), which indicates the end of a transaction
-are nestable calls, so you can reenter a transaction if necessary,
-but remember you must call journal_stop() the same number of times as
-journal_start() before the transaction is completed (or more accurately
-leaves the update phase). Ext3/VFS makes use of this feature to simplify
-quota support.
+jbd2_journal_start()
+and its counterpart jbd2_journal_stop(), which indicates the end of a
+transaction are nestable calls, so you can reenter a transaction if necessary,
+but remember you must call jbd2_journal_stop() the same number of times as
+jbd2_journal_start() before the transaction is completed (or more accurately
+leaves the update phase). Ext4/VFS makes use of this feature to simplify
+handling of inode dirtying, quota support, etc.
 </para>
 
 <para>
 Inside each transaction you need to wrap the modifications to the
 individual buffers (blocks). Before you start to modify a buffer you
-need to call journal_get_{create,write,undo}_access() as appropriate,
+need to call jbd2_journal_get_{create,write,undo}_access() as appropriate,
 this allows the journalling layer to copy the unmodified data if it
 needs to. After all the buffer may be part of a previously uncommitted
 transaction.
 At this point you are at last ready to modify a buffer, and once
-you are have done so you need to call journal_dirty_{meta,}data().
+you are have done so you need to call jbd2_journal_dirty_{meta,}data().
 Or if you've asked for access to a buffer you now know is now longer
-required to be pushed back on the device you can call journal_forget()
+required to be pushed back on the device you can call jbd2_journal_forget()
 in much the same way as you might have used bforget() in the past.
 </para>
 
 <para>
-A journal_flush() may be called at any time to commit and checkpoint
+A jbd2_journal_flush() may be called at any time to commit and checkpoint
 all your transactions.
 </para>
 
 <para>
-Then at umount time , in your put_super() you can then call journal_destroy()
+Then at umount time , in your put_super() you can then call jbd2_journal_destroy()
 to clean up your in-core journal object.
 </para>
 
@@ -231,82 +225,74 @@ to clean up your in-core journal object.
 Unfortunately there a couple of ways the journal layer can cause a deadlock.
 The first thing to note is that each task can only have
 a single outstanding transaction at any one time, remember nothing
-commits until the outermost journal_stop(). This means
+commits until the outermost jbd2_journal_stop(). This means
 you must complete the transaction at the end of each file/inode/address
 etc. operation you perform, so that the journalling system isn't re-entered
 on another journal. Since transactions can't be nested/batched
 across differing journals, and another filesystem other than
-yours (say ext3) may be modified in a later syscall.
+yours (say ext4) may be modified in a later syscall.
 </para>
 
 <para>
-The second case to bear in mind is that journal_start() can
+The second case to bear in mind is that jbd2_journal_start() can
 block if there isn't enough space in the journal for your transaction
 (based on the passed nblocks param) - when it blocks it merely(!) needs to
 wait for transactions to complete and be committed from other tasks,
-so essentially we are waiting for journal_stop(). So to avoid
-deadlocks you must treat journal_start/stop() as if they
+so essentially we are waiting for jbd2_journal_stop(). So to avoid
+deadlocks you must treat jbd2_journal_start/stop() as if they
 were semaphores and include them in your semaphore ordering rules to prevent
-deadlocks. Note that journal_extend() has similar blocking behaviour to
-journal_start() so you can deadlock here just as easily as on journal_start().
+deadlocks. Note that jbd2_journal_extend() has similar blocking behaviour to
+jbd2_journal_start() so you can deadlock here just as easily as on
+jbd2_journal_start().
 </para>
 
 <para>
 Try to reserve the right number of blocks the first time. ;-). This will
 be the maximum number of blocks you are going to touch in this transaction.
-I advise having a look at at least ext3_jbd.h to see the basis on which
-ext3 uses to make these decisions.
+I advise having a look at at least ext4_jbd.h to see the basis on which
+ext4 uses to make these decisions.
 </para>
 
 <para>
 Another wriggle to watch out for is your on-disk block allocation strategy.
-why? Because, if you undo a delete, you need to ensure you haven't reused any
-of the freed blocks in a later transaction. One simple way of doing this
-is make sure any blocks you allocate only have checkpointed transactions
-listed against them. Ext3 does this in ext3_test_allocatable().
+Why? Because, if you do a delete, you need to ensure you haven't reused any
+of the freed blocks until the transaction freeing these blocks commits. If you
+reused these blocks and crash happens, there is no way to restore the contents
+of the reallocated blocks at the end of the last fully committed transaction.
+
+One simple way of doing this is to mark blocks as free in internal in-memory
+block allocation structures only after the transaction freeing them commits.
+Ext4 uses journal commit callback for this purpose.
+</para>
+
+<para>
+With journal commit callbacks you can ask the journalling layer to call a
+callback function when the transaction is finally committed to disk, so that
+you can do some of your own management. You ask the journalling layer for
+calling the callback by simply setting journal->j_commit_callback function
+pointer and that function is called after each transaction commit. You can also
+use transaction->t_private_list for attaching entries to a transaction that
+need processing when the transaction commits.
 </para>
 
 <para>
-Lock is also providing through journal_{un,}lock_updates(),
-ext3 uses this when it wants a window with a clean and stable fs for a moment.
-eg.
+JBD2 also provides a way to block all transaction updates via
+jbd2_journal_{un,}lock_updates(). Ext4 uses this when it wants a window with a
+clean and stable fs for a moment.  E.g.
 </para>
 
 <programlisting>
 
-	journal_lock_updates() //stop new stuff happening..
-	journal_flush()        // checkpoint everything.
+	jbd2_journal_lock_updates() //stop new stuff happening..
+	jbd2_journal_flush()        // checkpoint everything.
 	..do stuff on stable fs
-	journal_unlock_updates() // carry on with filesystem use.
+	jbd2_journal_unlock_updates() // carry on with filesystem use.
 </programlisting>
 
 <para>
 The opportunities for abuse and DOS attacks with this should be obvious,
 if you allow unprivileged userspace to trigger codepaths containing these
 calls.
-</para>
-
-<para>
-A new feature of jbd since 2.5.25 is commit callbacks with the new
-journal_callback_set() function you can now ask the journalling layer
-to call you back when the transaction is finally committed to disk, so that
-you can do some of your own management. The key to this is the journal_callback
-struct, this maintains the internal callback information but you can
-extend it like this:-
-</para>
-<programlisting>
-	struct  myfs_callback_s {
-		//Data structure element required by jbd..
-		struct journal_callback for_jbd;
-		// Stuff for myfs allocated together.
-		myfs_inode*    i_commited;
-
-	}
-</programlisting>
-
-<para>
-this would be useful if you needed to know when data was committed to a
-particular inode.
 </para>
 
     </sect2>
@@ -319,36 +305,6 @@ being each mount, each modification (transaction) and each changed buffer
 to tell the journalling layer about them.
 </para>
 
-<para>
-Here is a some pseudo code to give you an idea of how it works, as
-an example.
-</para>
-
-<programlisting>
-  journal_t* my_jnrl = journal_create();
-  journal_init_{dev,inode}(jnrl,...)
-  if (clean) journal_wipe();
-  journal_load();
-
-   foreach(transaction) { /*transactions must be
-                            completed before
-                            a syscall returns to
-                            userspace*/
-
-          handle_t * xct=journal_start(my_jnrl);
-          foreach(bh) {
-                journal_get_{create,write,undo}_access(xact,bh);
-                if ( myfs_modify(bh) ) { /* returns true
-                                        if makes changes */
-                           journal_dirty_{meta,}data(xact,bh);
-                } else {
-                           journal_forget(bh);
-                }
-          }
-          journal_stop(xct);
-   }
-   journal_destroy(my_jrnl);
-</programlisting>
     </sect2>
 
     </sect1>
@@ -357,13 +313,13 @@ an example.
      <title>Data Types</title>
      <para>
 	The journalling layer uses typedefs to 'hide' the concrete definitions
-	of the structures used. As a client of the JBD layer you can
+	of the structures used. As a client of the JBD2 layer you can
 	just rely on the using the pointer as a magic cookie  of some sort.
 
 	Obviously the hiding is not enforced as this is 'C'.
      </para>
 	<sect2 id="structures"><title>Structures</title>
-!Iinclude/linux/jbd.h
+!Iinclude/linux/jbd2.h
 	</sect2>
     </sect1>
 
@@ -375,11 +331,11 @@ an example.
 	manage transactions
      </para>
 	<sect2 id="journal_level"><title>Journal Level</title>
-!Efs/jbd/journal.c
-!Ifs/jbd/recovery.c
+!Efs/jbd2/journal.c
+!Ifs/jbd2/recovery.c
 	</sect2>
 	<sect2 id="transaction_level"><title>Transasction Level</title>
-!Efs/jbd/transaction.c
+!Efs/jbd2/transaction.c
 	</sect2>
     </sect1>
     <sect1 id="see_also">
-- 
cgit v1.2.3


From c290ea01abb7907fde602f3ba55905ef10a37477 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 18 Jun 2015 16:52:29 +0200
Subject: fs: Remove ext3 filesystem driver

The functionality of ext3 is fully supported by ext4 driver. Major
distributions (SUSE, RedHat) already use ext4 driver to handle ext3
filesystems for quite some time. There is some ugliness in mm resulting
from jbd cleaning buffers in a dirty page without cleaning page dirty
bit and also support for buffer bouncing in the block layer when stable
pages are required is there only because of jbd. So let's remove the
ext3 driver. This saves us some 28k lines of duplicated code.

Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 Documentation/filesystems/ext2.txt |    4 +-
 Documentation/filesystems/ext3.txt |  209 +--
 Documentation/filesystems/vfs.txt  |    2 +-
 MAINTAINERS                        |   18 +-
 fs/Kconfig                         |    5 +-
 fs/Makefile                        |    2 -
 fs/ext3/Kconfig                    |   89 -
 fs/ext3/Makefile                   |   12 -
 fs/ext3/acl.c                      |  281 ---
 fs/ext3/acl.h                      |   72 -
 fs/ext3/balloc.c                   | 2158 ----------------------
 fs/ext3/bitmap.c                   |   20 -
 fs/ext3/dir.c                      |  537 ------
 fs/ext3/ext3.h                     | 1332 --------------
 fs/ext3/ext3_jbd.c                 |   59 -
 fs/ext3/file.c                     |   79 -
 fs/ext3/fsync.c                    |  109 --
 fs/ext3/hash.c                     |  206 ---
 fs/ext3/ialloc.c                   |  706 -------
 fs/ext3/inode.c                    | 3574 ------------------------------------
 fs/ext3/ioctl.c                    |  327 ----
 fs/ext3/namei.c                    | 2586 --------------------------
 fs/ext3/namei.h                    |   27 -
 fs/ext3/resize.c                   | 1117 -----------
 fs/ext3/super.c                    | 3165 -------------------------------
 fs/ext3/symlink.c                  |   46 -
 fs/ext3/xattr.c                    | 1330 --------------
 fs/ext3/xattr.h                    |  136 --
 fs/ext3/xattr_security.c           |   78 -
 fs/ext3/xattr_trusted.c            |   54 -
 fs/ext3/xattr_user.c               |   58 -
 fs/ext4/Kconfig                    |   41 +-
 fs/ext4/super.c                    |   14 +-
 fs/jbd/Kconfig                     |   30 -
 fs/jbd/Makefile                    |    7 -
 fs/jbd/checkpoint.c                |  782 --------
 fs/jbd/commit.c                    | 1021 ----------
 fs/jbd/journal.c                   | 2145 ----------------------
 fs/jbd/recovery.c                  |  594 ------
 fs/jbd/revoke.c                    |  733 --------
 fs/jbd/transaction.c               | 2237 ----------------------
 include/linux/jbd.h                | 1047 -----------
 include/linux/jbd2.h               |   41 +-
 include/linux/jbd_common.h         |   46 -
 include/trace/events/ext3.h        |  866 ---------
 include/trace/events/jbd.h         |  194 --
 46 files changed, 87 insertions(+), 28109 deletions(-)
 delete mode 100644 fs/ext3/Kconfig
 delete mode 100644 fs/ext3/Makefile
 delete mode 100644 fs/ext3/acl.c
 delete mode 100644 fs/ext3/acl.h
 delete mode 100644 fs/ext3/balloc.c
 delete mode 100644 fs/ext3/bitmap.c
 delete mode 100644 fs/ext3/dir.c
 delete mode 100644 fs/ext3/ext3.h
 delete mode 100644 fs/ext3/ext3_jbd.c
 delete mode 100644 fs/ext3/file.c
 delete mode 100644 fs/ext3/fsync.c
 delete mode 100644 fs/ext3/hash.c
 delete mode 100644 fs/ext3/ialloc.c
 delete mode 100644 fs/ext3/inode.c
 delete mode 100644 fs/ext3/ioctl.c
 delete mode 100644 fs/ext3/namei.c
 delete mode 100644 fs/ext3/namei.h
 delete mode 100644 fs/ext3/resize.c
 delete mode 100644 fs/ext3/super.c
 delete mode 100644 fs/ext3/symlink.c
 delete mode 100644 fs/ext3/xattr.c
 delete mode 100644 fs/ext3/xattr.h
 delete mode 100644 fs/ext3/xattr_security.c
 delete mode 100644 fs/ext3/xattr_trusted.c
 delete mode 100644 fs/ext3/xattr_user.c
 delete mode 100644 fs/jbd/Kconfig
 delete mode 100644 fs/jbd/Makefile
 delete mode 100644 fs/jbd/checkpoint.c
 delete mode 100644 fs/jbd/commit.c
 delete mode 100644 fs/jbd/journal.c
 delete mode 100644 fs/jbd/recovery.c
 delete mode 100644 fs/jbd/revoke.c
 delete mode 100644 fs/jbd/transaction.c
 delete mode 100644 include/linux/jbd.h
 delete mode 100644 include/linux/jbd_common.h
 delete mode 100644 include/trace/events/ext3.h
 delete mode 100644 include/trace/events/jbd.h

(limited to 'Documentation')

diff --git a/Documentation/filesystems/ext2.txt b/Documentation/filesystems/ext2.txt
index b9714569e472..55755395d3dc 100644
--- a/Documentation/filesystems/ext2.txt
+++ b/Documentation/filesystems/ext2.txt
@@ -360,8 +360,8 @@ and are copied into the filesystem.  If a transaction is incomplete at
 the time of the crash, then there is no guarantee of consistency for
 the blocks in that transaction so they are discarded (which means any
 filesystem changes they represent are also lost).
-Check Documentation/filesystems/ext3.txt if you want to read more about
-ext3 and journaling.
+Check Documentation/filesystems/ext4.txt if you want to read more about
+ext4 and journaling.
 
 References
 ==========
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index 7ed0d17d6721..58758fbef9e0 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -6,210 +6,7 @@ Ext3 was originally released in September 1999. Written by Stephen Tweedie
 for the 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger,
 Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie.
 
-Ext3 is the ext2 filesystem enhanced with journalling capabilities.
+Ext3 is the ext2 filesystem enhanced with journalling capabilities. The
+filesystem is a subset of ext4 filesystem so use ext4 driver for accessing
+ext3 filesystems.
 
-Options
-=======
-
-When mounting an ext3 filesystem, the following option are accepted:
-(*) == default
-
-ro			Mount filesystem read only. Note that ext3 will replay
-			the journal (and thus write to the partition) even when
-			mounted "read only". Mount options "ro,noload" can be
-			used to prevent writes to the filesystem.
-
-journal=update		Update the ext3 file system's journal to the current
-			format.
-
-journal=inum		When a journal already exists, this option is ignored.
-			Otherwise, it specifies the number of the inode which
-			will represent the ext3 file system's journal file.
-
-journal_path=path
-journal_dev=devnum	When the external journal device's major/minor numbers
-			have changed, these options allow the user to specify
-			the new journal location.  The journal device is
-			identified through either its new major/minor numbers
-			encoded in devnum, or via a path to the device.
-
-norecovery		Don't load the journal on mounting. Note that this forces
-noload			mount of inconsistent filesystem, which can lead to
-			various problems.
-
-data=journal		All data are committed into the journal prior to being
-			written into the main file system.
-
-data=ordered	(*)	All data are forced directly out to the main file
-			system prior to its metadata being committed to the
-			journal.
-
-data=writeback		Data ordering is not preserved, data may be written
-			into the main file system after its metadata has been
-			committed to the journal.
-
-commit=nrsec	(*)	Ext3 can be told to sync all its data and metadata
-			every 'nrsec' seconds. The default value is 5 seconds.
-			This means that if you lose your power, you will lose
-			as much as the latest 5 seconds of work (your
-			filesystem will not be damaged though, thanks to the
-			journaling).  This default value (or any low value)
-			will hurt performance, but it's good for data-safety.
-			Setting it to 0 will have the same effect as leaving
-			it at the default (5 seconds).
-			Setting it to very large values will improve
-			performance.
-
-barrier=<0|1(*)>	This enables/disables the use of write barriers in
-barrier	(*)		the jbd code.  barrier=0 disables, barrier=1 enables.
-nobarrier		This also requires an IO stack which can support
-			barriers, and if jbd gets an error on a barrier
-			write, it will disable again with a warning.
-			Write barriers enforce proper on-disk ordering
-			of journal commits, making volatile disk write caches
-			safe to use, at some performance penalty.  If
-			your disks are battery-backed in one way or another,
-			disabling barriers may safely improve performance.
-			The mount options "barrier" and "nobarrier" can
-			also be used to enable or disable barriers, for
-			consistency with other ext3 mount options.
-
-user_xattr		Enables Extended User Attributes.  Additionally, you
-			need to have extended attribute support enabled in the
-			kernel configuration (CONFIG_EXT3_FS_XATTR).  See the
-			attr(5) manual page and http://acl.bestbits.at/ to
-			learn more about extended attributes.
-
-nouser_xattr		Disables Extended User Attributes.
-
-acl			Enables POSIX Access Control Lists support.
-			Additionally, you need to have ACL support enabled in
-			the kernel configuration (CONFIG_EXT3_FS_POSIX_ACL).
-			See the acl(5) manual page and http://acl.bestbits.at/
-			for more information.
-
-noacl			This option disables POSIX Access Control List
-			support.
-
-reservation
-
-noreservation
-
-bsddf 		(*)	Make 'df' act like BSD.
-minixdf			Make 'df' act like Minix.
-
-check=none		Don't do extra checking of bitmaps on mount.
-nocheck
-
-debug			Extra debugging information is sent to syslog.
-
-errors=remount-ro	Remount the filesystem read-only on an error.
-errors=continue		Keep going on a filesystem error.
-errors=panic		Panic and halt the machine if an error occurs.
-			(These mount options override the errors behavior
-			specified in the superblock, which can be
-			configured using tune2fs.)
-
-data_err=ignore(*)	Just print an error message if an error occurs
-			in a file data buffer in ordered mode.
-data_err=abort		Abort the journal if an error occurs in a file
-			data buffer in ordered mode.
-
-grpid			Give objects the same group ID as their creator.
-bsdgroups
-
-nogrpid		(*)	New objects have the group ID of their creator.
-sysvgroups
-
-resgid=n		The group ID which may use the reserved blocks.
-
-resuid=n		The user ID which may use the reserved blocks.
-
-sb=n			Use alternate superblock at this location.
-
-quota			These options are ignored by the filesystem. They
-noquota			are used only by quota tools to recognize volumes
-grpquota		where quota should be turned on. See documentation
-usrquota		in the quota-tools package for more details
-			(http://sourceforge.net/projects/linuxquota).
-
-jqfmt=<quota type>	These options tell filesystem details about quota
-usrjquota=<file>	so that quota information can be properly updated
-grpjquota=<file>	during journal replay. They replace the above
-			quota options. See documentation in the quota-tools
-			package for more details
-			(http://sourceforge.net/projects/linuxquota).
-
-Specification
-=============
-Ext3 shares all disk implementation with the ext2 filesystem, and adds
-transactions capabilities to ext2.  Journaling is done by the Journaling Block
-Device layer.
-
-Journaling Block Device layer
------------------------------
-The Journaling Block Device layer (JBD) isn't ext3 specific.  It was designed
-to add journaling capabilities to a block device.  The ext3 filesystem code
-will inform the JBD of modifications it is performing (called a transaction).
-The journal supports the transactions start and stop, and in case of a crash,
-the journal can replay the transactions to quickly put the partition back into
-a consistent state.
-
-Handles represent a single atomic update to a filesystem.  JBD can handle an
-external journal on a block device.
-
-Data Mode
----------
-There are 3 different data modes:
-
-* writeback mode
-In data=writeback mode, ext3 does not journal data at all.  This mode provides
-a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
-mode - metadata journaling.  A crash+recovery can cause incorrect data to
-appear in files which were written shortly before the crash.  This mode will
-typically provide the best ext3 performance.
-
-* ordered mode
-In data=ordered mode, ext3 only officially journals metadata, but it logically
-groups metadata and data blocks into a single unit called a transaction.  When
-it's time to write the new metadata out to disk, the associated data blocks
-are written first.  In general, this mode performs slightly slower than
-writeback but significantly faster than journal mode.
-
-* journal mode
-data=journal mode provides full data and metadata journaling.  All new data is
-written to the journal first, and then to its final location.
-In the event of a crash, the journal can be replayed, bringing both data and
-metadata into a consistent state.  This mode is the slowest except when data
-needs to be read from and written to disk at the same time where it
-outperforms all other modes.
-
-Compatibility
--------------
-
-Ext2 partitions can be easily convert to ext3, with `tune2fs -j <dev>`.
-Ext3 is fully compatible with Ext2.  Ext3 partitions can easily be mounted as
-Ext2.
-
-
-External Tools
-==============
-See manual pages to learn more.
-
-tune2fs: 	create a ext3 journal on a ext2 partition with the -j flag.
-mke2fs: 	create a ext3 partition with the -j flag.
-debugfs: 	ext2 and ext3 file system debugger.
-ext2online:	online (mounted) ext2 and ext3 filesystem resizer
-
-
-References
-==========
-
-kernel source:	<file:fs/ext3/>
-		<file:fs/jbd/>
-
-programs: 	http://e2fsprogs.sourceforge.net/
-		http://ext2resize.sourceforge.net
-
-useful links:	http://www.ibm.com/developerworks/library/l-fs7/index.html
-        http://www.ibm.com/developerworks/library/l-fs8/index.html
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 5eb8456fc41e..8c6f07ad373a 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -769,7 +769,7 @@ struct address_space_operations {
 	to stall to allow flushers a chance to complete some IO. Ordinarily
 	it can use PageDirty and PageWriteback but some filesystems have
 	more complex state (unstable pages in NFS prevent reclaim) or
-	do not set those flags due to locking problems (jbd). This callback
+	do not set those flags due to locking problems. This callback
 	allows a filesystem to indicate to the VM if a page should be
 	treated as dirty or writeback for the purposes of stalling.
 
diff --git a/MAINTAINERS b/MAINTAINERS
index a2264167791a..0555bdb72c0d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4059,15 +4059,6 @@ F:	Documentation/filesystems/ext2.txt
 F:	fs/ext2/
 F:	include/linux/ext2*
 
-EXT3 FILE SYSTEM
-M:	Jan Kara <jack@suse.com>
-M:	Andrew Morton <akpm@linux-foundation.org>
-M:	Andreas Dilger <adilger.kernel@dilger.ca>
-L:	linux-ext4@vger.kernel.org
-S:	Maintained
-F:	Documentation/filesystems/ext3.txt
-F:	fs/ext3/
-
 EXT4 FILE SYSTEM
 M:	"Theodore Ts'o" <tytso@mit.edu>
 M:	Andreas Dilger <adilger.kernel@dilger.ca>
@@ -5751,16 +5742,9 @@ S:	Maintained
 F:	fs/jffs2/
 F:	include/uapi/linux/jffs2.h
 
-JOURNALLING LAYER FOR BLOCK DEVICES (JBD)
-M:	Andrew Morton <akpm@linux-foundation.org>
-M:	Jan Kara <jack@suse.com>
-L:	linux-ext4@vger.kernel.org
-S:	Maintained
-F:	fs/jbd/
-F:	include/linux/jbd.h
-
 JOURNALLING LAYER FOR BLOCK DEVICES (JBD2)
 M:	"Theodore Ts'o" <tytso@mit.edu>
+M:	Jan Kara <jack@suse.com>
 L:	linux-ext4@vger.kernel.org
 S:	Maintained
 F:	fs/jbd2/
diff --git a/fs/Kconfig b/fs/Kconfig
index 011f43365d7b..da3f32f1a4e4 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -11,18 +11,15 @@ config DCACHE_WORD_ACCESS
 if BLOCK
 
 source "fs/ext2/Kconfig"
-source "fs/ext3/Kconfig"
 source "fs/ext4/Kconfig"
-source "fs/jbd/Kconfig"
 source "fs/jbd2/Kconfig"
 
 config FS_MBCACHE
 # Meta block cache for Extended Attributes (ext2/ext3/ext4)
 	tristate
 	default y if EXT2_FS=y && EXT2_FS_XATTR
-	default y if EXT3_FS=y && EXT3_FS_XATTR
 	default y if EXT4_FS=y
-	default m if EXT2_FS_XATTR || EXT3_FS_XATTR || EXT4_FS
+	default m if EXT2_FS_XATTR || EXT4_FS
 
 source "fs/reiserfs/Kconfig"
 source "fs/jfs/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index cb20e4bf2303..09e051fefc5b 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -62,12 +62,10 @@ obj-$(CONFIG_DLM)		+= dlm/
 # Do not add any filesystems before this line
 obj-$(CONFIG_FSCACHE)		+= fscache/
 obj-$(CONFIG_REISERFS_FS)	+= reiserfs/
-obj-$(CONFIG_EXT3_FS)		+= ext3/ # Before ext2 so root fs can be ext3
 obj-$(CONFIG_EXT2_FS)		+= ext2/
 # We place ext4 after ext2 so plain ext2 root fs's are mounted using ext2
 # unless explicitly requested by rootfstype
 obj-$(CONFIG_EXT4_FS)		+= ext4/
-obj-$(CONFIG_JBD)		+= jbd/
 obj-$(CONFIG_JBD2)		+= jbd2/
 obj-$(CONFIG_CRAMFS)		+= cramfs/
 obj-$(CONFIG_SQUASHFS)		+= squashfs/
diff --git a/fs/ext3/Kconfig b/fs/ext3/Kconfig
deleted file mode 100644
index e8c6ba0e4a3e..000000000000
--- a/fs/ext3/Kconfig
+++ /dev/null
@@ -1,89 +0,0 @@
-config EXT3_FS
-	tristate "Ext3 journalling file system support"
-	select JBD
-	help
-	  This is the journalling version of the Second extended file system
-	  (often called ext3), the de facto standard Linux file system
-	  (method to organize files on a storage device) for hard disks.
-
-	  The journalling code included in this driver means you do not have
-	  to run e2fsck (file system checker) on your file systems after a
-	  crash.  The journal keeps track of any changes that were being made
-	  at the time the system crashed, and can ensure that your file system
-	  is consistent without the need for a lengthy check.
-
-	  Other than adding the journal to the file system, the on-disk format
-	  of ext3 is identical to ext2.  It is possible to freely switch
-	  between using the ext3 driver and the ext2 driver, as long as the
-	  file system has been cleanly unmounted, or e2fsck is run on the file
-	  system.
-
-	  To add a journal on an existing ext2 file system or change the
-	  behavior of ext3 file systems, you can use the tune2fs utility ("man
-	  tune2fs").  To modify attributes of files and directories on ext3
-	  file systems, use chattr ("man chattr").  You need to be using
-	  e2fsprogs version 1.20 or later in order to create ext3 journals
-	  (available at <http://sourceforge.net/projects/e2fsprogs/>).
-
-	  To compile this file system support as a module, choose M here: the
-	  module will be called ext3.
-
-config EXT3_DEFAULTS_TO_ORDERED
-	bool "Default to 'data=ordered' in ext3"
-	depends on EXT3_FS
-	default y
-	help
-	  The journal mode options for ext3 have different tradeoffs
-	  between when data is guaranteed to be on disk and
-	  performance.	The use of "data=writeback" can cause
-	  unwritten data to appear in files after an system crash or
-	  power failure, which can be a security issue.	 However,
-	  "data=ordered" mode can also result in major performance
-	  problems, including seconds-long delays before an fsync()
-	  call returns.	 For details, see:
-
-	  http://ext4.wiki.kernel.org/index.php/Ext3_data_mode_tradeoffs
-
-	  If you have been historically happy with ext3's performance,
-	  data=ordered mode will be a safe choice and you should
-	  answer 'y' here.  If you understand the reliability and data
-	  privacy issues of data=writeback and are willing to make
-	  that trade off, answer 'n'.
-
-config EXT3_FS_XATTR
-	bool "Ext3 extended attributes"
-	depends on EXT3_FS
-	default y
-	help
-	  Extended attributes are name:value pairs associated with inodes by
-	  the kernel or by users (see the attr(5) manual page, or visit
-	  <http://acl.bestbits.at/> for details).
-
-	  If unsure, say N.
-
-	  You need this for POSIX ACL support on ext3.
-
-config EXT3_FS_POSIX_ACL
-	bool "Ext3 POSIX Access Control Lists"
-	depends on EXT3_FS_XATTR
-	select FS_POSIX_ACL
-	help
-	  Posix Access Control Lists (ACLs) support permissions for users and
-	  groups beyond the owner/group/world scheme.
-
-	  To learn more about Access Control Lists, visit the Posix ACLs for
-	  Linux website <http://acl.bestbits.at/>.
-
-	  If you don't know what Access Control Lists are, say N
-
-config EXT3_FS_SECURITY
-	bool "Ext3 Security Labels"
-	depends on EXT3_FS_XATTR
-	help
-	  Security labels support alternative access control models
-	  implemented by security modules like SELinux.  This option
-	  enables an extended attribute handler for file security
-	  labels in the ext3 filesystem.
-
-	  If you are not using a security module that requires using
-	  extended attributes for file security labels, say N.
diff --git a/fs/ext3/Makefile b/fs/ext3/Makefile
deleted file mode 100644
index e77766a8b3f0..000000000000
--- a/fs/ext3/Makefile
+++ /dev/null
@@ -1,12 +0,0 @@
-#
-# Makefile for the linux ext3-filesystem routines.
-#
-
-obj-$(CONFIG_EXT3_FS) += ext3.o
-
-ext3-y	:= balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \
-	   ioctl.o namei.o super.o symlink.o hash.o resize.o ext3_jbd.o
-
-ext3-$(CONFIG_EXT3_FS_XATTR)	 += xattr.o xattr_user.o xattr_trusted.o
-ext3-$(CONFIG_EXT3_FS_POSIX_ACL) += acl.o
-ext3-$(CONFIG_EXT3_FS_SECURITY)	 += xattr_security.o
diff --git a/fs/ext3/acl.c b/fs/ext3/acl.c
deleted file mode 100644
index 8bbaf5bcf982..000000000000
--- a/fs/ext3/acl.c
+++ /dev/null
@@ -1,281 +0,0 @@
-/*
- * linux/fs/ext3/acl.c
- *
- * Copyright (C) 2001-2003 Andreas Gruenbacher, <agruen@suse.de>
- */
-
-#include "ext3.h"
-#include "xattr.h"
-#include "acl.h"
-
-/*
- * Convert from filesystem to in-memory representation.
- */
-static struct posix_acl *
-ext3_acl_from_disk(const void *value, size_t size)
-{
-	const char *end = (char *)value + size;
-	int n, count;
-	struct posix_acl *acl;
-
-	if (!value)
-		return NULL;
-	if (size < sizeof(ext3_acl_header))
-		 return ERR_PTR(-EINVAL);
-	if (((ext3_acl_header *)value)->a_version !=
-	    cpu_to_le32(EXT3_ACL_VERSION))
-		return ERR_PTR(-EINVAL);
-	value = (char *)value + sizeof(ext3_acl_header);
-	count = ext3_acl_count(size);
-	if (count < 0)
-		return ERR_PTR(-EINVAL);
-	if (count == 0)
-		return NULL;
-	acl = posix_acl_alloc(count, GFP_NOFS);
-	if (!acl)
-		return ERR_PTR(-ENOMEM);
-	for (n=0; n < count; n++) {
-		ext3_acl_entry *entry =
-			(ext3_acl_entry *)value;
-		if ((char *)value + sizeof(ext3_acl_entry_short) > end)
-			goto fail;
-		acl->a_entries[n].e_tag  = le16_to_cpu(entry->e_tag);
-		acl->a_entries[n].e_perm = le16_to_cpu(entry->e_perm);
-		switch(acl->a_entries[n].e_tag) {
-			case ACL_USER_OBJ:
-			case ACL_GROUP_OBJ:
-			case ACL_MASK:
-			case ACL_OTHER:
-				value = (char *)value +
-					sizeof(ext3_acl_entry_short);
-				break;
-
-			case ACL_USER:
-				value = (char *)value + sizeof(ext3_acl_entry);
-				if ((char *)value > end)
-					goto fail;
-				acl->a_entries[n].e_uid =
-					make_kuid(&init_user_ns,
-						  le32_to_cpu(entry->e_id));
-				break;
-			case ACL_GROUP:
-				value = (char *)value + sizeof(ext3_acl_entry);
-				if ((char *)value > end)
-					goto fail;
-				acl->a_entries[n].e_gid =
-					make_kgid(&init_user_ns,
-						  le32_to_cpu(entry->e_id));
-				break;
-
-			default:
-				goto fail;
-		}
-	}
-	if (value != end)
-		goto fail;
-	return acl;
-
-fail:
-	posix_acl_release(acl);
-	return ERR_PTR(-EINVAL);
-}
-
-/*
- * Convert from in-memory to filesystem representation.
- */
-static void *
-ext3_acl_to_disk(const struct posix_acl *acl, size_t *size)
-{
-	ext3_acl_header *ext_acl;
-	char *e;
-	size_t n;
-
-	*size = ext3_acl_size(acl->a_count);
-	ext_acl = kmalloc(sizeof(ext3_acl_header) + acl->a_count *
-			sizeof(ext3_acl_entry), GFP_NOFS);
-	if (!ext_acl)
-		return ERR_PTR(-ENOMEM);
-	ext_acl->a_version = cpu_to_le32(EXT3_ACL_VERSION);
-	e = (char *)ext_acl + sizeof(ext3_acl_header);
-	for (n=0; n < acl->a_count; n++) {
-		const struct posix_acl_entry *acl_e = &acl->a_entries[n];
-		ext3_acl_entry *entry = (ext3_acl_entry *)e;
-		entry->e_tag  = cpu_to_le16(acl_e->e_tag);
-		entry->e_perm = cpu_to_le16(acl_e->e_perm);
-		switch(acl_e->e_tag) {
-			case ACL_USER:
-				entry->e_id = cpu_to_le32(
-					from_kuid(&init_user_ns, acl_e->e_uid));
-				e += sizeof(ext3_acl_entry);
-				break;
-			case ACL_GROUP:
-				entry->e_id = cpu_to_le32(
-					from_kgid(&init_user_ns, acl_e->e_gid));
-				e += sizeof(ext3_acl_entry);
-				break;
-
-			case ACL_USER_OBJ:
-			case ACL_GROUP_OBJ:
-			case ACL_MASK:
-			case ACL_OTHER:
-				e += sizeof(ext3_acl_entry_short);
-				break;
-
-			default:
-				goto fail;
-		}
-	}
-	return (char *)ext_acl;
-
-fail:
-	kfree(ext_acl);
-	return ERR_PTR(-EINVAL);
-}
-
-/*
- * Inode operation get_posix_acl().
- *
- * inode->i_mutex: don't care
- */
-struct posix_acl *
-ext3_get_acl(struct inode *inode, int type)
-{
-	int name_index;
-	char *value = NULL;
-	struct posix_acl *acl;
-	int retval;
-
-	switch (type) {
-	case ACL_TYPE_ACCESS:
-		name_index = EXT3_XATTR_INDEX_POSIX_ACL_ACCESS;
-		break;
-	case ACL_TYPE_DEFAULT:
-		name_index = EXT3_XATTR_INDEX_POSIX_ACL_DEFAULT;
-		break;
-	default:
-		BUG();
-	}
-
-	retval = ext3_xattr_get(inode, name_index, "", NULL, 0);
-	if (retval > 0) {
-		value = kmalloc(retval, GFP_NOFS);
-		if (!value)
-			return ERR_PTR(-ENOMEM);
-		retval = ext3_xattr_get(inode, name_index, "", value, retval);
-	}
-	if (retval > 0)
-		acl = ext3_acl_from_disk(value, retval);
-	else if (retval == -ENODATA || retval == -ENOSYS)
-		acl = NULL;
-	else
-		acl = ERR_PTR(retval);
-	kfree(value);
-
-	if (!IS_ERR(acl))
-		set_cached_acl(inode, type, acl);
-
-	return acl;
-}
-
-/*
- * Set the access or default ACL of an inode.
- *
- * inode->i_mutex: down unless called from ext3_new_inode
- */
-static int
-__ext3_set_acl(handle_t *handle, struct inode *inode, int type,
-	     struct posix_acl *acl)
-{
-	int name_index;
-	void *value = NULL;
-	size_t size = 0;
-	int error;
-
-	switch(type) {
-		case ACL_TYPE_ACCESS:
-			name_index = EXT3_XATTR_INDEX_POSIX_ACL_ACCESS;
-			if (acl) {
-				error = posix_acl_equiv_mode(acl, &inode->i_mode);
-				if (error < 0)
-					return error;
-				else {
-					inode->i_ctime = CURRENT_TIME_SEC;
-					ext3_mark_inode_dirty(handle, inode);
-					if (error == 0)
-						acl = NULL;
-				}
-			}
-			break;
-
-		case ACL_TYPE_DEFAULT:
-			name_index = EXT3_XATTR_INDEX_POSIX_ACL_DEFAULT;
-			if (!S_ISDIR(inode->i_mode))
-				return acl ? -EACCES : 0;
-			break;
-
-		default:
-			return -EINVAL;
-	}
-	if (acl) {
-		value = ext3_acl_to_disk(acl, &size);
-		if (IS_ERR(value))
-			return (int)PTR_ERR(value);
-	}
-
-	error = ext3_xattr_set_handle(handle, inode, name_index, "",
-				      value, size, 0);
-
-	kfree(value);
-
-	if (!error)
-		set_cached_acl(inode, type, acl);
-
-	return error;
-}
-
-int
-ext3_set_acl(struct inode *inode, struct posix_acl *acl, int type)
-{
-	handle_t *handle;
-	int error, retries = 0;
-
-retry:
-	handle = ext3_journal_start(inode, EXT3_DATA_TRANS_BLOCKS(inode->i_sb));
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-	error = __ext3_set_acl(handle, inode, type, acl);
-	ext3_journal_stop(handle);
-	if (error == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
-		goto retry;
-	return error;
-}
-
-/*
- * Initialize the ACLs of a new inode. Called from ext3_new_inode.
- *
- * dir->i_mutex: down
- * inode->i_mutex: up (access to inode is still exclusive)
- */
-int
-ext3_init_acl(handle_t *handle, struct inode *inode, struct inode *dir)
-{
-	struct posix_acl *default_acl, *acl;
-	int error;
-
-	error = posix_acl_create(dir, &inode->i_mode, &default_acl, &acl);
-	if (error)
-		return error;
-
-	if (default_acl) {
-		error = __ext3_set_acl(handle, inode, ACL_TYPE_DEFAULT,
-				       default_acl);
-		posix_acl_release(default_acl);
-	}
-	if (acl) {
-		if (!error)
-			error = __ext3_set_acl(handle, inode, ACL_TYPE_ACCESS,
-					       acl);
-		posix_acl_release(acl);
-	}
-	return error;
-}
diff --git a/fs/ext3/acl.h b/fs/ext3/acl.h
deleted file mode 100644
index ea1c69edab9e..000000000000
--- a/fs/ext3/acl.h
+++ /dev/null
@@ -1,72 +0,0 @@
-/*
-  File: fs/ext3/acl.h
-
-  (C) 2001 Andreas Gruenbacher, <a.gruenbacher@computer.org>
-*/
-
-#include <linux/posix_acl_xattr.h>
-
-#define EXT3_ACL_VERSION	0x0001
-
-typedef struct {
-	__le16		e_tag;
-	__le16		e_perm;
-	__le32		e_id;
-} ext3_acl_entry;
-
-typedef struct {
-	__le16		e_tag;
-	__le16		e_perm;
-} ext3_acl_entry_short;
-
-typedef struct {
-	__le32		a_version;
-} ext3_acl_header;
-
-static inline size_t ext3_acl_size(int count)
-{
-	if (count <= 4) {
-		return sizeof(ext3_acl_header) +
-		       count * sizeof(ext3_acl_entry_short);
-	} else {
-		return sizeof(ext3_acl_header) +
-		       4 * sizeof(ext3_acl_entry_short) +
-		       (count - 4) * sizeof(ext3_acl_entry);
-	}
-}
-
-static inline int ext3_acl_count(size_t size)
-{
-	ssize_t s;
-	size -= sizeof(ext3_acl_header);
-	s = size - 4 * sizeof(ext3_acl_entry_short);
-	if (s < 0) {
-		if (size % sizeof(ext3_acl_entry_short))
-			return -1;
-		return size / sizeof(ext3_acl_entry_short);
-	} else {
-		if (s % sizeof(ext3_acl_entry))
-			return -1;
-		return s / sizeof(ext3_acl_entry) + 4;
-	}
-}
-
-#ifdef CONFIG_EXT3_FS_POSIX_ACL
-
-/* acl.c */
-extern struct posix_acl *ext3_get_acl(struct inode *inode, int type);
-extern int ext3_set_acl(struct inode *inode, struct posix_acl *acl, int type);
-extern int ext3_init_acl (handle_t *, struct inode *, struct inode *);
-
-#else  /* CONFIG_EXT3_FS_POSIX_ACL */
-#include <linux/sched.h>
-#define ext3_get_acl NULL
-#define ext3_set_acl NULL
-
-static inline int
-ext3_init_acl(handle_t *handle, struct inode *inode, struct inode *dir)
-{
-	return 0;
-}
-#endif  /* CONFIG_EXT3_FS_POSIX_ACL */
-
diff --git a/fs/ext3/balloc.c b/fs/ext3/balloc.c
deleted file mode 100644
index 158b5d4ce067..000000000000
--- a/fs/ext3/balloc.c
+++ /dev/null
@@ -1,2158 +0,0 @@
-/*
- *  linux/fs/ext3/balloc.c
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- *
- *  Enhanced block allocation by Stephen Tweedie (sct@redhat.com), 1993
- *  Big-endian to little-endian byte-swapping/bitmaps by
- *        David S. Miller (davem@caip.rutgers.edu), 1995
- */
-
-#include <linux/quotaops.h>
-#include <linux/blkdev.h>
-#include "ext3.h"
-
-/*
- * balloc.c contains the blocks allocation and deallocation routines
- */
-
-/*
- * The free blocks are managed by bitmaps.  A file system contains several
- * blocks groups.  Each group contains 1 bitmap block for blocks, 1 bitmap
- * block for inodes, N blocks for the inode table and data blocks.
- *
- * The file system contains group descriptors which are located after the
- * super block.  Each descriptor contains the number of the bitmap block and
- * the free blocks count in the block.  The descriptors are loaded in memory
- * when a file system is mounted (see ext3_fill_super).
- */
-
-
-#define in_range(b, first, len)	((b) >= (first) && (b) <= (first) + (len) - 1)
-
-/*
- * Calculate the block group number and offset, given a block number
- */
-static void ext3_get_group_no_and_offset(struct super_block *sb,
-	ext3_fsblk_t blocknr, unsigned long *blockgrpp, ext3_grpblk_t *offsetp)
-{
-	struct ext3_super_block *es = EXT3_SB(sb)->s_es;
-
-	blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
-	if (offsetp)
-		*offsetp = blocknr % EXT3_BLOCKS_PER_GROUP(sb);
-	if (blockgrpp)
-		*blockgrpp = blocknr / EXT3_BLOCKS_PER_GROUP(sb);
-}
-
-/**
- * ext3_get_group_desc() -- load group descriptor from disk
- * @sb:			super block
- * @block_group:	given block group
- * @bh:			pointer to the buffer head to store the block
- *			group descriptor
- */
-struct ext3_group_desc * ext3_get_group_desc(struct super_block * sb,
-					     unsigned int block_group,
-					     struct buffer_head ** bh)
-{
-	unsigned long group_desc;
-	unsigned long offset;
-	struct ext3_group_desc * desc;
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-
-	if (block_group >= sbi->s_groups_count) {
-		ext3_error (sb, "ext3_get_group_desc",
-			    "block_group >= groups_count - "
-			    "block_group = %d, groups_count = %lu",
-			    block_group, sbi->s_groups_count);
-
-		return NULL;
-	}
-	smp_rmb();
-
-	group_desc = block_group >> EXT3_DESC_PER_BLOCK_BITS(sb);
-	offset = block_group & (EXT3_DESC_PER_BLOCK(sb) - 1);
-	if (!sbi->s_group_desc[group_desc]) {
-		ext3_error (sb, "ext3_get_group_desc",
-			    "Group descriptor not loaded - "
-			    "block_group = %d, group_desc = %lu, desc = %lu",
-			     block_group, group_desc, offset);
-		return NULL;
-	}
-
-	desc = (struct ext3_group_desc *) sbi->s_group_desc[group_desc]->b_data;
-	if (bh)
-		*bh = sbi->s_group_desc[group_desc];
-	return desc + offset;
-}
-
-static int ext3_valid_block_bitmap(struct super_block *sb,
-					struct ext3_group_desc *desc,
-					unsigned int block_group,
-					struct buffer_head *bh)
-{
-	ext3_grpblk_t offset;
-	ext3_grpblk_t next_zero_bit;
-	ext3_fsblk_t bitmap_blk;
-	ext3_fsblk_t group_first_block;
-
-	group_first_block = ext3_group_first_block_no(sb, block_group);
-
-	/* check whether block bitmap block number is set */
-	bitmap_blk = le32_to_cpu(desc->bg_block_bitmap);
-	offset = bitmap_blk - group_first_block;
-	if (!ext3_test_bit(offset, bh->b_data))
-		/* bad block bitmap */
-		goto err_out;
-
-	/* check whether the inode bitmap block number is set */
-	bitmap_blk = le32_to_cpu(desc->bg_inode_bitmap);
-	offset = bitmap_blk - group_first_block;
-	if (!ext3_test_bit(offset, bh->b_data))
-		/* bad block bitmap */
-		goto err_out;
-
-	/* check whether the inode table block number is set */
-	bitmap_blk = le32_to_cpu(desc->bg_inode_table);
-	offset = bitmap_blk - group_first_block;
-	next_zero_bit = ext3_find_next_zero_bit(bh->b_data,
-				offset + EXT3_SB(sb)->s_itb_per_group,
-				offset);
-	if (next_zero_bit >= offset + EXT3_SB(sb)->s_itb_per_group)
-		/* good bitmap for inode tables */
-		return 1;
-
-err_out:
-	ext3_error(sb, __func__,
-			"Invalid block bitmap - "
-			"block_group = %d, block = %lu",
-			block_group, bitmap_blk);
-	return 0;
-}
-
-/**
- * read_block_bitmap()
- * @sb:			super block
- * @block_group:	given block group
- *
- * Read the bitmap for a given block_group,and validate the
- * bits for block/inode/inode tables are set in the bitmaps
- *
- * Return buffer_head on success or NULL in case of failure.
- */
-static struct buffer_head *
-read_block_bitmap(struct super_block *sb, unsigned int block_group)
-{
-	struct ext3_group_desc * desc;
-	struct buffer_head * bh = NULL;
-	ext3_fsblk_t bitmap_blk;
-
-	desc = ext3_get_group_desc(sb, block_group, NULL);
-	if (!desc)
-		return NULL;
-	trace_ext3_read_block_bitmap(sb, block_group);
-	bitmap_blk = le32_to_cpu(desc->bg_block_bitmap);
-	bh = sb_getblk(sb, bitmap_blk);
-	if (unlikely(!bh)) {
-		ext3_error(sb, __func__,
-			    "Cannot read block bitmap - "
-			    "block_group = %d, block_bitmap = %u",
-			    block_group, le32_to_cpu(desc->bg_block_bitmap));
-		return NULL;
-	}
-	if (likely(bh_uptodate_or_lock(bh)))
-		return bh;
-
-	if (bh_submit_read(bh) < 0) {
-		brelse(bh);
-		ext3_error(sb, __func__,
-			    "Cannot read block bitmap - "
-			    "block_group = %d, block_bitmap = %u",
-			    block_group, le32_to_cpu(desc->bg_block_bitmap));
-		return NULL;
-	}
-	ext3_valid_block_bitmap(sb, desc, block_group, bh);
-	/*
-	 * file system mounted not to panic on error, continue with corrupt
-	 * bitmap
-	 */
-	return bh;
-}
-/*
- * The reservation window structure operations
- * --------------------------------------------
- * Operations include:
- * dump, find, add, remove, is_empty, find_next_reservable_window, etc.
- *
- * We use a red-black tree to represent per-filesystem reservation
- * windows.
- *
- */
-
-/**
- * __rsv_window_dump() -- Dump the filesystem block allocation reservation map
- * @rb_root:		root of per-filesystem reservation rb tree
- * @verbose:		verbose mode
- * @fn:			function which wishes to dump the reservation map
- *
- * If verbose is turned on, it will print the whole block reservation
- * windows(start, end).	Otherwise, it will only print out the "bad" windows,
- * those windows that overlap with their immediate neighbors.
- */
-#if 1
-static void __rsv_window_dump(struct rb_root *root, int verbose,
-			      const char *fn)
-{
-	struct rb_node *n;
-	struct ext3_reserve_window_node *rsv, *prev;
-	int bad;
-
-restart:
-	n = rb_first(root);
-	bad = 0;
-	prev = NULL;
-
-	printk("Block Allocation Reservation Windows Map (%s):\n", fn);
-	while (n) {
-		rsv = rb_entry(n, struct ext3_reserve_window_node, rsv_node);
-		if (verbose)
-			printk("reservation window 0x%p "
-			       "start:  %lu, end:  %lu\n",
-			       rsv, rsv->rsv_start, rsv->rsv_end);
-		if (rsv->rsv_start && rsv->rsv_start >= rsv->rsv_end) {
-			printk("Bad reservation %p (start >= end)\n",
-			       rsv);
-			bad = 1;
-		}
-		if (prev && prev->rsv_end >= rsv->rsv_start) {
-			printk("Bad reservation %p (prev->end >= start)\n",
-			       rsv);
-			bad = 1;
-		}
-		if (bad) {
-			if (!verbose) {
-				printk("Restarting reservation walk in verbose mode\n");
-				verbose = 1;
-				goto restart;
-			}
-		}
-		n = rb_next(n);
-		prev = rsv;
-	}
-	printk("Window map complete.\n");
-	BUG_ON(bad);
-}
-#define rsv_window_dump(root, verbose) \
-	__rsv_window_dump((root), (verbose), __func__)
-#else
-#define rsv_window_dump(root, verbose) do {} while (0)
-#endif
-
-/**
- * goal_in_my_reservation()
- * @rsv:		inode's reservation window
- * @grp_goal:		given goal block relative to the allocation block group
- * @group:		the current allocation block group
- * @sb:			filesystem super block
- *
- * Test if the given goal block (group relative) is within the file's
- * own block reservation window range.
- *
- * If the reservation window is outside the goal allocation group, return 0;
- * grp_goal (given goal block) could be -1, which means no specific
- * goal block. In this case, always return 1.
- * If the goal block is within the reservation window, return 1;
- * otherwise, return 0;
- */
-static int
-goal_in_my_reservation(struct ext3_reserve_window *rsv, ext3_grpblk_t grp_goal,
-			unsigned int group, struct super_block * sb)
-{
-	ext3_fsblk_t group_first_block, group_last_block;
-
-	group_first_block = ext3_group_first_block_no(sb, group);
-	group_last_block = group_first_block + (EXT3_BLOCKS_PER_GROUP(sb) - 1);
-
-	if ((rsv->_rsv_start > group_last_block) ||
-	    (rsv->_rsv_end < group_first_block))
-		return 0;
-	if ((grp_goal >= 0) && ((grp_goal + group_first_block < rsv->_rsv_start)
-		|| (grp_goal + group_first_block > rsv->_rsv_end)))
-		return 0;
-	return 1;
-}
-
-/**
- * search_reserve_window()
- * @rb_root:		root of reservation tree
- * @goal:		target allocation block
- *
- * Find the reserved window which includes the goal, or the previous one
- * if the goal is not in any window.
- * Returns NULL if there are no windows or if all windows start after the goal.
- */
-static struct ext3_reserve_window_node *
-search_reserve_window(struct rb_root *root, ext3_fsblk_t goal)
-{
-	struct rb_node *n = root->rb_node;
-	struct ext3_reserve_window_node *rsv;
-
-	if (!n)
-		return NULL;
-
-	do {
-		rsv = rb_entry(n, struct ext3_reserve_window_node, rsv_node);
-
-		if (goal < rsv->rsv_start)
-			n = n->rb_left;
-		else if (goal > rsv->rsv_end)
-			n = n->rb_right;
-		else
-			return rsv;
-	} while (n);
-	/*
-	 * We've fallen off the end of the tree: the goal wasn't inside
-	 * any particular node.  OK, the previous node must be to one
-	 * side of the interval containing the goal.  If it's the RHS,
-	 * we need to back up one.
-	 */
-	if (rsv->rsv_start > goal) {
-		n = rb_prev(&rsv->rsv_node);
-		rsv = rb_entry(n, struct ext3_reserve_window_node, rsv_node);
-	}
-	return rsv;
-}
-
-/**
- * ext3_rsv_window_add() -- Insert a window to the block reservation rb tree.
- * @sb:			super block
- * @rsv:		reservation window to add
- *
- * Must be called with rsv_lock hold.
- */
-void ext3_rsv_window_add(struct super_block *sb,
-		    struct ext3_reserve_window_node *rsv)
-{
-	struct rb_root *root = &EXT3_SB(sb)->s_rsv_window_root;
-	struct rb_node *node = &rsv->rsv_node;
-	ext3_fsblk_t start = rsv->rsv_start;
-
-	struct rb_node ** p = &root->rb_node;
-	struct rb_node * parent = NULL;
-	struct ext3_reserve_window_node *this;
-
-	trace_ext3_rsv_window_add(sb, rsv);
-	while (*p)
-	{
-		parent = *p;
-		this = rb_entry(parent, struct ext3_reserve_window_node, rsv_node);
-
-		if (start < this->rsv_start)
-			p = &(*p)->rb_left;
-		else if (start > this->rsv_end)
-			p = &(*p)->rb_right;
-		else {
-			rsv_window_dump(root, 1);
-			BUG();
-		}
-	}
-
-	rb_link_node(node, parent, p);
-	rb_insert_color(node, root);
-}
-
-/**
- * ext3_rsv_window_remove() -- unlink a window from the reservation rb tree
- * @sb:			super block
- * @rsv:		reservation window to remove
- *
- * Mark the block reservation window as not allocated, and unlink it
- * from the filesystem reservation window rb tree. Must be called with
- * rsv_lock hold.
- */
-static void rsv_window_remove(struct super_block *sb,
-			      struct ext3_reserve_window_node *rsv)
-{
-	rsv->rsv_start = EXT3_RESERVE_WINDOW_NOT_ALLOCATED;
-	rsv->rsv_end = EXT3_RESERVE_WINDOW_NOT_ALLOCATED;
-	rsv->rsv_alloc_hit = 0;
-	rb_erase(&rsv->rsv_node, &EXT3_SB(sb)->s_rsv_window_root);
-}
-
-/*
- * rsv_is_empty() -- Check if the reservation window is allocated.
- * @rsv:		given reservation window to check
- *
- * returns 1 if the end block is EXT3_RESERVE_WINDOW_NOT_ALLOCATED.
- */
-static inline int rsv_is_empty(struct ext3_reserve_window *rsv)
-{
-	/* a valid reservation end block could not be 0 */
-	return rsv->_rsv_end == EXT3_RESERVE_WINDOW_NOT_ALLOCATED;
-}
-
-/**
- * ext3_init_block_alloc_info()
- * @inode:		file inode structure
- *
- * Allocate and initialize the	reservation window structure, and
- * link the window to the ext3 inode structure at last
- *
- * The reservation window structure is only dynamically allocated
- * and linked to ext3 inode the first time the open file
- * needs a new block. So, before every ext3_new_block(s) call, for
- * regular files, we should check whether the reservation window
- * structure exists or not. In the latter case, this function is called.
- * Fail to do so will result in block reservation being turned off for that
- * open file.
- *
- * This function is called from ext3_get_blocks_handle(), also called
- * when setting the reservation window size through ioctl before the file
- * is open for write (needs block allocation).
- *
- * Needs truncate_mutex protection prior to call this function.
- */
-void ext3_init_block_alloc_info(struct inode *inode)
-{
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	struct ext3_block_alloc_info *block_i;
-	struct super_block *sb = inode->i_sb;
-
-	block_i = kmalloc(sizeof(*block_i), GFP_NOFS);
-	if (block_i) {
-		struct ext3_reserve_window_node *rsv = &block_i->rsv_window_node;
-
-		rsv->rsv_start = EXT3_RESERVE_WINDOW_NOT_ALLOCATED;
-		rsv->rsv_end = EXT3_RESERVE_WINDOW_NOT_ALLOCATED;
-
-		/*
-		 * if filesystem is mounted with NORESERVATION, the goal
-		 * reservation window size is set to zero to indicate
-		 * block reservation is off
-		 */
-		if (!test_opt(sb, RESERVATION))
-			rsv->rsv_goal_size = 0;
-		else
-			rsv->rsv_goal_size = EXT3_DEFAULT_RESERVE_BLOCKS;
-		rsv->rsv_alloc_hit = 0;
-		block_i->last_alloc_logical_block = 0;
-		block_i->last_alloc_physical_block = 0;
-	}
-	ei->i_block_alloc_info = block_i;
-}
-
-/**
- * ext3_discard_reservation()
- * @inode:		inode
- *
- * Discard(free) block reservation window on last file close, or truncate
- * or at last iput().
- *
- * It is being called in three cases:
- *	ext3_release_file(): last writer close the file
- *	ext3_clear_inode(): last iput(), when nobody link to this file.
- *	ext3_truncate(): when the block indirect map is about to change.
- *
- */
-void ext3_discard_reservation(struct inode *inode)
-{
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	struct ext3_block_alloc_info *block_i = ei->i_block_alloc_info;
-	struct ext3_reserve_window_node *rsv;
-	spinlock_t *rsv_lock = &EXT3_SB(inode->i_sb)->s_rsv_window_lock;
-
-	if (!block_i)
-		return;
-
-	rsv = &block_i->rsv_window_node;
-	if (!rsv_is_empty(&rsv->rsv_window)) {
-		spin_lock(rsv_lock);
-		if (!rsv_is_empty(&rsv->rsv_window)) {
-			trace_ext3_discard_reservation(inode, rsv);
-			rsv_window_remove(inode->i_sb, rsv);
-		}
-		spin_unlock(rsv_lock);
-	}
-}
-
-/**
- * ext3_free_blocks_sb() -- Free given blocks and update quota
- * @handle:			handle to this transaction
- * @sb:				super block
- * @block:			start physical block to free
- * @count:			number of blocks to free
- * @pdquot_freed_blocks:	pointer to quota
- */
-void ext3_free_blocks_sb(handle_t *handle, struct super_block *sb,
-			 ext3_fsblk_t block, unsigned long count,
-			 unsigned long *pdquot_freed_blocks)
-{
-	struct buffer_head *bitmap_bh = NULL;
-	struct buffer_head *gd_bh;
-	unsigned long block_group;
-	ext3_grpblk_t bit;
-	unsigned long i;
-	unsigned long overflow;
-	struct ext3_group_desc * desc;
-	struct ext3_super_block * es;
-	struct ext3_sb_info *sbi;
-	int err = 0, ret;
-	ext3_grpblk_t group_freed;
-
-	*pdquot_freed_blocks = 0;
-	sbi = EXT3_SB(sb);
-	es = sbi->s_es;
-	if (block < le32_to_cpu(es->s_first_data_block) ||
-	    block + count < block ||
-	    block + count > le32_to_cpu(es->s_blocks_count)) {
-		ext3_error (sb, "ext3_free_blocks",
-			    "Freeing blocks not in datazone - "
-			    "block = "E3FSBLK", count = %lu", block, count);
-		goto error_return;
-	}
-
-	ext3_debug ("freeing block(s) %lu-%lu\n", block, block + count - 1);
-
-do_more:
-	overflow = 0;
-	block_group = (block - le32_to_cpu(es->s_first_data_block)) /
-		      EXT3_BLOCKS_PER_GROUP(sb);
-	bit = (block - le32_to_cpu(es->s_first_data_block)) %
-		      EXT3_BLOCKS_PER_GROUP(sb);
-	/*
-	 * Check to see if we are freeing blocks across a group
-	 * boundary.
-	 */
-	if (bit + count > EXT3_BLOCKS_PER_GROUP(sb)) {
-		overflow = bit + count - EXT3_BLOCKS_PER_GROUP(sb);
-		count -= overflow;
-	}
-	brelse(bitmap_bh);
-	bitmap_bh = read_block_bitmap(sb, block_group);
-	if (!bitmap_bh)
-		goto error_return;
-	desc = ext3_get_group_desc (sb, block_group, &gd_bh);
-	if (!desc)
-		goto error_return;
-
-	if (in_range (le32_to_cpu(desc->bg_block_bitmap), block, count) ||
-	    in_range (le32_to_cpu(desc->bg_inode_bitmap), block, count) ||
-	    in_range (block, le32_to_cpu(desc->bg_inode_table),
-		      sbi->s_itb_per_group) ||
-	    in_range (block + count - 1, le32_to_cpu(desc->bg_inode_table),
-		      sbi->s_itb_per_group)) {
-		ext3_error (sb, "ext3_free_blocks",
-			    "Freeing blocks in system zones - "
-			    "Block = "E3FSBLK", count = %lu",
-			    block, count);
-		goto error_return;
-	}
-
-	/*
-	 * We are about to start releasing blocks in the bitmap,
-	 * so we need undo access.
-	 */
-	/* @@@ check errors */
-	BUFFER_TRACE(bitmap_bh, "getting undo access");
-	err = ext3_journal_get_undo_access(handle, bitmap_bh);
-	if (err)
-		goto error_return;
-
-	/*
-	 * We are about to modify some metadata.  Call the journal APIs
-	 * to unshare ->b_data if a currently-committing transaction is
-	 * using it
-	 */
-	BUFFER_TRACE(gd_bh, "get_write_access");
-	err = ext3_journal_get_write_access(handle, gd_bh);
-	if (err)
-		goto error_return;
-
-	jbd_lock_bh_state(bitmap_bh);
-
-	for (i = 0, group_freed = 0; i < count; i++) {
-		/*
-		 * An HJ special.  This is expensive...
-		 */
-#ifdef CONFIG_JBD_DEBUG
-		jbd_unlock_bh_state(bitmap_bh);
-		{
-			struct buffer_head *debug_bh;
-			debug_bh = sb_find_get_block(sb, block + i);
-			if (debug_bh) {
-				BUFFER_TRACE(debug_bh, "Deleted!");
-				if (!bh2jh(bitmap_bh)->b_committed_data)
-					BUFFER_TRACE(debug_bh,
-						"No committed data in bitmap");
-				BUFFER_TRACE2(debug_bh, bitmap_bh, "bitmap");
-				__brelse(debug_bh);
-			}
-		}
-		jbd_lock_bh_state(bitmap_bh);
-#endif
-		if (need_resched()) {
-			jbd_unlock_bh_state(bitmap_bh);
-			cond_resched();
-			jbd_lock_bh_state(bitmap_bh);
-		}
-		/* @@@ This prevents newly-allocated data from being
-		 * freed and then reallocated within the same
-		 * transaction.
-		 *
-		 * Ideally we would want to allow that to happen, but to
-		 * do so requires making journal_forget() capable of
-		 * revoking the queued write of a data block, which
-		 * implies blocking on the journal lock.  *forget()
-		 * cannot block due to truncate races.
-		 *
-		 * Eventually we can fix this by making journal_forget()
-		 * return a status indicating whether or not it was able
-		 * to revoke the buffer.  On successful revoke, it is
-		 * safe not to set the allocation bit in the committed
-		 * bitmap, because we know that there is no outstanding
-		 * activity on the buffer any more and so it is safe to
-		 * reallocate it.
-		 */
-		BUFFER_TRACE(bitmap_bh, "set in b_committed_data");
-		J_ASSERT_BH(bitmap_bh,
-				bh2jh(bitmap_bh)->b_committed_data != NULL);
-		ext3_set_bit_atomic(sb_bgl_lock(sbi, block_group), bit + i,
-				bh2jh(bitmap_bh)->b_committed_data);
-
-		/*
-		 * We clear the bit in the bitmap after setting the committed
-		 * data bit, because this is the reverse order to that which
-		 * the allocator uses.
-		 */
-		BUFFER_TRACE(bitmap_bh, "clear bit");
-		if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, block_group),
-						bit + i, bitmap_bh->b_data)) {
-			jbd_unlock_bh_state(bitmap_bh);
-			ext3_error(sb, __func__,
-				"bit already cleared for block "E3FSBLK,
-				 block + i);
-			jbd_lock_bh_state(bitmap_bh);
-			BUFFER_TRACE(bitmap_bh, "bit already cleared");
-		} else {
-			group_freed++;
-		}
-	}
-	jbd_unlock_bh_state(bitmap_bh);
-
-	spin_lock(sb_bgl_lock(sbi, block_group));
-	le16_add_cpu(&desc->bg_free_blocks_count, group_freed);
-	spin_unlock(sb_bgl_lock(sbi, block_group));
-	percpu_counter_add(&sbi->s_freeblocks_counter, count);
-
-	/* We dirtied the bitmap block */
-	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
-	err = ext3_journal_dirty_metadata(handle, bitmap_bh);
-
-	/* And the group descriptor block */
-	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
-	ret = ext3_journal_dirty_metadata(handle, gd_bh);
-	if (!err) err = ret;
-	*pdquot_freed_blocks += group_freed;
-
-	if (overflow && !err) {
-		block += count;
-		count = overflow;
-		goto do_more;
-	}
-
-error_return:
-	brelse(bitmap_bh);
-	ext3_std_error(sb, err);
-	return;
-}
-
-/**
- * ext3_free_blocks() -- Free given blocks and update quota
- * @handle:		handle for this transaction
- * @inode:		inode
- * @block:		start physical block to free
- * @count:		number of blocks to count
- */
-void ext3_free_blocks(handle_t *handle, struct inode *inode,
-			ext3_fsblk_t block, unsigned long count)
-{
-	struct super_block *sb = inode->i_sb;
-	unsigned long dquot_freed_blocks;
-
-	trace_ext3_free_blocks(inode, block, count);
-	ext3_free_blocks_sb(handle, sb, block, count, &dquot_freed_blocks);
-	if (dquot_freed_blocks)
-		dquot_free_block(inode, dquot_freed_blocks);
-	return;
-}
-
-/**
- * ext3_test_allocatable()
- * @nr:			given allocation block group
- * @bh:			bufferhead contains the bitmap of the given block group
- *
- * For ext3 allocations, we must not reuse any blocks which are
- * allocated in the bitmap buffer's "last committed data" copy.  This
- * prevents deletes from freeing up the page for reuse until we have
- * committed the delete transaction.
- *
- * If we didn't do this, then deleting something and reallocating it as
- * data would allow the old block to be overwritten before the
- * transaction committed (because we force data to disk before commit).
- * This would lead to corruption if we crashed between overwriting the
- * data and committing the delete.
- *
- * @@@ We may want to make this allocation behaviour conditional on
- * data-writes at some point, and disable it for metadata allocations or
- * sync-data inodes.
- */
-static int ext3_test_allocatable(ext3_grpblk_t nr, struct buffer_head *bh)
-{
-	int ret;
-	struct journal_head *jh = bh2jh(bh);
-
-	if (ext3_test_bit(nr, bh->b_data))
-		return 0;
-
-	jbd_lock_bh_state(bh);
-	if (!jh->b_committed_data)
-		ret = 1;
-	else
-		ret = !ext3_test_bit(nr, jh->b_committed_data);
-	jbd_unlock_bh_state(bh);
-	return ret;
-}
-
-/**
- * bitmap_search_next_usable_block()
- * @start:		the starting block (group relative) of the search
- * @bh:			bufferhead contains the block group bitmap
- * @maxblocks:		the ending block (group relative) of the reservation
- *
- * The bitmap search --- search forward alternately through the actual
- * bitmap on disk and the last-committed copy in journal, until we find a
- * bit free in both bitmaps.
- */
-static ext3_grpblk_t
-bitmap_search_next_usable_block(ext3_grpblk_t start, struct buffer_head *bh,
-					ext3_grpblk_t maxblocks)
-{
-	ext3_grpblk_t next;
-	struct journal_head *jh = bh2jh(bh);
-
-	while (start < maxblocks) {
-		next = ext3_find_next_zero_bit(bh->b_data, maxblocks, start);
-		if (next >= maxblocks)
-			return -1;
-		if (ext3_test_allocatable(next, bh))
-			return next;
-		jbd_lock_bh_state(bh);
-		if (jh->b_committed_data)
-			start = ext3_find_next_zero_bit(jh->b_committed_data,
-							maxblocks, next);
-		jbd_unlock_bh_state(bh);
-	}
-	return -1;
-}
-
-/**
- * find_next_usable_block()
- * @start:		the starting block (group relative) to find next
- *			allocatable block in bitmap.
- * @bh:			bufferhead contains the block group bitmap
- * @maxblocks:		the ending block (group relative) for the search
- *
- * Find an allocatable block in a bitmap.  We honor both the bitmap and
- * its last-committed copy (if that exists), and perform the "most
- * appropriate allocation" algorithm of looking for a free block near
- * the initial goal; then for a free byte somewhere in the bitmap; then
- * for any free bit in the bitmap.
- */
-static ext3_grpblk_t
-find_next_usable_block(ext3_grpblk_t start, struct buffer_head *bh,
-			ext3_grpblk_t maxblocks)
-{
-	ext3_grpblk_t here, next;
-	char *p, *r;
-
-	if (start > 0) {
-		/*
-		 * The goal was occupied; search forward for a free
-		 * block within the next XX blocks.
-		 *
-		 * end_goal is more or less random, but it has to be
-		 * less than EXT3_BLOCKS_PER_GROUP. Aligning up to the
-		 * next 64-bit boundary is simple..
-		 */
-		ext3_grpblk_t end_goal = (start + 63) & ~63;
-		if (end_goal > maxblocks)
-			end_goal = maxblocks;
-		here = ext3_find_next_zero_bit(bh->b_data, end_goal, start);
-		if (here < end_goal && ext3_test_allocatable(here, bh))
-			return here;
-		ext3_debug("Bit not found near goal\n");
-	}
-
-	here = start;
-	if (here < 0)
-		here = 0;
-
-	p = bh->b_data + (here >> 3);
-	r = memscan(p, 0, ((maxblocks + 7) >> 3) - (here >> 3));
-	next = (r - bh->b_data) << 3;
-
-	if (next < maxblocks && next >= start && ext3_test_allocatable(next, bh))
-		return next;
-
-	/*
-	 * The bitmap search --- search forward alternately through the actual
-	 * bitmap and the last-committed copy until we find a bit free in
-	 * both
-	 */
-	here = bitmap_search_next_usable_block(here, bh, maxblocks);
-	return here;
-}
-
-/**
- * claim_block()
- * @lock:		the spin lock for this block group
- * @block:		the free block (group relative) to allocate
- * @bh:			the buffer_head contains the block group bitmap
- *
- * We think we can allocate this block in this bitmap.  Try to set the bit.
- * If that succeeds then check that nobody has allocated and then freed the
- * block since we saw that is was not marked in b_committed_data.  If it _was_
- * allocated and freed then clear the bit in the bitmap again and return
- * zero (failure).
- */
-static inline int
-claim_block(spinlock_t *lock, ext3_grpblk_t block, struct buffer_head *bh)
-{
-	struct journal_head *jh = bh2jh(bh);
-	int ret;
-
-	if (ext3_set_bit_atomic(lock, block, bh->b_data))
-		return 0;
-	jbd_lock_bh_state(bh);
-	if (jh->b_committed_data && ext3_test_bit(block,jh->b_committed_data)) {
-		ext3_clear_bit_atomic(lock, block, bh->b_data);
-		ret = 0;
-	} else {
-		ret = 1;
-	}
-	jbd_unlock_bh_state(bh);
-	return ret;
-}
-
-/**
- * ext3_try_to_allocate()
- * @sb:			superblock
- * @handle:		handle to this transaction
- * @group:		given allocation block group
- * @bitmap_bh:		bufferhead holds the block bitmap
- * @grp_goal:		given target block within the group
- * @count:		target number of blocks to allocate
- * @my_rsv:		reservation window
- *
- * Attempt to allocate blocks within a give range. Set the range of allocation
- * first, then find the first free bit(s) from the bitmap (within the range),
- * and at last, allocate the blocks by claiming the found free bit as allocated.
- *
- * To set the range of this allocation:
- *	if there is a reservation window, only try to allocate block(s) from the
- *	file's own reservation window;
- *	Otherwise, the allocation range starts from the give goal block, ends at
- *	the block group's last block.
- *
- * If we failed to allocate the desired block then we may end up crossing to a
- * new bitmap.  In that case we must release write access to the old one via
- * ext3_journal_release_buffer(), else we'll run out of credits.
- */
-static ext3_grpblk_t
-ext3_try_to_allocate(struct super_block *sb, handle_t *handle, int group,
-			struct buffer_head *bitmap_bh, ext3_grpblk_t grp_goal,
-			unsigned long *count, struct ext3_reserve_window *my_rsv)
-{
-	ext3_fsblk_t group_first_block;
-	ext3_grpblk_t start, end;
-	unsigned long num = 0;
-
-	/* we do allocation within the reservation window if we have a window */
-	if (my_rsv) {
-		group_first_block = ext3_group_first_block_no(sb, group);
-		if (my_rsv->_rsv_start >= group_first_block)
-			start = my_rsv->_rsv_start - group_first_block;
-		else
-			/* reservation window cross group boundary */
-			start = 0;
-		end = my_rsv->_rsv_end - group_first_block + 1;
-		if (end > EXT3_BLOCKS_PER_GROUP(sb))
-			/* reservation window crosses group boundary */
-			end = EXT3_BLOCKS_PER_GROUP(sb);
-		if ((start <= grp_goal) && (grp_goal < end))
-			start = grp_goal;
-		else
-			grp_goal = -1;
-	} else {
-		if (grp_goal > 0)
-			start = grp_goal;
-		else
-			start = 0;
-		end = EXT3_BLOCKS_PER_GROUP(sb);
-	}
-
-	BUG_ON(start > EXT3_BLOCKS_PER_GROUP(sb));
-
-repeat:
-	if (grp_goal < 0 || !ext3_test_allocatable(grp_goal, bitmap_bh)) {
-		grp_goal = find_next_usable_block(start, bitmap_bh, end);
-		if (grp_goal < 0)
-			goto fail_access;
-		if (!my_rsv) {
-			int i;
-
-			for (i = 0; i < 7 && grp_goal > start &&
-					ext3_test_allocatable(grp_goal - 1,
-								bitmap_bh);
-					i++, grp_goal--)
-				;
-		}
-	}
-	start = grp_goal;
-
-	if (!claim_block(sb_bgl_lock(EXT3_SB(sb), group),
-		grp_goal, bitmap_bh)) {
-		/*
-		 * The block was allocated by another thread, or it was
-		 * allocated and then freed by another thread
-		 */
-		start++;
-		grp_goal++;
-		if (start >= end)
-			goto fail_access;
-		goto repeat;
-	}
-	num++;
-	grp_goal++;
-	while (num < *count && grp_goal < end
-		&& ext3_test_allocatable(grp_goal, bitmap_bh)
-		&& claim_block(sb_bgl_lock(EXT3_SB(sb), group),
-				grp_goal, bitmap_bh)) {
-		num++;
-		grp_goal++;
-	}
-	*count = num;
-	return grp_goal - num;
-fail_access:
-	*count = num;
-	return -1;
-}
-
-/**
- *	find_next_reservable_window():
- *		find a reservable space within the given range.
- *		It does not allocate the reservation window for now:
- *		alloc_new_reservation() will do the work later.
- *
- *	@search_head: the head of the searching list;
- *		This is not necessarily the list head of the whole filesystem
- *
- *		We have both head and start_block to assist the search
- *		for the reservable space. The list starts from head,
- *		but we will shift to the place where start_block is,
- *		then start from there, when looking for a reservable space.
- *
- *	@my_rsv: the reservation window
- *
- *	@sb: the super block
- *
- *	@start_block: the first block we consider to start
- *			the real search from
- *
- *	@last_block:
- *		the maximum block number that our goal reservable space
- *		could start from. This is normally the last block in this
- *		group. The search will end when we found the start of next
- *		possible reservable space is out of this boundary.
- *		This could handle the cross boundary reservation window
- *		request.
- *
- *	basically we search from the given range, rather than the whole
- *	reservation double linked list, (start_block, last_block)
- *	to find a free region that is of my size and has not
- *	been reserved.
- *
- */
-static int find_next_reservable_window(
-				struct ext3_reserve_window_node *search_head,
-				struct ext3_reserve_window_node *my_rsv,
-				struct super_block * sb,
-				ext3_fsblk_t start_block,
-				ext3_fsblk_t last_block)
-{
-	struct rb_node *next;
-	struct ext3_reserve_window_node *rsv, *prev;
-	ext3_fsblk_t cur;
-	int size = my_rsv->rsv_goal_size;
-
-	/* TODO: make the start of the reservation window byte-aligned */
-	/* cur = *start_block & ~7;*/
-	cur = start_block;
-	rsv = search_head;
-	if (!rsv)
-		return -1;
-
-	while (1) {
-		if (cur <= rsv->rsv_end)
-			cur = rsv->rsv_end + 1;
-
-		/* TODO?
-		 * in the case we could not find a reservable space
-		 * that is what is expected, during the re-search, we could
-		 * remember what's the largest reservable space we could have
-		 * and return that one.
-		 *
-		 * For now it will fail if we could not find the reservable
-		 * space with expected-size (or more)...
-		 */
-		if (cur > last_block)
-			return -1;		/* fail */
-
-		prev = rsv;
-		next = rb_next(&rsv->rsv_node);
-		rsv = rb_entry(next,struct ext3_reserve_window_node,rsv_node);
-
-		/*
-		 * Reached the last reservation, we can just append to the
-		 * previous one.
-		 */
-		if (!next)
-			break;
-
-		if (cur + size <= rsv->rsv_start) {
-			/*
-			 * Found a reserveable space big enough.  We could
-			 * have a reservation across the group boundary here
-			 */
-			break;
-		}
-	}
-	/*
-	 * we come here either :
-	 * when we reach the end of the whole list,
-	 * and there is empty reservable space after last entry in the list.
-	 * append it to the end of the list.
-	 *
-	 * or we found one reservable space in the middle of the list,
-	 * return the reservation window that we could append to.
-	 * succeed.
-	 */
-
-	if ((prev != my_rsv) && (!rsv_is_empty(&my_rsv->rsv_window)))
-		rsv_window_remove(sb, my_rsv);
-
-	/*
-	 * Let's book the whole available window for now.  We will check the
-	 * disk bitmap later and then, if there are free blocks then we adjust
-	 * the window size if it's larger than requested.
-	 * Otherwise, we will remove this node from the tree next time
-	 * call find_next_reservable_window.
-	 */
-	my_rsv->rsv_start = cur;
-	my_rsv->rsv_end = cur + size - 1;
-	my_rsv->rsv_alloc_hit = 0;
-
-	if (prev != my_rsv)
-		ext3_rsv_window_add(sb, my_rsv);
-
-	return 0;
-}
-
-/**
- *	alloc_new_reservation()--allocate a new reservation window
- *
- *		To make a new reservation, we search part of the filesystem
- *		reservation list (the list that inside the group). We try to
- *		allocate a new reservation window near the allocation goal,
- *		or the beginning of the group, if there is no goal.
- *
- *		We first find a reservable space after the goal, then from
- *		there, we check the bitmap for the first free block after
- *		it. If there is no free block until the end of group, then the
- *		whole group is full, we failed. Otherwise, check if the free
- *		block is inside the expected reservable space, if so, we
- *		succeed.
- *		If the first free block is outside the reservable space, then
- *		start from the first free block, we search for next available
- *		space, and go on.
- *
- *	on succeed, a new reservation will be found and inserted into the list
- *	It contains at least one free block, and it does not overlap with other
- *	reservation windows.
- *
- *	failed: we failed to find a reservation window in this group
- *
- *	@my_rsv: the reservation window
- *
- *	@grp_goal: The goal (group-relative).  It is where the search for a
- *		free reservable space should start from.
- *		if we have a grp_goal(grp_goal >0 ), then start from there,
- *		no grp_goal(grp_goal = -1), we start from the first block
- *		of the group.
- *
- *	@sb: the super block
- *	@group: the group we are trying to allocate in
- *	@bitmap_bh: the block group block bitmap
- *
- */
-static int alloc_new_reservation(struct ext3_reserve_window_node *my_rsv,
-		ext3_grpblk_t grp_goal, struct super_block *sb,
-		unsigned int group, struct buffer_head *bitmap_bh)
-{
-	struct ext3_reserve_window_node *search_head;
-	ext3_fsblk_t group_first_block, group_end_block, start_block;
-	ext3_grpblk_t first_free_block;
-	struct rb_root *fs_rsv_root = &EXT3_SB(sb)->s_rsv_window_root;
-	unsigned long size;
-	int ret;
-	spinlock_t *rsv_lock = &EXT3_SB(sb)->s_rsv_window_lock;
-
-	group_first_block = ext3_group_first_block_no(sb, group);
-	group_end_block = group_first_block + (EXT3_BLOCKS_PER_GROUP(sb) - 1);
-
-	if (grp_goal < 0)
-		start_block = group_first_block;
-	else
-		start_block = grp_goal + group_first_block;
-
-	trace_ext3_alloc_new_reservation(sb, start_block);
-	size = my_rsv->rsv_goal_size;
-
-	if (!rsv_is_empty(&my_rsv->rsv_window)) {
-		/*
-		 * if the old reservation is cross group boundary
-		 * and if the goal is inside the old reservation window,
-		 * we will come here when we just failed to allocate from
-		 * the first part of the window. We still have another part
-		 * that belongs to the next group. In this case, there is no
-		 * point to discard our window and try to allocate a new one
-		 * in this group(which will fail). we should
-		 * keep the reservation window, just simply move on.
-		 *
-		 * Maybe we could shift the start block of the reservation
-		 * window to the first block of next group.
-		 */
-
-		if ((my_rsv->rsv_start <= group_end_block) &&
-				(my_rsv->rsv_end > group_end_block) &&
-				(start_block >= my_rsv->rsv_start))
-			return -1;
-
-		if ((my_rsv->rsv_alloc_hit >
-		     (my_rsv->rsv_end - my_rsv->rsv_start + 1) / 2)) {
-			/*
-			 * if the previously allocation hit ratio is
-			 * greater than 1/2, then we double the size of
-			 * the reservation window the next time,
-			 * otherwise we keep the same size window
-			 */
-			size = size * 2;
-			if (size > EXT3_MAX_RESERVE_BLOCKS)
-				size = EXT3_MAX_RESERVE_BLOCKS;
-			my_rsv->rsv_goal_size= size;
-		}
-	}
-
-	spin_lock(rsv_lock);
-	/*
-	 * shift the search start to the window near the goal block
-	 */
-	search_head = search_reserve_window(fs_rsv_root, start_block);
-
-	/*
-	 * find_next_reservable_window() simply finds a reservable window
-	 * inside the given range(start_block, group_end_block).
-	 *
-	 * To make sure the reservation window has a free bit inside it, we
-	 * need to check the bitmap after we found a reservable window.
-	 */
-retry:
-	ret = find_next_reservable_window(search_head, my_rsv, sb,
-						start_block, group_end_block);
-
-	if (ret == -1) {
-		if (!rsv_is_empty(&my_rsv->rsv_window))
-			rsv_window_remove(sb, my_rsv);
-		spin_unlock(rsv_lock);
-		return -1;
-	}
-
-	/*
-	 * On success, find_next_reservable_window() returns the
-	 * reservation window where there is a reservable space after it.
-	 * Before we reserve this reservable space, we need
-	 * to make sure there is at least a free block inside this region.
-	 *
-	 * searching the first free bit on the block bitmap and copy of
-	 * last committed bitmap alternatively, until we found a allocatable
-	 * block. Search start from the start block of the reservable space
-	 * we just found.
-	 */
-	spin_unlock(rsv_lock);
-	first_free_block = bitmap_search_next_usable_block(
-			my_rsv->rsv_start - group_first_block,
-			bitmap_bh, group_end_block - group_first_block + 1);
-
-	if (first_free_block < 0) {
-		/*
-		 * no free block left on the bitmap, no point
-		 * to reserve the space. return failed.
-		 */
-		spin_lock(rsv_lock);
-		if (!rsv_is_empty(&my_rsv->rsv_window))
-			rsv_window_remove(sb, my_rsv);
-		spin_unlock(rsv_lock);
-		return -1;		/* failed */
-	}
-
-	start_block = first_free_block + group_first_block;
-	/*
-	 * check if the first free block is within the
-	 * free space we just reserved
-	 */
-	if (start_block >= my_rsv->rsv_start &&
-	    start_block <= my_rsv->rsv_end) {
-		trace_ext3_reserved(sb, start_block, my_rsv);
-		return 0;		/* success */
-	}
-	/*
-	 * if the first free bit we found is out of the reservable space
-	 * continue search for next reservable space,
-	 * start from where the free block is,
-	 * we also shift the list head to where we stopped last time
-	 */
-	search_head = my_rsv;
-	spin_lock(rsv_lock);
-	goto retry;
-}
-
-/**
- * try_to_extend_reservation()
- * @my_rsv:		given reservation window
- * @sb:			super block
- * @size:		the delta to extend
- *
- * Attempt to expand the reservation window large enough to have
- * required number of free blocks
- *
- * Since ext3_try_to_allocate() will always allocate blocks within
- * the reservation window range, if the window size is too small,
- * multiple blocks allocation has to stop at the end of the reservation
- * window. To make this more efficient, given the total number of
- * blocks needed and the current size of the window, we try to
- * expand the reservation window size if necessary on a best-effort
- * basis before ext3_new_blocks() tries to allocate blocks,
- */
-static void try_to_extend_reservation(struct ext3_reserve_window_node *my_rsv,
-			struct super_block *sb, int size)
-{
-	struct ext3_reserve_window_node *next_rsv;
-	struct rb_node *next;
-	spinlock_t *rsv_lock = &EXT3_SB(sb)->s_rsv_window_lock;
-
-	if (!spin_trylock(rsv_lock))
-		return;
-
-	next = rb_next(&my_rsv->rsv_node);
-
-	if (!next)
-		my_rsv->rsv_end += size;
-	else {
-		next_rsv = rb_entry(next, struct ext3_reserve_window_node, rsv_node);
-
-		if ((next_rsv->rsv_start - my_rsv->rsv_end - 1) >= size)
-			my_rsv->rsv_end += size;
-		else
-			my_rsv->rsv_end = next_rsv->rsv_start - 1;
-	}
-	spin_unlock(rsv_lock);
-}
-
-/**
- * ext3_try_to_allocate_with_rsv()
- * @sb:			superblock
- * @handle:		handle to this transaction
- * @group:		given allocation block group
- * @bitmap_bh:		bufferhead holds the block bitmap
- * @grp_goal:		given target block within the group
- * @my_rsv:		reservation window
- * @count:		target number of blocks to allocate
- * @errp:		pointer to store the error code
- *
- * This is the main function used to allocate a new block and its reservation
- * window.
- *
- * Each time when a new block allocation is need, first try to allocate from
- * its own reservation.  If it does not have a reservation window, instead of
- * looking for a free bit on bitmap first, then look up the reservation list to
- * see if it is inside somebody else's reservation window, we try to allocate a
- * reservation window for it starting from the goal first. Then do the block
- * allocation within the reservation window.
- *
- * This will avoid keeping on searching the reservation list again and
- * again when somebody is looking for a free block (without
- * reservation), and there are lots of free blocks, but they are all
- * being reserved.
- *
- * We use a red-black tree for the per-filesystem reservation list.
- *
- */
-static ext3_grpblk_t
-ext3_try_to_allocate_with_rsv(struct super_block *sb, handle_t *handle,
-			unsigned int group, struct buffer_head *bitmap_bh,
-			ext3_grpblk_t grp_goal,
-			struct ext3_reserve_window_node * my_rsv,
-			unsigned long *count, int *errp)
-{
-	ext3_fsblk_t group_first_block, group_last_block;
-	ext3_grpblk_t ret = 0;
-	int fatal;
-	unsigned long num = *count;
-
-	*errp = 0;
-
-	/*
-	 * Make sure we use undo access for the bitmap, because it is critical
-	 * that we do the frozen_data COW on bitmap buffers in all cases even
-	 * if the buffer is in BJ_Forget state in the committing transaction.
-	 */
-	BUFFER_TRACE(bitmap_bh, "get undo access for new block");
-	fatal = ext3_journal_get_undo_access(handle, bitmap_bh);
-	if (fatal) {
-		*errp = fatal;
-		return -1;
-	}
-
-	/*
-	 * we don't deal with reservation when
-	 * filesystem is mounted without reservation
-	 * or the file is not a regular file
-	 * or last attempt to allocate a block with reservation turned on failed
-	 */
-	if (my_rsv == NULL ) {
-		ret = ext3_try_to_allocate(sb, handle, group, bitmap_bh,
-						grp_goal, count, NULL);
-		goto out;
-	}
-	/*
-	 * grp_goal is a group relative block number (if there is a goal)
-	 * 0 <= grp_goal < EXT3_BLOCKS_PER_GROUP(sb)
-	 * first block is a filesystem wide block number
-	 * first block is the block number of the first block in this group
-	 */
-	group_first_block = ext3_group_first_block_no(sb, group);
-	group_last_block = group_first_block + (EXT3_BLOCKS_PER_GROUP(sb) - 1);
-
-	/*
-	 * Basically we will allocate a new block from inode's reservation
-	 * window.
-	 *
-	 * We need to allocate a new reservation window, if:
-	 * a) inode does not have a reservation window; or
-	 * b) last attempt to allocate a block from existing reservation
-	 *    failed; or
-	 * c) we come here with a goal and with a reservation window
-	 *
-	 * We do not need to allocate a new reservation window if we come here
-	 * at the beginning with a goal and the goal is inside the window, or
-	 * we don't have a goal but already have a reservation window.
-	 * then we could go to allocate from the reservation window directly.
-	 */
-	while (1) {
-		if (rsv_is_empty(&my_rsv->rsv_window) || (ret < 0) ||
-			!goal_in_my_reservation(&my_rsv->rsv_window,
-						grp_goal, group, sb)) {
-			if (my_rsv->rsv_goal_size < *count)
-				my_rsv->rsv_goal_size = *count;
-			ret = alloc_new_reservation(my_rsv, grp_goal, sb,
-							group, bitmap_bh);
-			if (ret < 0)
-				break;			/* failed */
-
-			if (!goal_in_my_reservation(&my_rsv->rsv_window,
-							grp_goal, group, sb))
-				grp_goal = -1;
-		} else if (grp_goal >= 0) {
-			int curr = my_rsv->rsv_end -
-					(grp_goal + group_first_block) + 1;
-
-			if (curr < *count)
-				try_to_extend_reservation(my_rsv, sb,
-							*count - curr);
-		}
-
-		if ((my_rsv->rsv_start > group_last_block) ||
-				(my_rsv->rsv_end < group_first_block)) {
-			rsv_window_dump(&EXT3_SB(sb)->s_rsv_window_root, 1);
-			BUG();
-		}
-		ret = ext3_try_to_allocate(sb, handle, group, bitmap_bh,
-					   grp_goal, &num, &my_rsv->rsv_window);
-		if (ret >= 0) {
-			my_rsv->rsv_alloc_hit += num;
-			*count = num;
-			break;				/* succeed */
-		}
-		num = *count;
-	}
-out:
-	if (ret >= 0) {
-		BUFFER_TRACE(bitmap_bh, "journal_dirty_metadata for "
-					"bitmap block");
-		fatal = ext3_journal_dirty_metadata(handle, bitmap_bh);
-		if (fatal) {
-			*errp = fatal;
-			return -1;
-		}
-		return ret;
-	}
-
-	BUFFER_TRACE(bitmap_bh, "journal_release_buffer");
-	ext3_journal_release_buffer(handle, bitmap_bh);
-	return ret;
-}
-
-/**
- * ext3_has_free_blocks()
- * @sbi:		in-core super block structure.
- *
- * Check if filesystem has at least 1 free block available for allocation.
- */
-static int ext3_has_free_blocks(struct ext3_sb_info *sbi, int use_reservation)
-{
-	ext3_fsblk_t free_blocks, root_blocks;
-
-	free_blocks = percpu_counter_read_positive(&sbi->s_freeblocks_counter);
-	root_blocks = le32_to_cpu(sbi->s_es->s_r_blocks_count);
-	if (free_blocks < root_blocks + 1 && !capable(CAP_SYS_RESOURCE) &&
-		!use_reservation && !uid_eq(sbi->s_resuid, current_fsuid()) &&
-		(gid_eq(sbi->s_resgid, GLOBAL_ROOT_GID) ||
-		 !in_group_p (sbi->s_resgid))) {
-		return 0;
-	}
-	return 1;
-}
-
-/**
- * ext3_should_retry_alloc()
- * @sb:			super block
- * @retries		number of attemps has been made
- *
- * ext3_should_retry_alloc() is called when ENOSPC is returned, and if
- * it is profitable to retry the operation, this function will wait
- * for the current or committing transaction to complete, and then
- * return TRUE.
- *
- * if the total number of retries exceed three times, return FALSE.
- */
-int ext3_should_retry_alloc(struct super_block *sb, int *retries)
-{
-	if (!ext3_has_free_blocks(EXT3_SB(sb), 0) || (*retries)++ > 3)
-		return 0;
-
-	jbd_debug(1, "%s: retrying operation after ENOSPC\n", sb->s_id);
-
-	return journal_force_commit_nested(EXT3_SB(sb)->s_journal);
-}
-
-/**
- * ext3_new_blocks() -- core block(s) allocation function
- * @handle:		handle to this transaction
- * @inode:		file inode
- * @goal:		given target block(filesystem wide)
- * @count:		target number of blocks to allocate
- * @errp:		error code
- *
- * ext3_new_blocks uses a goal block to assist allocation.  It tries to
- * allocate block(s) from the block group contains the goal block first. If that
- * fails, it will try to allocate block(s) from other block groups without
- * any specific goal block.
- *
- */
-ext3_fsblk_t ext3_new_blocks(handle_t *handle, struct inode *inode,
-			ext3_fsblk_t goal, unsigned long *count, int *errp)
-{
-	struct buffer_head *bitmap_bh = NULL;
-	struct buffer_head *gdp_bh;
-	int group_no;
-	int goal_group;
-	ext3_grpblk_t grp_target_blk;	/* blockgroup relative goal block */
-	ext3_grpblk_t grp_alloc_blk;	/* blockgroup-relative allocated block*/
-	ext3_fsblk_t ret_block;		/* filesyetem-wide allocated block */
-	int bgi;			/* blockgroup iteration index */
-	int fatal = 0, err;
-	int performed_allocation = 0;
-	ext3_grpblk_t free_blocks;	/* number of free blocks in a group */
-	struct super_block *sb;
-	struct ext3_group_desc *gdp;
-	struct ext3_super_block *es;
-	struct ext3_sb_info *sbi;
-	struct ext3_reserve_window_node *my_rsv = NULL;
-	struct ext3_block_alloc_info *block_i;
-	unsigned short windowsz = 0;
-#ifdef EXT3FS_DEBUG
-	static int goal_hits, goal_attempts;
-#endif
-	unsigned long ngroups;
-	unsigned long num = *count;
-
-	*errp = -ENOSPC;
-	sb = inode->i_sb;
-
-	/*
-	 * Check quota for allocation of this block.
-	 */
-	err = dquot_alloc_block(inode, num);
-	if (err) {
-		*errp = err;
-		return 0;
-	}
-
-	trace_ext3_request_blocks(inode, goal, num);
-
-	sbi = EXT3_SB(sb);
-	es = sbi->s_es;
-	ext3_debug("goal=%lu.\n", goal);
-	/*
-	 * Allocate a block from reservation only when
-	 * filesystem is mounted with reservation(default,-o reservation), and
-	 * it's a regular file, and
-	 * the desired window size is greater than 0 (One could use ioctl
-	 * command EXT3_IOC_SETRSVSZ to set the window size to 0 to turn off
-	 * reservation on that particular file)
-	 */
-	block_i = EXT3_I(inode)->i_block_alloc_info;
-	if (block_i && ((windowsz = block_i->rsv_window_node.rsv_goal_size) > 0))
-		my_rsv = &block_i->rsv_window_node;
-
-	if (!ext3_has_free_blocks(sbi, IS_NOQUOTA(inode))) {
-		*errp = -ENOSPC;
-		goto out;
-	}
-
-	/*
-	 * First, test whether the goal block is free.
-	 */
-	if (goal < le32_to_cpu(es->s_first_data_block) ||
-	    goal >= le32_to_cpu(es->s_blocks_count))
-		goal = le32_to_cpu(es->s_first_data_block);
-	group_no = (goal - le32_to_cpu(es->s_first_data_block)) /
-			EXT3_BLOCKS_PER_GROUP(sb);
-	goal_group = group_no;
-retry_alloc:
-	gdp = ext3_get_group_desc(sb, group_no, &gdp_bh);
-	if (!gdp)
-		goto io_error;
-
-	free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
-	/*
-	 * if there is not enough free blocks to make a new resevation
-	 * turn off reservation for this allocation
-	 */
-	if (my_rsv && (free_blocks < windowsz)
-		&& (free_blocks > 0)
-		&& (rsv_is_empty(&my_rsv->rsv_window)))
-		my_rsv = NULL;
-
-	if (free_blocks > 0) {
-		grp_target_blk = ((goal - le32_to_cpu(es->s_first_data_block)) %
-				EXT3_BLOCKS_PER_GROUP(sb));
-		bitmap_bh = read_block_bitmap(sb, group_no);
-		if (!bitmap_bh)
-			goto io_error;
-		grp_alloc_blk = ext3_try_to_allocate_with_rsv(sb, handle,
-					group_no, bitmap_bh, grp_target_blk,
-					my_rsv,	&num, &fatal);
-		if (fatal)
-			goto out;
-		if (grp_alloc_blk >= 0)
-			goto allocated;
-	}
-
-	ngroups = EXT3_SB(sb)->s_groups_count;
-	smp_rmb();
-
-	/*
-	 * Now search the rest of the groups.  We assume that
-	 * group_no and gdp correctly point to the last group visited.
-	 */
-	for (bgi = 0; bgi < ngroups; bgi++) {
-		group_no++;
-		if (group_no >= ngroups)
-			group_no = 0;
-		gdp = ext3_get_group_desc(sb, group_no, &gdp_bh);
-		if (!gdp)
-			goto io_error;
-		free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
-		/*
-		 * skip this group (and avoid loading bitmap) if there
-		 * are no free blocks
-		 */
-		if (!free_blocks)
-			continue;
-		/*
-		 * skip this group if the number of
-		 * free blocks is less than half of the reservation
-		 * window size.
-		 */
-		if (my_rsv && (free_blocks <= (windowsz/2)))
-			continue;
-
-		brelse(bitmap_bh);
-		bitmap_bh = read_block_bitmap(sb, group_no);
-		if (!bitmap_bh)
-			goto io_error;
-		/*
-		 * try to allocate block(s) from this group, without a goal(-1).
-		 */
-		grp_alloc_blk = ext3_try_to_allocate_with_rsv(sb, handle,
-					group_no, bitmap_bh, -1, my_rsv,
-					&num, &fatal);
-		if (fatal)
-			goto out;
-		if (grp_alloc_blk >= 0)
-			goto allocated;
-	}
-	/*
-	 * We may end up a bogus earlier ENOSPC error due to
-	 * filesystem is "full" of reservations, but
-	 * there maybe indeed free blocks available on disk
-	 * In this case, we just forget about the reservations
-	 * just do block allocation as without reservations.
-	 */
-	if (my_rsv) {
-		my_rsv = NULL;
-		windowsz = 0;
-		group_no = goal_group;
-		goto retry_alloc;
-	}
-	/* No space left on the device */
-	*errp = -ENOSPC;
-	goto out;
-
-allocated:
-
-	ext3_debug("using block group %d(%d)\n",
-			group_no, gdp->bg_free_blocks_count);
-
-	BUFFER_TRACE(gdp_bh, "get_write_access");
-	fatal = ext3_journal_get_write_access(handle, gdp_bh);
-	if (fatal)
-		goto out;
-
-	ret_block = grp_alloc_blk + ext3_group_first_block_no(sb, group_no);
-
-	if (in_range(le32_to_cpu(gdp->bg_block_bitmap), ret_block, num) ||
-	    in_range(le32_to_cpu(gdp->bg_inode_bitmap), ret_block, num) ||
-	    in_range(ret_block, le32_to_cpu(gdp->bg_inode_table),
-		      EXT3_SB(sb)->s_itb_per_group) ||
-	    in_range(ret_block + num - 1, le32_to_cpu(gdp->bg_inode_table),
-		      EXT3_SB(sb)->s_itb_per_group)) {
-		ext3_error(sb, "ext3_new_block",
-			    "Allocating block in system zone - "
-			    "blocks from "E3FSBLK", length %lu",
-			     ret_block, num);
-		/*
-		 * claim_block() marked the blocks we allocated as in use. So we
-		 * may want to selectively mark some of the blocks as free.
-		 */
-		goto retry_alloc;
-	}
-
-	performed_allocation = 1;
-
-#ifdef CONFIG_JBD_DEBUG
-	{
-		struct buffer_head *debug_bh;
-
-		/* Record bitmap buffer state in the newly allocated block */
-		debug_bh = sb_find_get_block(sb, ret_block);
-		if (debug_bh) {
-			BUFFER_TRACE(debug_bh, "state when allocated");
-			BUFFER_TRACE2(debug_bh, bitmap_bh, "bitmap state");
-			brelse(debug_bh);
-		}
-	}
-	jbd_lock_bh_state(bitmap_bh);
-	spin_lock(sb_bgl_lock(sbi, group_no));
-	if (buffer_jbd(bitmap_bh) && bh2jh(bitmap_bh)->b_committed_data) {
-		int i;
-
-		for (i = 0; i < num; i++) {
-			if (ext3_test_bit(grp_alloc_blk+i,
-					bh2jh(bitmap_bh)->b_committed_data)) {
-				printk("%s: block was unexpectedly set in "
-					"b_committed_data\n", __func__);
-			}
-		}
-	}
-	ext3_debug("found bit %d\n", grp_alloc_blk);
-	spin_unlock(sb_bgl_lock(sbi, group_no));
-	jbd_unlock_bh_state(bitmap_bh);
-#endif
-
-	if (ret_block + num - 1 >= le32_to_cpu(es->s_blocks_count)) {
-		ext3_error(sb, "ext3_new_block",
-			    "block("E3FSBLK") >= blocks count(%d) - "
-			    "block_group = %d, es == %p ", ret_block,
-			le32_to_cpu(es->s_blocks_count), group_no, es);
-		goto out;
-	}
-
-	/*
-	 * It is up to the caller to add the new buffer to a journal
-	 * list of some description.  We don't know in advance whether
-	 * the caller wants to use it as metadata or data.
-	 */
-	ext3_debug("allocating block %lu. Goal hits %d of %d.\n",
-			ret_block, goal_hits, goal_attempts);
-
-	spin_lock(sb_bgl_lock(sbi, group_no));
-	le16_add_cpu(&gdp->bg_free_blocks_count, -num);
-	spin_unlock(sb_bgl_lock(sbi, group_no));
-	percpu_counter_sub(&sbi->s_freeblocks_counter, num);
-
-	BUFFER_TRACE(gdp_bh, "journal_dirty_metadata for group descriptor");
-	fatal = ext3_journal_dirty_metadata(handle, gdp_bh);
-	if (fatal)
-		goto out;
-
-	*errp = 0;
-	brelse(bitmap_bh);
-
-	if (num < *count) {
-		dquot_free_block(inode, *count-num);
-		*count = num;
-	}
-
-	trace_ext3_allocate_blocks(inode, goal, num,
-				   (unsigned long long)ret_block);
-
-	return ret_block;
-
-io_error:
-	*errp = -EIO;
-out:
-	if (fatal) {
-		*errp = fatal;
-		ext3_std_error(sb, fatal);
-	}
-	/*
-	 * Undo the block allocation
-	 */
-	if (!performed_allocation)
-		dquot_free_block(inode, *count);
-	brelse(bitmap_bh);
-	return 0;
-}
-
-ext3_fsblk_t ext3_new_block(handle_t *handle, struct inode *inode,
-			ext3_fsblk_t goal, int *errp)
-{
-	unsigned long count = 1;
-
-	return ext3_new_blocks(handle, inode, goal, &count, errp);
-}
-
-/**
- * ext3_count_free_blocks() -- count filesystem free blocks
- * @sb:		superblock
- *
- * Adds up the number of free blocks from each block group.
- */
-ext3_fsblk_t ext3_count_free_blocks(struct super_block *sb)
-{
-	ext3_fsblk_t desc_count;
-	struct ext3_group_desc *gdp;
-	int i;
-	unsigned long ngroups = EXT3_SB(sb)->s_groups_count;
-#ifdef EXT3FS_DEBUG
-	struct ext3_super_block *es;
-	ext3_fsblk_t bitmap_count;
-	unsigned long x;
-	struct buffer_head *bitmap_bh = NULL;
-
-	es = EXT3_SB(sb)->s_es;
-	desc_count = 0;
-	bitmap_count = 0;
-	gdp = NULL;
-
-	smp_rmb();
-	for (i = 0; i < ngroups; i++) {
-		gdp = ext3_get_group_desc(sb, i, NULL);
-		if (!gdp)
-			continue;
-		desc_count += le16_to_cpu(gdp->bg_free_blocks_count);
-		brelse(bitmap_bh);
-		bitmap_bh = read_block_bitmap(sb, i);
-		if (bitmap_bh == NULL)
-			continue;
-
-		x = ext3_count_free(bitmap_bh, sb->s_blocksize);
-		printk("group %d: stored = %d, counted = %lu\n",
-			i, le16_to_cpu(gdp->bg_free_blocks_count), x);
-		bitmap_count += x;
-	}
-	brelse(bitmap_bh);
-	printk("ext3_count_free_blocks: stored = "E3FSBLK
-		", computed = "E3FSBLK", "E3FSBLK"\n",
-	       (ext3_fsblk_t)le32_to_cpu(es->s_free_blocks_count),
-		desc_count, bitmap_count);
-	return bitmap_count;
-#else
-	desc_count = 0;
-	smp_rmb();
-	for (i = 0; i < ngroups; i++) {
-		gdp = ext3_get_group_desc(sb, i, NULL);
-		if (!gdp)
-			continue;
-		desc_count += le16_to_cpu(gdp->bg_free_blocks_count);
-	}
-
-	return desc_count;
-#endif
-}
-
-static inline int test_root(int a, int b)
-{
-	int num = b;
-
-	while (a > num)
-		num *= b;
-	return num == a;
-}
-
-static int ext3_group_sparse(int group)
-{
-	if (group <= 1)
-		return 1;
-	if (!(group & 1))
-		return 0;
-	return (test_root(group, 7) || test_root(group, 5) ||
-		test_root(group, 3));
-}
-
-/**
- *	ext3_bg_has_super - number of blocks used by the superblock in group
- *	@sb: superblock for filesystem
- *	@group: group number to check
- *
- *	Return the number of blocks used by the superblock (primary or backup)
- *	in this group.  Currently this will be only 0 or 1.
- */
-int ext3_bg_has_super(struct super_block *sb, int group)
-{
-	if (EXT3_HAS_RO_COMPAT_FEATURE(sb,
-				EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER) &&
-			!ext3_group_sparse(group))
-		return 0;
-	return 1;
-}
-
-static unsigned long ext3_bg_num_gdb_meta(struct super_block *sb, int group)
-{
-	unsigned long metagroup = group / EXT3_DESC_PER_BLOCK(sb);
-	unsigned long first = metagroup * EXT3_DESC_PER_BLOCK(sb);
-	unsigned long last = first + EXT3_DESC_PER_BLOCK(sb) - 1;
-
-	if (group == first || group == first + 1 || group == last)
-		return 1;
-	return 0;
-}
-
-static unsigned long ext3_bg_num_gdb_nometa(struct super_block *sb, int group)
-{
-	return ext3_bg_has_super(sb, group) ? EXT3_SB(sb)->s_gdb_count : 0;
-}
-
-/**
- *	ext3_bg_num_gdb - number of blocks used by the group table in group
- *	@sb: superblock for filesystem
- *	@group: group number to check
- *
- *	Return the number of blocks used by the group descriptor table
- *	(primary or backup) in this group.  In the future there may be a
- *	different number of descriptor blocks in each group.
- */
-unsigned long ext3_bg_num_gdb(struct super_block *sb, int group)
-{
-	unsigned long first_meta_bg =
-			le32_to_cpu(EXT3_SB(sb)->s_es->s_first_meta_bg);
-	unsigned long metagroup = group / EXT3_DESC_PER_BLOCK(sb);
-
-	if (!EXT3_HAS_INCOMPAT_FEATURE(sb,EXT3_FEATURE_INCOMPAT_META_BG) ||
-			metagroup < first_meta_bg)
-		return ext3_bg_num_gdb_nometa(sb,group);
-
-	return ext3_bg_num_gdb_meta(sb,group);
-
-}
-
-/**
- * ext3_trim_all_free -- function to trim all free space in alloc. group
- * @sb:			super block for file system
- * @group:		allocation group to trim
- * @start:		first group block to examine
- * @max:		last group block to examine
- * @gdp:		allocation group description structure
- * @minblocks:		minimum extent block count
- *
- * ext3_trim_all_free walks through group's block bitmap searching for free
- * blocks. When the free block is found, it tries to allocate this block and
- * consequent free block to get the biggest free extent possible, until it
- * reaches any used block. Then issue a TRIM command on this extent and free
- * the extent in the block bitmap. This is done until whole group is scanned.
- */
-static ext3_grpblk_t ext3_trim_all_free(struct super_block *sb,
-					unsigned int group,
-					ext3_grpblk_t start, ext3_grpblk_t max,
-					ext3_grpblk_t minblocks)
-{
-	handle_t *handle;
-	ext3_grpblk_t next, free_blocks, bit, freed, count = 0;
-	ext3_fsblk_t discard_block;
-	struct ext3_sb_info *sbi;
-	struct buffer_head *gdp_bh, *bitmap_bh = NULL;
-	struct ext3_group_desc *gdp;
-	int err = 0, ret = 0;
-
-	/*
-	 * We will update one block bitmap, and one group descriptor
-	 */
-	handle = ext3_journal_start_sb(sb, 2);
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-
-	bitmap_bh = read_block_bitmap(sb, group);
-	if (!bitmap_bh) {
-		err = -EIO;
-		goto err_out;
-	}
-
-	BUFFER_TRACE(bitmap_bh, "getting undo access");
-	err = ext3_journal_get_undo_access(handle, bitmap_bh);
-	if (err)
-		goto err_out;
-
-	gdp = ext3_get_group_desc(sb, group, &gdp_bh);
-	if (!gdp) {
-		err = -EIO;
-		goto err_out;
-	}
-
-	BUFFER_TRACE(gdp_bh, "get_write_access");
-	err = ext3_journal_get_write_access(handle, gdp_bh);
-	if (err)
-		goto err_out;
-
-	free_blocks = le16_to_cpu(gdp->bg_free_blocks_count);
-	sbi = EXT3_SB(sb);
-
-	 /* Walk through the whole group */
-	while (start <= max) {
-		start = bitmap_search_next_usable_block(start, bitmap_bh, max);
-		if (start < 0)
-			break;
-		next = start;
-
-		/*
-		 * Allocate contiguous free extents by setting bits in the
-		 * block bitmap
-		 */
-		while (next <= max
-			&& claim_block(sb_bgl_lock(sbi, group),
-					next, bitmap_bh)) {
-			next++;
-		}
-
-		 /* We did not claim any blocks */
-		if (next == start)
-			continue;
-
-		discard_block = (ext3_fsblk_t)start +
-				ext3_group_first_block_no(sb, group);
-
-		/* Update counters */
-		spin_lock(sb_bgl_lock(sbi, group));
-		le16_add_cpu(&gdp->bg_free_blocks_count, start - next);
-		spin_unlock(sb_bgl_lock(sbi, group));
-		percpu_counter_sub(&sbi->s_freeblocks_counter, next - start);
-
-		free_blocks -= next - start;
-		/* Do not issue a TRIM on extents smaller than minblocks */
-		if ((next - start) < minblocks)
-			goto free_extent;
-
-		trace_ext3_discard_blocks(sb, discard_block, next - start);
-		 /* Send the TRIM command down to the device */
-		err = sb_issue_discard(sb, discard_block, next - start,
-				       GFP_NOFS, 0);
-		count += (next - start);
-free_extent:
-		freed = 0;
-
-		/*
-		 * Clear bits in the bitmap
-		 */
-		for (bit = start; bit < next; bit++) {
-			BUFFER_TRACE(bitmap_bh, "clear bit");
-			if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, group),
-						bit, bitmap_bh->b_data)) {
-				ext3_error(sb, __func__,
-					"bit already cleared for block "E3FSBLK,
-					 (unsigned long)bit);
-				BUFFER_TRACE(bitmap_bh, "bit already cleared");
-			} else {
-				freed++;
-			}
-		}
-
-		/* Update couters */
-		spin_lock(sb_bgl_lock(sbi, group));
-		le16_add_cpu(&gdp->bg_free_blocks_count, freed);
-		spin_unlock(sb_bgl_lock(sbi, group));
-		percpu_counter_add(&sbi->s_freeblocks_counter, freed);
-
-		start = next;
-		if (err < 0) {
-			if (err != -EOPNOTSUPP)
-				ext3_warning(sb, __func__, "Discard command "
-					     "returned error %d\n", err);
-			break;
-		}
-
-		if (fatal_signal_pending(current)) {
-			err = -ERESTARTSYS;
-			break;
-		}
-
-		cond_resched();
-
-		/* No more suitable extents */
-		if (free_blocks < minblocks)
-			break;
-	}
-
-	/* We dirtied the bitmap block */
-	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
-	ret = ext3_journal_dirty_metadata(handle, bitmap_bh);
-	if (!err)
-		err = ret;
-
-	/* And the group descriptor block */
-	BUFFER_TRACE(gdp_bh, "dirtied group descriptor block");
-	ret = ext3_journal_dirty_metadata(handle, gdp_bh);
-	if (!err)
-		err = ret;
-
-	ext3_debug("trimmed %d blocks in the group %d\n",
-		count, group);
-
-err_out:
-	if (err)
-		count = err;
-	ext3_journal_stop(handle);
-	brelse(bitmap_bh);
-
-	return count;
-}
-
-/**
- * ext3_trim_fs() -- trim ioctl handle function
- * @sb:			superblock for filesystem
- * @start:		First Byte to trim
- * @len:		number of Bytes to trim from start
- * @minlen:		minimum extent length in Bytes
- *
- * ext3_trim_fs goes through all allocation groups containing Bytes from
- * start to start+len. For each such a group ext3_trim_all_free function
- * is invoked to trim all free space.
- */
-int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range)
-{
-	ext3_grpblk_t last_block, first_block;
-	unsigned long group, first_group, last_group;
-	struct ext3_group_desc *gdp;
-	struct ext3_super_block *es = EXT3_SB(sb)->s_es;
-	uint64_t start, minlen, end, trimmed = 0;
-	ext3_fsblk_t first_data_blk =
-			le32_to_cpu(EXT3_SB(sb)->s_es->s_first_data_block);
-	ext3_fsblk_t max_blks = le32_to_cpu(es->s_blocks_count);
-	int ret = 0;
-
-	start = range->start >> sb->s_blocksize_bits;
-	end = start + (range->len >> sb->s_blocksize_bits) - 1;
-	minlen = range->minlen >> sb->s_blocksize_bits;
-
-	if (minlen > EXT3_BLOCKS_PER_GROUP(sb) ||
-	    start >= max_blks ||
-	    range->len < sb->s_blocksize)
-		return -EINVAL;
-	if (end >= max_blks)
-		end = max_blks - 1;
-	if (end <= first_data_blk)
-		goto out;
-	if (start < first_data_blk)
-		start = first_data_blk;
-
-	smp_rmb();
-
-	/* Determine first and last group to examine based on start and len */
-	ext3_get_group_no_and_offset(sb, (ext3_fsblk_t) start,
-				     &first_group, &first_block);
-	ext3_get_group_no_and_offset(sb, (ext3_fsblk_t) end,
-				     &last_group, &last_block);
-
-	/* end now represents the last block to discard in this group */
-	end = EXT3_BLOCKS_PER_GROUP(sb) - 1;
-
-	for (group = first_group; group <= last_group; group++) {
-		gdp = ext3_get_group_desc(sb, group, NULL);
-		if (!gdp)
-			break;
-
-		/*
-		 * For all the groups except the last one, last block will
-		 * always be EXT3_BLOCKS_PER_GROUP(sb)-1, so we only need to
-		 * change it for the last group, note that last_block is
-		 * already computed earlier by ext3_get_group_no_and_offset()
-		 */
-		if (group == last_group)
-			end = last_block;
-
-		if (le16_to_cpu(gdp->bg_free_blocks_count) >= minlen) {
-			ret = ext3_trim_all_free(sb, group, first_block,
-						 end, minlen);
-			if (ret < 0)
-				break;
-			trimmed += ret;
-		}
-
-		/*
-		 * For every group except the first one, we are sure
-		 * that the first block to discard will be block #0.
-		 */
-		first_block = 0;
-	}
-
-	if (ret > 0)
-		ret = 0;
-
-out:
-	range->len = trimmed * sb->s_blocksize;
-	return ret;
-}
diff --git a/fs/ext3/bitmap.c b/fs/ext3/bitmap.c
deleted file mode 100644
index ef9c643e8e9d..000000000000
--- a/fs/ext3/bitmap.c
+++ /dev/null
@@ -1,20 +0,0 @@
-/*
- *  linux/fs/ext3/bitmap.c
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- */
-
-#include "ext3.h"
-
-#ifdef EXT3FS_DEBUG
-
-unsigned long ext3_count_free (struct buffer_head * map, unsigned int numchars)
-{
-	return numchars * BITS_PER_BYTE - memweight(map->b_data, numchars);
-}
-
-#endif  /*  EXT3FS_DEBUG  */
-
diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
deleted file mode 100644
index 17742eed2c16..000000000000
--- a/fs/ext3/dir.c
+++ /dev/null
@@ -1,537 +0,0 @@
-/*
- *  linux/fs/ext3/dir.c
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- *
- *  from
- *
- *  linux/fs/minix/dir.c
- *
- *  Copyright (C) 1991, 1992  Linus Torvalds
- *
- *  ext3 directory handling functions
- *
- *  Big-endian to little-endian byte-swapping/bitmaps by
- *        David S. Miller (davem@caip.rutgers.edu), 1995
- *
- * Hash Tree Directory indexing (c) 2001  Daniel Phillips
- *
- */
-
-#include <linux/compat.h>
-#include "ext3.h"
-
-static unsigned char ext3_filetype_table[] = {
-	DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
-};
-
-static int ext3_dx_readdir(struct file *, struct dir_context *);
-
-static unsigned char get_dtype(struct super_block *sb, int filetype)
-{
-	if (!EXT3_HAS_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_FILETYPE) ||
-	    (filetype >= EXT3_FT_MAX))
-		return DT_UNKNOWN;
-
-	return (ext3_filetype_table[filetype]);
-}
-
-/**
- * Check if the given dir-inode refers to an htree-indexed directory
- * (or a directory which could potentially get converted to use htree
- * indexing).
- *
- * Return 1 if it is a dx dir, 0 if not
- */
-static int is_dx_dir(struct inode *inode)
-{
-	struct super_block *sb = inode->i_sb;
-
-	if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
-		     EXT3_FEATURE_COMPAT_DIR_INDEX) &&
-	    ((EXT3_I(inode)->i_flags & EXT3_INDEX_FL) ||
-	     ((inode->i_size >> sb->s_blocksize_bits) == 1)))
-		return 1;
-
-	return 0;
-}
-
-int ext3_check_dir_entry (const char * function, struct inode * dir,
-			  struct ext3_dir_entry_2 * de,
-			  struct buffer_head * bh,
-			  unsigned long offset)
-{
-	const char * error_msg = NULL;
-	const int rlen = ext3_rec_len_from_disk(de->rec_len);
-
-	if (unlikely(rlen < EXT3_DIR_REC_LEN(1)))
-		error_msg = "rec_len is smaller than minimal";
-	else if (unlikely(rlen % 4 != 0))
-		error_msg = "rec_len % 4 != 0";
-	else if (unlikely(rlen < EXT3_DIR_REC_LEN(de->name_len)))
-		error_msg = "rec_len is too small for name_len";
-	else if (unlikely((((char *) de - bh->b_data) + rlen > dir->i_sb->s_blocksize)))
-		error_msg = "directory entry across blocks";
-	else if (unlikely(le32_to_cpu(de->inode) >
-			le32_to_cpu(EXT3_SB(dir->i_sb)->s_es->s_inodes_count)))
-		error_msg = "inode out of bounds";
-
-	if (unlikely(error_msg != NULL))
-		ext3_error (dir->i_sb, function,
-			"bad entry in directory #%lu: %s - "
-			"offset=%lu, inode=%lu, rec_len=%d, name_len=%d",
-			dir->i_ino, error_msg, offset,
-			(unsigned long) le32_to_cpu(de->inode),
-			rlen, de->name_len);
-
-	return error_msg == NULL ? 1 : 0;
-}
-
-static int ext3_readdir(struct file *file, struct dir_context *ctx)
-{
-	unsigned long offset;
-	int i;
-	struct ext3_dir_entry_2 *de;
-	int err;
-	struct inode *inode = file_inode(file);
-	struct super_block *sb = inode->i_sb;
-	int dir_has_error = 0;
-
-	if (is_dx_dir(inode)) {
-		err = ext3_dx_readdir(file, ctx);
-		if (err != ERR_BAD_DX_DIR)
-			return err;
-		/*
-		 * We don't set the inode dirty flag since it's not
-		 * critical that it get flushed back to the disk.
-		 */
-		EXT3_I(inode)->i_flags &= ~EXT3_INDEX_FL;
-	}
-	offset = ctx->pos & (sb->s_blocksize - 1);
-
-	while (ctx->pos < inode->i_size) {
-		unsigned long blk = ctx->pos >> EXT3_BLOCK_SIZE_BITS(sb);
-		struct buffer_head map_bh;
-		struct buffer_head *bh = NULL;
-
-		map_bh.b_state = 0;
-		err = ext3_get_blocks_handle(NULL, inode, blk, 1, &map_bh, 0);
-		if (err > 0) {
-			pgoff_t index = map_bh.b_blocknr >>
-					(PAGE_CACHE_SHIFT - inode->i_blkbits);
-			if (!ra_has_index(&file->f_ra, index))
-				page_cache_sync_readahead(
-					sb->s_bdev->bd_inode->i_mapping,
-					&file->f_ra, file,
-					index, 1);
-			file->f_ra.prev_pos = (loff_t)index << PAGE_CACHE_SHIFT;
-			bh = ext3_bread(NULL, inode, blk, 0, &err);
-		}
-
-		/*
-		 * We ignore I/O errors on directories so users have a chance
-		 * of recovering data when there's a bad sector
-		 */
-		if (!bh) {
-			if (!dir_has_error) {
-				ext3_error(sb, __func__, "directory #%lu "
-					"contains a hole at offset %lld",
-					inode->i_ino, ctx->pos);
-				dir_has_error = 1;
-			}
-			/* corrupt size?  Maybe no more blocks to read */
-			if (ctx->pos > inode->i_blocks << 9)
-				break;
-			ctx->pos += sb->s_blocksize - offset;
-			continue;
-		}
-
-		/* If the dir block has changed since the last call to
-		 * readdir(2), then we might be pointing to an invalid
-		 * dirent right now.  Scan from the start of the block
-		 * to make sure. */
-		if (offset && file->f_version != inode->i_version) {
-			for (i = 0; i < sb->s_blocksize && i < offset; ) {
-				de = (struct ext3_dir_entry_2 *)
-					(bh->b_data + i);
-				/* It's too expensive to do a full
-				 * dirent test each time round this
-				 * loop, but we do have to test at
-				 * least that it is non-zero.  A
-				 * failure will be detected in the
-				 * dirent test below. */
-				if (ext3_rec_len_from_disk(de->rec_len) <
-						EXT3_DIR_REC_LEN(1))
-					break;
-				i += ext3_rec_len_from_disk(de->rec_len);
-			}
-			offset = i;
-			ctx->pos = (ctx->pos & ~(sb->s_blocksize - 1))
-				| offset;
-			file->f_version = inode->i_version;
-		}
-
-		while (ctx->pos < inode->i_size
-		       && offset < sb->s_blocksize) {
-			de = (struct ext3_dir_entry_2 *) (bh->b_data + offset);
-			if (!ext3_check_dir_entry ("ext3_readdir", inode, de,
-						   bh, offset)) {
-				/* On error, skip the to the
-                                   next block. */
-				ctx->pos = (ctx->pos |
-						(sb->s_blocksize - 1)) + 1;
-				break;
-			}
-			offset += ext3_rec_len_from_disk(de->rec_len);
-			if (le32_to_cpu(de->inode)) {
-				if (!dir_emit(ctx, de->name, de->name_len,
-					      le32_to_cpu(de->inode),
-					      get_dtype(sb, de->file_type))) {
-					brelse(bh);
-					return 0;
-				}
-			}
-			ctx->pos += ext3_rec_len_from_disk(de->rec_len);
-		}
-		offset = 0;
-		brelse (bh);
-		if (ctx->pos < inode->i_size)
-			if (!dir_relax(inode))
-				return 0;
-	}
-	return 0;
-}
-
-static inline int is_32bit_api(void)
-{
-#ifdef CONFIG_COMPAT
-	return is_compat_task();
-#else
-	return (BITS_PER_LONG == 32);
-#endif
-}
-
-/*
- * These functions convert from the major/minor hash to an f_pos
- * value for dx directories
- *
- * Upper layer (for example NFS) should specify FMODE_32BITHASH or
- * FMODE_64BITHASH explicitly. On the other hand, we allow ext3 to be mounted
- * directly on both 32-bit and 64-bit nodes, under such case, neither
- * FMODE_32BITHASH nor FMODE_64BITHASH is specified.
- */
-static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
-{
-	if ((filp->f_mode & FMODE_32BITHASH) ||
-	    (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
-		return major >> 1;
-	else
-		return ((__u64)(major >> 1) << 32) | (__u64)minor;
-}
-
-static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
-{
-	if ((filp->f_mode & FMODE_32BITHASH) ||
-	    (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
-		return (pos << 1) & 0xffffffff;
-	else
-		return ((pos >> 32) << 1) & 0xffffffff;
-}
-
-static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
-{
-	if ((filp->f_mode & FMODE_32BITHASH) ||
-	    (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
-		return 0;
-	else
-		return pos & 0xffffffff;
-}
-
-/*
- * Return 32- or 64-bit end-of-file for dx directories
- */
-static inline loff_t ext3_get_htree_eof(struct file *filp)
-{
-	if ((filp->f_mode & FMODE_32BITHASH) ||
-	    (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
-		return EXT3_HTREE_EOF_32BIT;
-	else
-		return EXT3_HTREE_EOF_64BIT;
-}
-
-
-/*
- * ext3_dir_llseek() calls generic_file_llseek[_size]() to handle both
- * non-htree and htree directories, where the "offset" is in terms
- * of the filename hash value instead of the byte offset.
- *
- * Because we may return a 64-bit hash that is well beyond s_maxbytes,
- * we need to pass the max hash as the maximum allowable offset in
- * the htree directory case.
- *
- * NOTE: offsets obtained *before* ext3_set_inode_flag(dir, EXT3_INODE_INDEX)
- *       will be invalid once the directory was converted into a dx directory
- */
-static loff_t ext3_dir_llseek(struct file *file, loff_t offset, int whence)
-{
-	struct inode *inode = file->f_mapping->host;
-	int dx_dir = is_dx_dir(inode);
-	loff_t htree_max = ext3_get_htree_eof(file);
-
-	if (likely(dx_dir))
-		return generic_file_llseek_size(file, offset, whence,
-					        htree_max, htree_max);
-	else
-		return generic_file_llseek(file, offset, whence);
-}
-
-/*
- * This structure holds the nodes of the red-black tree used to store
- * the directory entry in hash order.
- */
-struct fname {
-	__u32		hash;
-	__u32		minor_hash;
-	struct rb_node	rb_hash;
-	struct fname	*next;
-	__u32		inode;
-	__u8		name_len;
-	__u8		file_type;
-	char		name[0];
-};
-
-/*
- * This functoin implements a non-recursive way of freeing all of the
- * nodes in the red-black tree.
- */
-static void free_rb_tree_fname(struct rb_root *root)
-{
-	struct fname *fname, *next;
-
-	rbtree_postorder_for_each_entry_safe(fname, next, root, rb_hash)
-		do {
-			struct fname *old = fname;
-			fname = fname->next;
-			kfree(old);
-		} while (fname);
-
-	*root = RB_ROOT;
-}
-
-static struct dir_private_info *ext3_htree_create_dir_info(struct file *filp,
-							   loff_t pos)
-{
-	struct dir_private_info *p;
-
-	p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
-	if (!p)
-		return NULL;
-	p->curr_hash = pos2maj_hash(filp, pos);
-	p->curr_minor_hash = pos2min_hash(filp, pos);
-	return p;
-}
-
-void ext3_htree_free_dir_info(struct dir_private_info *p)
-{
-	free_rb_tree_fname(&p->root);
-	kfree(p);
-}
-
-/*
- * Given a directory entry, enter it into the fname rb tree.
- */
-int ext3_htree_store_dirent(struct file *dir_file, __u32 hash,
-			     __u32 minor_hash,
-			     struct ext3_dir_entry_2 *dirent)
-{
-	struct rb_node **p, *parent = NULL;
-	struct fname * fname, *new_fn;
-	struct dir_private_info *info;
-	int len;
-
-	info = (struct dir_private_info *) dir_file->private_data;
-	p = &info->root.rb_node;
-
-	/* Create and allocate the fname structure */
-	len = sizeof(struct fname) + dirent->name_len + 1;
-	new_fn = kzalloc(len, GFP_KERNEL);
-	if (!new_fn)
-		return -ENOMEM;
-	new_fn->hash = hash;
-	new_fn->minor_hash = minor_hash;
-	new_fn->inode = le32_to_cpu(dirent->inode);
-	new_fn->name_len = dirent->name_len;
-	new_fn->file_type = dirent->file_type;
-	memcpy(new_fn->name, dirent->name, dirent->name_len);
-	new_fn->name[dirent->name_len] = 0;
-
-	while (*p) {
-		parent = *p;
-		fname = rb_entry(parent, struct fname, rb_hash);
-
-		/*
-		 * If the hash and minor hash match up, then we put
-		 * them on a linked list.  This rarely happens...
-		 */
-		if ((new_fn->hash == fname->hash) &&
-		    (new_fn->minor_hash == fname->minor_hash)) {
-			new_fn->next = fname->next;
-			fname->next = new_fn;
-			return 0;
-		}
-
-		if (new_fn->hash < fname->hash)
-			p = &(*p)->rb_left;
-		else if (new_fn->hash > fname->hash)
-			p = &(*p)->rb_right;
-		else if (new_fn->minor_hash < fname->minor_hash)
-			p = &(*p)->rb_left;
-		else /* if (new_fn->minor_hash > fname->minor_hash) */
-			p = &(*p)->rb_right;
-	}
-
-	rb_link_node(&new_fn->rb_hash, parent, p);
-	rb_insert_color(&new_fn->rb_hash, &info->root);
-	return 0;
-}
-
-
-
-/*
- * This is a helper function for ext3_dx_readdir.  It calls filldir
- * for all entres on the fname linked list.  (Normally there is only
- * one entry on the linked list, unless there are 62 bit hash collisions.)
- */
-static bool call_filldir(struct file *file, struct dir_context *ctx,
-			struct fname *fname)
-{
-	struct dir_private_info *info = file->private_data;
-	struct inode *inode = file_inode(file);
-	struct super_block *sb = inode->i_sb;
-
-	if (!fname) {
-		printk("call_filldir: called with null fname?!?\n");
-		return true;
-	}
-	ctx->pos = hash2pos(file, fname->hash, fname->minor_hash);
-	while (fname) {
-		if (!dir_emit(ctx, fname->name, fname->name_len,
-				fname->inode,
-				get_dtype(sb, fname->file_type))) {
-			info->extra_fname = fname;
-			return false;
-		}
-		fname = fname->next;
-	}
-	return true;
-}
-
-static int ext3_dx_readdir(struct file *file, struct dir_context *ctx)
-{
-	struct dir_private_info *info = file->private_data;
-	struct inode *inode = file_inode(file);
-	struct fname *fname;
-	int	ret;
-
-	if (!info) {
-		info = ext3_htree_create_dir_info(file, ctx->pos);
-		if (!info)
-			return -ENOMEM;
-		file->private_data = info;
-	}
-
-	if (ctx->pos == ext3_get_htree_eof(file))
-		return 0;	/* EOF */
-
-	/* Some one has messed with f_pos; reset the world */
-	if (info->last_pos != ctx->pos) {
-		free_rb_tree_fname(&info->root);
-		info->curr_node = NULL;
-		info->extra_fname = NULL;
-		info->curr_hash = pos2maj_hash(file, ctx->pos);
-		info->curr_minor_hash = pos2min_hash(file, ctx->pos);
-	}
-
-	/*
-	 * If there are any leftover names on the hash collision
-	 * chain, return them first.
-	 */
-	if (info->extra_fname) {
-		if (!call_filldir(file, ctx, info->extra_fname))
-			goto finished;
-		info->extra_fname = NULL;
-		goto next_node;
-	} else if (!info->curr_node)
-		info->curr_node = rb_first(&info->root);
-
-	while (1) {
-		/*
-		 * Fill the rbtree if we have no more entries,
-		 * or the inode has changed since we last read in the
-		 * cached entries.
-		 */
-		if ((!info->curr_node) ||
-		    (file->f_version != inode->i_version)) {
-			info->curr_node = NULL;
-			free_rb_tree_fname(&info->root);
-			file->f_version = inode->i_version;
-			ret = ext3_htree_fill_tree(file, info->curr_hash,
-						   info->curr_minor_hash,
-						   &info->next_hash);
-			if (ret < 0)
-				return ret;
-			if (ret == 0) {
-				ctx->pos = ext3_get_htree_eof(file);
-				break;
-			}
-			info->curr_node = rb_first(&info->root);
-		}
-
-		fname = rb_entry(info->curr_node, struct fname, rb_hash);
-		info->curr_hash = fname->hash;
-		info->curr_minor_hash = fname->minor_hash;
-		if (!call_filldir(file, ctx, fname))
-			break;
-	next_node:
-		info->curr_node = rb_next(info->curr_node);
-		if (info->curr_node) {
-			fname = rb_entry(info->curr_node, struct fname,
-					 rb_hash);
-			info->curr_hash = fname->hash;
-			info->curr_minor_hash = fname->minor_hash;
-		} else {
-			if (info->next_hash == ~0) {
-				ctx->pos = ext3_get_htree_eof(file);
-				break;
-			}
-			info->curr_hash = info->next_hash;
-			info->curr_minor_hash = 0;
-		}
-	}
-finished:
-	info->last_pos = ctx->pos;
-	return 0;
-}
-
-static int ext3_release_dir (struct inode * inode, struct file * filp)
-{
-       if (filp->private_data)
-		ext3_htree_free_dir_info(filp->private_data);
-
-	return 0;
-}
-
-const struct file_operations ext3_dir_operations = {
-	.llseek		= ext3_dir_llseek,
-	.read		= generic_read_dir,
-	.iterate	= ext3_readdir,
-	.unlocked_ioctl = ext3_ioctl,
-#ifdef CONFIG_COMPAT
-	.compat_ioctl	= ext3_compat_ioctl,
-#endif
-	.fsync		= ext3_sync_file,
-	.release	= ext3_release_dir,
-};
diff --git a/fs/ext3/ext3.h b/fs/ext3/ext3.h
deleted file mode 100644
index f483a80b3fe7..000000000000
--- a/fs/ext3/ext3.h
+++ /dev/null
@@ -1,1332 +0,0 @@
-/*
- * Written by Stephen C. Tweedie <sct@redhat.com>, 1999
- *
- * Copyright 1998--1999 Red Hat corp --- All Rights Reserved
- *
- * This file is part of the Linux kernel and is made available under
- * the terms of the GNU General Public License, version 2, or at your
- * option, any later version, incorporated herein by reference.
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- *
- *  from
- *
- *  linux/include/linux/minix_fs.h
- *
- *  Copyright (C) 1991, 1992  Linus Torvalds
- */
-
-#include <linux/fs.h>
-#include <linux/jbd.h>
-#include <linux/magic.h>
-#include <linux/bug.h>
-#include <linux/blockgroup_lock.h>
-
-/*
- * The second extended filesystem constants/structures
- */
-
-/*
- * Define EXT3FS_DEBUG to produce debug messages
- */
-#undef EXT3FS_DEBUG
-
-/*
- * Define EXT3_RESERVATION to reserve data blocks for expanding files
- */
-#define EXT3_DEFAULT_RESERVE_BLOCKS     8
-/*max window size: 1024(direct blocks) + 3([t,d]indirect blocks) */
-#define EXT3_MAX_RESERVE_BLOCKS         1027
-#define EXT3_RESERVE_WINDOW_NOT_ALLOCATED 0
-
-/*
- * Debug code
- */
-#ifdef EXT3FS_DEBUG
-#define ext3_debug(f, a...)						\
-	do {								\
-		printk (KERN_DEBUG "EXT3-fs DEBUG (%s, %d): %s:",	\
-			__FILE__, __LINE__, __func__);		\
-		printk (KERN_DEBUG f, ## a);				\
-	} while (0)
-#else
-#define ext3_debug(f, a...)	do {} while (0)
-#endif
-
-/*
- * Special inodes numbers
- */
-#define	EXT3_BAD_INO		 1	/* Bad blocks inode */
-#define EXT3_ROOT_INO		 2	/* Root inode */
-#define EXT3_BOOT_LOADER_INO	 5	/* Boot loader inode */
-#define EXT3_UNDEL_DIR_INO	 6	/* Undelete directory inode */
-#define EXT3_RESIZE_INO		 7	/* Reserved group descriptors inode */
-#define EXT3_JOURNAL_INO	 8	/* Journal inode */
-
-/* First non-reserved inode for old ext3 filesystems */
-#define EXT3_GOOD_OLD_FIRST_INO	11
-
-/*
- * Maximal count of links to a file
- */
-#define EXT3_LINK_MAX		32000
-
-/*
- * Macro-instructions used to manage several block sizes
- */
-#define EXT3_MIN_BLOCK_SIZE		1024
-#define	EXT3_MAX_BLOCK_SIZE		65536
-#define EXT3_MIN_BLOCK_LOG_SIZE		10
-#define EXT3_BLOCK_SIZE(s)		((s)->s_blocksize)
-#define	EXT3_ADDR_PER_BLOCK(s)		(EXT3_BLOCK_SIZE(s) / sizeof (__u32))
-#define EXT3_BLOCK_SIZE_BITS(s)	((s)->s_blocksize_bits)
-#define	EXT3_ADDR_PER_BLOCK_BITS(s)	(EXT3_SB(s)->s_addr_per_block_bits)
-#define EXT3_INODE_SIZE(s)		(EXT3_SB(s)->s_inode_size)
-#define EXT3_FIRST_INO(s)		(EXT3_SB(s)->s_first_ino)
-
-/*
- * Macro-instructions used to manage fragments
- */
-#define EXT3_MIN_FRAG_SIZE		1024
-#define	EXT3_MAX_FRAG_SIZE		4096
-#define EXT3_MIN_FRAG_LOG_SIZE		  10
-#define EXT3_FRAG_SIZE(s)		(EXT3_SB(s)->s_frag_size)
-#define EXT3_FRAGS_PER_BLOCK(s)		(EXT3_SB(s)->s_frags_per_block)
-
-/*
- * Structure of a blocks group descriptor
- */
-struct ext3_group_desc
-{
-	__le32	bg_block_bitmap;		/* Blocks bitmap block */
-	__le32	bg_inode_bitmap;		/* Inodes bitmap block */
-	__le32	bg_inode_table;		/* Inodes table block */
-	__le16	bg_free_blocks_count;	/* Free blocks count */
-	__le16	bg_free_inodes_count;	/* Free inodes count */
-	__le16	bg_used_dirs_count;	/* Directories count */
-	__u16	bg_pad;
-	__le32	bg_reserved[3];
-};
-
-/*
- * Macro-instructions used to manage group descriptors
- */
-#define EXT3_BLOCKS_PER_GROUP(s)	(EXT3_SB(s)->s_blocks_per_group)
-#define EXT3_DESC_PER_BLOCK(s)		(EXT3_SB(s)->s_desc_per_block)
-#define EXT3_INODES_PER_GROUP(s)	(EXT3_SB(s)->s_inodes_per_group)
-#define EXT3_DESC_PER_BLOCK_BITS(s)	(EXT3_SB(s)->s_desc_per_block_bits)
-
-/*
- * Constants relative to the data blocks
- */
-#define	EXT3_NDIR_BLOCKS		12
-#define	EXT3_IND_BLOCK			EXT3_NDIR_BLOCKS
-#define	EXT3_DIND_BLOCK			(EXT3_IND_BLOCK + 1)
-#define	EXT3_TIND_BLOCK			(EXT3_DIND_BLOCK + 1)
-#define	EXT3_N_BLOCKS			(EXT3_TIND_BLOCK + 1)
-
-/*
- * Inode flags
- */
-#define	EXT3_SECRM_FL			0x00000001 /* Secure deletion */
-#define	EXT3_UNRM_FL			0x00000002 /* Undelete */
-#define	EXT3_COMPR_FL			0x00000004 /* Compress file */
-#define EXT3_SYNC_FL			0x00000008 /* Synchronous updates */
-#define EXT3_IMMUTABLE_FL		0x00000010 /* Immutable file */
-#define EXT3_APPEND_FL			0x00000020 /* writes to file may only append */
-#define EXT3_NODUMP_FL			0x00000040 /* do not dump file */
-#define EXT3_NOATIME_FL			0x00000080 /* do not update atime */
-/* Reserved for compression usage... */
-#define EXT3_DIRTY_FL			0x00000100
-#define EXT3_COMPRBLK_FL		0x00000200 /* One or more compressed clusters */
-#define EXT3_NOCOMPR_FL			0x00000400 /* Don't compress */
-#define EXT3_ECOMPR_FL			0x00000800 /* Compression error */
-/* End compression flags --- maybe not all used */
-#define EXT3_INDEX_FL			0x00001000 /* hash-indexed directory */
-#define EXT3_IMAGIC_FL			0x00002000 /* AFS directory */
-#define EXT3_JOURNAL_DATA_FL		0x00004000 /* file data should be journaled */
-#define EXT3_NOTAIL_FL			0x00008000 /* file tail should not be merged */
-#define EXT3_DIRSYNC_FL			0x00010000 /* dirsync behaviour (directories only) */
-#define EXT3_TOPDIR_FL			0x00020000 /* Top of directory hierarchies*/
-#define EXT3_RESERVED_FL		0x80000000 /* reserved for ext3 lib */
-
-#define EXT3_FL_USER_VISIBLE		0x0003DFFF /* User visible flags */
-#define EXT3_FL_USER_MODIFIABLE		0x000380FF /* User modifiable flags */
-
-/* Flags that should be inherited by new inodes from their parent. */
-#define EXT3_FL_INHERITED (EXT3_SECRM_FL | EXT3_UNRM_FL | EXT3_COMPR_FL |\
-			   EXT3_SYNC_FL | EXT3_NODUMP_FL |\
-			   EXT3_NOATIME_FL | EXT3_COMPRBLK_FL |\
-			   EXT3_NOCOMPR_FL | EXT3_JOURNAL_DATA_FL |\
-			   EXT3_NOTAIL_FL | EXT3_DIRSYNC_FL)
-
-/* Flags that are appropriate for regular files (all but dir-specific ones). */
-#define EXT3_REG_FLMASK (~(EXT3_DIRSYNC_FL | EXT3_TOPDIR_FL))
-
-/* Flags that are appropriate for non-directories/regular files. */
-#define EXT3_OTHER_FLMASK (EXT3_NODUMP_FL | EXT3_NOATIME_FL)
-
-/* Mask out flags that are inappropriate for the given type of inode. */
-static inline __u32 ext3_mask_flags(umode_t mode, __u32 flags)
-{
-	if (S_ISDIR(mode))
-		return flags;
-	else if (S_ISREG(mode))
-		return flags & EXT3_REG_FLMASK;
-	else
-		return flags & EXT3_OTHER_FLMASK;
-}
-
-/* Used to pass group descriptor data when online resize is done */
-struct ext3_new_group_input {
-	__u32 group;            /* Group number for this data */
-	__u32 block_bitmap;     /* Absolute block number of block bitmap */
-	__u32 inode_bitmap;     /* Absolute block number of inode bitmap */
-	__u32 inode_table;      /* Absolute block number of inode table start */
-	__u32 blocks_count;     /* Total number of blocks in this group */
-	__u16 reserved_blocks;  /* Number of reserved blocks in this group */
-	__u16 unused;
-};
-
-/* The struct ext3_new_group_input in kernel space, with free_blocks_count */
-struct ext3_new_group_data {
-	__u32 group;
-	__u32 block_bitmap;
-	__u32 inode_bitmap;
-	__u32 inode_table;
-	__u32 blocks_count;
-	__u16 reserved_blocks;
-	__u16 unused;
-	__u32 free_blocks_count;
-};
-
-
-/*
- * ioctl commands
- */
-#define	EXT3_IOC_GETFLAGS		FS_IOC_GETFLAGS
-#define	EXT3_IOC_SETFLAGS		FS_IOC_SETFLAGS
-#define	EXT3_IOC_GETVERSION		_IOR('f', 3, long)
-#define	EXT3_IOC_SETVERSION		_IOW('f', 4, long)
-#define EXT3_IOC_GROUP_EXTEND		_IOW('f', 7, unsigned long)
-#define EXT3_IOC_GROUP_ADD		_IOW('f', 8,struct ext3_new_group_input)
-#define	EXT3_IOC_GETVERSION_OLD		FS_IOC_GETVERSION
-#define	EXT3_IOC_SETVERSION_OLD		FS_IOC_SETVERSION
-#ifdef CONFIG_JBD_DEBUG
-#define EXT3_IOC_WAIT_FOR_READONLY	_IOR('f', 99, long)
-#endif
-#define EXT3_IOC_GETRSVSZ		_IOR('f', 5, long)
-#define EXT3_IOC_SETRSVSZ		_IOW('f', 6, long)
-
-/*
- * ioctl commands in 32 bit emulation
- */
-#define EXT3_IOC32_GETFLAGS		FS_IOC32_GETFLAGS
-#define EXT3_IOC32_SETFLAGS		FS_IOC32_SETFLAGS
-#define EXT3_IOC32_GETVERSION		_IOR('f', 3, int)
-#define EXT3_IOC32_SETVERSION		_IOW('f', 4, int)
-#define EXT3_IOC32_GETRSVSZ		_IOR('f', 5, int)
-#define EXT3_IOC32_SETRSVSZ		_IOW('f', 6, int)
-#define EXT3_IOC32_GROUP_EXTEND		_IOW('f', 7, unsigned int)
-#ifdef CONFIG_JBD_DEBUG
-#define EXT3_IOC32_WAIT_FOR_READONLY	_IOR('f', 99, int)
-#endif
-#define EXT3_IOC32_GETVERSION_OLD	FS_IOC32_GETVERSION
-#define EXT3_IOC32_SETVERSION_OLD	FS_IOC32_SETVERSION
-
-/* Number of supported quota types */
-#define EXT3_MAXQUOTAS 2
-
-/*
- *  Mount options
- */
-struct ext3_mount_options {
-	unsigned long s_mount_opt;
-	kuid_t s_resuid;
-	kgid_t s_resgid;
-	unsigned long s_commit_interval;
-#ifdef CONFIG_QUOTA
-	int s_jquota_fmt;
-	char *s_qf_names[EXT3_MAXQUOTAS];
-#endif
-};
-
-/*
- * Structure of an inode on the disk
- */
-struct ext3_inode {
-	__le16	i_mode;		/* File mode */
-	__le16	i_uid;		/* Low 16 bits of Owner Uid */
-	__le32	i_size;		/* Size in bytes */
-	__le32	i_atime;	/* Access time */
-	__le32	i_ctime;	/* Creation time */
-	__le32	i_mtime;	/* Modification time */
-	__le32	i_dtime;	/* Deletion Time */
-	__le16	i_gid;		/* Low 16 bits of Group Id */
-	__le16	i_links_count;	/* Links count */
-	__le32	i_blocks;	/* Blocks count */
-	__le32	i_flags;	/* File flags */
-	union {
-		struct {
-			__u32  l_i_reserved1;
-		} linux1;
-		struct {
-			__u32  h_i_translator;
-		} hurd1;
-		struct {
-			__u32  m_i_reserved1;
-		} masix1;
-	} osd1;				/* OS dependent 1 */
-	__le32	i_block[EXT3_N_BLOCKS];/* Pointers to blocks */
-	__le32	i_generation;	/* File version (for NFS) */
-	__le32	i_file_acl;	/* File ACL */
-	__le32	i_dir_acl;	/* Directory ACL */
-	__le32	i_faddr;	/* Fragment address */
-	union {
-		struct {
-			__u8	l_i_frag;	/* Fragment number */
-			__u8	l_i_fsize;	/* Fragment size */
-			__u16	i_pad1;
-			__le16	l_i_uid_high;	/* these 2 fields    */
-			__le16	l_i_gid_high;	/* were reserved2[0] */
-			__u32	l_i_reserved2;
-		} linux2;
-		struct {
-			__u8	h_i_frag;	/* Fragment number */
-			__u8	h_i_fsize;	/* Fragment size */
-			__u16	h_i_mode_high;
-			__u16	h_i_uid_high;
-			__u16	h_i_gid_high;
-			__u32	h_i_author;
-		} hurd2;
-		struct {
-			__u8	m_i_frag;	/* Fragment number */
-			__u8	m_i_fsize;	/* Fragment size */
-			__u16	m_pad1;
-			__u32	m_i_reserved2[2];
-		} masix2;
-	} osd2;				/* OS dependent 2 */
-	__le16	i_extra_isize;
-	__le16	i_pad1;
-};
-
-#define i_size_high	i_dir_acl
-
-#define i_reserved1	osd1.linux1.l_i_reserved1
-#define i_frag		osd2.linux2.l_i_frag
-#define i_fsize		osd2.linux2.l_i_fsize
-#define i_uid_low	i_uid
-#define i_gid_low	i_gid
-#define i_uid_high	osd2.linux2.l_i_uid_high
-#define i_gid_high	osd2.linux2.l_i_gid_high
-#define i_reserved2	osd2.linux2.l_i_reserved2
-
-/*
- * File system states
- */
-#define	EXT3_VALID_FS			0x0001	/* Unmounted cleanly */
-#define	EXT3_ERROR_FS			0x0002	/* Errors detected */
-#define	EXT3_ORPHAN_FS			0x0004	/* Orphans being recovered */
-
-/*
- * Misc. filesystem flags
- */
-#define EXT2_FLAGS_SIGNED_HASH		0x0001  /* Signed dirhash in use */
-#define EXT2_FLAGS_UNSIGNED_HASH	0x0002  /* Unsigned dirhash in use */
-#define EXT2_FLAGS_TEST_FILESYS		0x0004	/* to test development code */
-
-/*
- * Mount flags
- */
-#define EXT3_MOUNT_CHECK		0x00001	/* Do mount-time checks */
-/* EXT3_MOUNT_OLDALLOC was there */
-#define EXT3_MOUNT_GRPID		0x00004	/* Create files with directory's group */
-#define EXT3_MOUNT_DEBUG		0x00008	/* Some debugging messages */
-#define EXT3_MOUNT_ERRORS_CONT		0x00010	/* Continue on errors */
-#define EXT3_MOUNT_ERRORS_RO		0x00020	/* Remount fs ro on errors */
-#define EXT3_MOUNT_ERRORS_PANIC		0x00040	/* Panic on errors */
-#define EXT3_MOUNT_MINIX_DF		0x00080	/* Mimics the Minix statfs */
-#define EXT3_MOUNT_NOLOAD		0x00100	/* Don't use existing journal*/
-#define EXT3_MOUNT_ABORT		0x00200	/* Fatal error detected */
-#define EXT3_MOUNT_DATA_FLAGS		0x00C00	/* Mode for data writes: */
-#define EXT3_MOUNT_JOURNAL_DATA		0x00400	/* Write data to journal */
-#define EXT3_MOUNT_ORDERED_DATA		0x00800	/* Flush data before commit */
-#define EXT3_MOUNT_WRITEBACK_DATA	0x00C00	/* No data ordering */
-#define EXT3_MOUNT_UPDATE_JOURNAL	0x01000	/* Update the journal format */
-#define EXT3_MOUNT_NO_UID32		0x02000  /* Disable 32-bit UIDs */
-#define EXT3_MOUNT_XATTR_USER		0x04000	/* Extended user attributes */
-#define EXT3_MOUNT_POSIX_ACL		0x08000	/* POSIX Access Control Lists */
-#define EXT3_MOUNT_RESERVATION		0x10000	/* Preallocation */
-#define EXT3_MOUNT_BARRIER		0x20000 /* Use block barriers */
-#define EXT3_MOUNT_QUOTA		0x80000 /* Some quota option set */
-#define EXT3_MOUNT_USRQUOTA		0x100000 /* "old" user quota */
-#define EXT3_MOUNT_GRPQUOTA		0x200000 /* "old" group quota */
-#define EXT3_MOUNT_DATA_ERR_ABORT	0x400000 /* Abort on file data write
-						  * error in ordered mode */
-
-/* Compatibility, for having both ext2_fs.h and ext3_fs.h included at once */
-#ifndef _LINUX_EXT2_FS_H
-#define clear_opt(o, opt)		o &= ~EXT3_MOUNT_##opt
-#define set_opt(o, opt)			o |= EXT3_MOUNT_##opt
-#define test_opt(sb, opt)		(EXT3_SB(sb)->s_mount_opt & \
-					 EXT3_MOUNT_##opt)
-#else
-#define EXT2_MOUNT_NOLOAD		EXT3_MOUNT_NOLOAD
-#define EXT2_MOUNT_ABORT		EXT3_MOUNT_ABORT
-#define EXT2_MOUNT_DATA_FLAGS		EXT3_MOUNT_DATA_FLAGS
-#endif
-
-#define ext3_set_bit			__set_bit_le
-#define ext3_set_bit_atomic		ext2_set_bit_atomic
-#define ext3_clear_bit			__clear_bit_le
-#define ext3_clear_bit_atomic		ext2_clear_bit_atomic
-#define ext3_test_bit			test_bit_le
-#define ext3_find_next_zero_bit		find_next_zero_bit_le
-
-/*
- * Maximal mount counts between two filesystem checks
- */
-#define EXT3_DFL_MAX_MNT_COUNT		20	/* Allow 20 mounts */
-#define EXT3_DFL_CHECKINTERVAL		0	/* Don't use interval check */
-
-/*
- * Behaviour when detecting errors
- */
-#define EXT3_ERRORS_CONTINUE		1	/* Continue execution */
-#define EXT3_ERRORS_RO			2	/* Remount fs read-only */
-#define EXT3_ERRORS_PANIC		3	/* Panic */
-#define EXT3_ERRORS_DEFAULT		EXT3_ERRORS_CONTINUE
-
-/*
- * Structure of the super block
- */
-struct ext3_super_block {
-/*00*/	__le32	s_inodes_count;		/* Inodes count */
-	__le32	s_blocks_count;		/* Blocks count */
-	__le32	s_r_blocks_count;	/* Reserved blocks count */
-	__le32	s_free_blocks_count;	/* Free blocks count */
-/*10*/	__le32	s_free_inodes_count;	/* Free inodes count */
-	__le32	s_first_data_block;	/* First Data Block */
-	__le32	s_log_block_size;	/* Block size */
-	__le32	s_log_frag_size;	/* Fragment size */
-/*20*/	__le32	s_blocks_per_group;	/* # Blocks per group */
-	__le32	s_frags_per_group;	/* # Fragments per group */
-	__le32	s_inodes_per_group;	/* # Inodes per group */
-	__le32	s_mtime;		/* Mount time */
-/*30*/	__le32	s_wtime;		/* Write time */
-	__le16	s_mnt_count;		/* Mount count */
-	__le16	s_max_mnt_count;	/* Maximal mount count */
-	__le16	s_magic;		/* Magic signature */
-	__le16	s_state;		/* File system state */
-	__le16	s_errors;		/* Behaviour when detecting errors */
-	__le16	s_minor_rev_level;	/* minor revision level */
-/*40*/	__le32	s_lastcheck;		/* time of last check */
-	__le32	s_checkinterval;	/* max. time between checks */
-	__le32	s_creator_os;		/* OS */
-	__le32	s_rev_level;		/* Revision level */
-/*50*/	__le16	s_def_resuid;		/* Default uid for reserved blocks */
-	__le16	s_def_resgid;		/* Default gid for reserved blocks */
-	/*
-	 * These fields are for EXT3_DYNAMIC_REV superblocks only.
-	 *
-	 * Note: the difference between the compatible feature set and
-	 * the incompatible feature set is that if there is a bit set
-	 * in the incompatible feature set that the kernel doesn't
-	 * know about, it should refuse to mount the filesystem.
-	 *
-	 * e2fsck's requirements are more strict; if it doesn't know
-	 * about a feature in either the compatible or incompatible
-	 * feature set, it must abort and not try to meddle with
-	 * things it doesn't understand...
-	 */
-	__le32	s_first_ino;		/* First non-reserved inode */
-	__le16   s_inode_size;		/* size of inode structure */
-	__le16	s_block_group_nr;	/* block group # of this superblock */
-	__le32	s_feature_compat;	/* compatible feature set */
-/*60*/	__le32	s_feature_incompat;	/* incompatible feature set */
-	__le32	s_feature_ro_compat;	/* readonly-compatible feature set */
-/*68*/	__u8	s_uuid[16];		/* 128-bit uuid for volume */
-/*78*/	char	s_volume_name[16];	/* volume name */
-/*88*/	char	s_last_mounted[64];	/* directory where last mounted */
-/*C8*/	__le32	s_algorithm_usage_bitmap; /* For compression */
-	/*
-	 * Performance hints.  Directory preallocation should only
-	 * happen if the EXT3_FEATURE_COMPAT_DIR_PREALLOC flag is on.
-	 */
-	__u8	s_prealloc_blocks;	/* Nr of blocks to try to preallocate*/
-	__u8	s_prealloc_dir_blocks;	/* Nr to preallocate for dirs */
-	__le16	s_reserved_gdt_blocks;	/* Per group desc for online growth */
-	/*
-	 * Journaling support valid if EXT3_FEATURE_COMPAT_HAS_JOURNAL set.
-	 */
-/*D0*/	__u8	s_journal_uuid[16];	/* uuid of journal superblock */
-/*E0*/	__le32	s_journal_inum;		/* inode number of journal file */
-	__le32	s_journal_dev;		/* device number of journal file */
-	__le32	s_last_orphan;		/* start of list of inodes to delete */
-	__le32	s_hash_seed[4];		/* HTREE hash seed */
-	__u8	s_def_hash_version;	/* Default hash version to use */
-	__u8	s_reserved_char_pad;
-	__u16	s_reserved_word_pad;
-	__le32	s_default_mount_opts;
-	__le32	s_first_meta_bg;	/* First metablock block group */
-	__le32	s_mkfs_time;		/* When the filesystem was created */
-	__le32	s_jnl_blocks[17];	/* Backup of the journal inode */
-	/* 64bit support valid if EXT4_FEATURE_COMPAT_64BIT */
-/*150*/	__le32	s_blocks_count_hi;	/* Blocks count */
-	__le32	s_r_blocks_count_hi;	/* Reserved blocks count */
-	__le32	s_free_blocks_count_hi;	/* Free blocks count */
-	__le16	s_min_extra_isize;	/* All inodes have at least # bytes */
-	__le16	s_want_extra_isize; 	/* New inodes should reserve # bytes */
-	__le32	s_flags;		/* Miscellaneous flags */
-	__le16  s_raid_stride;		/* RAID stride */
-	__le16  s_mmp_interval;         /* # seconds to wait in MMP checking */
-	__le64  s_mmp_block;            /* Block for multi-mount protection */
-	__le32  s_raid_stripe_width;    /* blocks on all data disks (N*stride)*/
-	__u8	s_log_groups_per_flex;  /* FLEX_BG group size */
-	__u8	s_reserved_char_pad2;
-	__le16  s_reserved_pad;
-	__u32   s_reserved[162];        /* Padding to the end of the block */
-};
-
-/* data type for block offset of block group */
-typedef int ext3_grpblk_t;
-
-/* data type for filesystem-wide blocks number */
-typedef unsigned long ext3_fsblk_t;
-
-#define E3FSBLK "%lu"
-
-struct ext3_reserve_window {
-	ext3_fsblk_t	_rsv_start;	/* First byte reserved */
-	ext3_fsblk_t	_rsv_end;	/* Last byte reserved or 0 */
-};
-
-struct ext3_reserve_window_node {
-	struct rb_node		rsv_node;
-	__u32			rsv_goal_size;
-	__u32			rsv_alloc_hit;
-	struct ext3_reserve_window	rsv_window;
-};
-
-struct ext3_block_alloc_info {
-	/* information about reservation window */
-	struct ext3_reserve_window_node	rsv_window_node;
-	/*
-	 * was i_next_alloc_block in ext3_inode_info
-	 * is the logical (file-relative) number of the
-	 * most-recently-allocated block in this file.
-	 * We use this for detecting linearly ascending allocation requests.
-	 */
-	__u32                   last_alloc_logical_block;
-	/*
-	 * Was i_next_alloc_goal in ext3_inode_info
-	 * is the *physical* companion to i_next_alloc_block.
-	 * it the physical block number of the block which was most-recentl
-	 * allocated to this file.  This give us the goal (target) for the next
-	 * allocation when we detect linearly ascending requests.
-	 */
-	ext3_fsblk_t		last_alloc_physical_block;
-};
-
-#define rsv_start rsv_window._rsv_start
-#define rsv_end rsv_window._rsv_end
-
-/*
- * third extended file system inode data in memory
- */
-struct ext3_inode_info {
-	__le32	i_data[15];	/* unconverted */
-	__u32	i_flags;
-#ifdef EXT3_FRAGMENTS
-	__u32	i_faddr;
-	__u8	i_frag_no;
-	__u8	i_frag_size;
-#endif
-	ext3_fsblk_t	i_file_acl;
-	__u32	i_dir_acl;
-	__u32	i_dtime;
-
-	/*
-	 * i_block_group is the number of the block group which contains
-	 * this file's inode.  Constant across the lifetime of the inode,
-	 * it is ued for making block allocation decisions - we try to
-	 * place a file's data blocks near its inode block, and new inodes
-	 * near to their parent directory's inode.
-	 */
-	__u32	i_block_group;
-	unsigned long	i_state_flags;	/* Dynamic state flags for ext3 */
-
-	/* block reservation info */
-	struct ext3_block_alloc_info *i_block_alloc_info;
-
-	__u32	i_dir_start_lookup;
-#ifdef CONFIG_EXT3_FS_XATTR
-	/*
-	 * Extended attributes can be read independently of the main file
-	 * data. Taking i_mutex even when reading would cause contention
-	 * between readers of EAs and writers of regular file data, so
-	 * instead we synchronize on xattr_sem when reading or changing
-	 * EAs.
-	 */
-	struct rw_semaphore xattr_sem;
-#endif
-
-	struct list_head i_orphan;	/* unlinked but open inodes */
-
-	/*
-	 * i_disksize keeps track of what the inode size is ON DISK, not
-	 * in memory.  During truncate, i_size is set to the new size by
-	 * the VFS prior to calling ext3_truncate(), but the filesystem won't
-	 * set i_disksize to 0 until the truncate is actually under way.
-	 *
-	 * The intent is that i_disksize always represents the blocks which
-	 * are used by this file.  This allows recovery to restart truncate
-	 * on orphans if we crash during truncate.  We actually write i_disksize
-	 * into the on-disk inode when writing inodes out, instead of i_size.
-	 *
-	 * The only time when i_disksize and i_size may be different is when
-	 * a truncate is in progress.  The only things which change i_disksize
-	 * are ext3_get_block (growth) and ext3_truncate (shrinkth).
-	 */
-	loff_t	i_disksize;
-
-	/* on-disk additional length */
-	__u16 i_extra_isize;
-
-	/*
-	 * truncate_mutex is for serialising ext3_truncate() against
-	 * ext3_getblock().  In the 2.4 ext2 design, great chunks of inode's
-	 * data tree are chopped off during truncate. We can't do that in
-	 * ext3 because whenever we perform intermediate commits during
-	 * truncate, the inode and all the metadata blocks *must* be in a
-	 * consistent state which allows truncation of the orphans to restart
-	 * during recovery.  Hence we must fix the get_block-vs-truncate race
-	 * by other means, so we have truncate_mutex.
-	 */
-	struct mutex truncate_mutex;
-
-	/*
-	 * Transactions that contain inode's metadata needed to complete
-	 * fsync and fdatasync, respectively.
-	 */
-	atomic_t i_sync_tid;
-	atomic_t i_datasync_tid;
-
-#ifdef CONFIG_QUOTA
-	struct dquot *i_dquot[MAXQUOTAS];
-#endif
-
-	struct inode vfs_inode;
-};
-
-/*
- * third extended-fs super-block data in memory
- */
-struct ext3_sb_info {
-	unsigned long s_frag_size;	/* Size of a fragment in bytes */
-	unsigned long s_frags_per_block;/* Number of fragments per block */
-	unsigned long s_inodes_per_block;/* Number of inodes per block */
-	unsigned long s_frags_per_group;/* Number of fragments in a group */
-	unsigned long s_blocks_per_group;/* Number of blocks in a group */
-	unsigned long s_inodes_per_group;/* Number of inodes in a group */
-	unsigned long s_itb_per_group;	/* Number of inode table blocks per group */
-	unsigned long s_gdb_count;	/* Number of group descriptor blocks */
-	unsigned long s_desc_per_block;	/* Number of group descriptors per block */
-	unsigned long s_groups_count;	/* Number of groups in the fs */
-	unsigned long s_overhead_last;  /* Last calculated overhead */
-	unsigned long s_blocks_last;    /* Last seen block count */
-	struct buffer_head * s_sbh;	/* Buffer containing the super block */
-	struct ext3_super_block * s_es;	/* Pointer to the super block in the buffer */
-	struct buffer_head ** s_group_desc;
-	unsigned long  s_mount_opt;
-	ext3_fsblk_t s_sb_block;
-	kuid_t s_resuid;
-	kgid_t s_resgid;
-	unsigned short s_mount_state;
-	unsigned short s_pad;
-	int s_addr_per_block_bits;
-	int s_desc_per_block_bits;
-	int s_inode_size;
-	int s_first_ino;
-	spinlock_t s_next_gen_lock;
-	u32 s_next_generation;
-	u32 s_hash_seed[4];
-	int s_def_hash_version;
-	int s_hash_unsigned;	/* 3 if hash should be signed, 0 if not */
-	struct percpu_counter s_freeblocks_counter;
-	struct percpu_counter s_freeinodes_counter;
-	struct percpu_counter s_dirs_counter;
-	struct blockgroup_lock *s_blockgroup_lock;
-
-	/* root of the per fs reservation window tree */
-	spinlock_t s_rsv_window_lock;
-	struct rb_root s_rsv_window_root;
-	struct ext3_reserve_window_node s_rsv_window_head;
-
-	/* Journaling */
-	struct inode * s_journal_inode;
-	struct journal_s * s_journal;
-	struct list_head s_orphan;
-	struct mutex s_orphan_lock;
-	struct mutex s_resize_lock;
-	unsigned long s_commit_interval;
-	struct block_device *journal_bdev;
-#ifdef CONFIG_QUOTA
-	char *s_qf_names[EXT3_MAXQUOTAS];	/* Names of quota files with journalled quota */
-	int s_jquota_fmt;			/* Format of quota to use */
-#endif
-};
-
-static inline spinlock_t *
-sb_bgl_lock(struct ext3_sb_info *sbi, unsigned int block_group)
-{
-	return bgl_lock_ptr(sbi->s_blockgroup_lock, block_group);
-}
-
-static inline struct ext3_sb_info * EXT3_SB(struct super_block *sb)
-{
-	return sb->s_fs_info;
-}
-static inline struct ext3_inode_info *EXT3_I(struct inode *inode)
-{
-	return container_of(inode, struct ext3_inode_info, vfs_inode);
-}
-
-static inline int ext3_valid_inum(struct super_block *sb, unsigned long ino)
-{
-	return ino == EXT3_ROOT_INO ||
-		ino == EXT3_JOURNAL_INO ||
-		ino == EXT3_RESIZE_INO ||
-		(ino >= EXT3_FIRST_INO(sb) &&
-		 ino <= le32_to_cpu(EXT3_SB(sb)->s_es->s_inodes_count));
-}
-
-/*
- * Inode dynamic state flags
- */
-enum {
-	EXT3_STATE_JDATA,		/* journaled data exists */
-	EXT3_STATE_NEW,			/* inode is newly created */
-	EXT3_STATE_XATTR,		/* has in-inode xattrs */
-	EXT3_STATE_FLUSH_ON_CLOSE,	/* flush dirty pages on close */
-};
-
-static inline int ext3_test_inode_state(struct inode *inode, int bit)
-{
-	return test_bit(bit, &EXT3_I(inode)->i_state_flags);
-}
-
-static inline void ext3_set_inode_state(struct inode *inode, int bit)
-{
-	set_bit(bit, &EXT3_I(inode)->i_state_flags);
-}
-
-static inline void ext3_clear_inode_state(struct inode *inode, int bit)
-{
-	clear_bit(bit, &EXT3_I(inode)->i_state_flags);
-}
-
-#define NEXT_ORPHAN(inode) EXT3_I(inode)->i_dtime
-
-/*
- * Codes for operating systems
- */
-#define EXT3_OS_LINUX		0
-#define EXT3_OS_HURD		1
-#define EXT3_OS_MASIX		2
-#define EXT3_OS_FREEBSD		3
-#define EXT3_OS_LITES		4
-
-/*
- * Revision levels
- */
-#define EXT3_GOOD_OLD_REV	0	/* The good old (original) format */
-#define EXT3_DYNAMIC_REV	1	/* V2 format w/ dynamic inode sizes */
-
-#define EXT3_CURRENT_REV	EXT3_GOOD_OLD_REV
-#define EXT3_MAX_SUPP_REV	EXT3_DYNAMIC_REV
-
-#define EXT3_GOOD_OLD_INODE_SIZE 128
-
-/*
- * Feature set definitions
- */
-
-#define EXT3_HAS_COMPAT_FEATURE(sb,mask)			\
-	( EXT3_SB(sb)->s_es->s_feature_compat & cpu_to_le32(mask) )
-#define EXT3_HAS_RO_COMPAT_FEATURE(sb,mask)			\
-	( EXT3_SB(sb)->s_es->s_feature_ro_compat & cpu_to_le32(mask) )
-#define EXT3_HAS_INCOMPAT_FEATURE(sb,mask)			\
-	( EXT3_SB(sb)->s_es->s_feature_incompat & cpu_to_le32(mask) )
-#define EXT3_SET_COMPAT_FEATURE(sb,mask)			\
-	EXT3_SB(sb)->s_es->s_feature_compat |= cpu_to_le32(mask)
-#define EXT3_SET_RO_COMPAT_FEATURE(sb,mask)			\
-	EXT3_SB(sb)->s_es->s_feature_ro_compat |= cpu_to_le32(mask)
-#define EXT3_SET_INCOMPAT_FEATURE(sb,mask)			\
-	EXT3_SB(sb)->s_es->s_feature_incompat |= cpu_to_le32(mask)
-#define EXT3_CLEAR_COMPAT_FEATURE(sb,mask)			\
-	EXT3_SB(sb)->s_es->s_feature_compat &= ~cpu_to_le32(mask)
-#define EXT3_CLEAR_RO_COMPAT_FEATURE(sb,mask)			\
-	EXT3_SB(sb)->s_es->s_feature_ro_compat &= ~cpu_to_le32(mask)
-#define EXT3_CLEAR_INCOMPAT_FEATURE(sb,mask)			\
-	EXT3_SB(sb)->s_es->s_feature_incompat &= ~cpu_to_le32(mask)
-
-#define EXT3_FEATURE_COMPAT_DIR_PREALLOC	0x0001
-#define EXT3_FEATURE_COMPAT_IMAGIC_INODES	0x0002
-#define EXT3_FEATURE_COMPAT_HAS_JOURNAL		0x0004
-#define EXT3_FEATURE_COMPAT_EXT_ATTR		0x0008
-#define EXT3_FEATURE_COMPAT_RESIZE_INODE	0x0010
-#define EXT3_FEATURE_COMPAT_DIR_INDEX		0x0020
-
-#define EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER	0x0001
-#define EXT3_FEATURE_RO_COMPAT_LARGE_FILE	0x0002
-#define EXT3_FEATURE_RO_COMPAT_BTREE_DIR	0x0004
-
-#define EXT3_FEATURE_INCOMPAT_COMPRESSION	0x0001
-#define EXT3_FEATURE_INCOMPAT_FILETYPE		0x0002
-#define EXT3_FEATURE_INCOMPAT_RECOVER		0x0004 /* Needs recovery */
-#define EXT3_FEATURE_INCOMPAT_JOURNAL_DEV	0x0008 /* Journal device */
-#define EXT3_FEATURE_INCOMPAT_META_BG		0x0010
-
-#define EXT3_FEATURE_COMPAT_SUPP	EXT2_FEATURE_COMPAT_EXT_ATTR
-#define EXT3_FEATURE_INCOMPAT_SUPP	(EXT3_FEATURE_INCOMPAT_FILETYPE| \
-					 EXT3_FEATURE_INCOMPAT_RECOVER| \
-					 EXT3_FEATURE_INCOMPAT_META_BG)
-#define EXT3_FEATURE_RO_COMPAT_SUPP	(EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER| \
-					 EXT3_FEATURE_RO_COMPAT_LARGE_FILE| \
-					 EXT3_FEATURE_RO_COMPAT_BTREE_DIR)
-
-/*
- * Default values for user and/or group using reserved blocks
- */
-#define	EXT3_DEF_RESUID		0
-#define	EXT3_DEF_RESGID		0
-
-/*
- * Default mount options
- */
-#define EXT3_DEFM_DEBUG		0x0001
-#define EXT3_DEFM_BSDGROUPS	0x0002
-#define EXT3_DEFM_XATTR_USER	0x0004
-#define EXT3_DEFM_ACL		0x0008
-#define EXT3_DEFM_UID16		0x0010
-#define EXT3_DEFM_JMODE		0x0060
-#define EXT3_DEFM_JMODE_DATA	0x0020
-#define EXT3_DEFM_JMODE_ORDERED	0x0040
-#define EXT3_DEFM_JMODE_WBACK	0x0060
-
-/*
- * Structure of a directory entry
- */
-#define EXT3_NAME_LEN 255
-
-struct ext3_dir_entry {
-	__le32	inode;			/* Inode number */
-	__le16	rec_len;		/* Directory entry length */
-	__le16	name_len;		/* Name length */
-	char	name[EXT3_NAME_LEN];	/* File name */
-};
-
-/*
- * The new version of the directory entry.  Since EXT3 structures are
- * stored in intel byte order, and the name_len field could never be
- * bigger than 255 chars, it's safe to reclaim the extra byte for the
- * file_type field.
- */
-struct ext3_dir_entry_2 {
-	__le32	inode;			/* Inode number */
-	__le16	rec_len;		/* Directory entry length */
-	__u8	name_len;		/* Name length */
-	__u8	file_type;
-	char	name[EXT3_NAME_LEN];	/* File name */
-};
-
-/*
- * Ext3 directory file types.  Only the low 3 bits are used.  The
- * other bits are reserved for now.
- */
-#define EXT3_FT_UNKNOWN		0
-#define EXT3_FT_REG_FILE	1
-#define EXT3_FT_DIR		2
-#define EXT3_FT_CHRDEV		3
-#define EXT3_FT_BLKDEV		4
-#define EXT3_FT_FIFO		5
-#define EXT3_FT_SOCK		6
-#define EXT3_FT_SYMLINK		7
-
-#define EXT3_FT_MAX		8
-
-/*
- * EXT3_DIR_PAD defines the directory entries boundaries
- *
- * NOTE: It must be a multiple of 4
- */
-#define EXT3_DIR_PAD			4
-#define EXT3_DIR_ROUND			(EXT3_DIR_PAD - 1)
-#define EXT3_DIR_REC_LEN(name_len)	(((name_len) + 8 + EXT3_DIR_ROUND) & \
-					 ~EXT3_DIR_ROUND)
-#define EXT3_MAX_REC_LEN		((1<<16)-1)
-
-/*
- * Tests against MAX_REC_LEN etc were put in place for 64k block
- * sizes; if that is not possible on this arch, we can skip
- * those tests and speed things up.
- */
-static inline unsigned ext3_rec_len_from_disk(__le16 dlen)
-{
-	unsigned len = le16_to_cpu(dlen);
-
-#if (PAGE_CACHE_SIZE >= 65536)
-	if (len == EXT3_MAX_REC_LEN)
-		return 1 << 16;
-#endif
-	return len;
-}
-
-static inline __le16 ext3_rec_len_to_disk(unsigned len)
-{
-#if (PAGE_CACHE_SIZE >= 65536)
-	if (len == (1 << 16))
-		return cpu_to_le16(EXT3_MAX_REC_LEN);
-	else if (len > (1 << 16))
-		BUG();
-#endif
-	return cpu_to_le16(len);
-}
-
-/*
- * Hash Tree Directory indexing
- * (c) Daniel Phillips, 2001
- */
-
-#define is_dx(dir) (EXT3_HAS_COMPAT_FEATURE(dir->i_sb, \
-				      EXT3_FEATURE_COMPAT_DIR_INDEX) && \
-		      (EXT3_I(dir)->i_flags & EXT3_INDEX_FL))
-#define EXT3_DIR_LINK_MAX(dir) (!is_dx(dir) && (dir)->i_nlink >= EXT3_LINK_MAX)
-#define EXT3_DIR_LINK_EMPTY(dir) ((dir)->i_nlink == 2 || (dir)->i_nlink == 1)
-
-/* Legal values for the dx_root hash_version field: */
-
-#define DX_HASH_LEGACY		0
-#define DX_HASH_HALF_MD4	1
-#define DX_HASH_TEA		2
-#define DX_HASH_LEGACY_UNSIGNED	3
-#define DX_HASH_HALF_MD4_UNSIGNED	4
-#define DX_HASH_TEA_UNSIGNED		5
-
-/* hash info structure used by the directory hash */
-struct dx_hash_info
-{
-	u32		hash;
-	u32		minor_hash;
-	int		hash_version;
-	u32		*seed;
-};
-
-
-/* 32 and 64 bit signed EOF for dx directories */
-#define EXT3_HTREE_EOF_32BIT   ((1UL  << (32 - 1)) - 1)
-#define EXT3_HTREE_EOF_64BIT   ((1ULL << (64 - 1)) - 1)
-
-
-/*
- * Control parameters used by ext3_htree_next_block
- */
-#define HASH_NB_ALWAYS		1
-
-
-/*
- * Describe an inode's exact location on disk and in memory
- */
-struct ext3_iloc
-{
-	struct buffer_head *bh;
-	unsigned long offset;
-	unsigned long block_group;
-};
-
-static inline struct ext3_inode *ext3_raw_inode(struct ext3_iloc *iloc)
-{
-	return (struct ext3_inode *) (iloc->bh->b_data + iloc->offset);
-}
-
-/*
- * This structure is stuffed into the struct file's private_data field
- * for directories.  It is where we put information so that we can do
- * readdir operations in hash tree order.
- */
-struct dir_private_info {
-	struct rb_root	root;
-	struct rb_node	*curr_node;
-	struct fname	*extra_fname;
-	loff_t		last_pos;
-	__u32		curr_hash;
-	__u32		curr_minor_hash;
-	__u32		next_hash;
-};
-
-/* calculate the first block number of the group */
-static inline ext3_fsblk_t
-ext3_group_first_block_no(struct super_block *sb, unsigned long group_no)
-{
-	return group_no * (ext3_fsblk_t)EXT3_BLOCKS_PER_GROUP(sb) +
-		le32_to_cpu(EXT3_SB(sb)->s_es->s_first_data_block);
-}
-
-/*
- * Special error return code only used by dx_probe() and its callers.
- */
-#define ERR_BAD_DX_DIR	-75000
-
-/*
- * Function prototypes
- */
-
-/*
- * Ok, these declarations are also in <linux/kernel.h> but none of the
- * ext3 source programs needs to include it so they are duplicated here.
- */
-# define NORET_TYPE    /**/
-# define ATTRIB_NORET  __attribute__((noreturn))
-# define NORET_AND     noreturn,
-
-/* balloc.c */
-extern int ext3_bg_has_super(struct super_block *sb, int group);
-extern unsigned long ext3_bg_num_gdb(struct super_block *sb, int group);
-extern ext3_fsblk_t ext3_new_block (handle_t *handle, struct inode *inode,
-			ext3_fsblk_t goal, int *errp);
-extern ext3_fsblk_t ext3_new_blocks (handle_t *handle, struct inode *inode,
-			ext3_fsblk_t goal, unsigned long *count, int *errp);
-extern void ext3_free_blocks (handle_t *handle, struct inode *inode,
-			ext3_fsblk_t block, unsigned long count);
-extern void ext3_free_blocks_sb (handle_t *handle, struct super_block *sb,
-				 ext3_fsblk_t block, unsigned long count,
-				unsigned long *pdquot_freed_blocks);
-extern ext3_fsblk_t ext3_count_free_blocks (struct super_block *);
-extern void ext3_check_blocks_bitmap (struct super_block *);
-extern struct ext3_group_desc * ext3_get_group_desc(struct super_block * sb,
-						    unsigned int block_group,
-						    struct buffer_head ** bh);
-extern int ext3_should_retry_alloc(struct super_block *sb, int *retries);
-extern void ext3_init_block_alloc_info(struct inode *);
-extern void ext3_rsv_window_add(struct super_block *sb, struct ext3_reserve_window_node *rsv);
-extern int ext3_trim_fs(struct super_block *sb, struct fstrim_range *range);
-
-/* dir.c */
-extern int ext3_check_dir_entry(const char *, struct inode *,
-				struct ext3_dir_entry_2 *,
-				struct buffer_head *, unsigned long);
-extern int ext3_htree_store_dirent(struct file *dir_file, __u32 hash,
-				    __u32 minor_hash,
-				    struct ext3_dir_entry_2 *dirent);
-extern void ext3_htree_free_dir_info(struct dir_private_info *p);
-
-/* fsync.c */
-extern int ext3_sync_file(struct file *, loff_t, loff_t, int);
-
-/* hash.c */
-extern int ext3fs_dirhash(const char *name, int len, struct
-			  dx_hash_info *hinfo);
-
-/* ialloc.c */
-extern struct inode * ext3_new_inode (handle_t *, struct inode *,
-				      const struct qstr *, umode_t);
-extern void ext3_free_inode (handle_t *, struct inode *);
-extern struct inode * ext3_orphan_get (struct super_block *, unsigned long);
-extern unsigned long ext3_count_free_inodes (struct super_block *);
-extern unsigned long ext3_count_dirs (struct super_block *);
-extern void ext3_check_inodes_bitmap (struct super_block *);
-extern unsigned long ext3_count_free (struct buffer_head *, unsigned);
-
-
-/* inode.c */
-int ext3_forget(handle_t *handle, int is_metadata, struct inode *inode,
-		struct buffer_head *bh, ext3_fsblk_t blocknr);
-struct buffer_head * ext3_getblk (handle_t *, struct inode *, long, int, int *);
-struct buffer_head * ext3_bread (handle_t *, struct inode *, int, int, int *);
-int ext3_get_blocks_handle(handle_t *handle, struct inode *inode,
-	sector_t iblock, unsigned long maxblocks, struct buffer_head *bh_result,
-	int create);
-
-extern struct inode *ext3_iget(struct super_block *, unsigned long);
-extern int  ext3_write_inode (struct inode *, struct writeback_control *);
-extern int  ext3_setattr (struct dentry *, struct iattr *);
-extern void ext3_evict_inode (struct inode *);
-extern int  ext3_sync_inode (handle_t *, struct inode *);
-extern void ext3_discard_reservation (struct inode *);
-extern void ext3_dirty_inode(struct inode *, int);
-extern int ext3_change_inode_journal_flag(struct inode *, int);
-extern int ext3_get_inode_loc(struct inode *, struct ext3_iloc *);
-extern int ext3_can_truncate(struct inode *inode);
-extern void ext3_truncate(struct inode *inode);
-extern void ext3_set_inode_flags(struct inode *);
-extern void ext3_get_inode_flags(struct ext3_inode_info *);
-extern void ext3_set_aops(struct inode *inode);
-extern int ext3_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
-		       u64 start, u64 len);
-
-/* ioctl.c */
-extern long ext3_ioctl(struct file *, unsigned int, unsigned long);
-extern long ext3_compat_ioctl(struct file *, unsigned int, unsigned long);
-
-/* namei.c */
-extern int ext3_orphan_add(handle_t *, struct inode *);
-extern int ext3_orphan_del(handle_t *, struct inode *);
-extern int ext3_htree_fill_tree(struct file *dir_file, __u32 start_hash,
-				__u32 start_minor_hash, __u32 *next_hash);
-
-/* resize.c */
-extern int ext3_group_add(struct super_block *sb,
-				struct ext3_new_group_data *input);
-extern int ext3_group_extend(struct super_block *sb,
-				struct ext3_super_block *es,
-				ext3_fsblk_t n_blocks_count);
-
-/* super.c */
-extern __printf(3, 4)
-void ext3_error(struct super_block *, const char *, const char *, ...);
-extern void __ext3_std_error (struct super_block *, const char *, int);
-extern __printf(3, 4)
-void ext3_abort(struct super_block *, const char *, const char *, ...);
-extern __printf(3, 4)
-void ext3_warning(struct super_block *, const char *, const char *, ...);
-extern __printf(3, 4)
-void ext3_msg(struct super_block *, const char *, const char *, ...);
-extern void ext3_update_dynamic_rev (struct super_block *sb);
-
-#define ext3_std_error(sb, errno)				\
-do {								\
-	if ((errno))						\
-		__ext3_std_error((sb), __func__, (errno));	\
-} while (0)
-
-/*
- * Inodes and files operations
- */
-
-/* dir.c */
-extern const struct file_operations ext3_dir_operations;
-
-/* file.c */
-extern const struct inode_operations ext3_file_inode_operations;
-extern const struct file_operations ext3_file_operations;
-
-/* namei.c */
-extern const struct inode_operations ext3_dir_inode_operations;
-extern const struct inode_operations ext3_special_inode_operations;
-
-/* symlink.c */
-extern const struct inode_operations ext3_symlink_inode_operations;
-extern const struct inode_operations ext3_fast_symlink_inode_operations;
-
-#define EXT3_JOURNAL(inode)	(EXT3_SB((inode)->i_sb)->s_journal)
-
-/* Define the number of blocks we need to account to a transaction to
- * modify one block of data.
- *
- * We may have to touch one inode, one bitmap buffer, up to three
- * indirection blocks, the group and superblock summaries, and the data
- * block to complete the transaction.  */
-
-#define EXT3_SINGLEDATA_TRANS_BLOCKS	8U
-
-/* Extended attribute operations touch at most two data buffers,
- * two bitmap buffers, and two group summaries, in addition to the inode
- * and the superblock, which are already accounted for. */
-
-#define EXT3_XATTR_TRANS_BLOCKS		6U
-
-/* Define the minimum size for a transaction which modifies data.  This
- * needs to take into account the fact that we may end up modifying two
- * quota files too (one for the group, one for the user quota).  The
- * superblock only gets updated once, of course, so don't bother
- * counting that again for the quota updates. */
-
-#define EXT3_DATA_TRANS_BLOCKS(sb)	(EXT3_SINGLEDATA_TRANS_BLOCKS + \
-					 EXT3_XATTR_TRANS_BLOCKS - 2 + \
-					 EXT3_MAXQUOTAS_TRANS_BLOCKS(sb))
-
-/* Delete operations potentially hit one directory's namespace plus an
- * entire inode, plus arbitrary amounts of bitmap/indirection data.  Be
- * generous.  We can grow the delete transaction later if necessary. */
-
-#define EXT3_DELETE_TRANS_BLOCKS(sb)   (EXT3_MAXQUOTAS_TRANS_BLOCKS(sb) + 64)
-
-/* Define an arbitrary limit for the amount of data we will anticipate
- * writing to any given transaction.  For unbounded transactions such as
- * write(2) and truncate(2) we can write more than this, but we always
- * start off at the maximum transaction size and grow the transaction
- * optimistically as we go. */
-
-#define EXT3_MAX_TRANS_DATA		64U
-
-/* We break up a large truncate or write transaction once the handle's
- * buffer credits gets this low, we need either to extend the
- * transaction or to start a new one.  Reserve enough space here for
- * inode, bitmap, superblock, group and indirection updates for at least
- * one block, plus two quota updates.  Quota allocations are not
- * needed. */
-
-#define EXT3_RESERVE_TRANS_BLOCKS	12U
-
-#define EXT3_INDEX_EXTRA_TRANS_BLOCKS	8
-
-#ifdef CONFIG_QUOTA
-/* Amount of blocks needed for quota update - we know that the structure was
- * allocated so we need to update only inode+data */
-#define EXT3_QUOTA_TRANS_BLOCKS(sb) (test_opt(sb, QUOTA) ? 2 : 0)
-/* Amount of blocks needed for quota insert/delete - we do some block writes
- * but inode, sb and group updates are done only once */
-#define EXT3_QUOTA_INIT_BLOCKS(sb) (test_opt(sb, QUOTA) ? (DQUOT_INIT_ALLOC*\
-		(EXT3_SINGLEDATA_TRANS_BLOCKS-3)+3+DQUOT_INIT_REWRITE) : 0)
-#define EXT3_QUOTA_DEL_BLOCKS(sb) (test_opt(sb, QUOTA) ? (DQUOT_DEL_ALLOC*\
-		(EXT3_SINGLEDATA_TRANS_BLOCKS-3)+3+DQUOT_DEL_REWRITE) : 0)
-#else
-#define EXT3_QUOTA_TRANS_BLOCKS(sb) 0
-#define EXT3_QUOTA_INIT_BLOCKS(sb) 0
-#define EXT3_QUOTA_DEL_BLOCKS(sb) 0
-#endif
-#define EXT3_MAXQUOTAS_TRANS_BLOCKS(sb) (EXT3_MAXQUOTAS*EXT3_QUOTA_TRANS_BLOCKS(sb))
-#define EXT3_MAXQUOTAS_INIT_BLOCKS(sb) (EXT3_MAXQUOTAS*EXT3_QUOTA_INIT_BLOCKS(sb))
-#define EXT3_MAXQUOTAS_DEL_BLOCKS(sb) (EXT3_MAXQUOTAS*EXT3_QUOTA_DEL_BLOCKS(sb))
-
-int
-ext3_mark_iloc_dirty(handle_t *handle,
-		     struct inode *inode,
-		     struct ext3_iloc *iloc);
-
-/*
- * On success, We end up with an outstanding reference count against
- * iloc->bh.  This _must_ be cleaned up later.
- */
-
-int ext3_reserve_inode_write(handle_t *handle, struct inode *inode,
-			struct ext3_iloc *iloc);
-
-int ext3_mark_inode_dirty(handle_t *handle, struct inode *inode);
-
-/*
- * Wrapper functions with which ext3 calls into JBD.  The intent here is
- * to allow these to be turned into appropriate stubs so ext3 can control
- * ext2 filesystems, so ext2+ext3 systems only nee one fs.  This work hasn't
- * been done yet.
- */
-
-static inline void ext3_journal_release_buffer(handle_t *handle,
-						struct buffer_head *bh)
-{
-	journal_release_buffer(handle, bh);
-}
-
-void ext3_journal_abort_handle(const char *caller, const char *err_fn,
-		struct buffer_head *bh, handle_t *handle, int err);
-
-int __ext3_journal_get_undo_access(const char *where, handle_t *handle,
-				struct buffer_head *bh);
-
-int __ext3_journal_get_write_access(const char *where, handle_t *handle,
-				struct buffer_head *bh);
-
-int __ext3_journal_forget(const char *where, handle_t *handle,
-				struct buffer_head *bh);
-
-int __ext3_journal_revoke(const char *where, handle_t *handle,
-				unsigned long blocknr, struct buffer_head *bh);
-
-int __ext3_journal_get_create_access(const char *where,
-				handle_t *handle, struct buffer_head *bh);
-
-int __ext3_journal_dirty_metadata(const char *where,
-				handle_t *handle, struct buffer_head *bh);
-
-#define ext3_journal_get_undo_access(handle, bh) \
-	__ext3_journal_get_undo_access(__func__, (handle), (bh))
-#define ext3_journal_get_write_access(handle, bh) \
-	__ext3_journal_get_write_access(__func__, (handle), (bh))
-#define ext3_journal_revoke(handle, blocknr, bh) \
-	__ext3_journal_revoke(__func__, (handle), (blocknr), (bh))
-#define ext3_journal_get_create_access(handle, bh) \
-	__ext3_journal_get_create_access(__func__, (handle), (bh))
-#define ext3_journal_dirty_metadata(handle, bh) \
-	__ext3_journal_dirty_metadata(__func__, (handle), (bh))
-#define ext3_journal_forget(handle, bh) \
-	__ext3_journal_forget(__func__, (handle), (bh))
-
-int ext3_journal_dirty_data(handle_t *handle, struct buffer_head *bh);
-
-handle_t *ext3_journal_start_sb(struct super_block *sb, int nblocks);
-int __ext3_journal_stop(const char *where, handle_t *handle);
-
-static inline handle_t *ext3_journal_start(struct inode *inode, int nblocks)
-{
-	return ext3_journal_start_sb(inode->i_sb, nblocks);
-}
-
-#define ext3_journal_stop(handle) \
-	__ext3_journal_stop(__func__, (handle))
-
-static inline handle_t *ext3_journal_current_handle(void)
-{
-	return journal_current_handle();
-}
-
-static inline int ext3_journal_extend(handle_t *handle, int nblocks)
-{
-	return journal_extend(handle, nblocks);
-}
-
-static inline int ext3_journal_restart(handle_t *handle, int nblocks)
-{
-	return journal_restart(handle, nblocks);
-}
-
-static inline int ext3_journal_blocks_per_page(struct inode *inode)
-{
-	return journal_blocks_per_page(inode);
-}
-
-static inline int ext3_journal_force_commit(journal_t *journal)
-{
-	return journal_force_commit(journal);
-}
-
-/* super.c */
-int ext3_force_commit(struct super_block *sb);
-
-static inline int ext3_should_journal_data(struct inode *inode)
-{
-	if (!S_ISREG(inode->i_mode))
-		return 1;
-	if (test_opt(inode->i_sb, DATA_FLAGS) == EXT3_MOUNT_JOURNAL_DATA)
-		return 1;
-	if (EXT3_I(inode)->i_flags & EXT3_JOURNAL_DATA_FL)
-		return 1;
-	return 0;
-}
-
-static inline int ext3_should_order_data(struct inode *inode)
-{
-	if (!S_ISREG(inode->i_mode))
-		return 0;
-	if (EXT3_I(inode)->i_flags & EXT3_JOURNAL_DATA_FL)
-		return 0;
-	if (test_opt(inode->i_sb, DATA_FLAGS) == EXT3_MOUNT_ORDERED_DATA)
-		return 1;
-	return 0;
-}
-
-static inline int ext3_should_writeback_data(struct inode *inode)
-{
-	if (!S_ISREG(inode->i_mode))
-		return 0;
-	if (EXT3_I(inode)->i_flags & EXT3_JOURNAL_DATA_FL)
-		return 0;
-	if (test_opt(inode->i_sb, DATA_FLAGS) == EXT3_MOUNT_WRITEBACK_DATA)
-		return 1;
-	return 0;
-}
-
-#include <trace/events/ext3.h>
diff --git a/fs/ext3/ext3_jbd.c b/fs/ext3/ext3_jbd.c
deleted file mode 100644
index 785a3261a26c..000000000000
--- a/fs/ext3/ext3_jbd.c
+++ /dev/null
@@ -1,59 +0,0 @@
-/*
- * Interface between ext3 and JBD
- */
-
-#include "ext3.h"
-
-int __ext3_journal_get_undo_access(const char *where, handle_t *handle,
-				struct buffer_head *bh)
-{
-	int err = journal_get_undo_access(handle, bh);
-	if (err)
-		ext3_journal_abort_handle(where, __func__, bh, handle,err);
-	return err;
-}
-
-int __ext3_journal_get_write_access(const char *where, handle_t *handle,
-				struct buffer_head *bh)
-{
-	int err = journal_get_write_access(handle, bh);
-	if (err)
-		ext3_journal_abort_handle(where, __func__, bh, handle,err);
-	return err;
-}
-
-int __ext3_journal_forget(const char *where, handle_t *handle,
-				struct buffer_head *bh)
-{
-	int err = journal_forget(handle, bh);
-	if (err)
-		ext3_journal_abort_handle(where, __func__, bh, handle,err);
-	return err;
-}
-
-int __ext3_journal_revoke(const char *where, handle_t *handle,
-				unsigned long blocknr, struct buffer_head *bh)
-{
-	int err = journal_revoke(handle, blocknr, bh);
-	if (err)
-		ext3_journal_abort_handle(where, __func__, bh, handle,err);
-	return err;
-}
-
-int __ext3_journal_get_create_access(const char *where,
-				handle_t *handle, struct buffer_head *bh)
-{
-	int err = journal_get_create_access(handle, bh);
-	if (err)
-		ext3_journal_abort_handle(where, __func__, bh, handle,err);
-	return err;
-}
-
-int __ext3_journal_dirty_metadata(const char *where,
-				handle_t *handle, struct buffer_head *bh)
-{
-	int err = journal_dirty_metadata(handle, bh);
-	if (err)
-		ext3_journal_abort_handle(where, __func__, bh, handle,err);
-	return err;
-}
diff --git a/fs/ext3/file.c b/fs/ext3/file.c
deleted file mode 100644
index 3b8f650de22c..000000000000
--- a/fs/ext3/file.c
+++ /dev/null
@@ -1,79 +0,0 @@
-/*
- *  linux/fs/ext3/file.c
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- *
- *  from
- *
- *  linux/fs/minix/file.c
- *
- *  Copyright (C) 1991, 1992  Linus Torvalds
- *
- *  ext3 fs regular file handling primitives
- *
- *  64-bit file support on 64-bit platforms by Jakub Jelinek
- *	(jj@sunsite.ms.mff.cuni.cz)
- */
-
-#include <linux/quotaops.h>
-#include "ext3.h"
-#include "xattr.h"
-#include "acl.h"
-
-/*
- * Called when an inode is released. Note that this is different
- * from ext3_file_open: open gets called at every open, but release
- * gets called only when /all/ the files are closed.
- */
-static int ext3_release_file (struct inode * inode, struct file * filp)
-{
-	if (ext3_test_inode_state(inode, EXT3_STATE_FLUSH_ON_CLOSE)) {
-		filemap_flush(inode->i_mapping);
-		ext3_clear_inode_state(inode, EXT3_STATE_FLUSH_ON_CLOSE);
-	}
-	/* if we are the last writer on the inode, drop the block reservation */
-	if ((filp->f_mode & FMODE_WRITE) &&
-			(atomic_read(&inode->i_writecount) == 1))
-	{
-		mutex_lock(&EXT3_I(inode)->truncate_mutex);
-		ext3_discard_reservation(inode);
-		mutex_unlock(&EXT3_I(inode)->truncate_mutex);
-	}
-	if (is_dx(inode) && filp->private_data)
-		ext3_htree_free_dir_info(filp->private_data);
-
-	return 0;
-}
-
-const struct file_operations ext3_file_operations = {
-	.llseek		= generic_file_llseek,
-	.read_iter	= generic_file_read_iter,
-	.write_iter	= generic_file_write_iter,
-	.unlocked_ioctl	= ext3_ioctl,
-#ifdef CONFIG_COMPAT
-	.compat_ioctl	= ext3_compat_ioctl,
-#endif
-	.mmap		= generic_file_mmap,
-	.open		= dquot_file_open,
-	.release	= ext3_release_file,
-	.fsync		= ext3_sync_file,
-	.splice_read	= generic_file_splice_read,
-	.splice_write	= iter_file_splice_write,
-};
-
-const struct inode_operations ext3_file_inode_operations = {
-	.setattr	= ext3_setattr,
-#ifdef CONFIG_EXT3_FS_XATTR
-	.setxattr	= generic_setxattr,
-	.getxattr	= generic_getxattr,
-	.listxattr	= ext3_listxattr,
-	.removexattr	= generic_removexattr,
-#endif
-	.get_acl	= ext3_get_acl,
-	.set_acl	= ext3_set_acl,
-	.fiemap		= ext3_fiemap,
-};
-
diff --git a/fs/ext3/fsync.c b/fs/ext3/fsync.c
deleted file mode 100644
index 1cb9c7e10c6f..000000000000
--- a/fs/ext3/fsync.c
+++ /dev/null
@@ -1,109 +0,0 @@
-/*
- *  linux/fs/ext3/fsync.c
- *
- *  Copyright (C) 1993  Stephen Tweedie (sct@redhat.com)
- *  from
- *  Copyright (C) 1992  Remy Card (card@masi.ibp.fr)
- *                      Laboratoire MASI - Institut Blaise Pascal
- *                      Universite Pierre et Marie Curie (Paris VI)
- *  from
- *  linux/fs/minix/truncate.c   Copyright (C) 1991, 1992  Linus Torvalds
- *
- *  ext3fs fsync primitive
- *
- *  Big-endian to little-endian byte-swapping/bitmaps by
- *        David S. Miller (davem@caip.rutgers.edu), 1995
- *
- *  Removed unnecessary code duplication for little endian machines
- *  and excessive __inline__s.
- *        Andi Kleen, 1997
- *
- * Major simplications and cleanup - we only need to do the metadata, because
- * we can depend on generic_block_fdatasync() to sync the data blocks.
- */
-
-#include <linux/blkdev.h>
-#include <linux/writeback.h>
-#include "ext3.h"
-
-/*
- * akpm: A new design for ext3_sync_file().
- *
- * This is only called from sys_fsync(), sys_fdatasync() and sys_msync().
- * There cannot be a transaction open by this task.
- * Another task could have dirtied this inode.  Its data can be in any
- * state in the journalling system.
- *
- * What we do is just kick off a commit and wait on it.  This will snapshot the
- * inode to disk.
- */
-
-int ext3_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
-{
-	struct inode *inode = file->f_mapping->host;
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	journal_t *journal = EXT3_SB(inode->i_sb)->s_journal;
-	int ret, needs_barrier = 0;
-	tid_t commit_tid;
-
-	trace_ext3_sync_file_enter(file, datasync);
-
-	if (inode->i_sb->s_flags & MS_RDONLY) {
-		/* Make sure that we read updated state */
-		smp_rmb();
-		if (EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ERROR_FS)
-			return -EROFS;
-		return 0;
-	}
-	ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
-	if (ret)
-		goto out;
-
-	J_ASSERT(ext3_journal_current_handle() == NULL);
-
-	/*
-	 * data=writeback,ordered:
-	 *  The caller's filemap_fdatawrite()/wait will sync the data.
-	 *  Metadata is in the journal, we wait for a proper transaction
-	 *  to commit here.
-	 *
-	 * data=journal:
-	 *  filemap_fdatawrite won't do anything (the buffers are clean).
-	 *  ext3_force_commit will write the file data into the journal and
-	 *  will wait on that.
-	 *  filemap_fdatawait() will encounter a ton of newly-dirtied pages
-	 *  (they were dirtied by commit).  But that's OK - the blocks are
-	 *  safe in-journal, which is all fsync() needs to ensure.
-	 */
-	if (ext3_should_journal_data(inode)) {
-		ret = ext3_force_commit(inode->i_sb);
-		goto out;
-	}
-
-	if (datasync)
-		commit_tid = atomic_read(&ei->i_datasync_tid);
-	else
-		commit_tid = atomic_read(&ei->i_sync_tid);
-
-	if (test_opt(inode->i_sb, BARRIER) &&
-	    !journal_trans_will_send_data_barrier(journal, commit_tid))
-		needs_barrier = 1;
-	log_start_commit(journal, commit_tid);
-	ret = log_wait_commit(journal, commit_tid);
-
-	/*
-	 * In case we didn't commit a transaction, we have to flush
-	 * disk caches manually so that data really is on persistent
-	 * storage
-	 */
-	if (needs_barrier) {
-		int err;
-
-		err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
-		if (!ret)
-			ret = err;
-	}
-out:
-	trace_ext3_sync_file_exit(inode, ret);
-	return ret;
-}
diff --git a/fs/ext3/hash.c b/fs/ext3/hash.c
deleted file mode 100644
index ede315cdf126..000000000000
--- a/fs/ext3/hash.c
+++ /dev/null
@@ -1,206 +0,0 @@
-/*
- *  linux/fs/ext3/hash.c
- *
- * Copyright (C) 2002 by Theodore Ts'o
- *
- * This file is released under the GPL v2.
- *
- * This file may be redistributed under the terms of the GNU Public
- * License.
- */
-
-#include "ext3.h"
-#include <linux/cryptohash.h>
-
-#define DELTA 0x9E3779B9
-
-static void TEA_transform(__u32 buf[4], __u32 const in[])
-{
-	__u32	sum = 0;
-	__u32	b0 = buf[0], b1 = buf[1];
-	__u32	a = in[0], b = in[1], c = in[2], d = in[3];
-	int	n = 16;
-
-	do {
-		sum += DELTA;
-		b0 += ((b1 << 4)+a) ^ (b1+sum) ^ ((b1 >> 5)+b);
-		b1 += ((b0 << 4)+c) ^ (b0+sum) ^ ((b0 >> 5)+d);
-	} while(--n);
-
-	buf[0] += b0;
-	buf[1] += b1;
-}
-
-
-/* The old legacy hash */
-static __u32 dx_hack_hash_unsigned(const char *name, int len)
-{
-	__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
-	const unsigned char *ucp = (const unsigned char *) name;
-
-	while (len--) {
-		hash = hash1 + (hash0 ^ (((int) *ucp++) * 7152373));
-
-		if (hash & 0x80000000)
-			hash -= 0x7fffffff;
-		hash1 = hash0;
-		hash0 = hash;
-	}
-	return hash0 << 1;
-}
-
-static __u32 dx_hack_hash_signed(const char *name, int len)
-{
-	__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
-	const signed char *scp = (const signed char *) name;
-
-	while (len--) {
-		hash = hash1 + (hash0 ^ (((int) *scp++) * 7152373));
-
-		if (hash & 0x80000000)
-			hash -= 0x7fffffff;
-		hash1 = hash0;
-		hash0 = hash;
-	}
-	return hash0 << 1;
-}
-
-static void str2hashbuf_signed(const char *msg, int len, __u32 *buf, int num)
-{
-	__u32	pad, val;
-	int	i;
-	const signed char *scp = (const signed char *) msg;
-
-	pad = (__u32)len | ((__u32)len << 8);
-	pad |= pad << 16;
-
-	val = pad;
-	if (len > num*4)
-		len = num * 4;
-	for (i = 0; i < len; i++) {
-		if ((i % 4) == 0)
-			val = pad;
-		val = ((int) scp[i]) + (val << 8);
-		if ((i % 4) == 3) {
-			*buf++ = val;
-			val = pad;
-			num--;
-		}
-	}
-	if (--num >= 0)
-		*buf++ = val;
-	while (--num >= 0)
-		*buf++ = pad;
-}
-
-static void str2hashbuf_unsigned(const char *msg, int len, __u32 *buf, int num)
-{
-	__u32	pad, val;
-	int	i;
-	const unsigned char *ucp = (const unsigned char *) msg;
-
-	pad = (__u32)len | ((__u32)len << 8);
-	pad |= pad << 16;
-
-	val = pad;
-	if (len > num*4)
-		len = num * 4;
-	for (i=0; i < len; i++) {
-		if ((i % 4) == 0)
-			val = pad;
-		val = ((int) ucp[i]) + (val << 8);
-		if ((i % 4) == 3) {
-			*buf++ = val;
-			val = pad;
-			num--;
-		}
-	}
-	if (--num >= 0)
-		*buf++ = val;
-	while (--num >= 0)
-		*buf++ = pad;
-}
-
-/*
- * Returns the hash of a filename.  If len is 0 and name is NULL, then
- * this function can be used to test whether or not a hash version is
- * supported.
- *
- * The seed is an 4 longword (32 bits) "secret" which can be used to
- * uniquify a hash.  If the seed is all zero's, then some default seed
- * may be used.
- *
- * A particular hash version specifies whether or not the seed is
- * represented, and whether or not the returned hash is 32 bits or 64
- * bits.  32 bit hashes will return 0 for the minor hash.
- */
-int ext3fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
-{
-	__u32	hash;
-	__u32	minor_hash = 0;
-	const char	*p;
-	int		i;
-	__u32		in[8], buf[4];
-	void		(*str2hashbuf)(const char *, int, __u32 *, int) =
-				str2hashbuf_signed;
-
-	/* Initialize the default seed for the hash checksum functions */
-	buf[0] = 0x67452301;
-	buf[1] = 0xefcdab89;
-	buf[2] = 0x98badcfe;
-	buf[3] = 0x10325476;
-
-	/* Check to see if the seed is all zero's */
-	if (hinfo->seed) {
-		for (i=0; i < 4; i++) {
-			if (hinfo->seed[i])
-				break;
-		}
-		if (i < 4)
-			memcpy(buf, hinfo->seed, sizeof(buf));
-	}
-
-	switch (hinfo->hash_version) {
-	case DX_HASH_LEGACY_UNSIGNED:
-		hash = dx_hack_hash_unsigned(name, len);
-		break;
-	case DX_HASH_LEGACY:
-		hash = dx_hack_hash_signed(name, len);
-		break;
-	case DX_HASH_HALF_MD4_UNSIGNED:
-		str2hashbuf = str2hashbuf_unsigned;
-	case DX_HASH_HALF_MD4:
-		p = name;
-		while (len > 0) {
-			(*str2hashbuf)(p, len, in, 8);
-			half_md4_transform(buf, in);
-			len -= 32;
-			p += 32;
-		}
-		minor_hash = buf[2];
-		hash = buf[1];
-		break;
-	case DX_HASH_TEA_UNSIGNED:
-		str2hashbuf = str2hashbuf_unsigned;
-	case DX_HASH_TEA:
-		p = name;
-		while (len > 0) {
-			(*str2hashbuf)(p, len, in, 4);
-			TEA_transform(buf, in);
-			len -= 16;
-			p += 16;
-		}
-		hash = buf[0];
-		minor_hash = buf[1];
-		break;
-	default:
-		hinfo->hash = 0;
-		return -1;
-	}
-	hash = hash & ~1;
-	if (hash == (EXT3_HTREE_EOF_32BIT << 1))
-		hash = (EXT3_HTREE_EOF_32BIT - 1) << 1;
-	hinfo->hash = hash;
-	hinfo->minor_hash = minor_hash;
-	return 0;
-}
diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
deleted file mode 100644
index 3ad242e5840e..000000000000
--- a/fs/ext3/ialloc.c
+++ /dev/null
@@ -1,706 +0,0 @@
-/*
- *  linux/fs/ext3/ialloc.c
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- *
- *  BSD ufs-inspired inode and directory allocation by
- *  Stephen Tweedie (sct@redhat.com), 1993
- *  Big-endian to little-endian byte-swapping/bitmaps by
- *        David S. Miller (davem@caip.rutgers.edu), 1995
- */
-
-#include <linux/quotaops.h>
-#include <linux/random.h>
-
-#include "ext3.h"
-#include "xattr.h"
-#include "acl.h"
-
-/*
- * ialloc.c contains the inodes allocation and deallocation routines
- */
-
-/*
- * The free inodes are managed by bitmaps.  A file system contains several
- * blocks groups.  Each group contains 1 bitmap block for blocks, 1 bitmap
- * block for inodes, N blocks for the inode table and data blocks.
- *
- * The file system contains group descriptors which are located after the
- * super block.  Each descriptor contains the number of the bitmap block and
- * the free blocks count in the block.
- */
-
-
-/*
- * Read the inode allocation bitmap for a given block_group, reading
- * into the specified slot in the superblock's bitmap cache.
- *
- * Return buffer_head of bitmap on success or NULL.
- */
-static struct buffer_head *
-read_inode_bitmap(struct super_block * sb, unsigned long block_group)
-{
-	struct ext3_group_desc *desc;
-	struct buffer_head *bh = NULL;
-
-	desc = ext3_get_group_desc(sb, block_group, NULL);
-	if (!desc)
-		goto error_out;
-
-	bh = sb_bread(sb, le32_to_cpu(desc->bg_inode_bitmap));
-	if (!bh)
-		ext3_error(sb, "read_inode_bitmap",
-			    "Cannot read inode bitmap - "
-			    "block_group = %lu, inode_bitmap = %u",
-			    block_group, le32_to_cpu(desc->bg_inode_bitmap));
-error_out:
-	return bh;
-}
-
-/*
- * NOTE! When we get the inode, we're the only people
- * that have access to it, and as such there are no
- * race conditions we have to worry about. The inode
- * is not on the hash-lists, and it cannot be reached
- * through the filesystem because the directory entry
- * has been deleted earlier.
- *
- * HOWEVER: we must make sure that we get no aliases,
- * which means that we have to call "clear_inode()"
- * _before_ we mark the inode not in use in the inode
- * bitmaps. Otherwise a newly created file might use
- * the same inode number (not actually the same pointer
- * though), and then we'd have two inodes sharing the
- * same inode number and space on the harddisk.
- */
-void ext3_free_inode (handle_t *handle, struct inode * inode)
-{
-	struct super_block * sb = inode->i_sb;
-	int is_directory;
-	unsigned long ino;
-	struct buffer_head *bitmap_bh = NULL;
-	struct buffer_head *bh2;
-	unsigned long block_group;
-	unsigned long bit;
-	struct ext3_group_desc * gdp;
-	struct ext3_super_block * es;
-	struct ext3_sb_info *sbi;
-	int fatal = 0, err;
-
-	if (atomic_read(&inode->i_count) > 1) {
-		printk ("ext3_free_inode: inode has count=%d\n",
-					atomic_read(&inode->i_count));
-		return;
-	}
-	if (inode->i_nlink) {
-		printk ("ext3_free_inode: inode has nlink=%d\n",
-			inode->i_nlink);
-		return;
-	}
-	if (!sb) {
-		printk("ext3_free_inode: inode on nonexistent device\n");
-		return;
-	}
-	sbi = EXT3_SB(sb);
-
-	ino = inode->i_ino;
-	ext3_debug ("freeing inode %lu\n", ino);
-	trace_ext3_free_inode(inode);
-
-	is_directory = S_ISDIR(inode->i_mode);
-
-	es = EXT3_SB(sb)->s_es;
-	if (ino < EXT3_FIRST_INO(sb) || ino > le32_to_cpu(es->s_inodes_count)) {
-		ext3_error (sb, "ext3_free_inode",
-			    "reserved or nonexistent inode %lu", ino);
-		goto error_return;
-	}
-	block_group = (ino - 1) / EXT3_INODES_PER_GROUP(sb);
-	bit = (ino - 1) % EXT3_INODES_PER_GROUP(sb);
-	bitmap_bh = read_inode_bitmap(sb, block_group);
-	if (!bitmap_bh)
-		goto error_return;
-
-	BUFFER_TRACE(bitmap_bh, "get_write_access");
-	fatal = ext3_journal_get_write_access(handle, bitmap_bh);
-	if (fatal)
-		goto error_return;
-
-	/* Ok, now we can actually update the inode bitmaps.. */
-	if (!ext3_clear_bit_atomic(sb_bgl_lock(sbi, block_group),
-					bit, bitmap_bh->b_data))
-		ext3_error (sb, "ext3_free_inode",
-			      "bit already cleared for inode %lu", ino);
-	else {
-		gdp = ext3_get_group_desc (sb, block_group, &bh2);
-
-		BUFFER_TRACE(bh2, "get_write_access");
-		fatal = ext3_journal_get_write_access(handle, bh2);
-		if (fatal) goto error_return;
-
-		if (gdp) {
-			spin_lock(sb_bgl_lock(sbi, block_group));
-			le16_add_cpu(&gdp->bg_free_inodes_count, 1);
-			if (is_directory)
-				le16_add_cpu(&gdp->bg_used_dirs_count, -1);
-			spin_unlock(sb_bgl_lock(sbi, block_group));
-			percpu_counter_inc(&sbi->s_freeinodes_counter);
-			if (is_directory)
-				percpu_counter_dec(&sbi->s_dirs_counter);
-
-		}
-		BUFFER_TRACE(bh2, "call ext3_journal_dirty_metadata");
-		err = ext3_journal_dirty_metadata(handle, bh2);
-		if (!fatal) fatal = err;
-	}
-	BUFFER_TRACE(bitmap_bh, "call ext3_journal_dirty_metadata");
-	err = ext3_journal_dirty_metadata(handle, bitmap_bh);
-	if (!fatal)
-		fatal = err;
-
-error_return:
-	brelse(bitmap_bh);
-	ext3_std_error(sb, fatal);
-}
-
-/*
- * Orlov's allocator for directories.
- *
- * We always try to spread first-level directories.
- *
- * If there are blockgroups with both free inodes and free blocks counts
- * not worse than average we return one with smallest directory count.
- * Otherwise we simply return a random group.
- *
- * For the rest rules look so:
- *
- * It's OK to put directory into a group unless
- * it has too many directories already (max_dirs) or
- * it has too few free inodes left (min_inodes) or
- * it has too few free blocks left (min_blocks).
- * Parent's group is preferred, if it doesn't satisfy these
- * conditions we search cyclically through the rest. If none
- * of the groups look good we just look for a group with more
- * free inodes than average (starting at parent's group).
- *
- * Debt is incremented each time we allocate a directory and decremented
- * when we allocate an inode, within 0--255.
- */
-
-static int find_group_orlov(struct super_block *sb, struct inode *parent)
-{
-	int parent_group = EXT3_I(parent)->i_block_group;
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	int ngroups = sbi->s_groups_count;
-	int inodes_per_group = EXT3_INODES_PER_GROUP(sb);
-	unsigned int freei, avefreei;
-	ext3_fsblk_t freeb, avefreeb;
-	unsigned int ndirs;
-	int max_dirs, min_inodes;
-	ext3_grpblk_t min_blocks;
-	int group = -1, i;
-	struct ext3_group_desc *desc;
-
-	freei = percpu_counter_read_positive(&sbi->s_freeinodes_counter);
-	avefreei = freei / ngroups;
-	freeb = percpu_counter_read_positive(&sbi->s_freeblocks_counter);
-	avefreeb = freeb / ngroups;
-	ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
-
-	if ((parent == d_inode(sb->s_root)) ||
-	    (EXT3_I(parent)->i_flags & EXT3_TOPDIR_FL)) {
-		int best_ndir = inodes_per_group;
-		int best_group = -1;
-
-		group = prandom_u32();
-		parent_group = (unsigned)group % ngroups;
-		for (i = 0; i < ngroups; i++) {
-			group = (parent_group + i) % ngroups;
-			desc = ext3_get_group_desc (sb, group, NULL);
-			if (!desc || !desc->bg_free_inodes_count)
-				continue;
-			if (le16_to_cpu(desc->bg_used_dirs_count) >= best_ndir)
-				continue;
-			if (le16_to_cpu(desc->bg_free_inodes_count) < avefreei)
-				continue;
-			if (le16_to_cpu(desc->bg_free_blocks_count) < avefreeb)
-				continue;
-			best_group = group;
-			best_ndir = le16_to_cpu(desc->bg_used_dirs_count);
-		}
-		if (best_group >= 0)
-			return best_group;
-		goto fallback;
-	}
-
-	max_dirs = ndirs / ngroups + inodes_per_group / 16;
-	min_inodes = avefreei - inodes_per_group / 4;
-	min_blocks = avefreeb - EXT3_BLOCKS_PER_GROUP(sb) / 4;
-
-	for (i = 0; i < ngroups; i++) {
-		group = (parent_group + i) % ngroups;
-		desc = ext3_get_group_desc (sb, group, NULL);
-		if (!desc || !desc->bg_free_inodes_count)
-			continue;
-		if (le16_to_cpu(desc->bg_used_dirs_count) >= max_dirs)
-			continue;
-		if (le16_to_cpu(desc->bg_free_inodes_count) < min_inodes)
-			continue;
-		if (le16_to_cpu(desc->bg_free_blocks_count) < min_blocks)
-			continue;
-		return group;
-	}
-
-fallback:
-	for (i = 0; i < ngroups; i++) {
-		group = (parent_group + i) % ngroups;
-		desc = ext3_get_group_desc (sb, group, NULL);
-		if (!desc || !desc->bg_free_inodes_count)
-			continue;
-		if (le16_to_cpu(desc->bg_free_inodes_count) >= avefreei)
-			return group;
-	}
-
-	if (avefreei) {
-		/*
-		 * The free-inodes counter is approximate, and for really small
-		 * filesystems the above test can fail to find any blockgroups
-		 */
-		avefreei = 0;
-		goto fallback;
-	}
-
-	return -1;
-}
-
-static int find_group_other(struct super_block *sb, struct inode *parent)
-{
-	int parent_group = EXT3_I(parent)->i_block_group;
-	int ngroups = EXT3_SB(sb)->s_groups_count;
-	struct ext3_group_desc *desc;
-	int group, i;
-
-	/*
-	 * Try to place the inode in its parent directory
-	 */
-	group = parent_group;
-	desc = ext3_get_group_desc (sb, group, NULL);
-	if (desc && le16_to_cpu(desc->bg_free_inodes_count) &&
-			le16_to_cpu(desc->bg_free_blocks_count))
-		return group;
-
-	/*
-	 * We're going to place this inode in a different blockgroup from its
-	 * parent.  We want to cause files in a common directory to all land in
-	 * the same blockgroup.  But we want files which are in a different
-	 * directory which shares a blockgroup with our parent to land in a
-	 * different blockgroup.
-	 *
-	 * So add our directory's i_ino into the starting point for the hash.
-	 */
-	group = (group + parent->i_ino) % ngroups;
-
-	/*
-	 * Use a quadratic hash to find a group with a free inode and some free
-	 * blocks.
-	 */
-	for (i = 1; i < ngroups; i <<= 1) {
-		group += i;
-		if (group >= ngroups)
-			group -= ngroups;
-		desc = ext3_get_group_desc (sb, group, NULL);
-		if (desc && le16_to_cpu(desc->bg_free_inodes_count) &&
-				le16_to_cpu(desc->bg_free_blocks_count))
-			return group;
-	}
-
-	/*
-	 * That failed: try linear search for a free inode, even if that group
-	 * has no free blocks.
-	 */
-	group = parent_group;
-	for (i = 0; i < ngroups; i++) {
-		if (++group >= ngroups)
-			group = 0;
-		desc = ext3_get_group_desc (sb, group, NULL);
-		if (desc && le16_to_cpu(desc->bg_free_inodes_count))
-			return group;
-	}
-
-	return -1;
-}
-
-/*
- * There are two policies for allocating an inode.  If the new inode is
- * a directory, then a forward search is made for a block group with both
- * free space and a low directory-to-inode ratio; if that fails, then of
- * the groups with above-average free space, that group with the fewest
- * directories already is chosen.
- *
- * For other inodes, search forward from the parent directory's block
- * group to find a free inode.
- */
-struct inode *ext3_new_inode(handle_t *handle, struct inode * dir,
-			     const struct qstr *qstr, umode_t mode)
-{
-	struct super_block *sb;
-	struct buffer_head *bitmap_bh = NULL;
-	struct buffer_head *bh2;
-	int group;
-	unsigned long ino = 0;
-	struct inode * inode;
-	struct ext3_group_desc * gdp = NULL;
-	struct ext3_super_block * es;
-	struct ext3_inode_info *ei;
-	struct ext3_sb_info *sbi;
-	int err = 0;
-	struct inode *ret;
-	int i;
-
-	/* Cannot create files in a deleted directory */
-	if (!dir || !dir->i_nlink)
-		return ERR_PTR(-EPERM);
-
-	sb = dir->i_sb;
-	trace_ext3_request_inode(dir, mode);
-	inode = new_inode(sb);
-	if (!inode)
-		return ERR_PTR(-ENOMEM);
-	ei = EXT3_I(inode);
-
-	sbi = EXT3_SB(sb);
-	es = sbi->s_es;
-	if (S_ISDIR(mode))
-		group = find_group_orlov(sb, dir);
-	else
-		group = find_group_other(sb, dir);
-
-	err = -ENOSPC;
-	if (group == -1)
-		goto out;
-
-	for (i = 0; i < sbi->s_groups_count; i++) {
-		err = -EIO;
-
-		gdp = ext3_get_group_desc(sb, group, &bh2);
-		if (!gdp)
-			goto fail;
-
-		brelse(bitmap_bh);
-		bitmap_bh = read_inode_bitmap(sb, group);
-		if (!bitmap_bh)
-			goto fail;
-
-		ino = 0;
-
-repeat_in_this_group:
-		ino = ext3_find_next_zero_bit((unsigned long *)
-				bitmap_bh->b_data, EXT3_INODES_PER_GROUP(sb), ino);
-		if (ino < EXT3_INODES_PER_GROUP(sb)) {
-
-			BUFFER_TRACE(bitmap_bh, "get_write_access");
-			err = ext3_journal_get_write_access(handle, bitmap_bh);
-			if (err)
-				goto fail;
-
-			if (!ext3_set_bit_atomic(sb_bgl_lock(sbi, group),
-						ino, bitmap_bh->b_data)) {
-				/* we won it */
-				BUFFER_TRACE(bitmap_bh,
-					"call ext3_journal_dirty_metadata");
-				err = ext3_journal_dirty_metadata(handle,
-								bitmap_bh);
-				if (err)
-					goto fail;
-				goto got;
-			}
-			/* we lost it */
-			journal_release_buffer(handle, bitmap_bh);
-
-			if (++ino < EXT3_INODES_PER_GROUP(sb))
-				goto repeat_in_this_group;
-		}
-
-		/*
-		 * This case is possible in concurrent environment.  It is very
-		 * rare.  We cannot repeat the find_group_xxx() call because
-		 * that will simply return the same blockgroup, because the
-		 * group descriptor metadata has not yet been updated.
-		 * So we just go onto the next blockgroup.
-		 */
-		if (++group == sbi->s_groups_count)
-			group = 0;
-	}
-	err = -ENOSPC;
-	goto out;
-
-got:
-	ino += group * EXT3_INODES_PER_GROUP(sb) + 1;
-	if (ino < EXT3_FIRST_INO(sb) || ino > le32_to_cpu(es->s_inodes_count)) {
-		ext3_error (sb, "ext3_new_inode",
-			    "reserved inode or inode > inodes count - "
-			    "block_group = %d, inode=%lu", group, ino);
-		err = -EIO;
-		goto fail;
-	}
-
-	BUFFER_TRACE(bh2, "get_write_access");
-	err = ext3_journal_get_write_access(handle, bh2);
-	if (err) goto fail;
-	spin_lock(sb_bgl_lock(sbi, group));
-	le16_add_cpu(&gdp->bg_free_inodes_count, -1);
-	if (S_ISDIR(mode)) {
-		le16_add_cpu(&gdp->bg_used_dirs_count, 1);
-	}
-	spin_unlock(sb_bgl_lock(sbi, group));
-	BUFFER_TRACE(bh2, "call ext3_journal_dirty_metadata");
-	err = ext3_journal_dirty_metadata(handle, bh2);
-	if (err) goto fail;
-
-	percpu_counter_dec(&sbi->s_freeinodes_counter);
-	if (S_ISDIR(mode))
-		percpu_counter_inc(&sbi->s_dirs_counter);
-
-
-	if (test_opt(sb, GRPID)) {
-		inode->i_mode = mode;
-		inode->i_uid = current_fsuid();
-		inode->i_gid = dir->i_gid;
-	} else
-		inode_init_owner(inode, dir, mode);
-
-	inode->i_ino = ino;
-	/* This is the optimal IO size (for stat), not the fs block size */
-	inode->i_blocks = 0;
-	inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC;
-
-	memset(ei->i_data, 0, sizeof(ei->i_data));
-	ei->i_dir_start_lookup = 0;
-	ei->i_disksize = 0;
-
-	ei->i_flags =
-		ext3_mask_flags(mode, EXT3_I(dir)->i_flags & EXT3_FL_INHERITED);
-#ifdef EXT3_FRAGMENTS
-	ei->i_faddr = 0;
-	ei->i_frag_no = 0;
-	ei->i_frag_size = 0;
-#endif
-	ei->i_file_acl = 0;
-	ei->i_dir_acl = 0;
-	ei->i_dtime = 0;
-	ei->i_block_alloc_info = NULL;
-	ei->i_block_group = group;
-
-	ext3_set_inode_flags(inode);
-	if (IS_DIRSYNC(inode))
-		handle->h_sync = 1;
-	if (insert_inode_locked(inode) < 0) {
-		/*
-		 * Likely a bitmap corruption causing inode to be allocated
-		 * twice.
-		 */
-		err = -EIO;
-		goto fail;
-	}
-	spin_lock(&sbi->s_next_gen_lock);
-	inode->i_generation = sbi->s_next_generation++;
-	spin_unlock(&sbi->s_next_gen_lock);
-
-	ei->i_state_flags = 0;
-	ext3_set_inode_state(inode, EXT3_STATE_NEW);
-
-	/* See comment in ext3_iget for explanation */
-	if (ino >= EXT3_FIRST_INO(sb) + 1 &&
-	    EXT3_INODE_SIZE(sb) > EXT3_GOOD_OLD_INODE_SIZE) {
-		ei->i_extra_isize =
-			sizeof(struct ext3_inode) - EXT3_GOOD_OLD_INODE_SIZE;
-	} else {
-		ei->i_extra_isize = 0;
-	}
-
-	ret = inode;
-	dquot_initialize(inode);
-	err = dquot_alloc_inode(inode);
-	if (err)
-		goto fail_drop;
-
-	err = ext3_init_acl(handle, inode, dir);
-	if (err)
-		goto fail_free_drop;
-
-	err = ext3_init_security(handle, inode, dir, qstr);
-	if (err)
-		goto fail_free_drop;
-
-	err = ext3_mark_inode_dirty(handle, inode);
-	if (err) {
-		ext3_std_error(sb, err);
-		goto fail_free_drop;
-	}
-
-	ext3_debug("allocating inode %lu\n", inode->i_ino);
-	trace_ext3_allocate_inode(inode, dir, mode);
-	goto really_out;
-fail:
-	ext3_std_error(sb, err);
-out:
-	iput(inode);
-	ret = ERR_PTR(err);
-really_out:
-	brelse(bitmap_bh);
-	return ret;
-
-fail_free_drop:
-	dquot_free_inode(inode);
-
-fail_drop:
-	dquot_drop(inode);
-	inode->i_flags |= S_NOQUOTA;
-	clear_nlink(inode);
-	unlock_new_inode(inode);
-	iput(inode);
-	brelse(bitmap_bh);
-	return ERR_PTR(err);
-}
-
-/* Verify that we are loading a valid orphan from disk */
-struct inode *ext3_orphan_get(struct super_block *sb, unsigned long ino)
-{
-	unsigned long max_ino = le32_to_cpu(EXT3_SB(sb)->s_es->s_inodes_count);
-	unsigned long block_group;
-	int bit;
-	struct buffer_head *bitmap_bh;
-	struct inode *inode = NULL;
-	long err = -EIO;
-
-	/* Error cases - e2fsck has already cleaned up for us */
-	if (ino > max_ino) {
-		ext3_warning(sb, __func__,
-			     "bad orphan ino %lu!  e2fsck was run?", ino);
-		goto error;
-	}
-
-	block_group = (ino - 1) / EXT3_INODES_PER_GROUP(sb);
-	bit = (ino - 1) % EXT3_INODES_PER_GROUP(sb);
-	bitmap_bh = read_inode_bitmap(sb, block_group);
-	if (!bitmap_bh) {
-		ext3_warning(sb, __func__,
-			     "inode bitmap error for orphan %lu", ino);
-		goto error;
-	}
-
-	/* Having the inode bit set should be a 100% indicator that this
-	 * is a valid orphan (no e2fsck run on fs).  Orphans also include
-	 * inodes that were being truncated, so we can't check i_nlink==0.
-	 */
-	if (!ext3_test_bit(bit, bitmap_bh->b_data))
-		goto bad_orphan;
-
-	inode = ext3_iget(sb, ino);
-	if (IS_ERR(inode))
-		goto iget_failed;
-
-	/*
-	 * If the orphans has i_nlinks > 0 then it should be able to be
-	 * truncated, otherwise it won't be removed from the orphan list
-	 * during processing and an infinite loop will result.
-	 */
-	if (inode->i_nlink && !ext3_can_truncate(inode))
-		goto bad_orphan;
-
-	if (NEXT_ORPHAN(inode) > max_ino)
-		goto bad_orphan;
-	brelse(bitmap_bh);
-	return inode;
-
-iget_failed:
-	err = PTR_ERR(inode);
-	inode = NULL;
-bad_orphan:
-	ext3_warning(sb, __func__,
-		     "bad orphan inode %lu!  e2fsck was run?", ino);
-	printk(KERN_NOTICE "ext3_test_bit(bit=%d, block=%llu) = %d\n",
-	       bit, (unsigned long long)bitmap_bh->b_blocknr,
-	       ext3_test_bit(bit, bitmap_bh->b_data));
-	printk(KERN_NOTICE "inode=%p\n", inode);
-	if (inode) {
-		printk(KERN_NOTICE "is_bad_inode(inode)=%d\n",
-		       is_bad_inode(inode));
-		printk(KERN_NOTICE "NEXT_ORPHAN(inode)=%u\n",
-		       NEXT_ORPHAN(inode));
-		printk(KERN_NOTICE "max_ino=%lu\n", max_ino);
-		printk(KERN_NOTICE "i_nlink=%u\n", inode->i_nlink);
-		/* Avoid freeing blocks if we got a bad deleted inode */
-		if (inode->i_nlink == 0)
-			inode->i_blocks = 0;
-		iput(inode);
-	}
-	brelse(bitmap_bh);
-error:
-	return ERR_PTR(err);
-}
-
-unsigned long ext3_count_free_inodes (struct super_block * sb)
-{
-	unsigned long desc_count;
-	struct ext3_group_desc *gdp;
-	int i;
-#ifdef EXT3FS_DEBUG
-	struct ext3_super_block *es;
-	unsigned long bitmap_count, x;
-	struct buffer_head *bitmap_bh = NULL;
-
-	es = EXT3_SB(sb)->s_es;
-	desc_count = 0;
-	bitmap_count = 0;
-	gdp = NULL;
-	for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) {
-		gdp = ext3_get_group_desc (sb, i, NULL);
-		if (!gdp)
-			continue;
-		desc_count += le16_to_cpu(gdp->bg_free_inodes_count);
-		brelse(bitmap_bh);
-		bitmap_bh = read_inode_bitmap(sb, i);
-		if (!bitmap_bh)
-			continue;
-
-		x = ext3_count_free(bitmap_bh, EXT3_INODES_PER_GROUP(sb) / 8);
-		printk("group %d: stored = %d, counted = %lu\n",
-			i, le16_to_cpu(gdp->bg_free_inodes_count), x);
-		bitmap_count += x;
-	}
-	brelse(bitmap_bh);
-	printk("ext3_count_free_inodes: stored = %u, computed = %lu, %lu\n",
-		le32_to_cpu(es->s_free_inodes_count), desc_count, bitmap_count);
-	return desc_count;
-#else
-	desc_count = 0;
-	for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) {
-		gdp = ext3_get_group_desc (sb, i, NULL);
-		if (!gdp)
-			continue;
-		desc_count += le16_to_cpu(gdp->bg_free_inodes_count);
-		cond_resched();
-	}
-	return desc_count;
-#endif
-}
-
-/* Called at mount-time, super-block is locked */
-unsigned long ext3_count_dirs (struct super_block * sb)
-{
-	unsigned long count = 0;
-	int i;
-
-	for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) {
-		struct ext3_group_desc *gdp = ext3_get_group_desc (sb, i, NULL);
-		if (!gdp)
-			continue;
-		count += le16_to_cpu(gdp->bg_used_dirs_count);
-	}
-	return count;
-}
-
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
deleted file mode 100644
index 6c7e5468a2f8..000000000000
--- a/fs/ext3/inode.c
+++ /dev/null
@@ -1,3574 +0,0 @@
-/*
- *  linux/fs/ext3/inode.c
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- *
- *  from
- *
- *  linux/fs/minix/inode.c
- *
- *  Copyright (C) 1991, 1992  Linus Torvalds
- *
- *  Goal-directed block allocation by Stephen Tweedie
- *	(sct@redhat.com), 1993, 1998
- *  Big-endian to little-endian byte-swapping/bitmaps by
- *        David S. Miller (davem@caip.rutgers.edu), 1995
- *  64-bit file support on 64-bit platforms by Jakub Jelinek
- *	(jj@sunsite.ms.mff.cuni.cz)
- *
- *  Assorted race fixes, rewrite of ext3_get_block() by Al Viro, 2000
- */
-
-#include <linux/highuid.h>
-#include <linux/quotaops.h>
-#include <linux/writeback.h>
-#include <linux/mpage.h>
-#include <linux/namei.h>
-#include <linux/uio.h>
-#include "ext3.h"
-#include "xattr.h"
-#include "acl.h"
-
-static int ext3_writepage_trans_blocks(struct inode *inode);
-static int ext3_block_truncate_page(struct inode *inode, loff_t from);
-
-/*
- * Test whether an inode is a fast symlink.
- */
-static int ext3_inode_is_fast_symlink(struct inode *inode)
-{
-	int ea_blocks = EXT3_I(inode)->i_file_acl ?
-		(inode->i_sb->s_blocksize >> 9) : 0;
-
-	return (S_ISLNK(inode->i_mode) && inode->i_blocks - ea_blocks == 0);
-}
-
-/*
- * The ext3 forget function must perform a revoke if we are freeing data
- * which has been journaled.  Metadata (eg. indirect blocks) must be
- * revoked in all cases.
- *
- * "bh" may be NULL: a metadata block may have been freed from memory
- * but there may still be a record of it in the journal, and that record
- * still needs to be revoked.
- */
-int ext3_forget(handle_t *handle, int is_metadata, struct inode *inode,
-			struct buffer_head *bh, ext3_fsblk_t blocknr)
-{
-	int err;
-
-	might_sleep();
-
-	trace_ext3_forget(inode, is_metadata, blocknr);
-	BUFFER_TRACE(bh, "enter");
-
-	jbd_debug(4, "forgetting bh %p: is_metadata = %d, mode %o, "
-		  "data mode %lx\n",
-		  bh, is_metadata, inode->i_mode,
-		  test_opt(inode->i_sb, DATA_FLAGS));
-
-	/* Never use the revoke function if we are doing full data
-	 * journaling: there is no need to, and a V1 superblock won't
-	 * support it.  Otherwise, only skip the revoke on un-journaled
-	 * data blocks. */
-
-	if (test_opt(inode->i_sb, DATA_FLAGS) == EXT3_MOUNT_JOURNAL_DATA ||
-	    (!is_metadata && !ext3_should_journal_data(inode))) {
-		if (bh) {
-			BUFFER_TRACE(bh, "call journal_forget");
-			return ext3_journal_forget(handle, bh);
-		}
-		return 0;
-	}
-
-	/*
-	 * data!=journal && (is_metadata || should_journal_data(inode))
-	 */
-	BUFFER_TRACE(bh, "call ext3_journal_revoke");
-	err = ext3_journal_revoke(handle, blocknr, bh);
-	if (err)
-		ext3_abort(inode->i_sb, __func__,
-			   "error %d when attempting revoke", err);
-	BUFFER_TRACE(bh, "exit");
-	return err;
-}
-
-/*
- * Work out how many blocks we need to proceed with the next chunk of a
- * truncate transaction.
- */
-static unsigned long blocks_for_truncate(struct inode *inode)
-{
-	unsigned long needed;
-
-	needed = inode->i_blocks >> (inode->i_sb->s_blocksize_bits - 9);
-
-	/* Give ourselves just enough room to cope with inodes in which
-	 * i_blocks is corrupt: we've seen disk corruptions in the past
-	 * which resulted in random data in an inode which looked enough
-	 * like a regular file for ext3 to try to delete it.  Things
-	 * will go a bit crazy if that happens, but at least we should
-	 * try not to panic the whole kernel. */
-	if (needed < 2)
-		needed = 2;
-
-	/* But we need to bound the transaction so we don't overflow the
-	 * journal. */
-	if (needed > EXT3_MAX_TRANS_DATA)
-		needed = EXT3_MAX_TRANS_DATA;
-
-	return EXT3_DATA_TRANS_BLOCKS(inode->i_sb) + needed;
-}
-
-/*
- * Truncate transactions can be complex and absolutely huge.  So we need to
- * be able to restart the transaction at a conventient checkpoint to make
- * sure we don't overflow the journal.
- *
- * start_transaction gets us a new handle for a truncate transaction,
- * and extend_transaction tries to extend the existing one a bit.  If
- * extend fails, we need to propagate the failure up and restart the
- * transaction in the top-level truncate loop. --sct
- */
-static handle_t *start_transaction(struct inode *inode)
-{
-	handle_t *result;
-
-	result = ext3_journal_start(inode, blocks_for_truncate(inode));
-	if (!IS_ERR(result))
-		return result;
-
-	ext3_std_error(inode->i_sb, PTR_ERR(result));
-	return result;
-}
-
-/*
- * Try to extend this transaction for the purposes of truncation.
- *
- * Returns 0 if we managed to create more room.  If we can't create more
- * room, and the transaction must be restarted we return 1.
- */
-static int try_to_extend_transaction(handle_t *handle, struct inode *inode)
-{
-	if (handle->h_buffer_credits > EXT3_RESERVE_TRANS_BLOCKS)
-		return 0;
-	if (!ext3_journal_extend(handle, blocks_for_truncate(inode)))
-		return 0;
-	return 1;
-}
-
-/*
- * Restart the transaction associated with *handle.  This does a commit,
- * so before we call here everything must be consistently dirtied against
- * this transaction.
- */
-static int truncate_restart_transaction(handle_t *handle, struct inode *inode)
-{
-	int ret;
-
-	jbd_debug(2, "restarting handle %p\n", handle);
-	/*
-	 * Drop truncate_mutex to avoid deadlock with ext3_get_blocks_handle
-	 * At this moment, get_block can be called only for blocks inside
-	 * i_size since page cache has been already dropped and writes are
-	 * blocked by i_mutex. So we can safely drop the truncate_mutex.
-	 */
-	mutex_unlock(&EXT3_I(inode)->truncate_mutex);
-	ret = ext3_journal_restart(handle, blocks_for_truncate(inode));
-	mutex_lock(&EXT3_I(inode)->truncate_mutex);
-	return ret;
-}
-
-/*
- * Called at inode eviction from icache
- */
-void ext3_evict_inode (struct inode *inode)
-{
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	struct ext3_block_alloc_info *rsv;
-	handle_t *handle;
-	int want_delete = 0;
-
-	trace_ext3_evict_inode(inode);
-	if (!inode->i_nlink && !is_bad_inode(inode)) {
-		dquot_initialize(inode);
-		want_delete = 1;
-	}
-
-	/*
-	 * When journalling data dirty buffers are tracked only in the journal.
-	 * So although mm thinks everything is clean and ready for reaping the
-	 * inode might still have some pages to write in the running
-	 * transaction or waiting to be checkpointed. Thus calling
-	 * journal_invalidatepage() (via truncate_inode_pages()) to discard
-	 * these buffers can cause data loss. Also even if we did not discard
-	 * these buffers, we would have no way to find them after the inode
-	 * is reaped and thus user could see stale data if he tries to read
-	 * them before the transaction is checkpointed. So be careful and
-	 * force everything to disk here... We use ei->i_datasync_tid to
-	 * store the newest transaction containing inode's data.
-	 *
-	 * Note that directories do not have this problem because they don't
-	 * use page cache.
-	 *
-	 * The s_journal check handles the case when ext3_get_journal() fails
-	 * and puts the journal inode.
-	 */
-	if (inode->i_nlink && ext3_should_journal_data(inode) &&
-	    EXT3_SB(inode->i_sb)->s_journal &&
-	    (S_ISLNK(inode->i_mode) || S_ISREG(inode->i_mode)) &&
-	    inode->i_ino != EXT3_JOURNAL_INO) {
-		tid_t commit_tid = atomic_read(&ei->i_datasync_tid);
-		journal_t *journal = EXT3_SB(inode->i_sb)->s_journal;
-
-		log_start_commit(journal, commit_tid);
-		log_wait_commit(journal, commit_tid);
-		filemap_write_and_wait(&inode->i_data);
-	}
-	truncate_inode_pages_final(&inode->i_data);
-
-	ext3_discard_reservation(inode);
-	rsv = ei->i_block_alloc_info;
-	ei->i_block_alloc_info = NULL;
-	if (unlikely(rsv))
-		kfree(rsv);
-
-	if (!want_delete)
-		goto no_delete;
-
-	handle = start_transaction(inode);
-	if (IS_ERR(handle)) {
-		/*
-		 * If we're going to skip the normal cleanup, we still need to
-		 * make sure that the in-core orphan linked list is properly
-		 * cleaned up.
-		 */
-		ext3_orphan_del(NULL, inode);
-		goto no_delete;
-	}
-
-	if (IS_SYNC(inode))
-		handle->h_sync = 1;
-	inode->i_size = 0;
-	if (inode->i_blocks)
-		ext3_truncate(inode);
-	/*
-	 * Kill off the orphan record created when the inode lost the last
-	 * link.  Note that ext3_orphan_del() has to be able to cope with the
-	 * deletion of a non-existent orphan - ext3_truncate() could
-	 * have removed the record.
-	 */
-	ext3_orphan_del(handle, inode);
-	ei->i_dtime = get_seconds();
-
-	/*
-	 * One subtle ordering requirement: if anything has gone wrong
-	 * (transaction abort, IO errors, whatever), then we can still
-	 * do these next steps (the fs will already have been marked as
-	 * having errors), but we can't free the inode if the mark_dirty
-	 * fails.
-	 */
-	if (ext3_mark_inode_dirty(handle, inode)) {
-		/* If that failed, just dquot_drop() and be done with that */
-		dquot_drop(inode);
-		clear_inode(inode);
-	} else {
-		ext3_xattr_delete_inode(handle, inode);
-		dquot_free_inode(inode);
-		dquot_drop(inode);
-		clear_inode(inode);
-		ext3_free_inode(handle, inode);
-	}
-	ext3_journal_stop(handle);
-	return;
-no_delete:
-	clear_inode(inode);
-	dquot_drop(inode);
-}
-
-typedef struct {
-	__le32	*p;
-	__le32	key;
-	struct buffer_head *bh;
-} Indirect;
-
-static inline void add_chain(Indirect *p, struct buffer_head *bh, __le32 *v)
-{
-	p->key = *(p->p = v);
-	p->bh = bh;
-}
-
-static int verify_chain(Indirect *from, Indirect *to)
-{
-	while (from <= to && from->key == *from->p)
-		from++;
-	return (from > to);
-}
-
-/**
- *	ext3_block_to_path - parse the block number into array of offsets
- *	@inode: inode in question (we are only interested in its superblock)
- *	@i_block: block number to be parsed
- *	@offsets: array to store the offsets in
- *      @boundary: set this non-zero if the referred-to block is likely to be
- *             followed (on disk) by an indirect block.
- *
- *	To store the locations of file's data ext3 uses a data structure common
- *	for UNIX filesystems - tree of pointers anchored in the inode, with
- *	data blocks at leaves and indirect blocks in intermediate nodes.
- *	This function translates the block number into path in that tree -
- *	return value is the path length and @offsets[n] is the offset of
- *	pointer to (n+1)th node in the nth one. If @block is out of range
- *	(negative or too large) warning is printed and zero returned.
- *
- *	Note: function doesn't find node addresses, so no IO is needed. All
- *	we need to know is the capacity of indirect blocks (taken from the
- *	inode->i_sb).
- */
-
-/*
- * Portability note: the last comparison (check that we fit into triple
- * indirect block) is spelled differently, because otherwise on an
- * architecture with 32-bit longs and 8Kb pages we might get into trouble
- * if our filesystem had 8Kb blocks. We might use long long, but that would
- * kill us on x86. Oh, well, at least the sign propagation does not matter -
- * i_block would have to be negative in the very beginning, so we would not
- * get there at all.
- */
-
-static int ext3_block_to_path(struct inode *inode,
-			long i_block, int offsets[4], int *boundary)
-{
-	int ptrs = EXT3_ADDR_PER_BLOCK(inode->i_sb);
-	int ptrs_bits = EXT3_ADDR_PER_BLOCK_BITS(inode->i_sb);
-	const long direct_blocks = EXT3_NDIR_BLOCKS,
-		indirect_blocks = ptrs,
-		double_blocks = (1 << (ptrs_bits * 2));
-	int n = 0;
-	int final = 0;
-
-	if (i_block < 0) {
-		ext3_warning (inode->i_sb, "ext3_block_to_path", "block < 0");
-	} else if (i_block < direct_blocks) {
-		offsets[n++] = i_block;
-		final = direct_blocks;
-	} else if ( (i_block -= direct_blocks) < indirect_blocks) {
-		offsets[n++] = EXT3_IND_BLOCK;
-		offsets[n++] = i_block;
-		final = ptrs;
-	} else if ((i_block -= indirect_blocks) < double_blocks) {
-		offsets[n++] = EXT3_DIND_BLOCK;
-		offsets[n++] = i_block >> ptrs_bits;
-		offsets[n++] = i_block & (ptrs - 1);
-		final = ptrs;
-	} else if (((i_block -= double_blocks) >> (ptrs_bits * 2)) < ptrs) {
-		offsets[n++] = EXT3_TIND_BLOCK;
-		offsets[n++] = i_block >> (ptrs_bits * 2);
-		offsets[n++] = (i_block >> ptrs_bits) & (ptrs - 1);
-		offsets[n++] = i_block & (ptrs - 1);
-		final = ptrs;
-	} else {
-		ext3_warning(inode->i_sb, "ext3_block_to_path", "block > big");
-	}
-	if (boundary)
-		*boundary = final - 1 - (i_block & (ptrs - 1));
-	return n;
-}
-
-/**
- *	ext3_get_branch - read the chain of indirect blocks leading to data
- *	@inode: inode in question
- *	@depth: depth of the chain (1 - direct pointer, etc.)
- *	@offsets: offsets of pointers in inode/indirect blocks
- *	@chain: place to store the result
- *	@err: here we store the error value
- *
- *	Function fills the array of triples <key, p, bh> and returns %NULL
- *	if everything went OK or the pointer to the last filled triple
- *	(incomplete one) otherwise. Upon the return chain[i].key contains
- *	the number of (i+1)-th block in the chain (as it is stored in memory,
- *	i.e. little-endian 32-bit), chain[i].p contains the address of that
- *	number (it points into struct inode for i==0 and into the bh->b_data
- *	for i>0) and chain[i].bh points to the buffer_head of i-th indirect
- *	block for i>0 and NULL for i==0. In other words, it holds the block
- *	numbers of the chain, addresses they were taken from (and where we can
- *	verify that chain did not change) and buffer_heads hosting these
- *	numbers.
- *
- *	Function stops when it stumbles upon zero pointer (absent block)
- *		(pointer to last triple returned, *@err == 0)
- *	or when it gets an IO error reading an indirect block
- *		(ditto, *@err == -EIO)
- *	or when it notices that chain had been changed while it was reading
- *		(ditto, *@err == -EAGAIN)
- *	or when it reads all @depth-1 indirect blocks successfully and finds
- *	the whole chain, all way to the data (returns %NULL, *err == 0).
- */
-static Indirect *ext3_get_branch(struct inode *inode, int depth, int *offsets,
-				 Indirect chain[4], int *err)
-{
-	struct super_block *sb = inode->i_sb;
-	Indirect *p = chain;
-	struct buffer_head *bh;
-
-	*err = 0;
-	/* i_data is not going away, no lock needed */
-	add_chain (chain, NULL, EXT3_I(inode)->i_data + *offsets);
-	if (!p->key)
-		goto no_block;
-	while (--depth) {
-		bh = sb_bread(sb, le32_to_cpu(p->key));
-		if (!bh)
-			goto failure;
-		/* Reader: pointers */
-		if (!verify_chain(chain, p))
-			goto changed;
-		add_chain(++p, bh, (__le32*)bh->b_data + *++offsets);
-		/* Reader: end */
-		if (!p->key)
-			goto no_block;
-	}
-	return NULL;
-
-changed:
-	brelse(bh);
-	*err = -EAGAIN;
-	goto no_block;
-failure:
-	*err = -EIO;
-no_block:
-	return p;
-}
-
-/**
- *	ext3_find_near - find a place for allocation with sufficient locality
- *	@inode: owner
- *	@ind: descriptor of indirect block.
- *
- *	This function returns the preferred place for block allocation.
- *	It is used when heuristic for sequential allocation fails.
- *	Rules are:
- *	  + if there is a block to the left of our position - allocate near it.
- *	  + if pointer will live in indirect block - allocate near that block.
- *	  + if pointer will live in inode - allocate in the same
- *	    cylinder group.
- *
- * In the latter case we colour the starting block by the callers PID to
- * prevent it from clashing with concurrent allocations for a different inode
- * in the same block group.   The PID is used here so that functionally related
- * files will be close-by on-disk.
- *
- *	Caller must make sure that @ind is valid and will stay that way.
- */
-static ext3_fsblk_t ext3_find_near(struct inode *inode, Indirect *ind)
-{
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	__le32 *start = ind->bh ? (__le32*) ind->bh->b_data : ei->i_data;
-	__le32 *p;
-	ext3_fsblk_t bg_start;
-	ext3_grpblk_t colour;
-
-	/* Try to find previous block */
-	for (p = ind->p - 1; p >= start; p--) {
-		if (*p)
-			return le32_to_cpu(*p);
-	}
-
-	/* No such thing, so let's try location of indirect block */
-	if (ind->bh)
-		return ind->bh->b_blocknr;
-
-	/*
-	 * It is going to be referred to from the inode itself? OK, just put it
-	 * into the same cylinder group then.
-	 */
-	bg_start = ext3_group_first_block_no(inode->i_sb, ei->i_block_group);
-	colour = (current->pid % 16) *
-			(EXT3_BLOCKS_PER_GROUP(inode->i_sb) / 16);
-	return bg_start + colour;
-}
-
-/**
- *	ext3_find_goal - find a preferred place for allocation.
- *	@inode: owner
- *	@block:  block we want
- *	@partial: pointer to the last triple within a chain
- *
- *	Normally this function find the preferred place for block allocation,
- *	returns it.
- */
-
-static ext3_fsblk_t ext3_find_goal(struct inode *inode, long block,
-				   Indirect *partial)
-{
-	struct ext3_block_alloc_info *block_i;
-
-	block_i =  EXT3_I(inode)->i_block_alloc_info;
-
-	/*
-	 * try the heuristic for sequential allocation,
-	 * failing that at least try to get decent locality.
-	 */
-	if (block_i && (block == block_i->last_alloc_logical_block + 1)
-		&& (block_i->last_alloc_physical_block != 0)) {
-		return block_i->last_alloc_physical_block + 1;
-	}
-
-	return ext3_find_near(inode, partial);
-}
-
-/**
- *	ext3_blks_to_allocate - Look up the block map and count the number
- *	of direct blocks need to be allocated for the given branch.
- *
- *	@branch: chain of indirect blocks
- *	@k: number of blocks need for indirect blocks
- *	@blks: number of data blocks to be mapped.
- *	@blocks_to_boundary:  the offset in the indirect block
- *
- *	return the total number of blocks to be allocate, including the
- *	direct and indirect blocks.
- */
-static int ext3_blks_to_allocate(Indirect *branch, int k, unsigned long blks,
-		int blocks_to_boundary)
-{
-	unsigned long count = 0;
-
-	/*
-	 * Simple case, [t,d]Indirect block(s) has not allocated yet
-	 * then it's clear blocks on that path have not allocated
-	 */
-	if (k > 0) {
-		/* right now we don't handle cross boundary allocation */
-		if (blks < blocks_to_boundary + 1)
-			count += blks;
-		else
-			count += blocks_to_boundary + 1;
-		return count;
-	}
-
-	count++;
-	while (count < blks && count <= blocks_to_boundary &&
-		le32_to_cpu(*(branch[0].p + count)) == 0) {
-		count++;
-	}
-	return count;
-}
-
-/**
- *	ext3_alloc_blocks - multiple allocate blocks needed for a branch
- *	@handle: handle for this transaction
- *	@inode: owner
- *	@goal: preferred place for allocation
- *	@indirect_blks: the number of blocks need to allocate for indirect
- *			blocks
- *	@blks:	number of blocks need to allocated for direct blocks
- *	@new_blocks: on return it will store the new block numbers for
- *	the indirect blocks(if needed) and the first direct block,
- *	@err: here we store the error value
- *
- *	return the number of direct blocks allocated
- */
-static int ext3_alloc_blocks(handle_t *handle, struct inode *inode,
-			ext3_fsblk_t goal, int indirect_blks, int blks,
-			ext3_fsblk_t new_blocks[4], int *err)
-{
-	int target, i;
-	unsigned long count = 0;
-	int index = 0;
-	ext3_fsblk_t current_block = 0;
-	int ret = 0;
-
-	/*
-	 * Here we try to allocate the requested multiple blocks at once,
-	 * on a best-effort basis.
-	 * To build a branch, we should allocate blocks for
-	 * the indirect blocks(if not allocated yet), and at least
-	 * the first direct block of this branch.  That's the
-	 * minimum number of blocks need to allocate(required)
-	 */
-	target = blks + indirect_blks;
-
-	while (1) {
-		count = target;
-		/* allocating blocks for indirect blocks and direct blocks */
-		current_block = ext3_new_blocks(handle,inode,goal,&count,err);
-		if (*err)
-			goto failed_out;
-
-		target -= count;
-		/* allocate blocks for indirect blocks */
-		while (index < indirect_blks && count) {
-			new_blocks[index++] = current_block++;
-			count--;
-		}
-
-		if (count > 0)
-			break;
-	}
-
-	/* save the new block number for the first direct block */
-	new_blocks[index] = current_block;
-
-	/* total number of blocks allocated for direct blocks */
-	ret = count;
-	*err = 0;
-	return ret;
-failed_out:
-	for (i = 0; i <index; i++)
-		ext3_free_blocks(handle, inode, new_blocks[i], 1);
-	return ret;
-}
-
-/**
- *	ext3_alloc_branch - allocate and set up a chain of blocks.
- *	@handle: handle for this transaction
- *	@inode: owner
- *	@indirect_blks: number of allocated indirect blocks
- *	@blks: number of allocated direct blocks
- *	@goal: preferred place for allocation
- *	@offsets: offsets (in the blocks) to store the pointers to next.
- *	@branch: place to store the chain in.
- *
- *	This function allocates blocks, zeroes out all but the last one,
- *	links them into chain and (if we are synchronous) writes them to disk.
- *	In other words, it prepares a branch that can be spliced onto the
- *	inode. It stores the information about that chain in the branch[], in
- *	the same format as ext3_get_branch() would do. We are calling it after
- *	we had read the existing part of chain and partial points to the last
- *	triple of that (one with zero ->key). Upon the exit we have the same
- *	picture as after the successful ext3_get_block(), except that in one
- *	place chain is disconnected - *branch->p is still zero (we did not
- *	set the last link), but branch->key contains the number that should
- *	be placed into *branch->p to fill that gap.
- *
- *	If allocation fails we free all blocks we've allocated (and forget
- *	their buffer_heads) and return the error value the from failed
- *	ext3_alloc_block() (normally -ENOSPC). Otherwise we set the chain
- *	as described above and return 0.
- */
-static int ext3_alloc_branch(handle_t *handle, struct inode *inode,
-			int indirect_blks, int *blks, ext3_fsblk_t goal,
-			int *offsets, Indirect *branch)
-{
-	int blocksize = inode->i_sb->s_blocksize;
-	int i, n = 0;
-	int err = 0;
-	struct buffer_head *bh;
-	int num;
-	ext3_fsblk_t new_blocks[4];
-	ext3_fsblk_t current_block;
-
-	num = ext3_alloc_blocks(handle, inode, goal, indirect_blks,
-				*blks, new_blocks, &err);
-	if (err)
-		return err;
-
-	branch[0].key = cpu_to_le32(new_blocks[0]);
-	/*
-	 * metadata blocks and data blocks are allocated.
-	 */
-	for (n = 1; n <= indirect_blks;  n++) {
-		/*
-		 * Get buffer_head for parent block, zero it out
-		 * and set the pointer to new one, then send
-		 * parent to disk.
-		 */
-		bh = sb_getblk(inode->i_sb, new_blocks[n-1]);
-		if (unlikely(!bh)) {
-			err = -ENOMEM;
-			goto failed;
-		}
-		branch[n].bh = bh;
-		lock_buffer(bh);
-		BUFFER_TRACE(bh, "call get_create_access");
-		err = ext3_journal_get_create_access(handle, bh);
-		if (err) {
-			unlock_buffer(bh);
-			brelse(bh);
-			goto failed;
-		}
-
-		memset(bh->b_data, 0, blocksize);
-		branch[n].p = (__le32 *) bh->b_data + offsets[n];
-		branch[n].key = cpu_to_le32(new_blocks[n]);
-		*branch[n].p = branch[n].key;
-		if ( n == indirect_blks) {
-			current_block = new_blocks[n];
-			/*
-			 * End of chain, update the last new metablock of
-			 * the chain to point to the new allocated
-			 * data blocks numbers
-			 */
-			for (i=1; i < num; i++)
-				*(branch[n].p + i) = cpu_to_le32(++current_block);
-		}
-		BUFFER_TRACE(bh, "marking uptodate");
-		set_buffer_uptodate(bh);
-		unlock_buffer(bh);
-
-		BUFFER_TRACE(bh, "call ext3_journal_dirty_metadata");
-		err = ext3_journal_dirty_metadata(handle, bh);
-		if (err)
-			goto failed;
-	}
-	*blks = num;
-	return err;
-failed:
-	/* Allocation failed, free what we already allocated */
-	for (i = 1; i <= n ; i++) {
-		BUFFER_TRACE(branch[i].bh, "call journal_forget");
-		ext3_journal_forget(handle, branch[i].bh);
-	}
-	for (i = 0; i < indirect_blks; i++)
-		ext3_free_blocks(handle, inode, new_blocks[i], 1);
-
-	ext3_free_blocks(handle, inode, new_blocks[i], num);
-
-	return err;
-}
-
-/**
- * ext3_splice_branch - splice the allocated branch onto inode.
- * @handle: handle for this transaction
- * @inode: owner
- * @block: (logical) number of block we are adding
- * @where: location of missing link
- * @num:   number of indirect blocks we are adding
- * @blks:  number of direct blocks we are adding
- *
- * This function fills the missing link and does all housekeeping needed in
- * inode (->i_blocks, etc.). In case of success we end up with the full
- * chain to new block and return 0.
- */
-static int ext3_splice_branch(handle_t *handle, struct inode *inode,
-			long block, Indirect *where, int num, int blks)
-{
-	int i;
-	int err = 0;
-	struct ext3_block_alloc_info *block_i;
-	ext3_fsblk_t current_block;
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	struct timespec now;
-
-	block_i = ei->i_block_alloc_info;
-	/*
-	 * If we're splicing into a [td]indirect block (as opposed to the
-	 * inode) then we need to get write access to the [td]indirect block
-	 * before the splice.
-	 */
-	if (where->bh) {
-		BUFFER_TRACE(where->bh, "get_write_access");
-		err = ext3_journal_get_write_access(handle, where->bh);
-		if (err)
-			goto err_out;
-	}
-	/* That's it */
-
-	*where->p = where->key;
-
-	/*
-	 * Update the host buffer_head or inode to point to more just allocated
-	 * direct blocks blocks
-	 */
-	if (num == 0 && blks > 1) {
-		current_block = le32_to_cpu(where->key) + 1;
-		for (i = 1; i < blks; i++)
-			*(where->p + i ) = cpu_to_le32(current_block++);
-	}
-
-	/*
-	 * update the most recently allocated logical & physical block
-	 * in i_block_alloc_info, to assist find the proper goal block for next
-	 * allocation
-	 */
-	if (block_i) {
-		block_i->last_alloc_logical_block = block + blks - 1;
-		block_i->last_alloc_physical_block =
-				le32_to_cpu(where[num].key) + blks - 1;
-	}
-
-	/* We are done with atomic stuff, now do the rest of housekeeping */
-	now = CURRENT_TIME_SEC;
-	if (!timespec_equal(&inode->i_ctime, &now) || !where->bh) {
-		inode->i_ctime = now;
-		ext3_mark_inode_dirty(handle, inode);
-	}
-	/* ext3_mark_inode_dirty already updated i_sync_tid */
-	atomic_set(&ei->i_datasync_tid, handle->h_transaction->t_tid);
-
-	/* had we spliced it onto indirect block? */
-	if (where->bh) {
-		/*
-		 * If we spliced it onto an indirect block, we haven't
-		 * altered the inode.  Note however that if it is being spliced
-		 * onto an indirect block at the very end of the file (the
-		 * file is growing) then we *will* alter the inode to reflect
-		 * the new i_size.  But that is not done here - it is done in
-		 * generic_commit_write->__mark_inode_dirty->ext3_dirty_inode.
-		 */
-		jbd_debug(5, "splicing indirect only\n");
-		BUFFER_TRACE(where->bh, "call ext3_journal_dirty_metadata");
-		err = ext3_journal_dirty_metadata(handle, where->bh);
-		if (err)
-			goto err_out;
-	} else {
-		/*
-		 * OK, we spliced it into the inode itself on a direct block.
-		 * Inode was dirtied above.
-		 */
-		jbd_debug(5, "splicing direct\n");
-	}
-	return err;
-
-err_out:
-	for (i = 1; i <= num; i++) {
-		BUFFER_TRACE(where[i].bh, "call journal_forget");
-		ext3_journal_forget(handle, where[i].bh);
-		ext3_free_blocks(handle,inode,le32_to_cpu(where[i-1].key),1);
-	}
-	ext3_free_blocks(handle, inode, le32_to_cpu(where[num].key), blks);
-
-	return err;
-}
-
-/*
- * Allocation strategy is simple: if we have to allocate something, we will
- * have to go the whole way to leaf. So let's do it before attaching anything
- * to tree, set linkage between the newborn blocks, write them if sync is
- * required, recheck the path, free and repeat if check fails, otherwise
- * set the last missing link (that will protect us from any truncate-generated
- * removals - all blocks on the path are immune now) and possibly force the
- * write on the parent block.
- * That has a nice additional property: no special recovery from the failed
- * allocations is needed - we simply release blocks and do not touch anything
- * reachable from inode.
- *
- * `handle' can be NULL if create == 0.
- *
- * The BKL may not be held on entry here.  Be sure to take it early.
- * return > 0, # of blocks mapped or allocated.
- * return = 0, if plain lookup failed.
- * return < 0, error case.
- */
-int ext3_get_blocks_handle(handle_t *handle, struct inode *inode,
-		sector_t iblock, unsigned long maxblocks,
-		struct buffer_head *bh_result,
-		int create)
-{
-	int err = -EIO;
-	int offsets[4];
-	Indirect chain[4];
-	Indirect *partial;
-	ext3_fsblk_t goal;
-	int indirect_blks;
-	int blocks_to_boundary = 0;
-	int depth;
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	int count = 0;
-	ext3_fsblk_t first_block = 0;
-
-
-	trace_ext3_get_blocks_enter(inode, iblock, maxblocks, create);
-	J_ASSERT(handle != NULL || create == 0);
-	depth = ext3_block_to_path(inode,iblock,offsets,&blocks_to_boundary);
-
-	if (depth == 0)
-		goto out;
-
-	partial = ext3_get_branch(inode, depth, offsets, chain, &err);
-
-	/* Simplest case - block found, no allocation needed */
-	if (!partial) {
-		first_block = le32_to_cpu(chain[depth - 1].key);
-		clear_buffer_new(bh_result);
-		count++;
-		/*map more blocks*/
-		while (count < maxblocks && count <= blocks_to_boundary) {
-			ext3_fsblk_t blk;
-
-			if (!verify_chain(chain, chain + depth - 1)) {
-				/*
-				 * Indirect block might be removed by
-				 * truncate while we were reading it.
-				 * Handling of that case: forget what we've
-				 * got now. Flag the err as EAGAIN, so it
-				 * will reread.
-				 */
-				err = -EAGAIN;
-				count = 0;
-				break;
-			}
-			blk = le32_to_cpu(*(chain[depth-1].p + count));
-
-			if (blk == first_block + count)
-				count++;
-			else
-				break;
-		}
-		if (err != -EAGAIN)
-			goto got_it;
-	}
-
-	/* Next simple case - plain lookup or failed read of indirect block */
-	if (!create || err == -EIO)
-		goto cleanup;
-
-	/*
-	 * Block out ext3_truncate while we alter the tree
-	 */
-	mutex_lock(&ei->truncate_mutex);
-
-	/*
-	 * If the indirect block is missing while we are reading
-	 * the chain(ext3_get_branch() returns -EAGAIN err), or
-	 * if the chain has been changed after we grab the semaphore,
-	 * (either because another process truncated this branch, or
-	 * another get_block allocated this branch) re-grab the chain to see if
-	 * the request block has been allocated or not.
-	 *
-	 * Since we already block the truncate/other get_block
-	 * at this point, we will have the current copy of the chain when we
-	 * splice the branch into the tree.
-	 */
-	if (err == -EAGAIN || !verify_chain(chain, partial)) {
-		while (partial > chain) {
-			brelse(partial->bh);
-			partial--;
-		}
-		partial = ext3_get_branch(inode, depth, offsets, chain, &err);
-		if (!partial) {
-			count++;
-			mutex_unlock(&ei->truncate_mutex);
-			if (err)
-				goto cleanup;
-			clear_buffer_new(bh_result);
-			goto got_it;
-		}
-	}
-
-	/*
-	 * Okay, we need to do block allocation.  Lazily initialize the block
-	 * allocation info here if necessary
-	*/
-	if (S_ISREG(inode->i_mode) && (!ei->i_block_alloc_info))
-		ext3_init_block_alloc_info(inode);
-
-	goal = ext3_find_goal(inode, iblock, partial);
-
-	/* the number of blocks need to allocate for [d,t]indirect blocks */
-	indirect_blks = (chain + depth) - partial - 1;
-
-	/*
-	 * Next look up the indirect map to count the totoal number of
-	 * direct blocks to allocate for this branch.
-	 */
-	count = ext3_blks_to_allocate(partial, indirect_blks,
-					maxblocks, blocks_to_boundary);
-	err = ext3_alloc_branch(handle, inode, indirect_blks, &count, goal,
-				offsets + (partial - chain), partial);
-
-	/*
-	 * The ext3_splice_branch call will free and forget any buffers
-	 * on the new chain if there is a failure, but that risks using
-	 * up transaction credits, especially for bitmaps where the
-	 * credits cannot be returned.  Can we handle this somehow?  We
-	 * may need to return -EAGAIN upwards in the worst case.  --sct
-	 */
-	if (!err)
-		err = ext3_splice_branch(handle, inode, iblock,
-					partial, indirect_blks, count);
-	mutex_unlock(&ei->truncate_mutex);
-	if (err)
-		goto cleanup;
-
-	set_buffer_new(bh_result);
-got_it:
-	map_bh(bh_result, inode->i_sb, le32_to_cpu(chain[depth-1].key));
-	if (count > blocks_to_boundary)
-		set_buffer_boundary(bh_result);
-	err = count;
-	/* Clean up and exit */
-	partial = chain + depth - 1;	/* the whole chain */
-cleanup:
-	while (partial > chain) {
-		BUFFER_TRACE(partial->bh, "call brelse");
-		brelse(partial->bh);
-		partial--;
-	}
-	BUFFER_TRACE(bh_result, "returned");
-out:
-	trace_ext3_get_blocks_exit(inode, iblock,
-				   depth ? le32_to_cpu(chain[depth-1].key) : 0,
-				   count, err);
-	return err;
-}
-
-/* Maximum number of blocks we map for direct IO at once. */
-#define DIO_MAX_BLOCKS 4096
-/*
- * Number of credits we need for writing DIO_MAX_BLOCKS:
- * We need sb + group descriptor + bitmap + inode -> 4
- * For B blocks with A block pointers per block we need:
- * 1 (triple ind.) + (B/A/A + 2) (doubly ind.) + (B/A + 2) (indirect).
- * If we plug in 4096 for B and 256 for A (for 1KB block size), we get 25.
- */
-#define DIO_CREDITS 25
-
-static int ext3_get_block(struct inode *inode, sector_t iblock,
-			struct buffer_head *bh_result, int create)
-{
-	handle_t *handle = ext3_journal_current_handle();
-	int ret = 0, started = 0;
-	unsigned max_blocks = bh_result->b_size >> inode->i_blkbits;
-
-	if (create && !handle) {	/* Direct IO write... */
-		if (max_blocks > DIO_MAX_BLOCKS)
-			max_blocks = DIO_MAX_BLOCKS;
-		handle = ext3_journal_start(inode, DIO_CREDITS +
-				EXT3_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb));
-		if (IS_ERR(handle)) {
-			ret = PTR_ERR(handle);
-			goto out;
-		}
-		started = 1;
-	}
-
-	ret = ext3_get_blocks_handle(handle, inode, iblock,
-					max_blocks, bh_result, create);
-	if (ret > 0) {
-		bh_result->b_size = (ret << inode->i_blkbits);
-		ret = 0;
-	}
-	if (started)
-		ext3_journal_stop(handle);
-out:
-	return ret;
-}
-
-int ext3_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
-		u64 start, u64 len)
-{
-	return generic_block_fiemap(inode, fieinfo, start, len,
-				    ext3_get_block);
-}
-
-/*
- * `handle' can be NULL if create is zero
- */
-struct buffer_head *ext3_getblk(handle_t *handle, struct inode *inode,
-				long block, int create, int *errp)
-{
-	struct buffer_head dummy;
-	int fatal = 0, err;
-
-	J_ASSERT(handle != NULL || create == 0);
-
-	dummy.b_state = 0;
-	dummy.b_blocknr = -1000;
-	buffer_trace_init(&dummy.b_history);
-	err = ext3_get_blocks_handle(handle, inode, block, 1,
-					&dummy, create);
-	/*
-	 * ext3_get_blocks_handle() returns number of blocks
-	 * mapped. 0 in case of a HOLE.
-	 */
-	if (err > 0) {
-		WARN_ON(err > 1);
-		err = 0;
-	}
-	*errp = err;
-	if (!err && buffer_mapped(&dummy)) {
-		struct buffer_head *bh;
-		bh = sb_getblk(inode->i_sb, dummy.b_blocknr);
-		if (unlikely(!bh)) {
-			*errp = -ENOMEM;
-			goto err;
-		}
-		if (buffer_new(&dummy)) {
-			J_ASSERT(create != 0);
-			J_ASSERT(handle != NULL);
-
-			/*
-			 * Now that we do not always journal data, we should
-			 * keep in mind whether this should always journal the
-			 * new buffer as metadata.  For now, regular file
-			 * writes use ext3_get_block instead, so it's not a
-			 * problem.
-			 */
-			lock_buffer(bh);
-			BUFFER_TRACE(bh, "call get_create_access");
-			fatal = ext3_journal_get_create_access(handle, bh);
-			if (!fatal && !buffer_uptodate(bh)) {
-				memset(bh->b_data,0,inode->i_sb->s_blocksize);
-				set_buffer_uptodate(bh);
-			}
-			unlock_buffer(bh);
-			BUFFER_TRACE(bh, "call ext3_journal_dirty_metadata");
-			err = ext3_journal_dirty_metadata(handle, bh);
-			if (!fatal)
-				fatal = err;
-		} else {
-			BUFFER_TRACE(bh, "not a new buffer");
-		}
-		if (fatal) {
-			*errp = fatal;
-			brelse(bh);
-			bh = NULL;
-		}
-		return bh;
-	}
-err:
-	return NULL;
-}
-
-struct buffer_head *ext3_bread(handle_t *handle, struct inode *inode,
-			       int block, int create, int *err)
-{
-	struct buffer_head * bh;
-
-	bh = ext3_getblk(handle, inode, block, create, err);
-	if (!bh)
-		return bh;
-	if (bh_uptodate_or_lock(bh))
-		return bh;
-	get_bh(bh);
-	bh->b_end_io = end_buffer_read_sync;
-	submit_bh(READ | REQ_META | REQ_PRIO, bh);
-	wait_on_buffer(bh);
-	if (buffer_uptodate(bh))
-		return bh;
-	put_bh(bh);
-	*err = -EIO;
-	return NULL;
-}
-
-static int walk_page_buffers(	handle_t *handle,
-				struct buffer_head *head,
-				unsigned from,
-				unsigned to,
-				int *partial,
-				int (*fn)(	handle_t *handle,
-						struct buffer_head *bh))
-{
-	struct buffer_head *bh;
-	unsigned block_start, block_end;
-	unsigned blocksize = head->b_size;
-	int err, ret = 0;
-	struct buffer_head *next;
-
-	for (	bh = head, block_start = 0;
-		ret == 0 && (bh != head || !block_start);
-		block_start = block_end, bh = next)
-	{
-		next = bh->b_this_page;
-		block_end = block_start + blocksize;
-		if (block_end <= from || block_start >= to) {
-			if (partial && !buffer_uptodate(bh))
-				*partial = 1;
-			continue;
-		}
-		err = (*fn)(handle, bh);
-		if (!ret)
-			ret = err;
-	}
-	return ret;
-}
-
-/*
- * To preserve ordering, it is essential that the hole instantiation and
- * the data write be encapsulated in a single transaction.  We cannot
- * close off a transaction and start a new one between the ext3_get_block()
- * and the commit_write().  So doing the journal_start at the start of
- * prepare_write() is the right place.
- *
- * Also, this function can nest inside ext3_writepage() ->
- * block_write_full_page(). In that case, we *know* that ext3_writepage()
- * has generated enough buffer credits to do the whole page.  So we won't
- * block on the journal in that case, which is good, because the caller may
- * be PF_MEMALLOC.
- *
- * By accident, ext3 can be reentered when a transaction is open via
- * quota file writes.  If we were to commit the transaction while thus
- * reentered, there can be a deadlock - we would be holding a quota
- * lock, and the commit would never complete if another thread had a
- * transaction open and was blocking on the quota lock - a ranking
- * violation.
- *
- * So what we do is to rely on the fact that journal_stop/journal_start
- * will _not_ run commit under these circumstances because handle->h_ref
- * is elevated.  We'll still have enough credits for the tiny quotafile
- * write.
- */
-static int do_journal_get_write_access(handle_t *handle,
-					struct buffer_head *bh)
-{
-	int dirty = buffer_dirty(bh);
-	int ret;
-
-	if (!buffer_mapped(bh) || buffer_freed(bh))
-		return 0;
-	/*
-	 * __block_prepare_write() could have dirtied some buffers. Clean
-	 * the dirty bit as jbd2_journal_get_write_access() could complain
-	 * otherwise about fs integrity issues. Setting of the dirty bit
-	 * by __block_prepare_write() isn't a real problem here as we clear
-	 * the bit before releasing a page lock and thus writeback cannot
-	 * ever write the buffer.
-	 */
-	if (dirty)
-		clear_buffer_dirty(bh);
-	ret = ext3_journal_get_write_access(handle, bh);
-	if (!ret && dirty)
-		ret = ext3_journal_dirty_metadata(handle, bh);
-	return ret;
-}
-
-/*
- * Truncate blocks that were not used by write. We have to truncate the
- * pagecache as well so that corresponding buffers get properly unmapped.
- */
-static void ext3_truncate_failed_write(struct inode *inode)
-{
-	truncate_inode_pages(inode->i_mapping, inode->i_size);
-	ext3_truncate(inode);
-}
-
-/*
- * Truncate blocks that were not used by direct IO write. We have to zero out
- * the last file block as well because direct IO might have written to it.
- */
-static void ext3_truncate_failed_direct_write(struct inode *inode)
-{
-	ext3_block_truncate_page(inode, inode->i_size);
-	ext3_truncate(inode);
-}
-
-static int ext3_write_begin(struct file *file, struct address_space *mapping,
-				loff_t pos, unsigned len, unsigned flags,
-				struct page **pagep, void **fsdata)
-{
-	struct inode *inode = mapping->host;
-	int ret;
-	handle_t *handle;
-	int retries = 0;
-	struct page *page;
-	pgoff_t index;
-	unsigned from, to;
-	/* Reserve one block more for addition to orphan list in case
-	 * we allocate blocks but write fails for some reason */
-	int needed_blocks = ext3_writepage_trans_blocks(inode) + 1;
-
-	trace_ext3_write_begin(inode, pos, len, flags);
-
-	index = pos >> PAGE_CACHE_SHIFT;
-	from = pos & (PAGE_CACHE_SIZE - 1);
-	to = from + len;
-
-retry:
-	page = grab_cache_page_write_begin(mapping, index, flags);
-	if (!page)
-		return -ENOMEM;
-	*pagep = page;
-
-	handle = ext3_journal_start(inode, needed_blocks);
-	if (IS_ERR(handle)) {
-		unlock_page(page);
-		page_cache_release(page);
-		ret = PTR_ERR(handle);
-		goto out;
-	}
-	ret = __block_write_begin(page, pos, len, ext3_get_block);
-	if (ret)
-		goto write_begin_failed;
-
-	if (ext3_should_journal_data(inode)) {
-		ret = walk_page_buffers(handle, page_buffers(page),
-				from, to, NULL, do_journal_get_write_access);
-	}
-write_begin_failed:
-	if (ret) {
-		/*
-		 * block_write_begin may have instantiated a few blocks
-		 * outside i_size.  Trim these off again. Don't need
-		 * i_size_read because we hold i_mutex.
-		 *
-		 * Add inode to orphan list in case we crash before truncate
-		 * finishes. Do this only if ext3_can_truncate() agrees so
-		 * that orphan processing code is happy.
-		 */
-		if (pos + len > inode->i_size && ext3_can_truncate(inode))
-			ext3_orphan_add(handle, inode);
-		ext3_journal_stop(handle);
-		unlock_page(page);
-		page_cache_release(page);
-		if (pos + len > inode->i_size)
-			ext3_truncate_failed_write(inode);
-	}
-	if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
-		goto retry;
-out:
-	return ret;
-}
-
-
-int ext3_journal_dirty_data(handle_t *handle, struct buffer_head *bh)
-{
-	int err = journal_dirty_data(handle, bh);
-	if (err)
-		ext3_journal_abort_handle(__func__, __func__,
-						bh, handle, err);
-	return err;
-}
-
-/* For ordered writepage and write_end functions */
-static int journal_dirty_data_fn(handle_t *handle, struct buffer_head *bh)
-{
-	/*
-	 * Write could have mapped the buffer but it didn't copy the data in
-	 * yet. So avoid filing such buffer into a transaction.
-	 */
-	if (buffer_mapped(bh) && buffer_uptodate(bh))
-		return ext3_journal_dirty_data(handle, bh);
-	return 0;
-}
-
-/* For write_end() in data=journal mode */
-static int write_end_fn(handle_t *handle, struct buffer_head *bh)
-{
-	if (!buffer_mapped(bh) || buffer_freed(bh))
-		return 0;
-	set_buffer_uptodate(bh);
-	return ext3_journal_dirty_metadata(handle, bh);
-}
-
-/*
- * This is nasty and subtle: ext3_write_begin() could have allocated blocks
- * for the whole page but later we failed to copy the data in. Update inode
- * size according to what we managed to copy. The rest is going to be
- * truncated in write_end function.
- */
-static void update_file_sizes(struct inode *inode, loff_t pos, unsigned copied)
-{
-	/* What matters to us is i_disksize. We don't write i_size anywhere */
-	if (pos + copied > inode->i_size)
-		i_size_write(inode, pos + copied);
-	if (pos + copied > EXT3_I(inode)->i_disksize) {
-		EXT3_I(inode)->i_disksize = pos + copied;
-		mark_inode_dirty(inode);
-	}
-}
-
-/*
- * We need to pick up the new inode size which generic_commit_write gave us
- * `file' can be NULL - eg, when called from page_symlink().
- *
- * ext3 never places buffers on inode->i_mapping->private_list.  metadata
- * buffers are managed internally.
- */
-static int ext3_ordered_write_end(struct file *file,
-				struct address_space *mapping,
-				loff_t pos, unsigned len, unsigned copied,
-				struct page *page, void *fsdata)
-{
-	handle_t *handle = ext3_journal_current_handle();
-	struct inode *inode = file->f_mapping->host;
-	unsigned from, to;
-	int ret = 0, ret2;
-
-	trace_ext3_ordered_write_end(inode, pos, len, copied);
-	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
-
-	from = pos & (PAGE_CACHE_SIZE - 1);
-	to = from + copied;
-	ret = walk_page_buffers(handle, page_buffers(page),
-		from, to, NULL, journal_dirty_data_fn);
-
-	if (ret == 0)
-		update_file_sizes(inode, pos, copied);
-	/*
-	 * There may be allocated blocks outside of i_size because
-	 * we failed to copy some data. Prepare for truncate.
-	 */
-	if (pos + len > inode->i_size && ext3_can_truncate(inode))
-		ext3_orphan_add(handle, inode);
-	ret2 = ext3_journal_stop(handle);
-	if (!ret)
-		ret = ret2;
-	unlock_page(page);
-	page_cache_release(page);
-
-	if (pos + len > inode->i_size)
-		ext3_truncate_failed_write(inode);
-	return ret ? ret : copied;
-}
-
-static int ext3_writeback_write_end(struct file *file,
-				struct address_space *mapping,
-				loff_t pos, unsigned len, unsigned copied,
-				struct page *page, void *fsdata)
-{
-	handle_t *handle = ext3_journal_current_handle();
-	struct inode *inode = file->f_mapping->host;
-	int ret;
-
-	trace_ext3_writeback_write_end(inode, pos, len, copied);
-	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
-	update_file_sizes(inode, pos, copied);
-	/*
-	 * There may be allocated blocks outside of i_size because
-	 * we failed to copy some data. Prepare for truncate.
-	 */
-	if (pos + len > inode->i_size && ext3_can_truncate(inode))
-		ext3_orphan_add(handle, inode);
-	ret = ext3_journal_stop(handle);
-	unlock_page(page);
-	page_cache_release(page);
-
-	if (pos + len > inode->i_size)
-		ext3_truncate_failed_write(inode);
-	return ret ? ret : copied;
-}
-
-static int ext3_journalled_write_end(struct file *file,
-				struct address_space *mapping,
-				loff_t pos, unsigned len, unsigned copied,
-				struct page *page, void *fsdata)
-{
-	handle_t *handle = ext3_journal_current_handle();
-	struct inode *inode = mapping->host;
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	int ret = 0, ret2;
-	int partial = 0;
-	unsigned from, to;
-
-	trace_ext3_journalled_write_end(inode, pos, len, copied);
-	from = pos & (PAGE_CACHE_SIZE - 1);
-	to = from + len;
-
-	if (copied < len) {
-		if (!PageUptodate(page))
-			copied = 0;
-		page_zero_new_buffers(page, from + copied, to);
-		to = from + copied;
-	}
-
-	ret = walk_page_buffers(handle, page_buffers(page), from,
-				to, &partial, write_end_fn);
-	if (!partial)
-		SetPageUptodate(page);
-
-	if (pos + copied > inode->i_size)
-		i_size_write(inode, pos + copied);
-	/*
-	 * There may be allocated blocks outside of i_size because
-	 * we failed to copy some data. Prepare for truncate.
-	 */
-	if (pos + len > inode->i_size && ext3_can_truncate(inode))
-		ext3_orphan_add(handle, inode);
-	ext3_set_inode_state(inode, EXT3_STATE_JDATA);
-	atomic_set(&ei->i_datasync_tid, handle->h_transaction->t_tid);
-	if (inode->i_size > ei->i_disksize) {
-		ei->i_disksize = inode->i_size;
-		ret2 = ext3_mark_inode_dirty(handle, inode);
-		if (!ret)
-			ret = ret2;
-	}
-
-	ret2 = ext3_journal_stop(handle);
-	if (!ret)
-		ret = ret2;
-	unlock_page(page);
-	page_cache_release(page);
-
-	if (pos + len > inode->i_size)
-		ext3_truncate_failed_write(inode);
-	return ret ? ret : copied;
-}
-
-/*
- * bmap() is special.  It gets used by applications such as lilo and by
- * the swapper to find the on-disk block of a specific piece of data.
- *
- * Naturally, this is dangerous if the block concerned is still in the
- * journal.  If somebody makes a swapfile on an ext3 data-journaling
- * filesystem and enables swap, then they may get a nasty shock when the
- * data getting swapped to that swapfile suddenly gets overwritten by
- * the original zero's written out previously to the journal and
- * awaiting writeback in the kernel's buffer cache.
- *
- * So, if we see any bmap calls here on a modified, data-journaled file,
- * take extra steps to flush any blocks which might be in the cache.
- */
-static sector_t ext3_bmap(struct address_space *mapping, sector_t block)
-{
-	struct inode *inode = mapping->host;
-	journal_t *journal;
-	int err;
-
-	if (ext3_test_inode_state(inode, EXT3_STATE_JDATA)) {
-		/*
-		 * This is a REALLY heavyweight approach, but the use of
-		 * bmap on dirty files is expected to be extremely rare:
-		 * only if we run lilo or swapon on a freshly made file
-		 * do we expect this to happen.
-		 *
-		 * (bmap requires CAP_SYS_RAWIO so this does not
-		 * represent an unprivileged user DOS attack --- we'd be
-		 * in trouble if mortal users could trigger this path at
-		 * will.)
-		 *
-		 * NB. EXT3_STATE_JDATA is not set on files other than
-		 * regular files.  If somebody wants to bmap a directory
-		 * or symlink and gets confused because the buffer
-		 * hasn't yet been flushed to disk, they deserve
-		 * everything they get.
-		 */
-
-		ext3_clear_inode_state(inode, EXT3_STATE_JDATA);
-		journal = EXT3_JOURNAL(inode);
-		journal_lock_updates(journal);
-		err = journal_flush(journal);
-		journal_unlock_updates(journal);
-
-		if (err)
-			return 0;
-	}
-
-	return generic_block_bmap(mapping,block,ext3_get_block);
-}
-
-static int bget_one(handle_t *handle, struct buffer_head *bh)
-{
-	get_bh(bh);
-	return 0;
-}
-
-static int bput_one(handle_t *handle, struct buffer_head *bh)
-{
-	put_bh(bh);
-	return 0;
-}
-
-static int buffer_unmapped(handle_t *handle, struct buffer_head *bh)
-{
-	return !buffer_mapped(bh);
-}
-
-/*
- * Note that whenever we need to map blocks we start a transaction even if
- * we're not journalling data.  This is to preserve ordering: any hole
- * instantiation within __block_write_full_page -> ext3_get_block() should be
- * journalled along with the data so we don't crash and then get metadata which
- * refers to old data.
- *
- * In all journalling modes block_write_full_page() will start the I/O.
- *
- * We don't honour synchronous mounts for writepage().  That would be
- * disastrous.  Any write() or metadata operation will sync the fs for
- * us.
- */
-static int ext3_ordered_writepage(struct page *page,
-				struct writeback_control *wbc)
-{
-	struct inode *inode = page->mapping->host;
-	struct buffer_head *page_bufs;
-	handle_t *handle = NULL;
-	int ret = 0;
-	int err;
-
-	J_ASSERT(PageLocked(page));
-	/*
-	 * We don't want to warn for emergency remount. The condition is
-	 * ordered to avoid dereferencing inode->i_sb in non-error case to
-	 * avoid slow-downs.
-	 */
-	WARN_ON_ONCE(IS_RDONLY(inode) &&
-		     !(EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ERROR_FS));
-
-	/*
-	 * We give up here if we're reentered, because it might be for a
-	 * different filesystem.
-	 */
-	if (ext3_journal_current_handle())
-		goto out_fail;
-
-	trace_ext3_ordered_writepage(page);
-	if (!page_has_buffers(page)) {
-		create_empty_buffers(page, inode->i_sb->s_blocksize,
-				(1 << BH_Dirty)|(1 << BH_Uptodate));
-		page_bufs = page_buffers(page);
-	} else {
-		page_bufs = page_buffers(page);
-		if (!walk_page_buffers(NULL, page_bufs, 0, PAGE_CACHE_SIZE,
-				       NULL, buffer_unmapped)) {
-			/* Provide NULL get_block() to catch bugs if buffers
-			 * weren't really mapped */
-			return block_write_full_page(page, NULL, wbc);
-		}
-	}
-	handle = ext3_journal_start(inode, ext3_writepage_trans_blocks(inode));
-
-	if (IS_ERR(handle)) {
-		ret = PTR_ERR(handle);
-		goto out_fail;
-	}
-
-	walk_page_buffers(handle, page_bufs, 0,
-			PAGE_CACHE_SIZE, NULL, bget_one);
-
-	ret = block_write_full_page(page, ext3_get_block, wbc);
-
-	/*
-	 * The page can become unlocked at any point now, and
-	 * truncate can then come in and change things.  So we
-	 * can't touch *page from now on.  But *page_bufs is
-	 * safe due to elevated refcount.
-	 */
-
-	/*
-	 * And attach them to the current transaction.  But only if
-	 * block_write_full_page() succeeded.  Otherwise they are unmapped,
-	 * and generally junk.
-	 */
-	if (ret == 0)
-		ret = walk_page_buffers(handle, page_bufs, 0, PAGE_CACHE_SIZE,
-					NULL, journal_dirty_data_fn);
-	walk_page_buffers(handle, page_bufs, 0,
-			PAGE_CACHE_SIZE, NULL, bput_one);
-	err = ext3_journal_stop(handle);
-	if (!ret)
-		ret = err;
-	return ret;
-
-out_fail:
-	redirty_page_for_writepage(wbc, page);
-	unlock_page(page);
-	return ret;
-}
-
-static int ext3_writeback_writepage(struct page *page,
-				struct writeback_control *wbc)
-{
-	struct inode *inode = page->mapping->host;
-	handle_t *handle = NULL;
-	int ret = 0;
-	int err;
-
-	J_ASSERT(PageLocked(page));
-	/*
-	 * We don't want to warn for emergency remount. The condition is
-	 * ordered to avoid dereferencing inode->i_sb in non-error case to
-	 * avoid slow-downs.
-	 */
-	WARN_ON_ONCE(IS_RDONLY(inode) &&
-		     !(EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ERROR_FS));
-
-	if (ext3_journal_current_handle())
-		goto out_fail;
-
-	trace_ext3_writeback_writepage(page);
-	if (page_has_buffers(page)) {
-		if (!walk_page_buffers(NULL, page_buffers(page), 0,
-				      PAGE_CACHE_SIZE, NULL, buffer_unmapped)) {
-			/* Provide NULL get_block() to catch bugs if buffers
-			 * weren't really mapped */
-			return block_write_full_page(page, NULL, wbc);
-		}
-	}
-
-	handle = ext3_journal_start(inode, ext3_writepage_trans_blocks(inode));
-	if (IS_ERR(handle)) {
-		ret = PTR_ERR(handle);
-		goto out_fail;
-	}
-
-	ret = block_write_full_page(page, ext3_get_block, wbc);
-
-	err = ext3_journal_stop(handle);
-	if (!ret)
-		ret = err;
-	return ret;
-
-out_fail:
-	redirty_page_for_writepage(wbc, page);
-	unlock_page(page);
-	return ret;
-}
-
-static int ext3_journalled_writepage(struct page *page,
-				struct writeback_control *wbc)
-{
-	struct inode *inode = page->mapping->host;
-	handle_t *handle = NULL;
-	int ret = 0;
-	int err;
-
-	J_ASSERT(PageLocked(page));
-	/*
-	 * We don't want to warn for emergency remount. The condition is
-	 * ordered to avoid dereferencing inode->i_sb in non-error case to
-	 * avoid slow-downs.
-	 */
-	WARN_ON_ONCE(IS_RDONLY(inode) &&
-		     !(EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ERROR_FS));
-
-	trace_ext3_journalled_writepage(page);
-	if (!page_has_buffers(page) || PageChecked(page)) {
-		if (ext3_journal_current_handle())
-			goto no_write;
-
-		handle = ext3_journal_start(inode,
-					    ext3_writepage_trans_blocks(inode));
-		if (IS_ERR(handle)) {
-			ret = PTR_ERR(handle);
-			goto no_write;
-		}
-		/*
-		 * It's mmapped pagecache.  Add buffers and journal it.  There
-		 * doesn't seem much point in redirtying the page here.
-		 */
-		ClearPageChecked(page);
-		ret = __block_write_begin(page, 0, PAGE_CACHE_SIZE,
-					  ext3_get_block);
-		if (ret != 0) {
-			ext3_journal_stop(handle);
-			goto out_unlock;
-		}
-		ret = walk_page_buffers(handle, page_buffers(page), 0,
-			PAGE_CACHE_SIZE, NULL, do_journal_get_write_access);
-
-		err = walk_page_buffers(handle, page_buffers(page), 0,
-				PAGE_CACHE_SIZE, NULL, write_end_fn);
-		if (ret == 0)
-			ret = err;
-		ext3_set_inode_state(inode, EXT3_STATE_JDATA);
-		atomic_set(&EXT3_I(inode)->i_datasync_tid,
-			   handle->h_transaction->t_tid);
-		unlock_page(page);
-		err = ext3_journal_stop(handle);
-		if (!ret)
-			ret = err;
-	} else {
-		/*
-		 * It is a page full of checkpoint-mode buffers. Go and write
-		 * them. They should have been already mapped when they went
-		 * to the journal so provide NULL get_block function to catch
-		 * errors.
-		 */
-		ret = block_write_full_page(page, NULL, wbc);
-	}
-out:
-	return ret;
-
-no_write:
-	redirty_page_for_writepage(wbc, page);
-out_unlock:
-	unlock_page(page);
-	goto out;
-}
-
-static int ext3_readpage(struct file *file, struct page *page)
-{
-	trace_ext3_readpage(page);
-	return mpage_readpage(page, ext3_get_block);
-}
-
-static int
-ext3_readpages(struct file *file, struct address_space *mapping,
-		struct list_head *pages, unsigned nr_pages)
-{
-	return mpage_readpages(mapping, pages, nr_pages, ext3_get_block);
-}
-
-static void ext3_invalidatepage(struct page *page, unsigned int offset,
-				unsigned int length)
-{
-	journal_t *journal = EXT3_JOURNAL(page->mapping->host);
-
-	trace_ext3_invalidatepage(page, offset, length);
-
-	/*
-	 * If it's a full truncate we just forget about the pending dirtying
-	 */
-	if (offset == 0 && length == PAGE_CACHE_SIZE)
-		ClearPageChecked(page);
-
-	journal_invalidatepage(journal, page, offset, length);
-}
-
-static int ext3_releasepage(struct page *page, gfp_t wait)
-{
-	journal_t *journal = EXT3_JOURNAL(page->mapping->host);
-
-	trace_ext3_releasepage(page);
-	WARN_ON(PageChecked(page));
-	if (!page_has_buffers(page))
-		return 0;
-	return journal_try_to_free_buffers(journal, page, wait);
-}
-
-/*
- * If the O_DIRECT write will extend the file then add this inode to the
- * orphan list.  So recovery will truncate it back to the original size
- * if the machine crashes during the write.
- *
- * If the O_DIRECT write is intantiating holes inside i_size and the machine
- * crashes then stale disk data _may_ be exposed inside the file. But current
- * VFS code falls back into buffered path in that case so we are safe.
- */
-static ssize_t ext3_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
-			      loff_t offset)
-{
-	struct file *file = iocb->ki_filp;
-	struct inode *inode = file->f_mapping->host;
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	handle_t *handle;
-	ssize_t ret;
-	int orphan = 0;
-	size_t count = iov_iter_count(iter);
-	int retries = 0;
-
-	trace_ext3_direct_IO_enter(inode, offset, count, iov_iter_rw(iter));
-
-	if (iov_iter_rw(iter) == WRITE) {
-		loff_t final_size = offset + count;
-
-		if (final_size > inode->i_size) {
-			/* Credits for sb + inode write */
-			handle = ext3_journal_start(inode, 2);
-			if (IS_ERR(handle)) {
-				ret = PTR_ERR(handle);
-				goto out;
-			}
-			ret = ext3_orphan_add(handle, inode);
-			if (ret) {
-				ext3_journal_stop(handle);
-				goto out;
-			}
-			orphan = 1;
-			ei->i_disksize = inode->i_size;
-			ext3_journal_stop(handle);
-		}
-	}
-
-retry:
-	ret = blockdev_direct_IO(iocb, inode, iter, offset, ext3_get_block);
-	/*
-	 * In case of error extending write may have instantiated a few
-	 * blocks outside i_size. Trim these off again.
-	 */
-	if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
-		loff_t isize = i_size_read(inode);
-		loff_t end = offset + count;
-
-		if (end > isize)
-			ext3_truncate_failed_direct_write(inode);
-	}
-	if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
-		goto retry;
-
-	if (orphan) {
-		int err;
-
-		/* Credits for sb + inode write */
-		handle = ext3_journal_start(inode, 2);
-		if (IS_ERR(handle)) {
-			/* This is really bad luck. We've written the data
-			 * but cannot extend i_size. Truncate allocated blocks
-			 * and pretend the write failed... */
-			ext3_truncate_failed_direct_write(inode);
-			ret = PTR_ERR(handle);
-			if (inode->i_nlink)
-				ext3_orphan_del(NULL, inode);
-			goto out;
-		}
-		if (inode->i_nlink)
-			ext3_orphan_del(handle, inode);
-		if (ret > 0) {
-			loff_t end = offset + ret;
-			if (end > inode->i_size) {
-				ei->i_disksize = end;
-				i_size_write(inode, end);
-				/*
-				 * We're going to return a positive `ret'
-				 * here due to non-zero-length I/O, so there's
-				 * no way of reporting error returns from
-				 * ext3_mark_inode_dirty() to userspace.  So
-				 * ignore it.
-				 */
-				ext3_mark_inode_dirty(handle, inode);
-			}
-		}
-		err = ext3_journal_stop(handle);
-		if (ret == 0)
-			ret = err;
-	}
-out:
-	trace_ext3_direct_IO_exit(inode, offset, count, iov_iter_rw(iter), ret);
-	return ret;
-}
-
-/*
- * Pages can be marked dirty completely asynchronously from ext3's journalling
- * activity.  By filemap_sync_pte(), try_to_unmap_one(), etc.  We cannot do
- * much here because ->set_page_dirty is called under VFS locks.  The page is
- * not necessarily locked.
- *
- * We cannot just dirty the page and leave attached buffers clean, because the
- * buffers' dirty state is "definitive".  We cannot just set the buffers dirty
- * or jbddirty because all the journalling code will explode.
- *
- * So what we do is to mark the page "pending dirty" and next time writepage
- * is called, propagate that into the buffers appropriately.
- */
-static int ext3_journalled_set_page_dirty(struct page *page)
-{
-	SetPageChecked(page);
-	return __set_page_dirty_nobuffers(page);
-}
-
-static const struct address_space_operations ext3_ordered_aops = {
-	.readpage		= ext3_readpage,
-	.readpages		= ext3_readpages,
-	.writepage		= ext3_ordered_writepage,
-	.write_begin		= ext3_write_begin,
-	.write_end		= ext3_ordered_write_end,
-	.bmap			= ext3_bmap,
-	.invalidatepage		= ext3_invalidatepage,
-	.releasepage		= ext3_releasepage,
-	.direct_IO		= ext3_direct_IO,
-	.migratepage		= buffer_migrate_page,
-	.is_partially_uptodate  = block_is_partially_uptodate,
-	.is_dirty_writeback	= buffer_check_dirty_writeback,
-	.error_remove_page	= generic_error_remove_page,
-};
-
-static const struct address_space_operations ext3_writeback_aops = {
-	.readpage		= ext3_readpage,
-	.readpages		= ext3_readpages,
-	.writepage		= ext3_writeback_writepage,
-	.write_begin		= ext3_write_begin,
-	.write_end		= ext3_writeback_write_end,
-	.bmap			= ext3_bmap,
-	.invalidatepage		= ext3_invalidatepage,
-	.releasepage		= ext3_releasepage,
-	.direct_IO		= ext3_direct_IO,
-	.migratepage		= buffer_migrate_page,
-	.is_partially_uptodate  = block_is_partially_uptodate,
-	.error_remove_page	= generic_error_remove_page,
-};
-
-static const struct address_space_operations ext3_journalled_aops = {
-	.readpage		= ext3_readpage,
-	.readpages		= ext3_readpages,
-	.writepage		= ext3_journalled_writepage,
-	.write_begin		= ext3_write_begin,
-	.write_end		= ext3_journalled_write_end,
-	.set_page_dirty		= ext3_journalled_set_page_dirty,
-	.bmap			= ext3_bmap,
-	.invalidatepage		= ext3_invalidatepage,
-	.releasepage		= ext3_releasepage,
-	.is_partially_uptodate  = block_is_partially_uptodate,
-	.error_remove_page	= generic_error_remove_page,
-};
-
-void ext3_set_aops(struct inode *inode)
-{
-	if (ext3_should_order_data(inode))
-		inode->i_mapping->a_ops = &ext3_ordered_aops;
-	else if (ext3_should_writeback_data(inode))
-		inode->i_mapping->a_ops = &ext3_writeback_aops;
-	else
-		inode->i_mapping->a_ops = &ext3_journalled_aops;
-}
-
-/*
- * ext3_block_truncate_page() zeroes out a mapping from file offset `from'
- * up to the end of the block which corresponds to `from'.
- * This required during truncate. We need to physically zero the tail end
- * of that block so it doesn't yield old data if the file is later grown.
- */
-static int ext3_block_truncate_page(struct inode *inode, loff_t from)
-{
-	ext3_fsblk_t index = from >> PAGE_CACHE_SHIFT;
-	unsigned offset = from & (PAGE_CACHE_SIZE - 1);
-	unsigned blocksize, iblock, length, pos;
-	struct page *page;
-	handle_t *handle = NULL;
-	struct buffer_head *bh;
-	int err = 0;
-
-	/* Truncated on block boundary - nothing to do */
-	blocksize = inode->i_sb->s_blocksize;
-	if ((from & (blocksize - 1)) == 0)
-		return 0;
-
-	page = grab_cache_page(inode->i_mapping, index);
-	if (!page)
-		return -ENOMEM;
-	length = blocksize - (offset & (blocksize - 1));
-	iblock = index << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
-
-	if (!page_has_buffers(page))
-		create_empty_buffers(page, blocksize, 0);
-
-	/* Find the buffer that contains "offset" */
-	bh = page_buffers(page);
-	pos = blocksize;
-	while (offset >= pos) {
-		bh = bh->b_this_page;
-		iblock++;
-		pos += blocksize;
-	}
-
-	err = 0;
-	if (buffer_freed(bh)) {
-		BUFFER_TRACE(bh, "freed: skip");
-		goto unlock;
-	}
-
-	if (!buffer_mapped(bh)) {
-		BUFFER_TRACE(bh, "unmapped");
-		ext3_get_block(inode, iblock, bh, 0);
-		/* unmapped? It's a hole - nothing to do */
-		if (!buffer_mapped(bh)) {
-			BUFFER_TRACE(bh, "still unmapped");
-			goto unlock;
-		}
-	}
-
-	/* Ok, it's mapped. Make sure it's up-to-date */
-	if (PageUptodate(page))
-		set_buffer_uptodate(bh);
-
-	if (!bh_uptodate_or_lock(bh)) {
-		err = bh_submit_read(bh);
-		/* Uhhuh. Read error. Complain and punt. */
-		if (err)
-			goto unlock;
-	}
-
-	/* data=writeback mode doesn't need transaction to zero-out data */
-	if (!ext3_should_writeback_data(inode)) {
-		/* We journal at most one block */
-		handle = ext3_journal_start(inode, 1);
-		if (IS_ERR(handle)) {
-			clear_highpage(page);
-			flush_dcache_page(page);
-			err = PTR_ERR(handle);
-			goto unlock;
-		}
-	}
-
-	if (ext3_should_journal_data(inode)) {
-		BUFFER_TRACE(bh, "get write access");
-		err = ext3_journal_get_write_access(handle, bh);
-		if (err)
-			goto stop;
-	}
-
-	zero_user(page, offset, length);
-	BUFFER_TRACE(bh, "zeroed end of block");
-
-	err = 0;
-	if (ext3_should_journal_data(inode)) {
-		err = ext3_journal_dirty_metadata(handle, bh);
-	} else {
-		if (ext3_should_order_data(inode))
-			err = ext3_journal_dirty_data(handle, bh);
-		mark_buffer_dirty(bh);
-	}
-stop:
-	if (handle)
-		ext3_journal_stop(handle);
-
-unlock:
-	unlock_page(page);
-	page_cache_release(page);
-	return err;
-}
-
-/*
- * Probably it should be a library function... search for first non-zero word
- * or memcmp with zero_page, whatever is better for particular architecture.
- * Linus?
- */
-static inline int all_zeroes(__le32 *p, __le32 *q)
-{
-	while (p < q)
-		if (*p++)
-			return 0;
-	return 1;
-}
-
-/**
- *	ext3_find_shared - find the indirect blocks for partial truncation.
- *	@inode:	  inode in question
- *	@depth:	  depth of the affected branch
- *	@offsets: offsets of pointers in that branch (see ext3_block_to_path)
- *	@chain:	  place to store the pointers to partial indirect blocks
- *	@top:	  place to the (detached) top of branch
- *
- *	This is a helper function used by ext3_truncate().
- *
- *	When we do truncate() we may have to clean the ends of several
- *	indirect blocks but leave the blocks themselves alive. Block is
- *	partially truncated if some data below the new i_size is referred
- *	from it (and it is on the path to the first completely truncated
- *	data block, indeed).  We have to free the top of that path along
- *	with everything to the right of the path. Since no allocation
- *	past the truncation point is possible until ext3_truncate()
- *	finishes, we may safely do the latter, but top of branch may
- *	require special attention - pageout below the truncation point
- *	might try to populate it.
- *
- *	We atomically detach the top of branch from the tree, store the
- *	block number of its root in *@top, pointers to buffer_heads of
- *	partially truncated blocks - in @chain[].bh and pointers to
- *	their last elements that should not be removed - in
- *	@chain[].p. Return value is the pointer to last filled element
- *	of @chain.
- *
- *	The work left to caller to do the actual freeing of subtrees:
- *		a) free the subtree starting from *@top
- *		b) free the subtrees whose roots are stored in
- *			(@chain[i].p+1 .. end of @chain[i].bh->b_data)
- *		c) free the subtrees growing from the inode past the @chain[0].
- *			(no partially truncated stuff there).  */
-
-static Indirect *ext3_find_shared(struct inode *inode, int depth,
-			int offsets[4], Indirect chain[4], __le32 *top)
-{
-	Indirect *partial, *p;
-	int k, err;
-
-	*top = 0;
-	/* Make k index the deepest non-null offset + 1 */
-	for (k = depth; k > 1 && !offsets[k-1]; k--)
-		;
-	partial = ext3_get_branch(inode, k, offsets, chain, &err);
-	/* Writer: pointers */
-	if (!partial)
-		partial = chain + k-1;
-	/*
-	 * If the branch acquired continuation since we've looked at it -
-	 * fine, it should all survive and (new) top doesn't belong to us.
-	 */
-	if (!partial->key && *partial->p)
-		/* Writer: end */
-		goto no_top;
-	for (p=partial; p>chain && all_zeroes((__le32*)p->bh->b_data,p->p); p--)
-		;
-	/*
-	 * OK, we've found the last block that must survive. The rest of our
-	 * branch should be detached before unlocking. However, if that rest
-	 * of branch is all ours and does not grow immediately from the inode
-	 * it's easier to cheat and just decrement partial->p.
-	 */
-	if (p == chain + k - 1 && p > chain) {
-		p->p--;
-	} else {
-		*top = *p->p;
-		/* Nope, don't do this in ext3.  Must leave the tree intact */
-#if 0
-		*p->p = 0;
-#endif
-	}
-	/* Writer: end */
-
-	while(partial > p) {
-		brelse(partial->bh);
-		partial--;
-	}
-no_top:
-	return partial;
-}
-
-/*
- * Zero a number of block pointers in either an inode or an indirect block.
- * If we restart the transaction we must again get write access to the
- * indirect block for further modification.
- *
- * We release `count' blocks on disk, but (last - first) may be greater
- * than `count' because there can be holes in there.
- */
-static void ext3_clear_blocks(handle_t *handle, struct inode *inode,
-		struct buffer_head *bh, ext3_fsblk_t block_to_free,
-		unsigned long count, __le32 *first, __le32 *last)
-{
-	__le32 *p;
-	if (try_to_extend_transaction(handle, inode)) {
-		if (bh) {
-			BUFFER_TRACE(bh, "call ext3_journal_dirty_metadata");
-			if (ext3_journal_dirty_metadata(handle, bh))
-				return;
-		}
-		ext3_mark_inode_dirty(handle, inode);
-		truncate_restart_transaction(handle, inode);
-		if (bh) {
-			BUFFER_TRACE(bh, "retaking write access");
-			if (ext3_journal_get_write_access(handle, bh))
-				return;
-		}
-	}
-
-	/*
-	 * Any buffers which are on the journal will be in memory. We find
-	 * them on the hash table so journal_revoke() will run journal_forget()
-	 * on them.  We've already detached each block from the file, so
-	 * bforget() in journal_forget() should be safe.
-	 *
-	 * AKPM: turn on bforget in journal_forget()!!!
-	 */
-	for (p = first; p < last; p++) {
-		u32 nr = le32_to_cpu(*p);
-		if (nr) {
-			struct buffer_head *bh;
-
-			*p = 0;
-			bh = sb_find_get_block(inode->i_sb, nr);
-			ext3_forget(handle, 0, inode, bh, nr);
-		}
-	}
-
-	ext3_free_blocks(handle, inode, block_to_free, count);
-}
-
-/**
- * ext3_free_data - free a list of data blocks
- * @handle:	handle for this transaction
- * @inode:	inode we are dealing with
- * @this_bh:	indirect buffer_head which contains *@first and *@last
- * @first:	array of block numbers
- * @last:	points immediately past the end of array
- *
- * We are freeing all blocks referred from that array (numbers are stored as
- * little-endian 32-bit) and updating @inode->i_blocks appropriately.
- *
- * We accumulate contiguous runs of blocks to free.  Conveniently, if these
- * blocks are contiguous then releasing them at one time will only affect one
- * or two bitmap blocks (+ group descriptor(s) and superblock) and we won't
- * actually use a lot of journal space.
- *
- * @this_bh will be %NULL if @first and @last point into the inode's direct
- * block pointers.
- */
-static void ext3_free_data(handle_t *handle, struct inode *inode,
-			   struct buffer_head *this_bh,
-			   __le32 *first, __le32 *last)
-{
-	ext3_fsblk_t block_to_free = 0;    /* Starting block # of a run */
-	unsigned long count = 0;	    /* Number of blocks in the run */
-	__le32 *block_to_free_p = NULL;	    /* Pointer into inode/ind
-					       corresponding to
-					       block_to_free */
-	ext3_fsblk_t nr;		    /* Current block # */
-	__le32 *p;			    /* Pointer into inode/ind
-					       for current block */
-	int err;
-
-	if (this_bh) {				/* For indirect block */
-		BUFFER_TRACE(this_bh, "get_write_access");
-		err = ext3_journal_get_write_access(handle, this_bh);
-		/* Important: if we can't update the indirect pointers
-		 * to the blocks, we can't free them. */
-		if (err)
-			return;
-	}
-
-	for (p = first; p < last; p++) {
-		nr = le32_to_cpu(*p);
-		if (nr) {
-			/* accumulate blocks to free if they're contiguous */
-			if (count == 0) {
-				block_to_free = nr;
-				block_to_free_p = p;
-				count = 1;
-			} else if (nr == block_to_free + count) {
-				count++;
-			} else {
-				ext3_clear_blocks(handle, inode, this_bh,
-						  block_to_free,
-						  count, block_to_free_p, p);
-				block_to_free = nr;
-				block_to_free_p = p;
-				count = 1;
-			}
-		}
-	}
-
-	if (count > 0)
-		ext3_clear_blocks(handle, inode, this_bh, block_to_free,
-				  count, block_to_free_p, p);
-
-	if (this_bh) {
-		BUFFER_TRACE(this_bh, "call ext3_journal_dirty_metadata");
-
-		/*
-		 * The buffer head should have an attached journal head at this
-		 * point. However, if the data is corrupted and an indirect
-		 * block pointed to itself, it would have been detached when
-		 * the block was cleared. Check for this instead of OOPSing.
-		 */
-		if (bh2jh(this_bh))
-			ext3_journal_dirty_metadata(handle, this_bh);
-		else
-			ext3_error(inode->i_sb, "ext3_free_data",
-				   "circular indirect block detected, "
-				   "inode=%lu, block=%llu",
-				   inode->i_ino,
-				   (unsigned long long)this_bh->b_blocknr);
-	}
-}
-
-/**
- *	ext3_free_branches - free an array of branches
- *	@handle: JBD handle for this transaction
- *	@inode:	inode we are dealing with
- *	@parent_bh: the buffer_head which contains *@first and *@last
- *	@first:	array of block numbers
- *	@last:	pointer immediately past the end of array
- *	@depth:	depth of the branches to free
- *
- *	We are freeing all blocks referred from these branches (numbers are
- *	stored as little-endian 32-bit) and updating @inode->i_blocks
- *	appropriately.
- */
-static void ext3_free_branches(handle_t *handle, struct inode *inode,
-			       struct buffer_head *parent_bh,
-			       __le32 *first, __le32 *last, int depth)
-{
-	ext3_fsblk_t nr;
-	__le32 *p;
-
-	if (is_handle_aborted(handle))
-		return;
-
-	if (depth--) {
-		struct buffer_head *bh;
-		int addr_per_block = EXT3_ADDR_PER_BLOCK(inode->i_sb);
-		p = last;
-		while (--p >= first) {
-			nr = le32_to_cpu(*p);
-			if (!nr)
-				continue;		/* A hole */
-
-			/* Go read the buffer for the next level down */
-			bh = sb_bread(inode->i_sb, nr);
-
-			/*
-			 * A read failure? Report error and clear slot
-			 * (should be rare).
-			 */
-			if (!bh) {
-				ext3_error(inode->i_sb, "ext3_free_branches",
-					   "Read failure, inode=%lu, block="E3FSBLK,
-					   inode->i_ino, nr);
-				continue;
-			}
-
-			/* This zaps the entire block.  Bottom up. */
-			BUFFER_TRACE(bh, "free child branches");
-			ext3_free_branches(handle, inode, bh,
-					   (__le32*)bh->b_data,
-					   (__le32*)bh->b_data + addr_per_block,
-					   depth);
-
-			/*
-			 * Everything below this this pointer has been
-			 * released.  Now let this top-of-subtree go.
-			 *
-			 * We want the freeing of this indirect block to be
-			 * atomic in the journal with the updating of the
-			 * bitmap block which owns it.  So make some room in
-			 * the journal.
-			 *
-			 * We zero the parent pointer *after* freeing its
-			 * pointee in the bitmaps, so if extend_transaction()
-			 * for some reason fails to put the bitmap changes and
-			 * the release into the same transaction, recovery
-			 * will merely complain about releasing a free block,
-			 * rather than leaking blocks.
-			 */
-			if (is_handle_aborted(handle))
-				return;
-			if (try_to_extend_transaction(handle, inode)) {
-				ext3_mark_inode_dirty(handle, inode);
-				truncate_restart_transaction(handle, inode);
-			}
-
-			/*
-			 * We've probably journalled the indirect block several
-			 * times during the truncate.  But it's no longer
-			 * needed and we now drop it from the transaction via
-			 * journal_revoke().
-			 *
-			 * That's easy if it's exclusively part of this
-			 * transaction.  But if it's part of the committing
-			 * transaction then journal_forget() will simply
-			 * brelse() it.  That means that if the underlying
-			 * block is reallocated in ext3_get_block(),
-			 * unmap_underlying_metadata() will find this block
-			 * and will try to get rid of it.  damn, damn. Thus
-			 * we don't allow a block to be reallocated until
-			 * a transaction freeing it has fully committed.
-			 *
-			 * We also have to make sure journal replay after a
-			 * crash does not overwrite non-journaled data blocks
-			 * with old metadata when the block got reallocated for
-			 * data.  Thus we have to store a revoke record for a
-			 * block in the same transaction in which we free the
-			 * block.
-			 */
-			ext3_forget(handle, 1, inode, bh, bh->b_blocknr);
-
-			ext3_free_blocks(handle, inode, nr, 1);
-
-			if (parent_bh) {
-				/*
-				 * The block which we have just freed is
-				 * pointed to by an indirect block: journal it
-				 */
-				BUFFER_TRACE(parent_bh, "get_write_access");
-				if (!ext3_journal_get_write_access(handle,
-								   parent_bh)){
-					*p = 0;
-					BUFFER_TRACE(parent_bh,
-					"call ext3_journal_dirty_metadata");
-					ext3_journal_dirty_metadata(handle,
-								    parent_bh);
-				}
-			}
-		}
-	} else {
-		/* We have reached the bottom of the tree. */
-		BUFFER_TRACE(parent_bh, "free data blocks");
-		ext3_free_data(handle, inode, parent_bh, first, last);
-	}
-}
-
-int ext3_can_truncate(struct inode *inode)
-{
-	if (S_ISREG(inode->i_mode))
-		return 1;
-	if (S_ISDIR(inode->i_mode))
-		return 1;
-	if (S_ISLNK(inode->i_mode))
-		return !ext3_inode_is_fast_symlink(inode);
-	return 0;
-}
-
-/*
- * ext3_truncate()
- *
- * We block out ext3_get_block() block instantiations across the entire
- * transaction, and VFS/VM ensures that ext3_truncate() cannot run
- * simultaneously on behalf of the same inode.
- *
- * As we work through the truncate and commit bits of it to the journal there
- * is one core, guiding principle: the file's tree must always be consistent on
- * disk.  We must be able to restart the truncate after a crash.
- *
- * The file's tree may be transiently inconsistent in memory (although it
- * probably isn't), but whenever we close off and commit a journal transaction,
- * the contents of (the filesystem + the journal) must be consistent and
- * restartable.  It's pretty simple, really: bottom up, right to left (although
- * left-to-right works OK too).
- *
- * Note that at recovery time, journal replay occurs *before* the restart of
- * truncate against the orphan inode list.
- *
- * The committed inode has the new, desired i_size (which is the same as
- * i_disksize in this case).  After a crash, ext3_orphan_cleanup() will see
- * that this inode's truncate did not complete and it will again call
- * ext3_truncate() to have another go.  So there will be instantiated blocks
- * to the right of the truncation point in a crashed ext3 filesystem.  But
- * that's fine - as long as they are linked from the inode, the post-crash
- * ext3_truncate() run will find them and release them.
- */
-void ext3_truncate(struct inode *inode)
-{
-	handle_t *handle;
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	__le32 *i_data = ei->i_data;
-	int addr_per_block = EXT3_ADDR_PER_BLOCK(inode->i_sb);
-	int offsets[4];
-	Indirect chain[4];
-	Indirect *partial;
-	__le32 nr = 0;
-	int n;
-	long last_block;
-	unsigned blocksize = inode->i_sb->s_blocksize;
-
-	trace_ext3_truncate_enter(inode);
-
-	if (!ext3_can_truncate(inode))
-		goto out_notrans;
-
-	if (inode->i_size == 0 && ext3_should_writeback_data(inode))
-		ext3_set_inode_state(inode, EXT3_STATE_FLUSH_ON_CLOSE);
-
-	handle = start_transaction(inode);
-	if (IS_ERR(handle))
-		goto out_notrans;
-
-	last_block = (inode->i_size + blocksize-1)
-					>> EXT3_BLOCK_SIZE_BITS(inode->i_sb);
-	n = ext3_block_to_path(inode, last_block, offsets, NULL);
-	if (n == 0)
-		goto out_stop;	/* error */
-
-	/*
-	 * OK.  This truncate is going to happen.  We add the inode to the
-	 * orphan list, so that if this truncate spans multiple transactions,
-	 * and we crash, we will resume the truncate when the filesystem
-	 * recovers.  It also marks the inode dirty, to catch the new size.
-	 *
-	 * Implication: the file must always be in a sane, consistent
-	 * truncatable state while each transaction commits.
-	 */
-	if (ext3_orphan_add(handle, inode))
-		goto out_stop;
-
-	/*
-	 * The orphan list entry will now protect us from any crash which
-	 * occurs before the truncate completes, so it is now safe to propagate
-	 * the new, shorter inode size (held for now in i_size) into the
-	 * on-disk inode. We do this via i_disksize, which is the value which
-	 * ext3 *really* writes onto the disk inode.
-	 */
-	ei->i_disksize = inode->i_size;
-
-	/*
-	 * From here we block out all ext3_get_block() callers who want to
-	 * modify the block allocation tree.
-	 */
-	mutex_lock(&ei->truncate_mutex);
-
-	if (n == 1) {		/* direct blocks */
-		ext3_free_data(handle, inode, NULL, i_data+offsets[0],
-			       i_data + EXT3_NDIR_BLOCKS);
-		goto do_indirects;
-	}
-
-	partial = ext3_find_shared(inode, n, offsets, chain, &nr);
-	/* Kill the top of shared branch (not detached) */
-	if (nr) {
-		if (partial == chain) {
-			/* Shared branch grows from the inode */
-			ext3_free_branches(handle, inode, NULL,
-					   &nr, &nr+1, (chain+n-1) - partial);
-			*partial->p = 0;
-			/*
-			 * We mark the inode dirty prior to restart,
-			 * and prior to stop.  No need for it here.
-			 */
-		} else {
-			/* Shared branch grows from an indirect block */
-			ext3_free_branches(handle, inode, partial->bh,
-					partial->p,
-					partial->p+1, (chain+n-1) - partial);
-		}
-	}
-	/* Clear the ends of indirect blocks on the shared branch */
-	while (partial > chain) {
-		ext3_free_branches(handle, inode, partial->bh, partial->p + 1,
-				   (__le32*)partial->bh->b_data+addr_per_block,
-				   (chain+n-1) - partial);
-		BUFFER_TRACE(partial->bh, "call brelse");
-		brelse (partial->bh);
-		partial--;
-	}
-do_indirects:
-	/* Kill the remaining (whole) subtrees */
-	switch (offsets[0]) {
-	default:
-		nr = i_data[EXT3_IND_BLOCK];
-		if (nr) {
-			ext3_free_branches(handle, inode, NULL, &nr, &nr+1, 1);
-			i_data[EXT3_IND_BLOCK] = 0;
-		}
-	case EXT3_IND_BLOCK:
-		nr = i_data[EXT3_DIND_BLOCK];
-		if (nr) {
-			ext3_free_branches(handle, inode, NULL, &nr, &nr+1, 2);
-			i_data[EXT3_DIND_BLOCK] = 0;
-		}
-	case EXT3_DIND_BLOCK:
-		nr = i_data[EXT3_TIND_BLOCK];
-		if (nr) {
-			ext3_free_branches(handle, inode, NULL, &nr, &nr+1, 3);
-			i_data[EXT3_TIND_BLOCK] = 0;
-		}
-	case EXT3_TIND_BLOCK:
-		;
-	}
-
-	ext3_discard_reservation(inode);
-
-	mutex_unlock(&ei->truncate_mutex);
-	inode->i_mtime = inode->i_ctime = CURRENT_TIME_SEC;
-	ext3_mark_inode_dirty(handle, inode);
-
-	/*
-	 * In a multi-transaction truncate, we only make the final transaction
-	 * synchronous
-	 */
-	if (IS_SYNC(inode))
-		handle->h_sync = 1;
-out_stop:
-	/*
-	 * If this was a simple ftruncate(), and the file will remain alive
-	 * then we need to clear up the orphan record which we created above.
-	 * However, if this was a real unlink then we were called by
-	 * ext3_evict_inode(), and we allow that function to clean up the
-	 * orphan info for us.
-	 */
-	if (inode->i_nlink)
-		ext3_orphan_del(handle, inode);
-
-	ext3_journal_stop(handle);
-	trace_ext3_truncate_exit(inode);
-	return;
-out_notrans:
-	/*
-	 * Delete the inode from orphan list so that it doesn't stay there
-	 * forever and trigger assertion on umount.
-	 */
-	if (inode->i_nlink)
-		ext3_orphan_del(NULL, inode);
-	trace_ext3_truncate_exit(inode);
-}
-
-static ext3_fsblk_t ext3_get_inode_block(struct super_block *sb,
-		unsigned long ino, struct ext3_iloc *iloc)
-{
-	unsigned long block_group;
-	unsigned long offset;
-	ext3_fsblk_t block;
-	struct ext3_group_desc *gdp;
-
-	if (!ext3_valid_inum(sb, ino)) {
-		/*
-		 * This error is already checked for in namei.c unless we are
-		 * looking at an NFS filehandle, in which case no error
-		 * report is needed
-		 */
-		return 0;
-	}
-
-	block_group = (ino - 1) / EXT3_INODES_PER_GROUP(sb);
-	gdp = ext3_get_group_desc(sb, block_group, NULL);
-	if (!gdp)
-		return 0;
-	/*
-	 * Figure out the offset within the block group inode table
-	 */
-	offset = ((ino - 1) % EXT3_INODES_PER_GROUP(sb)) *
-		EXT3_INODE_SIZE(sb);
-	block = le32_to_cpu(gdp->bg_inode_table) +
-		(offset >> EXT3_BLOCK_SIZE_BITS(sb));
-
-	iloc->block_group = block_group;
-	iloc->offset = offset & (EXT3_BLOCK_SIZE(sb) - 1);
-	return block;
-}
-
-/*
- * ext3_get_inode_loc returns with an extra refcount against the inode's
- * underlying buffer_head on success. If 'in_mem' is true, we have all
- * data in memory that is needed to recreate the on-disk version of this
- * inode.
- */
-static int __ext3_get_inode_loc(struct inode *inode,
-				struct ext3_iloc *iloc, int in_mem)
-{
-	ext3_fsblk_t block;
-	struct buffer_head *bh;
-
-	block = ext3_get_inode_block(inode->i_sb, inode->i_ino, iloc);
-	if (!block)
-		return -EIO;
-
-	bh = sb_getblk(inode->i_sb, block);
-	if (unlikely(!bh)) {
-		ext3_error (inode->i_sb, "ext3_get_inode_loc",
-				"unable to read inode block - "
-				"inode=%lu, block="E3FSBLK,
-				 inode->i_ino, block);
-		return -ENOMEM;
-	}
-	if (!buffer_uptodate(bh)) {
-		lock_buffer(bh);
-
-		/*
-		 * If the buffer has the write error flag, we have failed
-		 * to write out another inode in the same block.  In this
-		 * case, we don't have to read the block because we may
-		 * read the old inode data successfully.
-		 */
-		if (buffer_write_io_error(bh) && !buffer_uptodate(bh))
-			set_buffer_uptodate(bh);
-
-		if (buffer_uptodate(bh)) {
-			/* someone brought it uptodate while we waited */
-			unlock_buffer(bh);
-			goto has_buffer;
-		}
-
-		/*
-		 * If we have all information of the inode in memory and this
-		 * is the only valid inode in the block, we need not read the
-		 * block.
-		 */
-		if (in_mem) {
-			struct buffer_head *bitmap_bh;
-			struct ext3_group_desc *desc;
-			int inodes_per_buffer;
-			int inode_offset, i;
-			int block_group;
-			int start;
-
-			block_group = (inode->i_ino - 1) /
-					EXT3_INODES_PER_GROUP(inode->i_sb);
-			inodes_per_buffer = bh->b_size /
-				EXT3_INODE_SIZE(inode->i_sb);
-			inode_offset = ((inode->i_ino - 1) %
-					EXT3_INODES_PER_GROUP(inode->i_sb));
-			start = inode_offset & ~(inodes_per_buffer - 1);
-
-			/* Is the inode bitmap in cache? */
-			desc = ext3_get_group_desc(inode->i_sb,
-						block_group, NULL);
-			if (!desc)
-				goto make_io;
-
-			bitmap_bh = sb_getblk(inode->i_sb,
-					le32_to_cpu(desc->bg_inode_bitmap));
-			if (unlikely(!bitmap_bh))
-				goto make_io;
-
-			/*
-			 * If the inode bitmap isn't in cache then the
-			 * optimisation may end up performing two reads instead
-			 * of one, so skip it.
-			 */
-			if (!buffer_uptodate(bitmap_bh)) {
-				brelse(bitmap_bh);
-				goto make_io;
-			}
-			for (i = start; i < start + inodes_per_buffer; i++) {
-				if (i == inode_offset)
-					continue;
-				if (ext3_test_bit(i, bitmap_bh->b_data))
-					break;
-			}
-			brelse(bitmap_bh);
-			if (i == start + inodes_per_buffer) {
-				/* all other inodes are free, so skip I/O */
-				memset(bh->b_data, 0, bh->b_size);
-				set_buffer_uptodate(bh);
-				unlock_buffer(bh);
-				goto has_buffer;
-			}
-		}
-
-make_io:
-		/*
-		 * There are other valid inodes in the buffer, this inode
-		 * has in-inode xattrs, or we don't have this inode in memory.
-		 * Read the block from disk.
-		 */
-		trace_ext3_load_inode(inode);
-		get_bh(bh);
-		bh->b_end_io = end_buffer_read_sync;
-		submit_bh(READ | REQ_META | REQ_PRIO, bh);
-		wait_on_buffer(bh);
-		if (!buffer_uptodate(bh)) {
-			ext3_error(inode->i_sb, "ext3_get_inode_loc",
-					"unable to read inode block - "
-					"inode=%lu, block="E3FSBLK,
-					inode->i_ino, block);
-			brelse(bh);
-			return -EIO;
-		}
-	}
-has_buffer:
-	iloc->bh = bh;
-	return 0;
-}
-
-int ext3_get_inode_loc(struct inode *inode, struct ext3_iloc *iloc)
-{
-	/* We have all inode data except xattrs in memory here. */
-	return __ext3_get_inode_loc(inode, iloc,
-		!ext3_test_inode_state(inode, EXT3_STATE_XATTR));
-}
-
-void ext3_set_inode_flags(struct inode *inode)
-{
-	unsigned int flags = EXT3_I(inode)->i_flags;
-
-	inode->i_flags &= ~(S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC);
-	if (flags & EXT3_SYNC_FL)
-		inode->i_flags |= S_SYNC;
-	if (flags & EXT3_APPEND_FL)
-		inode->i_flags |= S_APPEND;
-	if (flags & EXT3_IMMUTABLE_FL)
-		inode->i_flags |= S_IMMUTABLE;
-	if (flags & EXT3_NOATIME_FL)
-		inode->i_flags |= S_NOATIME;
-	if (flags & EXT3_DIRSYNC_FL)
-		inode->i_flags |= S_DIRSYNC;
-}
-
-/* Propagate flags from i_flags to EXT3_I(inode)->i_flags */
-void ext3_get_inode_flags(struct ext3_inode_info *ei)
-{
-	unsigned int flags = ei->vfs_inode.i_flags;
-
-	ei->i_flags &= ~(EXT3_SYNC_FL|EXT3_APPEND_FL|
-			EXT3_IMMUTABLE_FL|EXT3_NOATIME_FL|EXT3_DIRSYNC_FL);
-	if (flags & S_SYNC)
-		ei->i_flags |= EXT3_SYNC_FL;
-	if (flags & S_APPEND)
-		ei->i_flags |= EXT3_APPEND_FL;
-	if (flags & S_IMMUTABLE)
-		ei->i_flags |= EXT3_IMMUTABLE_FL;
-	if (flags & S_NOATIME)
-		ei->i_flags |= EXT3_NOATIME_FL;
-	if (flags & S_DIRSYNC)
-		ei->i_flags |= EXT3_DIRSYNC_FL;
-}
-
-struct inode *ext3_iget(struct super_block *sb, unsigned long ino)
-{
-	struct ext3_iloc iloc;
-	struct ext3_inode *raw_inode;
-	struct ext3_inode_info *ei;
-	struct buffer_head *bh;
-	struct inode *inode;
-	journal_t *journal = EXT3_SB(sb)->s_journal;
-	transaction_t *transaction;
-	long ret;
-	int block;
-	uid_t i_uid;
-	gid_t i_gid;
-
-	inode = iget_locked(sb, ino);
-	if (!inode)
-		return ERR_PTR(-ENOMEM);
-	if (!(inode->i_state & I_NEW))
-		return inode;
-
-	ei = EXT3_I(inode);
-	ei->i_block_alloc_info = NULL;
-
-	ret = __ext3_get_inode_loc(inode, &iloc, 0);
-	if (ret < 0)
-		goto bad_inode;
-	bh = iloc.bh;
-	raw_inode = ext3_raw_inode(&iloc);
-	inode->i_mode = le16_to_cpu(raw_inode->i_mode);
-	i_uid = (uid_t)le16_to_cpu(raw_inode->i_uid_low);
-	i_gid = (gid_t)le16_to_cpu(raw_inode->i_gid_low);
-	if(!(test_opt (inode->i_sb, NO_UID32))) {
-		i_uid |= le16_to_cpu(raw_inode->i_uid_high) << 16;
-		i_gid |= le16_to_cpu(raw_inode->i_gid_high) << 16;
-	}
-	i_uid_write(inode, i_uid);
-	i_gid_write(inode, i_gid);
-	set_nlink(inode, le16_to_cpu(raw_inode->i_links_count));
-	inode->i_size = le32_to_cpu(raw_inode->i_size);
-	inode->i_atime.tv_sec = (signed)le32_to_cpu(raw_inode->i_atime);
-	inode->i_ctime.tv_sec = (signed)le32_to_cpu(raw_inode->i_ctime);
-	inode->i_mtime.tv_sec = (signed)le32_to_cpu(raw_inode->i_mtime);
-	inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec = inode->i_mtime.tv_nsec = 0;
-
-	ei->i_state_flags = 0;
-	ei->i_dir_start_lookup = 0;
-	ei->i_dtime = le32_to_cpu(raw_inode->i_dtime);
-	/* We now have enough fields to check if the inode was active or not.
-	 * This is needed because nfsd might try to access dead inodes
-	 * the test is that same one that e2fsck uses
-	 * NeilBrown 1999oct15
-	 */
-	if (inode->i_nlink == 0) {
-		if (inode->i_mode == 0 ||
-		    !(EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ORPHAN_FS)) {
-			/* this inode is deleted */
-			brelse (bh);
-			ret = -ESTALE;
-			goto bad_inode;
-		}
-		/* The only unlinked inodes we let through here have
-		 * valid i_mode and are being read by the orphan
-		 * recovery code: that's fine, we're about to complete
-		 * the process of deleting those. */
-	}
-	inode->i_blocks = le32_to_cpu(raw_inode->i_blocks);
-	ei->i_flags = le32_to_cpu(raw_inode->i_flags);
-#ifdef EXT3_FRAGMENTS
-	ei->i_faddr = le32_to_cpu(raw_inode->i_faddr);
-	ei->i_frag_no = raw_inode->i_frag;
-	ei->i_frag_size = raw_inode->i_fsize;
-#endif
-	ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl);
-	if (!S_ISREG(inode->i_mode)) {
-		ei->i_dir_acl = le32_to_cpu(raw_inode->i_dir_acl);
-	} else {
-		inode->i_size |=
-			((__u64)le32_to_cpu(raw_inode->i_size_high)) << 32;
-	}
-	ei->i_disksize = inode->i_size;
-	inode->i_generation = le32_to_cpu(raw_inode->i_generation);
-	ei->i_block_group = iloc.block_group;
-	/*
-	 * NOTE! The in-memory inode i_data array is in little-endian order
-	 * even on big-endian machines: we do NOT byteswap the block numbers!
-	 */
-	for (block = 0; block < EXT3_N_BLOCKS; block++)
-		ei->i_data[block] = raw_inode->i_block[block];
-	INIT_LIST_HEAD(&ei->i_orphan);
-
-	/*
-	 * Set transaction id's of transactions that have to be committed
-	 * to finish f[data]sync. We set them to currently running transaction
-	 * as we cannot be sure that the inode or some of its metadata isn't
-	 * part of the transaction - the inode could have been reclaimed and
-	 * now it is reread from disk.
-	 */
-	if (journal) {
-		tid_t tid;
-
-		spin_lock(&journal->j_state_lock);
-		if (journal->j_running_transaction)
-			transaction = journal->j_running_transaction;
-		else
-			transaction = journal->j_committing_transaction;
-		if (transaction)
-			tid = transaction->t_tid;
-		else
-			tid = journal->j_commit_sequence;
-		spin_unlock(&journal->j_state_lock);
-		atomic_set(&ei->i_sync_tid, tid);
-		atomic_set(&ei->i_datasync_tid, tid);
-	}
-
-	if (inode->i_ino >= EXT3_FIRST_INO(inode->i_sb) + 1 &&
-	    EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) {
-		/*
-		 * When mke2fs creates big inodes it does not zero out
-		 * the unused bytes above EXT3_GOOD_OLD_INODE_SIZE,
-		 * so ignore those first few inodes.
-		 */
-		ei->i_extra_isize = le16_to_cpu(raw_inode->i_extra_isize);
-		if (EXT3_GOOD_OLD_INODE_SIZE + ei->i_extra_isize >
-		    EXT3_INODE_SIZE(inode->i_sb)) {
-			brelse (bh);
-			ret = -EIO;
-			goto bad_inode;
-		}
-		if (ei->i_extra_isize == 0) {
-			/* The extra space is currently unused. Use it. */
-			ei->i_extra_isize = sizeof(struct ext3_inode) -
-					    EXT3_GOOD_OLD_INODE_SIZE;
-		} else {
-			__le32 *magic = (void *)raw_inode +
-					EXT3_GOOD_OLD_INODE_SIZE +
-					ei->i_extra_isize;
-			if (*magic == cpu_to_le32(EXT3_XATTR_MAGIC))
-				 ext3_set_inode_state(inode, EXT3_STATE_XATTR);
-		}
-	} else
-		ei->i_extra_isize = 0;
-
-	if (S_ISREG(inode->i_mode)) {
-		inode->i_op = &ext3_file_inode_operations;
-		inode->i_fop = &ext3_file_operations;
-		ext3_set_aops(inode);
-	} else if (S_ISDIR(inode->i_mode)) {
-		inode->i_op = &ext3_dir_inode_operations;
-		inode->i_fop = &ext3_dir_operations;
-	} else if (S_ISLNK(inode->i_mode)) {
-		if (ext3_inode_is_fast_symlink(inode)) {
-			inode->i_op = &ext3_fast_symlink_inode_operations;
-			nd_terminate_link(ei->i_data, inode->i_size,
-				sizeof(ei->i_data) - 1);
-			inode->i_link = (char *)ei->i_data;
-		} else {
-			inode->i_op = &ext3_symlink_inode_operations;
-			ext3_set_aops(inode);
-		}
-	} else {
-		inode->i_op = &ext3_special_inode_operations;
-		if (raw_inode->i_block[0])
-			init_special_inode(inode, inode->i_mode,
-			   old_decode_dev(le32_to_cpu(raw_inode->i_block[0])));
-		else
-			init_special_inode(inode, inode->i_mode,
-			   new_decode_dev(le32_to_cpu(raw_inode->i_block[1])));
-	}
-	brelse (iloc.bh);
-	ext3_set_inode_flags(inode);
-	unlock_new_inode(inode);
-	return inode;
-
-bad_inode:
-	iget_failed(inode);
-	return ERR_PTR(ret);
-}
-
-/*
- * Post the struct inode info into an on-disk inode location in the
- * buffer-cache.  This gobbles the caller's reference to the
- * buffer_head in the inode location struct.
- *
- * The caller must have write access to iloc->bh.
- */
-static int ext3_do_update_inode(handle_t *handle,
-				struct inode *inode,
-				struct ext3_iloc *iloc)
-{
-	struct ext3_inode *raw_inode = ext3_raw_inode(iloc);
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	struct buffer_head *bh = iloc->bh;
-	int err = 0, rc, block;
-	int need_datasync = 0;
-	__le32 disksize;
-	uid_t i_uid;
-	gid_t i_gid;
-
-again:
-	/* we can't allow multiple procs in here at once, its a bit racey */
-	lock_buffer(bh);
-
-	/* For fields not not tracking in the in-memory inode,
-	 * initialise them to zero for new inodes. */
-	if (ext3_test_inode_state(inode, EXT3_STATE_NEW))
-		memset(raw_inode, 0, EXT3_SB(inode->i_sb)->s_inode_size);
-
-	ext3_get_inode_flags(ei);
-	raw_inode->i_mode = cpu_to_le16(inode->i_mode);
-	i_uid = i_uid_read(inode);
-	i_gid = i_gid_read(inode);
-	if(!(test_opt(inode->i_sb, NO_UID32))) {
-		raw_inode->i_uid_low = cpu_to_le16(low_16_bits(i_uid));
-		raw_inode->i_gid_low = cpu_to_le16(low_16_bits(i_gid));
-/*
- * Fix up interoperability with old kernels. Otherwise, old inodes get
- * re-used with the upper 16 bits of the uid/gid intact
- */
-		if(!ei->i_dtime) {
-			raw_inode->i_uid_high =
-				cpu_to_le16(high_16_bits(i_uid));
-			raw_inode->i_gid_high =
-				cpu_to_le16(high_16_bits(i_gid));
-		} else {
-			raw_inode->i_uid_high = 0;
-			raw_inode->i_gid_high = 0;
-		}
-	} else {
-		raw_inode->i_uid_low =
-			cpu_to_le16(fs_high2lowuid(i_uid));
-		raw_inode->i_gid_low =
-			cpu_to_le16(fs_high2lowgid(i_gid));
-		raw_inode->i_uid_high = 0;
-		raw_inode->i_gid_high = 0;
-	}
-	raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
-	disksize = cpu_to_le32(ei->i_disksize);
-	if (disksize != raw_inode->i_size) {
-		need_datasync = 1;
-		raw_inode->i_size = disksize;
-	}
-	raw_inode->i_atime = cpu_to_le32(inode->i_atime.tv_sec);
-	raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec);
-	raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec);
-	raw_inode->i_blocks = cpu_to_le32(inode->i_blocks);
-	raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
-	raw_inode->i_flags = cpu_to_le32(ei->i_flags);
-#ifdef EXT3_FRAGMENTS
-	raw_inode->i_faddr = cpu_to_le32(ei->i_faddr);
-	raw_inode->i_frag = ei->i_frag_no;
-	raw_inode->i_fsize = ei->i_frag_size;
-#endif
-	raw_inode->i_file_acl = cpu_to_le32(ei->i_file_acl);
-	if (!S_ISREG(inode->i_mode)) {
-		raw_inode->i_dir_acl = cpu_to_le32(ei->i_dir_acl);
-	} else {
-		disksize = cpu_to_le32(ei->i_disksize >> 32);
-		if (disksize != raw_inode->i_size_high) {
-			raw_inode->i_size_high = disksize;
-			need_datasync = 1;
-		}
-		if (ei->i_disksize > 0x7fffffffULL) {
-			struct super_block *sb = inode->i_sb;
-			if (!EXT3_HAS_RO_COMPAT_FEATURE(sb,
-					EXT3_FEATURE_RO_COMPAT_LARGE_FILE) ||
-			    EXT3_SB(sb)->s_es->s_rev_level ==
-					cpu_to_le32(EXT3_GOOD_OLD_REV)) {
-			       /* If this is the first large file
-				* created, add a flag to the superblock.
-				*/
-				unlock_buffer(bh);
-				err = ext3_journal_get_write_access(handle,
-						EXT3_SB(sb)->s_sbh);
-				if (err)
-					goto out_brelse;
-
-				ext3_update_dynamic_rev(sb);
-				EXT3_SET_RO_COMPAT_FEATURE(sb,
-					EXT3_FEATURE_RO_COMPAT_LARGE_FILE);
-				handle->h_sync = 1;
-				err = ext3_journal_dirty_metadata(handle,
-						EXT3_SB(sb)->s_sbh);
-				/* get our lock and start over */
-				goto again;
-			}
-		}
-	}
-	raw_inode->i_generation = cpu_to_le32(inode->i_generation);
-	if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
-		if (old_valid_dev(inode->i_rdev)) {
-			raw_inode->i_block[0] =
-				cpu_to_le32(old_encode_dev(inode->i_rdev));
-			raw_inode->i_block[1] = 0;
-		} else {
-			raw_inode->i_block[0] = 0;
-			raw_inode->i_block[1] =
-				cpu_to_le32(new_encode_dev(inode->i_rdev));
-			raw_inode->i_block[2] = 0;
-		}
-	} else for (block = 0; block < EXT3_N_BLOCKS; block++)
-		raw_inode->i_block[block] = ei->i_data[block];
-
-	if (ei->i_extra_isize)
-		raw_inode->i_extra_isize = cpu_to_le16(ei->i_extra_isize);
-
-	BUFFER_TRACE(bh, "call ext3_journal_dirty_metadata");
-	unlock_buffer(bh);
-	rc = ext3_journal_dirty_metadata(handle, bh);
-	if (!err)
-		err = rc;
-	ext3_clear_inode_state(inode, EXT3_STATE_NEW);
-
-	atomic_set(&ei->i_sync_tid, handle->h_transaction->t_tid);
-	if (need_datasync)
-		atomic_set(&ei->i_datasync_tid, handle->h_transaction->t_tid);
-out_brelse:
-	brelse (bh);
-	ext3_std_error(inode->i_sb, err);
-	return err;
-}
-
-/*
- * ext3_write_inode()
- *
- * We are called from a few places:
- *
- * - Within generic_file_aio_write() -> generic_write_sync() for O_SYNC files.
- *   Here, there will be no transaction running. We wait for any running
- *   transaction to commit.
- *
- * - Within flush work (for sys_sync(), kupdate and such).
- *   We wait on commit, if told to.
- *
- * - Within iput_final() -> write_inode_now()
- *   We wait on commit, if told to.
- *
- * In all cases it is actually safe for us to return without doing anything,
- * because the inode has been copied into a raw inode buffer in
- * ext3_mark_inode_dirty().  This is a correctness thing for WB_SYNC_ALL
- * writeback.
- *
- * Note that we are absolutely dependent upon all inode dirtiers doing the
- * right thing: they *must* call mark_inode_dirty() after dirtying info in
- * which we are interested.
- *
- * It would be a bug for them to not do this.  The code:
- *
- *	mark_inode_dirty(inode)
- *	stuff();
- *	inode->i_size = expr;
- *
- * is in error because write_inode() could occur while `stuff()' is running,
- * and the new i_size will be lost.  Plus the inode will no longer be on the
- * superblock's dirty inode list.
- */
-int ext3_write_inode(struct inode *inode, struct writeback_control *wbc)
-{
-	if (WARN_ON_ONCE(current->flags & PF_MEMALLOC))
-		return 0;
-
-	if (ext3_journal_current_handle()) {
-		jbd_debug(1, "called recursively, non-PF_MEMALLOC!\n");
-		dump_stack();
-		return -EIO;
-	}
-
-	/*
-	 * No need to force transaction in WB_SYNC_NONE mode. Also
-	 * ext3_sync_fs() will force the commit after everything is
-	 * written.
-	 */
-	if (wbc->sync_mode != WB_SYNC_ALL || wbc->for_sync)
-		return 0;
-
-	return ext3_force_commit(inode->i_sb);
-}
-
-/*
- * ext3_setattr()
- *
- * Called from notify_change.
- *
- * We want to trap VFS attempts to truncate the file as soon as
- * possible.  In particular, we want to make sure that when the VFS
- * shrinks i_size, we put the inode on the orphan list and modify
- * i_disksize immediately, so that during the subsequent flushing of
- * dirty pages and freeing of disk blocks, we can guarantee that any
- * commit will leave the blocks being flushed in an unused state on
- * disk.  (On recovery, the inode will get truncated and the blocks will
- * be freed, so we have a strong guarantee that no future commit will
- * leave these blocks visible to the user.)
- *
- * Called with inode->sem down.
- */
-int ext3_setattr(struct dentry *dentry, struct iattr *attr)
-{
-	struct inode *inode = d_inode(dentry);
-	int error, rc = 0;
-	const unsigned int ia_valid = attr->ia_valid;
-
-	error = inode_change_ok(inode, attr);
-	if (error)
-		return error;
-
-	if (is_quota_modification(inode, attr))
-		dquot_initialize(inode);
-	if ((ia_valid & ATTR_UID && !uid_eq(attr->ia_uid, inode->i_uid)) ||
-	    (ia_valid & ATTR_GID && !gid_eq(attr->ia_gid, inode->i_gid))) {
-		handle_t *handle;
-
-		/* (user+group)*(old+new) structure, inode write (sb,
-		 * inode block, ? - but truncate inode update has it) */
-		handle = ext3_journal_start(inode, EXT3_MAXQUOTAS_INIT_BLOCKS(inode->i_sb)+
-					EXT3_MAXQUOTAS_DEL_BLOCKS(inode->i_sb)+3);
-		if (IS_ERR(handle)) {
-			error = PTR_ERR(handle);
-			goto err_out;
-		}
-		error = dquot_transfer(inode, attr);
-		if (error) {
-			ext3_journal_stop(handle);
-			return error;
-		}
-		/* Update corresponding info in inode so that everything is in
-		 * one transaction */
-		if (attr->ia_valid & ATTR_UID)
-			inode->i_uid = attr->ia_uid;
-		if (attr->ia_valid & ATTR_GID)
-			inode->i_gid = attr->ia_gid;
-		error = ext3_mark_inode_dirty(handle, inode);
-		ext3_journal_stop(handle);
-	}
-
-	if (attr->ia_valid & ATTR_SIZE)
-		inode_dio_wait(inode);
-
-	if (S_ISREG(inode->i_mode) &&
-	    attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) {
-		handle_t *handle;
-
-		handle = ext3_journal_start(inode, 3);
-		if (IS_ERR(handle)) {
-			error = PTR_ERR(handle);
-			goto err_out;
-		}
-
-		error = ext3_orphan_add(handle, inode);
-		if (error) {
-			ext3_journal_stop(handle);
-			goto err_out;
-		}
-		EXT3_I(inode)->i_disksize = attr->ia_size;
-		error = ext3_mark_inode_dirty(handle, inode);
-		ext3_journal_stop(handle);
-		if (error) {
-			/* Some hard fs error must have happened. Bail out. */
-			ext3_orphan_del(NULL, inode);
-			goto err_out;
-		}
-		rc = ext3_block_truncate_page(inode, attr->ia_size);
-		if (rc) {
-			/* Cleanup orphan list and exit */
-			handle = ext3_journal_start(inode, 3);
-			if (IS_ERR(handle)) {
-				ext3_orphan_del(NULL, inode);
-				goto err_out;
-			}
-			ext3_orphan_del(handle, inode);
-			ext3_journal_stop(handle);
-			goto err_out;
-		}
-	}
-
-	if ((attr->ia_valid & ATTR_SIZE) &&
-	    attr->ia_size != i_size_read(inode)) {
-		truncate_setsize(inode, attr->ia_size);
-		ext3_truncate(inode);
-	}
-
-	setattr_copy(inode, attr);
-	mark_inode_dirty(inode);
-
-	if (ia_valid & ATTR_MODE)
-		rc = posix_acl_chmod(inode, inode->i_mode);
-
-err_out:
-	ext3_std_error(inode->i_sb, error);
-	if (!error)
-		error = rc;
-	return error;
-}
-
-
-/*
- * How many blocks doth make a writepage()?
- *
- * With N blocks per page, it may be:
- * N data blocks
- * 2 indirect block
- * 2 dindirect
- * 1 tindirect
- * N+5 bitmap blocks (from the above)
- * N+5 group descriptor summary blocks
- * 1 inode block
- * 1 superblock.
- * 2 * EXT3_SINGLEDATA_TRANS_BLOCKS for the quote files
- *
- * 3 * (N + 5) + 2 + 2 * EXT3_SINGLEDATA_TRANS_BLOCKS
- *
- * With ordered or writeback data it's the same, less the N data blocks.
- *
- * If the inode's direct blocks can hold an integral number of pages then a
- * page cannot straddle two indirect blocks, and we can only touch one indirect
- * and dindirect block, and the "5" above becomes "3".
- *
- * This still overestimates under most circumstances.  If we were to pass the
- * start and end offsets in here as well we could do block_to_path() on each
- * block and work out the exact number of indirects which are touched.  Pah.
- */
-
-static int ext3_writepage_trans_blocks(struct inode *inode)
-{
-	int bpp = ext3_journal_blocks_per_page(inode);
-	int indirects = (EXT3_NDIR_BLOCKS % bpp) ? 5 : 3;
-	int ret;
-
-	if (ext3_should_journal_data(inode))
-		ret = 3 * (bpp + indirects) + 2;
-	else
-		ret = 2 * (bpp + indirects) + indirects + 2;
-
-#ifdef CONFIG_QUOTA
-	/* We know that structure was already allocated during dquot_initialize so
-	 * we will be updating only the data blocks + inodes */
-	ret += EXT3_MAXQUOTAS_TRANS_BLOCKS(inode->i_sb);
-#endif
-
-	return ret;
-}
-
-/*
- * The caller must have previously called ext3_reserve_inode_write().
- * Give this, we know that the caller already has write access to iloc->bh.
- */
-int ext3_mark_iloc_dirty(handle_t *handle,
-		struct inode *inode, struct ext3_iloc *iloc)
-{
-	int err = 0;
-
-	/* the do_update_inode consumes one bh->b_count */
-	get_bh(iloc->bh);
-
-	/* ext3_do_update_inode() does journal_dirty_metadata */
-	err = ext3_do_update_inode(handle, inode, iloc);
-	put_bh(iloc->bh);
-	return err;
-}
-
-/*
- * On success, We end up with an outstanding reference count against
- * iloc->bh.  This _must_ be cleaned up later.
- */
-
-int
-ext3_reserve_inode_write(handle_t *handle, struct inode *inode,
-			 struct ext3_iloc *iloc)
-{
-	int err = 0;
-	if (handle) {
-		err = ext3_get_inode_loc(inode, iloc);
-		if (!err) {
-			BUFFER_TRACE(iloc->bh, "get_write_access");
-			err = ext3_journal_get_write_access(handle, iloc->bh);
-			if (err) {
-				brelse(iloc->bh);
-				iloc->bh = NULL;
-			}
-		}
-	}
-	ext3_std_error(inode->i_sb, err);
-	return err;
-}
-
-/*
- * What we do here is to mark the in-core inode as clean with respect to inode
- * dirtiness (it may still be data-dirty).
- * This means that the in-core inode may be reaped by prune_icache
- * without having to perform any I/O.  This is a very good thing,
- * because *any* task may call prune_icache - even ones which
- * have a transaction open against a different journal.
- *
- * Is this cheating?  Not really.  Sure, we haven't written the
- * inode out, but prune_icache isn't a user-visible syncing function.
- * Whenever the user wants stuff synced (sys_sync, sys_msync, sys_fsync)
- * we start and wait on commits.
- */
-int ext3_mark_inode_dirty(handle_t *handle, struct inode *inode)
-{
-	struct ext3_iloc iloc;
-	int err;
-
-	might_sleep();
-	trace_ext3_mark_inode_dirty(inode, _RET_IP_);
-	err = ext3_reserve_inode_write(handle, inode, &iloc);
-	if (!err)
-		err = ext3_mark_iloc_dirty(handle, inode, &iloc);
-	return err;
-}
-
-/*
- * ext3_dirty_inode() is called from __mark_inode_dirty()
- *
- * We're really interested in the case where a file is being extended.
- * i_size has been changed by generic_commit_write() and we thus need
- * to include the updated inode in the current transaction.
- *
- * Also, dquot_alloc_space() will always dirty the inode when blocks
- * are allocated to the file.
- *
- * If the inode is marked synchronous, we don't honour that here - doing
- * so would cause a commit on atime updates, which we don't bother doing.
- * We handle synchronous inodes at the highest possible level.
- */
-void ext3_dirty_inode(struct inode *inode, int flags)
-{
-	handle_t *current_handle = ext3_journal_current_handle();
-	handle_t *handle;
-
-	handle = ext3_journal_start(inode, 2);
-	if (IS_ERR(handle))
-		goto out;
-	if (current_handle &&
-		current_handle->h_transaction != handle->h_transaction) {
-		/* This task has a transaction open against a different fs */
-		printk(KERN_EMERG "%s: transactions do not match!\n",
-		       __func__);
-	} else {
-		jbd_debug(5, "marking dirty.  outer handle=%p\n",
-				current_handle);
-		ext3_mark_inode_dirty(handle, inode);
-	}
-	ext3_journal_stop(handle);
-out:
-	return;
-}
-
-#if 0
-/*
- * Bind an inode's backing buffer_head into this transaction, to prevent
- * it from being flushed to disk early.  Unlike
- * ext3_reserve_inode_write, this leaves behind no bh reference and
- * returns no iloc structure, so the caller needs to repeat the iloc
- * lookup to mark the inode dirty later.
- */
-static int ext3_pin_inode(handle_t *handle, struct inode *inode)
-{
-	struct ext3_iloc iloc;
-
-	int err = 0;
-	if (handle) {
-		err = ext3_get_inode_loc(inode, &iloc);
-		if (!err) {
-			BUFFER_TRACE(iloc.bh, "get_write_access");
-			err = journal_get_write_access(handle, iloc.bh);
-			if (!err)
-				err = ext3_journal_dirty_metadata(handle,
-								  iloc.bh);
-			brelse(iloc.bh);
-		}
-	}
-	ext3_std_error(inode->i_sb, err);
-	return err;
-}
-#endif
-
-int ext3_change_inode_journal_flag(struct inode *inode, int val)
-{
-	journal_t *journal;
-	handle_t *handle;
-	int err;
-
-	/*
-	 * We have to be very careful here: changing a data block's
-	 * journaling status dynamically is dangerous.  If we write a
-	 * data block to the journal, change the status and then delete
-	 * that block, we risk forgetting to revoke the old log record
-	 * from the journal and so a subsequent replay can corrupt data.
-	 * So, first we make sure that the journal is empty and that
-	 * nobody is changing anything.
-	 */
-
-	journal = EXT3_JOURNAL(inode);
-	if (is_journal_aborted(journal))
-		return -EROFS;
-
-	journal_lock_updates(journal);
-	journal_flush(journal);
-
-	/*
-	 * OK, there are no updates running now, and all cached data is
-	 * synced to disk.  We are now in a completely consistent state
-	 * which doesn't have anything in the journal, and we know that
-	 * no filesystem updates are running, so it is safe to modify
-	 * the inode's in-core data-journaling state flag now.
-	 */
-
-	if (val)
-		EXT3_I(inode)->i_flags |= EXT3_JOURNAL_DATA_FL;
-	else
-		EXT3_I(inode)->i_flags &= ~EXT3_JOURNAL_DATA_FL;
-	ext3_set_aops(inode);
-
-	journal_unlock_updates(journal);
-
-	/* Finally we can mark the inode as dirty. */
-
-	handle = ext3_journal_start(inode, 1);
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-
-	err = ext3_mark_inode_dirty(handle, inode);
-	handle->h_sync = 1;
-	ext3_journal_stop(handle);
-	ext3_std_error(inode->i_sb, err);
-
-	return err;
-}
diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
deleted file mode 100644
index 4d96e9a64532..000000000000
--- a/fs/ext3/ioctl.c
+++ /dev/null
@@ -1,327 +0,0 @@
-/*
- * linux/fs/ext3/ioctl.c
- *
- * Copyright (C) 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- */
-
-#include <linux/mount.h>
-#include <linux/compat.h>
-#include <asm/uaccess.h>
-#include "ext3.h"
-
-long ext3_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
-{
-	struct inode *inode = file_inode(filp);
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	unsigned int flags;
-	unsigned short rsv_window_size;
-
-	ext3_debug ("cmd = %u, arg = %lu\n", cmd, arg);
-
-	switch (cmd) {
-	case EXT3_IOC_GETFLAGS:
-		ext3_get_inode_flags(ei);
-		flags = ei->i_flags & EXT3_FL_USER_VISIBLE;
-		return put_user(flags, (int __user *) arg);
-	case EXT3_IOC_SETFLAGS: {
-		handle_t *handle = NULL;
-		int err;
-		struct ext3_iloc iloc;
-		unsigned int oldflags;
-		unsigned int jflag;
-
-		if (!inode_owner_or_capable(inode))
-			return -EACCES;
-
-		if (get_user(flags, (int __user *) arg))
-			return -EFAULT;
-
-		err = mnt_want_write_file(filp);
-		if (err)
-			return err;
-
-		flags = ext3_mask_flags(inode->i_mode, flags);
-
-		mutex_lock(&inode->i_mutex);
-
-		/* Is it quota file? Do not allow user to mess with it */
-		err = -EPERM;
-		if (IS_NOQUOTA(inode))
-			goto flags_out;
-
-		oldflags = ei->i_flags;
-
-		/* The JOURNAL_DATA flag is modifiable only by root */
-		jflag = flags & EXT3_JOURNAL_DATA_FL;
-
-		/*
-		 * The IMMUTABLE and APPEND_ONLY flags can only be changed by
-		 * the relevant capability.
-		 *
-		 * This test looks nicer. Thanks to Pauline Middelink
-		 */
-		if ((flags ^ oldflags) & (EXT3_APPEND_FL | EXT3_IMMUTABLE_FL)) {
-			if (!capable(CAP_LINUX_IMMUTABLE))
-				goto flags_out;
-		}
-
-		/*
-		 * The JOURNAL_DATA flag can only be changed by
-		 * the relevant capability.
-		 */
-		if ((jflag ^ oldflags) & (EXT3_JOURNAL_DATA_FL)) {
-			if (!capable(CAP_SYS_RESOURCE))
-				goto flags_out;
-		}
-
-		handle = ext3_journal_start(inode, 1);
-		if (IS_ERR(handle)) {
-			err = PTR_ERR(handle);
-			goto flags_out;
-		}
-		if (IS_SYNC(inode))
-			handle->h_sync = 1;
-		err = ext3_reserve_inode_write(handle, inode, &iloc);
-		if (err)
-			goto flags_err;
-
-		flags = flags & EXT3_FL_USER_MODIFIABLE;
-		flags |= oldflags & ~EXT3_FL_USER_MODIFIABLE;
-		ei->i_flags = flags;
-
-		ext3_set_inode_flags(inode);
-		inode->i_ctime = CURRENT_TIME_SEC;
-
-		err = ext3_mark_iloc_dirty(handle, inode, &iloc);
-flags_err:
-		ext3_journal_stop(handle);
-		if (err)
-			goto flags_out;
-
-		if ((jflag ^ oldflags) & (EXT3_JOURNAL_DATA_FL))
-			err = ext3_change_inode_journal_flag(inode, jflag);
-flags_out:
-		mutex_unlock(&inode->i_mutex);
-		mnt_drop_write_file(filp);
-		return err;
-	}
-	case EXT3_IOC_GETVERSION:
-	case EXT3_IOC_GETVERSION_OLD:
-		return put_user(inode->i_generation, (int __user *) arg);
-	case EXT3_IOC_SETVERSION:
-	case EXT3_IOC_SETVERSION_OLD: {
-		handle_t *handle;
-		struct ext3_iloc iloc;
-		__u32 generation;
-		int err;
-
-		if (!inode_owner_or_capable(inode))
-			return -EPERM;
-
-		err = mnt_want_write_file(filp);
-		if (err)
-			return err;
-		if (get_user(generation, (int __user *) arg)) {
-			err = -EFAULT;
-			goto setversion_out;
-		}
-
-		mutex_lock(&inode->i_mutex);
-		handle = ext3_journal_start(inode, 1);
-		if (IS_ERR(handle)) {
-			err = PTR_ERR(handle);
-			goto unlock_out;
-		}
-		err = ext3_reserve_inode_write(handle, inode, &iloc);
-		if (err == 0) {
-			inode->i_ctime = CURRENT_TIME_SEC;
-			inode->i_generation = generation;
-			err = ext3_mark_iloc_dirty(handle, inode, &iloc);
-		}
-		ext3_journal_stop(handle);
-
-unlock_out:
-		mutex_unlock(&inode->i_mutex);
-setversion_out:
-		mnt_drop_write_file(filp);
-		return err;
-	}
-	case EXT3_IOC_GETRSVSZ:
-		if (test_opt(inode->i_sb, RESERVATION)
-			&& S_ISREG(inode->i_mode)
-			&& ei->i_block_alloc_info) {
-			rsv_window_size = ei->i_block_alloc_info->rsv_window_node.rsv_goal_size;
-			return put_user(rsv_window_size, (int __user *)arg);
-		}
-		return -ENOTTY;
-	case EXT3_IOC_SETRSVSZ: {
-		int err;
-
-		if (!test_opt(inode->i_sb, RESERVATION) ||!S_ISREG(inode->i_mode))
-			return -ENOTTY;
-
-		err = mnt_want_write_file(filp);
-		if (err)
-			return err;
-
-		if (!inode_owner_or_capable(inode)) {
-			err = -EACCES;
-			goto setrsvsz_out;
-		}
-
-		if (get_user(rsv_window_size, (int __user *)arg)) {
-			err = -EFAULT;
-			goto setrsvsz_out;
-		}
-
-		if (rsv_window_size > EXT3_MAX_RESERVE_BLOCKS)
-			rsv_window_size = EXT3_MAX_RESERVE_BLOCKS;
-
-		/*
-		 * need to allocate reservation structure for this inode
-		 * before set the window size
-		 */
-		mutex_lock(&ei->truncate_mutex);
-		if (!ei->i_block_alloc_info)
-			ext3_init_block_alloc_info(inode);
-
-		if (ei->i_block_alloc_info){
-			struct ext3_reserve_window_node *rsv = &ei->i_block_alloc_info->rsv_window_node;
-			rsv->rsv_goal_size = rsv_window_size;
-		}
-		mutex_unlock(&ei->truncate_mutex);
-setrsvsz_out:
-		mnt_drop_write_file(filp);
-		return err;
-	}
-	case EXT3_IOC_GROUP_EXTEND: {
-		ext3_fsblk_t n_blocks_count;
-		struct super_block *sb = inode->i_sb;
-		int err, err2;
-
-		if (!capable(CAP_SYS_RESOURCE))
-			return -EPERM;
-
-		err = mnt_want_write_file(filp);
-		if (err)
-			return err;
-
-		if (get_user(n_blocks_count, (__u32 __user *)arg)) {
-			err = -EFAULT;
-			goto group_extend_out;
-		}
-		err = ext3_group_extend(sb, EXT3_SB(sb)->s_es, n_blocks_count);
-		journal_lock_updates(EXT3_SB(sb)->s_journal);
-		err2 = journal_flush(EXT3_SB(sb)->s_journal);
-		journal_unlock_updates(EXT3_SB(sb)->s_journal);
-		if (err == 0)
-			err = err2;
-group_extend_out:
-		mnt_drop_write_file(filp);
-		return err;
-	}
-	case EXT3_IOC_GROUP_ADD: {
-		struct ext3_new_group_data input;
-		struct super_block *sb = inode->i_sb;
-		int err, err2;
-
-		if (!capable(CAP_SYS_RESOURCE))
-			return -EPERM;
-
-		err = mnt_want_write_file(filp);
-		if (err)
-			return err;
-
-		if (copy_from_user(&input, (struct ext3_new_group_input __user *)arg,
-				sizeof(input))) {
-			err = -EFAULT;
-			goto group_add_out;
-		}
-
-		err = ext3_group_add(sb, &input);
-		journal_lock_updates(EXT3_SB(sb)->s_journal);
-		err2 = journal_flush(EXT3_SB(sb)->s_journal);
-		journal_unlock_updates(EXT3_SB(sb)->s_journal);
-		if (err == 0)
-			err = err2;
-group_add_out:
-		mnt_drop_write_file(filp);
-		return err;
-	}
-	case FITRIM: {
-
-		struct super_block *sb = inode->i_sb;
-		struct fstrim_range range;
-		int ret = 0;
-
-		if (!capable(CAP_SYS_ADMIN))
-			return -EPERM;
-
-		if (copy_from_user(&range, (struct fstrim_range __user *)arg,
-				   sizeof(range)))
-			return -EFAULT;
-
-		ret = ext3_trim_fs(sb, &range);
-		if (ret < 0)
-			return ret;
-
-		if (copy_to_user((struct fstrim_range __user *)arg, &range,
-				 sizeof(range)))
-			return -EFAULT;
-
-		return 0;
-	}
-
-	default:
-		return -ENOTTY;
-	}
-}
-
-#ifdef CONFIG_COMPAT
-long ext3_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
-	/* These are just misnamed, they actually get/put from/to user an int */
-	switch (cmd) {
-	case EXT3_IOC32_GETFLAGS:
-		cmd = EXT3_IOC_GETFLAGS;
-		break;
-	case EXT3_IOC32_SETFLAGS:
-		cmd = EXT3_IOC_SETFLAGS;
-		break;
-	case EXT3_IOC32_GETVERSION:
-		cmd = EXT3_IOC_GETVERSION;
-		break;
-	case EXT3_IOC32_SETVERSION:
-		cmd = EXT3_IOC_SETVERSION;
-		break;
-	case EXT3_IOC32_GROUP_EXTEND:
-		cmd = EXT3_IOC_GROUP_EXTEND;
-		break;
-	case EXT3_IOC32_GETVERSION_OLD:
-		cmd = EXT3_IOC_GETVERSION_OLD;
-		break;
-	case EXT3_IOC32_SETVERSION_OLD:
-		cmd = EXT3_IOC_SETVERSION_OLD;
-		break;
-#ifdef CONFIG_JBD_DEBUG
-	case EXT3_IOC32_WAIT_FOR_READONLY:
-		cmd = EXT3_IOC_WAIT_FOR_READONLY;
-		break;
-#endif
-	case EXT3_IOC32_GETRSVSZ:
-		cmd = EXT3_IOC_GETRSVSZ;
-		break;
-	case EXT3_IOC32_SETRSVSZ:
-		cmd = EXT3_IOC_SETRSVSZ;
-		break;
-	case EXT3_IOC_GROUP_ADD:
-		break;
-	default:
-		return -ENOIOCTLCMD;
-	}
-	return ext3_ioctl(file, cmd, (unsigned long) compat_ptr(arg));
-}
-#endif
diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
deleted file mode 100644
index c9e767cd4b67..000000000000
--- a/fs/ext3/namei.c
+++ /dev/null
@@ -1,2586 +0,0 @@
-/*
- *  linux/fs/ext3/namei.c
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- *
- *  from
- *
- *  linux/fs/minix/namei.c
- *
- *  Copyright (C) 1991, 1992  Linus Torvalds
- *
- *  Big-endian to little-endian byte-swapping/bitmaps by
- *        David S. Miller (davem@caip.rutgers.edu), 1995
- *  Directory entry file type support and forward compatibility hooks
- *	for B-tree directories by Theodore Ts'o (tytso@mit.edu), 1998
- *  Hash Tree Directory indexing (c)
- *	Daniel Phillips, 2001
- *  Hash Tree Directory indexing porting
- *	Christopher Li, 2002
- *  Hash Tree Directory indexing cleanup
- *	Theodore Ts'o, 2002
- */
-
-#include <linux/quotaops.h>
-#include "ext3.h"
-#include "namei.h"
-#include "xattr.h"
-#include "acl.h"
-
-/*
- * define how far ahead to read directories while searching them.
- */
-#define NAMEI_RA_CHUNKS  2
-#define NAMEI_RA_BLOCKS  4
-#define NAMEI_RA_SIZE        (NAMEI_RA_CHUNKS * NAMEI_RA_BLOCKS)
-
-static struct buffer_head *ext3_append(handle_t *handle,
-					struct inode *inode,
-					u32 *block, int *err)
-{
-	struct buffer_head *bh;
-
-	*block = inode->i_size >> inode->i_sb->s_blocksize_bits;
-
-	if ((bh = ext3_dir_bread(handle, inode, *block, 1, err))) {
-		inode->i_size += inode->i_sb->s_blocksize;
-		EXT3_I(inode)->i_disksize = inode->i_size;
-		*err = ext3_journal_get_write_access(handle, bh);
-		if (*err) {
-			brelse(bh);
-			bh = NULL;
-		}
-	}
-	return bh;
-}
-
-#ifndef assert
-#define assert(test) J_ASSERT(test)
-#endif
-
-#ifdef DX_DEBUG
-#define dxtrace(command) command
-#else
-#define dxtrace(command)
-#endif
-
-struct fake_dirent
-{
-	__le32 inode;
-	__le16 rec_len;
-	u8 name_len;
-	u8 file_type;
-};
-
-struct dx_countlimit
-{
-	__le16 limit;
-	__le16 count;
-};
-
-struct dx_entry
-{
-	__le32 hash;
-	__le32 block;
-};
-
-/*
- * dx_root_info is laid out so that if it should somehow get overlaid by a
- * dirent the two low bits of the hash version will be zero.  Therefore, the
- * hash version mod 4 should never be 0.  Sincerely, the paranoia department.
- */
-
-struct dx_root
-{
-	struct fake_dirent dot;
-	char dot_name[4];
-	struct fake_dirent dotdot;
-	char dotdot_name[4];
-	struct dx_root_info
-	{
-		__le32 reserved_zero;
-		u8 hash_version;
-		u8 info_length; /* 8 */
-		u8 indirect_levels;
-		u8 unused_flags;
-	}
-	info;
-	struct dx_entry	entries[0];
-};
-
-struct dx_node
-{
-	struct fake_dirent fake;
-	struct dx_entry	entries[0];
-};
-
-
-struct dx_frame
-{
-	struct buffer_head *bh;
-	struct dx_entry *entries;
-	struct dx_entry *at;
-};
-
-struct dx_map_entry
-{
-	u32 hash;
-	u16 offs;
-	u16 size;
-};
-
-static inline unsigned dx_get_block (struct dx_entry *entry);
-static void dx_set_block (struct dx_entry *entry, unsigned value);
-static inline unsigned dx_get_hash (struct dx_entry *entry);
-static void dx_set_hash (struct dx_entry *entry, unsigned value);
-static unsigned dx_get_count (struct dx_entry *entries);
-static unsigned dx_get_limit (struct dx_entry *entries);
-static void dx_set_count (struct dx_entry *entries, unsigned value);
-static void dx_set_limit (struct dx_entry *entries, unsigned value);
-static unsigned dx_root_limit (struct inode *dir, unsigned infosize);
-static unsigned dx_node_limit (struct inode *dir);
-static struct dx_frame *dx_probe(struct qstr *entry,
-				 struct inode *dir,
-				 struct dx_hash_info *hinfo,
-				 struct dx_frame *frame,
-				 int *err);
-static void dx_release (struct dx_frame *frames);
-static int dx_make_map(struct ext3_dir_entry_2 *de, unsigned blocksize,
-			struct dx_hash_info *hinfo, struct dx_map_entry map[]);
-static void dx_sort_map(struct dx_map_entry *map, unsigned count);
-static struct ext3_dir_entry_2 *dx_move_dirents (char *from, char *to,
-		struct dx_map_entry *offsets, int count);
-static struct ext3_dir_entry_2 *dx_pack_dirents(char *base, unsigned blocksize);
-static void dx_insert_block (struct dx_frame *frame, u32 hash, u32 block);
-static int ext3_htree_next_block(struct inode *dir, __u32 hash,
-				 struct dx_frame *frame,
-				 struct dx_frame *frames,
-				 __u32 *start_hash);
-static struct buffer_head * ext3_dx_find_entry(struct inode *dir,
-			struct qstr *entry, struct ext3_dir_entry_2 **res_dir,
-			int *err);
-static int ext3_dx_add_entry(handle_t *handle, struct dentry *dentry,
-			     struct inode *inode);
-
-/*
- * p is at least 6 bytes before the end of page
- */
-static inline struct ext3_dir_entry_2 *
-ext3_next_entry(struct ext3_dir_entry_2 *p)
-{
-	return (struct ext3_dir_entry_2 *)((char *)p +
-		ext3_rec_len_from_disk(p->rec_len));
-}
-
-/*
- * Future: use high four bits of block for coalesce-on-delete flags
- * Mask them off for now.
- */
-
-static inline unsigned dx_get_block (struct dx_entry *entry)
-{
-	return le32_to_cpu(entry->block) & 0x00ffffff;
-}
-
-static inline void dx_set_block (struct dx_entry *entry, unsigned value)
-{
-	entry->block = cpu_to_le32(value);
-}
-
-static inline unsigned dx_get_hash (struct dx_entry *entry)
-{
-	return le32_to_cpu(entry->hash);
-}
-
-static inline void dx_set_hash (struct dx_entry *entry, unsigned value)
-{
-	entry->hash = cpu_to_le32(value);
-}
-
-static inline unsigned dx_get_count (struct dx_entry *entries)
-{
-	return le16_to_cpu(((struct dx_countlimit *) entries)->count);
-}
-
-static inline unsigned dx_get_limit (struct dx_entry *entries)
-{
-	return le16_to_cpu(((struct dx_countlimit *) entries)->limit);
-}
-
-static inline void dx_set_count (struct dx_entry *entries, unsigned value)
-{
-	((struct dx_countlimit *) entries)->count = cpu_to_le16(value);
-}
-
-static inline void dx_set_limit (struct dx_entry *entries, unsigned value)
-{
-	((struct dx_countlimit *) entries)->limit = cpu_to_le16(value);
-}
-
-static inline unsigned dx_root_limit (struct inode *dir, unsigned infosize)
-{
-	unsigned entry_space = dir->i_sb->s_blocksize - EXT3_DIR_REC_LEN(1) -
-		EXT3_DIR_REC_LEN(2) - infosize;
-	return entry_space / sizeof(struct dx_entry);
-}
-
-static inline unsigned dx_node_limit (struct inode *dir)
-{
-	unsigned entry_space = dir->i_sb->s_blocksize - EXT3_DIR_REC_LEN(0);
-	return entry_space / sizeof(struct dx_entry);
-}
-
-/*
- * Debug
- */
-#ifdef DX_DEBUG
-static void dx_show_index (char * label, struct dx_entry *entries)
-{
-        int i, n = dx_get_count (entries);
-        printk("%s index ", label);
-        for (i = 0; i < n; i++)
-        {
-                printk("%x->%u ", i? dx_get_hash(entries + i): 0, dx_get_block(entries + i));
-        }
-        printk("\n");
-}
-
-struct stats
-{
-	unsigned names;
-	unsigned space;
-	unsigned bcount;
-};
-
-static struct stats dx_show_leaf(struct dx_hash_info *hinfo, struct ext3_dir_entry_2 *de,
-				 int size, int show_names)
-{
-	unsigned names = 0, space = 0;
-	char *base = (char *) de;
-	struct dx_hash_info h = *hinfo;
-
-	printk("names: ");
-	while ((char *) de < base + size)
-	{
-		if (de->inode)
-		{
-			if (show_names)
-			{
-				int len = de->name_len;
-				char *name = de->name;
-				while (len--) printk("%c", *name++);
-				ext3fs_dirhash(de->name, de->name_len, &h);
-				printk(":%x.%u ", h.hash,
-				       (unsigned) ((char *) de - base));
-			}
-			space += EXT3_DIR_REC_LEN(de->name_len);
-			names++;
-		}
-		de = ext3_next_entry(de);
-	}
-	printk("(%i)\n", names);
-	return (struct stats) { names, space, 1 };
-}
-
-struct stats dx_show_entries(struct dx_hash_info *hinfo, struct inode *dir,
-			     struct dx_entry *entries, int levels)
-{
-	unsigned blocksize = dir->i_sb->s_blocksize;
-	unsigned count = dx_get_count (entries), names = 0, space = 0, i;
-	unsigned bcount = 0;
-	struct buffer_head *bh;
-	int err;
-	printk("%i indexed blocks...\n", count);
-	for (i = 0; i < count; i++, entries++)
-	{
-		u32 block = dx_get_block(entries), hash = i? dx_get_hash(entries): 0;
-		u32 range = i < count - 1? (dx_get_hash(entries + 1) - hash): ~hash;
-		struct stats stats;
-		printk("%s%3u:%03u hash %8x/%8x ",levels?"":"   ", i, block, hash, range);
-		if (!(bh = ext3_bread (NULL,dir, block, 0,&err))) continue;
-		stats = levels?
-		   dx_show_entries(hinfo, dir, ((struct dx_node *) bh->b_data)->entries, levels - 1):
-		   dx_show_leaf(hinfo, (struct ext3_dir_entry_2 *) bh->b_data, blocksize, 0);
-		names += stats.names;
-		space += stats.space;
-		bcount += stats.bcount;
-		brelse (bh);
-	}
-	if (bcount)
-		printk("%snames %u, fullness %u (%u%%)\n", levels?"":"   ",
-			names, space/bcount,(space/bcount)*100/blocksize);
-	return (struct stats) { names, space, bcount};
-}
-#endif /* DX_DEBUG */
-
-/*
- * Probe for a directory leaf block to search.
- *
- * dx_probe can return ERR_BAD_DX_DIR, which means there was a format
- * error in the directory index, and the caller should fall back to
- * searching the directory normally.  The callers of dx_probe **MUST**
- * check for this error code, and make sure it never gets reflected
- * back to userspace.
- */
-static struct dx_frame *
-dx_probe(struct qstr *entry, struct inode *dir,
-	 struct dx_hash_info *hinfo, struct dx_frame *frame_in, int *err)
-{
-	unsigned count, indirect;
-	struct dx_entry *at, *entries, *p, *q, *m;
-	struct dx_root *root;
-	struct buffer_head *bh;
-	struct dx_frame *frame = frame_in;
-	u32 hash;
-
-	frame->bh = NULL;
-	if (!(bh = ext3_dir_bread(NULL, dir, 0, 0, err))) {
-		*err = ERR_BAD_DX_DIR;
-		goto fail;
-	}
-	root = (struct dx_root *) bh->b_data;
-	if (root->info.hash_version != DX_HASH_TEA &&
-	    root->info.hash_version != DX_HASH_HALF_MD4 &&
-	    root->info.hash_version != DX_HASH_LEGACY) {
-		ext3_warning(dir->i_sb, __func__,
-			     "Unrecognised inode hash code %d",
-			     root->info.hash_version);
-		brelse(bh);
-		*err = ERR_BAD_DX_DIR;
-		goto fail;
-	}
-	hinfo->hash_version = root->info.hash_version;
-	if (hinfo->hash_version <= DX_HASH_TEA)
-		hinfo->hash_version += EXT3_SB(dir->i_sb)->s_hash_unsigned;
-	hinfo->seed = EXT3_SB(dir->i_sb)->s_hash_seed;
-	if (entry)
-		ext3fs_dirhash(entry->name, entry->len, hinfo);
-	hash = hinfo->hash;
-
-	if (root->info.unused_flags & 1) {
-		ext3_warning(dir->i_sb, __func__,
-			     "Unimplemented inode hash flags: %#06x",
-			     root->info.unused_flags);
-		brelse(bh);
-		*err = ERR_BAD_DX_DIR;
-		goto fail;
-	}
-
-	if ((indirect = root->info.indirect_levels) > 1) {
-		ext3_warning(dir->i_sb, __func__,
-			     "Unimplemented inode hash depth: %#06x",
-			     root->info.indirect_levels);
-		brelse(bh);
-		*err = ERR_BAD_DX_DIR;
-		goto fail;
-	}
-
-	entries = (struct dx_entry *) (((char *)&root->info) +
-				       root->info.info_length);
-
-	if (dx_get_limit(entries) != dx_root_limit(dir,
-						   root->info.info_length)) {
-		ext3_warning(dir->i_sb, __func__,
-			     "dx entry: limit != root limit");
-		brelse(bh);
-		*err = ERR_BAD_DX_DIR;
-		goto fail;
-	}
-
-	dxtrace (printk("Look up %x", hash));
-	while (1)
-	{
-		count = dx_get_count(entries);
-		if (!count || count > dx_get_limit(entries)) {
-			ext3_warning(dir->i_sb, __func__,
-				     "dx entry: no count or count > limit");
-			brelse(bh);
-			*err = ERR_BAD_DX_DIR;
-			goto fail2;
-		}
-
-		p = entries + 1;
-		q = entries + count - 1;
-		while (p <= q)
-		{
-			m = p + (q - p)/2;
-			dxtrace(printk("."));
-			if (dx_get_hash(m) > hash)
-				q = m - 1;
-			else
-				p = m + 1;
-		}
-
-		if (0) // linear search cross check
-		{
-			unsigned n = count - 1;
-			at = entries;
-			while (n--)
-			{
-				dxtrace(printk(","));
-				if (dx_get_hash(++at) > hash)
-				{
-					at--;
-					break;
-				}
-			}
-			assert (at == p - 1);
-		}
-
-		at = p - 1;
-		dxtrace(printk(" %x->%u\n", at == entries? 0: dx_get_hash(at), dx_get_block(at)));
-		frame->bh = bh;
-		frame->entries = entries;
-		frame->at = at;
-		if (!indirect--) return frame;
-		if (!(bh = ext3_dir_bread(NULL, dir, dx_get_block(at), 0, err))) {
-			*err = ERR_BAD_DX_DIR;
-			goto fail2;
-		}
-		at = entries = ((struct dx_node *) bh->b_data)->entries;
-		if (dx_get_limit(entries) != dx_node_limit (dir)) {
-			ext3_warning(dir->i_sb, __func__,
-				     "dx entry: limit != node limit");
-			brelse(bh);
-			*err = ERR_BAD_DX_DIR;
-			goto fail2;
-		}
-		frame++;
-		frame->bh = NULL;
-	}
-fail2:
-	while (frame >= frame_in) {
-		brelse(frame->bh);
-		frame--;
-	}
-fail:
-	if (*err == ERR_BAD_DX_DIR)
-		ext3_warning(dir->i_sb, __func__,
-			     "Corrupt dir inode %ld, running e2fsck is "
-			     "recommended.", dir->i_ino);
-	return NULL;
-}
-
-static void dx_release (struct dx_frame *frames)
-{
-	if (frames[0].bh == NULL)
-		return;
-
-	if (((struct dx_root *) frames[0].bh->b_data)->info.indirect_levels)
-		brelse(frames[1].bh);
-	brelse(frames[0].bh);
-}
-
-/*
- * This function increments the frame pointer to search the next leaf
- * block, and reads in the necessary intervening nodes if the search
- * should be necessary.  Whether or not the search is necessary is
- * controlled by the hash parameter.  If the hash value is even, then
- * the search is only continued if the next block starts with that
- * hash value.  This is used if we are searching for a specific file.
- *
- * If the hash value is HASH_NB_ALWAYS, then always go to the next block.
- *
- * This function returns 1 if the caller should continue to search,
- * or 0 if it should not.  If there is an error reading one of the
- * index blocks, it will a negative error code.
- *
- * If start_hash is non-null, it will be filled in with the starting
- * hash of the next page.
- */
-static int ext3_htree_next_block(struct inode *dir, __u32 hash,
-				 struct dx_frame *frame,
-				 struct dx_frame *frames,
-				 __u32 *start_hash)
-{
-	struct dx_frame *p;
-	struct buffer_head *bh;
-	int err, num_frames = 0;
-	__u32 bhash;
-
-	p = frame;
-	/*
-	 * Find the next leaf page by incrementing the frame pointer.
-	 * If we run out of entries in the interior node, loop around and
-	 * increment pointer in the parent node.  When we break out of
-	 * this loop, num_frames indicates the number of interior
-	 * nodes need to be read.
-	 */
-	while (1) {
-		if (++(p->at) < p->entries + dx_get_count(p->entries))
-			break;
-		if (p == frames)
-			return 0;
-		num_frames++;
-		p--;
-	}
-
-	/*
-	 * If the hash is 1, then continue only if the next page has a
-	 * continuation hash of any value.  This is used for readdir
-	 * handling.  Otherwise, check to see if the hash matches the
-	 * desired contiuation hash.  If it doesn't, return since
-	 * there's no point to read in the successive index pages.
-	 */
-	bhash = dx_get_hash(p->at);
-	if (start_hash)
-		*start_hash = bhash;
-	if ((hash & 1) == 0) {
-		if ((bhash & ~1) != hash)
-			return 0;
-	}
-	/*
-	 * If the hash is HASH_NB_ALWAYS, we always go to the next
-	 * block so no check is necessary
-	 */
-	while (num_frames--) {
-		if (!(bh = ext3_dir_bread(NULL, dir, dx_get_block(p->at),
-					  0, &err)))
-			return err; /* Failure */
-		p++;
-		brelse (p->bh);
-		p->bh = bh;
-		p->at = p->entries = ((struct dx_node *) bh->b_data)->entries;
-	}
-	return 1;
-}
-
-
-/*
- * This function fills a red-black tree with information from a
- * directory block.  It returns the number directory entries loaded
- * into the tree.  If there is an error it is returned in err.
- */
-static int htree_dirblock_to_tree(struct file *dir_file,
-				  struct inode *dir, int block,
-				  struct dx_hash_info *hinfo,
-				  __u32 start_hash, __u32 start_minor_hash)
-{
-	struct buffer_head *bh;
-	struct ext3_dir_entry_2 *de, *top;
-	int err = 0, count = 0;
-
-	dxtrace(printk("In htree dirblock_to_tree: block %d\n", block));
-
-	if (!(bh = ext3_dir_bread(NULL, dir, block, 0, &err)))
-		return err;
-
-	de = (struct ext3_dir_entry_2 *) bh->b_data;
-	top = (struct ext3_dir_entry_2 *) ((char *) de +
-					   dir->i_sb->s_blocksize -
-					   EXT3_DIR_REC_LEN(0));
-	for (; de < top; de = ext3_next_entry(de)) {
-		if (!ext3_check_dir_entry("htree_dirblock_to_tree", dir, de, bh,
-					(block<<EXT3_BLOCK_SIZE_BITS(dir->i_sb))
-						+((char *)de - bh->b_data))) {
-			/* silently ignore the rest of the block */
-			break;
-		}
-		ext3fs_dirhash(de->name, de->name_len, hinfo);
-		if ((hinfo->hash < start_hash) ||
-		    ((hinfo->hash == start_hash) &&
-		     (hinfo->minor_hash < start_minor_hash)))
-			continue;
-		if (de->inode == 0)
-			continue;
-		if ((err = ext3_htree_store_dirent(dir_file,
-				   hinfo->hash, hinfo->minor_hash, de)) != 0) {
-			brelse(bh);
-			return err;
-		}
-		count++;
-	}
-	brelse(bh);
-	return count;
-}
-
-
-/*
- * This function fills a red-black tree with information from a
- * directory.  We start scanning the directory in hash order, starting
- * at start_hash and start_minor_hash.
- *
- * This function returns the number of entries inserted into the tree,
- * or a negative error code.
- */
-int ext3_htree_fill_tree(struct file *dir_file, __u32 start_hash,
-			 __u32 start_minor_hash, __u32 *next_hash)
-{
-	struct dx_hash_info hinfo;
-	struct ext3_dir_entry_2 *de;
-	struct dx_frame frames[2], *frame;
-	struct inode *dir;
-	int block, err;
-	int count = 0;
-	int ret;
-	__u32 hashval;
-
-	dxtrace(printk("In htree_fill_tree, start hash: %x:%x\n", start_hash,
-		       start_minor_hash));
-	dir = file_inode(dir_file);
-	if (!(EXT3_I(dir)->i_flags & EXT3_INDEX_FL)) {
-		hinfo.hash_version = EXT3_SB(dir->i_sb)->s_def_hash_version;
-		if (hinfo.hash_version <= DX_HASH_TEA)
-			hinfo.hash_version +=
-				EXT3_SB(dir->i_sb)->s_hash_unsigned;
-		hinfo.seed = EXT3_SB(dir->i_sb)->s_hash_seed;
-		count = htree_dirblock_to_tree(dir_file, dir, 0, &hinfo,
-					       start_hash, start_minor_hash);
-		*next_hash = ~0;
-		return count;
-	}
-	hinfo.hash = start_hash;
-	hinfo.minor_hash = 0;
-	frame = dx_probe(NULL, file_inode(dir_file), &hinfo, frames, &err);
-	if (!frame)
-		return err;
-
-	/* Add '.' and '..' from the htree header */
-	if (!start_hash && !start_minor_hash) {
-		de = (struct ext3_dir_entry_2 *) frames[0].bh->b_data;
-		if ((err = ext3_htree_store_dirent(dir_file, 0, 0, de)) != 0)
-			goto errout;
-		count++;
-	}
-	if (start_hash < 2 || (start_hash ==2 && start_minor_hash==0)) {
-		de = (struct ext3_dir_entry_2 *) frames[0].bh->b_data;
-		de = ext3_next_entry(de);
-		if ((err = ext3_htree_store_dirent(dir_file, 2, 0, de)) != 0)
-			goto errout;
-		count++;
-	}
-
-	while (1) {
-		block = dx_get_block(frame->at);
-		ret = htree_dirblock_to_tree(dir_file, dir, block, &hinfo,
-					     start_hash, start_minor_hash);
-		if (ret < 0) {
-			err = ret;
-			goto errout;
-		}
-		count += ret;
-		hashval = ~0;
-		ret = ext3_htree_next_block(dir, HASH_NB_ALWAYS,
-					    frame, frames, &hashval);
-		*next_hash = hashval;
-		if (ret < 0) {
-			err = ret;
-			goto errout;
-		}
-		/*
-		 * Stop if:  (a) there are no more entries, or
-		 * (b) we have inserted at least one entry and the
-		 * next hash value is not a continuation
-		 */
-		if ((ret == 0) ||
-		    (count && ((hashval & 1) == 0)))
-			break;
-	}
-	dx_release(frames);
-	dxtrace(printk("Fill tree: returned %d entries, next hash: %x\n",
-		       count, *next_hash));
-	return count;
-errout:
-	dx_release(frames);
-	return (err);
-}
-
-
-/*
- * Directory block splitting, compacting
- */
-
-/*
- * Create map of hash values, offsets, and sizes, stored at end of block.
- * Returns number of entries mapped.
- */
-static int dx_make_map(struct ext3_dir_entry_2 *de, unsigned blocksize,
-		struct dx_hash_info *hinfo, struct dx_map_entry *map_tail)
-{
-	int count = 0;
-	char *base = (char *) de;
-	struct dx_hash_info h = *hinfo;
-
-	while ((char *) de < base + blocksize)
-	{
-		if (de->name_len && de->inode) {
-			ext3fs_dirhash(de->name, de->name_len, &h);
-			map_tail--;
-			map_tail->hash = h.hash;
-			map_tail->offs = (u16) ((char *) de - base);
-			map_tail->size = le16_to_cpu(de->rec_len);
-			count++;
-			cond_resched();
-		}
-		/* XXX: do we need to check rec_len == 0 case? -Chris */
-		de = ext3_next_entry(de);
-	}
-	return count;
-}
-
-/* Sort map by hash value */
-static void dx_sort_map (struct dx_map_entry *map, unsigned count)
-{
-        struct dx_map_entry *p, *q, *top = map + count - 1;
-        int more;
-        /* Combsort until bubble sort doesn't suck */
-        while (count > 2)
-	{
-                count = count*10/13;
-                if (count - 9 < 2) /* 9, 10 -> 11 */
-                        count = 11;
-                for (p = top, q = p - count; q >= map; p--, q--)
-                        if (p->hash < q->hash)
-                                swap(*p, *q);
-        }
-        /* Garden variety bubble sort */
-        do {
-                more = 0;
-                q = top;
-                while (q-- > map)
-		{
-                        if (q[1].hash >= q[0].hash)
-				continue;
-                        swap(*(q+1), *q);
-                        more = 1;
-		}
-	} while(more);
-}
-
-static void dx_insert_block(struct dx_frame *frame, u32 hash, u32 block)
-{
-	struct dx_entry *entries = frame->entries;
-	struct dx_entry *old = frame->at, *new = old + 1;
-	int count = dx_get_count(entries);
-
-	assert(count < dx_get_limit(entries));
-	assert(old < entries + count);
-	memmove(new + 1, new, (char *)(entries + count) - (char *)(new));
-	dx_set_hash(new, hash);
-	dx_set_block(new, block);
-	dx_set_count(entries, count + 1);
-}
-
-static void ext3_update_dx_flag(struct inode *inode)
-{
-	if (!EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
-				     EXT3_FEATURE_COMPAT_DIR_INDEX))
-		EXT3_I(inode)->i_flags &= ~EXT3_INDEX_FL;
-}
-
-/*
- * NOTE! unlike strncmp, ext3_match returns 1 for success, 0 for failure.
- *
- * `len <= EXT3_NAME_LEN' is guaranteed by caller.
- * `de != NULL' is guaranteed by caller.
- */
-static inline int ext3_match (int len, const char * const name,
-			      struct ext3_dir_entry_2 * de)
-{
-	if (len != de->name_len)
-		return 0;
-	if (!de->inode)
-		return 0;
-	return !memcmp(name, de->name, len);
-}
-
-/*
- * Returns 0 if not found, -1 on failure, and 1 on success
- */
-static inline int search_dirblock(struct buffer_head * bh,
-				  struct inode *dir,
-				  struct qstr *child,
-				  unsigned long offset,
-				  struct ext3_dir_entry_2 ** res_dir)
-{
-	struct ext3_dir_entry_2 * de;
-	char * dlimit;
-	int de_len;
-	const char *name = child->name;
-	int namelen = child->len;
-
-	de = (struct ext3_dir_entry_2 *) bh->b_data;
-	dlimit = bh->b_data + dir->i_sb->s_blocksize;
-	while ((char *) de < dlimit) {
-		/* this code is executed quadratically often */
-		/* do minimal checking `by hand' */
-
-		if ((char *) de + namelen <= dlimit &&
-		    ext3_match (namelen, name, de)) {
-			/* found a match - just to be sure, do a full check */
-			if (!ext3_check_dir_entry("ext3_find_entry",
-						  dir, de, bh, offset))
-				return -1;
-			*res_dir = de;
-			return 1;
-		}
-		/* prevent looping on a bad block */
-		de_len = ext3_rec_len_from_disk(de->rec_len);
-		if (de_len <= 0)
-			return -1;
-		offset += de_len;
-		de = (struct ext3_dir_entry_2 *) ((char *) de + de_len);
-	}
-	return 0;
-}
-
-
-/*
- *	ext3_find_entry()
- *
- * finds an entry in the specified directory with the wanted name. It
- * returns the cache buffer in which the entry was found, and the entry
- * itself (as a parameter - res_dir). It does NOT read the inode of the
- * entry - you'll have to do that yourself if you want to.
- *
- * The returned buffer_head has ->b_count elevated.  The caller is expected
- * to brelse() it when appropriate.
- */
-static struct buffer_head *ext3_find_entry(struct inode *dir,
-					struct qstr *entry,
-					struct ext3_dir_entry_2 **res_dir)
-{
-	struct super_block * sb;
-	struct buffer_head * bh_use[NAMEI_RA_SIZE];
-	struct buffer_head * bh, *ret = NULL;
-	unsigned long start, block, b;
-	const u8 *name = entry->name;
-	int ra_max = 0;		/* Number of bh's in the readahead
-				   buffer, bh_use[] */
-	int ra_ptr = 0;		/* Current index into readahead
-				   buffer */
-	int num = 0;
-	int nblocks, i, err;
-	int namelen;
-
-	*res_dir = NULL;
-	sb = dir->i_sb;
-	namelen = entry->len;
-	if (namelen > EXT3_NAME_LEN)
-		return NULL;
-	if ((namelen <= 2) && (name[0] == '.') &&
-	    (name[1] == '.' || name[1] == 0)) {
-		/*
-		 * "." or ".." will only be in the first block
-		 * NFS may look up ".."; "." should be handled by the VFS
-		 */
-		block = start = 0;
-		nblocks = 1;
-		goto restart;
-	}
-	if (is_dx(dir)) {
-		bh = ext3_dx_find_entry(dir, entry, res_dir, &err);
-		/*
-		 * On success, or if the error was file not found,
-		 * return.  Otherwise, fall back to doing a search the
-		 * old fashioned way.
-		 */
-		if (bh || (err != ERR_BAD_DX_DIR))
-			return bh;
-		dxtrace(printk("ext3_find_entry: dx failed, falling back\n"));
-	}
-	nblocks = dir->i_size >> EXT3_BLOCK_SIZE_BITS(sb);
-	start = EXT3_I(dir)->i_dir_start_lookup;
-	if (start >= nblocks)
-		start = 0;
-	block = start;
-restart:
-	do {
-		/*
-		 * We deal with the read-ahead logic here.
-		 */
-		if (ra_ptr >= ra_max) {
-			/* Refill the readahead buffer */
-			ra_ptr = 0;
-			b = block;
-			for (ra_max = 0; ra_max < NAMEI_RA_SIZE; ra_max++) {
-				/*
-				 * Terminate if we reach the end of the
-				 * directory and must wrap, or if our
-				 * search has finished at this block.
-				 */
-				if (b >= nblocks || (num && block == start)) {
-					bh_use[ra_max] = NULL;
-					break;
-				}
-				num++;
-				bh = ext3_getblk(NULL, dir, b++, 0, &err);
-				bh_use[ra_max] = bh;
-				if (bh && !bh_uptodate_or_lock(bh)) {
-					get_bh(bh);
-					bh->b_end_io = end_buffer_read_sync;
-					submit_bh(READ | REQ_META | REQ_PRIO,
-						  bh);
-				}
-			}
-		}
-		if ((bh = bh_use[ra_ptr++]) == NULL)
-			goto next;
-		wait_on_buffer(bh);
-		if (!buffer_uptodate(bh)) {
-			/* read error, skip block & hope for the best */
-			ext3_error(sb, __func__, "reading directory #%lu "
-				   "offset %lu", dir->i_ino, block);
-			brelse(bh);
-			goto next;
-		}
-		i = search_dirblock(bh, dir, entry,
-			    block << EXT3_BLOCK_SIZE_BITS(sb), res_dir);
-		if (i == 1) {
-			EXT3_I(dir)->i_dir_start_lookup = block;
-			ret = bh;
-			goto cleanup_and_exit;
-		} else {
-			brelse(bh);
-			if (i < 0)
-				goto cleanup_and_exit;
-		}
-	next:
-		if (++block >= nblocks)
-			block = 0;
-	} while (block != start);
-
-	/*
-	 * If the directory has grown while we were searching, then
-	 * search the last part of the directory before giving up.
-	 */
-	block = nblocks;
-	nblocks = dir->i_size >> EXT3_BLOCK_SIZE_BITS(sb);
-	if (block < nblocks) {
-		start = 0;
-		goto restart;
-	}
-
-cleanup_and_exit:
-	/* Clean up the read-ahead blocks */
-	for (; ra_ptr < ra_max; ra_ptr++)
-		brelse (bh_use[ra_ptr]);
-	return ret;
-}
-
-static struct buffer_head * ext3_dx_find_entry(struct inode *dir,
-			struct qstr *entry, struct ext3_dir_entry_2 **res_dir,
-			int *err)
-{
-	struct super_block *sb = dir->i_sb;
-	struct dx_hash_info	hinfo;
-	struct dx_frame frames[2], *frame;
-	struct buffer_head *bh;
-	unsigned long block;
-	int retval;
-
-	if (!(frame = dx_probe(entry, dir, &hinfo, frames, err)))
-		return NULL;
-	do {
-		block = dx_get_block(frame->at);
-		if (!(bh = ext3_dir_bread (NULL, dir, block, 0, err)))
-			goto errout;
-
-		retval = search_dirblock(bh, dir, entry,
-					 block << EXT3_BLOCK_SIZE_BITS(sb),
-					 res_dir);
-		if (retval == 1) {
-			dx_release(frames);
-			return bh;
-		}
-		brelse(bh);
-		if (retval == -1) {
-			*err = ERR_BAD_DX_DIR;
-			goto errout;
-		}
-
-		/* Check to see if we should continue to search */
-		retval = ext3_htree_next_block(dir, hinfo.hash, frame,
-					       frames, NULL);
-		if (retval < 0) {
-			ext3_warning(sb, __func__,
-			     "error reading index page in directory #%lu",
-			     dir->i_ino);
-			*err = retval;
-			goto errout;
-		}
-	} while (retval == 1);
-
-	*err = -ENOENT;
-errout:
-	dxtrace(printk("%s not found\n", entry->name));
-	dx_release (frames);
-	return NULL;
-}
-
-static struct dentry *ext3_lookup(struct inode * dir, struct dentry *dentry, unsigned int flags)
-{
-	struct inode * inode;
-	struct ext3_dir_entry_2 * de;
-	struct buffer_head * bh;
-
-	if (dentry->d_name.len > EXT3_NAME_LEN)
-		return ERR_PTR(-ENAMETOOLONG);
-
-	bh = ext3_find_entry(dir, &dentry->d_name, &de);
-	inode = NULL;
-	if (bh) {
-		unsigned long ino = le32_to_cpu(de->inode);
-		brelse (bh);
-		if (!ext3_valid_inum(dir->i_sb, ino)) {
-			ext3_error(dir->i_sb, "ext3_lookup",
-				   "bad inode number: %lu", ino);
-			return ERR_PTR(-EIO);
-		}
-		inode = ext3_iget(dir->i_sb, ino);
-		if (inode == ERR_PTR(-ESTALE)) {
-			ext3_error(dir->i_sb, __func__,
-					"deleted inode referenced: %lu",
-					ino);
-			return ERR_PTR(-EIO);
-		}
-	}
-	return d_splice_alias(inode, dentry);
-}
-
-
-struct dentry *ext3_get_parent(struct dentry *child)
-{
-	unsigned long ino;
-	struct qstr dotdot = QSTR_INIT("..", 2);
-	struct ext3_dir_entry_2 * de;
-	struct buffer_head *bh;
-
-	bh = ext3_find_entry(d_inode(child), &dotdot, &de);
-	if (!bh)
-		return ERR_PTR(-ENOENT);
-	ino = le32_to_cpu(de->inode);
-	brelse(bh);
-
-	if (!ext3_valid_inum(d_inode(child)->i_sb, ino)) {
-		ext3_error(d_inode(child)->i_sb, "ext3_get_parent",
-			   "bad inode number: %lu", ino);
-		return ERR_PTR(-EIO);
-	}
-
-	return d_obtain_alias(ext3_iget(d_inode(child)->i_sb, ino));
-}
-
-#define S_SHIFT 12
-static unsigned char ext3_type_by_mode[S_IFMT >> S_SHIFT] = {
-	[S_IFREG >> S_SHIFT]	= EXT3_FT_REG_FILE,
-	[S_IFDIR >> S_SHIFT]	= EXT3_FT_DIR,
-	[S_IFCHR >> S_SHIFT]	= EXT3_FT_CHRDEV,
-	[S_IFBLK >> S_SHIFT]	= EXT3_FT_BLKDEV,
-	[S_IFIFO >> S_SHIFT]	= EXT3_FT_FIFO,
-	[S_IFSOCK >> S_SHIFT]	= EXT3_FT_SOCK,
-	[S_IFLNK >> S_SHIFT]	= EXT3_FT_SYMLINK,
-};
-
-static inline void ext3_set_de_type(struct super_block *sb,
-				struct ext3_dir_entry_2 *de,
-				umode_t mode) {
-	if (EXT3_HAS_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_FILETYPE))
-		de->file_type = ext3_type_by_mode[(mode & S_IFMT)>>S_SHIFT];
-}
-
-/*
- * Move count entries from end of map between two memory locations.
- * Returns pointer to last entry moved.
- */
-static struct ext3_dir_entry_2 *
-dx_move_dirents(char *from, char *to, struct dx_map_entry *map, int count)
-{
-	unsigned rec_len = 0;
-
-	while (count--) {
-		struct ext3_dir_entry_2 *de = (struct ext3_dir_entry_2 *) (from + map->offs);
-		rec_len = EXT3_DIR_REC_LEN(de->name_len);
-		memcpy (to, de, rec_len);
-		((struct ext3_dir_entry_2 *) to)->rec_len =
-				ext3_rec_len_to_disk(rec_len);
-		de->inode = 0;
-		map++;
-		to += rec_len;
-	}
-	return (struct ext3_dir_entry_2 *) (to - rec_len);
-}
-
-/*
- * Compact each dir entry in the range to the minimal rec_len.
- * Returns pointer to last entry in range.
- */
-static struct ext3_dir_entry_2 *dx_pack_dirents(char *base, unsigned blocksize)
-{
-	struct ext3_dir_entry_2 *next, *to, *prev;
-	struct ext3_dir_entry_2 *de = (struct ext3_dir_entry_2 *)base;
-	unsigned rec_len = 0;
-
-	prev = to = de;
-	while ((char *)de < base + blocksize) {
-		next = ext3_next_entry(de);
-		if (de->inode && de->name_len) {
-			rec_len = EXT3_DIR_REC_LEN(de->name_len);
-			if (de > to)
-				memmove(to, de, rec_len);
-			to->rec_len = ext3_rec_len_to_disk(rec_len);
-			prev = to;
-			to = (struct ext3_dir_entry_2 *) (((char *) to) + rec_len);
-		}
-		de = next;
-	}
-	return prev;
-}
-
-/*
- * Split a full leaf block to make room for a new dir entry.
- * Allocate a new block, and move entries so that they are approx. equally full.
- * Returns pointer to de in block into which the new entry will be inserted.
- */
-static struct ext3_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
-			struct buffer_head **bh,struct dx_frame *frame,
-			struct dx_hash_info *hinfo, int *error)
-{
-	unsigned blocksize = dir->i_sb->s_blocksize;
-	unsigned count, continued;
-	struct buffer_head *bh2;
-	u32 newblock;
-	u32 hash2;
-	struct dx_map_entry *map;
-	char *data1 = (*bh)->b_data, *data2;
-	unsigned split, move, size;
-	struct ext3_dir_entry_2 *de = NULL, *de2;
-	int	err = 0, i;
-
-	bh2 = ext3_append (handle, dir, &newblock, &err);
-	if (!(bh2)) {
-		brelse(*bh);
-		*bh = NULL;
-		goto errout;
-	}
-
-	BUFFER_TRACE(*bh, "get_write_access");
-	err = ext3_journal_get_write_access(handle, *bh);
-	if (err)
-		goto journal_error;
-
-	BUFFER_TRACE(frame->bh, "get_write_access");
-	err = ext3_journal_get_write_access(handle, frame->bh);
-	if (err)
-		goto journal_error;
-
-	data2 = bh2->b_data;
-
-	/* create map in the end of data2 block */
-	map = (struct dx_map_entry *) (data2 + blocksize);
-	count = dx_make_map ((struct ext3_dir_entry_2 *) data1,
-			     blocksize, hinfo, map);
-	map -= count;
-	dx_sort_map (map, count);
-	/* Split the existing block in the middle, size-wise */
-	size = 0;
-	move = 0;
-	for (i = count-1; i >= 0; i--) {
-		/* is more than half of this entry in 2nd half of the block? */
-		if (size + map[i].size/2 > blocksize/2)
-			break;
-		size += map[i].size;
-		move++;
-	}
-	/* map index at which we will split */
-	split = count - move;
-	hash2 = map[split].hash;
-	continued = hash2 == map[split - 1].hash;
-	dxtrace(printk("Split block %i at %x, %i/%i\n",
-		dx_get_block(frame->at), hash2, split, count-split));
-
-	/* Fancy dance to stay within two buffers */
-	de2 = dx_move_dirents(data1, data2, map + split, count - split);
-	de = dx_pack_dirents(data1,blocksize);
-	de->rec_len = ext3_rec_len_to_disk(data1 + blocksize - (char *) de);
-	de2->rec_len = ext3_rec_len_to_disk(data2 + blocksize - (char *) de2);
-	dxtrace(dx_show_leaf (hinfo, (struct ext3_dir_entry_2 *) data1, blocksize, 1));
-	dxtrace(dx_show_leaf (hinfo, (struct ext3_dir_entry_2 *) data2, blocksize, 1));
-
-	/* Which block gets the new entry? */
-	if (hinfo->hash >= hash2)
-	{
-		swap(*bh, bh2);
-		de = de2;
-	}
-	dx_insert_block (frame, hash2 + continued, newblock);
-	err = ext3_journal_dirty_metadata (handle, bh2);
-	if (err)
-		goto journal_error;
-	err = ext3_journal_dirty_metadata (handle, frame->bh);
-	if (err)
-		goto journal_error;
-	brelse (bh2);
-	dxtrace(dx_show_index ("frame", frame->entries));
-	return de;
-
-journal_error:
-	brelse(*bh);
-	brelse(bh2);
-	*bh = NULL;
-	ext3_std_error(dir->i_sb, err);
-errout:
-	*error = err;
-	return NULL;
-}
-
-
-/*
- * Add a new entry into a directory (leaf) block.  If de is non-NULL,
- * it points to a directory entry which is guaranteed to be large
- * enough for new directory entry.  If de is NULL, then
- * add_dirent_to_buf will attempt search the directory block for
- * space.  It will return -ENOSPC if no space is available, and -EIO
- * and -EEXIST if directory entry already exists.
- *
- * NOTE!  bh is NOT released in the case where ENOSPC is returned.  In
- * all other cases bh is released.
- */
-static int add_dirent_to_buf(handle_t *handle, struct dentry *dentry,
-			     struct inode *inode, struct ext3_dir_entry_2 *de,
-			     struct buffer_head * bh)
-{
-	struct inode	*dir = d_inode(dentry->d_parent);
-	const char	*name = dentry->d_name.name;
-	int		namelen = dentry->d_name.len;
-	unsigned long	offset = 0;
-	unsigned short	reclen;
-	int		nlen, rlen, err;
-	char		*top;
-
-	reclen = EXT3_DIR_REC_LEN(namelen);
-	if (!de) {
-		de = (struct ext3_dir_entry_2 *)bh->b_data;
-		top = bh->b_data + dir->i_sb->s_blocksize - reclen;
-		while ((char *) de <= top) {
-			if (!ext3_check_dir_entry("ext3_add_entry", dir, de,
-						  bh, offset)) {
-				brelse (bh);
-				return -EIO;
-			}
-			if (ext3_match (namelen, name, de)) {
-				brelse (bh);
-				return -EEXIST;
-			}
-			nlen = EXT3_DIR_REC_LEN(de->name_len);
-			rlen = ext3_rec_len_from_disk(de->rec_len);
-			if ((de->inode? rlen - nlen: rlen) >= reclen)
-				break;
-			de = (struct ext3_dir_entry_2 *)((char *)de + rlen);
-			offset += rlen;
-		}
-		if ((char *) de > top)
-			return -ENOSPC;
-	}
-	BUFFER_TRACE(bh, "get_write_access");
-	err = ext3_journal_get_write_access(handle, bh);
-	if (err) {
-		ext3_std_error(dir->i_sb, err);
-		brelse(bh);
-		return err;
-	}
-
-	/* By now the buffer is marked for journaling */
-	nlen = EXT3_DIR_REC_LEN(de->name_len);
-	rlen = ext3_rec_len_from_disk(de->rec_len);
-	if (de->inode) {
-		struct ext3_dir_entry_2 *de1 = (struct ext3_dir_entry_2 *)((char *)de + nlen);
-		de1->rec_len = ext3_rec_len_to_disk(rlen - nlen);
-		de->rec_len = ext3_rec_len_to_disk(nlen);
-		de = de1;
-	}
-	de->file_type = EXT3_FT_UNKNOWN;
-	if (inode) {
-		de->inode = cpu_to_le32(inode->i_ino);
-		ext3_set_de_type(dir->i_sb, de, inode->i_mode);
-	} else
-		de->inode = 0;
-	de->name_len = namelen;
-	memcpy (de->name, name, namelen);
-	/*
-	 * XXX shouldn't update any times until successful
-	 * completion of syscall, but too many callers depend
-	 * on this.
-	 *
-	 * XXX similarly, too many callers depend on
-	 * ext3_new_inode() setting the times, but error
-	 * recovery deletes the inode, so the worst that can
-	 * happen is that the times are slightly out of date
-	 * and/or different from the directory change time.
-	 */
-	dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
-	ext3_update_dx_flag(dir);
-	dir->i_version++;
-	ext3_mark_inode_dirty(handle, dir);
-	BUFFER_TRACE(bh, "call ext3_journal_dirty_metadata");
-	err = ext3_journal_dirty_metadata(handle, bh);
-	if (err)
-		ext3_std_error(dir->i_sb, err);
-	brelse(bh);
-	return 0;
-}
-
-/*
- * This converts a one block unindexed directory to a 3 block indexed
- * directory, and adds the dentry to the indexed directory.
- */
-static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
-			    struct inode *inode, struct buffer_head *bh)
-{
-	struct inode	*dir = d_inode(dentry->d_parent);
-	const char	*name = dentry->d_name.name;
-	int		namelen = dentry->d_name.len;
-	struct buffer_head *bh2;
-	struct dx_root	*root;
-	struct dx_frame	frames[2], *frame;
-	struct dx_entry *entries;
-	struct ext3_dir_entry_2	*de, *de2;
-	char		*data1, *top;
-	unsigned	len;
-	int		retval;
-	unsigned	blocksize;
-	struct dx_hash_info hinfo;
-	u32		block;
-	struct fake_dirent *fde;
-
-	blocksize =  dir->i_sb->s_blocksize;
-	dxtrace(printk(KERN_DEBUG "Creating index: inode %lu\n", dir->i_ino));
-	retval = ext3_journal_get_write_access(handle, bh);
-	if (retval) {
-		ext3_std_error(dir->i_sb, retval);
-		brelse(bh);
-		return retval;
-	}
-	root = (struct dx_root *) bh->b_data;
-
-	/* The 0th block becomes the root, move the dirents out */
-	fde = &root->dotdot;
-	de = (struct ext3_dir_entry_2 *)((char *)fde +
-			ext3_rec_len_from_disk(fde->rec_len));
-	if ((char *) de >= (((char *) root) + blocksize)) {
-		ext3_error(dir->i_sb, __func__,
-			   "invalid rec_len for '..' in inode %lu",
-			   dir->i_ino);
-		brelse(bh);
-		return -EIO;
-	}
-	len = ((char *) root) + blocksize - (char *) de;
-
-	bh2 = ext3_append (handle, dir, &block, &retval);
-	if (!(bh2)) {
-		brelse(bh);
-		return retval;
-	}
-	EXT3_I(dir)->i_flags |= EXT3_INDEX_FL;
-	data1 = bh2->b_data;
-
-	memcpy (data1, de, len);
-	de = (struct ext3_dir_entry_2 *) data1;
-	top = data1 + len;
-	while ((char *)(de2 = ext3_next_entry(de)) < top)
-		de = de2;
-	de->rec_len = ext3_rec_len_to_disk(data1 + blocksize - (char *) de);
-	/* Initialize the root; the dot dirents already exist */
-	de = (struct ext3_dir_entry_2 *) (&root->dotdot);
-	de->rec_len = ext3_rec_len_to_disk(blocksize - EXT3_DIR_REC_LEN(2));
-	memset (&root->info, 0, sizeof(root->info));
-	root->info.info_length = sizeof(root->info);
-	root->info.hash_version = EXT3_SB(dir->i_sb)->s_def_hash_version;
-	entries = root->entries;
-	dx_set_block (entries, 1);
-	dx_set_count (entries, 1);
-	dx_set_limit (entries, dx_root_limit(dir, sizeof(root->info)));
-
-	/* Initialize as for dx_probe */
-	hinfo.hash_version = root->info.hash_version;
-	if (hinfo.hash_version <= DX_HASH_TEA)
-		hinfo.hash_version += EXT3_SB(dir->i_sb)->s_hash_unsigned;
-	hinfo.seed = EXT3_SB(dir->i_sb)->s_hash_seed;
-	ext3fs_dirhash(name, namelen, &hinfo);
-	frame = frames;
-	frame->entries = entries;
-	frame->at = entries;
-	frame->bh = bh;
-	bh = bh2;
-	/*
-	 * Mark buffers dirty here so that if do_split() fails we write a
-	 * consistent set of buffers to disk.
-	 */
-	ext3_journal_dirty_metadata(handle, frame->bh);
-	ext3_journal_dirty_metadata(handle, bh);
-	de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
-	if (!de) {
-		ext3_mark_inode_dirty(handle, dir);
-		dx_release(frames);
-		return retval;
-	}
-	dx_release(frames);
-
-	return add_dirent_to_buf(handle, dentry, inode, de, bh);
-}
-
-/*
- *	ext3_add_entry()
- *
- * adds a file entry to the specified directory, using the same
- * semantics as ext3_find_entry(). It returns NULL if it failed.
- *
- * NOTE!! The inode part of 'de' is left at 0 - which means you
- * may not sleep between calling this and putting something into
- * the entry, as someone else might have used it while you slept.
- */
-static int ext3_add_entry (handle_t *handle, struct dentry *dentry,
-	struct inode *inode)
-{
-	struct inode *dir = d_inode(dentry->d_parent);
-	struct buffer_head * bh;
-	struct ext3_dir_entry_2 *de;
-	struct super_block * sb;
-	int	retval;
-	int	dx_fallback=0;
-	unsigned blocksize;
-	u32 block, blocks;
-
-	sb = dir->i_sb;
-	blocksize = sb->s_blocksize;
-	if (!dentry->d_name.len)
-		return -EINVAL;
-	if (is_dx(dir)) {
-		retval = ext3_dx_add_entry(handle, dentry, inode);
-		if (!retval || (retval != ERR_BAD_DX_DIR))
-			return retval;
-		EXT3_I(dir)->i_flags &= ~EXT3_INDEX_FL;
-		dx_fallback++;
-		ext3_mark_inode_dirty(handle, dir);
-	}
-	blocks = dir->i_size >> sb->s_blocksize_bits;
-	for (block = 0; block < blocks; block++) {
-		if (!(bh = ext3_dir_bread(handle, dir, block, 0, &retval)))
-			return retval;
-
-		retval = add_dirent_to_buf(handle, dentry, inode, NULL, bh);
-		if (retval != -ENOSPC)
-			return retval;
-
-		if (blocks == 1 && !dx_fallback &&
-		    EXT3_HAS_COMPAT_FEATURE(sb, EXT3_FEATURE_COMPAT_DIR_INDEX))
-			return make_indexed_dir(handle, dentry, inode, bh);
-		brelse(bh);
-	}
-	bh = ext3_append(handle, dir, &block, &retval);
-	if (!bh)
-		return retval;
-	de = (struct ext3_dir_entry_2 *) bh->b_data;
-	de->inode = 0;
-	de->rec_len = ext3_rec_len_to_disk(blocksize);
-	return add_dirent_to_buf(handle, dentry, inode, de, bh);
-}
-
-/*
- * Returns 0 for success, or a negative error value
- */
-static int ext3_dx_add_entry(handle_t *handle, struct dentry *dentry,
-			     struct inode *inode)
-{
-	struct dx_frame frames[2], *frame;
-	struct dx_entry *entries, *at;
-	struct dx_hash_info hinfo;
-	struct buffer_head * bh;
-	struct inode *dir = d_inode(dentry->d_parent);
-	struct super_block * sb = dir->i_sb;
-	struct ext3_dir_entry_2 *de;
-	int err;
-
-	frame = dx_probe(&dentry->d_name, dir, &hinfo, frames, &err);
-	if (!frame)
-		return err;
-	entries = frame->entries;
-	at = frame->at;
-
-	if (!(bh = ext3_dir_bread(handle, dir, dx_get_block(frame->at), 0, &err)))
-		goto cleanup;
-
-	BUFFER_TRACE(bh, "get_write_access");
-	err = ext3_journal_get_write_access(handle, bh);
-	if (err)
-		goto journal_error;
-
-	err = add_dirent_to_buf(handle, dentry, inode, NULL, bh);
-	if (err != -ENOSPC) {
-		bh = NULL;
-		goto cleanup;
-	}
-
-	/* Block full, should compress but for now just split */
-	dxtrace(printk("using %u of %u node entries\n",
-		       dx_get_count(entries), dx_get_limit(entries)));
-	/* Need to split index? */
-	if (dx_get_count(entries) == dx_get_limit(entries)) {
-		u32 newblock;
-		unsigned icount = dx_get_count(entries);
-		int levels = frame - frames;
-		struct dx_entry *entries2;
-		struct dx_node *node2;
-		struct buffer_head *bh2;
-
-		if (levels && (dx_get_count(frames->entries) ==
-			       dx_get_limit(frames->entries))) {
-			ext3_warning(sb, __func__,
-				     "Directory index full!");
-			err = -ENOSPC;
-			goto cleanup;
-		}
-		bh2 = ext3_append (handle, dir, &newblock, &err);
-		if (!(bh2))
-			goto cleanup;
-		node2 = (struct dx_node *)(bh2->b_data);
-		entries2 = node2->entries;
-		memset(&node2->fake, 0, sizeof(struct fake_dirent));
-		node2->fake.rec_len = ext3_rec_len_to_disk(sb->s_blocksize);
-		BUFFER_TRACE(frame->bh, "get_write_access");
-		err = ext3_journal_get_write_access(handle, frame->bh);
-		if (err)
-			goto journal_error;
-		if (levels) {
-			unsigned icount1 = icount/2, icount2 = icount - icount1;
-			unsigned hash2 = dx_get_hash(entries + icount1);
-			dxtrace(printk("Split index %i/%i\n", icount1, icount2));
-
-			BUFFER_TRACE(frame->bh, "get_write_access"); /* index root */
-			err = ext3_journal_get_write_access(handle,
-							     frames[0].bh);
-			if (err)
-				goto journal_error;
-
-			memcpy ((char *) entries2, (char *) (entries + icount1),
-				icount2 * sizeof(struct dx_entry));
-			dx_set_count (entries, icount1);
-			dx_set_count (entries2, icount2);
-			dx_set_limit (entries2, dx_node_limit(dir));
-
-			/* Which index block gets the new entry? */
-			if (at - entries >= icount1) {
-				frame->at = at = at - entries - icount1 + entries2;
-				frame->entries = entries = entries2;
-				swap(frame->bh, bh2);
-			}
-			dx_insert_block (frames + 0, hash2, newblock);
-			dxtrace(dx_show_index ("node", frames[1].entries));
-			dxtrace(dx_show_index ("node",
-			       ((struct dx_node *) bh2->b_data)->entries));
-			err = ext3_journal_dirty_metadata(handle, bh2);
-			if (err)
-				goto journal_error;
-			brelse (bh2);
-		} else {
-			dxtrace(printk("Creating second level index...\n"));
-			memcpy((char *) entries2, (char *) entries,
-			       icount * sizeof(struct dx_entry));
-			dx_set_limit(entries2, dx_node_limit(dir));
-
-			/* Set up root */
-			dx_set_count(entries, 1);
-			dx_set_block(entries + 0, newblock);
-			((struct dx_root *) frames[0].bh->b_data)->info.indirect_levels = 1;
-
-			/* Add new access path frame */
-			frame = frames + 1;
-			frame->at = at = at - entries + entries2;
-			frame->entries = entries = entries2;
-			frame->bh = bh2;
-			err = ext3_journal_get_write_access(handle,
-							     frame->bh);
-			if (err)
-				goto journal_error;
-		}
-		err = ext3_journal_dirty_metadata(handle, frames[0].bh);
-		if (err)
-			goto journal_error;
-	}
-	de = do_split(handle, dir, &bh, frame, &hinfo, &err);
-	if (!de)
-		goto cleanup;
-	err = add_dirent_to_buf(handle, dentry, inode, de, bh);
-	bh = NULL;
-	goto cleanup;
-
-journal_error:
-	ext3_std_error(dir->i_sb, err);
-cleanup:
-	if (bh)
-		brelse(bh);
-	dx_release(frames);
-	return err;
-}
-
-/*
- * ext3_delete_entry deletes a directory entry by merging it with the
- * previous entry
- */
-static int ext3_delete_entry (handle_t *handle,
-			      struct inode * dir,
-			      struct ext3_dir_entry_2 * de_del,
-			      struct buffer_head * bh)
-{
-	struct ext3_dir_entry_2 * de, * pde;
-	int i;
-
-	i = 0;
-	pde = NULL;
-	de = (struct ext3_dir_entry_2 *) bh->b_data;
-	while (i < bh->b_size) {
-		if (!ext3_check_dir_entry("ext3_delete_entry", dir, de, bh, i))
-			return -EIO;
-		if (de == de_del)  {
-			int err;
-
-			BUFFER_TRACE(bh, "get_write_access");
-			err = ext3_journal_get_write_access(handle, bh);
-			if (err)
-				goto journal_error;
-
-			if (pde)
-				pde->rec_len = ext3_rec_len_to_disk(
-					ext3_rec_len_from_disk(pde->rec_len) +
-					ext3_rec_len_from_disk(de->rec_len));
-			else
-				de->inode = 0;
-			dir->i_version++;
-			BUFFER_TRACE(bh, "call ext3_journal_dirty_metadata");
-			err = ext3_journal_dirty_metadata(handle, bh);
-			if (err) {
-journal_error:
-				ext3_std_error(dir->i_sb, err);
-				return err;
-			}
-			return 0;
-		}
-		i += ext3_rec_len_from_disk(de->rec_len);
-		pde = de;
-		de = ext3_next_entry(de);
-	}
-	return -ENOENT;
-}
-
-static int ext3_add_nondir(handle_t *handle,
-		struct dentry *dentry, struct inode *inode)
-{
-	int err = ext3_add_entry(handle, dentry, inode);
-	if (!err) {
-		ext3_mark_inode_dirty(handle, inode);
-		unlock_new_inode(inode);
-		d_instantiate(dentry, inode);
-		return 0;
-	}
-	drop_nlink(inode);
-	unlock_new_inode(inode);
-	iput(inode);
-	return err;
-}
-
-/*
- * By the time this is called, we already have created
- * the directory cache entry for the new file, but it
- * is so far negative - it has no inode.
- *
- * If the create succeeds, we fill in the inode information
- * with d_instantiate().
- */
-static int ext3_create (struct inode * dir, struct dentry * dentry, umode_t mode,
-		bool excl)
-{
-	handle_t *handle;
-	struct inode * inode;
-	int err, retries = 0;
-
-	dquot_initialize(dir);
-
-retry:
-	handle = ext3_journal_start(dir, EXT3_DATA_TRANS_BLOCKS(dir->i_sb) +
-					EXT3_INDEX_EXTRA_TRANS_BLOCKS + 3 +
-					EXT3_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-
-	if (IS_DIRSYNC(dir))
-		handle->h_sync = 1;
-
-	inode = ext3_new_inode (handle, dir, &dentry->d_name, mode);
-	err = PTR_ERR(inode);
-	if (!IS_ERR(inode)) {
-		inode->i_op = &ext3_file_inode_operations;
-		inode->i_fop = &ext3_file_operations;
-		ext3_set_aops(inode);
-		err = ext3_add_nondir(handle, dentry, inode);
-	}
-	ext3_journal_stop(handle);
-	if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries))
-		goto retry;
-	return err;
-}
-
-static int ext3_mknod (struct inode * dir, struct dentry *dentry,
-			umode_t mode, dev_t rdev)
-{
-	handle_t *handle;
-	struct inode *inode;
-	int err, retries = 0;
-
-	if (!new_valid_dev(rdev))
-		return -EINVAL;
-
-	dquot_initialize(dir);
-
-retry:
-	handle = ext3_journal_start(dir, EXT3_DATA_TRANS_BLOCKS(dir->i_sb) +
-					EXT3_INDEX_EXTRA_TRANS_BLOCKS + 3 +
-					EXT3_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-
-	if (IS_DIRSYNC(dir))
-		handle->h_sync = 1;
-
-	inode = ext3_new_inode (handle, dir, &dentry->d_name, mode);
-	err = PTR_ERR(inode);
-	if (!IS_ERR(inode)) {
-		init_special_inode(inode, inode->i_mode, rdev);
-#ifdef CONFIG_EXT3_FS_XATTR
-		inode->i_op = &ext3_special_inode_operations;
-#endif
-		err = ext3_add_nondir(handle, dentry, inode);
-	}
-	ext3_journal_stop(handle);
-	if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries))
-		goto retry;
-	return err;
-}
-
-static int ext3_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
-{
-	handle_t *handle;
-	struct inode *inode;
-	int err, retries = 0;
-
-	dquot_initialize(dir);
-
-retry:
-	handle = ext3_journal_start(dir, EXT3_MAXQUOTAS_INIT_BLOCKS(dir->i_sb) +
-			  4 + EXT3_XATTR_TRANS_BLOCKS);
-
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-
-	inode = ext3_new_inode (handle, dir, NULL, mode);
-	err = PTR_ERR(inode);
-	if (!IS_ERR(inode)) {
-		inode->i_op = &ext3_file_inode_operations;
-		inode->i_fop = &ext3_file_operations;
-		ext3_set_aops(inode);
-		d_tmpfile(dentry, inode);
-		err = ext3_orphan_add(handle, inode);
-		if (err)
-			goto err_unlock_inode;
-		mark_inode_dirty(inode);
-		unlock_new_inode(inode);
-	}
-	ext3_journal_stop(handle);
-	if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries))
-		goto retry;
-	return err;
-err_unlock_inode:
-	ext3_journal_stop(handle);
-	unlock_new_inode(inode);
-	return err;
-}
-
-static int ext3_mkdir(struct inode * dir, struct dentry * dentry, umode_t mode)
-{
-	handle_t *handle;
-	struct inode * inode;
-	struct buffer_head * dir_block = NULL;
-	struct ext3_dir_entry_2 * de;
-	int err, retries = 0;
-
-	if (dir->i_nlink >= EXT3_LINK_MAX)
-		return -EMLINK;
-
-	dquot_initialize(dir);
-
-retry:
-	handle = ext3_journal_start(dir, EXT3_DATA_TRANS_BLOCKS(dir->i_sb) +
-					EXT3_INDEX_EXTRA_TRANS_BLOCKS + 3 +
-					EXT3_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-
-	if (IS_DIRSYNC(dir))
-		handle->h_sync = 1;
-
-	inode = ext3_new_inode (handle, dir, &dentry->d_name, S_IFDIR | mode);
-	err = PTR_ERR(inode);
-	if (IS_ERR(inode))
-		goto out_stop;
-
-	inode->i_op = &ext3_dir_inode_operations;
-	inode->i_fop = &ext3_dir_operations;
-	inode->i_size = EXT3_I(inode)->i_disksize = inode->i_sb->s_blocksize;
-	if (!(dir_block = ext3_dir_bread(handle, inode, 0, 1, &err)))
-		goto out_clear_inode;
-
-	BUFFER_TRACE(dir_block, "get_write_access");
-	err = ext3_journal_get_write_access(handle, dir_block);
-	if (err)
-		goto out_clear_inode;
-
-	de = (struct ext3_dir_entry_2 *) dir_block->b_data;
-	de->inode = cpu_to_le32(inode->i_ino);
-	de->name_len = 1;
-	de->rec_len = ext3_rec_len_to_disk(EXT3_DIR_REC_LEN(de->name_len));
-	strcpy (de->name, ".");
-	ext3_set_de_type(dir->i_sb, de, S_IFDIR);
-	de = ext3_next_entry(de);
-	de->inode = cpu_to_le32(dir->i_ino);
-	de->rec_len = ext3_rec_len_to_disk(inode->i_sb->s_blocksize -
-					EXT3_DIR_REC_LEN(1));
-	de->name_len = 2;
-	strcpy (de->name, "..");
-	ext3_set_de_type(dir->i_sb, de, S_IFDIR);
-	set_nlink(inode, 2);
-	BUFFER_TRACE(dir_block, "call ext3_journal_dirty_metadata");
-	err = ext3_journal_dirty_metadata(handle, dir_block);
-	if (err)
-		goto out_clear_inode;
-
-	err = ext3_mark_inode_dirty(handle, inode);
-	if (!err)
-		err = ext3_add_entry (handle, dentry, inode);
-
-	if (err) {
-out_clear_inode:
-		clear_nlink(inode);
-		unlock_new_inode(inode);
-		ext3_mark_inode_dirty(handle, inode);
-		iput (inode);
-		goto out_stop;
-	}
-	inc_nlink(dir);
-	ext3_update_dx_flag(dir);
-	err = ext3_mark_inode_dirty(handle, dir);
-	if (err)
-		goto out_clear_inode;
-
-	unlock_new_inode(inode);
-	d_instantiate(dentry, inode);
-out_stop:
-	brelse(dir_block);
-	ext3_journal_stop(handle);
-	if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries))
-		goto retry;
-	return err;
-}
-
-/*
- * routine to check that the specified directory is empty (for rmdir)
- */
-static int empty_dir (struct inode * inode)
-{
-	unsigned long offset;
-	struct buffer_head * bh;
-	struct ext3_dir_entry_2 * de, * de1;
-	struct super_block * sb;
-	int err = 0;
-
-	sb = inode->i_sb;
-	if (inode->i_size < EXT3_DIR_REC_LEN(1) + EXT3_DIR_REC_LEN(2) ||
-	    !(bh = ext3_dir_bread(NULL, inode, 0, 0, &err))) {
-		if (err)
-			ext3_error(inode->i_sb, __func__,
-				   "error %d reading directory #%lu offset 0",
-				   err, inode->i_ino);
-		else
-			ext3_warning(inode->i_sb, __func__,
-				     "bad directory (dir #%lu) - no data block",
-				     inode->i_ino);
-		return 1;
-	}
-	de = (struct ext3_dir_entry_2 *) bh->b_data;
-	de1 = ext3_next_entry(de);
-	if (le32_to_cpu(de->inode) != inode->i_ino ||
-			!le32_to_cpu(de1->inode) ||
-			strcmp (".", de->name) ||
-			strcmp ("..", de1->name)) {
-		ext3_warning (inode->i_sb, "empty_dir",
-			      "bad directory (dir #%lu) - no `.' or `..'",
-			      inode->i_ino);
-		brelse (bh);
-		return 1;
-	}
-	offset = ext3_rec_len_from_disk(de->rec_len) +
-			ext3_rec_len_from_disk(de1->rec_len);
-	de = ext3_next_entry(de1);
-	while (offset < inode->i_size ) {
-		if (!bh ||
-			(void *) de >= (void *) (bh->b_data+sb->s_blocksize)) {
-			err = 0;
-			brelse (bh);
-			if (!(bh = ext3_dir_bread (NULL, inode,
-				offset >> EXT3_BLOCK_SIZE_BITS(sb), 0, &err))) {
-				if (err)
-					ext3_error(sb, __func__,
-						   "error %d reading directory"
-						   " #%lu offset %lu",
-						   err, inode->i_ino, offset);
-				offset += sb->s_blocksize;
-				continue;
-			}
-			de = (struct ext3_dir_entry_2 *) bh->b_data;
-		}
-		if (!ext3_check_dir_entry("empty_dir", inode, de, bh, offset)) {
-			de = (struct ext3_dir_entry_2 *)(bh->b_data +
-							 sb->s_blocksize);
-			offset = (offset | (sb->s_blocksize - 1)) + 1;
-			continue;
-		}
-		if (le32_to_cpu(de->inode)) {
-			brelse (bh);
-			return 0;
-		}
-		offset += ext3_rec_len_from_disk(de->rec_len);
-		de = ext3_next_entry(de);
-	}
-	brelse (bh);
-	return 1;
-}
-
-/* ext3_orphan_add() links an unlinked or truncated inode into a list of
- * such inodes, starting at the superblock, in case we crash before the
- * file is closed/deleted, or in case the inode truncate spans multiple
- * transactions and the last transaction is not recovered after a crash.
- *
- * At filesystem recovery time, we walk this list deleting unlinked
- * inodes and truncating linked inodes in ext3_orphan_cleanup().
- */
-int ext3_orphan_add(handle_t *handle, struct inode *inode)
-{
-	struct super_block *sb = inode->i_sb;
-	struct ext3_iloc iloc;
-	int err = 0, rc;
-
-	mutex_lock(&EXT3_SB(sb)->s_orphan_lock);
-	if (!list_empty(&EXT3_I(inode)->i_orphan))
-		goto out_unlock;
-
-	/* Orphan handling is only valid for files with data blocks
-	 * being truncated, or files being unlinked. */
-
-	/* @@@ FIXME: Observation from aviro:
-	 * I think I can trigger J_ASSERT in ext3_orphan_add().  We block
-	 * here (on s_orphan_lock), so race with ext3_link() which might bump
-	 * ->i_nlink. For, say it, character device. Not a regular file,
-	 * not a directory, not a symlink and ->i_nlink > 0.
-	 *
-	 * tytso, 4/25/2009: I'm not sure how that could happen;
-	 * shouldn't the fs core protect us from these sort of
-	 * unlink()/link() races?
-	 */
-	J_ASSERT ((S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
-		S_ISLNK(inode->i_mode)) || inode->i_nlink == 0);
-
-	BUFFER_TRACE(EXT3_SB(sb)->s_sbh, "get_write_access");
-	err = ext3_journal_get_write_access(handle, EXT3_SB(sb)->s_sbh);
-	if (err)
-		goto out_unlock;
-
-	err = ext3_reserve_inode_write(handle, inode, &iloc);
-	if (err)
-		goto out_unlock;
-
-	/* Insert this inode at the head of the on-disk orphan list... */
-	NEXT_ORPHAN(inode) = le32_to_cpu(EXT3_SB(sb)->s_es->s_last_orphan);
-	EXT3_SB(sb)->s_es->s_last_orphan = cpu_to_le32(inode->i_ino);
-	err = ext3_journal_dirty_metadata(handle, EXT3_SB(sb)->s_sbh);
-	rc = ext3_mark_iloc_dirty(handle, inode, &iloc);
-	if (!err)
-		err = rc;
-
-	/* Only add to the head of the in-memory list if all the
-	 * previous operations succeeded.  If the orphan_add is going to
-	 * fail (possibly taking the journal offline), we can't risk
-	 * leaving the inode on the orphan list: stray orphan-list
-	 * entries can cause panics at unmount time.
-	 *
-	 * This is safe: on error we're going to ignore the orphan list
-	 * anyway on the next recovery. */
-	if (!err)
-		list_add(&EXT3_I(inode)->i_orphan, &EXT3_SB(sb)->s_orphan);
-
-	jbd_debug(4, "superblock will point to %lu\n", inode->i_ino);
-	jbd_debug(4, "orphan inode %lu will point to %d\n",
-			inode->i_ino, NEXT_ORPHAN(inode));
-out_unlock:
-	mutex_unlock(&EXT3_SB(sb)->s_orphan_lock);
-	ext3_std_error(inode->i_sb, err);
-	return err;
-}
-
-/*
- * ext3_orphan_del() removes an unlinked or truncated inode from the list
- * of such inodes stored on disk, because it is finally being cleaned up.
- */
-int ext3_orphan_del(handle_t *handle, struct inode *inode)
-{
-	struct list_head *prev;
-	struct ext3_inode_info *ei = EXT3_I(inode);
-	struct ext3_sb_info *sbi;
-	unsigned long ino_next;
-	struct ext3_iloc iloc;
-	int err = 0;
-
-	mutex_lock(&EXT3_SB(inode->i_sb)->s_orphan_lock);
-	if (list_empty(&ei->i_orphan))
-		goto out;
-
-	ino_next = NEXT_ORPHAN(inode);
-	prev = ei->i_orphan.prev;
-	sbi = EXT3_SB(inode->i_sb);
-
-	jbd_debug(4, "remove inode %lu from orphan list\n", inode->i_ino);
-
-	list_del_init(&ei->i_orphan);
-
-	/* If we're on an error path, we may not have a valid
-	 * transaction handle with which to update the orphan list on
-	 * disk, but we still need to remove the inode from the linked
-	 * list in memory. */
-	if (!handle)
-		goto out;
-
-	err = ext3_reserve_inode_write(handle, inode, &iloc);
-	if (err)
-		goto out_err;
-
-	if (prev == &sbi->s_orphan) {
-		jbd_debug(4, "superblock will point to %lu\n", ino_next);
-		BUFFER_TRACE(sbi->s_sbh, "get_write_access");
-		err = ext3_journal_get_write_access(handle, sbi->s_sbh);
-		if (err)
-			goto out_brelse;
-		sbi->s_es->s_last_orphan = cpu_to_le32(ino_next);
-		err = ext3_journal_dirty_metadata(handle, sbi->s_sbh);
-	} else {
-		struct ext3_iloc iloc2;
-		struct inode *i_prev =
-			&list_entry(prev, struct ext3_inode_info, i_orphan)->vfs_inode;
-
-		jbd_debug(4, "orphan inode %lu will point to %lu\n",
-			  i_prev->i_ino, ino_next);
-		err = ext3_reserve_inode_write(handle, i_prev, &iloc2);
-		if (err)
-			goto out_brelse;
-		NEXT_ORPHAN(i_prev) = ino_next;
-		err = ext3_mark_iloc_dirty(handle, i_prev, &iloc2);
-	}
-	if (err)
-		goto out_brelse;
-	NEXT_ORPHAN(inode) = 0;
-	err = ext3_mark_iloc_dirty(handle, inode, &iloc);
-
-out_err:
-	ext3_std_error(inode->i_sb, err);
-out:
-	mutex_unlock(&EXT3_SB(inode->i_sb)->s_orphan_lock);
-	return err;
-
-out_brelse:
-	brelse(iloc.bh);
-	goto out_err;
-}
-
-static int ext3_rmdir (struct inode * dir, struct dentry *dentry)
-{
-	int retval;
-	struct inode * inode;
-	struct buffer_head * bh;
-	struct ext3_dir_entry_2 * de;
-	handle_t *handle;
-
-	/* Initialize quotas before so that eventual writes go in
-	 * separate transaction */
-	dquot_initialize(dir);
-	dquot_initialize(d_inode(dentry));
-
-	handle = ext3_journal_start(dir, EXT3_DELETE_TRANS_BLOCKS(dir->i_sb));
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-
-	retval = -ENOENT;
-	bh = ext3_find_entry(dir, &dentry->d_name, &de);
-	if (!bh)
-		goto end_rmdir;
-
-	if (IS_DIRSYNC(dir))
-		handle->h_sync = 1;
-
-	inode = d_inode(dentry);
-
-	retval = -EIO;
-	if (le32_to_cpu(de->inode) != inode->i_ino)
-		goto end_rmdir;
-
-	retval = -ENOTEMPTY;
-	if (!empty_dir (inode))
-		goto end_rmdir;
-
-	retval = ext3_delete_entry(handle, dir, de, bh);
-	if (retval)
-		goto end_rmdir;
-	if (inode->i_nlink != 2)
-		ext3_warning (inode->i_sb, "ext3_rmdir",
-			      "empty directory has nlink!=2 (%d)",
-			      inode->i_nlink);
-	inode->i_version++;
-	clear_nlink(inode);
-	/* There's no need to set i_disksize: the fact that i_nlink is
-	 * zero will ensure that the right thing happens during any
-	 * recovery. */
-	inode->i_size = 0;
-	ext3_orphan_add(handle, inode);
-	inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME_SEC;
-	ext3_mark_inode_dirty(handle, inode);
-	drop_nlink(dir);
-	ext3_update_dx_flag(dir);
-	ext3_mark_inode_dirty(handle, dir);
-
-end_rmdir:
-	ext3_journal_stop(handle);
-	brelse (bh);
-	return retval;
-}
-
-static int ext3_unlink(struct inode * dir, struct dentry *dentry)
-{
-	int retval;
-	struct inode * inode;
-	struct buffer_head * bh;
-	struct ext3_dir_entry_2 * de;
-	handle_t *handle;
-
-	trace_ext3_unlink_enter(dir, dentry);
-	/* Initialize quotas before so that eventual writes go
-	 * in separate transaction */
-	dquot_initialize(dir);
-	dquot_initialize(d_inode(dentry));
-
-	handle = ext3_journal_start(dir, EXT3_DELETE_TRANS_BLOCKS(dir->i_sb));
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-
-	if (IS_DIRSYNC(dir))
-		handle->h_sync = 1;
-
-	retval = -ENOENT;
-	bh = ext3_find_entry(dir, &dentry->d_name, &de);
-	if (!bh)
-		goto end_unlink;
-
-	inode = d_inode(dentry);
-
-	retval = -EIO;
-	if (le32_to_cpu(de->inode) != inode->i_ino)
-		goto end_unlink;
-
-	if (!inode->i_nlink) {
-		ext3_warning (inode->i_sb, "ext3_unlink",
-			      "Deleting nonexistent file (%lu), %d",
-			      inode->i_ino, inode->i_nlink);
-		set_nlink(inode, 1);
-	}
-	retval = ext3_delete_entry(handle, dir, de, bh);
-	if (retval)
-		goto end_unlink;
-	dir->i_ctime = dir->i_mtime = CURRENT_TIME_SEC;
-	ext3_update_dx_flag(dir);
-	ext3_mark_inode_dirty(handle, dir);
-	drop_nlink(inode);
-	if (!inode->i_nlink)
-		ext3_orphan_add(handle, inode);
-	inode->i_ctime = dir->i_ctime;
-	ext3_mark_inode_dirty(handle, inode);
-	retval = 0;
-
-end_unlink:
-	ext3_journal_stop(handle);
-	brelse (bh);
-	trace_ext3_unlink_exit(dentry, retval);
-	return retval;
-}
-
-static int ext3_symlink (struct inode * dir,
-		struct dentry *dentry, const char * symname)
-{
-	handle_t *handle;
-	struct inode * inode;
-	int l, err, retries = 0;
-	int credits;
-
-	l = strlen(symname)+1;
-	if (l > dir->i_sb->s_blocksize)
-		return -ENAMETOOLONG;
-
-	dquot_initialize(dir);
-
-	if (l > EXT3_N_BLOCKS * 4) {
-		/*
-		 * For non-fast symlinks, we just allocate inode and put it on
-		 * orphan list in the first transaction => we need bitmap,
-		 * group descriptor, sb, inode block, quota blocks, and
-		 * possibly selinux xattr blocks.
-		 */
-		credits = 4 + EXT3_MAXQUOTAS_INIT_BLOCKS(dir->i_sb) +
-			  EXT3_XATTR_TRANS_BLOCKS;
-	} else {
-		/*
-		 * Fast symlink. We have to add entry to directory
-		 * (EXT3_DATA_TRANS_BLOCKS + EXT3_INDEX_EXTRA_TRANS_BLOCKS),
-		 * allocate new inode (bitmap, group descriptor, inode block,
-		 * quota blocks, sb is already counted in previous macros).
-		 */
-		credits = EXT3_DATA_TRANS_BLOCKS(dir->i_sb) +
-			  EXT3_INDEX_EXTRA_TRANS_BLOCKS + 3 +
-			  EXT3_MAXQUOTAS_INIT_BLOCKS(dir->i_sb);
-	}
-retry:
-	handle = ext3_journal_start(dir, credits);
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-
-	if (IS_DIRSYNC(dir))
-		handle->h_sync = 1;
-
-	inode = ext3_new_inode (handle, dir, &dentry->d_name, S_IFLNK|S_IRWXUGO);
-	err = PTR_ERR(inode);
-	if (IS_ERR(inode))
-		goto out_stop;
-
-	if (l > EXT3_N_BLOCKS * 4) {
-		inode->i_op = &ext3_symlink_inode_operations;
-		ext3_set_aops(inode);
-		/*
-		 * We cannot call page_symlink() with transaction started
-		 * because it calls into ext3_write_begin() which acquires page
-		 * lock which ranks below transaction start (and it can also
-		 * wait for journal commit if we are running out of space). So
-		 * we have to stop transaction now and restart it when symlink
-		 * contents is written. 
-		 *
-		 * To keep fs consistent in case of crash, we have to put inode
-		 * to orphan list in the mean time.
-		 */
-		drop_nlink(inode);
-		err = ext3_orphan_add(handle, inode);
-		ext3_journal_stop(handle);
-		if (err)
-			goto err_drop_inode;
-		err = __page_symlink(inode, symname, l, 1);
-		if (err)
-			goto err_drop_inode;
-		/*
-		 * Now inode is being linked into dir (EXT3_DATA_TRANS_BLOCKS
-		 * + EXT3_INDEX_EXTRA_TRANS_BLOCKS), inode is also modified
-		 */
-		handle = ext3_journal_start(dir,
-				EXT3_DATA_TRANS_BLOCKS(dir->i_sb) +
-				EXT3_INDEX_EXTRA_TRANS_BLOCKS + 1);
-		if (IS_ERR(handle)) {
-			err = PTR_ERR(handle);
-			goto err_drop_inode;
-		}
-		set_nlink(inode, 1);
-		err = ext3_orphan_del(handle, inode);
-		if (err) {
-			ext3_journal_stop(handle);
-			drop_nlink(inode);
-			goto err_drop_inode;
-		}
-	} else {
-		inode->i_op = &ext3_fast_symlink_inode_operations;
-		inode->i_link = (char*)&EXT3_I(inode)->i_data;
-		memcpy(inode->i_link, symname, l);
-		inode->i_size = l-1;
-	}
-	EXT3_I(inode)->i_disksize = inode->i_size;
-	err = ext3_add_nondir(handle, dentry, inode);
-out_stop:
-	ext3_journal_stop(handle);
-	if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries))
-		goto retry;
-	return err;
-err_drop_inode:
-	unlock_new_inode(inode);
-	iput(inode);
-	return err;
-}
-
-static int ext3_link (struct dentry * old_dentry,
-		struct inode * dir, struct dentry *dentry)
-{
-	handle_t *handle;
-	struct inode *inode = d_inode(old_dentry);
-	int err, retries = 0;
-
-	if (inode->i_nlink >= EXT3_LINK_MAX)
-		return -EMLINK;
-
-	dquot_initialize(dir);
-
-retry:
-	handle = ext3_journal_start(dir, EXT3_DATA_TRANS_BLOCKS(dir->i_sb) +
-					EXT3_INDEX_EXTRA_TRANS_BLOCKS + 1);
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-
-	if (IS_DIRSYNC(dir))
-		handle->h_sync = 1;
-
-	inode->i_ctime = CURRENT_TIME_SEC;
-	inc_nlink(inode);
-	ihold(inode);
-
-	err = ext3_add_entry(handle, dentry, inode);
-	if (!err) {
-		ext3_mark_inode_dirty(handle, inode);
-		/* this can happen only for tmpfile being
-		 * linked the first time
-		 */
-		if (inode->i_nlink == 1)
-			ext3_orphan_del(handle, inode);
-		d_instantiate(dentry, inode);
-	} else {
-		drop_nlink(inode);
-		iput(inode);
-	}
-	ext3_journal_stop(handle);
-	if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries))
-		goto retry;
-	return err;
-}
-
-#define PARENT_INO(buffer) \
-	(ext3_next_entry((struct ext3_dir_entry_2 *)(buffer))->inode)
-
-/*
- * Anybody can rename anything with this: the permission checks are left to the
- * higher-level routines.
- */
-static int ext3_rename (struct inode * old_dir, struct dentry *old_dentry,
-			   struct inode * new_dir,struct dentry *new_dentry)
-{
-	handle_t *handle;
-	struct inode * old_inode, * new_inode;
-	struct buffer_head * old_bh, * new_bh, * dir_bh;
-	struct ext3_dir_entry_2 * old_de, * new_de;
-	int retval, flush_file = 0;
-
-	dquot_initialize(old_dir);
-	dquot_initialize(new_dir);
-
-	old_bh = new_bh = dir_bh = NULL;
-
-	/* Initialize quotas before so that eventual writes go
-	 * in separate transaction */
-	if (d_really_is_positive(new_dentry))
-		dquot_initialize(d_inode(new_dentry));
-	handle = ext3_journal_start(old_dir, 2 *
-					EXT3_DATA_TRANS_BLOCKS(old_dir->i_sb) +
-					EXT3_INDEX_EXTRA_TRANS_BLOCKS + 2);
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-
-	if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
-		handle->h_sync = 1;
-
-	old_bh = ext3_find_entry(old_dir, &old_dentry->d_name, &old_de);
-	/*
-	 *  Check for inode number is _not_ due to possible IO errors.
-	 *  We might rmdir the source, keep it as pwd of some process
-	 *  and merrily kill the link to whatever was created under the
-	 *  same name. Goodbye sticky bit ;-<
-	 */
-	old_inode = d_inode(old_dentry);
-	retval = -ENOENT;
-	if (!old_bh || le32_to_cpu(old_de->inode) != old_inode->i_ino)
-		goto end_rename;
-
-	new_inode = d_inode(new_dentry);
-	new_bh = ext3_find_entry(new_dir, &new_dentry->d_name, &new_de);
-	if (new_bh) {
-		if (!new_inode) {
-			brelse (new_bh);
-			new_bh = NULL;
-		}
-	}
-	if (S_ISDIR(old_inode->i_mode)) {
-		if (new_inode) {
-			retval = -ENOTEMPTY;
-			if (!empty_dir (new_inode))
-				goto end_rename;
-		}
-		retval = -EIO;
-		dir_bh = ext3_dir_bread(handle, old_inode, 0, 0, &retval);
-		if (!dir_bh)
-			goto end_rename;
-		if (le32_to_cpu(PARENT_INO(dir_bh->b_data)) != old_dir->i_ino)
-			goto end_rename;
-		retval = -EMLINK;
-		if (!new_inode && new_dir!=old_dir &&
-				new_dir->i_nlink >= EXT3_LINK_MAX)
-			goto end_rename;
-	}
-	if (!new_bh) {
-		retval = ext3_add_entry (handle, new_dentry, old_inode);
-		if (retval)
-			goto end_rename;
-	} else {
-		BUFFER_TRACE(new_bh, "get write access");
-		retval = ext3_journal_get_write_access(handle, new_bh);
-		if (retval)
-			goto journal_error;
-		new_de->inode = cpu_to_le32(old_inode->i_ino);
-		if (EXT3_HAS_INCOMPAT_FEATURE(new_dir->i_sb,
-					      EXT3_FEATURE_INCOMPAT_FILETYPE))
-			new_de->file_type = old_de->file_type;
-		new_dir->i_version++;
-		new_dir->i_ctime = new_dir->i_mtime = CURRENT_TIME_SEC;
-		ext3_mark_inode_dirty(handle, new_dir);
-		BUFFER_TRACE(new_bh, "call ext3_journal_dirty_metadata");
-		retval = ext3_journal_dirty_metadata(handle, new_bh);
-		if (retval)
-			goto journal_error;
-		brelse(new_bh);
-		new_bh = NULL;
-	}
-
-	/*
-	 * Like most other Unix systems, set the ctime for inodes on a
-	 * rename.
-	 */
-	old_inode->i_ctime = CURRENT_TIME_SEC;
-	ext3_mark_inode_dirty(handle, old_inode);
-
-	/*
-	 * ok, that's it
-	 */
-	if (le32_to_cpu(old_de->inode) != old_inode->i_ino ||
-	    old_de->name_len != old_dentry->d_name.len ||
-	    strncmp(old_de->name, old_dentry->d_name.name, old_de->name_len) ||
-	    (retval = ext3_delete_entry(handle, old_dir,
-					old_de, old_bh)) == -ENOENT) {
-		/* old_de could have moved from under us during htree split, so
-		 * make sure that we are deleting the right entry.  We might
-		 * also be pointing to a stale entry in the unused part of
-		 * old_bh so just checking inum and the name isn't enough. */
-		struct buffer_head *old_bh2;
-		struct ext3_dir_entry_2 *old_de2;
-
-		old_bh2 = ext3_find_entry(old_dir, &old_dentry->d_name,
-					  &old_de2);
-		if (old_bh2) {
-			retval = ext3_delete_entry(handle, old_dir,
-						   old_de2, old_bh2);
-			brelse(old_bh2);
-		}
-	}
-	if (retval) {
-		ext3_warning(old_dir->i_sb, "ext3_rename",
-				"Deleting old file (%lu), %d, error=%d",
-				old_dir->i_ino, old_dir->i_nlink, retval);
-	}
-
-	if (new_inode) {
-		drop_nlink(new_inode);
-		new_inode->i_ctime = CURRENT_TIME_SEC;
-	}
-	old_dir->i_ctime = old_dir->i_mtime = CURRENT_TIME_SEC;
-	ext3_update_dx_flag(old_dir);
-	if (dir_bh) {
-		BUFFER_TRACE(dir_bh, "get_write_access");
-		retval = ext3_journal_get_write_access(handle, dir_bh);
-		if (retval)
-			goto journal_error;
-		PARENT_INO(dir_bh->b_data) = cpu_to_le32(new_dir->i_ino);
-		BUFFER_TRACE(dir_bh, "call ext3_journal_dirty_metadata");
-		retval = ext3_journal_dirty_metadata(handle, dir_bh);
-		if (retval) {
-journal_error:
-			ext3_std_error(new_dir->i_sb, retval);
-			goto end_rename;
-		}
-		drop_nlink(old_dir);
-		if (new_inode) {
-			drop_nlink(new_inode);
-		} else {
-			inc_nlink(new_dir);
-			ext3_update_dx_flag(new_dir);
-			ext3_mark_inode_dirty(handle, new_dir);
-		}
-	}
-	ext3_mark_inode_dirty(handle, old_dir);
-	if (new_inode) {
-		ext3_mark_inode_dirty(handle, new_inode);
-		if (!new_inode->i_nlink)
-			ext3_orphan_add(handle, new_inode);
-		if (ext3_should_writeback_data(new_inode))
-			flush_file = 1;
-	}
-	retval = 0;
-
-end_rename:
-	brelse (dir_bh);
-	brelse (old_bh);
-	brelse (new_bh);
-	ext3_journal_stop(handle);
-	if (retval == 0 && flush_file)
-		filemap_flush(old_inode->i_mapping);
-	return retval;
-}
-
-/*
- * directories can handle most operations...
- */
-const struct inode_operations ext3_dir_inode_operations = {
-	.create		= ext3_create,
-	.lookup		= ext3_lookup,
-	.link		= ext3_link,
-	.unlink		= ext3_unlink,
-	.symlink	= ext3_symlink,
-	.mkdir		= ext3_mkdir,
-	.rmdir		= ext3_rmdir,
-	.mknod		= ext3_mknod,
-	.tmpfile	= ext3_tmpfile,
-	.rename		= ext3_rename,
-	.setattr	= ext3_setattr,
-#ifdef CONFIG_EXT3_FS_XATTR
-	.setxattr	= generic_setxattr,
-	.getxattr	= generic_getxattr,
-	.listxattr	= ext3_listxattr,
-	.removexattr	= generic_removexattr,
-#endif
-	.get_acl	= ext3_get_acl,
-	.set_acl	= ext3_set_acl,
-};
-
-const struct inode_operations ext3_special_inode_operations = {
-	.setattr	= ext3_setattr,
-#ifdef CONFIG_EXT3_FS_XATTR
-	.setxattr	= generic_setxattr,
-	.getxattr	= generic_getxattr,
-	.listxattr	= ext3_listxattr,
-	.removexattr	= generic_removexattr,
-#endif
-	.get_acl	= ext3_get_acl,
-	.set_acl	= ext3_set_acl,
-};
diff --git a/fs/ext3/namei.h b/fs/ext3/namei.h
deleted file mode 100644
index 46304d8c9f0a..000000000000
--- a/fs/ext3/namei.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/*  linux/fs/ext3/namei.h
- *
- * Copyright (C) 2005 Simtec Electronics
- *	Ben Dooks <ben@simtec.co.uk>
- *
-*/
-
-extern struct dentry *ext3_get_parent(struct dentry *child);
-
-static inline struct buffer_head *ext3_dir_bread(handle_t *handle,
-						 struct inode *inode,
-						 int block, int create,
-						 int *err)
-{
-	struct buffer_head *bh;
-
-	bh = ext3_bread(handle, inode, block, create, err);
-
-	if (!bh && !(*err)) {
-		*err = -EIO;
-		ext3_error(inode->i_sb, __func__,
-			   "Directory hole detected on inode %lu\n",
-			   inode->i_ino);
-		return NULL;
-	}
-	return bh;
-}
diff --git a/fs/ext3/resize.c b/fs/ext3/resize.c
deleted file mode 100644
index 27105655502c..000000000000
--- a/fs/ext3/resize.c
+++ /dev/null
@@ -1,1117 +0,0 @@
-/*
- *  linux/fs/ext3/resize.c
- *
- * Support for resizing an ext3 filesystem while it is mounted.
- *
- * Copyright (C) 2001, 2002 Andreas Dilger <adilger@clusterfs.com>
- *
- * This could probably be made into a module, because it is not often in use.
- */
-
-
-#define EXT3FS_DEBUG
-
-#include "ext3.h"
-
-
-#define outside(b, first, last)	((b) < (first) || (b) >= (last))
-#define inside(b, first, last)	((b) >= (first) && (b) < (last))
-
-static int verify_group_input(struct super_block *sb,
-			      struct ext3_new_group_data *input)
-{
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	struct ext3_super_block *es = sbi->s_es;
-	ext3_fsblk_t start = le32_to_cpu(es->s_blocks_count);
-	ext3_fsblk_t end = start + input->blocks_count;
-	unsigned group = input->group;
-	ext3_fsblk_t itend = input->inode_table + sbi->s_itb_per_group;
-	unsigned overhead = ext3_bg_has_super(sb, group) ?
-		(1 + ext3_bg_num_gdb(sb, group) +
-		 le16_to_cpu(es->s_reserved_gdt_blocks)) : 0;
-	ext3_fsblk_t metaend = start + overhead;
-	struct buffer_head *bh = NULL;
-	ext3_grpblk_t free_blocks_count;
-	int err = -EINVAL;
-
-	input->free_blocks_count = free_blocks_count =
-		input->blocks_count - 2 - overhead - sbi->s_itb_per_group;
-
-	if (test_opt(sb, DEBUG))
-		printk(KERN_DEBUG "EXT3-fs: adding %s group %u: %u blocks "
-		       "(%d free, %u reserved)\n",
-		       ext3_bg_has_super(sb, input->group) ? "normal" :
-		       "no-super", input->group, input->blocks_count,
-		       free_blocks_count, input->reserved_blocks);
-
-	if (group != sbi->s_groups_count)
-		ext3_warning(sb, __func__,
-			     "Cannot add at group %u (only %lu groups)",
-			     input->group, sbi->s_groups_count);
-	else if ((start - le32_to_cpu(es->s_first_data_block)) %
-		 EXT3_BLOCKS_PER_GROUP(sb))
-		ext3_warning(sb, __func__, "Last group not full");
-	else if (input->reserved_blocks > input->blocks_count / 5)
-		ext3_warning(sb, __func__, "Reserved blocks too high (%u)",
-			     input->reserved_blocks);
-	else if (free_blocks_count < 0)
-		ext3_warning(sb, __func__, "Bad blocks count %u",
-			     input->blocks_count);
-	else if (!(bh = sb_bread(sb, end - 1)))
-		ext3_warning(sb, __func__,
-			     "Cannot read last block ("E3FSBLK")",
-			     end - 1);
-	else if (outside(input->block_bitmap, start, end))
-		ext3_warning(sb, __func__,
-			     "Block bitmap not in group (block %u)",
-			     input->block_bitmap);
-	else if (outside(input->inode_bitmap, start, end))
-		ext3_warning(sb, __func__,
-			     "Inode bitmap not in group (block %u)",
-			     input->inode_bitmap);
-	else if (outside(input->inode_table, start, end) ||
-	         outside(itend - 1, start, end))
-		ext3_warning(sb, __func__,
-			     "Inode table not in group (blocks %u-"E3FSBLK")",
-			     input->inode_table, itend - 1);
-	else if (input->inode_bitmap == input->block_bitmap)
-		ext3_warning(sb, __func__,
-			     "Block bitmap same as inode bitmap (%u)",
-			     input->block_bitmap);
-	else if (inside(input->block_bitmap, input->inode_table, itend))
-		ext3_warning(sb, __func__,
-			     "Block bitmap (%u) in inode table (%u-"E3FSBLK")",
-			     input->block_bitmap, input->inode_table, itend-1);
-	else if (inside(input->inode_bitmap, input->inode_table, itend))
-		ext3_warning(sb, __func__,
-			     "Inode bitmap (%u) in inode table (%u-"E3FSBLK")",
-			     input->inode_bitmap, input->inode_table, itend-1);
-	else if (inside(input->block_bitmap, start, metaend))
-		ext3_warning(sb, __func__,
-			     "Block bitmap (%u) in GDT table"
-			     " ("E3FSBLK"-"E3FSBLK")",
-			     input->block_bitmap, start, metaend - 1);
-	else if (inside(input->inode_bitmap, start, metaend))
-		ext3_warning(sb, __func__,
-			     "Inode bitmap (%u) in GDT table"
-			     " ("E3FSBLK"-"E3FSBLK")",
-			     input->inode_bitmap, start, metaend - 1);
-	else if (inside(input->inode_table, start, metaend) ||
-	         inside(itend - 1, start, metaend))
-		ext3_warning(sb, __func__,
-			     "Inode table (%u-"E3FSBLK") overlaps"
-			     "GDT table ("E3FSBLK"-"E3FSBLK")",
-			     input->inode_table, itend - 1, start, metaend - 1);
-	else
-		err = 0;
-	brelse(bh);
-
-	return err;
-}
-
-static struct buffer_head *bclean(handle_t *handle, struct super_block *sb,
-				  ext3_fsblk_t blk)
-{
-	struct buffer_head *bh;
-	int err;
-
-	bh = sb_getblk(sb, blk);
-	if (unlikely(!bh))
-		return ERR_PTR(-ENOMEM);
-	if ((err = ext3_journal_get_write_access(handle, bh))) {
-		brelse(bh);
-		bh = ERR_PTR(err);
-	} else {
-		lock_buffer(bh);
-		memset(bh->b_data, 0, sb->s_blocksize);
-		set_buffer_uptodate(bh);
-		unlock_buffer(bh);
-	}
-
-	return bh;
-}
-
-/*
- * To avoid calling the atomic setbit hundreds or thousands of times, we only
- * need to use it within a single byte (to ensure we get endianness right).
- * We can use memset for the rest of the bitmap as there are no other users.
- */
-static void mark_bitmap_end(int start_bit, int end_bit, char *bitmap)
-{
-	int i;
-
-	if (start_bit >= end_bit)
-		return;
-
-	ext3_debug("mark end bits +%d through +%d used\n", start_bit, end_bit);
-	for (i = start_bit; i < ((start_bit + 7) & ~7UL); i++)
-		ext3_set_bit(i, bitmap);
-	if (i < end_bit)
-		memset(bitmap + (i >> 3), 0xff, (end_bit - i) >> 3);
-}
-
-/*
- * If we have fewer than thresh credits, extend by EXT3_MAX_TRANS_DATA.
- * If that fails, restart the transaction & regain write access for the
- * buffer head which is used for block_bitmap modifications.
- */
-static int extend_or_restart_transaction(handle_t *handle, int thresh,
-					 struct buffer_head *bh)
-{
-	int err;
-
-	if (handle->h_buffer_credits >= thresh)
-		return 0;
-
-	err = ext3_journal_extend(handle, EXT3_MAX_TRANS_DATA);
-	if (err < 0)
-		return err;
-	if (err) {
-		err = ext3_journal_restart(handle, EXT3_MAX_TRANS_DATA);
-		if (err)
-			return err;
-		err = ext3_journal_get_write_access(handle, bh);
-		if (err)
-			return err;
-	}
-
-	return 0;
-}
-
-/*
- * Set up the block and inode bitmaps, and the inode table for the new group.
- * This doesn't need to be part of the main transaction, since we are only
- * changing blocks outside the actual filesystem.  We still do journaling to
- * ensure the recovery is correct in case of a failure just after resize.
- * If any part of this fails, we simply abort the resize.
- */
-static int setup_new_group_blocks(struct super_block *sb,
-				  struct ext3_new_group_data *input)
-{
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	ext3_fsblk_t start = ext3_group_first_block_no(sb, input->group);
-	int reserved_gdb = ext3_bg_has_super(sb, input->group) ?
-		le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks) : 0;
-	unsigned long gdblocks = ext3_bg_num_gdb(sb, input->group);
-	struct buffer_head *bh;
-	handle_t *handle;
-	ext3_fsblk_t block;
-	ext3_grpblk_t bit;
-	int i;
-	int err = 0, err2;
-
-	/* This transaction may be extended/restarted along the way */
-	handle = ext3_journal_start_sb(sb, EXT3_MAX_TRANS_DATA);
-
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-
-	mutex_lock(&sbi->s_resize_lock);
-	if (input->group != sbi->s_groups_count) {
-		err = -EBUSY;
-		goto exit_journal;
-	}
-
-	if (IS_ERR(bh = bclean(handle, sb, input->block_bitmap))) {
-		err = PTR_ERR(bh);
-		goto exit_journal;
-	}
-
-	if (ext3_bg_has_super(sb, input->group)) {
-		ext3_debug("mark backup superblock %#04lx (+0)\n", start);
-		ext3_set_bit(0, bh->b_data);
-	}
-
-	/* Copy all of the GDT blocks into the backup in this group */
-	for (i = 0, bit = 1, block = start + 1;
-	     i < gdblocks; i++, block++, bit++) {
-		struct buffer_head *gdb;
-
-		ext3_debug("update backup group %#04lx (+%d)\n", block, bit);
-
-		err = extend_or_restart_transaction(handle, 1, bh);
-		if (err)
-			goto exit_bh;
-
-		gdb = sb_getblk(sb, block);
-		if (unlikely(!gdb)) {
-			err = -ENOMEM;
-			goto exit_bh;
-		}
-		if ((err = ext3_journal_get_write_access(handle, gdb))) {
-			brelse(gdb);
-			goto exit_bh;
-		}
-		lock_buffer(gdb);
-		memcpy(gdb->b_data, sbi->s_group_desc[i]->b_data, gdb->b_size);
-		set_buffer_uptodate(gdb);
-		unlock_buffer(gdb);
-		err = ext3_journal_dirty_metadata(handle, gdb);
-		if (err) {
-			brelse(gdb);
-			goto exit_bh;
-		}
-		ext3_set_bit(bit, bh->b_data);
-		brelse(gdb);
-	}
-
-	/* Zero out all of the reserved backup group descriptor table blocks */
-	for (i = 0, bit = gdblocks + 1, block = start + bit;
-	     i < reserved_gdb; i++, block++, bit++) {
-		struct buffer_head *gdb;
-
-		ext3_debug("clear reserved block %#04lx (+%d)\n", block, bit);
-
-		err = extend_or_restart_transaction(handle, 1, bh);
-		if (err)
-			goto exit_bh;
-
-		if (IS_ERR(gdb = bclean(handle, sb, block))) {
-			err = PTR_ERR(gdb);
-			goto exit_bh;
-		}
-		err = ext3_journal_dirty_metadata(handle, gdb);
-		if (err) {
-			brelse(gdb);
-			goto exit_bh;
-		}
-		ext3_set_bit(bit, bh->b_data);
-		brelse(gdb);
-	}
-	ext3_debug("mark block bitmap %#04x (+%ld)\n", input->block_bitmap,
-		   input->block_bitmap - start);
-	ext3_set_bit(input->block_bitmap - start, bh->b_data);
-	ext3_debug("mark inode bitmap %#04x (+%ld)\n", input->inode_bitmap,
-		   input->inode_bitmap - start);
-	ext3_set_bit(input->inode_bitmap - start, bh->b_data);
-
-	/* Zero out all of the inode table blocks */
-	for (i = 0, block = input->inode_table, bit = block - start;
-	     i < sbi->s_itb_per_group; i++, bit++, block++) {
-		struct buffer_head *it;
-
-		ext3_debug("clear inode block %#04lx (+%d)\n", block, bit);
-
-		err = extend_or_restart_transaction(handle, 1, bh);
-		if (err)
-			goto exit_bh;
-
-		if (IS_ERR(it = bclean(handle, sb, block))) {
-			err = PTR_ERR(it);
-			goto exit_bh;
-		}
-		err = ext3_journal_dirty_metadata(handle, it);
-		if (err) {
-			brelse(it);
-			goto exit_bh;
-		}
-		brelse(it);
-		ext3_set_bit(bit, bh->b_data);
-	}
-
-	err = extend_or_restart_transaction(handle, 2, bh);
-	if (err)
-		goto exit_bh;
-
-	mark_bitmap_end(input->blocks_count, EXT3_BLOCKS_PER_GROUP(sb),
-			bh->b_data);
-	err = ext3_journal_dirty_metadata(handle, bh);
-	if (err)
-		goto exit_bh;
-	brelse(bh);
-
-	/* Mark unused entries in inode bitmap used */
-	ext3_debug("clear inode bitmap %#04x (+%ld)\n",
-		   input->inode_bitmap, input->inode_bitmap - start);
-	if (IS_ERR(bh = bclean(handle, sb, input->inode_bitmap))) {
-		err = PTR_ERR(bh);
-		goto exit_journal;
-	}
-
-	mark_bitmap_end(EXT3_INODES_PER_GROUP(sb), EXT3_BLOCKS_PER_GROUP(sb),
-			bh->b_data);
-	err = ext3_journal_dirty_metadata(handle, bh);
-exit_bh:
-	brelse(bh);
-
-exit_journal:
-	mutex_unlock(&sbi->s_resize_lock);
-	if ((err2 = ext3_journal_stop(handle)) && !err)
-		err = err2;
-
-	return err;
-}
-
-/*
- * Iterate through the groups which hold BACKUP superblock/GDT copies in an
- * ext3 filesystem.  The counters should be initialized to 1, 5, and 7 before
- * calling this for the first time.  In a sparse filesystem it will be the
- * sequence of powers of 3, 5, and 7: 1, 3, 5, 7, 9, 25, 27, 49, 81, ...
- * For a non-sparse filesystem it will be every group: 1, 2, 3, 4, ...
- */
-static unsigned ext3_list_backups(struct super_block *sb, unsigned *three,
-				  unsigned *five, unsigned *seven)
-{
-	unsigned *min = three;
-	int mult = 3;
-	unsigned ret;
-
-	if (!EXT3_HAS_RO_COMPAT_FEATURE(sb,
-					EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER)) {
-		ret = *min;
-		*min += 1;
-		return ret;
-	}
-
-	if (*five < *min) {
-		min = five;
-		mult = 5;
-	}
-	if (*seven < *min) {
-		min = seven;
-		mult = 7;
-	}
-
-	ret = *min;
-	*min *= mult;
-
-	return ret;
-}
-
-/*
- * Check that all of the backup GDT blocks are held in the primary GDT block.
- * It is assumed that they are stored in group order.  Returns the number of
- * groups in current filesystem that have BACKUPS, or -ve error code.
- */
-static int verify_reserved_gdb(struct super_block *sb,
-			       struct buffer_head *primary)
-{
-	const ext3_fsblk_t blk = primary->b_blocknr;
-	const unsigned long end = EXT3_SB(sb)->s_groups_count;
-	unsigned three = 1;
-	unsigned five = 5;
-	unsigned seven = 7;
-	unsigned grp;
-	__le32 *p = (__le32 *)primary->b_data;
-	int gdbackups = 0;
-
-	while ((grp = ext3_list_backups(sb, &three, &five, &seven)) < end) {
-		if (le32_to_cpu(*p++) != grp * EXT3_BLOCKS_PER_GROUP(sb) + blk){
-			ext3_warning(sb, __func__,
-				     "reserved GDT "E3FSBLK
-				     " missing grp %d ("E3FSBLK")",
-				     blk, grp,
-				     grp * EXT3_BLOCKS_PER_GROUP(sb) + blk);
-			return -EINVAL;
-		}
-		if (++gdbackups > EXT3_ADDR_PER_BLOCK(sb))
-			return -EFBIG;
-	}
-
-	return gdbackups;
-}
-
-/*
- * Called when we need to bring a reserved group descriptor table block into
- * use from the resize inode.  The primary copy of the new GDT block currently
- * is an indirect block (under the double indirect block in the resize inode).
- * The new backup GDT blocks will be stored as leaf blocks in this indirect
- * block, in group order.  Even though we know all the block numbers we need,
- * we check to ensure that the resize inode has actually reserved these blocks.
- *
- * Don't need to update the block bitmaps because the blocks are still in use.
- *
- * We get all of the error cases out of the way, so that we are sure to not
- * fail once we start modifying the data on disk, because JBD has no rollback.
- */
-static int add_new_gdb(handle_t *handle, struct inode *inode,
-		       struct ext3_new_group_data *input,
-		       struct buffer_head **primary)
-{
-	struct super_block *sb = inode->i_sb;
-	struct ext3_super_block *es = EXT3_SB(sb)->s_es;
-	unsigned long gdb_num = input->group / EXT3_DESC_PER_BLOCK(sb);
-	ext3_fsblk_t gdblock = EXT3_SB(sb)->s_sbh->b_blocknr + 1 + gdb_num;
-	struct buffer_head **o_group_desc, **n_group_desc;
-	struct buffer_head *dind;
-	int gdbackups;
-	struct ext3_iloc iloc;
-	__le32 *data;
-	int err;
-
-	if (test_opt(sb, DEBUG))
-		printk(KERN_DEBUG
-		       "EXT3-fs: ext3_add_new_gdb: adding group block %lu\n",
-		       gdb_num);
-
-	/*
-	 * If we are not using the primary superblock/GDT copy don't resize,
-	 * because the user tools have no way of handling this.  Probably a
-	 * bad time to do it anyways.
-	 */
-	if (EXT3_SB(sb)->s_sbh->b_blocknr !=
-	    le32_to_cpu(EXT3_SB(sb)->s_es->s_first_data_block)) {
-		ext3_warning(sb, __func__,
-			"won't resize using backup superblock at %llu",
-			(unsigned long long)EXT3_SB(sb)->s_sbh->b_blocknr);
-		return -EPERM;
-	}
-
-	*primary = sb_bread(sb, gdblock);
-	if (!*primary)
-		return -EIO;
-
-	if ((gdbackups = verify_reserved_gdb(sb, *primary)) < 0) {
-		err = gdbackups;
-		goto exit_bh;
-	}
-
-	data = EXT3_I(inode)->i_data + EXT3_DIND_BLOCK;
-	dind = sb_bread(sb, le32_to_cpu(*data));
-	if (!dind) {
-		err = -EIO;
-		goto exit_bh;
-	}
-
-	data = (__le32 *)dind->b_data;
-	if (le32_to_cpu(data[gdb_num % EXT3_ADDR_PER_BLOCK(sb)]) != gdblock) {
-		ext3_warning(sb, __func__,
-			     "new group %u GDT block "E3FSBLK" not reserved",
-			     input->group, gdblock);
-		err = -EINVAL;
-		goto exit_dind;
-	}
-
-	if ((err = ext3_journal_get_write_access(handle, EXT3_SB(sb)->s_sbh)))
-		goto exit_dind;
-
-	if ((err = ext3_journal_get_write_access(handle, *primary)))
-		goto exit_sbh;
-
-	if ((err = ext3_journal_get_write_access(handle, dind)))
-		goto exit_primary;
-
-	/* ext3_reserve_inode_write() gets a reference on the iloc */
-	if ((err = ext3_reserve_inode_write(handle, inode, &iloc)))
-		goto exit_dindj;
-
-	n_group_desc = kmalloc((gdb_num + 1) * sizeof(struct buffer_head *),
-			GFP_NOFS);
-	if (!n_group_desc) {
-		err = -ENOMEM;
-		ext3_warning (sb, __func__,
-			      "not enough memory for %lu groups", gdb_num + 1);
-		goto exit_inode;
-	}
-
-	/*
-	 * Finally, we have all of the possible failures behind us...
-	 *
-	 * Remove new GDT block from inode double-indirect block and clear out
-	 * the new GDT block for use (which also "frees" the backup GDT blocks
-	 * from the reserved inode).  We don't need to change the bitmaps for
-	 * these blocks, because they are marked as in-use from being in the
-	 * reserved inode, and will become GDT blocks (primary and backup).
-	 */
-	data[gdb_num % EXT3_ADDR_PER_BLOCK(sb)] = 0;
-	err = ext3_journal_dirty_metadata(handle, dind);
-	if (err)
-		goto exit_group_desc;
-	brelse(dind);
-	dind = NULL;
-	inode->i_blocks -= (gdbackups + 1) * sb->s_blocksize >> 9;
-	err = ext3_mark_iloc_dirty(handle, inode, &iloc);
-	if (err)
-		goto exit_group_desc;
-	memset((*primary)->b_data, 0, sb->s_blocksize);
-	err = ext3_journal_dirty_metadata(handle, *primary);
-	if (err)
-		goto exit_group_desc;
-
-	o_group_desc = EXT3_SB(sb)->s_group_desc;
-	memcpy(n_group_desc, o_group_desc,
-	       EXT3_SB(sb)->s_gdb_count * sizeof(struct buffer_head *));
-	n_group_desc[gdb_num] = *primary;
-	EXT3_SB(sb)->s_group_desc = n_group_desc;
-	EXT3_SB(sb)->s_gdb_count++;
-	kfree(o_group_desc);
-
-	le16_add_cpu(&es->s_reserved_gdt_blocks, -1);
-	err = ext3_journal_dirty_metadata(handle, EXT3_SB(sb)->s_sbh);
-	if (err)
-		goto exit_inode;
-
-	return 0;
-
-exit_group_desc:
-	kfree(n_group_desc);
-exit_inode:
-	//ext3_journal_release_buffer(handle, iloc.bh);
-	brelse(iloc.bh);
-exit_dindj:
-	//ext3_journal_release_buffer(handle, dind);
-exit_primary:
-	//ext3_journal_release_buffer(handle, *primary);
-exit_sbh:
-	//ext3_journal_release_buffer(handle, *primary);
-exit_dind:
-	brelse(dind);
-exit_bh:
-	brelse(*primary);
-
-	ext3_debug("leaving with error %d\n", err);
-	return err;
-}
-
-/*
- * Called when we are adding a new group which has a backup copy of each of
- * the GDT blocks (i.e. sparse group) and there are reserved GDT blocks.
- * We need to add these reserved backup GDT blocks to the resize inode, so
- * that they are kept for future resizing and not allocated to files.
- *
- * Each reserved backup GDT block will go into a different indirect block.
- * The indirect blocks are actually the primary reserved GDT blocks,
- * so we know in advance what their block numbers are.  We only get the
- * double-indirect block to verify it is pointing to the primary reserved
- * GDT blocks so we don't overwrite a data block by accident.  The reserved
- * backup GDT blocks are stored in their reserved primary GDT block.
- */
-static int reserve_backup_gdb(handle_t *handle, struct inode *inode,
-			      struct ext3_new_group_data *input)
-{
-	struct super_block *sb = inode->i_sb;
-	int reserved_gdb =le16_to_cpu(EXT3_SB(sb)->s_es->s_reserved_gdt_blocks);
-	struct buffer_head **primary;
-	struct buffer_head *dind;
-	struct ext3_iloc iloc;
-	ext3_fsblk_t blk;
-	__le32 *data, *end;
-	int gdbackups = 0;
-	int res, i;
-	int err;
-
-	primary = kmalloc(reserved_gdb * sizeof(*primary), GFP_NOFS);
-	if (!primary)
-		return -ENOMEM;
-
-	data = EXT3_I(inode)->i_data + EXT3_DIND_BLOCK;
-	dind = sb_bread(sb, le32_to_cpu(*data));
-	if (!dind) {
-		err = -EIO;
-		goto exit_free;
-	}
-
-	blk = EXT3_SB(sb)->s_sbh->b_blocknr + 1 + EXT3_SB(sb)->s_gdb_count;
-	data = (__le32 *)dind->b_data + (EXT3_SB(sb)->s_gdb_count %
-					 EXT3_ADDR_PER_BLOCK(sb));
-	end = (__le32 *)dind->b_data + EXT3_ADDR_PER_BLOCK(sb);
-
-	/* Get each reserved primary GDT block and verify it holds backups */
-	for (res = 0; res < reserved_gdb; res++, blk++) {
-		if (le32_to_cpu(*data) != blk) {
-			ext3_warning(sb, __func__,
-				     "reserved block "E3FSBLK
-				     " not at offset %ld",
-				     blk,
-				     (long)(data - (__le32 *)dind->b_data));
-			err = -EINVAL;
-			goto exit_bh;
-		}
-		primary[res] = sb_bread(sb, blk);
-		if (!primary[res]) {
-			err = -EIO;
-			goto exit_bh;
-		}
-		if ((gdbackups = verify_reserved_gdb(sb, primary[res])) < 0) {
-			brelse(primary[res]);
-			err = gdbackups;
-			goto exit_bh;
-		}
-		if (++data >= end)
-			data = (__le32 *)dind->b_data;
-	}
-
-	for (i = 0; i < reserved_gdb; i++) {
-		if ((err = ext3_journal_get_write_access(handle, primary[i]))) {
-			/*
-			int j;
-			for (j = 0; j < i; j++)
-				ext3_journal_release_buffer(handle, primary[j]);
-			 */
-			goto exit_bh;
-		}
-	}
-
-	if ((err = ext3_reserve_inode_write(handle, inode, &iloc)))
-		goto exit_bh;
-
-	/*
-	 * Finally we can add each of the reserved backup GDT blocks from
-	 * the new group to its reserved primary GDT block.
-	 */
-	blk = input->group * EXT3_BLOCKS_PER_GROUP(sb);
-	for (i = 0; i < reserved_gdb; i++) {
-		int err2;
-		data = (__le32 *)primary[i]->b_data;
-		/* printk("reserving backup %lu[%u] = %lu\n",
-		       primary[i]->b_blocknr, gdbackups,
-		       blk + primary[i]->b_blocknr); */
-		data[gdbackups] = cpu_to_le32(blk + primary[i]->b_blocknr);
-		err2 = ext3_journal_dirty_metadata(handle, primary[i]);
-		if (!err)
-			err = err2;
-	}
-	inode->i_blocks += reserved_gdb * sb->s_blocksize >> 9;
-	ext3_mark_iloc_dirty(handle, inode, &iloc);
-
-exit_bh:
-	while (--res >= 0)
-		brelse(primary[res]);
-	brelse(dind);
-
-exit_free:
-	kfree(primary);
-
-	return err;
-}
-
-/*
- * Update the backup copies of the ext3 metadata.  These don't need to be part
- * of the main resize transaction, because e2fsck will re-write them if there
- * is a problem (basically only OOM will cause a problem).  However, we
- * _should_ update the backups if possible, in case the primary gets trashed
- * for some reason and we need to run e2fsck from a backup superblock.  The
- * important part is that the new block and inode counts are in the backup
- * superblocks, and the location of the new group metadata in the GDT backups.
- *
- * We do not need take the s_resize_lock for this, because these
- * blocks are not otherwise touched by the filesystem code when it is
- * mounted.  We don't need to worry about last changing from
- * sbi->s_groups_count, because the worst that can happen is that we
- * do not copy the full number of backups at this time.  The resize
- * which changed s_groups_count will backup again.
- */
-static void update_backups(struct super_block *sb,
-			   int blk_off, char *data, int size)
-{
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	const unsigned long last = sbi->s_groups_count;
-	const int bpg = EXT3_BLOCKS_PER_GROUP(sb);
-	unsigned three = 1;
-	unsigned five = 5;
-	unsigned seven = 7;
-	unsigned group;
-	int rest = sb->s_blocksize - size;
-	handle_t *handle;
-	int err = 0, err2;
-
-	handle = ext3_journal_start_sb(sb, EXT3_MAX_TRANS_DATA);
-	if (IS_ERR(handle)) {
-		group = 1;
-		err = PTR_ERR(handle);
-		goto exit_err;
-	}
-
-	while ((group = ext3_list_backups(sb, &three, &five, &seven)) < last) {
-		struct buffer_head *bh;
-
-		/* Out of journal space, and can't get more - abort - so sad */
-		if (handle->h_buffer_credits == 0 &&
-		    ext3_journal_extend(handle, EXT3_MAX_TRANS_DATA) &&
-		    (err = ext3_journal_restart(handle, EXT3_MAX_TRANS_DATA)))
-			break;
-
-		bh = sb_getblk(sb, group * bpg + blk_off);
-		if (unlikely(!bh)) {
-			err = -ENOMEM;
-			break;
-		}
-		ext3_debug("update metadata backup %#04lx\n",
-			  (unsigned long)bh->b_blocknr);
-		if ((err = ext3_journal_get_write_access(handle, bh))) {
-			brelse(bh);
-			break;
-		}
-		lock_buffer(bh);
-		memcpy(bh->b_data, data, size);
-		if (rest)
-			memset(bh->b_data + size, 0, rest);
-		set_buffer_uptodate(bh);
-		unlock_buffer(bh);
-		err = ext3_journal_dirty_metadata(handle, bh);
-		brelse(bh);
-		if (err)
-			break;
-	}
-	if ((err2 = ext3_journal_stop(handle)) && !err)
-		err = err2;
-
-	/*
-	 * Ugh! Need to have e2fsck write the backup copies.  It is too
-	 * late to revert the resize, we shouldn't fail just because of
-	 * the backup copies (they are only needed in case of corruption).
-	 *
-	 * However, if we got here we have a journal problem too, so we
-	 * can't really start a transaction to mark the superblock.
-	 * Chicken out and just set the flag on the hope it will be written
-	 * to disk, and if not - we will simply wait until next fsck.
-	 */
-exit_err:
-	if (err) {
-		ext3_warning(sb, __func__,
-			     "can't update backup for group %d (err %d), "
-			     "forcing fsck on next reboot", group, err);
-		sbi->s_mount_state &= ~EXT3_VALID_FS;
-		sbi->s_es->s_state &= cpu_to_le16(~EXT3_VALID_FS);
-		mark_buffer_dirty(sbi->s_sbh);
-	}
-}
-
-/* Add group descriptor data to an existing or new group descriptor block.
- * Ensure we handle all possible error conditions _before_ we start modifying
- * the filesystem, because we cannot abort the transaction and not have it
- * write the data to disk.
- *
- * If we are on a GDT block boundary, we need to get the reserved GDT block.
- * Otherwise, we may need to add backup GDT blocks for a sparse group.
- *
- * We only need to hold the superblock lock while we are actually adding
- * in the new group's counts to the superblock.  Prior to that we have
- * not really "added" the group at all.  We re-check that we are still
- * adding in the last group in case things have changed since verifying.
- */
-int ext3_group_add(struct super_block *sb, struct ext3_new_group_data *input)
-{
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	struct ext3_super_block *es = sbi->s_es;
-	int reserved_gdb = ext3_bg_has_super(sb, input->group) ?
-		le16_to_cpu(es->s_reserved_gdt_blocks) : 0;
-	struct buffer_head *primary = NULL;
-	struct ext3_group_desc *gdp;
-	struct inode *inode = NULL;
-	handle_t *handle;
-	int gdb_off, gdb_num;
-	int err, err2;
-
-	gdb_num = input->group / EXT3_DESC_PER_BLOCK(sb);
-	gdb_off = input->group % EXT3_DESC_PER_BLOCK(sb);
-
-	if (gdb_off == 0 && !EXT3_HAS_RO_COMPAT_FEATURE(sb,
-					EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER)) {
-		ext3_warning(sb, __func__,
-			     "Can't resize non-sparse filesystem further");
-		return -EPERM;
-	}
-
-	if (le32_to_cpu(es->s_blocks_count) + input->blocks_count <
-	    le32_to_cpu(es->s_blocks_count)) {
-		ext3_warning(sb, __func__, "blocks_count overflow\n");
-		return -EINVAL;
-	}
-
-	if (le32_to_cpu(es->s_inodes_count) + EXT3_INODES_PER_GROUP(sb) <
-	    le32_to_cpu(es->s_inodes_count)) {
-		ext3_warning(sb, __func__, "inodes_count overflow\n");
-		return -EINVAL;
-	}
-
-	if (reserved_gdb || gdb_off == 0) {
-		if (!EXT3_HAS_COMPAT_FEATURE(sb,
-					     EXT3_FEATURE_COMPAT_RESIZE_INODE)
-		    || !le16_to_cpu(es->s_reserved_gdt_blocks)) {
-			ext3_warning(sb, __func__,
-				     "No reserved GDT blocks, can't resize");
-			return -EPERM;
-		}
-		inode = ext3_iget(sb, EXT3_RESIZE_INO);
-		if (IS_ERR(inode)) {
-			ext3_warning(sb, __func__,
-				     "Error opening resize inode");
-			return PTR_ERR(inode);
-		}
-	}
-
-	if ((err = verify_group_input(sb, input)))
-		goto exit_put;
-
-	if ((err = setup_new_group_blocks(sb, input)))
-		goto exit_put;
-
-	/*
-	 * We will always be modifying at least the superblock and a GDT
-	 * block.  If we are adding a group past the last current GDT block,
-	 * we will also modify the inode and the dindirect block.  If we
-	 * are adding a group with superblock/GDT backups  we will also
-	 * modify each of the reserved GDT dindirect blocks.
-	 */
-	handle = ext3_journal_start_sb(sb,
-				       ext3_bg_has_super(sb, input->group) ?
-				       3 + reserved_gdb : 4);
-	if (IS_ERR(handle)) {
-		err = PTR_ERR(handle);
-		goto exit_put;
-	}
-
-	mutex_lock(&sbi->s_resize_lock);
-	if (input->group != sbi->s_groups_count) {
-		ext3_warning(sb, __func__,
-			     "multiple resizers run on filesystem!");
-		err = -EBUSY;
-		goto exit_journal;
-	}
-
-	if ((err = ext3_journal_get_write_access(handle, sbi->s_sbh)))
-		goto exit_journal;
-
-	/*
-	 * We will only either add reserved group blocks to a backup group
-	 * or remove reserved blocks for the first group in a new group block.
-	 * Doing both would be mean more complex code, and sane people don't
-	 * use non-sparse filesystems anymore.  This is already checked above.
-	 */
-	if (gdb_off) {
-		primary = sbi->s_group_desc[gdb_num];
-		if ((err = ext3_journal_get_write_access(handle, primary)))
-			goto exit_journal;
-
-		if (reserved_gdb && ext3_bg_num_gdb(sb, input->group) &&
-		    (err = reserve_backup_gdb(handle, inode, input)))
-			goto exit_journal;
-	} else if ((err = add_new_gdb(handle, inode, input, &primary)))
-		goto exit_journal;
-
-	/*
-	 * OK, now we've set up the new group.  Time to make it active.
-	 *
-	 * We do not lock all allocations via s_resize_lock
-	 * so we have to be safe wrt. concurrent accesses the group
-	 * data.  So we need to be careful to set all of the relevant
-	 * group descriptor data etc. *before* we enable the group.
-	 *
-	 * The key field here is sbi->s_groups_count: as long as
-	 * that retains its old value, nobody is going to access the new
-	 * group.
-	 *
-	 * So first we update all the descriptor metadata for the new
-	 * group; then we update the total disk blocks count; then we
-	 * update the groups count to enable the group; then finally we
-	 * update the free space counts so that the system can start
-	 * using the new disk blocks.
-	 */
-
-	/* Update group descriptor block for new group */
-	gdp = (struct ext3_group_desc *)primary->b_data + gdb_off;
-
-	gdp->bg_block_bitmap = cpu_to_le32(input->block_bitmap);
-	gdp->bg_inode_bitmap = cpu_to_le32(input->inode_bitmap);
-	gdp->bg_inode_table = cpu_to_le32(input->inode_table);
-	gdp->bg_free_blocks_count = cpu_to_le16(input->free_blocks_count);
-	gdp->bg_free_inodes_count = cpu_to_le16(EXT3_INODES_PER_GROUP(sb));
-
-	/*
-	 * Make the new blocks and inodes valid next.  We do this before
-	 * increasing the group count so that once the group is enabled,
-	 * all of its blocks and inodes are already valid.
-	 *
-	 * We always allocate group-by-group, then block-by-block or
-	 * inode-by-inode within a group, so enabling these
-	 * blocks/inodes before the group is live won't actually let us
-	 * allocate the new space yet.
-	 */
-	le32_add_cpu(&es->s_blocks_count, input->blocks_count);
-	le32_add_cpu(&es->s_inodes_count, EXT3_INODES_PER_GROUP(sb));
-
-	/*
-	 * We need to protect s_groups_count against other CPUs seeing
-	 * inconsistent state in the superblock.
-	 *
-	 * The precise rules we use are:
-	 *
-	 * * Writers of s_groups_count *must* hold s_resize_lock
-	 * AND
-	 * * Writers must perform a smp_wmb() after updating all dependent
-	 *   data and before modifying the groups count
-	 *
-	 * * Readers must hold s_resize_lock over the access
-	 * OR
-	 * * Readers must perform an smp_rmb() after reading the groups count
-	 *   and before reading any dependent data.
-	 *
-	 * NB. These rules can be relaxed when checking the group count
-	 * while freeing data, as we can only allocate from a block
-	 * group after serialising against the group count, and we can
-	 * only then free after serialising in turn against that
-	 * allocation.
-	 */
-	smp_wmb();
-
-	/* Update the global fs size fields */
-	sbi->s_groups_count++;
-
-	err = ext3_journal_dirty_metadata(handle, primary);
-	if (err)
-		goto exit_journal;
-
-	/* Update the reserved block counts only once the new group is
-	 * active. */
-	le32_add_cpu(&es->s_r_blocks_count, input->reserved_blocks);
-
-	/* Update the free space counts */
-	percpu_counter_add(&sbi->s_freeblocks_counter,
-			   input->free_blocks_count);
-	percpu_counter_add(&sbi->s_freeinodes_counter,
-			   EXT3_INODES_PER_GROUP(sb));
-
-	err = ext3_journal_dirty_metadata(handle, sbi->s_sbh);
-
-exit_journal:
-	mutex_unlock(&sbi->s_resize_lock);
-	if ((err2 = ext3_journal_stop(handle)) && !err)
-		err = err2;
-	if (!err) {
-		update_backups(sb, sbi->s_sbh->b_blocknr, (char *)es,
-			       sizeof(struct ext3_super_block));
-		update_backups(sb, primary->b_blocknr, primary->b_data,
-			       primary->b_size);
-	}
-exit_put:
-	iput(inode);
-	return err;
-} /* ext3_group_add */
-
-/* Extend the filesystem to the new number of blocks specified.  This entry
- * point is only used to extend the current filesystem to the end of the last
- * existing group.  It can be accessed via ioctl, or by "remount,resize=<size>"
- * for emergencies (because it has no dependencies on reserved blocks).
- *
- * If we _really_ wanted, we could use default values to call ext3_group_add()
- * allow the "remount" trick to work for arbitrary resizing, assuming enough
- * GDT blocks are reserved to grow to the desired size.
- */
-int ext3_group_extend(struct super_block *sb, struct ext3_super_block *es,
-		      ext3_fsblk_t n_blocks_count)
-{
-	ext3_fsblk_t o_blocks_count;
-	ext3_grpblk_t last;
-	ext3_grpblk_t add;
-	struct buffer_head * bh;
-	handle_t *handle;
-	int err;
-	unsigned long freed_blocks;
-
-	/* We don't need to worry about locking wrt other resizers just
-	 * yet: we're going to revalidate es->s_blocks_count after
-	 * taking the s_resize_lock below. */
-	o_blocks_count = le32_to_cpu(es->s_blocks_count);
-
-	if (test_opt(sb, DEBUG))
-		printk(KERN_DEBUG "EXT3-fs: extending last group from "E3FSBLK
-		       " up to "E3FSBLK" blocks\n",
-		       o_blocks_count, n_blocks_count);
-
-	if (n_blocks_count == 0 || n_blocks_count == o_blocks_count)
-		return 0;
-
-	if (n_blocks_count > (sector_t)(~0ULL) >> (sb->s_blocksize_bits - 9)) {
-		printk(KERN_ERR "EXT3-fs: filesystem on %s:"
-			" too large to resize to "E3FSBLK" blocks safely\n",
-			sb->s_id, n_blocks_count);
-		if (sizeof(sector_t) < 8)
-			ext3_warning(sb, __func__,
-			"CONFIG_LBDAF not enabled\n");
-		return -EINVAL;
-	}
-
-	if (n_blocks_count < o_blocks_count) {
-		ext3_warning(sb, __func__,
-			     "can't shrink FS - resize aborted");
-		return -EBUSY;
-	}
-
-	/* Handle the remaining blocks in the last group only. */
-	last = (o_blocks_count - le32_to_cpu(es->s_first_data_block)) %
-		EXT3_BLOCKS_PER_GROUP(sb);
-
-	if (last == 0) {
-		ext3_warning(sb, __func__,
-			     "need to use ext2online to resize further");
-		return -EPERM;
-	}
-
-	add = EXT3_BLOCKS_PER_GROUP(sb) - last;
-
-	if (o_blocks_count + add < o_blocks_count) {
-		ext3_warning(sb, __func__, "blocks_count overflow");
-		return -EINVAL;
-	}
-
-	if (o_blocks_count + add > n_blocks_count)
-		add = n_blocks_count - o_blocks_count;
-
-	if (o_blocks_count + add < n_blocks_count)
-		ext3_warning(sb, __func__,
-			     "will only finish group ("E3FSBLK
-			     " blocks, %u new)",
-			     o_blocks_count + add, add);
-
-	/* See if the device is actually as big as what was requested */
-	bh = sb_bread(sb, o_blocks_count + add -1);
-	if (!bh) {
-		ext3_warning(sb, __func__,
-			     "can't read last block, resize aborted");
-		return -ENOSPC;
-	}
-	brelse(bh);
-
-	/* We will update the superblock, one block bitmap, and
-	 * one group descriptor via ext3_free_blocks().
-	 */
-	handle = ext3_journal_start_sb(sb, 3);
-	if (IS_ERR(handle)) {
-		err = PTR_ERR(handle);
-		ext3_warning(sb, __func__, "error %d on journal start",err);
-		goto exit_put;
-	}
-
-	mutex_lock(&EXT3_SB(sb)->s_resize_lock);
-	if (o_blocks_count != le32_to_cpu(es->s_blocks_count)) {
-		ext3_warning(sb, __func__,
-			     "multiple resizers run on filesystem!");
-		mutex_unlock(&EXT3_SB(sb)->s_resize_lock);
-		ext3_journal_stop(handle);
-		err = -EBUSY;
-		goto exit_put;
-	}
-
-	if ((err = ext3_journal_get_write_access(handle,
-						 EXT3_SB(sb)->s_sbh))) {
-		ext3_warning(sb, __func__,
-			     "error %d on journal write access", err);
-		mutex_unlock(&EXT3_SB(sb)->s_resize_lock);
-		ext3_journal_stop(handle);
-		goto exit_put;
-	}
-	es->s_blocks_count = cpu_to_le32(o_blocks_count + add);
-	err = ext3_journal_dirty_metadata(handle, EXT3_SB(sb)->s_sbh);
-	mutex_unlock(&EXT3_SB(sb)->s_resize_lock);
-	if (err) {
-		ext3_warning(sb, __func__,
-			     "error %d on journal dirty metadata", err);
-		ext3_journal_stop(handle);
-		goto exit_put;
-	}
-	ext3_debug("freeing blocks "E3FSBLK" through "E3FSBLK"\n",
-		   o_blocks_count, o_blocks_count + add);
-	ext3_free_blocks_sb(handle, sb, o_blocks_count, add, &freed_blocks);
-	ext3_debug("freed blocks "E3FSBLK" through "E3FSBLK"\n",
-		   o_blocks_count, o_blocks_count + add);
-	if ((err = ext3_journal_stop(handle)))
-		goto exit_put;
-	if (test_opt(sb, DEBUG))
-		printk(KERN_DEBUG "EXT3-fs: extended group to %u blocks\n",
-		       le32_to_cpu(es->s_blocks_count));
-	update_backups(sb, EXT3_SB(sb)->s_sbh->b_blocknr, (char *)es,
-		       sizeof(struct ext3_super_block));
-exit_put:
-	return err;
-} /* ext3_group_extend */
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
deleted file mode 100644
index 5ed0044fbb37..000000000000
--- a/fs/ext3/super.c
+++ /dev/null
@@ -1,3165 +0,0 @@
-/*
- *  linux/fs/ext3/super.c
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- *
- *  from
- *
- *  linux/fs/minix/inode.c
- *
- *  Copyright (C) 1991, 1992  Linus Torvalds
- *
- *  Big-endian to little-endian byte-swapping/bitmaps by
- *        David S. Miller (davem@caip.rutgers.edu), 1995
- */
-
-#include <linux/module.h>
-#include <linux/blkdev.h>
-#include <linux/parser.h>
-#include <linux/exportfs.h>
-#include <linux/statfs.h>
-#include <linux/random.h>
-#include <linux/mount.h>
-#include <linux/quotaops.h>
-#include <linux/seq_file.h>
-#include <linux/log2.h>
-#include <linux/cleancache.h>
-#include <linux/namei.h>
-
-#include <asm/uaccess.h>
-
-#define CREATE_TRACE_POINTS
-
-#include "ext3.h"
-#include "xattr.h"
-#include "acl.h"
-#include "namei.h"
-
-#ifdef CONFIG_EXT3_DEFAULTS_TO_ORDERED
-  #define EXT3_MOUNT_DEFAULT_DATA_MODE EXT3_MOUNT_ORDERED_DATA
-#else
-  #define EXT3_MOUNT_DEFAULT_DATA_MODE EXT3_MOUNT_WRITEBACK_DATA
-#endif
-
-static int ext3_load_journal(struct super_block *, struct ext3_super_block *,
-			     unsigned long journal_devnum);
-static int ext3_create_journal(struct super_block *, struct ext3_super_block *,
-			       unsigned int);
-static int ext3_commit_super(struct super_block *sb,
-			       struct ext3_super_block *es,
-			       int sync);
-static void ext3_mark_recovery_complete(struct super_block * sb,
-					struct ext3_super_block * es);
-static void ext3_clear_journal_err(struct super_block * sb,
-				   struct ext3_super_block * es);
-static int ext3_sync_fs(struct super_block *sb, int wait);
-static const char *ext3_decode_error(struct super_block * sb, int errno,
-				     char nbuf[16]);
-static int ext3_remount (struct super_block * sb, int * flags, char * data);
-static int ext3_statfs (struct dentry * dentry, struct kstatfs * buf);
-static int ext3_unfreeze(struct super_block *sb);
-static int ext3_freeze(struct super_block *sb);
-
-/*
- * Wrappers for journal_start/end.
- */
-handle_t *ext3_journal_start_sb(struct super_block *sb, int nblocks)
-{
-	journal_t *journal;
-
-	if (sb->s_flags & MS_RDONLY)
-		return ERR_PTR(-EROFS);
-
-	/* Special case here: if the journal has aborted behind our
-	 * backs (eg. EIO in the commit thread), then we still need to
-	 * take the FS itself readonly cleanly. */
-	journal = EXT3_SB(sb)->s_journal;
-	if (is_journal_aborted(journal)) {
-		ext3_abort(sb, __func__,
-			   "Detected aborted journal");
-		return ERR_PTR(-EROFS);
-	}
-
-	return journal_start(journal, nblocks);
-}
-
-int __ext3_journal_stop(const char *where, handle_t *handle)
-{
-	struct super_block *sb;
-	int err;
-	int rc;
-
-	sb = handle->h_transaction->t_journal->j_private;
-	err = handle->h_err;
-	rc = journal_stop(handle);
-
-	if (!err)
-		err = rc;
-	if (err)
-		__ext3_std_error(sb, where, err);
-	return err;
-}
-
-void ext3_journal_abort_handle(const char *caller, const char *err_fn,
-		struct buffer_head *bh, handle_t *handle, int err)
-{
-	char nbuf[16];
-	const char *errstr = ext3_decode_error(NULL, err, nbuf);
-
-	if (bh)
-		BUFFER_TRACE(bh, "abort");
-
-	if (!handle->h_err)
-		handle->h_err = err;
-
-	if (is_handle_aborted(handle))
-		return;
-
-	printk(KERN_ERR "EXT3-fs: %s: aborting transaction: %s in %s\n",
-		caller, errstr, err_fn);
-
-	journal_abort_handle(handle);
-}
-
-void ext3_msg(struct super_block *sb, const char *prefix,
-		const char *fmt, ...)
-{
-	struct va_format vaf;
-	va_list args;
-
-	va_start(args, fmt);
-
-	vaf.fmt = fmt;
-	vaf.va = &args;
-
-	printk("%sEXT3-fs (%s): %pV\n", prefix, sb->s_id, &vaf);
-
-	va_end(args);
-}
-
-/* Deal with the reporting of failure conditions on a filesystem such as
- * inconsistencies detected or read IO failures.
- *
- * On ext2, we can store the error state of the filesystem in the
- * superblock.  That is not possible on ext3, because we may have other
- * write ordering constraints on the superblock which prevent us from
- * writing it out straight away; and given that the journal is about to
- * be aborted, we can't rely on the current, or future, transactions to
- * write out the superblock safely.
- *
- * We'll just use the journal_abort() error code to record an error in
- * the journal instead.  On recovery, the journal will complain about
- * that error until we've noted it down and cleared it.
- */
-
-static void ext3_handle_error(struct super_block *sb)
-{
-	struct ext3_super_block *es = EXT3_SB(sb)->s_es;
-
-	EXT3_SB(sb)->s_mount_state |= EXT3_ERROR_FS;
-	es->s_state |= cpu_to_le16(EXT3_ERROR_FS);
-
-	if (sb->s_flags & MS_RDONLY)
-		return;
-
-	if (!test_opt (sb, ERRORS_CONT)) {
-		journal_t *journal = EXT3_SB(sb)->s_journal;
-
-		set_opt(EXT3_SB(sb)->s_mount_opt, ABORT);
-		if (journal)
-			journal_abort(journal, -EIO);
-	}
-	if (test_opt (sb, ERRORS_RO)) {
-		ext3_msg(sb, KERN_CRIT,
-			"error: remounting filesystem read-only");
-		/*
-		 * Make sure updated value of ->s_mount_state will be visible
-		 * before ->s_flags update.
-		 */
-		smp_wmb();
-		sb->s_flags |= MS_RDONLY;
-	}
-	ext3_commit_super(sb, es, 1);
-	if (test_opt(sb, ERRORS_PANIC))
-		panic("EXT3-fs (%s): panic forced after error\n",
-			sb->s_id);
-}
-
-void ext3_error(struct super_block *sb, const char *function,
-		const char *fmt, ...)
-{
-	struct va_format vaf;
-	va_list args;
-
-	va_start(args, fmt);
-
-	vaf.fmt = fmt;
-	vaf.va = &args;
-
-	printk(KERN_CRIT "EXT3-fs error (device %s): %s: %pV\n",
-	       sb->s_id, function, &vaf);
-
-	va_end(args);
-
-	ext3_handle_error(sb);
-}
-
-static const char *ext3_decode_error(struct super_block * sb, int errno,
-				     char nbuf[16])
-{
-	char *errstr = NULL;
-
-	switch (errno) {
-	case -EIO:
-		errstr = "IO failure";
-		break;
-	case -ENOMEM:
-		errstr = "Out of memory";
-		break;
-	case -EROFS:
-		if (!sb || EXT3_SB(sb)->s_journal->j_flags & JFS_ABORT)
-			errstr = "Journal has aborted";
-		else
-			errstr = "Readonly filesystem";
-		break;
-	default:
-		/* If the caller passed in an extra buffer for unknown
-		 * errors, textualise them now.  Else we just return
-		 * NULL. */
-		if (nbuf) {
-			/* Check for truncated error codes... */
-			if (snprintf(nbuf, 16, "error %d", -errno) >= 0)
-				errstr = nbuf;
-		}
-		break;
-	}
-
-	return errstr;
-}
-
-/* __ext3_std_error decodes expected errors from journaling functions
- * automatically and invokes the appropriate error response.  */
-
-void __ext3_std_error (struct super_block * sb, const char * function,
-		       int errno)
-{
-	char nbuf[16];
-	const char *errstr;
-
-	/* Special case: if the error is EROFS, and we're not already
-	 * inside a transaction, then there's really no point in logging
-	 * an error. */
-	if (errno == -EROFS && journal_current_handle() == NULL &&
-	    (sb->s_flags & MS_RDONLY))
-		return;
-
-	errstr = ext3_decode_error(sb, errno, nbuf);
-	ext3_msg(sb, KERN_CRIT, "error in %s: %s", function, errstr);
-
-	ext3_handle_error(sb);
-}
-
-/*
- * ext3_abort is a much stronger failure handler than ext3_error.  The
- * abort function may be used to deal with unrecoverable failures such
- * as journal IO errors or ENOMEM at a critical moment in log management.
- *
- * We unconditionally force the filesystem into an ABORT|READONLY state,
- * unless the error response on the fs has been set to panic in which
- * case we take the easy way out and panic immediately.
- */
-
-void ext3_abort(struct super_block *sb, const char *function,
-		 const char *fmt, ...)
-{
-	struct va_format vaf;
-	va_list args;
-
-	va_start(args, fmt);
-
-	vaf.fmt = fmt;
-	vaf.va = &args;
-
-	printk(KERN_CRIT "EXT3-fs (%s): error: %s: %pV\n",
-	       sb->s_id, function, &vaf);
-
-	va_end(args);
-
-	if (test_opt(sb, ERRORS_PANIC))
-		panic("EXT3-fs: panic from previous error\n");
-
-	if (sb->s_flags & MS_RDONLY)
-		return;
-
-	ext3_msg(sb, KERN_CRIT,
-		"error: remounting filesystem read-only");
-	EXT3_SB(sb)->s_mount_state |= EXT3_ERROR_FS;
-	set_opt(EXT3_SB(sb)->s_mount_opt, ABORT);
-	/*
-	 * Make sure updated value of ->s_mount_state will be visible
-	 * before ->s_flags update.
-	 */
-	smp_wmb();
-	sb->s_flags |= MS_RDONLY;
-
-	if (EXT3_SB(sb)->s_journal)
-		journal_abort(EXT3_SB(sb)->s_journal, -EIO);
-}
-
-void ext3_warning(struct super_block *sb, const char *function,
-		  const char *fmt, ...)
-{
-	struct va_format vaf;
-	va_list args;
-
-	va_start(args, fmt);
-
-	vaf.fmt = fmt;
-	vaf.va = &args;
-
-	printk(KERN_WARNING "EXT3-fs (%s): warning: %s: %pV\n",
-	       sb->s_id, function, &vaf);
-
-	va_end(args);
-}
-
-void ext3_update_dynamic_rev(struct super_block *sb)
-{
-	struct ext3_super_block *es = EXT3_SB(sb)->s_es;
-
-	if (le32_to_cpu(es->s_rev_level) > EXT3_GOOD_OLD_REV)
-		return;
-
-	ext3_msg(sb, KERN_WARNING,
-		"warning: updating to rev %d because of "
-		"new feature flag, running e2fsck is recommended",
-		EXT3_DYNAMIC_REV);
-
-	es->s_first_ino = cpu_to_le32(EXT3_GOOD_OLD_FIRST_INO);
-	es->s_inode_size = cpu_to_le16(EXT3_GOOD_OLD_INODE_SIZE);
-	es->s_rev_level = cpu_to_le32(EXT3_DYNAMIC_REV);
-	/* leave es->s_feature_*compat flags alone */
-	/* es->s_uuid will be set by e2fsck if empty */
-
-	/*
-	 * The rest of the superblock fields should be zero, and if not it
-	 * means they are likely already in use, so leave them alone.  We
-	 * can leave it up to e2fsck to clean up any inconsistencies there.
-	 */
-}
-
-/*
- * Open the external journal device
- */
-static struct block_device *ext3_blkdev_get(dev_t dev, struct super_block *sb)
-{
-	struct block_device *bdev;
-	char b[BDEVNAME_SIZE];
-
-	bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb);
-	if (IS_ERR(bdev))
-		goto fail;
-	return bdev;
-
-fail:
-	ext3_msg(sb, KERN_ERR, "error: failed to open journal device %s: %ld",
-		__bdevname(dev, b), PTR_ERR(bdev));
-
-	return NULL;
-}
-
-/*
- * Release the journal device
- */
-static void ext3_blkdev_put(struct block_device *bdev)
-{
-	blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
-}
-
-static void ext3_blkdev_remove(struct ext3_sb_info *sbi)
-{
-	struct block_device *bdev;
-	bdev = sbi->journal_bdev;
-	if (bdev) {
-		ext3_blkdev_put(bdev);
-		sbi->journal_bdev = NULL;
-	}
-}
-
-static inline struct inode *orphan_list_entry(struct list_head *l)
-{
-	return &list_entry(l, struct ext3_inode_info, i_orphan)->vfs_inode;
-}
-
-static void dump_orphan_list(struct super_block *sb, struct ext3_sb_info *sbi)
-{
-	struct list_head *l;
-
-	ext3_msg(sb, KERN_ERR, "error: sb orphan head is %d",
-	       le32_to_cpu(sbi->s_es->s_last_orphan));
-
-	ext3_msg(sb, KERN_ERR, "sb_info orphan list:");
-	list_for_each(l, &sbi->s_orphan) {
-		struct inode *inode = orphan_list_entry(l);
-		ext3_msg(sb, KERN_ERR, "  "
-		       "inode %s:%lu at %p: mode %o, nlink %d, next %d\n",
-		       inode->i_sb->s_id, inode->i_ino, inode,
-		       inode->i_mode, inode->i_nlink,
-		       NEXT_ORPHAN(inode));
-	}
-}
-
-static void ext3_put_super (struct super_block * sb)
-{
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	struct ext3_super_block *es = sbi->s_es;
-	int i, err;
-
-	dquot_disable(sb, -1, DQUOT_USAGE_ENABLED | DQUOT_LIMITS_ENABLED);
-	ext3_xattr_put_super(sb);
-	err = journal_destroy(sbi->s_journal);
-	sbi->s_journal = NULL;
-	if (err < 0)
-		ext3_abort(sb, __func__, "Couldn't clean up the journal");
-
-	if (!(sb->s_flags & MS_RDONLY)) {
-		EXT3_CLEAR_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_RECOVER);
-		es->s_state = cpu_to_le16(sbi->s_mount_state);
-		BUFFER_TRACE(sbi->s_sbh, "marking dirty");
-		mark_buffer_dirty(sbi->s_sbh);
-		ext3_commit_super(sb, es, 1);
-	}
-
-	for (i = 0; i < sbi->s_gdb_count; i++)
-		brelse(sbi->s_group_desc[i]);
-	kfree(sbi->s_group_desc);
-	percpu_counter_destroy(&sbi->s_freeblocks_counter);
-	percpu_counter_destroy(&sbi->s_freeinodes_counter);
-	percpu_counter_destroy(&sbi->s_dirs_counter);
-	brelse(sbi->s_sbh);
-#ifdef CONFIG_QUOTA
-	for (i = 0; i < EXT3_MAXQUOTAS; i++)
-		kfree(sbi->s_qf_names[i]);
-#endif
-
-	/* Debugging code just in case the in-memory inode orphan list
-	 * isn't empty.  The on-disk one can be non-empty if we've
-	 * detected an error and taken the fs readonly, but the
-	 * in-memory list had better be clean by this point. */
-	if (!list_empty(&sbi->s_orphan))
-		dump_orphan_list(sb, sbi);
-	J_ASSERT(list_empty(&sbi->s_orphan));
-
-	invalidate_bdev(sb->s_bdev);
-	if (sbi->journal_bdev && sbi->journal_bdev != sb->s_bdev) {
-		/*
-		 * Invalidate the journal device's buffers.  We don't want them
-		 * floating about in memory - the physical journal device may
-		 * hotswapped, and it breaks the `ro-after' testing code.
-		 */
-		sync_blockdev(sbi->journal_bdev);
-		invalidate_bdev(sbi->journal_bdev);
-		ext3_blkdev_remove(sbi);
-	}
-	sb->s_fs_info = NULL;
-	kfree(sbi->s_blockgroup_lock);
-	mutex_destroy(&sbi->s_orphan_lock);
-	mutex_destroy(&sbi->s_resize_lock);
-	kfree(sbi);
-}
-
-static struct kmem_cache *ext3_inode_cachep;
-
-/*
- * Called inside transaction, so use GFP_NOFS
- */
-static struct inode *ext3_alloc_inode(struct super_block *sb)
-{
-	struct ext3_inode_info *ei;
-
-	ei = kmem_cache_alloc(ext3_inode_cachep, GFP_NOFS);
-	if (!ei)
-		return NULL;
-	ei->i_block_alloc_info = NULL;
-	ei->vfs_inode.i_version = 1;
-	atomic_set(&ei->i_datasync_tid, 0);
-	atomic_set(&ei->i_sync_tid, 0);
-#ifdef CONFIG_QUOTA
-	memset(&ei->i_dquot, 0, sizeof(ei->i_dquot));
-#endif
-
-	return &ei->vfs_inode;
-}
-
-static int ext3_drop_inode(struct inode *inode)
-{
-	int drop = generic_drop_inode(inode);
-
-	trace_ext3_drop_inode(inode, drop);
-	return drop;
-}
-
-static void ext3_i_callback(struct rcu_head *head)
-{
-	struct inode *inode = container_of(head, struct inode, i_rcu);
-	kmem_cache_free(ext3_inode_cachep, EXT3_I(inode));
-}
-
-static void ext3_destroy_inode(struct inode *inode)
-{
-	if (!list_empty(&(EXT3_I(inode)->i_orphan))) {
-		printk("EXT3 Inode %p: orphan list check failed!\n",
-			EXT3_I(inode));
-		print_hex_dump(KERN_INFO, "", DUMP_PREFIX_ADDRESS, 16, 4,
-				EXT3_I(inode), sizeof(struct ext3_inode_info),
-				false);
-		dump_stack();
-	}
-	call_rcu(&inode->i_rcu, ext3_i_callback);
-}
-
-static void init_once(void *foo)
-{
-	struct ext3_inode_info *ei = (struct ext3_inode_info *) foo;
-
-	INIT_LIST_HEAD(&ei->i_orphan);
-#ifdef CONFIG_EXT3_FS_XATTR
-	init_rwsem(&ei->xattr_sem);
-#endif
-	mutex_init(&ei->truncate_mutex);
-	inode_init_once(&ei->vfs_inode);
-}
-
-static int __init init_inodecache(void)
-{
-	ext3_inode_cachep = kmem_cache_create("ext3_inode_cache",
-					     sizeof(struct ext3_inode_info),
-					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD),
-					     init_once);
-	if (ext3_inode_cachep == NULL)
-		return -ENOMEM;
-	return 0;
-}
-
-static void destroy_inodecache(void)
-{
-	/*
-	 * Make sure all delayed rcu free inodes are flushed before we
-	 * destroy cache.
-	 */
-	rcu_barrier();
-	kmem_cache_destroy(ext3_inode_cachep);
-}
-
-static inline void ext3_show_quota_options(struct seq_file *seq, struct super_block *sb)
-{
-#if defined(CONFIG_QUOTA)
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-
-	if (sbi->s_jquota_fmt) {
-		char *fmtname = "";
-
-		switch (sbi->s_jquota_fmt) {
-		case QFMT_VFS_OLD:
-			fmtname = "vfsold";
-			break;
-		case QFMT_VFS_V0:
-			fmtname = "vfsv0";
-			break;
-		case QFMT_VFS_V1:
-			fmtname = "vfsv1";
-			break;
-		}
-		seq_printf(seq, ",jqfmt=%s", fmtname);
-	}
-
-	if (sbi->s_qf_names[USRQUOTA])
-		seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]);
-
-	if (sbi->s_qf_names[GRPQUOTA])
-		seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]);
-
-	if (test_opt(sb, USRQUOTA))
-		seq_puts(seq, ",usrquota");
-
-	if (test_opt(sb, GRPQUOTA))
-		seq_puts(seq, ",grpquota");
-#endif
-}
-
-static char *data_mode_string(unsigned long mode)
-{
-	switch (mode) {
-	case EXT3_MOUNT_JOURNAL_DATA:
-		return "journal";
-	case EXT3_MOUNT_ORDERED_DATA:
-		return "ordered";
-	case EXT3_MOUNT_WRITEBACK_DATA:
-		return "writeback";
-	}
-	return "unknown";
-}
-
-/*
- * Show an option if
- *  - it's set to a non-default value OR
- *  - if the per-sb default is different from the global default
- */
-static int ext3_show_options(struct seq_file *seq, struct dentry *root)
-{
-	struct super_block *sb = root->d_sb;
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	struct ext3_super_block *es = sbi->s_es;
-	unsigned long def_mount_opts;
-
-	def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
-
-	if (sbi->s_sb_block != 1)
-		seq_printf(seq, ",sb=%lu", sbi->s_sb_block);
-	if (test_opt(sb, MINIX_DF))
-		seq_puts(seq, ",minixdf");
-	if (test_opt(sb, GRPID))
-		seq_puts(seq, ",grpid");
-	if (!test_opt(sb, GRPID) && (def_mount_opts & EXT3_DEFM_BSDGROUPS))
-		seq_puts(seq, ",nogrpid");
-	if (!uid_eq(sbi->s_resuid, make_kuid(&init_user_ns, EXT3_DEF_RESUID)) ||
-	    le16_to_cpu(es->s_def_resuid) != EXT3_DEF_RESUID) {
-		seq_printf(seq, ",resuid=%u",
-				from_kuid_munged(&init_user_ns, sbi->s_resuid));
-	}
-	if (!gid_eq(sbi->s_resgid, make_kgid(&init_user_ns, EXT3_DEF_RESGID)) ||
-	    le16_to_cpu(es->s_def_resgid) != EXT3_DEF_RESGID) {
-		seq_printf(seq, ",resgid=%u",
-				from_kgid_munged(&init_user_ns, sbi->s_resgid));
-	}
-	if (test_opt(sb, ERRORS_RO)) {
-		int def_errors = le16_to_cpu(es->s_errors);
-
-		if (def_errors == EXT3_ERRORS_PANIC ||
-		    def_errors == EXT3_ERRORS_CONTINUE) {
-			seq_puts(seq, ",errors=remount-ro");
-		}
-	}
-	if (test_opt(sb, ERRORS_CONT))
-		seq_puts(seq, ",errors=continue");
-	if (test_opt(sb, ERRORS_PANIC))
-		seq_puts(seq, ",errors=panic");
-	if (test_opt(sb, NO_UID32))
-		seq_puts(seq, ",nouid32");
-	if (test_opt(sb, DEBUG))
-		seq_puts(seq, ",debug");
-#ifdef CONFIG_EXT3_FS_XATTR
-	if (test_opt(sb, XATTR_USER))
-		seq_puts(seq, ",user_xattr");
-	if (!test_opt(sb, XATTR_USER) &&
-	    (def_mount_opts & EXT3_DEFM_XATTR_USER)) {
-		seq_puts(seq, ",nouser_xattr");
-	}
-#endif
-#ifdef CONFIG_EXT3_FS_POSIX_ACL
-	if (test_opt(sb, POSIX_ACL))
-		seq_puts(seq, ",acl");
-	if (!test_opt(sb, POSIX_ACL) && (def_mount_opts & EXT3_DEFM_ACL))
-		seq_puts(seq, ",noacl");
-#endif
-	if (!test_opt(sb, RESERVATION))
-		seq_puts(seq, ",noreservation");
-	if (sbi->s_commit_interval) {
-		seq_printf(seq, ",commit=%u",
-			   (unsigned) (sbi->s_commit_interval / HZ));
-	}
-
-	/*
-	 * Always display barrier state so it's clear what the status is.
-	 */
-	seq_puts(seq, ",barrier=");
-	seq_puts(seq, test_opt(sb, BARRIER) ? "1" : "0");
-	seq_printf(seq, ",data=%s", data_mode_string(test_opt(sb, DATA_FLAGS)));
-	if (test_opt(sb, DATA_ERR_ABORT))
-		seq_puts(seq, ",data_err=abort");
-
-	if (test_opt(sb, NOLOAD))
-		seq_puts(seq, ",norecovery");
-
-	ext3_show_quota_options(seq, sb);
-
-	return 0;
-}
-
-
-static struct inode *ext3_nfs_get_inode(struct super_block *sb,
-		u64 ino, u32 generation)
-{
-	struct inode *inode;
-
-	if (ino < EXT3_FIRST_INO(sb) && ino != EXT3_ROOT_INO)
-		return ERR_PTR(-ESTALE);
-	if (ino > le32_to_cpu(EXT3_SB(sb)->s_es->s_inodes_count))
-		return ERR_PTR(-ESTALE);
-
-	/* iget isn't really right if the inode is currently unallocated!!
-	 *
-	 * ext3_read_inode will return a bad_inode if the inode had been
-	 * deleted, so we should be safe.
-	 *
-	 * Currently we don't know the generation for parent directory, so
-	 * a generation of 0 means "accept any"
-	 */
-	inode = ext3_iget(sb, ino);
-	if (IS_ERR(inode))
-		return ERR_CAST(inode);
-	if (generation && inode->i_generation != generation) {
-		iput(inode);
-		return ERR_PTR(-ESTALE);
-	}
-
-	return inode;
-}
-
-static struct dentry *ext3_fh_to_dentry(struct super_block *sb, struct fid *fid,
-		int fh_len, int fh_type)
-{
-	return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
-				    ext3_nfs_get_inode);
-}
-
-static struct dentry *ext3_fh_to_parent(struct super_block *sb, struct fid *fid,
-		int fh_len, int fh_type)
-{
-	return generic_fh_to_parent(sb, fid, fh_len, fh_type,
-				    ext3_nfs_get_inode);
-}
-
-/*
- * Try to release metadata pages (indirect blocks, directories) which are
- * mapped via the block device.  Since these pages could have journal heads
- * which would prevent try_to_free_buffers() from freeing them, we must use
- * jbd layer's try_to_free_buffers() function to release them.
- */
-static int bdev_try_to_free_page(struct super_block *sb, struct page *page,
-				 gfp_t wait)
-{
-	journal_t *journal = EXT3_SB(sb)->s_journal;
-
-	WARN_ON(PageChecked(page));
-	if (!page_has_buffers(page))
-		return 0;
-	if (journal)
-		return journal_try_to_free_buffers(journal, page, 
-						   wait & ~__GFP_WAIT);
-	return try_to_free_buffers(page);
-}
-
-#ifdef CONFIG_QUOTA
-#define QTYPE2NAME(t) ((t)==USRQUOTA?"user":"group")
-#define QTYPE2MOPT(on, t) ((t)==USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA))
-
-static int ext3_write_dquot(struct dquot *dquot);
-static int ext3_acquire_dquot(struct dquot *dquot);
-static int ext3_release_dquot(struct dquot *dquot);
-static int ext3_mark_dquot_dirty(struct dquot *dquot);
-static int ext3_write_info(struct super_block *sb, int type);
-static int ext3_quota_on(struct super_block *sb, int type, int format_id,
-			 struct path *path);
-static int ext3_quota_on_mount(struct super_block *sb, int type);
-static ssize_t ext3_quota_read(struct super_block *sb, int type, char *data,
-			       size_t len, loff_t off);
-static ssize_t ext3_quota_write(struct super_block *sb, int type,
-				const char *data, size_t len, loff_t off);
-static struct dquot **ext3_get_dquots(struct inode *inode)
-{
-	return EXT3_I(inode)->i_dquot;
-}
-
-static const struct dquot_operations ext3_quota_operations = {
-	.write_dquot	= ext3_write_dquot,
-	.acquire_dquot	= ext3_acquire_dquot,
-	.release_dquot	= ext3_release_dquot,
-	.mark_dirty	= ext3_mark_dquot_dirty,
-	.write_info	= ext3_write_info,
-	.alloc_dquot	= dquot_alloc,
-	.destroy_dquot	= dquot_destroy,
-};
-
-static const struct quotactl_ops ext3_qctl_operations = {
-	.quota_on	= ext3_quota_on,
-	.quota_off	= dquot_quota_off,
-	.quota_sync	= dquot_quota_sync,
-	.get_state	= dquot_get_state,
-	.set_info	= dquot_set_dqinfo,
-	.get_dqblk	= dquot_get_dqblk,
-	.set_dqblk	= dquot_set_dqblk
-};
-#endif
-
-static const struct super_operations ext3_sops = {
-	.alloc_inode	= ext3_alloc_inode,
-	.destroy_inode	= ext3_destroy_inode,
-	.write_inode	= ext3_write_inode,
-	.dirty_inode	= ext3_dirty_inode,
-	.drop_inode	= ext3_drop_inode,
-	.evict_inode	= ext3_evict_inode,
-	.put_super	= ext3_put_super,
-	.sync_fs	= ext3_sync_fs,
-	.freeze_fs	= ext3_freeze,
-	.unfreeze_fs	= ext3_unfreeze,
-	.statfs		= ext3_statfs,
-	.remount_fs	= ext3_remount,
-	.show_options	= ext3_show_options,
-#ifdef CONFIG_QUOTA
-	.quota_read	= ext3_quota_read,
-	.quota_write	= ext3_quota_write,
-	.get_dquots	= ext3_get_dquots,
-#endif
-	.bdev_try_to_free_page = bdev_try_to_free_page,
-};
-
-static const struct export_operations ext3_export_ops = {
-	.fh_to_dentry = ext3_fh_to_dentry,
-	.fh_to_parent = ext3_fh_to_parent,
-	.get_parent = ext3_get_parent,
-};
-
-enum {
-	Opt_bsd_df, Opt_minix_df, Opt_grpid, Opt_nogrpid,
-	Opt_resgid, Opt_resuid, Opt_sb, Opt_err_cont, Opt_err_panic, Opt_err_ro,
-	Opt_nouid32, Opt_nocheck, Opt_debug, Opt_oldalloc, Opt_orlov,
-	Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl,
-	Opt_reservation, Opt_noreservation, Opt_noload, Opt_nobh, Opt_bh,
-	Opt_commit, Opt_journal_update, Opt_journal_inum, Opt_journal_dev,
-	Opt_journal_path,
-	Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback,
-	Opt_data_err_abort, Opt_data_err_ignore,
-	Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
-	Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
-	Opt_noquota, Opt_ignore, Opt_barrier, Opt_nobarrier, Opt_err,
-	Opt_resize, Opt_usrquota, Opt_grpquota
-};
-
-static const match_table_t tokens = {
-	{Opt_bsd_df, "bsddf"},
-	{Opt_minix_df, "minixdf"},
-	{Opt_grpid, "grpid"},
-	{Opt_grpid, "bsdgroups"},
-	{Opt_nogrpid, "nogrpid"},
-	{Opt_nogrpid, "sysvgroups"},
-	{Opt_resgid, "resgid=%u"},
-	{Opt_resuid, "resuid=%u"},
-	{Opt_sb, "sb=%u"},
-	{Opt_err_cont, "errors=continue"},
-	{Opt_err_panic, "errors=panic"},
-	{Opt_err_ro, "errors=remount-ro"},
-	{Opt_nouid32, "nouid32"},
-	{Opt_nocheck, "nocheck"},
-	{Opt_nocheck, "check=none"},
-	{Opt_debug, "debug"},
-	{Opt_oldalloc, "oldalloc"},
-	{Opt_orlov, "orlov"},
-	{Opt_user_xattr, "user_xattr"},
-	{Opt_nouser_xattr, "nouser_xattr"},
-	{Opt_acl, "acl"},
-	{Opt_noacl, "noacl"},
-	{Opt_reservation, "reservation"},
-	{Opt_noreservation, "noreservation"},
-	{Opt_noload, "noload"},
-	{Opt_noload, "norecovery"},
-	{Opt_nobh, "nobh"},
-	{Opt_bh, "bh"},
-	{Opt_commit, "commit=%u"},
-	{Opt_journal_update, "journal=update"},
-	{Opt_journal_inum, "journal=%u"},
-	{Opt_journal_dev, "journal_dev=%u"},
-	{Opt_journal_path, "journal_path=%s"},
-	{Opt_abort, "abort"},
-	{Opt_data_journal, "data=journal"},
-	{Opt_data_ordered, "data=ordered"},
-	{Opt_data_writeback, "data=writeback"},
-	{Opt_data_err_abort, "data_err=abort"},
-	{Opt_data_err_ignore, "data_err=ignore"},
-	{Opt_offusrjquota, "usrjquota="},
-	{Opt_usrjquota, "usrjquota=%s"},
-	{Opt_offgrpjquota, "grpjquota="},
-	{Opt_grpjquota, "grpjquota=%s"},
-	{Opt_jqfmt_vfsold, "jqfmt=vfsold"},
-	{Opt_jqfmt_vfsv0, "jqfmt=vfsv0"},
-	{Opt_jqfmt_vfsv1, "jqfmt=vfsv1"},
-	{Opt_grpquota, "grpquota"},
-	{Opt_noquota, "noquota"},
-	{Opt_quota, "quota"},
-	{Opt_usrquota, "usrquota"},
-	{Opt_barrier, "barrier=%u"},
-	{Opt_barrier, "barrier"},
-	{Opt_nobarrier, "nobarrier"},
-	{Opt_resize, "resize"},
-	{Opt_err, NULL},
-};
-
-static ext3_fsblk_t get_sb_block(void **data, struct super_block *sb)
-{
-	ext3_fsblk_t	sb_block;
-	char		*options = (char *) *data;
-
-	if (!options || strncmp(options, "sb=", 3) != 0)
-		return 1;	/* Default location */
-	options += 3;
-	/*todo: use simple_strtoll with >32bit ext3 */
-	sb_block = simple_strtoul(options, &options, 0);
-	if (*options && *options != ',') {
-		ext3_msg(sb, KERN_ERR, "error: invalid sb specification: %s",
-		       (char *) *data);
-		return 1;
-	}
-	if (*options == ',')
-		options++;
-	*data = (void *) options;
-	return sb_block;
-}
-
-#ifdef CONFIG_QUOTA
-static int set_qf_name(struct super_block *sb, int qtype, substring_t *args)
-{
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	char *qname;
-
-	if (sb_any_quota_loaded(sb) &&
-		!sbi->s_qf_names[qtype]) {
-		ext3_msg(sb, KERN_ERR,
-			"Cannot change journaled "
-			"quota options when quota turned on");
-		return 0;
-	}
-	qname = match_strdup(args);
-	if (!qname) {
-		ext3_msg(sb, KERN_ERR,
-			"Not enough memory for storing quotafile name");
-		return 0;
-	}
-	if (sbi->s_qf_names[qtype]) {
-		int same = !strcmp(sbi->s_qf_names[qtype], qname);
-
-		kfree(qname);
-		if (!same) {
-			ext3_msg(sb, KERN_ERR,
-				 "%s quota file already specified",
-				 QTYPE2NAME(qtype));
-		}
-		return same;
-	}
-	if (strchr(qname, '/')) {
-		ext3_msg(sb, KERN_ERR,
-			"quotafile must be on filesystem root");
-		kfree(qname);
-		return 0;
-	}
-	sbi->s_qf_names[qtype] = qname;
-	set_opt(sbi->s_mount_opt, QUOTA);
-	return 1;
-}
-
-static int clear_qf_name(struct super_block *sb, int qtype) {
-
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-
-	if (sb_any_quota_loaded(sb) &&
-		sbi->s_qf_names[qtype]) {
-		ext3_msg(sb, KERN_ERR, "Cannot change journaled quota options"
-			" when quota turned on");
-		return 0;
-	}
-	if (sbi->s_qf_names[qtype]) {
-		kfree(sbi->s_qf_names[qtype]);
-		sbi->s_qf_names[qtype] = NULL;
-	}
-	return 1;
-}
-#endif
-
-static int parse_options (char *options, struct super_block *sb,
-			  unsigned int *inum, unsigned long *journal_devnum,
-			  ext3_fsblk_t *n_blocks_count, int is_remount)
-{
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	char * p;
-	substring_t args[MAX_OPT_ARGS];
-	int data_opt = 0;
-	int option;
-	kuid_t uid;
-	kgid_t gid;
-	char *journal_path;
-	struct inode *journal_inode;
-	struct path path;
-	int error;
-
-#ifdef CONFIG_QUOTA
-	int qfmt;
-#endif
-
-	if (!options)
-		return 1;
-
-	while ((p = strsep (&options, ",")) != NULL) {
-		int token;
-		if (!*p)
-			continue;
-		/*
-		 * Initialize args struct so we know whether arg was
-		 * found; some options take optional arguments.
-		 */
-		args[0].to = args[0].from = NULL;
-		token = match_token(p, tokens, args);
-		switch (token) {
-		case Opt_bsd_df:
-			clear_opt (sbi->s_mount_opt, MINIX_DF);
-			break;
-		case Opt_minix_df:
-			set_opt (sbi->s_mount_opt, MINIX_DF);
-			break;
-		case Opt_grpid:
-			set_opt (sbi->s_mount_opt, GRPID);
-			break;
-		case Opt_nogrpid:
-			clear_opt (sbi->s_mount_opt, GRPID);
-			break;
-		case Opt_resuid:
-			if (match_int(&args[0], &option))
-				return 0;
-			uid = make_kuid(current_user_ns(), option);
-			if (!uid_valid(uid)) {
-				ext3_msg(sb, KERN_ERR, "Invalid uid value %d", option);
-				return 0;
-
-			}
-			sbi->s_resuid = uid;
-			break;
-		case Opt_resgid:
-			if (match_int(&args[0], &option))
-				return 0;
-			gid = make_kgid(current_user_ns(), option);
-			if (!gid_valid(gid)) {
-				ext3_msg(sb, KERN_ERR, "Invalid gid value %d", option);
-				return 0;
-			}
-			sbi->s_resgid = gid;
-			break;
-		case Opt_sb:
-			/* handled by get_sb_block() instead of here */
-			/* *sb_block = match_int(&args[0]); */
-			break;
-		case Opt_err_panic:
-			clear_opt (sbi->s_mount_opt, ERRORS_CONT);
-			clear_opt (sbi->s_mount_opt, ERRORS_RO);
-			set_opt (sbi->s_mount_opt, ERRORS_PANIC);
-			break;
-		case Opt_err_ro:
-			clear_opt (sbi->s_mount_opt, ERRORS_CONT);
-			clear_opt (sbi->s_mount_opt, ERRORS_PANIC);
-			set_opt (sbi->s_mount_opt, ERRORS_RO);
-			break;
-		case Opt_err_cont:
-			clear_opt (sbi->s_mount_opt, ERRORS_RO);
-			clear_opt (sbi->s_mount_opt, ERRORS_PANIC);
-			set_opt (sbi->s_mount_opt, ERRORS_CONT);
-			break;
-		case Opt_nouid32:
-			set_opt (sbi->s_mount_opt, NO_UID32);
-			break;
-		case Opt_nocheck:
-			clear_opt (sbi->s_mount_opt, CHECK);
-			break;
-		case Opt_debug:
-			set_opt (sbi->s_mount_opt, DEBUG);
-			break;
-		case Opt_oldalloc:
-			ext3_msg(sb, KERN_WARNING,
-				"Ignoring deprecated oldalloc option");
-			break;
-		case Opt_orlov:
-			ext3_msg(sb, KERN_WARNING,
-				"Ignoring deprecated orlov option");
-			break;
-#ifdef CONFIG_EXT3_FS_XATTR
-		case Opt_user_xattr:
-			set_opt (sbi->s_mount_opt, XATTR_USER);
-			break;
-		case Opt_nouser_xattr:
-			clear_opt (sbi->s_mount_opt, XATTR_USER);
-			break;
-#else
-		case Opt_user_xattr:
-		case Opt_nouser_xattr:
-			ext3_msg(sb, KERN_INFO,
-				"(no)user_xattr options not supported");
-			break;
-#endif
-#ifdef CONFIG_EXT3_FS_POSIX_ACL
-		case Opt_acl:
-			set_opt(sbi->s_mount_opt, POSIX_ACL);
-			break;
-		case Opt_noacl:
-			clear_opt(sbi->s_mount_opt, POSIX_ACL);
-			break;
-#else
-		case Opt_acl:
-		case Opt_noacl:
-			ext3_msg(sb, KERN_INFO,
-				"(no)acl options not supported");
-			break;
-#endif
-		case Opt_reservation:
-			set_opt(sbi->s_mount_opt, RESERVATION);
-			break;
-		case Opt_noreservation:
-			clear_opt(sbi->s_mount_opt, RESERVATION);
-			break;
-		case Opt_journal_update:
-			/* @@@ FIXME */
-			/* Eventually we will want to be able to create
-			   a journal file here.  For now, only allow the
-			   user to specify an existing inode to be the
-			   journal file. */
-			if (is_remount) {
-				ext3_msg(sb, KERN_ERR, "error: cannot specify "
-					"journal on remount");
-				return 0;
-			}
-			set_opt (sbi->s_mount_opt, UPDATE_JOURNAL);
-			break;
-		case Opt_journal_inum:
-			if (is_remount) {
-				ext3_msg(sb, KERN_ERR, "error: cannot specify "
-				       "journal on remount");
-				return 0;
-			}
-			if (match_int(&args[0], &option))
-				return 0;
-			*inum = option;
-			break;
-		case Opt_journal_dev:
-			if (is_remount) {
-				ext3_msg(sb, KERN_ERR, "error: cannot specify "
-				       "journal on remount");
-				return 0;
-			}
-			if (match_int(&args[0], &option))
-				return 0;
-			*journal_devnum = option;
-			break;
-		case Opt_journal_path:
-			if (is_remount) {
-				ext3_msg(sb, KERN_ERR, "error: cannot specify "
-				       "journal on remount");
-				return 0;
-			}
-
-			journal_path = match_strdup(&args[0]);
-			if (!journal_path) {
-				ext3_msg(sb, KERN_ERR, "error: could not dup "
-					"journal device string");
-				return 0;
-			}
-
-			error = kern_path(journal_path, LOOKUP_FOLLOW, &path);
-			if (error) {
-				ext3_msg(sb, KERN_ERR, "error: could not find "
-					"journal device path: error %d", error);
-				kfree(journal_path);
-				return 0;
-			}
-
-			journal_inode = d_inode(path.dentry);
-			if (!S_ISBLK(journal_inode->i_mode)) {
-				ext3_msg(sb, KERN_ERR, "error: journal path %s "
-					"is not a block device", journal_path);
-				path_put(&path);
-				kfree(journal_path);
-				return 0;
-			}
-
-			*journal_devnum = new_encode_dev(journal_inode->i_rdev);
-			path_put(&path);
-			kfree(journal_path);
-			break;
-		case Opt_noload:
-			set_opt (sbi->s_mount_opt, NOLOAD);
-			break;
-		case Opt_commit:
-			if (match_int(&args[0], &option))
-				return 0;
-			if (option < 0)
-				return 0;
-			if (option == 0)
-				option = JBD_DEFAULT_MAX_COMMIT_AGE;
-			sbi->s_commit_interval = HZ * option;
-			break;
-		case Opt_data_journal:
-			data_opt = EXT3_MOUNT_JOURNAL_DATA;
-			goto datacheck;
-		case Opt_data_ordered:
-			data_opt = EXT3_MOUNT_ORDERED_DATA;
-			goto datacheck;
-		case Opt_data_writeback:
-			data_opt = EXT3_MOUNT_WRITEBACK_DATA;
-		datacheck:
-			if (is_remount) {
-				if (test_opt(sb, DATA_FLAGS) == data_opt)
-					break;
-				ext3_msg(sb, KERN_ERR,
-					"error: cannot change "
-					"data mode on remount. The filesystem "
-					"is mounted in data=%s mode and you "
-					"try to remount it in data=%s mode.",
-					data_mode_string(test_opt(sb,
-							DATA_FLAGS)),
-					data_mode_string(data_opt));
-				return 0;
-			} else {
-				clear_opt(sbi->s_mount_opt, DATA_FLAGS);
-				sbi->s_mount_opt |= data_opt;
-			}
-			break;
-		case Opt_data_err_abort:
-			set_opt(sbi->s_mount_opt, DATA_ERR_ABORT);
-			break;
-		case Opt_data_err_ignore:
-			clear_opt(sbi->s_mount_opt, DATA_ERR_ABORT);
-			break;
-#ifdef CONFIG_QUOTA
-		case Opt_usrjquota:
-			if (!set_qf_name(sb, USRQUOTA, &args[0]))
-				return 0;
-			break;
-		case Opt_grpjquota:
-			if (!set_qf_name(sb, GRPQUOTA, &args[0]))
-				return 0;
-			break;
-		case Opt_offusrjquota:
-			if (!clear_qf_name(sb, USRQUOTA))
-				return 0;
-			break;
-		case Opt_offgrpjquota:
-			if (!clear_qf_name(sb, GRPQUOTA))
-				return 0;
-			break;
-		case Opt_jqfmt_vfsold:
-			qfmt = QFMT_VFS_OLD;
-			goto set_qf_format;
-		case Opt_jqfmt_vfsv0:
-			qfmt = QFMT_VFS_V0;
-			goto set_qf_format;
-		case Opt_jqfmt_vfsv1:
-			qfmt = QFMT_VFS_V1;
-set_qf_format:
-			if (sb_any_quota_loaded(sb) &&
-			    sbi->s_jquota_fmt != qfmt) {
-				ext3_msg(sb, KERN_ERR, "error: cannot change "
-					"journaled quota options when "
-					"quota turned on.");
-				return 0;
-			}
-			sbi->s_jquota_fmt = qfmt;
-			break;
-		case Opt_quota:
-		case Opt_usrquota:
-			set_opt(sbi->s_mount_opt, QUOTA);
-			set_opt(sbi->s_mount_opt, USRQUOTA);
-			break;
-		case Opt_grpquota:
-			set_opt(sbi->s_mount_opt, QUOTA);
-			set_opt(sbi->s_mount_opt, GRPQUOTA);
-			break;
-		case Opt_noquota:
-			if (sb_any_quota_loaded(sb)) {
-				ext3_msg(sb, KERN_ERR, "error: cannot change "
-					"quota options when quota turned on.");
-				return 0;
-			}
-			clear_opt(sbi->s_mount_opt, QUOTA);
-			clear_opt(sbi->s_mount_opt, USRQUOTA);
-			clear_opt(sbi->s_mount_opt, GRPQUOTA);
-			break;
-#else
-		case Opt_quota:
-		case Opt_usrquota:
-		case Opt_grpquota:
-			ext3_msg(sb, KERN_ERR,
-				"error: quota options not supported.");
-			break;
-		case Opt_usrjquota:
-		case Opt_grpjquota:
-		case Opt_offusrjquota:
-		case Opt_offgrpjquota:
-		case Opt_jqfmt_vfsold:
-		case Opt_jqfmt_vfsv0:
-		case Opt_jqfmt_vfsv1:
-			ext3_msg(sb, KERN_ERR,
-				"error: journaled quota options not "
-				"supported.");
-			break;
-		case Opt_noquota:
-			break;
-#endif
-		case Opt_abort:
-			set_opt(sbi->s_mount_opt, ABORT);
-			break;
-		case Opt_nobarrier:
-			clear_opt(sbi->s_mount_opt, BARRIER);
-			break;
-		case Opt_barrier:
-			if (args[0].from) {
-				if (match_int(&args[0], &option))
-					return 0;
-			} else
-				option = 1;	/* No argument, default to 1 */
-			if (option)
-				set_opt(sbi->s_mount_opt, BARRIER);
-			else
-				clear_opt(sbi->s_mount_opt, BARRIER);
-			break;
-		case Opt_ignore:
-			break;
-		case Opt_resize:
-			if (!is_remount) {
-				ext3_msg(sb, KERN_ERR,
-					"error: resize option only available "
-					"for remount");
-				return 0;
-			}
-			if (match_int(&args[0], &option) != 0)
-				return 0;
-			*n_blocks_count = option;
-			break;
-		case Opt_nobh:
-			ext3_msg(sb, KERN_WARNING,
-				"warning: ignoring deprecated nobh option");
-			break;
-		case Opt_bh:
-			ext3_msg(sb, KERN_WARNING,
-				"warning: ignoring deprecated bh option");
-			break;
-		default:
-			ext3_msg(sb, KERN_ERR,
-				"error: unrecognized mount option \"%s\" "
-				"or missing value", p);
-			return 0;
-		}
-	}
-#ifdef CONFIG_QUOTA
-	if (sbi->s_qf_names[USRQUOTA] || sbi->s_qf_names[GRPQUOTA]) {
-		if (test_opt(sb, USRQUOTA) && sbi->s_qf_names[USRQUOTA])
-			clear_opt(sbi->s_mount_opt, USRQUOTA);
-		if (test_opt(sb, GRPQUOTA) && sbi->s_qf_names[GRPQUOTA])
-			clear_opt(sbi->s_mount_opt, GRPQUOTA);
-
-		if (test_opt(sb, GRPQUOTA) || test_opt(sb, USRQUOTA)) {
-			ext3_msg(sb, KERN_ERR, "error: old and new quota "
-					"format mixing.");
-			return 0;
-		}
-
-		if (!sbi->s_jquota_fmt) {
-			ext3_msg(sb, KERN_ERR, "error: journaled quota format "
-					"not specified.");
-			return 0;
-		}
-	}
-#endif
-	return 1;
-}
-
-static int ext3_setup_super(struct super_block *sb, struct ext3_super_block *es,
-			    int read_only)
-{
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	int res = 0;
-
-	if (le32_to_cpu(es->s_rev_level) > EXT3_MAX_SUPP_REV) {
-		ext3_msg(sb, KERN_ERR,
-			"error: revision level too high, "
-			"forcing read-only mode");
-		res = MS_RDONLY;
-	}
-	if (read_only)
-		return res;
-	if (!(sbi->s_mount_state & EXT3_VALID_FS))
-		ext3_msg(sb, KERN_WARNING,
-			"warning: mounting unchecked fs, "
-			"running e2fsck is recommended");
-	else if ((sbi->s_mount_state & EXT3_ERROR_FS))
-		ext3_msg(sb, KERN_WARNING,
-			"warning: mounting fs with errors, "
-			"running e2fsck is recommended");
-	else if ((__s16) le16_to_cpu(es->s_max_mnt_count) > 0 &&
-		 le16_to_cpu(es->s_mnt_count) >=
-			le16_to_cpu(es->s_max_mnt_count))
-		ext3_msg(sb, KERN_WARNING,
-			"warning: maximal mount count reached, "
-			"running e2fsck is recommended");
-	else if (le32_to_cpu(es->s_checkinterval) &&
-		(le32_to_cpu(es->s_lastcheck) +
-			le32_to_cpu(es->s_checkinterval) <= get_seconds()))
-		ext3_msg(sb, KERN_WARNING,
-			"warning: checktime reached, "
-			"running e2fsck is recommended");
-#if 0
-		/* @@@ We _will_ want to clear the valid bit if we find
-                   inconsistencies, to force a fsck at reboot.  But for
-                   a plain journaled filesystem we can keep it set as
-                   valid forever! :) */
-	es->s_state &= cpu_to_le16(~EXT3_VALID_FS);
-#endif
-	if (!le16_to_cpu(es->s_max_mnt_count))
-		es->s_max_mnt_count = cpu_to_le16(EXT3_DFL_MAX_MNT_COUNT);
-	le16_add_cpu(&es->s_mnt_count, 1);
-	es->s_mtime = cpu_to_le32(get_seconds());
-	ext3_update_dynamic_rev(sb);
-	EXT3_SET_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_RECOVER);
-
-	ext3_commit_super(sb, es, 1);
-	if (test_opt(sb, DEBUG))
-		ext3_msg(sb, KERN_INFO, "[bs=%lu, gc=%lu, "
-				"bpg=%lu, ipg=%lu, mo=%04lx]",
-			sb->s_blocksize,
-			sbi->s_groups_count,
-			EXT3_BLOCKS_PER_GROUP(sb),
-			EXT3_INODES_PER_GROUP(sb),
-			sbi->s_mount_opt);
-
-	if (EXT3_SB(sb)->s_journal->j_inode == NULL) {
-		char b[BDEVNAME_SIZE];
-		ext3_msg(sb, KERN_INFO, "using external journal on %s",
-			bdevname(EXT3_SB(sb)->s_journal->j_dev, b));
-	} else {
-		ext3_msg(sb, KERN_INFO, "using internal journal");
-	}
-	cleancache_init_fs(sb);
-	return res;
-}
-
-/* Called at mount-time, super-block is locked */
-static int ext3_check_descriptors(struct super_block *sb)
-{
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	int i;
-
-	ext3_debug ("Checking group descriptors");
-
-	for (i = 0; i < sbi->s_groups_count; i++) {
-		struct ext3_group_desc *gdp = ext3_get_group_desc(sb, i, NULL);
-		ext3_fsblk_t first_block = ext3_group_first_block_no(sb, i);
-		ext3_fsblk_t last_block;
-
-		if (i == sbi->s_groups_count - 1)
-			last_block = le32_to_cpu(sbi->s_es->s_blocks_count) - 1;
-		else
-			last_block = first_block +
-				(EXT3_BLOCKS_PER_GROUP(sb) - 1);
-
-		if (le32_to_cpu(gdp->bg_block_bitmap) < first_block ||
-		    le32_to_cpu(gdp->bg_block_bitmap) > last_block)
-		{
-			ext3_error (sb, "ext3_check_descriptors",
-				    "Block bitmap for group %d"
-				    " not in group (block %lu)!",
-				    i, (unsigned long)
-					le32_to_cpu(gdp->bg_block_bitmap));
-			return 0;
-		}
-		if (le32_to_cpu(gdp->bg_inode_bitmap) < first_block ||
-		    le32_to_cpu(gdp->bg_inode_bitmap) > last_block)
-		{
-			ext3_error (sb, "ext3_check_descriptors",
-				    "Inode bitmap for group %d"
-				    " not in group (block %lu)!",
-				    i, (unsigned long)
-					le32_to_cpu(gdp->bg_inode_bitmap));
-			return 0;
-		}
-		if (le32_to_cpu(gdp->bg_inode_table) < first_block ||
-		    le32_to_cpu(gdp->bg_inode_table) + sbi->s_itb_per_group - 1 >
-		    last_block)
-		{
-			ext3_error (sb, "ext3_check_descriptors",
-				    "Inode table for group %d"
-				    " not in group (block %lu)!",
-				    i, (unsigned long)
-					le32_to_cpu(gdp->bg_inode_table));
-			return 0;
-		}
-	}
-
-	sbi->s_es->s_free_blocks_count=cpu_to_le32(ext3_count_free_blocks(sb));
-	sbi->s_es->s_free_inodes_count=cpu_to_le32(ext3_count_free_inodes(sb));
-	return 1;
-}
-
-
-/* ext3_orphan_cleanup() walks a singly-linked list of inodes (starting at
- * the superblock) which were deleted from all directories, but held open by
- * a process at the time of a crash.  We walk the list and try to delete these
- * inodes at recovery time (only with a read-write filesystem).
- *
- * In order to keep the orphan inode chain consistent during traversal (in
- * case of crash during recovery), we link each inode into the superblock
- * orphan list_head and handle it the same way as an inode deletion during
- * normal operation (which journals the operations for us).
- *
- * We only do an iget() and an iput() on each inode, which is very safe if we
- * accidentally point at an in-use or already deleted inode.  The worst that
- * can happen in this case is that we get a "bit already cleared" message from
- * ext3_free_inode().  The only reason we would point at a wrong inode is if
- * e2fsck was run on this filesystem, and it must have already done the orphan
- * inode cleanup for us, so we can safely abort without any further action.
- */
-static void ext3_orphan_cleanup (struct super_block * sb,
-				 struct ext3_super_block * es)
-{
-	unsigned int s_flags = sb->s_flags;
-	int nr_orphans = 0, nr_truncates = 0;
-#ifdef CONFIG_QUOTA
-	int i;
-#endif
-	if (!es->s_last_orphan) {
-		jbd_debug(4, "no orphan inodes to clean up\n");
-		return;
-	}
-
-	if (bdev_read_only(sb->s_bdev)) {
-		ext3_msg(sb, KERN_ERR, "error: write access "
-			"unavailable, skipping orphan cleanup.");
-		return;
-	}
-
-	/* Check if feature set allows readwrite operations */
-	if (EXT3_HAS_RO_COMPAT_FEATURE(sb, ~EXT3_FEATURE_RO_COMPAT_SUPP)) {
-		ext3_msg(sb, KERN_INFO, "Skipping orphan cleanup due to "
-			 "unknown ROCOMPAT features");
-		return;
-	}
-
-	if (EXT3_SB(sb)->s_mount_state & EXT3_ERROR_FS) {
-		/* don't clear list on RO mount w/ errors */
-		if (es->s_last_orphan && !(s_flags & MS_RDONLY)) {
-			jbd_debug(1, "Errors on filesystem, "
-				  "clearing orphan list.\n");
-			es->s_last_orphan = 0;
-		}
-		jbd_debug(1, "Skipping orphan recovery on fs with errors.\n");
-		return;
-	}
-
-	if (s_flags & MS_RDONLY) {
-		ext3_msg(sb, KERN_INFO, "orphan cleanup on readonly fs");
-		sb->s_flags &= ~MS_RDONLY;
-	}
-#ifdef CONFIG_QUOTA
-	/* Needed for iput() to work correctly and not trash data */
-	sb->s_flags |= MS_ACTIVE;
-	/* Turn on quotas so that they are updated correctly */
-	for (i = 0; i < EXT3_MAXQUOTAS; i++) {
-		if (EXT3_SB(sb)->s_qf_names[i]) {
-			int ret = ext3_quota_on_mount(sb, i);
-			if (ret < 0)
-				ext3_msg(sb, KERN_ERR,
-					"error: cannot turn on journaled "
-					"quota: %d", ret);
-		}
-	}
-#endif
-
-	while (es->s_last_orphan) {
-		struct inode *inode;
-
-		inode = ext3_orphan_get(sb, le32_to_cpu(es->s_last_orphan));
-		if (IS_ERR(inode)) {
-			es->s_last_orphan = 0;
-			break;
-		}
-
-		list_add(&EXT3_I(inode)->i_orphan, &EXT3_SB(sb)->s_orphan);
-		dquot_initialize(inode);
-		if (inode->i_nlink) {
-			printk(KERN_DEBUG
-				"%s: truncating inode %lu to %Ld bytes\n",
-				__func__, inode->i_ino, inode->i_size);
-			jbd_debug(2, "truncating inode %lu to %Ld bytes\n",
-				  inode->i_ino, inode->i_size);
-			ext3_truncate(inode);
-			nr_truncates++;
-		} else {
-			printk(KERN_DEBUG
-				"%s: deleting unreferenced inode %lu\n",
-				__func__, inode->i_ino);
-			jbd_debug(2, "deleting unreferenced inode %lu\n",
-				  inode->i_ino);
-			nr_orphans++;
-		}
-		iput(inode);  /* The delete magic happens here! */
-	}
-
-#define PLURAL(x) (x), ((x)==1) ? "" : "s"
-
-	if (nr_orphans)
-		ext3_msg(sb, KERN_INFO, "%d orphan inode%s deleted",
-		       PLURAL(nr_orphans));
-	if (nr_truncates)
-		ext3_msg(sb, KERN_INFO, "%d truncate%s cleaned up",
-		       PLURAL(nr_truncates));
-#ifdef CONFIG_QUOTA
-	/* Turn quotas off */
-	for (i = 0; i < EXT3_MAXQUOTAS; i++) {
-		if (sb_dqopt(sb)->files[i])
-			dquot_quota_off(sb, i);
-	}
-#endif
-	sb->s_flags = s_flags; /* Restore MS_RDONLY status */
-}
-
-/*
- * Maximal file size.  There is a direct, and {,double-,triple-}indirect
- * block limit, and also a limit of (2^32 - 1) 512-byte sectors in i_blocks.
- * We need to be 1 filesystem block less than the 2^32 sector limit.
- */
-static loff_t ext3_max_size(int bits)
-{
-	loff_t res = EXT3_NDIR_BLOCKS;
-	int meta_blocks;
-	loff_t upper_limit;
-
-	/* This is calculated to be the largest file size for a
-	 * dense, file such that the total number of
-	 * sectors in the file, including data and all indirect blocks,
-	 * does not exceed 2^32 -1
-	 * __u32 i_blocks representing the total number of
-	 * 512 bytes blocks of the file
-	 */
-	upper_limit = (1LL << 32) - 1;
-
-	/* total blocks in file system block size */
-	upper_limit >>= (bits - 9);
-
-
-	/* indirect blocks */
-	meta_blocks = 1;
-	/* double indirect blocks */
-	meta_blocks += 1 + (1LL << (bits-2));
-	/* tripple indirect blocks */
-	meta_blocks += 1 + (1LL << (bits-2)) + (1LL << (2*(bits-2)));
-
-	upper_limit -= meta_blocks;
-	upper_limit <<= bits;
-
-	res += 1LL << (bits-2);
-	res += 1LL << (2*(bits-2));
-	res += 1LL << (3*(bits-2));
-	res <<= bits;
-	if (res > upper_limit)
-		res = upper_limit;
-
-	if (res > MAX_LFS_FILESIZE)
-		res = MAX_LFS_FILESIZE;
-
-	return res;
-}
-
-static ext3_fsblk_t descriptor_loc(struct super_block *sb,
-				    ext3_fsblk_t logic_sb_block,
-				    int nr)
-{
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	unsigned long bg, first_meta_bg;
-	int has_super = 0;
-
-	first_meta_bg = le32_to_cpu(sbi->s_es->s_first_meta_bg);
-
-	if (!EXT3_HAS_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_META_BG) ||
-	    nr < first_meta_bg)
-		return (logic_sb_block + nr + 1);
-	bg = sbi->s_desc_per_block * nr;
-	if (ext3_bg_has_super(sb, bg))
-		has_super = 1;
-	return (has_super + ext3_group_first_block_no(sb, bg));
-}
-
-
-static int ext3_fill_super (struct super_block *sb, void *data, int silent)
-{
-	struct buffer_head * bh;
-	struct ext3_super_block *es = NULL;
-	struct ext3_sb_info *sbi;
-	ext3_fsblk_t block;
-	ext3_fsblk_t sb_block = get_sb_block(&data, sb);
-	ext3_fsblk_t logic_sb_block;
-	unsigned long offset = 0;
-	unsigned int journal_inum = 0;
-	unsigned long journal_devnum = 0;
-	unsigned long def_mount_opts;
-	struct inode *root;
-	int blocksize;
-	int hblock;
-	int db_count;
-	int i;
-	int needs_recovery;
-	int ret = -EINVAL;
-	__le32 features;
-	int err;
-
-	sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
-	if (!sbi)
-		return -ENOMEM;
-
-	sbi->s_blockgroup_lock =
-		kzalloc(sizeof(struct blockgroup_lock), GFP_KERNEL);
-	if (!sbi->s_blockgroup_lock) {
-		kfree(sbi);
-		return -ENOMEM;
-	}
-	sb->s_fs_info = sbi;
-	sbi->s_sb_block = sb_block;
-
-	blocksize = sb_min_blocksize(sb, EXT3_MIN_BLOCK_SIZE);
-	if (!blocksize) {
-		ext3_msg(sb, KERN_ERR, "error: unable to set blocksize");
-		goto out_fail;
-	}
-
-	/*
-	 * The ext3 superblock will not be buffer aligned for other than 1kB
-	 * block sizes.  We need to calculate the offset from buffer start.
-	 */
-	if (blocksize != EXT3_MIN_BLOCK_SIZE) {
-		logic_sb_block = (sb_block * EXT3_MIN_BLOCK_SIZE) / blocksize;
-		offset = (sb_block * EXT3_MIN_BLOCK_SIZE) % blocksize;
-	} else {
-		logic_sb_block = sb_block;
-	}
-
-	if (!(bh = sb_bread(sb, logic_sb_block))) {
-		ext3_msg(sb, KERN_ERR, "error: unable to read superblock");
-		goto out_fail;
-	}
-	/*
-	 * Note: s_es must be initialized as soon as possible because
-	 *       some ext3 macro-instructions depend on its value
-	 */
-	es = (struct ext3_super_block *) (bh->b_data + offset);
-	sbi->s_es = es;
-	sb->s_magic = le16_to_cpu(es->s_magic);
-	if (sb->s_magic != EXT3_SUPER_MAGIC)
-		goto cantfind_ext3;
-
-	/* Set defaults before we parse the mount options */
-	def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
-	if (def_mount_opts & EXT3_DEFM_DEBUG)
-		set_opt(sbi->s_mount_opt, DEBUG);
-	if (def_mount_opts & EXT3_DEFM_BSDGROUPS)
-		set_opt(sbi->s_mount_opt, GRPID);
-	if (def_mount_opts & EXT3_DEFM_UID16)
-		set_opt(sbi->s_mount_opt, NO_UID32);
-#ifdef CONFIG_EXT3_FS_XATTR
-	if (def_mount_opts & EXT3_DEFM_XATTR_USER)
-		set_opt(sbi->s_mount_opt, XATTR_USER);
-#endif
-#ifdef CONFIG_EXT3_FS_POSIX_ACL
-	if (def_mount_opts & EXT3_DEFM_ACL)
-		set_opt(sbi->s_mount_opt, POSIX_ACL);
-#endif
-	if ((def_mount_opts & EXT3_DEFM_JMODE) == EXT3_DEFM_JMODE_DATA)
-		set_opt(sbi->s_mount_opt, JOURNAL_DATA);
-	else if ((def_mount_opts & EXT3_DEFM_JMODE) == EXT3_DEFM_JMODE_ORDERED)
-		set_opt(sbi->s_mount_opt, ORDERED_DATA);
-	else if ((def_mount_opts & EXT3_DEFM_JMODE) == EXT3_DEFM_JMODE_WBACK)
-		set_opt(sbi->s_mount_opt, WRITEBACK_DATA);
-
-	if (le16_to_cpu(sbi->s_es->s_errors) == EXT3_ERRORS_PANIC)
-		set_opt(sbi->s_mount_opt, ERRORS_PANIC);
-	else if (le16_to_cpu(sbi->s_es->s_errors) == EXT3_ERRORS_CONTINUE)
-		set_opt(sbi->s_mount_opt, ERRORS_CONT);
-	else
-		set_opt(sbi->s_mount_opt, ERRORS_RO);
-
-	sbi->s_resuid = make_kuid(&init_user_ns, le16_to_cpu(es->s_def_resuid));
-	sbi->s_resgid = make_kgid(&init_user_ns, le16_to_cpu(es->s_def_resgid));
-
-	/* enable barriers by default */
-	set_opt(sbi->s_mount_opt, BARRIER);
-	set_opt(sbi->s_mount_opt, RESERVATION);
-
-	if (!parse_options ((char *) data, sb, &journal_inum, &journal_devnum,
-			    NULL, 0))
-		goto failed_mount;
-
-	sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
-		(test_opt(sb, POSIX_ACL) ? MS_POSIXACL : 0);
-
-	if (le32_to_cpu(es->s_rev_level) == EXT3_GOOD_OLD_REV &&
-	    (EXT3_HAS_COMPAT_FEATURE(sb, ~0U) ||
-	     EXT3_HAS_RO_COMPAT_FEATURE(sb, ~0U) ||
-	     EXT3_HAS_INCOMPAT_FEATURE(sb, ~0U)))
-		ext3_msg(sb, KERN_WARNING,
-			"warning: feature flags set on rev 0 fs, "
-			"running e2fsck is recommended");
-	/*
-	 * Check feature flags regardless of the revision level, since we
-	 * previously didn't change the revision level when setting the flags,
-	 * so there is a chance incompat flags are set on a rev 0 filesystem.
-	 */
-	features = EXT3_HAS_INCOMPAT_FEATURE(sb, ~EXT3_FEATURE_INCOMPAT_SUPP);
-	if (features) {
-		ext3_msg(sb, KERN_ERR,
-			"error: couldn't mount because of unsupported "
-			"optional features (%x)", le32_to_cpu(features));
-		goto failed_mount;
-	}
-	features = EXT3_HAS_RO_COMPAT_FEATURE(sb, ~EXT3_FEATURE_RO_COMPAT_SUPP);
-	if (!(sb->s_flags & MS_RDONLY) && features) {
-		ext3_msg(sb, KERN_ERR,
-			"error: couldn't mount RDWR because of unsupported "
-			"optional features (%x)", le32_to_cpu(features));
-		goto failed_mount;
-	}
-	blocksize = BLOCK_SIZE << le32_to_cpu(es->s_log_block_size);
-
-	if (blocksize < EXT3_MIN_BLOCK_SIZE ||
-	    blocksize > EXT3_MAX_BLOCK_SIZE) {
-		ext3_msg(sb, KERN_ERR,
-			"error: couldn't mount because of unsupported "
-			"filesystem blocksize %d", blocksize);
-		goto failed_mount;
-	}
-
-	hblock = bdev_logical_block_size(sb->s_bdev);
-	if (sb->s_blocksize != blocksize) {
-		/*
-		 * Make sure the blocksize for the filesystem is larger
-		 * than the hardware sectorsize for the machine.
-		 */
-		if (blocksize < hblock) {
-			ext3_msg(sb, KERN_ERR,
-				"error: fsblocksize %d too small for "
-				"hardware sectorsize %d", blocksize, hblock);
-			goto failed_mount;
-		}
-
-		brelse (bh);
-		if (!sb_set_blocksize(sb, blocksize)) {
-			ext3_msg(sb, KERN_ERR,
-				"error: bad blocksize %d", blocksize);
-			goto out_fail;
-		}
-		logic_sb_block = (sb_block * EXT3_MIN_BLOCK_SIZE) / blocksize;
-		offset = (sb_block * EXT3_MIN_BLOCK_SIZE) % blocksize;
-		bh = sb_bread(sb, logic_sb_block);
-		if (!bh) {
-			ext3_msg(sb, KERN_ERR,
-			       "error: can't read superblock on 2nd try");
-			goto failed_mount;
-		}
-		es = (struct ext3_super_block *)(bh->b_data + offset);
-		sbi->s_es = es;
-		if (es->s_magic != cpu_to_le16(EXT3_SUPER_MAGIC)) {
-			ext3_msg(sb, KERN_ERR,
-				"error: magic mismatch");
-			goto failed_mount;
-		}
-	}
-
-	sb->s_maxbytes = ext3_max_size(sb->s_blocksize_bits);
-
-	if (le32_to_cpu(es->s_rev_level) == EXT3_GOOD_OLD_REV) {
-		sbi->s_inode_size = EXT3_GOOD_OLD_INODE_SIZE;
-		sbi->s_first_ino = EXT3_GOOD_OLD_FIRST_INO;
-	} else {
-		sbi->s_inode_size = le16_to_cpu(es->s_inode_size);
-		sbi->s_first_ino = le32_to_cpu(es->s_first_ino);
-		if ((sbi->s_inode_size < EXT3_GOOD_OLD_INODE_SIZE) ||
-		    (!is_power_of_2(sbi->s_inode_size)) ||
-		    (sbi->s_inode_size > blocksize)) {
-			ext3_msg(sb, KERN_ERR,
-				"error: unsupported inode size: %d",
-				sbi->s_inode_size);
-			goto failed_mount;
-		}
-	}
-	sbi->s_frag_size = EXT3_MIN_FRAG_SIZE <<
-				   le32_to_cpu(es->s_log_frag_size);
-	if (blocksize != sbi->s_frag_size) {
-		ext3_msg(sb, KERN_ERR,
-		       "error: fragsize %lu != blocksize %u (unsupported)",
-		       sbi->s_frag_size, blocksize);
-		goto failed_mount;
-	}
-	sbi->s_frags_per_block = 1;
-	sbi->s_blocks_per_group = le32_to_cpu(es->s_blocks_per_group);
-	sbi->s_frags_per_group = le32_to_cpu(es->s_frags_per_group);
-	sbi->s_inodes_per_group = le32_to_cpu(es->s_inodes_per_group);
-	if (EXT3_INODE_SIZE(sb) == 0 || EXT3_INODES_PER_GROUP(sb) == 0)
-		goto cantfind_ext3;
-	sbi->s_inodes_per_block = blocksize / EXT3_INODE_SIZE(sb);
-	if (sbi->s_inodes_per_block == 0)
-		goto cantfind_ext3;
-	sbi->s_itb_per_group = sbi->s_inodes_per_group /
-					sbi->s_inodes_per_block;
-	sbi->s_desc_per_block = blocksize / sizeof(struct ext3_group_desc);
-	sbi->s_sbh = bh;
-	sbi->s_mount_state = le16_to_cpu(es->s_state);
-	sbi->s_addr_per_block_bits = ilog2(EXT3_ADDR_PER_BLOCK(sb));
-	sbi->s_desc_per_block_bits = ilog2(EXT3_DESC_PER_BLOCK(sb));
-	for (i = 0; i < 4; i++)
-		sbi->s_hash_seed[i] = le32_to_cpu(es->s_hash_seed[i]);
-	sbi->s_def_hash_version = es->s_def_hash_version;
-	i = le32_to_cpu(es->s_flags);
-	if (i & EXT2_FLAGS_UNSIGNED_HASH)
-		sbi->s_hash_unsigned = 3;
-	else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) {
-#ifdef __CHAR_UNSIGNED__
-		es->s_flags |= cpu_to_le32(EXT2_FLAGS_UNSIGNED_HASH);
-		sbi->s_hash_unsigned = 3;
-#else
-		es->s_flags |= cpu_to_le32(EXT2_FLAGS_SIGNED_HASH);
-#endif
-	}
-
-	if (sbi->s_blocks_per_group > blocksize * 8) {
-		ext3_msg(sb, KERN_ERR,
-			"#blocks per group too big: %lu",
-			sbi->s_blocks_per_group);
-		goto failed_mount;
-	}
-	if (sbi->s_frags_per_group > blocksize * 8) {
-		ext3_msg(sb, KERN_ERR,
-			"error: #fragments per group too big: %lu",
-			sbi->s_frags_per_group);
-		goto failed_mount;
-	}
-	if (sbi->s_inodes_per_group > blocksize * 8) {
-		ext3_msg(sb, KERN_ERR,
-			"error: #inodes per group too big: %lu",
-			sbi->s_inodes_per_group);
-		goto failed_mount;
-	}
-
-	err = generic_check_addressable(sb->s_blocksize_bits,
-					le32_to_cpu(es->s_blocks_count));
-	if (err) {
-		ext3_msg(sb, KERN_ERR,
-			"error: filesystem is too large to mount safely");
-		if (sizeof(sector_t) < 8)
-			ext3_msg(sb, KERN_ERR,
-				"error: CONFIG_LBDAF not enabled");
-		ret = err;
-		goto failed_mount;
-	}
-
-	if (EXT3_BLOCKS_PER_GROUP(sb) == 0)
-		goto cantfind_ext3;
-	sbi->s_groups_count = ((le32_to_cpu(es->s_blocks_count) -
-			       le32_to_cpu(es->s_first_data_block) - 1)
-				       / EXT3_BLOCKS_PER_GROUP(sb)) + 1;
-	db_count = DIV_ROUND_UP(sbi->s_groups_count, EXT3_DESC_PER_BLOCK(sb));
-	sbi->s_group_desc = kmalloc(db_count * sizeof (struct buffer_head *),
-				    GFP_KERNEL);
-	if (sbi->s_group_desc == NULL) {
-		ext3_msg(sb, KERN_ERR,
-			"error: not enough memory");
-		ret = -ENOMEM;
-		goto failed_mount;
-	}
-
-	bgl_lock_init(sbi->s_blockgroup_lock);
-
-	for (i = 0; i < db_count; i++) {
-		block = descriptor_loc(sb, logic_sb_block, i);
-		sbi->s_group_desc[i] = sb_bread(sb, block);
-		if (!sbi->s_group_desc[i]) {
-			ext3_msg(sb, KERN_ERR,
-				"error: can't read group descriptor %d", i);
-			db_count = i;
-			goto failed_mount2;
-		}
-	}
-	if (!ext3_check_descriptors (sb)) {
-		ext3_msg(sb, KERN_ERR,
-			"error: group descriptors corrupted");
-		goto failed_mount2;
-	}
-	sbi->s_gdb_count = db_count;
-	get_random_bytes(&sbi->s_next_generation, sizeof(u32));
-	spin_lock_init(&sbi->s_next_gen_lock);
-
-	/* per fileystem reservation list head & lock */
-	spin_lock_init(&sbi->s_rsv_window_lock);
-	sbi->s_rsv_window_root = RB_ROOT;
-	/* Add a single, static dummy reservation to the start of the
-	 * reservation window list --- it gives us a placeholder for
-	 * append-at-start-of-list which makes the allocation logic
-	 * _much_ simpler. */
-	sbi->s_rsv_window_head.rsv_start = EXT3_RESERVE_WINDOW_NOT_ALLOCATED;
-	sbi->s_rsv_window_head.rsv_end = EXT3_RESERVE_WINDOW_NOT_ALLOCATED;
-	sbi->s_rsv_window_head.rsv_alloc_hit = 0;
-	sbi->s_rsv_window_head.rsv_goal_size = 0;
-	ext3_rsv_window_add(sb, &sbi->s_rsv_window_head);
-
-	/*
-	 * set up enough so that it can read an inode
-	 */
-	sb->s_op = &ext3_sops;
-	sb->s_export_op = &ext3_export_ops;
-	sb->s_xattr = ext3_xattr_handlers;
-#ifdef CONFIG_QUOTA
-	sb->s_qcop = &ext3_qctl_operations;
-	sb->dq_op = &ext3_quota_operations;
-	sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP;
-#endif
-	memcpy(sb->s_uuid, es->s_uuid, sizeof(es->s_uuid));
-	INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
-	mutex_init(&sbi->s_orphan_lock);
-	mutex_init(&sbi->s_resize_lock);
-
-	sb->s_root = NULL;
-
-	needs_recovery = (es->s_last_orphan != 0 ||
-			  EXT3_HAS_INCOMPAT_FEATURE(sb,
-				    EXT3_FEATURE_INCOMPAT_RECOVER));
-
-	/*
-	 * The first inode we look at is the journal inode.  Don't try
-	 * root first: it may be modified in the journal!
-	 */
-	if (!test_opt(sb, NOLOAD) &&
-	    EXT3_HAS_COMPAT_FEATURE(sb, EXT3_FEATURE_COMPAT_HAS_JOURNAL)) {
-		if (ext3_load_journal(sb, es, journal_devnum))
-			goto failed_mount2;
-	} else if (journal_inum) {
-		if (ext3_create_journal(sb, es, journal_inum))
-			goto failed_mount2;
-	} else {
-		if (!silent)
-			ext3_msg(sb, KERN_ERR,
-				"error: no journal found. "
-				"mounting ext3 over ext2?");
-		goto failed_mount2;
-	}
-	err = percpu_counter_init(&sbi->s_freeblocks_counter,
-			ext3_count_free_blocks(sb), GFP_KERNEL);
-	if (!err) {
-		err = percpu_counter_init(&sbi->s_freeinodes_counter,
-				ext3_count_free_inodes(sb), GFP_KERNEL);
-	}
-	if (!err) {
-		err = percpu_counter_init(&sbi->s_dirs_counter,
-				ext3_count_dirs(sb), GFP_KERNEL);
-	}
-	if (err) {
-		ext3_msg(sb, KERN_ERR, "error: insufficient memory");
-		ret = err;
-		goto failed_mount3;
-	}
-
-	/* We have now updated the journal if required, so we can
-	 * validate the data journaling mode. */
-	switch (test_opt(sb, DATA_FLAGS)) {
-	case 0:
-		/* No mode set, assume a default based on the journal
-                   capabilities: ORDERED_DATA if the journal can
-                   cope, else JOURNAL_DATA */
-		if (journal_check_available_features
-		    (sbi->s_journal, 0, 0, JFS_FEATURE_INCOMPAT_REVOKE))
-			set_opt(sbi->s_mount_opt, DEFAULT_DATA_MODE);
-		else
-			set_opt(sbi->s_mount_opt, JOURNAL_DATA);
-		break;
-
-	case EXT3_MOUNT_ORDERED_DATA:
-	case EXT3_MOUNT_WRITEBACK_DATA:
-		if (!journal_check_available_features
-		    (sbi->s_journal, 0, 0, JFS_FEATURE_INCOMPAT_REVOKE)) {
-			ext3_msg(sb, KERN_ERR,
-				"error: journal does not support "
-				"requested data journaling mode");
-			goto failed_mount3;
-		}
-	default:
-		break;
-	}
-
-	/*
-	 * The journal_load will have done any necessary log recovery,
-	 * so we can safely mount the rest of the filesystem now.
-	 */
-
-	root = ext3_iget(sb, EXT3_ROOT_INO);
-	if (IS_ERR(root)) {
-		ext3_msg(sb, KERN_ERR, "error: get root inode failed");
-		ret = PTR_ERR(root);
-		goto failed_mount3;
-	}
-	if (!S_ISDIR(root->i_mode) || !root->i_blocks || !root->i_size) {
-		iput(root);
-		ext3_msg(sb, KERN_ERR, "error: corrupt root inode, run e2fsck");
-		goto failed_mount3;
-	}
-	sb->s_root = d_make_root(root);
-	if (!sb->s_root) {
-		ext3_msg(sb, KERN_ERR, "error: get root dentry failed");
-		ret = -ENOMEM;
-		goto failed_mount3;
-	}
-
-	if (ext3_setup_super(sb, es, sb->s_flags & MS_RDONLY))
-		sb->s_flags |= MS_RDONLY;
-
-	EXT3_SB(sb)->s_mount_state |= EXT3_ORPHAN_FS;
-	ext3_orphan_cleanup(sb, es);
-	EXT3_SB(sb)->s_mount_state &= ~EXT3_ORPHAN_FS;
-	if (needs_recovery) {
-		ext3_mark_recovery_complete(sb, es);
-		ext3_msg(sb, KERN_INFO, "recovery complete");
-	}
-	ext3_msg(sb, KERN_INFO, "mounted filesystem with %s data mode",
-		test_opt(sb,DATA_FLAGS) == EXT3_MOUNT_JOURNAL_DATA ? "journal":
-		test_opt(sb,DATA_FLAGS) == EXT3_MOUNT_ORDERED_DATA ? "ordered":
-		"writeback");
-
-	return 0;
-
-cantfind_ext3:
-	if (!silent)
-		ext3_msg(sb, KERN_INFO,
-			"error: can't find ext3 filesystem on dev %s.",
-		       sb->s_id);
-	goto failed_mount;
-
-failed_mount3:
-	percpu_counter_destroy(&sbi->s_freeblocks_counter);
-	percpu_counter_destroy(&sbi->s_freeinodes_counter);
-	percpu_counter_destroy(&sbi->s_dirs_counter);
-	journal_destroy(sbi->s_journal);
-failed_mount2:
-	for (i = 0; i < db_count; i++)
-		brelse(sbi->s_group_desc[i]);
-	kfree(sbi->s_group_desc);
-failed_mount:
-#ifdef CONFIG_QUOTA
-	for (i = 0; i < EXT3_MAXQUOTAS; i++)
-		kfree(sbi->s_qf_names[i]);
-#endif
-	ext3_blkdev_remove(sbi);
-	brelse(bh);
-out_fail:
-	sb->s_fs_info = NULL;
-	kfree(sbi->s_blockgroup_lock);
-	kfree(sbi);
-	return ret;
-}
-
-/*
- * Setup any per-fs journal parameters now.  We'll do this both on
- * initial mount, once the journal has been initialised but before we've
- * done any recovery; and again on any subsequent remount.
- */
-static void ext3_init_journal_params(struct super_block *sb, journal_t *journal)
-{
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-
-	if (sbi->s_commit_interval)
-		journal->j_commit_interval = sbi->s_commit_interval;
-	/* We could also set up an ext3-specific default for the commit
-	 * interval here, but for now we'll just fall back to the jbd
-	 * default. */
-
-	spin_lock(&journal->j_state_lock);
-	if (test_opt(sb, BARRIER))
-		journal->j_flags |= JFS_BARRIER;
-	else
-		journal->j_flags &= ~JFS_BARRIER;
-	if (test_opt(sb, DATA_ERR_ABORT))
-		journal->j_flags |= JFS_ABORT_ON_SYNCDATA_ERR;
-	else
-		journal->j_flags &= ~JFS_ABORT_ON_SYNCDATA_ERR;
-	spin_unlock(&journal->j_state_lock);
-}
-
-static journal_t *ext3_get_journal(struct super_block *sb,
-				   unsigned int journal_inum)
-{
-	struct inode *journal_inode;
-	journal_t *journal;
-
-	/* First, test for the existence of a valid inode on disk.  Bad
-	 * things happen if we iget() an unused inode, as the subsequent
-	 * iput() will try to delete it. */
-
-	journal_inode = ext3_iget(sb, journal_inum);
-	if (IS_ERR(journal_inode)) {
-		ext3_msg(sb, KERN_ERR, "error: no journal found");
-		return NULL;
-	}
-	if (!journal_inode->i_nlink) {
-		make_bad_inode(journal_inode);
-		iput(journal_inode);
-		ext3_msg(sb, KERN_ERR, "error: journal inode is deleted");
-		return NULL;
-	}
-
-	jbd_debug(2, "Journal inode found at %p: %Ld bytes\n",
-		  journal_inode, journal_inode->i_size);
-	if (!S_ISREG(journal_inode->i_mode)) {
-		ext3_msg(sb, KERN_ERR, "error: invalid journal inode");
-		iput(journal_inode);
-		return NULL;
-	}
-
-	journal = journal_init_inode(journal_inode);
-	if (!journal) {
-		ext3_msg(sb, KERN_ERR, "error: could not load journal inode");
-		iput(journal_inode);
-		return NULL;
-	}
-	journal->j_private = sb;
-	ext3_init_journal_params(sb, journal);
-	return journal;
-}
-
-static journal_t *ext3_get_dev_journal(struct super_block *sb,
-				       dev_t j_dev)
-{
-	struct buffer_head * bh;
-	journal_t *journal;
-	ext3_fsblk_t start;
-	ext3_fsblk_t len;
-	int hblock, blocksize;
-	ext3_fsblk_t sb_block;
-	unsigned long offset;
-	struct ext3_super_block * es;
-	struct block_device *bdev;
-
-	bdev = ext3_blkdev_get(j_dev, sb);
-	if (bdev == NULL)
-		return NULL;
-
-	blocksize = sb->s_blocksize;
-	hblock = bdev_logical_block_size(bdev);
-	if (blocksize < hblock) {
-		ext3_msg(sb, KERN_ERR,
-			"error: blocksize too small for journal device");
-		goto out_bdev;
-	}
-
-	sb_block = EXT3_MIN_BLOCK_SIZE / blocksize;
-	offset = EXT3_MIN_BLOCK_SIZE % blocksize;
-	set_blocksize(bdev, blocksize);
-	if (!(bh = __bread(bdev, sb_block, blocksize))) {
-		ext3_msg(sb, KERN_ERR, "error: couldn't read superblock of "
-			"external journal");
-		goto out_bdev;
-	}
-
-	es = (struct ext3_super_block *) (bh->b_data + offset);
-	if ((le16_to_cpu(es->s_magic) != EXT3_SUPER_MAGIC) ||
-	    !(le32_to_cpu(es->s_feature_incompat) &
-	      EXT3_FEATURE_INCOMPAT_JOURNAL_DEV)) {
-		ext3_msg(sb, KERN_ERR, "error: external journal has "
-			"bad superblock");
-		brelse(bh);
-		goto out_bdev;
-	}
-
-	if (memcmp(EXT3_SB(sb)->s_es->s_journal_uuid, es->s_uuid, 16)) {
-		ext3_msg(sb, KERN_ERR, "error: journal UUID does not match");
-		brelse(bh);
-		goto out_bdev;
-	}
-
-	len = le32_to_cpu(es->s_blocks_count);
-	start = sb_block + 1;
-	brelse(bh);	/* we're done with the superblock */
-
-	journal = journal_init_dev(bdev, sb->s_bdev,
-					start, len, blocksize);
-	if (!journal) {
-		ext3_msg(sb, KERN_ERR,
-			"error: failed to create device journal");
-		goto out_bdev;
-	}
-	journal->j_private = sb;
-	if (!bh_uptodate_or_lock(journal->j_sb_buffer)) {
-		if (bh_submit_read(journal->j_sb_buffer)) {
-			ext3_msg(sb, KERN_ERR, "I/O error on journal device");
-			goto out_journal;
-		}
-	}
-	if (be32_to_cpu(journal->j_superblock->s_nr_users) != 1) {
-		ext3_msg(sb, KERN_ERR,
-			"error: external journal has more than one "
-			"user (unsupported) - %d",
-			be32_to_cpu(journal->j_superblock->s_nr_users));
-		goto out_journal;
-	}
-	EXT3_SB(sb)->journal_bdev = bdev;
-	ext3_init_journal_params(sb, journal);
-	return journal;
-out_journal:
-	journal_destroy(journal);
-out_bdev:
-	ext3_blkdev_put(bdev);
-	return NULL;
-}
-
-static int ext3_load_journal(struct super_block *sb,
-			     struct ext3_super_block *es,
-			     unsigned long journal_devnum)
-{
-	journal_t *journal;
-	unsigned int journal_inum = le32_to_cpu(es->s_journal_inum);
-	dev_t journal_dev;
-	int err = 0;
-	int really_read_only;
-
-	if (journal_devnum &&
-	    journal_devnum != le32_to_cpu(es->s_journal_dev)) {
-		ext3_msg(sb, KERN_INFO, "external journal device major/minor "
-			"numbers have changed");
-		journal_dev = new_decode_dev(journal_devnum);
-	} else
-		journal_dev = new_decode_dev(le32_to_cpu(es->s_journal_dev));
-
-	really_read_only = bdev_read_only(sb->s_bdev);
-
-	/*
-	 * Are we loading a blank journal or performing recovery after a
-	 * crash?  For recovery, we need to check in advance whether we
-	 * can get read-write access to the device.
-	 */
-
-	if (EXT3_HAS_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_RECOVER)) {
-		if (sb->s_flags & MS_RDONLY) {
-			ext3_msg(sb, KERN_INFO,
-				"recovery required on readonly filesystem");
-			if (really_read_only) {
-				ext3_msg(sb, KERN_ERR, "error: write access "
-					"unavailable, cannot proceed");
-				return -EROFS;
-			}
-			ext3_msg(sb, KERN_INFO,
-				"write access will be enabled during recovery");
-		}
-	}
-
-	if (journal_inum && journal_dev) {
-		ext3_msg(sb, KERN_ERR, "error: filesystem has both journal "
-		       "and inode journals");
-		return -EINVAL;
-	}
-
-	if (journal_inum) {
-		if (!(journal = ext3_get_journal(sb, journal_inum)))
-			return -EINVAL;
-	} else {
-		if (!(journal = ext3_get_dev_journal(sb, journal_dev)))
-			return -EINVAL;
-	}
-
-	if (!(journal->j_flags & JFS_BARRIER))
-		printk(KERN_INFO "EXT3-fs: barriers not enabled\n");
-
-	if (!really_read_only && test_opt(sb, UPDATE_JOURNAL)) {
-		err = journal_update_format(journal);
-		if (err)  {
-			ext3_msg(sb, KERN_ERR, "error updating journal");
-			journal_destroy(journal);
-			return err;
-		}
-	}
-
-	if (!EXT3_HAS_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_RECOVER))
-		err = journal_wipe(journal, !really_read_only);
-	if (!err)
-		err = journal_load(journal);
-
-	if (err) {
-		ext3_msg(sb, KERN_ERR, "error loading journal");
-		journal_destroy(journal);
-		return err;
-	}
-
-	EXT3_SB(sb)->s_journal = journal;
-	ext3_clear_journal_err(sb, es);
-
-	if (!really_read_only && journal_devnum &&
-	    journal_devnum != le32_to_cpu(es->s_journal_dev)) {
-		es->s_journal_dev = cpu_to_le32(journal_devnum);
-
-		/* Make sure we flush the recovery flag to disk. */
-		ext3_commit_super(sb, es, 1);
-	}
-
-	return 0;
-}
-
-static int ext3_create_journal(struct super_block *sb,
-			       struct ext3_super_block *es,
-			       unsigned int journal_inum)
-{
-	journal_t *journal;
-	int err;
-
-	if (sb->s_flags & MS_RDONLY) {
-		ext3_msg(sb, KERN_ERR,
-			"error: readonly filesystem when trying to "
-			"create journal");
-		return -EROFS;
-	}
-
-	journal = ext3_get_journal(sb, journal_inum);
-	if (!journal)
-		return -EINVAL;
-
-	ext3_msg(sb, KERN_INFO, "creating new journal on inode %u",
-	       journal_inum);
-
-	err = journal_create(journal);
-	if (err) {
-		ext3_msg(sb, KERN_ERR, "error creating journal");
-		journal_destroy(journal);
-		return -EIO;
-	}
-
-	EXT3_SB(sb)->s_journal = journal;
-
-	ext3_update_dynamic_rev(sb);
-	EXT3_SET_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_RECOVER);
-	EXT3_SET_COMPAT_FEATURE(sb, EXT3_FEATURE_COMPAT_HAS_JOURNAL);
-
-	es->s_journal_inum = cpu_to_le32(journal_inum);
-
-	/* Make sure we flush the recovery flag to disk. */
-	ext3_commit_super(sb, es, 1);
-
-	return 0;
-}
-
-static int ext3_commit_super(struct super_block *sb,
-			       struct ext3_super_block *es,
-			       int sync)
-{
-	struct buffer_head *sbh = EXT3_SB(sb)->s_sbh;
-	int error = 0;
-
-	if (!sbh)
-		return error;
-
-	if (buffer_write_io_error(sbh)) {
-		/*
-		 * Oh, dear.  A previous attempt to write the
-		 * superblock failed.  This could happen because the
-		 * USB device was yanked out.  Or it could happen to
-		 * be a transient write error and maybe the block will
-		 * be remapped.  Nothing we can do but to retry the
-		 * write and hope for the best.
-		 */
-		ext3_msg(sb, KERN_ERR, "previous I/O error to "
-		       "superblock detected");
-		clear_buffer_write_io_error(sbh);
-		set_buffer_uptodate(sbh);
-	}
-	/*
-	 * If the file system is mounted read-only, don't update the
-	 * superblock write time.  This avoids updating the superblock
-	 * write time when we are mounting the root file system
-	 * read/only but we need to replay the journal; at that point,
-	 * for people who are east of GMT and who make their clock
-	 * tick in localtime for Windows bug-for-bug compatibility,
-	 * the clock is set in the future, and this will cause e2fsck
-	 * to complain and force a full file system check.
-	 */
-	if (!(sb->s_flags & MS_RDONLY))
-		es->s_wtime = cpu_to_le32(get_seconds());
-	es->s_free_blocks_count = cpu_to_le32(ext3_count_free_blocks(sb));
-	es->s_free_inodes_count = cpu_to_le32(ext3_count_free_inodes(sb));
-	BUFFER_TRACE(sbh, "marking dirty");
-	mark_buffer_dirty(sbh);
-	if (sync) {
-		error = sync_dirty_buffer(sbh);
-		if (buffer_write_io_error(sbh)) {
-			ext3_msg(sb, KERN_ERR, "I/O error while writing "
-			       "superblock");
-			clear_buffer_write_io_error(sbh);
-			set_buffer_uptodate(sbh);
-		}
-	}
-	return error;
-}
-
-
-/*
- * Have we just finished recovery?  If so, and if we are mounting (or
- * remounting) the filesystem readonly, then we will end up with a
- * consistent fs on disk.  Record that fact.
- */
-static void ext3_mark_recovery_complete(struct super_block * sb,
-					struct ext3_super_block * es)
-{
-	journal_t *journal = EXT3_SB(sb)->s_journal;
-
-	journal_lock_updates(journal);
-	if (journal_flush(journal) < 0)
-		goto out;
-
-	if (EXT3_HAS_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_RECOVER) &&
-	    sb->s_flags & MS_RDONLY) {
-		EXT3_CLEAR_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_RECOVER);
-		ext3_commit_super(sb, es, 1);
-	}
-
-out:
-	journal_unlock_updates(journal);
-}
-
-/*
- * If we are mounting (or read-write remounting) a filesystem whose journal
- * has recorded an error from a previous lifetime, move that error to the
- * main filesystem now.
- */
-static void ext3_clear_journal_err(struct super_block *sb,
-				   struct ext3_super_block *es)
-{
-	journal_t *journal;
-	int j_errno;
-	const char *errstr;
-
-	journal = EXT3_SB(sb)->s_journal;
-
-	/*
-	 * Now check for any error status which may have been recorded in the
-	 * journal by a prior ext3_error() or ext3_abort()
-	 */
-
-	j_errno = journal_errno(journal);
-	if (j_errno) {
-		char nbuf[16];
-
-		errstr = ext3_decode_error(sb, j_errno, nbuf);
-		ext3_warning(sb, __func__, "Filesystem error recorded "
-			     "from previous mount: %s", errstr);
-		ext3_warning(sb, __func__, "Marking fs in need of "
-			     "filesystem check.");
-
-		EXT3_SB(sb)->s_mount_state |= EXT3_ERROR_FS;
-		es->s_state |= cpu_to_le16(EXT3_ERROR_FS);
-		ext3_commit_super (sb, es, 1);
-
-		journal_clear_err(journal);
-	}
-}
-
-/*
- * Force the running and committing transactions to commit,
- * and wait on the commit.
- */
-int ext3_force_commit(struct super_block *sb)
-{
-	journal_t *journal;
-	int ret;
-
-	if (sb->s_flags & MS_RDONLY)
-		return 0;
-
-	journal = EXT3_SB(sb)->s_journal;
-	ret = ext3_journal_force_commit(journal);
-	return ret;
-}
-
-static int ext3_sync_fs(struct super_block *sb, int wait)
-{
-	tid_t target;
-
-	trace_ext3_sync_fs(sb, wait);
-	/*
-	 * Writeback quota in non-journalled quota case - journalled quota has
-	 * no dirty dquots
-	 */
-	dquot_writeback_dquots(sb, -1);
-	if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) {
-		if (wait)
-			log_wait_commit(EXT3_SB(sb)->s_journal, target);
-	}
-	return 0;
-}
-
-/*
- * LVM calls this function before a (read-only) snapshot is created.  This
- * gives us a chance to flush the journal completely and mark the fs clean.
- */
-static int ext3_freeze(struct super_block *sb)
-{
-	int error = 0;
-	journal_t *journal;
-
-	if (!(sb->s_flags & MS_RDONLY)) {
-		journal = EXT3_SB(sb)->s_journal;
-
-		/* Now we set up the journal barrier. */
-		journal_lock_updates(journal);
-
-		/*
-		 * We don't want to clear needs_recovery flag when we failed
-		 * to flush the journal.
-		 */
-		error = journal_flush(journal);
-		if (error < 0)
-			goto out;
-
-		/* Journal blocked and flushed, clear needs_recovery flag. */
-		EXT3_CLEAR_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_RECOVER);
-		error = ext3_commit_super(sb, EXT3_SB(sb)->s_es, 1);
-		if (error)
-			goto out;
-	}
-	return 0;
-
-out:
-	journal_unlock_updates(journal);
-	return error;
-}
-
-/*
- * Called by LVM after the snapshot is done.  We need to reset the RECOVER
- * flag here, even though the filesystem is not technically dirty yet.
- */
-static int ext3_unfreeze(struct super_block *sb)
-{
-	if (!(sb->s_flags & MS_RDONLY)) {
-		/* Reser the needs_recovery flag before the fs is unlocked. */
-		EXT3_SET_INCOMPAT_FEATURE(sb, EXT3_FEATURE_INCOMPAT_RECOVER);
-		ext3_commit_super(sb, EXT3_SB(sb)->s_es, 1);
-		journal_unlock_updates(EXT3_SB(sb)->s_journal);
-	}
-	return 0;
-}
-
-static int ext3_remount (struct super_block * sb, int * flags, char * data)
-{
-	struct ext3_super_block * es;
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	ext3_fsblk_t n_blocks_count = 0;
-	unsigned long old_sb_flags;
-	struct ext3_mount_options old_opts;
-	int enable_quota = 0;
-	int err;
-#ifdef CONFIG_QUOTA
-	int i;
-#endif
-
-	sync_filesystem(sb);
-
-	/* Store the original options */
-	old_sb_flags = sb->s_flags;
-	old_opts.s_mount_opt = sbi->s_mount_opt;
-	old_opts.s_resuid = sbi->s_resuid;
-	old_opts.s_resgid = sbi->s_resgid;
-	old_opts.s_commit_interval = sbi->s_commit_interval;
-#ifdef CONFIG_QUOTA
-	old_opts.s_jquota_fmt = sbi->s_jquota_fmt;
-	for (i = 0; i < EXT3_MAXQUOTAS; i++)
-		if (sbi->s_qf_names[i]) {
-			old_opts.s_qf_names[i] = kstrdup(sbi->s_qf_names[i],
-							 GFP_KERNEL);
-			if (!old_opts.s_qf_names[i]) {
-				int j;
-
-				for (j = 0; j < i; j++)
-					kfree(old_opts.s_qf_names[j]);
-				return -ENOMEM;
-			}
-		} else
-			old_opts.s_qf_names[i] = NULL;
-#endif
-
-	/*
-	 * Allow the "check" option to be passed as a remount option.
-	 */
-	if (!parse_options(data, sb, NULL, NULL, &n_blocks_count, 1)) {
-		err = -EINVAL;
-		goto restore_opts;
-	}
-
-	if (test_opt(sb, ABORT))
-		ext3_abort(sb, __func__, "Abort forced by user");
-
-	sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
-		(test_opt(sb, POSIX_ACL) ? MS_POSIXACL : 0);
-
-	es = sbi->s_es;
-
-	ext3_init_journal_params(sb, sbi->s_journal);
-
-	if ((*flags & MS_RDONLY) != (sb->s_flags & MS_RDONLY) ||
-		n_blocks_count > le32_to_cpu(es->s_blocks_count)) {
-		if (test_opt(sb, ABORT)) {
-			err = -EROFS;
-			goto restore_opts;
-		}
-
-		if (*flags & MS_RDONLY) {
-			err = dquot_suspend(sb, -1);
-			if (err < 0)
-				goto restore_opts;
-
-			/*
-			 * First of all, the unconditional stuff we have to do
-			 * to disable replay of the journal when we next remount
-			 */
-			sb->s_flags |= MS_RDONLY;
-
-			/*
-			 * OK, test if we are remounting a valid rw partition
-			 * readonly, and if so set the rdonly flag and then
-			 * mark the partition as valid again.
-			 */
-			if (!(es->s_state & cpu_to_le16(EXT3_VALID_FS)) &&
-			    (sbi->s_mount_state & EXT3_VALID_FS))
-				es->s_state = cpu_to_le16(sbi->s_mount_state);
-
-			ext3_mark_recovery_complete(sb, es);
-		} else {
-			__le32 ret;
-			if ((ret = EXT3_HAS_RO_COMPAT_FEATURE(sb,
-					~EXT3_FEATURE_RO_COMPAT_SUPP))) {
-				ext3_msg(sb, KERN_WARNING,
-					"warning: couldn't remount RDWR "
-					"because of unsupported optional "
-					"features (%x)", le32_to_cpu(ret));
-				err = -EROFS;
-				goto restore_opts;
-			}
-
-			/*
-			 * If we have an unprocessed orphan list hanging
-			 * around from a previously readonly bdev mount,
-			 * require a full umount & mount for now.
-			 */
-			if (es->s_last_orphan) {
-				ext3_msg(sb, KERN_WARNING, "warning: couldn't "
-				       "remount RDWR because of unprocessed "
-				       "orphan inode list.  Please "
-				       "umount & mount instead.");
-				err = -EINVAL;
-				goto restore_opts;
-			}
-
-			/*
-			 * Mounting a RDONLY partition read-write, so reread
-			 * and store the current valid flag.  (It may have
-			 * been changed by e2fsck since we originally mounted
-			 * the partition.)
-			 */
-			ext3_clear_journal_err(sb, es);
-			sbi->s_mount_state = le16_to_cpu(es->s_state);
-			if ((err = ext3_group_extend(sb, es, n_blocks_count)))
-				goto restore_opts;
-			if (!ext3_setup_super (sb, es, 0))
-				sb->s_flags &= ~MS_RDONLY;
-			enable_quota = 1;
-		}
-	}
-#ifdef CONFIG_QUOTA
-	/* Release old quota file names */
-	for (i = 0; i < EXT3_MAXQUOTAS; i++)
-		kfree(old_opts.s_qf_names[i]);
-#endif
-	if (enable_quota)
-		dquot_resume(sb, -1);
-	return 0;
-restore_opts:
-	sb->s_flags = old_sb_flags;
-	sbi->s_mount_opt = old_opts.s_mount_opt;
-	sbi->s_resuid = old_opts.s_resuid;
-	sbi->s_resgid = old_opts.s_resgid;
-	sbi->s_commit_interval = old_opts.s_commit_interval;
-#ifdef CONFIG_QUOTA
-	sbi->s_jquota_fmt = old_opts.s_jquota_fmt;
-	for (i = 0; i < EXT3_MAXQUOTAS; i++) {
-		kfree(sbi->s_qf_names[i]);
-		sbi->s_qf_names[i] = old_opts.s_qf_names[i];
-	}
-#endif
-	return err;
-}
-
-static int ext3_statfs (struct dentry * dentry, struct kstatfs * buf)
-{
-	struct super_block *sb = dentry->d_sb;
-	struct ext3_sb_info *sbi = EXT3_SB(sb);
-	struct ext3_super_block *es = sbi->s_es;
-	u64 fsid;
-
-	if (test_opt(sb, MINIX_DF)) {
-		sbi->s_overhead_last = 0;
-	} else if (sbi->s_blocks_last != le32_to_cpu(es->s_blocks_count)) {
-		unsigned long ngroups = sbi->s_groups_count, i;
-		ext3_fsblk_t overhead = 0;
-		smp_rmb();
-
-		/*
-		 * Compute the overhead (FS structures).  This is constant
-		 * for a given filesystem unless the number of block groups
-		 * changes so we cache the previous value until it does.
-		 */
-
-		/*
-		 * All of the blocks before first_data_block are
-		 * overhead
-		 */
-		overhead = le32_to_cpu(es->s_first_data_block);
-
-		/*
-		 * Add the overhead attributed to the superblock and
-		 * block group descriptors.  If the sparse superblocks
-		 * feature is turned on, then not all groups have this.
-		 */
-		for (i = 0; i < ngroups; i++) {
-			overhead += ext3_bg_has_super(sb, i) +
-				ext3_bg_num_gdb(sb, i);
-			cond_resched();
-		}
-
-		/*
-		 * Every block group has an inode bitmap, a block
-		 * bitmap, and an inode table.
-		 */
-		overhead += ngroups * (2 + sbi->s_itb_per_group);
-
-		/* Add the internal journal blocks as well */
-		if (sbi->s_journal && !sbi->journal_bdev)
-			overhead += sbi->s_journal->j_maxlen;
-
-		sbi->s_overhead_last = overhead;
-		smp_wmb();
-		sbi->s_blocks_last = le32_to_cpu(es->s_blocks_count);
-	}
-
-	buf->f_type = EXT3_SUPER_MAGIC;
-	buf->f_bsize = sb->s_blocksize;
-	buf->f_blocks = le32_to_cpu(es->s_blocks_count) - sbi->s_overhead_last;
-	buf->f_bfree = percpu_counter_sum_positive(&sbi->s_freeblocks_counter);
-	buf->f_bavail = buf->f_bfree - le32_to_cpu(es->s_r_blocks_count);
-	if (buf->f_bfree < le32_to_cpu(es->s_r_blocks_count))
-		buf->f_bavail = 0;
-	buf->f_files = le32_to_cpu(es->s_inodes_count);
-	buf->f_ffree = percpu_counter_sum_positive(&sbi->s_freeinodes_counter);
-	buf->f_namelen = EXT3_NAME_LEN;
-	fsid = le64_to_cpup((void *)es->s_uuid) ^
-	       le64_to_cpup((void *)es->s_uuid + sizeof(u64));
-	buf->f_fsid.val[0] = fsid & 0xFFFFFFFFUL;
-	buf->f_fsid.val[1] = (fsid >> 32) & 0xFFFFFFFFUL;
-	return 0;
-}
-
-/* Helper function for writing quotas on sync - we need to start transaction before quota file
- * is locked for write. Otherwise the are possible deadlocks:
- * Process 1                         Process 2
- * ext3_create()                     quota_sync()
- *   journal_start()                   write_dquot()
- *   dquot_initialize()                       down(dqio_mutex)
- *     down(dqio_mutex)                    journal_start()
- *
- */
-
-#ifdef CONFIG_QUOTA
-
-static inline struct inode *dquot_to_inode(struct dquot *dquot)
-{
-	return sb_dqopt(dquot->dq_sb)->files[dquot->dq_id.type];
-}
-
-static int ext3_write_dquot(struct dquot *dquot)
-{
-	int ret, err;
-	handle_t *handle;
-	struct inode *inode;
-
-	inode = dquot_to_inode(dquot);
-	handle = ext3_journal_start(inode,
-					EXT3_QUOTA_TRANS_BLOCKS(dquot->dq_sb));
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-	ret = dquot_commit(dquot);
-	err = ext3_journal_stop(handle);
-	if (!ret)
-		ret = err;
-	return ret;
-}
-
-static int ext3_acquire_dquot(struct dquot *dquot)
-{
-	int ret, err;
-	handle_t *handle;
-
-	handle = ext3_journal_start(dquot_to_inode(dquot),
-					EXT3_QUOTA_INIT_BLOCKS(dquot->dq_sb));
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-	ret = dquot_acquire(dquot);
-	err = ext3_journal_stop(handle);
-	if (!ret)
-		ret = err;
-	return ret;
-}
-
-static int ext3_release_dquot(struct dquot *dquot)
-{
-	int ret, err;
-	handle_t *handle;
-
-	handle = ext3_journal_start(dquot_to_inode(dquot),
-					EXT3_QUOTA_DEL_BLOCKS(dquot->dq_sb));
-	if (IS_ERR(handle)) {
-		/* Release dquot anyway to avoid endless cycle in dqput() */
-		dquot_release(dquot);
-		return PTR_ERR(handle);
-	}
-	ret = dquot_release(dquot);
-	err = ext3_journal_stop(handle);
-	if (!ret)
-		ret = err;
-	return ret;
-}
-
-static int ext3_mark_dquot_dirty(struct dquot *dquot)
-{
-	/* Are we journaling quotas? */
-	if (EXT3_SB(dquot->dq_sb)->s_qf_names[USRQUOTA] ||
-	    EXT3_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
-		dquot_mark_dquot_dirty(dquot);
-		return ext3_write_dquot(dquot);
-	} else {
-		return dquot_mark_dquot_dirty(dquot);
-	}
-}
-
-static int ext3_write_info(struct super_block *sb, int type)
-{
-	int ret, err;
-	handle_t *handle;
-
-	/* Data block + inode block */
-	handle = ext3_journal_start(d_inode(sb->s_root), 2);
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-	ret = dquot_commit_info(sb, type);
-	err = ext3_journal_stop(handle);
-	if (!ret)
-		ret = err;
-	return ret;
-}
-
-/*
- * Turn on quotas during mount time - we need to find
- * the quota file and such...
- */
-static int ext3_quota_on_mount(struct super_block *sb, int type)
-{
-	return dquot_quota_on_mount(sb, EXT3_SB(sb)->s_qf_names[type],
-					EXT3_SB(sb)->s_jquota_fmt, type);
-}
-
-/*
- * Standard function to be called on quota_on
- */
-static int ext3_quota_on(struct super_block *sb, int type, int format_id,
-			 struct path *path)
-{
-	int err;
-
-	if (!test_opt(sb, QUOTA))
-		return -EINVAL;
-
-	/* Quotafile not on the same filesystem? */
-	if (path->dentry->d_sb != sb)
-		return -EXDEV;
-	/* Journaling quota? */
-	if (EXT3_SB(sb)->s_qf_names[type]) {
-		/* Quotafile not of fs root? */
-		if (path->dentry->d_parent != sb->s_root)
-			ext3_msg(sb, KERN_WARNING,
-				"warning: Quota file not on filesystem root. "
-				"Journaled quota will not work.");
-	}
-
-	/*
-	 * When we journal data on quota file, we have to flush journal to see
-	 * all updates to the file when we bypass pagecache...
-	 */
-	if (ext3_should_journal_data(d_inode(path->dentry))) {
-		/*
-		 * We don't need to lock updates but journal_flush() could
-		 * otherwise be livelocked...
-		 */
-		journal_lock_updates(EXT3_SB(sb)->s_journal);
-		err = journal_flush(EXT3_SB(sb)->s_journal);
-		journal_unlock_updates(EXT3_SB(sb)->s_journal);
-		if (err)
-			return err;
-	}
-
-	return dquot_quota_on(sb, type, format_id, path);
-}
-
-/* Read data from quotafile - avoid pagecache and such because we cannot afford
- * acquiring the locks... As quota files are never truncated and quota code
- * itself serializes the operations (and no one else should touch the files)
- * we don't have to be afraid of races */
-static ssize_t ext3_quota_read(struct super_block *sb, int type, char *data,
-			       size_t len, loff_t off)
-{
-	struct inode *inode = sb_dqopt(sb)->files[type];
-	sector_t blk = off >> EXT3_BLOCK_SIZE_BITS(sb);
-	int err = 0;
-	int offset = off & (sb->s_blocksize - 1);
-	int tocopy;
-	size_t toread;
-	struct buffer_head *bh;
-	loff_t i_size = i_size_read(inode);
-
-	if (off > i_size)
-		return 0;
-	if (off+len > i_size)
-		len = i_size-off;
-	toread = len;
-	while (toread > 0) {
-		tocopy = sb->s_blocksize - offset < toread ?
-				sb->s_blocksize - offset : toread;
-		bh = ext3_bread(NULL, inode, blk, 0, &err);
-		if (err)
-			return err;
-		if (!bh)	/* A hole? */
-			memset(data, 0, tocopy);
-		else
-			memcpy(data, bh->b_data+offset, tocopy);
-		brelse(bh);
-		offset = 0;
-		toread -= tocopy;
-		data += tocopy;
-		blk++;
-	}
-	return len;
-}
-
-/* Write to quotafile (we know the transaction is already started and has
- * enough credits) */
-static ssize_t ext3_quota_write(struct super_block *sb, int type,
-				const char *data, size_t len, loff_t off)
-{
-	struct inode *inode = sb_dqopt(sb)->files[type];
-	sector_t blk = off >> EXT3_BLOCK_SIZE_BITS(sb);
-	int err = 0;
-	int offset = off & (sb->s_blocksize - 1);
-	int journal_quota = EXT3_SB(sb)->s_qf_names[type] != NULL;
-	struct buffer_head *bh;
-	handle_t *handle = journal_current_handle();
-
-	if (!handle) {
-		ext3_msg(sb, KERN_WARNING,
-			"warning: quota write (off=%llu, len=%llu)"
-			" cancelled because transaction is not started.",
-			(unsigned long long)off, (unsigned long long)len);
-		return -EIO;
-	}
-
-	/*
-	 * Since we account only one data block in transaction credits,
-	 * then it is impossible to cross a block boundary.
-	 */
-	if (sb->s_blocksize - offset < len) {
-		ext3_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
-			" cancelled because not block aligned",
-			(unsigned long long)off, (unsigned long long)len);
-		return -EIO;
-	}
-	bh = ext3_bread(handle, inode, blk, 1, &err);
-	if (!bh)
-		goto out;
-	if (journal_quota) {
-		err = ext3_journal_get_write_access(handle, bh);
-		if (err) {
-			brelse(bh);
-			goto out;
-		}
-	}
-	lock_buffer(bh);
-	memcpy(bh->b_data+offset, data, len);
-	flush_dcache_page(bh->b_page);
-	unlock_buffer(bh);
-	if (journal_quota)
-		err = ext3_journal_dirty_metadata(handle, bh);
-	else {
-		/* Always do at least ordered writes for quotas */
-		err = ext3_journal_dirty_data(handle, bh);
-		mark_buffer_dirty(bh);
-	}
-	brelse(bh);
-out:
-	if (err)
-		return err;
-	if (inode->i_size < off + len) {
-		i_size_write(inode, off + len);
-		EXT3_I(inode)->i_disksize = inode->i_size;
-	}
-	inode->i_version++;
-	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
-	ext3_mark_inode_dirty(handle, inode);
-	return len;
-}
-
-#endif
-
-static struct dentry *ext3_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
-{
-	return mount_bdev(fs_type, flags, dev_name, data, ext3_fill_super);
-}
-
-static struct file_system_type ext3_fs_type = {
-	.owner		= THIS_MODULE,
-	.name		= "ext3",
-	.mount		= ext3_mount,
-	.kill_sb	= kill_block_super,
-	.fs_flags	= FS_REQUIRES_DEV,
-};
-MODULE_ALIAS_FS("ext3");
-
-static int __init init_ext3_fs(void)
-{
-	int err = init_ext3_xattr();
-	if (err)
-		return err;
-	err = init_inodecache();
-	if (err)
-		goto out1;
-        err = register_filesystem(&ext3_fs_type);
-	if (err)
-		goto out;
-	return 0;
-out:
-	destroy_inodecache();
-out1:
-	exit_ext3_xattr();
-	return err;
-}
-
-static void __exit exit_ext3_fs(void)
-{
-	unregister_filesystem(&ext3_fs_type);
-	destroy_inodecache();
-	exit_ext3_xattr();
-}
-
-MODULE_AUTHOR("Remy Card, Stephen Tweedie, Andrew Morton, Andreas Dilger, Theodore Ts'o and others");
-MODULE_DESCRIPTION("Second Extended Filesystem with journaling extensions");
-MODULE_LICENSE("GPL");
-module_init(init_ext3_fs)
-module_exit(exit_ext3_fs)
diff --git a/fs/ext3/symlink.c b/fs/ext3/symlink.c
deleted file mode 100644
index c08c59094ae6..000000000000
--- a/fs/ext3/symlink.c
+++ /dev/null
@@ -1,46 +0,0 @@
-/*
- *  linux/fs/ext3/symlink.c
- *
- * Only fast symlinks left here - the rest is done by generic code. AV, 1999
- *
- * Copyright (C) 1992, 1993, 1994, 1995
- * Remy Card (card@masi.ibp.fr)
- * Laboratoire MASI - Institut Blaise Pascal
- * Universite Pierre et Marie Curie (Paris VI)
- *
- *  from
- *
- *  linux/fs/minix/symlink.c
- *
- *  Copyright (C) 1991, 1992  Linus Torvalds
- *
- *  ext3 symlink handling code
- */
-
-#include "ext3.h"
-#include "xattr.h"
-
-const struct inode_operations ext3_symlink_inode_operations = {
-	.readlink	= generic_readlink,
-	.follow_link	= page_follow_link_light,
-	.put_link	= page_put_link,
-	.setattr	= ext3_setattr,
-#ifdef CONFIG_EXT3_FS_XATTR
-	.setxattr	= generic_setxattr,
-	.getxattr	= generic_getxattr,
-	.listxattr	= ext3_listxattr,
-	.removexattr	= generic_removexattr,
-#endif
-};
-
-const struct inode_operations ext3_fast_symlink_inode_operations = {
-	.readlink	= generic_readlink,
-	.follow_link	= simple_follow_link,
-	.setattr	= ext3_setattr,
-#ifdef CONFIG_EXT3_FS_XATTR
-	.setxattr	= generic_setxattr,
-	.getxattr	= generic_getxattr,
-	.listxattr	= ext3_listxattr,
-	.removexattr	= generic_removexattr,
-#endif
-};
diff --git a/fs/ext3/xattr.c b/fs/ext3/xattr.c
deleted file mode 100644
index 7cf36501ccf4..000000000000
--- a/fs/ext3/xattr.c
+++ /dev/null
@@ -1,1330 +0,0 @@
-/*
- * linux/fs/ext3/xattr.c
- *
- * Copyright (C) 2001-2003 Andreas Gruenbacher, <agruen@suse.de>
- *
- * Fix by Harrison Xing <harrison@mountainviewdata.com>.
- * Ext3 code with a lot of help from Eric Jarman <ejarman@acm.org>.
- * Extended attributes for symlinks and special files added per
- *  suggestion of Luka Renko <luka.renko@hermes.si>.
- * xattr consolidation Copyright (c) 2004 James Morris <jmorris@redhat.com>,
- *  Red Hat Inc.
- * ea-in-inode support by Alex Tomas <alex@clusterfs.com> aka bzzz
- *  and Andreas Gruenbacher <agruen@suse.de>.
- */
-
-/*
- * Extended attributes are stored directly in inodes (on file systems with
- * inodes bigger than 128 bytes) and on additional disk blocks. The i_file_acl
- * field contains the block number if an inode uses an additional block. All
- * attributes must fit in the inode and one additional block. Blocks that
- * contain the identical set of attributes may be shared among several inodes.
- * Identical blocks are detected by keeping a cache of blocks that have
- * recently been accessed.
- *
- * The attributes in inodes and on blocks have a different header; the entries
- * are stored in the same format:
- *
- *   +------------------+
- *   | header           |
- *   | entry 1          | |
- *   | entry 2          | | growing downwards
- *   | entry 3          | v
- *   | four null bytes  |
- *   | . . .            |
- *   | value 1          | ^
- *   | value 3          | | growing upwards
- *   | value 2          | |
- *   +------------------+
- *
- * The header is followed by multiple entry descriptors. In disk blocks, the
- * entry descriptors are kept sorted. In inodes, they are unsorted. The
- * attribute values are aligned to the end of the block in no specific order.
- *
- * Locking strategy
- * ----------------
- * EXT3_I(inode)->i_file_acl is protected by EXT3_I(inode)->xattr_sem.
- * EA blocks are only changed if they are exclusive to an inode, so
- * holding xattr_sem also means that nothing but the EA block's reference
- * count can change. Multiple writers to the same block are synchronized
- * by the buffer lock.
- */
-
-#include "ext3.h"
-#include <linux/mbcache.h>
-#include <linux/quotaops.h>
-#include "xattr.h"
-#include "acl.h"
-
-#define BHDR(bh) ((struct ext3_xattr_header *)((bh)->b_data))
-#define ENTRY(ptr) ((struct ext3_xattr_entry *)(ptr))
-#define BFIRST(bh) ENTRY(BHDR(bh)+1)
-#define IS_LAST_ENTRY(entry) (*(__u32 *)(entry) == 0)
-
-#define IHDR(inode, raw_inode) \
-	((struct ext3_xattr_ibody_header *) \
-		((void *)raw_inode + \
-		 EXT3_GOOD_OLD_INODE_SIZE + \
-		 EXT3_I(inode)->i_extra_isize))
-#define IFIRST(hdr) ((struct ext3_xattr_entry *)((hdr)+1))
-
-#ifdef EXT3_XATTR_DEBUG
-# define ea_idebug(inode, f...) do { \
-		printk(KERN_DEBUG "inode %s:%lu: ", \
-			inode->i_sb->s_id, inode->i_ino); \
-		printk(f); \
-		printk("\n"); \
-	} while (0)
-# define ea_bdebug(bh, f...) do { \
-		char b[BDEVNAME_SIZE]; \
-		printk(KERN_DEBUG "block %s:%lu: ", \
-			bdevname(bh->b_bdev, b), \
-			(unsigned long) bh->b_blocknr); \
-		printk(f); \
-		printk("\n"); \
-	} while (0)
-#else
-# define ea_idebug(f...)
-# define ea_bdebug(f...)
-#endif
-
-static void ext3_xattr_cache_insert(struct buffer_head *);
-static struct buffer_head *ext3_xattr_cache_find(struct inode *,
-						 struct ext3_xattr_header *,
-						 struct mb_cache_entry **);
-static void ext3_xattr_rehash(struct ext3_xattr_header *,
-			      struct ext3_xattr_entry *);
-static int ext3_xattr_list(struct dentry *dentry, char *buffer,
-			   size_t buffer_size);
-
-static struct mb_cache *ext3_xattr_cache;
-
-static const struct xattr_handler *ext3_xattr_handler_map[] = {
-	[EXT3_XATTR_INDEX_USER]		     = &ext3_xattr_user_handler,
-#ifdef CONFIG_EXT3_FS_POSIX_ACL
-	[EXT3_XATTR_INDEX_POSIX_ACL_ACCESS]  = &posix_acl_access_xattr_handler,
-	[EXT3_XATTR_INDEX_POSIX_ACL_DEFAULT] = &posix_acl_default_xattr_handler,
-#endif
-	[EXT3_XATTR_INDEX_TRUSTED]	     = &ext3_xattr_trusted_handler,
-#ifdef CONFIG_EXT3_FS_SECURITY
-	[EXT3_XATTR_INDEX_SECURITY]	     = &ext3_xattr_security_handler,
-#endif
-};
-
-const struct xattr_handler *ext3_xattr_handlers[] = {
-	&ext3_xattr_user_handler,
-	&ext3_xattr_trusted_handler,
-#ifdef CONFIG_EXT3_FS_POSIX_ACL
-	&posix_acl_access_xattr_handler,
-	&posix_acl_default_xattr_handler,
-#endif
-#ifdef CONFIG_EXT3_FS_SECURITY
-	&ext3_xattr_security_handler,
-#endif
-	NULL
-};
-
-static inline const struct xattr_handler *
-ext3_xattr_handler(int name_index)
-{
-	const struct xattr_handler *handler = NULL;
-
-	if (name_index > 0 && name_index < ARRAY_SIZE(ext3_xattr_handler_map))
-		handler = ext3_xattr_handler_map[name_index];
-	return handler;
-}
-
-/*
- * Inode operation listxattr()
- *
- * d_inode(dentry)->i_mutex: don't care
- */
-ssize_t
-ext3_listxattr(struct dentry *dentry, char *buffer, size_t size)
-{
-	return ext3_xattr_list(dentry, buffer, size);
-}
-
-static int
-ext3_xattr_check_names(struct ext3_xattr_entry *entry, void *end)
-{
-	while (!IS_LAST_ENTRY(entry)) {
-		struct ext3_xattr_entry *next = EXT3_XATTR_NEXT(entry);
-		if ((void *)next >= end)
-			return -EIO;
-		entry = next;
-	}
-	return 0;
-}
-
-static inline int
-ext3_xattr_check_block(struct buffer_head *bh)
-{
-	int error;
-
-	if (BHDR(bh)->h_magic != cpu_to_le32(EXT3_XATTR_MAGIC) ||
-	    BHDR(bh)->h_blocks != cpu_to_le32(1))
-		return -EIO;
-	error = ext3_xattr_check_names(BFIRST(bh), bh->b_data + bh->b_size);
-	return error;
-}
-
-static inline int
-ext3_xattr_check_entry(struct ext3_xattr_entry *entry, size_t size)
-{
-	size_t value_size = le32_to_cpu(entry->e_value_size);
-
-	if (entry->e_value_block != 0 || value_size > size ||
-	    le16_to_cpu(entry->e_value_offs) + value_size > size)
-		return -EIO;
-	return 0;
-}
-
-static int
-ext3_xattr_find_entry(struct ext3_xattr_entry **pentry, int name_index,
-		      const char *name, size_t size, int sorted)
-{
-	struct ext3_xattr_entry *entry;
-	size_t name_len;
-	int cmp = 1;
-
-	if (name == NULL)
-		return -EINVAL;
-	name_len = strlen(name);
-	entry = *pentry;
-	for (; !IS_LAST_ENTRY(entry); entry = EXT3_XATTR_NEXT(entry)) {
-		cmp = name_index - entry->e_name_index;
-		if (!cmp)
-			cmp = name_len - entry->e_name_len;
-		if (!cmp)
-			cmp = memcmp(name, entry->e_name, name_len);
-		if (cmp <= 0 && (sorted || cmp == 0))
-			break;
-	}
-	*pentry = entry;
-	if (!cmp && ext3_xattr_check_entry(entry, size))
-			return -EIO;
-	return cmp ? -ENODATA : 0;
-}
-
-static int
-ext3_xattr_block_get(struct inode *inode, int name_index, const char *name,
-		     void *buffer, size_t buffer_size)
-{
-	struct buffer_head *bh = NULL;
-	struct ext3_xattr_entry *entry;
-	size_t size;
-	int error;
-
-	ea_idebug(inode, "name=%d.%s, buffer=%p, buffer_size=%ld",
-		  name_index, name, buffer, (long)buffer_size);
-
-	error = -ENODATA;
-	if (!EXT3_I(inode)->i_file_acl)
-		goto cleanup;
-	ea_idebug(inode, "reading block %u", EXT3_I(inode)->i_file_acl);
-	bh = sb_bread(inode->i_sb, EXT3_I(inode)->i_file_acl);
-	if (!bh)
-		goto cleanup;
-	ea_bdebug(bh, "b_count=%d, refcount=%d",
-		atomic_read(&(bh->b_count)), le32_to_cpu(BHDR(bh)->h_refcount));
-	if (ext3_xattr_check_block(bh)) {
-bad_block:	ext3_error(inode->i_sb, __func__,
-			   "inode %lu: bad block "E3FSBLK, inode->i_ino,
-			   EXT3_I(inode)->i_file_acl);
-		error = -EIO;
-		goto cleanup;
-	}
-	ext3_xattr_cache_insert(bh);
-	entry = BFIRST(bh);
-	error = ext3_xattr_find_entry(&entry, name_index, name, bh->b_size, 1);
-	if (error == -EIO)
-		goto bad_block;
-	if (error)
-		goto cleanup;
-	size = le32_to_cpu(entry->e_value_size);
-	if (buffer) {
-		error = -ERANGE;
-		if (size > buffer_size)
-			goto cleanup;
-		memcpy(buffer, bh->b_data + le16_to_cpu(entry->e_value_offs),
-		       size);
-	}
-	error = size;
-
-cleanup:
-	brelse(bh);
-	return error;
-}
-
-static int
-ext3_xattr_ibody_get(struct inode *inode, int name_index, const char *name,
-		     void *buffer, size_t buffer_size)
-{
-	struct ext3_xattr_ibody_header *header;
-	struct ext3_xattr_entry *entry;
-	struct ext3_inode *raw_inode;
-	struct ext3_iloc iloc;
-	size_t size;
-	void *end;
-	int error;
-
-	if (!ext3_test_inode_state(inode, EXT3_STATE_XATTR))
-		return -ENODATA;
-	error = ext3_get_inode_loc(inode, &iloc);
-	if (error)
-		return error;
-	raw_inode = ext3_raw_inode(&iloc);
-	header = IHDR(inode, raw_inode);
-	entry = IFIRST(header);
-	end = (void *)raw_inode + EXT3_SB(inode->i_sb)->s_inode_size;
-	error = ext3_xattr_check_names(entry, end);
-	if (error)
-		goto cleanup;
-	error = ext3_xattr_find_entry(&entry, name_index, name,
-				      end - (void *)entry, 0);
-	if (error)
-		goto cleanup;
-	size = le32_to_cpu(entry->e_value_size);
-	if (buffer) {
-		error = -ERANGE;
-		if (size > buffer_size)
-			goto cleanup;
-		memcpy(buffer, (void *)IFIRST(header) +
-		       le16_to_cpu(entry->e_value_offs), size);
-	}
-	error = size;
-
-cleanup:
-	brelse(iloc.bh);
-	return error;
-}
-
-/*
- * ext3_xattr_get()
- *
- * Copy an extended attribute into the buffer
- * provided, or compute the buffer size required.
- * Buffer is NULL to compute the size of the buffer required.
- *
- * Returns a negative error number on failure, or the number of bytes
- * used / required on success.
- */
-int
-ext3_xattr_get(struct inode *inode, int name_index, const char *name,
-	       void *buffer, size_t buffer_size)
-{
-	int error;
-
-	down_read(&EXT3_I(inode)->xattr_sem);
-	error = ext3_xattr_ibody_get(inode, name_index, name, buffer,
-				     buffer_size);
-	if (error == -ENODATA)
-		error = ext3_xattr_block_get(inode, name_index, name, buffer,
-					     buffer_size);
-	up_read(&EXT3_I(inode)->xattr_sem);
-	return error;
-}
-
-static int
-ext3_xattr_list_entries(struct dentry *dentry, struct ext3_xattr_entry *entry,
-			char *buffer, size_t buffer_size)
-{
-	size_t rest = buffer_size;
-
-	for (; !IS_LAST_ENTRY(entry); entry = EXT3_XATTR_NEXT(entry)) {
-		const struct xattr_handler *handler =
-			ext3_xattr_handler(entry->e_name_index);
-
-		if (handler) {
-			size_t size = handler->list(dentry, buffer, rest,
-						    entry->e_name,
-						    entry->e_name_len,
-						    handler->flags);
-			if (buffer) {
-				if (size > rest)
-					return -ERANGE;
-				buffer += size;
-			}
-			rest -= size;
-		}
-	}
-	return buffer_size - rest;
-}
-
-static int
-ext3_xattr_block_list(struct dentry *dentry, char *buffer, size_t buffer_size)
-{
-	struct inode *inode = d_inode(dentry);
-	struct buffer_head *bh = NULL;
-	int error;
-
-	ea_idebug(inode, "buffer=%p, buffer_size=%ld",
-		  buffer, (long)buffer_size);
-
-	error = 0;
-	if (!EXT3_I(inode)->i_file_acl)
-		goto cleanup;
-	ea_idebug(inode, "reading block %u", EXT3_I(inode)->i_file_acl);
-	bh = sb_bread(inode->i_sb, EXT3_I(inode)->i_file_acl);
-	error = -EIO;
-	if (!bh)
-		goto cleanup;
-	ea_bdebug(bh, "b_count=%d, refcount=%d",
-		atomic_read(&(bh->b_count)), le32_to_cpu(BHDR(bh)->h_refcount));
-	if (ext3_xattr_check_block(bh)) {
-		ext3_error(inode->i_sb, __func__,
-			   "inode %lu: bad block "E3FSBLK, inode->i_ino,
-			   EXT3_I(inode)->i_file_acl);
-		error = -EIO;
-		goto cleanup;
-	}
-	ext3_xattr_cache_insert(bh);
-	error = ext3_xattr_list_entries(dentry, BFIRST(bh), buffer, buffer_size);
-
-cleanup:
-	brelse(bh);
-
-	return error;
-}
-
-static int
-ext3_xattr_ibody_list(struct dentry *dentry, char *buffer, size_t buffer_size)
-{
-	struct inode *inode = d_inode(dentry);
-	struct ext3_xattr_ibody_header *header;
-	struct ext3_inode *raw_inode;
-	struct ext3_iloc iloc;
-	void *end;
-	int error;
-
-	if (!ext3_test_inode_state(inode, EXT3_STATE_XATTR))
-		return 0;
-	error = ext3_get_inode_loc(inode, &iloc);
-	if (error)
-		return error;
-	raw_inode = ext3_raw_inode(&iloc);
-	header = IHDR(inode, raw_inode);
-	end = (void *)raw_inode + EXT3_SB(inode->i_sb)->s_inode_size;
-	error = ext3_xattr_check_names(IFIRST(header), end);
-	if (error)
-		goto cleanup;
-	error = ext3_xattr_list_entries(dentry, IFIRST(header),
-					buffer, buffer_size);
-
-cleanup:
-	brelse(iloc.bh);
-	return error;
-}
-
-/*
- * ext3_xattr_list()
- *
- * Copy a list of attribute names into the buffer
- * provided, or compute the buffer size required.
- * Buffer is NULL to compute the size of the buffer required.
- *
- * Returns a negative error number on failure, or the number of bytes
- * used / required on success.
- */
-static int
-ext3_xattr_list(struct dentry *dentry, char *buffer, size_t buffer_size)
-{
-	int i_error, b_error;
-
-	down_read(&EXT3_I(d_inode(dentry))->xattr_sem);
-	i_error = ext3_xattr_ibody_list(dentry, buffer, buffer_size);
-	if (i_error < 0) {
-		b_error = 0;
-	} else {
-		if (buffer) {
-			buffer += i_error;
-			buffer_size -= i_error;
-		}
-		b_error = ext3_xattr_block_list(dentry, buffer, buffer_size);
-		if (b_error < 0)
-			i_error = 0;
-	}
-	up_read(&EXT3_I(d_inode(dentry))->xattr_sem);
-	return i_error + b_error;
-}
-
-/*
- * If the EXT3_FEATURE_COMPAT_EXT_ATTR feature of this file system is
- * not set, set it.
- */
-static void ext3_xattr_update_super_block(handle_t *handle,
-					  struct super_block *sb)
-{
-	if (EXT3_HAS_COMPAT_FEATURE(sb, EXT3_FEATURE_COMPAT_EXT_ATTR))
-		return;
-
-	if (ext3_journal_get_write_access(handle, EXT3_SB(sb)->s_sbh) == 0) {
-		EXT3_SET_COMPAT_FEATURE(sb, EXT3_FEATURE_COMPAT_EXT_ATTR);
-		ext3_journal_dirty_metadata(handle, EXT3_SB(sb)->s_sbh);
-	}
-}
-
-/*
- * Release the xattr block BH: If the reference count is > 1, decrement
- * it; otherwise free the block.
- */
-static void
-ext3_xattr_release_block(handle_t *handle, struct inode *inode,
-			 struct buffer_head *bh)
-{
-	struct mb_cache_entry *ce = NULL;
-	int error = 0;
-
-	ce = mb_cache_entry_get(ext3_xattr_cache, bh->b_bdev, bh->b_blocknr);
-	error = ext3_journal_get_write_access(handle, bh);
-	if (error)
-		 goto out;
-
-	lock_buffer(bh);
-
-	if (BHDR(bh)->h_refcount == cpu_to_le32(1)) {
-		ea_bdebug(bh, "refcount now=0; freeing");
-		if (ce)
-			mb_cache_entry_free(ce);
-		ext3_free_blocks(handle, inode, bh->b_blocknr, 1);
-		get_bh(bh);
-		ext3_forget(handle, 1, inode, bh, bh->b_blocknr);
-	} else {
-		le32_add_cpu(&BHDR(bh)->h_refcount, -1);
-		error = ext3_journal_dirty_metadata(handle, bh);
-		if (IS_SYNC(inode))
-			handle->h_sync = 1;
-		dquot_free_block(inode, 1);
-		ea_bdebug(bh, "refcount now=%d; releasing",
-			  le32_to_cpu(BHDR(bh)->h_refcount));
-		if (ce)
-			mb_cache_entry_release(ce);
-	}
-	unlock_buffer(bh);
-out:
-	ext3_std_error(inode->i_sb, error);
-	return;
-}
-
-struct ext3_xattr_info {
-	int name_index;
-	const char *name;
-	const void *value;
-	size_t value_len;
-};
-
-struct ext3_xattr_search {
-	struct ext3_xattr_entry *first;
-	void *base;
-	void *end;
-	struct ext3_xattr_entry *here;
-	int not_found;
-};
-
-static int
-ext3_xattr_set_entry(struct ext3_xattr_info *i, struct ext3_xattr_search *s)
-{
-	struct ext3_xattr_entry *last;
-	size_t free, min_offs = s->end - s->base, name_len = strlen(i->name);
-
-	/* Compute min_offs and last. */
-	last = s->first;
-	for (; !IS_LAST_ENTRY(last); last = EXT3_XATTR_NEXT(last)) {
-		if (!last->e_value_block && last->e_value_size) {
-			size_t offs = le16_to_cpu(last->e_value_offs);
-			if (offs < min_offs)
-				min_offs = offs;
-		}
-	}
-	free = min_offs - ((void *)last - s->base) - sizeof(__u32);
-	if (!s->not_found) {
-		if (!s->here->e_value_block && s->here->e_value_size) {
-			size_t size = le32_to_cpu(s->here->e_value_size);
-			free += EXT3_XATTR_SIZE(size);
-		}
-		free += EXT3_XATTR_LEN(name_len);
-	}
-	if (i->value) {
-		if (free < EXT3_XATTR_LEN(name_len) +
-			   EXT3_XATTR_SIZE(i->value_len))
-			return -ENOSPC;
-	}
-
-	if (i->value && s->not_found) {
-		/* Insert the new name. */
-		size_t size = EXT3_XATTR_LEN(name_len);
-		size_t rest = (void *)last - (void *)s->here + sizeof(__u32);
-		memmove((void *)s->here + size, s->here, rest);
-		memset(s->here, 0, size);
-		s->here->e_name_index = i->name_index;
-		s->here->e_name_len = name_len;
-		memcpy(s->here->e_name, i->name, name_len);
-	} else {
-		if (!s->here->e_value_block && s->here->e_value_size) {
-			void *first_val = s->base + min_offs;
-			size_t offs = le16_to_cpu(s->here->e_value_offs);
-			void *val = s->base + offs;
-			size_t size = EXT3_XATTR_SIZE(
-				le32_to_cpu(s->here->e_value_size));
-
-			if (i->value && size == EXT3_XATTR_SIZE(i->value_len)) {
-				/* The old and the new value have the same
-				   size. Just replace. */
-				s->here->e_value_size =
-					cpu_to_le32(i->value_len);
-				memset(val + size - EXT3_XATTR_PAD, 0,
-				       EXT3_XATTR_PAD); /* Clear pad bytes. */
-				memcpy(val, i->value, i->value_len);
-				return 0;
-			}
-
-			/* Remove the old value. */
-			memmove(first_val + size, first_val, val - first_val);
-			memset(first_val, 0, size);
-			s->here->e_value_size = 0;
-			s->here->e_value_offs = 0;
-			min_offs += size;
-
-			/* Adjust all value offsets. */
-			last = s->first;
-			while (!IS_LAST_ENTRY(last)) {
-				size_t o = le16_to_cpu(last->e_value_offs);
-				if (!last->e_value_block &&
-				    last->e_value_size && o < offs)
-					last->e_value_offs =
-						cpu_to_le16(o + size);
-				last = EXT3_XATTR_NEXT(last);
-			}
-		}
-		if (!i->value) {
-			/* Remove the old name. */
-			size_t size = EXT3_XATTR_LEN(name_len);
-			last = ENTRY((void *)last - size);
-			memmove(s->here, (void *)s->here + size,
-				(void *)last - (void *)s->here + sizeof(__u32));
-			memset(last, 0, size);
-		}
-	}
-
-	if (i->value) {
-		/* Insert the new value. */
-		s->here->e_value_size = cpu_to_le32(i->value_len);
-		if (i->value_len) {
-			size_t size = EXT3_XATTR_SIZE(i->value_len);
-			void *val = s->base + min_offs - size;
-			s->here->e_value_offs = cpu_to_le16(min_offs - size);
-			memset(val + size - EXT3_XATTR_PAD, 0,
-			       EXT3_XATTR_PAD); /* Clear the pad bytes. */
-			memcpy(val, i->value, i->value_len);
-		}
-	}
-	return 0;
-}
-
-struct ext3_xattr_block_find {
-	struct ext3_xattr_search s;
-	struct buffer_head *bh;
-};
-
-static int
-ext3_xattr_block_find(struct inode *inode, struct ext3_xattr_info *i,
-		      struct ext3_xattr_block_find *bs)
-{
-	struct super_block *sb = inode->i_sb;
-	int error;
-
-	ea_idebug(inode, "name=%d.%s, value=%p, value_len=%ld",
-		  i->name_index, i->name, i->value, (long)i->value_len);
-
-	if (EXT3_I(inode)->i_file_acl) {
-		/* The inode already has an extended attribute block. */
-		bs->bh = sb_bread(sb, EXT3_I(inode)->i_file_acl);
-		error = -EIO;
-		if (!bs->bh)
-			goto cleanup;
-		ea_bdebug(bs->bh, "b_count=%d, refcount=%d",
-			atomic_read(&(bs->bh->b_count)),
-			le32_to_cpu(BHDR(bs->bh)->h_refcount));
-		if (ext3_xattr_check_block(bs->bh)) {
-			ext3_error(sb, __func__,
-				"inode %lu: bad block "E3FSBLK, inode->i_ino,
-				EXT3_I(inode)->i_file_acl);
-			error = -EIO;
-			goto cleanup;
-		}
-		/* Find the named attribute. */
-		bs->s.base = BHDR(bs->bh);
-		bs->s.first = BFIRST(bs->bh);
-		bs->s.end = bs->bh->b_data + bs->bh->b_size;
-		bs->s.here = bs->s.first;
-		error = ext3_xattr_find_entry(&bs->s.here, i->name_index,
-					      i->name, bs->bh->b_size, 1);
-		if (error && error != -ENODATA)
-			goto cleanup;
-		bs->s.not_found = error;
-	}
-	error = 0;
-
-cleanup:
-	return error;
-}
-
-static int
-ext3_xattr_block_set(handle_t *handle, struct inode *inode,
-		     struct ext3_xattr_info *i,
-		     struct ext3_xattr_block_find *bs)
-{
-	struct super_block *sb = inode->i_sb;
-	struct buffer_head *new_bh = NULL;
-	struct ext3_xattr_search *s = &bs->s;
-	struct mb_cache_entry *ce = NULL;
-	int error = 0;
-
-#define header(x) ((struct ext3_xattr_header *)(x))
-
-	if (i->value && i->value_len > sb->s_blocksize)
-		return -ENOSPC;
-	if (s->base) {
-		ce = mb_cache_entry_get(ext3_xattr_cache, bs->bh->b_bdev,
-					bs->bh->b_blocknr);
-		error = ext3_journal_get_write_access(handle, bs->bh);
-		if (error)
-			goto cleanup;
-		lock_buffer(bs->bh);
-
-		if (header(s->base)->h_refcount == cpu_to_le32(1)) {
-			if (ce) {
-				mb_cache_entry_free(ce);
-				ce = NULL;
-			}
-			ea_bdebug(bs->bh, "modifying in-place");
-			error = ext3_xattr_set_entry(i, s);
-			if (!error) {
-				if (!IS_LAST_ENTRY(s->first))
-					ext3_xattr_rehash(header(s->base),
-							  s->here);
-				ext3_xattr_cache_insert(bs->bh);
-			}
-			unlock_buffer(bs->bh);
-			if (error == -EIO)
-				goto bad_block;
-			if (!error)
-				error = ext3_journal_dirty_metadata(handle,
-								    bs->bh);
-			if (error)
-				goto cleanup;
-			goto inserted;
-		} else {
-			int offset = (char *)s->here - bs->bh->b_data;
-
-			unlock_buffer(bs->bh);
-			journal_release_buffer(handle, bs->bh);
-
-			if (ce) {
-				mb_cache_entry_release(ce);
-				ce = NULL;
-			}
-			ea_bdebug(bs->bh, "cloning");
-			s->base = kmalloc(bs->bh->b_size, GFP_NOFS);
-			error = -ENOMEM;
-			if (s->base == NULL)
-				goto cleanup;
-			memcpy(s->base, BHDR(bs->bh), bs->bh->b_size);
-			s->first = ENTRY(header(s->base)+1);
-			header(s->base)->h_refcount = cpu_to_le32(1);
-			s->here = ENTRY(s->base + offset);
-			s->end = s->base + bs->bh->b_size;
-		}
-	} else {
-		/* Allocate a buffer where we construct the new block. */
-		s->base = kzalloc(sb->s_blocksize, GFP_NOFS);
-		/* assert(header == s->base) */
-		error = -ENOMEM;
-		if (s->base == NULL)
-			goto cleanup;
-		header(s->base)->h_magic = cpu_to_le32(EXT3_XATTR_MAGIC);
-		header(s->base)->h_blocks = cpu_to_le32(1);
-		header(s->base)->h_refcount = cpu_to_le32(1);
-		s->first = ENTRY(header(s->base)+1);
-		s->here = ENTRY(header(s->base)+1);
-		s->end = s->base + sb->s_blocksize;
-	}
-
-	error = ext3_xattr_set_entry(i, s);
-	if (error == -EIO)
-		goto bad_block;
-	if (error)
-		goto cleanup;
-	if (!IS_LAST_ENTRY(s->first))
-		ext3_xattr_rehash(header(s->base), s->here);
-
-inserted:
-	if (!IS_LAST_ENTRY(s->first)) {
-		new_bh = ext3_xattr_cache_find(inode, header(s->base), &ce);
-		if (new_bh) {
-			/* We found an identical block in the cache. */
-			if (new_bh == bs->bh)
-				ea_bdebug(new_bh, "keeping");
-			else {
-				/* The old block is released after updating
-				   the inode. */
-				error = dquot_alloc_block(inode, 1);
-				if (error)
-					goto cleanup;
-				error = ext3_journal_get_write_access(handle,
-								      new_bh);
-				if (error)
-					goto cleanup_dquot;
-				lock_buffer(new_bh);
-				le32_add_cpu(&BHDR(new_bh)->h_refcount, 1);
-				ea_bdebug(new_bh, "reusing; refcount now=%d",
-					le32_to_cpu(BHDR(new_bh)->h_refcount));
-				unlock_buffer(new_bh);
-				error = ext3_journal_dirty_metadata(handle,
-								    new_bh);
-				if (error)
-					goto cleanup_dquot;
-			}
-			mb_cache_entry_release(ce);
-			ce = NULL;
-		} else if (bs->bh && s->base == bs->bh->b_data) {
-			/* We were modifying this block in-place. */
-			ea_bdebug(bs->bh, "keeping this block");
-			new_bh = bs->bh;
-			get_bh(new_bh);
-		} else {
-			/* We need to allocate a new block */
-			ext3_fsblk_t goal = ext3_group_first_block_no(sb,
-						EXT3_I(inode)->i_block_group);
-			ext3_fsblk_t block;
-
-			/*
-			 * Protect us agaist concurrent allocations to the
-			 * same inode from ext3_..._writepage(). Reservation
-			 * code does not expect racing allocations.
-			 */
-			mutex_lock(&EXT3_I(inode)->truncate_mutex);
-			block = ext3_new_block(handle, inode, goal, &error);
-			mutex_unlock(&EXT3_I(inode)->truncate_mutex);
-			if (error)
-				goto cleanup;
-			ea_idebug(inode, "creating block %d", block);
-
-			new_bh = sb_getblk(sb, block);
-			if (unlikely(!new_bh)) {
-getblk_failed:
-				ext3_free_blocks(handle, inode, block, 1);
-				error = -ENOMEM;
-				goto cleanup;
-			}
-			lock_buffer(new_bh);
-			error = ext3_journal_get_create_access(handle, new_bh);
-			if (error) {
-				unlock_buffer(new_bh);
-				goto getblk_failed;
-			}
-			memcpy(new_bh->b_data, s->base, new_bh->b_size);
-			set_buffer_uptodate(new_bh);
-			unlock_buffer(new_bh);
-			ext3_xattr_cache_insert(new_bh);
-			error = ext3_journal_dirty_metadata(handle, new_bh);
-			if (error)
-				goto cleanup;
-		}
-	}
-
-	/* Update the inode. */
-	EXT3_I(inode)->i_file_acl = new_bh ? new_bh->b_blocknr : 0;
-
-	/* Drop the previous xattr block. */
-	if (bs->bh && bs->bh != new_bh)
-		ext3_xattr_release_block(handle, inode, bs->bh);
-	error = 0;
-
-cleanup:
-	if (ce)
-		mb_cache_entry_release(ce);
-	brelse(new_bh);
-	if (!(bs->bh && s->base == bs->bh->b_data))
-		kfree(s->base);
-
-	return error;
-
-cleanup_dquot:
-	dquot_free_block(inode, 1);
-	goto cleanup;
-
-bad_block:
-	ext3_error(inode->i_sb, __func__,
-		   "inode %lu: bad block "E3FSBLK, inode->i_ino,
-		   EXT3_I(inode)->i_file_acl);
-	goto cleanup;
-
-#undef header
-}
-
-struct ext3_xattr_ibody_find {
-	struct ext3_xattr_search s;
-	struct ext3_iloc iloc;
-};
-
-static int
-ext3_xattr_ibody_find(struct inode *inode, struct ext3_xattr_info *i,
-		      struct ext3_xattr_ibody_find *is)
-{
-	struct ext3_xattr_ibody_header *header;
-	struct ext3_inode *raw_inode;
-	int error;
-
-	if (EXT3_I(inode)->i_extra_isize == 0)
-		return 0;
-	raw_inode = ext3_raw_inode(&is->iloc);
-	header = IHDR(inode, raw_inode);
-	is->s.base = is->s.first = IFIRST(header);
-	is->s.here = is->s.first;
-	is->s.end = (void *)raw_inode + EXT3_SB(inode->i_sb)->s_inode_size;
-	if (ext3_test_inode_state(inode, EXT3_STATE_XATTR)) {
-		error = ext3_xattr_check_names(IFIRST(header), is->s.end);
-		if (error)
-			return error;
-		/* Find the named attribute. */
-		error = ext3_xattr_find_entry(&is->s.here, i->name_index,
-					      i->name, is->s.end -
-					      (void *)is->s.base, 0);
-		if (error && error != -ENODATA)
-			return error;
-		is->s.not_found = error;
-	}
-	return 0;
-}
-
-static int
-ext3_xattr_ibody_set(handle_t *handle, struct inode *inode,
-		     struct ext3_xattr_info *i,
-		     struct ext3_xattr_ibody_find *is)
-{
-	struct ext3_xattr_ibody_header *header;
-	struct ext3_xattr_search *s = &is->s;
-	int error;
-
-	if (EXT3_I(inode)->i_extra_isize == 0)
-		return -ENOSPC;
-	error = ext3_xattr_set_entry(i, s);
-	if (error)
-		return error;
-	header = IHDR(inode, ext3_raw_inode(&is->iloc));
-	if (!IS_LAST_ENTRY(s->first)) {
-		header->h_magic = cpu_to_le32(EXT3_XATTR_MAGIC);
-		ext3_set_inode_state(inode, EXT3_STATE_XATTR);
-	} else {
-		header->h_magic = cpu_to_le32(0);
-		ext3_clear_inode_state(inode, EXT3_STATE_XATTR);
-	}
-	return 0;
-}
-
-/*
- * ext3_xattr_set_handle()
- *
- * Create, replace or remove an extended attribute for this inode.  Value
- * is NULL to remove an existing extended attribute, and non-NULL to
- * either replace an existing extended attribute, or create a new extended
- * attribute. The flags XATTR_REPLACE and XATTR_CREATE
- * specify that an extended attribute must exist and must not exist
- * previous to the call, respectively.
- *
- * Returns 0, or a negative error number on failure.
- */
-int
-ext3_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
-		      const char *name, const void *value, size_t value_len,
-		      int flags)
-{
-	struct ext3_xattr_info i = {
-		.name_index = name_index,
-		.name = name,
-		.value = value,
-		.value_len = value_len,
-
-	};
-	struct ext3_xattr_ibody_find is = {
-		.s = { .not_found = -ENODATA, },
-	};
-	struct ext3_xattr_block_find bs = {
-		.s = { .not_found = -ENODATA, },
-	};
-	int error;
-
-	if (!name)
-		return -EINVAL;
-	if (strlen(name) > 255)
-		return -ERANGE;
-	down_write(&EXT3_I(inode)->xattr_sem);
-	error = ext3_get_inode_loc(inode, &is.iloc);
-	if (error)
-		goto cleanup;
-
-	error = ext3_journal_get_write_access(handle, is.iloc.bh);
-	if (error)
-		goto cleanup;
-
-	if (ext3_test_inode_state(inode, EXT3_STATE_NEW)) {
-		struct ext3_inode *raw_inode = ext3_raw_inode(&is.iloc);
-		memset(raw_inode, 0, EXT3_SB(inode->i_sb)->s_inode_size);
-		ext3_clear_inode_state(inode, EXT3_STATE_NEW);
-	}
-
-	error = ext3_xattr_ibody_find(inode, &i, &is);
-	if (error)
-		goto cleanup;
-	if (is.s.not_found)
-		error = ext3_xattr_block_find(inode, &i, &bs);
-	if (error)
-		goto cleanup;
-	if (is.s.not_found && bs.s.not_found) {
-		error = -ENODATA;
-		if (flags & XATTR_REPLACE)
-			goto cleanup;
-		error = 0;
-		if (!value)
-			goto cleanup;
-	} else {
-		error = -EEXIST;
-		if (flags & XATTR_CREATE)
-			goto cleanup;
-	}
-	if (!value) {
-		if (!is.s.not_found)
-			error = ext3_xattr_ibody_set(handle, inode, &i, &is);
-		else if (!bs.s.not_found)
-			error = ext3_xattr_block_set(handle, inode, &i, &bs);
-	} else {
-		error = ext3_xattr_ibody_set(handle, inode, &i, &is);
-		if (!error && !bs.s.not_found) {
-			i.value = NULL;
-			error = ext3_xattr_block_set(handle, inode, &i, &bs);
-		} else if (error == -ENOSPC) {
-			if (EXT3_I(inode)->i_file_acl && !bs.s.base) {
-				error = ext3_xattr_block_find(inode, &i, &bs);
-				if (error)
-					goto cleanup;
-			}
-			error = ext3_xattr_block_set(handle, inode, &i, &bs);
-			if (error)
-				goto cleanup;
-			if (!is.s.not_found) {
-				i.value = NULL;
-				error = ext3_xattr_ibody_set(handle, inode, &i,
-							     &is);
-			}
-		}
-	}
-	if (!error) {
-		ext3_xattr_update_super_block(handle, inode->i_sb);
-		inode->i_ctime = CURRENT_TIME_SEC;
-		error = ext3_mark_iloc_dirty(handle, inode, &is.iloc);
-		/*
-		 * The bh is consumed by ext3_mark_iloc_dirty, even with
-		 * error != 0.
-		 */
-		is.iloc.bh = NULL;
-		if (IS_SYNC(inode))
-			handle->h_sync = 1;
-	}
-
-cleanup:
-	brelse(is.iloc.bh);
-	brelse(bs.bh);
-	up_write(&EXT3_I(inode)->xattr_sem);
-	return error;
-}
-
-/*
- * ext3_xattr_set()
- *
- * Like ext3_xattr_set_handle, but start from an inode. This extended
- * attribute modification is a filesystem transaction by itself.
- *
- * Returns 0, or a negative error number on failure.
- */
-int
-ext3_xattr_set(struct inode *inode, int name_index, const char *name,
-	       const void *value, size_t value_len, int flags)
-{
-	handle_t *handle;
-	int error, retries = 0;
-
-retry:
-	handle = ext3_journal_start(inode, EXT3_DATA_TRANS_BLOCKS(inode->i_sb));
-	if (IS_ERR(handle)) {
-		error = PTR_ERR(handle);
-	} else {
-		int error2;
-
-		error = ext3_xattr_set_handle(handle, inode, name_index, name,
-					      value, value_len, flags);
-		error2 = ext3_journal_stop(handle);
-		if (error == -ENOSPC &&
-		    ext3_should_retry_alloc(inode->i_sb, &retries))
-			goto retry;
-		if (error == 0)
-			error = error2;
-	}
-
-	return error;
-}
-
-/*
- * ext3_xattr_delete_inode()
- *
- * Free extended attribute resources associated with this inode. This
- * is called immediately before an inode is freed. We have exclusive
- * access to the inode.
- */
-void
-ext3_xattr_delete_inode(handle_t *handle, struct inode *inode)
-{
-	struct buffer_head *bh = NULL;
-
-	if (!EXT3_I(inode)->i_file_acl)
-		goto cleanup;
-	bh = sb_bread(inode->i_sb, EXT3_I(inode)->i_file_acl);
-	if (!bh) {
-		ext3_error(inode->i_sb, __func__,
-			"inode %lu: block "E3FSBLK" read error", inode->i_ino,
-			EXT3_I(inode)->i_file_acl);
-		goto cleanup;
-	}
-	if (BHDR(bh)->h_magic != cpu_to_le32(EXT3_XATTR_MAGIC) ||
-	    BHDR(bh)->h_blocks != cpu_to_le32(1)) {
-		ext3_error(inode->i_sb, __func__,
-			"inode %lu: bad block "E3FSBLK, inode->i_ino,
-			EXT3_I(inode)->i_file_acl);
-		goto cleanup;
-	}
-	ext3_xattr_release_block(handle, inode, bh);
-	EXT3_I(inode)->i_file_acl = 0;
-
-cleanup:
-	brelse(bh);
-}
-
-/*
- * ext3_xattr_put_super()
- *
- * This is called when a file system is unmounted.
- */
-void
-ext3_xattr_put_super(struct super_block *sb)
-{
-	mb_cache_shrink(sb->s_bdev);
-}
-
-/*
- * ext3_xattr_cache_insert()
- *
- * Create a new entry in the extended attribute cache, and insert
- * it unless such an entry is already in the cache.
- *
- * Returns 0, or a negative error number on failure.
- */
-static void
-ext3_xattr_cache_insert(struct buffer_head *bh)
-{
-	__u32 hash = le32_to_cpu(BHDR(bh)->h_hash);
-	struct mb_cache_entry *ce;
-	int error;
-
-	ce = mb_cache_entry_alloc(ext3_xattr_cache, GFP_NOFS);
-	if (!ce) {
-		ea_bdebug(bh, "out of memory");
-		return;
-	}
-	error = mb_cache_entry_insert(ce, bh->b_bdev, bh->b_blocknr, hash);
-	if (error) {
-		mb_cache_entry_free(ce);
-		if (error == -EBUSY) {
-			ea_bdebug(bh, "already in cache");
-			error = 0;
-		}
-	} else {
-		ea_bdebug(bh, "inserting [%x]", (int)hash);
-		mb_cache_entry_release(ce);
-	}
-}
-
-/*
- * ext3_xattr_cmp()
- *
- * Compare two extended attribute blocks for equality.
- *
- * Returns 0 if the blocks are equal, 1 if they differ, and
- * a negative error number on errors.
- */
-static int
-ext3_xattr_cmp(struct ext3_xattr_header *header1,
-	       struct ext3_xattr_header *header2)
-{
-	struct ext3_xattr_entry *entry1, *entry2;
-
-	entry1 = ENTRY(header1+1);
-	entry2 = ENTRY(header2+1);
-	while (!IS_LAST_ENTRY(entry1)) {
-		if (IS_LAST_ENTRY(entry2))
-			return 1;
-		if (entry1->e_hash != entry2->e_hash ||
-		    entry1->e_name_index != entry2->e_name_index ||
-		    entry1->e_name_len != entry2->e_name_len ||
-		    entry1->e_value_size != entry2->e_value_size ||
-		    memcmp(entry1->e_name, entry2->e_name, entry1->e_name_len))
-			return 1;
-		if (entry1->e_value_block != 0 || entry2->e_value_block != 0)
-			return -EIO;
-		if (memcmp((char *)header1 + le16_to_cpu(entry1->e_value_offs),
-			   (char *)header2 + le16_to_cpu(entry2->e_value_offs),
-			   le32_to_cpu(entry1->e_value_size)))
-			return 1;
-
-		entry1 = EXT3_XATTR_NEXT(entry1);
-		entry2 = EXT3_XATTR_NEXT(entry2);
-	}
-	if (!IS_LAST_ENTRY(entry2))
-		return 1;
-	return 0;
-}
-
-/*
- * ext3_xattr_cache_find()
- *
- * Find an identical extended attribute block.
- *
- * Returns a pointer to the block found, or NULL if such a block was
- * not found or an error occurred.
- */
-static struct buffer_head *
-ext3_xattr_cache_find(struct inode *inode, struct ext3_xattr_header *header,
-		      struct mb_cache_entry **pce)
-{
-	__u32 hash = le32_to_cpu(header->h_hash);
-	struct mb_cache_entry *ce;
-
-	if (!header->h_hash)
-		return NULL;  /* never share */
-	ea_idebug(inode, "looking for cached blocks [%x]", (int)hash);
-again:
-	ce = mb_cache_entry_find_first(ext3_xattr_cache, inode->i_sb->s_bdev,
-				       hash);
-	while (ce) {
-		struct buffer_head *bh;
-
-		if (IS_ERR(ce)) {
-			if (PTR_ERR(ce) == -EAGAIN)
-				goto again;
-			break;
-		}
-		bh = sb_bread(inode->i_sb, ce->e_block);
-		if (!bh) {
-			ext3_error(inode->i_sb, __func__,
-				"inode %lu: block %lu read error",
-				inode->i_ino, (unsigned long) ce->e_block);
-		} else if (le32_to_cpu(BHDR(bh)->h_refcount) >=
-				EXT3_XATTR_REFCOUNT_MAX) {
-			ea_idebug(inode, "block %lu refcount %d>=%d",
-				  (unsigned long) ce->e_block,
-				  le32_to_cpu(BHDR(bh)->h_refcount),
-					  EXT3_XATTR_REFCOUNT_MAX);
-		} else if (ext3_xattr_cmp(header, BHDR(bh)) == 0) {
-			*pce = ce;
-			return bh;
-		}
-		brelse(bh);
-		ce = mb_cache_entry_find_next(ce, inode->i_sb->s_bdev, hash);
-	}
-	return NULL;
-}
-
-#define NAME_HASH_SHIFT 5
-#define VALUE_HASH_SHIFT 16
-
-/*
- * ext3_xattr_hash_entry()
- *
- * Compute the hash of an extended attribute.
- */
-static inline void ext3_xattr_hash_entry(struct ext3_xattr_header *header,
-					 struct ext3_xattr_entry *entry)
-{
-	__u32 hash = 0;
-	char *name = entry->e_name;
-	int n;
-
-	for (n=0; n < entry->e_name_len; n++) {
-		hash = (hash << NAME_HASH_SHIFT) ^
-		       (hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^
-		       *name++;
-	}
-
-	if (entry->e_value_block == 0 && entry->e_value_size != 0) {
-		__le32 *value = (__le32 *)((char *)header +
-			le16_to_cpu(entry->e_value_offs));
-		for (n = (le32_to_cpu(entry->e_value_size) +
-		     EXT3_XATTR_ROUND) >> EXT3_XATTR_PAD_BITS; n; n--) {
-			hash = (hash << VALUE_HASH_SHIFT) ^
-			       (hash >> (8*sizeof(hash) - VALUE_HASH_SHIFT)) ^
-			       le32_to_cpu(*value++);
-		}
-	}
-	entry->e_hash = cpu_to_le32(hash);
-}
-
-#undef NAME_HASH_SHIFT
-#undef VALUE_HASH_SHIFT
-
-#define BLOCK_HASH_SHIFT 16
-
-/*
- * ext3_xattr_rehash()
- *
- * Re-compute the extended attribute hash value after an entry has changed.
- */
-static void ext3_xattr_rehash(struct ext3_xattr_header *header,
-			      struct ext3_xattr_entry *entry)
-{
-	struct ext3_xattr_entry *here;
-	__u32 hash = 0;
-
-	ext3_xattr_hash_entry(header, entry);
-	here = ENTRY(header+1);
-	while (!IS_LAST_ENTRY(here)) {
-		if (!here->e_hash) {
-			/* Block is not shared if an entry's hash value == 0 */
-			hash = 0;
-			break;
-		}
-		hash = (hash << BLOCK_HASH_SHIFT) ^
-		       (hash >> (8*sizeof(hash) - BLOCK_HASH_SHIFT)) ^
-		       le32_to_cpu(here->e_hash);
-		here = EXT3_XATTR_NEXT(here);
-	}
-	header->h_hash = cpu_to_le32(hash);
-}
-
-#undef BLOCK_HASH_SHIFT
-
-int __init
-init_ext3_xattr(void)
-{
-	ext3_xattr_cache = mb_cache_create("ext3_xattr", 6);
-	if (!ext3_xattr_cache)
-		return -ENOMEM;
-	return 0;
-}
-
-void
-exit_ext3_xattr(void)
-{
-	if (ext3_xattr_cache)
-		mb_cache_destroy(ext3_xattr_cache);
-	ext3_xattr_cache = NULL;
-}
diff --git a/fs/ext3/xattr.h b/fs/ext3/xattr.h
deleted file mode 100644
index 32e93ebf8031..000000000000
--- a/fs/ext3/xattr.h
+++ /dev/null
@@ -1,136 +0,0 @@
-/*
-  File: fs/ext3/xattr.h
-
-  On-disk format of extended attributes for the ext3 filesystem.
-
-  (C) 2001 Andreas Gruenbacher, <a.gruenbacher@computer.org>
-*/
-
-#include <linux/xattr.h>
-
-/* Magic value in attribute blocks */
-#define EXT3_XATTR_MAGIC		0xEA020000
-
-/* Maximum number of references to one attribute block */
-#define EXT3_XATTR_REFCOUNT_MAX		1024
-
-/* Name indexes */
-#define EXT3_XATTR_INDEX_USER			1
-#define EXT3_XATTR_INDEX_POSIX_ACL_ACCESS	2
-#define EXT3_XATTR_INDEX_POSIX_ACL_DEFAULT	3
-#define EXT3_XATTR_INDEX_TRUSTED		4
-#define	EXT3_XATTR_INDEX_LUSTRE			5
-#define EXT3_XATTR_INDEX_SECURITY	        6
-
-struct ext3_xattr_header {
-	__le32	h_magic;	/* magic number for identification */
-	__le32	h_refcount;	/* reference count */
-	__le32	h_blocks;	/* number of disk blocks used */
-	__le32	h_hash;		/* hash value of all attributes */
-	__u32	h_reserved[4];	/* zero right now */
-};
-
-struct ext3_xattr_ibody_header {
-	__le32	h_magic;	/* magic number for identification */
-};
-
-struct ext3_xattr_entry {
-	__u8	e_name_len;	/* length of name */
-	__u8	e_name_index;	/* attribute name index */
-	__le16	e_value_offs;	/* offset in disk block of value */
-	__le32	e_value_block;	/* disk block attribute is stored on (n/i) */
-	__le32	e_value_size;	/* size of attribute value */
-	__le32	e_hash;		/* hash value of name and value */
-	char	e_name[0];	/* attribute name */
-};
-
-#define EXT3_XATTR_PAD_BITS		2
-#define EXT3_XATTR_PAD		(1<<EXT3_XATTR_PAD_BITS)
-#define EXT3_XATTR_ROUND		(EXT3_XATTR_PAD-1)
-#define EXT3_XATTR_LEN(name_len) \
-	(((name_len) + EXT3_XATTR_ROUND + \
-	sizeof(struct ext3_xattr_entry)) & ~EXT3_XATTR_ROUND)
-#define EXT3_XATTR_NEXT(entry) \
-	( (struct ext3_xattr_entry *)( \
-	  (char *)(entry) + EXT3_XATTR_LEN((entry)->e_name_len)) )
-#define EXT3_XATTR_SIZE(size) \
-	(((size) + EXT3_XATTR_ROUND) & ~EXT3_XATTR_ROUND)
-
-# ifdef CONFIG_EXT3_FS_XATTR
-
-extern const struct xattr_handler ext3_xattr_user_handler;
-extern const struct xattr_handler ext3_xattr_trusted_handler;
-extern const struct xattr_handler ext3_xattr_security_handler;
-
-extern ssize_t ext3_listxattr(struct dentry *, char *, size_t);
-
-extern int ext3_xattr_get(struct inode *, int, const char *, void *, size_t);
-extern int ext3_xattr_set(struct inode *, int, const char *, const void *, size_t, int);
-extern int ext3_xattr_set_handle(handle_t *, struct inode *, int, const char *, const void *, size_t, int);
-
-extern void ext3_xattr_delete_inode(handle_t *, struct inode *);
-extern void ext3_xattr_put_super(struct super_block *);
-
-extern int init_ext3_xattr(void);
-extern void exit_ext3_xattr(void);
-
-extern const struct xattr_handler *ext3_xattr_handlers[];
-
-# else  /* CONFIG_EXT3_FS_XATTR */
-
-static inline int
-ext3_xattr_get(struct inode *inode, int name_index, const char *name,
-	       void *buffer, size_t size, int flags)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline int
-ext3_xattr_set(struct inode *inode, int name_index, const char *name,
-	       const void *value, size_t size, int flags)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline int
-ext3_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
-	       const char *name, const void *value, size_t size, int flags)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline void
-ext3_xattr_delete_inode(handle_t *handle, struct inode *inode)
-{
-}
-
-static inline void
-ext3_xattr_put_super(struct super_block *sb)
-{
-}
-
-static inline int
-init_ext3_xattr(void)
-{
-	return 0;
-}
-
-static inline void
-exit_ext3_xattr(void)
-{
-}
-
-#define ext3_xattr_handlers	NULL
-
-# endif  /* CONFIG_EXT3_FS_XATTR */
-
-#ifdef CONFIG_EXT3_FS_SECURITY
-extern int ext3_init_security(handle_t *handle, struct inode *inode,
-			      struct inode *dir, const struct qstr *qstr);
-#else
-static inline int ext3_init_security(handle_t *handle, struct inode *inode,
-				     struct inode *dir, const struct qstr *qstr)
-{
-	return 0;
-}
-#endif
diff --git a/fs/ext3/xattr_security.c b/fs/ext3/xattr_security.c
deleted file mode 100644
index c9506d5e3b13..000000000000
--- a/fs/ext3/xattr_security.c
+++ /dev/null
@@ -1,78 +0,0 @@
-/*
- * linux/fs/ext3/xattr_security.c
- * Handler for storing security labels as extended attributes.
- */
-
-#include <linux/security.h>
-#include "ext3.h"
-#include "xattr.h"
-
-static size_t
-ext3_xattr_security_list(struct dentry *dentry, char *list, size_t list_size,
-			 const char *name, size_t name_len, int type)
-{
-	const size_t prefix_len = XATTR_SECURITY_PREFIX_LEN;
-	const size_t total_len = prefix_len + name_len + 1;
-
-
-	if (list && total_len <= list_size) {
-		memcpy(list, XATTR_SECURITY_PREFIX, prefix_len);
-		memcpy(list+prefix_len, name, name_len);
-		list[prefix_len + name_len] = '\0';
-	}
-	return total_len;
-}
-
-static int
-ext3_xattr_security_get(struct dentry *dentry, const char *name,
-		void *buffer, size_t size, int type)
-{
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	return ext3_xattr_get(d_inode(dentry), EXT3_XATTR_INDEX_SECURITY,
-			      name, buffer, size);
-}
-
-static int
-ext3_xattr_security_set(struct dentry *dentry, const char *name,
-		const void *value, size_t size, int flags, int type)
-{
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	return ext3_xattr_set(d_inode(dentry), EXT3_XATTR_INDEX_SECURITY,
-			      name, value, size, flags);
-}
-
-static int ext3_initxattrs(struct inode *inode,
-			   const struct xattr *xattr_array,
-			   void *fs_info)
-{
-	const struct xattr *xattr;
-	handle_t *handle = fs_info;
-	int err = 0;
-
-	for (xattr = xattr_array; xattr->name != NULL; xattr++) {
-		err = ext3_xattr_set_handle(handle, inode,
-					    EXT3_XATTR_INDEX_SECURITY,
-					    xattr->name, xattr->value,
-					    xattr->value_len, 0);
-		if (err < 0)
-			break;
-	}
-	return err;
-}
-
-int
-ext3_init_security(handle_t *handle, struct inode *inode, struct inode *dir,
-		   const struct qstr *qstr)
-{
-	return security_inode_init_security(inode, dir, qstr,
-					    &ext3_initxattrs, handle);
-}
-
-const struct xattr_handler ext3_xattr_security_handler = {
-	.prefix	= XATTR_SECURITY_PREFIX,
-	.list	= ext3_xattr_security_list,
-	.get	= ext3_xattr_security_get,
-	.set	= ext3_xattr_security_set,
-};
diff --git a/fs/ext3/xattr_trusted.c b/fs/ext3/xattr_trusted.c
deleted file mode 100644
index 206cc66dc285..000000000000
--- a/fs/ext3/xattr_trusted.c
+++ /dev/null
@@ -1,54 +0,0 @@
-/*
- * linux/fs/ext3/xattr_trusted.c
- * Handler for trusted extended attributes.
- *
- * Copyright (C) 2003 by Andreas Gruenbacher, <a.gruenbacher@computer.org>
- */
-
-#include "ext3.h"
-#include "xattr.h"
-
-static size_t
-ext3_xattr_trusted_list(struct dentry *dentry, char *list, size_t list_size,
-		const char *name, size_t name_len, int type)
-{
-	const size_t prefix_len = XATTR_TRUSTED_PREFIX_LEN;
-	const size_t total_len = prefix_len + name_len + 1;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return 0;
-
-	if (list && total_len <= list_size) {
-		memcpy(list, XATTR_TRUSTED_PREFIX, prefix_len);
-		memcpy(list+prefix_len, name, name_len);
-		list[prefix_len + name_len] = '\0';
-	}
-	return total_len;
-}
-
-static int
-ext3_xattr_trusted_get(struct dentry *dentry, const char *name,
-		       void *buffer, size_t size, int type)
-{
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	return ext3_xattr_get(d_inode(dentry), EXT3_XATTR_INDEX_TRUSTED,
-			      name, buffer, size);
-}
-
-static int
-ext3_xattr_trusted_set(struct dentry *dentry, const char *name,
-		const void *value, size_t size, int flags, int type)
-{
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	return ext3_xattr_set(d_inode(dentry), EXT3_XATTR_INDEX_TRUSTED, name,
-			      value, size, flags);
-}
-
-const struct xattr_handler ext3_xattr_trusted_handler = {
-	.prefix	= XATTR_TRUSTED_PREFIX,
-	.list	= ext3_xattr_trusted_list,
-	.get	= ext3_xattr_trusted_get,
-	.set	= ext3_xattr_trusted_set,
-};
diff --git a/fs/ext3/xattr_user.c b/fs/ext3/xattr_user.c
deleted file mode 100644
index 021508ad1616..000000000000
--- a/fs/ext3/xattr_user.c
+++ /dev/null
@@ -1,58 +0,0 @@
-/*
- * linux/fs/ext3/xattr_user.c
- * Handler for extended user attributes.
- *
- * Copyright (C) 2001 by Andreas Gruenbacher, <a.gruenbacher@computer.org>
- */
-
-#include "ext3.h"
-#include "xattr.h"
-
-static size_t
-ext3_xattr_user_list(struct dentry *dentry, char *list, size_t list_size,
-		const char *name, size_t name_len, int type)
-{
-	const size_t prefix_len = XATTR_USER_PREFIX_LEN;
-	const size_t total_len = prefix_len + name_len + 1;
-
-	if (!test_opt(dentry->d_sb, XATTR_USER))
-		return 0;
-
-	if (list && total_len <= list_size) {
-		memcpy(list, XATTR_USER_PREFIX, prefix_len);
-		memcpy(list+prefix_len, name, name_len);
-		list[prefix_len + name_len] = '\0';
-	}
-	return total_len;
-}
-
-static int
-ext3_xattr_user_get(struct dentry *dentry, const char *name, void *buffer,
-		size_t size, int type)
-{
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	if (!test_opt(dentry->d_sb, XATTR_USER))
-		return -EOPNOTSUPP;
-	return ext3_xattr_get(d_inode(dentry), EXT3_XATTR_INDEX_USER,
-			      name, buffer, size);
-}
-
-static int
-ext3_xattr_user_set(struct dentry *dentry, const char *name,
-		const void *value, size_t size, int flags, int type)
-{
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	if (!test_opt(dentry->d_sb, XATTR_USER))
-		return -EOPNOTSUPP;
-	return ext3_xattr_set(d_inode(dentry), EXT3_XATTR_INDEX_USER,
-			      name, value, size, flags);
-}
-
-const struct xattr_handler ext3_xattr_user_handler = {
-	.prefix	= XATTR_USER_PREFIX,
-	.list	= ext3_xattr_user_list,
-	.get	= ext3_xattr_user_get,
-	.set	= ext3_xattr_user_set,
-};
diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
index bf8bc8aba471..219a190ccae9 100644
--- a/fs/ext4/Kconfig
+++ b/fs/ext4/Kconfig
@@ -1,5 +1,38 @@
+# Ext3 configs are here for backward compatibility with old configs which may
+# have EXT3_FS set but not EXT4_FS set and thus would result in non-bootable
+# kernels after the removal of ext3 driver.
+config EXT3_FS
+	tristate "The Extended 3 (ext3) filesystem"
+	# These must match EXT4_FS selects...
+	select EXT4_FS
+	select JBD2
+	select CRC16
+	select CRYPTO
+	select CRYPTO_CRC32C
+	help
+	  This config option is here only for backward compatibility. ext3
+	  filesystem is now handled by the ext4 driver.
+
+config EXT3_FS_POSIX_ACL
+	bool "Ext3 POSIX Access Control Lists"
+	depends on EXT3_FS
+	select EXT4_FS_POSIX_ACL
+	select FS_POSIX_ACL
+	help
+	  This config option is here only for backward compatibility. ext3
+	  filesystem is now handled by the ext4 driver.
+
+config EXT3_FS_SECURITY
+	bool "Ext3 Security Labels"
+	depends on EXT3_FS
+	select EXT4_FS_SECURITY
+	help
+	  This config option is here only for backward compatibility. ext3
+	  filesystem is now handled by the ext4 driver.
+
 config EXT4_FS
 	tristate "The Extended 4 (ext4) filesystem"
+	# Please update EXT3_FS selects when changing these
 	select JBD2
 	select CRC16
 	select CRYPTO
@@ -28,14 +61,14 @@ config EXT4_FS
 
 	  If unsure, say N.
 
-config EXT4_USE_FOR_EXT23
+config EXT4_USE_FOR_EXT2
 	bool "Use ext4 for ext2/ext3 file systems"
 	depends on EXT4_FS
-	depends on EXT3_FS=n || EXT2_FS=n
+	depends on EXT2_FS=n
 	default y
 	help
-	  Allow the ext4 file system driver code to be used for ext2 or
-	  ext3 file system mounts.  This allows users to reduce their
+	  Allow the ext4 file system driver code to be used for ext2
+	  file system mounts.  This allows users to reduce their
 	  compiled kernel size by using one file system driver for
 	  ext2, ext3, and ext4 file systems.
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 58987b5c514b..06b4b14e8aa0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -84,7 +84,7 @@ static void ext4_unregister_li_request(struct super_block *sb);
 static void ext4_clear_request_list(void);
 static int ext4_reserve_clusters(struct ext4_sb_info *, ext4_fsblk_t);
 
-#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
+#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT2)
 static struct file_system_type ext2_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "ext2",
@@ -100,7 +100,6 @@ MODULE_ALIAS("ext2");
 #endif
 
 
-#if !defined(CONFIG_EXT3_FS) && !defined(CONFIG_EXT3_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
 static struct file_system_type ext3_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "ext3",
@@ -111,9 +110,6 @@ static struct file_system_type ext3_fs_type = {
 MODULE_ALIAS_FS("ext3");
 MODULE_ALIAS("ext3");
 #define IS_EXT3_SB(sb) ((sb)->s_bdev->bd_holder == &ext3_fs_type)
-#else
-#define IS_EXT3_SB(sb) (0)
-#endif
 
 static int ext4_verify_csum_type(struct super_block *sb,
 				 struct ext4_super_block *es)
@@ -5500,7 +5496,7 @@ static struct dentry *ext4_mount(struct file_system_type *fs_type, int flags,
 	return mount_bdev(fs_type, flags, dev_name, data, ext4_fill_super);
 }
 
-#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
+#if !defined(CONFIG_EXT2_FS) && !defined(CONFIG_EXT2_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT2)
 static inline void register_as_ext2(void)
 {
 	int err = register_filesystem(&ext2_fs_type);
@@ -5530,7 +5526,6 @@ static inline void unregister_as_ext2(void) { }
 static inline int ext2_feature_set_ok(struct super_block *sb) { return 0; }
 #endif
 
-#if !defined(CONFIG_EXT3_FS) && !defined(CONFIG_EXT3_FS_MODULE) && defined(CONFIG_EXT4_USE_FOR_EXT23)
 static inline void register_as_ext3(void)
 {
 	int err = register_filesystem(&ext3_fs_type);
@@ -5556,11 +5551,6 @@ static inline int ext3_feature_set_ok(struct super_block *sb)
 		return 0;
 	return 1;
 }
-#else
-static inline void register_as_ext3(void) { }
-static inline void unregister_as_ext3(void) { }
-static inline int ext3_feature_set_ok(struct super_block *sb) { return 0; }
-#endif
 
 static struct file_system_type ext4_fs_type = {
 	.owner		= THIS_MODULE,
diff --git a/fs/jbd/Kconfig b/fs/jbd/Kconfig
deleted file mode 100644
index 4e28beeed157..000000000000
--- a/fs/jbd/Kconfig
+++ /dev/null
@@ -1,30 +0,0 @@
-config JBD
-	tristate
-	help
-	  This is a generic journalling layer for block devices.  It is
-	  currently used by the ext3 file system, but it could also be
-	  used to add journal support to other file systems or block
-	  devices such as RAID or LVM.
-
-	  If you are using the ext3 file system, you need to say Y here.
-	  If you are not using ext3 then you will probably want to say N.
-
-	  To compile this device as a module, choose M here: the module will be
-	  called jbd.  If you are compiling ext3 into the kernel, you
-	  cannot compile this code as a module.
-
-config JBD_DEBUG
-	bool "JBD (ext3) debugging support"
-	depends on JBD && DEBUG_FS
-	help
-	  If you are using the ext3 journaled file system (or potentially any
-	  other file system/device using JBD), this option allows you to
-	  enable debugging output while the system is running, in order to
-	  help track down any problems you are having.  By default the
-	  debugging output will be turned off.
-
-	  If you select Y here, then you will be able to turn on debugging
-	  with "echo N > /sys/kernel/debug/jbd/jbd-debug", where N is a
-	  number between 1 and 5, the higher the number, the more debugging
-	  output is generated.  To turn debugging off again, do
-	  "echo 0 > /sys/kernel/debug/jbd/jbd-debug".
diff --git a/fs/jbd/Makefile b/fs/jbd/Makefile
deleted file mode 100644
index 54aca4868a36..000000000000
--- a/fs/jbd/Makefile
+++ /dev/null
@@ -1,7 +0,0 @@
-#
-# Makefile for the linux journaling routines.
-#
-
-obj-$(CONFIG_JBD) += jbd.o
-
-jbd-objs := transaction.o commit.o recovery.o checkpoint.o revoke.o journal.o
diff --git a/fs/jbd/checkpoint.c b/fs/jbd/checkpoint.c
deleted file mode 100644
index 08c03044abdd..000000000000
--- a/fs/jbd/checkpoint.c
+++ /dev/null
@@ -1,782 +0,0 @@
-/*
- * linux/fs/jbd/checkpoint.c
- *
- * Written by Stephen C. Tweedie <sct@redhat.com>, 1999
- *
- * Copyright 1999 Red Hat Software --- All Rights Reserved
- *
- * This file is part of the Linux kernel and is made available under
- * the terms of the GNU General Public License, version 2, or at your
- * option, any later version, incorporated herein by reference.
- *
- * Checkpoint routines for the generic filesystem journaling code.
- * Part of the ext2fs journaling system.
- *
- * Checkpointing is the process of ensuring that a section of the log is
- * committed fully to disk, so that that portion of the log can be
- * reused.
- */
-
-#include <linux/time.h>
-#include <linux/fs.h>
-#include <linux/jbd.h>
-#include <linux/errno.h>
-#include <linux/slab.h>
-#include <linux/blkdev.h>
-#include <trace/events/jbd.h>
-
-/*
- * Unlink a buffer from a transaction checkpoint list.
- *
- * Called with j_list_lock held.
- */
-static inline void __buffer_unlink_first(struct journal_head *jh)
-{
-	transaction_t *transaction = jh->b_cp_transaction;
-
-	jh->b_cpnext->b_cpprev = jh->b_cpprev;
-	jh->b_cpprev->b_cpnext = jh->b_cpnext;
-	if (transaction->t_checkpoint_list == jh) {
-		transaction->t_checkpoint_list = jh->b_cpnext;
-		if (transaction->t_checkpoint_list == jh)
-			transaction->t_checkpoint_list = NULL;
-	}
-}
-
-/*
- * Unlink a buffer from a transaction checkpoint(io) list.
- *
- * Called with j_list_lock held.
- */
-static inline void __buffer_unlink(struct journal_head *jh)
-{
-	transaction_t *transaction = jh->b_cp_transaction;
-
-	__buffer_unlink_first(jh);
-	if (transaction->t_checkpoint_io_list == jh) {
-		transaction->t_checkpoint_io_list = jh->b_cpnext;
-		if (transaction->t_checkpoint_io_list == jh)
-			transaction->t_checkpoint_io_list = NULL;
-	}
-}
-
-/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
-	transaction_t *transaction = jh->b_cp_transaction;
-
-	__buffer_unlink_first(jh);
-
-	if (!transaction->t_checkpoint_io_list) {
-		jh->b_cpnext = jh->b_cpprev = jh;
-	} else {
-		jh->b_cpnext = transaction->t_checkpoint_io_list;
-		jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
-		jh->b_cpprev->b_cpnext = jh;
-		jh->b_cpnext->b_cpprev = jh;
-	}
-	transaction->t_checkpoint_io_list = jh;
-}
-
-/*
- * Try to release a checkpointed buffer from its transaction.
- * Returns 1 if we released it and 2 if we also released the
- * whole transaction.
- *
- * Requires j_list_lock
- * Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
- */
-static int __try_to_free_cp_buf(struct journal_head *jh)
-{
-	int ret = 0;
-	struct buffer_head *bh = jh2bh(jh);
-
-	if (jh->b_jlist == BJ_None && !buffer_locked(bh) &&
-	    !buffer_dirty(bh) && !buffer_write_io_error(bh)) {
-		/*
-		 * Get our reference so that bh cannot be freed before
-		 * we unlock it
-		 */
-		get_bh(bh);
-		JBUFFER_TRACE(jh, "remove from checkpoint list");
-		ret = __journal_remove_checkpoint(jh) + 1;
-		jbd_unlock_bh_state(bh);
-		BUFFER_TRACE(bh, "release");
-		__brelse(bh);
-	} else {
-		jbd_unlock_bh_state(bh);
-	}
-	return ret;
-}
-
-/*
- * __log_wait_for_space: wait until there is space in the journal.
- *
- * Called under j-state_lock *only*.  It will be unlocked if we have to wait
- * for a checkpoint to free up some space in the log.
- */
-void __log_wait_for_space(journal_t *journal)
-{
-	int nblocks, space_left;
-	assert_spin_locked(&journal->j_state_lock);
-
-	nblocks = jbd_space_needed(journal);
-	while (__log_space_left(journal) < nblocks) {
-		if (journal->j_flags & JFS_ABORT)
-			return;
-		spin_unlock(&journal->j_state_lock);
-		mutex_lock(&journal->j_checkpoint_mutex);
-
-		/*
-		 * Test again, another process may have checkpointed while we
-		 * were waiting for the checkpoint lock. If there are no
-		 * transactions ready to be checkpointed, try to recover
-		 * journal space by calling cleanup_journal_tail(), and if
-		 * that doesn't work, by waiting for the currently committing
-		 * transaction to complete.  If there is absolutely no way
-		 * to make progress, this is either a BUG or corrupted
-		 * filesystem, so abort the journal and leave a stack
-		 * trace for forensic evidence.
-		 */
-		spin_lock(&journal->j_state_lock);
-		spin_lock(&journal->j_list_lock);
-		nblocks = jbd_space_needed(journal);
-		space_left = __log_space_left(journal);
-		if (space_left < nblocks) {
-			int chkpt = journal->j_checkpoint_transactions != NULL;
-			tid_t tid = 0;
-
-			if (journal->j_committing_transaction)
-				tid = journal->j_committing_transaction->t_tid;
-			spin_unlock(&journal->j_list_lock);
-			spin_unlock(&journal->j_state_lock);
-			if (chkpt) {
-				log_do_checkpoint(journal);
-			} else if (cleanup_journal_tail(journal) == 0) {
-				/* We were able to recover space; yay! */
-				;
-			} else if (tid) {
-				log_wait_commit(journal, tid);
-			} else {
-				printk(KERN_ERR "%s: needed %d blocks and "
-				       "only had %d space available\n",
-				       __func__, nblocks, space_left);
-				printk(KERN_ERR "%s: no way to get more "
-				       "journal space\n", __func__);
-				WARN_ON(1);
-				journal_abort(journal, 0);
-			}
-			spin_lock(&journal->j_state_lock);
-		} else {
-			spin_unlock(&journal->j_list_lock);
-		}
-		mutex_unlock(&journal->j_checkpoint_mutex);
-	}
-}
-
-/*
- * We were unable to perform jbd_trylock_bh_state() inside j_list_lock.
- * The caller must restart a list walk.  Wait for someone else to run
- * jbd_unlock_bh_state().
- */
-static void jbd_sync_bh(journal_t *journal, struct buffer_head *bh)
-	__releases(journal->j_list_lock)
-{
-	get_bh(bh);
-	spin_unlock(&journal->j_list_lock);
-	jbd_lock_bh_state(bh);
-	jbd_unlock_bh_state(bh);
-	put_bh(bh);
-}
-
-/*
- * Clean up transaction's list of buffers submitted for io.
- * We wait for any pending IO to complete and remove any clean
- * buffers. Note that we take the buffers in the opposite ordering
- * from the one in which they were submitted for IO.
- *
- * Return 0 on success, and return <0 if some buffers have failed
- * to be written out.
- *
- * Called with j_list_lock held.
- */
-static int __wait_cp_io(journal_t *journal, transaction_t *transaction)
-{
-	struct journal_head *jh;
-	struct buffer_head *bh;
-	tid_t this_tid;
-	int released = 0;
-	int ret = 0;
-
-	this_tid = transaction->t_tid;
-restart:
-	/* Did somebody clean up the transaction in the meanwhile? */
-	if (journal->j_checkpoint_transactions != transaction ||
-			transaction->t_tid != this_tid)
-		return ret;
-	while (!released && transaction->t_checkpoint_io_list) {
-		jh = transaction->t_checkpoint_io_list;
-		bh = jh2bh(jh);
-		if (!jbd_trylock_bh_state(bh)) {
-			jbd_sync_bh(journal, bh);
-			spin_lock(&journal->j_list_lock);
-			goto restart;
-		}
-		get_bh(bh);
-		if (buffer_locked(bh)) {
-			spin_unlock(&journal->j_list_lock);
-			jbd_unlock_bh_state(bh);
-			wait_on_buffer(bh);
-			/* the journal_head may have gone by now */
-			BUFFER_TRACE(bh, "brelse");
-			__brelse(bh);
-			spin_lock(&journal->j_list_lock);
-			goto restart;
-		}
-		if (unlikely(buffer_write_io_error(bh)))
-			ret = -EIO;
-
-		/*
-		 * Now in whatever state the buffer currently is, we know that
-		 * it has been written out and so we can drop it from the list
-		 */
-		released = __journal_remove_checkpoint(jh);
-		jbd_unlock_bh_state(bh);
-		__brelse(bh);
-	}
-
-	return ret;
-}
-
-#define NR_BATCH	64
-
-static void
-__flush_batch(journal_t *journal, struct buffer_head **bhs, int *batch_count)
-{
-	int i;
-	struct blk_plug plug;
-
-	blk_start_plug(&plug);
-	for (i = 0; i < *batch_count; i++)
-		write_dirty_buffer(bhs[i], WRITE_SYNC);
-	blk_finish_plug(&plug);
-
-	for (i = 0; i < *batch_count; i++) {
-		struct buffer_head *bh = bhs[i];
-		clear_buffer_jwrite(bh);
-		BUFFER_TRACE(bh, "brelse");
-		__brelse(bh);
-	}
-	*batch_count = 0;
-}
-
-/*
- * Try to flush one buffer from the checkpoint list to disk.
- *
- * Return 1 if something happened which requires us to abort the current
- * scan of the checkpoint list.  Return <0 if the buffer has failed to
- * be written out.
- *
- * Called with j_list_lock held and drops it if 1 is returned
- * Called under jbd_lock_bh_state(jh2bh(jh)), and drops it
- */
-static int __process_buffer(journal_t *journal, struct journal_head *jh,
-			struct buffer_head **bhs, int *batch_count)
-{
-	struct buffer_head *bh = jh2bh(jh);
-	int ret = 0;
-
-	if (buffer_locked(bh)) {
-		get_bh(bh);
-		spin_unlock(&journal->j_list_lock);
-		jbd_unlock_bh_state(bh);
-		wait_on_buffer(bh);
-		/* the journal_head may have gone by now */
-		BUFFER_TRACE(bh, "brelse");
-		__brelse(bh);
-		ret = 1;
-	} else if (jh->b_transaction != NULL) {
-		transaction_t *t = jh->b_transaction;
-		tid_t tid = t->t_tid;
-
-		spin_unlock(&journal->j_list_lock);
-		jbd_unlock_bh_state(bh);
-		log_start_commit(journal, tid);
-		log_wait_commit(journal, tid);
-		ret = 1;
-	} else if (!buffer_dirty(bh)) {
-		ret = 1;
-		if (unlikely(buffer_write_io_error(bh)))
-			ret = -EIO;
-		get_bh(bh);
-		J_ASSERT_JH(jh, !buffer_jbddirty(bh));
-		BUFFER_TRACE(bh, "remove from checkpoint");
-		__journal_remove_checkpoint(jh);
-		spin_unlock(&journal->j_list_lock);
-		jbd_unlock_bh_state(bh);
-		__brelse(bh);
-	} else {
-		/*
-		 * Important: we are about to write the buffer, and
-		 * possibly block, while still holding the journal lock.
-		 * We cannot afford to let the transaction logic start
-		 * messing around with this buffer before we write it to
-		 * disk, as that would break recoverability.
-		 */
-		BUFFER_TRACE(bh, "queue");
-		get_bh(bh);
-		J_ASSERT_BH(bh, !buffer_jwrite(bh));
-		set_buffer_jwrite(bh);
-		bhs[*batch_count] = bh;
-		__buffer_relink_io(jh);
-		jbd_unlock_bh_state(bh);
-		(*batch_count)++;
-		if (*batch_count == NR_BATCH) {
-			spin_unlock(&journal->j_list_lock);
-			__flush_batch(journal, bhs, batch_count);
-			ret = 1;
-		}
-	}
-	return ret;
-}
-
-/*
- * Perform an actual checkpoint. We take the first transaction on the
- * list of transactions to be checkpointed and send all its buffers
- * to disk. We submit larger chunks of data at once.
- *
- * The journal should be locked before calling this function.
- * Called with j_checkpoint_mutex held.
- */
-int log_do_checkpoint(journal_t *journal)
-{
-	transaction_t *transaction;
-	tid_t this_tid;
-	int result;
-
-	jbd_debug(1, "Start checkpoint\n");
-
-	/*
-	 * First thing: if there are any transactions in the log which
-	 * don't need checkpointing, just eliminate them from the
-	 * journal straight away.
-	 */
-	result = cleanup_journal_tail(journal);
-	trace_jbd_checkpoint(journal, result);
-	jbd_debug(1, "cleanup_journal_tail returned %d\n", result);
-	if (result <= 0)
-		return result;
-
-	/*
-	 * OK, we need to start writing disk blocks.  Take one transaction
-	 * and write it.
-	 */
-	result = 0;
-	spin_lock(&journal->j_list_lock);
-	if (!journal->j_checkpoint_transactions)
-		goto out;
-	transaction = journal->j_checkpoint_transactions;
-	this_tid = transaction->t_tid;
-restart:
-	/*
-	 * If someone cleaned up this transaction while we slept, we're
-	 * done (maybe it's a new transaction, but it fell at the same
-	 * address).
-	 */
-	if (journal->j_checkpoint_transactions == transaction &&
-			transaction->t_tid == this_tid) {
-		int batch_count = 0;
-		struct buffer_head *bhs[NR_BATCH];
-		struct journal_head *jh;
-		int retry = 0, err;
-
-		while (!retry && transaction->t_checkpoint_list) {
-			struct buffer_head *bh;
-
-			jh = transaction->t_checkpoint_list;
-			bh = jh2bh(jh);
-			if (!jbd_trylock_bh_state(bh)) {
-				jbd_sync_bh(journal, bh);
-				retry = 1;
-				break;
-			}
-			retry = __process_buffer(journal, jh, bhs,&batch_count);
-			if (retry < 0 && !result)
-				result = retry;
-			if (!retry && (need_resched() ||
-				spin_needbreak(&journal->j_list_lock))) {
-				spin_unlock(&journal->j_list_lock);
-				retry = 1;
-				break;
-			}
-		}
-
-		if (batch_count) {
-			if (!retry) {
-				spin_unlock(&journal->j_list_lock);
-				retry = 1;
-			}
-			__flush_batch(journal, bhs, &batch_count);
-		}
-
-		if (retry) {
-			spin_lock(&journal->j_list_lock);
-			goto restart;
-		}
-		/*
-		 * Now we have cleaned up the first transaction's checkpoint
-		 * list. Let's clean up the second one
-		 */
-		err = __wait_cp_io(journal, transaction);
-		if (!result)
-			result = err;
-	}
-out:
-	spin_unlock(&journal->j_list_lock);
-	if (result < 0)
-		journal_abort(journal, result);
-	else
-		result = cleanup_journal_tail(journal);
-
-	return (result < 0) ? result : 0;
-}
-
-/*
- * Check the list of checkpoint transactions for the journal to see if
- * we have already got rid of any since the last update of the log tail
- * in the journal superblock.  If so, we can instantly roll the
- * superblock forward to remove those transactions from the log.
- *
- * Return <0 on error, 0 on success, 1 if there was nothing to clean up.
- *
- * This is the only part of the journaling code which really needs to be
- * aware of transaction aborts.  Checkpointing involves writing to the
- * main filesystem area rather than to the journal, so it can proceed
- * even in abort state, but we must not update the super block if
- * checkpointing may have failed.  Otherwise, we would lose some metadata
- * buffers which should be written-back to the filesystem.
- */
-
-int cleanup_journal_tail(journal_t *journal)
-{
-	transaction_t * transaction;
-	tid_t		first_tid;
-	unsigned int	blocknr, freed;
-
-	if (is_journal_aborted(journal))
-		return 1;
-
-	/*
-	 * OK, work out the oldest transaction remaining in the log, and
-	 * the log block it starts at.
-	 *
-	 * If the log is now empty, we need to work out which is the
-	 * next transaction ID we will write, and where it will
-	 * start.
-	 */
-	spin_lock(&journal->j_state_lock);
-	spin_lock(&journal->j_list_lock);
-	transaction = journal->j_checkpoint_transactions;
-	if (transaction) {
-		first_tid = transaction->t_tid;
-		blocknr = transaction->t_log_start;
-	} else if ((transaction = journal->j_committing_transaction) != NULL) {
-		first_tid = transaction->t_tid;
-		blocknr = transaction->t_log_start;
-	} else if ((transaction = journal->j_running_transaction) != NULL) {
-		first_tid = transaction->t_tid;
-		blocknr = journal->j_head;
-	} else {
-		first_tid = journal->j_transaction_sequence;
-		blocknr = journal->j_head;
-	}
-	spin_unlock(&journal->j_list_lock);
-	J_ASSERT(blocknr != 0);
-
-	/* If the oldest pinned transaction is at the tail of the log
-           already then there's not much we can do right now. */
-	if (journal->j_tail_sequence == first_tid) {
-		spin_unlock(&journal->j_state_lock);
-		return 1;
-	}
-	spin_unlock(&journal->j_state_lock);
-
-	/*
-	 * We need to make sure that any blocks that were recently written out
-	 * --- perhaps by log_do_checkpoint() --- are flushed out before we
-	 * drop the transactions from the journal. Similarly we need to be sure
-	 * superblock makes it to disk before next transaction starts reusing
-	 * freed space (otherwise we could replay some blocks of the new
-	 * transaction thinking they belong to the old one). So we use
-	 * WRITE_FLUSH_FUA. It's unlikely this will be necessary, especially
-	 * with an appropriately sized journal, but we need this to guarantee
-	 * correctness.  Fortunately cleanup_journal_tail() doesn't get called
-	 * all that often.
-	 */
-	journal_update_sb_log_tail(journal, first_tid, blocknr,
-				   WRITE_FLUSH_FUA);
-
-	spin_lock(&journal->j_state_lock);
-	/* OK, update the superblock to recover the freed space.
-	 * Physical blocks come first: have we wrapped beyond the end of
-	 * the log?  */
-	freed = blocknr - journal->j_tail;
-	if (blocknr < journal->j_tail)
-		freed = freed + journal->j_last - journal->j_first;
-
-	trace_jbd_cleanup_journal_tail(journal, first_tid, blocknr, freed);
-	jbd_debug(1,
-		  "Cleaning journal tail from %d to %d (offset %u), "
-		  "freeing %u\n",
-		  journal->j_tail_sequence, first_tid, blocknr, freed);
-
-	journal->j_free += freed;
-	journal->j_tail_sequence = first_tid;
-	journal->j_tail = blocknr;
-	spin_unlock(&journal->j_state_lock);
-	return 0;
-}
-
-
-/* Checkpoint list management */
-
-/*
- * journal_clean_one_cp_list
- *
- * Find all the written-back checkpoint buffers in the given list and release
- * them.
- *
- * Called with j_list_lock held.
- * Returns number of buffers reaped (for debug)
- */
-
-static int journal_clean_one_cp_list(struct journal_head *jh, int *released)
-{
-	struct journal_head *last_jh;
-	struct journal_head *next_jh = jh;
-	int ret, freed = 0;
-
-	*released = 0;
-	if (!jh)
-		return 0;
-
-	last_jh = jh->b_cpprev;
-	do {
-		jh = next_jh;
-		next_jh = jh->b_cpnext;
-		/* Use trylock because of the ranking */
-		if (jbd_trylock_bh_state(jh2bh(jh))) {
-			ret = __try_to_free_cp_buf(jh);
-			if (ret) {
-				freed++;
-				if (ret == 2) {
-					*released = 1;
-					return freed;
-				}
-			}
-		}
-		/*
-		 * This function only frees up some memory
-		 * if possible so we dont have an obligation
-		 * to finish processing. Bail out if preemption
-		 * requested:
-		 */
-		if (need_resched())
-			return freed;
-	} while (jh != last_jh);
-
-	return freed;
-}
-
-/*
- * journal_clean_checkpoint_list
- *
- * Find all the written-back checkpoint buffers in the journal and release them.
- *
- * Called with the journal locked.
- * Called with j_list_lock held.
- * Returns number of buffers reaped (for debug)
- */
-
-int __journal_clean_checkpoint_list(journal_t *journal)
-{
-	transaction_t *transaction, *last_transaction, *next_transaction;
-	int ret = 0;
-	int released;
-
-	transaction = journal->j_checkpoint_transactions;
-	if (!transaction)
-		goto out;
-
-	last_transaction = transaction->t_cpprev;
-	next_transaction = transaction;
-	do {
-		transaction = next_transaction;
-		next_transaction = transaction->t_cpnext;
-		ret += journal_clean_one_cp_list(transaction->
-				t_checkpoint_list, &released);
-		/*
-		 * This function only frees up some memory if possible so we
-		 * dont have an obligation to finish processing. Bail out if
-		 * preemption requested:
-		 */
-		if (need_resched())
-			goto out;
-		if (released)
-			continue;
-		/*
-		 * It is essential that we are as careful as in the case of
-		 * t_checkpoint_list with removing the buffer from the list as
-		 * we can possibly see not yet submitted buffers on io_list
-		 */
-		ret += journal_clean_one_cp_list(transaction->
-				t_checkpoint_io_list, &released);
-		if (need_resched())
-			goto out;
-	} while (transaction != last_transaction);
-out:
-	return ret;
-}
-
-/*
- * journal_remove_checkpoint: called after a buffer has been committed
- * to disk (either by being write-back flushed to disk, or being
- * committed to the log).
- *
- * We cannot safely clean a transaction out of the log until all of the
- * buffer updates committed in that transaction have safely been stored
- * elsewhere on disk.  To achieve this, all of the buffers in a
- * transaction need to be maintained on the transaction's checkpoint
- * lists until they have been rewritten, at which point this function is
- * called to remove the buffer from the existing transaction's
- * checkpoint lists.
- *
- * The function returns 1 if it frees the transaction, 0 otherwise.
- * The function can free jh and bh.
- *
- * This function is called with j_list_lock held.
- * This function is called with jbd_lock_bh_state(jh2bh(jh))
- */
-
-int __journal_remove_checkpoint(struct journal_head *jh)
-{
-	transaction_t *transaction;
-	journal_t *journal;
-	int ret = 0;
-
-	JBUFFER_TRACE(jh, "entry");
-
-	if ((transaction = jh->b_cp_transaction) == NULL) {
-		JBUFFER_TRACE(jh, "not on transaction");
-		goto out;
-	}
-	journal = transaction->t_journal;
-
-	JBUFFER_TRACE(jh, "removing from transaction");
-	__buffer_unlink(jh);
-	jh->b_cp_transaction = NULL;
-	journal_put_journal_head(jh);
-
-	if (transaction->t_checkpoint_list != NULL ||
-	    transaction->t_checkpoint_io_list != NULL)
-		goto out;
-
-	/*
-	 * There is one special case to worry about: if we have just pulled the
-	 * buffer off a running or committing transaction's checkpoing list,
-	 * then even if the checkpoint list is empty, the transaction obviously
-	 * cannot be dropped!
-	 *
-	 * The locking here around t_state is a bit sleazy.
-	 * See the comment at the end of journal_commit_transaction().
-	 */
-	if (transaction->t_state != T_FINISHED)
-		goto out;
-
-	/* OK, that was the last buffer for the transaction: we can now
-	   safely remove this transaction from the log */
-
-	__journal_drop_transaction(journal, transaction);
-
-	/* Just in case anybody was waiting for more transactions to be
-           checkpointed... */
-	wake_up(&journal->j_wait_logspace);
-	ret = 1;
-out:
-	return ret;
-}
-
-/*
- * journal_insert_checkpoint: put a committed buffer onto a checkpoint
- * list so that we know when it is safe to clean the transaction out of
- * the log.
- *
- * Called with the journal locked.
- * Called with j_list_lock held.
- */
-void __journal_insert_checkpoint(struct journal_head *jh,
-			       transaction_t *transaction)
-{
-	JBUFFER_TRACE(jh, "entry");
-	J_ASSERT_JH(jh, buffer_dirty(jh2bh(jh)) || buffer_jbddirty(jh2bh(jh)));
-	J_ASSERT_JH(jh, jh->b_cp_transaction == NULL);
-
-	/* Get reference for checkpointing transaction */
-	journal_grab_journal_head(jh2bh(jh));
-	jh->b_cp_transaction = transaction;
-
-	if (!transaction->t_checkpoint_list) {
-		jh->b_cpnext = jh->b_cpprev = jh;
-	} else {
-		jh->b_cpnext = transaction->t_checkpoint_list;
-		jh->b_cpprev = transaction->t_checkpoint_list->b_cpprev;
-		jh->b_cpprev->b_cpnext = jh;
-		jh->b_cpnext->b_cpprev = jh;
-	}
-	transaction->t_checkpoint_list = jh;
-}
-
-/*
- * We've finished with this transaction structure: adios...
- *
- * The transaction must have no links except for the checkpoint by this
- * point.
- *
- * Called with the journal locked.
- * Called with j_list_lock held.
- */
-
-void __journal_drop_transaction(journal_t *journal, transaction_t *transaction)
-{
-	assert_spin_locked(&journal->j_list_lock);
-	if (transaction->t_cpnext) {
-		transaction->t_cpnext->t_cpprev = transaction->t_cpprev;
-		transaction->t_cpprev->t_cpnext = transaction->t_cpnext;
-		if (journal->j_checkpoint_transactions == transaction)
-			journal->j_checkpoint_transactions =
-				transaction->t_cpnext;
-		if (journal->j_checkpoint_transactions == transaction)
-			journal->j_checkpoint_transactions = NULL;
-	}
-
-	J_ASSERT(transaction->t_state == T_FINISHED);
-	J_ASSERT(transaction->t_buffers == NULL);
-	J_ASSERT(transaction->t_sync_datalist == NULL);
-	J_ASSERT(transaction->t_forget == NULL);
-	J_ASSERT(transaction->t_iobuf_list == NULL);
-	J_ASSERT(transaction->t_shadow_list == NULL);
-	J_ASSERT(transaction->t_log_list == NULL);
-	J_ASSERT(transaction->t_checkpoint_list == NULL);
-	J_ASSERT(transaction->t_checkpoint_io_list == NULL);
-	J_ASSERT(transaction->t_updates == 0);
-	J_ASSERT(journal->j_committing_transaction != transaction);
-	J_ASSERT(journal->j_running_transaction != transaction);
-
-	trace_jbd_drop_transaction(journal, transaction);
-	jbd_debug(1, "Dropping transaction %d, all done\n", transaction->t_tid);
-	kfree(transaction);
-}
diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
deleted file mode 100644
index bb217dcb41af..000000000000
--- a/fs/jbd/commit.c
+++ /dev/null
@@ -1,1021 +0,0 @@
-/*
- * linux/fs/jbd/commit.c
- *
- * Written by Stephen C. Tweedie <sct@redhat.com>, 1998
- *
- * Copyright 1998 Red Hat corp --- All Rights Reserved
- *
- * This file is part of the Linux kernel and is made available under
- * the terms of the GNU General Public License, version 2, or at your
- * option, any later version, incorporated herein by reference.
- *
- * Journal commit routines for the generic filesystem journaling code;
- * part of the ext2fs journaling system.
- */
-
-#include <linux/time.h>
-#include <linux/fs.h>
-#include <linux/jbd.h>
-#include <linux/errno.h>
-#include <linux/mm.h>
-#include <linux/pagemap.h>
-#include <linux/bio.h>
-#include <linux/blkdev.h>
-#include <trace/events/jbd.h>
-
-/*
- * Default IO end handler for temporary BJ_IO buffer_heads.
- */
-static void journal_end_buffer_io_sync(struct buffer_head *bh, int uptodate)
-{
-	BUFFER_TRACE(bh, "");
-	if (uptodate)
-		set_buffer_uptodate(bh);
-	else
-		clear_buffer_uptodate(bh);
-	unlock_buffer(bh);
-}
-
-/*
- * When an ext3-ordered file is truncated, it is possible that many pages are
- * not successfully freed, because they are attached to a committing transaction.
- * After the transaction commits, these pages are left on the LRU, with no
- * ->mapping, and with attached buffers.  These pages are trivially reclaimable
- * by the VM, but their apparent absence upsets the VM accounting, and it makes
- * the numbers in /proc/meminfo look odd.
- *
- * So here, we have a buffer which has just come off the forget list.  Look to
- * see if we can strip all buffers from the backing page.
- *
- * Called under journal->j_list_lock.  The caller provided us with a ref
- * against the buffer, and we drop that here.
- */
-static void release_buffer_page(struct buffer_head *bh)
-{
-	struct page *page;
-
-	if (buffer_dirty(bh))
-		goto nope;
-	if (atomic_read(&bh->b_count) != 1)
-		goto nope;
-	page = bh->b_page;
-	if (!page)
-		goto nope;
-	if (page->mapping)
-		goto nope;
-
-	/* OK, it's a truncated page */
-	if (!trylock_page(page))
-		goto nope;
-
-	page_cache_get(page);
-	__brelse(bh);
-	try_to_free_buffers(page);
-	unlock_page(page);
-	page_cache_release(page);
-	return;
-
-nope:
-	__brelse(bh);
-}
-
-/*
- * Decrement reference counter for data buffer. If it has been marked
- * 'BH_Freed', release it and the page to which it belongs if possible.
- */
-static void release_data_buffer(struct buffer_head *bh)
-{
-	if (buffer_freed(bh)) {
-		WARN_ON_ONCE(buffer_dirty(bh));
-		clear_buffer_freed(bh);
-		clear_buffer_mapped(bh);
-		clear_buffer_new(bh);
-		clear_buffer_req(bh);
-		bh->b_bdev = NULL;
-		release_buffer_page(bh);
-	} else
-		put_bh(bh);
-}
-
-/*
- * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock is
- * held.  For ranking reasons we must trylock.  If we lose, schedule away and
- * return 0.  j_list_lock is dropped in this case.
- */
-static int inverted_lock(journal_t *journal, struct buffer_head *bh)
-{
-	if (!jbd_trylock_bh_state(bh)) {
-		spin_unlock(&journal->j_list_lock);
-		schedule();
-		return 0;
-	}
-	return 1;
-}
-
-/* Done it all: now write the commit record.  We should have
- * cleaned up our previous buffers by now, so if we are in abort
- * mode we can now just skip the rest of the journal write
- * entirely.
- *
- * Returns 1 if the journal needs to be aborted or 0 on success
- */
-static int journal_write_commit_record(journal_t *journal,
-					transaction_t *commit_transaction)
-{
-	struct journal_head *descriptor;
-	struct buffer_head *bh;
-	journal_header_t *header;
-	int ret;
-
-	if (is_journal_aborted(journal))
-		return 0;
-
-	descriptor = journal_get_descriptor_buffer(journal);
-	if (!descriptor)
-		return 1;
-
-	bh = jh2bh(descriptor);
-
-	header = (journal_header_t *)(bh->b_data);
-	header->h_magic = cpu_to_be32(JFS_MAGIC_NUMBER);
-	header->h_blocktype = cpu_to_be32(JFS_COMMIT_BLOCK);
-	header->h_sequence = cpu_to_be32(commit_transaction->t_tid);
-
-	JBUFFER_TRACE(descriptor, "write commit block");
-	set_buffer_dirty(bh);
-
-	if (journal->j_flags & JFS_BARRIER)
-		ret = __sync_dirty_buffer(bh, WRITE_SYNC | WRITE_FLUSH_FUA);
-	else
-		ret = sync_dirty_buffer(bh);
-
-	put_bh(bh);		/* One for getblk() */
-	journal_put_journal_head(descriptor);
-
-	return (ret == -EIO);
-}
-
-static void journal_do_submit_data(struct buffer_head **wbuf, int bufs,
-				   int write_op)
-{
-	int i;
-
-	for (i = 0; i < bufs; i++) {
-		wbuf[i]->b_end_io = end_buffer_write_sync;
-		/*
-		 * Here we write back pagecache data that may be mmaped. Since
-		 * we cannot afford to clean the page and set PageWriteback
-		 * here due to lock ordering (page lock ranks above transaction
-		 * start), the data can change while IO is in flight. Tell the
-		 * block layer it should bounce the bio pages if stable data
-		 * during write is required.
-		 *
-		 * We use up our safety reference in submit_bh().
-		 */
-		_submit_bh(write_op, wbuf[i], 1 << BIO_SNAP_STABLE);
-	}
-}
-
-/*
- *  Submit all the data buffers to disk
- */
-static int journal_submit_data_buffers(journal_t *journal,
-				       transaction_t *commit_transaction,
-				       int write_op)
-{
-	struct journal_head *jh;
-	struct buffer_head *bh;
-	int locked;
-	int bufs = 0;
-	struct buffer_head **wbuf = journal->j_wbuf;
-	int err = 0;
-
-	/*
-	 * Whenever we unlock the journal and sleep, things can get added
-	 * onto ->t_sync_datalist, so we have to keep looping back to
-	 * write_out_data until we *know* that the list is empty.
-	 *
-	 * Cleanup any flushed data buffers from the data list.  Even in
-	 * abort mode, we want to flush this out as soon as possible.
-	 */
-write_out_data:
-	cond_resched();
-	spin_lock(&journal->j_list_lock);
-
-	while (commit_transaction->t_sync_datalist) {
-		jh = commit_transaction->t_sync_datalist;
-		bh = jh2bh(jh);
-		locked = 0;
-
-		/* Get reference just to make sure buffer does not disappear
-		 * when we are forced to drop various locks */
-		get_bh(bh);
-		/* If the buffer is dirty, we need to submit IO and hence
-		 * we need the buffer lock. We try to lock the buffer without
-		 * blocking. If we fail, we need to drop j_list_lock and do
-		 * blocking lock_buffer().
-		 */
-		if (buffer_dirty(bh)) {
-			if (!trylock_buffer(bh)) {
-				BUFFER_TRACE(bh, "needs blocking lock");
-				spin_unlock(&journal->j_list_lock);
-				trace_jbd_do_submit_data(journal,
-						     commit_transaction);
-				/* Write out all data to prevent deadlocks */
-				journal_do_submit_data(wbuf, bufs, write_op);
-				bufs = 0;
-				lock_buffer(bh);
-				spin_lock(&journal->j_list_lock);
-			}
-			locked = 1;
-		}
-		/* We have to get bh_state lock. Again out of order, sigh. */
-		if (!inverted_lock(journal, bh)) {
-			jbd_lock_bh_state(bh);
-			spin_lock(&journal->j_list_lock);
-		}
-		/* Someone already cleaned up the buffer? */
-		if (!buffer_jbd(bh) || bh2jh(bh) != jh
-			|| jh->b_transaction != commit_transaction
-			|| jh->b_jlist != BJ_SyncData) {
-			jbd_unlock_bh_state(bh);
-			if (locked)
-				unlock_buffer(bh);
-			BUFFER_TRACE(bh, "already cleaned up");
-			release_data_buffer(bh);
-			continue;
-		}
-		if (locked && test_clear_buffer_dirty(bh)) {
-			BUFFER_TRACE(bh, "needs writeout, adding to array");
-			wbuf[bufs++] = bh;
-			__journal_file_buffer(jh, commit_transaction,
-						BJ_Locked);
-			jbd_unlock_bh_state(bh);
-			if (bufs == journal->j_wbufsize) {
-				spin_unlock(&journal->j_list_lock);
-				trace_jbd_do_submit_data(journal,
-						     commit_transaction);
-				journal_do_submit_data(wbuf, bufs, write_op);
-				bufs = 0;
-				goto write_out_data;
-			}
-		} else if (!locked && buffer_locked(bh)) {
-			__journal_file_buffer(jh, commit_transaction,
-						BJ_Locked);
-			jbd_unlock_bh_state(bh);
-			put_bh(bh);
-		} else {
-			BUFFER_TRACE(bh, "writeout complete: unfile");
-			if (unlikely(!buffer_uptodate(bh)))
-				err = -EIO;
-			__journal_unfile_buffer(jh);
-			jbd_unlock_bh_state(bh);
-			if (locked)
-				unlock_buffer(bh);
-			release_data_buffer(bh);
-		}
-
-		if (need_resched() || spin_needbreak(&journal->j_list_lock)) {
-			spin_unlock(&journal->j_list_lock);
-			goto write_out_data;
-		}
-	}
-	spin_unlock(&journal->j_list_lock);
-	trace_jbd_do_submit_data(journal, commit_transaction);
-	journal_do_submit_data(wbuf, bufs, write_op);
-
-	return err;
-}
-
-/*
- * journal_commit_transaction
- *
- * The primary function for committing a transaction to the log.  This
- * function is called by the journal thread to begin a complete commit.
- */
-void journal_commit_transaction(journal_t *journal)
-{
-	transaction_t *commit_transaction;
-	struct journal_head *jh, *new_jh, *descriptor;
-	struct buffer_head **wbuf = journal->j_wbuf;
-	int bufs;
-	int flags;
-	int err;
-	unsigned int blocknr;
-	ktime_t start_time;
-	u64 commit_time;
-	char *tagp = NULL;
-	journal_header_t *header;
-	journal_block_tag_t *tag = NULL;
-	int space_left = 0;
-	int first_tag = 0;
-	int tag_flag;
-	int i;
-	struct blk_plug plug;
-	int write_op = WRITE;
-
-	/*
-	 * First job: lock down the current transaction and wait for
-	 * all outstanding updates to complete.
-	 */
-
-	/* Do we need to erase the effects of a prior journal_flush? */
-	if (journal->j_flags & JFS_FLUSHED) {
-		jbd_debug(3, "super block updated\n");
-		mutex_lock(&journal->j_checkpoint_mutex);
-		/*
-		 * We hold j_checkpoint_mutex so tail cannot change under us.
-		 * We don't need any special data guarantees for writing sb
-		 * since journal is empty and it is ok for write to be
-		 * flushed only with transaction commit.
-		 */
-		journal_update_sb_log_tail(journal, journal->j_tail_sequence,
-					   journal->j_tail, WRITE_SYNC);
-		mutex_unlock(&journal->j_checkpoint_mutex);
-	} else {
-		jbd_debug(3, "superblock not updated\n");
-	}
-
-	J_ASSERT(journal->j_running_transaction != NULL);
-	J_ASSERT(journal->j_committing_transaction == NULL);
-
-	commit_transaction = journal->j_running_transaction;
-
-	trace_jbd_start_commit(journal, commit_transaction);
-	jbd_debug(1, "JBD: starting commit of transaction %d\n",
-			commit_transaction->t_tid);
-
-	spin_lock(&journal->j_state_lock);
-	J_ASSERT(commit_transaction->t_state == T_RUNNING);
-	commit_transaction->t_state = T_LOCKED;
-
-	trace_jbd_commit_locking(journal, commit_transaction);
-	spin_lock(&commit_transaction->t_handle_lock);
-	while (commit_transaction->t_updates) {
-		DEFINE_WAIT(wait);
-
-		prepare_to_wait(&journal->j_wait_updates, &wait,
-					TASK_UNINTERRUPTIBLE);
-		if (commit_transaction->t_updates) {
-			spin_unlock(&commit_transaction->t_handle_lock);
-			spin_unlock(&journal->j_state_lock);
-			schedule();
-			spin_lock(&journal->j_state_lock);
-			spin_lock(&commit_transaction->t_handle_lock);
-		}
-		finish_wait(&journal->j_wait_updates, &wait);
-	}
-	spin_unlock(&commit_transaction->t_handle_lock);
-
-	J_ASSERT (commit_transaction->t_outstanding_credits <=
-			journal->j_max_transaction_buffers);
-
-	/*
-	 * First thing we are allowed to do is to discard any remaining
-	 * BJ_Reserved buffers.  Note, it is _not_ permissible to assume
-	 * that there are no such buffers: if a large filesystem
-	 * operation like a truncate needs to split itself over multiple
-	 * transactions, then it may try to do a journal_restart() while
-	 * there are still BJ_Reserved buffers outstanding.  These must
-	 * be released cleanly from the current transaction.
-	 *
-	 * In this case, the filesystem must still reserve write access
-	 * again before modifying the buffer in the new transaction, but
-	 * we do not require it to remember exactly which old buffers it
-	 * has reserved.  This is consistent with the existing behaviour
-	 * that multiple journal_get_write_access() calls to the same
-	 * buffer are perfectly permissible.
-	 */
-	while (commit_transaction->t_reserved_list) {
-		jh = commit_transaction->t_reserved_list;
-		JBUFFER_TRACE(jh, "reserved, unused: refile");
-		/*
-		 * A journal_get_undo_access()+journal_release_buffer() may
-		 * leave undo-committed data.
-		 */
-		if (jh->b_committed_data) {
-			struct buffer_head *bh = jh2bh(jh);
-
-			jbd_lock_bh_state(bh);
-			jbd_free(jh->b_committed_data, bh->b_size);
-			jh->b_committed_data = NULL;
-			jbd_unlock_bh_state(bh);
-		}
-		journal_refile_buffer(journal, jh);
-	}
-
-	/*
-	 * Now try to drop any written-back buffers from the journal's
-	 * checkpoint lists.  We do this *before* commit because it potentially
-	 * frees some memory
-	 */
-	spin_lock(&journal->j_list_lock);
-	__journal_clean_checkpoint_list(journal);
-	spin_unlock(&journal->j_list_lock);
-
-	jbd_debug (3, "JBD: commit phase 1\n");
-
-	/*
-	 * Clear revoked flag to reflect there is no revoked buffers
-	 * in the next transaction which is going to be started.
-	 */
-	journal_clear_buffer_revoked_flags(journal);
-
-	/*
-	 * Switch to a new revoke table.
-	 */
-	journal_switch_revoke_table(journal);
-
-	trace_jbd_commit_flushing(journal, commit_transaction);
-	commit_transaction->t_state = T_FLUSH;
-	journal->j_committing_transaction = commit_transaction;
-	journal->j_running_transaction = NULL;
-	start_time = ktime_get();
-	commit_transaction->t_log_start = journal->j_head;
-	wake_up(&journal->j_wait_transaction_locked);
-	spin_unlock(&journal->j_state_lock);
-
-	jbd_debug (3, "JBD: commit phase 2\n");
-
-	if (tid_geq(journal->j_commit_waited, commit_transaction->t_tid))
-		write_op = WRITE_SYNC;
-
-	/*
-	 * Now start flushing things to disk, in the order they appear
-	 * on the transaction lists.  Data blocks go first.
-	 */
-	blk_start_plug(&plug);
-	err = journal_submit_data_buffers(journal, commit_transaction,
-					  write_op);
-	blk_finish_plug(&plug);
-
-	/*
-	 * Wait for all previously submitted IO to complete.
-	 */
-	spin_lock(&journal->j_list_lock);
-	while (commit_transaction->t_locked_list) {
-		struct buffer_head *bh;
-
-		jh = commit_transaction->t_locked_list->b_tprev;
-		bh = jh2bh(jh);
-		get_bh(bh);
-		if (buffer_locked(bh)) {
-			spin_unlock(&journal->j_list_lock);
-			wait_on_buffer(bh);
-			spin_lock(&journal->j_list_lock);
-		}
-		if (unlikely(!buffer_uptodate(bh))) {
-			if (!trylock_page(bh->b_page)) {
-				spin_unlock(&journal->j_list_lock);
-				lock_page(bh->b_page);
-				spin_lock(&journal->j_list_lock);
-			}
-			if (bh->b_page->mapping)
-				set_bit(AS_EIO, &bh->b_page->mapping->flags);
-
-			unlock_page(bh->b_page);
-			SetPageError(bh->b_page);
-			err = -EIO;
-		}
-		if (!inverted_lock(journal, bh)) {
-			put_bh(bh);
-			spin_lock(&journal->j_list_lock);
-			continue;
-		}
-		if (buffer_jbd(bh) && bh2jh(bh) == jh &&
-		    jh->b_transaction == commit_transaction &&
-		    jh->b_jlist == BJ_Locked)
-			__journal_unfile_buffer(jh);
-		jbd_unlock_bh_state(bh);
-		release_data_buffer(bh);
-		cond_resched_lock(&journal->j_list_lock);
-	}
-	spin_unlock(&journal->j_list_lock);
-
-	if (err) {
-		char b[BDEVNAME_SIZE];
-
-		printk(KERN_WARNING
-			"JBD: Detected IO errors while flushing file data "
-			"on %s\n", bdevname(journal->j_fs_dev, b));
-		if (journal->j_flags & JFS_ABORT_ON_SYNCDATA_ERR)
-			journal_abort(journal, err);
-		err = 0;
-	}
-
-	blk_start_plug(&plug);
-
-	journal_write_revoke_records(journal, commit_transaction, write_op);
-
-	/*
-	 * If we found any dirty or locked buffers, then we should have
-	 * looped back up to the write_out_data label.  If there weren't
-	 * any then journal_clean_data_list should have wiped the list
-	 * clean by now, so check that it is in fact empty.
-	 */
-	J_ASSERT (commit_transaction->t_sync_datalist == NULL);
-
-	jbd_debug (3, "JBD: commit phase 3\n");
-
-	/*
-	 * Way to go: we have now written out all of the data for a
-	 * transaction!  Now comes the tricky part: we need to write out
-	 * metadata.  Loop over the transaction's entire buffer list:
-	 */
-	spin_lock(&journal->j_state_lock);
-	commit_transaction->t_state = T_COMMIT;
-	spin_unlock(&journal->j_state_lock);
-
-	trace_jbd_commit_logging(journal, commit_transaction);
-	J_ASSERT(commit_transaction->t_nr_buffers <=
-		 commit_transaction->t_outstanding_credits);
-
-	descriptor = NULL;
-	bufs = 0;
-	while (commit_transaction->t_buffers) {
-
-		/* Find the next buffer to be journaled... */
-
-		jh = commit_transaction->t_buffers;
-
-		/* If we're in abort mode, we just un-journal the buffer and
-		   release it. */
-
-		if (is_journal_aborted(journal)) {
-			clear_buffer_jbddirty(jh2bh(jh));
-			JBUFFER_TRACE(jh, "journal is aborting: refile");
-			journal_refile_buffer(journal, jh);
-			/* If that was the last one, we need to clean up
-			 * any descriptor buffers which may have been
-			 * already allocated, even if we are now
-			 * aborting. */
-			if (!commit_transaction->t_buffers)
-				goto start_journal_io;
-			continue;
-		}
-
-		/* Make sure we have a descriptor block in which to
-		   record the metadata buffer. */
-
-		if (!descriptor) {
-			struct buffer_head *bh;
-
-			J_ASSERT (bufs == 0);
-
-			jbd_debug(4, "JBD: get descriptor\n");
-
-			descriptor = journal_get_descriptor_buffer(journal);
-			if (!descriptor) {
-				journal_abort(journal, -EIO);
-				continue;
-			}
-
-			bh = jh2bh(descriptor);
-			jbd_debug(4, "JBD: got buffer %llu (%p)\n",
-				(unsigned long long)bh->b_blocknr, bh->b_data);
-			header = (journal_header_t *)&bh->b_data[0];
-			header->h_magic     = cpu_to_be32(JFS_MAGIC_NUMBER);
-			header->h_blocktype = cpu_to_be32(JFS_DESCRIPTOR_BLOCK);
-			header->h_sequence  = cpu_to_be32(commit_transaction->t_tid);
-
-			tagp = &bh->b_data[sizeof(journal_header_t)];
-			space_left = bh->b_size - sizeof(journal_header_t);
-			first_tag = 1;
-			set_buffer_jwrite(bh);
-			set_buffer_dirty(bh);
-			wbuf[bufs++] = bh;
-
-			/* Record it so that we can wait for IO
-                           completion later */
-			BUFFER_TRACE(bh, "ph3: file as descriptor");
-			journal_file_buffer(descriptor, commit_transaction,
-					BJ_LogCtl);
-		}
-
-		/* Where is the buffer to be written? */
-
-		err = journal_next_log_block(journal, &blocknr);
-		/* If the block mapping failed, just abandon the buffer
-		   and repeat this loop: we'll fall into the
-		   refile-on-abort condition above. */
-		if (err) {
-			journal_abort(journal, err);
-			continue;
-		}
-
-		/*
-		 * start_this_handle() uses t_outstanding_credits to determine
-		 * the free space in the log, but this counter is changed
-		 * by journal_next_log_block() also.
-		 */
-		commit_transaction->t_outstanding_credits--;
-
-		/* Bump b_count to prevent truncate from stumbling over
-                   the shadowed buffer!  @@@ This can go if we ever get
-                   rid of the BJ_IO/BJ_Shadow pairing of buffers. */
-		get_bh(jh2bh(jh));
-
-		/* Make a temporary IO buffer with which to write it out
-                   (this will requeue both the metadata buffer and the
-                   temporary IO buffer). new_bh goes on BJ_IO*/
-
-		set_buffer_jwrite(jh2bh(jh));
-		/*
-		 * akpm: journal_write_metadata_buffer() sets
-		 * new_bh->b_transaction to commit_transaction.
-		 * We need to clean this up before we release new_bh
-		 * (which is of type BJ_IO)
-		 */
-		JBUFFER_TRACE(jh, "ph3: write metadata");
-		flags = journal_write_metadata_buffer(commit_transaction,
-						      jh, &new_jh, blocknr);
-		set_buffer_jwrite(jh2bh(new_jh));
-		wbuf[bufs++] = jh2bh(new_jh);
-
-		/* Record the new block's tag in the current descriptor
-                   buffer */
-
-		tag_flag = 0;
-		if (flags & 1)
-			tag_flag |= JFS_FLAG_ESCAPE;
-		if (!first_tag)
-			tag_flag |= JFS_FLAG_SAME_UUID;
-
-		tag = (journal_block_tag_t *) tagp;
-		tag->t_blocknr = cpu_to_be32(jh2bh(jh)->b_blocknr);
-		tag->t_flags = cpu_to_be32(tag_flag);
-		tagp += sizeof(journal_block_tag_t);
-		space_left -= sizeof(journal_block_tag_t);
-
-		if (first_tag) {
-			memcpy (tagp, journal->j_uuid, 16);
-			tagp += 16;
-			space_left -= 16;
-			first_tag = 0;
-		}
-
-		/* If there's no more to do, or if the descriptor is full,
-		   let the IO rip! */
-
-		if (bufs == journal->j_wbufsize ||
-		    commit_transaction->t_buffers == NULL ||
-		    space_left < sizeof(journal_block_tag_t) + 16) {
-
-			jbd_debug(4, "JBD: Submit %d IOs\n", bufs);
-
-			/* Write an end-of-descriptor marker before
-                           submitting the IOs.  "tag" still points to
-                           the last tag we set up. */
-
-			tag->t_flags |= cpu_to_be32(JFS_FLAG_LAST_TAG);
-
-start_journal_io:
-			for (i = 0; i < bufs; i++) {
-				struct buffer_head *bh = wbuf[i];
-				lock_buffer(bh);
-				clear_buffer_dirty(bh);
-				set_buffer_uptodate(bh);
-				bh->b_end_io = journal_end_buffer_io_sync;
-				/*
-				 * In data=journal mode, here we can end up
-				 * writing pagecache data that might be
-				 * mmapped. Since we can't afford to clean the
-				 * page and set PageWriteback (see the comment
-				 * near the other use of _submit_bh()), the
-				 * data can change while the write is in
-				 * flight.  Tell the block layer to bounce the
-				 * bio pages if stable pages are required.
-				 */
-				_submit_bh(write_op, bh, 1 << BIO_SNAP_STABLE);
-			}
-			cond_resched();
-
-			/* Force a new descriptor to be generated next
-                           time round the loop. */
-			descriptor = NULL;
-			bufs = 0;
-		}
-	}
-
-	blk_finish_plug(&plug);
-
-	/* Lo and behold: we have just managed to send a transaction to
-           the log.  Before we can commit it, wait for the IO so far to
-           complete.  Control buffers being written are on the
-           transaction's t_log_list queue, and metadata buffers are on
-           the t_iobuf_list queue.
-
-	   Wait for the buffers in reverse order.  That way we are
-	   less likely to be woken up until all IOs have completed, and
-	   so we incur less scheduling load.
-	*/
-
-	jbd_debug(3, "JBD: commit phase 4\n");
-
-	/*
-	 * akpm: these are BJ_IO, and j_list_lock is not needed.
-	 * See __journal_try_to_free_buffer.
-	 */
-wait_for_iobuf:
-	while (commit_transaction->t_iobuf_list != NULL) {
-		struct buffer_head *bh;
-
-		jh = commit_transaction->t_iobuf_list->b_tprev;
-		bh = jh2bh(jh);
-		if (buffer_locked(bh)) {
-			wait_on_buffer(bh);
-			goto wait_for_iobuf;
-		}
-		if (cond_resched())
-			goto wait_for_iobuf;
-
-		if (unlikely(!buffer_uptodate(bh)))
-			err = -EIO;
-
-		clear_buffer_jwrite(bh);
-
-		JBUFFER_TRACE(jh, "ph4: unfile after journal write");
-		journal_unfile_buffer(journal, jh);
-
-		/*
-		 * ->t_iobuf_list should contain only dummy buffer_heads
-		 * which were created by journal_write_metadata_buffer().
-		 */
-		BUFFER_TRACE(bh, "dumping temporary bh");
-		journal_put_journal_head(jh);
-		__brelse(bh);
-		J_ASSERT_BH(bh, atomic_read(&bh->b_count) == 0);
-		free_buffer_head(bh);
-
-		/* We also have to unlock and free the corresponding
-                   shadowed buffer */
-		jh = commit_transaction->t_shadow_list->b_tprev;
-		bh = jh2bh(jh);
-		clear_buffer_jwrite(bh);
-		J_ASSERT_BH(bh, buffer_jbddirty(bh));
-
-		/* The metadata is now released for reuse, but we need
-                   to remember it against this transaction so that when
-                   we finally commit, we can do any checkpointing
-                   required. */
-		JBUFFER_TRACE(jh, "file as BJ_Forget");
-		journal_file_buffer(jh, commit_transaction, BJ_Forget);
-		/*
-		 * Wake up any transactions which were waiting for this
-		 * IO to complete. The barrier must be here so that changes
-		 * by journal_file_buffer() take effect before wake_up_bit()
-		 * does the waitqueue check.
-		 */
-		smp_mb();
-		wake_up_bit(&bh->b_state, BH_Unshadow);
-		JBUFFER_TRACE(jh, "brelse shadowed buffer");
-		__brelse(bh);
-	}
-
-	J_ASSERT (commit_transaction->t_shadow_list == NULL);
-
-	jbd_debug(3, "JBD: commit phase 5\n");
-
-	/* Here we wait for the revoke record and descriptor record buffers */
- wait_for_ctlbuf:
-	while (commit_transaction->t_log_list != NULL) {
-		struct buffer_head *bh;
-
-		jh = commit_transaction->t_log_list->b_tprev;
-		bh = jh2bh(jh);
-		if (buffer_locked(bh)) {
-			wait_on_buffer(bh);
-			goto wait_for_ctlbuf;
-		}
-		if (cond_resched())
-			goto wait_for_ctlbuf;
-
-		if (unlikely(!buffer_uptodate(bh)))
-			err = -EIO;
-
-		BUFFER_TRACE(bh, "ph5: control buffer writeout done: unfile");
-		clear_buffer_jwrite(bh);
-		journal_unfile_buffer(journal, jh);
-		journal_put_journal_head(jh);
-		__brelse(bh);		/* One for getblk */
-		/* AKPM: bforget here */
-	}
-
-	if (err)
-		journal_abort(journal, err);
-
-	jbd_debug(3, "JBD: commit phase 6\n");
-
-	/* All metadata is written, now write commit record and do cleanup */
-	spin_lock(&journal->j_state_lock);
-	J_ASSERT(commit_transaction->t_state == T_COMMIT);
-	commit_transaction->t_state = T_COMMIT_RECORD;
-	spin_unlock(&journal->j_state_lock);
-
-	if (journal_write_commit_record(journal, commit_transaction))
-		err = -EIO;
-
-	if (err)
-		journal_abort(journal, err);
-
-	/* End of a transaction!  Finally, we can do checkpoint
-           processing: any buffers committed as a result of this
-           transaction can be removed from any checkpoint list it was on
-           before. */
-
-	jbd_debug(3, "JBD: commit phase 7\n");
-
-	J_ASSERT(commit_transaction->t_sync_datalist == NULL);
-	J_ASSERT(commit_transaction->t_buffers == NULL);
-	J_ASSERT(commit_transaction->t_checkpoint_list == NULL);
-	J_ASSERT(commit_transaction->t_iobuf_list == NULL);
-	J_ASSERT(commit_transaction->t_shadow_list == NULL);
-	J_ASSERT(commit_transaction->t_log_list == NULL);
-
-restart_loop:
-	/*
-	 * As there are other places (journal_unmap_buffer()) adding buffers
-	 * to this list we have to be careful and hold the j_list_lock.
-	 */
-	spin_lock(&journal->j_list_lock);
-	while (commit_transaction->t_forget) {
-		transaction_t *cp_transaction;
-		struct buffer_head *bh;
-		int try_to_free = 0;
-
-		jh = commit_transaction->t_forget;
-		spin_unlock(&journal->j_list_lock);
-		bh = jh2bh(jh);
-		/*
-		 * Get a reference so that bh cannot be freed before we are
-		 * done with it.
-		 */
-		get_bh(bh);
-		jbd_lock_bh_state(bh);
-		J_ASSERT_JH(jh,	jh->b_transaction == commit_transaction ||
-			jh->b_transaction == journal->j_running_transaction);
-
-		/*
-		 * If there is undo-protected committed data against
-		 * this buffer, then we can remove it now.  If it is a
-		 * buffer needing such protection, the old frozen_data
-		 * field now points to a committed version of the
-		 * buffer, so rotate that field to the new committed
-		 * data.
-		 *
-		 * Otherwise, we can just throw away the frozen data now.
-		 */
-		if (jh->b_committed_data) {
-			jbd_free(jh->b_committed_data, bh->b_size);
-			jh->b_committed_data = NULL;
-			if (jh->b_frozen_data) {
-				jh->b_committed_data = jh->b_frozen_data;
-				jh->b_frozen_data = NULL;
-			}
-		} else if (jh->b_frozen_data) {
-			jbd_free(jh->b_frozen_data, bh->b_size);
-			jh->b_frozen_data = NULL;
-		}
-
-		spin_lock(&journal->j_list_lock);
-		cp_transaction = jh->b_cp_transaction;
-		if (cp_transaction) {
-			JBUFFER_TRACE(jh, "remove from old cp transaction");
-			__journal_remove_checkpoint(jh);
-		}
-
-		/* Only re-checkpoint the buffer_head if it is marked
-		 * dirty.  If the buffer was added to the BJ_Forget list
-		 * by journal_forget, it may no longer be dirty and
-		 * there's no point in keeping a checkpoint record for
-		 * it. */
-
-		/*
-		 * A buffer which has been freed while still being journaled by
-		 * a previous transaction.
-		 */
-		if (buffer_freed(bh)) {
-			/*
-			 * If the running transaction is the one containing
-			 * "add to orphan" operation (b_next_transaction !=
-			 * NULL), we have to wait for that transaction to
-			 * commit before we can really get rid of the buffer.
-			 * So just clear b_modified to not confuse transaction
-			 * credit accounting and refile the buffer to
-			 * BJ_Forget of the running transaction. If the just
-			 * committed transaction contains "add to orphan"
-			 * operation, we can completely invalidate the buffer
-			 * now. We are rather throughout in that since the
-			 * buffer may be still accessible when blocksize <
-			 * pagesize and it is attached to the last partial
-			 * page.
-			 */
-			jh->b_modified = 0;
-			if (!jh->b_next_transaction) {
-				clear_buffer_freed(bh);
-				clear_buffer_jbddirty(bh);
-				clear_buffer_mapped(bh);
-				clear_buffer_new(bh);
-				clear_buffer_req(bh);
-				bh->b_bdev = NULL;
-			}
-		}
-
-		if (buffer_jbddirty(bh)) {
-			JBUFFER_TRACE(jh, "add to new checkpointing trans");
-			__journal_insert_checkpoint(jh, commit_transaction);
-			if (is_journal_aborted(journal))
-				clear_buffer_jbddirty(bh);
-		} else {
-			J_ASSERT_BH(bh, !buffer_dirty(bh));
-			/*
-			 * The buffer on BJ_Forget list and not jbddirty means
-			 * it has been freed by this transaction and hence it
-			 * could not have been reallocated until this
-			 * transaction has committed. *BUT* it could be
-			 * reallocated once we have written all the data to
-			 * disk and before we process the buffer on BJ_Forget
-			 * list.
-			 */
-			if (!jh->b_next_transaction)
-				try_to_free = 1;
-		}
-		JBUFFER_TRACE(jh, "refile or unfile freed buffer");
-		__journal_refile_buffer(jh);
-		jbd_unlock_bh_state(bh);
-		if (try_to_free)
-			release_buffer_page(bh);
-		else
-			__brelse(bh);
-		cond_resched_lock(&journal->j_list_lock);
-	}
-	spin_unlock(&journal->j_list_lock);
-	/*
-	 * This is a bit sleazy.  We use j_list_lock to protect transition
-	 * of a transaction into T_FINISHED state and calling
-	 * __journal_drop_transaction(). Otherwise we could race with
-	 * other checkpointing code processing the transaction...
-	 */
-	spin_lock(&journal->j_state_lock);
-	spin_lock(&journal->j_list_lock);
-	/*
-	 * Now recheck if some buffers did not get attached to the transaction
-	 * while the lock was dropped...
-	 */
-	if (commit_transaction->t_forget) {
-		spin_unlock(&journal->j_list_lock);
-		spin_unlock(&journal->j_state_lock);
-		goto restart_loop;
-	}
-
-	/* Done with this transaction! */
-
-	jbd_debug(3, "JBD: commit phase 8\n");
-
-	J_ASSERT(commit_transaction->t_state == T_COMMIT_RECORD);
-
-	commit_transaction->t_state = T_FINISHED;
-	J_ASSERT(commit_transaction == journal->j_committing_transaction);
-	journal->j_commit_sequence = commit_transaction->t_tid;
-	journal->j_committing_transaction = NULL;
-	commit_time = ktime_to_ns(ktime_sub(ktime_get(), start_time));
-
-	/*
-	 * weight the commit time higher than the average time so we don't
-	 * react too strongly to vast changes in commit time
-	 */
-	if (likely(journal->j_average_commit_time))
-		journal->j_average_commit_time = (commit_time*3 +
-				journal->j_average_commit_time) / 4;
-	else
-		journal->j_average_commit_time = commit_time;
-
-	spin_unlock(&journal->j_state_lock);
-
-	if (commit_transaction->t_checkpoint_list == NULL &&
-	    commit_transaction->t_checkpoint_io_list == NULL) {
-		__journal_drop_transaction(journal, commit_transaction);
-	} else {
-		if (journal->j_checkpoint_transactions == NULL) {
-			journal->j_checkpoint_transactions = commit_transaction;
-			commit_transaction->t_cpnext = commit_transaction;
-			commit_transaction->t_cpprev = commit_transaction;
-		} else {
-			commit_transaction->t_cpnext =
-				journal->j_checkpoint_transactions;
-			commit_transaction->t_cpprev =
-				commit_transaction->t_cpnext->t_cpprev;
-			commit_transaction->t_cpnext->t_cpprev =
-				commit_transaction;
-			commit_transaction->t_cpprev->t_cpnext =
-				commit_transaction;
-		}
-	}
-	spin_unlock(&journal->j_list_lock);
-
-	trace_jbd_end_commit(journal, commit_transaction);
-	jbd_debug(1, "JBD: commit %d complete, head %d\n",
-		  journal->j_commit_sequence, journal->j_tail_sequence);
-
-	wake_up(&journal->j_wait_done_commit);
-}
diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
deleted file mode 100644
index c46a79adb6ad..000000000000
--- a/fs/jbd/journal.c
+++ /dev/null
@@ -1,2145 +0,0 @@
-/*
- * linux/fs/jbd/journal.c
- *
- * Written by Stephen C. Tweedie <sct@redhat.com>, 1998
- *
- * Copyright 1998 Red Hat corp --- All Rights Reserved
- *
- * This file is part of the Linux kernel and is made available under
- * the terms of the GNU General Public License, version 2, or at your
- * option, any later version, incorporated herein by reference.
- *
- * Generic filesystem journal-writing code; part of the ext2fs
- * journaling system.
- *
- * This file manages journals: areas of disk reserved for logging
- * transactional updates.  This includes the kernel journaling thread
- * which is responsible for scheduling updates to the log.
- *
- * We do not actually manage the physical storage of the journal in this
- * file: that is left to a per-journal policy function, which allows us
- * to store the journal within a filesystem-specified area for ext2
- * journaling (ext2 can use a reserved inode for storing the log).
- */
-
-#include <linux/module.h>
-#include <linux/time.h>
-#include <linux/fs.h>
-#include <linux/jbd.h>
-#include <linux/errno.h>
-#include <linux/slab.h>
-#include <linux/init.h>
-#include <linux/mm.h>
-#include <linux/freezer.h>
-#include <linux/pagemap.h>
-#include <linux/kthread.h>
-#include <linux/poison.h>
-#include <linux/proc_fs.h>
-#include <linux/debugfs.h>
-#include <linux/ratelimit.h>
-
-#define CREATE_TRACE_POINTS
-#include <trace/events/jbd.h>
-
-#include <asm/uaccess.h>
-#include <asm/page.h>
-
-EXPORT_SYMBOL(journal_start);
-EXPORT_SYMBOL(journal_restart);
-EXPORT_SYMBOL(journal_extend);
-EXPORT_SYMBOL(journal_stop);
-EXPORT_SYMBOL(journal_lock_updates);
-EXPORT_SYMBOL(journal_unlock_updates);
-EXPORT_SYMBOL(journal_get_write_access);
-EXPORT_SYMBOL(journal_get_create_access);
-EXPORT_SYMBOL(journal_get_undo_access);
-EXPORT_SYMBOL(journal_dirty_data);
-EXPORT_SYMBOL(journal_dirty_metadata);
-EXPORT_SYMBOL(journal_release_buffer);
-EXPORT_SYMBOL(journal_forget);
-#if 0
-EXPORT_SYMBOL(journal_sync_buffer);
-#endif
-EXPORT_SYMBOL(journal_flush);
-EXPORT_SYMBOL(journal_revoke);
-
-EXPORT_SYMBOL(journal_init_dev);
-EXPORT_SYMBOL(journal_init_inode);
-EXPORT_SYMBOL(journal_update_format);
-EXPORT_SYMBOL(journal_check_used_features);
-EXPORT_SYMBOL(journal_check_available_features);
-EXPORT_SYMBOL(journal_set_features);
-EXPORT_SYMBOL(journal_create);
-EXPORT_SYMBOL(journal_load);
-EXPORT_SYMBOL(journal_destroy);
-EXPORT_SYMBOL(journal_abort);
-EXPORT_SYMBOL(journal_errno);
-EXPORT_SYMBOL(journal_ack_err);
-EXPORT_SYMBOL(journal_clear_err);
-EXPORT_SYMBOL(log_wait_commit);
-EXPORT_SYMBOL(log_start_commit);
-EXPORT_SYMBOL(journal_start_commit);
-EXPORT_SYMBOL(journal_force_commit_nested);
-EXPORT_SYMBOL(journal_wipe);
-EXPORT_SYMBOL(journal_blocks_per_page);
-EXPORT_SYMBOL(journal_invalidatepage);
-EXPORT_SYMBOL(journal_try_to_free_buffers);
-EXPORT_SYMBOL(journal_force_commit);
-
-static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
-static void __journal_abort_soft (journal_t *journal, int errno);
-static const char *journal_dev_name(journal_t *journal, char *buffer);
-
-#ifdef CONFIG_JBD_DEBUG
-void __jbd_debug(int level, const char *file, const char *func,
-		 unsigned int line, const char *fmt, ...)
-{
-	struct va_format vaf;
-	va_list args;
-
-	if (level > journal_enable_debug)
-		return;
-	va_start(args, fmt);
-	vaf.fmt = fmt;
-	vaf.va = &args;
-	printk(KERN_DEBUG "%s: (%s, %u): %pV\n", file, func, line, &vaf);
-	va_end(args);
-}
-EXPORT_SYMBOL(__jbd_debug);
-#endif
-
-/*
- * Helper function used to manage commit timeouts
- */
-
-static void commit_timeout(unsigned long __data)
-{
-	struct task_struct * p = (struct task_struct *) __data;
-
-	wake_up_process(p);
-}
-
-/*
- * kjournald: The main thread function used to manage a logging device
- * journal.
- *
- * This kernel thread is responsible for two things:
- *
- * 1) COMMIT:  Every so often we need to commit the current state of the
- *    filesystem to disk.  The journal thread is responsible for writing
- *    all of the metadata buffers to disk.
- *
- * 2) CHECKPOINT: We cannot reuse a used section of the log file until all
- *    of the data in that part of the log has been rewritten elsewhere on
- *    the disk.  Flushing these old buffers to reclaim space in the log is
- *    known as checkpointing, and this thread is responsible for that job.
- */
-
-static int kjournald(void *arg)
-{
-	journal_t *journal = arg;
-	transaction_t *transaction;
-
-	/*
-	 * Set up an interval timer which can be used to trigger a commit wakeup
-	 * after the commit interval expires
-	 */
-	setup_timer(&journal->j_commit_timer, commit_timeout,
-			(unsigned long)current);
-
-	set_freezable();
-
-	/* Record that the journal thread is running */
-	journal->j_task = current;
-	wake_up(&journal->j_wait_done_commit);
-
-	printk(KERN_INFO "kjournald starting.  Commit interval %ld seconds\n",
-			journal->j_commit_interval / HZ);
-
-	/*
-	 * And now, wait forever for commit wakeup events.
-	 */
-	spin_lock(&journal->j_state_lock);
-
-loop:
-	if (journal->j_flags & JFS_UNMOUNT)
-		goto end_loop;
-
-	jbd_debug(1, "commit_sequence=%d, commit_request=%d\n",
-		journal->j_commit_sequence, journal->j_commit_request);
-
-	if (journal->j_commit_sequence != journal->j_commit_request) {
-		jbd_debug(1, "OK, requests differ\n");
-		spin_unlock(&journal->j_state_lock);
-		del_timer_sync(&journal->j_commit_timer);
-		journal_commit_transaction(journal);
-		spin_lock(&journal->j_state_lock);
-		goto loop;
-	}
-
-	wake_up(&journal->j_wait_done_commit);
-	if (freezing(current)) {
-		/*
-		 * The simpler the better. Flushing journal isn't a
-		 * good idea, because that depends on threads that may
-		 * be already stopped.
-		 */
-		jbd_debug(1, "Now suspending kjournald\n");
-		spin_unlock(&journal->j_state_lock);
-		try_to_freeze();
-		spin_lock(&journal->j_state_lock);
-	} else {
-		/*
-		 * We assume on resume that commits are already there,
-		 * so we don't sleep
-		 */
-		DEFINE_WAIT(wait);
-		int should_sleep = 1;
-
-		prepare_to_wait(&journal->j_wait_commit, &wait,
-				TASK_INTERRUPTIBLE);
-		if (journal->j_commit_sequence != journal->j_commit_request)
-			should_sleep = 0;
-		transaction = journal->j_running_transaction;
-		if (transaction && time_after_eq(jiffies,
-						transaction->t_expires))
-			should_sleep = 0;
-		if (journal->j_flags & JFS_UNMOUNT)
-			should_sleep = 0;
-		if (should_sleep) {
-			spin_unlock(&journal->j_state_lock);
-			schedule();
-			spin_lock(&journal->j_state_lock);
-		}
-		finish_wait(&journal->j_wait_commit, &wait);
-	}
-
-	jbd_debug(1, "kjournald wakes\n");
-
-	/*
-	 * Were we woken up by a commit wakeup event?
-	 */
-	transaction = journal->j_running_transaction;
-	if (transaction && time_after_eq(jiffies, transaction->t_expires)) {
-		journal->j_commit_request = transaction->t_tid;
-		jbd_debug(1, "woke because of timeout\n");
-	}
-	goto loop;
-
-end_loop:
-	spin_unlock(&journal->j_state_lock);
-	del_timer_sync(&journal->j_commit_timer);
-	journal->j_task = NULL;
-	wake_up(&journal->j_wait_done_commit);
-	jbd_debug(1, "Journal thread exiting.\n");
-	return 0;
-}
-
-static int journal_start_thread(journal_t *journal)
-{
-	struct task_struct *t;
-
-	t = kthread_run(kjournald, journal, "kjournald");
-	if (IS_ERR(t))
-		return PTR_ERR(t);
-
-	wait_event(journal->j_wait_done_commit, journal->j_task != NULL);
-	return 0;
-}
-
-static void journal_kill_thread(journal_t *journal)
-{
-	spin_lock(&journal->j_state_lock);
-	journal->j_flags |= JFS_UNMOUNT;
-
-	while (journal->j_task) {
-		wake_up(&journal->j_wait_commit);
-		spin_unlock(&journal->j_state_lock);
-		wait_event(journal->j_wait_done_commit,
-				journal->j_task == NULL);
-		spin_lock(&journal->j_state_lock);
-	}
-	spin_unlock(&journal->j_state_lock);
-}
-
-/*
- * journal_write_metadata_buffer: write a metadata buffer to the journal.
- *
- * Writes a metadata buffer to a given disk block.  The actual IO is not
- * performed but a new buffer_head is constructed which labels the data
- * to be written with the correct destination disk block.
- *
- * Any magic-number escaping which needs to be done will cause a
- * copy-out here.  If the buffer happens to start with the
- * JFS_MAGIC_NUMBER, then we can't write it to the log directly: the
- * magic number is only written to the log for descripter blocks.  In
- * this case, we copy the data and replace the first word with 0, and we
- * return a result code which indicates that this buffer needs to be
- * marked as an escaped buffer in the corresponding log descriptor
- * block.  The missing word can then be restored when the block is read
- * during recovery.
- *
- * If the source buffer has already been modified by a new transaction
- * since we took the last commit snapshot, we use the frozen copy of
- * that data for IO.  If we end up using the existing buffer_head's data
- * for the write, then we *have* to lock the buffer to prevent anyone
- * else from using and possibly modifying it while the IO is in
- * progress.
- *
- * The function returns a pointer to the buffer_heads to be used for IO.
- *
- * We assume that the journal has already been locked in this function.
- *
- * Return value:
- *  <0: Error
- * >=0: Finished OK
- *
- * On success:
- * Bit 0 set == escape performed on the data
- * Bit 1 set == buffer copy-out performed (kfree the data after IO)
- */
-
-int journal_write_metadata_buffer(transaction_t *transaction,
-				  struct journal_head  *jh_in,
-				  struct journal_head **jh_out,
-				  unsigned int blocknr)
-{
-	int need_copy_out = 0;
-	int done_copy_out = 0;
-	int do_escape = 0;
-	char *mapped_data;
-	struct buffer_head *new_bh;
-	struct journal_head *new_jh;
-	struct page *new_page;
-	unsigned int new_offset;
-	struct buffer_head *bh_in = jh2bh(jh_in);
-	journal_t *journal = transaction->t_journal;
-
-	/*
-	 * The buffer really shouldn't be locked: only the current committing
-	 * transaction is allowed to write it, so nobody else is allowed
-	 * to do any IO.
-	 *
-	 * akpm: except if we're journalling data, and write() output is
-	 * also part of a shared mapping, and another thread has
-	 * decided to launch a writepage() against this buffer.
-	 */
-	J_ASSERT_BH(bh_in, buffer_jbddirty(bh_in));
-
-	new_bh = alloc_buffer_head(GFP_NOFS|__GFP_NOFAIL);
-	/* keep subsequent assertions sane */
-	atomic_set(&new_bh->b_count, 1);
-	new_jh = journal_add_journal_head(new_bh);	/* This sleeps */
-
-	/*
-	 * If a new transaction has already done a buffer copy-out, then
-	 * we use that version of the data for the commit.
-	 */
-	jbd_lock_bh_state(bh_in);
-repeat:
-	if (jh_in->b_frozen_data) {
-		done_copy_out = 1;
-		new_page = virt_to_page(jh_in->b_frozen_data);
-		new_offset = offset_in_page(jh_in->b_frozen_data);
-	} else {
-		new_page = jh2bh(jh_in)->b_page;
-		new_offset = offset_in_page(jh2bh(jh_in)->b_data);
-	}
-
-	mapped_data = kmap_atomic(new_page);
-	/*
-	 * Check for escaping
-	 */
-	if (*((__be32 *)(mapped_data + new_offset)) ==
-				cpu_to_be32(JFS_MAGIC_NUMBER)) {
-		need_copy_out = 1;
-		do_escape = 1;
-	}
-	kunmap_atomic(mapped_data);
-
-	/*
-	 * Do we need to do a data copy?
-	 */
-	if (need_copy_out && !done_copy_out) {
-		char *tmp;
-
-		jbd_unlock_bh_state(bh_in);
-		tmp = jbd_alloc(bh_in->b_size, GFP_NOFS);
-		jbd_lock_bh_state(bh_in);
-		if (jh_in->b_frozen_data) {
-			jbd_free(tmp, bh_in->b_size);
-			goto repeat;
-		}
-
-		jh_in->b_frozen_data = tmp;
-		mapped_data = kmap_atomic(new_page);
-		memcpy(tmp, mapped_data + new_offset, jh2bh(jh_in)->b_size);
-		kunmap_atomic(mapped_data);
-
-		new_page = virt_to_page(tmp);
-		new_offset = offset_in_page(tmp);
-		done_copy_out = 1;
-	}
-
-	/*
-	 * Did we need to do an escaping?  Now we've done all the
-	 * copying, we can finally do so.
-	 */
-	if (do_escape) {
-		mapped_data = kmap_atomic(new_page);
-		*((unsigned int *)(mapped_data + new_offset)) = 0;
-		kunmap_atomic(mapped_data);
-	}
-
-	set_bh_page(new_bh, new_page, new_offset);
-	new_jh->b_transaction = NULL;
-	new_bh->b_size = jh2bh(jh_in)->b_size;
-	new_bh->b_bdev = transaction->t_journal->j_dev;
-	new_bh->b_blocknr = blocknr;
-	set_buffer_mapped(new_bh);
-	set_buffer_dirty(new_bh);
-
-	*jh_out = new_jh;
-
-	/*
-	 * The to-be-written buffer needs to get moved to the io queue,
-	 * and the original buffer whose contents we are shadowing or
-	 * copying is moved to the transaction's shadow queue.
-	 */
-	JBUFFER_TRACE(jh_in, "file as BJ_Shadow");
-	spin_lock(&journal->j_list_lock);
-	__journal_file_buffer(jh_in, transaction, BJ_Shadow);
-	spin_unlock(&journal->j_list_lock);
-	jbd_unlock_bh_state(bh_in);
-
-	JBUFFER_TRACE(new_jh, "file as BJ_IO");
-	journal_file_buffer(new_jh, transaction, BJ_IO);
-
-	return do_escape | (done_copy_out << 1);
-}
-
-/*
- * Allocation code for the journal file.  Manage the space left in the
- * journal, so that we can begin checkpointing when appropriate.
- */
-
-/*
- * __log_space_left: Return the number of free blocks left in the journal.
- *
- * Called with the journal already locked.
- *
- * Called under j_state_lock
- */
-
-int __log_space_left(journal_t *journal)
-{
-	int left = journal->j_free;
-
-	assert_spin_locked(&journal->j_state_lock);
-
-	/*
-	 * Be pessimistic here about the number of those free blocks which
-	 * might be required for log descriptor control blocks.
-	 */
-
-#define MIN_LOG_RESERVED_BLOCKS 32 /* Allow for rounding errors */
-
-	left -= MIN_LOG_RESERVED_BLOCKS;
-
-	if (left <= 0)
-		return 0;
-	left -= (left >> 3);
-	return left;
-}
-
-/*
- * Called under j_state_lock.  Returns true if a transaction commit was started.
- */
-int __log_start_commit(journal_t *journal, tid_t target)
-{
-	/*
-	 * The only transaction we can possibly wait upon is the
-	 * currently running transaction (if it exists).  Otherwise,
-	 * the target tid must be an old one.
-	 */
-	if (journal->j_commit_request != target &&
-	    journal->j_running_transaction &&
-	    journal->j_running_transaction->t_tid == target) {
-		/*
-		 * We want a new commit: OK, mark the request and wakeup the
-		 * commit thread.  We do _not_ do the commit ourselves.
-		 */
-
-		journal->j_commit_request = target;
-		jbd_debug(1, "JBD: requesting commit %d/%d\n",
-			  journal->j_commit_request,
-			  journal->j_commit_sequence);
-		wake_up(&journal->j_wait_commit);
-		return 1;
-	} else if (!tid_geq(journal->j_commit_request, target))
-		/* This should never happen, but if it does, preserve
-		   the evidence before kjournald goes into a loop and
-		   increments j_commit_sequence beyond all recognition. */
-		WARN_ONCE(1, "jbd: bad log_start_commit: %u %u %u %u\n",
-		    journal->j_commit_request, journal->j_commit_sequence,
-		    target, journal->j_running_transaction ?
-		    journal->j_running_transaction->t_tid : 0);
-	return 0;
-}
-
-int log_start_commit(journal_t *journal, tid_t tid)
-{
-	int ret;
-
-	spin_lock(&journal->j_state_lock);
-	ret = __log_start_commit(journal, tid);
-	spin_unlock(&journal->j_state_lock);
-	return ret;
-}
-
-/*
- * Force and wait upon a commit if the calling process is not within
- * transaction.  This is used for forcing out undo-protected data which contains
- * bitmaps, when the fs is running out of space.
- *
- * We can only force the running transaction if we don't have an active handle;
- * otherwise, we will deadlock.
- *
- * Returns true if a transaction was started.
- */
-int journal_force_commit_nested(journal_t *journal)
-{
-	transaction_t *transaction = NULL;
-	tid_t tid;
-
-	spin_lock(&journal->j_state_lock);
-	if (journal->j_running_transaction && !current->journal_info) {
-		transaction = journal->j_running_transaction;
-		__log_start_commit(journal, transaction->t_tid);
-	} else if (journal->j_committing_transaction)
-		transaction = journal->j_committing_transaction;
-
-	if (!transaction) {
-		spin_unlock(&journal->j_state_lock);
-		return 0;	/* Nothing to retry */
-	}
-
-	tid = transaction->t_tid;
-	spin_unlock(&journal->j_state_lock);
-	log_wait_commit(journal, tid);
-	return 1;
-}
-
-/*
- * Start a commit of the current running transaction (if any).  Returns true
- * if a transaction is going to be committed (or is currently already
- * committing), and fills its tid in at *ptid
- */
-int journal_start_commit(journal_t *journal, tid_t *ptid)
-{
-	int ret = 0;
-
-	spin_lock(&journal->j_state_lock);
-	if (journal->j_running_transaction) {
-		tid_t tid = journal->j_running_transaction->t_tid;
-
-		__log_start_commit(journal, tid);
-		/* There's a running transaction and we've just made sure
-		 * it's commit has been scheduled. */
-		if (ptid)
-			*ptid = tid;
-		ret = 1;
-	} else if (journal->j_committing_transaction) {
-		/*
-		 * If commit has been started, then we have to wait for
-		 * completion of that transaction.
-		 */
-		if (ptid)
-			*ptid = journal->j_committing_transaction->t_tid;
-		ret = 1;
-	}
-	spin_unlock(&journal->j_state_lock);
-	return ret;
-}
-
-/*
- * Wait for a specified commit to complete.
- * The caller may not hold the journal lock.
- */
-int log_wait_commit(journal_t *journal, tid_t tid)
-{
-	int err = 0;
-
-#ifdef CONFIG_JBD_DEBUG
-	spin_lock(&journal->j_state_lock);
-	if (!tid_geq(journal->j_commit_request, tid)) {
-		printk(KERN_ERR
-		       "%s: error: j_commit_request=%d, tid=%d\n",
-		       __func__, journal->j_commit_request, tid);
-	}
-	spin_unlock(&journal->j_state_lock);
-#endif
-	spin_lock(&journal->j_state_lock);
-	/*
-	 * Not running or committing trans? Must be already committed. This
-	 * saves us from waiting for a *long* time when tid overflows.
-	 */
-	if (!((journal->j_running_transaction &&
-	       journal->j_running_transaction->t_tid == tid) ||
-	      (journal->j_committing_transaction &&
-	       journal->j_committing_transaction->t_tid == tid)))
-		goto out_unlock;
-
-	if (!tid_geq(journal->j_commit_waited, tid))
-		journal->j_commit_waited = tid;
-	while (tid_gt(tid, journal->j_commit_sequence)) {
-		jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n",
-				  tid, journal->j_commit_sequence);
-		wake_up(&journal->j_wait_commit);
-		spin_unlock(&journal->j_state_lock);
-		wait_event(journal->j_wait_done_commit,
-				!tid_gt(tid, journal->j_commit_sequence));
-		spin_lock(&journal->j_state_lock);
-	}
-out_unlock:
-	spin_unlock(&journal->j_state_lock);
-
-	if (unlikely(is_journal_aborted(journal)))
-		err = -EIO;
-	return err;
-}
-
-/*
- * Return 1 if a given transaction has not yet sent barrier request
- * connected with a transaction commit. If 0 is returned, transaction
- * may or may not have sent the barrier. Used to avoid sending barrier
- * twice in common cases.
- */
-int journal_trans_will_send_data_barrier(journal_t *journal, tid_t tid)
-{
-	int ret = 0;
-	transaction_t *commit_trans;
-
-	if (!(journal->j_flags & JFS_BARRIER))
-		return 0;
-	spin_lock(&journal->j_state_lock);
-	/* Transaction already committed? */
-	if (tid_geq(journal->j_commit_sequence, tid))
-		goto out;
-	/*
-	 * Transaction is being committed and we already proceeded to
-	 * writing commit record?
-	 */
-	commit_trans = journal->j_committing_transaction;
-	if (commit_trans && commit_trans->t_tid == tid &&
-	    commit_trans->t_state >= T_COMMIT_RECORD)
-		goto out;
-	ret = 1;
-out:
-	spin_unlock(&journal->j_state_lock);
-	return ret;
-}
-EXPORT_SYMBOL(journal_trans_will_send_data_barrier);
-
-/*
- * Log buffer allocation routines:
- */
-
-int journal_next_log_block(journal_t *journal, unsigned int *retp)
-{
-	unsigned int blocknr;
-
-	spin_lock(&journal->j_state_lock);
-	J_ASSERT(journal->j_free > 1);
-
-	blocknr = journal->j_head;
-	journal->j_head++;
-	journal->j_free--;
-	if (journal->j_head == journal->j_last)
-		journal->j_head = journal->j_first;
-	spin_unlock(&journal->j_state_lock);
-	return journal_bmap(journal, blocknr, retp);
-}
-
-/*
- * Conversion of logical to physical block numbers for the journal
- *
- * On external journals the journal blocks are identity-mapped, so
- * this is a no-op.  If needed, we can use j_blk_offset - everything is
- * ready.
- */
-int journal_bmap(journal_t *journal, unsigned int blocknr,
-		 unsigned int *retp)
-{
-	int err = 0;
-	unsigned int ret;
-
-	if (journal->j_inode) {
-		ret = bmap(journal->j_inode, blocknr);
-		if (ret)
-			*retp = ret;
-		else {
-			char b[BDEVNAME_SIZE];
-
-			printk(KERN_ALERT "%s: journal block not found "
-					"at offset %u on %s\n",
-				__func__,
-				blocknr,
-				bdevname(journal->j_dev, b));
-			err = -EIO;
-			__journal_abort_soft(journal, err);
-		}
-	} else {
-		*retp = blocknr; /* +journal->j_blk_offset */
-	}
-	return err;
-}
-
-/*
- * We play buffer_head aliasing tricks to write data/metadata blocks to
- * the journal without copying their contents, but for journal
- * descriptor blocks we do need to generate bona fide buffers.
- *
- * After the caller of journal_get_descriptor_buffer() has finished modifying
- * the buffer's contents they really should run flush_dcache_page(bh->b_page).
- * But we don't bother doing that, so there will be coherency problems with
- * mmaps of blockdevs which hold live JBD-controlled filesystems.
- */
-struct journal_head *journal_get_descriptor_buffer(journal_t *journal)
-{
-	struct buffer_head *bh;
-	unsigned int blocknr;
-	int err;
-
-	err = journal_next_log_block(journal, &blocknr);
-
-	if (err)
-		return NULL;
-
-	bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
-	if (!bh)
-		return NULL;
-	lock_buffer(bh);
-	memset(bh->b_data, 0, journal->j_blocksize);
-	set_buffer_uptodate(bh);
-	unlock_buffer(bh);
-	BUFFER_TRACE(bh, "return this buffer");
-	return journal_add_journal_head(bh);
-}
-
-/*
- * Management for journal control blocks: functions to create and
- * destroy journal_t structures, and to initialise and read existing
- * journal blocks from disk.  */
-
-/* First: create and setup a journal_t object in memory.  We initialise
- * very few fields yet: that has to wait until we have created the
- * journal structures from from scratch, or loaded them from disk. */
-
-static journal_t * journal_init_common (void)
-{
-	journal_t *journal;
-	int err;
-
-	journal = kzalloc(sizeof(*journal), GFP_KERNEL);
-	if (!journal)
-		goto fail;
-
-	init_waitqueue_head(&journal->j_wait_transaction_locked);
-	init_waitqueue_head(&journal->j_wait_logspace);
-	init_waitqueue_head(&journal->j_wait_done_commit);
-	init_waitqueue_head(&journal->j_wait_checkpoint);
-	init_waitqueue_head(&journal->j_wait_commit);
-	init_waitqueue_head(&journal->j_wait_updates);
-	mutex_init(&journal->j_checkpoint_mutex);
-	spin_lock_init(&journal->j_revoke_lock);
-	spin_lock_init(&journal->j_list_lock);
-	spin_lock_init(&journal->j_state_lock);
-
-	journal->j_commit_interval = (HZ * JBD_DEFAULT_MAX_COMMIT_AGE);
-
-	/* The journal is marked for error until we succeed with recovery! */
-	journal->j_flags = JFS_ABORT;
-
-	/* Set up a default-sized revoke table for the new mount. */
-	err = journal_init_revoke(journal, JOURNAL_REVOKE_DEFAULT_HASH);
-	if (err) {
-		kfree(journal);
-		goto fail;
-	}
-	return journal;
-fail:
-	return NULL;
-}
-
-/* journal_init_dev and journal_init_inode:
- *
- * Create a journal structure assigned some fixed set of disk blocks to
- * the journal.  We don't actually touch those disk blocks yet, but we
- * need to set up all of the mapping information to tell the journaling
- * system where the journal blocks are.
- *
- */
-
-/**
- *  journal_t * journal_init_dev() - creates and initialises a journal structure
- *  @bdev: Block device on which to create the journal
- *  @fs_dev: Device which hold journalled filesystem for this journal.
- *  @start: Block nr Start of journal.
- *  @len:  Length of the journal in blocks.
- *  @blocksize: blocksize of journalling device
- *
- *  Returns: a newly created journal_t *
- *
- *  journal_init_dev creates a journal which maps a fixed contiguous
- *  range of blocks on an arbitrary block device.
- *
- */
-journal_t * journal_init_dev(struct block_device *bdev,
-			struct block_device *fs_dev,
-			int start, int len, int blocksize)
-{
-	journal_t *journal = journal_init_common();
-	struct buffer_head *bh;
-	int n;
-
-	if (!journal)
-		return NULL;
-
-	/* journal descriptor can store up to n blocks -bzzz */
-	journal->j_blocksize = blocksize;
-	n = journal->j_blocksize / sizeof(journal_block_tag_t);
-	journal->j_wbufsize = n;
-	journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
-	if (!journal->j_wbuf) {
-		printk(KERN_ERR "%s: Can't allocate bhs for commit thread\n",
-			__func__);
-		goto out_err;
-	}
-	journal->j_dev = bdev;
-	journal->j_fs_dev = fs_dev;
-	journal->j_blk_offset = start;
-	journal->j_maxlen = len;
-
-	bh = __getblk(journal->j_dev, start, journal->j_blocksize);
-	if (!bh) {
-		printk(KERN_ERR
-		       "%s: Cannot get buffer for journal superblock\n",
-		       __func__);
-		goto out_err;
-	}
-	journal->j_sb_buffer = bh;
-	journal->j_superblock = (journal_superblock_t *)bh->b_data;
-
-	return journal;
-out_err:
-	kfree(journal->j_wbuf);
-	kfree(journal);
-	return NULL;
-}
-
-/**
- *  journal_t * journal_init_inode () - creates a journal which maps to a inode.
- *  @inode: An inode to create the journal in
- *
- * journal_init_inode creates a journal which maps an on-disk inode as
- * the journal.  The inode must exist already, must support bmap() and
- * must have all data blocks preallocated.
- */
-journal_t * journal_init_inode (struct inode *inode)
-{
-	struct buffer_head *bh;
-	journal_t *journal = journal_init_common();
-	int err;
-	int n;
-	unsigned int blocknr;
-
-	if (!journal)
-		return NULL;
-
-	journal->j_dev = journal->j_fs_dev = inode->i_sb->s_bdev;
-	journal->j_inode = inode;
-	jbd_debug(1,
-		  "journal %p: inode %s/%ld, size %Ld, bits %d, blksize %ld\n",
-		  journal, inode->i_sb->s_id, inode->i_ino,
-		  (long long) inode->i_size,
-		  inode->i_sb->s_blocksize_bits, inode->i_sb->s_blocksize);
-
-	journal->j_maxlen = inode->i_size >> inode->i_sb->s_blocksize_bits;
-	journal->j_blocksize = inode->i_sb->s_blocksize;
-
-	/* journal descriptor can store up to n blocks -bzzz */
-	n = journal->j_blocksize / sizeof(journal_block_tag_t);
-	journal->j_wbufsize = n;
-	journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
-	if (!journal->j_wbuf) {
-		printk(KERN_ERR "%s: Can't allocate bhs for commit thread\n",
-			__func__);
-		goto out_err;
-	}
-
-	err = journal_bmap(journal, 0, &blocknr);
-	/* If that failed, give up */
-	if (err) {
-		printk(KERN_ERR "%s: Cannot locate journal superblock\n",
-		       __func__);
-		goto out_err;
-	}
-
-	bh = getblk_unmovable(journal->j_dev, blocknr, journal->j_blocksize);
-	if (!bh) {
-		printk(KERN_ERR
-		       "%s: Cannot get buffer for journal superblock\n",
-		       __func__);
-		goto out_err;
-	}
-	journal->j_sb_buffer = bh;
-	journal->j_superblock = (journal_superblock_t *)bh->b_data;
-
-	return journal;
-out_err:
-	kfree(journal->j_wbuf);
-	kfree(journal);
-	return NULL;
-}
-
-/*
- * If the journal init or create aborts, we need to mark the journal
- * superblock as being NULL to prevent the journal destroy from writing
- * back a bogus superblock.
- */
-static void journal_fail_superblock (journal_t *journal)
-{
-	struct buffer_head *bh = journal->j_sb_buffer;
-	brelse(bh);
-	journal->j_sb_buffer = NULL;
-}
-
-/*
- * Given a journal_t structure, initialise the various fields for
- * startup of a new journaling session.  We use this both when creating
- * a journal, and after recovering an old journal to reset it for
- * subsequent use.
- */
-
-static int journal_reset(journal_t *journal)
-{
-	journal_superblock_t *sb = journal->j_superblock;
-	unsigned int first, last;
-
-	first = be32_to_cpu(sb->s_first);
-	last = be32_to_cpu(sb->s_maxlen);
-	if (first + JFS_MIN_JOURNAL_BLOCKS > last + 1) {
-		printk(KERN_ERR "JBD: Journal too short (blocks %u-%u).\n",
-		       first, last);
-		journal_fail_superblock(journal);
-		return -EINVAL;
-	}
-
-	journal->j_first = first;
-	journal->j_last = last;
-
-	journal->j_head = first;
-	journal->j_tail = first;
-	journal->j_free = last - first;
-
-	journal->j_tail_sequence = journal->j_transaction_sequence;
-	journal->j_commit_sequence = journal->j_transaction_sequence - 1;
-	journal->j_commit_request = journal->j_commit_sequence;
-
-	journal->j_max_transaction_buffers = journal->j_maxlen / 4;
-
-	/*
-	 * As a special case, if the on-disk copy is already marked as needing
-	 * no recovery (s_start == 0), then we can safely defer the superblock
-	 * update until the next commit by setting JFS_FLUSHED.  This avoids
-	 * attempting a write to a potential-readonly device.
-	 */
-	if (sb->s_start == 0) {
-		jbd_debug(1,"JBD: Skipping superblock update on recovered sb "
-			"(start %u, seq %d, errno %d)\n",
-			journal->j_tail, journal->j_tail_sequence,
-			journal->j_errno);
-		journal->j_flags |= JFS_FLUSHED;
-	} else {
-		/* Lock here to make assertions happy... */
-		mutex_lock(&journal->j_checkpoint_mutex);
-		/*
-		 * Update log tail information. We use WRITE_FUA since new
-		 * transaction will start reusing journal space and so we
-		 * must make sure information about current log tail is on
-		 * disk before that.
-		 */
-		journal_update_sb_log_tail(journal,
-					   journal->j_tail_sequence,
-					   journal->j_tail,
-					   WRITE_FUA);
-		mutex_unlock(&journal->j_checkpoint_mutex);
-	}
-	return journal_start_thread(journal);
-}
-
-/**
- * int journal_create() - Initialise the new journal file
- * @journal: Journal to create. This structure must have been initialised
- *
- * Given a journal_t structure which tells us which disk blocks we can
- * use, create a new journal superblock and initialise all of the
- * journal fields from scratch.
- **/
-int journal_create(journal_t *journal)
-{
-	unsigned int blocknr;
-	struct buffer_head *bh;
-	journal_superblock_t *sb;
-	int i, err;
-
-	if (journal->j_maxlen < JFS_MIN_JOURNAL_BLOCKS) {
-		printk (KERN_ERR "Journal length (%d blocks) too short.\n",
-			journal->j_maxlen);
-		journal_fail_superblock(journal);
-		return -EINVAL;
-	}
-
-	if (journal->j_inode == NULL) {
-		/*
-		 * We don't know what block to start at!
-		 */
-		printk(KERN_EMERG
-		       "%s: creation of journal on external device!\n",
-		       __func__);
-		BUG();
-	}
-
-	/* Zero out the entire journal on disk.  We cannot afford to
-	   have any blocks on disk beginning with JFS_MAGIC_NUMBER. */
-	jbd_debug(1, "JBD: Zeroing out journal blocks...\n");
-	for (i = 0; i < journal->j_maxlen; i++) {
-		err = journal_bmap(journal, i, &blocknr);
-		if (err)
-			return err;
-		bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
-		if (unlikely(!bh))
-			return -ENOMEM;
-		lock_buffer(bh);
-		memset (bh->b_data, 0, journal->j_blocksize);
-		BUFFER_TRACE(bh, "marking dirty");
-		mark_buffer_dirty(bh);
-		BUFFER_TRACE(bh, "marking uptodate");
-		set_buffer_uptodate(bh);
-		unlock_buffer(bh);
-		__brelse(bh);
-	}
-
-	sync_blockdev(journal->j_dev);
-	jbd_debug(1, "JBD: journal cleared.\n");
-
-	/* OK, fill in the initial static fields in the new superblock */
-	sb = journal->j_superblock;
-
-	sb->s_header.h_magic	 = cpu_to_be32(JFS_MAGIC_NUMBER);
-	sb->s_header.h_blocktype = cpu_to_be32(JFS_SUPERBLOCK_V2);
-
-	sb->s_blocksize	= cpu_to_be32(journal->j_blocksize);
-	sb->s_maxlen	= cpu_to_be32(journal->j_maxlen);
-	sb->s_first	= cpu_to_be32(1);
-
-	journal->j_transaction_sequence = 1;
-
-	journal->j_flags &= ~JFS_ABORT;
-	journal->j_format_version = 2;
-
-	return journal_reset(journal);
-}
-
-static void journal_write_superblock(journal_t *journal, int write_op)
-{
-	struct buffer_head *bh = journal->j_sb_buffer;
-	int ret;
-
-	trace_journal_write_superblock(journal, write_op);
-	if (!(journal->j_flags & JFS_BARRIER))
-		write_op &= ~(REQ_FUA | REQ_FLUSH);
-	lock_buffer(bh);
-	if (buffer_write_io_error(bh)) {
-		char b[BDEVNAME_SIZE];
-		/*
-		 * Oh, dear.  A previous attempt to write the journal
-		 * superblock failed.  This could happen because the
-		 * USB device was yanked out.  Or it could happen to
-		 * be a transient write error and maybe the block will
-		 * be remapped.  Nothing we can do but to retry the
-		 * write and hope for the best.
-		 */
-		printk(KERN_ERR "JBD: previous I/O error detected "
-		       "for journal superblock update for %s.\n",
-		       journal_dev_name(journal, b));
-		clear_buffer_write_io_error(bh);
-		set_buffer_uptodate(bh);
-	}
-
-	get_bh(bh);
-	bh->b_end_io = end_buffer_write_sync;
-	ret = submit_bh(write_op, bh);
-	wait_on_buffer(bh);
-	if (buffer_write_io_error(bh)) {
-		clear_buffer_write_io_error(bh);
-		set_buffer_uptodate(bh);
-		ret = -EIO;
-	}
-	if (ret) {
-		char b[BDEVNAME_SIZE];
-		printk(KERN_ERR "JBD: Error %d detected "
-		       "when updating journal superblock for %s.\n",
-		       ret, journal_dev_name(journal, b));
-	}
-}
-
-/**
- * journal_update_sb_log_tail() - Update log tail in journal sb on disk.
- * @journal: The journal to update.
- * @tail_tid: TID of the new transaction at the tail of the log
- * @tail_block: The first block of the transaction at the tail of the log
- * @write_op: With which operation should we write the journal sb
- *
- * Update a journal's superblock information about log tail and write it to
- * disk, waiting for the IO to complete.
- */
-void journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
-				unsigned int tail_block, int write_op)
-{
-	journal_superblock_t *sb = journal->j_superblock;
-
-	BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
-	jbd_debug(1,"JBD: updating superblock (start %u, seq %u)\n",
-		  tail_block, tail_tid);
-
-	sb->s_sequence = cpu_to_be32(tail_tid);
-	sb->s_start    = cpu_to_be32(tail_block);
-
-	journal_write_superblock(journal, write_op);
-
-	/* Log is no longer empty */
-	spin_lock(&journal->j_state_lock);
-	WARN_ON(!sb->s_sequence);
-	journal->j_flags &= ~JFS_FLUSHED;
-	spin_unlock(&journal->j_state_lock);
-}
-
-/**
- * mark_journal_empty() - Mark on disk journal as empty.
- * @journal: The journal to update.
- *
- * Update a journal's dynamic superblock fields to show that journal is empty.
- * Write updated superblock to disk waiting for IO to complete.
- */
-static void mark_journal_empty(journal_t *journal)
-{
-	journal_superblock_t *sb = journal->j_superblock;
-
-	BUG_ON(!mutex_is_locked(&journal->j_checkpoint_mutex));
-	spin_lock(&journal->j_state_lock);
-	/* Is it already empty? */
-	if (sb->s_start == 0) {
-		spin_unlock(&journal->j_state_lock);
-		return;
-	}
-	jbd_debug(1, "JBD: Marking journal as empty (seq %d)\n",
-        	  journal->j_tail_sequence);
-
-	sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
-	sb->s_start    = cpu_to_be32(0);
-	spin_unlock(&journal->j_state_lock);
-
-	journal_write_superblock(journal, WRITE_FUA);
-
-	spin_lock(&journal->j_state_lock);
-	/* Log is empty */
-	journal->j_flags |= JFS_FLUSHED;
-	spin_unlock(&journal->j_state_lock);
-}
-
-/**
- * journal_update_sb_errno() - Update error in the journal.
- * @journal: The journal to update.
- *
- * Update a journal's errno.  Write updated superblock to disk waiting for IO
- * to complete.
- */
-static void journal_update_sb_errno(journal_t *journal)
-{
-	journal_superblock_t *sb = journal->j_superblock;
-
-	spin_lock(&journal->j_state_lock);
-	jbd_debug(1, "JBD: updating superblock error (errno %d)\n",
-        	  journal->j_errno);
-	sb->s_errno = cpu_to_be32(journal->j_errno);
-	spin_unlock(&journal->j_state_lock);
-
-	journal_write_superblock(journal, WRITE_SYNC);
-}
-
-/*
- * Read the superblock for a given journal, performing initial
- * validation of the format.
- */
-
-static int journal_get_superblock(journal_t *journal)
-{
-	struct buffer_head *bh;
-	journal_superblock_t *sb;
-	int err = -EIO;
-
-	bh = journal->j_sb_buffer;
-
-	J_ASSERT(bh != NULL);
-	if (!buffer_uptodate(bh)) {
-		ll_rw_block(READ, 1, &bh);
-		wait_on_buffer(bh);
-		if (!buffer_uptodate(bh)) {
-			printk (KERN_ERR
-				"JBD: IO error reading journal superblock\n");
-			goto out;
-		}
-	}
-
-	sb = journal->j_superblock;
-
-	err = -EINVAL;
-
-	if (sb->s_header.h_magic != cpu_to_be32(JFS_MAGIC_NUMBER) ||
-	    sb->s_blocksize != cpu_to_be32(journal->j_blocksize)) {
-		printk(KERN_WARNING "JBD: no valid journal superblock found\n");
-		goto out;
-	}
-
-	switch(be32_to_cpu(sb->s_header.h_blocktype)) {
-	case JFS_SUPERBLOCK_V1:
-		journal->j_format_version = 1;
-		break;
-	case JFS_SUPERBLOCK_V2:
-		journal->j_format_version = 2;
-		break;
-	default:
-		printk(KERN_WARNING "JBD: unrecognised superblock format ID\n");
-		goto out;
-	}
-
-	if (be32_to_cpu(sb->s_maxlen) < journal->j_maxlen)
-		journal->j_maxlen = be32_to_cpu(sb->s_maxlen);
-	else if (be32_to_cpu(sb->s_maxlen) > journal->j_maxlen) {
-		printk (KERN_WARNING "JBD: journal file too short\n");
-		goto out;
-	}
-
-	if (be32_to_cpu(sb->s_first) == 0 ||
-	    be32_to_cpu(sb->s_first) >= journal->j_maxlen) {
-		printk(KERN_WARNING
-			"JBD: Invalid start block of journal: %u\n",
-			be32_to_cpu(sb->s_first));
-		goto out;
-	}
-
-	return 0;
-
-out:
-	journal_fail_superblock(journal);
-	return err;
-}
-
-/*
- * Load the on-disk journal superblock and read the key fields into the
- * journal_t.
- */
-
-static int load_superblock(journal_t *journal)
-{
-	int err;
-	journal_superblock_t *sb;
-
-	err = journal_get_superblock(journal);
-	if (err)
-		return err;
-
-	sb = journal->j_superblock;
-
-	journal->j_tail_sequence = be32_to_cpu(sb->s_sequence);
-	journal->j_tail = be32_to_cpu(sb->s_start);
-	journal->j_first = be32_to_cpu(sb->s_first);
-	journal->j_last = be32_to_cpu(sb->s_maxlen);
-	journal->j_errno = be32_to_cpu(sb->s_errno);
-
-	return 0;
-}
-
-
-/**
- * int journal_load() - Read journal from disk.
- * @journal: Journal to act on.
- *
- * Given a journal_t structure which tells us which disk blocks contain
- * a journal, read the journal from disk to initialise the in-memory
- * structures.
- */
-int journal_load(journal_t *journal)
-{
-	int err;
-	journal_superblock_t *sb;
-
-	err = load_superblock(journal);
-	if (err)
-		return err;
-
-	sb = journal->j_superblock;
-	/* If this is a V2 superblock, then we have to check the
-	 * features flags on it. */
-
-	if (journal->j_format_version >= 2) {
-		if ((sb->s_feature_ro_compat &
-		     ~cpu_to_be32(JFS_KNOWN_ROCOMPAT_FEATURES)) ||
-		    (sb->s_feature_incompat &
-		     ~cpu_to_be32(JFS_KNOWN_INCOMPAT_FEATURES))) {
-			printk (KERN_WARNING
-				"JBD: Unrecognised features on journal\n");
-			return -EINVAL;
-		}
-	}
-
-	/* Let the recovery code check whether it needs to recover any
-	 * data from the journal. */
-	if (journal_recover(journal))
-		goto recovery_error;
-
-	/* OK, we've finished with the dynamic journal bits:
-	 * reinitialise the dynamic contents of the superblock in memory
-	 * and reset them on disk. */
-	if (journal_reset(journal))
-		goto recovery_error;
-
-	journal->j_flags &= ~JFS_ABORT;
-	journal->j_flags |= JFS_LOADED;
-	return 0;
-
-recovery_error:
-	printk (KERN_WARNING "JBD: recovery failed\n");
-	return -EIO;
-}
-
-/**
- * void journal_destroy() - Release a journal_t structure.
- * @journal: Journal to act on.
- *
- * Release a journal_t structure once it is no longer in use by the
- * journaled object.
- * Return <0 if we couldn't clean up the journal.
- */
-int journal_destroy(journal_t *journal)
-{
-	int err = 0;
-
-	
-	/* Wait for the commit thread to wake up and die. */
-	journal_kill_thread(journal);
-
-	/* Force a final log commit */
-	if (journal->j_running_transaction)
-		journal_commit_transaction(journal);
-
-	/* Force any old transactions to disk */
-
-	/* We cannot race with anybody but must keep assertions happy */
-	mutex_lock(&journal->j_checkpoint_mutex);
-	/* Totally anal locking here... */
-	spin_lock(&journal->j_list_lock);
-	while (journal->j_checkpoint_transactions != NULL) {
-		spin_unlock(&journal->j_list_lock);
-		log_do_checkpoint(journal);
-		spin_lock(&journal->j_list_lock);
-	}
-
-	J_ASSERT(journal->j_running_transaction == NULL);
-	J_ASSERT(journal->j_committing_transaction == NULL);
-	J_ASSERT(journal->j_checkpoint_transactions == NULL);
-	spin_unlock(&journal->j_list_lock);
-
-	if (journal->j_sb_buffer) {
-		if (!is_journal_aborted(journal)) {
-			journal->j_tail_sequence =
-				++journal->j_transaction_sequence;
-			mark_journal_empty(journal);
-		} else
-			err = -EIO;
-		brelse(journal->j_sb_buffer);
-	}
-	mutex_unlock(&journal->j_checkpoint_mutex);
-
-	iput(journal->j_inode);
-	if (journal->j_revoke)
-		journal_destroy_revoke(journal);
-	kfree(journal->j_wbuf);
-	kfree(journal);
-
-	return err;
-}
-
-
-/**
- *int journal_check_used_features () - Check if features specified are used.
- * @journal: Journal to check.
- * @compat: bitmask of compatible features
- * @ro: bitmask of features that force read-only mount
- * @incompat: bitmask of incompatible features
- *
- * Check whether the journal uses all of a given set of
- * features.  Return true (non-zero) if it does.
- **/
-
-int journal_check_used_features (journal_t *journal, unsigned long compat,
-				 unsigned long ro, unsigned long incompat)
-{
-	journal_superblock_t *sb;
-
-	if (!compat && !ro && !incompat)
-		return 1;
-	if (journal->j_format_version == 1)
-		return 0;
-
-	sb = journal->j_superblock;
-
-	if (((be32_to_cpu(sb->s_feature_compat) & compat) == compat) &&
-	    ((be32_to_cpu(sb->s_feature_ro_compat) & ro) == ro) &&
-	    ((be32_to_cpu(sb->s_feature_incompat) & incompat) == incompat))
-		return 1;
-
-	return 0;
-}
-
-/**
- * int journal_check_available_features() - Check feature set in journalling layer
- * @journal: Journal to check.
- * @compat: bitmask of compatible features
- * @ro: bitmask of features that force read-only mount
- * @incompat: bitmask of incompatible features
- *
- * Check whether the journaling code supports the use of
- * all of a given set of features on this journal.  Return true
- * (non-zero) if it can. */
-
-int journal_check_available_features (journal_t *journal, unsigned long compat,
-				      unsigned long ro, unsigned long incompat)
-{
-	if (!compat && !ro && !incompat)
-		return 1;
-
-	/* We can support any known requested features iff the
-	 * superblock is in version 2.  Otherwise we fail to support any
-	 * extended sb features. */
-
-	if (journal->j_format_version != 2)
-		return 0;
-
-	if ((compat   & JFS_KNOWN_COMPAT_FEATURES) == compat &&
-	    (ro       & JFS_KNOWN_ROCOMPAT_FEATURES) == ro &&
-	    (incompat & JFS_KNOWN_INCOMPAT_FEATURES) == incompat)
-		return 1;
-
-	return 0;
-}
-
-/**
- * int journal_set_features () - Mark a given journal feature in the superblock
- * @journal: Journal to act on.
- * @compat: bitmask of compatible features
- * @ro: bitmask of features that force read-only mount
- * @incompat: bitmask of incompatible features
- *
- * Mark a given journal feature as present on the
- * superblock.  Returns true if the requested features could be set.
- *
- */
-
-int journal_set_features (journal_t *journal, unsigned long compat,
-			  unsigned long ro, unsigned long incompat)
-{
-	journal_superblock_t *sb;
-
-	if (journal_check_used_features(journal, compat, ro, incompat))
-		return 1;
-
-	if (!journal_check_available_features(journal, compat, ro, incompat))
-		return 0;
-
-	jbd_debug(1, "Setting new features 0x%lx/0x%lx/0x%lx\n",
-		  compat, ro, incompat);
-
-	sb = journal->j_superblock;
-
-	sb->s_feature_compat    |= cpu_to_be32(compat);
-	sb->s_feature_ro_compat |= cpu_to_be32(ro);
-	sb->s_feature_incompat  |= cpu_to_be32(incompat);
-
-	return 1;
-}
-
-
-/**
- * int journal_update_format () - Update on-disk journal structure.
- * @journal: Journal to act on.
- *
- * Given an initialised but unloaded journal struct, poke about in the
- * on-disk structure to update it to the most recent supported version.
- */
-int journal_update_format (journal_t *journal)
-{
-	journal_superblock_t *sb;
-	int err;
-
-	err = journal_get_superblock(journal);
-	if (err)
-		return err;
-
-	sb = journal->j_superblock;
-
-	switch (be32_to_cpu(sb->s_header.h_blocktype)) {
-	case JFS_SUPERBLOCK_V2:
-		return 0;
-	case JFS_SUPERBLOCK_V1:
-		return journal_convert_superblock_v1(journal, sb);
-	default:
-		break;
-	}
-	return -EINVAL;
-}
-
-static int journal_convert_superblock_v1(journal_t *journal,
-					 journal_superblock_t *sb)
-{
-	int offset, blocksize;
-	struct buffer_head *bh;
-
-	printk(KERN_WARNING
-		"JBD: Converting superblock from version 1 to 2.\n");
-
-	/* Pre-initialise new fields to zero */
-	offset = ((char *) &(sb->s_feature_compat)) - ((char *) sb);
-	blocksize = be32_to_cpu(sb->s_blocksize);
-	memset(&sb->s_feature_compat, 0, blocksize-offset);
-
-	sb->s_nr_users = cpu_to_be32(1);
-	sb->s_header.h_blocktype = cpu_to_be32(JFS_SUPERBLOCK_V2);
-	journal->j_format_version = 2;
-
-	bh = journal->j_sb_buffer;
-	BUFFER_TRACE(bh, "marking dirty");
-	mark_buffer_dirty(bh);
-	sync_dirty_buffer(bh);
-	return 0;
-}
-
-
-/**
- * int journal_flush () - Flush journal
- * @journal: Journal to act on.
- *
- * Flush all data for a given journal to disk and empty the journal.
- * Filesystems can use this when remounting readonly to ensure that
- * recovery does not need to happen on remount.
- */
-
-int journal_flush(journal_t *journal)
-{
-	int err = 0;
-	transaction_t *transaction = NULL;
-
-	spin_lock(&journal->j_state_lock);
-
-	/* Force everything buffered to the log... */
-	if (journal->j_running_transaction) {
-		transaction = journal->j_running_transaction;
-		__log_start_commit(journal, transaction->t_tid);
-	} else if (journal->j_committing_transaction)
-		transaction = journal->j_committing_transaction;
-
-	/* Wait for the log commit to complete... */
-	if (transaction) {
-		tid_t tid = transaction->t_tid;
-
-		spin_unlock(&journal->j_state_lock);
-		log_wait_commit(journal, tid);
-	} else {
-		spin_unlock(&journal->j_state_lock);
-	}
-
-	/* ...and flush everything in the log out to disk. */
-	spin_lock(&journal->j_list_lock);
-	while (!err && journal->j_checkpoint_transactions != NULL) {
-		spin_unlock(&journal->j_list_lock);
-		mutex_lock(&journal->j_checkpoint_mutex);
-		err = log_do_checkpoint(journal);
-		mutex_unlock(&journal->j_checkpoint_mutex);
-		spin_lock(&journal->j_list_lock);
-	}
-	spin_unlock(&journal->j_list_lock);
-
-	if (is_journal_aborted(journal))
-		return -EIO;
-
-	mutex_lock(&journal->j_checkpoint_mutex);
-	cleanup_journal_tail(journal);
-
-	/* Finally, mark the journal as really needing no recovery.
-	 * This sets s_start==0 in the underlying superblock, which is
-	 * the magic code for a fully-recovered superblock.  Any future
-	 * commits of data to the journal will restore the current
-	 * s_start value. */
-	mark_journal_empty(journal);
-	mutex_unlock(&journal->j_checkpoint_mutex);
-	spin_lock(&journal->j_state_lock);
-	J_ASSERT(!journal->j_running_transaction);
-	J_ASSERT(!journal->j_committing_transaction);
-	J_ASSERT(!journal->j_checkpoint_transactions);
-	J_ASSERT(journal->j_head == journal->j_tail);
-	J_ASSERT(journal->j_tail_sequence == journal->j_transaction_sequence);
-	spin_unlock(&journal->j_state_lock);
-	return 0;
-}
-
-/**
- * int journal_wipe() - Wipe journal contents
- * @journal: Journal to act on.
- * @write: flag (see below)
- *
- * Wipe out all of the contents of a journal, safely.  This will produce
- * a warning if the journal contains any valid recovery information.
- * Must be called between journal_init_*() and journal_load().
- *
- * If 'write' is non-zero, then we wipe out the journal on disk; otherwise
- * we merely suppress recovery.
- */
-
-int journal_wipe(journal_t *journal, int write)
-{
-	int err = 0;
-
-	J_ASSERT (!(journal->j_flags & JFS_LOADED));
-
-	err = load_superblock(journal);
-	if (err)
-		return err;
-
-	if (!journal->j_tail)
-		goto no_recovery;
-
-	printk (KERN_WARNING "JBD: %s recovery information on journal\n",
-		write ? "Clearing" : "Ignoring");
-
-	err = journal_skip_recovery(journal);
-	if (write) {
-		/* Lock to make assertions happy... */
-		mutex_lock(&journal->j_checkpoint_mutex);
-		mark_journal_empty(journal);
-		mutex_unlock(&journal->j_checkpoint_mutex);
-	}
-
- no_recovery:
-	return err;
-}
-
-/*
- * journal_dev_name: format a character string to describe on what
- * device this journal is present.
- */
-
-static const char *journal_dev_name(journal_t *journal, char *buffer)
-{
-	struct block_device *bdev;
-
-	if (journal->j_inode)
-		bdev = journal->j_inode->i_sb->s_bdev;
-	else
-		bdev = journal->j_dev;
-
-	return bdevname(bdev, buffer);
-}
-
-/*
- * Journal abort has very specific semantics, which we describe
- * for journal abort.
- *
- * Two internal function, which provide abort to te jbd layer
- * itself are here.
- */
-
-/*
- * Quick version for internal journal use (doesn't lock the journal).
- * Aborts hard --- we mark the abort as occurred, but do _nothing_ else,
- * and don't attempt to make any other journal updates.
- */
-static void __journal_abort_hard(journal_t *journal)
-{
-	transaction_t *transaction;
-	char b[BDEVNAME_SIZE];
-
-	if (journal->j_flags & JFS_ABORT)
-		return;
-
-	printk(KERN_ERR "Aborting journal on device %s.\n",
-		journal_dev_name(journal, b));
-
-	spin_lock(&journal->j_state_lock);
-	journal->j_flags |= JFS_ABORT;
-	transaction = journal->j_running_transaction;
-	if (transaction)
-		__log_start_commit(journal, transaction->t_tid);
-	spin_unlock(&journal->j_state_lock);
-}
-
-/* Soft abort: record the abort error status in the journal superblock,
- * but don't do any other IO. */
-static void __journal_abort_soft (journal_t *journal, int errno)
-{
-	if (journal->j_flags & JFS_ABORT)
-		return;
-
-	if (!journal->j_errno)
-		journal->j_errno = errno;
-
-	__journal_abort_hard(journal);
-
-	if (errno)
-		journal_update_sb_errno(journal);
-}
-
-/**
- * void journal_abort () - Shutdown the journal immediately.
- * @journal: the journal to shutdown.
- * @errno:   an error number to record in the journal indicating
- *           the reason for the shutdown.
- *
- * Perform a complete, immediate shutdown of the ENTIRE
- * journal (not of a single transaction).  This operation cannot be
- * undone without closing and reopening the journal.
- *
- * The journal_abort function is intended to support higher level error
- * recovery mechanisms such as the ext2/ext3 remount-readonly error
- * mode.
- *
- * Journal abort has very specific semantics.  Any existing dirty,
- * unjournaled buffers in the main filesystem will still be written to
- * disk by bdflush, but the journaling mechanism will be suspended
- * immediately and no further transaction commits will be honoured.
- *
- * Any dirty, journaled buffers will be written back to disk without
- * hitting the journal.  Atomicity cannot be guaranteed on an aborted
- * filesystem, but we _do_ attempt to leave as much data as possible
- * behind for fsck to use for cleanup.
- *
- * Any attempt to get a new transaction handle on a journal which is in
- * ABORT state will just result in an -EROFS error return.  A
- * journal_stop on an existing handle will return -EIO if we have
- * entered abort state during the update.
- *
- * Recursive transactions are not disturbed by journal abort until the
- * final journal_stop, which will receive the -EIO error.
- *
- * Finally, the journal_abort call allows the caller to supply an errno
- * which will be recorded (if possible) in the journal superblock.  This
- * allows a client to record failure conditions in the middle of a
- * transaction without having to complete the transaction to record the
- * failure to disk.  ext3_error, for example, now uses this
- * functionality.
- *
- * Errors which originate from within the journaling layer will NOT
- * supply an errno; a null errno implies that absolutely no further
- * writes are done to the journal (unless there are any already in
- * progress).
- *
- */
-
-void journal_abort(journal_t *journal, int errno)
-{
-	__journal_abort_soft(journal, errno);
-}
-
-/**
- * int journal_errno () - returns the journal's error state.
- * @journal: journal to examine.
- *
- * This is the errno numbet set with journal_abort(), the last
- * time the journal was mounted - if the journal was stopped
- * without calling abort this will be 0.
- *
- * If the journal has been aborted on this mount time -EROFS will
- * be returned.
- */
-int journal_errno(journal_t *journal)
-{
-	int err;
-
-	spin_lock(&journal->j_state_lock);
-	if (journal->j_flags & JFS_ABORT)
-		err = -EROFS;
-	else
-		err = journal->j_errno;
-	spin_unlock(&journal->j_state_lock);
-	return err;
-}
-
-/**
- * int journal_clear_err () - clears the journal's error state
- * @journal: journal to act on.
- *
- * An error must be cleared or Acked to take a FS out of readonly
- * mode.
- */
-int journal_clear_err(journal_t *journal)
-{
-	int err = 0;
-
-	spin_lock(&journal->j_state_lock);
-	if (journal->j_flags & JFS_ABORT)
-		err = -EROFS;
-	else
-		journal->j_errno = 0;
-	spin_unlock(&journal->j_state_lock);
-	return err;
-}
-
-/**
- * void journal_ack_err() - Ack journal err.
- * @journal: journal to act on.
- *
- * An error must be cleared or Acked to take a FS out of readonly
- * mode.
- */
-void journal_ack_err(journal_t *journal)
-{
-	spin_lock(&journal->j_state_lock);
-	if (journal->j_errno)
-		journal->j_flags |= JFS_ACK_ERR;
-	spin_unlock(&journal->j_state_lock);
-}
-
-int journal_blocks_per_page(struct inode *inode)
-{
-	return 1 << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
-}
-
-/*
- * Journal_head storage management
- */
-static struct kmem_cache *journal_head_cache;
-#ifdef CONFIG_JBD_DEBUG
-static atomic_t nr_journal_heads = ATOMIC_INIT(0);
-#endif
-
-static int journal_init_journal_head_cache(void)
-{
-	int retval;
-
-	J_ASSERT(journal_head_cache == NULL);
-	journal_head_cache = kmem_cache_create("journal_head",
-				sizeof(struct journal_head),
-				0,		/* offset */
-				SLAB_TEMPORARY,	/* flags */
-				NULL);		/* ctor */
-	retval = 0;
-	if (!journal_head_cache) {
-		retval = -ENOMEM;
-		printk(KERN_EMERG "JBD: no memory for journal_head cache\n");
-	}
-	return retval;
-}
-
-static void journal_destroy_journal_head_cache(void)
-{
-	if (journal_head_cache) {
-		kmem_cache_destroy(journal_head_cache);
-		journal_head_cache = NULL;
-	}
-}
-
-/*
- * journal_head splicing and dicing
- */
-static struct journal_head *journal_alloc_journal_head(void)
-{
-	struct journal_head *ret;
-
-#ifdef CONFIG_JBD_DEBUG
-	atomic_inc(&nr_journal_heads);
-#endif
-	ret = kmem_cache_zalloc(journal_head_cache, GFP_NOFS);
-	if (ret == NULL) {
-		jbd_debug(1, "out of memory for journal_head\n");
-		printk_ratelimited(KERN_NOTICE "ENOMEM in %s, retrying.\n",
-				   __func__);
-
-		while (ret == NULL) {
-			yield();
-			ret = kmem_cache_zalloc(journal_head_cache, GFP_NOFS);
-		}
-	}
-	return ret;
-}
-
-static void journal_free_journal_head(struct journal_head *jh)
-{
-#ifdef CONFIG_JBD_DEBUG
-	atomic_dec(&nr_journal_heads);
-	memset(jh, JBD_POISON_FREE, sizeof(*jh));
-#endif
-	kmem_cache_free(journal_head_cache, jh);
-}
-
-/*
- * A journal_head is attached to a buffer_head whenever JBD has an
- * interest in the buffer.
- *
- * Whenever a buffer has an attached journal_head, its ->b_state:BH_JBD bit
- * is set.  This bit is tested in core kernel code where we need to take
- * JBD-specific actions.  Testing the zeroness of ->b_private is not reliable
- * there.
- *
- * When a buffer has its BH_JBD bit set, its ->b_count is elevated by one.
- *
- * When a buffer has its BH_JBD bit set it is immune from being released by
- * core kernel code, mainly via ->b_count.
- *
- * A journal_head is detached from its buffer_head when the journal_head's
- * b_jcount reaches zero. Running transaction (b_transaction) and checkpoint
- * transaction (b_cp_transaction) hold their references to b_jcount.
- *
- * Various places in the kernel want to attach a journal_head to a buffer_head
- * _before_ attaching the journal_head to a transaction.  To protect the
- * journal_head in this situation, journal_add_journal_head elevates the
- * journal_head's b_jcount refcount by one.  The caller must call
- * journal_put_journal_head() to undo this.
- *
- * So the typical usage would be:
- *
- *	(Attach a journal_head if needed.  Increments b_jcount)
- *	struct journal_head *jh = journal_add_journal_head(bh);
- *	...
- *      (Get another reference for transaction)
- *      journal_grab_journal_head(bh);
- *      jh->b_transaction = xxx;
- *      (Put original reference)
- *      journal_put_journal_head(jh);
- */
-
-/*
- * Give a buffer_head a journal_head.
- *
- * May sleep.
- */
-struct journal_head *journal_add_journal_head(struct buffer_head *bh)
-{
-	struct journal_head *jh;
-	struct journal_head *new_jh = NULL;
-
-repeat:
-	if (!buffer_jbd(bh))
-		new_jh = journal_alloc_journal_head();
-
-	jbd_lock_bh_journal_head(bh);
-	if (buffer_jbd(bh)) {
-		jh = bh2jh(bh);
-	} else {
-		J_ASSERT_BH(bh,
-			(atomic_read(&bh->b_count) > 0) ||
-			(bh->b_page && bh->b_page->mapping));
-
-		if (!new_jh) {
-			jbd_unlock_bh_journal_head(bh);
-			goto repeat;
-		}
-
-		jh = new_jh;
-		new_jh = NULL;		/* We consumed it */
-		set_buffer_jbd(bh);
-		bh->b_private = jh;
-		jh->b_bh = bh;
-		get_bh(bh);
-		BUFFER_TRACE(bh, "added journal_head");
-	}
-	jh->b_jcount++;
-	jbd_unlock_bh_journal_head(bh);
-	if (new_jh)
-		journal_free_journal_head(new_jh);
-	return bh->b_private;
-}
-
-/*
- * Grab a ref against this buffer_head's journal_head.  If it ended up not
- * having a journal_head, return NULL
- */
-struct journal_head *journal_grab_journal_head(struct buffer_head *bh)
-{
-	struct journal_head *jh = NULL;
-
-	jbd_lock_bh_journal_head(bh);
-	if (buffer_jbd(bh)) {
-		jh = bh2jh(bh);
-		jh->b_jcount++;
-	}
-	jbd_unlock_bh_journal_head(bh);
-	return jh;
-}
-
-static void __journal_remove_journal_head(struct buffer_head *bh)
-{
-	struct journal_head *jh = bh2jh(bh);
-
-	J_ASSERT_JH(jh, jh->b_jcount >= 0);
-	J_ASSERT_JH(jh, jh->b_transaction == NULL);
-	J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
-	J_ASSERT_JH(jh, jh->b_cp_transaction == NULL);
-	J_ASSERT_JH(jh, jh->b_jlist == BJ_None);
-	J_ASSERT_BH(bh, buffer_jbd(bh));
-	J_ASSERT_BH(bh, jh2bh(jh) == bh);
-	BUFFER_TRACE(bh, "remove journal_head");
-	if (jh->b_frozen_data) {
-		printk(KERN_WARNING "%s: freeing b_frozen_data\n", __func__);
-		jbd_free(jh->b_frozen_data, bh->b_size);
-	}
-	if (jh->b_committed_data) {
-		printk(KERN_WARNING "%s: freeing b_committed_data\n", __func__);
-		jbd_free(jh->b_committed_data, bh->b_size);
-	}
-	bh->b_private = NULL;
-	jh->b_bh = NULL;	/* debug, really */
-	clear_buffer_jbd(bh);
-	journal_free_journal_head(jh);
-}
-
-/*
- * Drop a reference on the passed journal_head.  If it fell to zero then
- * release the journal_head from the buffer_head.
- */
-void journal_put_journal_head(struct journal_head *jh)
-{
-	struct buffer_head *bh = jh2bh(jh);
-
-	jbd_lock_bh_journal_head(bh);
-	J_ASSERT_JH(jh, jh->b_jcount > 0);
-	--jh->b_jcount;
-	if (!jh->b_jcount) {
-		__journal_remove_journal_head(bh);
-		jbd_unlock_bh_journal_head(bh);
-		__brelse(bh);
-	} else
-		jbd_unlock_bh_journal_head(bh);
-}
-
-/*
- * debugfs tunables
- */
-#ifdef CONFIG_JBD_DEBUG
-
-u8 journal_enable_debug __read_mostly;
-EXPORT_SYMBOL(journal_enable_debug);
-
-static struct dentry *jbd_debugfs_dir;
-static struct dentry *jbd_debug;
-
-static void __init jbd_create_debugfs_entry(void)
-{
-	jbd_debugfs_dir = debugfs_create_dir("jbd", NULL);
-	if (jbd_debugfs_dir)
-		jbd_debug = debugfs_create_u8("jbd-debug", S_IRUGO | S_IWUSR,
-					       jbd_debugfs_dir,
-					       &journal_enable_debug);
-}
-
-static void __exit jbd_remove_debugfs_entry(void)
-{
-	debugfs_remove(jbd_debug);
-	debugfs_remove(jbd_debugfs_dir);
-}
-
-#else
-
-static inline void jbd_create_debugfs_entry(void)
-{
-}
-
-static inline void jbd_remove_debugfs_entry(void)
-{
-}
-
-#endif
-
-struct kmem_cache *jbd_handle_cache;
-
-static int __init journal_init_handle_cache(void)
-{
-	jbd_handle_cache = kmem_cache_create("journal_handle",
-				sizeof(handle_t),
-				0,		/* offset */
-				SLAB_TEMPORARY,	/* flags */
-				NULL);		/* ctor */
-	if (jbd_handle_cache == NULL) {
-		printk(KERN_EMERG "JBD: failed to create handle cache\n");
-		return -ENOMEM;
-	}
-	return 0;
-}
-
-static void journal_destroy_handle_cache(void)
-{
-	if (jbd_handle_cache)
-		kmem_cache_destroy(jbd_handle_cache);
-}
-
-/*
- * Module startup and shutdown
- */
-
-static int __init journal_init_caches(void)
-{
-	int ret;
-
-	ret = journal_init_revoke_caches();
-	if (ret == 0)
-		ret = journal_init_journal_head_cache();
-	if (ret == 0)
-		ret = journal_init_handle_cache();
-	return ret;
-}
-
-static void journal_destroy_caches(void)
-{
-	journal_destroy_revoke_caches();
-	journal_destroy_journal_head_cache();
-	journal_destroy_handle_cache();
-}
-
-static int __init journal_init(void)
-{
-	int ret;
-
-	BUILD_BUG_ON(sizeof(struct journal_superblock_s) != 1024);
-
-	ret = journal_init_caches();
-	if (ret != 0)
-		journal_destroy_caches();
-	jbd_create_debugfs_entry();
-	return ret;
-}
-
-static void __exit journal_exit(void)
-{
-#ifdef CONFIG_JBD_DEBUG
-	int n = atomic_read(&nr_journal_heads);
-	if (n)
-		printk(KERN_ERR "JBD: leaked %d journal_heads!\n", n);
-#endif
-	jbd_remove_debugfs_entry();
-	journal_destroy_caches();
-}
-
-MODULE_LICENSE("GPL");
-module_init(journal_init);
-module_exit(journal_exit);
-
diff --git a/fs/jbd/recovery.c b/fs/jbd/recovery.c
deleted file mode 100644
index a748fe21465a..000000000000
--- a/fs/jbd/recovery.c
+++ /dev/null
@@ -1,594 +0,0 @@
-/*
- * linux/fs/jbd/recovery.c
- *
- * Written by Stephen C. Tweedie <sct@redhat.com>, 1999
- *
- * Copyright 1999-2000 Red Hat Software --- All Rights Reserved
- *
- * This file is part of the Linux kernel and is made available under
- * the terms of the GNU General Public License, version 2, or at your
- * option, any later version, incorporated herein by reference.
- *
- * Journal recovery routines for the generic filesystem journaling code;
- * part of the ext2fs journaling system.
- */
-
-#ifndef __KERNEL__
-#include "jfs_user.h"
-#else
-#include <linux/time.h>
-#include <linux/fs.h>
-#include <linux/jbd.h>
-#include <linux/errno.h>
-#include <linux/blkdev.h>
-#endif
-
-/*
- * Maintain information about the progress of the recovery job, so that
- * the different passes can carry information between them.
- */
-struct recovery_info
-{
-	tid_t		start_transaction;
-	tid_t		end_transaction;
-
-	int		nr_replays;
-	int		nr_revokes;
-	int		nr_revoke_hits;
-};
-
-enum passtype {PASS_SCAN, PASS_REVOKE, PASS_REPLAY};
-static int do_one_pass(journal_t *journal,
-				struct recovery_info *info, enum passtype pass);
-static int scan_revoke_records(journal_t *, struct buffer_head *,
-				tid_t, struct recovery_info *);
-
-#ifdef __KERNEL__
-
-/* Release readahead buffers after use */
-static void journal_brelse_array(struct buffer_head *b[], int n)
-{
-	while (--n >= 0)
-		brelse (b[n]);
-}
-
-
-/*
- * When reading from the journal, we are going through the block device
- * layer directly and so there is no readahead being done for us.  We
- * need to implement any readahead ourselves if we want it to happen at
- * all.  Recovery is basically one long sequential read, so make sure we
- * do the IO in reasonably large chunks.
- *
- * This is not so critical that we need to be enormously clever about
- * the readahead size, though.  128K is a purely arbitrary, good-enough
- * fixed value.
- */
-
-#define MAXBUF 8
-static int do_readahead(journal_t *journal, unsigned int start)
-{
-	int err;
-	unsigned int max, nbufs, next;
-	unsigned int blocknr;
-	struct buffer_head *bh;
-
-	struct buffer_head * bufs[MAXBUF];
-
-	/* Do up to 128K of readahead */
-	max = start + (128 * 1024 / journal->j_blocksize);
-	if (max > journal->j_maxlen)
-		max = journal->j_maxlen;
-
-	/* Do the readahead itself.  We'll submit MAXBUF buffer_heads at
-	 * a time to the block device IO layer. */
-
-	nbufs = 0;
-
-	for (next = start; next < max; next++) {
-		err = journal_bmap(journal, next, &blocknr);
-
-		if (err) {
-			printk (KERN_ERR "JBD: bad block at offset %u\n",
-				next);
-			goto failed;
-		}
-
-		bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
-		if (!bh) {
-			err = -ENOMEM;
-			goto failed;
-		}
-
-		if (!buffer_uptodate(bh) && !buffer_locked(bh)) {
-			bufs[nbufs++] = bh;
-			if (nbufs == MAXBUF) {
-				ll_rw_block(READ, nbufs, bufs);
-				journal_brelse_array(bufs, nbufs);
-				nbufs = 0;
-			}
-		} else
-			brelse(bh);
-	}
-
-	if (nbufs)
-		ll_rw_block(READ, nbufs, bufs);
-	err = 0;
-
-failed:
-	if (nbufs)
-		journal_brelse_array(bufs, nbufs);
-	return err;
-}
-
-#endif /* __KERNEL__ */
-
-
-/*
- * Read a block from the journal
- */
-
-static int jread(struct buffer_head **bhp, journal_t *journal,
-		 unsigned int offset)
-{
-	int err;
-	unsigned int blocknr;
-	struct buffer_head *bh;
-
-	*bhp = NULL;
-
-	if (offset >= journal->j_maxlen) {
-		printk(KERN_ERR "JBD: corrupted journal superblock\n");
-		return -EIO;
-	}
-
-	err = journal_bmap(journal, offset, &blocknr);
-
-	if (err) {
-		printk (KERN_ERR "JBD: bad block at offset %u\n",
-			offset);
-		return err;
-	}
-
-	bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
-	if (!bh)
-		return -ENOMEM;
-
-	if (!buffer_uptodate(bh)) {
-		/* If this is a brand new buffer, start readahead.
-                   Otherwise, we assume we are already reading it.  */
-		if (!buffer_req(bh))
-			do_readahead(journal, offset);
-		wait_on_buffer(bh);
-	}
-
-	if (!buffer_uptodate(bh)) {
-		printk (KERN_ERR "JBD: Failed to read block at offset %u\n",
-			offset);
-		brelse(bh);
-		return -EIO;
-	}
-
-	*bhp = bh;
-	return 0;
-}
-
-
-/*
- * Count the number of in-use tags in a journal descriptor block.
- */
-
-static int count_tags(struct buffer_head *bh, int size)
-{
-	char *			tagp;
-	journal_block_tag_t *	tag;
-	int			nr = 0;
-
-	tagp = &bh->b_data[sizeof(journal_header_t)];
-
-	while ((tagp - bh->b_data + sizeof(journal_block_tag_t)) <= size) {
-		tag = (journal_block_tag_t *) tagp;
-
-		nr++;
-		tagp += sizeof(journal_block_tag_t);
-		if (!(tag->t_flags & cpu_to_be32(JFS_FLAG_SAME_UUID)))
-			tagp += 16;
-
-		if (tag->t_flags & cpu_to_be32(JFS_FLAG_LAST_TAG))
-			break;
-	}
-
-	return nr;
-}
-
-
-/* Make sure we wrap around the log correctly! */
-#define wrap(journal, var)						\
-do {									\
-	if (var >= (journal)->j_last)					\
-		var -= ((journal)->j_last - (journal)->j_first);	\
-} while (0)
-
-/**
- * journal_recover - recovers a on-disk journal
- * @journal: the journal to recover
- *
- * The primary function for recovering the log contents when mounting a
- * journaled device.
- *
- * Recovery is done in three passes.  In the first pass, we look for the
- * end of the log.  In the second, we assemble the list of revoke
- * blocks.  In the third and final pass, we replay any un-revoked blocks
- * in the log.
- */
-int journal_recover(journal_t *journal)
-{
-	int			err, err2;
-	journal_superblock_t *	sb;
-
-	struct recovery_info	info;
-
-	memset(&info, 0, sizeof(info));
-	sb = journal->j_superblock;
-
-	/*
-	 * The journal superblock's s_start field (the current log head)
-	 * is always zero if, and only if, the journal was cleanly
-	 * unmounted.
-	 */
-
-	if (!sb->s_start) {
-		jbd_debug(1, "No recovery required, last transaction %d\n",
-			  be32_to_cpu(sb->s_sequence));
-		journal->j_transaction_sequence = be32_to_cpu(sb->s_sequence) + 1;
-		return 0;
-	}
-
-	err = do_one_pass(journal, &info, PASS_SCAN);
-	if (!err)
-		err = do_one_pass(journal, &info, PASS_REVOKE);
-	if (!err)
-		err = do_one_pass(journal, &info, PASS_REPLAY);
-
-	jbd_debug(1, "JBD: recovery, exit status %d, "
-		  "recovered transactions %u to %u\n",
-		  err, info.start_transaction, info.end_transaction);
-	jbd_debug(1, "JBD: Replayed %d and revoked %d/%d blocks\n",
-		  info.nr_replays, info.nr_revoke_hits, info.nr_revokes);
-
-	/* Restart the log at the next transaction ID, thus invalidating
-	 * any existing commit records in the log. */
-	journal->j_transaction_sequence = ++info.end_transaction;
-
-	journal_clear_revoke(journal);
-	err2 = sync_blockdev(journal->j_fs_dev);
-	if (!err)
-		err = err2;
-	/* Flush disk caches to get replayed data on the permanent storage */
-	if (journal->j_flags & JFS_BARRIER) {
-		err2 = blkdev_issue_flush(journal->j_fs_dev, GFP_KERNEL, NULL);
-		if (!err)
-			err = err2;
-	}
-
-	return err;
-}
-
-/**
- * journal_skip_recovery - Start journal and wipe exiting records
- * @journal: journal to startup
- *
- * Locate any valid recovery information from the journal and set up the
- * journal structures in memory to ignore it (presumably because the
- * caller has evidence that it is out of date).
- * This function does'nt appear to be exorted..
- *
- * We perform one pass over the journal to allow us to tell the user how
- * much recovery information is being erased, and to let us initialise
- * the journal transaction sequence numbers to the next unused ID.
- */
-int journal_skip_recovery(journal_t *journal)
-{
-	int			err;
-	struct recovery_info	info;
-
-	memset (&info, 0, sizeof(info));
-
-	err = do_one_pass(journal, &info, PASS_SCAN);
-
-	if (err) {
-		printk(KERN_ERR "JBD: error %d scanning journal\n", err);
-		++journal->j_transaction_sequence;
-	} else {
-#ifdef CONFIG_JBD_DEBUG
-		int dropped = info.end_transaction -
-			      be32_to_cpu(journal->j_superblock->s_sequence);
-		jbd_debug(1,
-			  "JBD: ignoring %d transaction%s from the journal.\n",
-			  dropped, (dropped == 1) ? "" : "s");
-#endif
-		journal->j_transaction_sequence = ++info.end_transaction;
-	}
-
-	journal->j_tail = 0;
-	return err;
-}
-
-static int do_one_pass(journal_t *journal,
-			struct recovery_info *info, enum passtype pass)
-{
-	unsigned int		first_commit_ID, next_commit_ID;
-	unsigned int		next_log_block;
-	int			err, success = 0;
-	journal_superblock_t *	sb;
-	journal_header_t *	tmp;
-	struct buffer_head *	bh;
-	unsigned int		sequence;
-	int			blocktype;
-
-	/*
-	 * First thing is to establish what we expect to find in the log
-	 * (in terms of transaction IDs), and where (in terms of log
-	 * block offsets): query the superblock.
-	 */
-
-	sb = journal->j_superblock;
-	next_commit_ID = be32_to_cpu(sb->s_sequence);
-	next_log_block = be32_to_cpu(sb->s_start);
-
-	first_commit_ID = next_commit_ID;
-	if (pass == PASS_SCAN)
-		info->start_transaction = first_commit_ID;
-
-	jbd_debug(1, "Starting recovery pass %d\n", pass);
-
-	/*
-	 * Now we walk through the log, transaction by transaction,
-	 * making sure that each transaction has a commit block in the
-	 * expected place.  Each complete transaction gets replayed back
-	 * into the main filesystem.
-	 */
-
-	while (1) {
-		int			flags;
-		char *			tagp;
-		journal_block_tag_t *	tag;
-		struct buffer_head *	obh;
-		struct buffer_head *	nbh;
-
-		cond_resched();
-
-		/* If we already know where to stop the log traversal,
-		 * check right now that we haven't gone past the end of
-		 * the log. */
-
-		if (pass != PASS_SCAN)
-			if (tid_geq(next_commit_ID, info->end_transaction))
-				break;
-
-		jbd_debug(2, "Scanning for sequence ID %u at %u/%u\n",
-			  next_commit_ID, next_log_block, journal->j_last);
-
-		/* Skip over each chunk of the transaction looking
-		 * either the next descriptor block or the final commit
-		 * record. */
-
-		jbd_debug(3, "JBD: checking block %u\n", next_log_block);
-		err = jread(&bh, journal, next_log_block);
-		if (err)
-			goto failed;
-
-		next_log_block++;
-		wrap(journal, next_log_block);
-
-		/* What kind of buffer is it?
-		 *
-		 * If it is a descriptor block, check that it has the
-		 * expected sequence number.  Otherwise, we're all done
-		 * here. */
-
-		tmp = (journal_header_t *)bh->b_data;
-
-		if (tmp->h_magic != cpu_to_be32(JFS_MAGIC_NUMBER)) {
-			brelse(bh);
-			break;
-		}
-
-		blocktype = be32_to_cpu(tmp->h_blocktype);
-		sequence = be32_to_cpu(tmp->h_sequence);
-		jbd_debug(3, "Found magic %d, sequence %d\n",
-			  blocktype, sequence);
-
-		if (sequence != next_commit_ID) {
-			brelse(bh);
-			break;
-		}
-
-		/* OK, we have a valid descriptor block which matches
-		 * all of the sequence number checks.  What are we going
-		 * to do with it?  That depends on the pass... */
-
-		switch(blocktype) {
-		case JFS_DESCRIPTOR_BLOCK:
-			/* If it is a valid descriptor block, replay it
-			 * in pass REPLAY; otherwise, just skip over the
-			 * blocks it describes. */
-			if (pass != PASS_REPLAY) {
-				next_log_block +=
-					count_tags(bh, journal->j_blocksize);
-				wrap(journal, next_log_block);
-				brelse(bh);
-				continue;
-			}
-
-			/* A descriptor block: we can now write all of
-			 * the data blocks.  Yay, useful work is finally
-			 * getting done here! */
-
-			tagp = &bh->b_data[sizeof(journal_header_t)];
-			while ((tagp - bh->b_data +sizeof(journal_block_tag_t))
-			       <= journal->j_blocksize) {
-				unsigned int io_block;
-
-				tag = (journal_block_tag_t *) tagp;
-				flags = be32_to_cpu(tag->t_flags);
-
-				io_block = next_log_block++;
-				wrap(journal, next_log_block);
-				err = jread(&obh, journal, io_block);
-				if (err) {
-					/* Recover what we can, but
-					 * report failure at the end. */
-					success = err;
-					printk (KERN_ERR
-						"JBD: IO error %d recovering "
-						"block %u in log\n",
-						err, io_block);
-				} else {
-					unsigned int blocknr;
-
-					J_ASSERT(obh != NULL);
-					blocknr = be32_to_cpu(tag->t_blocknr);
-
-					/* If the block has been
-					 * revoked, then we're all done
-					 * here. */
-					if (journal_test_revoke
-					    (journal, blocknr,
-					     next_commit_ID)) {
-						brelse(obh);
-						++info->nr_revoke_hits;
-						goto skip_write;
-					}
-
-					/* Find a buffer for the new
-					 * data being restored */
-					nbh = __getblk(journal->j_fs_dev,
-							blocknr,
-							journal->j_blocksize);
-					if (nbh == NULL) {
-						printk(KERN_ERR
-						       "JBD: Out of memory "
-						       "during recovery.\n");
-						err = -ENOMEM;
-						brelse(bh);
-						brelse(obh);
-						goto failed;
-					}
-
-					lock_buffer(nbh);
-					memcpy(nbh->b_data, obh->b_data,
-							journal->j_blocksize);
-					if (flags & JFS_FLAG_ESCAPE) {
-						*((__be32 *)nbh->b_data) =
-						cpu_to_be32(JFS_MAGIC_NUMBER);
-					}
-
-					BUFFER_TRACE(nbh, "marking dirty");
-					set_buffer_uptodate(nbh);
-					mark_buffer_dirty(nbh);
-					BUFFER_TRACE(nbh, "marking uptodate");
-					++info->nr_replays;
-					/* ll_rw_block(WRITE, 1, &nbh); */
-					unlock_buffer(nbh);
-					brelse(obh);
-					brelse(nbh);
-				}
-
-			skip_write:
-				tagp += sizeof(journal_block_tag_t);
-				if (!(flags & JFS_FLAG_SAME_UUID))
-					tagp += 16;
-
-				if (flags & JFS_FLAG_LAST_TAG)
-					break;
-			}
-
-			brelse(bh);
-			continue;
-
-		case JFS_COMMIT_BLOCK:
-			/* Found an expected commit block: not much to
-			 * do other than move on to the next sequence
-			 * number. */
-			brelse(bh);
-			next_commit_ID++;
-			continue;
-
-		case JFS_REVOKE_BLOCK:
-			/* If we aren't in the REVOKE pass, then we can
-			 * just skip over this block. */
-			if (pass != PASS_REVOKE) {
-				brelse(bh);
-				continue;
-			}
-
-			err = scan_revoke_records(journal, bh,
-						  next_commit_ID, info);
-			brelse(bh);
-			if (err)
-				goto failed;
-			continue;
-
-		default:
-			jbd_debug(3, "Unrecognised magic %d, end of scan.\n",
-				  blocktype);
-			brelse(bh);
-			goto done;
-		}
-	}
-
- done:
-	/*
-	 * We broke out of the log scan loop: either we came to the
-	 * known end of the log or we found an unexpected block in the
-	 * log.  If the latter happened, then we know that the "current"
-	 * transaction marks the end of the valid log.
-	 */
-
-	if (pass == PASS_SCAN)
-		info->end_transaction = next_commit_ID;
-	else {
-		/* It's really bad news if different passes end up at
-		 * different places (but possible due to IO errors). */
-		if (info->end_transaction != next_commit_ID) {
-			printk (KERN_ERR "JBD: recovery pass %d ended at "
-				"transaction %u, expected %u\n",
-				pass, next_commit_ID, info->end_transaction);
-			if (!success)
-				success = -EIO;
-		}
-	}
-
-	return success;
-
- failed:
-	return err;
-}
-
-
-/* Scan a revoke record, marking all blocks mentioned as revoked. */
-
-static int scan_revoke_records(journal_t *journal, struct buffer_head *bh,
-			       tid_t sequence, struct recovery_info *info)
-{
-	journal_revoke_header_t *header;
-	int offset, max;
-
-	header = (journal_revoke_header_t *) bh->b_data;
-	offset = sizeof(journal_revoke_header_t);
-	max = be32_to_cpu(header->r_count);
-
-	while (offset < max) {
-		unsigned int blocknr;
-		int err;
-
-		blocknr = be32_to_cpu(* ((__be32 *) (bh->b_data+offset)));
-		offset += 4;
-		err = journal_set_revoke(journal, blocknr, sequence);
-		if (err)
-			return err;
-		++info->nr_revokes;
-	}
-	return 0;
-}
diff --git a/fs/jbd/revoke.c b/fs/jbd/revoke.c
deleted file mode 100644
index dcead636c33b..000000000000
--- a/fs/jbd/revoke.c
+++ /dev/null
@@ -1,733 +0,0 @@
-/*
- * linux/fs/jbd/revoke.c
- *
- * Written by Stephen C. Tweedie <sct@redhat.com>, 2000
- *
- * Copyright 2000 Red Hat corp --- All Rights Reserved
- *
- * This file is part of the Linux kernel and is made available under
- * the terms of the GNU General Public License, version 2, or at your
- * option, any later version, incorporated herein by reference.
- *
- * Journal revoke routines for the generic filesystem journaling code;
- * part of the ext2fs journaling system.
- *
- * Revoke is the mechanism used to prevent old log records for deleted
- * metadata from being replayed on top of newer data using the same
- * blocks.  The revoke mechanism is used in two separate places:
- *
- * + Commit: during commit we write the entire list of the current
- *   transaction's revoked blocks to the journal
- *
- * + Recovery: during recovery we record the transaction ID of all
- *   revoked blocks.  If there are multiple revoke records in the log
- *   for a single block, only the last one counts, and if there is a log
- *   entry for a block beyond the last revoke, then that log entry still
- *   gets replayed.
- *
- * We can get interactions between revokes and new log data within a
- * single transaction:
- *
- * Block is revoked and then journaled:
- *   The desired end result is the journaling of the new block, so we
- *   cancel the revoke before the transaction commits.
- *
- * Block is journaled and then revoked:
- *   The revoke must take precedence over the write of the block, so we
- *   need either to cancel the journal entry or to write the revoke
- *   later in the log than the log block.  In this case, we choose the
- *   latter: journaling a block cancels any revoke record for that block
- *   in the current transaction, so any revoke for that block in the
- *   transaction must have happened after the block was journaled and so
- *   the revoke must take precedence.
- *
- * Block is revoked and then written as data:
- *   The data write is allowed to succeed, but the revoke is _not_
- *   cancelled.  We still need to prevent old log records from
- *   overwriting the new data.  We don't even need to clear the revoke
- *   bit here.
- *
- * We cache revoke status of a buffer in the current transaction in b_states
- * bits.  As the name says, revokevalid flag indicates that the cached revoke
- * status of a buffer is valid and we can rely on the cached status.
- *
- * Revoke information on buffers is a tri-state value:
- *
- * RevokeValid clear:	no cached revoke status, need to look it up
- * RevokeValid set, Revoked clear:
- *			buffer has not been revoked, and cancel_revoke
- *			need do nothing.
- * RevokeValid set, Revoked set:
- *			buffer has been revoked.
- *
- * Locking rules:
- * We keep two hash tables of revoke records. One hashtable belongs to the
- * running transaction (is pointed to by journal->j_revoke), the other one
- * belongs to the committing transaction. Accesses to the second hash table
- * happen only from the kjournald and no other thread touches this table.  Also
- * journal_switch_revoke_table() which switches which hashtable belongs to the
- * running and which to the committing transaction is called only from
- * kjournald. Therefore we need no locks when accessing the hashtable belonging
- * to the committing transaction.
- *
- * All users operating on the hash table belonging to the running transaction
- * have a handle to the transaction. Therefore they are safe from kjournald
- * switching hash tables under them. For operations on the lists of entries in
- * the hash table j_revoke_lock is used.
- *
- * Finally, also replay code uses the hash tables but at this moment no one else
- * can touch them (filesystem isn't mounted yet) and hence no locking is
- * needed.
- */
-
-#ifndef __KERNEL__
-#include "jfs_user.h"
-#else
-#include <linux/time.h>
-#include <linux/fs.h>
-#include <linux/jbd.h>
-#include <linux/errno.h>
-#include <linux/slab.h>
-#include <linux/list.h>
-#include <linux/init.h>
-#include <linux/bio.h>
-#endif
-#include <linux/log2.h>
-#include <linux/hash.h>
-
-static struct kmem_cache *revoke_record_cache;
-static struct kmem_cache *revoke_table_cache;
-
-/* Each revoke record represents one single revoked block.  During
-   journal replay, this involves recording the transaction ID of the
-   last transaction to revoke this block. */
-
-struct jbd_revoke_record_s
-{
-	struct list_head  hash;
-	tid_t		  sequence;	/* Used for recovery only */
-	unsigned int	  blocknr;
-};
-
-
-/* The revoke table is just a simple hash table of revoke records. */
-struct jbd_revoke_table_s
-{
-	/* It is conceivable that we might want a larger hash table
-	 * for recovery.  Must be a power of two. */
-	int		  hash_size;
-	int		  hash_shift;
-	struct list_head *hash_table;
-};
-
-
-#ifdef __KERNEL__
-static void write_one_revoke_record(journal_t *, transaction_t *,
-				    struct journal_head **, int *,
-				    struct jbd_revoke_record_s *, int);
-static void flush_descriptor(journal_t *, struct journal_head *, int, int);
-#endif
-
-/* Utility functions to maintain the revoke table */
-
-static inline int hash(journal_t *journal, unsigned int block)
-{
-	struct jbd_revoke_table_s *table = journal->j_revoke;
-
-	return hash_32(block, table->hash_shift);
-}
-
-static int insert_revoke_hash(journal_t *journal, unsigned int blocknr,
-			      tid_t seq)
-{
-	struct list_head *hash_list;
-	struct jbd_revoke_record_s *record;
-
-repeat:
-	record = kmem_cache_alloc(revoke_record_cache, GFP_NOFS);
-	if (!record)
-		goto oom;
-
-	record->sequence = seq;
-	record->blocknr = blocknr;
-	hash_list = &journal->j_revoke->hash_table[hash(journal, blocknr)];
-	spin_lock(&journal->j_revoke_lock);
-	list_add(&record->hash, hash_list);
-	spin_unlock(&journal->j_revoke_lock);
-	return 0;
-
-oom:
-	if (!journal_oom_retry)
-		return -ENOMEM;
-	jbd_debug(1, "ENOMEM in %s, retrying\n", __func__);
-	yield();
-	goto repeat;
-}
-
-/* Find a revoke record in the journal's hash table. */
-
-static struct jbd_revoke_record_s *find_revoke_record(journal_t *journal,
-						      unsigned int blocknr)
-{
-	struct list_head *hash_list;
-	struct jbd_revoke_record_s *record;
-
-	hash_list = &journal->j_revoke->hash_table[hash(journal, blocknr)];
-
-	spin_lock(&journal->j_revoke_lock);
-	record = (struct jbd_revoke_record_s *) hash_list->next;
-	while (&(record->hash) != hash_list) {
-		if (record->blocknr == blocknr) {
-			spin_unlock(&journal->j_revoke_lock);
-			return record;
-		}
-		record = (struct jbd_revoke_record_s *) record->hash.next;
-	}
-	spin_unlock(&journal->j_revoke_lock);
-	return NULL;
-}
-
-void journal_destroy_revoke_caches(void)
-{
-	if (revoke_record_cache) {
-		kmem_cache_destroy(revoke_record_cache);
-		revoke_record_cache = NULL;
-	}
-	if (revoke_table_cache) {
-		kmem_cache_destroy(revoke_table_cache);
-		revoke_table_cache = NULL;
-	}
-}
-
-int __init journal_init_revoke_caches(void)
-{
-	J_ASSERT(!revoke_record_cache);
-	J_ASSERT(!revoke_table_cache);
-
-	revoke_record_cache = kmem_cache_create("revoke_record",
-					   sizeof(struct jbd_revoke_record_s),
-					   0,
-					   SLAB_HWCACHE_ALIGN|SLAB_TEMPORARY,
-					   NULL);
-	if (!revoke_record_cache)
-		goto record_cache_failure;
-
-	revoke_table_cache = kmem_cache_create("revoke_table",
-					   sizeof(struct jbd_revoke_table_s),
-					   0, SLAB_TEMPORARY, NULL);
-	if (!revoke_table_cache)
-		goto table_cache_failure;
-
-	return 0;
-
-table_cache_failure:
-	journal_destroy_revoke_caches();
-record_cache_failure:
-	return -ENOMEM;
-}
-
-static struct jbd_revoke_table_s *journal_init_revoke_table(int hash_size)
-{
-	int i;
-	struct jbd_revoke_table_s *table;
-
-	table = kmem_cache_alloc(revoke_table_cache, GFP_KERNEL);
-	if (!table)
-		goto out;
-
-	table->hash_size = hash_size;
-	table->hash_shift = ilog2(hash_size);
-	table->hash_table =
-		kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
-	if (!table->hash_table) {
-		kmem_cache_free(revoke_table_cache, table);
-		table = NULL;
-		goto out;
-	}
-
-	for (i = 0; i < hash_size; i++)
-		INIT_LIST_HEAD(&table->hash_table[i]);
-
-out:
-	return table;
-}
-
-static void journal_destroy_revoke_table(struct jbd_revoke_table_s *table)
-{
-	int i;
-	struct list_head *hash_list;
-
-	for (i = 0; i < table->hash_size; i++) {
-		hash_list = &table->hash_table[i];
-		J_ASSERT(list_empty(hash_list));
-	}
-
-	kfree(table->hash_table);
-	kmem_cache_free(revoke_table_cache, table);
-}
-
-/* Initialise the revoke table for a given journal to a given size. */
-int journal_init_revoke(journal_t *journal, int hash_size)
-{
-	J_ASSERT(journal->j_revoke_table[0] == NULL);
-	J_ASSERT(is_power_of_2(hash_size));
-
-	journal->j_revoke_table[0] = journal_init_revoke_table(hash_size);
-	if (!journal->j_revoke_table[0])
-		goto fail0;
-
-	journal->j_revoke_table[1] = journal_init_revoke_table(hash_size);
-	if (!journal->j_revoke_table[1])
-		goto fail1;
-
-	journal->j_revoke = journal->j_revoke_table[1];
-
-	spin_lock_init(&journal->j_revoke_lock);
-
-	return 0;
-
-fail1:
-	journal_destroy_revoke_table(journal->j_revoke_table[0]);
-fail0:
-	return -ENOMEM;
-}
-
-/* Destroy a journal's revoke table.  The table must already be empty! */
-void journal_destroy_revoke(journal_t *journal)
-{
-	journal->j_revoke = NULL;
-	if (journal->j_revoke_table[0])
-		journal_destroy_revoke_table(journal->j_revoke_table[0]);
-	if (journal->j_revoke_table[1])
-		journal_destroy_revoke_table(journal->j_revoke_table[1]);
-}
-
-
-#ifdef __KERNEL__
-
-/*
- * journal_revoke: revoke a given buffer_head from the journal.  This
- * prevents the block from being replayed during recovery if we take a
- * crash after this current transaction commits.  Any subsequent
- * metadata writes of the buffer in this transaction cancel the
- * revoke.
- *
- * Note that this call may block --- it is up to the caller to make
- * sure that there are no further calls to journal_write_metadata
- * before the revoke is complete.  In ext3, this implies calling the
- * revoke before clearing the block bitmap when we are deleting
- * metadata.
- *
- * Revoke performs a journal_forget on any buffer_head passed in as a
- * parameter, but does _not_ forget the buffer_head if the bh was only
- * found implicitly.
- *
- * bh_in may not be a journalled buffer - it may have come off
- * the hash tables without an attached journal_head.
- *
- * If bh_in is non-zero, journal_revoke() will decrement its b_count
- * by one.
- */
-
-int journal_revoke(handle_t *handle, unsigned int blocknr,
-		   struct buffer_head *bh_in)
-{
-	struct buffer_head *bh = NULL;
-	journal_t *journal;
-	struct block_device *bdev;
-	int err;
-
-	might_sleep();
-	if (bh_in)
-		BUFFER_TRACE(bh_in, "enter");
-
-	journal = handle->h_transaction->t_journal;
-	if (!journal_set_features(journal, 0, 0, JFS_FEATURE_INCOMPAT_REVOKE)){
-		J_ASSERT (!"Cannot set revoke feature!");
-		return -EINVAL;
-	}
-
-	bdev = journal->j_fs_dev;
-	bh = bh_in;
-
-	if (!bh) {
-		bh = __find_get_block(bdev, blocknr, journal->j_blocksize);
-		if (bh)
-			BUFFER_TRACE(bh, "found on hash");
-	}
-#ifdef JBD_EXPENSIVE_CHECKING
-	else {
-		struct buffer_head *bh2;
-
-		/* If there is a different buffer_head lying around in
-		 * memory anywhere... */
-		bh2 = __find_get_block(bdev, blocknr, journal->j_blocksize);
-		if (bh2) {
-			/* ... and it has RevokeValid status... */
-			if (bh2 != bh && buffer_revokevalid(bh2))
-				/* ...then it better be revoked too,
-				 * since it's illegal to create a revoke
-				 * record against a buffer_head which is
-				 * not marked revoked --- that would
-				 * risk missing a subsequent revoke
-				 * cancel. */
-				J_ASSERT_BH(bh2, buffer_revoked(bh2));
-			put_bh(bh2);
-		}
-	}
-#endif
-
-	/* We really ought not ever to revoke twice in a row without
-           first having the revoke cancelled: it's illegal to free a
-           block twice without allocating it in between! */
-	if (bh) {
-		if (!J_EXPECT_BH(bh, !buffer_revoked(bh),
-				 "inconsistent data on disk")) {
-			if (!bh_in)
-				brelse(bh);
-			return -EIO;
-		}
-		set_buffer_revoked(bh);
-		set_buffer_revokevalid(bh);
-		if (bh_in) {
-			BUFFER_TRACE(bh_in, "call journal_forget");
-			journal_forget(handle, bh_in);
-		} else {
-			BUFFER_TRACE(bh, "call brelse");
-			__brelse(bh);
-		}
-	}
-
-	jbd_debug(2, "insert revoke for block %u, bh_in=%p\n", blocknr, bh_in);
-	err = insert_revoke_hash(journal, blocknr,
-				handle->h_transaction->t_tid);
-	BUFFER_TRACE(bh_in, "exit");
-	return err;
-}
-
-/*
- * Cancel an outstanding revoke.  For use only internally by the
- * journaling code (called from journal_get_write_access).
- *
- * We trust buffer_revoked() on the buffer if the buffer is already
- * being journaled: if there is no revoke pending on the buffer, then we
- * don't do anything here.
- *
- * This would break if it were possible for a buffer to be revoked and
- * discarded, and then reallocated within the same transaction.  In such
- * a case we would have lost the revoked bit, but when we arrived here
- * the second time we would still have a pending revoke to cancel.  So,
- * do not trust the Revoked bit on buffers unless RevokeValid is also
- * set.
- */
-int journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
-{
-	struct jbd_revoke_record_s *record;
-	journal_t *journal = handle->h_transaction->t_journal;
-	int need_cancel;
-	int did_revoke = 0;	/* akpm: debug */
-	struct buffer_head *bh = jh2bh(jh);
-
-	jbd_debug(4, "journal_head %p, cancelling revoke\n", jh);
-
-	/* Is the existing Revoke bit valid?  If so, we trust it, and
-	 * only perform the full cancel if the revoke bit is set.  If
-	 * not, we can't trust the revoke bit, and we need to do the
-	 * full search for a revoke record. */
-	if (test_set_buffer_revokevalid(bh)) {
-		need_cancel = test_clear_buffer_revoked(bh);
-	} else {
-		need_cancel = 1;
-		clear_buffer_revoked(bh);
-	}
-
-	if (need_cancel) {
-		record = find_revoke_record(journal, bh->b_blocknr);
-		if (record) {
-			jbd_debug(4, "cancelled existing revoke on "
-				  "blocknr %llu\n", (unsigned long long)bh->b_blocknr);
-			spin_lock(&journal->j_revoke_lock);
-			list_del(&record->hash);
-			spin_unlock(&journal->j_revoke_lock);
-			kmem_cache_free(revoke_record_cache, record);
-			did_revoke = 1;
-		}
-	}
-
-#ifdef JBD_EXPENSIVE_CHECKING
-	/* There better not be one left behind by now! */
-	record = find_revoke_record(journal, bh->b_blocknr);
-	J_ASSERT_JH(jh, record == NULL);
-#endif
-
-	/* Finally, have we just cleared revoke on an unhashed
-	 * buffer_head?  If so, we'd better make sure we clear the
-	 * revoked status on any hashed alias too, otherwise the revoke
-	 * state machine will get very upset later on. */
-	if (need_cancel) {
-		struct buffer_head *bh2;
-		bh2 = __find_get_block(bh->b_bdev, bh->b_blocknr, bh->b_size);
-		if (bh2) {
-			if (bh2 != bh)
-				clear_buffer_revoked(bh2);
-			__brelse(bh2);
-		}
-	}
-	return did_revoke;
-}
-
-/*
- * journal_clear_revoked_flags clears revoked flag of buffers in
- * revoke table to reflect there is no revoked buffer in the next
- * transaction which is going to be started.
- */
-void journal_clear_buffer_revoked_flags(journal_t *journal)
-{
-	struct jbd_revoke_table_s *revoke = journal->j_revoke;
-	int i = 0;
-
-	for (i = 0; i < revoke->hash_size; i++) {
-		struct list_head *hash_list;
-		struct list_head *list_entry;
-		hash_list = &revoke->hash_table[i];
-
-		list_for_each(list_entry, hash_list) {
-			struct jbd_revoke_record_s *record;
-			struct buffer_head *bh;
-			record = (struct jbd_revoke_record_s *)list_entry;
-			bh = __find_get_block(journal->j_fs_dev,
-					      record->blocknr,
-					      journal->j_blocksize);
-			if (bh) {
-				clear_buffer_revoked(bh);
-				__brelse(bh);
-			}
-		}
-	}
-}
-
-/* journal_switch_revoke table select j_revoke for next transaction
- * we do not want to suspend any processing until all revokes are
- * written -bzzz
- */
-void journal_switch_revoke_table(journal_t *journal)
-{
-	int i;
-
-	if (journal->j_revoke == journal->j_revoke_table[0])
-		journal->j_revoke = journal->j_revoke_table[1];
-	else
-		journal->j_revoke = journal->j_revoke_table[0];
-
-	for (i = 0; i < journal->j_revoke->hash_size; i++)
-		INIT_LIST_HEAD(&journal->j_revoke->hash_table[i]);
-}
-
-/*
- * Write revoke records to the journal for all entries in the current
- * revoke hash, deleting the entries as we go.
- */
-void journal_write_revoke_records(journal_t *journal,
-				  transaction_t *transaction, int write_op)
-{
-	struct journal_head *descriptor;
-	struct jbd_revoke_record_s *record;
-	struct jbd_revoke_table_s *revoke;
-	struct list_head *hash_list;
-	int i, offset, count;
-
-	descriptor = NULL;
-	offset = 0;
-	count = 0;
-
-	/* select revoke table for committing transaction */
-	revoke = journal->j_revoke == journal->j_revoke_table[0] ?
-		journal->j_revoke_table[1] : journal->j_revoke_table[0];
-
-	for (i = 0; i < revoke->hash_size; i++) {
-		hash_list = &revoke->hash_table[i];
-
-		while (!list_empty(hash_list)) {
-			record = (struct jbd_revoke_record_s *)
-				hash_list->next;
-			write_one_revoke_record(journal, transaction,
-						&descriptor, &offset,
-						record, write_op);
-			count++;
-			list_del(&record->hash);
-			kmem_cache_free(revoke_record_cache, record);
-		}
-	}
-	if (descriptor)
-		flush_descriptor(journal, descriptor, offset, write_op);
-	jbd_debug(1, "Wrote %d revoke records\n", count);
-}
-
-/*
- * Write out one revoke record.  We need to create a new descriptor
- * block if the old one is full or if we have not already created one.
- */
-
-static void write_one_revoke_record(journal_t *journal,
-				    transaction_t *transaction,
-				    struct journal_head **descriptorp,
-				    int *offsetp,
-				    struct jbd_revoke_record_s *record,
-				    int write_op)
-{
-	struct journal_head *descriptor;
-	int offset;
-	journal_header_t *header;
-
-	/* If we are already aborting, this all becomes a noop.  We
-           still need to go round the loop in
-           journal_write_revoke_records in order to free all of the
-           revoke records: only the IO to the journal is omitted. */
-	if (is_journal_aborted(journal))
-		return;
-
-	descriptor = *descriptorp;
-	offset = *offsetp;
-
-	/* Make sure we have a descriptor with space left for the record */
-	if (descriptor) {
-		if (offset == journal->j_blocksize) {
-			flush_descriptor(journal, descriptor, offset, write_op);
-			descriptor = NULL;
-		}
-	}
-
-	if (!descriptor) {
-		descriptor = journal_get_descriptor_buffer(journal);
-		if (!descriptor)
-			return;
-		header = (journal_header_t *) &jh2bh(descriptor)->b_data[0];
-		header->h_magic     = cpu_to_be32(JFS_MAGIC_NUMBER);
-		header->h_blocktype = cpu_to_be32(JFS_REVOKE_BLOCK);
-		header->h_sequence  = cpu_to_be32(transaction->t_tid);
-
-		/* Record it so that we can wait for IO completion later */
-		JBUFFER_TRACE(descriptor, "file as BJ_LogCtl");
-		journal_file_buffer(descriptor, transaction, BJ_LogCtl);
-
-		offset = sizeof(journal_revoke_header_t);
-		*descriptorp = descriptor;
-	}
-
-	* ((__be32 *)(&jh2bh(descriptor)->b_data[offset])) =
-		cpu_to_be32(record->blocknr);
-	offset += 4;
-	*offsetp = offset;
-}
-
-/*
- * Flush a revoke descriptor out to the journal.  If we are aborting,
- * this is a noop; otherwise we are generating a buffer which needs to
- * be waited for during commit, so it has to go onto the appropriate
- * journal buffer list.
- */
-
-static void flush_descriptor(journal_t *journal,
-			     struct journal_head *descriptor,
-			     int offset, int write_op)
-{
-	journal_revoke_header_t *header;
-	struct buffer_head *bh = jh2bh(descriptor);
-
-	if (is_journal_aborted(journal)) {
-		put_bh(bh);
-		return;
-	}
-
-	header = (journal_revoke_header_t *) jh2bh(descriptor)->b_data;
-	header->r_count = cpu_to_be32(offset);
-	set_buffer_jwrite(bh);
-	BUFFER_TRACE(bh, "write");
-	set_buffer_dirty(bh);
-	write_dirty_buffer(bh, write_op);
-}
-#endif
-
-/*
- * Revoke support for recovery.
- *
- * Recovery needs to be able to:
- *
- *  record all revoke records, including the tid of the latest instance
- *  of each revoke in the journal
- *
- *  check whether a given block in a given transaction should be replayed
- *  (ie. has not been revoked by a revoke record in that or a subsequent
- *  transaction)
- *
- *  empty the revoke table after recovery.
- */
-
-/*
- * First, setting revoke records.  We create a new revoke record for
- * every block ever revoked in the log as we scan it for recovery, and
- * we update the existing records if we find multiple revokes for a
- * single block.
- */
-
-int journal_set_revoke(journal_t *journal,
-		       unsigned int blocknr,
-		       tid_t sequence)
-{
-	struct jbd_revoke_record_s *record;
-
-	record = find_revoke_record(journal, blocknr);
-	if (record) {
-		/* If we have multiple occurrences, only record the
-		 * latest sequence number in the hashed record */
-		if (tid_gt(sequence, record->sequence))
-			record->sequence = sequence;
-		return 0;
-	}
-	return insert_revoke_hash(journal, blocknr, sequence);
-}
-
-/*
- * Test revoke records.  For a given block referenced in the log, has
- * that block been revoked?  A revoke record with a given transaction
- * sequence number revokes all blocks in that transaction and earlier
- * ones, but later transactions still need replayed.
- */
-
-int journal_test_revoke(journal_t *journal,
-			unsigned int blocknr,
-			tid_t sequence)
-{
-	struct jbd_revoke_record_s *record;
-
-	record = find_revoke_record(journal, blocknr);
-	if (!record)
-		return 0;
-	if (tid_gt(sequence, record->sequence))
-		return 0;
-	return 1;
-}
-
-/*
- * Finally, once recovery is over, we need to clear the revoke table so
- * that it can be reused by the running filesystem.
- */
-
-void journal_clear_revoke(journal_t *journal)
-{
-	int i;
-	struct list_head *hash_list;
-	struct jbd_revoke_record_s *record;
-	struct jbd_revoke_table_s *revoke;
-
-	revoke = journal->j_revoke;
-
-	for (i = 0; i < revoke->hash_size; i++) {
-		hash_list = &revoke->hash_table[i];
-		while (!list_empty(hash_list)) {
-			record = (struct jbd_revoke_record_s*) hash_list->next;
-			list_del(&record->hash);
-			kmem_cache_free(revoke_record_cache, record);
-		}
-	}
-}
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
deleted file mode 100644
index 1695ba8334a2..000000000000
--- a/fs/jbd/transaction.c
+++ /dev/null
@@ -1,2237 +0,0 @@
-/*
- * linux/fs/jbd/transaction.c
- *
- * Written by Stephen C. Tweedie <sct@redhat.com>, 1998
- *
- * Copyright 1998 Red Hat corp --- All Rights Reserved
- *
- * This file is part of the Linux kernel and is made available under
- * the terms of the GNU General Public License, version 2, or at your
- * option, any later version, incorporated herein by reference.
- *
- * Generic filesystem transaction handling code; part of the ext2fs
- * journaling system.
- *
- * This file manages transactions (compound commits managed by the
- * journaling code) and handles (individual atomic operations by the
- * filesystem).
- */
-
-#include <linux/time.h>
-#include <linux/fs.h>
-#include <linux/jbd.h>
-#include <linux/errno.h>
-#include <linux/slab.h>
-#include <linux/timer.h>
-#include <linux/mm.h>
-#include <linux/highmem.h>
-#include <linux/hrtimer.h>
-
-static void __journal_temp_unlink_buffer(struct journal_head *jh);
-
-/*
- * get_transaction: obtain a new transaction_t object.
- *
- * Simply allocate and initialise a new transaction.  Create it in
- * RUNNING state and add it to the current journal (which should not
- * have an existing running transaction: we only make a new transaction
- * once we have started to commit the old one).
- *
- * Preconditions:
- *	The journal MUST be locked.  We don't perform atomic mallocs on the
- *	new transaction	and we can't block without protecting against other
- *	processes trying to touch the journal while it is in transition.
- *
- * Called under j_state_lock
- */
-
-static transaction_t *
-get_transaction(journal_t *journal, transaction_t *transaction)
-{
-	transaction->t_journal = journal;
-	transaction->t_state = T_RUNNING;
-	transaction->t_start_time = ktime_get();
-	transaction->t_tid = journal->j_transaction_sequence++;
-	transaction->t_expires = jiffies + journal->j_commit_interval;
-	spin_lock_init(&transaction->t_handle_lock);
-
-	/* Set up the commit timer for the new transaction. */
-	journal->j_commit_timer.expires =
-				round_jiffies_up(transaction->t_expires);
-	add_timer(&journal->j_commit_timer);
-
-	J_ASSERT(journal->j_running_transaction == NULL);
-	journal->j_running_transaction = transaction;
-
-	return transaction;
-}
-
-/*
- * Handle management.
- *
- * A handle_t is an object which represents a single atomic update to a
- * filesystem, and which tracks all of the modifications which form part
- * of that one update.
- */
-
-/*
- * start_this_handle: Given a handle, deal with any locking or stalling
- * needed to make sure that there is enough journal space for the handle
- * to begin.  Attach the handle to a transaction and set up the
- * transaction's buffer credits.
- */
-
-static int start_this_handle(journal_t *journal, handle_t *handle)
-{
-	transaction_t *transaction;
-	int needed;
-	int nblocks = handle->h_buffer_credits;
-	transaction_t *new_transaction = NULL;
-	int ret = 0;
-
-	if (nblocks > journal->j_max_transaction_buffers) {
-		printk(KERN_ERR "JBD: %s wants too many credits (%d > %d)\n",
-		       current->comm, nblocks,
-		       journal->j_max_transaction_buffers);
-		ret = -ENOSPC;
-		goto out;
-	}
-
-alloc_transaction:
-	if (!journal->j_running_transaction) {
-		new_transaction = kzalloc(sizeof(*new_transaction),
-						GFP_NOFS|__GFP_NOFAIL);
-		if (!new_transaction) {
-			ret = -ENOMEM;
-			goto out;
-		}
-	}
-
-	jbd_debug(3, "New handle %p going live.\n", handle);
-
-repeat:
-
-	/*
-	 * We need to hold j_state_lock until t_updates has been incremented,
-	 * for proper journal barrier handling
-	 */
-	spin_lock(&journal->j_state_lock);
-repeat_locked:
-	if (is_journal_aborted(journal) ||
-	    (journal->j_errno != 0 && !(journal->j_flags & JFS_ACK_ERR))) {
-		spin_unlock(&journal->j_state_lock);
-		ret = -EROFS;
-		goto out;
-	}
-
-	/* Wait on the journal's transaction barrier if necessary */
-	if (journal->j_barrier_count) {
-		spin_unlock(&journal->j_state_lock);
-		wait_event(journal->j_wait_transaction_locked,
-				journal->j_barrier_count == 0);
-		goto repeat;
-	}
-
-	if (!journal->j_running_transaction) {
-		if (!new_transaction) {
-			spin_unlock(&journal->j_state_lock);
-			goto alloc_transaction;
-		}
-		get_transaction(journal, new_transaction);
-		new_transaction = NULL;
-	}
-
-	transaction = journal->j_running_transaction;
-
-	/*
-	 * If the current transaction is locked down for commit, wait for the
-	 * lock to be released.
-	 */
-	if (transaction->t_state == T_LOCKED) {
-		DEFINE_WAIT(wait);
-
-		prepare_to_wait(&journal->j_wait_transaction_locked,
-					&wait, TASK_UNINTERRUPTIBLE);
-		spin_unlock(&journal->j_state_lock);
-		schedule();
-		finish_wait(&journal->j_wait_transaction_locked, &wait);
-		goto repeat;
-	}
-
-	/*
-	 * If there is not enough space left in the log to write all potential
-	 * buffers requested by this operation, we need to stall pending a log
-	 * checkpoint to free some more log space.
-	 */
-	spin_lock(&transaction->t_handle_lock);
-	needed = transaction->t_outstanding_credits + nblocks;
-
-	if (needed > journal->j_max_transaction_buffers) {
-		/*
-		 * If the current transaction is already too large, then start
-		 * to commit it: we can then go back and attach this handle to
-		 * a new transaction.
-		 */
-		DEFINE_WAIT(wait);
-
-		jbd_debug(2, "Handle %p starting new commit...\n", handle);
-		spin_unlock(&transaction->t_handle_lock);
-		prepare_to_wait(&journal->j_wait_transaction_locked, &wait,
-				TASK_UNINTERRUPTIBLE);
-		__log_start_commit(journal, transaction->t_tid);
-		spin_unlock(&journal->j_state_lock);
-		schedule();
-		finish_wait(&journal->j_wait_transaction_locked, &wait);
-		goto repeat;
-	}
-
-	/*
-	 * The commit code assumes that it can get enough log space
-	 * without forcing a checkpoint.  This is *critical* for
-	 * correctness: a checkpoint of a buffer which is also
-	 * associated with a committing transaction creates a deadlock,
-	 * so commit simply cannot force through checkpoints.
-	 *
-	 * We must therefore ensure the necessary space in the journal
-	 * *before* starting to dirty potentially checkpointed buffers
-	 * in the new transaction.
-	 *
-	 * The worst part is, any transaction currently committing can
-	 * reduce the free space arbitrarily.  Be careful to account for
-	 * those buffers when checkpointing.
-	 */
-
-	/*
-	 * @@@ AKPM: This seems rather over-defensive.  We're giving commit
-	 * a _lot_ of headroom: 1/4 of the journal plus the size of
-	 * the committing transaction.  Really, we only need to give it
-	 * committing_transaction->t_outstanding_credits plus "enough" for
-	 * the log control blocks.
-	 * Also, this test is inconsistent with the matching one in
-	 * journal_extend().
-	 */
-	if (__log_space_left(journal) < jbd_space_needed(journal)) {
-		jbd_debug(2, "Handle %p waiting for checkpoint...\n", handle);
-		spin_unlock(&transaction->t_handle_lock);
-		__log_wait_for_space(journal);
-		goto repeat_locked;
-	}
-
-	/* OK, account for the buffers that this operation expects to
-	 * use and add the handle to the running transaction. */
-
-	handle->h_transaction = transaction;
-	transaction->t_outstanding_credits += nblocks;
-	transaction->t_updates++;
-	transaction->t_handle_count++;
-	jbd_debug(4, "Handle %p given %d credits (total %d, free %d)\n",
-		  handle, nblocks, transaction->t_outstanding_credits,
-		  __log_space_left(journal));
-	spin_unlock(&transaction->t_handle_lock);
-	spin_unlock(&journal->j_state_lock);
-
-	lock_map_acquire(&handle->h_lockdep_map);
-out:
-	if (unlikely(new_transaction))		/* It's usually NULL */
-		kfree(new_transaction);
-	return ret;
-}
-
-static struct lock_class_key jbd_handle_key;
-
-/* Allocate a new handle.  This should probably be in a slab... */
-static handle_t *new_handle(int nblocks)
-{
-	handle_t *handle = jbd_alloc_handle(GFP_NOFS);
-	if (!handle)
-		return NULL;
-	handle->h_buffer_credits = nblocks;
-	handle->h_ref = 1;
-
-	lockdep_init_map(&handle->h_lockdep_map, "jbd_handle", &jbd_handle_key, 0);
-
-	return handle;
-}
-
-/**
- * handle_t *journal_start() - Obtain a new handle.
- * @journal: Journal to start transaction on.
- * @nblocks: number of block buffer we might modify
- *
- * We make sure that the transaction can guarantee at least nblocks of
- * modified buffers in the log.  We block until the log can guarantee
- * that much space.
- *
- * This function is visible to journal users (like ext3fs), so is not
- * called with the journal already locked.
- *
- * Return a pointer to a newly allocated handle, or an ERR_PTR() value
- * on failure.
- */
-handle_t *journal_start(journal_t *journal, int nblocks)
-{
-	handle_t *handle = journal_current_handle();
-	int err;
-
-	if (!journal)
-		return ERR_PTR(-EROFS);
-
-	if (handle) {
-		J_ASSERT(handle->h_transaction->t_journal == journal);
-		handle->h_ref++;
-		return handle;
-	}
-
-	handle = new_handle(nblocks);
-	if (!handle)
-		return ERR_PTR(-ENOMEM);
-
-	current->journal_info = handle;
-
-	err = start_this_handle(journal, handle);
-	if (err < 0) {
-		jbd_free_handle(handle);
-		current->journal_info = NULL;
-		handle = ERR_PTR(err);
-	}
-	return handle;
-}
-
-/**
- * int journal_extend() - extend buffer credits.
- * @handle:  handle to 'extend'
- * @nblocks: nr blocks to try to extend by.
- *
- * Some transactions, such as large extends and truncates, can be done
- * atomically all at once or in several stages.  The operation requests
- * a credit for a number of buffer modications in advance, but can
- * extend its credit if it needs more.
- *
- * journal_extend tries to give the running handle more buffer credits.
- * It does not guarantee that allocation - this is a best-effort only.
- * The calling process MUST be able to deal cleanly with a failure to
- * extend here.
- *
- * Return 0 on success, non-zero on failure.
- *
- * return code < 0 implies an error
- * return code > 0 implies normal transaction-full status.
- */
-int journal_extend(handle_t *handle, int nblocks)
-{
-	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
-	int result;
-	int wanted;
-
-	result = -EIO;
-	if (is_handle_aborted(handle))
-		goto out;
-
-	result = 1;
-
-	spin_lock(&journal->j_state_lock);
-
-	/* Don't extend a locked-down transaction! */
-	if (handle->h_transaction->t_state != T_RUNNING) {
-		jbd_debug(3, "denied handle %p %d blocks: "
-			  "transaction not running\n", handle, nblocks);
-		goto error_out;
-	}
-
-	spin_lock(&transaction->t_handle_lock);
-	wanted = transaction->t_outstanding_credits + nblocks;
-
-	if (wanted > journal->j_max_transaction_buffers) {
-		jbd_debug(3, "denied handle %p %d blocks: "
-			  "transaction too large\n", handle, nblocks);
-		goto unlock;
-	}
-
-	if (wanted > __log_space_left(journal)) {
-		jbd_debug(3, "denied handle %p %d blocks: "
-			  "insufficient log space\n", handle, nblocks);
-		goto unlock;
-	}
-
-	handle->h_buffer_credits += nblocks;
-	transaction->t_outstanding_credits += nblocks;
-	result = 0;
-
-	jbd_debug(3, "extended handle %p by %d\n", handle, nblocks);
-unlock:
-	spin_unlock(&transaction->t_handle_lock);
-error_out:
-	spin_unlock(&journal->j_state_lock);
-out:
-	return result;
-}
-
-
-/**
- * int journal_restart() - restart a handle.
- * @handle:  handle to restart
- * @nblocks: nr credits requested
- *
- * Restart a handle for a multi-transaction filesystem
- * operation.
- *
- * If the journal_extend() call above fails to grant new buffer credits
- * to a running handle, a call to journal_restart will commit the
- * handle's transaction so far and reattach the handle to a new
- * transaction capabable of guaranteeing the requested number of
- * credits.
- */
-
-int journal_restart(handle_t *handle, int nblocks)
-{
-	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
-	int ret;
-
-	/* If we've had an abort of any type, don't even think about
-	 * actually doing the restart! */
-	if (is_handle_aborted(handle))
-		return 0;
-
-	/*
-	 * First unlink the handle from its current transaction, and start the
-	 * commit on that.
-	 */
-	J_ASSERT(transaction->t_updates > 0);
-	J_ASSERT(journal_current_handle() == handle);
-
-	spin_lock(&journal->j_state_lock);
-	spin_lock(&transaction->t_handle_lock);
-	transaction->t_outstanding_credits -= handle->h_buffer_credits;
-	transaction->t_updates--;
-
-	if (!transaction->t_updates)
-		wake_up(&journal->j_wait_updates);
-	spin_unlock(&transaction->t_handle_lock);
-
-	jbd_debug(2, "restarting handle %p\n", handle);
-	__log_start_commit(journal, transaction->t_tid);
-	spin_unlock(&journal->j_state_lock);
-
-	lock_map_release(&handle->h_lockdep_map);
-	handle->h_buffer_credits = nblocks;
-	ret = start_this_handle(journal, handle);
-	return ret;
-}
-
-
-/**
- * void journal_lock_updates () - establish a transaction barrier.
- * @journal:  Journal to establish a barrier on.
- *
- * This locks out any further updates from being started, and blocks until all
- * existing updates have completed, returning only once the journal is in a
- * quiescent state with no updates running.
- *
- * We do not use simple mutex for synchronization as there are syscalls which
- * want to return with filesystem locked and that trips up lockdep. Also
- * hibernate needs to lock filesystem but locked mutex then blocks hibernation.
- * Since locking filesystem is rare operation, we use simple counter and
- * waitqueue for locking.
- */
-void journal_lock_updates(journal_t *journal)
-{
-	DEFINE_WAIT(wait);
-
-wait:
-	/* Wait for previous locked operation to finish */
-	wait_event(journal->j_wait_transaction_locked,
-		   journal->j_barrier_count == 0);
-
-	spin_lock(&journal->j_state_lock);
-	/*
-	 * Check reliably under the lock whether we are the ones winning the race
-	 * and locking the journal
-	 */
-	if (journal->j_barrier_count > 0) {
-		spin_unlock(&journal->j_state_lock);
-		goto wait;
-	}
-	++journal->j_barrier_count;
-
-	/* Wait until there are no running updates */
-	while (1) {
-		transaction_t *transaction = journal->j_running_transaction;
-
-		if (!transaction)
-			break;
-
-		spin_lock(&transaction->t_handle_lock);
-		if (!transaction->t_updates) {
-			spin_unlock(&transaction->t_handle_lock);
-			break;
-		}
-		prepare_to_wait(&journal->j_wait_updates, &wait,
-				TASK_UNINTERRUPTIBLE);
-		spin_unlock(&transaction->t_handle_lock);
-		spin_unlock(&journal->j_state_lock);
-		schedule();
-		finish_wait(&journal->j_wait_updates, &wait);
-		spin_lock(&journal->j_state_lock);
-	}
-	spin_unlock(&journal->j_state_lock);
-}
-
-/**
- * void journal_unlock_updates (journal_t* journal) - release barrier
- * @journal:  Journal to release the barrier on.
- *
- * Release a transaction barrier obtained with journal_lock_updates().
- */
-void journal_unlock_updates (journal_t *journal)
-{
-	J_ASSERT(journal->j_barrier_count != 0);
-
-	spin_lock(&journal->j_state_lock);
-	--journal->j_barrier_count;
-	spin_unlock(&journal->j_state_lock);
-	wake_up(&journal->j_wait_transaction_locked);
-}
-
-static void warn_dirty_buffer(struct buffer_head *bh)
-{
-	char b[BDEVNAME_SIZE];
-
-	printk(KERN_WARNING
-	       "JBD: Spotted dirty metadata buffer (dev = %s, blocknr = %llu). "
-	       "There's a risk of filesystem corruption in case of system "
-	       "crash.\n",
-	       bdevname(bh->b_bdev, b), (unsigned long long)bh->b_blocknr);
-}
-
-/*
- * If the buffer is already part of the current transaction, then there
- * is nothing we need to do.  If it is already part of a prior
- * transaction which we are still committing to disk, then we need to
- * make sure that we do not overwrite the old copy: we do copy-out to
- * preserve the copy going to disk.  We also account the buffer against
- * the handle's metadata buffer credits (unless the buffer is already
- * part of the transaction, that is).
- *
- */
-static int
-do_get_write_access(handle_t *handle, struct journal_head *jh,
-			int force_copy)
-{
-	struct buffer_head *bh;
-	transaction_t *transaction;
-	journal_t *journal;
-	int error;
-	char *frozen_buffer = NULL;
-	int need_copy = 0;
-
-	if (is_handle_aborted(handle))
-		return -EROFS;
-
-	transaction = handle->h_transaction;
-	journal = transaction->t_journal;
-
-	jbd_debug(5, "journal_head %p, force_copy %d\n", jh, force_copy);
-
-	JBUFFER_TRACE(jh, "entry");
-repeat:
-	bh = jh2bh(jh);
-
-	/* @@@ Need to check for errors here at some point. */
-
-	lock_buffer(bh);
-	jbd_lock_bh_state(bh);
-
-	/* We now hold the buffer lock so it is safe to query the buffer
-	 * state.  Is the buffer dirty?
-	 *
-	 * If so, there are two possibilities.  The buffer may be
-	 * non-journaled, and undergoing a quite legitimate writeback.
-	 * Otherwise, it is journaled, and we don't expect dirty buffers
-	 * in that state (the buffers should be marked JBD_Dirty
-	 * instead.)  So either the IO is being done under our own
-	 * control and this is a bug, or it's a third party IO such as
-	 * dump(8) (which may leave the buffer scheduled for read ---
-	 * ie. locked but not dirty) or tune2fs (which may actually have
-	 * the buffer dirtied, ugh.)  */
-
-	if (buffer_dirty(bh)) {
-		/*
-		 * First question: is this buffer already part of the current
-		 * transaction or the existing committing transaction?
-		 */
-		if (jh->b_transaction) {
-			J_ASSERT_JH(jh,
-				jh->b_transaction == transaction ||
-				jh->b_transaction ==
-					journal->j_committing_transaction);
-			if (jh->b_next_transaction)
-				J_ASSERT_JH(jh, jh->b_next_transaction ==
-							transaction);
-			warn_dirty_buffer(bh);
-		}
-		/*
-		 * In any case we need to clean the dirty flag and we must
-		 * do it under the buffer lock to be sure we don't race
-		 * with running write-out.
-		 */
-		JBUFFER_TRACE(jh, "Journalling dirty buffer");
-		clear_buffer_dirty(bh);
-		set_buffer_jbddirty(bh);
-	}
-
-	unlock_buffer(bh);
-
-	error = -EROFS;
-	if (is_handle_aborted(handle)) {
-		jbd_unlock_bh_state(bh);
-		goto out;
-	}
-	error = 0;
-
-	/*
-	 * The buffer is already part of this transaction if b_transaction or
-	 * b_next_transaction points to it
-	 */
-	if (jh->b_transaction == transaction ||
-	    jh->b_next_transaction == transaction)
-		goto done;
-
-	/*
-	 * this is the first time this transaction is touching this buffer,
-	 * reset the modified flag
-	 */
-	jh->b_modified = 0;
-
-	/*
-	 * If there is already a copy-out version of this buffer, then we don't
-	 * need to make another one
-	 */
-	if (jh->b_frozen_data) {
-		JBUFFER_TRACE(jh, "has frozen data");
-		J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
-		jh->b_next_transaction = transaction;
-		goto done;
-	}
-
-	/* Is there data here we need to preserve? */
-
-	if (jh->b_transaction && jh->b_transaction != transaction) {
-		JBUFFER_TRACE(jh, "owned by older transaction");
-		J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
-		J_ASSERT_JH(jh, jh->b_transaction ==
-					journal->j_committing_transaction);
-
-		/* There is one case we have to be very careful about.
-		 * If the committing transaction is currently writing
-		 * this buffer out to disk and has NOT made a copy-out,
-		 * then we cannot modify the buffer contents at all
-		 * right now.  The essence of copy-out is that it is the
-		 * extra copy, not the primary copy, which gets
-		 * journaled.  If the primary copy is already going to
-		 * disk then we cannot do copy-out here. */
-
-		if (jh->b_jlist == BJ_Shadow) {
-			DEFINE_WAIT_BIT(wait, &bh->b_state, BH_Unshadow);
-			wait_queue_head_t *wqh;
-
-			wqh = bit_waitqueue(&bh->b_state, BH_Unshadow);
-
-			JBUFFER_TRACE(jh, "on shadow: sleep");
-			jbd_unlock_bh_state(bh);
-			/* commit wakes up all shadow buffers after IO */
-			for ( ; ; ) {
-				prepare_to_wait(wqh, &wait.wait,
-						TASK_UNINTERRUPTIBLE);
-				if (jh->b_jlist != BJ_Shadow)
-					break;
-				schedule();
-			}
-			finish_wait(wqh, &wait.wait);
-			goto repeat;
-		}
-
-		/* Only do the copy if the currently-owning transaction
-		 * still needs it.  If it is on the Forget list, the
-		 * committing transaction is past that stage.  The
-		 * buffer had better remain locked during the kmalloc,
-		 * but that should be true --- we hold the journal lock
-		 * still and the buffer is already on the BUF_JOURNAL
-		 * list so won't be flushed.
-		 *
-		 * Subtle point, though: if this is a get_undo_access,
-		 * then we will be relying on the frozen_data to contain
-		 * the new value of the committed_data record after the
-		 * transaction, so we HAVE to force the frozen_data copy
-		 * in that case. */
-
-		if (jh->b_jlist != BJ_Forget || force_copy) {
-			JBUFFER_TRACE(jh, "generate frozen data");
-			if (!frozen_buffer) {
-				JBUFFER_TRACE(jh, "allocate memory for buffer");
-				jbd_unlock_bh_state(bh);
-				frozen_buffer =
-					jbd_alloc(jh2bh(jh)->b_size,
-							 GFP_NOFS);
-				if (!frozen_buffer) {
-					printk(KERN_ERR
-					       "%s: OOM for frozen_buffer\n",
-					       __func__);
-					JBUFFER_TRACE(jh, "oom!");
-					error = -ENOMEM;
-					jbd_lock_bh_state(bh);
-					goto done;
-				}
-				goto repeat;
-			}
-			jh->b_frozen_data = frozen_buffer;
-			frozen_buffer = NULL;
-			need_copy = 1;
-		}
-		jh->b_next_transaction = transaction;
-	}
-
-
-	/*
-	 * Finally, if the buffer is not journaled right now, we need to make
-	 * sure it doesn't get written to disk before the caller actually
-	 * commits the new data
-	 */
-	if (!jh->b_transaction) {
-		JBUFFER_TRACE(jh, "no transaction");
-		J_ASSERT_JH(jh, !jh->b_next_transaction);
-		JBUFFER_TRACE(jh, "file as BJ_Reserved");
-		spin_lock(&journal->j_list_lock);
-		__journal_file_buffer(jh, transaction, BJ_Reserved);
-		spin_unlock(&journal->j_list_lock);
-	}
-
-done:
-	if (need_copy) {
-		struct page *page;
-		int offset;
-		char *source;
-
-		J_EXPECT_JH(jh, buffer_uptodate(jh2bh(jh)),
-			    "Possible IO failure.\n");
-		page = jh2bh(jh)->b_page;
-		offset = offset_in_page(jh2bh(jh)->b_data);
-		source = kmap_atomic(page);
-		memcpy(jh->b_frozen_data, source+offset, jh2bh(jh)->b_size);
-		kunmap_atomic(source);
-	}
-	jbd_unlock_bh_state(bh);
-
-	/*
-	 * If we are about to journal a buffer, then any revoke pending on it is
-	 * no longer valid
-	 */
-	journal_cancel_revoke(handle, jh);
-
-out:
-	if (unlikely(frozen_buffer))	/* It's usually NULL */
-		jbd_free(frozen_buffer, bh->b_size);
-
-	JBUFFER_TRACE(jh, "exit");
-	return error;
-}
-
-/**
- * int journal_get_write_access() - notify intent to modify a buffer for metadata (not data) update.
- * @handle: transaction to add buffer modifications to
- * @bh:     bh to be used for metadata writes
- *
- * Returns an error code or 0 on success.
- *
- * In full data journalling mode the buffer may be of type BJ_AsyncData,
- * because we're write()ing a buffer which is also part of a shared mapping.
- */
-
-int journal_get_write_access(handle_t *handle, struct buffer_head *bh)
-{
-	struct journal_head *jh = journal_add_journal_head(bh);
-	int rc;
-
-	/* We do not want to get caught playing with fields which the
-	 * log thread also manipulates.  Make sure that the buffer
-	 * completes any outstanding IO before proceeding. */
-	rc = do_get_write_access(handle, jh, 0);
-	journal_put_journal_head(jh);
-	return rc;
-}
-
-
-/*
- * When the user wants to journal a newly created buffer_head
- * (ie. getblk() returned a new buffer and we are going to populate it
- * manually rather than reading off disk), then we need to keep the
- * buffer_head locked until it has been completely filled with new
- * data.  In this case, we should be able to make the assertion that
- * the bh is not already part of an existing transaction.
- *
- * The buffer should already be locked by the caller by this point.
- * There is no lock ranking violation: it was a newly created,
- * unlocked buffer beforehand. */
-
-/**
- * int journal_get_create_access () - notify intent to use newly created bh
- * @handle: transaction to new buffer to
- * @bh: new buffer.
- *
- * Call this if you create a new bh.
- */
-int journal_get_create_access(handle_t *handle, struct buffer_head *bh)
-{
-	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
-	struct journal_head *jh = journal_add_journal_head(bh);
-	int err;
-
-	jbd_debug(5, "journal_head %p\n", jh);
-	err = -EROFS;
-	if (is_handle_aborted(handle))
-		goto out;
-	err = 0;
-
-	JBUFFER_TRACE(jh, "entry");
-	/*
-	 * The buffer may already belong to this transaction due to pre-zeroing
-	 * in the filesystem's new_block code.  It may also be on the previous,
-	 * committing transaction's lists, but it HAS to be in Forget state in
-	 * that case: the transaction must have deleted the buffer for it to be
-	 * reused here.
-	 */
-	jbd_lock_bh_state(bh);
-	spin_lock(&journal->j_list_lock);
-	J_ASSERT_JH(jh, (jh->b_transaction == transaction ||
-		jh->b_transaction == NULL ||
-		(jh->b_transaction == journal->j_committing_transaction &&
-			  jh->b_jlist == BJ_Forget)));
-
-	J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
-	J_ASSERT_JH(jh, buffer_locked(jh2bh(jh)));
-
-	if (jh->b_transaction == NULL) {
-		/*
-		 * Previous journal_forget() could have left the buffer
-		 * with jbddirty bit set because it was being committed. When
-		 * the commit finished, we've filed the buffer for
-		 * checkpointing and marked it dirty. Now we are reallocating
-		 * the buffer so the transaction freeing it must have
-		 * committed and so it's safe to clear the dirty bit.
-		 */
-		clear_buffer_dirty(jh2bh(jh));
-
-		/* first access by this transaction */
-		jh->b_modified = 0;
-
-		JBUFFER_TRACE(jh, "file as BJ_Reserved");
-		__journal_file_buffer(jh, transaction, BJ_Reserved);
-	} else if (jh->b_transaction == journal->j_committing_transaction) {
-		/* first access by this transaction */
-		jh->b_modified = 0;
-
-		JBUFFER_TRACE(jh, "set next transaction");
-		jh->b_next_transaction = transaction;
-	}
-	spin_unlock(&journal->j_list_lock);
-	jbd_unlock_bh_state(bh);
-
-	/*
-	 * akpm: I added this.  ext3_alloc_branch can pick up new indirect
-	 * blocks which contain freed but then revoked metadata.  We need
-	 * to cancel the revoke in case we end up freeing it yet again
-	 * and the reallocating as data - this would cause a second revoke,
-	 * which hits an assertion error.
-	 */
-	JBUFFER_TRACE(jh, "cancelling revoke");
-	journal_cancel_revoke(handle, jh);
-out:
-	journal_put_journal_head(jh);
-	return err;
-}
-
-/**
- * int journal_get_undo_access() - Notify intent to modify metadata with non-rewindable consequences
- * @handle: transaction
- * @bh: buffer to undo
- *
- * Sometimes there is a need to distinguish between metadata which has
- * been committed to disk and that which has not.  The ext3fs code uses
- * this for freeing and allocating space, we have to make sure that we
- * do not reuse freed space until the deallocation has been committed,
- * since if we overwrote that space we would make the delete
- * un-rewindable in case of a crash.
- *
- * To deal with that, journal_get_undo_access requests write access to a
- * buffer for parts of non-rewindable operations such as delete
- * operations on the bitmaps.  The journaling code must keep a copy of
- * the buffer's contents prior to the undo_access call until such time
- * as we know that the buffer has definitely been committed to disk.
- *
- * We never need to know which transaction the committed data is part
- * of, buffers touched here are guaranteed to be dirtied later and so
- * will be committed to a new transaction in due course, at which point
- * we can discard the old committed data pointer.
- *
- * Returns error number or 0 on success.
- */
-int journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
-{
-	int err;
-	struct journal_head *jh = journal_add_journal_head(bh);
-	char *committed_data = NULL;
-
-	JBUFFER_TRACE(jh, "entry");
-
-	/*
-	 * Do this first --- it can drop the journal lock, so we want to
-	 * make sure that obtaining the committed_data is done
-	 * atomically wrt. completion of any outstanding commits.
-	 */
-	err = do_get_write_access(handle, jh, 1);
-	if (err)
-		goto out;
-
-repeat:
-	if (!jh->b_committed_data) {
-		committed_data = jbd_alloc(jh2bh(jh)->b_size, GFP_NOFS);
-		if (!committed_data) {
-			printk(KERN_ERR "%s: No memory for committed data\n",
-				__func__);
-			err = -ENOMEM;
-			goto out;
-		}
-	}
-
-	jbd_lock_bh_state(bh);
-	if (!jh->b_committed_data) {
-		/* Copy out the current buffer contents into the
-		 * preserved, committed copy. */
-		JBUFFER_TRACE(jh, "generate b_committed data");
-		if (!committed_data) {
-			jbd_unlock_bh_state(bh);
-			goto repeat;
-		}
-
-		jh->b_committed_data = committed_data;
-		committed_data = NULL;
-		memcpy(jh->b_committed_data, bh->b_data, bh->b_size);
-	}
-	jbd_unlock_bh_state(bh);
-out:
-	journal_put_journal_head(jh);
-	if (unlikely(committed_data))
-		jbd_free(committed_data, bh->b_size);
-	return err;
-}
-
-/**
- * int journal_dirty_data() - mark a buffer as containing dirty data to be flushed
- * @handle: transaction
- * @bh: bufferhead to mark
- *
- * Description:
- * Mark a buffer as containing dirty data which needs to be flushed before
- * we can commit the current transaction.
- *
- * The buffer is placed on the transaction's data list and is marked as
- * belonging to the transaction.
- *
- * Returns error number or 0 on success.
- *
- * journal_dirty_data() can be called via page_launder->ext3_writepage
- * by kswapd.
- */
-int journal_dirty_data(handle_t *handle, struct buffer_head *bh)
-{
-	journal_t *journal = handle->h_transaction->t_journal;
-	int need_brelse = 0;
-	struct journal_head *jh;
-	int ret = 0;
-
-	if (is_handle_aborted(handle))
-		return ret;
-
-	jh = journal_add_journal_head(bh);
-	JBUFFER_TRACE(jh, "entry");
-
-	/*
-	 * The buffer could *already* be dirty.  Writeout can start
-	 * at any time.
-	 */
-	jbd_debug(4, "jh: %p, tid:%d\n", jh, handle->h_transaction->t_tid);
-
-	/*
-	 * What if the buffer is already part of a running transaction?
-	 *
-	 * There are two cases:
-	 * 1) It is part of the current running transaction.  Refile it,
-	 *    just in case we have allocated it as metadata, deallocated
-	 *    it, then reallocated it as data.
-	 * 2) It is part of the previous, still-committing transaction.
-	 *    If all we want to do is to guarantee that the buffer will be
-	 *    written to disk before this new transaction commits, then
-	 *    being sure that the *previous* transaction has this same
-	 *    property is sufficient for us!  Just leave it on its old
-	 *    transaction.
-	 *
-	 * In case (2), the buffer must not already exist as metadata
-	 * --- that would violate write ordering (a transaction is free
-	 * to write its data at any point, even before the previous
-	 * committing transaction has committed).  The caller must
-	 * never, ever allow this to happen: there's nothing we can do
-	 * about it in this layer.
-	 */
-	jbd_lock_bh_state(bh);
-	spin_lock(&journal->j_list_lock);
-
-	/* Now that we have bh_state locked, are we really still mapped? */
-	if (!buffer_mapped(bh)) {
-		JBUFFER_TRACE(jh, "unmapped buffer, bailing out");
-		goto no_journal;
-	}
-
-	if (jh->b_transaction) {
-		JBUFFER_TRACE(jh, "has transaction");
-		if (jh->b_transaction != handle->h_transaction) {
-			JBUFFER_TRACE(jh, "belongs to older transaction");
-			J_ASSERT_JH(jh, jh->b_transaction ==
-					journal->j_committing_transaction);
-
-			/* @@@ IS THIS TRUE  ? */
-			/*
-			 * Not any more.  Scenario: someone does a write()
-			 * in data=journal mode.  The buffer's transaction has
-			 * moved into commit.  Then someone does another
-			 * write() to the file.  We do the frozen data copyout
-			 * and set b_next_transaction to point to j_running_t.
-			 * And while we're in that state, someone does a
-			 * writepage() in an attempt to pageout the same area
-			 * of the file via a shared mapping.  At present that
-			 * calls journal_dirty_data(), and we get right here.
-			 * It may be too late to journal the data.  Simply
-			 * falling through to the next test will suffice: the
-			 * data will be dirty and wil be checkpointed.  The
-			 * ordering comments in the next comment block still
-			 * apply.
-			 */
-			//J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
-
-			/*
-			 * If we're journalling data, and this buffer was
-			 * subject to a write(), it could be metadata, forget
-			 * or shadow against the committing transaction.  Now,
-			 * someone has dirtied the same darn page via a mapping
-			 * and it is being writepage()'d.
-			 * We *could* just steal the page from commit, with some
-			 * fancy locking there.  Instead, we just skip it -
-			 * don't tie the page's buffers to the new transaction
-			 * at all.
-			 * Implication: if we crash before the writepage() data
-			 * is written into the filesystem, recovery will replay
-			 * the write() data.
-			 */
-			if (jh->b_jlist != BJ_None &&
-					jh->b_jlist != BJ_SyncData &&
-					jh->b_jlist != BJ_Locked) {
-				JBUFFER_TRACE(jh, "Not stealing");
-				goto no_journal;
-			}
-
-			/*
-			 * This buffer may be undergoing writeout in commit.  We
-			 * can't return from here and let the caller dirty it
-			 * again because that can cause the write-out loop in
-			 * commit to never terminate.
-			 */
-			if (buffer_dirty(bh)) {
-				get_bh(bh);
-				spin_unlock(&journal->j_list_lock);
-				jbd_unlock_bh_state(bh);
-				need_brelse = 1;
-				sync_dirty_buffer(bh);
-				jbd_lock_bh_state(bh);
-				spin_lock(&journal->j_list_lock);
-				/* Since we dropped the lock... */
-				if (!buffer_mapped(bh)) {
-					JBUFFER_TRACE(jh, "buffer got unmapped");
-					goto no_journal;
-				}
-				/* The buffer may become locked again at any
-				   time if it is redirtied */
-			}
-
-			/*
-			 * We cannot remove the buffer with io error from the
-			 * committing transaction, because otherwise it would
-			 * miss the error and the commit would not abort.
-			 */
-			if (unlikely(!buffer_uptodate(bh))) {
-				ret = -EIO;
-				goto no_journal;
-			}
-			/* We might have slept so buffer could be refiled now */
-			if (jh->b_transaction != NULL &&
-			    jh->b_transaction != handle->h_transaction) {
-				JBUFFER_TRACE(jh, "unfile from commit");
-				__journal_temp_unlink_buffer(jh);
-				/* It still points to the committing
-				 * transaction; move it to this one so
-				 * that the refile assert checks are
-				 * happy. */
-				jh->b_transaction = handle->h_transaction;
-			}
-			/* The buffer will be refiled below */
-
-		}
-		/*
-		 * Special case --- the buffer might actually have been
-		 * allocated and then immediately deallocated in the previous,
-		 * committing transaction, so might still be left on that
-		 * transaction's metadata lists.
-		 */
-		if (jh->b_jlist != BJ_SyncData && jh->b_jlist != BJ_Locked) {
-			JBUFFER_TRACE(jh, "not on correct data list: unfile");
-			J_ASSERT_JH(jh, jh->b_jlist != BJ_Shadow);
-			JBUFFER_TRACE(jh, "file as data");
-			__journal_file_buffer(jh, handle->h_transaction,
-						BJ_SyncData);
-		}
-	} else {
-		JBUFFER_TRACE(jh, "not on a transaction");
-		__journal_file_buffer(jh, handle->h_transaction, BJ_SyncData);
-	}
-no_journal:
-	spin_unlock(&journal->j_list_lock);
-	jbd_unlock_bh_state(bh);
-	if (need_brelse) {
-		BUFFER_TRACE(bh, "brelse");
-		__brelse(bh);
-	}
-	JBUFFER_TRACE(jh, "exit");
-	journal_put_journal_head(jh);
-	return ret;
-}
-
-/**
- * int journal_dirty_metadata() - mark a buffer as containing dirty metadata
- * @handle: transaction to add buffer to.
- * @bh: buffer to mark
- *
- * Mark dirty metadata which needs to be journaled as part of the current
- * transaction.
- *
- * The buffer is placed on the transaction's metadata list and is marked
- * as belonging to the transaction.
- *
- * Returns error number or 0 on success.
- *
- * Special care needs to be taken if the buffer already belongs to the
- * current committing transaction (in which case we should have frozen
- * data present for that commit).  In that case, we don't relink the
- * buffer: that only gets done when the old transaction finally
- * completes its commit.
- */
-int journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
-{
-	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
-	struct journal_head *jh = bh2jh(bh);
-
-	jbd_debug(5, "journal_head %p\n", jh);
-	JBUFFER_TRACE(jh, "entry");
-	if (is_handle_aborted(handle))
-		goto out;
-
-	jbd_lock_bh_state(bh);
-
-	if (jh->b_modified == 0) {
-		/*
-		 * This buffer's got modified and becoming part
-		 * of the transaction. This needs to be done
-		 * once a transaction -bzzz
-		 */
-		jh->b_modified = 1;
-		J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
-		handle->h_buffer_credits--;
-	}
-
-	/*
-	 * fastpath, to avoid expensive locking.  If this buffer is already
-	 * on the running transaction's metadata list there is nothing to do.
-	 * Nobody can take it off again because there is a handle open.
-	 * I _think_ we're OK here with SMP barriers - a mistaken decision will
-	 * result in this test being false, so we go in and take the locks.
-	 */
-	if (jh->b_transaction == transaction && jh->b_jlist == BJ_Metadata) {
-		JBUFFER_TRACE(jh, "fastpath");
-		J_ASSERT_JH(jh, jh->b_transaction ==
-					journal->j_running_transaction);
-		goto out_unlock_bh;
-	}
-
-	set_buffer_jbddirty(bh);
-
-	/*
-	 * Metadata already on the current transaction list doesn't
-	 * need to be filed.  Metadata on another transaction's list must
-	 * be committing, and will be refiled once the commit completes:
-	 * leave it alone for now.
-	 */
-	if (jh->b_transaction != transaction) {
-		JBUFFER_TRACE(jh, "already on other transaction");
-		J_ASSERT_JH(jh, jh->b_transaction ==
-					journal->j_committing_transaction);
-		J_ASSERT_JH(jh, jh->b_next_transaction == transaction);
-		/* And this case is illegal: we can't reuse another
-		 * transaction's data buffer, ever. */
-		goto out_unlock_bh;
-	}
-
-	/* That test should have eliminated the following case: */
-	J_ASSERT_JH(jh, jh->b_frozen_data == NULL);
-
-	JBUFFER_TRACE(jh, "file as BJ_Metadata");
-	spin_lock(&journal->j_list_lock);
-	__journal_file_buffer(jh, handle->h_transaction, BJ_Metadata);
-	spin_unlock(&journal->j_list_lock);
-out_unlock_bh:
-	jbd_unlock_bh_state(bh);
-out:
-	JBUFFER_TRACE(jh, "exit");
-	return 0;
-}
-
-/*
- * journal_release_buffer: undo a get_write_access without any buffer
- * updates, if the update decided in the end that it didn't need access.
- *
- */
-void
-journal_release_buffer(handle_t *handle, struct buffer_head *bh)
-{
-	BUFFER_TRACE(bh, "entry");
-}
-
-/**
- * void journal_forget() - bforget() for potentially-journaled buffers.
- * @handle: transaction handle
- * @bh:     bh to 'forget'
- *
- * We can only do the bforget if there are no commits pending against the
- * buffer.  If the buffer is dirty in the current running transaction we
- * can safely unlink it.
- *
- * bh may not be a journalled buffer at all - it may be a non-JBD
- * buffer which came off the hashtable.  Check for this.
- *
- * Decrements bh->b_count by one.
- *
- * Allow this call even if the handle has aborted --- it may be part of
- * the caller's cleanup after an abort.
- */
-int journal_forget (handle_t *handle, struct buffer_head *bh)
-{
-	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
-	struct journal_head *jh;
-	int drop_reserve = 0;
-	int err = 0;
-	int was_modified = 0;
-
-	BUFFER_TRACE(bh, "entry");
-
-	jbd_lock_bh_state(bh);
-	spin_lock(&journal->j_list_lock);
-
-	if (!buffer_jbd(bh))
-		goto not_jbd;
-	jh = bh2jh(bh);
-
-	/* Critical error: attempting to delete a bitmap buffer, maybe?
-	 * Don't do any jbd operations, and return an error. */
-	if (!J_EXPECT_JH(jh, !jh->b_committed_data,
-			 "inconsistent data on disk")) {
-		err = -EIO;
-		goto not_jbd;
-	}
-
-	/* keep track of whether or not this transaction modified us */
-	was_modified = jh->b_modified;
-
-	/*
-	 * The buffer's going from the transaction, we must drop
-	 * all references -bzzz
-	 */
-	jh->b_modified = 0;
-
-	if (jh->b_transaction == handle->h_transaction) {
-		J_ASSERT_JH(jh, !jh->b_frozen_data);
-
-		/* If we are forgetting a buffer which is already part
-		 * of this transaction, then we can just drop it from
-		 * the transaction immediately. */
-		clear_buffer_dirty(bh);
-		clear_buffer_jbddirty(bh);
-
-		JBUFFER_TRACE(jh, "belongs to current transaction: unfile");
-
-		/*
-		 * we only want to drop a reference if this transaction
-		 * modified the buffer
-		 */
-		if (was_modified)
-			drop_reserve = 1;
-
-		/*
-		 * We are no longer going to journal this buffer.
-		 * However, the commit of this transaction is still
-		 * important to the buffer: the delete that we are now
-		 * processing might obsolete an old log entry, so by
-		 * committing, we can satisfy the buffer's checkpoint.
-		 *
-		 * So, if we have a checkpoint on the buffer, we should
-		 * now refile the buffer on our BJ_Forget list so that
-		 * we know to remove the checkpoint after we commit.
-		 */
-
-		if (jh->b_cp_transaction) {
-			__journal_temp_unlink_buffer(jh);
-			__journal_file_buffer(jh, transaction, BJ_Forget);
-		} else {
-			__journal_unfile_buffer(jh);
-			if (!buffer_jbd(bh)) {
-				spin_unlock(&journal->j_list_lock);
-				jbd_unlock_bh_state(bh);
-				__bforget(bh);
-				goto drop;
-			}
-		}
-	} else if (jh->b_transaction) {
-		J_ASSERT_JH(jh, (jh->b_transaction ==
-				 journal->j_committing_transaction));
-		/* However, if the buffer is still owned by a prior
-		 * (committing) transaction, we can't drop it yet... */
-		JBUFFER_TRACE(jh, "belongs to older transaction");
-		/* ... but we CAN drop it from the new transaction if we
-		 * have also modified it since the original commit. */
-
-		if (jh->b_next_transaction) {
-			J_ASSERT(jh->b_next_transaction == transaction);
-			jh->b_next_transaction = NULL;
-
-			/*
-			 * only drop a reference if this transaction modified
-			 * the buffer
-			 */
-			if (was_modified)
-				drop_reserve = 1;
-		}
-	}
-
-not_jbd:
-	spin_unlock(&journal->j_list_lock);
-	jbd_unlock_bh_state(bh);
-	__brelse(bh);
-drop:
-	if (drop_reserve) {
-		/* no need to reserve log space for this block -bzzz */
-		handle->h_buffer_credits++;
-	}
-	return err;
-}
-
-/**
- * int journal_stop() - complete a transaction
- * @handle: tranaction to complete.
- *
- * All done for a particular handle.
- *
- * There is not much action needed here.  We just return any remaining
- * buffer credits to the transaction and remove the handle.  The only
- * complication is that we need to start a commit operation if the
- * filesystem is marked for synchronous update.
- *
- * journal_stop itself will not usually return an error, but it may
- * do so in unusual circumstances.  In particular, expect it to
- * return -EIO if a journal_abort has been executed since the
- * transaction began.
- */
-int journal_stop(handle_t *handle)
-{
-	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
-	int err;
-	pid_t pid;
-
-	J_ASSERT(journal_current_handle() == handle);
-
-	if (is_handle_aborted(handle))
-		err = -EIO;
-	else {
-		J_ASSERT(transaction->t_updates > 0);
-		err = 0;
-	}
-
-	if (--handle->h_ref > 0) {
-		jbd_debug(4, "h_ref %d -> %d\n", handle->h_ref + 1,
-			  handle->h_ref);
-		return err;
-	}
-
-	jbd_debug(4, "Handle %p going down\n", handle);
-
-	/*
-	 * Implement synchronous transaction batching.  If the handle
-	 * was synchronous, don't force a commit immediately.  Let's
-	 * yield and let another thread piggyback onto this transaction.
-	 * Keep doing that while new threads continue to arrive.
-	 * It doesn't cost much - we're about to run a commit and sleep
-	 * on IO anyway.  Speeds up many-threaded, many-dir operations
-	 * by 30x or more...
-	 *
-	 * We try and optimize the sleep time against what the underlying disk
-	 * can do, instead of having a static sleep time.  This is useful for
-	 * the case where our storage is so fast that it is more optimal to go
-	 * ahead and force a flush and wait for the transaction to be committed
-	 * than it is to wait for an arbitrary amount of time for new writers to
-	 * join the transaction.  We achieve this by measuring how long it takes
-	 * to commit a transaction, and compare it with how long this
-	 * transaction has been running, and if run time < commit time then we
-	 * sleep for the delta and commit.  This greatly helps super fast disks
-	 * that would see slowdowns as more threads started doing fsyncs.
-	 *
-	 * But don't do this if this process was the most recent one to
-	 * perform a synchronous write.  We do this to detect the case where a
-	 * single process is doing a stream of sync writes.  No point in waiting
-	 * for joiners in that case.
-	 */
-	pid = current->pid;
-	if (handle->h_sync && journal->j_last_sync_writer != pid) {
-		u64 commit_time, trans_time;
-
-		journal->j_last_sync_writer = pid;
-
-		spin_lock(&journal->j_state_lock);
-		commit_time = journal->j_average_commit_time;
-		spin_unlock(&journal->j_state_lock);
-
-		trans_time = ktime_to_ns(ktime_sub(ktime_get(),
-						   transaction->t_start_time));
-
-		commit_time = min_t(u64, commit_time,
-				    1000*jiffies_to_usecs(1));
-
-		if (trans_time < commit_time) {
-			ktime_t expires = ktime_add_ns(ktime_get(),
-						       commit_time);
-			set_current_state(TASK_UNINTERRUPTIBLE);
-			schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
-		}
-	}
-
-	current->journal_info = NULL;
-	spin_lock(&journal->j_state_lock);
-	spin_lock(&transaction->t_handle_lock);
-	transaction->t_outstanding_credits -= handle->h_buffer_credits;
-	transaction->t_updates--;
-	if (!transaction->t_updates) {
-		wake_up(&journal->j_wait_updates);
-		if (journal->j_barrier_count)
-			wake_up(&journal->j_wait_transaction_locked);
-	}
-
-	/*
-	 * If the handle is marked SYNC, we need to set another commit
-	 * going!  We also want to force a commit if the current
-	 * transaction is occupying too much of the log, or if the
-	 * transaction is too old now.
-	 */
-	if (handle->h_sync ||
-			transaction->t_outstanding_credits >
-				journal->j_max_transaction_buffers ||
-			time_after_eq(jiffies, transaction->t_expires)) {
-		/* Do this even for aborted journals: an abort still
-		 * completes the commit thread, it just doesn't write
-		 * anything to disk. */
-		tid_t tid = transaction->t_tid;
-
-		spin_unlock(&transaction->t_handle_lock);
-		jbd_debug(2, "transaction too old, requesting commit for "
-					"handle %p\n", handle);
-		/* This is non-blocking */
-		__log_start_commit(journal, transaction->t_tid);
-		spin_unlock(&journal->j_state_lock);
-
-		/*
-		 * Special case: JFS_SYNC synchronous updates require us
-		 * to wait for the commit to complete.
-		 */
-		if (handle->h_sync && !(current->flags & PF_MEMALLOC))
-			err = log_wait_commit(journal, tid);
-	} else {
-		spin_unlock(&transaction->t_handle_lock);
-		spin_unlock(&journal->j_state_lock);
-	}
-
-	lock_map_release(&handle->h_lockdep_map);
-
-	jbd_free_handle(handle);
-	return err;
-}
-
-/**
- * int journal_force_commit() - force any uncommitted transactions
- * @journal: journal to force
- *
- * For synchronous operations: force any uncommitted transactions
- * to disk.  May seem kludgy, but it reuses all the handle batching
- * code in a very simple manner.
- */
-int journal_force_commit(journal_t *journal)
-{
-	handle_t *handle;
-	int ret;
-
-	handle = journal_start(journal, 1);
-	if (IS_ERR(handle)) {
-		ret = PTR_ERR(handle);
-	} else {
-		handle->h_sync = 1;
-		ret = journal_stop(handle);
-	}
-	return ret;
-}
-
-/*
- *
- * List management code snippets: various functions for manipulating the
- * transaction buffer lists.
- *
- */
-
-/*
- * Append a buffer to a transaction list, given the transaction's list head
- * pointer.
- *
- * j_list_lock is held.
- *
- * jbd_lock_bh_state(jh2bh(jh)) is held.
- */
-
-static inline void
-__blist_add_buffer(struct journal_head **list, struct journal_head *jh)
-{
-	if (!*list) {
-		jh->b_tnext = jh->b_tprev = jh;
-		*list = jh;
-	} else {
-		/* Insert at the tail of the list to preserve order */
-		struct journal_head *first = *list, *last = first->b_tprev;
-		jh->b_tprev = last;
-		jh->b_tnext = first;
-		last->b_tnext = first->b_tprev = jh;
-	}
-}
-
-/*
- * Remove a buffer from a transaction list, given the transaction's list
- * head pointer.
- *
- * Called with j_list_lock held, and the journal may not be locked.
- *
- * jbd_lock_bh_state(jh2bh(jh)) is held.
- */
-
-static inline void
-__blist_del_buffer(struct journal_head **list, struct journal_head *jh)
-{
-	if (*list == jh) {
-		*list = jh->b_tnext;
-		if (*list == jh)
-			*list = NULL;
-	}
-	jh->b_tprev->b_tnext = jh->b_tnext;
-	jh->b_tnext->b_tprev = jh->b_tprev;
-}
-
-/*
- * Remove a buffer from the appropriate transaction list.
- *
- * Note that this function can *change* the value of
- * bh->b_transaction->t_sync_datalist, t_buffers, t_forget,
- * t_iobuf_list, t_shadow_list, t_log_list or t_reserved_list.  If the caller
- * is holding onto a copy of one of thee pointers, it could go bad.
- * Generally the caller needs to re-read the pointer from the transaction_t.
- *
- * Called under j_list_lock.  The journal may not be locked.
- */
-static void __journal_temp_unlink_buffer(struct journal_head *jh)
-{
-	struct journal_head **list = NULL;
-	transaction_t *transaction;
-	struct buffer_head *bh = jh2bh(jh);
-
-	J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));
-	transaction = jh->b_transaction;
-	if (transaction)
-		assert_spin_locked(&transaction->t_journal->j_list_lock);
-
-	J_ASSERT_JH(jh, jh->b_jlist < BJ_Types);
-	if (jh->b_jlist != BJ_None)
-		J_ASSERT_JH(jh, transaction != NULL);
-
-	switch (jh->b_jlist) {
-	case BJ_None:
-		return;
-	case BJ_SyncData:
-		list = &transaction->t_sync_datalist;
-		break;
-	case BJ_Metadata:
-		transaction->t_nr_buffers--;
-		J_ASSERT_JH(jh, transaction->t_nr_buffers >= 0);
-		list = &transaction->t_buffers;
-		break;
-	case BJ_Forget:
-		list = &transaction->t_forget;
-		break;
-	case BJ_IO:
-		list = &transaction->t_iobuf_list;
-		break;
-	case BJ_Shadow:
-		list = &transaction->t_shadow_list;
-		break;
-	case BJ_LogCtl:
-		list = &transaction->t_log_list;
-		break;
-	case BJ_Reserved:
-		list = &transaction->t_reserved_list;
-		break;
-	case BJ_Locked:
-		list = &transaction->t_locked_list;
-		break;
-	}
-
-	__blist_del_buffer(list, jh);
-	jh->b_jlist = BJ_None;
-	if (test_clear_buffer_jbddirty(bh))
-		mark_buffer_dirty(bh);	/* Expose it to the VM */
-}
-
-/*
- * Remove buffer from all transactions.
- *
- * Called with bh_state lock and j_list_lock
- *
- * jh and bh may be already freed when this function returns.
- */
-void __journal_unfile_buffer(struct journal_head *jh)
-{
-	__journal_temp_unlink_buffer(jh);
-	jh->b_transaction = NULL;
-	journal_put_journal_head(jh);
-}
-
-void journal_unfile_buffer(journal_t *journal, struct journal_head *jh)
-{
-	struct buffer_head *bh = jh2bh(jh);
-
-	/* Get reference so that buffer cannot be freed before we unlock it */
-	get_bh(bh);
-	jbd_lock_bh_state(bh);
-	spin_lock(&journal->j_list_lock);
-	__journal_unfile_buffer(jh);
-	spin_unlock(&journal->j_list_lock);
-	jbd_unlock_bh_state(bh);
-	__brelse(bh);
-}
-
-/*
- * Called from journal_try_to_free_buffers().
- *
- * Called under jbd_lock_bh_state(bh)
- */
-static void
-__journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
-{
-	struct journal_head *jh;
-
-	jh = bh2jh(bh);
-
-	if (buffer_locked(bh) || buffer_dirty(bh))
-		goto out;
-
-	if (jh->b_next_transaction != NULL)
-		goto out;
-
-	spin_lock(&journal->j_list_lock);
-	if (jh->b_transaction != NULL && jh->b_cp_transaction == NULL) {
-		if (jh->b_jlist == BJ_SyncData || jh->b_jlist == BJ_Locked) {
-			/* A written-back ordered data buffer */
-			JBUFFER_TRACE(jh, "release data");
-			__journal_unfile_buffer(jh);
-		}
-	} else if (jh->b_cp_transaction != NULL && jh->b_transaction == NULL) {
-		/* written-back checkpointed metadata buffer */
-		if (jh->b_jlist == BJ_None) {
-			JBUFFER_TRACE(jh, "remove from checkpoint list");
-			__journal_remove_checkpoint(jh);
-		}
-	}
-	spin_unlock(&journal->j_list_lock);
-out:
-	return;
-}
-
-/**
- * int journal_try_to_free_buffers() - try to free page buffers.
- * @journal: journal for operation
- * @page: to try and free
- * @gfp_mask: we use the mask to detect how hard should we try to release
- * buffers. If __GFP_WAIT and __GFP_FS is set, we wait for commit code to
- * release the buffers.
- *
- *
- * For all the buffers on this page,
- * if they are fully written out ordered data, move them onto BUF_CLEAN
- * so try_to_free_buffers() can reap them.
- *
- * This function returns non-zero if we wish try_to_free_buffers()
- * to be called. We do this if the page is releasable by try_to_free_buffers().
- * We also do it if the page has locked or dirty buffers and the caller wants
- * us to perform sync or async writeout.
- *
- * This complicates JBD locking somewhat.  We aren't protected by the
- * BKL here.  We wish to remove the buffer from its committing or
- * running transaction's ->t_datalist via __journal_unfile_buffer.
- *
- * This may *change* the value of transaction_t->t_datalist, so anyone
- * who looks at t_datalist needs to lock against this function.
- *
- * Even worse, someone may be doing a journal_dirty_data on this
- * buffer.  So we need to lock against that.  journal_dirty_data()
- * will come out of the lock with the buffer dirty, which makes it
- * ineligible for release here.
- *
- * Who else is affected by this?  hmm...  Really the only contender
- * is do_get_write_access() - it could be looking at the buffer while
- * journal_try_to_free_buffer() is changing its state.  But that
- * cannot happen because we never reallocate freed data as metadata
- * while the data is part of a transaction.  Yes?
- *
- * Return 0 on failure, 1 on success
- */
-int journal_try_to_free_buffers(journal_t *journal,
-				struct page *page, gfp_t gfp_mask)
-{
-	struct buffer_head *head;
-	struct buffer_head *bh;
-	int ret = 0;
-
-	J_ASSERT(PageLocked(page));
-
-	head = page_buffers(page);
-	bh = head;
-	do {
-		struct journal_head *jh;
-
-		/*
-		 * We take our own ref against the journal_head here to avoid
-		 * having to add tons of locking around each instance of
-		 * journal_put_journal_head().
-		 */
-		jh = journal_grab_journal_head(bh);
-		if (!jh)
-			continue;
-
-		jbd_lock_bh_state(bh);
-		__journal_try_to_free_buffer(journal, bh);
-		journal_put_journal_head(jh);
-		jbd_unlock_bh_state(bh);
-		if (buffer_jbd(bh))
-			goto busy;
-	} while ((bh = bh->b_this_page) != head);
-
-	ret = try_to_free_buffers(page);
-
-busy:
-	return ret;
-}
-
-/*
- * This buffer is no longer needed.  If it is on an older transaction's
- * checkpoint list we need to record it on this transaction's forget list
- * to pin this buffer (and hence its checkpointing transaction) down until
- * this transaction commits.  If the buffer isn't on a checkpoint list, we
- * release it.
- * Returns non-zero if JBD no longer has an interest in the buffer.
- *
- * Called under j_list_lock.
- *
- * Called under jbd_lock_bh_state(bh).
- */
-static int __dispose_buffer(struct journal_head *jh, transaction_t *transaction)
-{
-	int may_free = 1;
-	struct buffer_head *bh = jh2bh(jh);
-
-	if (jh->b_cp_transaction) {
-		JBUFFER_TRACE(jh, "on running+cp transaction");
-		__journal_temp_unlink_buffer(jh);
-		/*
-		 * We don't want to write the buffer anymore, clear the
-		 * bit so that we don't confuse checks in
-		 * __journal_file_buffer
-		 */
-		clear_buffer_dirty(bh);
-		__journal_file_buffer(jh, transaction, BJ_Forget);
-		may_free = 0;
-	} else {
-		JBUFFER_TRACE(jh, "on running transaction");
-		__journal_unfile_buffer(jh);
-	}
-	return may_free;
-}
-
-/*
- * journal_invalidatepage
- *
- * This code is tricky.  It has a number of cases to deal with.
- *
- * There are two invariants which this code relies on:
- *
- * i_size must be updated on disk before we start calling invalidatepage on the
- * data.
- *
- *  This is done in ext3 by defining an ext3_setattr method which
- *  updates i_size before truncate gets going.  By maintaining this
- *  invariant, we can be sure that it is safe to throw away any buffers
- *  attached to the current transaction: once the transaction commits,
- *  we know that the data will not be needed.
- *
- *  Note however that we can *not* throw away data belonging to the
- *  previous, committing transaction!
- *
- * Any disk blocks which *are* part of the previous, committing
- * transaction (and which therefore cannot be discarded immediately) are
- * not going to be reused in the new running transaction
- *
- *  The bitmap committed_data images guarantee this: any block which is
- *  allocated in one transaction and removed in the next will be marked
- *  as in-use in the committed_data bitmap, so cannot be reused until
- *  the next transaction to delete the block commits.  This means that
- *  leaving committing buffers dirty is quite safe: the disk blocks
- *  cannot be reallocated to a different file and so buffer aliasing is
- *  not possible.
- *
- *
- * The above applies mainly to ordered data mode.  In writeback mode we
- * don't make guarantees about the order in which data hits disk --- in
- * particular we don't guarantee that new dirty data is flushed before
- * transaction commit --- so it is always safe just to discard data
- * immediately in that mode.  --sct
- */
-
-/*
- * The journal_unmap_buffer helper function returns zero if the buffer
- * concerned remains pinned as an anonymous buffer belonging to an older
- * transaction.
- *
- * We're outside-transaction here.  Either or both of j_running_transaction
- * and j_committing_transaction may be NULL.
- */
-static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
-				int partial_page)
-{
-	transaction_t *transaction;
-	struct journal_head *jh;
-	int may_free = 1;
-
-	BUFFER_TRACE(bh, "entry");
-
-retry:
-	/*
-	 * It is safe to proceed here without the j_list_lock because the
-	 * buffers cannot be stolen by try_to_free_buffers as long as we are
-	 * holding the page lock. --sct
-	 */
-
-	if (!buffer_jbd(bh))
-		goto zap_buffer_unlocked;
-
-	spin_lock(&journal->j_state_lock);
-	jbd_lock_bh_state(bh);
-	spin_lock(&journal->j_list_lock);
-
-	jh = journal_grab_journal_head(bh);
-	if (!jh)
-		goto zap_buffer_no_jh;
-
-	/*
-	 * We cannot remove the buffer from checkpoint lists until the
-	 * transaction adding inode to orphan list (let's call it T)
-	 * is committed.  Otherwise if the transaction changing the
-	 * buffer would be cleaned from the journal before T is
-	 * committed, a crash will cause that the correct contents of
-	 * the buffer will be lost.  On the other hand we have to
-	 * clear the buffer dirty bit at latest at the moment when the
-	 * transaction marking the buffer as freed in the filesystem
-	 * structures is committed because from that moment on the
-	 * block can be reallocated and used by a different page.
-	 * Since the block hasn't been freed yet but the inode has
-	 * already been added to orphan list, it is safe for us to add
-	 * the buffer to BJ_Forget list of the newest transaction.
-	 *
-	 * Also we have to clear buffer_mapped flag of a truncated buffer
-	 * because the buffer_head may be attached to the page straddling
-	 * i_size (can happen only when blocksize < pagesize) and thus the
-	 * buffer_head can be reused when the file is extended again. So we end
-	 * up keeping around invalidated buffers attached to transactions'
-	 * BJ_Forget list just to stop checkpointing code from cleaning up
-	 * the transaction this buffer was modified in.
-	 */
-	transaction = jh->b_transaction;
-	if (transaction == NULL) {
-		/* First case: not on any transaction.  If it
-		 * has no checkpoint link, then we can zap it:
-		 * it's a writeback-mode buffer so we don't care
-		 * if it hits disk safely. */
-		if (!jh->b_cp_transaction) {
-			JBUFFER_TRACE(jh, "not on any transaction: zap");
-			goto zap_buffer;
-		}
-
-		if (!buffer_dirty(bh)) {
-			/* bdflush has written it.  We can drop it now */
-			goto zap_buffer;
-		}
-
-		/* OK, it must be in the journal but still not
-		 * written fully to disk: it's metadata or
-		 * journaled data... */
-
-		if (journal->j_running_transaction) {
-			/* ... and once the current transaction has
-			 * committed, the buffer won't be needed any
-			 * longer. */
-			JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget");
-			may_free = __dispose_buffer(jh,
-					journal->j_running_transaction);
-			goto zap_buffer;
-		} else {
-			/* There is no currently-running transaction. So the
-			 * orphan record which we wrote for this file must have
-			 * passed into commit.  We must attach this buffer to
-			 * the committing transaction, if it exists. */
-			if (journal->j_committing_transaction) {
-				JBUFFER_TRACE(jh, "give to committing trans");
-				may_free = __dispose_buffer(jh,
-					journal->j_committing_transaction);
-				goto zap_buffer;
-			} else {
-				/* The orphan record's transaction has
-				 * committed.  We can cleanse this buffer */
-				clear_buffer_jbddirty(bh);
-				goto zap_buffer;
-			}
-		}
-	} else if (transaction == journal->j_committing_transaction) {
-		JBUFFER_TRACE(jh, "on committing transaction");
-		if (jh->b_jlist == BJ_Locked) {
-			/*
-			 * The buffer is on the committing transaction's locked
-			 * list.  We have the buffer locked, so I/O has
-			 * completed.  So we can nail the buffer now.
-			 */
-			may_free = __dispose_buffer(jh, transaction);
-			goto zap_buffer;
-		}
-		/*
-		 * The buffer is committing, we simply cannot touch
-		 * it. If the page is straddling i_size we have to wait
-		 * for commit and try again.
-		 */
-		if (partial_page) {
-			tid_t tid = journal->j_committing_transaction->t_tid;
-
-			journal_put_journal_head(jh);
-			spin_unlock(&journal->j_list_lock);
-			jbd_unlock_bh_state(bh);
-			spin_unlock(&journal->j_state_lock);
-			unlock_buffer(bh);
-			log_wait_commit(journal, tid);
-			lock_buffer(bh);
-			goto retry;
-		}
-		/*
-		 * OK, buffer won't be reachable after truncate. We just set
-		 * j_next_transaction to the running transaction (if there is
-		 * one) and mark buffer as freed so that commit code knows it
-		 * should clear dirty bits when it is done with the buffer.
-		 */
-		set_buffer_freed(bh);
-		if (journal->j_running_transaction && buffer_jbddirty(bh))
-			jh->b_next_transaction = journal->j_running_transaction;
-		journal_put_journal_head(jh);
-		spin_unlock(&journal->j_list_lock);
-		jbd_unlock_bh_state(bh);
-		spin_unlock(&journal->j_state_lock);
-		return 0;
-	} else {
-		/* Good, the buffer belongs to the running transaction.
-		 * We are writing our own transaction's data, not any
-		 * previous one's, so it is safe to throw it away
-		 * (remember that we expect the filesystem to have set
-		 * i_size already for this truncate so recovery will not
-		 * expose the disk blocks we are discarding here.) */
-		J_ASSERT_JH(jh, transaction == journal->j_running_transaction);
-		JBUFFER_TRACE(jh, "on running transaction");
-		may_free = __dispose_buffer(jh, transaction);
-	}
-
-zap_buffer:
-	/*
-	 * This is tricky. Although the buffer is truncated, it may be reused
-	 * if blocksize < pagesize and it is attached to the page straddling
-	 * EOF. Since the buffer might have been added to BJ_Forget list of the
-	 * running transaction, journal_get_write_access() won't clear
-	 * b_modified and credit accounting gets confused. So clear b_modified
-	 * here. */
-	jh->b_modified = 0;
-	journal_put_journal_head(jh);
-zap_buffer_no_jh:
-	spin_unlock(&journal->j_list_lock);
-	jbd_unlock_bh_state(bh);
-	spin_unlock(&journal->j_state_lock);
-zap_buffer_unlocked:
-	clear_buffer_dirty(bh);
-	J_ASSERT_BH(bh, !buffer_jbddirty(bh));
-	clear_buffer_mapped(bh);
-	clear_buffer_req(bh);
-	clear_buffer_new(bh);
-	bh->b_bdev = NULL;
-	return may_free;
-}
-
-/**
- * void journal_invalidatepage() - invalidate a journal page
- * @journal: journal to use for flush
- * @page:    page to flush
- * @offset:  offset of the range to invalidate
- * @length:  length of the range to invalidate
- *
- * Reap page buffers containing data in specified range in page.
- */
-void journal_invalidatepage(journal_t *journal,
-		      struct page *page,
-		      unsigned int offset,
-		      unsigned int length)
-{
-	struct buffer_head *head, *bh, *next;
-	unsigned int stop = offset + length;
-	unsigned int curr_off = 0;
-	int partial_page = (offset || length < PAGE_CACHE_SIZE);
-	int may_free = 1;
-
-	if (!PageLocked(page))
-		BUG();
-	if (!page_has_buffers(page))
-		return;
-
-	BUG_ON(stop > PAGE_CACHE_SIZE || stop < length);
-
-	/* We will potentially be playing with lists other than just the
-	 * data lists (especially for journaled data mode), so be
-	 * cautious in our locking. */
-
-	head = bh = page_buffers(page);
-	do {
-		unsigned int next_off = curr_off + bh->b_size;
-		next = bh->b_this_page;
-
-		if (next_off > stop)
-			return;
-
-		if (offset <= curr_off) {
-			/* This block is wholly outside the truncation point */
-			lock_buffer(bh);
-			may_free &= journal_unmap_buffer(journal, bh,
-							 partial_page);
-			unlock_buffer(bh);
-		}
-		curr_off = next_off;
-		bh = next;
-
-	} while (bh != head);
-
-	if (!partial_page) {
-		if (may_free && try_to_free_buffers(page))
-			J_ASSERT(!page_has_buffers(page));
-	}
-}
-
-/*
- * File a buffer on the given transaction list.
- */
-void __journal_file_buffer(struct journal_head *jh,
-			transaction_t *transaction, int jlist)
-{
-	struct journal_head **list = NULL;
-	int was_dirty = 0;
-	struct buffer_head *bh = jh2bh(jh);
-
-	J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));
-	assert_spin_locked(&transaction->t_journal->j_list_lock);
-
-	J_ASSERT_JH(jh, jh->b_jlist < BJ_Types);
-	J_ASSERT_JH(jh, jh->b_transaction == transaction ||
-				jh->b_transaction == NULL);
-
-	if (jh->b_transaction && jh->b_jlist == jlist)
-		return;
-
-	if (jlist == BJ_Metadata || jlist == BJ_Reserved ||
-	    jlist == BJ_Shadow || jlist == BJ_Forget) {
-		/*
-		 * For metadata buffers, we track dirty bit in buffer_jbddirty
-		 * instead of buffer_dirty. We should not see a dirty bit set
-		 * here because we clear it in do_get_write_access but e.g.
-		 * tune2fs can modify the sb and set the dirty bit at any time
-		 * so we try to gracefully handle that.
-		 */
-		if (buffer_dirty(bh))
-			warn_dirty_buffer(bh);
-		if (test_clear_buffer_dirty(bh) ||
-		    test_clear_buffer_jbddirty(bh))
-			was_dirty = 1;
-	}
-
-	if (jh->b_transaction)
-		__journal_temp_unlink_buffer(jh);
-	else
-		journal_grab_journal_head(bh);
-	jh->b_transaction = transaction;
-
-	switch (jlist) {
-	case BJ_None:
-		J_ASSERT_JH(jh, !jh->b_committed_data);
-		J_ASSERT_JH(jh, !jh->b_frozen_data);
-		return;
-	case BJ_SyncData:
-		list = &transaction->t_sync_datalist;
-		break;
-	case BJ_Metadata:
-		transaction->t_nr_buffers++;
-		list = &transaction->t_buffers;
-		break;
-	case BJ_Forget:
-		list = &transaction->t_forget;
-		break;
-	case BJ_IO:
-		list = &transaction->t_iobuf_list;
-		break;
-	case BJ_Shadow:
-		list = &transaction->t_shadow_list;
-		break;
-	case BJ_LogCtl:
-		list = &transaction->t_log_list;
-		break;
-	case BJ_Reserved:
-		list = &transaction->t_reserved_list;
-		break;
-	case BJ_Locked:
-		list =  &transaction->t_locked_list;
-		break;
-	}
-
-	__blist_add_buffer(list, jh);
-	jh->b_jlist = jlist;
-
-	if (was_dirty)
-		set_buffer_jbddirty(bh);
-}
-
-void journal_file_buffer(struct journal_head *jh,
-				transaction_t *transaction, int jlist)
-{
-	jbd_lock_bh_state(jh2bh(jh));
-	spin_lock(&transaction->t_journal->j_list_lock);
-	__journal_file_buffer(jh, transaction, jlist);
-	spin_unlock(&transaction->t_journal->j_list_lock);
-	jbd_unlock_bh_state(jh2bh(jh));
-}
-
-/*
- * Remove a buffer from its current buffer list in preparation for
- * dropping it from its current transaction entirely.  If the buffer has
- * already started to be used by a subsequent transaction, refile the
- * buffer on that transaction's metadata list.
- *
- * Called under j_list_lock
- * Called under jbd_lock_bh_state(jh2bh(jh))
- *
- * jh and bh may be already free when this function returns
- */
-void __journal_refile_buffer(struct journal_head *jh)
-{
-	int was_dirty, jlist;
-	struct buffer_head *bh = jh2bh(jh);
-
-	J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));
-	if (jh->b_transaction)
-		assert_spin_locked(&jh->b_transaction->t_journal->j_list_lock);
-
-	/* If the buffer is now unused, just drop it. */
-	if (jh->b_next_transaction == NULL) {
-		__journal_unfile_buffer(jh);
-		return;
-	}
-
-	/*
-	 * It has been modified by a later transaction: add it to the new
-	 * transaction's metadata list.
-	 */
-
-	was_dirty = test_clear_buffer_jbddirty(bh);
-	__journal_temp_unlink_buffer(jh);
-	/*
-	 * We set b_transaction here because b_next_transaction will inherit
-	 * our jh reference and thus __journal_file_buffer() must not take a
-	 * new one.
-	 */
-	jh->b_transaction = jh->b_next_transaction;
-	jh->b_next_transaction = NULL;
-	if (buffer_freed(bh))
-		jlist = BJ_Forget;
-	else if (jh->b_modified)
-		jlist = BJ_Metadata;
-	else
-		jlist = BJ_Reserved;
-	__journal_file_buffer(jh, jh->b_transaction, jlist);
-	J_ASSERT_JH(jh, jh->b_transaction->t_state == T_RUNNING);
-
-	if (was_dirty)
-		set_buffer_jbddirty(bh);
-}
-
-/*
- * __journal_refile_buffer() with necessary locking added. We take our bh
- * reference so that we can safely unlock bh.
- *
- * The jh and bh may be freed by this call.
- */
-void journal_refile_buffer(journal_t *journal, struct journal_head *jh)
-{
-	struct buffer_head *bh = jh2bh(jh);
-
-	/* Get reference so that buffer cannot be freed before we unlock it */
-	get_bh(bh);
-	jbd_lock_bh_state(bh);
-	spin_lock(&journal->j_list_lock);
-	__journal_refile_buffer(jh);
-	jbd_unlock_bh_state(bh);
-	spin_unlock(&journal->j_list_lock);
-	__brelse(bh);
-}
diff --git a/include/linux/jbd.h b/include/linux/jbd.h
deleted file mode 100644
index d32615280be9..000000000000
--- a/include/linux/jbd.h
+++ /dev/null
@@ -1,1047 +0,0 @@
-/*
- * linux/include/linux/jbd.h
- *
- * Written by Stephen C. Tweedie <sct@redhat.com>
- *
- * Copyright 1998-2000 Red Hat, Inc --- All Rights Reserved
- *
- * This file is part of the Linux kernel and is made available under
- * the terms of the GNU General Public License, version 2, or at your
- * option, any later version, incorporated herein by reference.
- *
- * Definitions for transaction data structures for the buffer cache
- * filesystem journaling support.
- */
-
-#ifndef _LINUX_JBD_H
-#define _LINUX_JBD_H
-
-/* Allow this file to be included directly into e2fsprogs */
-#ifndef __KERNEL__
-#include "jfs_compat.h"
-#define JFS_DEBUG
-#define jfs_debug jbd_debug
-#else
-
-#include <linux/types.h>
-#include <linux/buffer_head.h>
-#include <linux/journal-head.h>
-#include <linux/stddef.h>
-#include <linux/mutex.h>
-#include <linux/timer.h>
-#include <linux/lockdep.h>
-#include <linux/slab.h>
-
-#define journal_oom_retry 1
-
-/*
- * Define JBD_PARANOID_IOFAIL to cause a kernel BUG() if ext3 finds
- * certain classes of error which can occur due to failed IOs.  Under
- * normal use we want ext3 to continue after such errors, because
- * hardware _can_ fail, but for debugging purposes when running tests on
- * known-good hardware we may want to trap these errors.
- */
-#undef JBD_PARANOID_IOFAIL
-
-/*
- * The default maximum commit age, in seconds.
- */
-#define JBD_DEFAULT_MAX_COMMIT_AGE 5
-
-#ifdef CONFIG_JBD_DEBUG
-/*
- * Define JBD_EXPENSIVE_CHECKING to enable more expensive internal
- * consistency checks.  By default we don't do this unless
- * CONFIG_JBD_DEBUG is on.
- */
-#define JBD_EXPENSIVE_CHECKING
-extern u8 journal_enable_debug;
-
-void __jbd_debug(int level, const char *file, const char *func,
-		 unsigned int line, const char *fmt, ...);
-
-#define jbd_debug(n, fmt, a...) \
-	__jbd_debug((n), __FILE__, __func__, __LINE__, (fmt), ##a)
-#else
-#define jbd_debug(n, fmt, a...)    /**/
-#endif
-
-static inline void *jbd_alloc(size_t size, gfp_t flags)
-{
-	return (void *)__get_free_pages(flags, get_order(size));
-}
-
-static inline void jbd_free(void *ptr, size_t size)
-{
-	free_pages((unsigned long)ptr, get_order(size));
-}
-
-#define JFS_MIN_JOURNAL_BLOCKS 1024
-
-
-/**
- * typedef handle_t - The handle_t type represents a single atomic update being performed by some process.
- *
- * All filesystem modifications made by the process go
- * through this handle.  Recursive operations (such as quota operations)
- * are gathered into a single update.
- *
- * The buffer credits field is used to account for journaled buffers
- * being modified by the running process.  To ensure that there is
- * enough log space for all outstanding operations, we need to limit the
- * number of outstanding buffers possible at any time.  When the
- * operation completes, any buffer credits not used are credited back to
- * the transaction, so that at all times we know how many buffers the
- * outstanding updates on a transaction might possibly touch.
- *
- * This is an opaque datatype.
- **/
-typedef struct handle_s		handle_t;	/* Atomic operation type */
-
-
-/**
- * typedef journal_t - The journal_t maintains all of the journaling state information for a single filesystem.
- *
- * journal_t is linked to from the fs superblock structure.
- *
- * We use the journal_t to keep track of all outstanding transaction
- * activity on the filesystem, and to manage the state of the log
- * writing process.
- *
- * This is an opaque datatype.
- **/
-typedef struct journal_s	journal_t;	/* Journal control structure */
-#endif
-
-/*
- * Internal structures used by the logging mechanism:
- */
-
-#define JFS_MAGIC_NUMBER 0xc03b3998U /* The first 4 bytes of /dev/random! */
-
-/*
- * On-disk structures
- */
-
-/*
- * Descriptor block types:
- */
-
-#define JFS_DESCRIPTOR_BLOCK	1
-#define JFS_COMMIT_BLOCK	2
-#define JFS_SUPERBLOCK_V1	3
-#define JFS_SUPERBLOCK_V2	4
-#define JFS_REVOKE_BLOCK	5
-
-/*
- * Standard header for all descriptor blocks:
- */
-typedef struct journal_header_s
-{
-	__be32		h_magic;
-	__be32		h_blocktype;
-	__be32		h_sequence;
-} journal_header_t;
-
-
-/*
- * The block tag: used to describe a single buffer in the journal
- */
-typedef struct journal_block_tag_s
-{
-	__be32		t_blocknr;	/* The on-disk block number */
-	__be32		t_flags;	/* See below */
-} journal_block_tag_t;
-
-/*
- * The revoke descriptor: used on disk to describe a series of blocks to
- * be revoked from the log
- */
-typedef struct journal_revoke_header_s
-{
-	journal_header_t r_header;
-	__be32		 r_count;	/* Count of bytes used in the block */
-} journal_revoke_header_t;
-
-
-/* Definitions for the journal tag flags word: */
-#define JFS_FLAG_ESCAPE		1	/* on-disk block is escaped */
-#define JFS_FLAG_SAME_UUID	2	/* block has same uuid as previous */
-#define JFS_FLAG_DELETED	4	/* block deleted by this transaction */
-#define JFS_FLAG_LAST_TAG	8	/* last tag in this descriptor block */
-
-
-/*
- * The journal superblock.  All fields are in big-endian byte order.
- */
-typedef struct journal_superblock_s
-{
-/* 0x0000 */
-	journal_header_t s_header;
-
-/* 0x000C */
-	/* Static information describing the journal */
-	__be32	s_blocksize;		/* journal device blocksize */
-	__be32	s_maxlen;		/* total blocks in journal file */
-	__be32	s_first;		/* first block of log information */
-
-/* 0x0018 */
-	/* Dynamic information describing the current state of the log */
-	__be32	s_sequence;		/* first commit ID expected in log */
-	__be32	s_start;		/* blocknr of start of log */
-
-/* 0x0020 */
-	/* Error value, as set by journal_abort(). */
-	__be32	s_errno;
-
-/* 0x0024 */
-	/* Remaining fields are only valid in a version-2 superblock */
-	__be32	s_feature_compat;	/* compatible feature set */
-	__be32	s_feature_incompat;	/* incompatible feature set */
-	__be32	s_feature_ro_compat;	/* readonly-compatible feature set */
-/* 0x0030 */
-	__u8	s_uuid[16];		/* 128-bit uuid for journal */
-
-/* 0x0040 */
-	__be32	s_nr_users;		/* Nr of filesystems sharing log */
-
-	__be32	s_dynsuper;		/* Blocknr of dynamic superblock copy*/
-
-/* 0x0048 */
-	__be32	s_max_transaction;	/* Limit of journal blocks per trans.*/
-	__be32	s_max_trans_data;	/* Limit of data blocks per trans. */
-
-/* 0x0050 */
-	__u32	s_padding[44];
-
-/* 0x0100 */
-	__u8	s_users[16*48];		/* ids of all fs'es sharing the log */
-/* 0x0400 */
-} journal_superblock_t;
-
-#define JFS_HAS_COMPAT_FEATURE(j,mask)					\
-	((j)->j_format_version >= 2 &&					\
-	 ((j)->j_superblock->s_feature_compat & cpu_to_be32((mask))))
-#define JFS_HAS_RO_COMPAT_FEATURE(j,mask)				\
-	((j)->j_format_version >= 2 &&					\
-	 ((j)->j_superblock->s_feature_ro_compat & cpu_to_be32((mask))))
-#define JFS_HAS_INCOMPAT_FEATURE(j,mask)				\
-	((j)->j_format_version >= 2 &&					\
-	 ((j)->j_superblock->s_feature_incompat & cpu_to_be32((mask))))
-
-#define JFS_FEATURE_INCOMPAT_REVOKE	0x00000001
-
-/* Features known to this kernel version: */
-#define JFS_KNOWN_COMPAT_FEATURES	0
-#define JFS_KNOWN_ROCOMPAT_FEATURES	0
-#define JFS_KNOWN_INCOMPAT_FEATURES	JFS_FEATURE_INCOMPAT_REVOKE
-
-#ifdef __KERNEL__
-
-#include <linux/fs.h>
-#include <linux/sched.h>
-
-enum jbd_state_bits {
-	BH_JBD			/* Has an attached ext3 journal_head */
-	  = BH_PrivateStart,
-	BH_JWrite,		/* Being written to log (@@@ DEBUGGING) */
-	BH_Freed,		/* Has been freed (truncated) */
-	BH_Revoked,		/* Has been revoked from the log */
-	BH_RevokeValid,		/* Revoked flag is valid */
-	BH_JBDDirty,		/* Is dirty but journaled */
-	BH_State,		/* Pins most journal_head state */
-	BH_JournalHead,		/* Pins bh->b_private and jh->b_bh */
-	BH_Unshadow,		/* Dummy bit, for BJ_Shadow wakeup filtering */
-	BH_JBDPrivateStart,	/* First bit available for private use by FS */
-};
-
-BUFFER_FNS(JBD, jbd)
-BUFFER_FNS(JWrite, jwrite)
-BUFFER_FNS(JBDDirty, jbddirty)
-TAS_BUFFER_FNS(JBDDirty, jbddirty)
-BUFFER_FNS(Revoked, revoked)
-TAS_BUFFER_FNS(Revoked, revoked)
-BUFFER_FNS(RevokeValid, revokevalid)
-TAS_BUFFER_FNS(RevokeValid, revokevalid)
-BUFFER_FNS(Freed, freed)
-
-#include <linux/jbd_common.h>
-
-#define J_ASSERT(assert)	BUG_ON(!(assert))
-
-#define J_ASSERT_BH(bh, expr)	J_ASSERT(expr)
-#define J_ASSERT_JH(jh, expr)	J_ASSERT(expr)
-
-#if defined(JBD_PARANOID_IOFAIL)
-#define J_EXPECT(expr, why...)		J_ASSERT(expr)
-#define J_EXPECT_BH(bh, expr, why...)	J_ASSERT_BH(bh, expr)
-#define J_EXPECT_JH(jh, expr, why...)	J_ASSERT_JH(jh, expr)
-#else
-#define __journal_expect(expr, why...)					     \
-	({								     \
-		int val = (expr);					     \
-		if (!val) {						     \
-			printk(KERN_ERR					     \
-				"EXT3-fs unexpected failure: %s;\n",# expr); \
-			printk(KERN_ERR why "\n");			     \
-		}							     \
-		val;							     \
-	})
-#define J_EXPECT(expr, why...)		__journal_expect(expr, ## why)
-#define J_EXPECT_BH(bh, expr, why...)	__journal_expect(expr, ## why)
-#define J_EXPECT_JH(jh, expr, why...)	__journal_expect(expr, ## why)
-#endif
-
-struct jbd_revoke_table_s;
-
-/**
- * struct handle_s - this is the concrete type associated with handle_t.
- * @h_transaction: Which compound transaction is this update a part of?
- * @h_buffer_credits: Number of remaining buffers we are allowed to dirty.
- * @h_ref: Reference count on this handle
- * @h_err: Field for caller's use to track errors through large fs operations
- * @h_sync: flag for sync-on-close
- * @h_jdata: flag to force data journaling
- * @h_aborted: flag indicating fatal error on handle
- * @h_lockdep_map: lockdep info for debugging lock problems
- */
-struct handle_s
-{
-	/* Which compound transaction is this update a part of? */
-	transaction_t		*h_transaction;
-
-	/* Number of remaining buffers we are allowed to dirty: */
-	int			h_buffer_credits;
-
-	/* Reference count on this handle */
-	int			h_ref;
-
-	/* Field for caller's use to track errors through large fs */
-	/* operations */
-	int			h_err;
-
-	/* Flags [no locking] */
-	unsigned int	h_sync:		1;	/* sync-on-close */
-	unsigned int	h_jdata:	1;	/* force data journaling */
-	unsigned int	h_aborted:	1;	/* fatal error on handle */
-
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-	struct lockdep_map	h_lockdep_map;
-#endif
-};
-
-
-/* The transaction_t type is the guts of the journaling mechanism.  It
- * tracks a compound transaction through its various states:
- *
- * RUNNING:	accepting new updates
- * LOCKED:	Updates still running but we don't accept new ones
- * RUNDOWN:	Updates are tidying up but have finished requesting
- *		new buffers to modify (state not used for now)
- * FLUSH:       All updates complete, but we are still writing to disk
- * COMMIT:      All data on disk, writing commit record
- * FINISHED:	We still have to keep the transaction for checkpointing.
- *
- * The transaction keeps track of all of the buffers modified by a
- * running transaction, and all of the buffers committed but not yet
- * flushed to home for finished transactions.
- */
-
-/*
- * Lock ranking:
- *
- *    j_list_lock
- *      ->jbd_lock_bh_journal_head()	(This is "innermost")
- *
- *    j_state_lock
- *    ->jbd_lock_bh_state()
- *
- *    jbd_lock_bh_state()
- *    ->j_list_lock
- *
- *    j_state_lock
- *    ->t_handle_lock
- *
- *    j_state_lock
- *    ->j_list_lock			(journal_unmap_buffer)
- *
- */
-
-struct transaction_s
-{
-	/* Pointer to the journal for this transaction. [no locking] */
-	journal_t		*t_journal;
-
-	/* Sequence number for this transaction [no locking] */
-	tid_t			t_tid;
-
-	/*
-	 * Transaction's current state
-	 * [no locking - only kjournald alters this]
-	 * [j_list_lock] guards transition of a transaction into T_FINISHED
-	 * state and subsequent call of __journal_drop_transaction()
-	 * FIXME: needs barriers
-	 * KLUDGE: [use j_state_lock]
-	 */
-	enum {
-		T_RUNNING,
-		T_LOCKED,
-		T_FLUSH,
-		T_COMMIT,
-		T_COMMIT_RECORD,
-		T_FINISHED
-	}			t_state;
-
-	/*
-	 * Where in the log does this transaction's commit start? [no locking]
-	 */
-	unsigned int		t_log_start;
-
-	/* Number of buffers on the t_buffers list [j_list_lock] */
-	int			t_nr_buffers;
-
-	/*
-	 * Doubly-linked circular list of all buffers reserved but not yet
-	 * modified by this transaction [j_list_lock]
-	 */
-	struct journal_head	*t_reserved_list;
-
-	/*
-	 * Doubly-linked circular list of all buffers under writeout during
-	 * commit [j_list_lock]
-	 */
-	struct journal_head	*t_locked_list;
-
-	/*
-	 * Doubly-linked circular list of all metadata buffers owned by this
-	 * transaction [j_list_lock]
-	 */
-	struct journal_head	*t_buffers;
-
-	/*
-	 * Doubly-linked circular list of all data buffers still to be
-	 * flushed before this transaction can be committed [j_list_lock]
-	 */
-	struct journal_head	*t_sync_datalist;
-
-	/*
-	 * Doubly-linked circular list of all forget buffers (superseded
-	 * buffers which we can un-checkpoint once this transaction commits)
-	 * [j_list_lock]
-	 */
-	struct journal_head	*t_forget;
-
-	/*
-	 * Doubly-linked circular list of all buffers still to be flushed before
-	 * this transaction can be checkpointed. [j_list_lock]
-	 */
-	struct journal_head	*t_checkpoint_list;
-
-	/*
-	 * Doubly-linked circular list of all buffers submitted for IO while
-	 * checkpointing. [j_list_lock]
-	 */
-	struct journal_head	*t_checkpoint_io_list;
-
-	/*
-	 * Doubly-linked circular list of temporary buffers currently undergoing
-	 * IO in the log [j_list_lock]
-	 */
-	struct journal_head	*t_iobuf_list;
-
-	/*
-	 * Doubly-linked circular list of metadata buffers being shadowed by log
-	 * IO.  The IO buffers on the iobuf list and the shadow buffers on this
-	 * list match each other one for one at all times. [j_list_lock]
-	 */
-	struct journal_head	*t_shadow_list;
-
-	/*
-	 * Doubly-linked circular list of control buffers being written to the
-	 * log. [j_list_lock]
-	 */
-	struct journal_head	*t_log_list;
-
-	/*
-	 * Protects info related to handles
-	 */
-	spinlock_t		t_handle_lock;
-
-	/*
-	 * Number of outstanding updates running on this transaction
-	 * [t_handle_lock]
-	 */
-	int			t_updates;
-
-	/*
-	 * Number of buffers reserved for use by all handles in this transaction
-	 * handle but not yet modified. [t_handle_lock]
-	 */
-	int			t_outstanding_credits;
-
-	/*
-	 * Forward and backward links for the circular list of all transactions
-	 * awaiting checkpoint. [j_list_lock]
-	 */
-	transaction_t		*t_cpnext, *t_cpprev;
-
-	/*
-	 * When will the transaction expire (become due for commit), in jiffies?
-	 * [no locking]
-	 */
-	unsigned long		t_expires;
-
-	/*
-	 * When this transaction started, in nanoseconds [no locking]
-	 */
-	ktime_t			t_start_time;
-
-	/*
-	 * How many handles used this transaction? [t_handle_lock]
-	 */
-	int t_handle_count;
-};
-
-/**
- * struct journal_s - this is the concrete type associated with journal_t.
- * @j_flags:  General journaling state flags
- * @j_errno:  Is there an outstanding uncleared error on the journal (from a
- *     prior abort)?
- * @j_sb_buffer: First part of superblock buffer
- * @j_superblock: Second part of superblock buffer
- * @j_format_version: Version of the superblock format
- * @j_state_lock: Protect the various scalars in the journal
- * @j_barrier_count:  Number of processes waiting to create a barrier lock
- * @j_running_transaction: The current running transaction..
- * @j_committing_transaction: the transaction we are pushing to disk
- * @j_checkpoint_transactions: a linked circular list of all transactions
- *  waiting for checkpointing
- * @j_wait_transaction_locked: Wait queue for waiting for a locked transaction
- *  to start committing, or for a barrier lock to be released
- * @j_wait_logspace: Wait queue for waiting for checkpointing to complete
- * @j_wait_done_commit: Wait queue for waiting for commit to complete
- * @j_wait_checkpoint:  Wait queue to trigger checkpointing
- * @j_wait_commit: Wait queue to trigger commit
- * @j_wait_updates: Wait queue to wait for updates to complete
- * @j_checkpoint_mutex: Mutex for locking against concurrent checkpoints
- * @j_head: Journal head - identifies the first unused block in the journal
- * @j_tail: Journal tail - identifies the oldest still-used block in the
- *  journal.
- * @j_free: Journal free - how many free blocks are there in the journal?
- * @j_first: The block number of the first usable block
- * @j_last: The block number one beyond the last usable block
- * @j_dev: Device where we store the journal
- * @j_blocksize: blocksize for the location where we store the journal.
- * @j_blk_offset: starting block offset for into the device where we store the
- *     journal
- * @j_fs_dev: Device which holds the client fs.  For internal journal this will
- *     be equal to j_dev
- * @j_maxlen: Total maximum capacity of the journal region on disk.
- * @j_list_lock: Protects the buffer lists and internal buffer state.
- * @j_inode: Optional inode where we store the journal.  If present, all journal
- *     block numbers are mapped into this inode via bmap().
- * @j_tail_sequence:  Sequence number of the oldest transaction in the log
- * @j_transaction_sequence: Sequence number of the next transaction to grant
- * @j_commit_sequence: Sequence number of the most recently committed
- *  transaction
- * @j_commit_request: Sequence number of the most recent transaction wanting
- *     commit
- * @j_commit_waited: Sequence number of the most recent transaction someone
- *     is waiting for to commit.
- * @j_uuid: Uuid of client object.
- * @j_task: Pointer to the current commit thread for this journal
- * @j_max_transaction_buffers:  Maximum number of metadata buffers to allow in a
- *     single compound commit transaction
- * @j_commit_interval: What is the maximum transaction lifetime before we begin
- *  a commit?
- * @j_commit_timer:  The timer used to wakeup the commit thread
- * @j_revoke_lock: Protect the revoke table
- * @j_revoke: The revoke table - maintains the list of revoked blocks in the
- *     current transaction.
- * @j_revoke_table: alternate revoke tables for j_revoke
- * @j_wbuf: array of buffer_heads for journal_commit_transaction
- * @j_wbufsize: maximum number of buffer_heads allowed in j_wbuf, the
- *	number that will fit in j_blocksize
- * @j_last_sync_writer: most recent pid which did a synchronous write
- * @j_average_commit_time: the average amount of time in nanoseconds it
- *	takes to commit a transaction to the disk.
- * @j_private: An opaque pointer to fs-private information.
- */
-
-struct journal_s
-{
-	/* General journaling state flags [j_state_lock] */
-	unsigned long		j_flags;
-
-	/*
-	 * Is there an outstanding uncleared error on the journal (from a prior
-	 * abort)? [j_state_lock]
-	 */
-	int			j_errno;
-
-	/* The superblock buffer */
-	struct buffer_head	*j_sb_buffer;
-	journal_superblock_t	*j_superblock;
-
-	/* Version of the superblock format */
-	int			j_format_version;
-
-	/*
-	 * Protect the various scalars in the journal
-	 */
-	spinlock_t		j_state_lock;
-
-	/*
-	 * Number of processes waiting to create a barrier lock [j_state_lock]
-	 */
-	int			j_barrier_count;
-
-	/*
-	 * Transactions: The current running transaction...
-	 * [j_state_lock] [caller holding open handle]
-	 */
-	transaction_t		*j_running_transaction;
-
-	/*
-	 * the transaction we are pushing to disk
-	 * [j_state_lock] [caller holding open handle]
-	 */
-	transaction_t		*j_committing_transaction;
-
-	/*
-	 * ... and a linked circular list of all transactions waiting for
-	 * checkpointing. [j_list_lock]
-	 */
-	transaction_t		*j_checkpoint_transactions;
-
-	/*
-	 * Wait queue for waiting for a locked transaction to start committing,
-	 * or for a barrier lock to be released
-	 */
-	wait_queue_head_t	j_wait_transaction_locked;
-
-	/* Wait queue for waiting for checkpointing to complete */
-	wait_queue_head_t	j_wait_logspace;
-
-	/* Wait queue for waiting for commit to complete */
-	wait_queue_head_t	j_wait_done_commit;
-
-	/* Wait queue to trigger checkpointing */
-	wait_queue_head_t	j_wait_checkpoint;
-
-	/* Wait queue to trigger commit */
-	wait_queue_head_t	j_wait_commit;
-
-	/* Wait queue to wait for updates to complete */
-	wait_queue_head_t	j_wait_updates;
-
-	/* Semaphore for locking against concurrent checkpoints */
-	struct mutex		j_checkpoint_mutex;
-
-	/*
-	 * Journal head: identifies the first unused block in the journal.
-	 * [j_state_lock]
-	 */
-	unsigned int		j_head;
-
-	/*
-	 * Journal tail: identifies the oldest still-used block in the journal.
-	 * [j_state_lock]
-	 */
-	unsigned int		j_tail;
-
-	/*
-	 * Journal free: how many free blocks are there in the journal?
-	 * [j_state_lock]
-	 */
-	unsigned int		j_free;
-
-	/*
-	 * Journal start and end: the block numbers of the first usable block
-	 * and one beyond the last usable block in the journal. [j_state_lock]
-	 */
-	unsigned int		j_first;
-	unsigned int		j_last;
-
-	/*
-	 * Device, blocksize and starting block offset for the location where we
-	 * store the journal.
-	 */
-	struct block_device	*j_dev;
-	int			j_blocksize;
-	unsigned int		j_blk_offset;
-
-	/*
-	 * Device which holds the client fs.  For internal journal this will be
-	 * equal to j_dev.
-	 */
-	struct block_device	*j_fs_dev;
-
-	/* Total maximum capacity of the journal region on disk. */
-	unsigned int		j_maxlen;
-
-	/*
-	 * Protects the buffer lists and internal buffer state.
-	 */
-	spinlock_t		j_list_lock;
-
-	/* Optional inode where we store the journal.  If present, all */
-	/* journal block numbers are mapped into this inode via */
-	/* bmap(). */
-	struct inode		*j_inode;
-
-	/*
-	 * Sequence number of the oldest transaction in the log [j_state_lock]
-	 */
-	tid_t			j_tail_sequence;
-
-	/*
-	 * Sequence number of the next transaction to grant [j_state_lock]
-	 */
-	tid_t			j_transaction_sequence;
-
-	/*
-	 * Sequence number of the most recently committed transaction
-	 * [j_state_lock].
-	 */
-	tid_t			j_commit_sequence;
-
-	/*
-	 * Sequence number of the most recent transaction wanting commit
-	 * [j_state_lock]
-	 */
-	tid_t			j_commit_request;
-
-	/*
-	 * Sequence number of the most recent transaction someone is waiting
-	 * for to commit.
-	 * [j_state_lock]
-	 */
-	tid_t                   j_commit_waited;
-
-	/*
-	 * Journal uuid: identifies the object (filesystem, LVM volume etc)
-	 * backed by this journal.  This will eventually be replaced by an array
-	 * of uuids, allowing us to index multiple devices within a single
-	 * journal and to perform atomic updates across them.
-	 */
-	__u8			j_uuid[16];
-
-	/* Pointer to the current commit thread for this journal */
-	struct task_struct	*j_task;
-
-	/*
-	 * Maximum number of metadata buffers to allow in a single compound
-	 * commit transaction
-	 */
-	int			j_max_transaction_buffers;
-
-	/*
-	 * What is the maximum transaction lifetime before we begin a commit?
-	 */
-	unsigned long		j_commit_interval;
-
-	/* The timer used to wakeup the commit thread: */
-	struct timer_list	j_commit_timer;
-
-	/*
-	 * The revoke table: maintains the list of revoked blocks in the
-	 * current transaction.  [j_revoke_lock]
-	 */
-	spinlock_t		j_revoke_lock;
-	struct jbd_revoke_table_s *j_revoke;
-	struct jbd_revoke_table_s *j_revoke_table[2];
-
-	/*
-	 * array of bhs for journal_commit_transaction
-	 */
-	struct buffer_head	**j_wbuf;
-	int			j_wbufsize;
-
-	/*
-	 * this is the pid of the last person to run a synchronous operation
-	 * through the journal.
-	 */
-	pid_t			j_last_sync_writer;
-
-	/*
-	 * the average amount of time in nanoseconds it takes to commit a
-	 * transaction to the disk.  [j_state_lock]
-	 */
-	u64			j_average_commit_time;
-
-	/*
-	 * An opaque pointer to fs-private information.  ext3 puts its
-	 * superblock pointer here
-	 */
-	void *j_private;
-};
-
-/*
- * Journal flag definitions
- */
-#define JFS_UNMOUNT	0x001	/* Journal thread is being destroyed */
-#define JFS_ABORT	0x002	/* Journaling has been aborted for errors. */
-#define JFS_ACK_ERR	0x004	/* The errno in the sb has been acked */
-#define JFS_FLUSHED	0x008	/* The journal superblock has been flushed */
-#define JFS_LOADED	0x010	/* The journal superblock has been loaded */
-#define JFS_BARRIER	0x020	/* Use IDE barriers */
-#define JFS_ABORT_ON_SYNCDATA_ERR	0x040  /* Abort the journal on file
-						* data write error in ordered
-						* mode */
-
-/*
- * Function declarations for the journaling transaction and buffer
- * management
- */
-
-/* Filing buffers */
-extern void journal_unfile_buffer(journal_t *, struct journal_head *);
-extern void __journal_unfile_buffer(struct journal_head *);
-extern void __journal_refile_buffer(struct journal_head *);
-extern void journal_refile_buffer(journal_t *, struct journal_head *);
-extern void __journal_file_buffer(struct journal_head *, transaction_t *, int);
-extern void __journal_free_buffer(struct journal_head *bh);
-extern void journal_file_buffer(struct journal_head *, transaction_t *, int);
-extern void __journal_clean_data_list(transaction_t *transaction);
-
-/* Log buffer allocation */
-extern struct journal_head * journal_get_descriptor_buffer(journal_t *);
-int journal_next_log_block(journal_t *, unsigned int *);
-
-/* Commit management */
-extern void journal_commit_transaction(journal_t *);
-
-/* Checkpoint list management */
-int __journal_clean_checkpoint_list(journal_t *journal);
-int __journal_remove_checkpoint(struct journal_head *);
-void __journal_insert_checkpoint(struct journal_head *, transaction_t *);
-
-/* Buffer IO */
-extern int
-journal_write_metadata_buffer(transaction_t	  *transaction,
-			      struct journal_head  *jh_in,
-			      struct journal_head **jh_out,
-			      unsigned int blocknr);
-
-/* Transaction locking */
-extern void		__wait_on_journal (journal_t *);
-
-/*
- * Journal locking.
- *
- * We need to lock the journal during transaction state changes so that nobody
- * ever tries to take a handle on the running transaction while we are in the
- * middle of moving it to the commit phase.  j_state_lock does this.
- *
- * Note that the locking is completely interrupt unsafe.  We never touch
- * journal structures from interrupts.
- */
-
-static inline handle_t *journal_current_handle(void)
-{
-	return current->journal_info;
-}
-
-/* The journaling code user interface:
- *
- * Create and destroy handles
- * Register buffer modifications against the current transaction.
- */
-
-extern handle_t *journal_start(journal_t *, int nblocks);
-extern int	 journal_restart (handle_t *, int nblocks);
-extern int	 journal_extend (handle_t *, int nblocks);
-extern int	 journal_get_write_access(handle_t *, struct buffer_head *);
-extern int	 journal_get_create_access (handle_t *, struct buffer_head *);
-extern int	 journal_get_undo_access(handle_t *, struct buffer_head *);
-extern int	 journal_dirty_data (handle_t *, struct buffer_head *);
-extern int	 journal_dirty_metadata (handle_t *, struct buffer_head *);
-extern void	 journal_release_buffer (handle_t *, struct buffer_head *);
-extern int	 journal_forget (handle_t *, struct buffer_head *);
-extern void	 journal_sync_buffer (struct buffer_head *);
-extern void	 journal_invalidatepage(journal_t *,
-				struct page *, unsigned int, unsigned int);
-extern int	 journal_try_to_free_buffers(journal_t *, struct page *, gfp_t);
-extern int	 journal_stop(handle_t *);
-extern int	 journal_flush (journal_t *);
-extern void	 journal_lock_updates (journal_t *);
-extern void	 journal_unlock_updates (journal_t *);
-
-extern journal_t * journal_init_dev(struct block_device *bdev,
-				struct block_device *fs_dev,
-				int start, int len, int bsize);
-extern journal_t * journal_init_inode (struct inode *);
-extern int	   journal_update_format (journal_t *);
-extern int	   journal_check_used_features
-		   (journal_t *, unsigned long, unsigned long, unsigned long);
-extern int	   journal_check_available_features
-		   (journal_t *, unsigned long, unsigned long, unsigned long);
-extern int	   journal_set_features
-		   (journal_t *, unsigned long, unsigned long, unsigned long);
-extern int	   journal_create     (journal_t *);
-extern int	   journal_load       (journal_t *journal);
-extern int	   journal_destroy    (journal_t *);
-extern int	   journal_recover    (journal_t *journal);
-extern int	   journal_wipe       (journal_t *, int);
-extern int	   journal_skip_recovery	(journal_t *);
-extern void	   journal_update_sb_log_tail	(journal_t *, tid_t, unsigned int,
-						 int);
-extern void	   journal_abort      (journal_t *, int);
-extern int	   journal_errno      (journal_t *);
-extern void	   journal_ack_err    (journal_t *);
-extern int	   journal_clear_err  (journal_t *);
-extern int	   journal_bmap(journal_t *, unsigned int, unsigned int *);
-extern int	   journal_force_commit(journal_t *);
-
-/*
- * journal_head management
- */
-struct journal_head *journal_add_journal_head(struct buffer_head *bh);
-struct journal_head *journal_grab_journal_head(struct buffer_head *bh);
-void journal_put_journal_head(struct journal_head *jh);
-
-/*
- * handle management
- */
-extern struct kmem_cache *jbd_handle_cache;
-
-static inline handle_t *jbd_alloc_handle(gfp_t gfp_flags)
-{
-	return kmem_cache_zalloc(jbd_handle_cache, gfp_flags);
-}
-
-static inline void jbd_free_handle(handle_t *handle)
-{
-	kmem_cache_free(jbd_handle_cache, handle);
-}
-
-/* Primary revoke support */
-#define JOURNAL_REVOKE_DEFAULT_HASH 256
-extern int	   journal_init_revoke(journal_t *, int);
-extern void	   journal_destroy_revoke_caches(void);
-extern int	   journal_init_revoke_caches(void);
-
-extern void	   journal_destroy_revoke(journal_t *);
-extern int	   journal_revoke (handle_t *,
-				unsigned int, struct buffer_head *);
-extern int	   journal_cancel_revoke(handle_t *, struct journal_head *);
-extern void	   journal_write_revoke_records(journal_t *,
-						transaction_t *, int);
-
-/* Recovery revoke support */
-extern int	journal_set_revoke(journal_t *, unsigned int, tid_t);
-extern int	journal_test_revoke(journal_t *, unsigned int, tid_t);
-extern void	journal_clear_revoke(journal_t *);
-extern void	journal_switch_revoke_table(journal_t *journal);
-extern void	journal_clear_buffer_revoked_flags(journal_t *journal);
-
-/*
- * The log thread user interface:
- *
- * Request space in the current transaction, and force transaction commit
- * transitions on demand.
- */
-
-int __log_space_left(journal_t *); /* Called with journal locked */
-int log_start_commit(journal_t *journal, tid_t tid);
-int __log_start_commit(journal_t *journal, tid_t tid);
-int journal_start_commit(journal_t *journal, tid_t *tid);
-int journal_force_commit_nested(journal_t *journal);
-int log_wait_commit(journal_t *journal, tid_t tid);
-int log_do_checkpoint(journal_t *journal);
-int journal_trans_will_send_data_barrier(journal_t *journal, tid_t tid);
-
-void __log_wait_for_space(journal_t *journal);
-extern void	__journal_drop_transaction(journal_t *, transaction_t *);
-extern int	cleanup_journal_tail(journal_t *);
-
-/*
- * is_journal_abort
- *
- * Simple test wrapper function to test the JFS_ABORT state flag.  This
- * bit, when set, indicates that we have had a fatal error somewhere,
- * either inside the journaling layer or indicated to us by the client
- * (eg. ext3), and that we and should not commit any further
- * transactions.
- */
-
-static inline int is_journal_aborted(journal_t *journal)
-{
-	return journal->j_flags & JFS_ABORT;
-}
-
-static inline int is_handle_aborted(handle_t *handle)
-{
-	if (handle->h_aborted)
-		return 1;
-	return is_journal_aborted(handle->h_transaction->t_journal);
-}
-
-static inline void journal_abort_handle(handle_t *handle)
-{
-	handle->h_aborted = 1;
-}
-
-#endif /* __KERNEL__   */
-
-/* Comparison functions for transaction IDs: perform comparisons using
- * modulo arithmetic so that they work over sequence number wraps. */
-
-static inline int tid_gt(tid_t x, tid_t y)
-{
-	int difference = (x - y);
-	return (difference > 0);
-}
-
-static inline int tid_geq(tid_t x, tid_t y)
-{
-	int difference = (x - y);
-	return (difference >= 0);
-}
-
-extern int journal_blocks_per_page(struct inode *inode);
-
-/*
- * Return the minimum number of blocks which must be free in the journal
- * before a new transaction may be started.  Must be called under j_state_lock.
- */
-static inline int jbd_space_needed(journal_t *journal)
-{
-	int nblocks = journal->j_max_transaction_buffers;
-	if (journal->j_committing_transaction)
-		nblocks += journal->j_committing_transaction->
-					t_outstanding_credits;
-	return nblocks;
-}
-
-/*
- * Definitions which augment the buffer_head layer
- */
-
-/* journaling buffer types */
-#define BJ_None		0	/* Not journaled */
-#define BJ_SyncData	1	/* Normal data: flush before commit */
-#define BJ_Metadata	2	/* Normal journaled metadata */
-#define BJ_Forget	3	/* Buffer superseded by this transaction */
-#define BJ_IO		4	/* Buffer is for temporary IO use */
-#define BJ_Shadow	5	/* Buffer contents being shadowed to the log */
-#define BJ_LogCtl	6	/* Buffer contains log descriptors */
-#define BJ_Reserved	7	/* Buffer is reserved for access by journal */
-#define BJ_Locked	8	/* Locked for I/O during commit */
-#define BJ_Types	9
-
-extern int jbd_blocks_per_page(struct inode *inode);
-
-#ifdef __KERNEL__
-
-#define buffer_trace_init(bh)	do {} while (0)
-#define print_buffer_fields(bh)	do {} while (0)
-#define print_buffer_trace(bh)	do {} while (0)
-#define BUFFER_TRACE(bh, info)	do {} while (0)
-#define BUFFER_TRACE2(bh, bh2, info)	do {} while (0)
-#define JBUFFER_TRACE(jh, info)	do {} while (0)
-
-#endif	/* __KERNEL__ */
-
-#endif	/* _LINUX_JBD_H */
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index edb640ae9a94..ad4b28647298 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -29,6 +29,7 @@
 #include <linux/mutex.h>
 #include <linux/timer.h>
 #include <linux/slab.h>
+#include <linux/bit_spinlock.h>
 #include <crypto/hash.h>
 #endif
 
@@ -336,7 +337,45 @@ BUFFER_FNS(Freed, freed)
 BUFFER_FNS(Shadow, shadow)
 BUFFER_FNS(Verified, verified)
 
-#include <linux/jbd_common.h>
+static inline struct buffer_head *jh2bh(struct journal_head *jh)
+{
+	return jh->b_bh;
+}
+
+static inline struct journal_head *bh2jh(struct buffer_head *bh)
+{
+	return bh->b_private;
+}
+
+static inline void jbd_lock_bh_state(struct buffer_head *bh)
+{
+	bit_spin_lock(BH_State, &bh->b_state);
+}
+
+static inline int jbd_trylock_bh_state(struct buffer_head *bh)
+{
+	return bit_spin_trylock(BH_State, &bh->b_state);
+}
+
+static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
+{
+	return bit_spin_is_locked(BH_State, &bh->b_state);
+}
+
+static inline void jbd_unlock_bh_state(struct buffer_head *bh)
+{
+	bit_spin_unlock(BH_State, &bh->b_state);
+}
+
+static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
+{
+	bit_spin_lock(BH_JournalHead, &bh->b_state);
+}
+
+static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
+{
+	bit_spin_unlock(BH_JournalHead, &bh->b_state);
+}
 
 #define J_ASSERT(assert)	BUG_ON(!(assert))
 
diff --git a/include/linux/jbd_common.h b/include/linux/jbd_common.h
deleted file mode 100644
index 3dc53432355f..000000000000
--- a/include/linux/jbd_common.h
+++ /dev/null
@@ -1,46 +0,0 @@
-#ifndef _LINUX_JBD_STATE_H
-#define _LINUX_JBD_STATE_H
-
-#include <linux/bit_spinlock.h>
-
-static inline struct buffer_head *jh2bh(struct journal_head *jh)
-{
-	return jh->b_bh;
-}
-
-static inline struct journal_head *bh2jh(struct buffer_head *bh)
-{
-	return bh->b_private;
-}
-
-static inline void jbd_lock_bh_state(struct buffer_head *bh)
-{
-	bit_spin_lock(BH_State, &bh->b_state);
-}
-
-static inline int jbd_trylock_bh_state(struct buffer_head *bh)
-{
-	return bit_spin_trylock(BH_State, &bh->b_state);
-}
-
-static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
-{
-	return bit_spin_is_locked(BH_State, &bh->b_state);
-}
-
-static inline void jbd_unlock_bh_state(struct buffer_head *bh)
-{
-	bit_spin_unlock(BH_State, &bh->b_state);
-}
-
-static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
-{
-	bit_spin_lock(BH_JournalHead, &bh->b_state);
-}
-
-static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
-{
-	bit_spin_unlock(BH_JournalHead, &bh->b_state);
-}
-
-#endif
diff --git a/include/trace/events/ext3.h b/include/trace/events/ext3.h
deleted file mode 100644
index fc733d28117a..000000000000
--- a/include/trace/events/ext3.h
+++ /dev/null
@@ -1,866 +0,0 @@
-#undef TRACE_SYSTEM
-#define TRACE_SYSTEM ext3
-
-#if !defined(_TRACE_EXT3_H) || defined(TRACE_HEADER_MULTI_READ)
-#define _TRACE_EXT3_H
-
-#include <linux/tracepoint.h>
-
-TRACE_EVENT(ext3_free_inode,
-	TP_PROTO(struct inode *inode),
-
-	TP_ARGS(inode),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	umode_t, mode			)
-		__field(	uid_t,	uid			)
-		__field(	gid_t,	gid			)
-		__field(	blkcnt_t, blocks		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->mode	= inode->i_mode;
-		__entry->uid	= i_uid_read(inode);
-		__entry->gid	= i_gid_read(inode);
-		__entry->blocks	= inode->i_blocks;
-	),
-
-	TP_printk("dev %d,%d ino %lu mode 0%o uid %u gid %u blocks %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->mode, __entry->uid, __entry->gid,
-		  (unsigned long) __entry->blocks)
-);
-
-TRACE_EVENT(ext3_request_inode,
-	TP_PROTO(struct inode *dir, int mode),
-
-	TP_ARGS(dir, mode),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	dir			)
-		__field(	umode_t, mode			)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= dir->i_sb->s_dev;
-		__entry->dir	= dir->i_ino;
-		__entry->mode	= mode;
-	),
-
-	TP_printk("dev %d,%d dir %lu mode 0%o",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->dir, __entry->mode)
-);
-
-TRACE_EVENT(ext3_allocate_inode,
-	TP_PROTO(struct inode *inode, struct inode *dir, int mode),
-
-	TP_ARGS(inode, dir, mode),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	ino_t,	dir			)
-		__field(	umode_t, mode			)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->dir	= dir->i_ino;
-		__entry->mode	= mode;
-	),
-
-	TP_printk("dev %d,%d ino %lu dir %lu mode 0%o",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long) __entry->dir, __entry->mode)
-);
-
-TRACE_EVENT(ext3_evict_inode,
-	TP_PROTO(struct inode *inode),
-
-	TP_ARGS(inode),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	int,	nlink			)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->nlink	= inode->i_nlink;
-	),
-
-	TP_printk("dev %d,%d ino %lu nlink %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino, __entry->nlink)
-);
-
-TRACE_EVENT(ext3_drop_inode,
-	TP_PROTO(struct inode *inode, int drop),
-
-	TP_ARGS(inode, drop),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	int,	drop			)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->drop	= drop;
-	),
-
-	TP_printk("dev %d,%d ino %lu drop %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino, __entry->drop)
-);
-
-TRACE_EVENT(ext3_mark_inode_dirty,
-	TP_PROTO(struct inode *inode, unsigned long IP),
-
-	TP_ARGS(inode, IP),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(unsigned long,	ip			)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->ip	= IP;
-	),
-
-	TP_printk("dev %d,%d ino %lu caller %pS",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino, (void *)__entry->ip)
-);
-
-TRACE_EVENT(ext3_write_begin,
-	TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
-		 unsigned int flags),
-
-	TP_ARGS(inode, pos, len, flags),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	loff_t,	pos			)
-		__field(	unsigned int, len		)
-		__field(	unsigned int, flags		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->pos	= pos;
-		__entry->len	= len;
-		__entry->flags	= flags;
-	),
-
-	TP_printk("dev %d,%d ino %lu pos %llu len %u flags %u",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long long) __entry->pos, __entry->len,
-		  __entry->flags)
-);
-
-DECLARE_EVENT_CLASS(ext3__write_end,
-	TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
-			unsigned int copied),
-
-	TP_ARGS(inode, pos, len, copied),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	loff_t,	pos			)
-		__field(	unsigned int, len		)
-		__field(	unsigned int, copied		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->pos	= pos;
-		__entry->len	= len;
-		__entry->copied	= copied;
-	),
-
-	TP_printk("dev %d,%d ino %lu pos %llu len %u copied %u",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long long) __entry->pos, __entry->len,
-		  __entry->copied)
-);
-
-DEFINE_EVENT(ext3__write_end, ext3_ordered_write_end,
-
-	TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
-		 unsigned int copied),
-
-	TP_ARGS(inode, pos, len, copied)
-);
-
-DEFINE_EVENT(ext3__write_end, ext3_writeback_write_end,
-
-	TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
-		 unsigned int copied),
-
-	TP_ARGS(inode, pos, len, copied)
-);
-
-DEFINE_EVENT(ext3__write_end, ext3_journalled_write_end,
-
-	TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
-		 unsigned int copied),
-
-	TP_ARGS(inode, pos, len, copied)
-);
-
-DECLARE_EVENT_CLASS(ext3__page_op,
-	TP_PROTO(struct page *page),
-
-	TP_ARGS(page),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	pgoff_t, index			)
-
-	),
-
-	TP_fast_assign(
-		__entry->index	= page->index;
-		__entry->ino	= page->mapping->host->i_ino;
-		__entry->dev	= page->mapping->host->i_sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d ino %lu page_index %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino, __entry->index)
-);
-
-DEFINE_EVENT(ext3__page_op, ext3_ordered_writepage,
-
-	TP_PROTO(struct page *page),
-
-	TP_ARGS(page)
-);
-
-DEFINE_EVENT(ext3__page_op, ext3_writeback_writepage,
-
-	TP_PROTO(struct page *page),
-
-	TP_ARGS(page)
-);
-
-DEFINE_EVENT(ext3__page_op, ext3_journalled_writepage,
-
-	TP_PROTO(struct page *page),
-
-	TP_ARGS(page)
-);
-
-DEFINE_EVENT(ext3__page_op, ext3_readpage,
-
-	TP_PROTO(struct page *page),
-
-	TP_ARGS(page)
-);
-
-DEFINE_EVENT(ext3__page_op, ext3_releasepage,
-
-	TP_PROTO(struct page *page),
-
-	TP_ARGS(page)
-);
-
-TRACE_EVENT(ext3_invalidatepage,
-	TP_PROTO(struct page *page, unsigned int offset, unsigned int length),
-
-	TP_ARGS(page, offset, length),
-
-	TP_STRUCT__entry(
-		__field(	pgoff_t, index			)
-		__field(	unsigned int, offset		)
-		__field(	unsigned int, length		)
-		__field(	ino_t,	ino			)
-		__field(	dev_t,	dev			)
-
-	),
-
-	TP_fast_assign(
-		__entry->index	= page->index;
-		__entry->offset	= offset;
-		__entry->length	= length;
-		__entry->ino	= page->mapping->host->i_ino;
-		__entry->dev	= page->mapping->host->i_sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d ino %lu page_index %lu offset %u length %u",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->index, __entry->offset, __entry->length)
-);
-
-TRACE_EVENT(ext3_discard_blocks,
-	TP_PROTO(struct super_block *sb, unsigned long blk,
-			unsigned long count),
-
-	TP_ARGS(sb, blk, count),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,		dev		)
-		__field(	unsigned long,	blk		)
-		__field(	unsigned long,	count		)
-
-	),
-
-	TP_fast_assign(
-		__entry->dev	= sb->s_dev;
-		__entry->blk	= blk;
-		__entry->count	= count;
-	),
-
-	TP_printk("dev %d,%d blk %lu count %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->blk, __entry->count)
-);
-
-TRACE_EVENT(ext3_request_blocks,
-	TP_PROTO(struct inode *inode, unsigned long goal,
-		 unsigned long count),
-
-	TP_ARGS(inode, goal, count),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	unsigned long, count		)
-		__field(	unsigned long,	goal		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->count	= count;
-		__entry->goal	= goal;
-	),
-
-	TP_printk("dev %d,%d ino %lu count %lu goal %lu ",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->count, __entry->goal)
-);
-
-TRACE_EVENT(ext3_allocate_blocks,
-	TP_PROTO(struct inode *inode, unsigned long goal,
-		 unsigned long count, unsigned long block),
-
-	TP_ARGS(inode, goal, count, block),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	unsigned long,	block		)
-		__field(	unsigned long, count		)
-		__field(	unsigned long,	goal		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->block	= block;
-		__entry->count	= count;
-		__entry->goal	= goal;
-	),
-
-	TP_printk("dev %d,%d ino %lu count %lu block %lu goal %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		   __entry->count, __entry->block,
-		  __entry->goal)
-);
-
-TRACE_EVENT(ext3_free_blocks,
-	TP_PROTO(struct inode *inode, unsigned long block,
-		 unsigned long count),
-
-	TP_ARGS(inode, block, count),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	umode_t, mode			)
-		__field(	unsigned long,	block		)
-		__field(	unsigned long,	count		)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= inode->i_sb->s_dev;
-		__entry->ino		= inode->i_ino;
-		__entry->mode		= inode->i_mode;
-		__entry->block		= block;
-		__entry->count		= count;
-	),
-
-	TP_printk("dev %d,%d ino %lu mode 0%o block %lu count %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->mode, __entry->block, __entry->count)
-);
-
-TRACE_EVENT(ext3_sync_file_enter,
-	TP_PROTO(struct file *file, int datasync),
-
-	TP_ARGS(file, datasync),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	ino_t,	parent			)
-		__field(	int,	datasync		)
-	),
-
-	TP_fast_assign(
-		struct dentry *dentry = file->f_path.dentry;
-
-		__entry->dev		= d_inode(dentry)->i_sb->s_dev;
-		__entry->ino		= d_inode(dentry)->i_ino;
-		__entry->datasync	= datasync;
-		__entry->parent		= d_inode(dentry->d_parent)->i_ino;
-	),
-
-	TP_printk("dev %d,%d ino %lu parent %ld datasync %d ",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long) __entry->parent, __entry->datasync)
-);
-
-TRACE_EVENT(ext3_sync_file_exit,
-	TP_PROTO(struct inode *inode, int ret),
-
-	TP_ARGS(inode, ret),
-
-	TP_STRUCT__entry(
-		__field(	int,	ret			)
-		__field(	ino_t,	ino			)
-		__field(	dev_t,	dev			)
-	),
-
-	TP_fast_assign(
-		__entry->ret		= ret;
-		__entry->ino		= inode->i_ino;
-		__entry->dev		= inode->i_sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d ino %lu ret %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->ret)
-);
-
-TRACE_EVENT(ext3_sync_fs,
-	TP_PROTO(struct super_block *sb, int wait),
-
-	TP_ARGS(sb, wait),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	wait			)
-
-	),
-
-	TP_fast_assign(
-		__entry->dev	= sb->s_dev;
-		__entry->wait	= wait;
-	),
-
-	TP_printk("dev %d,%d wait %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->wait)
-);
-
-TRACE_EVENT(ext3_rsv_window_add,
-	TP_PROTO(struct super_block *sb,
-		 struct ext3_reserve_window_node *rsv_node),
-
-	TP_ARGS(sb, rsv_node),
-
-	TP_STRUCT__entry(
-		__field(	unsigned long,	start		)
-		__field(	unsigned long,	end		)
-		__field(	dev_t,	dev			)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= sb->s_dev;
-		__entry->start	= rsv_node->rsv_window._rsv_start;
-		__entry->end	= rsv_node->rsv_window._rsv_end;
-	),
-
-	TP_printk("dev %d,%d start %lu end %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->start, __entry->end)
-);
-
-TRACE_EVENT(ext3_discard_reservation,
-	TP_PROTO(struct inode *inode,
-		 struct ext3_reserve_window_node *rsv_node),
-
-	TP_ARGS(inode, rsv_node),
-
-	TP_STRUCT__entry(
-		__field(	unsigned long,	start		)
-		__field(	unsigned long,	end		)
-		__field(	ino_t,	ino			)
-		__field(	dev_t,	dev			)
-	),
-
-	TP_fast_assign(
-		__entry->start	= rsv_node->rsv_window._rsv_start;
-		__entry->end	= rsv_node->rsv_window._rsv_end;
-		__entry->ino	= inode->i_ino;
-		__entry->dev	= inode->i_sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d ino %lu start %lu end %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long)__entry->ino, __entry->start,
-		  __entry->end)
-);
-
-TRACE_EVENT(ext3_alloc_new_reservation,
-	TP_PROTO(struct super_block *sb, unsigned long goal),
-
-	TP_ARGS(sb, goal),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	unsigned long,	goal		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= sb->s_dev;
-		__entry->goal	= goal;
-	),
-
-	TP_printk("dev %d,%d goal %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->goal)
-);
-
-TRACE_EVENT(ext3_reserved,
-	TP_PROTO(struct super_block *sb, unsigned long block,
-		 struct ext3_reserve_window_node *rsv_node),
-
-	TP_ARGS(sb, block, rsv_node),
-
-	TP_STRUCT__entry(
-		__field(	unsigned long,	block		)
-		__field(	unsigned long,	start		)
-		__field(	unsigned long,	end		)
-		__field(	dev_t,	dev			)
-	),
-
-	TP_fast_assign(
-		__entry->block	= block;
-		__entry->start	= rsv_node->rsv_window._rsv_start;
-		__entry->end	= rsv_node->rsv_window._rsv_end;
-		__entry->dev	= sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d block %lu, start %lu end %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->block, __entry->start, __entry->end)
-);
-
-TRACE_EVENT(ext3_forget,
-	TP_PROTO(struct inode *inode, int is_metadata, unsigned long block),
-
-	TP_ARGS(inode, is_metadata, block),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	ino_t,	ino			)
-		__field(	umode_t, mode			)
-		__field(	int,	is_metadata		)
-		__field(	unsigned long,	block		)
-	),
-
-	TP_fast_assign(
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->ino	= inode->i_ino;
-		__entry->mode	= inode->i_mode;
-		__entry->is_metadata = is_metadata;
-		__entry->block	= block;
-	),
-
-	TP_printk("dev %d,%d ino %lu mode 0%o is_metadata %d block %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->mode, __entry->is_metadata, __entry->block)
-);
-
-TRACE_EVENT(ext3_read_block_bitmap,
-	TP_PROTO(struct super_block *sb, unsigned int group),
-
-	TP_ARGS(sb, group),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	__u32,	group			)
-
-	),
-
-	TP_fast_assign(
-		__entry->dev	= sb->s_dev;
-		__entry->group	= group;
-	),
-
-	TP_printk("dev %d,%d group %u",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->group)
-);
-
-TRACE_EVENT(ext3_direct_IO_enter,
-	TP_PROTO(struct inode *inode, loff_t offset, unsigned long len, int rw),
-
-	TP_ARGS(inode, offset, len, rw),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,	ino			)
-		__field(	dev_t,	dev			)
-		__field(	loff_t,	pos			)
-		__field(	unsigned long,	len		)
-		__field(	int,	rw			)
-	),
-
-	TP_fast_assign(
-		__entry->ino	= inode->i_ino;
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->pos	= offset;
-		__entry->len	= len;
-		__entry->rw	= rw;
-	),
-
-	TP_printk("dev %d,%d ino %lu pos %llu len %lu rw %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long long) __entry->pos, __entry->len,
-		  __entry->rw)
-);
-
-TRACE_EVENT(ext3_direct_IO_exit,
-	TP_PROTO(struct inode *inode, loff_t offset, unsigned long len,
-		 int rw, int ret),
-
-	TP_ARGS(inode, offset, len, rw, ret),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,	ino			)
-		__field(	dev_t,	dev			)
-		__field(	loff_t,	pos			)
-		__field(	unsigned long,	len		)
-		__field(	int,	rw			)
-		__field(	int,	ret			)
-	),
-
-	TP_fast_assign(
-		__entry->ino	= inode->i_ino;
-		__entry->dev	= inode->i_sb->s_dev;
-		__entry->pos	= offset;
-		__entry->len	= len;
-		__entry->rw	= rw;
-		__entry->ret	= ret;
-	),
-
-	TP_printk("dev %d,%d ino %lu pos %llu len %lu rw %d ret %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long long) __entry->pos, __entry->len,
-		  __entry->rw, __entry->ret)
-);
-
-TRACE_EVENT(ext3_unlink_enter,
-	TP_PROTO(struct inode *parent, struct dentry *dentry),
-
-	TP_ARGS(parent, dentry),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,	parent			)
-		__field(	ino_t,	ino			)
-		__field(	loff_t,	size			)
-		__field(	dev_t,	dev			)
-	),
-
-	TP_fast_assign(
-		__entry->parent		= parent->i_ino;
-		__entry->ino		= d_inode(dentry)->i_ino;
-		__entry->size		= d_inode(dentry)->i_size;
-		__entry->dev		= d_inode(dentry)->i_sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d ino %lu size %lld parent %ld",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  (unsigned long long)__entry->size,
-		  (unsigned long) __entry->parent)
-);
-
-TRACE_EVENT(ext3_unlink_exit,
-	TP_PROTO(struct dentry *dentry, int ret),
-
-	TP_ARGS(dentry, ret),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,	ino			)
-		__field(	dev_t,	dev			)
-		__field(	int,	ret			)
-	),
-
-	TP_fast_assign(
-		__entry->ino		= d_inode(dentry)->i_ino;
-		__entry->dev		= d_inode(dentry)->i_sb->s_dev;
-		__entry->ret		= ret;
-	),
-
-	TP_printk("dev %d,%d ino %lu ret %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->ret)
-);
-
-DECLARE_EVENT_CLASS(ext3__truncate,
-	TP_PROTO(struct inode *inode),
-
-	TP_ARGS(inode),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,		ino		)
-		__field(	dev_t,		dev		)
-		__field(	blkcnt_t,	blocks		)
-	),
-
-	TP_fast_assign(
-		__entry->ino    = inode->i_ino;
-		__entry->dev    = inode->i_sb->s_dev;
-		__entry->blocks	= inode->i_blocks;
-	),
-
-	TP_printk("dev %d,%d ino %lu blocks %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino, (unsigned long) __entry->blocks)
-);
-
-DEFINE_EVENT(ext3__truncate, ext3_truncate_enter,
-
-	TP_PROTO(struct inode *inode),
-
-	TP_ARGS(inode)
-);
-
-DEFINE_EVENT(ext3__truncate, ext3_truncate_exit,
-
-	TP_PROTO(struct inode *inode),
-
-	TP_ARGS(inode)
-);
-
-TRACE_EVENT(ext3_get_blocks_enter,
-	TP_PROTO(struct inode *inode, unsigned long lblk,
-		 unsigned long len, int create),
-
-	TP_ARGS(inode, lblk, len, create),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,		ino		)
-		__field(	dev_t,		dev		)
-		__field(	unsigned long,	lblk		)
-		__field(	unsigned long,	len		)
-		__field(	int,		create		)
-	),
-
-	TP_fast_assign(
-		__entry->ino    = inode->i_ino;
-		__entry->dev    = inode->i_sb->s_dev;
-		__entry->lblk	= lblk;
-		__entry->len	= len;
-		__entry->create	= create;
-	),
-
-	TP_printk("dev %d,%d ino %lu lblk %lu len %lu create %u",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		  __entry->lblk, __entry->len, __entry->create)
-);
-
-TRACE_EVENT(ext3_get_blocks_exit,
-	TP_PROTO(struct inode *inode, unsigned long lblk,
-		 unsigned long pblk, unsigned long len, int ret),
-
-	TP_ARGS(inode, lblk, pblk, len, ret),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,		ino		)
-		__field(	dev_t,		dev		)
-		__field(	unsigned long,	lblk		)
-		__field(	unsigned long,	pblk		)
-		__field(	unsigned long,	len		)
-		__field(	int,		ret		)
-	),
-
-	TP_fast_assign(
-		__entry->ino    = inode->i_ino;
-		__entry->dev    = inode->i_sb->s_dev;
-		__entry->lblk	= lblk;
-		__entry->pblk	= pblk;
-		__entry->len	= len;
-		__entry->ret	= ret;
-	),
-
-	TP_printk("dev %d,%d ino %lu lblk %lu pblk %lu len %lu ret %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino,
-		   __entry->lblk, __entry->pblk,
-		  __entry->len, __entry->ret)
-);
-
-TRACE_EVENT(ext3_load_inode,
-	TP_PROTO(struct inode *inode),
-
-	TP_ARGS(inode),
-
-	TP_STRUCT__entry(
-		__field(	ino_t,	ino		)
-		__field(	dev_t,	dev		)
-	),
-
-	TP_fast_assign(
-		__entry->ino		= inode->i_ino;
-		__entry->dev		= inode->i_sb->s_dev;
-	),
-
-	TP_printk("dev %d,%d ino %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  (unsigned long) __entry->ino)
-);
-
-#endif /* _TRACE_EXT3_H */
-
-/* This part must be outside protection */
-#include <trace/define_trace.h>
diff --git a/include/trace/events/jbd.h b/include/trace/events/jbd.h
deleted file mode 100644
index da6f2591c25e..000000000000
--- a/include/trace/events/jbd.h
+++ /dev/null
@@ -1,194 +0,0 @@
-#undef TRACE_SYSTEM
-#define TRACE_SYSTEM jbd
-
-#if !defined(_TRACE_JBD_H) || defined(TRACE_HEADER_MULTI_READ)
-#define _TRACE_JBD_H
-
-#include <linux/jbd.h>
-#include <linux/tracepoint.h>
-
-TRACE_EVENT(jbd_checkpoint,
-
-	TP_PROTO(journal_t *journal, int result),
-
-	TP_ARGS(journal, result),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	result			)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->result		= result;
-	),
-
-	TP_printk("dev %d,%d result %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->result)
-);
-
-DECLARE_EVENT_CLASS(jbd_commit,
-
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	transaction		)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->transaction	= commit_transaction->t_tid;
-	),
-
-	TP_printk("dev %d,%d transaction %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->transaction)
-);
-
-DEFINE_EVENT(jbd_commit, jbd_start_commit,
-
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction)
-);
-
-DEFINE_EVENT(jbd_commit, jbd_commit_locking,
-
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction)
-);
-
-DEFINE_EVENT(jbd_commit, jbd_commit_flushing,
-
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction)
-);
-
-DEFINE_EVENT(jbd_commit, jbd_commit_logging,
-
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction)
-);
-
-TRACE_EVENT(jbd_drop_transaction,
-
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	transaction		)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->transaction	= commit_transaction->t_tid;
-	),
-
-	TP_printk("dev %d,%d transaction %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->transaction)
-);
-
-TRACE_EVENT(jbd_end_commit,
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	transaction		)
-		__field(	int,	head			)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->transaction	= commit_transaction->t_tid;
-		__entry->head		= journal->j_tail_sequence;
-	),
-
-	TP_printk("dev %d,%d transaction %d head %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->transaction, __entry->head)
-);
-
-TRACE_EVENT(jbd_do_submit_data,
-	TP_PROTO(journal_t *journal, transaction_t *commit_transaction),
-
-	TP_ARGS(journal, commit_transaction),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	transaction		)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->transaction	= commit_transaction->t_tid;
-	),
-
-	TP_printk("dev %d,%d transaction %d",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		   __entry->transaction)
-);
-
-TRACE_EVENT(jbd_cleanup_journal_tail,
-
-	TP_PROTO(journal_t *journal, tid_t first_tid,
-		 unsigned long block_nr, unsigned long freed),
-
-	TP_ARGS(journal, first_tid, block_nr, freed),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	tid_t,	tail_sequence		)
-		__field(	tid_t,	first_tid		)
-		__field(unsigned long,	block_nr		)
-		__field(unsigned long,	freed			)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->tail_sequence	= journal->j_tail_sequence;
-		__entry->first_tid	= first_tid;
-		__entry->block_nr	= block_nr;
-		__entry->freed		= freed;
-	),
-
-	TP_printk("dev %d,%d from %u to %u offset %lu freed %lu",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		  __entry->tail_sequence, __entry->first_tid,
-		  __entry->block_nr, __entry->freed)
-);
-
-TRACE_EVENT(journal_write_superblock,
-	TP_PROTO(journal_t *journal, int write_op),
-
-	TP_ARGS(journal, write_op),
-
-	TP_STRUCT__entry(
-		__field(	dev_t,	dev			)
-		__field(	int,	write_op		)
-	),
-
-	TP_fast_assign(
-		__entry->dev		= journal->j_fs_dev->bd_dev;
-		__entry->write_op	= write_op;
-	),
-
-	TP_printk("dev %d,%d write_op %x", MAJOR(__entry->dev),
-		  MINOR(__entry->dev), __entry->write_op)
-);
-
-#endif /* _TRACE_JBD_H */
-
-/* This part must be outside protection */
-#include <trace/define_trace.h>
-- 
cgit v1.2.3


From ddc05fa286068a2d0cf87a47739b155806d46670 Mon Sep 17 00:00:00 2001
From: Giuseppe Barba <giuseppe.barba@st.com>
Date: Tue, 21 Jul 2015 10:35:44 +0200
Subject: iio: st-accel: add support for lsm303agr accelerometer

This adds support for the lsm303agr accelerometer.

Signed-off-by: Giuseppe Barba <giuseppe.barba@st.com>
Acked-by: Denis Ciocca <denis.ciocca@st.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
---
 Documentation/devicetree/bindings/iio/st-sensors.txt | 1 +
 drivers/iio/accel/st_accel.h                         | 1 +
 drivers/iio/accel/st_accel_core.c                    | 1 +
 drivers/iio/accel/st_accel_i2c.c                     | 5 +++++
 drivers/iio/accel/st_accel_spi.c                     | 1 +
 5 files changed, 9 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/iio/st-sensors.txt b/Documentation/devicetree/bindings/iio/st-sensors.txt
index 8a6be3bdf267..d80bdaa75754 100644
--- a/Documentation/devicetree/bindings/iio/st-sensors.txt
+++ b/Documentation/devicetree/bindings/iio/st-sensors.txt
@@ -35,6 +35,7 @@ Accelerometers:
 - st,lsm303dl-accel
 - st,lsm303dlm-accel
 - st,lsm330-accel
+- st,lsm303agr-accel
 
 Gyroscopes:
 - st,l3g4200d-gyro
diff --git a/drivers/iio/accel/st_accel.h b/drivers/iio/accel/st_accel.h
index aa1001931d0c..468f21fa2950 100644
--- a/drivers/iio/accel/st_accel.h
+++ b/drivers/iio/accel/st_accel.h
@@ -26,6 +26,7 @@
 #define LSM303DLH_ACCEL_DEV_NAME	"lsm303dlh_accel"
 #define LSM303DLM_ACCEL_DEV_NAME	"lsm303dlm_accel"
 #define LSM330_ACCEL_DEV_NAME		"lsm330_accel"
+#define LSM303AGR_ACCEL_DEV_NAME	"lsm303agr_accel"
 
 /**
 * struct st_sensors_platform_data - default accel platform data
diff --git a/drivers/iio/accel/st_accel_core.c b/drivers/iio/accel/st_accel_core.c
index 12b42f6e70ab..ff30f8806880 100644
--- a/drivers/iio/accel/st_accel_core.c
+++ b/drivers/iio/accel/st_accel_core.c
@@ -233,6 +233,7 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = {
 			[2] = LSM330D_ACCEL_DEV_NAME,
 			[3] = LSM330DL_ACCEL_DEV_NAME,
 			[4] = LSM330DLC_ACCEL_DEV_NAME,
+			[5] = LSM303AGR_ACCEL_DEV_NAME,
 		},
 		.ch = (struct iio_chan_spec *)st_accel_12bit_channels,
 		.odr = {
diff --git a/drivers/iio/accel/st_accel_i2c.c b/drivers/iio/accel/st_accel_i2c.c
index a2f1c20319eb..8b9cc84fd44f 100644
--- a/drivers/iio/accel/st_accel_i2c.c
+++ b/drivers/iio/accel/st_accel_i2c.c
@@ -68,6 +68,10 @@ static const struct of_device_id st_accel_of_match[] = {
 		.compatible = "st,lsm330-accel",
 		.data = LSM330_ACCEL_DEV_NAME,
 	},
+	{
+		.compatible = "st,lsm303agr-accel",
+		.data = LSM303AGR_ACCEL_DEV_NAME,
+	},
 	{},
 };
 MODULE_DEVICE_TABLE(of, st_accel_of_match);
@@ -116,6 +120,7 @@ static const struct i2c_device_id st_accel_id_table[] = {
 	{ LSM303DL_ACCEL_DEV_NAME },
 	{ LSM303DLM_ACCEL_DEV_NAME },
 	{ LSM330_ACCEL_DEV_NAME },
+	{ LSM303AGR_ACCEL_DEV_NAME },
 	{},
 };
 MODULE_DEVICE_TABLE(i2c, st_accel_id_table);
diff --git a/drivers/iio/accel/st_accel_spi.c b/drivers/iio/accel/st_accel_spi.c
index 12ec29389e4b..54b61a3961c3 100644
--- a/drivers/iio/accel/st_accel_spi.c
+++ b/drivers/iio/accel/st_accel_spi.c
@@ -57,6 +57,7 @@ static const struct spi_device_id st_accel_id_table[] = {
 	{ LSM303DL_ACCEL_DEV_NAME },
 	{ LSM303DLM_ACCEL_DEV_NAME },
 	{ LSM330_ACCEL_DEV_NAME },
+	{ LSM303AGR_ACCEL_DEV_NAME },
 	{},
 };
 MODULE_DEVICE_TABLE(spi, st_accel_id_table);
-- 
cgit v1.2.3


From 1e9676a8474b657b6a793f0abff2205783a1ce6f Mon Sep 17 00:00:00 2001
From: Giuseppe Barba <giuseppe.barba@st.com>
Date: Tue, 21 Jul 2015 10:35:45 +0200
Subject: iio: st-magn: add support for lsm303agr magnetometer

This adds support for the lsm303agr magnetometer.

Signed-off-by: Giuseppe Barba <giuseppe.barba@st.com>
Acked-by: Denis Ciocca <denis.ciocca@st.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
---
 .../devicetree/bindings/iio/st-sensors.txt         |  1 +
 drivers/iio/magnetometer/st_magn.h                 |  1 +
 drivers/iio/magnetometer/st_magn_core.c            | 82 ++++++++++++++++++++++
 drivers/iio/magnetometer/st_magn_i2c.c             |  5 ++
 drivers/iio/magnetometer/st_magn_spi.c             |  1 +
 5 files changed, 90 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/iio/st-sensors.txt b/Documentation/devicetree/bindings/iio/st-sensors.txt
index d80bdaa75754..d3ccdb190c53 100644
--- a/Documentation/devicetree/bindings/iio/st-sensors.txt
+++ b/Documentation/devicetree/bindings/iio/st-sensors.txt
@@ -47,6 +47,7 @@ Gyroscopes:
 - st,lsm330-gyro
 
 Magnetometers:
+- st,lsm303agr-magn
 - st,lsm303dlh-magn
 - st,lsm303dlhc-magn
 - st,lsm303dlm-magn
diff --git a/drivers/iio/magnetometer/st_magn.h b/drivers/iio/magnetometer/st_magn.h
index 604410236d99..06a4d9c35581 100644
--- a/drivers/iio/magnetometer/st_magn.h
+++ b/drivers/iio/magnetometer/st_magn.h
@@ -18,6 +18,7 @@
 #define LSM303DLHC_MAGN_DEV_NAME	"lsm303dlhc_magn"
 #define LSM303DLM_MAGN_DEV_NAME		"lsm303dlm_magn"
 #define LIS3MDL_MAGN_DEV_NAME		"lis3mdl"
+#define LSM303AGR_MAGN_DEV_NAME		"lsm303agr_magn"
 
 int st_magn_common_probe(struct iio_dev *indio_dev);
 void st_magn_common_remove(struct iio_dev *indio_dev);
diff --git a/drivers/iio/magnetometer/st_magn_core.c b/drivers/iio/magnetometer/st_magn_core.c
index 4c0cef865517..f8dc4b85d70c 100644
--- a/drivers/iio/magnetometer/st_magn_core.c
+++ b/drivers/iio/magnetometer/st_magn_core.c
@@ -43,6 +43,7 @@
 #define ST_MAGN_FS_AVL_8000MG			8000
 #define ST_MAGN_FS_AVL_8100MG			8100
 #define ST_MAGN_FS_AVL_12000MG			12000
+#define ST_MAGN_FS_AVL_15000MG			15000
 #define ST_MAGN_FS_AVL_16000MG			16000
 
 /* CUSTOM VALUES FOR SENSOR 0 */
@@ -157,6 +158,29 @@
 #define ST_MAGN_2_OUT_Y_L_ADDR			0x2a
 #define ST_MAGN_2_OUT_Z_L_ADDR			0x2c
 
+/* CUSTOM VALUES FOR SENSOR 3 */
+#define ST_MAGN_3_WAI_ADDR			0x4f
+#define ST_MAGN_3_WAI_EXP			0x40
+#define ST_MAGN_3_ODR_ADDR			0x60
+#define ST_MAGN_3_ODR_MASK			0x0c
+#define ST_MAGN_3_ODR_AVL_10HZ_VAL		0x00
+#define ST_MAGN_3_ODR_AVL_20HZ_VAL		0x01
+#define ST_MAGN_3_ODR_AVL_50HZ_VAL		0x02
+#define ST_MAGN_3_ODR_AVL_100HZ_VAL		0x03
+#define ST_MAGN_3_PW_ADDR			0x60
+#define ST_MAGN_3_PW_MASK			0x03
+#define ST_MAGN_3_PW_ON				0x00
+#define ST_MAGN_3_PW_OFF			0x03
+#define ST_MAGN_3_BDU_ADDR			0x62
+#define ST_MAGN_3_BDU_MASK			0x10
+#define ST_MAGN_3_DRDY_IRQ_ADDR			0x62
+#define ST_MAGN_3_DRDY_INT_MASK			0x01
+#define ST_MAGN_3_FS_AVL_15000_GAIN		1500
+#define ST_MAGN_3_MULTIREAD_BIT			false
+#define ST_MAGN_3_OUT_X_L_ADDR			0x68
+#define ST_MAGN_3_OUT_Y_L_ADDR			0x6a
+#define ST_MAGN_3_OUT_Z_L_ADDR			0x6c
+
 static const struct iio_chan_spec st_magn_16bit_channels[] = {
 	ST_SENSORS_LSM_CHANNELS(IIO_MAGN,
 			BIT(IIO_CHAN_INFO_RAW) | BIT(IIO_CHAN_INFO_SCALE),
@@ -189,6 +213,22 @@ static const struct iio_chan_spec st_magn_2_16bit_channels[] = {
 	IIO_CHAN_SOFT_TIMESTAMP(3)
 };
 
+static const struct iio_chan_spec st_magn_3_16bit_channels[] = {
+	ST_SENSORS_LSM_CHANNELS(IIO_MAGN,
+			BIT(IIO_CHAN_INFO_RAW) | BIT(IIO_CHAN_INFO_SCALE),
+			ST_SENSORS_SCAN_X, 1, IIO_MOD_X, 's', IIO_LE, 16, 16,
+			ST_MAGN_3_OUT_X_L_ADDR),
+	ST_SENSORS_LSM_CHANNELS(IIO_MAGN,
+			BIT(IIO_CHAN_INFO_RAW) | BIT(IIO_CHAN_INFO_SCALE),
+			ST_SENSORS_SCAN_Y, 1, IIO_MOD_Y, 's', IIO_LE, 16, 16,
+			ST_MAGN_3_OUT_Y_L_ADDR),
+	ST_SENSORS_LSM_CHANNELS(IIO_MAGN,
+			BIT(IIO_CHAN_INFO_RAW) | BIT(IIO_CHAN_INFO_SCALE),
+			ST_SENSORS_SCAN_Z, 1, IIO_MOD_Z, 's', IIO_LE, 16, 16,
+			ST_MAGN_3_OUT_Z_L_ADDR),
+	IIO_CHAN_SOFT_TIMESTAMP(3)
+};
+
 static const struct st_sensor_settings st_magn_sensors_settings[] = {
 	{
 		.wai = 0, /* This sensor has no valid WhoAmI report 0 */
@@ -402,6 +442,48 @@ static const struct st_sensor_settings st_magn_sensors_settings[] = {
 		.multi_read_bit = ST_MAGN_2_MULTIREAD_BIT,
 		.bootime = 2,
 	},
+	{
+		.wai = ST_MAGN_3_WAI_EXP,
+		.wai_addr = ST_MAGN_3_WAI_ADDR,
+		.sensors_supported = {
+			[0] = LSM303AGR_MAGN_DEV_NAME,
+		},
+		.ch = (struct iio_chan_spec *)st_magn_3_16bit_channels,
+		.odr = {
+			.addr = ST_MAGN_3_ODR_ADDR,
+			.mask = ST_MAGN_3_ODR_MASK,
+			.odr_avl = {
+				{ 10, ST_MAGN_3_ODR_AVL_10HZ_VAL, },
+				{ 20, ST_MAGN_3_ODR_AVL_20HZ_VAL, },
+				{ 50, ST_MAGN_3_ODR_AVL_50HZ_VAL, },
+				{ 100, ST_MAGN_3_ODR_AVL_100HZ_VAL, },
+			},
+		},
+		.pw = {
+			.addr = ST_MAGN_3_PW_ADDR,
+			.mask = ST_MAGN_3_PW_MASK,
+			.value_on = ST_MAGN_3_PW_ON,
+			.value_off = ST_MAGN_3_PW_OFF,
+		},
+		.fs = {
+			.fs_avl = {
+				[0] = {
+					.num = ST_MAGN_FS_AVL_15000MG,
+					.gain = ST_MAGN_3_FS_AVL_15000_GAIN,
+				},
+			},
+		},
+		.bdu = {
+			.addr = ST_MAGN_3_BDU_ADDR,
+			.mask = ST_MAGN_3_BDU_MASK,
+		},
+		.drdy_irq = {
+			.addr = ST_MAGN_3_DRDY_IRQ_ADDR,
+			.mask_int1 = ST_MAGN_3_DRDY_INT_MASK,
+		},
+		.multi_read_bit = ST_MAGN_3_MULTIREAD_BIT,
+		.bootime = 2,
+	},
 };
 
 static int st_magn_read_raw(struct iio_dev *indio_dev,
diff --git a/drivers/iio/magnetometer/st_magn_i2c.c b/drivers/iio/magnetometer/st_magn_i2c.c
index 28aa80731cea..8aa37af306ed 100644
--- a/drivers/iio/magnetometer/st_magn_i2c.c
+++ b/drivers/iio/magnetometer/st_magn_i2c.c
@@ -36,6 +36,10 @@ static const struct of_device_id st_magn_of_match[] = {
 		.compatible = "st,lis3mdl-magn",
 		.data = LIS3MDL_MAGN_DEV_NAME,
 	},
+	{
+		.compatible = "st,lsm303agr-magn",
+		.data = LSM303AGR_MAGN_DEV_NAME,
+	},
 	{},
 };
 MODULE_DEVICE_TABLE(of, st_magn_of_match);
@@ -79,6 +83,7 @@ static const struct i2c_device_id st_magn_id_table[] = {
 	{ LSM303DLHC_MAGN_DEV_NAME },
 	{ LSM303DLM_MAGN_DEV_NAME },
 	{ LIS3MDL_MAGN_DEV_NAME },
+	{ LSM303AGR_MAGN_DEV_NAME },
 	{},
 };
 MODULE_DEVICE_TABLE(i2c, st_magn_id_table);
diff --git a/drivers/iio/magnetometer/st_magn_spi.c b/drivers/iio/magnetometer/st_magn_spi.c
index 7adacf160146..0abca2c6afa6 100644
--- a/drivers/iio/magnetometer/st_magn_spi.c
+++ b/drivers/iio/magnetometer/st_magn_spi.c
@@ -51,6 +51,7 @@ static const struct spi_device_id st_magn_id_table[] = {
 	{ LSM303DLHC_MAGN_DEV_NAME },
 	{ LSM303DLM_MAGN_DEV_NAME },
 	{ LIS3MDL_MAGN_DEV_NAME },
+	{ LSM303AGR_MAGN_DEV_NAME },
 	{},
 };
 MODULE_DEVICE_TABLE(spi, st_magn_id_table);
-- 
cgit v1.2.3


From 13517157d16f98564b91658940ccd720724da62f Mon Sep 17 00:00:00 2001
From: Romain Perier <romain.perier@gmail.com>
Date: Thu, 23 Jul 2015 18:50:49 +0200
Subject: ARM: dts: rockchip: Add veyron-speedy board

Which is formally known as the Asus C201 chromebook

Signed-off-by: Romain Perier <romain.perier@gmail.com>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
---
 Documentation/devicetree/bindings/arm/rockchip.txt |   8 ++
 arch/arm/boot/dts/Makefile                         |   3 +-
 arch/arm/boot/dts/rk3288-veyron-speedy.dts         | 155 +++++++++++++++++++++
 3 files changed, 165 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/boot/dts/rk3288-veyron-speedy.dts

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/rockchip.txt b/Documentation/devicetree/bindings/arm/rockchip.txt
index c7769400ef85..c7411cc53533 100644
--- a/Documentation/devicetree/bindings/arm/rockchip.txt
+++ b/Documentation/devicetree/bindings/arm/rockchip.txt
@@ -43,6 +43,14 @@ Rockchip platforms device tree bindings
       - compatible = "google,veyron-pinky-rev2", "google,veyron-pinky",
 		     "google,veyron", "rockchip,rk3288";
 
+- Google Speedy (Asus C201 Chromebook):
+    Required root node properties:
+      - compatible = "google,veyron-speedy-rev9", "google,veyron-speedy-rev8",
+		     "google,veyron-speedy-rev7", "google,veyron-speedy-rev6",
+		     "google,veyron-speedy-rev5", "google,veyron-speedy-rev4",
+		     "google,veyron-speedy-rev3", "google,veyron-speedy-rev2",
+		     "google,veyron-speedy", "google,veyron", "rockchip,rk3288";
+
 - Rockchip R88 board:
     Required root node properties:
       - compatible = "rockchip,r88", "rockchip,rk3368";
diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index ce7e1a821a23..7017d8e3f4f5 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -491,7 +491,8 @@ dtb-$(CONFIG_ARCH_ROCKCHIP) += \
 	rk3288-firefly.dtb \
 	rk3288-r89.dtb \
 	rk3288-veyron-jerry.dtb \
-	rk3288-veyron-pinky.dtb
+	rk3288-veyron-pinky.dtb \
+	rk3288-veyron-speedy.dtb
 dtb-$(CONFIG_ARCH_S3C24XX) += \
 	s3c2416-smdk2416.dtb
 dtb-$(CONFIG_ARCH_S3C64XX) += \
diff --git a/arch/arm/boot/dts/rk3288-veyron-speedy.dts b/arch/arm/boot/dts/rk3288-veyron-speedy.dts
new file mode 100644
index 000000000000..a7ea7d06cf7f
--- /dev/null
+++ b/arch/arm/boot/dts/rk3288-veyron-speedy.dts
@@ -0,0 +1,155 @@
+/*
+ * Google Veyron Speedy Rev 1+ board device tree source
+ *
+ * Copyright 2015 Google, Inc
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This file is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *  Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+#include "rk3288-veyron-chromebook.dtsi"
+#include "cros-ec-sbs.dtsi"
+
+/ {
+	model = "Google Speedy";
+	compatible = "google,veyron-speedy-rev9", "google,veyron-speedy-rev8",
+		     "google,veyron-speedy-rev7", "google,veyron-speedy-rev6",
+		     "google,veyron-speedy-rev5", "google,veyron-speedy-rev4",
+		     "google,veyron-speedy-rev3", "google,veyron-speedy-rev2",
+		     "google,veyron-speedy", "google,veyron", "rockchip,rk3288";
+
+	panel_regulator: panel-regulator {
+		compatible = "regulator-fixed";
+		enable-active-high;
+		gpio = <&gpio7 14 GPIO_ACTIVE_HIGH>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&lcd_enable_h>;
+		regulator-name = "panel_regulator";
+		vin-supply = <&vcc33_sys>;
+	};
+
+	vcc18_lcd: vcc18-lcd {
+		compatible = "regulator-fixed";
+		enable-active-high;
+		gpio = <&gpio2 13 GPIO_ACTIVE_HIGH>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&avdd_1v8_disp_en>;
+		regulator-name = "vcc18_lcd";
+		regulator-always-on;
+		regulator-boot-on;
+		vin-supply = <&vcc18_wl>;
+	};
+
+	backlight_regulator: backlight-regulator {
+		compatible = "regulator-fixed";
+		enable-active-high;
+		gpio = <&gpio2 12 GPIO_ACTIVE_HIGH>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&bl_pwr_en>;
+		regulator-name = "backlight_regulator";
+		vin-supply = <&vcc33_sys>;
+		startup-delay-us = <15000>;
+	};
+};
+
+&rk808 {
+	pinctrl-names = "default";
+	pinctrl-0 = <&pmic_int_l>;
+};
+
+&sdmmc {
+	disable-wp;
+	pinctrl-names = "default";
+	pinctrl-0 = <&sdmmc_clk &sdmmc_cmd &sdmmc_cd_disabled &sdmmc_cd_gpio
+			&sdmmc_bus4>;
+};
+
+&vcc_5v {
+	enable-active-high;
+	gpio = <&gpio7 21 GPIO_ACTIVE_HIGH>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&drv_5v>;
+};
+
+&vcc50_hdmi {
+	enable-active-high;
+	gpio = <&gpio5 19 GPIO_ACTIVE_HIGH>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&vcc50_hdmi_en>;
+};
+
+&pinctrl {
+	backlight {
+		bl_pwr_en: bl_pwr_en {
+			rockchip,pins = <2 12 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	buck-5v {
+		drv_5v: drv-5v {
+			rockchip,pins = <7 21 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	hdmi {
+		vcc50_hdmi_en: vcc50-hdmi-en {
+			rockchip,pins = <5 19 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	lcd {
+		lcd_enable_h: lcd-en {
+			rockchip,pins = <7 14 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+
+		avdd_1v8_disp_en: avdd-1v8-disp-en {
+			rockchip,pins = <2 13 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	pmic {
+		dvs_1: dvs-1 {
+			rockchip,pins = <7 12 RK_FUNC_GPIO &pcfg_pull_down>;
+		};
+
+		dvs_2: dvs-2 {
+			rockchip,pins = <7 15 RK_FUNC_GPIO &pcfg_pull_down>;
+		};
+	};
+};
-- 
cgit v1.2.3


From aeda5003d0b987085acef6ad4844f8e34851bb10 Mon Sep 17 00:00:00 2001
From: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Date: Thu, 16 Jul 2015 11:54:14 -0700
Subject: Input: matrix_keypad - change name of wakeup property to
 "wakeup-source"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Wakeup property of device is not Linux-specific, it describes intended
system behavior regardless of the OS being used. Therefore let's drop
"linux," prefix, and, while at it, use the same name as I2C bus does:
"wakeup-source".

We keep parsing old name to keep compatibility with old DTSes.

Cc: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 Documentation/devicetree/bindings/input/gpio-matrix-keypad.txt | 2 +-
 drivers/input/keyboard/matrix_keypad.c                         | 6 ++++--
 2 files changed, 5 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/input/gpio-matrix-keypad.txt b/Documentation/devicetree/bindings/input/gpio-matrix-keypad.txt
index ead641c65e0a..4d86059c370c 100644
--- a/Documentation/devicetree/bindings/input/gpio-matrix-keypad.txt
+++ b/Documentation/devicetree/bindings/input/gpio-matrix-keypad.txt
@@ -19,7 +19,7 @@ Required Properties:
 
 Optional Properties:
 - linux,no-autorepeat:	do no enable autorepeat feature.
-- linux,wakeup:		use any event on keypad as wakeup event.
+- wakeup-source:	use any event on keypad as wakeup event.
 - debounce-delay-ms:	debounce interval in milliseconds
 - col-scan-delay-us:	delay, measured in microseconds, that is needed
 			before we can scan keypad after activating column gpio
diff --git a/drivers/input/keyboard/matrix_keypad.c b/drivers/input/keyboard/matrix_keypad.c
index b370a59cb759..7f12b6579f82 100644
--- a/drivers/input/keyboard/matrix_keypad.c
+++ b/drivers/input/keyboard/matrix_keypad.c
@@ -425,8 +425,10 @@ matrix_keypad_parse_dt(struct device *dev)
 
 	if (of_get_property(np, "linux,no-autorepeat", NULL))
 		pdata->no_autorepeat = true;
-	if (of_get_property(np, "linux,wakeup", NULL))
-		pdata->wakeup = true;
+
+	pdata->wakeup = of_property_read_bool(np, "wakeup-source") ||
+			of_property_read_bool(np, "linux,wakeup"); /* legacy */
+
 	if (of_get_property(np, "gpio-activelow", NULL))
 		pdata->active_low = true;
 
-- 
cgit v1.2.3


From abf77a32288d4379dfede0b50be6a04be3cd1431 Mon Sep 17 00:00:00 2001
From: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Date: Thu, 16 Jul 2015 12:03:39 -0700
Subject: Input: ads7846 - change name of wakeup property to "wakeup-source"

Wakeup property of device is not Linux-specific, it describes intended
system behavior regardless of the OS being used. Therefore let's drop
"linux," prefix, and, while at it, use the same name as I2C bus does:
"wakeup-source".

We keep parsing old name to keep compatibility with old DTSes.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 Documentation/devicetree/bindings/input/ads7846.txt | 2 +-
 drivers/input/touchscreen/ads7846.c                 | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/input/ads7846.txt b/Documentation/devicetree/bindings/input/ads7846.txt
index 5f7619c22743..df8b1279491d 100644
--- a/Documentation/devicetree/bindings/input/ads7846.txt
+++ b/Documentation/devicetree/bindings/input/ads7846.txt
@@ -64,7 +64,7 @@ Optional properties:
 					pendown-gpio (u32).
 	pendown-gpio			GPIO handle describing the pin the !PENIRQ
 					line is connected to.
-	linux,wakeup			use any event on touchscreen as wakeup event.
+	wakeup-source			use any event on touchscreen as wakeup event.
 
 
 Example for a TSC2046 chip connected to an McSPI controller of an OMAP SoC::
diff --git a/drivers/input/touchscreen/ads7846.c b/drivers/input/touchscreen/ads7846.c
index e4eb8a6c658f..0f5f968592bd 100644
--- a/drivers/input/touchscreen/ads7846.c
+++ b/drivers/input/touchscreen/ads7846.c
@@ -1234,7 +1234,8 @@ static const struct ads7846_platform_data *ads7846_probe_dt(struct device *dev)
 	of_property_read_u32(node, "ti,pendown-gpio-debounce",
 			     &pdata->gpio_pendown_debounce);
 
-	pdata->wakeup = of_property_read_bool(node, "linux,wakeup");
+	pdata->wakeup = of_property_read_bool(node, "wakeup-source") ||
+			of_property_read_bool(node, "linux,wakeup");
 
 	pdata->gpio_pendown = of_get_named_gpio(dev->of_node, "pendown-gpio", 0);
 
-- 
cgit v1.2.3


From 274696521254855ba03f2d4f4575ae048a409256 Mon Sep 17 00:00:00 2001
From: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Date: Thu, 16 Jul 2015 12:15:24 -0700
Subject: Input: pmic8xxx-keypad - change name of wakeup property

Wakeup property of device is not Linux-specific, it describes intended
system behavior regardless of the OS being used. Therefore let's drop
"linux," prefix, and, while at it, use the same name as I2C bus does:
"wakeup-source".

We keep parsing old name to keep compatibility with old DTSes.

Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 Documentation/devicetree/bindings/input/qcom,pm8xxx-keypad.txt |  2 +-
 drivers/input/keyboard/pmic8xxx-keypad.c                       | 10 ++++++----
 2 files changed, 7 insertions(+), 5 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/input/qcom,pm8xxx-keypad.txt b/Documentation/devicetree/bindings/input/qcom,pm8xxx-keypad.txt
index 7d8cb92831d7..ee6215681182 100644
--- a/Documentation/devicetree/bindings/input/qcom,pm8xxx-keypad.txt
+++ b/Documentation/devicetree/bindings/input/qcom,pm8xxx-keypad.txt
@@ -33,7 +33,7 @@ PROPERTIES
 	Value type: <bool>
 	Definition: don't enable autorepeat feature.
 
-- linux,keypad-wakeup:
+- wakeup-source:
 	Usage: optional
 	Value type: <bool>
 	Definition: use any event on keypad as wakeup event.
diff --git a/drivers/input/keyboard/pmic8xxx-keypad.c b/drivers/input/keyboard/pmic8xxx-keypad.c
index 32580afecc26..5c68e3f096bc 100644
--- a/drivers/input/keyboard/pmic8xxx-keypad.c
+++ b/drivers/input/keyboard/pmic8xxx-keypad.c
@@ -507,6 +507,7 @@ static void pmic8xxx_kp_close(struct input_dev *dev)
  */
 static int pmic8xxx_kp_probe(struct platform_device *pdev)
 {
+	struct device_node *np = pdev->dev.of_node;
 	unsigned int rows, cols;
 	bool repeat;
 	bool wakeup;
@@ -524,10 +525,11 @@ static int pmic8xxx_kp_probe(struct platform_device *pdev)
 		return -EINVAL;
 	}
 
-	repeat = !of_property_read_bool(pdev->dev.of_node,
-					"linux,input-no-autorepeat");
-	wakeup = of_property_read_bool(pdev->dev.of_node,
-					"linux,keypad-wakeup");
+	repeat = !of_property_read_bool(np, "linux,input-no-autorepeat");
+
+	wakeup = of_property_read_bool(np, "wakeup-source") ||
+		 /* legacy name */
+		 of_property_read_bool(np, "linux,keypad-wakeup");
 
 	kp = devm_kzalloc(&pdev->dev, sizeof(*kp), GFP_KERNEL);
 	if (!kp)
-- 
cgit v1.2.3


From 99b4ffbd84ea4191e0b8d1709230656a1c33b848 Mon Sep 17 00:00:00 2001
From: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Date: Thu, 16 Jul 2015 12:10:05 -0700
Subject: Input: gpio_keys[_polled] - change name of wakeup property

Wakeup property of device is not Linux-specific, it describes intended
system behavior regardless of the OS being used. Therefore let's drop
"linux," prefix, and, while at it, use the same name as I2C bus does:
"wakeup-source".

We keep parsing old name to keep compatibility with old DTSes.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 Documentation/devicetree/bindings/input/gpio-keys-polled.txt | 2 +-
 Documentation/devicetree/bindings/input/gpio-keys.txt        | 2 +-
 drivers/input/keyboard/gpio_keys.c                           | 4 +++-
 drivers/input/keyboard/gpio_keys_polled.c                    | 5 ++++-
 4 files changed, 9 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/input/gpio-keys-polled.txt b/Documentation/devicetree/bindings/input/gpio-keys-polled.txt
index 313abefa37cc..5b91f5a3bd5c 100644
--- a/Documentation/devicetree/bindings/input/gpio-keys-polled.txt
+++ b/Documentation/devicetree/bindings/input/gpio-keys-polled.txt
@@ -20,7 +20,7 @@ Optional subnode-properties:
 	  If not specified defaults to <1> == EV_KEY.
 	- debounce-interval: Debouncing interval time in milliseconds.
 	  If not specified defaults to 5.
-	- gpio-key,wakeup: Boolean, button can wake-up the system.
+	- wakeup-source: Boolean, button can wake-up the system.
 
 Example nodes:
 
diff --git a/Documentation/devicetree/bindings/input/gpio-keys.txt b/Documentation/devicetree/bindings/input/gpio-keys.txt
index 44b705767aca..072bf7573c37 100644
--- a/Documentation/devicetree/bindings/input/gpio-keys.txt
+++ b/Documentation/devicetree/bindings/input/gpio-keys.txt
@@ -23,7 +23,7 @@ Optional subnode-properties:
 	  If not specified defaults to <1> == EV_KEY.
 	- debounce-interval: Debouncing interval time in milliseconds.
 	  If not specified defaults to 5.
-	- gpio-key,wakeup: Boolean, button can wake-up the system.
+	- wakeup-source: Boolean, button can wake-up the system.
 	- linux,can-disable: Boolean, indicates that button is connected
 	  to dedicated (not shared) interrupt which can be disabled to
 	  suppress events from the button.
diff --git a/drivers/input/keyboard/gpio_keys.c b/drivers/input/keyboard/gpio_keys.c
index ddf4045de084..1df4507c4c0b 100644
--- a/drivers/input/keyboard/gpio_keys.c
+++ b/drivers/input/keyboard/gpio_keys.c
@@ -655,7 +655,9 @@ gpio_keys_get_devtree_pdata(struct device *dev)
 		if (of_property_read_u32(pp, "linux,input-type", &button->type))
 			button->type = EV_KEY;
 
-		button->wakeup = !!of_get_property(pp, "gpio-key,wakeup", NULL);
+		button->wakeup = of_property_read_bool(pp, "wakeup-source") ||
+				 /* legacy name */
+				 of_property_read_bool(pp, "gpio-key,wakeup");
 
 		button->can_disable = !!of_get_property(pp, "linux,can-disable", NULL);
 
diff --git a/drivers/input/keyboard/gpio_keys_polled.c b/drivers/input/keyboard/gpio_keys_polled.c
index 097d7216d98e..5a0c99950659 100644
--- a/drivers/input/keyboard/gpio_keys_polled.c
+++ b/drivers/input/keyboard/gpio_keys_polled.c
@@ -152,7 +152,10 @@ static struct gpio_keys_platform_data *gpio_keys_polled_get_devtree_pdata(struct
 					     &button->type))
 			button->type = EV_KEY;
 
-		button->wakeup = fwnode_property_present(child, "gpio-key,wakeup");
+		button->wakeup =
+			fwnode_property_read_bool(child, "wakeup-source") ||
+			/* legacy name */
+			fwnode_property_read_bool(child, "gpio-key,wakeup");
 
 		if (fwnode_property_read_u32(child, "debounce-interval",
 					     &button->debounce_interval))
-- 
cgit v1.2.3


From d879aeaa416fea126081452e811c1d225fd25b00 Mon Sep 17 00:00:00 2001
From: Sekhar Nori <nsekhar@ti.com>
Date: Tue, 14 Jul 2015 13:32:04 +0530
Subject: Documentation: DT: omap_serial: document missing compatible

The compatible "ti,am4372-uart" is used in arch/arm/boot/dts/am4372.dtsi
but not documented. Add necessary documentation.

Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/devicetree/bindings/serial/omap_serial.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/serial/omap_serial.txt b/Documentation/devicetree/bindings/serial/omap_serial.txt
index 54c2a155c783..d3bd2b1ec401 100644
--- a/Documentation/devicetree/bindings/serial/omap_serial.txt
+++ b/Documentation/devicetree/bindings/serial/omap_serial.txt
@@ -4,6 +4,7 @@ Required properties:
 - compatible : should be "ti,omap2-uart" for OMAP2 controllers
 - compatible : should be "ti,omap3-uart" for OMAP3 controllers
 - compatible : should be "ti,omap4-uart" for OMAP4 controllers
+- compatible : should be "ti,am4372-uart" for AM437x controllers
 - reg : address and length of the register space
 - interrupts or interrupts-extended : Should contain the uart interrupt
                                       specifier or both the interrupt
-- 
cgit v1.2.3


From 4fcdff9bcabc135d72f69ed8723a14a9e514fe46 Mon Sep 17 00:00:00 2001
From: Sekhar Nori <nsekhar@ti.com>
Date: Tue, 14 Jul 2015 13:32:06 +0530
Subject: serial: 8250_omap: introduce "ti,am3352-uart" compatible property

Use of of_machine_is_compatible() for handling AM335x specific
"DMA kick" quirk in 8250_omap driver makes it ugly to extend the
quirk for other platforms. Instead use a new compatible.

The new compatible will also make it easier to take care of
other quirks on AM335x and like SoCs.

In order to not break backward DTB compatibility for users of
8250_omap driver on AM335x based boards, existing use of
of_machine_is_compatible() has not been removed.

Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 .../devicetree/bindings/serial/omap_serial.txt     |  1 +
 arch/arm/boot/dts/am33xx.dtsi                      | 12 ++++----
 drivers/tty/serial/8250/8250_omap.c                | 32 ++++++++++++++--------
 3 files changed, 28 insertions(+), 17 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/serial/omap_serial.txt b/Documentation/devicetree/bindings/serial/omap_serial.txt
index d3bd2b1ec401..0ee88209b341 100644
--- a/Documentation/devicetree/bindings/serial/omap_serial.txt
+++ b/Documentation/devicetree/bindings/serial/omap_serial.txt
@@ -5,6 +5,7 @@ Required properties:
 - compatible : should be "ti,omap3-uart" for OMAP3 controllers
 - compatible : should be "ti,omap4-uart" for OMAP4 controllers
 - compatible : should be "ti,am4372-uart" for AM437x controllers
+- compatible : should be "ti,am3352-uart" for AM335x controllers
 - reg : address and length of the register space
 - interrupts or interrupts-extended : Should contain the uart interrupt
                                       specifier or both the interrupt
diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi
index 21fcc440fc1a..b76f9a2ce05d 100644
--- a/arch/arm/boot/dts/am33xx.dtsi
+++ b/arch/arm/boot/dts/am33xx.dtsi
@@ -210,7 +210,7 @@
 		};
 
 		uart0: serial@44e09000 {
-			compatible = "ti,omap3-uart";
+			compatible = "ti,am3352-uart", "ti,omap3-uart";
 			ti,hwmods = "uart1";
 			clock-frequency = <48000000>;
 			reg = <0x44e09000 0x2000>;
@@ -221,7 +221,7 @@
 		};
 
 		uart1: serial@48022000 {
-			compatible = "ti,omap3-uart";
+			compatible = "ti,am3352-uart", "ti,omap3-uart";
 			ti,hwmods = "uart2";
 			clock-frequency = <48000000>;
 			reg = <0x48022000 0x2000>;
@@ -232,7 +232,7 @@
 		};
 
 		uart2: serial@48024000 {
-			compatible = "ti,omap3-uart";
+			compatible = "ti,am3352-uart", "ti,omap3-uart";
 			ti,hwmods = "uart3";
 			clock-frequency = <48000000>;
 			reg = <0x48024000 0x2000>;
@@ -243,7 +243,7 @@
 		};
 
 		uart3: serial@481a6000 {
-			compatible = "ti,omap3-uart";
+			compatible = "ti,am3352-uart", "ti,omap3-uart";
 			ti,hwmods = "uart4";
 			clock-frequency = <48000000>;
 			reg = <0x481a6000 0x2000>;
@@ -252,7 +252,7 @@
 		};
 
 		uart4: serial@481a8000 {
-			compatible = "ti,omap3-uart";
+			compatible = "ti,am3352-uart", "ti,omap3-uart";
 			ti,hwmods = "uart5";
 			clock-frequency = <48000000>;
 			reg = <0x481a8000 0x2000>;
@@ -261,7 +261,7 @@
 		};
 
 		uart5: serial@481aa000 {
-			compatible = "ti,omap3-uart";
+			compatible = "ti,am3352-uart", "ti,omap3-uart";
 			ti,hwmods = "uart6";
 			clock-frequency = <48000000>;
 			reg = <0x481aa000 0x2000>;
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index d9c96b993a84..fff026f3d0bb 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -16,6 +16,7 @@
 #include <linux/platform_device.h>
 #include <linux/slab.h>
 #include <linux/of.h>
+#include <linux/of_device.h>
 #include <linux/of_gpio.h>
 #include <linux/of_irq.h>
 #include <linux/delay.h>
@@ -537,14 +538,14 @@ static void omap_serial_fill_features_erratas(struct uart_8250_port *up,
 
 	switch (revision) {
 	case OMAP_UART_REV_46:
-		priv->habit = UART_ERRATA_i202_MDR1_ACCESS;
+		priv->habit |= UART_ERRATA_i202_MDR1_ACCESS;
 		break;
 	case OMAP_UART_REV_52:
-		priv->habit = UART_ERRATA_i202_MDR1_ACCESS |
+		priv->habit |= UART_ERRATA_i202_MDR1_ACCESS |
 				OMAP_UART_WER_HAS_TX_WAKEUP;
 		break;
 	case OMAP_UART_REV_63:
-		priv->habit = UART_ERRATA_i202_MDR1_ACCESS |
+		priv->habit |= UART_ERRATA_i202_MDR1_ACCESS |
 			OMAP_UART_WER_HAS_TX_WAKEUP;
 		break;
 	default:
@@ -1061,6 +1062,17 @@ static int omap8250_no_handle_irq(struct uart_port *port)
 	return 0;
 }
 
+static const u8 am3352_habit = OMAP_DMA_TX_KICK;
+
+static const struct of_device_id omap8250_dt_ids[] = {
+	{ .compatible = "ti,omap2-uart" },
+	{ .compatible = "ti,omap3-uart" },
+	{ .compatible = "ti,omap4-uart" },
+	{ .compatible = "ti,am3352-uart", .data = &am3352_habit, },
+	{},
+};
+MODULE_DEVICE_TABLE(of, omap8250_dt_ids);
+
 static int omap8250_probe(struct platform_device *pdev)
 {
 	struct resource *regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
@@ -1125,11 +1137,17 @@ static int omap8250_probe(struct platform_device *pdev)
 	up.port.unthrottle = omap_8250_unthrottle;
 
 	if (pdev->dev.of_node) {
+		const struct of_device_id *id;
+
 		ret = of_alias_get_id(pdev->dev.of_node, "serial");
 
 		of_property_read_u32(pdev->dev.of_node, "clock-frequency",
 				     &up.port.uartclk);
 		priv->wakeirq = irq_of_parse_and_map(pdev->dev.of_node, 1);
+
+		id = of_match_device(of_match_ptr(omap8250_dt_ids), &pdev->dev);
+		if (id && id->data)
+			priv->habit |= *(u8 *)id->data;
 	} else {
 		ret = pdev->id;
 	}
@@ -1374,14 +1392,6 @@ static const struct dev_pm_ops omap8250_dev_pm_ops = {
 	.complete       = omap8250_complete,
 };
 
-static const struct of_device_id omap8250_dt_ids[] = {
-	{ .compatible = "ti,omap2-uart" },
-	{ .compatible = "ti,omap3-uart" },
-	{ .compatible = "ti,omap4-uart" },
-	{},
-};
-MODULE_DEVICE_TABLE(of, omap8250_dt_ids);
-
 static struct platform_driver omap8250_platform_driver = {
 	.driver = {
 		.name		= "omap8250",
-- 
cgit v1.2.3


From 27c93af7e7d9ccdfa28e26767c09106bf0721998 Mon Sep 17 00:00:00 2001
From: Sekhar Nori <nsekhar@ti.com>
Date: Tue, 14 Jul 2015 13:32:08 +0530
Subject: serial: 8250_omap: workaround module disable errata on dra7x SoCs

Due to Advisory 21 as documented in AM437x errata document,
UART module cannot be disabled once DMA is used. The only
workaround is to softreset the module before disabling it.

DRA7x UARTs are compatible to AM437x UARTs in terms of
this errata and prescribed workaround.

Enable usage of workaround for this errata on DRA7x SoCs.

Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/devicetree/bindings/serial/omap_serial.txt | 1 +
 drivers/tty/serial/8250/8250_omap.c                      | 1 +
 2 files changed, 2 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/serial/omap_serial.txt b/Documentation/devicetree/bindings/serial/omap_serial.txt
index 0ee88209b341..7a71b5de77d6 100644
--- a/Documentation/devicetree/bindings/serial/omap_serial.txt
+++ b/Documentation/devicetree/bindings/serial/omap_serial.txt
@@ -6,6 +6,7 @@ Required properties:
 - compatible : should be "ti,omap4-uart" for OMAP4 controllers
 - compatible : should be "ti,am4372-uart" for AM437x controllers
 - compatible : should be "ti,am3352-uart" for AM335x controllers
+- compatible : should be "ti,dra742-uart" for DRA7x controllers
 - reg : address and length of the register space
 - interrupts or interrupts-extended : Should contain the uart interrupt
                                       specifier or both the interrupt
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index a2ee75dee070..d9a37191a1ae 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -1082,6 +1082,7 @@ static const struct of_device_id omap8250_dt_ids[] = {
 	{ .compatible = "ti,omap4-uart" },
 	{ .compatible = "ti,am3352-uart", .data = &am3352_habit, },
 	{ .compatible = "ti,am4372-uart", .data = &am4372_habit, },
+	{ .compatible = "ti,dra742-uart", .data = &am4372_habit, },
 	{},
 };
 MODULE_DEVICE_TABLE(of, omap8250_dt_ids);
-- 
cgit v1.2.3


From 43b7be3b8c0457c35a124d8538372f82f355bd29 Mon Sep 17 00:00:00 2001
From: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Date: Fri, 17 Jul 2015 17:04:10 -0700
Subject: Input: samsung-keypad - change name of wakeup property

Wakeup property of device is not Linux-specific, it describes intended
system behavior regardless of the OS being used. Therefore let's drop
"linux," prefix, and, while at it, use the same name as I2C bus does:
"wakeup-source".

We keep parsing old name to keep compatibility with old DTSes.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 Documentation/devicetree/bindings/input/samsung-keypad.txt | 4 +++-
 drivers/input/keyboard/samsung-keypad.c                    | 6 ++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/input/samsung-keypad.txt b/Documentation/devicetree/bindings/input/samsung-keypad.txt
index 942d071baaa5..863e77f619dc 100644
--- a/Documentation/devicetree/bindings/input/samsung-keypad.txt
+++ b/Documentation/devicetree/bindings/input/samsung-keypad.txt
@@ -36,9 +36,11 @@ Required Board Specific Properties:
 - pinctrl-0: Should specify pin control groups used for this controller.
 - pinctrl-names: Should contain only one value - "default".
 
+Optional Properties:
+- wakeup-source: use any event on keypad as wakeup event.
+
 Optional Properties specific to linux:
 - linux,keypad-no-autorepeat: do no enable autorepeat feature.
-- linux,keypad-wakeup: use any event on keypad as wakeup event.
 
 
 Example:
diff --git a/drivers/input/keyboard/samsung-keypad.c b/drivers/input/keyboard/samsung-keypad.c
index 43e48dac7687..4e319eb9e19d 100644
--- a/drivers/input/keyboard/samsung-keypad.c
+++ b/drivers/input/keyboard/samsung-keypad.c
@@ -299,8 +299,10 @@ samsung_keypad_parse_dt(struct device *dev)
 	if (of_get_property(np, "linux,input-no-autorepeat", NULL))
 		pdata->no_autorepeat = true;
 
-	if (of_get_property(np, "linux,input-wakeup", NULL))
-		pdata->wakeup = true;
+	pdata->wakeup = of_property_read_bool(np, "wakeup-source") ||
+			/* legacy name */
+			of_property_read_bool(np, "linux,input-wakeup");
+
 
 	return pdata;
 }
-- 
cgit v1.2.3


From e571c73ee4836640d3019ac68b008f3662087e80 Mon Sep 17 00:00:00 2001
From: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Date: Fri, 17 Jul 2015 17:08:02 -0700
Subject: Input: tc3589x-keypad - change name of wakeup property

Wakeup property of device is not Linux-specific, it describes intended
system behavior regardless of the OS being used. Therefore let's drop
"linux," prefix, and, while at it, use the same name as I2C bus does:
"wakeup-source".

We keep parsing old name to keep compatibility with old DTSes.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 Documentation/devicetree/bindings/mfd/tc3589x.txt | 4 ++--
 drivers/input/keyboard/tc3589x-keypad.c           | 5 ++++-
 2 files changed, 6 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mfd/tc3589x.txt b/Documentation/devicetree/bindings/mfd/tc3589x.txt
index 6fcedba46ae9..37bf7f1aa70a 100644
--- a/Documentation/devicetree/bindings/mfd/tc3589x.txt
+++ b/Documentation/devicetree/bindings/mfd/tc3589x.txt
@@ -55,7 +55,7 @@ Optional nodes:
  - linux,keymap: the definition can be found in
    bindings/input/matrix-keymap.txt
  - linux,no-autorepeat: do no enable autorepeat feature.
- - linux,wakeup: use any event on keypad as wakeup event.
+ - wakeup-source: use any event on keypad as wakeup event.
 
 Example:
 
@@ -84,7 +84,6 @@ tc35893@44 {
 		keypad,num-columns = <8>;
 		keypad,num-rows = <8>;
 		linux,no-autorepeat;
-		linux,wakeup;
 		linux,keymap = <0x0301006b
 				0x04010066
 				0x06040072
@@ -103,5 +102,6 @@ tc35893@44 {
 				0x01030039
 				0x07060069
 				0x050500d9>;
+		wakeup-source;
 	};
 };
diff --git a/drivers/input/keyboard/tc3589x-keypad.c b/drivers/input/keyboard/tc3589x-keypad.c
index 31c606a4dd31..565805e1ff94 100644
--- a/drivers/input/keyboard/tc3589x-keypad.c
+++ b/drivers/input/keyboard/tc3589x-keypad.c
@@ -352,7 +352,10 @@ tc3589x_keypad_of_probe(struct device *dev)
 	}
 
 	plat->no_autorepeat = of_property_read_bool(np, "linux,no-autorepeat");
-	plat->enable_wakeup = of_property_read_bool(np, "linux,wakeup");
+
+	plat->enable_wakeup = of_property_read_bool(np, "wakeup-source") ||
+			      /* legacy name */
+			      of_property_read_bool(np, "linux,wakeup");
 
 	/* The custom delay format is ms/16 */
 	of_property_read_u32(np, "debounce-delay-ms", &debounce_ms);
-- 
cgit v1.2.3


From 05a051436bdf37e121a627baca605977cd6d0730 Mon Sep 17 00:00:00 2001
From: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Date: Thu, 2 Jul 2015 15:18:09 +0200
Subject: ARM: at91/dt: add a new DT property to support FIFOs on Atmel USARTs

This patch adds a new DT property, "atmel,fifo-size", to enable and set
the maximum number of data the RX and TX FIFOs can store on FIFO capable
USARTs.

Please be aware that the VERSION register can not be used to guess the
size of FIFOs. Indeed, for a given hardware version, the USARTs can be
integrated on Atmel SoCs with different FIFO sizes. Also the
"atmel,fifo-size" property is optional as older USARTs don't embed FIFO at
all.

Besides, the FIFO size can not be read or guessed from other registers:
When designing the FIFO feature, no dedicated registers were added to
store this size. Unsed spaces in the I/O register range are limited and
better reserved for future usages. Instead, the FIFO size of each
peripheral is documented in the programmer datasheet.

Finally, on a given SoC, there can be several instances of USART with
different FIFO sizes. This explain why we'd rather use a dedicated DT
property than use the "compatible" property.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/devicetree/bindings/serial/atmel-usart.txt | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/serial/atmel-usart.txt b/Documentation/devicetree/bindings/serial/atmel-usart.txt
index 90787aa2e648..e6e6142e33ac 100644
--- a/Documentation/devicetree/bindings/serial/atmel-usart.txt
+++ b/Documentation/devicetree/bindings/serial/atmel-usart.txt
@@ -22,6 +22,8 @@ Optional properties:
 		memory peripheral interface and USART DMA channel ID, FIFO configuration.
 		Refer to dma.txt and atmel-dma.txt for details.
 	- dma-names: "rx" for RX channel, "tx" for TX channel.
+- atmel,fifo-size: maximum number of data the RX and TX FIFOs can store for FIFO
+  capable USARTs.
 
 <chip> compatible description:
 - at91rm9200:  legacy USART support
@@ -57,4 +59,5 @@ Example:
 		dmas = <&dma0 2 0x3>,
 		       <&dma0 2 0x204>;
 		dma-names = "tx", "rx";
+		atmel,fifo-size = <32>;
 	};
-- 
cgit v1.2.3


From 678cdb2fbd7e44b8905a8f4e770c694eb77f0395 Mon Sep 17 00:00:00 2001
From: Henry Chen <henryc.chen@mediatek.com>
Date: Fri, 24 Jul 2015 13:24:40 +0800
Subject: regulator: mt6311: Add document for mt6311 regulator

This patch adds a list of supported regulator names to the devicetree
binding documentation for Mediatek MT6311 PMIC.

Signed-off-by: Henry Chen <henryc.chen@mediatek.com>
Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../bindings/regulator/mt6311-regulator.txt        | 35 ++++++++++++++++++++++
 1 file changed, 35 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/regulator/mt6311-regulator.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/regulator/mt6311-regulator.txt b/Documentation/devicetree/bindings/regulator/mt6311-regulator.txt
new file mode 100644
index 000000000000..02649d8b3f5a
--- /dev/null
+++ b/Documentation/devicetree/bindings/regulator/mt6311-regulator.txt
@@ -0,0 +1,35 @@
+Mediatek MT6311 Regulator Driver
+
+Required properties:
+- compatible: "mediatek,mt6311-regulator"
+- reg: I2C slave address, usually 0x6b.
+- regulators: List of regulators provided by this controller. It is named
+  to VDVFS and VBIASN.
+  The definition for each of these nodes is defined using the standard binding
+  for regulators at Documentation/devicetree/bindings/regulator/regulator.txt.
+
+The valid names for regulators are:
+BUCK:
+  VDVFS
+LDO:
+  VBIASN
+
+Example:
+	mt6311: pmic@6b {
+		compatible = "mediatek,mt6311-regulator";
+		reg = <0x6b>;
+
+		regulators {
+			mt6311_vcpu_reg: VDVFS {
+				regulator-name = "VDVFS";
+				regulator-min-microvolt = < 600000>;
+				regulator-max-microvolt = <1400000>;
+				regulator-ramp-delay = <10000>;
+			};
+			mt6311_ldo_reg: VBIASN {
+				regulator-name = "VBIASN";
+				regulator-min-microvolt = <200000>;
+				regulator-max-microvolt = <800000>;
+			};
+		};
+	};
-- 
cgit v1.2.3


From a10726bb5472ff2ed95180cfb5e82091c45d7b19 Mon Sep 17 00:00:00 2001
From: Rabin Vincent <rabin.vincent@axis.com>
Date: Tue, 14 Jul 2015 07:35:11 +0200
Subject: Documentation: mm: fix location of extfrag_index

/proc/extfrag_index does not exist.  This file is in debugfs.  Fix the
description of extfrag_threshold to reflect this.

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/sysctl/vm.txt | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 9832ec52f859..9c3f2f8054b5 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -225,11 +225,11 @@ with your system.  To disable them, echo 4 (bit 3) into drop_caches.
 extfrag_threshold
 
 This parameter affects whether the kernel will compact memory or direct
-reclaim to satisfy a high-order allocation. /proc/extfrag_index shows what
-the fragmentation index for each order is in each zone in the system. Values
-tending towards 0 imply allocations would fail due to lack of memory,
-values towards 1000 imply failures are due to fragmentation and -1 implies
-that the allocation will succeed as long as watermarks are met.
+reclaim to satisfy a high-order allocation. The extfrag/extfrag_index file in
+debugfs shows what the fragmentation index for each order is in each zone in
+the system. Values tending towards 0 imply allocations would fail due to lack
+of memory, values towards 1000 imply failures are due to fragmentation and -1
+implies that the allocation will succeed as long as watermarks are met.
 
 The kernel will not compact memory in a zone if the
 fragmentation index is <= extfrag_threshold. The default value is 500.
-- 
cgit v1.2.3


From 9e1aa7c8882050577c9223ba85c4ee49cd1da469 Mon Sep 17 00:00:00 2001
From: Wang Long <long.wanglong@huawei.com>
Date: Thu, 16 Jul 2015 06:31:16 +0000
Subject: Documentation: Update filesystems/debugfs.txt

This patch update the Documentation/filesystems/debugfs.txt
file. The main work is to add the description of the following
functions:
    debugfs_create_atomic_t
    debugfs_create_u32_array
    debugfs_create_devm_seqfile
    debugfs_create_file_size

Signed-off-by: Wang Long <long.wanglong@huawei.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/debugfs.txt | 40 +++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt
index 88ab81c79109..463f595733e8 100644
--- a/Documentation/filesystems/debugfs.txt
+++ b/Documentation/filesystems/debugfs.txt
@@ -51,6 +51,17 @@ operations should be provided; others can be included as needed.  Again,
 the return value will be a dentry pointer to the created file, NULL for
 error, or ERR_PTR(-ENODEV) if debugfs support is missing.
 
+Create a file with an initial size, the following function can be used
+instead:
+
+    struct dentry *debugfs_create_file_size(const char *name, umode_t mode,
+				struct dentry *parent, void *data,
+				const struct file_operations *fops,
+				loff_t file_size);
+
+file_size is the initial file size. The other parameters are the same
+as the function debugfs_create_file.
+
 In a number of cases, the creation of a set of file operations is not
 actually necessary; the debugfs code provides a number of helper functions
 for simple situations.  Files containing a single integer value can be
@@ -100,6 +111,14 @@ A read on the resulting file will yield either Y (for non-zero values) or
 N, followed by a newline.  If written to, it will accept either upper- or
 lower-case values, or 1 or 0.  Any other input will be silently ignored.
 
+Also, atomic_t values can be placed in debugfs with:
+
+    struct dentry *debugfs_create_atomic_t(const char *name, umode_t mode,
+				struct dentry *parent, atomic_t *value)
+
+A read of this file will get atomic_t values, and a write of this file
+will set atomic_t values.
+
 Another option is exporting a block of arbitrary binary data, with
 this structure and function:
 
@@ -147,6 +166,27 @@ The "base" argument may be 0, but you may want to build the reg32 array
 using __stringify, and a number of register names (macros) are actually
 byte offsets over a base for the register block.
 
+If you want to dump an u32 array in debugfs, you can create file with:
+
+    struct dentry *debugfs_create_u32_array(const char *name, umode_t mode,
+			struct dentry *parent,
+			u32 *array, u32 elements);
+
+The "array" argument provides data, and the "elements" argument is
+the number of elements in the array. Note: Once array is created its
+size can not be changed.
+
+There is a helper function to create device related seq_file:
+
+   struct dentry *debugfs_create_devm_seqfile(struct device *dev,
+				const char *name,
+				struct dentry *parent,
+				int (*read_fn)(struct seq_file *s,
+					void *data));
+
+The "dev" argument is the device related to this debugfs file, and
+the "read_fn" is a function pointer which to be called to print the
+seq_file content.
 
 There are a couple of other directory-oriented helper functions:
 
-- 
cgit v1.2.3


From b9f2f4594c77073ca306ef345e9f9c1cde292dd3 Mon Sep 17 00:00:00 2001
From: Johannes Thumshirn <jthumshirn@suse.de>
Date: Fri, 17 Jul 2015 12:23:01 +0200
Subject: Documentation: Add MCB documentation

Add basic introductory documentation for the MEN Chameleon Bus.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/men-chameleon-bus.txt | 162 ++++++++++++++++++++++++++++++++++++
 MAINTAINERS                         |   1 +
 2 files changed, 163 insertions(+)
 create mode 100644 Documentation/men-chameleon-bus.txt

(limited to 'Documentation')

diff --git a/Documentation/men-chameleon-bus.txt b/Documentation/men-chameleon-bus.txt
new file mode 100644
index 000000000000..6d7bdb5d7f12
--- /dev/null
+++ b/Documentation/men-chameleon-bus.txt
@@ -0,0 +1,162 @@
+                               MEN Chameleon Bus
+                               =================
+
+Table of Contents
+=================
+1 Introduction
+    1.1 Scope of this Document
+    1.2 Limitations of the current implementation
+2 Architecture
+    2.1 MEN Chameleon Bus
+    2.2 Carrier Devices
+    2.3 Parser
+3 Resource handling
+    3.1 Memory Resources
+    3.2 IRQs
+4 Writing a MCB driver
+    4.1 The driver structure
+    4.2 Probing and attaching
+    4.3 Initializing the driver
+
+
+1 Introduction
+===============
+  This document describes the architecture and implementation of the MEN
+  Chameleon Bus (called MCB throughout this document).
+
+1.1 Scope of this Document
+---------------------------
+  This document is intended to be a short overview of the current
+  implementation and does by no means describe to complete possibilities of MCB
+  based devices.
+
+1.2 Limitations of the current implementation
+----------------------------------------------
+  The current implementation is limited to PCI and PCIe based carrier devices
+  that only use a single memory resource and share the PCI legacy IRQ.  Not
+  implemented are:
+  - Multi-resource MCB devices like the VME Controller or M-Module carrier.
+  - MCB devices that need another MCB device, like SRAM for a DMA Controller's
+    buffer descriptors or a video controller's video memory.
+  - A per-carrier IRQ domain for carrier devices that have one (or more) IRQs
+    per MCB device like PCIe based carriers with MSI or MSI-X support.
+
+2 Architecture
+===============
+  MCB is divided in 3 functional blocks:
+  - The MEN Chameleon Bus itself,
+  - drivers for MCB Carrier Devices and
+  - the parser for the Chameleon table.
+
+2.1 MEN Chameleon Bus
+----------------------
+   The MEN Chameleon Bus is an artificial bus system that attaches to an MEN
+   Chameleon FPGA device. These devices are multi-function devices implemented
+   in a single FPGA and usually attached via some sort of PCI or PCIe link. Each
+   FPGA contains a header section describing the content of the FPGA. The header
+   lists the device id, PCI BAR, offset from the beginning of the PCI BAR, size
+   in the FPGA, interrupt number and some other properties currently not handled
+   by the MCB implementation.
+
+2.2 Carrier Devices
+--------------------
+   A carrier device is just an abstraction for the real world physical bus the
+   chameleon FPGA is attached to. Some IP Core drivers may need to interact with
+   properties of the carrier device (like querying the IRQ number of a PCI
+   device). To provide abstraction from the real hardware bus, an MCB carrier
+   device provides callback methods to translate the driver's MCB function calls
+   to hardware related function calls. For example a carrier device may
+   implement the get_irq() method which can be translate into a hardware bus
+   query for the IRQ number the device should use.
+
+2.3 Parser
+-----------
+   The parser reads the 1st 512 bytes of a chameleon device and parses the
+   chameleon table. Currently the parser only supports the Chameleon v2 variant
+   of the chameleon table but can easily be adopted to support an older or
+   possible future variant. While parsing the table's entries new MCB devices
+   are allocated and their resources are assigned according to the resource
+   assignment in the chameleon table. After resource assignment is finished, the
+   MCB devices are registered at the MCB and thus at the driver core of the
+   Linux kernel.
+
+3 Resource handling
+====================
+  The current implementation assigns exactly one memory and one IRQ resource
+  per MCB device. But this is likely going to change in the future.
+
+3.1 Memory Resources
+---------------------
+   Each MCB device has exactly one memory resource, which can be requested from
+   the MCB bus. This memory resource is the physical address of the MCB device
+   inside the carrier and is intended to be passed to ioremap() and friends. It
+   is already requested from the kernel by calling request_mem_region().
+
+3.2 IRQs
+---------
+   Each MCB device has exactly one IRQ resource, which can be requested from the
+   MCB bus. If a carrier device driver implements the ->get_irq() callback
+   method, the IRQ number assigned by the carrier device will be returned,
+   otherwise the IRQ number inside the chameleon table will be returned. This
+   number is suitable to be passed to request_irq().
+
+4 Writing a MCB driver
+=======================
+
+4.1 The driver structure
+-------------------------
+    Each MCB driver has a structure to identify the device driver as well as
+    device ids which identify the IP Core inside the FPGA. The driver structure
+    also contaings callback methods which get executed on driver probe and
+    removal from the system.
+
+
+  static const struct mcb_device_id foo_ids[] = {
+          { .device = 0x123 },
+          { }
+  };
+  MODULE_DEVICE_TABLE(mcb, foo_ids);
+
+  static struct mcb_driver foo_driver = {
+          driver = {
+                  .name = "foo-bar",
+                  .owner = THIS_MODULE,
+          },
+          .probe = foo_probe,
+          .remove = foo_remove,
+          .id_table = foo_ids,
+  };
+
+4.2 Probing and attaching
+--------------------------
+   When a driver is loaded and the MCB devices it services are found, the MCB
+   core will call the driver's probe callback method. When the driver is removed
+   from the system, the MCB core will call the driver's remove callback method.
+
+
+  static init foo_probe(struct mcb_device *mdev, const struct mcb_device_id *id);
+  static void foo_remove(struct mcb_device *mdev);
+
+4.3 Initializing the driver
+----------------------------
+   When the kernel is booted or your foo driver module is inserted, you have to
+   perform driver initialization. Usually it is enough to register your driver
+   module at the MCB core.
+
+
+  static int __init foo_init(void)
+  {
+          return mcb_register_driver(&foo_driver);
+  }
+  module_init(foo_init);
+
+  static void __exit foo_exit(void)
+  {
+          mcb_unregister_driver(&foo_driver);
+  }
+  module_exit(foo_exit);
+
+   The module_mcb_driver() macro can be used to reduce the above code.
+
+
+  module_mcb_driver(foo_driver);
diff --git a/MAINTAINERS b/MAINTAINERS
index 8133cefb6b6e..b9b91566380e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6665,6 +6665,7 @@ M:	Johannes Thumshirn <morbidrsa@gmail.com>
 S:	Maintained
 F:	drivers/mcb/
 F:	include/linux/mcb.h
+F:	Documentation/men-chameleon-bus.txt
 
 MEN F21BMC (Board Management Controller)
 M:	Andreas Werner <andreas.werner@men.de>
-- 
cgit v1.2.3


From 3d5583cc82e9d505e4b9a37615f7a28c7ea465b9 Mon Sep 17 00:00:00 2001
From: Ahmed Mohamed Abd EL Mawgood <ahmedsoliman0x666@gmail.com>
Date: Thu, 16 Jul 2015 23:17:36 +0200
Subject: fix Evolution submenu name in email-clients.txt

I have noticed that there is no submenu in Evolution called Heading
instead there is Paragraph style

Signed-off-by: Ahmed Mohamed <ahmedsoliman0x666@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/email-clients.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/email-clients.txt b/Documentation/email-clients.txt
index c7d49b885559..3fa450881ecb 100644
--- a/Documentation/email-clients.txt
+++ b/Documentation/email-clients.txt
@@ -93,7 +93,7 @@ Evolution (GUI)
 Some people use this successfully for patches.
 
 When composing mail select: Preformat
-  from Format->Heading->Preformatted (Ctrl-7)
+  from Format->Paragraph Style->Preformatted (Ctrl-7)
   or the toolbar
 
 Then use:
-- 
cgit v1.2.3


From fce96568f314d0fb72f5ca4bba2f64e7164a7817 Mon Sep 17 00:00:00 2001
From: Jim Davis <jim.epost@gmail.com>
Date: Thu, 16 Jul 2015 12:50:50 -0700
Subject: Documentation: installed man pages don't need to be executable

Install the man pages with mode 644 instead of 755

Signed-off-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index b6a6a2e0dd3b..170042a1ea5f 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -60,7 +60,7 @@ mandocs: $(MAN)
 
 installmandocs: mandocs
 	mkdir -p /usr/local/man/man9/
-	install $(obj)/man/*.9.gz /usr/local/man/man9/
+	install -m 644 $(obj)/man/*.9.gz /usr/local/man/man9/
 
 ###
 #External programs used
-- 
cgit v1.2.3


From 013542caa898289f37695bea849f1e368254ec8b Mon Sep 17 00:00:00 2001
From: Benjamin Herr <ben@0x539.de>
Date: Sat, 18 Jul 2015 14:31:40 +0200
Subject: Doc: fix trivial typo in SubmittingPatches

This patch changes the tense of a verb in SubmittingPatches to ensure
grammatical validity of the containing sentence.

Signed-off-by: Benjamin Herr <ben@0x539.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/SubmittingPatches | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 56ef2e6f79c2..80b9f54f87ef 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -90,7 +90,7 @@ patch.
 
 Make sure your patch does not include any extra files which do not
 belong in a patch submission.  Make sure to review your patch -after-
-generated it with diff(1), to ensure accuracy.
+generating it with diff(1), to ensure accuracy.
 
 If your changes produce a lot of deltas, you need to split them into
 individual patches which modify things in logical stages; see section
-- 
cgit v1.2.3


From 7c97211b61d8b91b8f88237e4bbb3a8959ecb18a Mon Sep 17 00:00:00 2001
From: Johannes Thumshirn <jthumshirn@suse.de>
Date: Thu, 23 Jul 2015 11:17:26 +0200
Subject: Documentation: Minor changes to men-chameleon-bus.txt

Change men-chameleon-bus.txt according to the comments made by Randy Dunlap in
https://lkml.org/lkml/2015/7/17/691.

These are:
* Some minor gramatical changes
* Spelling fixes
* Write the word "Chameleon" capitalized throughout the whole document
* Explain MEN as MEN Mikro Elektronik GmbH.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/men-chameleon-bus.txt | 39 +++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/men-chameleon-bus.txt b/Documentation/men-chameleon-bus.txt
index 6d7bdb5d7f12..30ded732027e 100644
--- a/Documentation/men-chameleon-bus.txt
+++ b/Documentation/men-chameleon-bus.txt
@@ -13,7 +13,7 @@ Table of Contents
 3 Resource handling
     3.1 Memory Resources
     3.2 IRQs
-4 Writing a MCB driver
+4 Writing an MCB driver
     4.1 The driver structure
     4.2 Probing and attaching
     4.3 Initializing the driver
@@ -27,7 +27,7 @@ Table of Contents
 1.1 Scope of this Document
 ---------------------------
   This document is intended to be a short overview of the current
-  implementation and does by no means describe to complete possibilities of MCB
+  implementation and does by no means describe the complete possibilities of MCB
   based devices.
 
 1.2 Limitations of the current implementation
@@ -43,40 +43,41 @@ Table of Contents
 
 2 Architecture
 ===============
-  MCB is divided in 3 functional blocks:
+  MCB is divided into 3 functional blocks:
   - The MEN Chameleon Bus itself,
   - drivers for MCB Carrier Devices and
   - the parser for the Chameleon table.
 
 2.1 MEN Chameleon Bus
 ----------------------
-   The MEN Chameleon Bus is an artificial bus system that attaches to an MEN
-   Chameleon FPGA device. These devices are multi-function devices implemented
-   in a single FPGA and usually attached via some sort of PCI or PCIe link. Each
-   FPGA contains a header section describing the content of the FPGA. The header
-   lists the device id, PCI BAR, offset from the beginning of the PCI BAR, size
-   in the FPGA, interrupt number and some other properties currently not handled
-   by the MCB implementation.
+   The MEN Chameleon Bus is an artificial bus system that attaches to a so
+   called Chameleon FPGA device found on some hardware produced my MEN Mikro
+   Elektronik GmbH. These devices are multi-function devices implemented in a
+   single FPGA and usually attached via some sort of PCI or PCIe link. Each
+   FPGA contains a header section describing the content of the FPGA. The
+   header lists the device id, PCI BAR, offset from the beginning of the PCI
+   BAR, size in the FPGA, interrupt number and some other properties currently
+   not handled by the MCB implementation.
 
 2.2 Carrier Devices
 --------------------
    A carrier device is just an abstraction for the real world physical bus the
-   chameleon FPGA is attached to. Some IP Core drivers may need to interact with
+   Chameleon FPGA is attached to. Some IP Core drivers may need to interact with
    properties of the carrier device (like querying the IRQ number of a PCI
    device). To provide abstraction from the real hardware bus, an MCB carrier
    device provides callback methods to translate the driver's MCB function calls
    to hardware related function calls. For example a carrier device may
-   implement the get_irq() method which can be translate into a hardware bus
+   implement the get_irq() method which can be translated into a hardware bus
    query for the IRQ number the device should use.
 
 2.3 Parser
 -----------
-   The parser reads the 1st 512 bytes of a chameleon device and parses the
-   chameleon table. Currently the parser only supports the Chameleon v2 variant
-   of the chameleon table but can easily be adopted to support an older or
+   The parser reads the first 512 bytes of a Chameleon device and parses the
+   Chameleon table. Currently the parser only supports the Chameleon v2 variant
+   of the Chameleon table but can easily be adopted to support an older or
    possible future variant. While parsing the table's entries new MCB devices
    are allocated and their resources are assigned according to the resource
-   assignment in the chameleon table. After resource assignment is finished, the
+   assignment in the Chameleon table. After resource assignment is finished, the
    MCB devices are registered at the MCB and thus at the driver core of the
    Linux kernel.
 
@@ -97,17 +98,17 @@ Table of Contents
    Each MCB device has exactly one IRQ resource, which can be requested from the
    MCB bus. If a carrier device driver implements the ->get_irq() callback
    method, the IRQ number assigned by the carrier device will be returned,
-   otherwise the IRQ number inside the chameleon table will be returned. This
+   otherwise the IRQ number inside the Chameleon table will be returned. This
    number is suitable to be passed to request_irq().
 
-4 Writing a MCB driver
+4 Writing an MCB driver
 =======================
 
 4.1 The driver structure
 -------------------------
     Each MCB driver has a structure to identify the device driver as well as
     device ids which identify the IP Core inside the FPGA. The driver structure
-    also contaings callback methods which get executed on driver probe and
+    also contains callback methods which get executed on driver probe and
     removal from the system.
 
 
-- 
cgit v1.2.3


From fa466c91970a0207d9384016cc7884a7f61834b6 Mon Sep 17 00:00:00 2001
From: Franklin S Cooper Jr <fcooper@ti.com>
Date: Wed, 22 Jul 2015 07:32:22 -0500
Subject: spi: davinci: Choose correct pre-scaler limit based on SOC

Currently the pre-scaler limit is incorrect. The value differs slightly
for various devices so a single value can't be used. Using the compatible
field select the correct pre-scaler limit.

Add new compatible field value for Keystone devices to support their
unique pre-scaler limit value.

Signed-off-by: Franklin S Cooper Jr <fcooper@ti.com>
Reviewed-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../devicetree/bindings/spi/spi-davinci.txt        |  2 +
 drivers/spi/spi-davinci.c                          | 43 ++++++++++++++++++----
 include/linux/platform_data/spi-davinci.h          |  1 +
 3 files changed, 39 insertions(+), 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/spi/spi-davinci.txt b/Documentation/devicetree/bindings/spi/spi-davinci.txt
index 12ecfe9e3599..d1e914adcf6e 100644
--- a/Documentation/devicetree/bindings/spi/spi-davinci.txt
+++ b/Documentation/devicetree/bindings/spi/spi-davinci.txt
@@ -12,6 +12,8 @@ Required properties:
 - compatible:
 	- "ti,dm6441-spi" for SPI used similar to that on DM644x SoC family
 	- "ti,da830-spi" for SPI used similar to that on DA8xx SoC family
+	- "ti,keystone-spi" for SPI used similar to that on Keystone2 SoC
+		family
 - reg: Offset and length of SPI controller register space
 - num-cs: Number of chip selects. This includes internal as well as
 	GPIO chip selects.
diff --git a/drivers/spi/spi-davinci.c b/drivers/spi/spi-davinci.c
index b4605c4158f4..3cf9faa6cc3f 100644
--- a/drivers/spi/spi-davinci.c
+++ b/drivers/spi/spi-davinci.c
@@ -139,6 +139,8 @@ struct davinci_spi {
 	u32			(*get_tx)(struct davinci_spi *);
 
 	u8			*bytes_per_word;
+
+	u8			prescaler_limit;
 };
 
 static struct davinci_spi_config davinci_spi_default_cfg;
@@ -266,7 +268,7 @@ static inline int davinci_spi_get_prescale(struct davinci_spi *dspi,
 	/* Subtract 1 to match what will be programmed into SPI register. */
 	ret = DIV_ROUND_UP(clk_get_rate(dspi->clk), max_speed_hz) - 1;
 
-	if (ret < 0 || ret > 255)
+	if (ret < dspi->prescaler_limit || ret > 255)
 		return -EINVAL;
 
 	return ret;
@@ -833,13 +835,40 @@ rx_dma_failed:
 }
 
 #if defined(CONFIG_OF)
+
+/* OF SPI data structure */
+struct davinci_spi_of_data {
+	u8	version;
+	u8	prescaler_limit;
+};
+
+static const struct davinci_spi_of_data dm6441_spi_data = {
+	.version = SPI_VERSION_1,
+	.prescaler_limit = 2,
+};
+
+static const struct davinci_spi_of_data da830_spi_data = {
+	.version = SPI_VERSION_2,
+	.prescaler_limit = 2,
+};
+
+static const struct davinci_spi_of_data keystone_spi_data = {
+	.version = SPI_VERSION_1,
+	.prescaler_limit = 0,
+};
+
 static const struct of_device_id davinci_spi_of_match[] = {
 	{
 		.compatible = "ti,dm6441-spi",
+		.data = &dm6441_spi_data,
 	},
 	{
 		.compatible = "ti,da830-spi",
-		.data = (void *)SPI_VERSION_2,
+		.data = &da830_spi_data,
+	},
+	{
+		.compatible = "ti,keystone-spi",
+		.data = &keystone_spi_data,
 	},
 	{ },
 };
@@ -858,21 +887,21 @@ static int spi_davinci_get_pdata(struct platform_device *pdev,
 			struct davinci_spi *dspi)
 {
 	struct device_node *node = pdev->dev.of_node;
+	struct davinci_spi_of_data *spi_data;
 	struct davinci_spi_platform_data *pdata;
 	unsigned int num_cs, intr_line = 0;
 	const struct of_device_id *match;
 
 	pdata = &dspi->pdata;
 
-	pdata->version = SPI_VERSION_1;
 	match = of_match_device(davinci_spi_of_match, &pdev->dev);
 	if (!match)
 		return -ENODEV;
 
-	/* match data has the SPI version number for SPI_VERSION_2 */
-	if (match->data == (void *)SPI_VERSION_2)
-		pdata->version = SPI_VERSION_2;
+	spi_data = (struct davinci_spi_of_data *)match->data;
 
+	pdata->version = spi_data->version;
+	pdata->prescaler_limit = spi_data->prescaler_limit;
 	/*
 	 * default num_cs is 1 and all chipsel are internal to the chip
 	 * indicated by chip_sel being NULL or cs_gpios being NULL or
@@ -992,7 +1021,7 @@ static int davinci_spi_probe(struct platform_device *pdev)
 
 	dspi->bitbang.chipselect = davinci_spi_chipselect;
 	dspi->bitbang.setup_transfer = davinci_spi_setup_transfer;
-
+	dspi->prescaler_limit = pdata->prescaler_limit;
 	dspi->version = pdata->version;
 
 	dspi->bitbang.flags = SPI_NO_CS | SPI_LSB_FIRST | SPI_LOOP;
diff --git a/include/linux/platform_data/spi-davinci.h b/include/linux/platform_data/spi-davinci.h
index 8dc2fa47a2aa..f4edcb03c40c 100644
--- a/include/linux/platform_data/spi-davinci.h
+++ b/include/linux/platform_data/spi-davinci.h
@@ -49,6 +49,7 @@ struct davinci_spi_platform_data {
 	u8			num_chipselect;
 	u8			intr_line;
 	u8			*chip_sel;
+	u8			prescaler_limit;
 	bool			cshold_bug;
 	enum dma_event_q	dma_event_q;
 };
-- 
cgit v1.2.3


From 3a003baeec246f604ed1d2e0087560d7f15edcc6 Mon Sep 17 00:00:00 2001
From: Stephen Boyd <sboyd@codeaurora.org>
Date: Fri, 17 Jul 2015 14:41:54 -0700
Subject: regulator: Add over current protection (OCP) support

Some regulators can automatically shut down when they detect an
over current event. Add an op (set_over_current_protection) and a
DT property + constraint to support this capability.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/devicetree/bindings/regulator/regulator.txt | 1 +
 drivers/regulator/core.c                                  | 9 +++++++++
 drivers/regulator/of_regulator.c                          | 3 +++
 include/linux/regulator/driver.h                          | 1 +
 include/linux/regulator/machine.h                         | 1 +
 5 files changed, 15 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/regulator/regulator.txt b/Documentation/devicetree/bindings/regulator/regulator.txt
index db88feb28c03..24bd422cecd5 100644
--- a/Documentation/devicetree/bindings/regulator/regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/regulator.txt
@@ -42,6 +42,7 @@ Optional properties:
 - regulator-system-load: Load in uA present on regulator that is not captured by
   any consumer request.
 - regulator-pull-down: Enable pull down resistor when the regulator is disabled.
+- regulator-over-current-protection: Enable over current protection.
 
 Deprecated properties:
 - regulator-compatible: If a regulator chip contains multiple
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index c9f72019bd68..520413e2bca0 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1081,6 +1081,15 @@ static int set_machine_constraints(struct regulator_dev *rdev,
 		}
 	}
 
+	if (rdev->constraints->over_current_protection
+		&& ops->set_over_current_protection) {
+		ret = ops->set_over_current_protection(rdev);
+		if (ret < 0) {
+			rdev_err(rdev, "failed to set over current protection\n");
+			goto out;
+		}
+	}
+
 	print_constraints(rdev);
 	return 0;
 out:
diff --git a/drivers/regulator/of_regulator.c b/drivers/regulator/of_regulator.c
index b1c485b24ab2..250700c853bf 100644
--- a/drivers/regulator/of_regulator.c
+++ b/drivers/regulator/of_regulator.c
@@ -107,6 +107,9 @@ static void of_get_regulation_constraints(struct device_node *np,
 	if (!of_property_read_u32(np, "regulator-system-load", &pval))
 		constraints->system_load = pval;
 
+	constraints->over_current_protection = of_property_read_bool(np,
+					"regulator-over-current-protection");
+
 	for (i = 0; i < ARRAY_SIZE(regulator_states); i++) {
 		switch (i) {
 		case PM_SUSPEND_MEM:
diff --git a/include/linux/regulator/driver.h b/include/linux/regulator/driver.h
index 4db9fbe4889d..45932228cbf5 100644
--- a/include/linux/regulator/driver.h
+++ b/include/linux/regulator/driver.h
@@ -148,6 +148,7 @@ struct regulator_ops {
 	int (*get_current_limit) (struct regulator_dev *);
 
 	int (*set_input_current_limit) (struct regulator_dev *, int lim_uA);
+	int (*set_over_current_protection) (struct regulator_dev *);
 
 	/* enable/disable regulator */
 	int (*enable) (struct regulator_dev *);
diff --git a/include/linux/regulator/machine.h b/include/linux/regulator/machine.h
index b11be1260129..a1067d0b3991 100644
--- a/include/linux/regulator/machine.h
+++ b/include/linux/regulator/machine.h
@@ -147,6 +147,7 @@ struct regulation_constraints {
 	unsigned ramp_disable:1; /* disable ramp delay */
 	unsigned soft_start:1;	/* ramp voltage slowly */
 	unsigned pull_down:1;	/* pull down resistor when regulator off */
+	unsigned over_current_protection:1; /* auto disable on over current */
 };
 
 /**
-- 
cgit v1.2.3


From e2adfacde619d8e39dc8c02919bd2524d3c39588 Mon Sep 17 00:00:00 2001
From: Stephen Boyd <sboyd@codeaurora.org>
Date: Fri, 17 Jul 2015 14:41:55 -0700
Subject: regulator: qcom-spmi: Add vendor specific configuration

Add support for over current protection (OCP), pin control
selection, soft start strength, and auto-mode.

Cc: <devicetree@vger.kernel.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../bindings/regulator/qcom,spmi-regulator.txt     |  60 ++++++-
 drivers/regulator/qcom_spmi-regulator.c            | 200 ++++++++++++++++++++-
 2 files changed, 251 insertions(+), 9 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/regulator/qcom,spmi-regulator.txt b/Documentation/devicetree/bindings/regulator/qcom,spmi-regulator.txt
index 75b4604bad07..d00bfd8624a5 100644
--- a/Documentation/devicetree/bindings/regulator/qcom,spmi-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/qcom,spmi-regulator.txt
@@ -91,13 +91,65 @@ see regulator.txt - with additional custom properties described below:
 - regulator-initial-mode:
 	Usage: optional
 	Value type: <u32>
-	Descrption: 1 = Set initial mode to high power mode (HPM), also referred
-		    to as NPM.  HPM consumes more ground current than LPM, but
+	Description: 2 = Set initial mode to auto mode (automatically select
+		    between HPM and LPM); not available on boost type
+		    regulators.
+
+		    1 = Set initial mode to high power mode (HPM), also referred
+		    to as NPM. HPM consumes more ground current than LPM, but
 		    it can source significantly higher load current. HPM is not
 		    available on boost type regulators. For voltage switch type
 		    regulators, HPM implies that over current protection and
-		    soft start are active all the time. 0 = Set initial mode to
-		    low power mode (LPM).
+		    soft start are active all the time.
+
+		    0 = Set initial mode to low power mode (LPM).
+
+- qcom,ocp-max-retries:
+	Usage: optional
+	Value type: <u32>
+	Description: Maximum number of times to try toggling a voltage switch
+		     off and back on as a result of consecutive over current
+		     events.
+
+- qcom,ocp-retry-delay:
+	Usage: optional
+	Value type: <u32>
+	Description: Time to delay in milliseconds between each voltage switch
+		     toggle after an over current event takes place.
+
+- qcom,pin-ctrl-enable:
+	Usage: optional
+	Value type: <u32>
+	Description: Bit mask specifying which hardware pins should be used to
+		     enable the regulator, if any; supported bits are:
+			0 = ignore all hardware enable signals
+			BIT(0) = follow HW0_EN signal
+			BIT(1) = follow HW1_EN signal
+			BIT(2) = follow HW2_EN signal
+			BIT(3) = follow HW3_EN signal
+
+- qcom,pin-ctrl-hpm:
+	Usage: optional
+	Value type: <u32>
+	Description: Bit mask specifying which hardware pins should be used to
+		     force the regulator into high power mode, if any;
+		     supported bits are:
+			0 = ignore all hardware enable signals
+			BIT(0) = follow HW0_EN signal
+			BIT(1) = follow HW1_EN signal
+			BIT(2) = follow HW2_EN signal
+			BIT(3) = follow HW3_EN signal
+			BIT(4) = follow PMIC awake state
+
+- qcom,vs-soft-start-strength:
+	Usage: optional
+	Value type: <u32>
+	Description: This property sets the soft start strength for voltage
+		     switch type regulators; supported values are:
+			0 = 0.05 uA
+			1 = 0.25 uA
+			2 = 0.55 uA
+			3 = 0.75 uA
 
 Example:
 
diff --git a/drivers/regulator/qcom_spmi-regulator.c b/drivers/regulator/qcom_spmi-regulator.c
index 9ef0e2f28ec4..88a5dc88badc 100644
--- a/drivers/regulator/qcom_spmi-regulator.c
+++ b/drivers/regulator/qcom_spmi-regulator.c
@@ -26,6 +26,70 @@
 #include <linux/regmap.h>
 #include <linux/list.h>
 
+/* Pin control enable input pins. */
+#define SPMI_REGULATOR_PIN_CTRL_ENABLE_NONE		0x00
+#define SPMI_REGULATOR_PIN_CTRL_ENABLE_EN0		0x01
+#define SPMI_REGULATOR_PIN_CTRL_ENABLE_EN1		0x02
+#define SPMI_REGULATOR_PIN_CTRL_ENABLE_EN2		0x04
+#define SPMI_REGULATOR_PIN_CTRL_ENABLE_EN3		0x08
+#define SPMI_REGULATOR_PIN_CTRL_ENABLE_HW_DEFAULT	0x10
+
+/* Pin control high power mode input pins. */
+#define SPMI_REGULATOR_PIN_CTRL_HPM_NONE		0x00
+#define SPMI_REGULATOR_PIN_CTRL_HPM_EN0			0x01
+#define SPMI_REGULATOR_PIN_CTRL_HPM_EN1			0x02
+#define SPMI_REGULATOR_PIN_CTRL_HPM_EN2			0x04
+#define SPMI_REGULATOR_PIN_CTRL_HPM_EN3			0x08
+#define SPMI_REGULATOR_PIN_CTRL_HPM_SLEEP_B		0x10
+#define SPMI_REGULATOR_PIN_CTRL_HPM_HW_DEFAULT		0x20
+
+/*
+ * Used with enable parameters to specify that hardware default register values
+ * should be left unaltered.
+ */
+#define SPMI_REGULATOR_USE_HW_DEFAULT			2
+
+/* Soft start strength of a voltage switch type regulator */
+enum spmi_vs_soft_start_str {
+	SPMI_VS_SOFT_START_STR_0P05_UA = 0,
+	SPMI_VS_SOFT_START_STR_0P25_UA,
+	SPMI_VS_SOFT_START_STR_0P55_UA,
+	SPMI_VS_SOFT_START_STR_0P75_UA,
+	SPMI_VS_SOFT_START_STR_HW_DEFAULT,
+};
+
+/**
+ * struct spmi_regulator_init_data - spmi-regulator initialization data
+ * @pin_ctrl_enable:        Bit mask specifying which hardware pins should be
+ *				used to enable the regulator, if any
+ *			    Value should be an ORing of
+ *				SPMI_REGULATOR_PIN_CTRL_ENABLE_* constants.  If
+ *				the bit specified by
+ *				SPMI_REGULATOR_PIN_CTRL_ENABLE_HW_DEFAULT is
+ *				set, then pin control enable hardware registers
+ *				will not be modified.
+ * @pin_ctrl_hpm:           Bit mask specifying which hardware pins should be
+ *				used to force the regulator into high power
+ *				mode, if any
+ *			    Value should be an ORing of
+ *				SPMI_REGULATOR_PIN_CTRL_HPM_* constants.  If
+ *				the bit specified by
+ *				SPMI_REGULATOR_PIN_CTRL_HPM_HW_DEFAULT is
+ *				set, then pin control mode hardware registers
+ *				will not be modified.
+ * @vs_soft_start_strength: This parameter sets the soft start strength for
+ *				voltage switch type regulators.  Its value
+ *				should be one of SPMI_VS_SOFT_START_STR_*.  If
+ *				its value is SPMI_VS_SOFT_START_STR_HW_DEFAULT,
+ *				then the soft start strength will be left at its
+ *				default hardware value.
+ */
+struct spmi_regulator_init_data {
+	unsigned				pin_ctrl_enable;
+	unsigned				pin_ctrl_hpm;
+	enum spmi_vs_soft_start_str		vs_soft_start_strength;
+};
+
 /* These types correspond to unique register layouts. */
 enum spmi_regulator_logical_type {
 	SPMI_REGULATOR_LOGICAL_TYPE_SMPS,
@@ -458,6 +522,14 @@ static int spmi_regulator_vs_enable(struct regulator_dev *rdev)
 	return spmi_regulator_common_enable(rdev);
 }
 
+static int spmi_regulator_vs_ocp(struct regulator_dev *rdev)
+{
+	struct spmi_regulator *vreg = rdev_get_drvdata(rdev);
+	u8 reg = SPMI_VS_OCP_OVERRIDE;
+
+	return spmi_vreg_write(vreg, SPMI_VS_REG_OCP, &reg, 1);
+}
+
 static int spmi_regulator_common_disable(struct regulator_dev *rdev)
 {
 	struct spmi_regulator *vreg = rdev_get_drvdata(rdev);
@@ -791,6 +863,9 @@ static unsigned int spmi_regulator_common_get_mode(struct regulator_dev *rdev)
 	if (reg & SPMI_COMMON_MODE_HPM_MASK)
 		return REGULATOR_MODE_NORMAL;
 
+	if (reg & SPMI_COMMON_MODE_AUTO_MASK)
+		return REGULATOR_MODE_FAST;
+
 	return REGULATOR_MODE_IDLE;
 }
 
@@ -798,11 +873,13 @@ static int
 spmi_regulator_common_set_mode(struct regulator_dev *rdev, unsigned int mode)
 {
 	struct spmi_regulator *vreg = rdev_get_drvdata(rdev);
-	u8 mask = SPMI_COMMON_MODE_HPM_MASK;
+	u8 mask = SPMI_COMMON_MODE_HPM_MASK | SPMI_COMMON_MODE_AUTO_MASK;
 	u8 val = 0;
 
 	if (mode == REGULATOR_MODE_NORMAL)
-		val = mask;
+		val = SPMI_COMMON_MODE_HPM_MASK;
+	else if (mode == REGULATOR_MODE_FAST)
+		val = SPMI_COMMON_MODE_AUTO_MASK;
 
 	return spmi_vreg_update_bits(vreg, SPMI_COMMON_REG_MODE, val, mask);
 }
@@ -972,6 +1049,7 @@ static struct regulator_ops spmi_vs_ops = {
 	.is_enabled		= spmi_regulator_common_is_enabled,
 	.set_pull_down		= spmi_regulator_common_set_pull_down,
 	.set_soft_start		= spmi_regulator_common_set_soft_start,
+	.set_over_current_protection = spmi_regulator_vs_ocp,
 };
 
 static struct regulator_ops spmi_boost_ops = {
@@ -1202,10 +1280,111 @@ static int spmi_regulator_ftsmps_init_slew_rate(struct spmi_regulator *vreg)
 	return ret;
 }
 
+static int spmi_regulator_init_registers(struct spmi_regulator *vreg,
+				const struct spmi_regulator_init_data *data)
+{
+	int ret;
+	enum spmi_regulator_logical_type type;
+	u8 ctrl_reg[8], reg, mask;
+
+	type = vreg->logical_type;
+
+	ret = spmi_vreg_read(vreg, SPMI_COMMON_REG_VOLTAGE_RANGE, ctrl_reg, 8);
+	if (ret)
+		return ret;
+
+	/* Set up enable pin control. */
+	if ((type == SPMI_REGULATOR_LOGICAL_TYPE_SMPS
+	     || type == SPMI_REGULATOR_LOGICAL_TYPE_LDO
+	     || type == SPMI_REGULATOR_LOGICAL_TYPE_VS)
+	    && !(data->pin_ctrl_enable
+			& SPMI_REGULATOR_PIN_CTRL_ENABLE_HW_DEFAULT)) {
+		ctrl_reg[SPMI_COMMON_IDX_ENABLE] &=
+			~SPMI_COMMON_ENABLE_FOLLOW_ALL_MASK;
+		ctrl_reg[SPMI_COMMON_IDX_ENABLE] |=
+		    data->pin_ctrl_enable & SPMI_COMMON_ENABLE_FOLLOW_ALL_MASK;
+	}
+
+	/* Set up mode pin control. */
+	if ((type == SPMI_REGULATOR_LOGICAL_TYPE_SMPS
+	    || type == SPMI_REGULATOR_LOGICAL_TYPE_LDO)
+		&& !(data->pin_ctrl_hpm
+			& SPMI_REGULATOR_PIN_CTRL_HPM_HW_DEFAULT)) {
+		ctrl_reg[SPMI_COMMON_IDX_MODE] &=
+			~SPMI_COMMON_MODE_FOLLOW_ALL_MASK;
+		ctrl_reg[SPMI_COMMON_IDX_MODE] |=
+			data->pin_ctrl_hpm & SPMI_COMMON_MODE_FOLLOW_ALL_MASK;
+	}
+
+	if (type == SPMI_REGULATOR_LOGICAL_TYPE_VS
+	   && !(data->pin_ctrl_hpm & SPMI_REGULATOR_PIN_CTRL_HPM_HW_DEFAULT)) {
+		ctrl_reg[SPMI_COMMON_IDX_MODE] &=
+			~SPMI_COMMON_MODE_FOLLOW_AWAKE_MASK;
+		ctrl_reg[SPMI_COMMON_IDX_MODE] |=
+		       data->pin_ctrl_hpm & SPMI_COMMON_MODE_FOLLOW_AWAKE_MASK;
+	}
+
+	if ((type == SPMI_REGULATOR_LOGICAL_TYPE_ULT_LO_SMPS
+		|| type == SPMI_REGULATOR_LOGICAL_TYPE_ULT_HO_SMPS
+		|| type == SPMI_REGULATOR_LOGICAL_TYPE_ULT_LDO)
+		&& !(data->pin_ctrl_hpm
+			& SPMI_REGULATOR_PIN_CTRL_HPM_HW_DEFAULT)) {
+		ctrl_reg[SPMI_COMMON_IDX_MODE] &=
+			~SPMI_COMMON_MODE_FOLLOW_AWAKE_MASK;
+		ctrl_reg[SPMI_COMMON_IDX_MODE] |=
+		       data->pin_ctrl_hpm & SPMI_COMMON_MODE_FOLLOW_AWAKE_MASK;
+	}
+
+	/* Write back any control register values that were modified. */
+	ret = spmi_vreg_write(vreg, SPMI_COMMON_REG_VOLTAGE_RANGE, ctrl_reg, 8);
+	if (ret)
+		return ret;
+
+	/* Set soft start strength and over current protection for VS. */
+	if (type == SPMI_REGULATOR_LOGICAL_TYPE_VS) {
+		if (data->vs_soft_start_strength
+				!= SPMI_VS_SOFT_START_STR_HW_DEFAULT) {
+			reg = data->vs_soft_start_strength
+				& SPMI_VS_SOFT_START_SEL_MASK;
+			mask = SPMI_VS_SOFT_START_SEL_MASK;
+			return spmi_vreg_update_bits(vreg,
+						     SPMI_VS_REG_SOFT_START,
+						     reg, mask);
+		}
+	}
+
+	return 0;
+}
+
+static void spmi_regulator_get_dt_config(struct spmi_regulator *vreg,
+		struct device_node *node, struct spmi_regulator_init_data *data)
+{
+	/*
+	 * Initialize configuration parameters to use hardware default in case
+	 * no value is specified via device tree.
+	 */
+	data->pin_ctrl_enable	    = SPMI_REGULATOR_PIN_CTRL_ENABLE_HW_DEFAULT;
+	data->pin_ctrl_hpm	    = SPMI_REGULATOR_PIN_CTRL_HPM_HW_DEFAULT;
+	data->vs_soft_start_strength	= SPMI_VS_SOFT_START_STR_HW_DEFAULT;
+
+	/* These bindings are optional, so it is okay if they aren't found. */
+	of_property_read_u32(node, "qcom,ocp-max-retries",
+		&vreg->ocp_max_retries);
+	of_property_read_u32(node, "qcom,ocp-retry-delay",
+		&vreg->ocp_retry_delay_ms);
+	of_property_read_u32(node, "qcom,pin-ctrl-enable",
+		&data->pin_ctrl_enable);
+	of_property_read_u32(node, "qcom,pin-ctrl-hpm", &data->pin_ctrl_hpm);
+	of_property_read_u32(node, "qcom,vs-soft-start-strength",
+		&data->vs_soft_start_strength);
+}
+
 static unsigned int spmi_regulator_of_map_mode(unsigned int mode)
 {
-	if (mode)
+	if (mode == 1)
 		return REGULATOR_MODE_NORMAL;
+	if (mode == 2)
+		return REGULATOR_MODE_FAST;
 
 	return REGULATOR_MODE_IDLE;
 }
@@ -1214,12 +1393,23 @@ static int spmi_regulator_of_parse(struct device_node *node,
 			    const struct regulator_desc *desc,
 			    struct regulator_config *config)
 {
+	struct spmi_regulator_init_data data = { };
 	struct spmi_regulator *vreg = config->driver_data;
 	struct device *dev = config->dev;
 	int ret;
 
-	vreg->ocp_max_retries = SPMI_VS_OCP_DEFAULT_MAX_RETRIES;
-	vreg->ocp_retry_delay_ms = SPMI_VS_OCP_DEFAULT_RETRY_DELAY_MS;
+	spmi_regulator_get_dt_config(vreg, node, &data);
+
+	if (!vreg->ocp_max_retries)
+		vreg->ocp_max_retries = SPMI_VS_OCP_DEFAULT_MAX_RETRIES;
+	if (!vreg->ocp_retry_delay_ms)
+		vreg->ocp_retry_delay_ms = SPMI_VS_OCP_DEFAULT_RETRY_DELAY_MS;
+
+	ret = spmi_regulator_init_registers(vreg, &data);
+	if (ret) {
+		dev_err(dev, "common initialization failed, ret=%d\n", ret);
+		return ret;
+	}
 
 	if (vreg->logical_type == SPMI_REGULATOR_LOGICAL_TYPE_FTSMPS) {
 		ret = spmi_regulator_ftsmps_init_slew_rate(vreg);
-- 
cgit v1.2.3


From fc5462f8525b47fa219452289ecb22c921c16823 Mon Sep 17 00:00:00 2001
From: Azael Avalos <coproscefalo@gmail.com>
Date: Wed, 22 Jul 2015 18:09:11 -0600
Subject: toshiba_acpi: Add /dev/toshiba_acpi device

There were previous attempts to "merge" the toshiba SMM module to the
toshiba_acpi one, they were trying to imitate what the old toshiba
module does, however, some models (TOS1900 devices) come with a
"crippled" implementation and do not provide all the "features" a
"genuine" Toshiba BIOS does.

This patch adds a new device called toshiba_acpi, which aim is to
enable userspace to access the SMM on Toshiba laptops via ACPI calls.

Creating a new convenience _IOWR command to access the SCI functions
by opening/closing the SCI internally to avoid buggy BIOS, while at
the same time providing backwards compatibility.

Older programs (and new) who wish to access the SMM on newer models
can do it by pointing their path to /dev/toshiba_acpi (instead of
/dev/toshiba) as the toshiba.h header was modified to reflect these
changes as well as adds all the toshiba_acpi paths and command,
however, it is strongly recommended to use the new IOCTL for any
SCI command to avoid any buggy BIOS.

Signed-off-by: Azael Avalos <coproscefalo@gmail.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
---
 Documentation/ioctl/ioctl-number.txt |  2 +-
 drivers/platform/x86/toshiba_acpi.c  | 91 ++++++++++++++++++++++++++++++++++++
 include/uapi/linux/toshiba.h         | 32 +++++++++++--
 3 files changed, 121 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 611c52267d24..21d2f27c886b 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -263,7 +263,7 @@ Code  Seq#(hex)	Include File		Comments
 's'	all	linux/cdk.h
 't'	00-7F	linux/ppp-ioctl.h
 't'	80-8F	linux/isdn_ppp.h
-'t'	90	linux/toshiba.h
+'t'	90-91	linux/toshiba.h		toshiba and toshiba_acpi SMM
 'u'	00-1F	linux/smb_fs.h		gone
 'u'	20-3F	linux/uvcvideo.h	USB video class host driver
 'v'	00-1F	linux/ext2_fs.h		conflict!
diff --git a/drivers/platform/x86/toshiba_acpi.c b/drivers/platform/x86/toshiba_acpi.c
index c3a0c4d0c1dc..802577f43a23 100644
--- a/drivers/platform/x86/toshiba_acpi.c
+++ b/drivers/platform/x86/toshiba_acpi.c
@@ -50,6 +50,8 @@
 #include <linux/acpi.h>
 #include <linux/dmi.h>
 #include <linux/uaccess.h>
+#include <linux/miscdevice.h>
+#include <linux/toshiba.h>
 #include <acpi/video.h>
 
 MODULE_AUTHOR("John Belmonte");
@@ -170,6 +172,7 @@ struct toshiba_acpi_dev {
 	struct led_classdev led_dev;
 	struct led_classdev kbd_led;
 	struct led_classdev eco_led;
+	struct miscdevice miscdev;
 
 	int force_fan;
 	int last_key_event;
@@ -2239,6 +2242,81 @@ static struct attribute_group toshiba_attr_group = {
 	.attrs = toshiba_attributes,
 };
 
+/*
+ * Misc device
+ */
+static int toshiba_acpi_smm_bridge(SMMRegisters *regs)
+{
+	u32 in[TCI_WORDS] = { regs->eax, regs->ebx, regs->ecx,
+			      regs->edx, regs->esi, regs->edi };
+	u32 out[TCI_WORDS];
+	acpi_status status;
+
+	status = tci_raw(toshiba_acpi, in, out);
+	if (ACPI_FAILURE(status)) {
+		pr_err("ACPI call to query SMM registers failed\n");
+		return -EIO;
+	}
+
+	/* Fillout the SMM struct with the TCI call results */
+	regs->eax = out[0];
+	regs->ebx = out[1];
+	regs->ecx = out[2];
+	regs->edx = out[3];
+	regs->esi = out[4];
+	regs->edi = out[5];
+
+	return 0;
+}
+
+static long toshiba_acpi_ioctl(struct file *fp, unsigned int cmd,
+			       unsigned long arg)
+{
+	SMMRegisters __user *argp = (SMMRegisters __user *)arg;
+	SMMRegisters regs;
+	int ret;
+
+	if (!argp)
+		return -EINVAL;
+
+	switch (cmd) {
+	case TOSH_SMM:
+		if (copy_from_user(&regs, argp, sizeof(SMMRegisters)))
+			return -EFAULT;
+		ret = toshiba_acpi_smm_bridge(&regs);
+		if (ret)
+			return ret;
+		if (copy_to_user(argp, &regs, sizeof(SMMRegisters)))
+			return -EFAULT;
+		break;
+	case TOSHIBA_ACPI_SCI:
+		if (copy_from_user(&regs, argp, sizeof(SMMRegisters)))
+			return -EFAULT;
+		/* Ensure we are being called with a SCI_{GET, SET} register */
+		if (regs.eax != SCI_GET && regs.eax != SCI_SET)
+			return -EINVAL;
+		if (!sci_open(toshiba_acpi))
+			return -EIO;
+		ret = toshiba_acpi_smm_bridge(&regs);
+		sci_close(toshiba_acpi);
+		if (ret)
+			return ret;
+		if (copy_to_user(argp, &regs, sizeof(SMMRegisters)))
+			return -EFAULT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct file_operations toshiba_acpi_fops = {
+	.owner		= THIS_MODULE,
+	.unlocked_ioctl = toshiba_acpi_ioctl,
+	.llseek		= noop_llseek,
+};
+
 /*
  * Hotkeys
  */
@@ -2540,6 +2618,8 @@ static int toshiba_acpi_remove(struct acpi_device *acpi_dev)
 {
 	struct toshiba_acpi_dev *dev = acpi_driver_data(acpi_dev);
 
+	misc_deregister(&dev->miscdev);
+
 	remove_toshiba_proc_entries(dev);
 
 	if (dev->sysfs_created)
@@ -2611,6 +2691,17 @@ static int toshiba_acpi_add(struct acpi_device *acpi_dev)
 		return -ENOMEM;
 	dev->acpi_dev = acpi_dev;
 	dev->method_hci = hci_method;
+	dev->miscdev.minor = MISC_DYNAMIC_MINOR;
+	dev->miscdev.name = "toshiba_acpi";
+	dev->miscdev.fops = &toshiba_acpi_fops;
+
+	ret = misc_register(&dev->miscdev);
+	if (ret) {
+		pr_err("Failed to register miscdevice\n");
+		kfree(dev);
+		return ret;
+	}
+
 	acpi_dev->driver_data = dev;
 	dev_set_drvdata(&acpi_dev->dev, dev);
 
diff --git a/include/uapi/linux/toshiba.h b/include/uapi/linux/toshiba.h
index e9bef5b2f91e..c58bf4b5bb26 100644
--- a/include/uapi/linux/toshiba.h
+++ b/include/uapi/linux/toshiba.h
@@ -1,6 +1,7 @@
 /* toshiba.h -- Linux driver for accessing the SMM on Toshiba laptops 
  *
  * Copyright (c) 1996-2000  Jonathan A. Buzzard (jonathan@buzzard.org.uk)
+ * Copyright (c) 2015  Azael Avalos <coproscefalo@gmail.com>
  *
  * Thanks to Juergen Heinzl <juergen@monocerus.demon.co.uk> for the pointers
  * on making sure the structure is aligned and packed.
@@ -20,9 +21,18 @@
 #ifndef _UAPI_LINUX_TOSHIBA_H
 #define _UAPI_LINUX_TOSHIBA_H
 
-#define TOSH_PROC "/proc/toshiba"
-#define TOSH_DEVICE "/dev/toshiba"
-#define TOSH_SMM _IOWR('t', 0x90, int)	/* broken: meant 24 bytes */
+/*
+ * Toshiba modules paths
+ */
+
+#define TOSH_PROC		"/proc/toshiba"
+#define TOSH_DEVICE		"/dev/toshiba"
+#define TOSHIBA_ACPI_PROC	"/proc/acpi/toshiba"
+#define TOSHIBA_ACPI_DEVICE	"/dev/toshiba_acpi"
+
+/*
+ * Toshiba SMM structure
+ */
 
 typedef struct {
 	unsigned int eax;
@@ -33,5 +43,21 @@ typedef struct {
 	unsigned int edi __attribute__ ((packed));
 } SMMRegisters;
 
+/*
+ * IOCTLs (0x90 - 0x91)
+ */
+
+#define TOSH_SMM		_IOWR('t', 0x90, SMMRegisters)
+/*
+ * Convenience toshiba_acpi command.
+ *
+ * The System Configuration Interface (SCI) is opened/closed internally
+ * to avoid userspace of buggy BIOSes.
+ *
+ * The toshiba_acpi module checks whether the eax register is set with
+ * SCI_GET (0xf300) or SCI_SET (0xf400), returning -EINVAL if not.
+ */
+#define TOSHIBA_ACPI_SCI	_IOWR('t', 0x91, SMMRegisters)
+
 
 #endif /* _UAPI_LINUX_TOSHIBA_H */
-- 
cgit v1.2.3


From 2d69a115a991fb7400c501dc643c47ca54cc4045 Mon Sep 17 00:00:00 2001
From: Joachim Eastwood <manabian@gmail.com>
Date: Thu, 9 Jul 2015 22:48:26 +0200
Subject: phy: dt bindings: add NXP LPC18xx/43xx USB OTG PHY bindings

Add binding documentation for NXP LPC18xx/43xx USB OTG PHY. This
PHY can found on NXP LPC18xx and LPC43xx devices with USB support.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
---
 .../bindings/phy/phy-lpc18xx-usb-otg.txt           | 26 ++++++++++++++++++++++
 1 file changed, 26 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/phy/phy-lpc18xx-usb-otg.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/phy/phy-lpc18xx-usb-otg.txt b/Documentation/devicetree/bindings/phy/phy-lpc18xx-usb-otg.txt
new file mode 100644
index 000000000000..bd61b467e30a
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/phy-lpc18xx-usb-otg.txt
@@ -0,0 +1,26 @@
+NXP LPC18xx/43xx internal USB OTG PHY binding
+---------------------------------------------
+
+This file contains documentation for the internal USB OTG PHY found
+in NXP LPC18xx and LPC43xx SoCs.
+
+Required properties:
+- compatible	: must be "nxp,lpc1850-usb-otg-phy"
+- clocks	: must be exactly one entry
+See: Documentation/devicetree/bindings/clock/clock-bindings.txt
+- #phy-cells	: must be 0 for this phy
+See: Documentation/devicetree/bindings/phy/phy-bindings.txt
+
+The phy node must be a child of the creg syscon node.
+
+Example:
+creg: syscon@40043000 {
+	compatible = "nxp,lpc1850-creg", "syscon", "simple-mfd";
+	reg = <0x40043000 0x1000>;
+
+	usb0_otg_phy: phy@004 {
+		compatible = "nxp,lpc1850-usb-otg-phy";
+		clocks = <&ccu1 CLK_USB0>;
+		#phy-cells = <0>;
+	};
+};
-- 
cgit v1.2.3


From d2332303efc00d35af7f3d8b025663965862b0e3 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@redhat.com>
Date: Sat, 13 Jun 2015 14:37:45 +0200
Subject: phy-sun4i-usb: Add id and vbus detection support for the otg phy
 (phy0)

The usb0 phy is connected to an OTG controller, and as such needs some special
handling:

1) It allows explicit control over the pullups, enable these on phy_init and
disable them on phy_exit.

2) It has bits to signal id and vbus detect to the musb-core, add support for
for monitoring id and vbus detect gpio-s for use in dual role mode, and set
these bits to the correct values for operating in host only mode when no
gpios are specified in the devicetree.

While updating the devicetree binding documentation also add documentation
for the sofar undocumented usage of regulators for vbus for all 3 phys.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
---
 .../devicetree/bindings/phy/sun4i-usb-phy.txt      |  18 +-
 drivers/phy/phy-sun4i-usb.c                        | 250 ++++++++++++++++++++-
 2 files changed, 256 insertions(+), 12 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt b/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
index 16528b9eb561..557fa9940164 100644
--- a/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
+++ b/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
@@ -23,6 +23,13 @@ Required properties:
   * "usb1_reset"
   * "usb2_reset" for sun4i, sun6i or sun7i
 
+Optional properties:
+- usb0_id_det-gpios : gpio phandle for reading the otg id pin value
+- usb0_vbus_det-gpios : gpio phandle for detecting the presence of usb0 vbus
+- usb0_vbus-supply : regulator phandle for controller usb0 vbus
+- usb1_vbus-supply : regulator phandle for controller usb1 vbus
+- usb2_vbus-supply : regulator phandle for controller usb2 vbus
+
 Example:
 	usbphy: phy@0x01c13400 {
 		#phy-cells = <1>;
@@ -32,6 +39,13 @@ Example:
 		reg-names = "phy_ctrl", "pmu1", "pmu2";
 		clocks = <&usb_clk 8>;
 		clock-names = "usb_phy";
-		resets = <&usb_clk 1>, <&usb_clk 2>;
-		reset-names = "usb1_reset", "usb2_reset";
+		resets = <&usb_clk 0>, <&usb_clk 1>, <&usb_clk 2>;
+		reset-names = "usb0_reset", "usb1_reset", "usb2_reset";
+		pinctrl-names = "default";
+		pinctrl-0 = <&usb0_id_detect_pin>, <&usb0_vbus_detect_pin>;
+		usb0_id_det-gpios = <&pio 7 19 GPIO_ACTIVE_HIGH>; /* PH19 */
+		usb0_vbus_det-gpios = <&pio 7 22 GPIO_ACTIVE_HIGH>; /* PH22 */
+		usb0_vbus-supply = <&reg_usb0_vbus>;
+		usb1_vbus-supply = <&reg_usb1_vbus>;
+		usb2_vbus-supply = <&reg_usb2_vbus>;
 	};
diff --git a/drivers/phy/phy-sun4i-usb.c b/drivers/phy/phy-sun4i-usb.c
index e17c539e4f6f..37f52f73fdba 100644
--- a/drivers/phy/phy-sun4i-usb.c
+++ b/drivers/phy/phy-sun4i-usb.c
@@ -1,7 +1,7 @@
 /*
  * Allwinner sun4i USB phy driver
  *
- * Copyright (C) 2014 Hans de Goede <hdegoede@redhat.com>
+ * Copyright (C) 2014-2015 Hans de Goede <hdegoede@redhat.com>
  *
  * Based on code from
  * Allwinner Technology Co., Ltd. <www.allwinnertech.com>
@@ -24,16 +24,19 @@
 #include <linux/clk.h>
 #include <linux/err.h>
 #include <linux/io.h>
+#include <linux/interrupt.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/of.h>
 #include <linux/of_address.h>
+#include <linux/of_gpio.h>
 #include <linux/phy/phy.h>
 #include <linux/phy/phy-sun4i-usb.h>
 #include <linux/platform_device.h>
 #include <linux/regulator/consumer.h>
 #include <linux/reset.h>
+#include <linux/workqueue.h>
 
 #define REG_ISCR			0x00
 #define REG_PHYCTL			0x04
@@ -47,6 +50,17 @@
 #define SUNXI_AHB_INCRX_ALIGN_EN	BIT(8)
 #define SUNXI_ULPI_BYPASS_EN		BIT(0)
 
+/* ISCR, Interface Status and Control bits */
+#define ISCR_ID_PULLUP_EN		(1 << 17)
+#define ISCR_DPDM_PULLUP_EN	(1 << 16)
+/* sunxi has the phy id/vbus pins not connected, so we use the force bits */
+#define ISCR_FORCE_ID_MASK	(3 << 14)
+#define ISCR_FORCE_ID_LOW		(2 << 14)
+#define ISCR_FORCE_ID_HIGH	(3 << 14)
+#define ISCR_FORCE_VBUS_MASK	(3 << 12)
+#define ISCR_FORCE_VBUS_LOW	(2 << 12)
+#define ISCR_FORCE_VBUS_HIGH	(3 << 12)
+
 /* Common Control Bits for Both PHYs */
 #define PHY_PLL_BW			0x03
 #define PHY_RES45_CAL_EN		0x0c
@@ -63,6 +77,13 @@
 
 #define MAX_PHYS			3
 
+/*
+ * Note do not raise the debounce time, we must report Vusb high within 100ms
+ * otherwise we get Vbus errors
+ */
+#define DEBOUNCE_TIME			msecs_to_jiffies(50)
+#define POLL_TIME			msecs_to_jiffies(250)
+
 struct sun4i_usb_phy_data {
 	void __iomem *base;
 	struct mutex mutex;
@@ -74,13 +95,56 @@ struct sun4i_usb_phy_data {
 		struct regulator *vbus;
 		struct reset_control *reset;
 		struct clk *clk;
+		bool regulator_on;
 		int index;
 	} phys[MAX_PHYS];
+	/* phy0 / otg related variables */
+	bool phy0_init;
+	bool phy0_poll;
+	struct gpio_desc *id_det_gpio;
+	struct gpio_desc *vbus_det_gpio;
+	int id_det_irq;
+	int vbus_det_irq;
+	int id_det;
+	int vbus_det;
+	struct delayed_work detect;
 };
 
 #define to_sun4i_usb_phy_data(phy) \
 	container_of((phy), struct sun4i_usb_phy_data, phys[(phy)->index])
 
+static void sun4i_usb_phy0_update_iscr(struct phy *_phy, u32 clr, u32 set)
+{
+	struct sun4i_usb_phy *phy = phy_get_drvdata(_phy);
+	struct sun4i_usb_phy_data *data = to_sun4i_usb_phy_data(phy);
+	u32 iscr;
+
+	iscr = readl(data->base + REG_ISCR);
+	iscr &= ~clr;
+	iscr |= set;
+	writel(iscr, data->base + REG_ISCR);
+}
+
+static void sun4i_usb_phy0_set_id_detect(struct phy *phy, u32 val)
+{
+	if (val)
+		val = ISCR_FORCE_ID_HIGH;
+	else
+		val = ISCR_FORCE_ID_LOW;
+
+	sun4i_usb_phy0_update_iscr(phy, ISCR_FORCE_ID_MASK, val);
+}
+
+static void sun4i_usb_phy0_set_vbus_detect(struct phy *phy, u32 val)
+{
+	if (val)
+		val = ISCR_FORCE_VBUS_HIGH;
+	else
+		val = ISCR_FORCE_VBUS_LOW;
+
+	sun4i_usb_phy0_update_iscr(phy, ISCR_FORCE_VBUS_MASK, val);
+}
+
 static void sun4i_usb_phy_write(struct sun4i_usb_phy *phy, u32 addr, u32 data,
 				int len)
 {
@@ -171,12 +235,39 @@ static int sun4i_usb_phy_init(struct phy *_phy)
 
 	sun4i_usb_phy_passby(phy, 1);
 
+	if (phy->index == 0) {
+		data->phy0_init = true;
+
+		/* Enable pull-ups */
+		sun4i_usb_phy0_update_iscr(_phy, 0, ISCR_DPDM_PULLUP_EN);
+		sun4i_usb_phy0_update_iscr(_phy, 0, ISCR_ID_PULLUP_EN);
+
+		if (data->id_det_gpio) {
+			/* OTG mode, force ISCR and cable state updates */
+			data->id_det = -1;
+			data->vbus_det = -1;
+			queue_delayed_work(system_wq, &data->detect, 0);
+		} else {
+			/* Host only mode */
+			sun4i_usb_phy0_set_id_detect(_phy, 0);
+			sun4i_usb_phy0_set_vbus_detect(_phy, 1);
+		}
+	}
+
 	return 0;
 }
 
 static int sun4i_usb_phy_exit(struct phy *_phy)
 {
 	struct sun4i_usb_phy *phy = phy_get_drvdata(_phy);
+	struct sun4i_usb_phy_data *data = to_sun4i_usb_phy_data(phy);
+
+	if (phy->index == 0) {
+		/* Disable pull-ups */
+		sun4i_usb_phy0_update_iscr(_phy, ISCR_DPDM_PULLUP_EN, 0);
+		sun4i_usb_phy0_update_iscr(_phy, ISCR_ID_PULLUP_EN, 0);
+		data->phy0_init = false;
+	}
 
 	sun4i_usb_phy_passby(phy, 0);
 	reset_control_assert(phy->reset);
@@ -188,20 +279,46 @@ static int sun4i_usb_phy_exit(struct phy *_phy)
 static int sun4i_usb_phy_power_on(struct phy *_phy)
 {
 	struct sun4i_usb_phy *phy = phy_get_drvdata(_phy);
-	int ret = 0;
+	struct sun4i_usb_phy_data *data = to_sun4i_usb_phy_data(phy);
+	int ret;
+
+	if (!phy->vbus || phy->regulator_on)
+		return 0;
+
+	/* For phy0 only turn on Vbus if we don't have an ext. Vbus */
+	if (phy->index == 0 && data->vbus_det)
+		return 0;
 
-	if (phy->vbus)
-		ret = regulator_enable(phy->vbus);
+	ret = regulator_enable(phy->vbus);
+	if (ret)
+		return ret;
 
-	return ret;
+	phy->regulator_on = true;
+
+	/* We must report Vbus high within OTG_TIME_A_WAIT_VRISE msec. */
+	if (phy->index == 0 && data->phy0_poll)
+		mod_delayed_work(system_wq, &data->detect, DEBOUNCE_TIME);
+
+	return 0;
 }
 
 static int sun4i_usb_phy_power_off(struct phy *_phy)
 {
 	struct sun4i_usb_phy *phy = phy_get_drvdata(_phy);
+	struct sun4i_usb_phy_data *data = to_sun4i_usb_phy_data(phy);
+
+	if (!phy->vbus || !phy->regulator_on)
+		return 0;
 
-	if (phy->vbus)
-		regulator_disable(phy->vbus);
+	regulator_disable(phy->vbus);
+	phy->regulator_on = false;
+
+	/*
+	 * phy0 vbus typically slowly discharges, sometimes this causes the
+	 * Vbus gpio to not trigger an edge irq on Vbus off, so force a rescan.
+	 */
+	if (phy->index == 0 && !data->phy0_poll)
+		mod_delayed_work(system_wq, &data->detect, POLL_TIME);
 
 	return 0;
 }
@@ -221,6 +338,49 @@ static struct phy_ops sun4i_usb_phy_ops = {
 	.owner		= THIS_MODULE,
 };
 
+static void sun4i_usb_phy0_id_vbus_det_scan(struct work_struct *work)
+{
+	struct sun4i_usb_phy_data *data =
+		container_of(work, struct sun4i_usb_phy_data, detect.work);
+	struct phy *phy0 = data->phys[0].phy;
+	int id_det, vbus_det;
+
+	id_det = gpiod_get_value_cansleep(data->id_det_gpio);
+	vbus_det = gpiod_get_value_cansleep(data->vbus_det_gpio);
+
+	mutex_lock(&phy0->mutex);
+
+	if (!data->phy0_init) {
+		mutex_unlock(&phy0->mutex);
+		return;
+	}
+
+	if (id_det != data->id_det) {
+		sun4i_usb_phy0_set_id_detect(phy0, id_det);
+		data->id_det = id_det;
+	}
+
+	if (vbus_det != data->vbus_det) {
+		sun4i_usb_phy0_set_vbus_detect(phy0, vbus_det);
+		data->vbus_det = vbus_det;
+	}
+
+	mutex_unlock(&phy0->mutex);
+
+	if (data->phy0_poll)
+		queue_delayed_work(system_wq, &data->detect, POLL_TIME);
+}
+
+static irqreturn_t sun4i_usb_phy0_id_vbus_det_irq(int irq, void *dev_id)
+{
+	struct sun4i_usb_phy_data *data = dev_id;
+
+	/* vbus or id changed, let the pins settle and then scan them */
+	mod_delayed_work(system_wq, &data->detect, DEBOUNCE_TIME);
+
+	return IRQ_HANDLED;
+}
+
 static struct phy *sun4i_usb_phy_xlate(struct device *dev,
 					struct of_phandle_args *args)
 {
@@ -232,6 +392,21 @@ static struct phy *sun4i_usb_phy_xlate(struct device *dev,
 	return data->phys[args->args[0]].phy;
 }
 
+static int sun4i_usb_phy_remove(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct sun4i_usb_phy_data *data = dev_get_drvdata(dev);
+
+	if (data->id_det_irq >= 0)
+		devm_free_irq(dev, data->id_det_irq, data);
+	if (data->vbus_det_irq >= 0)
+		devm_free_irq(dev, data->vbus_det_irq, data);
+
+	cancel_delayed_work_sync(&data->detect);
+
+	return 0;
+}
+
 static int sun4i_usb_phy_probe(struct platform_device *pdev)
 {
 	struct sun4i_usb_phy_data *data;
@@ -240,13 +415,15 @@ static int sun4i_usb_phy_probe(struct platform_device *pdev)
 	struct phy_provider *phy_provider;
 	bool dedicated_clocks;
 	struct resource *res;
-	int i;
+	int i, ret;
 
 	data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
 	if (!data)
 		return -ENOMEM;
 
 	mutex_init(&data->mutex);
+	INIT_DELAYED_WORK(&data->detect, sun4i_usb_phy0_id_vbus_det_scan);
+	dev_set_drvdata(dev, data);
 
 	if (of_device_is_compatible(np, "allwinner,sun5i-a13-usb-phy"))
 		data->num_phys = 2;
@@ -269,6 +446,26 @@ static int sun4i_usb_phy_probe(struct platform_device *pdev)
 	if (IS_ERR(data->base))
 		return PTR_ERR(data->base);
 
+	data->id_det_gpio = devm_gpiod_get(dev, "usb0_id_det", GPIOD_IN);
+	if (IS_ERR(data->id_det_gpio)) {
+		if (PTR_ERR(data->id_det_gpio) == -EPROBE_DEFER)
+			return -EPROBE_DEFER;
+		data->id_det_gpio = NULL;
+	}
+
+	data->vbus_det_gpio = devm_gpiod_get(dev, "usb0_vbus_det", GPIOD_IN);
+	if (IS_ERR(data->vbus_det_gpio)) {
+		if (PTR_ERR(data->vbus_det_gpio) == -EPROBE_DEFER)
+			return -EPROBE_DEFER;
+		data->vbus_det_gpio = NULL;
+	}
+
+	/* We either want both gpio pins or neither (when in host mode) */
+	if (!data->id_det_gpio != !data->vbus_det_gpio) {
+		dev_err(dev, "failed to get id or vbus detect pin\n");
+		return -ENODEV;
+	}
+
 	for (i = 0; i < data->num_phys; i++) {
 		struct sun4i_usb_phy *phy = data->phys + i;
 		char name[16];
@@ -318,10 +515,42 @@ static int sun4i_usb_phy_probe(struct platform_device *pdev)
 		phy_set_drvdata(phy->phy, &data->phys[i]);
 	}
 
-	dev_set_drvdata(dev, data);
+	data->id_det_irq = gpiod_to_irq(data->id_det_gpio);
+	data->vbus_det_irq = gpiod_to_irq(data->vbus_det_gpio);
+	if (data->id_det_irq  < 0 || data->vbus_det_irq < 0)
+		data->phy0_poll = true;
+
+	if (data->id_det_irq >= 0) {
+		ret = devm_request_irq(dev, data->id_det_irq,
+				sun4i_usb_phy0_id_vbus_det_irq,
+				IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING,
+				"usb0-id-det", data);
+		if (ret) {
+			dev_err(dev, "Err requesting id-det-irq: %d\n", ret);
+			return ret;
+		}
+	}
+
+	if (data->vbus_det_irq >= 0) {
+		ret = devm_request_irq(dev, data->vbus_det_irq,
+				sun4i_usb_phy0_id_vbus_det_irq,
+				IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING,
+				"usb0-vbus-det", data);
+		if (ret) {
+			dev_err(dev, "Err requesting vbus-det-irq: %d\n", ret);
+			data->vbus_det_irq = -1;
+			sun4i_usb_phy_remove(pdev); /* Stop detect work */
+			return ret;
+		}
+	}
+
 	phy_provider = devm_of_phy_provider_register(dev, sun4i_usb_phy_xlate);
+	if (IS_ERR(phy_provider)) {
+		sun4i_usb_phy_remove(pdev); /* Stop detect work */
+		return PTR_ERR(phy_provider);
+	}
 
-	return PTR_ERR_OR_ZERO(phy_provider);
+	return 0;
 }
 
 static const struct of_device_id sun4i_usb_phy_of_match[] = {
@@ -335,6 +564,7 @@ MODULE_DEVICE_TABLE(of, sun4i_usb_phy_of_match);
 
 static struct platform_driver sun4i_usb_phy_driver = {
 	.probe	= sun4i_usb_phy_probe,
+	.remove	= sun4i_usb_phy_remove,
 	.driver = {
 		.of_match_table	= sun4i_usb_phy_of_match,
 		.name  = "sun4i-usb-phy",
-- 
cgit v1.2.3


From 123dfdbcfaf5eb8753cd6293ab50c41e9b8e9ebf Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@redhat.com>
Date: Sat, 13 Jun 2015 14:37:48 +0200
Subject: phy-sun4i-usb: Add support for the usb-phys on the sun8i-a23 SoC

The usb-phys on the sun8i-a23 SoC have the same setup wrt clocks as on the
sun6i-a31 SoC, but there are only 2 instead of 3 like on the sun5i-a13 SoC.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
---
 Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt | 2 ++
 drivers/phy/phy-sun4i-usb.c                             | 7 +++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt b/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
index 557fa9940164..f0c640adee64 100644
--- a/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
+++ b/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
@@ -7,6 +7,7 @@ Required properties:
   * allwinner,sun5i-a13-usb-phy
   * allwinner,sun6i-a31-usb-phy
   * allwinner,sun7i-a20-usb-phy
+  * allwinner,sun8i-a23-usb-phy
 - reg : a list of offset + length pairs
 - reg-names :
   * "phy_ctrl"
@@ -17,6 +18,7 @@ Required properties:
 - clock-names :
   * "usb_phy" for sun4i, sun5i or sun7i
   * "usb0_phy", "usb1_phy" and "usb2_phy" for sun6i
+  * "usb0_phy", "usb1_phy" for sun8i
 - resets : a list of phandle + reset specifier pairs
 - reset-names :
   * "usb0_reset"
diff --git a/drivers/phy/phy-sun4i-usb.c b/drivers/phy/phy-sun4i-usb.c
index 50ecab9d73df..00424dde79c1 100644
--- a/drivers/phy/phy-sun4i-usb.c
+++ b/drivers/phy/phy-sun4i-usb.c
@@ -442,7 +442,8 @@ static int sun4i_usb_phy_probe(struct platform_device *pdev)
 	INIT_DELAYED_WORK(&data->detect, sun4i_usb_phy0_id_vbus_det_scan);
 	dev_set_drvdata(dev, data);
 
-	if (of_device_is_compatible(np, "allwinner,sun5i-a13-usb-phy"))
+	if (of_device_is_compatible(np, "allwinner,sun5i-a13-usb-phy") ||
+	    of_device_is_compatible(np, "allwinner,sun8i-a23-usb-phy"))
 		data->num_phys = 2;
 	else
 		data->num_phys = 3;
@@ -453,7 +454,8 @@ static int sun4i_usb_phy_probe(struct platform_device *pdev)
 	else
 		data->disc_thresh = 3;
 
-	if (of_device_is_compatible(np, "allwinner,sun6i-a31-usb-phy"))
+	if (of_device_is_compatible(np, "allwinner,sun6i-a31-usb-phy") ||
+	    of_device_is_compatible(np, "allwinner,sun8i-a23-usb-phy"))
 		dedicated_clocks = true;
 	else
 		dedicated_clocks = false;
@@ -588,6 +590,7 @@ static const struct of_device_id sun4i_usb_phy_of_match[] = {
 	{ .compatible = "allwinner,sun5i-a13-usb-phy" },
 	{ .compatible = "allwinner,sun6i-a31-usb-phy" },
 	{ .compatible = "allwinner,sun7i-a20-usb-phy" },
+	{ .compatible = "allwinner,sun8i-a23-usb-phy" },
 	{ },
 };
 MODULE_DEVICE_TABLE(of, sun4i_usb_phy_of_match);
-- 
cgit v1.2.3


From fc1f45ed3043da3aa901e88a9ef0995bbd320698 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@redhat.com>
Date: Sat, 13 Jun 2015 14:37:49 +0200
Subject: phy-sun4i-usb: Add support for the usb-phys on the sun8i-a33 SoC

The usb-phys on the sun8i-a33 SoC are mostly the same as sun8i-a23 but for
some reason (hw bug?) the phyctl register was moved to a different address
and is not initialized to 0 on reset.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
---
 .../devicetree/bindings/phy/sun4i-usb-phy.txt      |  1 +
 drivers/phy/phy-sun4i-usb.c                        | 39 ++++++++++++++++------
 2 files changed, 29 insertions(+), 11 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt b/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
index f0c640adee64..5f48979b4edc 100644
--- a/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
+++ b/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
@@ -8,6 +8,7 @@ Required properties:
   * allwinner,sun6i-a31-usb-phy
   * allwinner,sun7i-a20-usb-phy
   * allwinner,sun8i-a23-usb-phy
+  * allwinner,sun8i-a33-usb-phy
 - reg : a list of offset + length pairs
 - reg-names :
   * "phy_ctrl"
diff --git a/drivers/phy/phy-sun4i-usb.c b/drivers/phy/phy-sun4i-usb.c
index 00424dde79c1..ddc399536c5c 100644
--- a/drivers/phy/phy-sun4i-usb.c
+++ b/drivers/phy/phy-sun4i-usb.c
@@ -40,9 +40,10 @@
 #include <linux/workqueue.h>
 
 #define REG_ISCR			0x00
-#define REG_PHYCTL			0x04
+#define REG_PHYCTL_A10			0x04
 #define REG_PHYBIST			0x08
 #define REG_PHYTUNE			0x0c
+#define REG_PHYCTL_A33			0x10
 
 #define PHYCTL_DATA			BIT(7)
 
@@ -90,6 +91,7 @@ struct sun4i_usb_phy_data {
 	struct mutex mutex;
 	int num_phys;
 	u32 disc_thresh;
+	bool has_a33_phyctl;
 	struct sun4i_usb_phy {
 		struct phy *phy;
 		void __iomem *pmu;
@@ -152,37 +154,46 @@ static void sun4i_usb_phy_write(struct sun4i_usb_phy *phy, u32 addr, u32 data,
 {
 	struct sun4i_usb_phy_data *phy_data = to_sun4i_usb_phy_data(phy);
 	u32 temp, usbc_bit = BIT(phy->index * 2);
+	void *phyctl;
 	int i;
 
 	mutex_lock(&phy_data->mutex);
 
+	if (phy_data->has_a33_phyctl) {
+		phyctl = phy_data->base + REG_PHYCTL_A33;
+		/* A33 needs us to set phyctl to 0 explicitly */
+		writel(0, phyctl);
+	} else {
+		phyctl = phy_data->base + REG_PHYCTL_A10;
+	}
+
 	for (i = 0; i < len; i++) {
-		temp = readl(phy_data->base + REG_PHYCTL);
+		temp = readl(phyctl);
 
 		/* clear the address portion */
 		temp &= ~(0xff << 8);
 
 		/* set the address */
 		temp |= ((addr + i) << 8);
-		writel(temp, phy_data->base + REG_PHYCTL);
+		writel(temp, phyctl);
 
 		/* set the data bit and clear usbc bit*/
-		temp = readb(phy_data->base + REG_PHYCTL);
+		temp = readb(phyctl);
 		if (data & 0x1)
 			temp |= PHYCTL_DATA;
 		else
 			temp &= ~PHYCTL_DATA;
 		temp &= ~usbc_bit;
-		writeb(temp, phy_data->base + REG_PHYCTL);
+		writeb(temp, phyctl);
 
 		/* pulse usbc_bit */
-		temp = readb(phy_data->base + REG_PHYCTL);
+		temp = readb(phyctl);
 		temp |= usbc_bit;
-		writeb(temp, phy_data->base + REG_PHYCTL);
+		writeb(temp, phyctl);
 
-		temp = readb(phy_data->base + REG_PHYCTL);
+		temp = readb(phyctl);
 		temp &= ~usbc_bit;
-		writeb(temp, phy_data->base + REG_PHYCTL);
+		writeb(temp, phyctl);
 
 		data >>= 1;
 	}
@@ -443,7 +454,8 @@ static int sun4i_usb_phy_probe(struct platform_device *pdev)
 	dev_set_drvdata(dev, data);
 
 	if (of_device_is_compatible(np, "allwinner,sun5i-a13-usb-phy") ||
-	    of_device_is_compatible(np, "allwinner,sun8i-a23-usb-phy"))
+	    of_device_is_compatible(np, "allwinner,sun8i-a23-usb-phy") ||
+	    of_device_is_compatible(np, "allwinner,sun8i-a33-usb-phy"))
 		data->num_phys = 2;
 	else
 		data->num_phys = 3;
@@ -455,11 +467,15 @@ static int sun4i_usb_phy_probe(struct platform_device *pdev)
 		data->disc_thresh = 3;
 
 	if (of_device_is_compatible(np, "allwinner,sun6i-a31-usb-phy") ||
-	    of_device_is_compatible(np, "allwinner,sun8i-a23-usb-phy"))
+	    of_device_is_compatible(np, "allwinner,sun8i-a23-usb-phy") ||
+	    of_device_is_compatible(np, "allwinner,sun8i-a33-usb-phy"))
 		dedicated_clocks = true;
 	else
 		dedicated_clocks = false;
 
+	if (of_device_is_compatible(np, "allwinner,sun8i-a33-usb-phy"))
+		data->has_a33_phyctl = true;
+
 	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "phy_ctrl");
 	data->base = devm_ioremap_resource(dev, res);
 	if (IS_ERR(data->base))
@@ -591,6 +607,7 @@ static const struct of_device_id sun4i_usb_phy_of_match[] = {
 	{ .compatible = "allwinner,sun6i-a31-usb-phy" },
 	{ .compatible = "allwinner,sun7i-a20-usb-phy" },
 	{ .compatible = "allwinner,sun8i-a23-usb-phy" },
+	{ .compatible = "allwinner,sun8i-a33-usb-phy" },
 	{ },
 };
 MODULE_DEVICE_TABLE(of, sun4i_usb_phy_of_match);
-- 
cgit v1.2.3


From 8665c18bece7965c3690ac3824bef2f97bae3d71 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@redhat.com>
Date: Sat, 13 Jun 2015 14:37:51 +0200
Subject: phy-sun4i-usb: Add support for monitoring vbus via a power-supply

On some boards there is no vbus_det gpio pin, instead vbus-detection for
otg can be done via the pmic.

This commit adds support for monitoring vbus_det via the power_supply
exported by the pmic, enabling support for otg on these boards.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
---
 .../devicetree/bindings/phy/sun4i-usb-phy.txt      |  1 +
 drivers/phy/Kconfig                                |  1 +
 drivers/phy/phy-sun4i-usb.c                        | 76 ++++++++++++++++++++--
 3 files changed, 71 insertions(+), 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt b/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
index 5f48979b4edc..0cebf7454517 100644
--- a/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
+++ b/Documentation/devicetree/bindings/phy/sun4i-usb-phy.txt
@@ -29,6 +29,7 @@ Required properties:
 Optional properties:
 - usb0_id_det-gpios : gpio phandle for reading the otg id pin value
 - usb0_vbus_det-gpios : gpio phandle for detecting the presence of usb0 vbus
+- usb0_vbus_power-supply: power-supply phandle for usb0 vbus presence detect
 - usb0_vbus-supply : regulator phandle for controller usb0 vbus
 - usb1_vbus-supply : regulator phandle for controller usb1 vbus
 - usb2_vbus-supply : regulator phandle for controller usb2 vbus
diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index c0c1ad2e2c01..0fe9bff7b37b 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -209,6 +209,7 @@ config PHY_SUN4I_USB
 	depends on ARCH_SUNXI && HAS_IOMEM && OF
 	depends on RESET_CONTROLLER
 	depends on EXTCON
+	depends on POWER_SUPPLY
 	select GENERIC_PHY
 	help
 	  Enable this to support the transceiver that is part of Allwinner
diff --git a/drivers/phy/phy-sun4i-usb.c b/drivers/phy/phy-sun4i-usb.c
index 4ba9856ac71a..5138843ec42d 100644
--- a/drivers/phy/phy-sun4i-usb.c
+++ b/drivers/phy/phy-sun4i-usb.c
@@ -36,6 +36,7 @@
 #include <linux/phy/phy.h>
 #include <linux/phy/phy-sun4i-usb.h>
 #include <linux/platform_device.h>
+#include <linux/power_supply.h>
 #include <linux/regulator/consumer.h>
 #include <linux/reset.h>
 #include <linux/workqueue.h>
@@ -108,6 +109,9 @@ struct sun4i_usb_phy_data {
 	bool phy0_poll;
 	struct gpio_desc *id_det_gpio;
 	struct gpio_desc *vbus_det_gpio;
+	struct power_supply *vbus_power_supply;
+	struct notifier_block vbus_power_nb;
+	bool vbus_power_nb_registered;
 	int id_det_irq;
 	int vbus_det_irq;
 	int id_det;
@@ -352,6 +356,30 @@ static struct phy_ops sun4i_usb_phy_ops = {
 	.owner		= THIS_MODULE,
 };
 
+static int sun4i_usb_phy0_get_vbus_det(struct sun4i_usb_phy_data *data)
+{
+	if (data->vbus_det_gpio)
+		return gpiod_get_value_cansleep(data->vbus_det_gpio);
+
+	if (data->vbus_power_supply) {
+		union power_supply_propval val;
+		int r;
+
+		r = power_supply_get_property(data->vbus_power_supply,
+					      POWER_SUPPLY_PROP_PRESENT, &val);
+		if (r == 0)
+			return val.intval;
+	}
+
+	/* Fallback: report vbus as high */
+	return 1;
+}
+
+static bool sun4i_usb_phy0_have_vbus_det(struct sun4i_usb_phy_data *data)
+{
+	return data->vbus_det_gpio || data->vbus_power_supply;
+}
+
 static void sun4i_usb_phy0_id_vbus_det_scan(struct work_struct *work)
 {
 	struct sun4i_usb_phy_data *data =
@@ -360,10 +388,7 @@ static void sun4i_usb_phy0_id_vbus_det_scan(struct work_struct *work)
 	int id_det, vbus_det, id_notify = 0, vbus_notify = 0;
 
 	id_det = gpiod_get_value_cansleep(data->id_det_gpio);
-	if (data->vbus_det_gpio)
-		vbus_det = gpiod_get_value_cansleep(data->vbus_det_gpio);
-	else
-		vbus_det = 1; /* Report vbus as high */
+	vbus_det = sun4i_usb_phy0_get_vbus_det(data);
 
 	mutex_lock(&phy0->mutex);
 
@@ -378,7 +403,7 @@ static void sun4i_usb_phy0_id_vbus_det_scan(struct work_struct *work)
 		 * without vbus detection report vbus low for long enough for
 		 * the musb-ip to end the current device session.
 		 */
-		if (!data->vbus_det_gpio && id_det == 0) {
+		if (!sun4i_usb_phy0_have_vbus_det(data) && id_det == 0) {
 			sun4i_usb_phy0_set_vbus_detect(phy0, 0);
 			msleep(200);
 			sun4i_usb_phy0_set_vbus_detect(phy0, 1);
@@ -404,7 +429,7 @@ static void sun4i_usb_phy0_id_vbus_det_scan(struct work_struct *work)
 		 * without vbus detection report vbus low for long enough to
 		 * the musb-ip to end the current host session.
 		 */
-		if (!data->vbus_det_gpio && id_det == 1) {
+		if (!sun4i_usb_phy0_have_vbus_det(data) && id_det == 1) {
 			mutex_lock(&phy0->mutex);
 			sun4i_usb_phy0_set_vbus_detect(phy0, 0);
 			msleep(1000);
@@ -430,6 +455,20 @@ static irqreturn_t sun4i_usb_phy0_id_vbus_det_irq(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static int sun4i_usb_phy0_vbus_notify(struct notifier_block *nb,
+				      unsigned long val, void *v)
+{
+	struct sun4i_usb_phy_data *data =
+		container_of(nb, struct sun4i_usb_phy_data, vbus_power_nb);
+	struct power_supply *psy = v;
+
+	/* Properties on the vbus_power_supply changed, scan vbus_det */
+	if (val == PSY_EVENT_PROP_CHANGED && psy == data->vbus_power_supply)
+		mod_delayed_work(system_wq, &data->detect, DEBOUNCE_TIME);
+
+	return NOTIFY_OK;
+}
+
 static struct phy *sun4i_usb_phy_xlate(struct device *dev,
 					struct of_phandle_args *args)
 {
@@ -446,6 +485,8 @@ static int sun4i_usb_phy_remove(struct platform_device *pdev)
 	struct device *dev = &pdev->dev;
 	struct sun4i_usb_phy_data *data = dev_get_drvdata(dev);
 
+	if (data->vbus_power_nb_registered)
+		power_supply_unreg_notifier(&data->vbus_power_nb);
 	if (data->id_det_irq >= 0)
 		devm_free_irq(dev, data->id_det_irq, data);
 	if (data->vbus_det_irq >= 0)
@@ -522,8 +563,18 @@ static int sun4i_usb_phy_probe(struct platform_device *pdev)
 		data->vbus_det_gpio = NULL;
 	}
 
+	if (of_find_property(np, "usb0_vbus_power-supply", NULL)) {
+		data->vbus_power_supply = devm_power_supply_get_by_phandle(dev,
+						     "usb0_vbus_power-supply");
+		if (IS_ERR(data->vbus_power_supply))
+			return PTR_ERR(data->vbus_power_supply);
+
+		if (!data->vbus_power_supply)
+			return -EPROBE_DEFER;
+	}
+
 	/* vbus_det without id_det makes no sense, and is not supported */
-	if (data->vbus_det_gpio && !data->id_det_gpio) {
+	if (sun4i_usb_phy0_have_vbus_det(data) && !data->id_det_gpio) {
 		dev_err(dev, "usb0_id_det missing or invalid\n");
 		return -ENODEV;
 	}
@@ -620,6 +671,17 @@ static int sun4i_usb_phy_probe(struct platform_device *pdev)
 		}
 	}
 
+	if (data->vbus_power_supply) {
+		data->vbus_power_nb.notifier_call = sun4i_usb_phy0_vbus_notify;
+		data->vbus_power_nb.priority = 0;
+		ret = power_supply_reg_notifier(&data->vbus_power_nb);
+		if (ret) {
+			sun4i_usb_phy_remove(pdev); /* Stop detect work */
+			return ret;
+		}
+		data->vbus_power_nb_registered = true;
+	}
+
 	phy_provider = devm_of_phy_provider_register(dev, sun4i_usb_phy_xlate);
 	if (IS_ERR(phy_provider)) {
 		sun4i_usb_phy_remove(pdev); /* Stop detect work */
-- 
cgit v1.2.3


From cbc59e26e90dab17e233c6cdf51235d5d25f5fba Mon Sep 17 00:00:00 2001
From: Baruch Siach <baruch@tkos.co.il>
Date: Tue, 5 May 2015 13:55:09 +0300
Subject: pinctrl: dt-binding: document Conexant CX92755 SoC

Add pinctrl device tree binding documentation for the General Purpose Pin
Mapping module of the Conexant CX92755 SoC.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 .../bindings/pinctrl/cnxt,cx92755-pinctrl.txt      | 86 ++++++++++++++++++++++
 1 file changed, 86 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pinctrl/cnxt,cx92755-pinctrl.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/pinctrl/cnxt,cx92755-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/cnxt,cx92755-pinctrl.txt
new file mode 100644
index 000000000000..23ce8dc26990
--- /dev/null
+++ b/Documentation/devicetree/bindings/pinctrl/cnxt,cx92755-pinctrl.txt
@@ -0,0 +1,86 @@
+Conexant Digicolor CX92755 General Purpose Pin Mapping
+
+This document describes the device tree binding of the pin mapping hardware
+modules in the Conexant Digicolor CX92755 SoCs. The CX92755 in one of the
+Digicolor series of SoCs.
+
+=== Pin Controller Node ===
+
+Required Properties:
+
+- compatible: Must be "cnxt,cx92755-pinctrl"
+- reg: Base address of the General Purpose Pin Mapping register block and the
+  size of the block.
+- gpio-controller: Marks the device node as a GPIO controller.
+- #gpio-cells: Must be <2>. The first cell is the pin number and the
+  second cell is used to specify flags. See include/dt-bindings/gpio/gpio.h
+  for possible values.
+
+For example, the following is the bare minimum node:
+
+	pinctrl: pinctrl@f0000e20 {
+		compatible = "cnxt,cx92755-pinctrl";
+		reg = <0xf0000e20 0x100>;
+		gpio-controller;
+		#gpio-cells = <2>;
+	};
+
+As a pin controller device, in addition to the required properties, this node
+should also contain the pin configuration nodes that client devices reference,
+if any.
+
+For a general description of GPIO bindings, please refer to ../gpio/gpio.txt.
+
+=== Pin Configuration Node ===
+
+Each pin configuration node is a sub-node of the pin controller node and is a
+container of an arbitrary number of subnodes, called pin group nodes in this
+document.
+
+Please refer to the pinctrl-bindings.txt in this directory for details of the
+common pinctrl bindings used by client devices, including the definition of a
+"pin configuration node".
+
+=== Pin Group Node ===
+
+A pin group node specifies the desired pin mux for an arbitrary number of
+pins. The name of the pin group node is optional and not used.
+
+A pin group node only affects the properties specified in the node, and has no
+effect on any properties that are omitted.
+
+The pin group node accepts a subset of the generic pin config properties. For
+details generic pin config properties, please refer to pinctrl-bindings.txt
+and <include/linux/pinctrl/pinconfig-generic.h>.
+
+Required Pin Group Node Properties:
+
+- pins: Multiple strings. Specifies the name(s) of one or more pins to be
+  configured by this node. The format of a pin name string is "GP_xy", where x
+  is an uppercase character from 'A' to 'R', and y is a digit from 0 to 7.
+- function: String. Specifies the pin mux selection. Values must be one of:
+  "gpio", "client_a", "client_b", "client_c"
+
+Example:
+	pinctrl: pinctrl@f0000e20 {
+		compatible = "cnxt,cx92755-pinctrl";
+		reg = <0xf0000e20 0x100>;
+
+		uart0_default: uart0_active {
+			data_signals {
+				pins = "GP_O0", "GP_O1";
+				function = "client_b";
+			};
+		};
+	};
+
+	uart0: uart@f0000740 {
+		compatible = "cnxt,cx92755-usart";
+		...
+		pinctrl-0 = <&uart0_default>;
+		pinctrl-names = "default";
+	};
+
+In the example above, a single pin group configuration node defines the
+"client select" for the Rx and Tx signals of uart0. The uart0 node references
+that pin configuration node using the &uart0_default phandle.
-- 
cgit v1.2.3


From bb379ceb8d9f3599ab9162ebf58d731350f4177a Mon Sep 17 00:00:00 2001
From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Date: Thu, 16 Jul 2015 21:08:24 +0200
Subject: dt-bindings: gpio: document bindings supported by gpio-mpc8xxx.c
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 .../devicetree/bindings/gpio/gpio-mpc8xxx.txt      | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/gpio/gpio-mpc8xxx.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/gpio/gpio-mpc8xxx.txt b/Documentation/devicetree/bindings/gpio/gpio-mpc8xxx.txt
new file mode 100644
index 000000000000..805ddcd79a57
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/gpio-mpc8xxx.txt
@@ -0,0 +1,22 @@
+* Freescale MPC512x/MPC8xxx GPIO controller
+
+Required properties:
+- compatible : Should be "fsl,<soc>-gpio"
+  The following <soc>s are known to be supported:
+    mpc5121, mpc5125, mpc8349, mpc8572, mpc8610, pq3, qoriq
+- reg : Address and length of the register set for the device
+- interrupts : Should be the port interrupt shared by all 32 pins.
+- #gpio-cells : Should be two.  The first cell is the pin number and
+  the second cell is used to specify the gpio polarity:
+      0 = active high
+      1 = active low
+
+Example:
+
+gpio0: gpio@1100 {
+	compatible = "fsl,mpc5125-gpio";
+	#gpio-cells = <2>;
+	reg = <0x1100 0x080>;
+	interrupts = <78 0x8>;
+	status = "okay";
+};
-- 
cgit v1.2.3


From 8cd14702be9bcb2ec45e1ec30af04aea9b965708 Mon Sep 17 00:00:00 2001
From: Ulrich Hecht <ulrich.hecht+renesas@gmail.com>
Date: Tue, 21 Jul 2015 11:08:50 +0200
Subject: gpio: rcar: Add r8a7795 (R-Car H3) support

R-Car Gen3's GPIO blocks are identical to Gen2's in every respect.

Signed-off-by: Ulrich Hecht <ulrich.hecht+renesas@gmail.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt | 1 +
 drivers/gpio/gpio-rcar.c                                     | 4 ++++
 2 files changed, 5 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt b/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt
index 38fb86f28ba2..f60e2f477e93 100644
--- a/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt
+++ b/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt
@@ -9,6 +9,7 @@ Required Properties:
     - "renesas,gpio-r8a7791": for R8A7791 (R-Car M2-W) compatible GPIO controller.
     - "renesas,gpio-r8a7793": for R8A7793 (R-Car M2-N) compatible GPIO controller.
     - "renesas,gpio-r8a7794": for R8A7794 (R-Car E2) compatible GPIO controller.
+    - "renesas,gpio-r8a7795": for R8A7795 (R-Car H3) compatible GPIO controller.
     - "renesas,gpio-rcar": for generic R-Car GPIO controller.
 
   - reg: Base address and length of each memory resource used by the GPIO
diff --git a/drivers/gpio/gpio-rcar.c b/drivers/gpio/gpio-rcar.c
index 4fc13ce9c60a..2a8122444614 100644
--- a/drivers/gpio/gpio-rcar.c
+++ b/drivers/gpio/gpio-rcar.c
@@ -341,6 +341,10 @@ static const struct of_device_id gpio_rcar_of_table[] = {
 	}, {
 		.compatible = "renesas,gpio-r8a7794",
 		.data = &gpio_rcar_info_gen2,
+	}, {
+		.compatible = "renesas,gpio-r8a7795",
+		/* Gen3 GPIO is identical to Gen2. */
+		.data = &gpio_rcar_info_gen2,
 	}, {
 		.compatible = "renesas,gpio-rcar",
 		.data = &gpio_rcar_info_gen1,
-- 
cgit v1.2.3


From d8323c6b03533ac870fb665277e6dad7ebf7e4d3 Mon Sep 17 00:00:00 2001
From: Maxime Ripard <maxime.ripard@free-electrons.com>
Date: Mon, 27 Jul 2015 14:41:57 +0200
Subject: pinctrl: sunxi: Add custom irq_domain_ops

The current interrupt parsing code was working by accident, because the
default was actually parsing the first node of interrupts.

While that was mostly working (and the flags were actually ignored), this
binding has never been documented, and doesn't work with SoCs that have
multiple interrupt banks anyway.

Add a proper interrupt xlate function, that uses the same description than
the GPIOs (<bank> <pin> <flags>), that will make things less confusing.

The EINT number will still be used as the hwirq number, but won't be
exposed through the DT.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 .../bindings/pinctrl/allwinner,sunxi-pinctrl.txt   | 37 +++++++++++++++++++++-
 drivers/pinctrl/sunxi/pinctrl-sunxi.c              | 35 ++++++++++++++++++--
 2 files changed, 69 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/pinctrl/allwinner,sunxi-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/allwinner,sunxi-pinctrl.txt
index 9462ab7ddd1f..3c821cda1ad0 100644
--- a/Documentation/devicetree/bindings/pinctrl/allwinner,sunxi-pinctrl.txt
+++ b/Documentation/devicetree/bindings/pinctrl/allwinner,sunxi-pinctrl.txt
@@ -48,7 +48,7 @@ Optional subnode-properties:
 
 Examples:
 
-pinctrl@01c20800 {
+pio: pinctrl@01c20800 {
 	compatible = "allwinner,sun5i-a13-pinctrl";
 	reg = <0x01c20800 0x400>;
 	#address-cells = <1>;
@@ -68,3 +68,38 @@ pinctrl@01c20800 {
 		allwinner,pull = <0>;
 	};
 };
+
+
+GPIO and interrupt controller
+-----------------------------
+
+This hardware also acts as a GPIO controller and an interrupt
+controller.
+
+Consumers that would want to refer to one or the other (or both)
+should provide through the usual *-gpios and interrupts properties a
+cell with 3 arguments, first the number of the bank, then the pin
+inside that bank, and finally the flags for the GPIO/interrupts.
+
+Example:
+
+xio: gpio@38 {
+	compatible = "nxp,pcf8574a";
+	reg = <0x38>;
+
+	gpio-controller;
+	#gpio-cells = <2>;
+
+	interrupt-parent = <&pio>;
+	interrupts = <6 0 IRQ_TYPE_EDGE_FALLING>;
+	interrupt-controller;
+	#interrupt-cells = <2>;
+};
+
+reg_usb1_vbus: usb1-vbus {
+	compatible = "regulator-fixed";
+	regulator-name = "usb1-vbus";
+	regulator-min-microvolt = <5000000>;
+	regulator-max-microvolt = <5000000>;
+	gpio = <&pio 7 6 GPIO_ACTIVE_HIGH>;
+};
diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.c b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
index 7f7e7bb4f22c..fb4669c0ce0e 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sunxi.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
@@ -711,6 +711,37 @@ static struct irq_chip sunxi_pinctrl_level_irq_chip = {
 			  IRQCHIP_EOI_IF_HANDLED,
 };
 
+static int sunxi_pinctrl_irq_of_xlate(struct irq_domain *d,
+				      struct device_node *node,
+				      const u32 *intspec,
+				      unsigned int intsize,
+				      unsigned long *out_hwirq,
+				      unsigned int *out_type)
+{
+	struct sunxi_desc_function *desc;
+	int pin, base;
+
+	if (intsize < 3)
+		return -EINVAL;
+
+	base = PINS_PER_BANK * intspec[0];
+	pin = base + intspec[1];
+
+	desc = sunxi_pinctrl_desc_find_function_by_pin(d->host_data,
+						       pin, "irq");
+	if (!desc)
+		return -EINVAL;
+
+	*out_hwirq = desc->irqbank * PINS_PER_BANK + desc->irqnum;
+	*out_type = intspec[2];
+
+	return 0;
+}
+
+static struct irq_domain_ops sunxi_pinctrl_irq_domain_ops = {
+	.xlate		= sunxi_pinctrl_irq_of_xlate,
+};
+
 static void sunxi_pinctrl_irq_handler(unsigned __irq, struct irq_desc *desc)
 {
 	unsigned int irq = irq_desc_get_irq(desc);
@@ -986,8 +1017,8 @@ int sunxi_pinctrl_init(struct platform_device *pdev,
 
 	pctl->domain = irq_domain_add_linear(node,
 					     pctl->desc->irq_banks * IRQ_PER_BANK,
-					     &irq_domain_simple_ops,
-					     NULL);
+					     &sunxi_pinctrl_irq_domain_ops,
+					     pctl);
 	if (!pctl->domain) {
 		dev_err(&pdev->dev, "Couldn't register IRQ domain\n");
 		ret = -ENOMEM;
-- 
cgit v1.2.3


From d705073cdafa75286970dd30f722d0df584bae54 Mon Sep 17 00:00:00 2001
From: Rabin Vincent <rabin@rab.in>
Date: Wed, 22 Jul 2015 15:05:19 +0200
Subject: gpio: etraxfs: add support for ARTPEC-3

Add support for the GIO block in the ARTPEC-3 SoC.  The basic
functionality is essentialy the same as the version in the ETRAX FS,
except for a different set of ports, including a read-only port.

Cc: devicetree@vger.kernel.org
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 .../devicetree/bindings/gpio/gpio-etraxfs.txt      |  3 +-
 drivers/gpio/gpio-etraxfs.c                        | 66 ++++++++++++++++++++--
 2 files changed, 62 insertions(+), 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/gpio/gpio-etraxfs.txt b/Documentation/devicetree/bindings/gpio/gpio-etraxfs.txt
index abf4db736c6e..170194af3027 100644
--- a/Documentation/devicetree/bindings/gpio/gpio-etraxfs.txt
+++ b/Documentation/devicetree/bindings/gpio/gpio-etraxfs.txt
@@ -2,8 +2,9 @@ Axis ETRAX FS General I/O controller bindings
 
 Required properties:
 
-- compatible:
+- compatible: one of:
   - "axis,etraxfs-gio"
+  - "axis,artpec3-gio"
 - reg: Physical base address and length of the controller's registers.
 - #gpio-cells: Should be 3
   - The first cell is the gpio offset number.
diff --git a/drivers/gpio/gpio-etraxfs.c b/drivers/gpio/gpio-etraxfs.c
index 625a9ed411da..27e5d8855205 100644
--- a/drivers/gpio/gpio-etraxfs.c
+++ b/drivers/gpio/gpio-etraxfs.c
@@ -26,6 +26,17 @@
 #define ETRAX_FS_r_pe_din	84
 #define ETRAX_FS_rw_pe_oe	88
 
+#define ARTPEC3_r_pa_din	0
+#define ARTPEC3_rw_pa_dout	4
+#define ARTPEC3_rw_pa_oe	8
+#define ARTPEC3_r_pb_din	44
+#define ARTPEC3_rw_pb_dout	48
+#define ARTPEC3_rw_pb_oe	52
+#define ARTPEC3_r_pc_din	88
+#define ARTPEC3_rw_pc_dout	92
+#define ARTPEC3_rw_pc_oe	96
+#define ARTPEC3_r_pd_din	116
+
 struct etraxfs_gpio_port {
 	const char *label;
 	unsigned int oe;
@@ -82,6 +93,40 @@ static const struct etraxfs_gpio_info etraxfs_gpio_etraxfs = {
 	.ports = etraxfs_gpio_etraxfs_ports,
 };
 
+static const struct etraxfs_gpio_port etraxfs_gpio_artpec3_ports[] = {
+	{
+		.label	= "A",
+		.ngpio	= 32,
+		.oe	= ARTPEC3_rw_pa_oe,
+		.dout	= ARTPEC3_rw_pa_dout,
+		.din	= ARTPEC3_r_pa_din,
+	},
+	{
+		.label	= "B",
+		.ngpio	= 32,
+		.oe	= ARTPEC3_rw_pb_oe,
+		.dout	= ARTPEC3_rw_pb_dout,
+		.din	= ARTPEC3_r_pb_din,
+	},
+	{
+		.label	= "C",
+		.ngpio	= 16,
+		.oe	= ARTPEC3_rw_pc_oe,
+		.dout	= ARTPEC3_rw_pc_dout,
+		.din	= ARTPEC3_r_pc_din,
+	},
+	{
+		.label	= "D",
+		.ngpio	= 32,
+		.din	= ARTPEC3_r_pd_din,
+	},
+};
+
+static const struct etraxfs_gpio_info etraxfs_gpio_artpec3 = {
+	.num_ports = ARRAY_SIZE(etraxfs_gpio_artpec3_ports),
+	.ports = etraxfs_gpio_artpec3_ports,
+};
+
 static int etraxfs_gpio_of_xlate(struct gpio_chip *gc,
 			       const struct of_phandle_args *gpiospec,
 			       u32 *flags)
@@ -101,6 +146,10 @@ static const struct of_device_id etraxfs_gpio_of_table[] = {
 		.compatible = "axis,etraxfs-gio",
 		.data = &etraxfs_gpio_etraxfs,
 	},
+	{
+		.compatible = "axis,artpec3-gio",
+		.data = &etraxfs_gpio_artpec3,
+	},
 	{},
 };
 
@@ -133,14 +182,19 @@ static int etraxfs_gpio_probe(struct platform_device *pdev)
 	for (i = 0; i < info->num_ports; i++) {
 		struct bgpio_chip *bgc = &chips[i];
 		const struct etraxfs_gpio_port *port = &info->ports[i];
+		unsigned long flags = BGPIOF_READ_OUTPUT_REG_SET;
+		void __iomem *dat = regs + port->din;
+		void __iomem *set = regs + port->dout;
+		void __iomem *dirout = regs + port->oe;
+
+		if (dirout == set) {
+			dirout = set = NULL;
+			flags = BGPIOF_NO_OUTPUT;
+		}
 
 		ret = bgpio_init(bgc, dev, 4,
-				 regs + port->din,	/* dat */
-				 regs + port->dout,	/* set */
-				 NULL,			/* clr */
-				 regs + port->oe,	/* dirout */
-				 NULL,			/* dirin */
-				 BGPIOF_READ_OUTPUT_REG_SET);
+				 dat, set, NULL, dirout, NULL,
+				 flags);
 		if (ret)
 			return ret;
 
-- 
cgit v1.2.3


From 0817b62cc037a56c5e4238c7eb7522299ea27aef Mon Sep 17 00:00:00 2001
From: Boris Brezillon <boris.brezillon@free-electrons.com>
Date: Tue, 7 Jul 2015 20:48:08 +0200
Subject: clk: change clk_ops' ->determine_rate() prototype
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Clock rates are stored in an unsigned long field, but ->determine_rate()
(which returns a rounded rate from a requested one) returns a long
value (errors are reported using negative error codes), which can lead
to long overflow if the clock rate exceed 2Ghz.

Change ->determine_rate() prototype to return 0 or an error code, and pass
a pointer to a clk_rate_request structure containing the expected target
rate and the rate constraints imposed by clk users.

The clk_rate_request structure might be extended in the future to contain
other kind of constraints like the rounding policy, the maximum clock
inaccuracy or other things that are not yet supported by the CCF
(power consumption constraints ?).

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
CC: Jonathan Corbet <corbet@lwn.net>
CC: Tony Lindgren <tony@atomide.com>
CC: Ralf Baechle <ralf@linux-mips.org>
CC: "Emilio López" <emilio@elopez.com.ar>
CC: Maxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: Tero Kristo <t-kristo@ti.com>
CC: Peter De Schrijver <pdeschrijver@nvidia.com>
CC: Prashant Gaikwad <pgaikwad@nvidia.com>
CC: Stephen Warren <swarren@wwwdotorg.org>
CC: Thierry Reding <thierry.reding@gmail.com>
CC: Alexandre Courbot <gnurou@gmail.com>
CC: linux-doc@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-arm-kernel@lists.infradead.org
CC: linux-omap@vger.kernel.org
CC: linux-mips@linux-mips.org
CC: linux-tegra@vger.kernel.org
[sboyd@codeaurora.org: Fix parent dereference problem in
__clk_determine_rate()]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
[sboyd@codeaurora.org: Folded in fix from Heiko for fixed-rate
clocks without parents or a rate determining op]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 Documentation/clk.txt               |   8 +-
 arch/arm/mach-omap2/dpll3xxx.c      |  29 +++---
 arch/arm/mach-omap2/dpll44xx.c      |  30 +++---
 arch/mips/alchemy/common/clock.c    |  61 +++++--------
 drivers/clk/at91/clk-programmable.c |  25 ++---
 drivers/clk/at91/clk-usb.c          |  28 +++---
 drivers/clk/bcm/clk-kona.c          |  34 ++++---
 drivers/clk/clk-composite.c         |  48 +++++-----
 drivers/clk/clk.c                   | 176 ++++++++++++++++++++----------------
 drivers/clk/hisilicon/clk-hi3620.c  |  39 ++++----
 drivers/clk/mmp/clk-mix.c           |  20 ++--
 drivers/clk/qcom/clk-pll.c          |  18 ++--
 drivers/clk/qcom/clk-rcg.c          |  44 ++++-----
 drivers/clk/qcom/clk-rcg2.c         |  78 ++++++++--------
 drivers/clk/sunxi/clk-factors.c     |  21 ++---
 drivers/clk/sunxi/clk-sun6i-ar100.c |  21 ++---
 drivers/clk/sunxi/clk-sunxi.c       |  20 ++--
 drivers/clk/tegra/clk-emc.c         |  28 +++---
 include/linux/clk-provider.h        |  49 ++++++----
 include/linux/clk/ti.h              |  16 +---
 20 files changed, 392 insertions(+), 401 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/clk.txt b/Documentation/clk.txt
index f463bdc37f88..5c4bc4d01d0c 100644
--- a/Documentation/clk.txt
+++ b/Documentation/clk.txt
@@ -71,12 +71,8 @@ the operations defined in clk.h:
 		long		(*round_rate)(struct clk_hw *hw,
 						unsigned long rate,
 						unsigned long *parent_rate);
-		long		(*determine_rate)(struct clk_hw *hw,
-						unsigned long rate,
-						unsigned long min_rate,
-						unsigned long max_rate,
-						unsigned long *best_parent_rate,
-						struct clk_hw **best_parent_clk);
+		int		(*determine_rate)(struct clk_hw *hw,
+						  struct clk_rate_request *req);
 		int		(*set_parent)(struct clk_hw *hw, u8 index);
 		u8		(*get_parent)(struct clk_hw *hw);
 		int		(*set_rate)(struct clk_hw *hw,
diff --git a/arch/arm/mach-omap2/dpll3xxx.c b/arch/arm/mach-omap2/dpll3xxx.c
index 44e57ec225d4..8c57ace30421 100644
--- a/arch/arm/mach-omap2/dpll3xxx.c
+++ b/arch/arm/mach-omap2/dpll3xxx.c
@@ -462,43 +462,38 @@ void omap3_noncore_dpll_disable(struct clk_hw *hw)
 /**
  * omap3_noncore_dpll_determine_rate - determine rate for a DPLL
  * @hw: pointer to the clock to determine rate for
- * @rate: target rate for the DPLL
- * @best_parent_rate: pointer for returning best parent rate
- * @best_parent_clk: pointer for returning best parent clock
+ * @req: target rate request
  *
  * Determines which DPLL mode to use for reaching a desired target rate.
  * Checks whether the DPLL shall be in bypass or locked mode, and if
  * locked, calculates the M,N values for the DPLL via round-rate.
- * Returns a positive clock rate with success, negative error value
- * in failure.
+ * Returns a 0 on success, negative error value in failure.
  */
-long omap3_noncore_dpll_determine_rate(struct clk_hw *hw, unsigned long rate,
-				       unsigned long min_rate,
-				       unsigned long max_rate,
-				       unsigned long *best_parent_rate,
-				       struct clk_hw **best_parent_clk)
+int omap3_noncore_dpll_determine_rate(struct clk_hw *hw,
+				      struct clk_rate_request *req)
 {
 	struct clk_hw_omap *clk = to_clk_hw_omap(hw);
 	struct dpll_data *dd;
 
-	if (!hw || !rate)
+	if (!req->rate)
 		return -EINVAL;
 
 	dd = clk->dpll_data;
 	if (!dd)
 		return -EINVAL;
 
-	if (__clk_get_rate(dd->clk_bypass) == rate &&
+	if (__clk_get_rate(dd->clk_bypass) == req->rate &&
 	    (dd->modes & (1 << DPLL_LOW_POWER_BYPASS))) {
-		*best_parent_clk = __clk_get_hw(dd->clk_bypass);
+		req->best_parent_hw = __clk_get_hw(dd->clk_bypass);
 	} else {
-		rate = omap2_dpll_round_rate(hw, rate, best_parent_rate);
-		*best_parent_clk = __clk_get_hw(dd->clk_ref);
+		req->rate = omap2_dpll_round_rate(hw, req->rate,
+					  &req->best_parent_rate);
+		req->best_parent_hw = __clk_get_hw(dd->clk_ref);
 	}
 
-	*best_parent_rate = rate;
+	req->best_parent_rate = req->rate;
 
-	return rate;
+	return 0;
 }
 
 /**
diff --git a/arch/arm/mach-omap2/dpll44xx.c b/arch/arm/mach-omap2/dpll44xx.c
index f231be05b9a6..446a4e0d5a6a 100644
--- a/arch/arm/mach-omap2/dpll44xx.c
+++ b/arch/arm/mach-omap2/dpll44xx.c
@@ -191,42 +191,36 @@ out:
 /**
  * omap4_dpll_regm4xen_determine_rate - determine rate for a DPLL
  * @hw: pointer to the clock to determine rate for
- * @rate: target rate for the DPLL
- * @best_parent_rate: pointer for returning best parent rate
- * @best_parent_clk: pointer for returning best parent clock
+ * @req: target rate request
  *
  * Determines which DPLL mode to use for reaching a desired rate.
  * Checks whether the DPLL shall be in bypass or locked mode, and if
  * locked, calculates the M,N values for the DPLL via round-rate.
- * Returns a positive clock rate with success, negative error value
- * in failure.
+ * Returns 0 on success and a negative error value otherwise.
  */
-long omap4_dpll_regm4xen_determine_rate(struct clk_hw *hw, unsigned long rate,
-					unsigned long min_rate,
-					unsigned long max_rate,
-					unsigned long *best_parent_rate,
-					struct clk_hw **best_parent_clk)
+int omap4_dpll_regm4xen_determine_rate(struct clk_hw *hw,
+				       struct clk_rate_request *req)
 {
 	struct clk_hw_omap *clk = to_clk_hw_omap(hw);
 	struct dpll_data *dd;
 
-	if (!hw || !rate)
+	if (!req->rate)
 		return -EINVAL;
 
 	dd = clk->dpll_data;
 	if (!dd)
 		return -EINVAL;
 
-	if (__clk_get_rate(dd->clk_bypass) == rate &&
+	if (__clk_get_rate(dd->clk_bypass) == req->rate &&
 	    (dd->modes & (1 << DPLL_LOW_POWER_BYPASS))) {
-		*best_parent_clk = __clk_get_hw(dd->clk_bypass);
+		req->best_parent_hw = __clk_get_hw(dd->clk_bypass);
 	} else {
-		rate = omap4_dpll_regm4xen_round_rate(hw, rate,
-						      best_parent_rate);
-		*best_parent_clk = __clk_get_hw(dd->clk_ref);
+		req->rate = omap4_dpll_regm4xen_round_rate(hw, req->rate,
+						&req->best_parent_rate);
+		req->best_parent_hw = __clk_get_hw(dd->clk_ref);
 	}
 
-	*best_parent_rate = rate;
+	req->best_parent_rate = req->rate;
 
-	return rate;
+	return 0;
 }
diff --git a/arch/mips/alchemy/common/clock.c b/arch/mips/alchemy/common/clock.c
index 6e46abe0dac6..0b4cf3e9f005 100644
--- a/arch/mips/alchemy/common/clock.c
+++ b/arch/mips/alchemy/common/clock.c
@@ -389,10 +389,9 @@ static long alchemy_calc_div(unsigned long rate, unsigned long prate,
 	return div1;
 }
 
-static long alchemy_clk_fgcs_detr(struct clk_hw *hw, unsigned long rate,
-					unsigned long *best_parent_rate,
-					struct clk_hw **best_parent_clk,
-					int scale, int maxdiv)
+static int alchemy_clk_fgcs_detr(struct clk_hw *hw,
+				 struct clk_rate_request *req,
+				 int scale, int maxdiv)
 {
 	struct clk *pc, *bpc, *free;
 	long tdv, tpr, pr, nr, br, bpr, diff, lastdiff;
@@ -422,14 +421,14 @@ static long alchemy_clk_fgcs_detr(struct clk_hw *hw, unsigned long rate,
 		}
 
 		pr = clk_get_rate(pc);
-		if (pr < rate)
+		if (pr < req->rate)
 			continue;
 
 		/* what can hardware actually provide */
-		tdv = alchemy_calc_div(rate, pr, scale, maxdiv, NULL);
+		tdv = alchemy_calc_div(req->rate, pr, scale, maxdiv, NULL);
 		nr = pr / tdv;
-		diff = rate - nr;
-		if (nr > rate)
+		diff = req->rate - nr;
+		if (nr > req->rate)
 			continue;
 
 		if (diff < lastdiff) {
@@ -448,15 +447,16 @@ static long alchemy_clk_fgcs_detr(struct clk_hw *hw, unsigned long rate,
 	 */
 	if (lastdiff && free) {
 		for (j = (maxdiv == 4) ? 1 : scale; j <= maxdiv; j += scale) {
-			tpr = rate * j;
+			tpr = req->rate * j;
 			if (tpr < 0)
 				break;
 			pr = clk_round_rate(free, tpr);
 
-			tdv = alchemy_calc_div(rate, pr, scale, maxdiv, NULL);
+			tdv = alchemy_calc_div(req->rate, pr, scale, maxdiv,
+					       NULL);
 			nr = pr / tdv;
-			diff = rate - nr;
-			if (nr > rate)
+			diff = req->rate - nr;
+			if (nr > req->rate)
 				continue;
 			if (diff < lastdiff) {
 				lastdiff = diff;
@@ -469,9 +469,10 @@ static long alchemy_clk_fgcs_detr(struct clk_hw *hw, unsigned long rate,
 		}
 	}
 
-	*best_parent_rate = bpr;
-	*best_parent_clk = __clk_get_hw(bpc);
-	return br;
+	req->best_parent_rate = bpr;
+	req->best_parent_hw = __clk_get_hw(bpc);
+	req->rate = br;
+	return 0;
 }
 
 static int alchemy_clk_fgv1_en(struct clk_hw *hw)
@@ -562,14 +563,10 @@ static unsigned long alchemy_clk_fgv1_recalc(struct clk_hw *hw,
 	return parent_rate / v;
 }
 
-static long alchemy_clk_fgv1_detr(struct clk_hw *hw, unsigned long rate,
-					unsigned long min_rate,
-					unsigned long max_rate,
-					unsigned long *best_parent_rate,
-					struct clk_hw **best_parent_clk)
+static int alchemy_clk_fgv1_detr(struct clk_hw *hw,
+				 struct clk_rate_request *req)
 {
-	return alchemy_clk_fgcs_detr(hw, rate, best_parent_rate,
-				     best_parent_clk, 2, 512);
+	return alchemy_clk_fgcs_detr(hw, req, 2, 512);
 }
 
 /* Au1000, Au1100, Au15x0, Au12x0 */
@@ -696,11 +693,8 @@ static unsigned long alchemy_clk_fgv2_recalc(struct clk_hw *hw,
 	return t;
 }
 
-static long alchemy_clk_fgv2_detr(struct clk_hw *hw, unsigned long rate,
-					unsigned long min_rate,
-					unsigned long max_rate,
-					unsigned long *best_parent_rate,
-					struct clk_hw **best_parent_clk)
+static int alchemy_clk_fgv2_detr(struct clk_hw *hw,
+				 struct clk_rate_request *req)
 {
 	struct alchemy_fgcs_clk *c = to_fgcs_clk(hw);
 	int scale, maxdiv;
@@ -713,8 +707,7 @@ static long alchemy_clk_fgv2_detr(struct clk_hw *hw, unsigned long rate,
 		maxdiv = 512;
 	}
 
-	return alchemy_clk_fgcs_detr(hw, rate, best_parent_rate,
-				     best_parent_clk, scale, maxdiv);
+	return alchemy_clk_fgcs_detr(hw, req, scale, maxdiv);
 }
 
 /* Au1300 larger input mux, no separate disable bit, flexible divider */
@@ -917,17 +910,13 @@ static int alchemy_clk_csrc_setr(struct clk_hw *hw, unsigned long rate,
 	return 0;
 }
 
-static long alchemy_clk_csrc_detr(struct clk_hw *hw, unsigned long rate,
-					unsigned long min_rate,
-					unsigned long max_rate,
-					unsigned long *best_parent_rate,
-					struct clk_hw **best_parent_clk)
+static int alchemy_clk_csrc_detr(struct clk_hw *hw,
+				 struct clk_rate_request *req)
 {
 	struct alchemy_fgcs_clk *c = to_fgcs_clk(hw);
 	int scale = c->dt[2] == 3 ? 1 : 2; /* au1300 check */
 
-	return alchemy_clk_fgcs_detr(hw, rate, best_parent_rate,
-				     best_parent_clk, scale, 4);
+	return alchemy_clk_fgcs_detr(hw, req, scale, 4);
 }
 
 static struct clk_ops alchemy_clkops_csrc = {
diff --git a/drivers/clk/at91/clk-programmable.c b/drivers/clk/at91/clk-programmable.c
index 8c86c0f7847a..43dacad5c96d 100644
--- a/drivers/clk/at91/clk-programmable.c
+++ b/drivers/clk/at91/clk-programmable.c
@@ -54,12 +54,8 @@ static unsigned long clk_programmable_recalc_rate(struct clk_hw *hw,
 	return parent_rate >> pres;
 }
 
-static long clk_programmable_determine_rate(struct clk_hw *hw,
-					    unsigned long rate,
-					    unsigned long min_rate,
-					    unsigned long max_rate,
-					    unsigned long *best_parent_rate,
-					    struct clk_hw **best_parent_hw)
+static int clk_programmable_determine_rate(struct clk_hw *hw,
+					   struct clk_rate_request *req)
 {
 	struct clk *parent = NULL;
 	long best_rate = -EINVAL;
@@ -76,24 +72,29 @@ static long clk_programmable_determine_rate(struct clk_hw *hw,
 		parent_rate = __clk_get_rate(parent);
 		for (shift = 0; shift < PROG_PRES_MASK; shift++) {
 			tmp_rate = parent_rate >> shift;
-			if (tmp_rate <= rate)
+			if (tmp_rate <= req->rate)
 				break;
 		}
 
-		if (tmp_rate > rate)
+		if (tmp_rate > req->rate)
 			continue;
 
-		if (best_rate < 0 || (rate - tmp_rate) < (rate - best_rate)) {
+		if (best_rate < 0 ||
+		    (req->rate - tmp_rate) < (req->rate - best_rate)) {
 			best_rate = tmp_rate;
-			*best_parent_rate = parent_rate;
-			*best_parent_hw = __clk_get_hw(parent);
+			req->best_parent_rate = parent_rate;
+			req->best_parent_hw = __clk_get_hw(parent);
 		}
 
 		if (!best_rate)
 			break;
 	}
 
-	return best_rate;
+	if (best_rate < 0)
+		return best_rate;
+
+	req->rate = best_rate;
+	return 0;
 }
 
 static int clk_programmable_set_parent(struct clk_hw *hw, u8 index)
diff --git a/drivers/clk/at91/clk-usb.c b/drivers/clk/at91/clk-usb.c
index b0cbd2b1ff59..24747df97742 100644
--- a/drivers/clk/at91/clk-usb.c
+++ b/drivers/clk/at91/clk-usb.c
@@ -56,12 +56,8 @@ static unsigned long at91sam9x5_clk_usb_recalc_rate(struct clk_hw *hw,
 	return DIV_ROUND_CLOSEST(parent_rate, (usbdiv + 1));
 }
 
-static long at91sam9x5_clk_usb_determine_rate(struct clk_hw *hw,
-					      unsigned long rate,
-					      unsigned long min_rate,
-					      unsigned long max_rate,
-					      unsigned long *best_parent_rate,
-					      struct clk_hw **best_parent_hw)
+static int at91sam9x5_clk_usb_determine_rate(struct clk_hw *hw,
+					     struct clk_rate_request *req)
 {
 	struct clk *parent = NULL;
 	long best_rate = -EINVAL;
@@ -80,23 +76,23 @@ static long at91sam9x5_clk_usb_determine_rate(struct clk_hw *hw,
 		for (div = 1; div < SAM9X5_USB_MAX_DIV + 2; div++) {
 			unsigned long tmp_parent_rate;
 
-			tmp_parent_rate = rate * div;
+			tmp_parent_rate = req->rate * div;
 			tmp_parent_rate = __clk_round_rate(parent,
 							   tmp_parent_rate);
 			tmp_rate = DIV_ROUND_CLOSEST(tmp_parent_rate, div);
-			if (tmp_rate < rate)
-				tmp_diff = rate - tmp_rate;
+			if (tmp_rate < req->rate)
+				tmp_diff = req->rate - tmp_rate;
 			else
-				tmp_diff = tmp_rate - rate;
+				tmp_diff = tmp_rate - req->rate;
 
 			if (best_diff < 0 || best_diff > tmp_diff) {
 				best_rate = tmp_rate;
 				best_diff = tmp_diff;
-				*best_parent_rate = tmp_parent_rate;
-				*best_parent_hw = __clk_get_hw(parent);
+				req->best_parent_rate = tmp_parent_rate;
+				req->best_parent_hw = __clk_get_hw(parent);
 			}
 
-			if (!best_diff || tmp_rate < rate)
+			if (!best_diff || tmp_rate < req->rate)
 				break;
 		}
 
@@ -104,7 +100,11 @@ static long at91sam9x5_clk_usb_determine_rate(struct clk_hw *hw,
 			break;
 	}
 
-	return best_rate;
+	if (best_rate < 0)
+		return best_rate;
+
+	req->rate = best_rate;
+	return 0;
 }
 
 static int at91sam9x5_clk_usb_set_parent(struct clk_hw *hw, u8 index)
diff --git a/drivers/clk/bcm/clk-kona.c b/drivers/clk/bcm/clk-kona.c
index 79a98506c433..d9c039c1902c 100644
--- a/drivers/clk/bcm/clk-kona.c
+++ b/drivers/clk/bcm/clk-kona.c
@@ -1017,10 +1017,8 @@ static long kona_peri_clk_round_rate(struct clk_hw *hw, unsigned long rate,
 				rate ? rate : 1, *parent_rate, NULL);
 }
 
-static long kona_peri_clk_determine_rate(struct clk_hw *hw, unsigned long rate,
-		unsigned long min_rate,
-		unsigned long max_rate,
-		unsigned long *best_parent_rate, struct clk_hw **best_parent)
+static int kona_peri_clk_determine_rate(struct clk_hw *hw,
+					struct clk_rate_request *req)
 {
 	struct kona_clk *bcm_clk = to_kona_clk(hw);
 	struct clk *clk = hw->clk;
@@ -1029,6 +1027,7 @@ static long kona_peri_clk_determine_rate(struct clk_hw *hw, unsigned long rate,
 	unsigned long best_delta;
 	unsigned long best_rate;
 	u32 parent_count;
+	long rate;
 	u32 which;
 
 	/*
@@ -1037,14 +1036,21 @@ static long kona_peri_clk_determine_rate(struct clk_hw *hw, unsigned long rate,
 	 */
 	WARN_ON_ONCE(bcm_clk->init_data.flags & CLK_SET_RATE_NO_REPARENT);
 	parent_count = (u32)bcm_clk->init_data.num_parents;
-	if (parent_count < 2)
-		return kona_peri_clk_round_rate(hw, rate, best_parent_rate);
+	if (parent_count < 2) {
+		rate = kona_peri_clk_round_rate(hw, req->rate,
+						&req->best_parent_rate);
+		if (rate < 0)
+			return rate;
+
+		req->rate = rate;
+		return 0;
+	}
 
 	/* Unless we can do better, stick with current parent */
 	current_parent = clk_get_parent(clk);
 	parent_rate = __clk_get_rate(current_parent);
-	best_rate = kona_peri_clk_round_rate(hw, rate, &parent_rate);
-	best_delta = abs(best_rate - rate);
+	best_rate = kona_peri_clk_round_rate(hw, req->rate, &parent_rate);
+	best_delta = abs(best_rate - req->rate);
 
 	/* Check whether any other parent clock can produce a better result */
 	for (which = 0; which < parent_count; which++) {
@@ -1058,17 +1064,19 @@ static long kona_peri_clk_determine_rate(struct clk_hw *hw, unsigned long rate,
 
 		/* We don't support CLK_SET_RATE_PARENT */
 		parent_rate = __clk_get_rate(parent);
-		other_rate = kona_peri_clk_round_rate(hw, rate, &parent_rate);
-		delta = abs(other_rate - rate);
+		other_rate = kona_peri_clk_round_rate(hw, req->rate,
+						      &parent_rate);
+		delta = abs(other_rate - req->rate);
 		if (delta < best_delta) {
 			best_delta = delta;
 			best_rate = other_rate;
-			*best_parent = __clk_get_hw(parent);
-			*best_parent_rate = parent_rate;
+			req->best_parent_hw = __clk_get_hw(parent);
+			req->best_parent_rate = parent_rate;
 		}
 	}
 
-	return best_rate;
+	req->rate = best_rate;
+	return 0;
 }
 
 static int kona_peri_clk_set_parent(struct clk_hw *hw, u8 index)
diff --git a/drivers/clk/clk-composite.c b/drivers/clk/clk-composite.c
index 616f5aef3c26..9e69f346ecc6 100644
--- a/drivers/clk/clk-composite.c
+++ b/drivers/clk/clk-composite.c
@@ -55,11 +55,8 @@ static unsigned long clk_composite_recalc_rate(struct clk_hw *hw,
 	return rate_ops->recalc_rate(rate_hw, parent_rate);
 }
 
-static long clk_composite_determine_rate(struct clk_hw *hw, unsigned long rate,
-					unsigned long min_rate,
-					unsigned long max_rate,
-					unsigned long *best_parent_rate,
-					struct clk_hw **best_parent_p)
+static int clk_composite_determine_rate(struct clk_hw *hw,
+					struct clk_rate_request *req)
 {
 	struct clk_composite *composite = to_clk_composite(hw);
 	const struct clk_ops *rate_ops = composite->rate_ops;
@@ -71,25 +68,28 @@ static long clk_composite_determine_rate(struct clk_hw *hw, unsigned long rate,
 	long tmp_rate, best_rate = 0;
 	unsigned long rate_diff;
 	unsigned long best_rate_diff = ULONG_MAX;
+	long rate;
 	int i;
 
 	if (rate_hw && rate_ops && rate_ops->determine_rate) {
 		__clk_hw_set_clk(rate_hw, hw);
-		return rate_ops->determine_rate(rate_hw, rate, min_rate,
-						max_rate,
-						best_parent_rate,
-						best_parent_p);
+		return rate_ops->determine_rate(rate_hw, req);
 	} else if (rate_hw && rate_ops && rate_ops->round_rate &&
 		   mux_hw && mux_ops && mux_ops->set_parent) {
-		*best_parent_p = NULL;
+		req->best_parent_hw = NULL;
 
 		if (__clk_get_flags(hw->clk) & CLK_SET_RATE_NO_REPARENT) {
 			parent = clk_get_parent(mux_hw->clk);
-			*best_parent_p = __clk_get_hw(parent);
-			*best_parent_rate = __clk_get_rate(parent);
+			req->best_parent_hw = __clk_get_hw(parent);
+			req->best_parent_rate = __clk_get_rate(parent);
 
-			return rate_ops->round_rate(rate_hw, rate,
-						    best_parent_rate);
+			rate = rate_ops->round_rate(rate_hw, req->rate,
+						    &req->best_parent_rate);
+			if (rate < 0)
+				return rate;
+
+			req->rate = rate;
+			return 0;
 		}
 
 		for (i = 0; i < __clk_get_num_parents(mux_hw->clk); i++) {
@@ -99,33 +99,33 @@ static long clk_composite_determine_rate(struct clk_hw *hw, unsigned long rate,
 
 			parent_rate = __clk_get_rate(parent);
 
-			tmp_rate = rate_ops->round_rate(rate_hw, rate,
+			tmp_rate = rate_ops->round_rate(rate_hw, req->rate,
 							&parent_rate);
 			if (tmp_rate < 0)
 				continue;
 
-			rate_diff = abs(rate - tmp_rate);
+			rate_diff = abs(req->rate - tmp_rate);
 
-			if (!rate_diff || !*best_parent_p
+			if (!rate_diff || !req->best_parent_hw
 				       || best_rate_diff > rate_diff) {
-				*best_parent_p = __clk_get_hw(parent);
-				*best_parent_rate = parent_rate;
+				req->best_parent_hw = __clk_get_hw(parent);
+				req->best_parent_rate = parent_rate;
 				best_rate_diff = rate_diff;
 				best_rate = tmp_rate;
 			}
 
 			if (!rate_diff)
-				return rate;
+				return 0;
 		}
 
-		return best_rate;
+		req->rate = best_rate;
+		return 0;
 	} else if (mux_hw && mux_ops && mux_ops->determine_rate) {
 		__clk_hw_set_clk(mux_hw, hw);
-		return mux_ops->determine_rate(mux_hw, rate, min_rate,
-					       max_rate, best_parent_rate,
-					       best_parent_p);
+		return mux_ops->determine_rate(mux_hw, req);
 	} else {
 		pr_err("clk: clk_composite_determine_rate function called, but no mux or rate callback set!\n");
+		req->rate = 0;
 		return 0;
 	}
 }
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index ddb4b541016f..4e9ff928ef88 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -436,28 +436,31 @@ static bool mux_is_better_rate(unsigned long rate, unsigned long now,
 	return now <= rate && now > best;
 }
 
-static long
-clk_mux_determine_rate_flags(struct clk_hw *hw, unsigned long rate,
-			     unsigned long min_rate,
-			     unsigned long max_rate,
-			     unsigned long *best_parent_rate,
-			     struct clk_hw **best_parent_p,
+static int
+clk_mux_determine_rate_flags(struct clk_hw *hw, struct clk_rate_request *req,
 			     unsigned long flags)
 {
 	struct clk_core *core = hw->core, *parent, *best_parent = NULL;
-	int i, num_parents;
-	unsigned long parent_rate, best = 0;
+	int i, num_parents, ret;
+	unsigned long best = 0;
+	struct clk_rate_request parent_req = *req;
 
 	/* if NO_REPARENT flag set, pass through to current parent */
 	if (core->flags & CLK_SET_RATE_NO_REPARENT) {
 		parent = core->parent;
-		if (core->flags & CLK_SET_RATE_PARENT)
-			best = __clk_determine_rate(parent ? parent->hw : NULL,
-						    rate, min_rate, max_rate);
-		else if (parent)
+		if (core->flags & CLK_SET_RATE_PARENT) {
+			ret = __clk_determine_rate(parent ? parent->hw : NULL,
+						   &parent_req);
+			if (ret)
+				return ret;
+
+			best = parent_req.rate;
+		} else if (parent) {
 			best = clk_core_get_rate_nolock(parent);
-		else
+		} else {
 			best = clk_core_get_rate_nolock(core);
+		}
+
 		goto out;
 	}
 
@@ -467,24 +470,30 @@ clk_mux_determine_rate_flags(struct clk_hw *hw, unsigned long rate,
 		parent = clk_core_get_parent_by_index(core, i);
 		if (!parent)
 			continue;
-		if (core->flags & CLK_SET_RATE_PARENT)
-			parent_rate = __clk_determine_rate(parent->hw, rate,
-							   min_rate,
-							   max_rate);
-		else
-			parent_rate = clk_core_get_rate_nolock(parent);
-		if (mux_is_better_rate(rate, parent_rate, best, flags)) {
+
+		if (core->flags & CLK_SET_RATE_PARENT) {
+			parent_req = *req;
+			ret = __clk_determine_rate(parent->hw, &parent_req);
+			if (ret)
+				continue;
+		} else {
+			parent_req.rate = clk_core_get_rate_nolock(parent);
+		}
+
+		if (mux_is_better_rate(req->rate, parent_req.rate,
+				       best, flags)) {
 			best_parent = parent;
-			best = parent_rate;
+			best = parent_req.rate;
 		}
 	}
 
 out:
 	if (best_parent)
-		*best_parent_p = best_parent->hw;
-	*best_parent_rate = best;
+		req->best_parent_hw = best_parent->hw;
+	req->best_parent_rate = best;
+	req->rate = best;
 
-	return best;
+	return 0;
 }
 
 struct clk *__clk_lookup(const char *name)
@@ -515,28 +524,17 @@ static void clk_core_get_boundaries(struct clk_core *core,
  * directly as a determine_rate callback (e.g. for a mux), or from a more
  * complex clock that may combine a mux with other operations.
  */
-long __clk_mux_determine_rate(struct clk_hw *hw, unsigned long rate,
-			      unsigned long min_rate,
-			      unsigned long max_rate,
-			      unsigned long *best_parent_rate,
-			      struct clk_hw **best_parent_p)
+int __clk_mux_determine_rate(struct clk_hw *hw,
+			     struct clk_rate_request *req)
 {
-	return clk_mux_determine_rate_flags(hw, rate, min_rate, max_rate,
-					    best_parent_rate,
-					    best_parent_p, 0);
+	return clk_mux_determine_rate_flags(hw, req, 0);
 }
 EXPORT_SYMBOL_GPL(__clk_mux_determine_rate);
 
-long __clk_mux_determine_rate_closest(struct clk_hw *hw, unsigned long rate,
-			      unsigned long min_rate,
-			      unsigned long max_rate,
-			      unsigned long *best_parent_rate,
-			      struct clk_hw **best_parent_p)
+int __clk_mux_determine_rate_closest(struct clk_hw *hw,
+				     struct clk_rate_request *req)
 {
-	return clk_mux_determine_rate_flags(hw, rate, min_rate, max_rate,
-					    best_parent_rate,
-					    best_parent_p,
-					    CLK_MUX_ROUND_CLOSEST);
+	return clk_mux_determine_rate_flags(hw, req, CLK_MUX_ROUND_CLOSEST);
 }
 EXPORT_SYMBOL_GPL(__clk_mux_determine_rate_closest);
 
@@ -759,14 +757,11 @@ int clk_enable(struct clk *clk)
 }
 EXPORT_SYMBOL_GPL(clk_enable);
 
-static unsigned long clk_core_round_rate_nolock(struct clk_core *core,
-						unsigned long rate,
-						unsigned long min_rate,
-						unsigned long max_rate)
+static int clk_core_round_rate_nolock(struct clk_core *core,
+				      struct clk_rate_request *req)
 {
-	unsigned long parent_rate = 0;
 	struct clk_core *parent;
-	struct clk_hw *parent_hw;
+	long rate;
 
 	lockdep_assert_held(&prepare_lock);
 
@@ -774,21 +769,30 @@ static unsigned long clk_core_round_rate_nolock(struct clk_core *core,
 		return 0;
 
 	parent = core->parent;
-	if (parent)
-		parent_rate = parent->rate;
+	if (parent) {
+		req->best_parent_hw = parent->hw;
+		req->best_parent_rate = parent->rate;
+	} else {
+		req->best_parent_hw = NULL;
+		req->best_parent_rate = 0;
+	}
 
 	if (core->ops->determine_rate) {
-		parent_hw = parent ? parent->hw : NULL;
-		return core->ops->determine_rate(core->hw, rate,
-						min_rate, max_rate,
-						&parent_rate, &parent_hw);
-	} else if (core->ops->round_rate)
-		return core->ops->round_rate(core->hw, rate, &parent_rate);
-	else if (core->flags & CLK_SET_RATE_PARENT)
-		return clk_core_round_rate_nolock(core->parent, rate, min_rate,
-						  max_rate);
-	else
-		return core->rate;
+		return core->ops->determine_rate(core->hw, req);
+	} else if (core->ops->round_rate) {
+		rate = core->ops->round_rate(core->hw, req->rate,
+					     &req->best_parent_rate);
+		if (rate < 0)
+			return rate;
+
+		req->rate = rate;
+	} else if (core->flags & CLK_SET_RATE_PARENT) {
+		return clk_core_round_rate_nolock(parent, req);
+	} else {
+		req->rate = core->rate;
+	}
+
+	return 0;
 }
 
 /**
@@ -800,15 +804,14 @@ static unsigned long clk_core_round_rate_nolock(struct clk_core *core,
  *
  * Useful for clk_ops such as .set_rate and .determine_rate.
  */
-unsigned long __clk_determine_rate(struct clk_hw *hw,
-				   unsigned long rate,
-				   unsigned long min_rate,
-				   unsigned long max_rate)
+int __clk_determine_rate(struct clk_hw *hw, struct clk_rate_request *req)
 {
-	if (!hw)
+	if (!hw) {
+		req->rate = 0;
 		return 0;
+	}
 
-	return clk_core_round_rate_nolock(hw->core, rate, min_rate, max_rate);
+	return clk_core_round_rate_nolock(hw->core, req);
 }
 EXPORT_SYMBOL_GPL(__clk_determine_rate);
 
@@ -821,15 +824,20 @@ EXPORT_SYMBOL_GPL(__clk_determine_rate);
  */
 unsigned long __clk_round_rate(struct clk *clk, unsigned long rate)
 {
-	unsigned long min_rate;
-	unsigned long max_rate;
+	struct clk_rate_request req;
+	int ret;
 
 	if (!clk)
 		return 0;
 
-	clk_core_get_boundaries(clk->core, &min_rate, &max_rate);
+	clk_core_get_boundaries(clk->core, &req.min_rate, &req.max_rate);
+	req.rate = rate;
+
+	ret = clk_core_round_rate_nolock(clk->core, &req);
+	if (ret)
+		return 0;
 
-	return clk_core_round_rate_nolock(clk->core, rate, min_rate, max_rate);
+	return req.rate;
 }
 EXPORT_SYMBOL_GPL(__clk_round_rate);
 
@@ -1249,7 +1257,6 @@ static struct clk_core *clk_calc_new_rates(struct clk_core *core,
 {
 	struct clk_core *top = core;
 	struct clk_core *old_parent, *parent;
-	struct clk_hw *parent_hw;
 	unsigned long best_parent_rate = 0;
 	unsigned long new_rate;
 	unsigned long min_rate;
@@ -1270,20 +1277,29 @@ static struct clk_core *clk_calc_new_rates(struct clk_core *core,
 
 	/* find the closest rate and parent clk/rate */
 	if (core->ops->determine_rate) {
-		parent_hw = parent ? parent->hw : NULL;
-		ret = core->ops->determine_rate(core->hw, rate,
-					       min_rate,
-					       max_rate,
-					       &best_parent_rate,
-					       &parent_hw);
+		struct clk_rate_request req;
+
+		req.rate = rate;
+		req.min_rate = min_rate;
+		req.max_rate = max_rate;
+		if (parent) {
+			req.best_parent_hw = parent->hw;
+			req.best_parent_rate = parent->rate;
+		} else {
+			req.best_parent_hw = NULL;
+			req.best_parent_rate = 0;
+		}
+
+		ret = core->ops->determine_rate(core->hw, &req);
 		if (ret < 0)
 			return NULL;
 
-		new_rate = ret;
-		parent = parent_hw ? parent_hw->core : NULL;
+		best_parent_rate = req.best_parent_rate;
+		new_rate = req.rate;
+		parent = req.best_parent_hw ? req.best_parent_hw->core : NULL;
 	} else if (core->ops->round_rate) {
 		ret = core->ops->round_rate(core->hw, rate,
-					   &best_parent_rate);
+					    &best_parent_rate);
 		if (ret < 0)
 			return NULL;
 
diff --git a/drivers/clk/hisilicon/clk-hi3620.c b/drivers/clk/hisilicon/clk-hi3620.c
index 715d34a5ef9b..a0674ba6659e 100644
--- a/drivers/clk/hisilicon/clk-hi3620.c
+++ b/drivers/clk/hisilicon/clk-hi3620.c
@@ -294,34 +294,29 @@ static unsigned long mmc_clk_recalc_rate(struct clk_hw *hw,
 	}
 }
 
-static long mmc_clk_determine_rate(struct clk_hw *hw, unsigned long rate,
-			      unsigned long min_rate,
-			      unsigned long max_rate,
-			      unsigned long *best_parent_rate,
-			      struct clk_hw **best_parent_p)
+static int mmc_clk_determine_rate(struct clk_hw *hw,
+				  struct clk_rate_request *req)
 {
 	struct clk_mmc *mclk = to_mmc(hw);
-	unsigned long best = 0;
 
-	if ((rate <= 13000000) && (mclk->id == HI3620_MMC_CIUCLK1)) {
-		rate = 13000000;
-		best = 26000000;
-	} else if (rate <= 26000000) {
-		rate = 25000000;
-		best = 180000000;
-	} else if (rate <= 52000000) {
-		rate = 50000000;
-		best = 360000000;
-	} else if (rate <= 100000000) {
-		rate = 100000000;
-		best = 720000000;
+	if ((req->rate <= 13000000) && (mclk->id == HI3620_MMC_CIUCLK1)) {
+		req->rate = 13000000;
+		req->best_parent_rate = 26000000;
+	} else if (req->rate <= 26000000) {
+		req->rate = 25000000;
+		req->best_parent_rate = 180000000;
+	} else if (req->rate <= 52000000) {
+		req->rate = 50000000;
+		req->best_parent_rate = 360000000;
+	} else if (req->rate <= 100000000) {
+		req->rate = 100000000;
+		req->best_parent_rate = 720000000;
 	} else {
 		/* max is 180M */
-		rate = 180000000;
-		best = 1440000000;
+		req->rate = 180000000;
+		req->best_parent_rate = 1440000000;
 	}
-	*best_parent_rate = best;
-	return rate;
+	return 0;
 }
 
 static u32 mmc_clk_delay(u32 val, u32 para, u32 off, u32 len)
diff --git a/drivers/clk/mmp/clk-mix.c b/drivers/clk/mmp/clk-mix.c
index de6a873175d2..7a37432761f9 100644
--- a/drivers/clk/mmp/clk-mix.c
+++ b/drivers/clk/mmp/clk-mix.c
@@ -201,11 +201,8 @@ error:
 	return ret;
 }
 
-static long mmp_clk_mix_determine_rate(struct clk_hw *hw, unsigned long rate,
-					unsigned long min_rate,
-					unsigned long max_rate,
-					unsigned long *best_parent_rate,
-					struct clk_hw **best_parent_clk)
+static int mmp_clk_mix_determine_rate(struct clk_hw *hw,
+				      struct clk_rate_request *req)
 {
 	struct mmp_clk_mix *mix = to_clk_mix(hw);
 	struct mmp_clk_mix_clk_table *item;
@@ -221,7 +218,7 @@ static long mmp_clk_mix_determine_rate(struct clk_hw *hw, unsigned long rate,
 	parent = NULL;
 	mix_rate_best = 0;
 	parent_rate_best = 0;
-	gap_best = rate;
+	gap_best = req->rate;
 	parent_best = NULL;
 
 	if (mix->table) {
@@ -233,7 +230,7 @@ static long mmp_clk_mix_determine_rate(struct clk_hw *hw, unsigned long rate,
 							item->parent_index);
 			parent_rate = __clk_get_rate(parent);
 			mix_rate = parent_rate / item->divisor;
-			gap = abs(mix_rate - rate);
+			gap = abs(mix_rate - req->rate);
 			if (parent_best == NULL || gap < gap_best) {
 				parent_best = parent;
 				parent_rate_best = parent_rate;
@@ -251,7 +248,7 @@ static long mmp_clk_mix_determine_rate(struct clk_hw *hw, unsigned long rate,
 			for (j = 0; j < div_val_max; j++) {
 				div = _get_div(mix, j);
 				mix_rate = parent_rate / div;
-				gap = abs(mix_rate - rate);
+				gap = abs(mix_rate - req->rate);
 				if (parent_best == NULL || gap < gap_best) {
 					parent_best = parent;
 					parent_rate_best = parent_rate;
@@ -265,10 +262,11 @@ static long mmp_clk_mix_determine_rate(struct clk_hw *hw, unsigned long rate,
 	}
 
 found:
-	*best_parent_rate = parent_rate_best;
-	*best_parent_clk = __clk_get_hw(parent_best);
+	req->best_parent_rate = parent_rate_best;
+	req->best_parent_hw = __clk_get_hw(parent_best);
+	req->rate = mix_rate_best;
 
-	return mix_rate_best;
+	return 0;
 }
 
 static int mmp_clk_mix_set_rate_and_parent(struct clk_hw *hw,
diff --git a/drivers/clk/qcom/clk-pll.c b/drivers/clk/qcom/clk-pll.c
index 245d5063a385..6017a76b47c8 100644
--- a/drivers/clk/qcom/clk-pll.c
+++ b/drivers/clk/qcom/clk-pll.c
@@ -135,19 +135,23 @@ struct pll_freq_tbl *find_freq(const struct pll_freq_tbl *f, unsigned long rate)
 	return NULL;
 }
 
-static long
-clk_pll_determine_rate(struct clk_hw *hw, unsigned long rate,
-		       unsigned long min_rate, unsigned long max_rate,
-		       unsigned long *p_rate, struct clk_hw **p)
+static int
+clk_pll_determine_rate(struct clk_hw *hw, struct clk_rate_request *req)
 {
+	struct clk *parent = __clk_get_parent(hw->clk);
 	struct clk_pll *pll = to_clk_pll(hw);
 	const struct pll_freq_tbl *f;
 
-	f = find_freq(pll->freq_tbl, rate);
+	req->best_parent_hw = __clk_get_hw(parent);
+	req->best_parent_rate = __clk_get_rate(parent);
+
+	f = find_freq(pll->freq_tbl, req->rate);
 	if (!f)
-		return clk_pll_recalc_rate(hw, *p_rate);
+		req->rate = clk_pll_recalc_rate(hw, req->best_parent_rate);
+	else
+		req->rate = f->freq;
 
-	return f->freq;
+	return 0;
 }
 
 static int
diff --git a/drivers/clk/qcom/clk-rcg.c b/drivers/clk/qcom/clk-rcg.c
index 7b3d62674203..2bc42bb21b3d 100644
--- a/drivers/clk/qcom/clk-rcg.c
+++ b/drivers/clk/qcom/clk-rcg.c
@@ -404,13 +404,11 @@ clk_dyn_rcg_recalc_rate(struct clk_hw *hw, unsigned long parent_rate)
 	return calc_rate(parent_rate, m, n, mode, pre_div);
 }
 
-static long _freq_tbl_determine_rate(struct clk_hw *hw,
-		const struct freq_tbl *f, unsigned long rate,
-		unsigned long min_rate, unsigned long max_rate,
-		unsigned long *p_rate, struct clk_hw **p_hw,
+static int _freq_tbl_determine_rate(struct clk_hw *hw, const struct freq_tbl *f,
+		struct clk_rate_request *req,
 		const struct parent_map *parent_map)
 {
-	unsigned long clk_flags;
+	unsigned long clk_flags, rate = req->rate;
 	struct clk *p;
 	int index;
 
@@ -435,25 +433,24 @@ static long _freq_tbl_determine_rate(struct clk_hw *hw,
 	} else {
 		rate =  __clk_get_rate(p);
 	}
-	*p_hw = __clk_get_hw(p);
-	*p_rate = rate;
+	req->best_parent_hw = __clk_get_hw(p);
+	req->best_parent_rate = rate;
+	req->rate = f->freq;
 
-	return f->freq;
+	return 0;
 }
 
-static long clk_rcg_determine_rate(struct clk_hw *hw, unsigned long rate,
-		unsigned long min_rate, unsigned long max_rate,
-		unsigned long *p_rate, struct clk_hw **p)
+static int clk_rcg_determine_rate(struct clk_hw *hw,
+				  struct clk_rate_request *req)
 {
 	struct clk_rcg *rcg = to_clk_rcg(hw);
 
-	return _freq_tbl_determine_rate(hw, rcg->freq_tbl, rate, min_rate,
-			max_rate, p_rate, p, rcg->s.parent_map);
+	return _freq_tbl_determine_rate(hw, rcg->freq_tbl, req,
+					rcg->s.parent_map);
 }
 
-static long clk_dyn_rcg_determine_rate(struct clk_hw *hw, unsigned long rate,
-		unsigned long min_rate, unsigned long max_rate,
-		unsigned long *p_rate, struct clk_hw **p)
+static int clk_dyn_rcg_determine_rate(struct clk_hw *hw,
+				      struct clk_rate_request *req)
 {
 	struct clk_dyn_rcg *rcg = to_clk_dyn_rcg(hw);
 	u32 reg;
@@ -464,13 +461,11 @@ static long clk_dyn_rcg_determine_rate(struct clk_hw *hw, unsigned long rate,
 	bank = reg_to_bank(rcg, reg);
 	s = &rcg->s[bank];
 
-	return _freq_tbl_determine_rate(hw, rcg->freq_tbl, rate, min_rate,
-			max_rate, p_rate, p, s->parent_map);
+	return _freq_tbl_determine_rate(hw, rcg->freq_tbl, req, s->parent_map);
 }
 
-static long clk_rcg_bypass_determine_rate(struct clk_hw *hw, unsigned long rate,
-		unsigned long min_rate, unsigned long max_rate,
-		unsigned long *p_rate, struct clk_hw **p_hw)
+static int clk_rcg_bypass_determine_rate(struct clk_hw *hw,
+					 struct clk_rate_request *req)
 {
 	struct clk_rcg *rcg = to_clk_rcg(hw);
 	const struct freq_tbl *f = rcg->freq_tbl;
@@ -478,10 +473,11 @@ static long clk_rcg_bypass_determine_rate(struct clk_hw *hw, unsigned long rate,
 	int index = qcom_find_src_index(hw, rcg->s.parent_map, f->src);
 
 	p = clk_get_parent_by_index(hw->clk, index);
-	*p_hw = __clk_get_hw(p);
-	*p_rate = __clk_round_rate(p, rate);
+	req->best_parent_hw = __clk_get_hw(p);
+	req->best_parent_rate = __clk_round_rate(p, req->rate);
+	req->rate = req->best_parent_rate;
 
-	return *p_rate;
+	return 0;
 }
 
 static int __clk_rcg_set_rate(struct clk_rcg *rcg, const struct freq_tbl *f)
diff --git a/drivers/clk/qcom/clk-rcg2.c b/drivers/clk/qcom/clk-rcg2.c
index b95d17fbb8d7..aa6c3bdac040 100644
--- a/drivers/clk/qcom/clk-rcg2.c
+++ b/drivers/clk/qcom/clk-rcg2.c
@@ -176,11 +176,10 @@ clk_rcg2_recalc_rate(struct clk_hw *hw, unsigned long parent_rate)
 	return calc_rate(parent_rate, m, n, mode, hid_div);
 }
 
-static long _freq_tbl_determine_rate(struct clk_hw *hw,
-		const struct freq_tbl *f, unsigned long rate,
-		unsigned long *p_rate, struct clk_hw **p_hw)
+static int _freq_tbl_determine_rate(struct clk_hw *hw,
+		const struct freq_tbl *f, struct clk_rate_request *req)
 {
-	unsigned long clk_flags;
+	unsigned long clk_flags, rate = req->rate;
 	struct clk *p;
 	struct clk_rcg2 *rcg = to_clk_rcg2(hw);
 	int index;
@@ -210,19 +209,19 @@ static long _freq_tbl_determine_rate(struct clk_hw *hw,
 	} else {
 		rate =  __clk_get_rate(p);
 	}
-	*p_hw = __clk_get_hw(p);
-	*p_rate = rate;
+	req->best_parent_hw = __clk_get_hw(p);
+	req->best_parent_rate = rate;
+	req->rate = f->freq;
 
-	return f->freq;
+	return 0;
 }
 
-static long clk_rcg2_determine_rate(struct clk_hw *hw, unsigned long rate,
-		unsigned long min_rate, unsigned long max_rate,
-		unsigned long *p_rate, struct clk_hw **p)
+static int clk_rcg2_determine_rate(struct clk_hw *hw,
+				   struct clk_rate_request *req)
 {
 	struct clk_rcg2 *rcg = to_clk_rcg2(hw);
 
-	return _freq_tbl_determine_rate(hw, rcg->freq_tbl, rate, p_rate, p);
+	return _freq_tbl_determine_rate(hw, rcg->freq_tbl, req);
 }
 
 static int clk_rcg2_configure(struct clk_rcg2 *rcg, const struct freq_tbl *f)
@@ -374,35 +373,34 @@ static int clk_edp_pixel_set_rate_and_parent(struct clk_hw *hw,
 	return clk_edp_pixel_set_rate(hw, rate, parent_rate);
 }
 
-static long clk_edp_pixel_determine_rate(struct clk_hw *hw, unsigned long rate,
-				 unsigned long min_rate,
-				 unsigned long max_rate,
-				 unsigned long *p_rate, struct clk_hw **p)
+static int clk_edp_pixel_determine_rate(struct clk_hw *hw,
+					struct clk_rate_request *req)
 {
 	struct clk_rcg2 *rcg = to_clk_rcg2(hw);
 	const struct freq_tbl *f = rcg->freq_tbl;
 	const struct frac_entry *frac;
 	int delta = 100000;
-	s64 src_rate = *p_rate;
 	s64 request;
 	u32 mask = BIT(rcg->hid_width) - 1;
 	u32 hid_div;
 	int index = qcom_find_src_index(hw, rcg->parent_map, f->src);
+	struct clk *p = clk_get_parent_by_index(hw->clk, index);
 
 	/* Force the correct parent */
-	*p = __clk_get_hw(clk_get_parent_by_index(hw->clk, index));
+	req->best_parent_hw = __clk_get_hw(p);
+	req->best_parent_rate = __clk_get_rate(p);
 
-	if (src_rate == 810000000)
+	if (req->best_parent_rate == 810000000)
 		frac = frac_table_810m;
 	else
 		frac = frac_table_675m;
 
 	for (; frac->num; frac++) {
-		request = rate;
+		request = req->rate;
 		request *= frac->den;
 		request = div_s64(request, frac->num);
-		if ((src_rate < (request - delta)) ||
-		    (src_rate > (request + delta)))
+		if ((req->best_parent_rate < (request - delta)) ||
+		    (req->best_parent_rate > (request + delta)))
 			continue;
 
 		regmap_read(rcg->clkr.regmap, rcg->cmd_rcgr + CFG_REG,
@@ -410,8 +408,10 @@ static long clk_edp_pixel_determine_rate(struct clk_hw *hw, unsigned long rate,
 		hid_div >>= CFG_SRC_DIV_SHIFT;
 		hid_div &= mask;
 
-		return calc_rate(src_rate, frac->num, frac->den, !!frac->den,
-				 hid_div);
+		req->rate = calc_rate(req->best_parent_rate,
+				      frac->num, frac->den,
+				      !!frac->den, hid_div);
+		return 0;
 	}
 
 	return -EINVAL;
@@ -428,9 +428,8 @@ const struct clk_ops clk_edp_pixel_ops = {
 };
 EXPORT_SYMBOL_GPL(clk_edp_pixel_ops);
 
-static long clk_byte_determine_rate(struct clk_hw *hw, unsigned long rate,
-			 unsigned long min_rate, unsigned long max_rate,
-			 unsigned long *p_rate, struct clk_hw **p_hw)
+static int clk_byte_determine_rate(struct clk_hw *hw,
+				   struct clk_rate_request *req)
 {
 	struct clk_rcg2 *rcg = to_clk_rcg2(hw);
 	const struct freq_tbl *f = rcg->freq_tbl;
@@ -439,17 +438,19 @@ static long clk_byte_determine_rate(struct clk_hw *hw, unsigned long rate,
 	u32 mask = BIT(rcg->hid_width) - 1;
 	struct clk *p;
 
-	if (rate == 0)
+	if (req->rate == 0)
 		return -EINVAL;
 
 	p = clk_get_parent_by_index(hw->clk, index);
-	*p_hw = __clk_get_hw(p);
-	*p_rate = parent_rate = __clk_round_rate(p, rate);
+	req->best_parent_hw = __clk_get_hw(p);
+	req->best_parent_rate = parent_rate = __clk_round_rate(p, req->rate);
 
-	div = DIV_ROUND_UP((2 * parent_rate), rate) - 1;
+	div = DIV_ROUND_UP((2 * parent_rate), req->rate) - 1;
 	div = min_t(u32, div, mask);
 
-	return calc_rate(parent_rate, 0, 0, 0, div);
+	req->rate = calc_rate(parent_rate, 0, 0, 0, div);
+
+	return 0;
 }
 
 static int clk_byte_set_rate(struct clk_hw *hw, unsigned long rate,
@@ -494,10 +495,8 @@ static const struct frac_entry frac_table_pixel[] = {
 	{ }
 };
 
-static long clk_pixel_determine_rate(struct clk_hw *hw, unsigned long rate,
-				 unsigned long min_rate,
-				 unsigned long max_rate,
-				 unsigned long *p_rate, struct clk_hw **p)
+static int clk_pixel_determine_rate(struct clk_hw *hw,
+				    struct clk_rate_request *req)
 {
 	struct clk_rcg2 *rcg = to_clk_rcg2(hw);
 	unsigned long request, src_rate;
@@ -507,18 +506,19 @@ static long clk_pixel_determine_rate(struct clk_hw *hw, unsigned long rate,
 	int index = qcom_find_src_index(hw, rcg->parent_map, f->src);
 	struct clk *parent = clk_get_parent_by_index(hw->clk, index);
 
-	*p = __clk_get_hw(parent);
+	req->best_parent_hw = __clk_get_hw(parent);
 
 	for (; frac->num; frac++) {
-		request = (rate * frac->den) / frac->num;
+		request = (req->rate * frac->den) / frac->num;
 
 		src_rate = __clk_round_rate(parent, request);
 		if ((src_rate < (request - delta)) ||
 			(src_rate > (request + delta)))
 			continue;
 
-		*p_rate = src_rate;
-		return (src_rate * frac->num) / frac->den;
+		req->best_parent_rate = src_rate;
+		req->rate = (src_rate * frac->num) / frac->den;
+		return 0;
 	}
 
 	return -EINVAL;
diff --git a/drivers/clk/sunxi/clk-factors.c b/drivers/clk/sunxi/clk-factors.c
index 8c20190a3e9f..7a485870991d 100644
--- a/drivers/clk/sunxi/clk-factors.c
+++ b/drivers/clk/sunxi/clk-factors.c
@@ -79,11 +79,8 @@ static long clk_factors_round_rate(struct clk_hw *hw, unsigned long rate,
 	return rate;
 }
 
-static long clk_factors_determine_rate(struct clk_hw *hw, unsigned long rate,
-				       unsigned long min_rate,
-				       unsigned long max_rate,
-				       unsigned long *best_parent_rate,
-				       struct clk_hw **best_parent_p)
+static int clk_factors_determine_rate(struct clk_hw *hw,
+				      struct clk_rate_request *req)
 {
 	struct clk *clk = hw->clk, *parent, *best_parent = NULL;
 	int i, num_parents;
@@ -96,13 +93,14 @@ static long clk_factors_determine_rate(struct clk_hw *hw, unsigned long rate,
 		if (!parent)
 			continue;
 		if (__clk_get_flags(clk) & CLK_SET_RATE_PARENT)
-			parent_rate = __clk_round_rate(parent, rate);
+			parent_rate = __clk_round_rate(parent, req->rate);
 		else
 			parent_rate = __clk_get_rate(parent);
 
-		child_rate = clk_factors_round_rate(hw, rate, &parent_rate);
+		child_rate = clk_factors_round_rate(hw, req->rate,
+						    &parent_rate);
 
-		if (child_rate <= rate && child_rate > best_child_rate) {
+		if (child_rate <= req->rate && child_rate > best_child_rate) {
 			best_parent = parent;
 			best = parent_rate;
 			best_child_rate = child_rate;
@@ -110,10 +108,11 @@ static long clk_factors_determine_rate(struct clk_hw *hw, unsigned long rate,
 	}
 
 	if (best_parent)
-		*best_parent_p = __clk_get_hw(best_parent);
-	*best_parent_rate = best;
+		req->best_parent_hw = __clk_get_hw(best_parent);
+	req->best_parent_rate = best;
+	req->rate = best_child_rate;
 
-	return best_child_rate;
+	return 0;
 }
 
 static int clk_factors_set_rate(struct clk_hw *hw, unsigned long rate,
diff --git a/drivers/clk/sunxi/clk-sun6i-ar100.c b/drivers/clk/sunxi/clk-sun6i-ar100.c
index 63cf149195ae..d70c1ea345db 100644
--- a/drivers/clk/sunxi/clk-sun6i-ar100.c
+++ b/drivers/clk/sunxi/clk-sun6i-ar100.c
@@ -44,17 +44,14 @@ static unsigned long ar100_recalc_rate(struct clk_hw *hw,
 	return (parent_rate >> shift) / (div + 1);
 }
 
-static long ar100_determine_rate(struct clk_hw *hw, unsigned long rate,
-				 unsigned long min_rate,
-				 unsigned long max_rate,
-				 unsigned long *best_parent_rate,
-				 struct clk_hw **best_parent_clk)
+static int ar100_determine_rate(struct clk_hw *hw,
+				struct clk_rate_request *req)
 {
 	int nparents = __clk_get_num_parents(hw->clk);
 	long best_rate = -EINVAL;
 	int i;
 
-	*best_parent_clk = NULL;
+	req->best_parent_hw = NULL;
 
 	for (i = 0; i < nparents; i++) {
 		unsigned long parent_rate;
@@ -65,7 +62,7 @@ static long ar100_determine_rate(struct clk_hw *hw, unsigned long rate,
 
 		parent = clk_get_parent_by_index(hw->clk, i);
 		parent_rate = __clk_get_rate(parent);
-		div = DIV_ROUND_UP(parent_rate, rate);
+		div = DIV_ROUND_UP(parent_rate, req->rate);
 
 		/*
 		 * The AR100 clk contains 2 divisors:
@@ -101,14 +98,16 @@ static long ar100_determine_rate(struct clk_hw *hw, unsigned long rate,
 			continue;
 
 		tmp_rate = (parent_rate >> shift) / div;
-		if (!*best_parent_clk || tmp_rate > best_rate) {
-			*best_parent_clk = __clk_get_hw(parent);
-			*best_parent_rate = parent_rate;
+		if (!req->best_parent_hw || tmp_rate > best_rate) {
+			req->best_parent_hw = __clk_get_hw(parent);
+			req->best_parent_rate = parent_rate;
 			best_rate = tmp_rate;
 		}
 	}
 
-	return best_rate;
+	req->rate = best_rate;
+
+	return 0;
 }
 
 static int ar100_set_parent(struct clk_hw *hw, u8 index)
diff --git a/drivers/clk/sunxi/clk-sunxi.c b/drivers/clk/sunxi/clk-sunxi.c
index 9a82f17d2d73..d0f72a151bf1 100644
--- a/drivers/clk/sunxi/clk-sunxi.c
+++ b/drivers/clk/sunxi/clk-sunxi.c
@@ -118,11 +118,8 @@ static long sun6i_ahb1_clk_round(unsigned long rate, u8 *divp, u8 *pre_divp,
 	return (parent_rate / calcm) >> calcp;
 }
 
-static long sun6i_ahb1_clk_determine_rate(struct clk_hw *hw, unsigned long rate,
-					  unsigned long min_rate,
-					  unsigned long max_rate,
-					  unsigned long *best_parent_rate,
-					  struct clk_hw **best_parent_clk)
+static int sun6i_ahb1_clk_determine_rate(struct clk_hw *hw,
+					 struct clk_rate_request *req)
 {
 	struct clk *clk = hw->clk, *parent, *best_parent = NULL;
 	int i, num_parents;
@@ -135,14 +132,14 @@ static long sun6i_ahb1_clk_determine_rate(struct clk_hw *hw, unsigned long rate,
 		if (!parent)
 			continue;
 		if (__clk_get_flags(clk) & CLK_SET_RATE_PARENT)
-			parent_rate = __clk_round_rate(parent, rate);
+			parent_rate = __clk_round_rate(parent, req->rate);
 		else
 			parent_rate = __clk_get_rate(parent);
 
-		child_rate = sun6i_ahb1_clk_round(rate, NULL, NULL, i,
+		child_rate = sun6i_ahb1_clk_round(req->rate, NULL, NULL, i,
 						  parent_rate);
 
-		if (child_rate <= rate && child_rate > best_child_rate) {
+		if (child_rate <= req->rate && child_rate > best_child_rate) {
 			best_parent = parent;
 			best = parent_rate;
 			best_child_rate = child_rate;
@@ -150,10 +147,11 @@ static long sun6i_ahb1_clk_determine_rate(struct clk_hw *hw, unsigned long rate,
 	}
 
 	if (best_parent)
-		*best_parent_clk = __clk_get_hw(best_parent);
-	*best_parent_rate = best;
+		req->best_parent_hw = __clk_get_hw(best_parent);
+	req->best_parent_rate = best;
+	req->rate = best_child_rate;
 
-	return best_child_rate;
+	return 0;
 }
 
 static int sun6i_ahb1_clk_set_rate(struct clk_hw *hw, unsigned long rate,
diff --git a/drivers/clk/tegra/clk-emc.c b/drivers/clk/tegra/clk-emc.c
index 7649685c86bc..08ae518c9950 100644
--- a/drivers/clk/tegra/clk-emc.c
+++ b/drivers/clk/tegra/clk-emc.c
@@ -116,11 +116,7 @@ static unsigned long emc_recalc_rate(struct clk_hw *hw,
  * safer since things have EMC rate floors. Also don't touch parent_rate
  * since we don't want the CCF to play with our parent clocks.
  */
-static long emc_determine_rate(struct clk_hw *hw, unsigned long rate,
-			       unsigned long min_rate,
-			       unsigned long max_rate,
-			       unsigned long *best_parent_rate,
-			       struct clk_hw **best_parent_hw)
+static int emc_determine_rate(struct clk_hw *hw, struct clk_rate_request *req)
 {
 	struct tegra_clk_emc *tegra;
 	u8 ram_code = tegra_read_ram_code();
@@ -135,22 +131,28 @@ static long emc_determine_rate(struct clk_hw *hw, unsigned long rate,
 
 		timing = tegra->timings + i;
 
-		if (timing->rate > max_rate) {
+		if (timing->rate > req->max_rate) {
 			i = min(i, 1);
-			return tegra->timings[i - 1].rate;
+			req->rate = tegra->timings[i - 1].rate;
+			return 0;
 		}
 
-		if (timing->rate < min_rate)
+		if (timing->rate < req->min_rate)
 			continue;
 
-		if (timing->rate >= rate)
-			return timing->rate;
+		if (timing->rate >= req->rate) {
+			req->rate = timing->rate;
+			return 0;
+		}
 	}
 
-	if (timing)
-		return timing->rate;
+	if (timing) {
+		req->rate = timing->rate;
+		return 0;
+	}
 
-	return __clk_get_rate(hw->clk);
+	req->rate = __clk_get_rate(hw->clk);
+	return 0;
 }
 
 static u8 emc_get_parent(struct clk_hw *hw)
diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index 78842f46f152..14998f05acf2 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -37,6 +37,28 @@ struct clk_hw;
 struct clk_core;
 struct dentry;
 
+/**
+ * struct clk_rate_request - Structure encoding the clk constraints that
+ * a clock user might require.
+ *
+ * @rate:		Requested clock rate. This field will be adjusted by
+ *			clock drivers according to hardware capabilities.
+ * @min_rate:		Minimum rate imposed by clk users.
+ * @max_rate:		Maximum rate a imposed by clk users.
+ * @best_parent_rate:	The best parent rate a parent can provide to fulfill the
+ *			requested constraints.
+ * @best_parent_hw:	The most appropriate parent clock that fulfills the
+ *			requested constraints.
+ *
+ */
+struct clk_rate_request {
+	unsigned long rate;
+	unsigned long min_rate;
+	unsigned long max_rate;
+	unsigned long best_parent_rate;
+	struct clk_hw *best_parent_hw;
+};
+
 /**
  * struct clk_ops -  Callback operations for hardware clocks; these are to
  * be provided by the clock implementation, and will be called by drivers
@@ -176,12 +198,8 @@ struct clk_ops {
 					unsigned long parent_rate);
 	long		(*round_rate)(struct clk_hw *hw, unsigned long rate,
 					unsigned long *parent_rate);
-	long		(*determine_rate)(struct clk_hw *hw,
-					  unsigned long rate,
-					  unsigned long min_rate,
-					  unsigned long max_rate,
-					  unsigned long *best_parent_rate,
-					  struct clk_hw **best_parent_hw);
+	int		(*determine_rate)(struct clk_hw *hw,
+					  struct clk_rate_request *req);
 	int		(*set_parent)(struct clk_hw *hw, u8 index);
 	u8		(*get_parent)(struct clk_hw *hw);
 	int		(*set_rate)(struct clk_hw *hw, unsigned long rate,
@@ -578,20 +596,11 @@ unsigned long __clk_get_flags(struct clk *clk);
 bool __clk_is_prepared(struct clk *clk);
 bool __clk_is_enabled(struct clk *clk);
 struct clk *__clk_lookup(const char *name);
-long __clk_mux_determine_rate(struct clk_hw *hw, unsigned long rate,
-			      unsigned long min_rate,
-			      unsigned long max_rate,
-			      unsigned long *best_parent_rate,
-			      struct clk_hw **best_parent_p);
-unsigned long __clk_determine_rate(struct clk_hw *core,
-				   unsigned long rate,
-				   unsigned long min_rate,
-				   unsigned long max_rate);
-long __clk_mux_determine_rate_closest(struct clk_hw *hw, unsigned long rate,
-			      unsigned long min_rate,
-			      unsigned long max_rate,
-			      unsigned long *best_parent_rate,
-			      struct clk_hw **best_parent_p);
+int __clk_mux_determine_rate(struct clk_hw *hw,
+			     struct clk_rate_request *req);
+int __clk_determine_rate(struct clk_hw *core, struct clk_rate_request *req);
+int __clk_mux_determine_rate_closest(struct clk_hw *hw,
+				     struct clk_rate_request *req);
 void clk_hw_reparent(struct clk_hw *hw, struct clk_hw *new_parent);
 
 static inline void __clk_hw_set_clk(struct clk_hw *dst, struct clk_hw *src)
diff --git a/include/linux/clk/ti.h b/include/linux/clk/ti.h
index 79b76e13d904..448b4f87b9eb 100644
--- a/include/linux/clk/ti.h
+++ b/include/linux/clk/ti.h
@@ -269,23 +269,15 @@ int omap3_noncore_dpll_set_rate_and_parent(struct clk_hw *hw,
 					   unsigned long rate,
 					   unsigned long parent_rate,
 					   u8 index);
-long omap3_noncore_dpll_determine_rate(struct clk_hw *hw,
-				       unsigned long rate,
-				       unsigned long min_rate,
-				       unsigned long max_rate,
-				       unsigned long *best_parent_rate,
-				       struct clk_hw **best_parent_clk);
+int omap3_noncore_dpll_determine_rate(struct clk_hw *hw,
+				      struct clk_rate_request *req);
 unsigned long omap4_dpll_regm4xen_recalc(struct clk_hw *hw,
 					 unsigned long parent_rate);
 long omap4_dpll_regm4xen_round_rate(struct clk_hw *hw,
 				    unsigned long target_rate,
 				    unsigned long *parent_rate);
-long omap4_dpll_regm4xen_determine_rate(struct clk_hw *hw,
-					unsigned long rate,
-					unsigned long min_rate,
-					unsigned long max_rate,
-					unsigned long *best_parent_rate,
-					struct clk_hw **best_parent_clk);
+int omap4_dpll_regm4xen_determine_rate(struct clk_hw *hw,
+				       struct clk_rate_request *req);
 u8 omap2_init_dpll_parent(struct clk_hw *hw);
 unsigned long omap3_dpll_recalc(struct clk_hw *hw, unsigned long parent_rate);
 long omap2_dpll_round_rate(struct clk_hw *hw, unsigned long target_rate,
-- 
cgit v1.2.3


From 730daa164e7c7e31c08fab940549f4acc3329432 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook@chromium.org>
Date: Thu, 23 Jul 2015 18:02:48 -0700
Subject: Yama: remove needless CONFIG_SECURITY_YAMA_STACKED

Now that minor LSMs can cleanly stack with major LSMs, remove the unneeded
config for Yama to be made to explicitly stack. Just selecting the main
Yama CONFIG will allow it to work, regardless of the major LSM. Since
distros using Yama are already forcing it to stack, this is effectively
a no-op change.

Additionally add MAINTAINERS entry.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 Documentation/security/Yama.txt       | 10 ++++------
 MAINTAINERS                           |  6 ++++++
 arch/mips/configs/pistachio_defconfig |  1 -
 include/linux/lsm_hooks.h             |  6 ++++--
 security/Kconfig                      |  5 -----
 security/security.c                   | 11 +++--------
 security/yama/Kconfig                 |  9 +--------
 security/yama/yama_lsm.c              | 32 ++++++++++----------------------
 8 files changed, 28 insertions(+), 52 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/security/Yama.txt b/Documentation/security/Yama.txt
index 227a63f018a2..d9ee7d7a6c7f 100644
--- a/Documentation/security/Yama.txt
+++ b/Documentation/security/Yama.txt
@@ -1,9 +1,7 @@
-Yama is a Linux Security Module that collects a number of system-wide DAC
-security protections that are not handled by the core kernel itself. To
-select it at boot time, specify "security=yama" (though this will disable
-any other LSM).
-
-Yama is controlled through sysctl in /proc/sys/kernel/yama:
+Yama is a Linux Security Module that collects system-wide DAC security
+protections that are not handled by the core kernel itself. This is
+selectable at build-time with CONFIG_SECURITY_YAMA, and can be controlled
+at run-time through sysctls in /proc/sys/kernel/yama:
 
 - ptrace_scope
 
diff --git a/MAINTAINERS b/MAINTAINERS
index a2264167791a..f8be2f797197 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9102,6 +9102,12 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git
 S:	Supported
 F:	security/apparmor/
 
+YAMA SECURITY MODULE
+M:	Kees Cook <keescook@chromium.org>
+T:	git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git yama/tip
+S:	Supported
+F:	security/yama/
+
 SENSABLE PHANTOM
 M:	Jiri Slaby <jirislaby@gmail.com>
 S:	Maintained
diff --git a/arch/mips/configs/pistachio_defconfig b/arch/mips/configs/pistachio_defconfig
index 1646cce032c3..642b50946943 100644
--- a/arch/mips/configs/pistachio_defconfig
+++ b/arch/mips/configs/pistachio_defconfig
@@ -320,7 +320,6 @@ CONFIG_KEYS=y
 CONFIG_SECURITY=y
 CONFIG_SECURITY_NETWORK=y
 CONFIG_SECURITY_YAMA=y
-CONFIG_SECURITY_YAMA_STACKED=y
 CONFIG_DEFAULT_SECURITY_DAC=y
 CONFIG_CRYPTO_AUTHENC=y
 CONFIG_CRYPTO_HMAC=y
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 9429f054c323..ec3a6bab29de 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1881,8 +1881,10 @@ static inline void security_delete_hooks(struct security_hook_list *hooks,
 
 extern int __init security_module_enable(const char *module);
 extern void __init capability_add_hooks(void);
-#ifdef CONFIG_SECURITY_YAMA_STACKED
-void __init yama_add_hooks(void);
+#ifdef CONFIG_SECURITY_YAMA
+extern void __init yama_add_hooks(void);
+#else
+static inline void __init yama_add_hooks(void) { }
 #endif
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/security/Kconfig b/security/Kconfig
index bf4ec46474b6..e45237897b43 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -132,7 +132,6 @@ choice
 	default DEFAULT_SECURITY_SMACK if SECURITY_SMACK
 	default DEFAULT_SECURITY_TOMOYO if SECURITY_TOMOYO
 	default DEFAULT_SECURITY_APPARMOR if SECURITY_APPARMOR
-	default DEFAULT_SECURITY_YAMA if SECURITY_YAMA
 	default DEFAULT_SECURITY_DAC
 
 	help
@@ -151,9 +150,6 @@ choice
 	config DEFAULT_SECURITY_APPARMOR
 		bool "AppArmor" if SECURITY_APPARMOR=y
 
-	config DEFAULT_SECURITY_YAMA
-		bool "Yama" if SECURITY_YAMA=y
-
 	config DEFAULT_SECURITY_DAC
 		bool "Unix Discretionary Access Controls"
 
@@ -165,7 +161,6 @@ config DEFAULT_SECURITY
 	default "smack" if DEFAULT_SECURITY_SMACK
 	default "tomoyo" if DEFAULT_SECURITY_TOMOYO
 	default "apparmor" if DEFAULT_SECURITY_APPARMOR
-	default "yama" if DEFAULT_SECURITY_YAMA
 	default "" if DEFAULT_SECURITY_DAC
 
 endmenu
diff --git a/security/security.c b/security/security.c
index 595fffab48b0..e693ffcf9266 100644
--- a/security/security.c
+++ b/security/security.c
@@ -56,18 +56,13 @@ int __init security_init(void)
 	pr_info("Security Framework initialized\n");
 
 	/*
-	 * Always load the capability module.
+	 * Load minor LSMs, with the capability module always first.
 	 */
 	capability_add_hooks();
-#ifdef CONFIG_SECURITY_YAMA_STACKED
-	/*
-	 * If Yama is configured for stacking load it next.
-	 */
 	yama_add_hooks();
-#endif
+
 	/*
-	 * Load the chosen module if there is one.
-	 * This will also find yama if it is stacking
+	 * Load all the remaining security modules.
 	 */
 	do_security_initcalls();
 
diff --git a/security/yama/Kconfig b/security/yama/Kconfig
index 3123e1da2fed..90c605eea892 100644
--- a/security/yama/Kconfig
+++ b/security/yama/Kconfig
@@ -6,14 +6,7 @@ config SECURITY_YAMA
 	  This selects Yama, which extends DAC support with additional
 	  system-wide security settings beyond regular Linux discretionary
 	  access controls. Currently available is ptrace scope restriction.
+	  Like capabilities, this security module stacks with other LSMs.
 	  Further information can be found in Documentation/security/Yama.txt.
 
 	  If you are unsure how to answer this question, answer N.
-
-config SECURITY_YAMA_STACKED
-	bool "Yama stacked with other LSMs"
-	depends on SECURITY_YAMA
-	default n
-	help
-	  When Yama is built into the kernel, force it to stack with the
-	  selected primary LSM.
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index 9ed32502470e..d3c19c970a06 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -353,11 +353,6 @@ static struct security_hook_list yama_hooks[] = {
 	LSM_HOOK_INIT(task_free, yama_task_free),
 };
 
-void __init yama_add_hooks(void)
-{
-	security_add_hooks(yama_hooks, ARRAY_SIZE(yama_hooks));
-}
-
 #ifdef CONFIG_SYSCTL
 static int yama_dointvec_minmax(struct ctl_table *table, int write,
 				void __user *buffer, size_t *lenp, loff_t *ppos)
@@ -396,25 +391,18 @@ static struct ctl_table yama_sysctl_table[] = {
 	},
 	{ }
 };
-#endif /* CONFIG_SYSCTL */
-
-static __init int yama_init(void)
+static void __init yama_init_sysctl(void)
 {
-#ifndef CONFIG_SECURITY_YAMA_STACKED
-	/*
-	 * If yama is being stacked this is already taken care of.
-	 */
-	if (!security_module_enable("yama"))
-		return 0;
-#endif
-	pr_info("Yama: becoming mindful.\n");
-
-#ifdef CONFIG_SYSCTL
 	if (!register_sysctl_paths(yama_sysctl_path, yama_sysctl_table))
 		panic("Yama: sysctl registration failed.\n");
-#endif
-
-	return 0;
 }
+#else
+static inline void yama_init_sysctl(void) { }
+#endif /* CONFIG_SYSCTL */
 
-security_initcall(yama_init);
+void __init yama_add_hooks(void)
+{
+	pr_info("Yama: becoming mindful.\n");
+	security_add_hooks(yama_hooks, ARRAY_SIZE(yama_hooks));
+	yama_init_sysctl();
+}
-- 
cgit v1.2.3


From 21abb1ec414c75abe32c3854848ff30e2b4a6113 Mon Sep 17 00:00:00 2001
From: Casey Schaufler <casey@schaufler-ca.com>
Date: Wed, 22 Jul 2015 14:25:31 -0700
Subject: Smack: IPv6 host labeling

IPv6 appears to be (finally) coming of age with the
influx of autonomous devices. In support of this, add
the ability to associate a Smack label with IPv6 addresses.

This patch also cleans up some of the conditional
compilation associated with the introduction of
secmark processing. It's now more obvious which bit
of code goes with which feature.

Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
---
 Documentation/security/Smack.txt |  27 ++-
 security/smack/smack.h           |  48 ++++-
 security/smack/smack_lsm.c       | 262 +++++++++++++++++-------
 security/smack/smackfs.c         | 428 ++++++++++++++++++++++++++++++++-------
 4 files changed, 604 insertions(+), 161 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/security/Smack.txt b/Documentation/security/Smack.txt
index de5e1aeca7fb..5e6d07fbed07 100644
--- a/Documentation/security/Smack.txt
+++ b/Documentation/security/Smack.txt
@@ -28,6 +28,10 @@ Smack kernels use the CIPSO IP option. Some network
 configurations are intolerant of IP options and can impede
 access to systems that use them as Smack does.
 
+Smack is used in the Tizen operating system. Please
+go to http://wiki.tizen.org for information about how
+Smack is used in Tizen.
+
 The current git repository for Smack user space is:
 
 	git://github.com/smack-team/smack.git
@@ -108,6 +112,8 @@ in the smackfs filesystem. This pseudo-filesystem is mounted
 on /sys/fs/smackfs.
 
 access
+	Provided for backward compatibility. The access2 interface
+	is preferred and should be used instead.
 	This interface reports whether a subject with the specified
 	Smack label has a particular access to an object with a
 	specified Smack label. Write a fixed format access rule to
@@ -136,6 +142,8 @@ change-rule
 	those in the fourth string. If there is no such rule it will be
 	created using the access specified in the third and the fourth strings.
 cipso
+	Provided for backward compatibility. The cipso2 interface
+	is preferred and should be used instead.
 	This interface allows a specific CIPSO header to be assigned
 	to a Smack label. The format accepted on write is:
 		"%24s%4d%4d"["%4d"]...
@@ -157,7 +165,19 @@ direct
 doi
 	This contains the CIPSO domain of interpretation used in
 	network packets.
+ipv6host
+	This interface allows specific IPv6 internet addresses to be
+	treated as single label hosts. Packets are sent to single
+	label hosts only from processes that have Smack write access
+	to the host label. All packets received from single label hosts
+	are given the specified label. The format accepted on write is:
+		"%h:%h:%h:%h:%h:%h:%h:%h label" or
+		"%h:%h:%h:%h:%h:%h:%h:%h/%d label".
+	The "::" address shortcut is not supported.
+	If label is "-DELETE" a matched entry will be deleted.
 load
+	Provided for backward compatibility. The load2 interface
+	is preferred and should be used instead.
 	This interface allows access control rules in addition to
 	the system defined rules to be specified. The format accepted
 	on write is:
@@ -181,6 +201,8 @@ load2
 	permissions that are not allowed. The string "r-x--" would
 	specify read and execute access.
 load-self
+	Provided for backward compatibility. The load-self2 interface
+	is preferred and should be used instead.
 	This interface allows process specific access rules to be
 	defined. These rules are only consulted if access would
 	otherwise be permitted, and are intended to provide additional
@@ -205,6 +227,8 @@ netlabel
 	received from single label hosts are given the specified
 	label. The format accepted on write is:
 		"%d.%d.%d.%d label" or "%d.%d.%d.%d/%d label".
+	If the label specified is "-CIPSO" the address is treated
+	as a host that supports CIPSO headers.
 onlycap
 	This contains labels processes must have for CAP_MAC_ADMIN
 	and CAP_MAC_OVERRIDE to be effective. If this file is empty
@@ -232,7 +256,8 @@ unconfined
 	is dangerous and can ruin the proper labeling of your system.
 	It should never be used in production.
 
-You can add access rules in /etc/smack/accesses. They take the form:
+If you are using the smackload utility
+you can add access rules in /etc/smack/accesses. They take the form:
 
     subjectlabel objectlabel access
 
diff --git a/security/smack/smack.h b/security/smack/smack.h
index 69ab9eb7d6d9..fff0c612bbb7 100644
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
@@ -17,11 +17,26 @@
 #include <linux/spinlock.h>
 #include <linux/lsm_hooks.h>
 #include <linux/in.h>
+#if IS_ENABLED(CONFIG_IPV6)
+#include <linux/in6.h>
+#endif /* CONFIG_IPV6 */
 #include <net/netlabel.h>
 #include <linux/list.h>
 #include <linux/rculist.h>
 #include <linux/lsm_audit.h>
 
+/*
+ * Use IPv6 port labeling if IPv6 is enabled and secmarks
+ * are not being used.
+ */
+#if IS_ENABLED(CONFIG_IPV6) && !defined(CONFIG_SECURITY_SMACK_NETFILTER)
+#define SMACK_IPV6_PORT_LABELING 1
+#endif
+
+#if IS_ENABLED(CONFIG_IPV6) && defined(CONFIG_SECURITY_SMACK_NETFILTER)
+#define SMACK_IPV6_SECMARK_LABELING 1
+#endif
+
 /*
  * Smack labels were limited to 23 characters for a long time.
  */
@@ -118,15 +133,30 @@ struct smack_rule {
 };
 
 /*
- * An entry in the table identifying hosts.
+ * An entry in the table identifying IPv4 hosts.
  */
-struct smk_netlbladdr {
+struct smk_net4addr {
 	struct list_head	list;
-	struct sockaddr_in	smk_host;	/* network address */
+	struct in_addr		smk_host;	/* network address */
 	struct in_addr		smk_mask;	/* network mask */
+	int			smk_masks;	/* mask size */
+	struct smack_known	*smk_label;	/* label */
+};
+
+#if IS_ENABLED(CONFIG_IPV6)
+/*
+ * An entry in the table identifying IPv6 hosts.
+ */
+struct smk_net6addr {
+	struct list_head	list;
+	struct in6_addr		smk_host;	/* network address */
+	struct in6_addr		smk_mask;	/* network mask */
+	int			smk_masks;	/* mask size */
 	struct smack_known	*smk_label;	/* label */
 };
+#endif /* CONFIG_IPV6 */
 
+#ifdef SMACK_IPV6_PORT_LABELING
 /*
  * An entry in the table identifying ports.
  */
@@ -137,6 +167,7 @@ struct smk_port_label {
 	struct smack_known	*smk_in;	/* inbound label */
 	struct smack_known	*smk_out;	/* outgoing label */
 };
+#endif /* SMACK_IPV6_PORT_LABELING */
 
 struct smack_onlycap {
 	struct list_head	list;
@@ -170,6 +201,7 @@ enum {
 #define SMK_FSROOT	"smackfsroot="
 #define SMK_FSTRANS	"smackfstransmute="
 
+#define SMACK_DELETE_OPTION	"-DELETE"
 #define SMACK_CIPSO_OPTION 	"-CIPSO"
 
 /*
@@ -252,10 +284,6 @@ struct smk_audit_info {
 	struct smack_audit_data sad;
 #endif
 };
-/*
- * These functions are in smack_lsm.c
- */
-struct inode_smack *new_inode_smack(struct smack_known *);
 
 /*
  * These functions are in smack_access.c
@@ -285,7 +313,6 @@ extern struct smack_known *smack_syslog_label;
 #ifdef CONFIG_SECURITY_SMACK_BRINGUP
 extern struct smack_known *smack_unconfined;
 #endif
-extern struct smack_known smack_cipso_option;
 extern int smack_ptrace_rule;
 
 extern struct smack_known smack_known_floor;
@@ -297,7 +324,10 @@ extern struct smack_known smack_known_web;
 
 extern struct mutex	smack_known_lock;
 extern struct list_head smack_known_list;
-extern struct list_head smk_netlbladdr_list;
+extern struct list_head smk_net4addr_list;
+#if IS_ENABLED(CONFIG_IPV6)
+extern struct list_head smk_net6addr_list;
+#endif /* CONFIG_IPV6 */
 
 extern struct mutex     smack_onlycap_lock;
 extern struct list_head smack_onlycap_list;
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index d962f887d3f4..cc390bccecd7 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -51,9 +51,9 @@
 #define SMK_RECEIVING	1
 #define SMK_SENDING	2
 
-#if IS_ENABLED(CONFIG_IPV6) && !defined(CONFIG_SECURITY_SMACK_NETFILTER)
+#ifdef SMACK_IPV6_PORT_LABELING
 LIST_HEAD(smk_ipv6_port_list);
-#endif /* CONFIG_IPV6 && !CONFIG_SECURITY_SMACK_NETFILTER */
+#endif
 static struct kmem_cache *smack_inode_cache;
 int smack_enabled;
 
@@ -2272,7 +2272,7 @@ static void smack_sk_free_security(struct sock *sk)
 }
 
 /**
-* smack_host_label - check host based restrictions
+* smack_ipv4host_label - check host based restrictions
 * @sip: the object end
 *
 * looks for host based access restrictions
@@ -2283,30 +2283,96 @@ static void smack_sk_free_security(struct sock *sk)
 *
 * Returns the label of the far end or NULL if it's not special.
 */
-static struct smack_known *smack_host_label(struct sockaddr_in *sip)
+static struct smack_known *smack_ipv4host_label(struct sockaddr_in *sip)
 {
-	struct smk_netlbladdr *snp;
+	struct smk_net4addr *snp;
 	struct in_addr *siap = &sip->sin_addr;
 
 	if (siap->s_addr == 0)
 		return NULL;
 
-	list_for_each_entry_rcu(snp, &smk_netlbladdr_list, list)
+	list_for_each_entry_rcu(snp, &smk_net4addr_list, list)
+		/*
+		 * we break after finding the first match because
+		 * the list is sorted from longest to shortest mask
+		 * so we have found the most specific match
+		 */
+		if (snp->smk_host.s_addr ==
+		    (siap->s_addr & snp->smk_mask.s_addr))
+			return snp->smk_label;
+
+	return NULL;
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+/*
+ * smk_ipv6_localhost - Check for local ipv6 host address
+ * @sip: the address
+ *
+ * Returns boolean true if this is the localhost address
+ */
+static bool smk_ipv6_localhost(struct sockaddr_in6 *sip)
+{
+	__be16 *be16p = (__be16 *)&sip->sin6_addr;
+	__be32 *be32p = (__be32 *)&sip->sin6_addr;
+
+	if (be32p[0] == 0 && be32p[1] == 0 && be32p[2] == 0 && be16p[6] == 0 &&
+	    ntohs(be16p[7]) == 1)
+		return true;
+	return false;
+}
+
+/**
+* smack_ipv6host_label - check host based restrictions
+* @sip: the object end
+*
+* looks for host based access restrictions
+*
+* This version will only be appropriate for really small sets of single label
+* hosts.  The caller is responsible for ensuring that the RCU read lock is
+* taken before calling this function.
+*
+* Returns the label of the far end or NULL if it's not special.
+*/
+static struct smack_known *smack_ipv6host_label(struct sockaddr_in6 *sip)
+{
+	struct smk_net6addr *snp;
+	struct in6_addr *sap = &sip->sin6_addr;
+	int i;
+	int found = 0;
+
+	/*
+	 * It's local. Don't look for a host label.
+	 */
+	if (smk_ipv6_localhost(sip))
+		return NULL;
+
+	list_for_each_entry_rcu(snp, &smk_net6addr_list, list) {
 		/*
 		* we break after finding the first match because
 		* the list is sorted from longest to shortest mask
 		* so we have found the most specific match
 		*/
-		if ((&snp->smk_host.sin_addr)->s_addr ==
-		    (siap->s_addr & (&snp->smk_mask)->s_addr)) {
-			/* we have found the special CIPSO option */
-			if (snp->smk_label == &smack_cipso_option)
-				return NULL;
-			return snp->smk_label;
+		for (found = 1, i = 0; i < 8; i++) {
+			/*
+			 * If the label is NULL the entry has
+			 * been renounced. Ignore it.
+			 */
+			if (snp->smk_label == NULL)
+				continue;
+			if ((sap->s6_addr16[i] & snp->smk_mask.s6_addr16[i]) !=
+			    snp->smk_host.s6_addr16[i]) {
+				found = 0;
+				break;
+			}
 		}
+		if (found)
+			return snp->smk_label;
+	}
 
 	return NULL;
 }
+#endif /* CONFIG_IPV6 */
 
 /**
  * smack_netlabel - Set the secattr on a socket
@@ -2370,7 +2436,7 @@ static int smack_netlabel_send(struct sock *sk, struct sockaddr_in *sap)
 	struct smk_audit_info ad;
 
 	rcu_read_lock();
-	hkp = smack_host_label(sap);
+	hkp = smack_ipv4host_label(sap);
 	if (hkp != NULL) {
 #ifdef CONFIG_AUDIT
 		struct lsm_network_audit net;
@@ -2395,7 +2461,42 @@ static int smack_netlabel_send(struct sock *sk, struct sockaddr_in *sap)
 	return smack_netlabel(sk, sk_lbl);
 }
 
-#if IS_ENABLED(CONFIG_IPV6) && !defined(CONFIG_SECURITY_SMACK_NETFILTER)
+#if IS_ENABLED(CONFIG_IPV6)
+/**
+ * smk_ipv6_check - check Smack access
+ * @subject: subject Smack label
+ * @object: object Smack label
+ * @address: address
+ * @act: the action being taken
+ *
+ * Check an IPv6 access
+ */
+static int smk_ipv6_check(struct smack_known *subject,
+				struct smack_known *object,
+				struct sockaddr_in6 *address, int act)
+{
+#ifdef CONFIG_AUDIT
+	struct lsm_network_audit net;
+#endif
+	struct smk_audit_info ad;
+	int rc;
+
+#ifdef CONFIG_AUDIT
+	smk_ad_init_net(&ad, __func__, LSM_AUDIT_DATA_NET, &net);
+	ad.a.u.net->family = PF_INET6;
+	ad.a.u.net->dport = ntohs(address->sin6_port);
+	if (act == SMK_RECEIVING)
+		ad.a.u.net->v6info.saddr = address->sin6_addr;
+	else
+		ad.a.u.net->v6info.daddr = address->sin6_addr;
+#endif
+	rc = smk_access(subject, object, MAY_WRITE, &ad);
+	rc = smk_bu_note("IPv6 check", subject, object, MAY_WRITE, rc);
+	return rc;
+}
+#endif /* CONFIG_IPV6 */
+
+#ifdef SMACK_IPV6_PORT_LABELING
 /**
  * smk_ipv6_port_label - Smack port access table management
  * @sock: socket
@@ -2479,48 +2580,43 @@ static void smk_ipv6_port_label(struct socket *sock, struct sockaddr *address)
 static int smk_ipv6_port_check(struct sock *sk, struct sockaddr_in6 *address,
 				int act)
 {
-	__be16 *bep;
-	__be32 *be32p;
 	struct smk_port_label *spp;
 	struct socket_smack *ssp = sk->sk_security;
-	struct smack_known *skp;
-	unsigned short port = 0;
+	struct smack_known *skp = NULL;
+	unsigned short port;
 	struct smack_known *object;
-	struct smk_audit_info ad;
-	int rc;
-#ifdef CONFIG_AUDIT
-	struct lsm_network_audit net;
-#endif
 
 	if (act == SMK_RECEIVING) {
-		skp = smack_net_ambient;
+		skp = smack_ipv6host_label(address);
 		object = ssp->smk_in;
 	} else {
 		skp = ssp->smk_out;
-		object = smack_net_ambient;
+		object = smack_ipv6host_label(address);
 	}
 
 	/*
-	 * Get the IP address and port from the address.
+	 * The other end is a single label host.
 	 */
-	port = ntohs(address->sin6_port);
-	bep = (__be16 *)(&address->sin6_addr);
-	be32p = (__be32 *)(&address->sin6_addr);
+	if (skp != NULL && object != NULL)
+		return smk_ipv6_check(skp, object, address, act);
+	if (skp == NULL)
+		skp = smack_net_ambient;
+	if (object == NULL)
+		object = smack_net_ambient;
 
 	/*
 	 * It's remote, so port lookup does no good.
 	 */
-	if (be32p[0] || be32p[1] || be32p[2] || bep[6] || ntohs(bep[7]) != 1)
-		goto auditout;
+	if (!smk_ipv6_localhost(address))
+		return smk_ipv6_check(skp, object, address, act);
 
 	/*
 	 * It's local so the send check has to have passed.
 	 */
-	if (act == SMK_RECEIVING) {
-		skp = &smack_known_web;
-		goto auditout;
-	}
+	if (act == SMK_RECEIVING)
+		return 0;
 
+	port = ntohs(address->sin6_port);
 	list_for_each_entry(spp, &smk_ipv6_port_list, list) {
 		if (spp->smk_port != port)
 			continue;
@@ -2530,22 +2626,9 @@ static int smk_ipv6_port_check(struct sock *sk, struct sockaddr_in6 *address,
 		break;
 	}
 
-auditout:
-
-#ifdef CONFIG_AUDIT
-	smk_ad_init_net(&ad, __func__, LSM_AUDIT_DATA_NET, &net);
-	ad.a.u.net->family = sk->sk_family;
-	ad.a.u.net->dport = port;
-	if (act == SMK_RECEIVING)
-		ad.a.u.net->v6info.saddr = address->sin6_addr;
-	else
-		ad.a.u.net->v6info.daddr = address->sin6_addr;
-#endif
-	rc = smk_access(skp, object, MAY_WRITE, &ad);
-	rc = smk_bu_note("IPv6 port check", skp, object, MAY_WRITE, rc);
-	return rc;
+	return smk_ipv6_check(skp, object, address, act);
 }
-#endif /* CONFIG_IPV6 && !CONFIG_SECURITY_SMACK_NETFILTER */
+#endif /* SMACK_IPV6_PORT_LABELING */
 
 /**
  * smack_inode_setsecurity - set smack xattrs
@@ -2606,10 +2689,10 @@ static int smack_inode_setsecurity(struct inode *inode, const char *name,
 	} else
 		return -EOPNOTSUPP;
 
-#if IS_ENABLED(CONFIG_IPV6) && !defined(CONFIG_SECURITY_SMACK_NETFILTER)
+#ifdef SMACK_IPV6_PORT_LABELING
 	if (sock->sk->sk_family == PF_INET6)
 		smk_ipv6_port_label(sock, NULL);
-#endif /* CONFIG_IPV6 && !CONFIG_SECURITY_SMACK_NETFILTER */
+#endif
 
 	return 0;
 }
@@ -2651,7 +2734,7 @@ static int smack_socket_post_create(struct socket *sock, int family,
 	return smack_netlabel(sock->sk, SMACK_CIPSO_SOCKET);
 }
 
-#ifndef CONFIG_SECURITY_SMACK_NETFILTER
+#ifdef SMACK_IPV6_PORT_LABELING
 /**
  * smack_socket_bind - record port binding information.
  * @sock: the socket
@@ -2665,14 +2748,11 @@ static int smack_socket_post_create(struct socket *sock, int family,
 static int smack_socket_bind(struct socket *sock, struct sockaddr *address,
 				int addrlen)
 {
-#if IS_ENABLED(CONFIG_IPV6)
 	if (sock->sk != NULL && sock->sk->sk_family == PF_INET6)
 		smk_ipv6_port_label(sock, address);
-#endif
-
 	return 0;
 }
-#endif /* !CONFIG_SECURITY_SMACK_NETFILTER */
+#endif /* SMACK_IPV6_PORT_LABELING */
 
 /**
  * smack_socket_connect - connect access check
@@ -2688,6 +2768,13 @@ static int smack_socket_connect(struct socket *sock, struct sockaddr *sap,
 				int addrlen)
 {
 	int rc = 0;
+#if IS_ENABLED(CONFIG_IPV6)
+	struct sockaddr_in6 *sip = (struct sockaddr_in6 *)sap;
+#endif
+#ifdef SMACK_IPV6_SECMARK_LABELING
+	struct smack_known *rsp;
+	struct socket_smack *ssp = sock->sk->sk_security;
+#endif
 
 	if (sock->sk == NULL)
 		return 0;
@@ -2701,10 +2788,15 @@ static int smack_socket_connect(struct socket *sock, struct sockaddr *sap,
 	case PF_INET6:
 		if (addrlen < sizeof(struct sockaddr_in6))
 			return -EINVAL;
-#if IS_ENABLED(CONFIG_IPV6) && !defined(CONFIG_SECURITY_SMACK_NETFILTER)
-		rc = smk_ipv6_port_check(sock->sk, (struct sockaddr_in6 *)sap,
+#ifdef SMACK_IPV6_SECMARK_LABELING
+		rsp = smack_ipv6host_label(sip);
+		if (rsp != NULL)
+			rc = smk_ipv6_check(ssp->smk_out, rsp, sip,
 						SMK_CONNECTING);
-#endif /* CONFIG_IPV6 && !CONFIG_SECURITY_SMACK_NETFILTER */
+#endif
+#ifdef SMACK_IPV6_PORT_LABELING
+		rc = smk_ipv6_port_check(sock->sk, sip, SMK_CONNECTING);
+#endif
 		break;
 	}
 	return rc;
@@ -3590,9 +3682,13 @@ static int smack_socket_sendmsg(struct socket *sock, struct msghdr *msg,
 				int size)
 {
 	struct sockaddr_in *sip = (struct sockaddr_in *) msg->msg_name;
-#if IS_ENABLED(CONFIG_IPV6) && !defined(CONFIG_SECURITY_SMACK_NETFILTER)
+#if IS_ENABLED(CONFIG_IPV6)
 	struct sockaddr_in6 *sap = (struct sockaddr_in6 *) msg->msg_name;
-#endif /* CONFIG_IPV6 && !CONFIG_SECURITY_SMACK_NETFILTER */
+#endif
+#ifdef SMACK_IPV6_SECMARK_LABELING
+	struct socket_smack *ssp = sock->sk->sk_security;
+	struct smack_known *rsp;
+#endif
 	int rc = 0;
 
 	/*
@@ -3606,9 +3702,15 @@ static int smack_socket_sendmsg(struct socket *sock, struct msghdr *msg,
 		rc = smack_netlabel_send(sock->sk, sip);
 		break;
 	case AF_INET6:
-#if IS_ENABLED(CONFIG_IPV6) && !defined(CONFIG_SECURITY_SMACK_NETFILTER)
+#ifdef SMACK_IPV6_SECMARK_LABELING
+		rsp = smack_ipv6host_label(sap);
+		if (rsp != NULL)
+			rc = smk_ipv6_check(ssp->smk_out, rsp, sap,
+						SMK_CONNECTING);
+#endif
+#ifdef SMACK_IPV6_PORT_LABELING
 		rc = smk_ipv6_port_check(sock->sk, sap, SMK_SENDING);
-#endif /* CONFIG_IPV6 && !CONFIG_SECURITY_SMACK_NETFILTER */
+#endif
 		break;
 	}
 	return rc;
@@ -3822,10 +3924,12 @@ access_check:
 		proto = smk_skb_to_addr_ipv6(skb, &sadd);
 		if (proto != IPPROTO_UDP && proto != IPPROTO_TCP)
 			break;
-#ifdef CONFIG_SECURITY_SMACK_NETFILTER
+#ifdef SMACK_IPV6_SECMARK_LABELING
 		if (skb && skb->secmark != 0)
 			skp = smack_from_secid(skb->secmark);
 		else
+			skp = smack_ipv6host_label(&sadd);
+		if (skp == NULL)
 			skp = smack_net_ambient;
 #ifdef CONFIG_AUDIT
 		smk_ad_init_net(&ad, __func__, LSM_AUDIT_DATA_NET, &net);
@@ -3836,9 +3940,10 @@ access_check:
 		rc = smk_access(skp, ssp->smk_in, MAY_WRITE, &ad);
 		rc = smk_bu_note("IPv6 delivery", skp, ssp->smk_in,
 					MAY_WRITE, rc);
-#else /* CONFIG_SECURITY_SMACK_NETFILTER */
+#endif /* SMACK_IPV6_SECMARK_LABELING */
+#ifdef SMACK_IPV6_PORT_LABELING
 		rc = smk_ipv6_port_check(sk, &sadd, SMK_RECEIVING);
-#endif /* CONFIG_SECURITY_SMACK_NETFILTER */
+#endif /* SMACK_IPV6_PORT_LABELING */
 		break;
 #endif /* CONFIG_IPV6 */
 	}
@@ -3936,13 +4041,11 @@ static int smack_socket_getpeersec_dgram(struct socket *sock,
 		}
 		netlbl_secattr_destroy(&secattr);
 		break;
-#if IS_ENABLED(CONFIG_IPV6)
 	case PF_INET6:
-#ifdef CONFIG_SECURITY_SMACK_NETFILTER
+#ifdef SMACK_IPV6_SECMARK_LABELING
 		s = skb->secmark;
-#endif /* CONFIG_SECURITY_SMACK_NETFILTER */
+#endif
 		break;
-#endif /* CONFIG_IPV6 */
 	}
 	*secid = s;
 	if (s == 0)
@@ -4065,7 +4168,7 @@ access_check:
 	hdr = ip_hdr(skb);
 	addr.sin_addr.s_addr = hdr->saddr;
 	rcu_read_lock();
-	hskp = smack_host_label(&addr);
+	hskp = smack_ipv4host_label(&addr);
 	rcu_read_unlock();
 
 	if (hskp == NULL)
@@ -4517,9 +4620,9 @@ struct security_hook_list smack_hooks[] = {
 	LSM_HOOK_INIT(unix_may_send, smack_unix_may_send),
 
 	LSM_HOOK_INIT(socket_post_create, smack_socket_post_create),
-#ifndef CONFIG_SECURITY_SMACK_NETFILTER
+#ifdef SMACK_IPV6_PORT_LABELING
 	LSM_HOOK_INIT(socket_bind, smack_socket_bind),
-#endif /* CONFIG_SECURITY_SMACK_NETFILTER */
+#endif
 	LSM_HOOK_INIT(socket_connect, smack_socket_connect),
 	LSM_HOOK_INIT(socket_sendmsg, smack_socket_sendmsg),
 	LSM_HOOK_INIT(socket_sock_rcv_skb, smack_socket_sock_rcv_skb),
@@ -4614,7 +4717,16 @@ static __init int smack_init(void)
 		return -ENOMEM;
 	}
 
-	printk(KERN_INFO "Smack:  Initializing.\n");
+	pr_info("Smack:  Initializing.\n");
+#ifdef CONFIG_SECURITY_SMACK_NETFILTER
+	pr_info("Smack:  Netfilter enabled.\n");
+#endif
+#ifdef SMACK_IPV6_PORT_LABELING
+	pr_info("Smack:  IPv6 port labeling enabled.\n");
+#endif
+#ifdef SMACK_IPV6_SECMARK_LABELING
+	pr_info("Smack:  IPv6 Netfilter enabled.\n");
+#endif
 
 	/*
 	 * Set the security state for the initial task.
diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c
index 81a2888a9908..11b752b366ea 100644
--- a/security/smack/smackfs.c
+++ b/security/smack/smackfs.c
@@ -29,6 +29,7 @@
 #include <linux/magic.h>
 #include "smack.h"
 
+#define BEBITS	(sizeof(__be32) * 8)
 /*
  * smackfs pseudo filesystem.
  */
@@ -40,7 +41,7 @@ enum smk_inos {
 	SMK_DOI		= 5,	/* CIPSO DOI */
 	SMK_DIRECT	= 6,	/* CIPSO level indicating direct label */
 	SMK_AMBIENT	= 7,	/* internet ambient label */
-	SMK_NETLBLADDR	= 8,	/* single label hosts */
+	SMK_NET4ADDR	= 8,	/* single label hosts */
 	SMK_ONLYCAP	= 9,	/* the only "capable" label */
 	SMK_LOGGING	= 10,	/* logging */
 	SMK_LOAD_SELF	= 11,	/* task specific rules */
@@ -57,6 +58,9 @@ enum smk_inos {
 #ifdef CONFIG_SECURITY_SMACK_BRINGUP
 	SMK_UNCONFINED	= 22,	/* define an unconfined label */
 #endif
+#if IS_ENABLED(CONFIG_IPV6)
+	SMK_NET6ADDR	= 23,	/* single label IPv6 hosts */
+#endif /* CONFIG_IPV6 */
 };
 
 /*
@@ -64,7 +68,10 @@ enum smk_inos {
  */
 static DEFINE_MUTEX(smack_cipso_lock);
 static DEFINE_MUTEX(smack_ambient_lock);
-static DEFINE_MUTEX(smk_netlbladdr_lock);
+static DEFINE_MUTEX(smk_net4addr_lock);
+#if IS_ENABLED(CONFIG_IPV6)
+static DEFINE_MUTEX(smk_net6addr_lock);
+#endif /* CONFIG_IPV6 */
 
 /*
  * This is the "ambient" label for network traffic.
@@ -118,7 +125,10 @@ int smack_ptrace_rule = SMACK_PTRACE_DEFAULT;
  * can write to the specified label.
  */
 
-LIST_HEAD(smk_netlbladdr_list);
+LIST_HEAD(smk_net4addr_list);
+#if IS_ENABLED(CONFIG_IPV6)
+LIST_HEAD(smk_net6addr_list);
+#endif /* CONFIG_IPV6 */
 
 /*
  * Rule lists are maintained for each label.
@@ -140,11 +150,6 @@ struct smack_parsed_rule {
 
 static int smk_cipso_doi_value = SMACK_CIPSO_DOI_DEFAULT;
 
-struct smack_known smack_cipso_option = {
-	.smk_known	= SMACK_CIPSO_OPTION,
-	.smk_secid	= 0,
-};
-
 /*
  * Values for parsing cipso rules
  * SMK_DIGITLEN: Length of a digit field in a rule.
@@ -1047,92 +1052,90 @@ static const struct file_operations smk_cipso2_ops = {
  * Seq_file read operations for /smack/netlabel
  */
 
-static void *netlbladdr_seq_start(struct seq_file *s, loff_t *pos)
+static void *net4addr_seq_start(struct seq_file *s, loff_t *pos)
 {
-	return smk_seq_start(s, pos, &smk_netlbladdr_list);
+	return smk_seq_start(s, pos, &smk_net4addr_list);
 }
 
-static void *netlbladdr_seq_next(struct seq_file *s, void *v, loff_t *pos)
+static void *net4addr_seq_next(struct seq_file *s, void *v, loff_t *pos)
 {
-	return smk_seq_next(s, v, pos, &smk_netlbladdr_list);
+	return smk_seq_next(s, v, pos, &smk_net4addr_list);
 }
-#define BEBITS	(sizeof(__be32) * 8)
 
 /*
  * Print host/label pairs
  */
-static int netlbladdr_seq_show(struct seq_file *s, void *v)
+static int net4addr_seq_show(struct seq_file *s, void *v)
 {
 	struct list_head *list = v;
-	struct smk_netlbladdr *skp =
-			list_entry_rcu(list, struct smk_netlbladdr, list);
-	unsigned char *hp = (char *) &skp->smk_host.sin_addr.s_addr;
-	int maskn;
-	u32 temp_mask = be32_to_cpu(skp->smk_mask.s_addr);
-
-	for (maskn = 0; temp_mask; temp_mask <<= 1, maskn++);
+	struct smk_net4addr *skp =
+			list_entry_rcu(list, struct smk_net4addr, list);
+	char *kp = SMACK_CIPSO_OPTION;
 
-	seq_printf(s, "%u.%u.%u.%u/%d %s\n",
-		hp[0], hp[1], hp[2], hp[3], maskn, skp->smk_label->smk_known);
+	if (skp->smk_label != NULL)
+		kp = skp->smk_label->smk_known;
+	seq_printf(s, "%pI4/%d %s\n", &skp->smk_host.s_addr,
+			skp->smk_masks, kp);
 
 	return 0;
 }
 
-static const struct seq_operations netlbladdr_seq_ops = {
-	.start = netlbladdr_seq_start,
-	.next  = netlbladdr_seq_next,
-	.show  = netlbladdr_seq_show,
+static const struct seq_operations net4addr_seq_ops = {
+	.start = net4addr_seq_start,
+	.next  = net4addr_seq_next,
+	.show  = net4addr_seq_show,
 	.stop  = smk_seq_stop,
 };
 
 /**
- * smk_open_netlbladdr - open() for /smack/netlabel
+ * smk_open_net4addr - open() for /smack/netlabel
  * @inode: inode structure representing file
  * @file: "netlabel" file pointer
  *
- * Connect our netlbladdr_seq_* operations with /smack/netlabel
+ * Connect our net4addr_seq_* operations with /smack/netlabel
  * file_operations
  */
-static int smk_open_netlbladdr(struct inode *inode, struct file *file)
+static int smk_open_net4addr(struct inode *inode, struct file *file)
 {
-	return seq_open(file, &netlbladdr_seq_ops);
+	return seq_open(file, &net4addr_seq_ops);
 }
 
 /**
- * smk_netlbladdr_insert
+ * smk_net4addr_insert
  * @new : netlabel to insert
  *
- * This helper insert netlabel in the smack_netlbladdrs list
+ * This helper insert netlabel in the smack_net4addrs list
  * sorted by netmask length (longest to smallest)
- * locked by &smk_netlbladdr_lock in smk_write_netlbladdr
+ * locked by &smk_net4addr_lock in smk_write_net4addr
  *
  */
-static void smk_netlbladdr_insert(struct smk_netlbladdr *new)
+static void smk_net4addr_insert(struct smk_net4addr *new)
 {
-	struct smk_netlbladdr *m, *m_next;
+	struct smk_net4addr *m;
+	struct smk_net4addr *m_next;
 
-	if (list_empty(&smk_netlbladdr_list)) {
-		list_add_rcu(&new->list, &smk_netlbladdr_list);
+	if (list_empty(&smk_net4addr_list)) {
+		list_add_rcu(&new->list, &smk_net4addr_list);
 		return;
 	}
 
-	m = list_entry_rcu(smk_netlbladdr_list.next,
-			   struct smk_netlbladdr, list);
+	m = list_entry_rcu(smk_net4addr_list.next,
+			   struct smk_net4addr, list);
 
 	/* the comparison '>' is a bit hacky, but works */
-	if (new->smk_mask.s_addr > m->smk_mask.s_addr) {
-		list_add_rcu(&new->list, &smk_netlbladdr_list);
+	if (new->smk_masks > m->smk_masks) {
+		list_add_rcu(&new->list, &smk_net4addr_list);
 		return;
 	}
 
-	list_for_each_entry_rcu(m, &smk_netlbladdr_list, list) {
-		if (list_is_last(&m->list, &smk_netlbladdr_list)) {
+	list_for_each_entry_rcu(m, &smk_net4addr_list, list) {
+		if (list_is_last(&m->list, &smk_net4addr_list)) {
 			list_add_rcu(&new->list, &m->list);
 			return;
 		}
 		m_next = list_entry_rcu(m->list.next,
-					struct smk_netlbladdr, list);
-		if (new->smk_mask.s_addr > m_next->smk_mask.s_addr) {
+					struct smk_net4addr, list);
+		if (new->smk_masks > m_next->smk_masks) {
 			list_add_rcu(&new->list, &m->list);
 			return;
 		}
@@ -1141,28 +1144,29 @@ static void smk_netlbladdr_insert(struct smk_netlbladdr *new)
 
 
 /**
- * smk_write_netlbladdr - write() for /smack/netlabel
+ * smk_write_net4addr - write() for /smack/netlabel
  * @file: file pointer, not actually used
  * @buf: where to get the data from
  * @count: bytes sent
  * @ppos: where to start
  *
- * Accepts only one netlbladdr per write call.
+ * Accepts only one net4addr per write call.
  * Returns number of bytes written or error code, as appropriate
  */
-static ssize_t smk_write_netlbladdr(struct file *file, const char __user *buf,
+static ssize_t smk_write_net4addr(struct file *file, const char __user *buf,
 				size_t count, loff_t *ppos)
 {
-	struct smk_netlbladdr *snp;
+	struct smk_net4addr *snp;
 	struct sockaddr_in newname;
 	char *smack;
-	struct smack_known *skp;
+	struct smack_known *skp = NULL;
 	char *data;
 	char *host = (char *)&newname.sin_addr.s_addr;
 	int rc;
 	struct netlbl_audit audit_info;
 	struct in_addr mask;
 	unsigned int m;
+	unsigned int masks;
 	int found;
 	u32 mask_bits = (1<<31);
 	__be32 nsa;
@@ -1200,7 +1204,7 @@ static ssize_t smk_write_netlbladdr(struct file *file, const char __user *buf,
 	data[count] = '\0';
 
 	rc = sscanf(data, "%hhd.%hhd.%hhd.%hhd/%u %s",
-		&host[0], &host[1], &host[2], &host[3], &m, smack);
+		&host[0], &host[1], &host[2], &host[3], &masks, smack);
 	if (rc != 6) {
 		rc = sscanf(data, "%hhd.%hhd.%hhd.%hhd %s",
 			&host[0], &host[1], &host[2], &host[3], smack);
@@ -1209,8 +1213,9 @@ static ssize_t smk_write_netlbladdr(struct file *file, const char __user *buf,
 			goto free_out;
 		}
 		m = BEBITS;
+		masks = 32;
 	}
-	if (m > BEBITS) {
+	if (masks > BEBITS) {
 		rc = -EINVAL;
 		goto free_out;
 	}
@@ -1225,16 +1230,16 @@ static ssize_t smk_write_netlbladdr(struct file *file, const char __user *buf,
 			goto free_out;
 		}
 	} else {
-		/* check known options */
-		if (strcmp(smack, smack_cipso_option.smk_known) == 0)
-			skp = &smack_cipso_option;
-		else {
+		/*
+		 * Only the -CIPSO option is supported for IPv4
+		 */
+		if (strcmp(smack, SMACK_CIPSO_OPTION) != 0) {
 			rc = -EINVAL;
 			goto free_out;
 		}
 	}
 
-	for (temp_mask = 0; m > 0; m--) {
+	for (m = masks, temp_mask = 0; m > 0; m--) {
 		temp_mask |= mask_bits;
 		mask_bits >>= 1;
 	}
@@ -1245,14 +1250,13 @@ static ssize_t smk_write_netlbladdr(struct file *file, const char __user *buf,
 	 * Only allow one writer at a time. Writes should be
 	 * quite rare and small in any case.
 	 */
-	mutex_lock(&smk_netlbladdr_lock);
+	mutex_lock(&smk_net4addr_lock);
 
 	nsa = newname.sin_addr.s_addr;
 	/* try to find if the prefix is already in the list */
 	found = 0;
-	list_for_each_entry_rcu(snp, &smk_netlbladdr_list, list) {
-		if (snp->smk_host.sin_addr.s_addr == nsa &&
-		    snp->smk_mask.s_addr == mask.s_addr) {
+	list_for_each_entry_rcu(snp, &smk_net4addr_list, list) {
+		if (snp->smk_host.s_addr == nsa && snp->smk_masks == masks) {
 			found = 1;
 			break;
 		}
@@ -1265,17 +1269,20 @@ static ssize_t smk_write_netlbladdr(struct file *file, const char __user *buf,
 			rc = -ENOMEM;
 		else {
 			rc = 0;
-			snp->smk_host.sin_addr.s_addr = newname.sin_addr.s_addr;
+			snp->smk_host.s_addr = newname.sin_addr.s_addr;
 			snp->smk_mask.s_addr = mask.s_addr;
 			snp->smk_label = skp;
-			smk_netlbladdr_insert(snp);
+			snp->smk_masks = masks;
+			smk_net4addr_insert(snp);
 		}
 	} else {
-		/* we delete the unlabeled entry, only if the previous label
-		 * wasn't the special CIPSO option */
-		if (snp->smk_label != &smack_cipso_option)
+		/*
+		 * Delete the unlabeled entry, only if the previous label
+		 * wasn't the special CIPSO option
+		 */
+		if (snp->smk_label != NULL)
 			rc = netlbl_cfg_unlbl_static_del(&init_net, NULL,
-					&snp->smk_host.sin_addr, &snp->smk_mask,
+					&snp->smk_host, &snp->smk_mask,
 					PF_INET, &audit_info);
 		else
 			rc = 0;
@@ -1287,15 +1294,279 @@ static ssize_t smk_write_netlbladdr(struct file *file, const char __user *buf,
 	 * this host so that incoming packets get labeled.
 	 * but only if we didn't get the special CIPSO option
 	 */
-	if (rc == 0 && skp != &smack_cipso_option)
+	if (rc == 0 && skp != NULL)
 		rc = netlbl_cfg_unlbl_static_add(&init_net, NULL,
-			&snp->smk_host.sin_addr, &snp->smk_mask, PF_INET,
+			&snp->smk_host, &snp->smk_mask, PF_INET,
 			snp->smk_label->smk_secid, &audit_info);
 
 	if (rc == 0)
 		rc = count;
 
-	mutex_unlock(&smk_netlbladdr_lock);
+	mutex_unlock(&smk_net4addr_lock);
+
+free_out:
+	kfree(smack);
+free_data_out:
+	kfree(data);
+
+	return rc;
+}
+
+static const struct file_operations smk_net4addr_ops = {
+	.open           = smk_open_net4addr,
+	.read		= seq_read,
+	.llseek         = seq_lseek,
+	.write		= smk_write_net4addr,
+	.release        = seq_release,
+};
+
+#if IS_ENABLED(CONFIG_IPV6)
+/*
+ * Seq_file read operations for /smack/netlabel6
+ */
+
+static void *net6addr_seq_start(struct seq_file *s, loff_t *pos)
+{
+	return smk_seq_start(s, pos, &smk_net6addr_list);
+}
+
+static void *net6addr_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	return smk_seq_next(s, v, pos, &smk_net6addr_list);
+}
+
+/*
+ * Print host/label pairs
+ */
+static int net6addr_seq_show(struct seq_file *s, void *v)
+{
+	struct list_head *list = v;
+	struct smk_net6addr *skp =
+			 list_entry(list, struct smk_net6addr, list);
+
+	if (skp->smk_label != NULL)
+		seq_printf(s, "%pI6/%d %s\n", &skp->smk_host, skp->smk_masks,
+				skp->smk_label->smk_known);
+
+	return 0;
+}
+
+static const struct seq_operations net6addr_seq_ops = {
+	.start = net6addr_seq_start,
+	.next  = net6addr_seq_next,
+	.show  = net6addr_seq_show,
+	.stop  = smk_seq_stop,
+};
+
+/**
+ * smk_open_net6addr - open() for /smack/netlabel
+ * @inode: inode structure representing file
+ * @file: "netlabel" file pointer
+ *
+ * Connect our net6addr_seq_* operations with /smack/netlabel
+ * file_operations
+ */
+static int smk_open_net6addr(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &net6addr_seq_ops);
+}
+
+/**
+ * smk_net6addr_insert
+ * @new : entry to insert
+ *
+ * This inserts an entry in the smack_net6addrs list
+ * sorted by netmask length (longest to smallest)
+ * locked by &smk_net6addr_lock in smk_write_net6addr
+ *
+ */
+static void smk_net6addr_insert(struct smk_net6addr *new)
+{
+	struct smk_net6addr *m_next;
+	struct smk_net6addr *m;
+
+	if (list_empty(&smk_net6addr_list)) {
+		list_add_rcu(&new->list, &smk_net6addr_list);
+		return;
+	}
+
+	m = list_entry_rcu(smk_net6addr_list.next,
+			   struct smk_net6addr, list);
+
+	if (new->smk_masks > m->smk_masks) {
+		list_add_rcu(&new->list, &smk_net6addr_list);
+		return;
+	}
+
+	list_for_each_entry_rcu(m, &smk_net6addr_list, list) {
+		if (list_is_last(&m->list, &smk_net6addr_list)) {
+			list_add_rcu(&new->list, &m->list);
+			return;
+		}
+		m_next = list_entry_rcu(m->list.next,
+					struct smk_net6addr, list);
+		if (new->smk_masks > m_next->smk_masks) {
+			list_add_rcu(&new->list, &m->list);
+			return;
+		}
+	}
+}
+
+
+/**
+ * smk_write_net6addr - write() for /smack/netlabel
+ * @file: file pointer, not actually used
+ * @buf: where to get the data from
+ * @count: bytes sent
+ * @ppos: where to start
+ *
+ * Accepts only one net6addr per write call.
+ * Returns number of bytes written or error code, as appropriate
+ */
+static ssize_t smk_write_net6addr(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	struct smk_net6addr *snp;
+	struct in6_addr newname;
+	struct in6_addr fullmask;
+	struct smack_known *skp = NULL;
+	char *smack;
+	char *data;
+	int rc = 0;
+	int found = 0;
+	int i;
+	unsigned int scanned[8];
+	unsigned int m;
+	unsigned int mask = 128;
+
+	/*
+	 * Must have privilege.
+	 * No partial writes.
+	 * Enough data must be present.
+	 * "<addr/mask, as a:b:c:d:e:f:g:h/e><space><label>"
+	 * "<addr, as a:b:c:d:e:f:g:h><space><label>"
+	 */
+	if (!smack_privileged(CAP_MAC_ADMIN))
+		return -EPERM;
+	if (*ppos != 0)
+		return -EINVAL;
+	if (count < SMK_NETLBLADDRMIN)
+		return -EINVAL;
+
+	data = kzalloc(count + 1, GFP_KERNEL);
+	if (data == NULL)
+		return -ENOMEM;
+
+	if (copy_from_user(data, buf, count) != 0) {
+		rc = -EFAULT;
+		goto free_data_out;
+	}
+
+	smack = kzalloc(count + 1, GFP_KERNEL);
+	if (smack == NULL) {
+		rc = -ENOMEM;
+		goto free_data_out;
+	}
+
+	data[count] = '\0';
+
+	i = sscanf(data, "%x:%x:%x:%x:%x:%x:%x:%x/%u %s",
+			&scanned[0], &scanned[1], &scanned[2], &scanned[3],
+			&scanned[4], &scanned[5], &scanned[6], &scanned[7],
+			&mask, smack);
+	if (i != 10) {
+		i = sscanf(data, "%x:%x:%x:%x:%x:%x:%x:%x %s",
+				&scanned[0], &scanned[1], &scanned[2],
+				&scanned[3], &scanned[4], &scanned[5],
+				&scanned[6], &scanned[7], smack);
+		if (i != 9) {
+			rc = -EINVAL;
+			goto free_out;
+		}
+	}
+	if (mask > 128) {
+		rc = -EINVAL;
+		goto free_out;
+	}
+	for (i = 0; i < 8; i++) {
+		if (scanned[i] > 0xffff) {
+			rc = -EINVAL;
+			goto free_out;
+		}
+		newname.s6_addr16[i] = htons(scanned[i]);
+	}
+
+	/*
+	 * If smack begins with '-', it is an option, don't import it
+	 */
+	if (smack[0] != '-') {
+		skp = smk_import_entry(smack, 0);
+		if (skp == NULL) {
+			rc = -EINVAL;
+			goto free_out;
+		}
+	} else {
+		/*
+		 * Only -DELETE is supported for IPv6
+		 */
+		if (strcmp(smack, SMACK_DELETE_OPTION) != 0) {
+			rc = -EINVAL;
+			goto free_out;
+		}
+	}
+
+	for (i = 0, m = mask; i < 8; i++) {
+		if (m >= 16) {
+			fullmask.s6_addr16[i] = 0xffff;
+			m -= 16;
+		} else if (m > 0) {
+			fullmask.s6_addr16[i] = (1 << m) - 1;
+			m = 0;
+		} else
+			fullmask.s6_addr16[i] = 0;
+		newname.s6_addr16[i] &= fullmask.s6_addr16[i];
+	}
+
+	/*
+	 * Only allow one writer at a time. Writes should be
+	 * quite rare and small in any case.
+	 */
+	mutex_lock(&smk_net6addr_lock);
+	/*
+	 * Try to find the prefix in the list
+	 */
+	list_for_each_entry_rcu(snp, &smk_net6addr_list, list) {
+		if (mask != snp->smk_masks)
+			continue;
+		for (found = 1, i = 0; i < 8; i++) {
+			if (newname.s6_addr16[i] !=
+			    snp->smk_host.s6_addr16[i]) {
+				found = 0;
+				break;
+			}
+		}
+		if (found == 1)
+			break;
+	}
+	if (found == 0) {
+		snp = kzalloc(sizeof(*snp), GFP_KERNEL);
+		if (snp == NULL)
+			rc = -ENOMEM;
+		else {
+			snp->smk_host = newname;
+			snp->smk_mask = fullmask;
+			snp->smk_masks = mask;
+			snp->smk_label = skp;
+			smk_net6addr_insert(snp);
+		}
+	} else {
+		snp->smk_label = skp;
+	}
+
+	if (rc == 0)
+		rc = count;
+
+	mutex_unlock(&smk_net6addr_lock);
 
 free_out:
 	kfree(smack);
@@ -1305,13 +1576,14 @@ free_data_out:
 	return rc;
 }
 
-static const struct file_operations smk_netlbladdr_ops = {
-	.open           = smk_open_netlbladdr,
+static const struct file_operations smk_net6addr_ops = {
+	.open           = smk_open_net6addr,
 	.read		= seq_read,
 	.llseek         = seq_lseek,
-	.write		= smk_write_netlbladdr,
+	.write		= smk_write_net6addr,
 	.release        = seq_release,
 };
+#endif /* CONFIG_IPV6 */
 
 /**
  * smk_read_doi - read() for /smack/doi
@@ -2515,8 +2787,8 @@ static int smk_fill_super(struct super_block *sb, void *data, int silent)
 			"direct", &smk_direct_ops, S_IRUGO|S_IWUSR},
 		[SMK_AMBIENT] = {
 			"ambient", &smk_ambient_ops, S_IRUGO|S_IWUSR},
-		[SMK_NETLBLADDR] = {
-			"netlabel", &smk_netlbladdr_ops, S_IRUGO|S_IWUSR},
+		[SMK_NET4ADDR] = {
+			"netlabel", &smk_net4addr_ops, S_IRUGO|S_IWUSR},
 		[SMK_ONLYCAP] = {
 			"onlycap", &smk_onlycap_ops, S_IRUGO|S_IWUSR},
 		[SMK_LOGGING] = {
@@ -2548,6 +2820,10 @@ static int smk_fill_super(struct super_block *sb, void *data, int silent)
 		[SMK_UNCONFINED] = {
 			"unconfined", &smk_unconfined_ops, S_IRUGO|S_IWUSR},
 #endif
+#if IS_ENABLED(CONFIG_IPV6)
+		[SMK_NET6ADDR] = {
+			"ipv6host", &smk_net6addr_ops, S_IRUGO|S_IWUSR},
+#endif /* CONFIG_IPV6 */
 		/* last one */
 			{""}
 	};
-- 
cgit v1.2.3


From 5b8b64843a3b272499d014a1786d072c1174912d Mon Sep 17 00:00:00 2001
From: Lars Persson <lars.persson@axis.com>
Date: Tue, 28 Jul 2015 12:01:47 +0200
Subject: dwc_eth_qos: Add Synopsys DWC Ethernet QoS bindings

Add device tree binding documentation for the Synopsys DWC Ethernet
QoS driver supporting revision 4.10a of the hardware IP.

Signed-off-by: Lars Persson <larper@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 .../bindings/net/snps,dwc-qos-ethernet.txt         | 75 ++++++++++++++++++++++
 1 file changed, 75 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
new file mode 100644
index 000000000000..51f8d2eba8d8
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
@@ -0,0 +1,75 @@
+* Synopsys DWC Ethernet QoS IP version 4.10 driver (GMAC)
+
+
+Required properties:
+- compatible: Should be "snps,dwc-qos-ethernet-4.10"
+- reg: Address and length of the register set for the device
+- clocks: Phandles to the reference clock and the bus clock
+- clock-names: Should be "phy_ref_clk" for the reference clock and "apb_pclk"
+  for the bus clock.
+- interrupt-parent: Should be the phandle for the interrupt controller
+  that services interrupts for this device
+- interrupts: Should contain the core's combined interrupt signal
+- phy-mode: See ethernet.txt file in the same directory
+
+Optional properties:
+- dma-coherent: Present if dma operations are coherent
+- mac-address: See ethernet.txt in the same directory
+- local-mac-address: See ethernet.txt in the same directory
+- snps,en-lpi: If present it enables use of the AXI low-power interface
+- snps,write-requests: Number of write requests that the AXI port can issue.
+  It depends on the SoC configuration.
+- snps,read-requests: Number of read requests that the AXI port can issue.
+  It depends on the SoC configuration.
+- snps,burst-map: Bitmap of allowed AXI burst lengts, with the LSB
+  representing 4, then 8 etc.
+- snps,txpbl: DMA Programmable burst length for the TX DMA
+- snps,rxpbl: DMA Programmable burst length for the RX DMA
+- snps,en-tx-lpi-clockgating: Enable gating of the MAC TX clock during
+  TX low-power mode.
+- phy-handle: See ethernet.txt file in the same directory
+- mdio device tree subnode: When the GMAC has a phy connected to its local
+    mdio, there must be device tree subnode with the following
+    required properties:
+    - compatible: Must be "snps,dwc-qos-ethernet-mdio".
+    - #address-cells: Must be <1>.
+    - #size-cells: Must be <0>.
+
+    For each phy on the mdio bus, there must be a node with the following
+    fields:
+
+    - reg: phy id used to communicate to phy.
+    - device_type: Must be "ethernet-phy".
+    - fixed-mode device tree subnode: see fixed-link.txt in the same directory
+
+Examples:
+ethernet2@40010000 {
+	clock-names = "phy_ref_clk", "apb_pclk";
+	clocks = <&clkc 17>, <&clkc 15>;
+	compatible = "snps,dwc-qos-ethernet-4.10";
+	interrupt-parent = <&intc>;
+	interrupts = <0x0 0x1e 0x4>;
+	reg = <0x40010000 0x4000>;
+	phy-handle = <&phy2>;
+	phy-mode = "gmii";
+
+	snps,en-tx-lpi-clockgating;
+	snps,en-lpi;
+	snps,write-requests = <2>;
+	snps,read-requests = <16>;
+	snps,burst-map = <0x7>;
+	snps,txpbl = <8>;
+	snps,rxpbl = <2>;
+
+	dma-coherent;
+
+	mdio {
+		#address-cells = <0x1>;
+		#size-cells = <0x0>;
+		phy2: phy@1 {
+			compatible = "ethernet-phy-ieee802.3-c22";
+			device_type = "ethernet-phy";
+			reg = <0x1>;
+		};
+	};
+};
-- 
cgit v1.2.3


From 0933328a1b8adb6c8b2b8c8b823dad0295659c40 Mon Sep 17 00:00:00 2001
From: Joachim Eastwood <manabian@gmail.com>
Date: Wed, 29 Jul 2015 00:09:02 +0200
Subject: stmmac: remove unused stmmac_of_data struct

As dwmac-* drivers that need OF match have been converted
to use their own internal OF match data structure this can
now be removed.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/stmmac.txt |  2 --
 include/linux/stmmac.h              | 18 ------------------
 2 files changed, 20 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index 5fddefa69baf..de5c42342ec3 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -274,8 +274,6 @@ capability register can replace what has been passed from the platform.
 Please see the following document:
 	Documentation/devicetree/bindings/net/stmmac.txt
 
-and the stmmac_of_data structure inside the include/linux/stmmac.h header file.
-
 4.11) This is a summary of the content of some relevant files:
  o stmmac_main.c: to implement the main network device driver;
  o stmmac_mdio.c: to provide mdio functions;
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index c86a20047cb1..b43cd56b78e9 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -125,22 +125,4 @@ struct plat_stmmacenet_data {
 	void (*exit)(struct platform_device *pdev, void *priv);
 	void *bsp_priv;
 };
-
-/* of_data for SoC glue layer device tree bindings */
-
-struct stmmac_of_data {
-	int has_gmac;
-	int enh_desc;
-	int tx_coe;
-	int rx_coe;
-	int bugged_jumbo;
-	int pmt;
-	int riwt_off;
-	void (*fix_mac_speed)(void *priv, unsigned int speed);
-	void (*bus_setup)(void __iomem *ioaddr);
-	void *(*setup)(struct platform_device *pdev);
-	void (*free)(struct platform_device *pdev, void *priv);
-	int (*init)(struct platform_device *pdev, void *priv);
-	void (*exit)(struct platform_device *pdev, void *priv);
-};
 #endif
-- 
cgit v1.2.3


From 75fee59550a9899fd9438ebc0a64c972829a8dd2 Mon Sep 17 00:00:00 2001
From: Joachim Eastwood <manabian@gmail.com>
Date: Wed, 29 Jul 2015 00:09:03 +0200
Subject: stmmac: remove setup/free glue callbacks

As all dwmac-* drivers have been converted to have a proper probe
function the setup callback can now be removed. Also remove the
free callback that wasn't used by any driver.

New dwmac-* drivers should implement standard probe and remove
functions to preform any needed setup and teardown.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/stmmac.txt                   | 8 ++------
 drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c   | 7 -------
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 3 ---
 include/linux/stmmac.h                                | 2 --
 4 files changed, 2 insertions(+), 18 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index de5c42342ec3..2903b1cf4d70 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -135,8 +135,6 @@ struct plat_stmmacenet_data {
 	int maxmtu;
 	void (*fix_mac_speed)(void *priv, unsigned int speed);
 	void (*bus_setup)(void __iomem *ioaddr);
-	void *(*setup)(struct platform_device *pdev);
-	void (*free)(struct platform_device *pdev, void *priv);
 	int (*init)(struct platform_device *pdev, void *priv);
 	void (*exit)(struct platform_device *pdev, void *priv);
 	void *bsp_priv;
@@ -177,12 +175,10 @@ Where:
  o bus_setup: perform HW setup of the bus. For example, on some ST platforms
 	     this field is used to configure the AMBA  bridge to generate more
 	     efficient STBus traffic.
- o setup/init/exit: callbacks used for calling a custom initialization;
+ o init/exit: callbacks used for calling a custom initialization;
 	     this is sometime necessary on some platforms (e.g. ST boxes)
 	     where the HW needs to have set some PIO lines or system cfg
-	     registers. setup should return a pointer to private data,
-	     which will be stored in bsp_priv, and then passed to init and
-	     exit callbacks. init/exit callbacks should not use or modify
+	     registers.  init/exit callbacks should not use or modify
 	     platform data.
  o bsp_priv: another private pointer.
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
index f4fe9f1a33b4..b1e5f24708c9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-generic.c
@@ -46,13 +46,6 @@ static int dwmac_generic_probe(struct platform_device *pdev)
 		plat_dat->unicast_filter_entries = 1;
 	}
 
-	/* Custom setup (if needed) */
-	if (plat_dat->setup) {
-		plat_dat->bsp_priv = plat_dat->setup(pdev);
-		if (IS_ERR(plat_dat->bsp_priv))
-			return PTR_ERR(plat_dat->bsp_priv);
-	}
-
 	/* Custom initialisation (if needed) */
 	if (plat_dat->init) {
 		ret = plat_dat->init(pdev, plat_dat->bsp_priv);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 55e569b330b2..1cb660405f35 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -300,9 +300,6 @@ int stmmac_pltfr_remove(struct platform_device *pdev)
 	if (priv->plat->exit)
 		priv->plat->exit(pdev, priv->plat->bsp_priv);
 
-	if (priv->plat->free)
-		priv->plat->free(pdev, priv->plat->bsp_priv);
-
 	return ret;
 }
 EXPORT_SYMBOL_GPL(stmmac_pltfr_remove);
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index b43cd56b78e9..eead8ab93c0a 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -119,8 +119,6 @@ struct plat_stmmacenet_data {
 	int rx_fifo_size;
 	void (*fix_mac_speed)(void *priv, unsigned int speed);
 	void (*bus_setup)(void __iomem *ioaddr);
-	void *(*setup)(struct platform_device *pdev);
-	void (*free)(struct platform_device *pdev, void *priv);
 	int (*init)(struct platform_device *pdev, void *priv);
 	void (*exit)(struct platform_device *pdev, void *priv);
 	void *bsp_priv;
-- 
cgit v1.2.3


From f2d2eb9d05b7d03acff0589e5fd8c13a0031cb6c Mon Sep 17 00:00:00 2001
From: Christian Borntraeger <borntraeger@de.ibm.com>
Date: Mon, 20 Jul 2015 22:05:55 +0200
Subject: KVM: s390: remove outdated documentation

The old Documentation/s390/kvm.txt file is either
outdated or described in Documentation/virtual/kvm/api.txt.
Let's get rid of it.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
---
 Documentation/s390/00-INDEX |   2 -
 Documentation/s390/kvm.txt  | 125 --------------------------------------------
 2 files changed, 127 deletions(-)
 delete mode 100644 Documentation/s390/kvm.txt

(limited to 'Documentation')

diff --git a/Documentation/s390/00-INDEX b/Documentation/s390/00-INDEX
index 10c874ebdfe5..9189535f6cd2 100644
--- a/Documentation/s390/00-INDEX
+++ b/Documentation/s390/00-INDEX
@@ -16,8 +16,6 @@ Debugging390.txt
 	- hints for debugging on s390 systems.
 driver-model.txt
 	- information on s390 devices and the driver model.
-kvm.txt
-	- ioctl calls to /dev/kvm on s390.
 monreader.txt
 	- information on accessing the z/VM monitor stream from Linux.
 qeth.txt
diff --git a/Documentation/s390/kvm.txt b/Documentation/s390/kvm.txt
deleted file mode 100644
index 85f3280d7ef6..000000000000
--- a/Documentation/s390/kvm.txt
+++ /dev/null
@@ -1,125 +0,0 @@
-*** BIG FAT WARNING ***
-The kvm module is currently in EXPERIMENTAL state for s390. This means that
-the interface to the module is not yet considered to remain stable. Thus, be
-prepared that we keep breaking your userspace application and guest
-compatibility over and over again until we feel happy with the result. Make sure
-your guest kernel, your host kernel, and your userspace launcher are in a
-consistent state.
-
-This Documentation describes the unique ioctl calls to /dev/kvm, the resulting
-kvm-vm file descriptors, and the kvm-vcpu file descriptors that differ from x86.
-
-1. ioctl calls to /dev/kvm
-KVM does support the following ioctls on s390 that are common with other
-architectures and do behave the same:
-KVM_GET_API_VERSION
-KVM_CREATE_VM		(*) see note
-KVM_CHECK_EXTENSION
-KVM_GET_VCPU_MMAP_SIZE
-
-Notes:
-* KVM_CREATE_VM may fail on s390, if the calling process has multiple
-threads and has not called KVM_S390_ENABLE_SIE before.
-
-In addition, on s390 the following architecture specific ioctls are supported:
-ioctl:		KVM_S390_ENABLE_SIE
-args:		none
-see also:	include/linux/kvm.h
-This call causes the kernel to switch on PGSTE in the user page table. This
-operation is needed in order to run a virtual machine, and it requires the
-calling process to be single-threaded. Note that the first call to KVM_CREATE_VM
-will implicitly try to switch on PGSTE if the user process has not called
-KVM_S390_ENABLE_SIE before. User processes that want to launch multiple threads
-before creating a virtual machine have to call KVM_S390_ENABLE_SIE, or will
-observe an error calling KVM_CREATE_VM. Switching on PGSTE is a one-time
-operation, is not reversible, and will persist over the entire lifetime of
-the calling process. It does not have any user-visible effect other than a small
-performance penalty.
-
-2. ioctl calls to the kvm-vm file descriptor
-KVM does support the following ioctls on s390 that are common with other
-architectures and do behave the same:
-KVM_CREATE_VCPU
-KVM_SET_USER_MEMORY_REGION      (*) see note
-KVM_GET_DIRTY_LOG		(**) see note
-
-Notes:
-*  kvm does only allow exactly one memory slot on s390, which has to start
-   at guest absolute address zero and at a user address that is aligned on any
-   page boundary. This hardware "limitation" allows us to have a few unique
-   optimizations. The memory slot doesn't have to be filled
-   with memory actually, it may contain sparse holes. That said, with different
-   user memory layout this does still allow a large flexibility when
-   doing the guest memory setup.
-** KVM_GET_DIRTY_LOG doesn't work properly yet. The user will receive an empty
-log. This ioctl call is only needed for guest migration, and we intend to
-implement this one in the future.
-
-In addition, on s390 the following architecture specific ioctls for the kvm-vm
-file descriptor are supported:
-ioctl:		KVM_S390_INTERRUPT
-args:		struct kvm_s390_interrupt *
-see also:	include/linux/kvm.h
-This ioctl is used to submit a floating interrupt for a virtual machine.
-Floating interrupts may be delivered to any virtual cpu in the configuration.
-Only some interrupt types defined in include/linux/kvm.h make sense when
-submitted as floating interrupts. The following interrupts are not considered
-to be useful as floating interrupts, and a call to inject them will result in
--EINVAL error code: program interrupts and interprocessor signals. Valid
-floating interrupts are:
-KVM_S390_INT_VIRTIO
-KVM_S390_INT_SERVICE
-
-3. ioctl calls to the kvm-vcpu file descriptor
-KVM does support the following ioctls on s390 that are common with other
-architectures and do behave the same:
-KVM_RUN
-KVM_GET_REGS
-KVM_SET_REGS
-KVM_GET_SREGS
-KVM_SET_SREGS
-KVM_GET_FPU
-KVM_SET_FPU
-
-In addition, on s390 the following architecture specific ioctls for the
-kvm-vcpu file descriptor are supported:
-ioctl:		KVM_S390_INTERRUPT
-args:		struct kvm_s390_interrupt *
-see also:	include/linux/kvm.h
-This ioctl is used to submit an interrupt for a specific virtual cpu.
-Only some interrupt types defined in include/linux/kvm.h make sense when
-submitted for a specific cpu. The following interrupts are not considered
-to be useful, and a call to inject them will result in -EINVAL error code:
-service processor calls and virtio interrupts. Valid interrupt types are:
-KVM_S390_PROGRAM_INT
-KVM_S390_SIGP_STOP
-KVM_S390_RESTART
-KVM_S390_SIGP_SET_PREFIX
-KVM_S390_INT_EMERGENCY
-
-ioctl:		KVM_S390_STORE_STATUS
-args:		unsigned long
-see also:	include/linux/kvm.h
-This ioctl stores the state of the cpu at the guest real address given as
-argument, unless one of the following values defined in include/linux/kvm.h
-is given as argument:
-KVM_S390_STORE_STATUS_NOADDR - the CPU stores its status to the save area in
-absolute lowcore as defined by the principles of operation
-KVM_S390_STORE_STATUS_PREFIXED - the CPU stores its status to the save area in
-its prefix page just like the dump tool that comes with zipl. This is useful
-to create a system dump for use with lkcdutils or crash.
-
-ioctl:		KVM_S390_SET_INITIAL_PSW
-args:		struct kvm_s390_psw *
-see also:	include/linux/kvm.h
-This ioctl can be used to set the processor status word (psw) of a stopped cpu
-prior to running it with KVM_RUN. Note that this call is not required to modify
-the psw during sie intercepts that fall back to userspace because struct kvm_run
-does contain the psw, and this value is evaluated during reentry of KVM_RUN
-after the intercept exit was recognized.
-
-ioctl:		KVM_S390_INITIAL_RESET
-args:		none
-see also:	include/linux/kvm.h
-This ioctl can be used to perform an initial cpu reset as defined by the
-principles of operation. The target cpu has to be in stopped state.
-- 
cgit v1.2.3


From 4246a0b63bd8f56a1469b12eafeb875b1041a451 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Mon, 20 Jul 2015 15:29:37 +0200
Subject: block: add a bi_error field to struct bio

Currently we have two different ways to signal an I/O error on a BIO:

 (1) by clearing the BIO_UPTODATE flag
 (2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario.  Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
---
 Documentation/block/biodoc.txt      |  2 +-
 arch/m68k/emu/nfblock.c             |  2 +-
 arch/powerpc/sysdev/axonram.c       |  2 +-
 arch/xtensa/platforms/iss/simdisk.c | 12 ++-----
 block/bio-integrity.c               | 11 +++----
 block/bio.c                         | 43 +++++++++++--------------
 block/blk-core.c                    | 15 ++++-----
 block/blk-lib.c                     | 30 ++++++++----------
 block/blk-map.c                     |  2 +-
 block/blk-mq.c                      |  6 ++--
 block/bounce.c                      | 27 ++++++++--------
 drivers/block/aoe/aoecmd.c          | 10 +++---
 drivers/block/aoe/aoedev.c          |  2 +-
 drivers/block/brd.c                 | 13 +++++---
 drivers/block/drbd/drbd_actlog.c    |  4 +--
 drivers/block/drbd/drbd_bitmap.c    | 19 +++---------
 drivers/block/drbd/drbd_int.h       | 11 ++++---
 drivers/block/drbd/drbd_req.c       | 10 +++---
 drivers/block/drbd/drbd_worker.c    | 44 +++++++-------------------
 drivers/block/floppy.c              |  7 +++--
 drivers/block/null_blk.c            |  2 +-
 drivers/block/pktcdvd.c             | 32 +++++++++----------
 drivers/block/ps3vram.c             |  3 +-
 drivers/block/rsxx/dev.c            |  9 ++++--
 drivers/block/umem.c                |  4 +--
 drivers/block/xen-blkback/blkback.c |  4 +--
 drivers/block/xen-blkfront.c        |  9 ++----
 drivers/block/zram/zram_drv.c       |  5 ++-
 drivers/md/bcache/btree.c           | 10 +++---
 drivers/md/bcache/closure.h         |  2 +-
 drivers/md/bcache/io.c              |  8 ++---
 drivers/md/bcache/journal.c         |  8 ++---
 drivers/md/bcache/movinggc.c        |  8 ++---
 drivers/md/bcache/request.c         | 27 ++++++++--------
 drivers/md/bcache/super.c           | 14 ++++-----
 drivers/md/bcache/writeback.c       | 10 +++---
 drivers/md/dm-bio-prison.c          |  6 ++--
 drivers/md/dm-bufio.c               | 26 ++++++++++------
 drivers/md/dm-cache-target.c        | 24 +++++++-------
 drivers/md/dm-crypt.c               | 14 ++++-----
 drivers/md/dm-flakey.c              |  2 +-
 drivers/md/dm-io.c                  |  6 ++--
 drivers/md/dm-log-writes.c          | 11 +++----
 drivers/md/dm-raid1.c               | 24 +++++++-------
 drivers/md/dm-snap.c                |  6 ++--
 drivers/md/dm-stripe.c              |  2 +-
 drivers/md/dm-thin.c                | 41 +++++++++++++-----------
 drivers/md/dm-verity.c              |  9 +++---
 drivers/md/dm-zero.c                |  2 +-
 drivers/md/dm.c                     | 15 +++++----
 drivers/md/faulty.c                 |  4 +--
 drivers/md/linear.c                 |  2 +-
 drivers/md/md.c                     | 18 +++++------
 drivers/md/multipath.c              | 12 +++----
 drivers/md/raid0.c                  |  2 +-
 drivers/md/raid1.c                  | 53 ++++++++++++++++---------------
 drivers/md/raid10.c                 | 55 +++++++++++++++-----------------
 drivers/md/raid5.c                  | 52 +++++++++++++++----------------
 drivers/nvdimm/blk.c                |  5 +--
 drivers/nvdimm/btt.c                |  5 +--
 drivers/nvdimm/pmem.c               |  2 +-
 drivers/s390/block/dcssblk.c        |  2 +-
 drivers/s390/block/xpram.c          |  3 +-
 drivers/target/target_core_iblock.c | 21 +++++--------
 drivers/target/target_core_pscsi.c  |  6 ++--
 fs/btrfs/check-integrity.c          | 10 +++---
 fs/btrfs/compression.c              | 24 ++++++++------
 fs/btrfs/disk-io.c                  | 35 +++++++++++----------
 fs/btrfs/extent_io.c                | 30 +++++++-----------
 fs/btrfs/inode.c                    | 50 ++++++++++++++++--------------
 fs/btrfs/raid56.c                   | 62 +++++++++++++++++--------------------
 fs/btrfs/scrub.c                    | 22 ++++++-------
 fs/btrfs/volumes.c                  | 23 +++++++-------
 fs/buffer.c                         |  4 +--
 fs/direct-io.c                      | 13 ++++----
 fs/ext4/page-io.c                   | 15 ++++-----
 fs/ext4/readpage.c                  |  6 ++--
 fs/f2fs/data.c                      | 10 +++---
 fs/gfs2/lops.c                      | 10 +++---
 fs/gfs2/ops_fstype.c                |  6 ++--
 fs/jfs/jfs_logmgr.c                 |  8 ++---
 fs/jfs/jfs_metapage.c               |  8 ++---
 fs/logfs/dev_bdev.c                 | 12 +++----
 fs/mpage.c                          |  4 +--
 fs/nfs/blocklayout/blocklayout.c    | 14 ++++-----
 fs/nilfs2/segbuf.c                  |  5 ++-
 fs/ocfs2/cluster/heartbeat.c        |  9 +++---
 fs/xfs/xfs_aops.c                   |  5 ++-
 fs/xfs/xfs_buf.c                    |  7 ++---
 include/linux/bio.h                 | 13 +++++---
 include/linux/blk_types.h           |  4 +--
 include/linux/swap.h                |  4 +--
 kernel/power/swap.c                 | 12 +++----
 kernel/trace/blktrace.c             | 10 ++----
 mm/page_io.c                        | 12 +++----
 95 files changed, 622 insertions(+), 682 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index fd12c0d835fd..5be8a7f4cc7f 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -1109,7 +1109,7 @@ it will loop and handle as many sectors (on a bio-segment granularity)
 as specified.
 
 Now bh->b_end_io is replaced by bio->bi_end_io, but most of the time the
-right thing to use is bio_endio(bio, uptodate) instead.
+right thing to use is bio_endio(bio) instead.
 
 If the driver is dropping the io_request_lock from its request_fn strategy,
 then it just needs to replace that with q->queue_lock instead.
diff --git a/arch/m68k/emu/nfblock.c b/arch/m68k/emu/nfblock.c
index 2d75ae246167..f2a00c591bf7 100644
--- a/arch/m68k/emu/nfblock.c
+++ b/arch/m68k/emu/nfblock.c
@@ -76,7 +76,7 @@ static void nfhd_make_request(struct request_queue *queue, struct bio *bio)
 				bvec_to_phys(&bvec));
 		sec += len;
 	}
-	bio_endio(bio, 0);
+	bio_endio(bio);
 }
 
 static int nfhd_getgeo(struct block_device *bdev, struct hd_geometry *geo)
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index ee90db17b097..f86250c48b53 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -132,7 +132,7 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
 		phys_mem += vec.bv_len;
 		transfered += vec.bv_len;
 	}
-	bio_endio(bio, 0);
+	bio_endio(bio);
 }
 
 /**
diff --git a/arch/xtensa/platforms/iss/simdisk.c b/arch/xtensa/platforms/iss/simdisk.c
index 48eebacdf5fe..fa84ca990caa 100644
--- a/arch/xtensa/platforms/iss/simdisk.c
+++ b/arch/xtensa/platforms/iss/simdisk.c
@@ -101,8 +101,9 @@ static void simdisk_transfer(struct simdisk *dev, unsigned long sector,
 	spin_unlock(&dev->lock);
 }
 
-static int simdisk_xfer_bio(struct simdisk *dev, struct bio *bio)
+static void simdisk_make_request(struct request_queue *q, struct bio *bio)
 {
+	struct simdisk *dev = q->queuedata;
 	struct bio_vec bvec;
 	struct bvec_iter iter;
 	sector_t sector = bio->bi_iter.bi_sector;
@@ -116,17 +117,10 @@ static int simdisk_xfer_bio(struct simdisk *dev, struct bio *bio)
 		sector += len;
 		__bio_kunmap_atomic(buffer);
 	}
-	return 0;
-}
 
-static void simdisk_make_request(struct request_queue *q, struct bio *bio)
-{
-	struct simdisk *dev = q->queuedata;
-	int status = simdisk_xfer_bio(dev, bio);
-	bio_endio(bio, status);
+	bio_endio(bio);
 }
 
-
 static int simdisk_open(struct block_device *bdev, fmode_t mode)
 {
 	struct simdisk *dev = bdev->bd_disk->private_data;
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 719b7152aed1..4aecca79374a 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -355,13 +355,12 @@ static void bio_integrity_verify_fn(struct work_struct *work)
 		container_of(work, struct bio_integrity_payload, bip_work);
 	struct bio *bio = bip->bip_bio;
 	struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
-	int error;
 
-	error = bio_integrity_process(bio, bi->verify_fn);
+	bio->bi_error = bio_integrity_process(bio, bi->verify_fn);
 
 	/* Restore original bio completion handler */
 	bio->bi_end_io = bip->bip_end_io;
-	bio_endio(bio, error);
+	bio_endio(bio);
 }
 
 /**
@@ -376,7 +375,7 @@ static void bio_integrity_verify_fn(struct work_struct *work)
  * in process context.	This function postpones completion
  * accordingly.
  */
-void bio_integrity_endio(struct bio *bio, int error)
+void bio_integrity_endio(struct bio *bio)
 {
 	struct bio_integrity_payload *bip = bio_integrity(bio);
 
@@ -386,9 +385,9 @@ void bio_integrity_endio(struct bio *bio, int error)
 	 * integrity metadata.  Restore original bio end_io handler
 	 * and run it.
 	 */
-	if (error) {
+	if (bio->bi_error) {
 		bio->bi_end_io = bip->bip_end_io;
-		bio_endio(bio, error);
+		bio_endio(bio);
 
 		return;
 	}
diff --git a/block/bio.c b/block/bio.c
index 2a00d349cd68..a23f489f398f 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -269,7 +269,6 @@ static void bio_free(struct bio *bio)
 void bio_init(struct bio *bio)
 {
 	memset(bio, 0, sizeof(*bio));
-	bio->bi_flags = 1 << BIO_UPTODATE;
 	atomic_set(&bio->__bi_remaining, 1);
 	atomic_set(&bio->__bi_cnt, 1);
 }
@@ -292,14 +291,17 @@ void bio_reset(struct bio *bio)
 	__bio_free(bio);
 
 	memset(bio, 0, BIO_RESET_BYTES);
-	bio->bi_flags = flags | (1 << BIO_UPTODATE);
+	bio->bi_flags = flags;
 	atomic_set(&bio->__bi_remaining, 1);
 }
 EXPORT_SYMBOL(bio_reset);
 
-static void bio_chain_endio(struct bio *bio, int error)
+static void bio_chain_endio(struct bio *bio)
 {
-	bio_endio(bio->bi_private, error);
+	struct bio *parent = bio->bi_private;
+
+	parent->bi_error = bio->bi_error;
+	bio_endio(parent);
 	bio_put(bio);
 }
 
@@ -896,11 +898,11 @@ struct submit_bio_ret {
 	int error;
 };
 
-static void submit_bio_wait_endio(struct bio *bio, int error)
+static void submit_bio_wait_endio(struct bio *bio)
 {
 	struct submit_bio_ret *ret = bio->bi_private;
 
-	ret->error = error;
+	ret->error = bio->bi_error;
 	complete(&ret->event);
 }
 
@@ -1445,7 +1447,7 @@ void bio_unmap_user(struct bio *bio)
 }
 EXPORT_SYMBOL(bio_unmap_user);
 
-static void bio_map_kern_endio(struct bio *bio, int err)
+static void bio_map_kern_endio(struct bio *bio)
 {
 	bio_put(bio);
 }
@@ -1501,13 +1503,13 @@ struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
 }
 EXPORT_SYMBOL(bio_map_kern);
 
-static void bio_copy_kern_endio(struct bio *bio, int err)
+static void bio_copy_kern_endio(struct bio *bio)
 {
 	bio_free_pages(bio);
 	bio_put(bio);
 }
 
-static void bio_copy_kern_endio_read(struct bio *bio, int err)
+static void bio_copy_kern_endio_read(struct bio *bio)
 {
 	char *p = bio->bi_private;
 	struct bio_vec *bvec;
@@ -1518,7 +1520,7 @@ static void bio_copy_kern_endio_read(struct bio *bio, int err)
 		p += bvec->bv_len;
 	}
 
-	bio_copy_kern_endio(bio, err);
+	bio_copy_kern_endio(bio);
 }
 
 /**
@@ -1778,25 +1780,15 @@ static inline bool bio_remaining_done(struct bio *bio)
 /**
  * bio_endio - end I/O on a bio
  * @bio:	bio
- * @error:	error, if any
  *
  * Description:
- *   bio_endio() will end I/O on the whole bio. bio_endio() is the
- *   preferred way to end I/O on a bio, it takes care of clearing
- *   BIO_UPTODATE on error. @error is 0 on success, and and one of the
- *   established -Exxxx (-EIO, for instance) error values in case
- *   something went wrong. No one should call bi_end_io() directly on a
- *   bio unless they own it and thus know that it has an end_io
- *   function.
+ *   bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
+ *   way to end I/O on a bio. No one should call bi_end_io() directly on a
+ *   bio unless they own it and thus know that it has an end_io function.
  **/
-void bio_endio(struct bio *bio, int error)
+void bio_endio(struct bio *bio)
 {
 	while (bio) {
-		if (error)
-			clear_bit(BIO_UPTODATE, &bio->bi_flags);
-		else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
-			error = -EIO;
-
 		if (unlikely(!bio_remaining_done(bio)))
 			break;
 
@@ -1810,11 +1802,12 @@ void bio_endio(struct bio *bio, int error)
 		 */
 		if (bio->bi_end_io == bio_chain_endio) {
 			struct bio *parent = bio->bi_private;
+			parent->bi_error = bio->bi_error;
 			bio_put(bio);
 			bio = parent;
 		} else {
 			if (bio->bi_end_io)
-				bio->bi_end_io(bio, error);
+				bio->bi_end_io(bio);
 			bio = NULL;
 		}
 	}
diff --git a/block/blk-core.c b/block/blk-core.c
index 627ed0c593fb..7ef15b947b91 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -143,9 +143,7 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
 			  unsigned int nbytes, int error)
 {
 	if (error)
-		clear_bit(BIO_UPTODATE, &bio->bi_flags);
-	else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
-		error = -EIO;
+		bio->bi_error = error;
 
 	if (unlikely(rq->cmd_flags & REQ_QUIET))
 		set_bit(BIO_QUIET, &bio->bi_flags);
@@ -154,7 +152,7 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
 
 	/* don't actually finish bio if it's part of flush sequence */
 	if (bio->bi_iter.bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ))
-		bio_endio(bio, error);
+		bio_endio(bio);
 }
 
 void blk_dump_rq_flags(struct request *rq, char *msg)
@@ -1620,7 +1618,8 @@ static void blk_queue_bio(struct request_queue *q, struct bio *bio)
 	blk_queue_bounce(q, &bio);
 
 	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
-		bio_endio(bio, -EIO);
+		bio->bi_error = -EIO;
+		bio_endio(bio);
 		return;
 	}
 
@@ -1673,7 +1672,8 @@ get_rq:
 	 */
 	req = get_request(q, rw_flags, bio, GFP_NOIO);
 	if (IS_ERR(req)) {
-		bio_endio(bio, PTR_ERR(req));	/* @q is dead */
+		bio->bi_error = PTR_ERR(req);
+		bio_endio(bio);
 		goto out_unlock;
 	}
 
@@ -1896,7 +1896,8 @@ generic_make_request_checks(struct bio *bio)
 	return true;
 
 end_io:
-	bio_endio(bio, err);
+	bio->bi_error = err;
+	bio_endio(bio);
 	return false;
 }
 
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7688ee3f5d72..6dee17443f14 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -11,16 +11,16 @@
 
 struct bio_batch {
 	atomic_t		done;
-	unsigned long		flags;
+	int			error;
 	struct completion	*wait;
 };
 
-static void bio_batch_end_io(struct bio *bio, int err)
+static void bio_batch_end_io(struct bio *bio)
 {
 	struct bio_batch *bb = bio->bi_private;
 
-	if (err && (err != -EOPNOTSUPP))
-		clear_bit(BIO_UPTODATE, &bb->flags);
+	if (bio->bi_error && bio->bi_error != -EOPNOTSUPP)
+		bb->error = bio->bi_error;
 	if (atomic_dec_and_test(&bb->done))
 		complete(bb->wait);
 	bio_put(bio);
@@ -78,7 +78,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	}
 
 	atomic_set(&bb.done, 1);
-	bb.flags = 1 << BIO_UPTODATE;
+	bb.error = 0;
 	bb.wait = &wait;
 
 	blk_start_plug(&plug);
@@ -134,9 +134,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	if (!atomic_dec_and_test(&bb.done))
 		wait_for_completion_io(&wait);
 
-	if (!test_bit(BIO_UPTODATE, &bb.flags))
-		ret = -EIO;
-
+	if (bb.error)
+		return bb.error;
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_discard);
@@ -172,7 +171,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 		return -EOPNOTSUPP;
 
 	atomic_set(&bb.done, 1);
-	bb.flags = 1 << BIO_UPTODATE;
+	bb.error = 0;
 	bb.wait = &wait;
 
 	while (nr_sects) {
@@ -208,9 +207,8 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 	if (!atomic_dec_and_test(&bb.done))
 		wait_for_completion_io(&wait);
 
-	if (!test_bit(BIO_UPTODATE, &bb.flags))
-		ret = -ENOTSUPP;
-
+	if (bb.error)
+		return bb.error;
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);
@@ -236,7 +234,7 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	DECLARE_COMPLETION_ONSTACK(wait);
 
 	atomic_set(&bb.done, 1);
-	bb.flags = 1 << BIO_UPTODATE;
+	bb.error = 0;
 	bb.wait = &wait;
 
 	ret = 0;
@@ -270,10 +268,8 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	if (!atomic_dec_and_test(&bb.done))
 		wait_for_completion_io(&wait);
 
-	if (!test_bit(BIO_UPTODATE, &bb.flags))
-		/* One of bios in the batch was completed with error.*/
-		ret = -EIO;
-
+	if (bb.error)
+		return bb.error;
 	return ret;
 }
 
diff --git a/block/blk-map.c b/block/blk-map.c
index da310a105429..5fe1c30bfba7 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -103,7 +103,7 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
 		 * normal IO completion path
 		 */
 		bio_get(bio);
-		bio_endio(bio, 0);
+		bio_endio(bio);
 		__blk_rq_unmap_user(bio);
 		return -EINVAL;
 	}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7d842db59699..94559025c5e6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1199,7 +1199,7 @@ static struct request *blk_mq_map_request(struct request_queue *q,
 	struct blk_mq_alloc_data alloc_data;
 
 	if (unlikely(blk_mq_queue_enter(q, GFP_KERNEL))) {
-		bio_endio(bio, -EIO);
+		bio_io_error(bio);
 		return NULL;
 	}
 
@@ -1283,7 +1283,7 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	blk_queue_bounce(q, &bio);
 
 	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
-		bio_endio(bio, -EIO);
+		bio_io_error(bio);
 		return;
 	}
 
@@ -1368,7 +1368,7 @@ static void blk_sq_make_request(struct request_queue *q, struct bio *bio)
 	blk_queue_bounce(q, &bio);
 
 	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
-		bio_endio(bio, -EIO);
+		bio_io_error(bio);
 		return;
 	}
 
diff --git a/block/bounce.c b/block/bounce.c
index b17311227c12..f4db245b9f3a 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -123,7 +123,7 @@ static void copy_to_high_bio_irq(struct bio *to, struct bio *from)
 	}
 }
 
-static void bounce_end_io(struct bio *bio, mempool_t *pool, int err)
+static void bounce_end_io(struct bio *bio, mempool_t *pool)
 {
 	struct bio *bio_orig = bio->bi_private;
 	struct bio_vec *bvec, *org_vec;
@@ -141,39 +141,40 @@ static void bounce_end_io(struct bio *bio, mempool_t *pool, int err)
 		mempool_free(bvec->bv_page, pool);
 	}
 
-	bio_endio(bio_orig, err);
+	bio_orig->bi_error = bio->bi_error;
+	bio_endio(bio_orig);
 	bio_put(bio);
 }
 
-static void bounce_end_io_write(struct bio *bio, int err)
+static void bounce_end_io_write(struct bio *bio)
 {
-	bounce_end_io(bio, page_pool, err);
+	bounce_end_io(bio, page_pool);
 }
 
-static void bounce_end_io_write_isa(struct bio *bio, int err)
+static void bounce_end_io_write_isa(struct bio *bio)
 {
 
-	bounce_end_io(bio, isa_page_pool, err);
+	bounce_end_io(bio, isa_page_pool);
 }
 
-static void __bounce_end_io_read(struct bio *bio, mempool_t *pool, int err)
+static void __bounce_end_io_read(struct bio *bio, mempool_t *pool)
 {
 	struct bio *bio_orig = bio->bi_private;
 
-	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
+	if (!bio->bi_error)
 		copy_to_high_bio_irq(bio_orig, bio);
 
-	bounce_end_io(bio, pool, err);
+	bounce_end_io(bio, pool);
 }
 
-static void bounce_end_io_read(struct bio *bio, int err)
+static void bounce_end_io_read(struct bio *bio)
 {
-	__bounce_end_io_read(bio, page_pool, err);
+	__bounce_end_io_read(bio, page_pool);
 }
 
-static void bounce_end_io_read_isa(struct bio *bio, int err)
+static void bounce_end_io_read_isa(struct bio *bio)
 {
-	__bounce_end_io_read(bio, isa_page_pool, err);
+	__bounce_end_io_read(bio, isa_page_pool);
 }
 
 #ifdef CONFIG_NEED_BOUNCE_POOL
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 422b7d84f686..ad80c85e0857 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -1110,7 +1110,7 @@ aoe_end_request(struct aoedev *d, struct request *rq, int fastfail)
 		d->ip.rq = NULL;
 	do {
 		bio = rq->bio;
-		bok = !fastfail && test_bit(BIO_UPTODATE, &bio->bi_flags);
+		bok = !fastfail && !bio->bi_error;
 	} while (__blk_end_request(rq, bok ? 0 : -EIO, bio->bi_iter.bi_size));
 
 	/* cf. http://lkml.org/lkml/2006/10/31/28 */
@@ -1172,7 +1172,7 @@ ktiocomplete(struct frame *f)
 			ahout->cmdstat, ahin->cmdstat,
 			d->aoemajor, d->aoeminor);
 noskb:		if (buf)
-			clear_bit(BIO_UPTODATE, &buf->bio->bi_flags);
+			buf->bio->bi_error = -EIO;
 		goto out;
 	}
 
@@ -1185,7 +1185,7 @@ noskb:		if (buf)
 				"aoe: runt data size in read from",
 				(long) d->aoemajor, d->aoeminor,
 			       skb->len, n);
-			clear_bit(BIO_UPTODATE, &buf->bio->bi_flags);
+			buf->bio->bi_error = -EIO;
 			break;
 		}
 		if (n > f->iter.bi_size) {
@@ -1193,7 +1193,7 @@ noskb:		if (buf)
 				"aoe: too-large data size in read from",
 				(long) d->aoemajor, d->aoeminor,
 				n, f->iter.bi_size);
-			clear_bit(BIO_UPTODATE, &buf->bio->bi_flags);
+			buf->bio->bi_error = -EIO;
 			break;
 		}
 		bvcpy(skb, f->buf->bio, f->iter, n);
@@ -1695,7 +1695,7 @@ aoe_failbuf(struct aoedev *d, struct buf *buf)
 	if (buf == NULL)
 		return;
 	buf->iter.bi_size = 0;
-	clear_bit(BIO_UPTODATE, &buf->bio->bi_flags);
+	buf->bio->bi_error = -EIO;
 	if (buf->nframesout == 0)
 		aoe_end_buf(d, buf);
 }
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index e774c50b6842..ffd1947500c6 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -170,7 +170,7 @@ aoe_failip(struct aoedev *d)
 	if (rq == NULL)
 		return;
 	while ((bio = d->ip.nxbio)) {
-		clear_bit(BIO_UPTODATE, &bio->bi_flags);
+		bio->bi_error = -EIO;
 		d->ip.nxbio = bio->bi_next;
 		n = (unsigned long) rq->special;
 		rq->special = (void *) --n;
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index e573e470bd8a..f9ab74505e69 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -331,14 +331,12 @@ static void brd_make_request(struct request_queue *q, struct bio *bio)
 	struct bio_vec bvec;
 	sector_t sector;
 	struct bvec_iter iter;
-	int err = -EIO;
 
 	sector = bio->bi_iter.bi_sector;
 	if (bio_end_sector(bio) > get_capacity(bdev->bd_disk))
-		goto out;
+		goto io_error;
 
 	if (unlikely(bio->bi_rw & REQ_DISCARD)) {
-		err = 0;
 		discard_from_brd(brd, sector, bio->bi_iter.bi_size);
 		goto out;
 	}
@@ -349,15 +347,20 @@ static void brd_make_request(struct request_queue *q, struct bio *bio)
 
 	bio_for_each_segment(bvec, bio, iter) {
 		unsigned int len = bvec.bv_len;
+		int err;
+
 		err = brd_do_bvec(brd, bvec.bv_page, len,
 					bvec.bv_offset, rw, sector);
 		if (err)
-			break;
+			goto io_error;
 		sector += len >> SECTOR_SHIFT;
 	}
 
 out:
-	bio_endio(bio, err);
+	bio_endio(bio);
+	return;
+io_error:
+	bio_io_error(bio);
 }
 
 static int brd_rw_page(struct block_device *bdev, sector_t sector,
diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c
index 1318e3217cb0..b3868e7a1ffd 100644
--- a/drivers/block/drbd/drbd_actlog.c
+++ b/drivers/block/drbd/drbd_actlog.c
@@ -175,11 +175,11 @@ static int _drbd_md_sync_page_io(struct drbd_device *device,
 	atomic_inc(&device->md_io.in_use); /* drbd_md_put_buffer() is in the completion handler */
 	device->md_io.submit_jif = jiffies;
 	if (drbd_insert_fault(device, (rw & WRITE) ? DRBD_FAULT_MD_WR : DRBD_FAULT_MD_RD))
-		bio_endio(bio, -EIO);
+		bio_io_error(bio);
 	else
 		submit_bio(rw, bio);
 	wait_until_done_or_force_detached(device, bdev, &device->md_io.done);
-	if (bio_flagged(bio, BIO_UPTODATE))
+	if (!bio->bi_error)
 		err = device->md_io.error;
 
  out:
diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index 434c77dcc99e..e5e0f19ceda0 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -941,36 +941,27 @@ static void drbd_bm_aio_ctx_destroy(struct kref *kref)
 }
 
 /* bv_page may be a copy, or may be the original */
-static void drbd_bm_endio(struct bio *bio, int error)
+static void drbd_bm_endio(struct bio *bio)
 {
 	struct drbd_bm_aio_ctx *ctx = bio->bi_private;
 	struct drbd_device *device = ctx->device;
 	struct drbd_bitmap *b = device->bitmap;
 	unsigned int idx = bm_page_to_idx(bio->bi_io_vec[0].bv_page);
-	int uptodate = bio_flagged(bio, BIO_UPTODATE);
-
-
-	/* strange behavior of some lower level drivers...
-	 * fail the request by clearing the uptodate flag,
-	 * but do not return any error?!
-	 * do we want to WARN() on this? */
-	if (!error && !uptodate)
-		error = -EIO;
 
 	if ((ctx->flags & BM_AIO_COPY_PAGES) == 0 &&
 	    !bm_test_page_unchanged(b->bm_pages[idx]))
 		drbd_warn(device, "bitmap page idx %u changed during IO!\n", idx);
 
-	if (error) {
+	if (bio->bi_error) {
 		/* ctx error will hold the completed-last non-zero error code,
 		 * in case error codes differ. */
-		ctx->error = error;
+		ctx->error = bio->bi_error;
 		bm_set_page_io_err(b->bm_pages[idx]);
 		/* Not identical to on disk version of it.
 		 * Is BM_PAGE_IO_ERROR enough? */
 		if (__ratelimit(&drbd_ratelimit_state))
 			drbd_err(device, "IO ERROR %d on bitmap page idx %u\n",
-					error, idx);
+					bio->bi_error, idx);
 	} else {
 		bm_clear_page_io_err(b->bm_pages[idx]);
 		dynamic_drbd_dbg(device, "bitmap page idx %u completed\n", idx);
@@ -1031,7 +1022,7 @@ static void bm_page_io_async(struct drbd_bm_aio_ctx *ctx, int page_nr) __must_ho
 
 	if (drbd_insert_fault(device, (rw & WRITE) ? DRBD_FAULT_MD_WR : DRBD_FAULT_MD_RD)) {
 		bio->bi_rw |= rw;
-		bio_endio(bio, -EIO);
+		bio_io_error(bio);
 	} else {
 		submit_bio(rw, bio);
 		/* this should not count as user activity and cause the
diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index efd19c2da9c2..a08c4a9179f1 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -1481,9 +1481,9 @@ extern int drbd_khelper(struct drbd_device *device, char *cmd);
 
 /* drbd_worker.c */
 /* bi_end_io handlers */
-extern void drbd_md_endio(struct bio *bio, int error);
-extern void drbd_peer_request_endio(struct bio *bio, int error);
-extern void drbd_request_endio(struct bio *bio, int error);
+extern void drbd_md_endio(struct bio *bio);
+extern void drbd_peer_request_endio(struct bio *bio);
+extern void drbd_request_endio(struct bio *bio);
 extern int drbd_worker(struct drbd_thread *thi);
 enum drbd_ret_code drbd_resync_after_valid(struct drbd_device *device, int o_minor);
 void drbd_resync_after_changed(struct drbd_device *device);
@@ -1604,12 +1604,13 @@ static inline void drbd_generic_make_request(struct drbd_device *device,
 	__release(local);
 	if (!bio->bi_bdev) {
 		drbd_err(device, "drbd_generic_make_request: bio->bi_bdev == NULL\n");
-		bio_endio(bio, -ENODEV);
+		bio->bi_error = -ENODEV;
+		bio_endio(bio);
 		return;
 	}
 
 	if (drbd_insert_fault(device, fault_type))
-		bio_endio(bio, -EIO);
+		bio_io_error(bio);
 	else
 		generic_make_request(bio);
 }
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 3907202fb9d9..9cb41166366e 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -201,7 +201,8 @@ void start_new_tl_epoch(struct drbd_connection *connection)
 void complete_master_bio(struct drbd_device *device,
 		struct bio_and_error *m)
 {
-	bio_endio(m->bio, m->error);
+	m->bio->bi_error = m->error;
+	bio_endio(m->bio);
 	dec_ap_bio(device);
 }
 
@@ -1153,12 +1154,12 @@ drbd_submit_req_private_bio(struct drbd_request *req)
 				      rw == WRITE ? DRBD_FAULT_DT_WR
 				    : rw == READ  ? DRBD_FAULT_DT_RD
 				    :               DRBD_FAULT_DT_RA))
-			bio_endio(bio, -EIO);
+			bio_io_error(bio);
 		else
 			generic_make_request(bio);
 		put_ldev(device);
 	} else
-		bio_endio(bio, -EIO);
+		bio_io_error(bio);
 }
 
 static void drbd_queue_write(struct drbd_device *device, struct drbd_request *req)
@@ -1191,7 +1192,8 @@ drbd_request_prepare(struct drbd_device *device, struct bio *bio, unsigned long
 		/* only pass the error to the upper layers.
 		 * if user cannot handle io errors, that's not our business. */
 		drbd_err(device, "could not kmalloc() req\n");
-		bio_endio(bio, -ENOMEM);
+		bio->bi_error = -ENOMEM;
+		bio_endio(bio);
 		return ERR_PTR(-ENOMEM);
 	}
 	req->start_jif = start_jif;
diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index d0fae55d871d..5578c1477ba6 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -65,12 +65,12 @@ rwlock_t global_state_lock;
 /* used for synchronous meta data and bitmap IO
  * submitted by drbd_md_sync_page_io()
  */
-void drbd_md_endio(struct bio *bio, int error)
+void drbd_md_endio(struct bio *bio)
 {
 	struct drbd_device *device;
 
 	device = bio->bi_private;
-	device->md_io.error = error;
+	device->md_io.error = bio->bi_error;
 
 	/* We grabbed an extra reference in _drbd_md_sync_page_io() to be able
 	 * to timeout on the lower level device, and eventually detach from it.
@@ -170,31 +170,20 @@ void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req) __releases(l
 /* writes on behalf of the partner, or resync writes,
  * "submitted" by the receiver.
  */
-void drbd_peer_request_endio(struct bio *bio, int error)
+void drbd_peer_request_endio(struct bio *bio)
 {
 	struct drbd_peer_request *peer_req = bio->bi_private;
 	struct drbd_device *device = peer_req->peer_device->device;
-	int uptodate = bio_flagged(bio, BIO_UPTODATE);
 	int is_write = bio_data_dir(bio) == WRITE;
 	int is_discard = !!(bio->bi_rw & REQ_DISCARD);
 
-	if (error && __ratelimit(&drbd_ratelimit_state))
+	if (bio->bi_error && __ratelimit(&drbd_ratelimit_state))
 		drbd_warn(device, "%s: error=%d s=%llus\n",
 				is_write ? (is_discard ? "discard" : "write")
-					: "read", error,
+					: "read", bio->bi_error,
 				(unsigned long long)peer_req->i.sector);
-	if (!error && !uptodate) {
-		if (__ratelimit(&drbd_ratelimit_state))
-			drbd_warn(device, "%s: setting error to -EIO s=%llus\n",
-					is_write ? "write" : "read",
-					(unsigned long long)peer_req->i.sector);
-		/* strange behavior of some lower level drivers...
-		 * fail the request by clearing the uptodate flag,
-		 * but do not return any error?! */
-		error = -EIO;
-	}
 
-	if (error)
+	if (bio->bi_error)
 		set_bit(__EE_WAS_ERROR, &peer_req->flags);
 
 	bio_put(bio); /* no need for the bio anymore */
@@ -208,24 +197,13 @@ void drbd_peer_request_endio(struct bio *bio, int error)
 
 /* read, readA or write requests on R_PRIMARY coming from drbd_make_request
  */
-void drbd_request_endio(struct bio *bio, int error)
+void drbd_request_endio(struct bio *bio)
 {
 	unsigned long flags;
 	struct drbd_request *req = bio->bi_private;
 	struct drbd_device *device = req->device;
 	struct bio_and_error m;
 	enum drbd_req_event what;
-	int uptodate = bio_flagged(bio, BIO_UPTODATE);
-
-	if (!error && !uptodate) {
-		drbd_warn(device, "p %s: setting error to -EIO\n",
-			 bio_data_dir(bio) == WRITE ? "write" : "read");
-		/* strange behavior of some lower level drivers...
-		 * fail the request by clearing the uptodate flag,
-		 * but do not return any error?! */
-		error = -EIO;
-	}
-
 
 	/* If this request was aborted locally before,
 	 * but now was completed "successfully",
@@ -259,14 +237,14 @@ void drbd_request_endio(struct bio *bio, int error)
 		if (__ratelimit(&drbd_ratelimit_state))
 			drbd_emerg(device, "delayed completion of aborted local request; disk-timeout may be too aggressive\n");
 
-		if (!error)
+		if (!bio->bi_error)
 			panic("possible random memory corruption caused by delayed completion of aborted local request\n");
 	}
 
 	/* to avoid recursion in __req_mod */
-	if (unlikely(error)) {
+	if (unlikely(bio->bi_error)) {
 		if (bio->bi_rw & REQ_DISCARD)
-			what = (error == -EOPNOTSUPP)
+			what = (bio->bi_error == -EOPNOTSUPP)
 				? DISCARD_COMPLETED_NOTSUPP
 				: DISCARD_COMPLETED_WITH_ERROR;
 		else
@@ -279,7 +257,7 @@ void drbd_request_endio(struct bio *bio, int error)
 		what = COMPLETED_OK;
 
 	bio_put(req->private_bio);
-	req->private_bio = ERR_PTR(error);
+	req->private_bio = ERR_PTR(bio->bi_error);
 
 	/* not req_mod(), we need irqsave here! */
 	spin_lock_irqsave(&device->resource->req_lock, flags);
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index a08cda955285..331363e7de0f 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3771,13 +3771,14 @@ struct rb0_cbdata {
 	struct completion complete;
 };
 
-static void floppy_rb0_cb(struct bio *bio, int err)
+static void floppy_rb0_cb(struct bio *bio)
 {
 	struct rb0_cbdata *cbdata = (struct rb0_cbdata *)bio->bi_private;
 	int drive = cbdata->drive;
 
-	if (err) {
-		pr_info("floppy: error %d while reading block 0\n", err);
+	if (bio->bi_error) {
+		pr_info("floppy: error %d while reading block 0\n",
+			bio->bi_error);
 		set_bit(FD_OPEN_SHOULD_FAIL_BIT, &UDRS->flags);
 	}
 	complete(&cbdata->complete);
diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index 69de41a87b74..016a59afcf24 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -222,7 +222,7 @@ static void end_cmd(struct nullb_cmd *cmd)
 		blk_end_request_all(cmd->rq, 0);
 		break;
 	case NULL_Q_BIO:
-		bio_endio(cmd->bio, 0);
+		bio_endio(cmd->bio);
 		break;
 	}
 
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 4c20c228184c..a7a259e031da 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -977,7 +977,7 @@ static void pkt_make_local_copy(struct packet_data *pkt, struct bio_vec *bvec)
 	}
 }
 
-static void pkt_end_io_read(struct bio *bio, int err)
+static void pkt_end_io_read(struct bio *bio)
 {
 	struct packet_data *pkt = bio->bi_private;
 	struct pktcdvd_device *pd = pkt->pd;
@@ -985,9 +985,9 @@ static void pkt_end_io_read(struct bio *bio, int err)
 
 	pkt_dbg(2, pd, "bio=%p sec0=%llx sec=%llx err=%d\n",
 		bio, (unsigned long long)pkt->sector,
-		(unsigned long long)bio->bi_iter.bi_sector, err);
+		(unsigned long long)bio->bi_iter.bi_sector, bio->bi_error);
 
-	if (err)
+	if (bio->bi_error)
 		atomic_inc(&pkt->io_errors);
 	if (atomic_dec_and_test(&pkt->io_wait)) {
 		atomic_inc(&pkt->run_sm);
@@ -996,13 +996,13 @@ static void pkt_end_io_read(struct bio *bio, int err)
 	pkt_bio_finished(pd);
 }
 
-static void pkt_end_io_packet_write(struct bio *bio, int err)
+static void pkt_end_io_packet_write(struct bio *bio)
 {
 	struct packet_data *pkt = bio->bi_private;
 	struct pktcdvd_device *pd = pkt->pd;
 	BUG_ON(!pd);
 
-	pkt_dbg(2, pd, "id=%d, err=%d\n", pkt->id, err);
+	pkt_dbg(2, pd, "id=%d, err=%d\n", pkt->id, bio->bi_error);
 
 	pd->stats.pkt_ended++;
 
@@ -1340,22 +1340,22 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
 	pkt_queue_bio(pd, pkt->w_bio);
 }
 
-static void pkt_finish_packet(struct packet_data *pkt, int uptodate)
+static void pkt_finish_packet(struct packet_data *pkt, int error)
 {
 	struct bio *bio;
 
-	if (!uptodate)
+	if (error)
 		pkt->cache_valid = 0;
 
 	/* Finish all bios corresponding to this packet */
-	while ((bio = bio_list_pop(&pkt->orig_bios)))
-		bio_endio(bio, uptodate ? 0 : -EIO);
+	while ((bio = bio_list_pop(&pkt->orig_bios))) {
+		bio->bi_error = error;
+		bio_endio(bio);
+	}
 }
 
 static void pkt_run_state_machine(struct pktcdvd_device *pd, struct packet_data *pkt)
 {
-	int uptodate;
-
 	pkt_dbg(2, pd, "pkt %d\n", pkt->id);
 
 	for (;;) {
@@ -1384,7 +1384,7 @@ static void pkt_run_state_machine(struct pktcdvd_device *pd, struct packet_data
 			if (atomic_read(&pkt->io_wait) > 0)
 				return;
 
-			if (test_bit(BIO_UPTODATE, &pkt->w_bio->bi_flags)) {
+			if (!pkt->w_bio->bi_error) {
 				pkt_set_state(pkt, PACKET_FINISHED_STATE);
 			} else {
 				pkt_set_state(pkt, PACKET_RECOVERY_STATE);
@@ -1401,8 +1401,7 @@ static void pkt_run_state_machine(struct pktcdvd_device *pd, struct packet_data
 			break;
 
 		case PACKET_FINISHED_STATE:
-			uptodate = test_bit(BIO_UPTODATE, &pkt->w_bio->bi_flags);
-			pkt_finish_packet(pkt, uptodate);
+			pkt_finish_packet(pkt, pkt->w_bio->bi_error);
 			return;
 
 		default:
@@ -2332,13 +2331,14 @@ static void pkt_close(struct gendisk *disk, fmode_t mode)
 }
 
 
-static void pkt_end_io_read_cloned(struct bio *bio, int err)
+static void pkt_end_io_read_cloned(struct bio *bio)
 {
 	struct packet_stacked_data *psd = bio->bi_private;
 	struct pktcdvd_device *pd = psd->pd;
 
+	psd->bio->bi_error = bio->bi_error;
 	bio_put(bio);
-	bio_endio(psd->bio, err);
+	bio_endio(psd->bio);
 	mempool_free(psd, psd_pool);
 	pkt_bio_finished(pd);
 }
diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index b1612eb16172..49b4706b162c 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -593,7 +593,8 @@ out:
 	next = bio_list_peek(&priv->list);
 	spin_unlock_irq(&priv->lock);
 
-	bio_endio(bio, error);
+	bio->bi_error = error;
+	bio_endio(bio);
 	return next;
 }
 
diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index ac8c62cb4875..63b9d2ffa8ee 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -137,7 +137,10 @@ static void bio_dma_done_cb(struct rsxx_cardinfo *card,
 		if (!card->eeh_state && card->gendisk)
 			disk_stats_complete(card, meta->bio, meta->start_time);
 
-		bio_endio(meta->bio, atomic_read(&meta->error) ? -EIO : 0);
+		if (atomic_read(&meta->error))
+			bio_io_error(meta->bio);
+		else
+			bio_endio(meta->bio);
 		kmem_cache_free(bio_meta_pool, meta);
 	}
 }
@@ -199,7 +202,9 @@ static void rsxx_make_request(struct request_queue *q, struct bio *bio)
 queue_err:
 	kmem_cache_free(bio_meta_pool, bio_meta);
 req_err:
-	bio_endio(bio, st);
+	if (st)
+		bio->bi_error = st;
+	bio_endio(bio);
 }
 
 /*----------------- Device Setup -------------------*/
diff --git a/drivers/block/umem.c b/drivers/block/umem.c
index 4cf81b5bf0f7..3b3afd2ec5d6 100644
--- a/drivers/block/umem.c
+++ b/drivers/block/umem.c
@@ -456,7 +456,7 @@ static void process_page(unsigned long data)
 				PCI_DMA_TODEVICE : PCI_DMA_FROMDEVICE);
 		if (control & DMASCR_HARD_ERROR) {
 			/* error */
-			clear_bit(BIO_UPTODATE, &bio->bi_flags);
+			bio->bi_error = -EIO;
 			dev_printk(KERN_WARNING, &card->dev->dev,
 				"I/O error on sector %d/%d\n",
 				le32_to_cpu(desc->local_addr)>>9,
@@ -505,7 +505,7 @@ static void process_page(unsigned long data)
 
 		return_bio = bio->bi_next;
 		bio->bi_next = NULL;
-		bio_endio(bio, 0);
+		bio_endio(bio);
 	}
 }
 
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index ced96777b677..662648e08596 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -1078,9 +1078,9 @@ static void __end_block_io_op(struct pending_req *pending_req, int error)
 /*
  * bio callback.
  */
-static void end_block_io_op(struct bio *bio, int error)
+static void end_block_io_op(struct bio *bio)
 {
-	__end_block_io_op(bio->bi_private, error);
+	__end_block_io_op(bio->bi_private, bio->bi_error);
 	bio_put(bio);
 }
 
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 6d89ed35d80c..d542db7a6c73 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -82,7 +82,6 @@ struct blk_shadow {
 struct split_bio {
 	struct bio *bio;
 	atomic_t pending;
-	int err;
 };
 
 static DEFINE_MUTEX(blkfront_mutex);
@@ -1478,16 +1477,14 @@ static int blkfront_probe(struct xenbus_device *dev,
 	return 0;
 }
 
-static void split_bio_end(struct bio *bio, int error)
+static void split_bio_end(struct bio *bio)
 {
 	struct split_bio *split_bio = bio->bi_private;
 
-	if (error)
-		split_bio->err = error;
-
 	if (atomic_dec_and_test(&split_bio->pending)) {
 		split_bio->bio->bi_phys_segments = 0;
-		bio_endio(split_bio->bio, split_bio->err);
+		split_bio->bio->bi_error = bio->bi_error;
+		bio_endio(split_bio->bio);
 		kfree(split_bio);
 	}
 	bio_put(bio);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f439ad2800da..68c3d4800464 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -850,7 +850,7 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
 
 	if (unlikely(bio->bi_rw & REQ_DISCARD)) {
 		zram_bio_discard(zram, index, offset, bio);
-		bio_endio(bio, 0);
+		bio_endio(bio);
 		return;
 	}
 
@@ -883,8 +883,7 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
 		update_position(&index, &offset, &bvec);
 	}
 
-	set_bit(BIO_UPTODATE, &bio->bi_flags);
-	bio_endio(bio, 0);
+	bio_endio(bio);
 	return;
 
 out:
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 00cde40db572..83392f856dfd 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -278,7 +278,7 @@ err:
 	goto out;
 }
 
-static void btree_node_read_endio(struct bio *bio, int error)
+static void btree_node_read_endio(struct bio *bio)
 {
 	struct closure *cl = bio->bi_private;
 	closure_put(cl);
@@ -305,7 +305,7 @@ static void bch_btree_node_read(struct btree *b)
 	bch_submit_bbio(bio, b->c, &b->key, 0);
 	closure_sync(&cl);
 
-	if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
+	if (bio->bi_error)
 		set_btree_node_io_error(b);
 
 	bch_bbio_free(bio, b->c);
@@ -371,15 +371,15 @@ static void btree_node_write_done(struct closure *cl)
 	__btree_node_write_done(cl);
 }
 
-static void btree_node_write_endio(struct bio *bio, int error)
+static void btree_node_write_endio(struct bio *bio)
 {
 	struct closure *cl = bio->bi_private;
 	struct btree *b = container_of(cl, struct btree, io);
 
-	if (error)
+	if (bio->bi_error)
 		set_btree_node_io_error(b);
 
-	bch_bbio_count_io_errors(b->c, bio, error, "writing btree");
+	bch_bbio_count_io_errors(b->c, bio, bio->bi_error, "writing btree");
 	closure_put(cl);
 }
 
diff --git a/drivers/md/bcache/closure.h b/drivers/md/bcache/closure.h
index 79a6d63e8ed3..782cc2c8a185 100644
--- a/drivers/md/bcache/closure.h
+++ b/drivers/md/bcache/closure.h
@@ -38,7 +38,7 @@
  * they are running owned by the thread that is running them. Otherwise, suppose
  * you submit some bios and wish to have a function run when they all complete:
  *
- * foo_endio(struct bio *bio, int error)
+ * foo_endio(struct bio *bio)
  * {
  *	closure_put(cl);
  * }
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index bf6a9ca18403..9440df94bc83 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -55,19 +55,19 @@ static void bch_bio_submit_split_done(struct closure *cl)
 
 	s->bio->bi_end_io = s->bi_end_io;
 	s->bio->bi_private = s->bi_private;
-	bio_endio(s->bio, 0);
+	bio_endio(s->bio);
 
 	closure_debug_destroy(&s->cl);
 	mempool_free(s, s->p->bio_split_hook);
 }
 
-static void bch_bio_submit_split_endio(struct bio *bio, int error)
+static void bch_bio_submit_split_endio(struct bio *bio)
 {
 	struct closure *cl = bio->bi_private;
 	struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
 
-	if (error)
-		clear_bit(BIO_UPTODATE, &s->bio->bi_flags);
+	if (bio->bi_error)
+		s->bio->bi_error = bio->bi_error;
 
 	bio_put(bio);
 	closure_put(cl);
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 418607a6ba33..d6a4e16030a6 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -24,7 +24,7 @@
  * bit.
  */
 
-static void journal_read_endio(struct bio *bio, int error)
+static void journal_read_endio(struct bio *bio)
 {
 	struct closure *cl = bio->bi_private;
 	closure_put(cl);
@@ -401,7 +401,7 @@ retry:
 
 #define last_seq(j)	((j)->seq - fifo_used(&(j)->pin) + 1)
 
-static void journal_discard_endio(struct bio *bio, int error)
+static void journal_discard_endio(struct bio *bio)
 {
 	struct journal_device *ja =
 		container_of(bio, struct journal_device, discard_bio);
@@ -547,11 +547,11 @@ void bch_journal_next(struct journal *j)
 		pr_debug("journal_pin full (%zu)", fifo_used(&j->pin));
 }
 
-static void journal_write_endio(struct bio *bio, int error)
+static void journal_write_endio(struct bio *bio)
 {
 	struct journal_write *w = bio->bi_private;
 
-	cache_set_err_on(error, w->c, "journal io error");
+	cache_set_err_on(bio->bi_error, w->c, "journal io error");
 	closure_put(&w->c->journal.io);
 }
 
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index cd7490311e51..b929fc944e9c 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -60,20 +60,20 @@ static void write_moving_finish(struct closure *cl)
 	closure_return_with_destructor(cl, moving_io_destructor);
 }
 
-static void read_moving_endio(struct bio *bio, int error)
+static void read_moving_endio(struct bio *bio)
 {
 	struct bbio *b = container_of(bio, struct bbio, bio);
 	struct moving_io *io = container_of(bio->bi_private,
 					    struct moving_io, cl);
 
-	if (error)
-		io->op.error = error;
+	if (bio->bi_error)
+		io->op.error = bio->bi_error;
 	else if (!KEY_DIRTY(&b->key) &&
 		 ptr_stale(io->op.c, &b->key, 0)) {
 		io->op.error = -EINTR;
 	}
 
-	bch_bbio_endio(io->op.c, bio, error, "reading data to move");
+	bch_bbio_endio(io->op.c, bio, bio->bi_error, "reading data to move");
 }
 
 static void moving_init(struct moving_io *io)
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index f292790997d7..a09b9462ff49 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -173,22 +173,22 @@ static void bch_data_insert_error(struct closure *cl)
 	bch_data_insert_keys(cl);
 }
 
-static void bch_data_insert_endio(struct bio *bio, int error)
+static void bch_data_insert_endio(struct bio *bio)
 {
 	struct closure *cl = bio->bi_private;
 	struct data_insert_op *op = container_of(cl, struct data_insert_op, cl);
 
-	if (error) {
+	if (bio->bi_error) {
 		/* TODO: We could try to recover from this. */
 		if (op->writeback)
-			op->error = error;
+			op->error = bio->bi_error;
 		else if (!op->replace)
 			set_closure_fn(cl, bch_data_insert_error, op->wq);
 		else
 			set_closure_fn(cl, NULL, NULL);
 	}
 
-	bch_bbio_endio(op->c, bio, error, "writing data to cache");
+	bch_bbio_endio(op->c, bio, bio->bi_error, "writing data to cache");
 }
 
 static void bch_data_insert_start(struct closure *cl)
@@ -477,7 +477,7 @@ struct search {
 	struct data_insert_op	iop;
 };
 
-static void bch_cache_read_endio(struct bio *bio, int error)
+static void bch_cache_read_endio(struct bio *bio)
 {
 	struct bbio *b = container_of(bio, struct bbio, bio);
 	struct closure *cl = bio->bi_private;
@@ -490,15 +490,15 @@ static void bch_cache_read_endio(struct bio *bio, int error)
 	 * from the backing device.
 	 */
 
-	if (error)
-		s->iop.error = error;
+	if (bio->bi_error)
+		s->iop.error = bio->bi_error;
 	else if (!KEY_DIRTY(&b->key) &&
 		 ptr_stale(s->iop.c, &b->key, 0)) {
 		atomic_long_inc(&s->iop.c->cache_read_races);
 		s->iop.error = -EINTR;
 	}
 
-	bch_bbio_endio(s->iop.c, bio, error, "reading from cache");
+	bch_bbio_endio(s->iop.c, bio, bio->bi_error, "reading from cache");
 }
 
 /*
@@ -591,13 +591,13 @@ static void cache_lookup(struct closure *cl)
 
 /* Common code for the make_request functions */
 
-static void request_endio(struct bio *bio, int error)
+static void request_endio(struct bio *bio)
 {
 	struct closure *cl = bio->bi_private;
 
-	if (error) {
+	if (bio->bi_error) {
 		struct search *s = container_of(cl, struct search, cl);
-		s->iop.error = error;
+		s->iop.error = bio->bi_error;
 		/* Only cache read errors are recoverable */
 		s->recoverable = false;
 	}
@@ -613,7 +613,8 @@ static void bio_complete(struct search *s)
 				    &s->d->disk->part0, s->start_time);
 
 		trace_bcache_request_end(s->d, s->orig_bio);
-		bio_endio(s->orig_bio, s->iop.error);
+		s->orig_bio->bi_error = s->iop.error;
+		bio_endio(s->orig_bio);
 		s->orig_bio = NULL;
 	}
 }
@@ -992,7 +993,7 @@ static void cached_dev_make_request(struct request_queue *q, struct bio *bio)
 	} else {
 		if ((bio->bi_rw & REQ_DISCARD) &&
 		    !blk_queue_discard(bdev_get_queue(dc->bdev)))
-			bio_endio(bio, 0);
+			bio_endio(bio);
 		else
 			bch_generic_make_request(bio, &d->bio_split_hook);
 	}
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index fc8e545ced18..be01fd3c87f1 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -221,7 +221,7 @@ err:
 	return err;
 }
 
-static void write_bdev_super_endio(struct bio *bio, int error)
+static void write_bdev_super_endio(struct bio *bio)
 {
 	struct cached_dev *dc = bio->bi_private;
 	/* XXX: error checking */
@@ -290,11 +290,11 @@ void bch_write_bdev_super(struct cached_dev *dc, struct closure *parent)
 	closure_return_with_destructor(cl, bch_write_bdev_super_unlock);
 }
 
-static void write_super_endio(struct bio *bio, int error)
+static void write_super_endio(struct bio *bio)
 {
 	struct cache *ca = bio->bi_private;
 
-	bch_count_io_errors(ca, error, "writing superblock");
+	bch_count_io_errors(ca, bio->bi_error, "writing superblock");
 	closure_put(&ca->set->sb_write);
 }
 
@@ -339,12 +339,12 @@ void bcache_write_super(struct cache_set *c)
 
 /* UUID io */
 
-static void uuid_endio(struct bio *bio, int error)
+static void uuid_endio(struct bio *bio)
 {
 	struct closure *cl = bio->bi_private;
 	struct cache_set *c = container_of(cl, struct cache_set, uuid_write);
 
-	cache_set_err_on(error, c, "accessing uuids");
+	cache_set_err_on(bio->bi_error, c, "accessing uuids");
 	bch_bbio_free(bio, c);
 	closure_put(cl);
 }
@@ -512,11 +512,11 @@ static struct uuid_entry *uuid_find_empty(struct cache_set *c)
  * disk.
  */
 
-static void prio_endio(struct bio *bio, int error)
+static void prio_endio(struct bio *bio)
 {
 	struct cache *ca = bio->bi_private;
 
-	cache_set_err_on(error, ca->set, "accessing priorities");
+	cache_set_err_on(bio->bi_error, ca->set, "accessing priorities");
 	bch_bbio_free(bio, ca->set);
 	closure_put(&ca->prio);
 }
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index f1986bcd1bf0..b4fc874c30fd 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -166,12 +166,12 @@ static void write_dirty_finish(struct closure *cl)
 	closure_return_with_destructor(cl, dirty_io_destructor);
 }
 
-static void dirty_endio(struct bio *bio, int error)
+static void dirty_endio(struct bio *bio)
 {
 	struct keybuf_key *w = bio->bi_private;
 	struct dirty_io *io = w->private;
 
-	if (error)
+	if (bio->bi_error)
 		SET_KEY_DIRTY(&w->key, false);
 
 	closure_put(&io->cl);
@@ -193,15 +193,15 @@ static void write_dirty(struct closure *cl)
 	continue_at(cl, write_dirty_finish, system_wq);
 }
 
-static void read_dirty_endio(struct bio *bio, int error)
+static void read_dirty_endio(struct bio *bio)
 {
 	struct keybuf_key *w = bio->bi_private;
 	struct dirty_io *io = w->private;
 
 	bch_count_io_errors(PTR_CACHE(io->dc->disk.c, &w->key, 0),
-			    error, "reading dirty data from cache");
+			    bio->bi_error, "reading dirty data from cache");
 
-	dirty_endio(bio, error);
+	dirty_endio(bio);
 }
 
 static void read_dirty_submit(struct closure *cl)
diff --git a/drivers/md/dm-bio-prison.c b/drivers/md/dm-bio-prison.c
index cd6d1d21e057..03af174485d3 100644
--- a/drivers/md/dm-bio-prison.c
+++ b/drivers/md/dm-bio-prison.c
@@ -236,8 +236,10 @@ void dm_cell_error(struct dm_bio_prison *prison,
 	bio_list_init(&bios);
 	dm_cell_release(prison, cell, &bios);
 
-	while ((bio = bio_list_pop(&bios)))
-		bio_endio(bio, error);
+	while ((bio = bio_list_pop(&bios))) {
+		bio->bi_error = error;
+		bio_endio(bio);
+	}
 }
 EXPORT_SYMBOL_GPL(dm_cell_error);
 
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 86dbbc737402..83cc52eaf56d 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -545,7 +545,8 @@ static void dmio_complete(unsigned long error, void *context)
 {
 	struct dm_buffer *b = context;
 
-	b->bio.bi_end_io(&b->bio, error ? -EIO : 0);
+	b->bio.bi_error = error ? -EIO : 0;
+	b->bio.bi_end_io(&b->bio);
 }
 
 static void use_dmio(struct dm_buffer *b, int rw, sector_t block,
@@ -575,13 +576,16 @@ static void use_dmio(struct dm_buffer *b, int rw, sector_t block,
 	b->bio.bi_end_io = end_io;
 
 	r = dm_io(&io_req, 1, &region, NULL);
-	if (r)
-		end_io(&b->bio, r);
+	if (r) {
+		b->bio.bi_error = r;
+		end_io(&b->bio);
+	}
 }
 
-static void inline_endio(struct bio *bio, int error)
+static void inline_endio(struct bio *bio)
 {
 	bio_end_io_t *end_fn = bio->bi_private;
+	int error = bio->bi_error;
 
 	/*
 	 * Reset the bio to free any attached resources
@@ -589,7 +593,8 @@ static void inline_endio(struct bio *bio, int error)
 	 */
 	bio_reset(bio);
 
-	end_fn(bio, error);
+	bio->bi_error = error;
+	end_fn(bio);
 }
 
 static void use_inline_bio(struct dm_buffer *b, int rw, sector_t block,
@@ -661,13 +666,14 @@ static void submit_io(struct dm_buffer *b, int rw, sector_t block,
  * Set the error, clear B_WRITING bit and wake anyone who was waiting on
  * it.
  */
-static void write_endio(struct bio *bio, int error)
+static void write_endio(struct bio *bio)
 {
 	struct dm_buffer *b = container_of(bio, struct dm_buffer, bio);
 
-	b->write_error = error;
-	if (unlikely(error)) {
+	b->write_error = bio->bi_error;
+	if (unlikely(bio->bi_error)) {
 		struct dm_bufio_client *c = b->c;
+		int error = bio->bi_error;
 		(void)cmpxchg(&c->async_write_error, 0, error);
 	}
 
@@ -1026,11 +1032,11 @@ found_buffer:
  * The endio routine for reading: set the error, clear the bit and wake up
  * anyone waiting on the buffer.
  */
-static void read_endio(struct bio *bio, int error)
+static void read_endio(struct bio *bio)
 {
 	struct dm_buffer *b = container_of(bio, struct dm_buffer, bio);
 
-	b->read_error = error;
+	b->read_error = bio->bi_error;
 
 	BUG_ON(!test_bit(B_READING, &b->state));
 
diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 1b4e1756b169..04d0dadc48b1 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -919,14 +919,14 @@ static void defer_writethrough_bio(struct cache *cache, struct bio *bio)
 	wake_worker(cache);
 }
 
-static void writethrough_endio(struct bio *bio, int err)
+static void writethrough_endio(struct bio *bio)
 {
 	struct per_bio_data *pb = get_per_bio_data(bio, PB_DATA_SIZE_WT);
 
 	dm_unhook_bio(&pb->hook_info, bio);
 
-	if (err) {
-		bio_endio(bio, err);
+	if (bio->bi_error) {
+		bio_endio(bio);
 		return;
 	}
 
@@ -1231,7 +1231,7 @@ static void migration_success_post_commit(struct dm_cache_migration *mg)
 			 * The block was promoted via an overwrite, so it's dirty.
 			 */
 			set_dirty(cache, mg->new_oblock, mg->cblock);
-			bio_endio(mg->new_ocell->holder, 0);
+			bio_endio(mg->new_ocell->holder);
 			cell_defer(cache, mg->new_ocell, false);
 		}
 		free_io_migration(mg);
@@ -1284,7 +1284,7 @@ static void issue_copy(struct dm_cache_migration *mg)
 	}
 }
 
-static void overwrite_endio(struct bio *bio, int err)
+static void overwrite_endio(struct bio *bio)
 {
 	struct dm_cache_migration *mg = bio->bi_private;
 	struct cache *cache = mg->cache;
@@ -1294,7 +1294,7 @@ static void overwrite_endio(struct bio *bio, int err)
 
 	dm_unhook_bio(&pb->hook_info, bio);
 
-	if (err)
+	if (bio->bi_error)
 		mg->err = true;
 
 	mg->requeue_holder = false;
@@ -1358,7 +1358,7 @@ static void issue_discard(struct dm_cache_migration *mg)
 		b = to_dblock(from_dblock(b) + 1);
 	}
 
-	bio_endio(bio, 0);
+	bio_endio(bio);
 	cell_defer(mg->cache, mg->new_ocell, false);
 	free_migration(mg);
 }
@@ -1631,7 +1631,7 @@ static void process_discard_bio(struct cache *cache, struct prealloc *structs,
 
 	calc_discard_block_range(cache, bio, &b, &e);
 	if (b == e) {
-		bio_endio(bio, 0);
+		bio_endio(bio);
 		return;
 	}
 
@@ -2213,8 +2213,10 @@ static void requeue_deferred_bios(struct cache *cache)
 	bio_list_merge(&bios, &cache->deferred_bios);
 	bio_list_init(&cache->deferred_bios);
 
-	while ((bio = bio_list_pop(&bios)))
-		bio_endio(bio, DM_ENDIO_REQUEUE);
+	while ((bio = bio_list_pop(&bios))) {
+		bio->bi_error = DM_ENDIO_REQUEUE;
+		bio_endio(bio);
+	}
 }
 
 static int more_work(struct cache *cache)
@@ -3119,7 +3121,7 @@ static int cache_map(struct dm_target *ti, struct bio *bio)
 			 * This is a duplicate writethrough io that is no
 			 * longer needed because the block has been demoted.
 			 */
-			bio_endio(bio, 0);
+			bio_endio(bio);
 			// FIXME: remap everything as a miss
 			cell_defer(cache, cell, false);
 			r = DM_MAPIO_SUBMITTED;
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 0f48fed44a17..744b80c608e5 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1076,7 +1076,8 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
 	if (io->ctx.req)
 		crypt_free_req(cc, io->ctx.req, base_bio);
 
-	bio_endio(base_bio, error);
+	base_bio->bi_error = error;
+	bio_endio(base_bio);
 }
 
 /*
@@ -1096,15 +1097,12 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
  * The work is done per CPU global for all dm-crypt instances.
  * They should not depend on each other and do not block.
  */
-static void crypt_endio(struct bio *clone, int error)
+static void crypt_endio(struct bio *clone)
 {
 	struct dm_crypt_io *io = clone->bi_private;
 	struct crypt_config *cc = io->cc;
 	unsigned rw = bio_data_dir(clone);
 
-	if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error))
-		error = -EIO;
-
 	/*
 	 * free the processed pages
 	 */
@@ -1113,13 +1111,13 @@ static void crypt_endio(struct bio *clone, int error)
 
 	bio_put(clone);
 
-	if (rw == READ && !error) {
+	if (rw == READ && !clone->bi_error) {
 		kcryptd_queue_crypt(io);
 		return;
 	}
 
-	if (unlikely(error))
-		io->error = error;
+	if (unlikely(clone->bi_error))
+		io->error = clone->bi_error;
 
 	crypt_dec_pending(io);
 }
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index b257e46876d3..04481247aab8 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -296,7 +296,7 @@ static int flakey_map(struct dm_target *ti, struct bio *bio)
 		 * Drop writes?
 		 */
 		if (test_bit(DROP_WRITES, &fc->flags)) {
-			bio_endio(bio, 0);
+			bio_endio(bio);
 			return DM_MAPIO_SUBMITTED;
 		}
 
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 74adcd2c967e..efc6659f9d6a 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -134,12 +134,12 @@ static void dec_count(struct io *io, unsigned int region, int error)
 		complete_io(io);
 }
 
-static void endio(struct bio *bio, int error)
+static void endio(struct bio *bio)
 {
 	struct io *io;
 	unsigned region;
 
-	if (error && bio_data_dir(bio) == READ)
+	if (bio->bi_error && bio_data_dir(bio) == READ)
 		zero_fill_bio(bio);
 
 	/*
@@ -149,7 +149,7 @@ static void endio(struct bio *bio, int error)
 
 	bio_put(bio);
 
-	dec_count(io, region, error);
+	dec_count(io, region, bio->bi_error);
 }
 
 /*-----------------------------------------------------------------
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index ad1b049ae2ab..e9d17488d5e3 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -146,16 +146,16 @@ static void put_io_block(struct log_writes_c *lc)
 	}
 }
 
-static void log_end_io(struct bio *bio, int err)
+static void log_end_io(struct bio *bio)
 {
 	struct log_writes_c *lc = bio->bi_private;
 	struct bio_vec *bvec;
 	int i;
 
-	if (err) {
+	if (bio->bi_error) {
 		unsigned long flags;
 
-		DMERR("Error writing log block, error=%d", err);
+		DMERR("Error writing log block, error=%d", bio->bi_error);
 		spin_lock_irqsave(&lc->blocks_lock, flags);
 		lc->logging_enabled = false;
 		spin_unlock_irqrestore(&lc->blocks_lock, flags);
@@ -205,7 +205,6 @@ static int write_metadata(struct log_writes_c *lc, void *entry,
 	bio->bi_bdev = lc->logdev->bdev;
 	bio->bi_end_io = log_end_io;
 	bio->bi_private = lc;
-	set_bit(BIO_UPTODATE, &bio->bi_flags);
 
 	page = alloc_page(GFP_KERNEL);
 	if (!page) {
@@ -270,7 +269,6 @@ static int log_one_block(struct log_writes_c *lc,
 	bio->bi_bdev = lc->logdev->bdev;
 	bio->bi_end_io = log_end_io;
 	bio->bi_private = lc;
-	set_bit(BIO_UPTODATE, &bio->bi_flags);
 
 	for (i = 0; i < block->vec_cnt; i++) {
 		/*
@@ -292,7 +290,6 @@ static int log_one_block(struct log_writes_c *lc,
 			bio->bi_bdev = lc->logdev->bdev;
 			bio->bi_end_io = log_end_io;
 			bio->bi_private = lc;
-			set_bit(BIO_UPTODATE, &bio->bi_flags);
 
 			ret = bio_add_page(bio, block->vecs[i].bv_page,
 					   block->vecs[i].bv_len, 0);
@@ -606,7 +603,7 @@ static int log_writes_map(struct dm_target *ti, struct bio *bio)
 		WARN_ON(flush_bio || fua_bio);
 		if (lc->device_supports_discard)
 			goto map_bio;
-		bio_endio(bio, 0);
+		bio_endio(bio);
 		return DM_MAPIO_SUBMITTED;
 	}
 
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index d83696bf403b..e1eabfb2f52d 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -490,9 +490,11 @@ static void hold_bio(struct mirror_set *ms, struct bio *bio)
 		 * If device is suspended, complete the bio.
 		 */
 		if (dm_noflush_suspending(ms->ti))
-			bio_endio(bio, DM_ENDIO_REQUEUE);
+			bio->bi_error = DM_ENDIO_REQUEUE;
 		else
-			bio_endio(bio, -EIO);
+			bio->bi_error = -EIO;
+
+		bio_endio(bio);
 		return;
 	}
 
@@ -515,7 +517,7 @@ static void read_callback(unsigned long error, void *context)
 	bio_set_m(bio, NULL);
 
 	if (likely(!error)) {
-		bio_endio(bio, 0);
+		bio_endio(bio);
 		return;
 	}
 
@@ -531,7 +533,7 @@ static void read_callback(unsigned long error, void *context)
 
 	DMERR_LIMIT("Read failure on mirror device %s.  Failing I/O.",
 		    m->dev->name);
-	bio_endio(bio, -EIO);
+	bio_io_error(bio);
 }
 
 /* Asynchronous read. */
@@ -580,7 +582,7 @@ static void do_reads(struct mirror_set *ms, struct bio_list *reads)
 		if (likely(m))
 			read_async_bio(m, bio);
 		else
-			bio_endio(bio, -EIO);
+			bio_io_error(bio);
 	}
 }
 
@@ -598,7 +600,7 @@ static void do_reads(struct mirror_set *ms, struct bio_list *reads)
 
 static void write_callback(unsigned long error, void *context)
 {
-	unsigned i, ret = 0;
+	unsigned i;
 	struct bio *bio = (struct bio *) context;
 	struct mirror_set *ms;
 	int should_wake = 0;
@@ -614,7 +616,7 @@ static void write_callback(unsigned long error, void *context)
 	 * regions with the same code.
 	 */
 	if (likely(!error)) {
-		bio_endio(bio, ret);
+		bio_endio(bio);
 		return;
 	}
 
@@ -623,7 +625,8 @@ static void write_callback(unsigned long error, void *context)
 	 * degrade the array.
 	 */
 	if (bio->bi_rw & REQ_DISCARD) {
-		bio_endio(bio, -EOPNOTSUPP);
+		bio->bi_error = -EOPNOTSUPP;
+		bio_endio(bio);
 		return;
 	}
 
@@ -828,13 +831,12 @@ static void do_failures(struct mirror_set *ms, struct bio_list *failures)
 		 * be wrong if the failed leg returned after reboot and
 		 * got replicated back to the good legs.)
 		 */
-
 		if (unlikely(!get_valid_mirror(ms) || (keep_log(ms) && ms->log_failure)))
-			bio_endio(bio, -EIO);
+			bio_io_error(bio);
 		else if (errors_handled(ms) && !keep_log(ms))
 			hold_bio(ms, bio);
 		else
-			bio_endio(bio, 0);
+			bio_endio(bio);
 	}
 }
 
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 7c82d3ccce87..dd8ca0bb0980 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1490,7 +1490,7 @@ out:
 		error_bios(snapshot_bios);
 	} else {
 		if (full_bio)
-			bio_endio(full_bio, 0);
+			bio_endio(full_bio);
 		flush_bios(snapshot_bios);
 	}
 
@@ -1580,11 +1580,11 @@ static void start_copy(struct dm_snap_pending_exception *pe)
 	dm_kcopyd_copy(s->kcopyd_client, &src, 1, &dest, 0, copy_callback, pe);
 }
 
-static void full_bio_end_io(struct bio *bio, int error)
+static void full_bio_end_io(struct bio *bio)
 {
 	void *callback_data = bio->bi_private;
 
-	dm_kcopyd_do_callback(callback_data, 0, error ? 1 : 0);
+	dm_kcopyd_do_callback(callback_data, 0, bio->bi_error ? 1 : 0);
 }
 
 static void start_full_bio(struct dm_snap_pending_exception *pe,
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index a672a1502c14..4f94c7da82f6 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -273,7 +273,7 @@ static int stripe_map_range(struct stripe_c *sc, struct bio *bio,
 		return DM_MAPIO_REMAPPED;
 	} else {
 		/* The range doesn't map to the target stripe */
-		bio_endio(bio, 0);
+		bio_endio(bio);
 		return DM_MAPIO_SUBMITTED;
 	}
 }
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index c33f61a4cc28..2ade2c46dca9 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -614,8 +614,10 @@ static void error_bio_list(struct bio_list *bios, int error)
 {
 	struct bio *bio;
 
-	while ((bio = bio_list_pop(bios)))
-		bio_endio(bio, error);
+	while ((bio = bio_list_pop(bios))) {
+		bio->bi_error = error;
+		bio_endio(bio);
+	}
 }
 
 static void error_thin_bio_list(struct thin_c *tc, struct bio_list *master, int error)
@@ -864,14 +866,14 @@ static void copy_complete(int read_err, unsigned long write_err, void *context)
 	complete_mapping_preparation(m);
 }
 
-static void overwrite_endio(struct bio *bio, int err)
+static void overwrite_endio(struct bio *bio)
 {
 	struct dm_thin_endio_hook *h = dm_per_bio_data(bio, sizeof(struct dm_thin_endio_hook));
 	struct dm_thin_new_mapping *m = h->overwrite_mapping;
 
 	bio->bi_end_io = m->saved_bi_end_io;
 
-	m->err = err;
+	m->err = bio->bi_error;
 	complete_mapping_preparation(m);
 }
 
@@ -996,7 +998,7 @@ static void process_prepared_mapping(struct dm_thin_new_mapping *m)
 	 */
 	if (bio) {
 		inc_remap_and_issue_cell(tc, m->cell, m->data_block);
-		bio_endio(bio, 0);
+		bio_endio(bio);
 	} else {
 		inc_all_io_entry(tc->pool, m->cell->holder);
 		remap_and_issue(tc, m->cell->holder, m->data_block);
@@ -1026,7 +1028,7 @@ static void process_prepared_discard_fail(struct dm_thin_new_mapping *m)
 
 static void process_prepared_discard_success(struct dm_thin_new_mapping *m)
 {
-	bio_endio(m->bio, 0);
+	bio_endio(m->bio);
 	free_discard_mapping(m);
 }
 
@@ -1040,7 +1042,7 @@ static void process_prepared_discard_no_passdown(struct dm_thin_new_mapping *m)
 		metadata_operation_failed(tc->pool, "dm_thin_remove_range", r);
 		bio_io_error(m->bio);
 	} else
-		bio_endio(m->bio, 0);
+		bio_endio(m->bio);
 
 	cell_defer_no_holder(tc, m->cell);
 	mempool_free(m, tc->pool->mapping_pool);
@@ -1111,7 +1113,8 @@ static void process_prepared_discard_passdown(struct dm_thin_new_mapping *m)
 	 * Even if r is set, there could be sub discards in flight that we
 	 * need to wait for.
 	 */
-	bio_endio(m->bio, r);
+	m->bio->bi_error = r;
+	bio_endio(m->bio);
 	cell_defer_no_holder(tc, m->cell);
 	mempool_free(m, pool->mapping_pool);
 }
@@ -1487,9 +1490,10 @@ static void handle_unserviceable_bio(struct pool *pool, struct bio *bio)
 {
 	int error = should_error_unserviceable_bio(pool);
 
-	if (error)
-		bio_endio(bio, error);
-	else
+	if (error) {
+		bio->bi_error = error;
+		bio_endio(bio);
+	} else
 		retry_on_resume(bio);
 }
 
@@ -1625,7 +1629,7 @@ static void process_discard_cell_passdown(struct thin_c *tc, struct dm_bio_priso
 	 * will prevent completion until the sub range discards have
 	 * completed.
 	 */
-	bio_endio(bio, 0);
+	bio_endio(bio);
 }
 
 static void process_discard_bio(struct thin_c *tc, struct bio *bio)
@@ -1639,7 +1643,7 @@ static void process_discard_bio(struct thin_c *tc, struct bio *bio)
 		/*
 		 * The discard covers less than a block.
 		 */
-		bio_endio(bio, 0);
+		bio_endio(bio);
 		return;
 	}
 
@@ -1784,7 +1788,7 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
 	if (bio_data_dir(bio) == READ) {
 		zero_fill_bio(bio);
 		cell_defer_no_holder(tc, cell);
-		bio_endio(bio, 0);
+		bio_endio(bio);
 		return;
 	}
 
@@ -1849,7 +1853,7 @@ static void process_cell(struct thin_c *tc, struct dm_bio_prison_cell *cell)
 
 			} else {
 				zero_fill_bio(bio);
-				bio_endio(bio, 0);
+				bio_endio(bio);
 			}
 		} else
 			provision_block(tc, bio, block, cell);
@@ -1920,7 +1924,7 @@ static void __process_bio_read_only(struct thin_c *tc, struct bio *bio,
 		}
 
 		zero_fill_bio(bio);
-		bio_endio(bio, 0);
+		bio_endio(bio);
 		break;
 
 	default:
@@ -1945,7 +1949,7 @@ static void process_cell_read_only(struct thin_c *tc, struct dm_bio_prison_cell
 
 static void process_bio_success(struct thin_c *tc, struct bio *bio)
 {
-	bio_endio(bio, 0);
+	bio_endio(bio);
 }
 
 static void process_bio_fail(struct thin_c *tc, struct bio *bio)
@@ -2581,7 +2585,8 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
 	thin_hook_bio(tc, bio);
 
 	if (tc->requeue_mode) {
-		bio_endio(bio, DM_ENDIO_REQUEUE);
+		bio->bi_error = DM_ENDIO_REQUEUE;
+		bio_endio(bio);
 		return DM_MAPIO_SUBMITTED;
 	}
 
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
index bb9c6a00e4b0..4b34df8fdb58 100644
--- a/drivers/md/dm-verity.c
+++ b/drivers/md/dm-verity.c
@@ -458,8 +458,9 @@ static void verity_finish_io(struct dm_verity_io *io, int error)
 
 	bio->bi_end_io = io->orig_bi_end_io;
 	bio->bi_private = io->orig_bi_private;
+	bio->bi_error = error;
 
-	bio_endio(bio, error);
+	bio_endio(bio);
 }
 
 static void verity_work(struct work_struct *w)
@@ -469,12 +470,12 @@ static void verity_work(struct work_struct *w)
 	verity_finish_io(io, verity_verify_io(io));
 }
 
-static void verity_end_io(struct bio *bio, int error)
+static void verity_end_io(struct bio *bio)
 {
 	struct dm_verity_io *io = bio->bi_private;
 
-	if (error) {
-		verity_finish_io(io, error);
+	if (bio->bi_error) {
+		verity_finish_io(io, bio->bi_error);
 		return;
 	}
 
diff --git a/drivers/md/dm-zero.c b/drivers/md/dm-zero.c
index b9a64bbce304..766bc93006e6 100644
--- a/drivers/md/dm-zero.c
+++ b/drivers/md/dm-zero.c
@@ -47,7 +47,7 @@ static int zero_map(struct dm_target *ti, struct bio *bio)
 		break;
 	}
 
-	bio_endio(bio, 0);
+	bio_endio(bio);
 
 	/* accepted bio, don't make new request */
 	return DM_MAPIO_SUBMITTED;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f331d888e7f5..7f367fcace03 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -944,7 +944,8 @@ static void dec_pending(struct dm_io *io, int error)
 		} else {
 			/* done with normal IO or empty flush */
 			trace_block_bio_complete(md->queue, bio, io_error);
-			bio_endio(bio, io_error);
+			bio->bi_error = io_error;
+			bio_endio(bio);
 		}
 	}
 }
@@ -957,17 +958,15 @@ static void disable_write_same(struct mapped_device *md)
 	limits->max_write_same_sectors = 0;
 }
 
-static void clone_endio(struct bio *bio, int error)
+static void clone_endio(struct bio *bio)
 {
+	int error = bio->bi_error;
 	int r = error;
 	struct dm_target_io *tio = container_of(bio, struct dm_target_io, clone);
 	struct dm_io *io = tio->io;
 	struct mapped_device *md = tio->io->md;
 	dm_endio_fn endio = tio->ti->type->end_io;
 
-	if (!bio_flagged(bio, BIO_UPTODATE) && !error)
-		error = -EIO;
-
 	if (endio) {
 		r = endio(tio->ti, bio, error);
 		if (r < 0 || r == DM_ENDIO_REQUEUE)
@@ -996,7 +995,7 @@ static void clone_endio(struct bio *bio, int error)
 /*
  * Partial completion handling for request-based dm
  */
-static void end_clone_bio(struct bio *clone, int error)
+static void end_clone_bio(struct bio *clone)
 {
 	struct dm_rq_clone_bio_info *info =
 		container_of(clone, struct dm_rq_clone_bio_info, clone);
@@ -1013,13 +1012,13 @@ static void end_clone_bio(struct bio *clone, int error)
 		 * the remainder.
 		 */
 		return;
-	else if (error) {
+	else if (bio->bi_error) {
 		/*
 		 * Don't notice the error to the upper layer yet.
 		 * The error handling decision is made by the target driver,
 		 * when the request is completed.
 		 */
-		tio->error = error;
+		tio->error = bio->bi_error;
 		return;
 	}
 
diff --git a/drivers/md/faulty.c b/drivers/md/faulty.c
index 1277eb26b58a..4a8e15058e8b 100644
--- a/drivers/md/faulty.c
+++ b/drivers/md/faulty.c
@@ -70,7 +70,7 @@
 #include <linux/seq_file.h>
 
 
-static void faulty_fail(struct bio *bio, int error)
+static void faulty_fail(struct bio *bio)
 {
 	struct bio *b = bio->bi_private;
 
@@ -181,7 +181,7 @@ static void make_request(struct mddev *mddev, struct bio *bio)
 			/* special case - don't decrement, don't generic_make_request,
 			 * just fail immediately
 			 */
-			bio_endio(bio, -EIO);
+			bio_io_error(bio);
 			return;
 		}
 
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index fa7d577f3d12..aefd66142eef 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -297,7 +297,7 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)
 		if (unlikely((split->bi_rw & REQ_DISCARD) &&
 			 !blk_queue_discard(bdev_get_queue(split->bi_bdev)))) {
 			/* Just ignore it */
-			bio_endio(split, 0);
+			bio_endio(split);
 		} else
 			generic_make_request(split);
 	} while (split != bio);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d429c30cd514..ac4381a6625c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -263,7 +263,9 @@ static void md_make_request(struct request_queue *q, struct bio *bio)
 		return;
 	}
 	if (mddev->ro == 1 && unlikely(rw == WRITE)) {
-		bio_endio(bio, bio_sectors(bio) == 0 ? 0 : -EROFS);
+		if (bio_sectors(bio) != 0)
+			bio->bi_error = -EROFS;
+		bio_endio(bio);
 		return;
 	}
 	smp_rmb(); /* Ensure implications of  'active' are visible */
@@ -377,7 +379,7 @@ static int md_mergeable_bvec(struct request_queue *q,
  * Generic flush handling for md
  */
 
-static void md_end_flush(struct bio *bio, int err)
+static void md_end_flush(struct bio *bio)
 {
 	struct md_rdev *rdev = bio->bi_private;
 	struct mddev *mddev = rdev->mddev;
@@ -433,7 +435,7 @@ static void md_submit_flush_data(struct work_struct *ws)
 
 	if (bio->bi_iter.bi_size == 0)
 		/* an empty barrier - all done */
-		bio_endio(bio, 0);
+		bio_endio(bio);
 	else {
 		bio->bi_rw &= ~REQ_FLUSH;
 		mddev->pers->make_request(mddev, bio);
@@ -728,15 +730,13 @@ void md_rdev_clear(struct md_rdev *rdev)
 }
 EXPORT_SYMBOL_GPL(md_rdev_clear);
 
-static void super_written(struct bio *bio, int error)
+static void super_written(struct bio *bio)
 {
 	struct md_rdev *rdev = bio->bi_private;
 	struct mddev *mddev = rdev->mddev;
 
-	if (error || !test_bit(BIO_UPTODATE, &bio->bi_flags)) {
-		printk("md: super_written gets error=%d, uptodate=%d\n",
-		       error, test_bit(BIO_UPTODATE, &bio->bi_flags));
-		WARN_ON(test_bit(BIO_UPTODATE, &bio->bi_flags));
+	if (bio->bi_error) {
+		printk("md: super_written gets error=%d\n", bio->bi_error);
 		md_error(mddev, rdev);
 	}
 
@@ -791,7 +791,7 @@ int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
 	bio_add_page(bio, page, size, 0);
 	submit_bio_wait(rw, bio);
 
-	ret = test_bit(BIO_UPTODATE, &bio->bi_flags);
+	ret = !bio->bi_error;
 	bio_put(bio);
 	return ret;
 }
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index ac3ede2bd00e..082a489af9d3 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -77,18 +77,18 @@ static void multipath_end_bh_io (struct multipath_bh *mp_bh, int err)
 	struct bio *bio = mp_bh->master_bio;
 	struct mpconf *conf = mp_bh->mddev->private;
 
-	bio_endio(bio, err);
+	bio->bi_error = err;
+	bio_endio(bio);
 	mempool_free(mp_bh, conf->pool);
 }
 
-static void multipath_end_request(struct bio *bio, int error)
+static void multipath_end_request(struct bio *bio)
 {
-	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	struct multipath_bh *mp_bh = bio->bi_private;
 	struct mpconf *conf = mp_bh->mddev->private;
 	struct md_rdev *rdev = conf->multipaths[mp_bh->path].rdev;
 
-	if (uptodate)
+	if (!bio->bi_error)
 		multipath_end_bh_io(mp_bh, 0);
 	else if (!(bio->bi_rw & REQ_RAHEAD)) {
 		/*
@@ -101,7 +101,7 @@ static void multipath_end_request(struct bio *bio, int error)
 		       (unsigned long long)bio->bi_iter.bi_sector);
 		multipath_reschedule_retry(mp_bh);
 	} else
-		multipath_end_bh_io(mp_bh, error);
+		multipath_end_bh_io(mp_bh, bio->bi_error);
 	rdev_dec_pending(rdev, conf->mddev);
 }
 
@@ -123,7 +123,7 @@ static void multipath_make_request(struct mddev *mddev, struct bio * bio)
 
 	mp_bh->path = multipath_map(conf);
 	if (mp_bh->path < 0) {
-		bio_endio(bio, -EIO);
+		bio_io_error(bio);
 		mempool_free(mp_bh, conf->pool);
 		return;
 	}
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index efb654eb5399..e6e0ae56f66b 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -543,7 +543,7 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
 		if (unlikely((split->bi_rw & REQ_DISCARD) &&
 			 !blk_queue_discard(bdev_get_queue(split->bi_bdev)))) {
 			/* Just ignore it */
-			bio_endio(split, 0);
+			bio_endio(split);
 		} else
 			generic_make_request(split);
 	} while (split != bio);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index f80f1af61ce7..9aa7d1fb2bc1 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -255,9 +255,10 @@ static void call_bio_endio(struct r1bio *r1_bio)
 		done = 1;
 
 	if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
-		clear_bit(BIO_UPTODATE, &bio->bi_flags);
+		bio->bi_error = -EIO;
+
 	if (done) {
-		bio_endio(bio, 0);
+		bio_endio(bio);
 		/*
 		 * Wake up any possible resync thread that waits for the device
 		 * to go idle.
@@ -312,9 +313,9 @@ static int find_bio_disk(struct r1bio *r1_bio, struct bio *bio)
 	return mirror;
 }
 
-static void raid1_end_read_request(struct bio *bio, int error)
+static void raid1_end_read_request(struct bio *bio)
 {
-	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+	int uptodate = !bio->bi_error;
 	struct r1bio *r1_bio = bio->bi_private;
 	int mirror;
 	struct r1conf *conf = r1_bio->mddev->private;
@@ -397,9 +398,8 @@ static void r1_bio_write_done(struct r1bio *r1_bio)
 	}
 }
 
-static void raid1_end_write_request(struct bio *bio, int error)
+static void raid1_end_write_request(struct bio *bio)
 {
-	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	struct r1bio *r1_bio = bio->bi_private;
 	int mirror, behind = test_bit(R1BIO_BehindIO, &r1_bio->state);
 	struct r1conf *conf = r1_bio->mddev->private;
@@ -410,7 +410,7 @@ static void raid1_end_write_request(struct bio *bio, int error)
 	/*
 	 * 'one mirror IO has finished' event handler:
 	 */
-	if (!uptodate) {
+	if (bio->bi_error) {
 		set_bit(WriteErrorSeen,
 			&conf->mirrors[mirror].rdev->flags);
 		if (!test_and_set_bit(WantReplacement,
@@ -793,7 +793,7 @@ static void flush_pending_writes(struct r1conf *conf)
 			if (unlikely((bio->bi_rw & REQ_DISCARD) &&
 			    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
 				/* Just ignore it */
-				bio_endio(bio, 0);
+				bio_endio(bio);
 			else
 				generic_make_request(bio);
 			bio = next;
@@ -1068,7 +1068,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
 		if (unlikely((bio->bi_rw & REQ_DISCARD) &&
 		    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
 			/* Just ignore it */
-			bio_endio(bio, 0);
+			bio_endio(bio);
 		else
 			generic_make_request(bio);
 		bio = next;
@@ -1734,7 +1734,7 @@ abort:
 	return err;
 }
 
-static void end_sync_read(struct bio *bio, int error)
+static void end_sync_read(struct bio *bio)
 {
 	struct r1bio *r1_bio = bio->bi_private;
 
@@ -1745,16 +1745,16 @@ static void end_sync_read(struct bio *bio, int error)
 	 * or re-read if the read failed.
 	 * We don't do much here, just schedule handling by raid1d
 	 */
-	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
+	if (!bio->bi_error)
 		set_bit(R1BIO_Uptodate, &r1_bio->state);
 
 	if (atomic_dec_and_test(&r1_bio->remaining))
 		reschedule_retry(r1_bio);
 }
 
-static void end_sync_write(struct bio *bio, int error)
+static void end_sync_write(struct bio *bio)
 {
-	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+	int uptodate = !bio->bi_error;
 	struct r1bio *r1_bio = bio->bi_private;
 	struct mddev *mddev = r1_bio->mddev;
 	struct r1conf *conf = mddev->private;
@@ -1941,7 +1941,7 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 		idx ++;
 	}
 	set_bit(R1BIO_Uptodate, &r1_bio->state);
-	set_bit(BIO_UPTODATE, &bio->bi_flags);
+	bio->bi_error = 0;
 	return 1;
 }
 
@@ -1965,15 +1965,14 @@ static void process_checks(struct r1bio *r1_bio)
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 		int j;
 		int size;
-		int uptodate;
+		int error;
 		struct bio *b = r1_bio->bios[i];
 		if (b->bi_end_io != end_sync_read)
 			continue;
-		/* fixup the bio for reuse, but preserve BIO_UPTODATE */
-		uptodate = test_bit(BIO_UPTODATE, &b->bi_flags);
+		/* fixup the bio for reuse, but preserve errno */
+		error = b->bi_error;
 		bio_reset(b);
-		if (!uptodate)
-			clear_bit(BIO_UPTODATE, &b->bi_flags);
+		b->bi_error = error;
 		b->bi_vcnt = vcnt;
 		b->bi_iter.bi_size = r1_bio->sectors << 9;
 		b->bi_iter.bi_sector = r1_bio->sector +
@@ -1996,7 +1995,7 @@ static void process_checks(struct r1bio *r1_bio)
 	}
 	for (primary = 0; primary < conf->raid_disks * 2; primary++)
 		if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
-		    test_bit(BIO_UPTODATE, &r1_bio->bios[primary]->bi_flags)) {
+		    !r1_bio->bios[primary]->bi_error) {
 			r1_bio->bios[primary]->bi_end_io = NULL;
 			rdev_dec_pending(conf->mirrors[primary].rdev, mddev);
 			break;
@@ -2006,14 +2005,14 @@ static void process_checks(struct r1bio *r1_bio)
 		int j;
 		struct bio *pbio = r1_bio->bios[primary];
 		struct bio *sbio = r1_bio->bios[i];
-		int uptodate = test_bit(BIO_UPTODATE, &sbio->bi_flags);
+		int error = sbio->bi_error;
 
 		if (sbio->bi_end_io != end_sync_read)
 			continue;
-		/* Now we can 'fixup' the BIO_UPTODATE flag */
-		set_bit(BIO_UPTODATE, &sbio->bi_flags);
+		/* Now we can 'fixup' the error value */
+		sbio->bi_error = 0;
 
-		if (uptodate) {
+		if (!error) {
 			for (j = vcnt; j-- ; ) {
 				struct page *p, *s;
 				p = pbio->bi_io_vec[j].bv_page;
@@ -2028,7 +2027,7 @@ static void process_checks(struct r1bio *r1_bio)
 		if (j >= 0)
 			atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);
 		if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
-			      && uptodate)) {
+			      && !error)) {
 			/* No need to write to this device. */
 			sbio->bi_end_io = NULL;
 			rdev_dec_pending(conf->mirrors[i].rdev, mddev);
@@ -2269,11 +2268,11 @@ static void handle_sync_write_finished(struct r1conf *conf, struct r1bio *r1_bio
 		struct bio *bio = r1_bio->bios[m];
 		if (bio->bi_end_io == NULL)
 			continue;
-		if (test_bit(BIO_UPTODATE, &bio->bi_flags) &&
+		if (!bio->bi_error &&
 		    test_bit(R1BIO_MadeGood, &r1_bio->state)) {
 			rdev_clear_badblocks(rdev, r1_bio->sector, s, 0);
 		}
-		if (!test_bit(BIO_UPTODATE, &bio->bi_flags) &&
+		if (bio->bi_error &&
 		    test_bit(R1BIO_WriteError, &r1_bio->state)) {
 			if (!rdev_set_badblocks(rdev, r1_bio->sector, s, 0))
 				md_error(conf->mddev, rdev);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 940f2f365461..929e9a26d81b 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -101,7 +101,7 @@ static int _enough(struct r10conf *conf, int previous, int ignore);
 static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr,
 				int *skipped);
 static void reshape_request_write(struct mddev *mddev, struct r10bio *r10_bio);
-static void end_reshape_write(struct bio *bio, int error);
+static void end_reshape_write(struct bio *bio);
 static void end_reshape(struct r10conf *conf);
 
 static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
@@ -307,9 +307,9 @@ static void raid_end_bio_io(struct r10bio *r10_bio)
 	} else
 		done = 1;
 	if (!test_bit(R10BIO_Uptodate, &r10_bio->state))
-		clear_bit(BIO_UPTODATE, &bio->bi_flags);
+		bio->bi_error = -EIO;
 	if (done) {
-		bio_endio(bio, 0);
+		bio_endio(bio);
 		/*
 		 * Wake up any possible resync thread that waits for the device
 		 * to go idle.
@@ -358,9 +358,9 @@ static int find_bio_disk(struct r10conf *conf, struct r10bio *r10_bio,
 	return r10_bio->devs[slot].devnum;
 }
 
-static void raid10_end_read_request(struct bio *bio, int error)
+static void raid10_end_read_request(struct bio *bio)
 {
-	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+	int uptodate = !bio->bi_error;
 	struct r10bio *r10_bio = bio->bi_private;
 	int slot, dev;
 	struct md_rdev *rdev;
@@ -438,9 +438,8 @@ static void one_write_done(struct r10bio *r10_bio)
 	}
 }
 
-static void raid10_end_write_request(struct bio *bio, int error)
+static void raid10_end_write_request(struct bio *bio)
 {
-	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	struct r10bio *r10_bio = bio->bi_private;
 	int dev;
 	int dec_rdev = 1;
@@ -460,7 +459,7 @@ static void raid10_end_write_request(struct bio *bio, int error)
 	/*
 	 * this branch is our 'one mirror IO has finished' event handler:
 	 */
-	if (!uptodate) {
+	if (bio->bi_error) {
 		if (repl)
 			/* Never record new bad blocks to replacement,
 			 * just fail it.
@@ -957,7 +956,7 @@ static void flush_pending_writes(struct r10conf *conf)
 			if (unlikely((bio->bi_rw & REQ_DISCARD) &&
 			    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
 				/* Just ignore it */
-				bio_endio(bio, 0);
+				bio_endio(bio);
 			else
 				generic_make_request(bio);
 			bio = next;
@@ -1133,7 +1132,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
 		if (unlikely((bio->bi_rw & REQ_DISCARD) &&
 		    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
 			/* Just ignore it */
-			bio_endio(bio, 0);
+			bio_endio(bio);
 		else
 			generic_make_request(bio);
 		bio = next;
@@ -1916,7 +1915,7 @@ abort:
 	return err;
 }
 
-static void end_sync_read(struct bio *bio, int error)
+static void end_sync_read(struct bio *bio)
 {
 	struct r10bio *r10_bio = bio->bi_private;
 	struct r10conf *conf = r10_bio->mddev->private;
@@ -1928,7 +1927,7 @@ static void end_sync_read(struct bio *bio, int error)
 	} else
 		d = find_bio_disk(conf, r10_bio, bio, NULL, NULL);
 
-	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
+	if (!bio->bi_error)
 		set_bit(R10BIO_Uptodate, &r10_bio->state);
 	else
 		/* The write handler will notice the lack of
@@ -1977,9 +1976,8 @@ static void end_sync_request(struct r10bio *r10_bio)
 	}
 }
 
-static void end_sync_write(struct bio *bio, int error)
+static void end_sync_write(struct bio *bio)
 {
-	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	struct r10bio *r10_bio = bio->bi_private;
 	struct mddev *mddev = r10_bio->mddev;
 	struct r10conf *conf = mddev->private;
@@ -1996,7 +1994,7 @@ static void end_sync_write(struct bio *bio, int error)
 	else
 		rdev = conf->mirrors[d].rdev;
 
-	if (!uptodate) {
+	if (bio->bi_error) {
 		if (repl)
 			md_error(mddev, rdev);
 		else {
@@ -2044,7 +2042,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 
 	/* find the first device with a block */
 	for (i=0; i<conf->copies; i++)
-		if (test_bit(BIO_UPTODATE, &r10_bio->devs[i].bio->bi_flags))
+		if (!r10_bio->devs[i].bio->bi_error)
 			break;
 
 	if (i == conf->copies)
@@ -2064,7 +2062,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 			continue;
 		if (i == first)
 			continue;
-		if (test_bit(BIO_UPTODATE, &r10_bio->devs[i].bio->bi_flags)) {
+		if (!r10_bio->devs[i].bio->bi_error) {
 			/* We know that the bi_io_vec layout is the same for
 			 * both 'first' and 'i', so we just compare them.
 			 * All vec entries are PAGE_SIZE;
@@ -2706,8 +2704,7 @@ static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
 			rdev = conf->mirrors[dev].rdev;
 			if (r10_bio->devs[m].bio == NULL)
 				continue;
-			if (test_bit(BIO_UPTODATE,
-				     &r10_bio->devs[m].bio->bi_flags)) {
+			if (!r10_bio->devs[m].bio->bi_error) {
 				rdev_clear_badblocks(
 					rdev,
 					r10_bio->devs[m].addr,
@@ -2722,8 +2719,8 @@ static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
 			rdev = conf->mirrors[dev].replacement;
 			if (r10_bio->devs[m].repl_bio == NULL)
 				continue;
-			if (test_bit(BIO_UPTODATE,
-				     &r10_bio->devs[m].repl_bio->bi_flags)) {
+
+			if (!r10_bio->devs[m].repl_bio->bi_error) {
 				rdev_clear_badblocks(
 					rdev,
 					r10_bio->devs[m].addr,
@@ -2748,8 +2745,7 @@ static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
 					r10_bio->devs[m].addr,
 					r10_bio->sectors, 0);
 				rdev_dec_pending(rdev, conf->mddev);
-			} else if (bio != NULL &&
-				   !test_bit(BIO_UPTODATE, &bio->bi_flags)) {
+			} else if (bio != NULL && bio->bi_error) {
 				if (!narrow_write_error(r10_bio, m)) {
 					md_error(conf->mddev, rdev);
 					set_bit(R10BIO_Degraded,
@@ -3263,7 +3259,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 
 			bio = r10_bio->devs[i].bio;
 			bio_reset(bio);
-			clear_bit(BIO_UPTODATE, &bio->bi_flags);
+			bio->bi_error = -EIO;
 			if (conf->mirrors[d].rdev == NULL ||
 			    test_bit(Faulty, &conf->mirrors[d].rdev->flags))
 				continue;
@@ -3300,7 +3296,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 			/* Need to set up for writing to the replacement */
 			bio = r10_bio->devs[i].repl_bio;
 			bio_reset(bio);
-			clear_bit(BIO_UPTODATE, &bio->bi_flags);
+			bio->bi_error = -EIO;
 
 			sector = r10_bio->devs[i].addr;
 			atomic_inc(&conf->mirrors[d].rdev->nr_pending);
@@ -3377,7 +3373,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 
 		if (bio->bi_end_io == end_sync_read) {
 			md_sync_acct(bio->bi_bdev, nr_sectors);
-			set_bit(BIO_UPTODATE, &bio->bi_flags);
+			bio->bi_error = 0;
 			generic_make_request(bio);
 		}
 	}
@@ -4380,7 +4376,7 @@ read_more:
 	read_bio->bi_end_io = end_sync_read;
 	read_bio->bi_rw = READ;
 	read_bio->bi_flags &= (~0UL << BIO_RESET_BITS);
-	__set_bit(BIO_UPTODATE, &read_bio->bi_flags);
+	read_bio->bi_error = 0;
 	read_bio->bi_vcnt = 0;
 	read_bio->bi_iter.bi_size = 0;
 	r10_bio->master_bio = read_bio;
@@ -4601,9 +4597,8 @@ static int handle_reshape_read_error(struct mddev *mddev,
 	return 0;
 }
 
-static void end_reshape_write(struct bio *bio, int error)
+static void end_reshape_write(struct bio *bio)
 {
-	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	struct r10bio *r10_bio = bio->bi_private;
 	struct mddev *mddev = r10_bio->mddev;
 	struct r10conf *conf = mddev->private;
@@ -4620,7 +4615,7 @@ static void end_reshape_write(struct bio *bio, int error)
 		rdev = conf->mirrors[d].rdev;
 	}
 
-	if (!uptodate) {
+	if (bio->bi_error) {
 		/* FIXME should record badblock */
 		md_error(mddev, rdev);
 	}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 59e44e99eef3..84d6eec1033e 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -233,7 +233,7 @@ static void return_io(struct bio *return_bi)
 		bi->bi_iter.bi_size = 0;
 		trace_block_bio_complete(bdev_get_queue(bi->bi_bdev),
 					 bi, 0);
-		bio_endio(bi, 0);
+		bio_endio(bi);
 		bi = return_bi;
 	}
 }
@@ -887,9 +887,9 @@ static int use_new_offset(struct r5conf *conf, struct stripe_head *sh)
 }
 
 static void
-raid5_end_read_request(struct bio *bi, int error);
+raid5_end_read_request(struct bio *bi);
 static void
-raid5_end_write_request(struct bio *bi, int error);
+raid5_end_write_request(struct bio *bi);
 
 static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
 {
@@ -2277,12 +2277,11 @@ static void shrink_stripes(struct r5conf *conf)
 	conf->slab_cache = NULL;
 }
 
-static void raid5_end_read_request(struct bio * bi, int error)
+static void raid5_end_read_request(struct bio * bi)
 {
 	struct stripe_head *sh = bi->bi_private;
 	struct r5conf *conf = sh->raid_conf;
 	int disks = sh->disks, i;
-	int uptodate = test_bit(BIO_UPTODATE, &bi->bi_flags);
 	char b[BDEVNAME_SIZE];
 	struct md_rdev *rdev = NULL;
 	sector_t s;
@@ -2291,9 +2290,9 @@ static void raid5_end_read_request(struct bio * bi, int error)
 		if (bi == &sh->dev[i].req)
 			break;
 
-	pr_debug("end_read_request %llu/%d, count: %d, uptodate %d.\n",
+	pr_debug("end_read_request %llu/%d, count: %d, error %d.\n",
 		(unsigned long long)sh->sector, i, atomic_read(&sh->count),
-		uptodate);
+		bi->bi_error);
 	if (i == disks) {
 		BUG();
 		return;
@@ -2312,7 +2311,7 @@ static void raid5_end_read_request(struct bio * bi, int error)
 		s = sh->sector + rdev->new_data_offset;
 	else
 		s = sh->sector + rdev->data_offset;
-	if (uptodate) {
+	if (!bi->bi_error) {
 		set_bit(R5_UPTODATE, &sh->dev[i].flags);
 		if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
 			/* Note that this cannot happen on a
@@ -2400,13 +2399,12 @@ static void raid5_end_read_request(struct bio * bi, int error)
 	release_stripe(sh);
 }
 
-static void raid5_end_write_request(struct bio *bi, int error)
+static void raid5_end_write_request(struct bio *bi)
 {
 	struct stripe_head *sh = bi->bi_private;
 	struct r5conf *conf = sh->raid_conf;
 	int disks = sh->disks, i;
 	struct md_rdev *uninitialized_var(rdev);
-	int uptodate = test_bit(BIO_UPTODATE, &bi->bi_flags);
 	sector_t first_bad;
 	int bad_sectors;
 	int replacement = 0;
@@ -2429,23 +2427,23 @@ static void raid5_end_write_request(struct bio *bi, int error)
 			break;
 		}
 	}
-	pr_debug("end_write_request %llu/%d, count %d, uptodate: %d.\n",
+	pr_debug("end_write_request %llu/%d, count %d, error: %d.\n",
 		(unsigned long long)sh->sector, i, atomic_read(&sh->count),
-		uptodate);
+		bi->bi_error);
 	if (i == disks) {
 		BUG();
 		return;
 	}
 
 	if (replacement) {
-		if (!uptodate)
+		if (bi->bi_error)
 			md_error(conf->mddev, rdev);
 		else if (is_badblock(rdev, sh->sector,
 				     STRIPE_SECTORS,
 				     &first_bad, &bad_sectors))
 			set_bit(R5_MadeGoodRepl, &sh->dev[i].flags);
 	} else {
-		if (!uptodate) {
+		if (bi->bi_error) {
 			set_bit(STRIPE_DEGRADED, &sh->state);
 			set_bit(WriteErrorSeen, &rdev->flags);
 			set_bit(R5_WriteError, &sh->dev[i].flags);
@@ -2466,7 +2464,7 @@ static void raid5_end_write_request(struct bio *bi, int error)
 	}
 	rdev_dec_pending(rdev, conf->mddev);
 
-	if (sh->batch_head && !uptodate && !replacement)
+	if (sh->batch_head && bi->bi_error && !replacement)
 		set_bit(STRIPE_BATCH_ERR, &sh->batch_head->state);
 
 	if (!test_and_clear_bit(R5_DOUBLE_LOCKED, &sh->dev[i].flags))
@@ -3107,7 +3105,8 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
 		while (bi && bi->bi_iter.bi_sector <
 			sh->dev[i].sector + STRIPE_SECTORS) {
 			struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector);
-			clear_bit(BIO_UPTODATE, &bi->bi_flags);
+
+			bi->bi_error = -EIO;
 			if (!raid5_dec_bi_active_stripes(bi)) {
 				md_write_end(conf->mddev);
 				bi->bi_next = *return_bi;
@@ -3131,7 +3130,8 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
 		while (bi && bi->bi_iter.bi_sector <
 		       sh->dev[i].sector + STRIPE_SECTORS) {
 			struct bio *bi2 = r5_next_bio(bi, sh->dev[i].sector);
-			clear_bit(BIO_UPTODATE, &bi->bi_flags);
+
+			bi->bi_error = -EIO;
 			if (!raid5_dec_bi_active_stripes(bi)) {
 				md_write_end(conf->mddev);
 				bi->bi_next = *return_bi;
@@ -3156,7 +3156,8 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
 			       sh->dev[i].sector + STRIPE_SECTORS) {
 				struct bio *nextbi =
 					r5_next_bio(bi, sh->dev[i].sector);
-				clear_bit(BIO_UPTODATE, &bi->bi_flags);
+
+				bi->bi_error = -EIO;
 				if (!raid5_dec_bi_active_stripes(bi)) {
 					bi->bi_next = *return_bi;
 					*return_bi = bi;
@@ -4749,12 +4750,11 @@ static struct bio *remove_bio_from_retry(struct r5conf *conf)
  *  first).
  *  If the read failed..
  */
-static void raid5_align_endio(struct bio *bi, int error)
+static void raid5_align_endio(struct bio *bi)
 {
 	struct bio* raid_bi  = bi->bi_private;
 	struct mddev *mddev;
 	struct r5conf *conf;
-	int uptodate = test_bit(BIO_UPTODATE, &bi->bi_flags);
 	struct md_rdev *rdev;
 
 	bio_put(bi);
@@ -4766,10 +4766,10 @@ static void raid5_align_endio(struct bio *bi, int error)
 
 	rdev_dec_pending(rdev, conf->mddev);
 
-	if (!error && uptodate) {
+	if (!bi->bi_error) {
 		trace_block_bio_complete(bdev_get_queue(raid_bi->bi_bdev),
 					 raid_bi, 0);
-		bio_endio(raid_bi, 0);
+		bio_endio(raid_bi);
 		if (atomic_dec_and_test(&conf->active_aligned_reads))
 			wake_up(&conf->wait_for_quiescent);
 		return;
@@ -5133,7 +5133,7 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
 	remaining = raid5_dec_bi_active_stripes(bi);
 	if (remaining == 0) {
 		md_write_end(mddev);
-		bio_endio(bi, 0);
+		bio_endio(bi);
 	}
 }
 
@@ -5297,7 +5297,7 @@ static void make_request(struct mddev *mddev, struct bio * bi)
 			release_stripe_plug(mddev, sh);
 		} else {
 			/* cannot get stripe for read-ahead, just give-up */
-			clear_bit(BIO_UPTODATE, &bi->bi_flags);
+			bi->bi_error = -EIO;
 			break;
 		}
 	}
@@ -5311,7 +5311,7 @@ static void make_request(struct mddev *mddev, struct bio * bi)
 
 		trace_block_bio_complete(bdev_get_queue(bi->bi_bdev),
 					 bi, 0);
-		bio_endio(bi, 0);
+		bio_endio(bi);
 	}
 }
 
@@ -5707,7 +5707,7 @@ static int  retry_aligned_read(struct r5conf *conf, struct bio *raid_bio)
 	if (remaining == 0) {
 		trace_block_bio_complete(bdev_get_queue(raid_bio->bi_bdev),
 					 raid_bio, 0);
-		bio_endio(raid_bio, 0);
+		bio_endio(raid_bio);
 	}
 	if (atomic_dec_and_test(&conf->active_aligned_reads))
 		wake_up(&conf->wait_for_quiescent);
diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
index 4f97b248c236..0df77cb07df6 100644
--- a/drivers/nvdimm/blk.c
+++ b/drivers/nvdimm/blk.c
@@ -180,7 +180,7 @@ static void nd_blk_make_request(struct request_queue *q, struct bio *bio)
 	 * another kernel subsystem, and we just pass it through.
 	 */
 	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
-		err = -EIO;
+		bio->bi_error = -EIO;
 		goto out;
 	}
 
@@ -199,6 +199,7 @@ static void nd_blk_make_request(struct request_queue *q, struct bio *bio)
 					"io error in %s sector %lld, len %d,\n",
 					(rw == READ) ? "READ" : "WRITE",
 					(unsigned long long) iter.bi_sector, len);
+			bio->bi_error = err;
 			break;
 		}
 	}
@@ -206,7 +207,7 @@ static void nd_blk_make_request(struct request_queue *q, struct bio *bio)
 		nd_iostat_end(bio, start);
 
  out:
-	bio_endio(bio, err);
+	bio_endio(bio);
 }
 
 static int nd_blk_rw_bytes(struct nd_namespace_common *ndns,
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 411c7b2bb37a..341202ed32b4 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1189,7 +1189,7 @@ static void btt_make_request(struct request_queue *q, struct bio *bio)
 	 * another kernel subsystem, and we just pass it through.
 	 */
 	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
-		err = -EIO;
+		bio->bi_error = -EIO;
 		goto out;
 	}
 
@@ -1211,6 +1211,7 @@ static void btt_make_request(struct request_queue *q, struct bio *bio)
 					"io error in %s sector %lld, len %d,\n",
 					(rw == READ) ? "READ" : "WRITE",
 					(unsigned long long) iter.bi_sector, len);
+			bio->bi_error = err;
 			break;
 		}
 	}
@@ -1218,7 +1219,7 @@ static void btt_make_request(struct request_queue *q, struct bio *bio)
 		nd_iostat_end(bio, start);
 
 out:
-	bio_endio(bio, err);
+	bio_endio(bio);
 }
 
 static int btt_rw_page(struct block_device *bdev, sector_t sector,
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index ade9eb917a4d..4c079d5cb539 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -77,7 +77,7 @@ static void pmem_make_request(struct request_queue *q, struct bio *bio)
 	if (bio_data_dir(bio))
 		wmb_pmem();
 
-	bio_endio(bio, 0);
+	bio_endio(bio);
 }
 
 static int pmem_rw_page(struct block_device *bdev, sector_t sector,
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index da212813f2d5..8bcb822b0bac 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -871,7 +871,7 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
 		}
 		bytes_done += bvec.bv_len;
 	}
-	bio_endio(bio, 0);
+	bio_endio(bio);
 	return;
 fail:
 	bio_io_error(bio);
diff --git a/drivers/s390/block/xpram.c b/drivers/s390/block/xpram.c
index 7d4e9397ac31..93856b9b6214 100644
--- a/drivers/s390/block/xpram.c
+++ b/drivers/s390/block/xpram.c
@@ -220,8 +220,7 @@ static void xpram_make_request(struct request_queue *q, struct bio *bio)
 			index++;
 		}
 	}
-	set_bit(BIO_UPTODATE, &bio->bi_flags);
-	bio_endio(bio, 0);
+	bio_endio(bio);
 	return;
 fail:
 	bio_io_error(bio);
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index 6d88d24e6cce..5a9982f5d5d6 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -306,20 +306,13 @@ static void iblock_complete_cmd(struct se_cmd *cmd)
 	kfree(ibr);
 }
 
-static void iblock_bio_done(struct bio *bio, int err)
+static void iblock_bio_done(struct bio *bio)
 {
 	struct se_cmd *cmd = bio->bi_private;
 	struct iblock_req *ibr = cmd->priv;
 
-	/*
-	 * Set -EIO if !BIO_UPTODATE and the passed is still err=0
-	 */
-	if (!test_bit(BIO_UPTODATE, &bio->bi_flags) && !err)
-		err = -EIO;
-
-	if (err != 0) {
-		pr_err("test_bit(BIO_UPTODATE) failed for bio: %p,"
-			" err: %d\n", bio, err);
+	if (bio->bi_error) {
+		pr_err("bio error: %p,  err: %d\n", bio, bio->bi_error);
 		/*
 		 * Bump the ib_bio_err_cnt and release bio.
 		 */
@@ -370,15 +363,15 @@ static void iblock_submit_bios(struct bio_list *list, int rw)
 	blk_finish_plug(&plug);
 }
 
-static void iblock_end_io_flush(struct bio *bio, int err)
+static void iblock_end_io_flush(struct bio *bio)
 {
 	struct se_cmd *cmd = bio->bi_private;
 
-	if (err)
-		pr_err("IBLOCK: cache flush failed: %d\n", err);
+	if (bio->bi_error)
+		pr_err("IBLOCK: cache flush failed: %d\n", bio->bi_error);
 
 	if (cmd) {
-		if (err)
+		if (bio->bi_error)
 			target_complete_cmd(cmd, SAM_STAT_CHECK_CONDITION);
 		else
 			target_complete_cmd(cmd, SAM_STAT_GOOD);
diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index 08e9084ee615..de18790eb21c 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -852,7 +852,7 @@ static ssize_t pscsi_show_configfs_dev_params(struct se_device *dev, char *b)
 	return bl;
 }
 
-static void pscsi_bi_endio(struct bio *bio, int error)
+static void pscsi_bi_endio(struct bio *bio)
 {
 	bio_put(bio);
 }
@@ -973,7 +973,7 @@ fail:
 	while (*hbio) {
 		bio = *hbio;
 		*hbio = (*hbio)->bi_next;
-		bio_endio(bio, 0);	/* XXX: should be error */
+		bio_endio(bio);
 	}
 	return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 }
@@ -1061,7 +1061,7 @@ fail_free_bio:
 	while (hbio) {
 		struct bio *bio = hbio;
 		hbio = hbio->bi_next;
-		bio_endio(bio, 0);	/* XXX: should be error */
+		bio_endio(bio);
 	}
 	ret = TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 fail:
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index ce7dec88f4b8..541fbfaed276 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -343,7 +343,7 @@ static int btrfsic_process_written_superblock(
 		struct btrfsic_state *state,
 		struct btrfsic_block *const block,
 		struct btrfs_super_block *const super_hdr);
-static void btrfsic_bio_end_io(struct bio *bp, int bio_error_status);
+static void btrfsic_bio_end_io(struct bio *bp);
 static void btrfsic_bh_end_io(struct buffer_head *bh, int uptodate);
 static int btrfsic_is_block_ref_by_superblock(const struct btrfsic_state *state,
 					      const struct btrfsic_block *block,
@@ -2207,7 +2207,7 @@ continue_loop:
 	goto again;
 }
 
-static void btrfsic_bio_end_io(struct bio *bp, int bio_error_status)
+static void btrfsic_bio_end_io(struct bio *bp)
 {
 	struct btrfsic_block *block = (struct btrfsic_block *)bp->bi_private;
 	int iodone_w_error;
@@ -2215,7 +2215,7 @@ static void btrfsic_bio_end_io(struct bio *bp, int bio_error_status)
 	/* mutex is not held! This is not save if IO is not yet completed
 	 * on umount */
 	iodone_w_error = 0;
-	if (bio_error_status)
+	if (bp->bi_error)
 		iodone_w_error = 1;
 
 	BUG_ON(NULL == block);
@@ -2230,7 +2230,7 @@ static void btrfsic_bio_end_io(struct bio *bp, int bio_error_status)
 		     BTRFSIC_PRINT_MASK_END_IO_BIO_BH))
 			printk(KERN_INFO
 			       "bio_end_io(err=%d) for %c @%llu (%s/%llu/%d)\n",
-			       bio_error_status,
+			       bp->bi_error,
 			       btrfsic_get_block_type(dev_state->state, block),
 			       block->logical_bytenr, dev_state->name,
 			       block->dev_bytenr, block->mirror_num);
@@ -2252,7 +2252,7 @@ static void btrfsic_bio_end_io(struct bio *bp, int bio_error_status)
 		block = next_block;
 	} while (NULL != block);
 
-	bp->bi_end_io(bp, bio_error_status);
+	bp->bi_end_io(bp);
 }
 
 static void btrfsic_bh_end_io(struct buffer_head *bh, int uptodate)
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index ce62324c78e7..302266ec2cdb 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -152,7 +152,7 @@ fail:
  * The compressed pages are freed here, and it must be run
  * in process context
  */
-static void end_compressed_bio_read(struct bio *bio, int err)
+static void end_compressed_bio_read(struct bio *bio)
 {
 	struct compressed_bio *cb = bio->bi_private;
 	struct inode *inode;
@@ -160,7 +160,7 @@ static void end_compressed_bio_read(struct bio *bio, int err)
 	unsigned long index;
 	int ret;
 
-	if (err)
+	if (bio->bi_error)
 		cb->errors = 1;
 
 	/* if there are more bios still pending for this compressed
@@ -210,7 +210,7 @@ csum_failed:
 		bio_for_each_segment_all(bvec, cb->orig_bio, i)
 			SetPageChecked(bvec->bv_page);
 
-		bio_endio(cb->orig_bio, 0);
+		bio_endio(cb->orig_bio);
 	}
 
 	/* finally free the cb struct */
@@ -266,7 +266,7 @@ static noinline void end_compressed_writeback(struct inode *inode,
  * This also calls the writeback end hooks for the file pages so that
  * metadata and checksums can be updated in the file.
  */
-static void end_compressed_bio_write(struct bio *bio, int err)
+static void end_compressed_bio_write(struct bio *bio)
 {
 	struct extent_io_tree *tree;
 	struct compressed_bio *cb = bio->bi_private;
@@ -274,7 +274,7 @@ static void end_compressed_bio_write(struct bio *bio, int err)
 	struct page *page;
 	unsigned long index;
 
-	if (err)
+	if (bio->bi_error)
 		cb->errors = 1;
 
 	/* if there are more bios still pending for this compressed
@@ -293,7 +293,7 @@ static void end_compressed_bio_write(struct bio *bio, int err)
 					 cb->start,
 					 cb->start + cb->len - 1,
 					 NULL,
-					 err ? 0 : 1);
+					 bio->bi_error ? 0 : 1);
 	cb->compressed_pages[0]->mapping = NULL;
 
 	end_compressed_writeback(inode, cb);
@@ -697,8 +697,10 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 
 			ret = btrfs_map_bio(root, READ, comp_bio,
 					    mirror_num, 0);
-			if (ret)
-				bio_endio(comp_bio, ret);
+			if (ret) {
+				bio->bi_error = ret;
+				bio_endio(comp_bio);
+			}
 
 			bio_put(comp_bio);
 
@@ -724,8 +726,10 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	}
 
 	ret = btrfs_map_bio(root, READ, comp_bio, mirror_num, 0);
-	if (ret)
-		bio_endio(comp_bio, ret);
+	if (ret) {
+		bio->bi_error = ret;
+		bio_endio(comp_bio);
+	}
 
 	bio_put(comp_bio);
 	return 0;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a9aadb2ad525..a8c0de888a9d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -703,7 +703,7 @@ static int btree_io_failed_hook(struct page *page, int failed_mirror)
 	return -EIO;	/* we fixed nothing */
 }
 
-static void end_workqueue_bio(struct bio *bio, int err)
+static void end_workqueue_bio(struct bio *bio)
 {
 	struct btrfs_end_io_wq *end_io_wq = bio->bi_private;
 	struct btrfs_fs_info *fs_info;
@@ -711,7 +711,7 @@ static void end_workqueue_bio(struct bio *bio, int err)
 	btrfs_work_func_t func;
 
 	fs_info = end_io_wq->info;
-	end_io_wq->error = err;
+	end_io_wq->error = bio->bi_error;
 
 	if (bio->bi_rw & REQ_WRITE) {
 		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA) {
@@ -808,7 +808,8 @@ static void run_one_async_done(struct btrfs_work *work)
 
 	/* If an error occured we just want to clean up the bio and move on */
 	if (async->error) {
-		bio_endio(async->bio, async->error);
+		async->bio->bi_error = async->error;
+		bio_endio(async->bio);
 		return;
 	}
 
@@ -908,8 +909,10 @@ static int __btree_submit_bio_done(struct inode *inode, int rw, struct bio *bio,
 	 * submission context.  Just jump into btrfs_map_bio
 	 */
 	ret = btrfs_map_bio(BTRFS_I(inode)->root, rw, bio, mirror_num, 1);
-	if (ret)
-		bio_endio(bio, ret);
+	if (ret) {
+		bio->bi_error = ret;
+		bio_endio(bio);
+	}
 	return ret;
 }
 
@@ -960,10 +963,13 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio,
 					  __btree_submit_bio_done);
 	}
 
-	if (ret) {
+	if (ret)
+		goto out_w_error;
+	return 0;
+
 out_w_error:
-		bio_endio(bio, ret);
-	}
+	bio->bi_error = ret;
+	bio_endio(bio);
 	return ret;
 }
 
@@ -1735,16 +1741,15 @@ static void end_workqueue_fn(struct btrfs_work *work)
 {
 	struct bio *bio;
 	struct btrfs_end_io_wq *end_io_wq;
-	int error;
 
 	end_io_wq = container_of(work, struct btrfs_end_io_wq, work);
 	bio = end_io_wq->bio;
 
-	error = end_io_wq->error;
+	bio->bi_error = end_io_wq->error;
 	bio->bi_private = end_io_wq->private;
 	bio->bi_end_io = end_io_wq->end_io;
 	kmem_cache_free(btrfs_end_io_wq_cache, end_io_wq);
-	bio_endio(bio, error);
+	bio_endio(bio);
 }
 
 static int cleaner_kthread(void *arg)
@@ -3323,10 +3328,8 @@ static int write_dev_supers(struct btrfs_device *device,
  * endio for the write_dev_flush, this will wake anyone waiting
  * for the barrier when it is done
  */
-static void btrfs_end_empty_barrier(struct bio *bio, int err)
+static void btrfs_end_empty_barrier(struct bio *bio)
 {
-	if (err)
-		clear_bit(BIO_UPTODATE, &bio->bi_flags);
 	if (bio->bi_private)
 		complete(bio->bi_private);
 	bio_put(bio);
@@ -3354,8 +3357,8 @@ static int write_dev_flush(struct btrfs_device *device, int wait)
 
 		wait_for_completion(&device->flush_wait);
 
-		if (!bio_flagged(bio, BIO_UPTODATE)) {
-			ret = -EIO;
+		if (bio->bi_error) {
+			ret = bio->bi_error;
 			btrfs_dev_stat_inc_and_print(device,
 				BTRFS_DEV_STAT_FLUSH_ERRS);
 		}
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 02d05817cbdf..c22f175ed024 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2486,7 +2486,7 @@ int end_extent_writepage(struct page *page, int err, u64 start, u64 end)
  * Scheduling is not allowed, so the extent state tree is expected
  * to have one and only one object corresponding to this IO.
  */
-static void end_bio_extent_writepage(struct bio *bio, int err)
+static void end_bio_extent_writepage(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	u64 start;
@@ -2516,7 +2516,7 @@ static void end_bio_extent_writepage(struct bio *bio, int err)
 		start = page_offset(page);
 		end = start + bvec->bv_offset + bvec->bv_len - 1;
 
-		if (end_extent_writepage(page, err, start, end))
+		if (end_extent_writepage(page, bio->bi_error, start, end))
 			continue;
 
 		end_page_writeback(page);
@@ -2548,10 +2548,10 @@ endio_readpage_release_extent(struct extent_io_tree *tree, u64 start, u64 len,
  * Scheduling is not allowed, so the extent state tree is expected
  * to have one and only one object corresponding to this IO.
  */
-static void end_bio_extent_readpage(struct bio *bio, int err)
+static void end_bio_extent_readpage(struct bio *bio)
 {
 	struct bio_vec *bvec;
-	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+	int uptodate = !bio->bi_error;
 	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
 	struct extent_io_tree *tree;
 	u64 offset = 0;
@@ -2564,16 +2564,13 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
 	int ret;
 	int i;
 
-	if (err)
-		uptodate = 0;
-
 	bio_for_each_segment_all(bvec, bio, i) {
 		struct page *page = bvec->bv_page;
 		struct inode *inode = page->mapping->host;
 
 		pr_debug("end_bio_extent_readpage: bi_sector=%llu, err=%d, "
-			 "mirror=%u\n", (u64)bio->bi_iter.bi_sector, err,
-			 io_bio->mirror_num);
+			 "mirror=%u\n", (u64)bio->bi_iter.bi_sector,
+			 bio->bi_error, io_bio->mirror_num);
 		tree = &BTRFS_I(inode)->io_tree;
 
 		/* We always issue full-page reads, but if some block
@@ -2614,8 +2611,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
 
 		if (tree->ops && tree->ops->readpage_io_failed_hook) {
 			ret = tree->ops->readpage_io_failed_hook(page, mirror);
-			if (!ret && !err &&
-			    test_bit(BIO_UPTODATE, &bio->bi_flags))
+			if (!ret && !bio->bi_error)
 				uptodate = 1;
 		} else {
 			/*
@@ -2631,10 +2627,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
 			ret = bio_readpage_error(bio, offset, page, start, end,
 						 mirror);
 			if (ret == 0) {
-				uptodate =
-					test_bit(BIO_UPTODATE, &bio->bi_flags);
-				if (err)
-					uptodate = 0;
+				uptodate = !bio->bi_error;
 				offset += len;
 				continue;
 			}
@@ -2684,7 +2677,7 @@ readpage_ok:
 		endio_readpage_release_extent(tree, extent_start, extent_len,
 					      uptodate);
 	if (io_bio->end_io)
-		io_bio->end_io(io_bio, err);
+		io_bio->end_io(io_bio, bio->bi_error);
 	bio_put(bio);
 }
 
@@ -3696,7 +3689,7 @@ static void set_btree_ioerr(struct page *page)
 	}
 }
 
-static void end_bio_extent_buffer_writepage(struct bio *bio, int err)
+static void end_bio_extent_buffer_writepage(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	struct extent_buffer *eb;
@@ -3709,7 +3702,8 @@ static void end_bio_extent_buffer_writepage(struct bio *bio, int err)
 		BUG_ON(!eb);
 		done = atomic_dec_and_test(&eb->io_pages);
 
-		if (err || test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
+		if (bio->bi_error ||
+		    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
 			ClearPageUptodate(page);
 			set_btree_ioerr(page);
 		}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b33c0cf02668..6b8becfe2057 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1845,8 +1845,10 @@ static int __btrfs_submit_bio_done(struct inode *inode, int rw, struct bio *bio,
 	int ret;
 
 	ret = btrfs_map_bio(root, rw, bio, mirror_num, 1);
-	if (ret)
-		bio_endio(bio, ret);
+	if (ret) {
+		bio->bi_error = ret;
+		bio_endio(bio);
+	}
 	return ret;
 }
 
@@ -1906,8 +1908,10 @@ mapit:
 	ret = btrfs_map_bio(root, rw, bio, mirror_num, 0);
 
 out:
-	if (ret < 0)
-		bio_endio(bio, ret);
+	if (ret < 0) {
+		bio->bi_error = ret;
+		bio_endio(bio);
+	}
 	return ret;
 }
 
@@ -7689,13 +7693,13 @@ struct btrfs_retry_complete {
 	int uptodate;
 };
 
-static void btrfs_retry_endio_nocsum(struct bio *bio, int err)
+static void btrfs_retry_endio_nocsum(struct bio *bio)
 {
 	struct btrfs_retry_complete *done = bio->bi_private;
 	struct bio_vec *bvec;
 	int i;
 
-	if (err)
+	if (bio->bi_error)
 		goto end;
 
 	done->uptodate = 1;
@@ -7744,7 +7748,7 @@ try_again:
 	return 0;
 }
 
-static void btrfs_retry_endio(struct bio *bio, int err)
+static void btrfs_retry_endio(struct bio *bio)
 {
 	struct btrfs_retry_complete *done = bio->bi_private;
 	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
@@ -7753,7 +7757,7 @@ static void btrfs_retry_endio(struct bio *bio, int err)
 	int ret;
 	int i;
 
-	if (err)
+	if (bio->bi_error)
 		goto end;
 
 	uptodate = 1;
@@ -7836,12 +7840,13 @@ static int btrfs_subio_endio_read(struct inode *inode,
 	}
 }
 
-static void btrfs_endio_direct_read(struct bio *bio, int err)
+static void btrfs_endio_direct_read(struct bio *bio)
 {
 	struct btrfs_dio_private *dip = bio->bi_private;
 	struct inode *inode = dip->inode;
 	struct bio *dio_bio;
 	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
+	int err = bio->bi_error;
 
 	if (dip->flags & BTRFS_DIO_ORIG_BIO_SUBMITTED)
 		err = btrfs_subio_endio_read(inode, io_bio, err);
@@ -7852,17 +7857,14 @@ static void btrfs_endio_direct_read(struct bio *bio, int err)
 
 	kfree(dip);
 
-	/* If we had a csum failure make sure to clear the uptodate flag */
-	if (err)
-		clear_bit(BIO_UPTODATE, &dio_bio->bi_flags);
-	dio_end_io(dio_bio, err);
+	dio_end_io(dio_bio, bio->bi_error);
 
 	if (io_bio->end_io)
 		io_bio->end_io(io_bio, err);
 	bio_put(bio);
 }
 
-static void btrfs_endio_direct_write(struct bio *bio, int err)
+static void btrfs_endio_direct_write(struct bio *bio)
 {
 	struct btrfs_dio_private *dip = bio->bi_private;
 	struct inode *inode = dip->inode;
@@ -7876,7 +7878,8 @@ static void btrfs_endio_direct_write(struct bio *bio, int err)
 again:
 	ret = btrfs_dec_test_first_ordered_pending(inode, &ordered,
 						   &ordered_offset,
-						   ordered_bytes, !err);
+						   ordered_bytes,
+						   !bio->bi_error);
 	if (!ret)
 		goto out_test;
 
@@ -7899,10 +7902,7 @@ out_test:
 
 	kfree(dip);
 
-	/* If we had an error make sure to clear the uptodate flag */
-	if (err)
-		clear_bit(BIO_UPTODATE, &dio_bio->bi_flags);
-	dio_end_io(dio_bio, err);
+	dio_end_io(dio_bio, bio->bi_error);
 	bio_put(bio);
 }
 
@@ -7917,9 +7917,10 @@ static int __btrfs_submit_bio_start_direct_io(struct inode *inode, int rw,
 	return 0;
 }
 
-static void btrfs_end_dio_bio(struct bio *bio, int err)
+static void btrfs_end_dio_bio(struct bio *bio)
 {
 	struct btrfs_dio_private *dip = bio->bi_private;
+	int err = bio->bi_error;
 
 	if (err)
 		btrfs_warn(BTRFS_I(dip->inode)->root->fs_info,
@@ -7948,8 +7949,8 @@ static void btrfs_end_dio_bio(struct bio *bio, int err)
 	if (dip->errors) {
 		bio_io_error(dip->orig_bio);
 	} else {
-		set_bit(BIO_UPTODATE, &dip->dio_bio->bi_flags);
-		bio_endio(dip->orig_bio, 0);
+		dip->dio_bio->bi_error = 0;
+		bio_endio(dip->orig_bio);
 	}
 out:
 	bio_put(bio);
@@ -8220,7 +8221,8 @@ free_ordered:
 	 * callbacks - they require an allocated dip and a clone of dio_bio.
 	 */
 	if (io_bio && dip) {
-		bio_endio(io_bio, ret);
+		io_bio->bi_error = -EIO;
+		bio_endio(io_bio);
 		/*
 		 * The end io callbacks free our dip, do the final put on io_bio
 		 * and all the cleanup and final put for dio_bio (through
@@ -8247,7 +8249,7 @@ free_ordered:
 			unlock_extent(&BTRFS_I(inode)->io_tree, file_offset,
 			      file_offset + dio_bio->bi_iter.bi_size - 1);
 		}
-		clear_bit(BIO_UPTODATE, &dio_bio->bi_flags);
+		dio_bio->bi_error = -EIO;
 		/*
 		 * Releases and cleans up our dio_bio, no need to bio_put()
 		 * nor bio_endio()/bio_io_error() against dio_bio.
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index fa72068bd256..0a02e24900aa 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -851,7 +851,7 @@ static void free_raid_bio(struct btrfs_raid_bio *rbio)
  * this frees the rbio and runs through all the bios in the
  * bio_list and calls end_io on them
  */
-static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, int err, int uptodate)
+static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, int err)
 {
 	struct bio *cur = bio_list_get(&rbio->bio_list);
 	struct bio *next;
@@ -864,9 +864,8 @@ static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, int err, int uptodate)
 	while (cur) {
 		next = cur->bi_next;
 		cur->bi_next = NULL;
-		if (uptodate)
-			set_bit(BIO_UPTODATE, &cur->bi_flags);
-		bio_endio(cur, err);
+		cur->bi_error = err;
+		bio_endio(cur);
 		cur = next;
 	}
 }
@@ -875,9 +874,10 @@ static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, int err, int uptodate)
  * end io function used by finish_rmw.  When we finally
  * get here, we've written a full stripe
  */
-static void raid_write_end_io(struct bio *bio, int err)
+static void raid_write_end_io(struct bio *bio)
 {
 	struct btrfs_raid_bio *rbio = bio->bi_private;
+	int err = bio->bi_error;
 
 	if (err)
 		fail_bio_stripe(rbio, bio);
@@ -893,7 +893,7 @@ static void raid_write_end_io(struct bio *bio, int err)
 	if (atomic_read(&rbio->error) > rbio->bbio->max_errors)
 		err = -EIO;
 
-	rbio_orig_end_io(rbio, err, 0);
+	rbio_orig_end_io(rbio, err);
 	return;
 }
 
@@ -1071,7 +1071,7 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
 		 * devices or if they are not contiguous
 		 */
 		if (last_end == disk_start && stripe->dev->bdev &&
-		    test_bit(BIO_UPTODATE, &last->bi_flags) &&
+		    !last->bi_error &&
 		    last->bi_bdev == stripe->dev->bdev) {
 			ret = bio_add_page(last, page, PAGE_CACHE_SIZE, 0);
 			if (ret == PAGE_CACHE_SIZE)
@@ -1087,7 +1087,6 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
 	bio->bi_iter.bi_size = 0;
 	bio->bi_bdev = stripe->dev->bdev;
 	bio->bi_iter.bi_sector = disk_start >> 9;
-	set_bit(BIO_UPTODATE, &bio->bi_flags);
 
 	bio_add_page(bio, page, PAGE_CACHE_SIZE, 0);
 	bio_list_add(bio_list, bio);
@@ -1312,13 +1311,12 @@ write_data:
 
 		bio->bi_private = rbio;
 		bio->bi_end_io = raid_write_end_io;
-		BUG_ON(!test_bit(BIO_UPTODATE, &bio->bi_flags));
 		submit_bio(WRITE, bio);
 	}
 	return;
 
 cleanup:
-	rbio_orig_end_io(rbio, -EIO, 0);
+	rbio_orig_end_io(rbio, -EIO);
 }
 
 /*
@@ -1441,11 +1439,11 @@ static void set_bio_pages_uptodate(struct bio *bio)
  * This will usually kick off finish_rmw once all the bios are read in, but it
  * may trigger parity reconstruction if we had any errors along the way
  */
-static void raid_rmw_end_io(struct bio *bio, int err)
+static void raid_rmw_end_io(struct bio *bio)
 {
 	struct btrfs_raid_bio *rbio = bio->bi_private;
 
-	if (err)
+	if (bio->bi_error)
 		fail_bio_stripe(rbio, bio);
 	else
 		set_bio_pages_uptodate(bio);
@@ -1455,7 +1453,6 @@ static void raid_rmw_end_io(struct bio *bio, int err)
 	if (!atomic_dec_and_test(&rbio->stripes_pending))
 		return;
 
-	err = 0;
 	if (atomic_read(&rbio->error) > rbio->bbio->max_errors)
 		goto cleanup;
 
@@ -1469,7 +1466,7 @@ static void raid_rmw_end_io(struct bio *bio, int err)
 
 cleanup:
 
-	rbio_orig_end_io(rbio, -EIO, 0);
+	rbio_orig_end_io(rbio, -EIO);
 }
 
 static void async_rmw_stripe(struct btrfs_raid_bio *rbio)
@@ -1572,14 +1569,13 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
 		btrfs_bio_wq_end_io(rbio->fs_info, bio,
 				    BTRFS_WQ_ENDIO_RAID56);
 
-		BUG_ON(!test_bit(BIO_UPTODATE, &bio->bi_flags));
 		submit_bio(READ, bio);
 	}
 	/* the actual write will happen once the reads are done */
 	return 0;
 
 cleanup:
-	rbio_orig_end_io(rbio, -EIO, 0);
+	rbio_orig_end_io(rbio, -EIO);
 	return -EIO;
 
 finish:
@@ -1964,7 +1960,7 @@ cleanup_io:
 		else
 			clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
 
-		rbio_orig_end_io(rbio, err, err == 0);
+		rbio_orig_end_io(rbio, err);
 	} else if (err == 0) {
 		rbio->faila = -1;
 		rbio->failb = -1;
@@ -1976,7 +1972,7 @@ cleanup_io:
 		else
 			BUG();
 	} else {
-		rbio_orig_end_io(rbio, err, 0);
+		rbio_orig_end_io(rbio, err);
 	}
 }
 
@@ -1984,7 +1980,7 @@ cleanup_io:
  * This is called only for stripes we've read from disk to
  * reconstruct the parity.
  */
-static void raid_recover_end_io(struct bio *bio, int err)
+static void raid_recover_end_io(struct bio *bio)
 {
 	struct btrfs_raid_bio *rbio = bio->bi_private;
 
@@ -1992,7 +1988,7 @@ static void raid_recover_end_io(struct bio *bio, int err)
 	 * we only read stripe pages off the disk, set them
 	 * up to date if there were no errors
 	 */
-	if (err)
+	if (bio->bi_error)
 		fail_bio_stripe(rbio, bio);
 	else
 		set_bio_pages_uptodate(bio);
@@ -2002,7 +1998,7 @@ static void raid_recover_end_io(struct bio *bio, int err)
 		return;
 
 	if (atomic_read(&rbio->error) > rbio->bbio->max_errors)
-		rbio_orig_end_io(rbio, -EIO, 0);
+		rbio_orig_end_io(rbio, -EIO);
 	else
 		__raid_recover_end_io(rbio);
 }
@@ -2094,7 +2090,6 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
 		btrfs_bio_wq_end_io(rbio->fs_info, bio,
 				    BTRFS_WQ_ENDIO_RAID56);
 
-		BUG_ON(!test_bit(BIO_UPTODATE, &bio->bi_flags));
 		submit_bio(READ, bio);
 	}
 out:
@@ -2102,7 +2097,7 @@ out:
 
 cleanup:
 	if (rbio->operation == BTRFS_RBIO_READ_REBUILD)
-		rbio_orig_end_io(rbio, -EIO, 0);
+		rbio_orig_end_io(rbio, -EIO);
 	return -EIO;
 }
 
@@ -2277,11 +2272,12 @@ static int alloc_rbio_essential_pages(struct btrfs_raid_bio *rbio)
  * end io function used by finish_rmw.  When we finally
  * get here, we've written a full stripe
  */
-static void raid_write_parity_end_io(struct bio *bio, int err)
+static void raid_write_parity_end_io(struct bio *bio)
 {
 	struct btrfs_raid_bio *rbio = bio->bi_private;
+	int err = bio->bi_error;
 
-	if (err)
+	if (bio->bi_error)
 		fail_bio_stripe(rbio, bio);
 
 	bio_put(bio);
@@ -2294,7 +2290,7 @@ static void raid_write_parity_end_io(struct bio *bio, int err)
 	if (atomic_read(&rbio->error))
 		err = -EIO;
 
-	rbio_orig_end_io(rbio, err, 0);
+	rbio_orig_end_io(rbio, err);
 }
 
 static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
@@ -2437,7 +2433,7 @@ submit_write:
 	nr_data = bio_list_size(&bio_list);
 	if (!nr_data) {
 		/* Every parity is right */
-		rbio_orig_end_io(rbio, 0, 0);
+		rbio_orig_end_io(rbio, 0);
 		return;
 	}
 
@@ -2450,13 +2446,12 @@ submit_write:
 
 		bio->bi_private = rbio;
 		bio->bi_end_io = raid_write_parity_end_io;
-		BUG_ON(!test_bit(BIO_UPTODATE, &bio->bi_flags));
 		submit_bio(WRITE, bio);
 	}
 	return;
 
 cleanup:
-	rbio_orig_end_io(rbio, -EIO, 0);
+	rbio_orig_end_io(rbio, -EIO);
 }
 
 static inline int is_data_stripe(struct btrfs_raid_bio *rbio, int stripe)
@@ -2524,7 +2519,7 @@ static void validate_rbio_for_parity_scrub(struct btrfs_raid_bio *rbio)
 	return;
 
 cleanup:
-	rbio_orig_end_io(rbio, -EIO, 0);
+	rbio_orig_end_io(rbio, -EIO);
 }
 
 /*
@@ -2535,11 +2530,11 @@ cleanup:
  * This will usually kick off finish_rmw once all the bios are read in, but it
  * may trigger parity reconstruction if we had any errors along the way
  */
-static void raid56_parity_scrub_end_io(struct bio *bio, int err)
+static void raid56_parity_scrub_end_io(struct bio *bio)
 {
 	struct btrfs_raid_bio *rbio = bio->bi_private;
 
-	if (err)
+	if (bio->bi_error)
 		fail_bio_stripe(rbio, bio);
 	else
 		set_bio_pages_uptodate(bio);
@@ -2632,14 +2627,13 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
 		btrfs_bio_wq_end_io(rbio->fs_info, bio,
 				    BTRFS_WQ_ENDIO_RAID56);
 
-		BUG_ON(!test_bit(BIO_UPTODATE, &bio->bi_flags));
 		submit_bio(READ, bio);
 	}
 	/* the actual write will happen once the reads are done */
 	return;
 
 cleanup:
-	rbio_orig_end_io(rbio, -EIO, 0);
+	rbio_orig_end_io(rbio, -EIO);
 	return;
 
 finish:
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 94db0fa5225a..ebb8260186fe 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -278,7 +278,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 		       u64 physical, struct btrfs_device *dev, u64 flags,
 		       u64 gen, int mirror_num, u8 *csum, int force,
 		       u64 physical_for_dev_replace);
-static void scrub_bio_end_io(struct bio *bio, int err);
+static void scrub_bio_end_io(struct bio *bio);
 static void scrub_bio_end_io_worker(struct btrfs_work *work);
 static void scrub_block_complete(struct scrub_block *sblock);
 static void scrub_remap_extent(struct btrfs_fs_info *fs_info,
@@ -295,7 +295,7 @@ static void scrub_free_wr_ctx(struct scrub_wr_ctx *wr_ctx);
 static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx,
 				    struct scrub_page *spage);
 static void scrub_wr_submit(struct scrub_ctx *sctx);
-static void scrub_wr_bio_end_io(struct bio *bio, int err);
+static void scrub_wr_bio_end_io(struct bio *bio);
 static void scrub_wr_bio_end_io_worker(struct btrfs_work *work);
 static int write_page_nocow(struct scrub_ctx *sctx,
 			    u64 physical_for_dev_replace, struct page *page);
@@ -1429,11 +1429,11 @@ struct scrub_bio_ret {
 	int error;
 };
 
-static void scrub_bio_wait_endio(struct bio *bio, int error)
+static void scrub_bio_wait_endio(struct bio *bio)
 {
 	struct scrub_bio_ret *ret = bio->bi_private;
 
-	ret->error = error;
+	ret->error = bio->bi_error;
 	complete(&ret->event);
 }
 
@@ -1790,12 +1790,12 @@ static void scrub_wr_submit(struct scrub_ctx *sctx)
 	btrfsic_submit_bio(WRITE, sbio->bio);
 }
 
-static void scrub_wr_bio_end_io(struct bio *bio, int err)
+static void scrub_wr_bio_end_io(struct bio *bio)
 {
 	struct scrub_bio *sbio = bio->bi_private;
 	struct btrfs_fs_info *fs_info = sbio->dev->dev_root->fs_info;
 
-	sbio->err = err;
+	sbio->err = bio->bi_error;
 	sbio->bio = bio;
 
 	btrfs_init_work(&sbio->work, btrfs_scrubwrc_helper,
@@ -2098,7 +2098,7 @@ static void scrub_submit(struct scrub_ctx *sctx)
 		 */
 		printk_ratelimited(KERN_WARNING
 			"BTRFS: scrub_submit(bio bdev == NULL) is unexpected!\n");
-		bio_endio(sbio->bio, -EIO);
+		bio_io_error(sbio->bio);
 	} else {
 		btrfsic_submit_bio(READ, sbio->bio);
 	}
@@ -2260,12 +2260,12 @@ leave_nomem:
 	return 0;
 }
 
-static void scrub_bio_end_io(struct bio *bio, int err)
+static void scrub_bio_end_io(struct bio *bio)
 {
 	struct scrub_bio *sbio = bio->bi_private;
 	struct btrfs_fs_info *fs_info = sbio->dev->dev_root->fs_info;
 
-	sbio->err = err;
+	sbio->err = bio->bi_error;
 	sbio->bio = bio;
 
 	btrfs_queue_work(fs_info->scrub_workers, &sbio->work);
@@ -2672,11 +2672,11 @@ static void scrub_parity_bio_endio_worker(struct btrfs_work *work)
 	scrub_pending_bio_dec(sctx);
 }
 
-static void scrub_parity_bio_endio(struct bio *bio, int error)
+static void scrub_parity_bio_endio(struct bio *bio)
 {
 	struct scrub_parity *sparity = (struct scrub_parity *)bio->bi_private;
 
-	if (error)
+	if (bio->bi_error)
 		bitmap_or(sparity->ebitmap, sparity->ebitmap, sparity->dbitmap,
 			  sparity->nsectors);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fbe7c104531c..8f2ca18c71f4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5741,23 +5741,23 @@ int btrfs_rmap_block(struct btrfs_mapping_tree *map_tree,
 	return 0;
 }
 
-static inline void btrfs_end_bbio(struct btrfs_bio *bbio, struct bio *bio, int err)
+static inline void btrfs_end_bbio(struct btrfs_bio *bbio, struct bio *bio)
 {
 	bio->bi_private = bbio->private;
 	bio->bi_end_io = bbio->end_io;
-	bio_endio(bio, err);
+	bio_endio(bio);
 
 	btrfs_put_bbio(bbio);
 }
 
-static void btrfs_end_bio(struct bio *bio, int err)
+static void btrfs_end_bio(struct bio *bio)
 {
 	struct btrfs_bio *bbio = bio->bi_private;
 	int is_orig_bio = 0;
 
-	if (err) {
+	if (bio->bi_error) {
 		atomic_inc(&bbio->error);
-		if (err == -EIO || err == -EREMOTEIO) {
+		if (bio->bi_error == -EIO || bio->bi_error == -EREMOTEIO) {
 			unsigned int stripe_index =
 				btrfs_io_bio(bio)->stripe_index;
 			struct btrfs_device *dev;
@@ -5795,17 +5795,16 @@ static void btrfs_end_bio(struct bio *bio, int err)
 		 * beyond the tolerance of the btrfs bio
 		 */
 		if (atomic_read(&bbio->error) > bbio->max_errors) {
-			err = -EIO;
+			bio->bi_error = -EIO;
 		} else {
 			/*
 			 * this bio is actually up to date, we didn't
 			 * go over the max number of errors
 			 */
-			set_bit(BIO_UPTODATE, &bio->bi_flags);
-			err = 0;
+			bio->bi_error = 0;
 		}
 
-		btrfs_end_bbio(bbio, bio, err);
+		btrfs_end_bbio(bbio, bio);
 	} else if (!is_orig_bio) {
 		bio_put(bio);
 	}
@@ -5826,7 +5825,7 @@ static noinline void btrfs_schedule_bio(struct btrfs_root *root,
 	struct btrfs_pending_bios *pending_bios;
 
 	if (device->missing || !device->bdev) {
-		bio_endio(bio, -EIO);
+		bio_io_error(bio);
 		return;
 	}
 
@@ -5973,8 +5972,8 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical)
 
 		btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
 		bio->bi_iter.bi_sector = logical >> 9;
-
-		btrfs_end_bbio(bbio, bio, -EIO);
+		bio->bi_error = -EIO;
+		btrfs_end_bbio(bbio, bio);
 	}
 }
 
diff --git a/fs/buffer.c b/fs/buffer.c
index 1cf7a53a0277..7a49bb84ecb5 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2957,14 +2957,14 @@ sector_t generic_block_bmap(struct address_space *mapping, sector_t block,
 }
 EXPORT_SYMBOL(generic_block_bmap);
 
-static void end_bio_bh_io_sync(struct bio *bio, int err)
+static void end_bio_bh_io_sync(struct bio *bio)
 {
 	struct buffer_head *bh = bio->bi_private;
 
 	if (unlikely (test_bit(BIO_QUIET,&bio->bi_flags)))
 		set_bit(BH_Quiet, &bh->b_state);
 
-	bh->b_end_io(bh, test_bit(BIO_UPTODATE, &bio->bi_flags));
+	bh->b_end_io(bh, !bio->bi_error);
 	bio_put(bio);
 }
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 745d2342651a..e1639c8c14d5 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -285,7 +285,7 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio);
 /*
  * Asynchronous IO callback. 
  */
-static void dio_bio_end_aio(struct bio *bio, int error)
+static void dio_bio_end_aio(struct bio *bio)
 {
 	struct dio *dio = bio->bi_private;
 	unsigned long remaining;
@@ -318,7 +318,7 @@ static void dio_bio_end_aio(struct bio *bio, int error)
  * During I/O bi_private points at the dio.  After I/O, bi_private is used to
  * implement a singly-linked list of completed BIOs, at dio->bio_list.
  */
-static void dio_bio_end_io(struct bio *bio, int error)
+static void dio_bio_end_io(struct bio *bio)
 {
 	struct dio *dio = bio->bi_private;
 	unsigned long flags;
@@ -345,9 +345,9 @@ void dio_end_io(struct bio *bio, int error)
 	struct dio *dio = bio->bi_private;
 
 	if (dio->is_async)
-		dio_bio_end_aio(bio, error);
+		dio_bio_end_aio(bio);
 	else
-		dio_bio_end_io(bio, error);
+		dio_bio_end_io(bio);
 }
 EXPORT_SYMBOL_GPL(dio_end_io);
 
@@ -457,11 +457,10 @@ static struct bio *dio_await_one(struct dio *dio)
  */
 static int dio_bio_complete(struct dio *dio, struct bio *bio)
 {
-	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	struct bio_vec *bvec;
 	unsigned i;
 
-	if (!uptodate)
+	if (bio->bi_error)
 		dio->io_error = -EIO;
 
 	if (dio->is_async && dio->rw == READ) {
@@ -476,7 +475,7 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
 		}
 		bio_put(bio);
 	}
-	return uptodate ? 0 : -EIO;
+	return bio->bi_error;
 }
 
 /*
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 5602450f03f6..aa95566f14be 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -61,7 +61,6 @@ static void buffer_io_error(struct buffer_head *bh)
 static void ext4_finish_bio(struct bio *bio)
 {
 	int i;
-	int error = !test_bit(BIO_UPTODATE, &bio->bi_flags);
 	struct bio_vec *bvec;
 
 	bio_for_each_segment_all(bvec, bio, i) {
@@ -88,7 +87,7 @@ static void ext4_finish_bio(struct bio *bio)
 		}
 #endif
 
-		if (error) {
+		if (bio->bi_error) {
 			SetPageError(page);
 			set_bit(AS_EIO, &page->mapping->flags);
 		}
@@ -107,7 +106,7 @@ static void ext4_finish_bio(struct bio *bio)
 				continue;
 			}
 			clear_buffer_async_write(bh);
-			if (error)
+			if (bio->bi_error)
 				buffer_io_error(bh);
 		} while ((bh = bh->b_this_page) != head);
 		bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
@@ -310,27 +309,25 @@ ext4_io_end_t *ext4_get_io_end(ext4_io_end_t *io_end)
 }
 
 /* BIO completion function for page writeback */
-static void ext4_end_bio(struct bio *bio, int error)
+static void ext4_end_bio(struct bio *bio)
 {
 	ext4_io_end_t *io_end = bio->bi_private;
 	sector_t bi_sector = bio->bi_iter.bi_sector;
 
 	BUG_ON(!io_end);
 	bio->bi_end_io = NULL;
-	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
-		error = 0;
 
-	if (error) {
+	if (bio->bi_error) {
 		struct inode *inode = io_end->inode;
 
 		ext4_warning(inode->i_sb, "I/O error %d writing to inode %lu "
 			     "(offset %llu size %ld starting block %llu)",
-			     error, inode->i_ino,
+			     bio->bi_error, inode->i_ino,
 			     (unsigned long long) io_end->offset,
 			     (long) io_end->size,
 			     (unsigned long long)
 			     bi_sector >> (inode->i_blkbits - 9));
-		mapping_set_error(inode->i_mapping, error);
+		mapping_set_error(inode->i_mapping, bio->bi_error);
 	}
 
 	if (io_end->flag & EXT4_IO_END_UNWRITTEN) {
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index ec3ef93a52db..5de5b871c178 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -98,7 +98,7 @@ static inline bool ext4_bio_encrypted(struct bio *bio)
  * status of that page is hard.  See end_buffer_async_read() for the details.
  * There is no point in duplicating all that complexity.
  */
-static void mpage_end_io(struct bio *bio, int err)
+static void mpage_end_io(struct bio *bio)
 {
 	struct bio_vec *bv;
 	int i;
@@ -106,7 +106,7 @@ static void mpage_end_io(struct bio *bio, int err)
 	if (ext4_bio_encrypted(bio)) {
 		struct ext4_crypto_ctx *ctx = bio->bi_private;
 
-		if (err) {
+		if (bio->bi_error) {
 			ext4_release_crypto_ctx(ctx);
 		} else {
 			INIT_WORK(&ctx->r.work, completion_pages);
@@ -118,7 +118,7 @@ static void mpage_end_io(struct bio *bio, int err)
 	bio_for_each_segment_all(bv, bio, i) {
 		struct page *page = bv->bv_page;
 
-		if (!err) {
+		if (!bio->bi_error) {
 			SetPageUptodate(page);
 		} else {
 			ClearPageUptodate(page);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9bedfa8dd3a5..8f0baa7ffb50 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -29,13 +29,13 @@
 static struct kmem_cache *extent_tree_slab;
 static struct kmem_cache *extent_node_slab;
 
-static void f2fs_read_end_io(struct bio *bio, int err)
+static void f2fs_read_end_io(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	int i;
 
 	if (f2fs_bio_encrypted(bio)) {
-		if (err) {
+		if (bio->bi_error) {
 			f2fs_release_crypto_ctx(bio->bi_private);
 		} else {
 			f2fs_end_io_crypto_work(bio->bi_private, bio);
@@ -46,7 +46,7 @@ static void f2fs_read_end_io(struct bio *bio, int err)
 	bio_for_each_segment_all(bvec, bio, i) {
 		struct page *page = bvec->bv_page;
 
-		if (!err) {
+		if (!bio->bi_error) {
 			SetPageUptodate(page);
 		} else {
 			ClearPageUptodate(page);
@@ -57,7 +57,7 @@ static void f2fs_read_end_io(struct bio *bio, int err)
 	bio_put(bio);
 }
 
-static void f2fs_write_end_io(struct bio *bio, int err)
+static void f2fs_write_end_io(struct bio *bio)
 {
 	struct f2fs_sb_info *sbi = bio->bi_private;
 	struct bio_vec *bvec;
@@ -68,7 +68,7 @@ static void f2fs_write_end_io(struct bio *bio, int err)
 
 		f2fs_restore_and_release_control_page(&page);
 
-		if (unlikely(err)) {
+		if (unlikely(bio->bi_error)) {
 			set_page_dirty(page);
 			set_bit(AS_EIO, &page->mapping->flags);
 			f2fs_stop_checkpoint(sbi);
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 2c1ae861dc94..c0a1b967deba 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -202,22 +202,22 @@ static void gfs2_end_log_write_bh(struct gfs2_sbd *sdp, struct bio_vec *bvec,
  *
  */
 
-static void gfs2_end_log_write(struct bio *bio, int error)
+static void gfs2_end_log_write(struct bio *bio)
 {
 	struct gfs2_sbd *sdp = bio->bi_private;
 	struct bio_vec *bvec;
 	struct page *page;
 	int i;
 
-	if (error) {
-		sdp->sd_log_error = error;
-		fs_err(sdp, "Error %d writing to log\n", error);
+	if (bio->bi_error) {
+		sdp->sd_log_error = bio->bi_error;
+		fs_err(sdp, "Error %d writing to log\n", bio->bi_error);
 	}
 
 	bio_for_each_segment_all(bvec, bio, i) {
 		page = bvec->bv_page;
 		if (page_has_buffers(page))
-			gfs2_end_log_write_bh(sdp, bvec, error);
+			gfs2_end_log_write_bh(sdp, bvec, bio->bi_error);
 		else
 			mempool_free(page, gfs2_page_pool);
 	}
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 1e3a93f2f71d..02586e7eb964 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -171,14 +171,14 @@ static int gfs2_check_sb(struct gfs2_sbd *sdp, int silent)
 	return -EINVAL;
 }
 
-static void end_bio_io_page(struct bio *bio, int error)
+static void end_bio_io_page(struct bio *bio)
 {
 	struct page *page = bio->bi_private;
 
-	if (!error)
+	if (!bio->bi_error)
 		SetPageUptodate(page);
 	else
-		pr_warn("error %d reading superblock\n", error);
+		pr_warn("error %d reading superblock\n", bio->bi_error);
 	unlock_page(page);
 }
 
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index bc462dcd7a40..d301acfdb80d 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -2011,7 +2011,7 @@ static int lbmRead(struct jfs_log * log, int pn, struct lbuf ** bpp)
 	/*check if journaling to disk has been disabled*/
 	if (log->no_integrity) {
 		bio->bi_iter.bi_size = 0;
-		lbmIODone(bio, 0);
+		lbmIODone(bio);
 	} else {
 		submit_bio(READ_SYNC, bio);
 	}
@@ -2158,7 +2158,7 @@ static void lbmStartIO(struct lbuf * bp)
 	/* check if journaling to disk has been disabled */
 	if (log->no_integrity) {
 		bio->bi_iter.bi_size = 0;
-		lbmIODone(bio, 0);
+		lbmIODone(bio);
 	} else {
 		submit_bio(WRITE_SYNC, bio);
 		INCREMENT(lmStat.submitted);
@@ -2196,7 +2196,7 @@ static int lbmIOWait(struct lbuf * bp, int flag)
  *
  * executed at INTIODONE level
  */
-static void lbmIODone(struct bio *bio, int error)
+static void lbmIODone(struct bio *bio)
 {
 	struct lbuf *bp = bio->bi_private;
 	struct lbuf *nextbp, *tail;
@@ -2212,7 +2212,7 @@ static void lbmIODone(struct bio *bio, int error)
 
 	bp->l_flag |= lbmDONE;
 
-	if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) {
+	if (bio->bi_error) {
 		bp->l_flag |= lbmERROR;
 
 		jfs_err("lbmIODone: I/O error in JFS log");
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index 16a0922beb59..a3eb316b1ac3 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -276,11 +276,11 @@ static void last_read_complete(struct page *page)
 	unlock_page(page);
 }
 
-static void metapage_read_end_io(struct bio *bio, int err)
+static void metapage_read_end_io(struct bio *bio)
 {
 	struct page *page = bio->bi_private;
 
-	if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) {
+	if (bio->bi_error) {
 		printk(KERN_ERR "metapage_read_end_io: I/O error\n");
 		SetPageError(page);
 	}
@@ -331,13 +331,13 @@ static void last_write_complete(struct page *page)
 	end_page_writeback(page);
 }
 
-static void metapage_write_end_io(struct bio *bio, int err)
+static void metapage_write_end_io(struct bio *bio)
 {
 	struct page *page = bio->bi_private;
 
 	BUG_ON(!PagePrivate(page));
 
-	if (! test_bit(BIO_UPTODATE, &bio->bi_flags)) {
+	if (bio->bi_error) {
 		printk(KERN_ERR "metapage_write_end_io: I/O error\n");
 		SetPageError(page);
 	}
diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index 76279e11982d..cea0cc9878b7 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -53,16 +53,14 @@ static int bdev_readpage(void *_sb, struct page *page)
 
 static DECLARE_WAIT_QUEUE_HEAD(wq);
 
-static void writeseg_end_io(struct bio *bio, int err)
+static void writeseg_end_io(struct bio *bio)
 {
-	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	struct bio_vec *bvec;
 	int i;
 	struct super_block *sb = bio->bi_private;
 	struct logfs_super *super = logfs_super(sb);
 
-	BUG_ON(!uptodate); /* FIXME: Retry io or write elsewhere */
-	BUG_ON(err);
+	BUG_ON(bio->bi_error); /* FIXME: Retry io or write elsewhere */
 
 	bio_for_each_segment_all(bvec, bio, i) {
 		end_page_writeback(bvec->bv_page);
@@ -153,14 +151,12 @@ static void bdev_writeseg(struct super_block *sb, u64 ofs, size_t len)
 }
 
 
-static void erase_end_io(struct bio *bio, int err) 
+static void erase_end_io(struct bio *bio)
 { 
-	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags); 
 	struct super_block *sb = bio->bi_private; 
 	struct logfs_super *super = logfs_super(sb); 
 
-	BUG_ON(!uptodate); /* FIXME: Retry io or write elsewhere */ 
-	BUG_ON(err); 
+	BUG_ON(bio->bi_error); /* FIXME: Retry io or write elsewhere */ 
 	BUG_ON(bio->bi_vcnt == 0); 
 	bio_put(bio); 
 	if (atomic_dec_and_test(&super->s_pending_writes))
diff --git a/fs/mpage.c b/fs/mpage.c
index ca0244b69de8..abac9361b3f1 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -42,14 +42,14 @@
  * status of that page is hard.  See end_buffer_async_read() for the details.
  * There is no point in duplicating all that complexity.
  */
-static void mpage_end_io(struct bio *bio, int err)
+static void mpage_end_io(struct bio *bio)
 {
 	struct bio_vec *bv;
 	int i;
 
 	bio_for_each_segment_all(bv, bio, i) {
 		struct page *page = bv->bv_page;
-		page_endio(page, bio_data_dir(bio), err);
+		page_endio(page, bio_data_dir(bio), bio->bi_error);
 	}
 
 	bio_put(bio);
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index d2554fe140a3..9cd4eb3a1e22 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -116,7 +116,7 @@ bl_submit_bio(int rw, struct bio *bio)
 
 static struct bio *
 bl_alloc_init_bio(int npg, struct block_device *bdev, sector_t disk_sector,
-		void (*end_io)(struct bio *, int err), struct parallel_io *par)
+		bio_end_io_t end_io, struct parallel_io *par)
 {
 	struct bio *bio;
 
@@ -139,8 +139,7 @@ bl_alloc_init_bio(int npg, struct block_device *bdev, sector_t disk_sector,
 static struct bio *
 do_add_page_to_bio(struct bio *bio, int npg, int rw, sector_t isect,
 		struct page *page, struct pnfs_block_dev_map *map,
-		struct pnfs_block_extent *be,
-		void (*end_io)(struct bio *, int err),
+		struct pnfs_block_extent *be, bio_end_io_t end_io,
 		struct parallel_io *par, unsigned int offset, int *len)
 {
 	struct pnfs_block_dev *dev =
@@ -183,11 +182,11 @@ retry:
 	return bio;
 }
 
-static void bl_end_io_read(struct bio *bio, int err)
+static void bl_end_io_read(struct bio *bio)
 {
 	struct parallel_io *par = bio->bi_private;
 
-	if (err) {
+	if (bio->bi_error) {
 		struct nfs_pgio_header *header = par->data;
 
 		if (!header->pnfs_error)
@@ -316,13 +315,12 @@ out:
 	return PNFS_ATTEMPTED;
 }
 
-static void bl_end_io_write(struct bio *bio, int err)
+static void bl_end_io_write(struct bio *bio)
 {
 	struct parallel_io *par = bio->bi_private;
-	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	struct nfs_pgio_header *header = par->data;
 
-	if (!uptodate) {
+	if (bio->bi_error) {
 		if (!header->pnfs_error)
 			header->pnfs_error = -EIO;
 		pnfs_set_lo_fail(header->lseg);
diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index 42468e5ab3e7..550b10efb14e 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -338,12 +338,11 @@ void nilfs_add_checksums_on_logs(struct list_head *logs, u32 seed)
 /*
  * BIO operations
  */
-static void nilfs_end_bio_write(struct bio *bio, int err)
+static void nilfs_end_bio_write(struct bio *bio)
 {
-	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	struct nilfs_segment_buffer *segbuf = bio->bi_private;
 
-	if (!uptodate)
+	if (bio->bi_error)
 		atomic_inc(&segbuf->sb_err);
 
 	bio_put(bio);
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 16eff45727ee..140de3c93d2e 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -372,14 +372,13 @@ static void o2hb_wait_on_io(struct o2hb_region *reg,
 	wait_for_completion(&wc->wc_io_complete);
 }
 
-static void o2hb_bio_end_io(struct bio *bio,
-			   int error)
+static void o2hb_bio_end_io(struct bio *bio)
 {
 	struct o2hb_bio_wait_ctxt *wc = bio->bi_private;
 
-	if (error) {
-		mlog(ML_ERROR, "IO Error %d\n", error);
-		wc->wc_error = error;
+	if (bio->bi_error) {
+		mlog(ML_ERROR, "IO Error %d\n", bio->bi_error);
+		wc->wc_error = bio->bi_error;
 	}
 
 	o2hb_bio_wait_dec(wc, 1);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 3859f5e27a4d..3714844a81d8 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -351,12 +351,11 @@ xfs_imap_valid(
  */
 STATIC void
 xfs_end_bio(
-	struct bio		*bio,
-	int			error)
+	struct bio		*bio)
 {
 	xfs_ioend_t		*ioend = bio->bi_private;
 
-	ioend->io_error = test_bit(BIO_UPTODATE, &bio->bi_flags) ? 0 : error;
+	ioend->io_error = bio->bi_error;
 
 	/* Toss bio and pass work off to an xfsdatad thread */
 	bio->bi_private = NULL;
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index a4b7d92e946c..01bd6781974e 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1096,8 +1096,7 @@ xfs_bwrite(
 
 STATIC void
 xfs_buf_bio_end_io(
-	struct bio		*bio,
-	int			error)
+	struct bio		*bio)
 {
 	xfs_buf_t		*bp = (xfs_buf_t *)bio->bi_private;
 
@@ -1105,10 +1104,10 @@ xfs_buf_bio_end_io(
 	 * don't overwrite existing errors - otherwise we can lose errors on
 	 * buffers that require multiple bios to complete.
 	 */
-	if (error) {
+	if (bio->bi_error) {
 		spin_lock(&bp->b_lock);
 		if (!bp->b_io_error)
-			bp->b_io_error = error;
+			bp->b_io_error = bio->bi_error;
 		spin_unlock(&bp->b_lock);
 	}
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5e963a6d7c14..6b918177002d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -195,8 +195,6 @@ static inline bool bvec_gap_to_prev(struct bio_vec *bprv, unsigned int offset)
 	return offset || ((bprv->bv_offset + bprv->bv_len) & (PAGE_SIZE - 1));
 }
 
-#define bio_io_error(bio) bio_endio((bio), -EIO)
-
 /*
  * drivers should _never_ use the all version - the bio may have been split
  * before it got to the driver and the driver won't own all of it
@@ -426,7 +424,14 @@ static inline struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
 
 }
 
-extern void bio_endio(struct bio *, int);
+extern void bio_endio(struct bio *);
+
+static inline void bio_io_error(struct bio *bio)
+{
+	bio->bi_error = -EIO;
+	bio_endio(bio);
+}
+
 struct request_queue;
 extern int bio_phys_segments(struct request_queue *, struct bio *);
 
@@ -717,7 +722,7 @@ extern void bio_integrity_free(struct bio *);
 extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int);
 extern bool bio_integrity_enabled(struct bio *bio);
 extern int bio_integrity_prep(struct bio *);
-extern void bio_integrity_endio(struct bio *, int);
+extern void bio_integrity_endio(struct bio *);
 extern void bio_integrity_advance(struct bio *, unsigned int);
 extern void bio_integrity_trim(struct bio *, unsigned int, unsigned int);
 extern int bio_integrity_clone(struct bio *, struct bio *, gfp_t);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 7303b3405520..6164fb8a817b 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -14,7 +14,7 @@ struct page;
 struct block_device;
 struct io_context;
 struct cgroup_subsys_state;
-typedef void (bio_end_io_t) (struct bio *, int);
+typedef void (bio_end_io_t) (struct bio *);
 typedef void (bio_destructor_t) (struct bio *);
 
 /*
@@ -53,6 +53,7 @@ struct bio {
 
 	struct bvec_iter	bi_iter;
 
+	int			bi_error;
 	/* Number of segments in this BIO after
 	 * physical address coalescing is performed.
 	 */
@@ -111,7 +112,6 @@ struct bio {
 /*
  * bio flags
  */
-#define BIO_UPTODATE	0	/* ok after I/O completion */
 #define BIO_SEG_VALID	1	/* bi_phys_segments valid */
 #define BIO_CLONED	2	/* doesn't own data */
 #define BIO_BOUNCED	3	/* bio is a bounce bio */
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 38874729dc5f..31496d201fdc 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -373,9 +373,9 @@ static inline void mem_cgroup_uncharge_swap(swp_entry_t entry)
 /* linux/mm/page_io.c */
 extern int swap_readpage(struct page *);
 extern int swap_writepage(struct page *page, struct writeback_control *wbc);
-extern void end_swap_bio_write(struct bio *bio, int err);
+extern void end_swap_bio_write(struct bio *bio);
 extern int __swap_writepage(struct page *page, struct writeback_control *wbc,
-	void (*end_write_func)(struct bio *, int));
+	bio_end_io_t end_write_func);
 extern int swap_set_page_dirty(struct page *page);
 
 int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 2f30ca91e4fa..b2066fb5b10f 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -227,27 +227,23 @@ static void hib_init_batch(struct hib_bio_batch *hb)
 	hb->error = 0;
 }
 
-static void hib_end_io(struct bio *bio, int error)
+static void hib_end_io(struct bio *bio)
 {
 	struct hib_bio_batch *hb = bio->bi_private;
-	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	struct page *page = bio->bi_io_vec[0].bv_page;
 
-	if (!uptodate || error) {
+	if (bio->bi_error) {
 		printk(KERN_ALERT "Read-error on swap-device (%u:%u:%Lu)\n",
 				imajor(bio->bi_bdev->bd_inode),
 				iminor(bio->bi_bdev->bd_inode),
 				(unsigned long long)bio->bi_iter.bi_sector);
-
-		if (!error)
-			error = -EIO;
 	}
 
 	if (bio_data_dir(bio) == WRITE)
 		put_page(page);
 
-	if (error && !hb->error)
-		hb->error = error;
+	if (bio->bi_error && !hb->error)
+		hb->error = bio->bi_error;
 	if (atomic_dec_and_test(&hb->count))
 		wake_up(&hb->wait);
 
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index b3e6b39b6cf9..90e72a0c3047 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -778,9 +778,6 @@ static void blk_add_trace_bio(struct request_queue *q, struct bio *bio,
 	if (likely(!bt))
 		return;
 
-	if (!error && !bio_flagged(bio, BIO_UPTODATE))
-		error = EIO;
-
 	__blk_add_trace(bt, bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
 			bio->bi_rw, what, error, 0, NULL);
 }
@@ -887,8 +884,7 @@ static void blk_add_trace_split(void *ignore,
 
 		__blk_add_trace(bt, bio->bi_iter.bi_sector,
 				bio->bi_iter.bi_size, bio->bi_rw, BLK_TA_SPLIT,
-				!bio_flagged(bio, BIO_UPTODATE),
-				sizeof(rpdu), &rpdu);
+				bio->bi_error, sizeof(rpdu), &rpdu);
 	}
 }
 
@@ -920,8 +916,8 @@ static void blk_add_trace_bio_remap(void *ignore,
 	r.sector_from = cpu_to_be64(from);
 
 	__blk_add_trace(bt, bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
-			bio->bi_rw, BLK_TA_REMAP,
-			!bio_flagged(bio, BIO_UPTODATE), sizeof(r), &r);
+			bio->bi_rw, BLK_TA_REMAP, bio->bi_error,
+			sizeof(r), &r);
 }
 
 /**
diff --git a/mm/page_io.c b/mm/page_io.c
index 520baa4b04d7..338ce68942a0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -43,12 +43,11 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
 	return bio;
 }
 
-void end_swap_bio_write(struct bio *bio, int err)
+void end_swap_bio_write(struct bio *bio)
 {
-	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	struct page *page = bio->bi_io_vec[0].bv_page;
 
-	if (!uptodate) {
+	if (bio->bi_error) {
 		SetPageError(page);
 		/*
 		 * We failed to write the page out to swap-space.
@@ -69,12 +68,11 @@ void end_swap_bio_write(struct bio *bio, int err)
 	bio_put(bio);
 }
 
-static void end_swap_bio_read(struct bio *bio, int err)
+static void end_swap_bio_read(struct bio *bio)
 {
-	const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	struct page *page = bio->bi_io_vec[0].bv_page;
 
-	if (!uptodate) {
+	if (bio->bi_error) {
 		SetPageError(page);
 		ClearPageUptodate(page);
 		printk(KERN_ALERT "Read-error on swap-device (%u:%u:%Lu)\n",
@@ -254,7 +252,7 @@ static sector_t swap_page_sector(struct page *page)
 }
 
 int __swap_writepage(struct page *page, struct writeback_control *wbc,
-	void (*end_write_func)(struct bio *, int))
+		bio_end_io_t end_write_func)
 {
 	struct bio *bio;
 	int ret, rw = WRITE;
-- 
cgit v1.2.3


From 351169933ea22592fb42b97c76c4c130fe287cca Mon Sep 17 00:00:00 2001
From: "Ivan T. Ivanov" <ivan.ivanov@linaro.org>
Date: Wed, 8 Jul 2015 17:57:40 +0300
Subject: usb: phy: qcom: New APQ8016/MSM8916 USB transceiver driver

Driver handles PHY initialization, clock management, power
management and workarounds required after resetting the hardware.

Signed-off-by: Ivan T. Ivanov <ivan.ivanov@linaro.org>
Signed-off-by: Felipe Balbi <balbi@ti.com>
---
 .../devicetree/bindings/usb/qcom,usb-8x16-phy.txt  |  76 ++++
 drivers/usb/phy/Kconfig                            |  14 +
 drivers/usb/phy/Makefile                           |   1 +
 drivers/usb/phy/phy-qcom-8x16-usb.c                | 436 +++++++++++++++++++++
 4 files changed, 527 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/usb/qcom,usb-8x16-phy.txt
 create mode 100644 drivers/usb/phy/phy-qcom-8x16-usb.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/usb/qcom,usb-8x16-phy.txt b/Documentation/devicetree/bindings/usb/qcom,usb-8x16-phy.txt
new file mode 100644
index 000000000000..2cb2168cef41
--- /dev/null
+++ b/Documentation/devicetree/bindings/usb/qcom,usb-8x16-phy.txt
@@ -0,0 +1,76 @@
+Qualcomm's APQ8016/MSM8916 USB transceiver controller
+
+- compatible:
+    Usage: required
+    Value type: <string>
+    Definition: Should contain "qcom,usb-8x16-phy".
+
+- reg:
+    Usage: required
+    Value type: <prop-encoded-array>
+    Definition: USB PHY base address and length of the register map
+
+- clocks:
+    Usage: required
+    Value type: <prop-encoded-array>
+    Definition: See clock-bindings.txt section "consumers". List of
+                two clock specifiers for interface and core controller
+                clocks.
+
+- clock-names:
+    Usage: required
+    Value type: <string>
+    Definition: Must contain "iface" and "core" strings.
+
+- vddcx-supply:
+    Usage: required
+    Value type: <phandle>
+    Definition: phandle to the regulator VDCCX supply node.
+
+- v1p8-supply:
+    Usage: required
+    Value type: <phandle>
+    Definition: phandle to the regulator 1.8V supply node.
+
+- v3p3-supply:
+    Usage: required
+    Value type: <phandle>
+    Definition: phandle to the regulator 3.3V supply node.
+
+- resets:
+    Usage: required
+    Value type: <prop-encoded-array>
+    Definition: See reset.txt section "consumers". PHY reset specifier.
+
+- reset-names:
+    Usage: required
+    Value type: <string>
+    Definition: Must contain "phy" string.
+
+- switch-gpio:
+    Usage: optional
+    Value type: <prop-encoded-array>
+    Definition: Some boards are using Dual SPDT USB Switch, witch is
+                controlled by GPIO to de/multiplex D+/D- USB lines
+                between connectors.
+
+Example:
+	usb_phy: phy@78d9000 {
+		compatible = "qcom,usb-8x16-phy";
+		reg = <0x78d9000 0x400>;
+
+		vddcx-supply = <&pm8916_s1_corner>;
+		v1p8-supply = <&pm8916_l7>;
+		v3p3-supply = <&pm8916_l13>;
+
+		clocks = <&gcc GCC_USB_HS_AHB_CLK>,
+			     <&gcc GCC_USB_HS_SYSTEM_CLK>;
+		clock-names = "iface", "core";
+
+		resets = <&gcc GCC_USB2A_PHY_BCR>;
+		reset-names = "phy";
+
+		// D+/D- lines: 1 - Routed to HUB, 0 - Device connector
+		switch-gpio = <&pm8916_gpios 4 GPIO_ACTIVE_HIGH>;
+	};
+
diff --git a/drivers/usb/phy/Kconfig b/drivers/usb/phy/Kconfig
index 869c0cfcad98..7d3beee2a587 100644
--- a/drivers/usb/phy/Kconfig
+++ b/drivers/usb/phy/Kconfig
@@ -152,6 +152,20 @@ config USB_MSM_OTG
 	  This driver is not supported on boards like trout which
 	  has an external PHY.
 
+config USB_QCOM_8X16_PHY
+	tristate "Qualcomm APQ8016/MSM8916 on-chip USB PHY controller support"
+	depends on ARCH_QCOM || COMPILE_TEST
+	depends on RESET_CONTROLLER
+	select USB_PHY
+	select USB_ULPI_VIEWPORT
+	help
+	  Enable this to support the USB transceiver on Qualcomm 8x16 chipsets.
+	  It handles PHY initialization, clock management, power management,
+	  and workarounds required after resetting the hardware.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called phy-qcom-8x16-usb.
+
 config USB_MV_OTG
 	tristate "Marvell USB OTG support"
 	depends on USB_EHCI_MV && USB_MV_UDC && PM
diff --git a/drivers/usb/phy/Makefile b/drivers/usb/phy/Makefile
index e36ab1d46d8b..19c0dccbb116 100644
--- a/drivers/usb/phy/Makefile
+++ b/drivers/usb/phy/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_USB_EHCI_TEGRA)		+= phy-tegra-usb.o
 obj-$(CONFIG_USB_GPIO_VBUS)		+= phy-gpio-vbus-usb.o
 obj-$(CONFIG_USB_ISP1301)		+= phy-isp1301.o
 obj-$(CONFIG_USB_MSM_OTG)		+= phy-msm-usb.o
+obj-$(CONFIG_USB_QCOM_8X16_PHY)	+= phy-qcom-8x16-usb.o
 obj-$(CONFIG_USB_MV_OTG)		+= phy-mv-usb.o
 obj-$(CONFIG_USB_MXS_PHY)		+= phy-mxs-usb.o
 obj-$(CONFIG_USB_RCAR_PHY)		+= phy-rcar-usb.o
diff --git a/drivers/usb/phy/phy-qcom-8x16-usb.c b/drivers/usb/phy/phy-qcom-8x16-usb.c
new file mode 100644
index 000000000000..5d357a94599e
--- /dev/null
+++ b/drivers/usb/phy/phy-qcom-8x16-usb.c
@@ -0,0 +1,436 @@
+/*
+ * Copyright (c) 2015, Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/extcon.h>
+#include <linux/gpio/consumer.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/reboot.h>
+#include <linux/regulator/consumer.h>
+#include <linux/reset.h>
+#include <linux/slab.h>
+#include <linux/usb.h>
+#include <linux/usb/ulpi.h>
+
+#define HSPHY_AHBBURST			0x0090
+#define HSPHY_AHBMODE			0x0098
+#define HSPHY_GENCONFIG			0x009c
+#define HSPHY_GENCONFIG_2		0x00a0
+
+#define HSPHY_USBCMD			0x0140
+#define HSPHY_ULPI_VIEWPORT		0x0170
+#define HSPHY_CTRL			0x0240
+
+#define HSPHY_TXFIFO_IDLE_FORCE_DIS	BIT(4)
+#define HSPHY_SESS_VLD_CTRL_EN		BIT(7)
+#define HSPHY_POR_ASSERT		BIT(0)
+#define HSPHY_RETEN			BIT(1)
+
+#define HSPHY_SESS_VLD_CTRL		BIT(25)
+
+#define ULPI_PWR_CLK_MNG_REG		0x88
+#define ULPI_PWR_OTG_COMP_DISABLE	BIT(0)
+
+#define ULPI_MISC_A			0x96
+#define ULPI_MISC_A_VBUSVLDEXTSEL	BIT(1)
+#define ULPI_MISC_A_VBUSVLDEXT		BIT(0)
+
+#define HSPHY_3P3_MIN			3050000 /* uV */
+#define HSPHY_3P3_MAX			3300000 /* uV */
+
+#define HSPHY_1P8_MIN			1800000 /* uV */
+#define HSPHY_1P8_MAX			1800000 /* uV */
+
+#define HSPHY_VDD_MIN			5
+#define HSPHY_VDD_MAX			7
+
+struct phy_8x16 {
+	struct usb_phy			phy;
+	void __iomem			*regs;
+	struct clk			*core_clk;
+	struct clk			*iface_clk;
+	struct regulator		*v3p3;
+	struct regulator		*v1p8;
+	struct regulator		*vdd;
+
+	struct reset_control		*phy_reset;
+
+	struct extcon_specific_cable_nb vbus_cable;
+	struct notifier_block		vbus_notify;
+
+	struct gpio_desc		*switch_gpio;
+	struct notifier_block		reboot_notify;
+};
+
+static int phy_8x16_regulators_enable(struct phy_8x16 *qphy)
+{
+	int ret;
+
+	ret = regulator_set_voltage(qphy->vdd, HSPHY_VDD_MIN, HSPHY_VDD_MAX);
+	if (ret)
+		return ret;
+
+	ret = regulator_enable(qphy->vdd);
+	if (ret)
+		return ret;
+
+	ret = regulator_set_voltage(qphy->v3p3, HSPHY_3P3_MIN, HSPHY_3P3_MAX);
+	if (ret)
+		goto off_vdd;
+
+	ret = regulator_enable(qphy->v3p3);
+	if (ret)
+		goto off_vdd;
+
+	ret = regulator_set_voltage(qphy->v1p8, HSPHY_1P8_MIN, HSPHY_1P8_MAX);
+	if (ret)
+		goto off_3p3;
+
+	ret = regulator_enable(qphy->v1p8);
+	if (ret)
+		goto off_3p3;
+
+	return 0;
+
+off_3p3:
+	regulator_disable(qphy->v3p3);
+off_vdd:
+	regulator_disable(qphy->vdd);
+
+	return ret;
+}
+
+static void phy_8x16_regulators_disable(struct phy_8x16 *qphy)
+{
+	regulator_disable(qphy->v1p8);
+	regulator_disable(qphy->v3p3);
+	regulator_disable(qphy->vdd);
+}
+
+static int phy_8x16_notify_connect(struct usb_phy *phy,
+				   enum usb_device_speed speed)
+{
+	struct phy_8x16 *qphy = container_of(phy, struct phy_8x16, phy);
+	u32 val;
+
+	val = ULPI_MISC_A_VBUSVLDEXTSEL | ULPI_MISC_A_VBUSVLDEXT;
+	usb_phy_io_write(&qphy->phy, val, ULPI_SET(ULPI_MISC_A));
+
+	val = readl(qphy->regs + HSPHY_USBCMD);
+	val |= HSPHY_SESS_VLD_CTRL;
+	writel(val, qphy->regs + HSPHY_USBCMD);
+
+	return 0;
+}
+
+static int phy_8x16_notify_disconnect(struct usb_phy *phy,
+				      enum usb_device_speed speed)
+{
+	struct phy_8x16 *qphy = container_of(phy, struct phy_8x16, phy);
+	u32 val;
+
+	val = ULPI_MISC_A_VBUSVLDEXT | ULPI_MISC_A_VBUSVLDEXTSEL;
+	usb_phy_io_write(&qphy->phy, val, ULPI_CLR(ULPI_MISC_A));
+
+	val = readl(qphy->regs + HSPHY_USBCMD);
+	val &= ~HSPHY_SESS_VLD_CTRL;
+	writel(val, qphy->regs + HSPHY_USBCMD);
+
+	return 0;
+}
+
+static int phy_8x16_vbus_on(struct phy_8x16 *qphy)
+{
+	phy_8x16_notify_connect(&qphy->phy, USB_SPEED_UNKNOWN);
+
+	/* Switch D+/D- lines to Device connector */
+	gpiod_set_value_cansleep(qphy->switch_gpio, 0);
+
+	return 0;
+}
+
+static int phy_8x16_vbus_off(struct phy_8x16 *qphy)
+{
+	phy_8x16_notify_disconnect(&qphy->phy, USB_SPEED_UNKNOWN);
+
+	/* Switch D+/D- lines to USB HUB */
+	gpiod_set_value_cansleep(qphy->switch_gpio, 1);
+
+	return 0;
+}
+
+static int phy_8x16_vbus_notify(struct notifier_block *nb, unsigned long event,
+				void *ptr)
+{
+	struct phy_8x16 *qphy = container_of(nb, struct phy_8x16, vbus_notify);
+
+	if (event)
+		phy_8x16_vbus_on(qphy);
+	else
+		phy_8x16_vbus_off(qphy);
+
+	return NOTIFY_DONE;
+}
+
+static int phy_8x16_init(struct usb_phy *phy)
+{
+	struct phy_8x16 *qphy = container_of(phy, struct phy_8x16, phy);
+	u32 val, init[] = {0x44, 0x6B, 0x24, 0x13};
+	u32 addr = ULPI_EXT_VENDOR_SPECIFIC;
+	int idx, state;
+
+	for (idx = 0; idx < ARRAY_SIZE(init); idx++)
+		usb_phy_io_write(phy, init[idx], addr + idx);
+
+	reset_control_reset(qphy->phy_reset);
+
+	/* Assert USB HSPHY_POR */
+	val = readl(qphy->regs + HSPHY_CTRL);
+	val |= HSPHY_POR_ASSERT;
+	writel(val, qphy->regs + HSPHY_CTRL);
+
+	/*
+	 * wait for minimum 10 microseconds as suggested in HPG.
+	 * Use a slightly larger value since the exact value didn't
+	 * work 100% of the time.
+	 */
+	usleep_range(12, 15);
+
+	/* Deassert USB HSPHY_POR */
+	val = readl(qphy->regs + HSPHY_CTRL);
+	val &= ~HSPHY_POR_ASSERT;
+	writel(val, qphy->regs + HSPHY_CTRL);
+
+	usleep_range(10, 15);
+
+	writel(0x00, qphy->regs + HSPHY_AHBBURST);
+	writel(0x08, qphy->regs + HSPHY_AHBMODE);
+
+	/* workaround for rx buffer collision issue */
+	val = readl(qphy->regs + HSPHY_GENCONFIG);
+	val &= ~HSPHY_TXFIFO_IDLE_FORCE_DIS;
+	writel(val, qphy->regs + HSPHY_GENCONFIG);
+
+	val = readl(qphy->regs + HSPHY_GENCONFIG_2);
+	val |= HSPHY_SESS_VLD_CTRL_EN;
+	writel(val, qphy->regs + HSPHY_GENCONFIG_2);
+
+	val = ULPI_PWR_OTG_COMP_DISABLE;
+	usb_phy_io_write(phy, val, ULPI_SET(ULPI_PWR_CLK_MNG_REG));
+
+	state = extcon_get_cable_state(qphy->vbus_cable.edev, "USB");
+	if (state)
+		phy_8x16_vbus_on(qphy);
+	else
+		phy_8x16_vbus_off(qphy);
+
+	val = usb_phy_io_read(&qphy->phy, ULPI_FUNC_CTRL);
+	val &= ~ULPI_FUNC_CTRL_OPMODE_MASK;
+	val |= ULPI_FUNC_CTRL_OPMODE_NORMAL;
+	usb_phy_io_write(&qphy->phy, val, ULPI_FUNC_CTRL);
+
+	return 0;
+}
+
+static void phy_8x16_shutdown(struct usb_phy *phy)
+{
+	u32 val;
+
+	/* Put the controller in non-driving mode */
+	val = usb_phy_io_read(phy, ULPI_FUNC_CTRL);
+	val &= ~ULPI_FUNC_CTRL_OPMODE_MASK;
+	val |= ULPI_FUNC_CTRL_OPMODE_NONDRIVING;
+	usb_phy_io_write(phy, val, ULPI_FUNC_CTRL);
+}
+
+static int phy_8x16_read_devicetree(struct phy_8x16 *qphy)
+{
+	struct regulator_bulk_data regs[3];
+	struct device *dev = qphy->phy.dev;
+	int ret;
+
+	qphy->core_clk = devm_clk_get(dev, "core");
+	if (IS_ERR(qphy->core_clk))
+		return PTR_ERR(qphy->core_clk);
+
+	qphy->iface_clk = devm_clk_get(dev, "iface");
+	if (IS_ERR(qphy->iface_clk))
+		return PTR_ERR(qphy->iface_clk);
+
+	regs[0].supply = "v3p3";
+	regs[1].supply = "v1p8";
+	regs[2].supply = "vddcx";
+
+	ret = devm_regulator_bulk_get(dev, ARRAY_SIZE(regs), regs);
+	if (ret)
+		return ret;
+
+	qphy->v3p3 = regs[0].consumer;
+	qphy->v1p8 = regs[1].consumer;
+	qphy->vdd  = regs[2].consumer;
+
+	qphy->phy_reset = devm_reset_control_get(dev, "phy");
+	if (IS_ERR(qphy->phy_reset))
+		return PTR_ERR(qphy->phy_reset);
+
+	qphy->switch_gpio = devm_gpiod_get_optional(dev, "switch",
+						   GPIOD_OUT_LOW);
+	if (IS_ERR(qphy->switch_gpio))
+		return PTR_ERR(qphy->switch_gpio);
+
+	return 0;
+}
+
+static int phy_8x16_reboot_notify(struct notifier_block *this,
+				  unsigned long code, void *unused)
+{
+	struct phy_8x16 *qphy;
+
+	qphy = container_of(this, struct phy_8x16, reboot_notify);
+
+	/*
+	 * Ensure that D+/D- lines are routed to uB connector, so
+	 * we could load bootloader/kernel at next reboot_notify
+	 */
+	gpiod_set_value_cansleep(qphy->switch_gpio, 0);
+	return NOTIFY_DONE;
+}
+
+static int phy_8x16_probe(struct platform_device *pdev)
+{
+	struct extcon_dev *edev;
+	struct phy_8x16 *qphy;
+	struct resource *res;
+	struct usb_phy *phy;
+	int ret;
+
+	qphy = devm_kzalloc(&pdev->dev, sizeof(*qphy), GFP_KERNEL);
+	if (!qphy)
+		return -ENOMEM;
+
+	platform_set_drvdata(pdev, qphy);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res)
+		return -EINVAL;
+
+	qphy->regs = devm_ioremap(&pdev->dev, res->start, resource_size(res));
+	if (!qphy->regs)
+		return -ENOMEM;
+
+	phy			= &qphy->phy;
+	phy->dev		= &pdev->dev;
+	phy->label		= dev_name(&pdev->dev);
+	phy->init		= phy_8x16_init;
+	phy->shutdown		= phy_8x16_shutdown;
+	phy->notify_connect	= phy_8x16_notify_connect;
+	phy->notify_disconnect	= phy_8x16_notify_disconnect;
+	phy->io_priv		= qphy->regs + HSPHY_ULPI_VIEWPORT;
+	phy->io_ops		= &ulpi_viewport_access_ops;
+	phy->type		= USB_PHY_TYPE_USB2;
+
+	ret = phy_8x16_read_devicetree(qphy);
+	if (ret < 0)
+		return ret;
+
+	edev = extcon_get_edev_by_phandle(phy->dev, 0);
+	if (IS_ERR(edev))
+		return PTR_ERR(edev);
+
+	ret = clk_set_rate(qphy->core_clk, INT_MAX);
+	if (ret < 0)
+		dev_dbg(phy->dev, "Can't boost core clock\n");
+
+	ret = clk_prepare_enable(qphy->core_clk);
+	if (ret < 0)
+		return ret;
+
+	ret = clk_prepare_enable(qphy->iface_clk);
+	if (ret < 0)
+		goto off_core;
+
+	ret = phy_8x16_regulators_enable(qphy);
+	if (0 && ret)
+		goto off_clks;
+
+	qphy->vbus_notify.notifier_call = phy_8x16_vbus_notify;
+	ret = extcon_register_interest(&qphy->vbus_cable, edev->name,
+				       "USB", &qphy->vbus_notify);
+	if (ret < 0)
+		goto off_power;
+
+	ret = usb_add_phy_dev(&qphy->phy);
+	if (ret)
+		goto off_extcon;
+
+	qphy->reboot_notify.notifier_call = phy_8x16_reboot_notify;
+	register_reboot_notifier(&qphy->reboot_notify);
+
+	return 0;
+
+off_extcon:
+	extcon_unregister_interest(&qphy->vbus_cable);
+off_power:
+	phy_8x16_regulators_disable(qphy);
+off_clks:
+	clk_disable_unprepare(qphy->iface_clk);
+off_core:
+	clk_disable_unprepare(qphy->core_clk);
+	return ret;
+}
+
+static int phy_8x16_remove(struct platform_device *pdev)
+{
+	struct phy_8x16 *qphy = platform_get_drvdata(pdev);
+
+	unregister_reboot_notifier(&qphy->reboot_notify);
+	extcon_unregister_interest(&qphy->vbus_cable);
+
+	/*
+	 * Ensure that D+/D- lines are routed to uB connector, so
+	 * we could load bootloader/kernel at next reboot_notify
+	 */
+	gpiod_set_value_cansleep(qphy->switch_gpio, 0);
+
+	usb_remove_phy(&qphy->phy);
+
+	clk_disable_unprepare(qphy->iface_clk);
+	clk_disable_unprepare(qphy->core_clk);
+	phy_8x16_regulators_disable(qphy);
+	return 0;
+}
+
+static const struct of_device_id phy_8x16_dt_match[] = {
+	{ .compatible = "qcom,usb-8x16-phy" },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, phy_8x16_dt_match);
+
+static struct platform_driver phy_8x16_driver = {
+	.probe	= phy_8x16_probe,
+	.remove = phy_8x16_remove,
+	.driver = {
+		.name = "phy-qcom-8x16-usb",
+		.of_match_table = phy_8x16_dt_match,
+	},
+};
+module_platform_driver(phy_8x16_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Qualcomm APQ8016/MSM8916 chipsets USB transceiver driver");
-- 
cgit v1.2.3


From 744543c599c420bcddca08cd2e2713b82a008328 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@redhat.com>
Date: Wed, 8 Jul 2015 16:41:38 +0200
Subject: usb: musb: sunxi: Add support for the Allwinner sunxi musb controller

This is based on initial code to get the Allwinner sunxi musb controller
supported by Chen-Yu Tsai and Roman Byshko.

This adds support for the Allwinner sunxi musb controller in both host only
and otg mode. Peripheral only mode is not supported, as no boards use that.

This has been tested on a cubietruck (A20 SoC) and an UTOO P66 tablet
(A13 SoC) with a variety of devices in host mode and with the g_serial gadget
driver in peripheral mode, plugging otg / host cables in/out a lot of times
in all possible imaginable plug orders.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
---
 .../bindings/usb/allwinner,sun4i-a10-musb.txt      |  27 +
 drivers/usb/musb/Kconfig                           |  13 +-
 drivers/usb/musb/Makefile                          |   1 +
 drivers/usb/musb/sunxi.c                           | 703 +++++++++++++++++++++
 4 files changed, 743 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/usb/allwinner,sun4i-a10-musb.txt
 create mode 100644 drivers/usb/musb/sunxi.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/usb/allwinner,sun4i-a10-musb.txt b/Documentation/devicetree/bindings/usb/allwinner,sun4i-a10-musb.txt
new file mode 100644
index 000000000000..9254a6c81808
--- /dev/null
+++ b/Documentation/devicetree/bindings/usb/allwinner,sun4i-a10-musb.txt
@@ -0,0 +1,27 @@
+Allwinner sun4i A10 musb DRC/OTG controller
+-------------------------------------------
+
+Required properties:
+ - compatible      : "allwinner,sun4i-a10-musb"
+ - reg             : mmio address range of the musb controller
+ - clocks          : clock specifier for the musb controller ahb gate clock
+ - interrupts      : interrupt to which the musb controller is connected
+ - interrupt-names : must be "mc"
+ - phys            : phy specifier for the otg phy
+ - phy-names       : must be "usb"
+ - dr_mode         : Dual-Role mode must be "host" or "otg"
+ - extcon          : extcon specifier for the otg phy
+
+Example:
+
+	usb_otg: usb@01c13000 {
+		compatible = "allwinner,sun4i-a10-musb";
+		reg = <0x01c13000 0x0400>;
+		clocks = <&ahb_gates 0>;
+		interrupts = <38>;
+		interrupt-names = "mc";
+		phys = <&usbphy 0>;
+		phy-names = "usb";
+		extcon = <&usbphy 0>;
+		status = "disabled";
+	};
diff --git a/drivers/usb/musb/Kconfig b/drivers/usb/musb/Kconfig
index 39db8b603627..37081ed8f295 100644
--- a/drivers/usb/musb/Kconfig
+++ b/drivers/usb/musb/Kconfig
@@ -5,7 +5,7 @@
 
 # (M)HDRC = (Multipoint) Highspeed Dual-Role Controller
 config USB_MUSB_HDRC
-	tristate 'Inventra Highspeed Dual Role Controller (TI, ADI, ...)'
+	tristate 'Inventra Highspeed Dual Role Controller (TI, ADI, AW, ...)'
 	depends on (USB || USB_GADGET)
 	help
 	  Say Y here if your system has a dual role high speed USB
@@ -20,6 +20,8 @@ config USB_MUSB_HDRC
 	  Analog Devices parts using this IP include Blackfin BF54x,
 	  BF525 and BF527.
 
+	  Allwinner SoCs using this IP include A10, A13, A20, ...
+
 	  If you do not know what this is, please say N.
 
 	  To compile this driver as a module, choose M here; the
@@ -60,6 +62,15 @@ endchoice
 
 comment "Platform Glue Layer"
 
+config USB_MUSB_SUNXI
+	tristate "Allwinner (sunxi)"
+	depends on ARCH_SUNXI
+	depends on NOP_USB_XCEIV
+	depends on PHY_SUN4I_USB
+	depends on EXTCON
+	depends on GENERIC_PHY
+	select SUNXI_SRAM
+
 config USB_MUSB_DAVINCI
 	tristate "DaVinci"
 	depends on ARCH_DAVINCI_DMx
diff --git a/drivers/usb/musb/Makefile b/drivers/usb/musb/Makefile
index ba495018b416..f95befe18cc1 100644
--- a/drivers/usb/musb/Makefile
+++ b/drivers/usb/musb/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_USB_MUSB_DA8XX)			+= da8xx.o
 obj-$(CONFIG_USB_MUSB_BLACKFIN)			+= blackfin.o
 obj-$(CONFIG_USB_MUSB_UX500)			+= ux500.o
 obj-$(CONFIG_USB_MUSB_JZ4740)			+= jz4740.o
+obj-$(CONFIG_USB_MUSB_SUNXI)			+= sunxi.o
 
 
 obj-$(CONFIG_USB_MUSB_AM335X_CHILD)		+= musb_am335x.o
diff --git a/drivers/usb/musb/sunxi.c b/drivers/usb/musb/sunxi.c
new file mode 100644
index 000000000000..00d7248a4840
--- /dev/null
+++ b/drivers/usb/musb/sunxi.c
@@ -0,0 +1,703 @@
+/*
+ * Allwinner sun4i MUSB Glue Layer
+ *
+ * Copyright (C) 2015 Hans de Goede <hdegoede@redhat.com>
+ *
+ * Based on code from
+ * Allwinner Technology Co., Ltd. <www.allwinnertech.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/clk.h>
+#include <linux/err.h>
+#include <linux/extcon.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/phy/phy-sun4i-usb.h>
+#include <linux/platform_device.h>
+#include <linux/soc/sunxi/sunxi_sram.h>
+#include <linux/usb/musb.h>
+#include <linux/usb/of.h>
+#include <linux/usb/usb_phy_generic.h>
+#include <linux/workqueue.h>
+#include "musb_core.h"
+
+/*
+ * Register offsets, note sunxi musb has a different layout then most
+ * musb implementations, we translate the layout in musb_readb & friends.
+ */
+#define SUNXI_MUSB_POWER			0x0040
+#define SUNXI_MUSB_DEVCTL			0x0041
+#define SUNXI_MUSB_INDEX			0x0042
+#define SUNXI_MUSB_VEND0			0x0043
+#define SUNXI_MUSB_INTRTX			0x0044
+#define SUNXI_MUSB_INTRRX			0x0046
+#define SUNXI_MUSB_INTRTXE			0x0048
+#define SUNXI_MUSB_INTRRXE			0x004a
+#define SUNXI_MUSB_INTRUSB			0x004c
+#define SUNXI_MUSB_INTRUSBE			0x0050
+#define SUNXI_MUSB_FRAME			0x0054
+#define SUNXI_MUSB_TXFIFOSZ			0x0090
+#define SUNXI_MUSB_TXFIFOADD			0x0092
+#define SUNXI_MUSB_RXFIFOSZ			0x0094
+#define SUNXI_MUSB_RXFIFOADD			0x0096
+#define SUNXI_MUSB_FADDR			0x0098
+#define SUNXI_MUSB_TXFUNCADDR			0x0098
+#define SUNXI_MUSB_TXHUBADDR			0x009a
+#define SUNXI_MUSB_TXHUBPORT			0x009b
+#define SUNXI_MUSB_RXFUNCADDR			0x009c
+#define SUNXI_MUSB_RXHUBADDR			0x009e
+#define SUNXI_MUSB_RXHUBPORT			0x009f
+#define SUNXI_MUSB_CONFIGDATA			0x00c0
+
+/* VEND0 bits */
+#define SUNXI_MUSB_VEND0_PIO_MODE		0
+
+/* flags */
+#define SUNXI_MUSB_FL_ENABLED			0
+#define SUNXI_MUSB_FL_HOSTMODE			1
+#define SUNXI_MUSB_FL_HOSTMODE_PEND		2
+#define SUNXI_MUSB_FL_VBUS_ON			3
+#define SUNXI_MUSB_FL_PHY_ON			4
+
+/* Our read/write methods need access and do not get passed in a musb ref :| */
+static struct musb *sunxi_musb;
+
+struct sunxi_glue {
+	struct device		*dev;
+	struct platform_device	*musb;
+	struct clk		*clk;
+	struct phy		*phy;
+	struct platform_device	*usb_phy;
+	struct usb_phy		*xceiv;
+	unsigned long		flags;
+	struct work_struct	work;
+	struct extcon_dev	*extcon;
+	struct notifier_block	host_nb;
+};
+
+/* phy_power_on / off may sleep, so we use a workqueue  */
+static void sunxi_musb_work(struct work_struct *work)
+{
+	struct sunxi_glue *glue = container_of(work, struct sunxi_glue, work);
+	bool vbus_on, phy_on;
+
+	if (!test_bit(SUNXI_MUSB_FL_ENABLED, &glue->flags))
+		return;
+
+	if (test_and_clear_bit(SUNXI_MUSB_FL_HOSTMODE_PEND, &glue->flags)) {
+		struct musb *musb = platform_get_drvdata(glue->musb);
+		unsigned long flags;
+		u8 devctl;
+
+		spin_lock_irqsave(&musb->lock, flags);
+
+		devctl = readb(musb->mregs + SUNXI_MUSB_DEVCTL);
+		if (test_bit(SUNXI_MUSB_FL_HOSTMODE, &glue->flags)) {
+			set_bit(SUNXI_MUSB_FL_VBUS_ON, &glue->flags);
+			musb->xceiv->otg->default_a = 1;
+			musb->xceiv->otg->state = OTG_STATE_A_IDLE;
+			MUSB_HST_MODE(musb);
+			devctl |= MUSB_DEVCTL_SESSION;
+		} else {
+			clear_bit(SUNXI_MUSB_FL_VBUS_ON, &glue->flags);
+			musb->xceiv->otg->default_a = 0;
+			musb->xceiv->otg->state = OTG_STATE_B_IDLE;
+			MUSB_DEV_MODE(musb);
+			devctl &= ~MUSB_DEVCTL_SESSION;
+		}
+		writeb(devctl, musb->mregs + SUNXI_MUSB_DEVCTL);
+
+		spin_unlock_irqrestore(&musb->lock, flags);
+	}
+
+	vbus_on = test_bit(SUNXI_MUSB_FL_VBUS_ON, &glue->flags);
+	phy_on = test_bit(SUNXI_MUSB_FL_PHY_ON, &glue->flags);
+
+	if (phy_on != vbus_on) {
+		if (vbus_on) {
+			phy_power_on(glue->phy);
+			set_bit(SUNXI_MUSB_FL_PHY_ON, &glue->flags);
+		} else {
+			phy_power_off(glue->phy);
+			clear_bit(SUNXI_MUSB_FL_PHY_ON, &glue->flags);
+		}
+	}
+}
+
+static void sunxi_musb_set_vbus(struct musb *musb, int is_on)
+{
+	struct sunxi_glue *glue = dev_get_drvdata(musb->controller->parent);
+
+	if (is_on)
+		set_bit(SUNXI_MUSB_FL_VBUS_ON, &glue->flags);
+	else
+		clear_bit(SUNXI_MUSB_FL_VBUS_ON, &glue->flags);
+
+	schedule_work(&glue->work);
+}
+
+static void sunxi_musb_pre_root_reset_end(struct musb *musb)
+{
+	struct sunxi_glue *glue = dev_get_drvdata(musb->controller->parent);
+
+	sun4i_usb_phy_set_squelch_detect(glue->phy, false);
+}
+
+static void sunxi_musb_post_root_reset_end(struct musb *musb)
+{
+	struct sunxi_glue *glue = dev_get_drvdata(musb->controller->parent);
+
+	sun4i_usb_phy_set_squelch_detect(glue->phy, true);
+}
+
+static irqreturn_t sunxi_musb_interrupt(int irq, void *__hci)
+{
+	struct musb *musb = __hci;
+	unsigned long flags;
+
+	spin_lock_irqsave(&musb->lock, flags);
+
+	musb->int_usb = readb(musb->mregs + SUNXI_MUSB_INTRUSB);
+	if (musb->int_usb)
+		writeb(musb->int_usb, musb->mregs + SUNXI_MUSB_INTRUSB);
+
+	/*
+	 * sunxi musb often signals babble on low / full speed device
+	 * disconnect, without ever raising MUSB_INTR_DISCONNECT, since
+	 * normally babble never happens treat it as disconnect.
+	 */
+	if ((musb->int_usb & MUSB_INTR_BABBLE) && is_host_active(musb)) {
+		musb->int_usb &= ~MUSB_INTR_BABBLE;
+		musb->int_usb |= MUSB_INTR_DISCONNECT;
+	}
+
+	if ((musb->int_usb & MUSB_INTR_RESET) && !is_host_active(musb)) {
+		/* ep0 FADDR must be 0 when (re)entering peripheral mode */
+		musb_ep_select(musb->mregs, 0);
+		musb_writeb(musb->mregs, MUSB_FADDR, 0);
+	}
+
+	musb->int_tx = readw(musb->mregs + SUNXI_MUSB_INTRTX);
+	if (musb->int_tx)
+		writew(musb->int_tx, musb->mregs + SUNXI_MUSB_INTRTX);
+
+	musb->int_rx = readw(musb->mregs + SUNXI_MUSB_INTRRX);
+	if (musb->int_rx)
+		writew(musb->int_rx, musb->mregs + SUNXI_MUSB_INTRRX);
+
+	musb_interrupt(musb);
+
+	spin_unlock_irqrestore(&musb->lock, flags);
+
+	return IRQ_HANDLED;
+}
+
+static int sunxi_musb_host_notifier(struct notifier_block *nb,
+				    unsigned long event, void *ptr)
+{
+	struct sunxi_glue *glue = container_of(nb, struct sunxi_glue, host_nb);
+
+	if (event)
+		set_bit(SUNXI_MUSB_FL_HOSTMODE, &glue->flags);
+	else
+		clear_bit(SUNXI_MUSB_FL_HOSTMODE, &glue->flags);
+
+	set_bit(SUNXI_MUSB_FL_HOSTMODE_PEND, &glue->flags);
+	schedule_work(&glue->work);
+
+	return NOTIFY_DONE;
+}
+
+static int sunxi_musb_init(struct musb *musb)
+{
+	struct sunxi_glue *glue = dev_get_drvdata(musb->controller->parent);
+	int ret;
+
+	sunxi_musb = musb;
+	musb->phy = glue->phy;
+	musb->xceiv = glue->xceiv;
+
+	ret = sunxi_sram_claim(musb->controller->parent);
+	if (ret)
+		return ret;
+
+	ret = clk_prepare_enable(glue->clk);
+	if (ret)
+		goto error_sram_release;
+
+	writeb(SUNXI_MUSB_VEND0_PIO_MODE, musb->mregs + SUNXI_MUSB_VEND0);
+
+	/* Register notifier before calling phy_init() */
+	if (musb->port_mode == MUSB_PORT_MODE_DUAL_ROLE) {
+		ret = extcon_register_notifier(glue->extcon, EXTCON_USB_HOST,
+					       &glue->host_nb);
+		if (ret)
+			goto error_clk_disable;
+	}
+
+	ret = phy_init(glue->phy);
+	if (ret)
+		goto error_unregister_notifier;
+
+	if (musb->port_mode == MUSB_PORT_MODE_HOST) {
+		ret = phy_power_on(glue->phy);
+		if (ret)
+			goto error_phy_exit;
+		set_bit(SUNXI_MUSB_FL_PHY_ON, &glue->flags);
+		/* Stop musb work from turning vbus off again */
+		set_bit(SUNXI_MUSB_FL_VBUS_ON, &glue->flags);
+	}
+
+	musb->isr = sunxi_musb_interrupt;
+
+	/* Stop the musb-core from doing runtime pm (not supported on sunxi) */
+	pm_runtime_get(musb->controller);
+
+	return 0;
+
+error_phy_exit:
+	phy_exit(glue->phy);
+error_unregister_notifier:
+	if (musb->port_mode == MUSB_PORT_MODE_DUAL_ROLE)
+		extcon_unregister_notifier(glue->extcon, EXTCON_USB_HOST,
+					   &glue->host_nb);
+error_clk_disable:
+	clk_disable_unprepare(glue->clk);
+error_sram_release:
+	sunxi_sram_release(musb->controller->parent);
+	return ret;
+}
+
+static int sunxi_musb_exit(struct musb *musb)
+{
+	struct sunxi_glue *glue = dev_get_drvdata(musb->controller->parent);
+
+	pm_runtime_put(musb->controller);
+
+	cancel_work_sync(&glue->work);
+	if (test_bit(SUNXI_MUSB_FL_PHY_ON, &glue->flags))
+		phy_power_off(glue->phy);
+
+	phy_exit(glue->phy);
+
+	if (musb->port_mode == MUSB_PORT_MODE_DUAL_ROLE)
+		extcon_unregister_notifier(glue->extcon, EXTCON_USB_HOST,
+					   &glue->host_nb);
+
+	clk_disable_unprepare(glue->clk);
+	sunxi_sram_release(musb->controller->parent);
+
+	return 0;
+}
+
+static void sunxi_musb_enable(struct musb *musb)
+{
+	struct sunxi_glue *glue = dev_get_drvdata(musb->controller->parent);
+
+	/* musb_core does not call us in a balanced manner */
+	if (test_and_set_bit(SUNXI_MUSB_FL_ENABLED, &glue->flags))
+		return;
+
+	schedule_work(&glue->work);
+}
+
+static void sunxi_musb_disable(struct musb *musb)
+{
+	struct sunxi_glue *glue = dev_get_drvdata(musb->controller->parent);
+
+	clear_bit(SUNXI_MUSB_FL_ENABLED, &glue->flags);
+}
+
+/*
+ * sunxi musb register layout
+ * 0x00 - 0x17	fifo regs, 1 long per fifo
+ * 0x40 - 0x57	generic control regs (power - frame)
+ * 0x80 - 0x8f	ep control regs (addressed through hw_ep->regs, indexed)
+ * 0x90 - 0x97	fifo control regs (indexed)
+ * 0x98 - 0x9f	multipoint / busctl regs (indexed)
+ * 0xc0		configdata reg
+ */
+
+static u32 sunxi_musb_fifo_offset(u8 epnum)
+{
+	return (epnum * 4);
+}
+
+static u32 sunxi_musb_ep_offset(u8 epnum, u16 offset)
+{
+	WARN_ONCE(offset != 0,
+		  "sunxi_musb_ep_offset called with non 0 offset\n");
+
+	return 0x80; /* indexed, so ignore epnum */
+}
+
+static u32 sunxi_musb_busctl_offset(u8 epnum, u16 offset)
+{
+	return SUNXI_MUSB_TXFUNCADDR + offset;
+}
+
+static u8 sunxi_musb_readb(const void __iomem *addr, unsigned offset)
+{
+	if (addr == sunxi_musb->mregs) {
+		/* generic control or fifo control reg access */
+		switch (offset) {
+		case MUSB_FADDR:
+			return readb(addr + SUNXI_MUSB_FADDR);
+		case MUSB_POWER:
+			return readb(addr + SUNXI_MUSB_POWER);
+		case MUSB_INTRUSB:
+			return readb(addr + SUNXI_MUSB_INTRUSB);
+		case MUSB_INTRUSBE:
+			return readb(addr + SUNXI_MUSB_INTRUSBE);
+		case MUSB_INDEX:
+			return readb(addr + SUNXI_MUSB_INDEX);
+		case MUSB_TESTMODE:
+			return 0; /* No testmode on sunxi */
+		case MUSB_DEVCTL:
+			return readb(addr + SUNXI_MUSB_DEVCTL);
+		case MUSB_TXFIFOSZ:
+			return readb(addr + SUNXI_MUSB_TXFIFOSZ);
+		case MUSB_RXFIFOSZ:
+			return readb(addr + SUNXI_MUSB_RXFIFOSZ);
+		case MUSB_CONFIGDATA + 0x10: /* See musb_read_configdata() */
+			return readb(addr + SUNXI_MUSB_CONFIGDATA);
+		/* Offset for these is fixed by sunxi_musb_busctl_offset() */
+		case SUNXI_MUSB_TXFUNCADDR:
+		case SUNXI_MUSB_TXHUBADDR:
+		case SUNXI_MUSB_TXHUBPORT:
+		case SUNXI_MUSB_RXFUNCADDR:
+		case SUNXI_MUSB_RXHUBADDR:
+		case SUNXI_MUSB_RXHUBPORT:
+			/* multipoint / busctl reg access */
+			return readb(addr + offset);
+		default:
+			dev_err(sunxi_musb->controller->parent,
+				"Error unknown readb offset %u\n", offset);
+			return 0;
+		}
+	} else if (addr == (sunxi_musb->mregs + 0x80)) {
+		/* ep control reg access */
+		/* sunxi has a 2 byte hole before the txtype register */
+		if (offset >= MUSB_TXTYPE)
+			offset += 2;
+		return readb(addr + offset);
+	}
+
+	dev_err(sunxi_musb->controller->parent,
+		"Error unknown readb at 0x%x bytes offset\n",
+		(int)(addr - sunxi_musb->mregs));
+	return 0;
+}
+
+static void sunxi_musb_writeb(void __iomem *addr, unsigned offset, u8 data)
+{
+	if (addr == sunxi_musb->mregs) {
+		/* generic control or fifo control reg access */
+		switch (offset) {
+		case MUSB_FADDR:
+			return writeb(data, addr + SUNXI_MUSB_FADDR);
+		case MUSB_POWER:
+			return writeb(data, addr + SUNXI_MUSB_POWER);
+		case MUSB_INTRUSB:
+			return writeb(data, addr + SUNXI_MUSB_INTRUSB);
+		case MUSB_INTRUSBE:
+			return writeb(data, addr + SUNXI_MUSB_INTRUSBE);
+		case MUSB_INDEX:
+			return writeb(data, addr + SUNXI_MUSB_INDEX);
+		case MUSB_TESTMODE:
+			if (data)
+				dev_warn(sunxi_musb->controller->parent,
+					"sunxi-musb does not have testmode\n");
+			return;
+		case MUSB_DEVCTL:
+			return writeb(data, addr + SUNXI_MUSB_DEVCTL);
+		case MUSB_TXFIFOSZ:
+			return writeb(data, addr + SUNXI_MUSB_TXFIFOSZ);
+		case MUSB_RXFIFOSZ:
+			return writeb(data, addr + SUNXI_MUSB_RXFIFOSZ);
+		/* Offset for these is fixed by sunxi_musb_busctl_offset() */
+		case SUNXI_MUSB_TXFUNCADDR:
+		case SUNXI_MUSB_TXHUBADDR:
+		case SUNXI_MUSB_TXHUBPORT:
+		case SUNXI_MUSB_RXFUNCADDR:
+		case SUNXI_MUSB_RXHUBADDR:
+		case SUNXI_MUSB_RXHUBPORT:
+			/* multipoint / busctl reg access */
+			return writeb(data, addr + offset);
+		default:
+			dev_err(sunxi_musb->controller->parent,
+				"Error unknown writeb offset %u\n", offset);
+			return;
+		}
+	} else if (addr == (sunxi_musb->mregs + 0x80)) {
+		/* ep control reg access */
+		if (offset >= MUSB_TXTYPE)
+			offset += 2;
+		return writeb(data, addr + offset);
+	}
+
+	dev_err(sunxi_musb->controller->parent,
+		"Error unknown writeb at 0x%x bytes offset\n",
+		(int)(addr - sunxi_musb->mregs));
+}
+
+static u16 sunxi_musb_readw(const void __iomem *addr, unsigned offset)
+{
+	if (addr == sunxi_musb->mregs) {
+		/* generic control or fifo control reg access */
+		switch (offset) {
+		case MUSB_INTRTX:
+			return readw(addr + SUNXI_MUSB_INTRTX);
+		case MUSB_INTRRX:
+			return readw(addr + SUNXI_MUSB_INTRRX);
+		case MUSB_INTRTXE:
+			return readw(addr + SUNXI_MUSB_INTRTXE);
+		case MUSB_INTRRXE:
+			return readw(addr + SUNXI_MUSB_INTRRXE);
+		case MUSB_FRAME:
+			return readw(addr + SUNXI_MUSB_FRAME);
+		case MUSB_TXFIFOADD:
+			return readw(addr + SUNXI_MUSB_TXFIFOADD);
+		case MUSB_RXFIFOADD:
+			return readw(addr + SUNXI_MUSB_RXFIFOADD);
+		case MUSB_HWVERS:
+			return 0; /* sunxi musb version is not known */
+		default:
+			dev_err(sunxi_musb->controller->parent,
+				"Error unknown readw offset %u\n", offset);
+			return 0;
+		}
+	} else if (addr == (sunxi_musb->mregs + 0x80)) {
+		/* ep control reg access */
+		return readw(addr + offset);
+	}
+
+	dev_err(sunxi_musb->controller->parent,
+		"Error unknown readw at 0x%x bytes offset\n",
+		(int)(addr - sunxi_musb->mregs));
+	return 0;
+}
+
+static void sunxi_musb_writew(void __iomem *addr, unsigned offset, u16 data)
+{
+	if (addr == sunxi_musb->mregs) {
+		/* generic control or fifo control reg access */
+		switch (offset) {
+		case MUSB_INTRTX:
+			return writew(data, addr + SUNXI_MUSB_INTRTX);
+		case MUSB_INTRRX:
+			return writew(data, addr + SUNXI_MUSB_INTRRX);
+		case MUSB_INTRTXE:
+			return writew(data, addr + SUNXI_MUSB_INTRTXE);
+		case MUSB_INTRRXE:
+			return writew(data, addr + SUNXI_MUSB_INTRRXE);
+		case MUSB_FRAME:
+			return writew(data, addr + SUNXI_MUSB_FRAME);
+		case MUSB_TXFIFOADD:
+			return writew(data, addr + SUNXI_MUSB_TXFIFOADD);
+		case MUSB_RXFIFOADD:
+			return writew(data, addr + SUNXI_MUSB_RXFIFOADD);
+		default:
+			dev_err(sunxi_musb->controller->parent,
+				"Error unknown writew offset %u\n", offset);
+			return;
+		}
+	} else if (addr == (sunxi_musb->mregs + 0x80)) {
+		/* ep control reg access */
+		return writew(data, addr + offset);
+	}
+
+	dev_err(sunxi_musb->controller->parent,
+		"Error unknown writew at 0x%x bytes offset\n",
+		(int)(addr - sunxi_musb->mregs));
+}
+
+static const struct musb_platform_ops sunxi_musb_ops = {
+	.quirks		= MUSB_INDEXED_EP,
+	.init		= sunxi_musb_init,
+	.exit		= sunxi_musb_exit,
+	.enable		= sunxi_musb_enable,
+	.disable	= sunxi_musb_disable,
+	.fifo_offset	= sunxi_musb_fifo_offset,
+	.ep_offset	= sunxi_musb_ep_offset,
+	.busctl_offset	= sunxi_musb_busctl_offset,
+	.readb		= sunxi_musb_readb,
+	.writeb		= sunxi_musb_writeb,
+	.readw		= sunxi_musb_readw,
+	.writew		= sunxi_musb_writew,
+	.set_vbus	= sunxi_musb_set_vbus,
+	.pre_root_reset_end = sunxi_musb_pre_root_reset_end,
+	.post_root_reset_end = sunxi_musb_post_root_reset_end,
+};
+
+/* Allwinner OTG supports up to 5 endpoints */
+#define SUNXI_MUSB_MAX_EP_NUM	6
+#define SUNXI_MUSB_RAM_BITS	11
+
+static struct musb_fifo_cfg sunxi_musb_mode_cfg[] = {
+	MUSB_EP_FIFO_SINGLE(1, FIFO_TX, 512),
+	MUSB_EP_FIFO_SINGLE(1, FIFO_RX, 512),
+	MUSB_EP_FIFO_SINGLE(2, FIFO_TX, 512),
+	MUSB_EP_FIFO_SINGLE(2, FIFO_RX, 512),
+	MUSB_EP_FIFO_SINGLE(3, FIFO_TX, 512),
+	MUSB_EP_FIFO_SINGLE(3, FIFO_RX, 512),
+	MUSB_EP_FIFO_SINGLE(4, FIFO_TX, 512),
+	MUSB_EP_FIFO_SINGLE(4, FIFO_RX, 512),
+	MUSB_EP_FIFO_SINGLE(5, FIFO_TX, 512),
+	MUSB_EP_FIFO_SINGLE(5, FIFO_RX, 512),
+};
+
+static struct musb_hdrc_config sunxi_musb_hdrc_config = {
+	.fifo_cfg       = sunxi_musb_mode_cfg,
+	.fifo_cfg_size  = ARRAY_SIZE(sunxi_musb_mode_cfg),
+	.multipoint	= true,
+	.dyn_fifo	= true,
+	.soft_con       = true,
+	.num_eps	= SUNXI_MUSB_MAX_EP_NUM,
+	.ram_bits	= SUNXI_MUSB_RAM_BITS,
+	.dma		= 0,
+};
+
+static int sunxi_musb_probe(struct platform_device *pdev)
+{
+	struct musb_hdrc_platform_data	pdata;
+	struct platform_device_info	pinfo;
+	struct sunxi_glue		*glue;
+	struct device_node		*np = pdev->dev.of_node;
+	int ret;
+
+	if (!np) {
+		dev_err(&pdev->dev, "Error no device tree node found\n");
+		return -EINVAL;
+	}
+
+	glue = devm_kzalloc(&pdev->dev, sizeof(*glue), GFP_KERNEL);
+	if (!glue)
+		return -ENOMEM;
+
+	memset(&pdata, 0, sizeof(pdata));
+	switch (of_usb_get_dr_mode(np)) {
+#if defined CONFIG_USB_MUSB_DUAL_ROLE || defined CONFIG_USB_MUSB_HOST
+	case USB_DR_MODE_HOST:
+		pdata.mode = MUSB_PORT_MODE_HOST;
+		break;
+#endif
+#ifdef CONFIG_USB_MUSB_DUAL_ROLE
+	case USB_DR_MODE_OTG:
+		glue->extcon = extcon_get_edev_by_phandle(&pdev->dev, 0);
+		if (IS_ERR(glue->extcon)) {
+			if (PTR_ERR(glue->extcon) == -EPROBE_DEFER)
+				return -EPROBE_DEFER;
+			dev_err(&pdev->dev, "Invalid or missing extcon\n");
+			return PTR_ERR(glue->extcon);
+		}
+		pdata.mode = MUSB_PORT_MODE_DUAL_ROLE;
+		break;
+#endif
+	default:
+		dev_err(&pdev->dev, "Invalid or missing 'dr_mode' property\n");
+		return -EINVAL;
+	}
+	pdata.platform_ops	= &sunxi_musb_ops;
+	pdata.config		= &sunxi_musb_hdrc_config;
+
+	glue->dev = &pdev->dev;
+	INIT_WORK(&glue->work, sunxi_musb_work);
+	glue->host_nb.notifier_call = sunxi_musb_host_notifier;
+
+	glue->clk = devm_clk_get(&pdev->dev, NULL);
+	if (IS_ERR(glue->clk)) {
+		dev_err(&pdev->dev, "Error getting clock: %ld\n",
+			PTR_ERR(glue->clk));
+		return PTR_ERR(glue->clk);
+	}
+
+	glue->phy = devm_phy_get(&pdev->dev, "usb");
+	if (IS_ERR(glue->phy)) {
+		if (PTR_ERR(glue->phy) == -EPROBE_DEFER)
+			return -EPROBE_DEFER;
+		dev_err(&pdev->dev, "Error getting phy %ld\n",
+			PTR_ERR(glue->phy));
+		return PTR_ERR(glue->phy);
+	}
+
+	glue->usb_phy = usb_phy_generic_register();
+	if (IS_ERR(glue->usb_phy)) {
+		dev_err(&pdev->dev, "Error registering usb-phy %ld\n",
+			PTR_ERR(glue->usb_phy));
+		return PTR_ERR(glue->usb_phy);
+	}
+
+	glue->xceiv = devm_usb_get_phy(&pdev->dev, USB_PHY_TYPE_USB2);
+	if (IS_ERR(glue->xceiv)) {
+		ret = PTR_ERR(glue->xceiv);
+		dev_err(&pdev->dev, "Error getting usb-phy %d\n", ret);
+		goto err_unregister_usb_phy;
+	}
+
+	platform_set_drvdata(pdev, glue);
+
+	memset(&pinfo, 0, sizeof(pinfo));
+	pinfo.name	 = "musb-hdrc";
+	pinfo.id	= PLATFORM_DEVID_AUTO;
+	pinfo.parent	= &pdev->dev;
+	pinfo.res	= pdev->resource;
+	pinfo.num_res	= pdev->num_resources;
+	pinfo.data	= &pdata;
+	pinfo.size_data = sizeof(pdata);
+
+	glue->musb = platform_device_register_full(&pinfo);
+	if (IS_ERR(glue->musb)) {
+		ret = PTR_ERR(glue->musb);
+		dev_err(&pdev->dev, "Error registering musb dev: %d\n", ret);
+		goto err_unregister_usb_phy;
+	}
+
+	return 0;
+
+err_unregister_usb_phy:
+	usb_phy_generic_unregister(glue->usb_phy);
+	return ret;
+}
+
+static int sunxi_musb_remove(struct platform_device *pdev)
+{
+	struct sunxi_glue *glue = platform_get_drvdata(pdev);
+	struct platform_device *usb_phy = glue->usb_phy;
+
+	platform_device_unregister(glue->musb); /* Frees glue ! */
+	usb_phy_generic_unregister(usb_phy);
+
+	return 0;
+}
+
+static const struct of_device_id sunxi_musb_match[] = {
+	{ .compatible = "allwinner,sun4i-a10-musb", },
+	{}
+};
+
+static struct platform_driver sunxi_musb_driver = {
+	.probe = sunxi_musb_probe,
+	.remove = sunxi_musb_remove,
+	.driver = {
+		.name = "musb-sunxi",
+		.of_match_table = sunxi_musb_match,
+	},
+};
+module_platform_driver(sunxi_musb_driver);
+
+MODULE_DESCRIPTION("Allwinner sunxi MUSB Glue Layer");
+MODULE_AUTHOR("Hans de Goede <hdegoede@redhat.com>");
+MODULE_LICENSE("GPL v2");
-- 
cgit v1.2.3


From 132e23775779cc895c37f7883c33a60a1a8a7cdd Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@redhat.com>
Date: Wed, 8 Jul 2015 16:41:39 +0200
Subject: usb: musb: sunxi: Add support for musb controller in A31 SoC

The A31 SoC uses the same musb controller as found in earlier SoCs, but it
is hooked up slightly different. Its SRAM is private and no longer controlled
through the SRAM controller, and its reset is controlled via a separate
reset controller. This commit adds support for this setup.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
---
 .../bindings/usb/allwinner,sun4i-a10-musb.txt      |  3 +-
 drivers/usb/musb/sunxi.c                           | 50 +++++++++++++++++++---
 2 files changed, 46 insertions(+), 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/usb/allwinner,sun4i-a10-musb.txt b/Documentation/devicetree/bindings/usb/allwinner,sun4i-a10-musb.txt
index 9254a6c81808..fde180bfb162 100644
--- a/Documentation/devicetree/bindings/usb/allwinner,sun4i-a10-musb.txt
+++ b/Documentation/devicetree/bindings/usb/allwinner,sun4i-a10-musb.txt
@@ -2,9 +2,10 @@ Allwinner sun4i A10 musb DRC/OTG controller
 -------------------------------------------
 
 Required properties:
- - compatible      : "allwinner,sun4i-a10-musb"
+ - compatible      : "allwinner,sun4i-a10-musb" or "allwinner,sun6i-a31-musb"
  - reg             : mmio address range of the musb controller
  - clocks          : clock specifier for the musb controller ahb gate clock
+ - reset           : reset specifier for the ahb reset (A31 and newer only)
  - interrupts      : interrupt to which the musb controller is connected
  - interrupt-names : must be "mc"
  - phys            : phy specifier for the otg phy
diff --git a/drivers/usb/musb/sunxi.c b/drivers/usb/musb/sunxi.c
index 00d7248a4840..df2f75eb9895 100644
--- a/drivers/usb/musb/sunxi.c
+++ b/drivers/usb/musb/sunxi.c
@@ -26,6 +26,7 @@
 #include <linux/of.h>
 #include <linux/phy/phy-sun4i-usb.h>
 #include <linux/platform_device.h>
+#include <linux/reset.h>
 #include <linux/soc/sunxi/sunxi_sram.h>
 #include <linux/usb/musb.h>
 #include <linux/usb/of.h>
@@ -70,6 +71,8 @@
 #define SUNXI_MUSB_FL_HOSTMODE_PEND		2
 #define SUNXI_MUSB_FL_VBUS_ON			3
 #define SUNXI_MUSB_FL_PHY_ON			4
+#define SUNXI_MUSB_FL_HAS_SRAM			5
+#define SUNXI_MUSB_FL_HAS_RESET			6
 
 /* Our read/write methods need access and do not get passed in a musb ref :| */
 static struct musb *sunxi_musb;
@@ -78,6 +81,7 @@ struct sunxi_glue {
 	struct device		*dev;
 	struct platform_device	*musb;
 	struct clk		*clk;
+	struct reset_control	*rst;
 	struct phy		*phy;
 	struct platform_device	*usb_phy;
 	struct usb_phy		*xceiv;
@@ -229,14 +233,22 @@ static int sunxi_musb_init(struct musb *musb)
 	musb->phy = glue->phy;
 	musb->xceiv = glue->xceiv;
 
-	ret = sunxi_sram_claim(musb->controller->parent);
-	if (ret)
-		return ret;
+	if (test_bit(SUNXI_MUSB_FL_HAS_SRAM, &glue->flags)) {
+		ret = sunxi_sram_claim(musb->controller->parent);
+		if (ret)
+			return ret;
+	}
 
 	ret = clk_prepare_enable(glue->clk);
 	if (ret)
 		goto error_sram_release;
 
+	if (test_bit(SUNXI_MUSB_FL_HAS_RESET, &glue->flags)) {
+		ret = reset_control_deassert(glue->rst);
+		if (ret)
+			goto error_clk_disable;
+	}
+
 	writeb(SUNXI_MUSB_VEND0_PIO_MODE, musb->mregs + SUNXI_MUSB_VEND0);
 
 	/* Register notifier before calling phy_init() */
@@ -244,7 +256,7 @@ static int sunxi_musb_init(struct musb *musb)
 		ret = extcon_register_notifier(glue->extcon, EXTCON_USB_HOST,
 					       &glue->host_nb);
 		if (ret)
-			goto error_clk_disable;
+			goto error_reset_assert;
 	}
 
 	ret = phy_init(glue->phy);
@@ -273,10 +285,14 @@ error_unregister_notifier:
 	if (musb->port_mode == MUSB_PORT_MODE_DUAL_ROLE)
 		extcon_unregister_notifier(glue->extcon, EXTCON_USB_HOST,
 					   &glue->host_nb);
+error_reset_assert:
+	if (test_bit(SUNXI_MUSB_FL_HAS_RESET, &glue->flags))
+		reset_control_assert(glue->rst);
 error_clk_disable:
 	clk_disable_unprepare(glue->clk);
 error_sram_release:
-	sunxi_sram_release(musb->controller->parent);
+	if (test_bit(SUNXI_MUSB_FL_HAS_SRAM, &glue->flags))
+		sunxi_sram_release(musb->controller->parent);
 	return ret;
 }
 
@@ -296,8 +312,12 @@ static int sunxi_musb_exit(struct musb *musb)
 		extcon_unregister_notifier(glue->extcon, EXTCON_USB_HOST,
 					   &glue->host_nb);
 
+	if (test_bit(SUNXI_MUSB_FL_HAS_RESET, &glue->flags))
+		reset_control_assert(glue->rst);
+
 	clk_disable_unprepare(glue->clk);
-	sunxi_sram_release(musb->controller->parent);
+	if (test_bit(SUNXI_MUSB_FL_HAS_SRAM, &glue->flags))
+		sunxi_sram_release(musb->controller->parent);
 
 	return 0;
 }
@@ -617,6 +637,12 @@ static int sunxi_musb_probe(struct platform_device *pdev)
 	INIT_WORK(&glue->work, sunxi_musb_work);
 	glue->host_nb.notifier_call = sunxi_musb_host_notifier;
 
+	if (of_device_is_compatible(np, "allwinner,sun4i-a10-musb"))
+		set_bit(SUNXI_MUSB_FL_HAS_SRAM, &glue->flags);
+
+	if (of_device_is_compatible(np, "allwinner,sun6i-a31-musb"))
+		set_bit(SUNXI_MUSB_FL_HAS_RESET, &glue->flags);
+
 	glue->clk = devm_clk_get(&pdev->dev, NULL);
 	if (IS_ERR(glue->clk)) {
 		dev_err(&pdev->dev, "Error getting clock: %ld\n",
@@ -624,6 +650,17 @@ static int sunxi_musb_probe(struct platform_device *pdev)
 		return PTR_ERR(glue->clk);
 	}
 
+	if (test_bit(SUNXI_MUSB_FL_HAS_RESET, &glue->flags)) {
+		glue->rst = devm_reset_control_get(&pdev->dev, NULL);
+		if (IS_ERR(glue->rst)) {
+			if (PTR_ERR(glue->rst) == -EPROBE_DEFER)
+				return -EPROBE_DEFER;
+			dev_err(&pdev->dev, "Error getting reset %ld\n",
+				PTR_ERR(glue->rst));
+			return PTR_ERR(glue->rst);
+		}
+	}
+
 	glue->phy = devm_phy_get(&pdev->dev, "usb");
 	if (IS_ERR(glue->phy)) {
 		if (PTR_ERR(glue->phy) == -EPROBE_DEFER)
@@ -685,6 +722,7 @@ static int sunxi_musb_remove(struct platform_device *pdev)
 
 static const struct of_device_id sunxi_musb_match[] = {
 	{ .compatible = "allwinner,sun4i-a10-musb", },
+	{ .compatible = "allwinner,sun6i-a31-musb", },
 	{}
 };
 
-- 
cgit v1.2.3


From d91de093d94eca6e280e50c24b172ed598bb5724 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@redhat.com>
Date: Wed, 8 Jul 2015 16:41:40 +0200
Subject: usb: musb: sunxi: Add support for musb controller in A33 SoC

The A33 SoC uses the same musb controller as found on the A31 and later,
but allwinner has removed the configdata register, this commit adds special
handling for this.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
---
 .../devicetree/bindings/usb/allwinner,sun4i-a10-musb.txt  |  3 ++-
 drivers/usb/musb/sunxi.c                                  | 15 +++++++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/usb/allwinner,sun4i-a10-musb.txt b/Documentation/devicetree/bindings/usb/allwinner,sun4i-a10-musb.txt
index fde180bfb162..862cd7c79805 100644
--- a/Documentation/devicetree/bindings/usb/allwinner,sun4i-a10-musb.txt
+++ b/Documentation/devicetree/bindings/usb/allwinner,sun4i-a10-musb.txt
@@ -2,7 +2,8 @@ Allwinner sun4i A10 musb DRC/OTG controller
 -------------------------------------------
 
 Required properties:
- - compatible      : "allwinner,sun4i-a10-musb" or "allwinner,sun6i-a31-musb"
+ - compatible      : "allwinner,sun4i-a10-musb", "allwinner,sun6i-a31-musb"
+                     or "allwinner,sun8i-a33-musb"
  - reg             : mmio address range of the musb controller
  - clocks          : clock specifier for the musb controller ahb gate clock
  - reset           : reset specifier for the ahb reset (A31 and newer only)
diff --git a/drivers/usb/musb/sunxi.c b/drivers/usb/musb/sunxi.c
index df2f75eb9895..f9f6304ad854 100644
--- a/drivers/usb/musb/sunxi.c
+++ b/drivers/usb/musb/sunxi.c
@@ -73,6 +73,7 @@
 #define SUNXI_MUSB_FL_PHY_ON			4
 #define SUNXI_MUSB_FL_HAS_SRAM			5
 #define SUNXI_MUSB_FL_HAS_RESET			6
+#define SUNXI_MUSB_FL_NO_CONFIGDATA		7
 
 /* Our read/write methods need access and do not get passed in a musb ref :| */
 static struct musb *sunxi_musb;
@@ -370,6 +371,8 @@ static u32 sunxi_musb_busctl_offset(u8 epnum, u16 offset)
 
 static u8 sunxi_musb_readb(const void __iomem *addr, unsigned offset)
 {
+	struct sunxi_glue *glue;
+
 	if (addr == sunxi_musb->mregs) {
 		/* generic control or fifo control reg access */
 		switch (offset) {
@@ -392,6 +395,12 @@ static u8 sunxi_musb_readb(const void __iomem *addr, unsigned offset)
 		case MUSB_RXFIFOSZ:
 			return readb(addr + SUNXI_MUSB_RXFIFOSZ);
 		case MUSB_CONFIGDATA + 0x10: /* See musb_read_configdata() */
+			glue = dev_get_drvdata(sunxi_musb->controller->parent);
+			/* A33 saves a reg, and we get to hardcode this */
+			if (test_bit(SUNXI_MUSB_FL_NO_CONFIGDATA,
+				     &glue->flags))
+				return 0xde;
+
 			return readb(addr + SUNXI_MUSB_CONFIGDATA);
 		/* Offset for these is fixed by sunxi_musb_busctl_offset() */
 		case SUNXI_MUSB_TXFUNCADDR:
@@ -643,6 +652,11 @@ static int sunxi_musb_probe(struct platform_device *pdev)
 	if (of_device_is_compatible(np, "allwinner,sun6i-a31-musb"))
 		set_bit(SUNXI_MUSB_FL_HAS_RESET, &glue->flags);
 
+	if (of_device_is_compatible(np, "allwinner,sun8i-a33-musb")) {
+		set_bit(SUNXI_MUSB_FL_HAS_RESET, &glue->flags);
+		set_bit(SUNXI_MUSB_FL_NO_CONFIGDATA, &glue->flags);
+	}
+
 	glue->clk = devm_clk_get(&pdev->dev, NULL);
 	if (IS_ERR(glue->clk)) {
 		dev_err(&pdev->dev, "Error getting clock: %ld\n",
@@ -723,6 +737,7 @@ static int sunxi_musb_remove(struct platform_device *pdev)
 static const struct of_device_id sunxi_musb_match[] = {
 	{ .compatible = "allwinner,sun4i-a10-musb", },
 	{ .compatible = "allwinner,sun6i-a31-musb", },
+	{ .compatible = "allwinner,sun8i-a33-musb", },
 	{}
 };
 
-- 
cgit v1.2.3


From b5513dede7c90ba2370ab5eb547568a22991907b Mon Sep 17 00:00:00 2001
From: Li Jun <jun.li@freescale.com>
Date: Thu, 9 Jul 2015 15:18:43 +0800
Subject: doc: dt-binding: usb: add otg related properties

Add otg version, srp, hnp and adp support for usb OTG port, then those OTG
features don't have to be decided by usb gadget drivers.

Signed-off-by: Li Jun <jun.li@freescale.com>
Reviewed-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
---
 Documentation/devicetree/bindings/usb/generic.txt | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/usb/generic.txt b/Documentation/devicetree/bindings/usb/generic.txt
index 477d5bb5e51c..bba825711873 100644
--- a/Documentation/devicetree/bindings/usb/generic.txt
+++ b/Documentation/devicetree/bindings/usb/generic.txt
@@ -11,6 +11,19 @@ Optional properties:
 			"peripheral" and "otg". In case this attribute isn't
 			passed via DT, USB DRD controllers should default to
 			OTG.
+ - otg-rev: tells usb driver the release number of the OTG and EH supplement
+			with which the device and its descriptors are compliant,
+			in binary-coded decimal (i.e. 2.0 is 0200H). This
+			property is used if any real OTG features(HNP/SRP/ADP)
+			is enabled, if ADP is required, otg-rev should be
+			0x0200 or above.
+ - hnp-disable: tells OTG controllers we want to disable OTG HNP, normally HNP
+			is the basic function of real OTG except you want it
+			to be a srp-capable only B device.
+ - srp-disable: tells OTG controllers we want to disable OTG SRP, SRP is
+			optional for OTG device.
+ - adp-disable: tells OTG controllers we want to disable OTG ADP, ADP is
+			optional for OTG device.
 
 This is an attribute to a USB controller such as:
 
@@ -21,4 +34,6 @@ dwc3@4a030000 {
 	usb-phy = <&usb2_phy>, <&usb3,phy>;
 	maximum-speed = "super-speed";
 	dr_mode = "otg";
+	otg-rev = <0x0200>;
+	adp-disable;
 };
-- 
cgit v1.2.3


From 6c020ea8dc3a8adee81b6f141428a7a75249706e Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Date: Wed, 29 Jul 2015 12:30:39 +0100
Subject: arm64/Documentation: clarify wording regarding memory below the Image

Clarify that the memory below the start of the image but inside the
region covered by the linear mapping has no special significance to
the kernel, and may be used by the firmware provided that it is marked
as reserved.

Also, fix up some whitespace errors.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 Documentation/arm64/booting.txt | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 1690350f16e7..7d9d3c2286b2 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -81,7 +81,7 @@ The decompressed kernel image contains a 64-byte header as follows:
   u64 res3	= 0;		/* reserved */
   u64 res4	= 0;		/* reserved */
   u32 magic	= 0x644d5241;	/* Magic number, little endian, "ARM\x64" */
-  u32 res5;      		/* reserved (used for PE COFF offset) */
+  u32 res5;			/* reserved (used for PE COFF offset) */
 
 
 Header notes:
@@ -103,7 +103,7 @@ Header notes:
 
 - The flags field (introduced in v3.17) is a little-endian 64-bit field
   composed as follows:
-  Bit 0: 	Kernel endianness.  1 if BE, 0 if LE.
+  Bit 0:	Kernel endianness.  1 if BE, 0 if LE.
   Bits 1-63:	Reserved.
 
 - When image_size is zero, a bootloader should attempt to keep as much
@@ -115,11 +115,14 @@ The Image must be placed text_offset bytes from a 2MB aligned base
 address near the start of usable system RAM and called there. Memory
 below that base address is currently unusable by Linux, and therefore it
 is strongly recommended that this location is the start of system RAM.
+The region between the 2 MB aligned base address and the start of the
+image has no special significance to the kernel, and may be used for
+other purposes.
 At least image_size bytes from the start of the image must be free for
 use by the kernel.
 
-Any memory described to the kernel (even that below the 2MB aligned base
-address) which is not marked as reserved from the kernel e.g. with a
+Any memory described to the kernel (even that below the start of the
+image) which is not marked as reserved from the kernel (e.g., with a
 memreserve region in the device tree) will be considered as available to
 the kernel.
 
-- 
cgit v1.2.3


From 72c10fef98fc3b3c924ee022e451872517e61ecf Mon Sep 17 00:00:00 2001
From: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Date: Mon, 27 Jul 2015 20:20:29 -0700
Subject: soc: qcom: Add device tree binding for Shared Memory Device

Add device tree binding documentation for the Qualcomm Shared Memory
Device, used for communication between the various CPUs in the Qualcomm
SoCs.

Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Andy Gross <agross@codeaurora.org>
---
 .../devicetree/bindings/soc/qcom/qcom,smd.txt      | 79 ++++++++++++++++++++++
 1 file changed, 79 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt
new file mode 100644
index 000000000000..f65c76db9859
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt
@@ -0,0 +1,79 @@
+Qualcomm Shared Memory Driver (SMD) binding
+
+This binding describes the Qualcomm Shared Memory Driver, a fifo based
+communication channel for sending data between the various subsystems in
+Qualcomm platforms.
+
+- compatible:
+	Usage: required
+	Value type: <stringlist>
+	Definition: must be "qcom,smd"
+
+= EDGES
+
+Each subnode of the SMD node represents a remote subsystem or a remote
+processor of some sort - or in SMD language an "edge". The name of the edges
+are not important.
+The edge is described by the following properties:
+
+- interrupts:
+	Usage: required
+	Value type: <prop-encoded-array>
+	Definition: should specify the IRQ used by the remote processor to
+		    signal this processor about communication related updates
+
+- qcom,ipc:
+	Usage: required
+	Value type: <prop-encoded-array>
+	Definition: three entries specifying the outgoing ipc bit used for
+		    signaling the remote processor:
+		    - phandle to a syscon node representing the apcs registers
+		    - u32 representing offset to the register within the syscon
+		    - u32 representing the ipc bit within the register
+
+- qcom,smd-edge:
+	Usage: required
+	Value type: <u32>
+	Definition: the identifier of the remote processor in the smd channel
+		    allocation table
+
+= SMD DEVICES
+
+In turn, subnodes of the "edges" represent devices tied to SMD channels on that
+"edge". The names of the devices are not important. The properties of these
+nodes are defined by the individual bindings for the SMD devices - but must
+contain the following property:
+
+- qcom,smd-channels:
+	Usage: required
+	Value type: <stringlist>
+	Definition: a list of channels tied to this device, used for matching
+		    the device to channels
+
+= EXAMPLE
+
+The following example represents a smd node, with one edge representing the
+"rpm" subsystem. For the "rpm" subsystem we have a device tied to the
+"rpm_request" channel.
+
+	apcs: syscon@f9011000 {
+		compatible = "syscon";
+		reg = <0xf9011000 0x1000>;
+	};
+
+	smd {
+		compatible = "qcom,smd";
+
+		rpm {
+			interrupts = <0 168 1>;
+			qcom,ipc = <&apcs 8 0>;
+			qcom,smd-edge = <15>;
+
+			rpm_requests {
+				compatible = "qcom,rpm-msm8974";
+				qcom,smd-channels = "rpm_requests";
+
+				...
+			};
+		};
+	};
-- 
cgit v1.2.3


From ba68227e610cec8e0bef7da7e04af3f479d9797d Mon Sep 17 00:00:00 2001
From: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Date: Mon, 27 Jul 2015 20:20:31 -0700
Subject: devicetree: soc: Add Qualcomm SMD based RPM DT binding

Add binding documentation for the Qualcomm Resource Power Manager (RPM)
using shared memory (Qualcomm SMD) as transport mechanism. This is found
in 8974 and newer based devices.

The binding currently describes the rpm itself and the regulator
subnodes.

Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Andy Gross <agross@codeaurora.org>
---
 .../devicetree/bindings/soc/qcom,smd-rpm.txt       | 117 +++++++++++++++++++++
 1 file changed, 117 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/soc/qcom,smd-rpm.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/soc/qcom,smd-rpm.txt b/Documentation/devicetree/bindings/soc/qcom,smd-rpm.txt
new file mode 100644
index 000000000000..e27f5c4c54fd
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/qcom,smd-rpm.txt
@@ -0,0 +1,117 @@
+Qualcomm Resource Power Manager (RPM) over SMD
+
+This driver is used to interface with the Resource Power Manager (RPM) found in
+various Qualcomm platforms. The RPM allows each component in the system to vote
+for state of the system resources, such as clocks, regulators and bus
+frequencies.
+
+- compatible:
+	Usage: required
+	Value type: <string>
+	Definition: must be one of:
+		    "qcom,rpm-msm8974"
+
+- qcom,smd-channels:
+	Usage: required
+	Value type: <stringlist>
+	Definition: Shared Memory channel used for communication with the RPM
+
+= SUBDEVICES
+
+The RPM exposes resources to its subnodes. The below bindings specify the set
+of valid subnodes that can operate on these resources.
+
+== Regulators
+
+Regulator nodes are identified by their compatible:
+
+- compatible:
+	Usage: required
+	Value type: <string>
+	Definition: must be one of:
+		    "qcom,rpm-pm8841-regulators"
+		    "qcom,rpm-pm8941-regulators"
+
+- vdd_s1-supply:
+- vdd_s2-supply:
+- vdd_s3-supply:
+- vdd_s4-supply:
+- vdd_s5-supply:
+- vdd_s6-supply:
+- vdd_s7-supply:
+- vdd_s8-supply:
+	Usage: optional (pm8841 only)
+	Value type: <phandle>
+	Definition: reference to regulator supplying the input pin, as
+		    described in the data sheet
+
+- vdd_s1-supply:
+- vdd_s2-supply:
+- vdd_s3-supply:
+- vdd_l1_l3-supply:
+- vdd_l2_lvs1_2_3-supply:
+- vdd_l4_l11-supply:
+- vdd_l5_l7-supply:
+- vdd_l6_l12_l14_l15-supply:
+- vdd_l8_l16_l18_l19-supply:
+- vdd_l9_l10_l17_l22-supply:
+- vdd_l13_l20_l23_l24-supply:
+- vdd_l21-supply:
+- vin_5vs-supply:
+	Usage: optional (pm8941 only)
+	Value type: <phandle>
+	Definition: reference to regulator supplying the input pin, as
+		    described in the data sheet
+
+The regulator node houses sub-nodes for each regulator within the device. Each
+sub-node is identified using the node's name, with valid values listed for each
+of the pmics below.
+
+pm8841:
+	s1, s2, s3, s4, s5, s6, s7, s8
+
+pm8941:
+	s1, s2, s3, s4, l1, l2, l3, l4, l5, l6, l7, l8, l9, l10, l11, l12, l13,
+	l14, l15, l16, l17, l18, l19, l20, l21, l22, l23, l24, lvs1, lvs2,
+	lvs3, 5vs1, 5vs2
+
+The content of each sub-node is defined by the standard binding for regulators -
+see regulator.txt.
+
+= EXAMPLE
+
+	smd {
+		compatible = "qcom,smd";
+
+		rpm {
+			interrupts = <0 168 1>;
+			qcom,ipc = <&apcs 8 0>;
+			qcom,smd-edge = <15>;
+
+			rpm_requests {
+				compatible = "qcom,rpm-msm8974";
+				qcom,smd-channels = "rpm_requests";
+
+				pm8941-regulators {
+					compatible = "qcom,rpm-pm8941-regulators";
+					vdd_l13_l20_l23_l24-supply = <&pm8941_boost>;
+
+					pm8941_s3: s3 {
+						regulator-min-microvolt = <1800000>;
+						regulator-max-microvolt = <1800000>;
+					};
+
+					pm8941_boost: s4 {
+						regulator-min-microvolt = <5000000>;
+						regulator-max-microvolt = <5000000>;
+					};
+
+					pm8941_l20: l20 {
+						regulator-min-microvolt = <2950000>;
+						regulator-max-microvolt = <2950000>;
+					};
+				};
+			};
+		};
+	};
+
-- 
cgit v1.2.3


From 6aad8bf99338fca5a6ef602f2d7e04546f79fd03 Mon Sep 17 00:00:00 2001
From: Ray Jui <rjui@broadcom.com>
Date: Mon, 27 Jul 2015 15:42:21 -0700
Subject: arm64: dts: Add Broadcom North Star 2 support

Add Broadcom NS2 device tree binding document. Also add initial device
tree dtsi for Broadcom North Star 2 (NS2) SoC and board support for NS2
SVK board

Signed-off-by: Jon Mason <jonmason@broadcom.com>
Signed-off-by: Ray Jui <rjui@broadcom.com>
Reviewed-by: Scott Branden <sbranden@broadcom.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
---
 Documentation/devicetree/bindings/arm/bcm/ns2.txt |   9 ++
 arch/arm64/boot/dts/Makefile                      |   1 +
 arch/arm64/boot/dts/broadcom/Makefile             |   5 +
 arch/arm64/boot/dts/broadcom/ns2-svk.dts          |  59 +++++++++++
 arch/arm64/boot/dts/broadcom/ns2.dtsi             | 118 ++++++++++++++++++++++
 5 files changed, 192 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/bcm/ns2.txt
 create mode 100644 arch/arm64/boot/dts/broadcom/Makefile
 create mode 100644 arch/arm64/boot/dts/broadcom/ns2-svk.dts
 create mode 100644 arch/arm64/boot/dts/broadcom/ns2.dtsi

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/bcm/ns2.txt b/Documentation/devicetree/bindings/arm/bcm/ns2.txt
new file mode 100644
index 000000000000..35f056f4a1c3
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/bcm/ns2.txt
@@ -0,0 +1,9 @@
+Broadcom North Star 2 (NS2) device tree bindings
+------------------------------------------------
+
+Boards with NS2 shall have the following properties:
+
+Required root node property:
+
+NS2 SVK board
+compatible = "brcm,ns2-svk", "brcm,ns2";
diff --git a/arch/arm64/boot/dts/Makefile b/arch/arm64/boot/dts/Makefile
index 0c57290c8ba9..1560478d6eb2 100644
--- a/arch/arm64/boot/dts/Makefile
+++ b/arch/arm64/boot/dts/Makefile
@@ -1,6 +1,7 @@
 dts-dirs += amd
 dts-dirs += apm
 dts-dirs += arm
+dts-dirs += broadcom
 dts-dirs += cavium
 dts-dirs += exynos
 dts-dirs += freescale
diff --git a/arch/arm64/boot/dts/broadcom/Makefile b/arch/arm64/boot/dts/broadcom/Makefile
new file mode 100644
index 000000000000..e21fe66f1837
--- /dev/null
+++ b/arch/arm64/boot/dts/broadcom/Makefile
@@ -0,0 +1,5 @@
+dtb-$(CONFIG_ARCH_BCM_IPROC) += ns2-svk.dtb
+
+always		:= $(dtb-y)
+subdir-y	:= $(dts-dirs)
+clean-files	:= *.dtb
diff --git a/arch/arm64/boot/dts/broadcom/ns2-svk.dts b/arch/arm64/boot/dts/broadcom/ns2-svk.dts
new file mode 100644
index 000000000000..244baf879dc9
--- /dev/null
+++ b/arch/arm64/boot/dts/broadcom/ns2-svk.dts
@@ -0,0 +1,59 @@
+/*
+ *  BSD LICENSE
+ *
+ *  Copyright(c) 2015 Broadcom Corporation.  All rights reserved.
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ *    * Redistributions of source code must retain the above copyright
+ *      notice, this list of conditions and the following disclaimer.
+ *    * Redistributions in binary form must reproduce the above copyright
+ *      notice, this list of conditions and the following disclaimer in
+ *      the documentation and/or other materials provided with the
+ *      distribution.
+ *    * Neither the name of Broadcom Corporation nor the names of its
+ *      contributors may be used to endorse or promote products derived
+ *      from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *  OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *  DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *  THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/dts-v1/;
+
+#include "ns2.dtsi"
+
+/ {
+	model = "Broadcom NS2 SVK";
+	compatible = "brcm,ns2-svk", "brcm,ns2";
+
+	aliases {
+		serial0 = &uart3;
+	};
+
+	chosen {
+		stdout-path = "serial0:115200n8";
+	};
+
+	memory {
+		device_type = "memory";
+		reg = <0x000000000 0x80000000 0x00000000 0x40000000>;
+	};
+
+	soc: soc {
+		uart3: serial@66130000 {
+			status = "ok";
+		};
+	};
+};
diff --git a/arch/arm64/boot/dts/broadcom/ns2.dtsi b/arch/arm64/boot/dts/broadcom/ns2.dtsi
new file mode 100644
index 000000000000..3c92d92278e5
--- /dev/null
+++ b/arch/arm64/boot/dts/broadcom/ns2.dtsi
@@ -0,0 +1,118 @@
+/*
+ *  BSD LICENSE
+ *
+ *  Copyright(c) 2015 Broadcom Corporation.  All rights reserved.
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ *    * Redistributions of source code must retain the above copyright
+ *      notice, this list of conditions and the following disclaimer.
+ *    * Redistributions in binary form must reproduce the above copyright
+ *      notice, this list of conditions and the following disclaimer in
+ *      the documentation and/or other materials provided with the
+ *      distribution.
+ *    * Neither the name of Broadcom Corporation nor the names of its
+ *      contributors may be used to endorse or promote products derived
+ *      from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *  OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *  DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *  THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <dt-bindings/interrupt-controller/arm-gic.h>
+
+/memreserve/ 0x84b00000 0x00000008;
+
+/ {
+	compatible = "brcm,ns2";
+	interrupt-parent = <&gic>;
+	#address-cells = <2>;
+	#size-cells = <2>;
+
+	cpus {
+		#address-cells = <2>;
+		#size-cells = <0>;
+
+		cpu@0 {
+			device_type = "cpu";
+			compatible = "arm,cortex-a57", "arm,armv8";
+			reg = <0 0>;
+			enable-method = "spin-table";
+			cpu-release-addr = <0 0x84b00000>;
+		};
+
+		cpu@1 {
+			device_type = "cpu";
+			compatible = "arm,cortex-a57", "arm,armv8";
+			reg = <0 1>;
+			enable-method = "spin-table";
+			cpu-release-addr = <0 0x84b00000>;
+		};
+
+		cpu@2 {
+			device_type = "cpu";
+			compatible = "arm,cortex-a57", "arm,armv8";
+			reg = <0 2>;
+			enable-method = "spin-table";
+			cpu-release-addr = <0 0x84b00000>;
+		};
+
+		cpu@3 {
+			device_type = "cpu";
+			compatible = "arm,cortex-a57", "arm,armv8";
+			reg = <0 3>;
+			enable-method = "spin-table";
+			cpu-release-addr = <0 0x84b00000>;
+		};
+	};
+
+	timer {
+		compatible = "arm,armv8-timer";
+		interrupts = <GIC_PPI 13 (GIC_CPU_MASK_RAW(0xff) |
+			      IRQ_TYPE_EDGE_RISING)>,
+			     <GIC_PPI 14 (GIC_CPU_MASK_RAW(0xff) |
+			      IRQ_TYPE_EDGE_RISING)>,
+			     <GIC_PPI 11 (GIC_CPU_MASK_RAW(0xff) |
+			      IRQ_TYPE_EDGE_RISING)>,
+			     <GIC_PPI 10 (GIC_CPU_MASK_RAW(0xff) |
+			      IRQ_TYPE_EDGE_RISING)>;
+	};
+
+	soc: soc {
+		compatible = "simple-bus";
+		#address-cells = <1>;
+		#size-cells = <1>;
+		ranges = <0 0 0 0xffffffff>;
+
+		gic: interrupt-controller@65210000 {
+			compatible = "arm,gic-400";
+			#interrupt-cells = <3>;
+			interrupt-controller;
+			reg = <0x65210000 0x1000>,
+			      <0x65220000 0x1000>,
+			      <0x65240000 0x2000>,
+			      <0x65260000 0x1000>;
+		};
+
+		uart3: serial@66130000 {
+			compatible = "snps,dw-apb-uart";
+			reg = <0x66130000 0x100>;
+			interrupts = <GIC_SPI 393 IRQ_TYPE_LEVEL_HIGH>;
+			reg-shift = <2>;
+			reg-io-width = <4>;
+			clock-frequency = <23961600>;
+			status = "disabled";
+		};
+	};
+};
-- 
cgit v1.2.3


From ee2b7f60a5c5e391fdde907ac61f135d87e74622 Mon Sep 17 00:00:00 2001
From: Tim Bird <tim.bird@sonymobile.com>
Date: Thu, 16 Jul 2015 16:55:31 -0700
Subject: ARM: dts: qcom: Add binding for the qcom coincell charger

This binding is used to configure the driver for the coincell charger
found in Qualcomm PMICs.

Signed-off-by: Tim Bird <tim.bird@sonymobile.com>
Reviewed-by: Andy Gross <agross@codeaurora.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 .../bindings/power/qcom,coincell-charger.txt       | 48 ++++++++++++++++++++++
 1 file changed, 48 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/power/qcom,coincell-charger.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/power/qcom,coincell-charger.txt b/Documentation/devicetree/bindings/power/qcom,coincell-charger.txt
new file mode 100644
index 000000000000..0e6d8754e7ec
--- /dev/null
+++ b/Documentation/devicetree/bindings/power/qcom,coincell-charger.txt
@@ -0,0 +1,48 @@
+Qualcomm Coincell Charger:
+
+The hardware block controls charging for a coincell or capacitor that is
+used to provide power backup for certain features of the power management
+IC (PMIC)
+
+- compatible:
+	Usage: required
+	Value type: <string>
+	Definition: must be: "qcom,pm8941-coincell"
+
+- reg:
+	Usage: required
+	Value type: <u32>
+	Definition: base address of the coincell charger registers
+
+- qcom,rset-ohms:
+	Usage: required
+	Value type: <u32>
+	Definition: resistance (in ohms) for current-limiting resistor
+		must be one of: 800, 1200, 1700, 2100
+
+- qcom,vset-millivolts:
+	Usage: required
+	Value type: <u32>
+	Definition: voltage (in millivolts) to apply for charging
+		must be one of: 2500, 3000, 3100, 3200
+
+- qcom,charger-disable:
+	Usage: optional
+	Value type: <boolean>
+	Definition: definining this property disables charging
+
+This charger is a sub-node of one of the 8941 PMIC blocks, and is specified
+as a child node in DTS of that node.  See ../mfd/qcom,spmi-pmic.txt and
+../mfd/qcom-pm8xxx.txt
+
+Example:
+
+	pm8941@0 {
+		coincell@2800 {
+			compatible = "qcom,pm8941-coincell";
+			reg = <0x2800>;
+
+			qcom,rset-ohms = <2100>;
+			qcom,vset-millivolts = <3000>;
+		};
+	};
-- 
cgit v1.2.3


From 71382bc0431ea5901640a3794fea4eeb71cbcb2e Mon Sep 17 00:00:00 2001
From: WingMan Kwok <w-kwok2@ti.com>
Date: Tue, 28 Jul 2015 16:01:11 -0400
Subject: net: netcp: Fixes efuse mac addr swap on k2e and k2l

On some of the K2E and K2L platforms, the two DWORDs in
efuse occupied by the pre-programmed mac address for
slave port 1 are swapped.  To workaround this issue,
this patch adds a new define NETCP_EFUSE_ADDR_SWAP (2)
which signifies the occurrence of such swapping so that
the driver can take proper action.  The flag can be
enabled in the corresponding netcp interface dts binding
as efuse-mac = <2>  under the corresponding netcp
interface node.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/devicetree/bindings/net/keystone-netcp.txt |  6 +++++-
 drivers/net/ethernet/ti/netcp_core.c                     | 15 +++++++++++++--
 2 files changed, 18 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt b/Documentation/devicetree/bindings/net/keystone-netcp.txt
index d0e6fa38f335..b30ab6b5cbfa 100644
--- a/Documentation/devicetree/bindings/net/keystone-netcp.txt
+++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt
@@ -130,7 +130,11 @@ Required properties:
 
 Optional properties:
 - efuse-mac:	If this is 1, then the MAC address for the interface is
-		obtained from the device efuse mac address register
+		obtained from the device efuse mac address register.
+		If this is 2, the two DWORDs occupied by the MAC address
+		are swapped.  The netcp driver will swap the two DWORDs
+		back to the proper order when this property is set to 2
+		when it obtains the mac address from efuse.
 - local-mac-address:	the driver is designed to use the of_get_mac_address api
 			only if efuse-mac is 0. When efuse-mac is 0, the MAC
 			address is obtained from local-mac-address. If this
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index 3ca87f26582a..6f2e151fbc73 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -51,6 +51,8 @@
 		    NETIF_MSG_PKTDATA	| NETIF_MSG_TX_QUEUED	|	\
 		    NETIF_MSG_RX_STATUS)
 
+#define NETCP_EFUSE_ADDR_SWAP	2
+
 #define knav_queue_get_id(q)	knav_queue_device_control(q, \
 				KNAV_QUEUE_GET_ID, (unsigned long)NULL)
 
@@ -172,13 +174,22 @@ static void set_words(u32 *words, int num_words, u32 *desc)
 }
 
 /* Read the e-fuse value as 32 bit values to be endian independent */
-static int emac_arch_get_mac_addr(char *x, void __iomem *efuse_mac)
+static int emac_arch_get_mac_addr(char *x, void __iomem *efuse_mac, u32 swap)
 {
 	unsigned int addr0, addr1;
 
 	addr1 = readl(efuse_mac + 4);
 	addr0 = readl(efuse_mac);
 
+	switch (swap) {
+	case NETCP_EFUSE_ADDR_SWAP:
+		addr0 = addr1;
+		addr1 = readl(efuse_mac);
+		break;
+	default:
+		break;
+	}
+
 	x[0] = (addr1 & 0x0000ff00) >> 8;
 	x[1] = addr1 & 0x000000ff;
 	x[2] = (addr0 & 0xff000000) >> 24;
@@ -1902,7 +1913,7 @@ static int netcp_create_interface(struct netcp_device *netcp_device,
 			goto quit;
 		}
 
-		emac_arch_get_mac_addr(efuse_mac_addr, efuse);
+		emac_arch_get_mac_addr(efuse_mac_addr, efuse, efuse_mac);
 		if (is_valid_ether_addr(efuse_mac_addr))
 			ether_addr_copy(ndev->dev_addr, efuse_mac_addr);
 		else
-- 
cgit v1.2.3


From a0ddef81f4aeeeec3326f6b6a255d8ea13b41908 Mon Sep 17 00:00:00 2001
From: Chris Metcalf <cmetcalf@ezchip.com>
Date: Wed, 22 Jul 2015 14:30:14 -0400
Subject: tile: enable full SECCOMP support

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
---
 .../seccomp/seccomp-filter/arch-support.txt        |  2 +-
 arch/tile/Kconfig                                  | 17 +++++++++++++
 arch/tile/include/asm/Kbuild                       |  1 +
 arch/tile/include/asm/elf.h                        |  4 +---
 arch/tile/include/asm/syscall.h                    | 28 +++++++++++++++++++++-
 arch/tile/kernel/intvec_32.S                       |  1 +
 arch/tile/kernel/intvec_64.S                       |  1 +
 arch/tile/kernel/ptrace.c                          |  3 +++
 include/uapi/linux/audit.h                         |  3 +++
 include/uapi/linux/elf-em.h                        |  2 ++
 10 files changed, 57 insertions(+), 5 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/features/seccomp/seccomp-filter/arch-support.txt b/Documentation/features/seccomp/seccomp-filter/arch-support.txt
index bea800910342..76d39d66a5d7 100644
--- a/Documentation/features/seccomp/seccomp-filter/arch-support.txt
+++ b/Documentation/features/seccomp/seccomp-filter/arch-support.txt
@@ -32,7 +32,7 @@
     |       score: | TODO |
     |          sh: | TODO |
     |       sparc: | TODO |
-    |        tile: | TODO |
+    |        tile: |  ok  |
     |          um: | TODO |
     |   unicore32: | TODO |
     |         x86: |  ok  |
diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 9def1f52d03a..2ba12d761723 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -32,6 +32,7 @@ config TILE
 	select EDAC_SUPPORT
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
+	select HAVE_ARCH_SECCOMP_FILTER
 
 # FIXME: investigate whether we need/want these options.
 #	select HAVE_IOREMAP_PROT
@@ -221,6 +222,22 @@ config COMPAT
 	  If enabled, the kernel will support running TILE-Gx binaries
 	  that were built with the -m32 option.
 
+config SECCOMP
+	bool "Enable seccomp to safely compute untrusted bytecode"
+	depends on PROC_FS
+	help
+	  This kernel feature is useful for number crunching applications
+	  that may need to compute untrusted bytecode during their
+	  execution. By using pipes or other transports made available to
+	  the process as file descriptors supporting the read/write
+	  syscalls, it's possible to isolate those applications in
+	  their own address space using seccomp. Once seccomp is
+	  enabled via prctl, it cannot be disabled and the task is only
+	  allowed to execute a few safe syscalls defined by each seccomp
+	  mode.
+
+	  If unsure, say N.
+
 config SYSVIPC_COMPAT
 	def_bool y
 	depends on COMPAT && SYSVIPC
diff --git a/arch/tile/include/asm/Kbuild b/arch/tile/include/asm/Kbuild
index d8a843163471..ba35c41c71ff 100644
--- a/arch/tile/include/asm/Kbuild
+++ b/arch/tile/include/asm/Kbuild
@@ -28,6 +28,7 @@ generic-y += poll.h
 generic-y += posix_types.h
 generic-y += preempt.h
 generic-y += resource.h
+generic-y += seccomp.h
 generic-y += sembuf.h
 generic-y += serial.h
 generic-y += shmbuf.h
diff --git a/arch/tile/include/asm/elf.h b/arch/tile/include/asm/elf.h
index 41d9878a9686..c505d77e4d06 100644
--- a/arch/tile/include/asm/elf.h
+++ b/arch/tile/include/asm/elf.h
@@ -22,6 +22,7 @@
 #include <arch/chip.h>
 
 #include <linux/ptrace.h>
+#include <linux/elf-em.h>
 #include <asm/byteorder.h>
 #include <asm/page.h>
 
@@ -30,9 +31,6 @@ typedef unsigned long elf_greg_t;
 #define ELF_NGREG (sizeof(struct pt_regs) / sizeof(elf_greg_t))
 typedef elf_greg_t elf_gregset_t[ELF_NGREG];
 
-#define EM_TILEPRO 188
-#define EM_TILEGX  191
-
 /* Provide a nominal data structure. */
 #define ELF_NFPREG	0
 typedef double elf_fpreg_t;
diff --git a/arch/tile/include/asm/syscall.h b/arch/tile/include/asm/syscall.h
index 9644b88f133d..373d73064ea1 100644
--- a/arch/tile/include/asm/syscall.h
+++ b/arch/tile/include/asm/syscall.h
@@ -20,6 +20,8 @@
 
 #include <linux/sched.h>
 #include <linux/err.h>
+#include <linux/audit.h>
+#include <linux/compat.h>
 #include <arch/abi.h>
 
 /* The array of function pointers for syscalls. */
@@ -61,7 +63,15 @@ static inline void syscall_set_return_value(struct task_struct *task,
 					    struct pt_regs *regs,
 					    int error, long val)
 {
-	regs->regs[0] = (long) error ?: val;
+	if (error) {
+		/* R0 is the passed-in negative error, R1 is positive. */
+		regs->regs[0] = error;
+		regs->regs[1] = -error;
+	} else {
+		/* R1 set to zero to indicate no error. */
+		regs->regs[0] = val;
+		regs->regs[1] = 0;
+	}
 }
 
 static inline void syscall_get_arguments(struct task_struct *task,
@@ -82,4 +92,20 @@ static inline void syscall_set_arguments(struct task_struct *task,
 	memcpy(&regs[i], args, n * sizeof(args[0]));
 }
 
+/*
+ * We don't care about endianness (__AUDIT_ARCH_LE bit) here because
+ * tile has the same system calls both on little- and big- endian.
+ */
+static inline int syscall_get_arch(void)
+{
+	if (is_compat_task())
+		return AUDIT_ARCH_TILEGX32;
+
+#ifdef CONFIG_TILEGX
+	return AUDIT_ARCH_TILEGX;
+#else
+	return AUDIT_ARCH_TILEPRO;
+#endif
+}
+
 #endif	/* _ASM_TILE_SYSCALL_H */
diff --git a/arch/tile/kernel/intvec_32.S b/arch/tile/kernel/intvec_32.S
index cdbda45a4e4b..fbbe2ea882ea 100644
--- a/arch/tile/kernel/intvec_32.S
+++ b/arch/tile/kernel/intvec_32.S
@@ -1224,6 +1224,7 @@ handle_syscall:
 	 jal    do_syscall_trace_enter
 	}
 	FEEDBACK_REENTER(handle_syscall)
+	blz     r0, .Lsyscall_sigreturn_skip
 
 	/*
 	 * We always reload our registers from the stack at this
diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S
index 800b91d3f9dc..58964d209d4d 100644
--- a/arch/tile/kernel/intvec_64.S
+++ b/arch/tile/kernel/intvec_64.S
@@ -1247,6 +1247,7 @@ handle_syscall:
 	 jal    do_syscall_trace_enter
 	}
 	FEEDBACK_REENTER(handle_syscall)
+	bltz    r0, .Lsyscall_sigreturn_skip
 
 	/*
 	 * We always reload our registers from the stack at this
diff --git a/arch/tile/kernel/ptrace.c b/arch/tile/kernel/ptrace.c
index f84eed8243da..bdc126faf741 100644
--- a/arch/tile/kernel/ptrace.c
+++ b/arch/tile/kernel/ptrace.c
@@ -262,6 +262,9 @@ int do_syscall_trace_enter(struct pt_regs *regs)
 	if (work & _TIF_NOHZ)
 		user_exit();
 
+	if (secure_computing() == -1)
+		return -1;
+
 	if (work & _TIF_SYSCALL_TRACE) {
 		if (tracehook_report_syscall_entry(regs))
 			regs->regs[TREG_SYSCALL_NR] = -1;
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index d3475e1f15ec..1f977dd4c370 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -382,6 +382,9 @@ enum {
 #define AUDIT_ARCH_SHEL64	(EM_SH|__AUDIT_ARCH_64BIT|__AUDIT_ARCH_LE)
 #define AUDIT_ARCH_SPARC	(EM_SPARC)
 #define AUDIT_ARCH_SPARC64	(EM_SPARCV9|__AUDIT_ARCH_64BIT)
+#define AUDIT_ARCH_TILEGX	(EM_TILEGX|__AUDIT_ARCH_64BIT|__AUDIT_ARCH_LE)
+#define AUDIT_ARCH_TILEGX32	(EM_TILEGX|__AUDIT_ARCH_LE)
+#define AUDIT_ARCH_TILEPRO	(EM_TILEPRO|__AUDIT_ARCH_LE)
 #define AUDIT_ARCH_X86_64	(EM_X86_64|__AUDIT_ARCH_64BIT|__AUDIT_ARCH_LE)
 
 #define AUDIT_PERM_EXEC		1
diff --git a/include/uapi/linux/elf-em.h b/include/uapi/linux/elf-em.h
index b08829667ed7..3429a3ba382b 100644
--- a/include/uapi/linux/elf-em.h
+++ b/include/uapi/linux/elf-em.h
@@ -38,6 +38,8 @@
 #define EM_ALTERA_NIOS2	113	/* Altera Nios II soft-core processor */
 #define EM_TI_C6000	140	/* TI C6X DSPs */
 #define EM_AARCH64	183	/* ARM 64 bit */
+#define EM_TILEPRO	188	/* Tilera TILEPro */
+#define EM_TILEGX	191	/* Tilera TILE-Gx */
 #define EM_FRV		0x5441	/* Fujitsu FR-V */
 #define EM_AVR32	0x18ad	/* Atmel AVR32 */
 
-- 
cgit v1.2.3


From 6f98f545b0b4effdf67e83e214a4eb13cd41fba2 Mon Sep 17 00:00:00 2001
From: "Ivan T. Ivanov" <ivan.ivanov@linaro.org>
Date: Tue, 28 Jul 2015 11:10:22 +0300
Subject: usb: phy: msm: Add D+/D- lines route control

apq8016-sbc board is using Dual SPDT USB Switch (TC7USB40MU),
witch is controlled by GPIO to de/multiplex D+/D- USB lines to
USB2513B Hub and uB connector. Add support for this.

Signed-off-by: Ivan T. Ivanov <ivan.ivanov@linaro.org>
Signed-off-by: Felipe Balbi <balbi@ti.com>
---
 .../devicetree/bindings/usb/msm-hsusb.txt          |  4 ++
 drivers/usb/phy/phy-msm-usb.c                      | 47 ++++++++++++++++++++++
 include/linux/usb/msm_hsusb.h                      |  7 ++++
 3 files changed, 58 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/usb/msm-hsusb.txt b/Documentation/devicetree/bindings/usb/msm-hsusb.txt
index bd8d9e753029..8654a3ec23e4 100644
--- a/Documentation/devicetree/bindings/usb/msm-hsusb.txt
+++ b/Documentation/devicetree/bindings/usb/msm-hsusb.txt
@@ -52,6 +52,10 @@ Required properties:
 Optional properties:
 - dr_mode:      One of "host", "peripheral" or "otg". Defaults to "otg"
 
+- switch-gpio:  A phandle + gpio-specifier pair. Some boards are using Dual
+                SPDT USB Switch, witch is cotrolled by GPIO to de/multiplex
+                D+/D- USB lines between connectors.
+
 - qcom,phy-init-sequence: PHY configuration sequence values. This is related to Device
                 Mode Eye Diagram test. Start address at which these values will be
                 written is ULPI_EXT_VENDOR_SPECIFIC. Value of -1 is reserved as
diff --git a/drivers/usb/phy/phy-msm-usb.c b/drivers/usb/phy/phy-msm-usb.c
index 61d86d8bf5b7..c58c3c0dbe35 100644
--- a/drivers/usb/phy/phy-msm-usb.c
+++ b/drivers/usb/phy/phy-msm-usb.c
@@ -18,6 +18,7 @@
 
 #include <linux/module.h>
 #include <linux/device.h>
+#include <linux/gpio/consumer.h>
 #include <linux/platform_device.h>
 #include <linux/clk.h>
 #include <linux/slab.h>
@@ -32,6 +33,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/of.h>
 #include <linux/of_device.h>
+#include <linux/reboot.h>
 #include <linux/reset.h>
 
 #include <linux/usb.h>
@@ -1471,6 +1473,14 @@ static int msm_otg_vbus_notifier(struct notifier_block *nb, unsigned long event,
 	else
 		clear_bit(B_SESS_VLD, &motg->inputs);
 
+	if (test_bit(B_SESS_VLD, &motg->inputs)) {
+		/* Switch D+/D- lines to Device connector */
+		gpiod_set_value_cansleep(motg->switch_gpio, 0);
+	} else {
+		/* Switch D+/D- lines to Hub */
+		gpiod_set_value_cansleep(motg->switch_gpio, 1);
+	}
+
 	schedule_work(&motg->sm_work);
 
 	return NOTIFY_DONE;
@@ -1546,6 +1556,11 @@ static int msm_otg_read_dt(struct platform_device *pdev, struct msm_otg *motg)
 
 	motg->manual_pullup = of_property_read_bool(node, "qcom,manual-pullup");
 
+	motg->switch_gpio = devm_gpiod_get_optional(&pdev->dev, "switch",
+						    GPIOD_OUT_LOW);
+	if (IS_ERR(motg->switch_gpio))
+		return PTR_ERR(motg->switch_gpio);
+
 	ext_id = ERR_PTR(-ENODEV);
 	ext_vbus = ERR_PTR(-ENODEV);
 	if (of_property_read_bool(node, "extcon")) {
@@ -1617,6 +1632,19 @@ static int msm_otg_read_dt(struct platform_device *pdev, struct msm_otg *motg)
 	return 0;
 }
 
+static int msm_otg_reboot_notify(struct notifier_block *this,
+				 unsigned long code, void *unused)
+{
+	struct msm_otg *motg = container_of(this, struct msm_otg, reboot);
+
+	/*
+	 * Ensure that D+/D- lines are routed to uB connector, so
+	 * we could load bootloader/kernel at next reboot
+	 */
+	gpiod_set_value_cansleep(motg->switch_gpio, 0);
+	return NOTIFY_DONE;
+}
+
 static int msm_otg_probe(struct platform_device *pdev)
 {
 	struct regulator_bulk_data regs[3];
@@ -1781,6 +1809,17 @@ static int msm_otg_probe(struct platform_device *pdev)
 			dev_dbg(&pdev->dev, "Can not create mode change file\n");
 	}
 
+	if (test_bit(B_SESS_VLD, &motg->inputs)) {
+		/* Switch D+/D- lines to Device connector */
+		gpiod_set_value_cansleep(motg->switch_gpio, 0);
+	} else {
+		/* Switch D+/D- lines to Hub */
+		gpiod_set_value_cansleep(motg->switch_gpio, 1);
+	}
+
+	motg->reboot.notifier_call = msm_otg_reboot_notify;
+	register_reboot_notifier(&motg->reboot);
+
 	pm_runtime_set_active(&pdev->dev);
 	pm_runtime_enable(&pdev->dev);
 
@@ -1807,6 +1846,14 @@ static int msm_otg_remove(struct platform_device *pdev)
 	if (phy->otg->host || phy->otg->gadget)
 		return -EBUSY;
 
+	unregister_reboot_notifier(&motg->reboot);
+
+	/*
+	 * Ensure that D+/D- lines are routed to uB connector, so
+	 * we could load bootloader/kernel at next reboot
+	 */
+	gpiod_set_value_cansleep(motg->switch_gpio, 0);
+
 	extcon_unregister_notifier(motg->id.extcon, EXTCON_USB_HOST, &motg->id.nb);
 	extcon_unregister_notifier(motg->vbus.extcon, EXTCON_USB, &motg->vbus.nb);
 
diff --git a/include/linux/usb/msm_hsusb.h b/include/linux/usb/msm_hsusb.h
index 5df2c8f59aa0..8c8f6854c993 100644
--- a/include/linux/usb/msm_hsusb.h
+++ b/include/linux/usb/msm_hsusb.h
@@ -155,6 +155,10 @@ struct msm_usb_cable {
  *	starting controller using usbcmd run/stop bit.
  * @vbus: VBUS signal state trakining, using extcon framework
  * @id: ID signal state trakining, using extcon framework
+ * @switch_gpio: Descriptor for GPIO used to control external Dual
+ *               SPDT USB Switch.
+ * @reboot: Used to inform the driver to route USB D+/D- line to Device
+ *	    connector
  */
 struct msm_otg {
 	struct usb_phy phy;
@@ -188,6 +192,9 @@ struct msm_otg {
 
 	struct msm_usb_cable vbus;
 	struct msm_usb_cable id;
+
+	struct gpio_desc *switch_gpio;
+	struct notifier_block reboot;
 };
 
 #endif
-- 
cgit v1.2.3


From ddbb299db0e040aca8e26361ab778224eeacf9d3 Mon Sep 17 00:00:00 2001
From: Dudley Du <dudl@cypress.com>
Date: Thu, 30 Jul 2015 11:24:16 -0700
Subject: Input: cyapa - introduce device tree binding

Add device tree support for  Cypress trackpad devices.

Signed-off-by: Dudley Du <dudl@cypress.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 .../devicetree/bindings/input/cypress,cyapa.txt    | 44 ++++++++++++++++++++++
 .../devicetree/bindings/vendor-prefixes.txt        |  1 +
 drivers/input/mouse/cyapa.c                        | 10 +++++
 3 files changed, 55 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/input/cypress,cyapa.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/input/cypress,cyapa.txt b/Documentation/devicetree/bindings/input/cypress,cyapa.txt
new file mode 100644
index 000000000000..635a3b036630
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/cypress,cyapa.txt
@@ -0,0 +1,44 @@
+Cypress I2C Touchpad
+
+Required properties:
+- compatible: must be "cypress,cyapa".
+- reg: I2C address of the chip.
+- interrupt-parent: a phandle for the interrupt controller (see interrupt
+	binding[0]).
+- interrupts: interrupt to which the chip is connected (see interrupt
+	binding[0]).
+
+Optional properties:
+- wakeup-source: touchpad can be used as a wakeup source.
+- pinctrl-names: should be "default" (see pinctrl binding [1]).
+- pinctrl-0: a phandle pointing to the pin settings for the device (see
+	pinctrl binding [1]).
+- vcc-supply: a phandle for the regulator supplying 3.3V power.
+
+[0]: Documentation/devicetree/bindings/interrupt-controller/interrupts.txt
+[1]: Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt
+
+Example:
+	&i2c0 {
+		/* ... */
+
+		/* Cypress Gen3 touchpad */
+		touchpad@67 {
+			compatible = "cypress,cyapa";
+			reg = <0x24>;
+			interrupt-parent = <&gpio>;
+			interrupts = <2 IRQ_TYPE_EDGE_FALLING>;	/* GPIO 2 */
+			wakeup-source;
+		};
+
+		/* Cypress Gen5 and later touchpad */
+		touchpad@24 {
+			compatible = "cypress,cyapa";
+			reg = <0x24>;
+			interrupt-parent = <&gpio>;
+			interrupts = <2 IRQ_TYPE_EDGE_FALLING>;	/* GPIO 2 */
+			wakeup-source;
+		};
+
+		/* ... */
+	};
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index d444757c4d9e..57c0f5f8839c 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -54,6 +54,7 @@ cortina	Cortina Systems, Inc.
 cosmic	Cosmic Circuits
 crystalfontz	Crystalfontz America, Inc.
 cubietech	Cubietech, Ltd.
+cypress	Cypress Semiconductor Corporation
 dallas	Maxim Integrated Products (formerly Dallas Semiconductor)
 davicom	DAVICOM Semiconductor, Inc.
 delta	Delta Electronics, Inc.
diff --git a/drivers/input/mouse/cyapa.c b/drivers/input/mouse/cyapa.c
index 1479ca996647..eb76b61418f3 100644
--- a/drivers/input/mouse/cyapa.c
+++ b/drivers/input/mouse/cyapa.c
@@ -26,6 +26,7 @@
 #include <linux/uaccess.h>
 #include <linux/pm_runtime.h>
 #include <linux/acpi.h>
+#include <linux/of.h>
 #include "cyapa.h"
 
 
@@ -1487,11 +1488,20 @@ static const struct acpi_device_id cyapa_acpi_id[] = {
 MODULE_DEVICE_TABLE(acpi, cyapa_acpi_id);
 #endif
 
+#ifdef CONFIG_OF
+static const struct of_device_id cyapa_of_match[] = {
+	{ .compatible = "cypress,cyapa" },
+	{ /* sentinel */ }
+};
+MODULE_DEVICE_TABLE(of, cyapa_of_match);
+#endif
+
 static struct i2c_driver cyapa_driver = {
 	.driver = {
 		.name = "cyapa",
 		.pm = &cyapa_pm_ops,
 		.acpi_match_table = ACPI_PTR(cyapa_acpi_id),
+		.of_match_table = of_match_ptr(cyapa_of_match),
 	},
 
 	.probe = cyapa_probe,
-- 
cgit v1.2.3


From 8013d1d7eafb0589ca766db6b74026f76b7f5cb4 Mon Sep 17 00:00:00 2001
From: Hangbin Liu <liuhangbin@gmail.com>
Date: Thu, 30 Jul 2015 14:28:42 +0800
Subject: net/ipv6: add sysctl option accept_ra_min_hop_limit

Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.

RFC 4861, 6.3.4.  Processing Received Router Advertisements
   A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
   and Retrans Timer) may contain a value denoting that it is
   unspecified.  In such cases, the parameter should be ignored and the
   host should continue using whatever value it is already using.

   If the received Cur Hop Limit value is non-zero, the host SHOULD set
   its CurHopLimit variable to the received value.

So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/ip-sysctl.txt |  8 ++++++++
 include/linux/ipv6.h                   |  1 +
 include/uapi/linux/ipv6.h              |  1 +
 net/ipv6/addrconf.c                    | 10 ++++++++++
 net/ipv6/ndisc.c                       | 16 +++++++---------
 5 files changed, 27 insertions(+), 9 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 1a5ab21bcca5..00d26d919459 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1340,6 +1340,14 @@ accept_ra_from_local - BOOLEAN
 	   disabled if accept_ra_from_local is disabled
                on a specific interface.
 
+accept_ra_min_hop_limit - INTEGER
+	Minimum hop limit Information in Router Advertisement.
+
+	Hop limit Information in Router Advertisement less than this
+	variable shall be ignored.
+
+	Default: 1
+
 accept_ra_pinfo - BOOLEAN
 	Learn Prefix Information in Router Advertisement.
 
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 06ed637225b8..cb9dcad72372 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -29,6 +29,7 @@ struct ipv6_devconf {
 	__s32		max_desync_factor;
 	__s32		max_addresses;
 	__s32		accept_ra_defrtr;
+	__s32		accept_ra_min_hop_limit;
 	__s32		accept_ra_pinfo;
 #ifdef CONFIG_IPV6_ROUTER_PREF
 	__s32		accept_ra_rtr_pref;
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 641a146ead7d..80f3b74446a1 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -172,6 +172,7 @@ enum {
 	DEVCONF_ACCEPT_RA_MTU,
 	DEVCONF_STABLE_SECRET,
 	DEVCONF_USE_OIF_ADDRS_ONLY,
+	DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT,
 	DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index eb0c6a3a8a00..53e3a9d756b0 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -195,6 +195,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
 	.max_addresses		= IPV6_MAX_ADDRESSES,
 	.accept_ra_defrtr	= 1,
 	.accept_ra_from_local	= 0,
+	.accept_ra_min_hop_limit= 1,
 	.accept_ra_pinfo	= 1,
 #ifdef CONFIG_IPV6_ROUTER_PREF
 	.accept_ra_rtr_pref	= 1,
@@ -237,6 +238,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
 	.max_addresses		= IPV6_MAX_ADDRESSES,
 	.accept_ra_defrtr	= 1,
 	.accept_ra_from_local	= 0,
+	.accept_ra_min_hop_limit= 1,
 	.accept_ra_pinfo	= 1,
 #ifdef CONFIG_IPV6_ROUTER_PREF
 	.accept_ra_rtr_pref	= 1,
@@ -4588,6 +4590,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
 	array[DEVCONF_MAX_DESYNC_FACTOR] = cnf->max_desync_factor;
 	array[DEVCONF_MAX_ADDRESSES] = cnf->max_addresses;
 	array[DEVCONF_ACCEPT_RA_DEFRTR] = cnf->accept_ra_defrtr;
+	array[DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT] = cnf->accept_ra_min_hop_limit;
 	array[DEVCONF_ACCEPT_RA_PINFO] = cnf->accept_ra_pinfo;
 #ifdef CONFIG_IPV6_ROUTER_PREF
 	array[DEVCONF_ACCEPT_RA_RTR_PREF] = cnf->accept_ra_rtr_pref;
@@ -5484,6 +5487,13 @@ static struct addrconf_sysctl_table
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec,
 		},
+		{
+			.procname	= "accept_ra_min_hop_limit",
+			.data		= &ipv6_devconf.accept_ra_min_hop_limit,
+			.maxlen		= sizeof(int),
+			.mode		= 0644,
+			.proc_handler	= proc_dointvec,
+		},
 		{
 			.procname	= "accept_ra_pinfo",
 			.data		= &ipv6_devconf.accept_ra_pinfo,
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 0a05b35a90fc..6e184e02fd3c 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1225,18 +1225,16 @@ static void ndisc_router_discovery(struct sk_buff *skb)
 
 	if (rt)
 		rt6_set_expires(rt, jiffies + (HZ * lifetime));
-	if (ra_msg->icmph.icmp6_hop_limit) {
-		/* Only set hop_limit on the interface if it is higher than
-		 * the current hop_limit.
-		 */
-		if (in6_dev->cnf.hop_limit < ra_msg->icmph.icmp6_hop_limit) {
+	if (in6_dev->cnf.accept_ra_min_hop_limit < 256 &&
+	    ra_msg->icmph.icmp6_hop_limit) {
+		if (in6_dev->cnf.accept_ra_min_hop_limit <= ra_msg->icmph.icmp6_hop_limit) {
 			in6_dev->cnf.hop_limit = ra_msg->icmph.icmp6_hop_limit;
+			if (rt)
+				dst_metric_set(&rt->dst, RTAX_HOPLIMIT,
+					       ra_msg->icmph.icmp6_hop_limit);
 		} else {
-			ND_PRINTK(2, warn, "RA: Got route advertisement with lower hop_limit than current\n");
+			ND_PRINTK(2, warn, "RA: Got route advertisement with lower hop_limit than minimum\n");
 		}
-		if (rt)
-			dst_metric_set(&rt->dst, RTAX_HOPLIMIT,
-				       ra_msg->icmph.icmp6_hop_limit);
 	}
 
 skip_defrtr:
-- 
cgit v1.2.3


From 1837649fd350bf25e20d05dbdd4449160ad0219b Mon Sep 17 00:00:00 2001
From: Punnaiah Choudary Kalluri <punnaiah.choudary.kalluri@xilinx.com>
Date: Thu, 27 Nov 2014 20:47:33 +0530
Subject: Documentation: devicetree: Add ECC information to synopsys ddr
 controller

Add ECC information to synopsys ddr memory controller.

Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
---
 Documentation/devicetree/bindings/memory-controllers/synopsys.txt | 4 ++++
 1 file changed, 4 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/memory-controllers/synopsys.txt b/Documentation/devicetree/bindings/memory-controllers/synopsys.txt
index f9c6454146b6..a43d26d41e04 100644
--- a/Documentation/devicetree/bindings/memory-controllers/synopsys.txt
+++ b/Documentation/devicetree/bindings/memory-controllers/synopsys.txt
@@ -1,5 +1,9 @@
 Binding for Synopsys IntelliDDR Multi Protocol Memory Controller
 
+This controller has an optional ECC support in half-bus width (16-bit)
+configuration. The ECC controller corrects one bit error and detects
+two bit errors.
+
 Required properties:
  - compatible: Should be 'xlnx,zynq-ddrc-a05'
  - reg: Base address and size of the controllers memory area
-- 
cgit v1.2.3


From bae2c2d421cdea9dd8d62425eef99e389584cdb3 Mon Sep 17 00:00:00 2001
From: Robin Murphy <Robin.Murphy@arm.com>
Date: Wed, 29 Jul 2015 19:46:05 +0100
Subject: iommu/arm-smmu: Sort out coherency

Currently, we detect whether the SMMU has coherent page table walk
capability from the IDR0.CTTW field, and base our cache maintenance
decisions on that. In preparation for fixing the bogus DMA API usage,
however, we need to ensure that the DMA API agrees about this, which
necessitates deferring to the dma-coherent property in the device tree
for the final say.

As an added bonus, since systems exist where an external CTTW signal
has been tied off incorrectly at integration, allowing DT to override
it offers a neat workaround for coherency issues with such SMMUs.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 Documentation/devicetree/bindings/iommu/arm,smmu.txt |  6 ++++++
 drivers/iommu/arm-smmu.c                             | 20 +++++++++++++++++---
 2 files changed, 23 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 06760503a819..718074501fcb 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -43,6 +43,12 @@ conditions.
 
 ** System MMU optional properties:
 
+- dma-coherent  : Present if page table walks made by the SMMU are
+                  cache coherent with the CPU.
+
+                  NOTE: this only applies to the SMMU itself, not
+                  masters connected upstream of the SMMU.
+
 - calxeda,smmu-secure-config-access : Enable proper handling of buggy
                   implementations that always use secure access to
                   SMMU configuration registers. In this case non-secure
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 4cd0c29cb585..0583ed2f33c0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -37,6 +37,7 @@
 #include <linux/iopoll.h>
 #include <linux/module.h>
 #include <linux/of.h>
+#include <linux/of_address.h>
 #include <linux/pci.h>
 #include <linux/platform_device.h>
 #include <linux/slab.h>
@@ -1532,6 +1533,7 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 	unsigned long size;
 	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
 	u32 id;
+	bool cttw_dt, cttw_reg;
 
 	dev_notice(smmu->dev, "probing hardware configuration...\n");
 	dev_notice(smmu->dev, "SMMUv%d with:\n", smmu->version);
@@ -1571,10 +1573,22 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 		dev_notice(smmu->dev, "\taddress translation ops\n");
 	}
 
-	if (id & ID0_CTTW) {
+	/*
+	 * In order for DMA API calls to work properly, we must defer to what
+	 * the DT says about coherency, regardless of what the hardware claims.
+	 * Fortunately, this also opens up a workaround for systems where the
+	 * ID register value has ended up configured incorrectly.
+	 */
+	cttw_dt = of_dma_is_coherent(smmu->dev->of_node);
+	cttw_reg = !!(id & ID0_CTTW);
+	if (cttw_dt)
 		smmu->features |= ARM_SMMU_FEAT_COHERENT_WALK;
-		dev_notice(smmu->dev, "\tcoherent table walk\n");
-	}
+	if (cttw_dt || cttw_reg)
+		dev_notice(smmu->dev, "\t%scoherent table walk\n",
+			   cttw_dt ? "" : "non-");
+	if (cttw_dt != cttw_reg)
+		dev_notice(smmu->dev,
+			   "\t(IDR0.CTTW overridden by dma-coherent property)\n");
 
 	if (id & ID0_SMS) {
 		u32 smr, sid, mask;
-- 
cgit v1.2.3


From b6c084d7aa8bca21920cbbe13ad58572fa85ece6 Mon Sep 17 00:00:00 2001
From: Will Deacon <will.deacon@arm.com>
Date: Mon, 29 Jun 2015 13:59:01 +0100
Subject: ARM: perf: extend interrupt-affinity property for PPIs

On systems containing multiple, heterogeneous clusters we need a way to
associate a PMU "device" with the CPU(s) on which it exists. For PMUs
that signal overflow with SPIs, this relationship is determined via the
"interrupt-affinity" property, which contains a list of phandles to CPU
nodes for the PMU. For PMUs using PPIs, the per-cpu nature of the
interrupt isn't enough to determine the set of CPUs which actually
contain the device.

This patch allows the interrupt-affinity property to be specified on a
PMU node irrespective of the interrupt type. For PPIs, it identifies
the set of CPUs signalling the PPI in question.

Tested-by: Stephen Boyd <sboyd@codeaurora.org> # Krait PMU
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 Documentation/devicetree/bindings/arm/pmu.txt | 12 +++--
 arch/arm/kernel/perf_event.c                  | 65 ++++++++++++++++++---------
 2 files changed, 53 insertions(+), 24 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/pmu.txt b/Documentation/devicetree/bindings/arm/pmu.txt
index 3b5f5d1088c6..435251fa9ce0 100644
--- a/Documentation/devicetree/bindings/arm/pmu.txt
+++ b/Documentation/devicetree/bindings/arm/pmu.txt
@@ -26,13 +26,19 @@ Required properties:
 
 Optional properties:
 
-- interrupt-affinity : Valid only when using SPIs, specifies a list of phandles
-                       to CPU nodes corresponding directly to the affinity of
+- interrupt-affinity : When using SPIs, specifies a list of phandles to CPU
+                       nodes corresponding directly to the affinity of
 		       the SPIs listed in the interrupts property.
 
-		       This property should be present when there is more than
+                       When using a PPI, specifies a list of phandles to CPU
+		       nodes corresponding to the set of CPUs which have
+		       a PMU of this type signalling the PPI listed in the
+		       interrupts property.
+
+                       This property should be present when there is more than
 		       a single SPI.
 
+
 - qcom,no-pc-write : Indicates that this PMU doesn't support the 0xc and 0xd
                      events.
 
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 7d5379c1c443..5a8f17bfcc60 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -790,32 +790,39 @@ static int probe_current_pmu(struct arm_pmu *pmu,
 
 static int of_pmu_irq_cfg(struct arm_pmu *pmu)
 {
-	int i, irq, *irqs;
+	int *irqs, i = 0;
+	bool using_spi = false;
 	struct platform_device *pdev = pmu->plat_device;
 
-	/* Don't bother with PPIs; they're already affine */
-	irq = platform_get_irq(pdev, 0);
-	if (irq >= 0 && irq_is_percpu(irq)) {
-		cpumask_setall(&pmu->supported_cpus);
-		return 0;
-	}
-
 	irqs = kcalloc(pdev->num_resources, sizeof(*irqs), GFP_KERNEL);
 	if (!irqs)
 		return -ENOMEM;
 
-	for (i = 0; i < pdev->num_resources; ++i) {
+	do {
 		struct device_node *dn;
-		int cpu;
+		int cpu, irq;
 
-		dn = of_parse_phandle(pdev->dev.of_node, "interrupt-affinity",
-				      i);
-		if (!dn) {
-			pr_warn("Failed to parse %s/interrupt-affinity[%d]\n",
-				of_node_full_name(pdev->dev.of_node), i);
+		/* See if we have an affinity entry */
+		dn = of_parse_phandle(pdev->dev.of_node, "interrupt-affinity", i);
+		if (!dn)
 			break;
+
+		/* Check the IRQ type and prohibit a mix of PPIs and SPIs */
+		irq = platform_get_irq(pdev, i);
+		if (irq >= 0) {
+			bool spi = !irq_is_percpu(irq);
+
+			if (i > 0 && spi != using_spi) {
+				pr_err("PPI/SPI IRQ type mismatch for %s!\n",
+					dn->name);
+				kfree(irqs);
+				return -EINVAL;
+			}
+
+			using_spi = spi;
 		}
 
+		/* Now look up the logical CPU number */
 		for_each_possible_cpu(cpu)
 			if (arch_find_n_match_cpu_physical_id(dn, cpu, NULL))
 				break;
@@ -824,20 +831,36 @@ static int of_pmu_irq_cfg(struct arm_pmu *pmu)
 			pr_warn("Failed to find logical CPU for %s\n",
 				dn->name);
 			of_node_put(dn);
+			cpumask_setall(&pmu->supported_cpus);
 			break;
 		}
 		of_node_put(dn);
 
-		irqs[i] = cpu;
+		/* For SPIs, we need to track the affinity per IRQ */
+		if (using_spi) {
+			if (i >= pdev->num_resources) {
+				of_node_put(dn);
+				break;
+			}
+
+			irqs[i] = cpu;
+		}
+
+		/* Keep track of the CPUs containing this PMU type */
 		cpumask_set_cpu(cpu, &pmu->supported_cpus);
-	}
+		of_node_put(dn);
+		i++;
+	} while (1);
 
-	if (i == pdev->num_resources) {
+	/* If we didn't manage to parse anything, claim to support all CPUs */
+	if (cpumask_weight(&pmu->supported_cpus) == 0)
+		cpumask_setall(&pmu->supported_cpus);
+
+	/* If we matched up the IRQ affinities, use them to route the SPIs */
+	if (using_spi && i == pdev->num_resources)
 		pmu->irq_affinity = irqs;
-	} else {
+	else
 		kfree(irqs);
-		cpumask_setall(&pmu->supported_cpus);
-	}
 
 	return 0;
 }
-- 
cgit v1.2.3


From 8cd50626823c00ca7472b2f61cb8c0eb9798ddc0 Mon Sep 17 00:00:00 2001
From: Peter Chen <peter.chen@freescale.com>
Date: Fri, 31 Jul 2015 16:36:28 +0800
Subject: Doc: ABI: testing: configfs-usb-gadget-loopback

Fix the name of attribute

Cc: <stable@vger.kernel.org>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
---
 Documentation/ABI/testing/configfs-usb-gadget-loopback | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/configfs-usb-gadget-loopback b/Documentation/ABI/testing/configfs-usb-gadget-loopback
index 9aae5bfb9908..06beefbcf061 100644
--- a/Documentation/ABI/testing/configfs-usb-gadget-loopback
+++ b/Documentation/ABI/testing/configfs-usb-gadget-loopback
@@ -5,4 +5,4 @@ Description:
 		The attributes:
 
 		qlen		- depth of loopback queue
-		bulk_buflen	- buffer length
+		buflen		- buffer length
-- 
cgit v1.2.3


From 4bc58eb16bb2352854b9c664cc36c1c68d2bfbb7 Mon Sep 17 00:00:00 2001
From: Peter Chen <peter.chen@freescale.com>
Date: Fri, 31 Jul 2015 16:36:29 +0800
Subject: Doc: ABI: testing: configfs-usb-gadget-sourcesink

Fix the name of attribute

Cc: <stable@vger.kernel.org>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
---
 Documentation/ABI/testing/configfs-usb-gadget-sourcesink | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/configfs-usb-gadget-sourcesink b/Documentation/ABI/testing/configfs-usb-gadget-sourcesink
index 29477c319f61..bc7ff731aa0c 100644
--- a/Documentation/ABI/testing/configfs-usb-gadget-sourcesink
+++ b/Documentation/ABI/testing/configfs-usb-gadget-sourcesink
@@ -9,4 +9,4 @@ Description:
 		isoc_maxpacket	- 0 - 1023 (fs), 0 - 1024 (hs/ss)
 		isoc_mult	- 0..2 (hs/ss only)
 		isoc_maxburst	- 0..15 (ss only)
-		qlen		- buffer length
+		buflen		- buffer length
-- 
cgit v1.2.3


From f811a38300be3cdb603171aea5ad3fb42b71ca53 Mon Sep 17 00:00:00 2001
From: Peter Chen <peter.chen@freescale.com>
Date: Fri, 31 Jul 2015 16:36:30 +0800
Subject: doc: usb: gadget-testing: using the updated testusb.c

testusb.c at http://www.linux-usb.org/usbtest/ is out of date,
using the one at the kernel source folder.

Cc: <stable@vger.kernel.org>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
---
 Documentation/usb/gadget-testing.txt | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/usb/gadget-testing.txt b/Documentation/usb/gadget-testing.txt
index 592678009c15..b24d3ef89166 100644
--- a/Documentation/usb/gadget-testing.txt
+++ b/Documentation/usb/gadget-testing.txt
@@ -237,9 +237,7 @@ Testing the LOOPBACK function
 -----------------------------
 
 device: run the gadget
-host: test-usb
-
-http://www.linux-usb.org/usbtest/testusb.c
+host: test-usb (tools/usb/testusb.c)
 
 8. MASS STORAGE function
 ========================
@@ -586,9 +584,8 @@ Testing the SOURCESINK function
 -------------------------------
 
 device: run the gadget
-host: test-usb
+host: test-usb (tools/usb/testusb.c)
 
-http://www.linux-usb.org/usbtest/testusb.c
 
 16. UAC1 function
 =================
-- 
cgit v1.2.3


From 885fb909dc17f9240ab3e3420f7f135ae82be01f Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Mon, 16 Mar 2015 15:47:12 +0100
Subject: PM / Domains: Correct unit address in power-controller example

In example 2 of the generic PM domains DT bindings, the unit address of
the device node representing the child power controller doesn't match
its "reg" property. Correct it.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/devicetree/bindings/power/power_domain.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/power/power_domain.txt b/Documentation/devicetree/bindings/power/power_domain.txt
index 0f8ed3710c66..025b5e7df61c 100644
--- a/Documentation/devicetree/bindings/power/power_domain.txt
+++ b/Documentation/devicetree/bindings/power/power_domain.txt
@@ -48,7 +48,7 @@ Example 2:
 		#power-domain-cells = <1>;
 	};
 
-	child: power-controller@12340000 {
+	child: power-controller@12341000 {
 		compatible = "foo,power-controller";
 		reg = <0x12341000 0x1000>;
 		power-domains = <&parent 0>;
-- 
cgit v1.2.3


From 42240901f7c438636715b9cb6ed93f4441ffc091 Mon Sep 17 00:00:00 2001
From: Tom Herbert <tom@herbertland.com>
Date: Fri, 31 Jul 2015 16:52:12 -0700
Subject: ipv6: Implement different admin modes for automatic flow labels

Change the meaning of net.ipv6.auto_flowlabels to provide a mode for
automatic flow labels generation. There are four modes:

0: flow labels are disabled
1: flow labels are enabled, sockets can opt-out
2: flow labels are allowed, sockets can opt-in
3: flow labels are enabled and enforced, no opt-out for sockets

np->autoflowlabel is initialized according to the sysctl value.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/ip-sysctl.txt | 20 ++++++++----
 include/net/ipv6.h                     | 59 ++++++++++++++++++++++++++--------
 net/ipv6/af_inet6.c                    |  3 +-
 net/ipv6/ip6_gre.c                     |  4 +--
 net/ipv6/ip6_tunnel.c                  |  2 +-
 net/ipv6/sysctl_net_ipv6.c             |  7 +++-
 6 files changed, 70 insertions(+), 25 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 00d26d919459..9ac3af3ab739 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1215,14 +1215,20 @@ flowlabel_consistency - BOOLEAN
 	FALSE: disabled
 	Default: TRUE
 
-auto_flowlabels - BOOLEAN
-	Automatically generate flow labels based based on a flow hash
-	of the packet. This allows intermediate devices, such as routers,
-	to idenfify packet flows for mechanisms like Equal Cost Multipath
+auto_flowlabels - INTEGER
+	Automatically generate flow labels based on a flow hash of the
+	packet. This allows intermediate devices, such as routers, to
+	identify packet flows for mechanisms like Equal Cost Multipath
 	Routing (see RFC 6438).
-	TRUE: enabled
-	FALSE: disabled
-	Default: false
+	0: automatic flow labels are completely disabled
+	1: automatic flow labels are enabled by default, they can be
+	   disabled on a per socket basis using the IPV6_AUTOFLOWLABEL
+	   socket option
+	2: automatic flow labels are allowed, they may be enabled on a
+	   per socket basis using the IPV6_AUTOFLOWLABEL socket option
+	3: automatic flow labels are enabled and enforced, they cannot
+	   be disabled by the socket option
+	Default: 0
 
 flowlabel_state_ranges - BOOLEAN
 	Split the flow label number space into two ranges. 0-0x7FFFF is
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 3e334b33ef3a..c02c1c03363a 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -707,36 +707,69 @@ static inline void iph_to_flow_copy_v6addrs(struct flow_keys *flow,
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
+
+/* Sysctl settings for net ipv6.auto_flowlabels */
+#define IP6_AUTO_FLOW_LABEL_OFF		0
+#define IP6_AUTO_FLOW_LABEL_OPTOUT	1
+#define IP6_AUTO_FLOW_LABEL_OPTIN	2
+#define IP6_AUTO_FLOW_LABEL_FORCED	3
+
+#define IP6_AUTO_FLOW_LABEL_MAX		IP6_AUTO_FLOW_LABEL_FORCED
+
+#define IP6_DEFAULT_AUTO_FLOW_LABELS	IP6_AUTO_FLOW_LABEL_OFF
+
 static inline __be32 ip6_make_flowlabel(struct net *net, struct sk_buff *skb,
 					__be32 flowlabel, bool autolabel,
 					struct flowi6 *fl6)
 {
-	if (!flowlabel && (autolabel || net->ipv6.sysctl.auto_flowlabels)) {
-		u32 hash;
+	u32 hash;
 
-		hash = skb_get_hash_flowi6(skb, fl6);
+	if (flowlabel ||
+	    net->ipv6.sysctl.auto_flowlabels == IP6_AUTO_FLOW_LABEL_OFF ||
+	    (!autolabel &&
+	     net->ipv6.sysctl.auto_flowlabels != IP6_AUTO_FLOW_LABEL_FORCED))
+		return flowlabel;
 
-		/* Since this is being sent on the wire obfuscate hash a bit
-		 * to minimize possbility that any useful information to an
-		 * attacker is leaked. Only lower 20 bits are relevant.
-		 */
-		hash ^= hash >> 12;
+	hash = skb_get_hash_flowi6(skb, fl6);
 
-		flowlabel = (__force __be32)hash & IPV6_FLOWLABEL_MASK;
+	/* Since this is being sent on the wire obfuscate hash a bit
+	 * to minimize possbility that any useful information to an
+	 * attacker is leaked. Only lower 20 bits are relevant.
+	 */
+	rol32(hash, 16);
 
-		if (net->ipv6.sysctl.flowlabel_state_ranges)
-			flowlabel |= IPV6_FLOWLABEL_STATELESS_FLAG;
-	}
+	flowlabel = (__force __be32)hash & IPV6_FLOWLABEL_MASK;
+
+	if (net->ipv6.sysctl.flowlabel_state_ranges)
+		flowlabel |= IPV6_FLOWLABEL_STATELESS_FLAG;
 
 	return flowlabel;
 }
+
+static inline int ip6_default_np_autolabel(struct net *net)
+{
+	switch (net->ipv6.sysctl.auto_flowlabels) {
+	case IP6_AUTO_FLOW_LABEL_OFF:
+	case IP6_AUTO_FLOW_LABEL_OPTIN:
+	default:
+		return 0;
+	case IP6_AUTO_FLOW_LABEL_OPTOUT:
+	case IP6_AUTO_FLOW_LABEL_FORCED:
+		return 1;
+	}
+}
 #else
 static inline void ip6_set_txhash(struct sock *sk) { }
 static inline __be32 ip6_make_flowlabel(struct net *net, struct sk_buff *skb,
-					__be32 flowlabel, bool autolabel)
+					__be32 flowlabel, bool autolabel,
+					struct flowi6 *fl6)
 {
 	return flowlabel;
 }
+static inline int ip6_default_np_autolabel(struct net *net)
+{
+	return 0;
+}
 #endif
 
 
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 7bc92ea4ae8f..3f0ae3a7c0b1 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -197,6 +197,7 @@ lookup_protocol:
 	np->mcast_hops	= IPV6_DEFAULT_MCASTHOPS;
 	np->mc_loop	= 1;
 	np->pmtudisc	= IPV6_PMTUDISC_WANT;
+	np->autoflowlabel = ip6_default_np_autolabel(sock_net(sk));
 	sk->sk_ipv6only	= net->ipv6.sysctl.bindv6only;
 
 	/* Init the ipv4 part of the socket since we can have sockets
@@ -767,7 +768,7 @@ static int __net_init inet6_net_init(struct net *net)
 	net->ipv6.sysctl.bindv6only = 0;
 	net->ipv6.sysctl.icmpv6_time = 1*HZ;
 	net->ipv6.sysctl.flowlabel_consistency = 1;
-	net->ipv6.sysctl.auto_flowlabels = 0;
+	net->ipv6.sysctl.auto_flowlabels = IP6_DEFAULT_AUTO_FLOW_LABELS;
 	net->ipv6.sysctl.idgen_retries = 3;
 	net->ipv6.sysctl.idgen_delay = 1 * HZ;
 	net->ipv6.sysctl.flowlabel_state_ranges = 1;
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index a7d1ca2337a9..34f121812a14 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -728,7 +728,7 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
 	 */
 	ipv6h = ipv6_hdr(skb);
 	ip6_flow_hdr(ipv6h, INET_ECN_encapsulate(0, dsfield),
-		     ip6_make_flowlabel(net, skb, fl6->flowlabel, false, fl6));
+		     ip6_make_flowlabel(net, skb, fl6->flowlabel, true, fl6));
 	ipv6h->hop_limit = tunnel->parms.hop_limit;
 	ipv6h->nexthdr = proto;
 	ipv6h->saddr = fl6->saddr;
@@ -1182,7 +1182,7 @@ static int ip6gre_header(struct sk_buff *skb, struct net_device *dev,
 
 	ip6_flow_hdr(ipv6h, 0,
 		     ip6_make_flowlabel(dev_net(dev), skb,
-					t->fl.u.ip6.flowlabel, false,
+					t->fl.u.ip6.flowlabel, true,
 					&t->fl.u.ip6));
 	ipv6h->hop_limit = t->parms.hop_limit;
 	ipv6h->nexthdr = NEXTHDR_GRE;
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 54e694c4af0e..b0ab420612bc 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1095,7 +1095,7 @@ static int ip6_tnl_xmit2(struct sk_buff *skb,
 	skb_reset_network_header(skb);
 	ipv6h = ipv6_hdr(skb);
 	ip6_flow_hdr(ipv6h, INET_ECN_encapsulate(0, dsfield),
-		     ip6_make_flowlabel(net, skb, fl6->flowlabel, false, fl6));
+		     ip6_make_flowlabel(net, skb, fl6->flowlabel, true, fl6));
 	ipv6h->hop_limit = t->parms.hop_limit;
 	ipv6h->nexthdr = proto;
 	ipv6h->saddr = fl6->saddr;
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index db48aebd9c47..45243bbe5253 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -17,6 +17,9 @@
 #include <net/inet_frag.h>
 
 static int one = 1;
+static int auto_flowlabels_min;
+static int auto_flowlabels_max = IP6_AUTO_FLOW_LABEL_MAX;
+
 
 static struct ctl_table ipv6_table_template[] = {
 	{
@@ -45,7 +48,9 @@ static struct ctl_table ipv6_table_template[] = {
 		.data		= &init_net.ipv6.sysctl.auto_flowlabels,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &auto_flowlabels_min,
+		.extra2		= &auto_flowlabels_max
 	},
 	{
 		.procname	= "fwmark_reflect",
-- 
cgit v1.2.3


From b56774163f994efce3f5603f35aa4e677c3e725a Mon Sep 17 00:00:00 2001
From: Tom Herbert <tom@herbertland.com>
Date: Fri, 31 Jul 2015 16:52:14 -0700
Subject: ipv6: Enable auto flow labels by default

Initialize auto_flowlabels to one. This enables automatic flow labels,
individual socket may disable them using the IPV6_AUTOFLOWLABEL socket
option.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/ip-sysctl.txt | 2 +-
 include/net/ipv6.h                     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 9ac3af3ab739..56db1efd7189 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1228,7 +1228,7 @@ auto_flowlabels - INTEGER
 	   per socket basis using the IPV6_AUTOFLOWLABEL socket option
 	3: automatic flow labels are enabled and enforced, they cannot
 	   be disabled by the socket option
-	Default: 0
+	Default: 1
 
 flowlabel_state_ranges - BOOLEAN
 	Split the flow label number space into two ranges. 0-0x7FFFF is
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index c02c1c03363a..711cca428cc8 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -716,7 +716,7 @@ static inline void iph_to_flow_copy_v6addrs(struct flow_keys *flow,
 
 #define IP6_AUTO_FLOW_LABEL_MAX		IP6_AUTO_FLOW_LABEL_FORCED
 
-#define IP6_DEFAULT_AUTO_FLOW_LABELS	IP6_AUTO_FLOW_LABEL_OFF
+#define IP6_DEFAULT_AUTO_FLOW_LABELS	IP6_AUTO_FLOW_LABEL_OPTOUT
 
 static inline __be32 ip6_make_flowlabel(struct net *net, struct sk_buff *skb,
 					__be32 flowlabel, bool autolabel,
-- 
cgit v1.2.3


From 9b0a18626288ecd62ad09a9c36a5ba318ea30087 Mon Sep 17 00:00:00 2001
From: Javier Martinez Canillas <javier@osg.samsung.com>
Date: Mon, 13 Jul 2015 08:58:51 +0200
Subject: PM / devfreq: event: Remove incorrect property in exynos-ppmu DT
 binding

The exynos-ppmu driver is only a clock consumer and not a clock provider
but its Device Tree binding listed #clock-cells as an optional property.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
---
 Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt | 1 -
 1 file changed, 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt b/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt
index b54bf3a2ff57..aed486692880 100644
--- a/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt
+++ b/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt
@@ -17,7 +17,6 @@ Required properties:
 Optional properties:
 - clock-names : the name of clock used by the PPMU, "ppmu"
 - clocks : phandles for clock specified in "clock-names" property
-- #clock-cells: should be 1.
 
 Example1 : PPMU nodes in exynos3250.dtsi are listed below.
 
-- 
cgit v1.2.3


From a622789aeebdeaa52d68bbc3930edbc3e820b54a Mon Sep 17 00:00:00 2001
From: Chanwoo Choi <cw00.choi@samsung.com>
Date: Fri, 24 Jul 2015 13:17:25 +0900
Subject: PM / devfreq: exynos-ppmu: Update documentation to support PPMUv2

This patch updates the documentation to include the information of PPMUv2.
The PPMUv2 is used for Exynos5433 and Exynos7420 to monitor the performance
of each IP in Exynos SoC.

Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
---
 .../bindings/devfreq/event/exynos-ppmu.txt         | 42 ++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt b/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt
index aed486692880..3e36c1d11386 100644
--- a/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt
+++ b/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt
@@ -11,14 +11,14 @@ to various devfreq devices. The devfreq devices would use the event data when
 derterming the current state of each IP.
 
 Required properties:
-- compatible: Should be "samsung,exynos-ppmu".
+- compatible: Should be "samsung,exynos-ppmu" or "samsung,exynos-ppmu-v2.
 - reg: physical base address of each PPMU and length of memory mapped region.
 
 Optional properties:
 - clock-names : the name of clock used by the PPMU, "ppmu"
 - clocks : phandles for clock specified in "clock-names" property
 
-Example1 : PPMU nodes in exynos3250.dtsi are listed below.
+Example1 : PPMUv1 nodes in exynos3250.dtsi are listed below.
 
 		ppmu_dmc0: ppmu_dmc0@106a0000 {
 			compatible = "samsung,exynos-ppmu";
@@ -107,3 +107,41 @@ Example2 : Events of each PPMU node in exynos3250-rinato.dts are listed below.
 			};
 		};
 	};
+
+Example3 : PPMUv2 nodes in exynos5433.dtsi are listed below.
+
+		ppmu_d0_cpu: ppmu_d0_cpu@10480000 {
+			compatible = "samsung,exynos-ppmu-v2";
+			reg = <0x10480000 0x2000>;
+			status = "disabled";
+		};
+
+		ppmu_d0_general: ppmu_d0_general@10490000 {
+			compatible = "samsung,exynos-ppmu-v2";
+			reg = <0x10490000 0x2000>;
+			status = "disabled";
+		};
+
+		ppmu_d0_rt: ppmu_d0_rt@104a0000 {
+			compatible = "samsung,exynos-ppmu-v2";
+			reg = <0x104a0000 0x2000>;
+			status = "disabled";
+		};
+
+		ppmu_d1_cpu: ppmu_d1_cpu@104b0000 {
+			compatible = "samsung,exynos-ppmu-v2";
+			reg = <0x104b0000 0x2000>;
+			status = "disabled";
+		};
+
+		ppmu_d1_general: ppmu_d1_general@104c0000 {
+			compatible = "samsung,exynos-ppmu-v2";
+			reg = <0x104c0000 0x2000>;
+			status = "disabled";
+		};
+
+		ppmu_d1_rt: ppmu_d1_rt@104d0000 {
+			compatible = "samsung,exynos-ppmu-v2";
+			reg = <0x104d0000 0x2000>;
+			status = "disabled";
+		};
-- 
cgit v1.2.3


From ed2de9f74ecbbf3063d29b2334e7b455d7f35189 Mon Sep 17 00:00:00 2001
From: Will Deacon <will.deacon@arm.com>
Date: Thu, 16 Jul 2015 16:10:06 +0100
Subject: locking/Documentation: Clarify failed cmpxchg() memory ordering
 semantics

A failed cmpxchg does not provide any memory ordering guarantees, a
property that is used to optimise the cmpxchg implementations on Alpha,
PowerPC and arm64.

This patch updates atomic_ops.txt and memory-barriers.txt to reflect
this.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <waiman.long@hp.com>
Link: http://lkml.kernel.org/r/20150716151006.GH26390@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/atomic_ops.txt      | 4 +++-
 Documentation/memory-barriers.txt | 6 +++---
 2 files changed, 6 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt
index dab6da3382d9..b19fc34efdb1 100644
--- a/Documentation/atomic_ops.txt
+++ b/Documentation/atomic_ops.txt
@@ -266,7 +266,9 @@ with the given old and new values. Like all atomic_xxx operations,
 atomic_cmpxchg will only satisfy its atomicity semantics as long as all
 other accesses of *v are performed through atomic_xxx operations.
 
-atomic_cmpxchg must provide explicit memory barriers around the operation.
+atomic_cmpxchg must provide explicit memory barriers around the operation,
+although if the comparison fails then no memory ordering guarantees are
+required.
 
 The semantics for atomic_cmpxchg are the same as those defined for 'cas'
 below.
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 13feb697271f..18fc860df1be 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2383,9 +2383,7 @@ about the state (old or new) implies an SMP-conditional general memory barrier
 explicit lock operations, described later).  These include:
 
 	xchg();
-	cmpxchg();
 	atomic_xchg();			atomic_long_xchg();
-	atomic_cmpxchg();		atomic_long_cmpxchg();
 	atomic_inc_return();		atomic_long_inc_return();
 	atomic_dec_return();		atomic_long_dec_return();
 	atomic_add_return();		atomic_long_add_return();
@@ -2398,7 +2396,9 @@ explicit lock operations, described later).  These include:
 	test_and_clear_bit();
 	test_and_change_bit();
 
-	/* when succeeds (returns 1) */
+	/* when succeeds */
+	cmpxchg();
+	atomic_cmpxchg();		atomic_long_cmpxchg();
 	atomic_add_unless();		atomic_long_add_unless();
 
 These are used for such things as implementing ACQUIRE-class and RELEASE-class
-- 
cgit v1.2.3


From 412758cb26704e5087ca2976ec3b28fb2bdbfad4 Mon Sep 17 00:00:00 2001
From: Jason Baron <jbaron@akamai.com>
Date: Thu, 30 Jul 2015 03:59:48 +0000
Subject: jump label, locking/static_keys: Update docs

Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: benh@kernel.crashing.org
Cc: bp@alien8.de
Cc: davem@davemloft.net
Cc: ddaney@caviumnetworks.com
Cc: heiko.carstens@de.ibm.com
Cc: linux-kernel@vger.kernel.org
Cc: liuj97@gmail.com
Cc: luto@amacapital.net
Cc: michael@ellerman.id.au
Cc: rabin@rab.in
Cc: ralf@linux-mips.org
Cc: rostedt@goodmis.org
Cc: vbabka@suse.cz
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/6b50f2f6423a2244f37f4b1d2d6c211b9dcdf4f8.1438227999.git.jbaron@akamai.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/static-keys.txt | 99 +++++++++++++++++++++++--------------------
 include/linux/jump_label.h    | 67 ++++++++++++++++++++---------
 2 files changed, 98 insertions(+), 68 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/static-keys.txt b/Documentation/static-keys.txt
index c4407a41b0fc..f4cb0b2d5cd7 100644
--- a/Documentation/static-keys.txt
+++ b/Documentation/static-keys.txt
@@ -1,7 +1,22 @@
 			Static Keys
 			-----------
 
-By: Jason Baron <jbaron@redhat.com>
+DEPRECATED API:
+
+The use of 'struct static_key' directly, is now DEPRECATED. In addition
+static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following:
+
+struct static_key false = STATIC_KEY_INIT_FALSE;
+struct static_key true = STATIC_KEY_INIT_TRUE;
+static_key_true()
+static_key_false()
+
+The updated API replacements are:
+
+DEFINE_STATIC_KEY_TRUE(key);
+DEFINE_STATIC_KEY_FALSE(key);
+static_key_likely()
+statick_key_unlikely()
 
 0) Abstract
 
@@ -9,22 +24,22 @@ Static keys allows the inclusion of seldom used features in
 performance-sensitive fast-path kernel code, via a GCC feature and a code
 patching technique. A quick example:
 
-	struct static_key key = STATIC_KEY_INIT_FALSE;
+	DEFINE_STATIC_KEY_FALSE(key);
 
 	...
 
-        if (static_key_false(&key))
+        if (static_branch_unlikely(&key))
                 do unlikely code
         else
                 do likely code
 
 	...
-	static_key_slow_inc();
+	static_branch_enable(&key);
 	...
-	static_key_slow_inc();
+	static_branch_disable(&key);
 	...
 
-The static_key_false() branch will be generated into the code with as little
+The static_branch_unlikely() branch will be generated into the code with as little
 impact to the likely code path as possible.
 
 
@@ -56,7 +71,7 @@ the branch site to change the branch direction.
 
 For example, if we have a simple branch that is disabled by default:
 
-	if (static_key_false(&key))
+	if (static_branch_unlikely(&key))
 		printk("I am the true branch\n");
 
 Thus, by default the 'printk' will not be emitted. And the code generated will
@@ -75,68 +90,55 @@ the basis for the static keys facility.
 
 In order to make use of this optimization you must first define a key:
 
-	struct static_key key;
-
-Which is initialized as:
-
-	struct static_key key = STATIC_KEY_INIT_TRUE;
+	DEFINE_STATIC_KEY_TRUE(key);
 
 or:
 
-	struct static_key key = STATIC_KEY_INIT_FALSE;
+	DEFINE_STATIC_KEY_FALSE(key);
+
 
-If the key is not initialized, it is default false. The 'struct static_key',
-must be a 'global'. That is, it can't be allocated on the stack or dynamically
+The key must be global, that is, it can't be allocated on the stack or dynamically
 allocated at run-time.
 
 The key is then used in code as:
 
-        if (static_key_false(&key))
+        if (static_branch_unlikely(&key))
                 do unlikely code
         else
                 do likely code
 
 Or:
 
-        if (static_key_true(&key))
+        if (static_branch_likely(&key))
                 do likely code
         else
                 do unlikely code
 
-A key that is initialized via 'STATIC_KEY_INIT_FALSE', must be used in a
-'static_key_false()' construct. Likewise, a key initialized via
-'STATIC_KEY_INIT_TRUE' must be used in a 'static_key_true()' construct. A
-single key can be used in many branches, but all the branches must match the
-way that the key has been initialized.
+Keys defined via DEFINE_STATIC_KEY_TRUE(), or DEFINE_STATIC_KEY_FALSE, may
+be used in either static_branch_likely() or static_branch_unlikely()
+statemnts.
 
-The branch(es) can then be switched via:
+Branch(es) can be set true via:
 
-	static_key_slow_inc(&key);
-	...
-	static_key_slow_dec(&key);
+static_branch_enable(&key);
 
-Thus, 'static_key_slow_inc()' means 'make the branch true', and
-'static_key_slow_dec()' means 'make the branch false' with appropriate
-reference counting. For example, if the key is initialized true, a
-static_key_slow_dec(), will switch the branch to false. And a subsequent
-static_key_slow_inc(), will change the branch back to true. Likewise, if the
-key is initialized false, a 'static_key_slow_inc()', will change the branch to
-true. And then a 'static_key_slow_dec()', will again make the branch false.
+or false via:
+
+static_branch_disable(&key);
 
-An example usage in the kernel is the implementation of tracepoints:
+The branch(es) can then be switched via reference counts:
 
-        static inline void trace_##name(proto)                          \
-        {                                                               \
-                if (static_key_false(&__tracepoint_##name.key))		\
-                        __DO_TRACE(&__tracepoint_##name,                \
-                                TP_PROTO(data_proto),                   \
-                                TP_ARGS(data_args),                     \
-                                TP_CONDITION(cond));                    \
-        }
+	static_branch_inc(&key);
+	...
+	static_branch_dec(&key);
 
-Tracepoints are disabled by default, and can be placed in performance critical
-pieces of the kernel. Thus, by using a static key, the tracepoints can have
-absolutely minimal impact when not in use.
+Thus, 'static_branch_inc()' means 'make the branch true', and
+'static_branch_dec()' means 'make the branch false' with appropriate
+reference counting. For example, if the key is initialized true, a
+static_branch_dec(), will switch the branch to false. And a subsequent
+static_branch_inc(), will change the branch back to true. Likewise, if the
+key is initialized false, a 'static_branch_inc()', will change the branch to
+true. And then a 'static_branch_dec()', will again make the branch false.
 
 
 4) Architecture level code patching interface, 'jump labels'
@@ -150,9 +152,12 @@ simply fall back to a traditional, load, test, and jump sequence.
 
 * #define JUMP_LABEL_NOP_SIZE, see: arch/x86/include/asm/jump_label.h
 
-* __always_inline bool arch_static_branch(struct static_key *key), see:
+* __always_inline bool arch_static_branch(struct static_key *key, bool branch), see:
 					arch/x86/include/asm/jump_label.h
 
+* __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch),
+					see: arch/x86/include/asm/jump_label.h
+
 * void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type),
 					see: arch/x86/kernel/jump_label.c
 
@@ -173,7 +178,7 @@ SYSCALL_DEFINE0(getppid)
 {
         int pid;
 
-+       if (static_key_false(&key))
++       if (static_branch_unlikely(&key))
 +               printk("I am the true branch\n");
 
         rcu_read_lock();
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index e337a1961933..7f653e8f6690 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -7,17 +7,52 @@
  * Copyright (C) 2009-2012 Jason Baron <jbaron@redhat.com>
  * Copyright (C) 2011-2012 Peter Zijlstra <pzijlstr@redhat.com>
  *
+ * DEPRECATED API:
+ *
+ * The use of 'struct static_key' directly, is now DEPRECATED. In addition
+ * static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following:
+ *
+ * struct static_key false = STATIC_KEY_INIT_FALSE;
+ * struct static_key true = STATIC_KEY_INIT_TRUE;
+ * static_key_true()
+ * static_key_false()
+ *
+ * The updated API replacements are:
+ *
+ * DEFINE_STATIC_KEY_TRUE(key);
+ * DEFINE_STATIC_KEY_FALSE(key);
+ * static_key_likely()
+ * statick_key_unlikely()
+ *
  * Jump labels provide an interface to generate dynamic branches using
- * self-modifying code. Assuming toolchain and architecture support, the result
- * of a "if (static_key_false(&key))" statement is an unconditional branch (which
- * defaults to false - and the true block is placed out of line).
+ * self-modifying code. Assuming toolchain and architecture support, if we
+ * define a "key" that is initially false via "DEFINE_STATIC_KEY_FALSE(key)",
+ * an "if (static_branch_unlikely(&key))" statement is an unconditional branch
+ * (which defaults to false - and the true block is placed out of line).
+ * Similarly, we can define an initially true key via
+ * "DEFINE_STATIC_KEY_TRUE(key)", and use it in the same
+ * "if (static_branch_unlikely(&key))", in which case we will generate an
+ * unconditional branch to the out-of-line true branch. Keys that are
+ * initially true or false can be using in both static_branch_unlikely()
+ * and static_branch_likely() statements.
  *
- * However at runtime we can change the branch target using
- * static_key_slow_{inc,dec}(). These function as a 'reference' count on the key
- * object, and for as long as there are references all branches referring to
- * that particular key will point to the (out of line) true block.
+ * At runtime we can change the branch target by setting the key
+ * to true via a call to static_branch_enable(), or false using
+ * static_branch_disable(). If the direction of the branch is switched by
+ * these calls then we run-time modify the branch target via a
+ * no-op -> jump or jump -> no-op conversion. For example, for an
+ * initially false key that is used in an "if (static_branch_unlikely(&key))"
+ * statement, setting the key to true requires us to patch in a jump
+ * to the out-of-line of true branch.
  *
- * Since this relies on modifying code, the static_key_slow_{inc,dec}() functions
+ * In addtion to static_branch_{enable,disable}, we can also reference count
+ * the key or branch direction via static_branch_{inc,dec}. Thus,
+ * static_branch_inc() can be thought of as a 'make more true' and
+ * static_branch_dec() as a 'make more false'. The inc()/dec()
+ * interface is meant to be used exclusively from the inc()/dec() for a given
+ * key.
+ *
+ * Since this relies on modifying code, the branch modifying functions
  * must be considered absolute slow paths (machine wide synchronization etc.).
  * OTOH, since the affected branches are unconditional, their runtime overhead
  * will be absolutely minimal, esp. in the default (off) case where the total
@@ -29,20 +64,10 @@
  * cause significant performance degradation. Struct static_key_deferred and
  * static_key_slow_dec_deferred() provide for this.
  *
- * Lacking toolchain and or architecture support, jump labels fall back to a simple
- * conditional branch.
- *
- * struct static_key my_key = STATIC_KEY_INIT_TRUE;
- *
- *   if (static_key_true(&my_key)) {
- *   }
- *
- * will result in the true case being in-line and starts the key with a single
- * reference. Mixing static_key_true() and static_key_false() on the same key is not
- * allowed.
+ * Lacking toolchain and or architecture support, static keys fall back to a
+ * simple conditional branch.
  *
- * Not initializing the key (static data is initialized to 0s anyway) is the
- * same as using STATIC_KEY_INIT_FALSE.
+ * Additional babbling in: Documentation/static-keys.txt
  */
 
 #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
-- 
cgit v1.2.3


From efdf5aa8f175c18def6407bb2b51643e4da88259 Mon Sep 17 00:00:00 2001
From: Philipp Zabel <p.zabel@pengutronix.de>
Date: Fri, 13 Feb 2015 12:20:49 +0100
Subject: ARM: STi: DT: Move reset controller constants into common location

By popular vote, the DT binding includes for reset controllers are located
in include/dt-bindings/reset/. Move the STi reset constants in there, too,
to avoid confusion.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Patrice Chotard <patrice.chotard@st.com>
---
 .../bindings/reset/st,sti-picophyreset.txt         |  2 +-
 .../devicetree/bindings/reset/st,sti-powerdown.txt |  4 +-
 .../devicetree/bindings/reset/st,sti-softreset.txt |  4 +-
 arch/arm/boot/dts/stih407-family.dtsi              |  2 +-
 arch/arm/boot/dts/stih415.dtsi                     |  2 +-
 arch/arm/boot/dts/stih416.dtsi                     |  2 +-
 drivers/reset/sti/reset-stih407.c                  |  2 +-
 drivers/reset/sti/reset-stih415.c                  |  2 +-
 drivers/reset/sti/reset-stih416.c                  |  2 +-
 .../dt-bindings/reset-controller/stih407-resets.h  | 61 ----------------------
 .../dt-bindings/reset-controller/stih415-resets.h  | 27 ----------
 .../dt-bindings/reset-controller/stih416-resets.h  | 51 ------------------
 include/dt-bindings/reset/stih407-resets.h         | 61 ++++++++++++++++++++++
 include/dt-bindings/reset/stih415-resets.h         | 27 ++++++++++
 include/dt-bindings/reset/stih416-resets.h         | 51 ++++++++++++++++++
 15 files changed, 150 insertions(+), 150 deletions(-)
 delete mode 100644 include/dt-bindings/reset-controller/stih407-resets.h
 delete mode 100644 include/dt-bindings/reset-controller/stih415-resets.h
 delete mode 100644 include/dt-bindings/reset-controller/stih416-resets.h
 create mode 100644 include/dt-bindings/reset/stih407-resets.h
 create mode 100644 include/dt-bindings/reset/stih415-resets.h
 create mode 100644 include/dt-bindings/reset/stih416-resets.h

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/reset/st,sti-picophyreset.txt b/Documentation/devicetree/bindings/reset/st,sti-picophyreset.txt
index 54ae9f747e45..9ca27761f811 100644
--- a/Documentation/devicetree/bindings/reset/st,sti-picophyreset.txt
+++ b/Documentation/devicetree/bindings/reset/st,sti-picophyreset.txt
@@ -39,4 +39,4 @@ Example:
 	};
 
 Macro definitions for the supported reset channels can be found in:
-include/dt-bindings/reset-controller/stih407-resets.h
+include/dt-bindings/reset/stih407-resets.h
diff --git a/Documentation/devicetree/bindings/reset/st,sti-powerdown.txt b/Documentation/devicetree/bindings/reset/st,sti-powerdown.txt
index 5ab26b7e9d35..1cfd21d1dfa1 100644
--- a/Documentation/devicetree/bindings/reset/st,sti-powerdown.txt
+++ b/Documentation/devicetree/bindings/reset/st,sti-powerdown.txt
@@ -43,5 +43,5 @@ example:
 
 Macro definitions for the supported reset channels can be found in:
 
-include/dt-bindings/reset-controller/stih415-resets.h
-include/dt-bindings/reset-controller/stih416-resets.h
+include/dt-bindings/reset/stih415-resets.h
+include/dt-bindings/reset/stih416-resets.h
diff --git a/Documentation/devicetree/bindings/reset/st,sti-softreset.txt b/Documentation/devicetree/bindings/reset/st,sti-softreset.txt
index a8d3d3c25ca2..891a2fd85ed6 100644
--- a/Documentation/devicetree/bindings/reset/st,sti-softreset.txt
+++ b/Documentation/devicetree/bindings/reset/st,sti-softreset.txt
@@ -42,5 +42,5 @@ example:
 
 Macro definitions for the supported reset channels can be found in:
 
-include/dt-bindings/reset-controller/stih415-resets.h
-include/dt-bindings/reset-controller/stih416-resets.h
+include/dt-bindings/reset/stih415-resets.h
+include/dt-bindings/reset/stih416-resets.h
diff --git a/arch/arm/boot/dts/stih407-family.dtsi b/arch/arm/boot/dts/stih407-family.dtsi
index 838b812cbda1..eab3477e0a0e 100644
--- a/arch/arm/boot/dts/stih407-family.dtsi
+++ b/arch/arm/boot/dts/stih407-family.dtsi
@@ -9,7 +9,7 @@
 #include "stih407-pinctrl.dtsi"
 #include <dt-bindings/mfd/st-lpc.h>
 #include <dt-bindings/phy/phy.h>
-#include <dt-bindings/reset-controller/stih407-resets.h>
+#include <dt-bindings/reset/stih407-resets.h>
 #include <dt-bindings/interrupt-controller/irq-st.h>
 / {
 	#address-cells = <1>;
diff --git a/arch/arm/boot/dts/stih415.dtsi b/arch/arm/boot/dts/stih415.dtsi
index 19b019b5f30e..12427e651e5e 100644
--- a/arch/arm/boot/dts/stih415.dtsi
+++ b/arch/arm/boot/dts/stih415.dtsi
@@ -10,7 +10,7 @@
 #include "stih415-clock.dtsi"
 #include "stih415-pinctrl.dtsi"
 #include <dt-bindings/interrupt-controller/arm-gic.h>
-#include <dt-bindings/reset-controller/stih415-resets.h>
+#include <dt-bindings/reset/stih415-resets.h>
 / {
 
 	L2: cache-controller {
diff --git a/arch/arm/boot/dts/stih416.dtsi b/arch/arm/boot/dts/stih416.dtsi
index 9dca173e694a..9e3170ccd18c 100644
--- a/arch/arm/boot/dts/stih416.dtsi
+++ b/arch/arm/boot/dts/stih416.dtsi
@@ -12,7 +12,7 @@
 
 #include <dt-bindings/phy/phy.h>
 #include <dt-bindings/interrupt-controller/arm-gic.h>
-#include <dt-bindings/reset-controller/stih416-resets.h>
+#include <dt-bindings/reset/stih416-resets.h>
 #include <dt-bindings/interrupt-controller/irq-st.h>
 / {
 	L2: cache-controller {
diff --git a/drivers/reset/sti/reset-stih407.c b/drivers/reset/sti/reset-stih407.c
index d83db5d72d08..f7a6cb093983 100644
--- a/drivers/reset/sti/reset-stih407.c
+++ b/drivers/reset/sti/reset-stih407.c
@@ -11,7 +11,7 @@
 #include <linux/of.h>
 #include <linux/of_platform.h>
 #include <linux/platform_device.h>
-#include <dt-bindings/reset-controller/stih407-resets.h>
+#include <dt-bindings/reset/stih407-resets.h>
 #include "reset-syscfg.h"
 
 /* STiH407 Peripheral powerdown definitions. */
diff --git a/drivers/reset/sti/reset-stih415.c b/drivers/reset/sti/reset-stih415.c
index 8dad603d863c..f7b49d5a62dc 100644
--- a/drivers/reset/sti/reset-stih415.c
+++ b/drivers/reset/sti/reset-stih415.c
@@ -13,7 +13,7 @@
 #include <linux/of_platform.h>
 #include <linux/platform_device.h>
 
-#include <dt-bindings/reset-controller/stih415-resets.h>
+#include <dt-bindings/reset/stih415-resets.h>
 
 #include "reset-syscfg.h"
 
diff --git a/drivers/reset/sti/reset-stih416.c b/drivers/reset/sti/reset-stih416.c
index 79aed70a26c0..d65a82e74004 100644
--- a/drivers/reset/sti/reset-stih416.c
+++ b/drivers/reset/sti/reset-stih416.c
@@ -13,7 +13,7 @@
 #include <linux/of_platform.h>
 #include <linux/platform_device.h>
 
-#include <dt-bindings/reset-controller/stih416-resets.h>
+#include <dt-bindings/reset/stih416-resets.h>
 
 #include "reset-syscfg.h"
 
diff --git a/include/dt-bindings/reset-controller/stih407-resets.h b/include/dt-bindings/reset-controller/stih407-resets.h
deleted file mode 100644
index 02d4328fe479..000000000000
--- a/include/dt-bindings/reset-controller/stih407-resets.h
+++ /dev/null
@@ -1,61 +0,0 @@
-/*
- * This header provides constants for the reset controller
- * based peripheral powerdown requests on the STMicroelectronics
- * STiH407 SoC.
- */
-#ifndef _DT_BINDINGS_RESET_CONTROLLER_STIH407
-#define _DT_BINDINGS_RESET_CONTROLLER_STIH407
-
-/* Powerdown requests control 0 */
-#define STIH407_EMISS_POWERDOWN		0
-#define STIH407_NAND_POWERDOWN		1
-
-/* Synp GMAC PowerDown */
-#define STIH407_ETH1_POWERDOWN		2
-
-/* Powerdown requests control 1 */
-#define STIH407_USB3_POWERDOWN		3
-#define STIH407_USB2_PORT1_POWERDOWN	4
-#define STIH407_USB2_PORT0_POWERDOWN	5
-#define STIH407_PCIE1_POWERDOWN		6
-#define STIH407_PCIE0_POWERDOWN		7
-#define STIH407_SATA1_POWERDOWN		8
-#define STIH407_SATA0_POWERDOWN		9
-
-/* Reset defines */
-#define STIH407_ETH1_SOFTRESET		0
-#define STIH407_MMC1_SOFTRESET		1
-#define STIH407_PICOPHY_SOFTRESET	2
-#define STIH407_IRB_SOFTRESET		3
-#define STIH407_PCIE0_SOFTRESET		4
-#define STIH407_PCIE1_SOFTRESET		5
-#define STIH407_SATA0_SOFTRESET		6
-#define STIH407_SATA1_SOFTRESET		7
-#define STIH407_MIPHY0_SOFTRESET	8
-#define STIH407_MIPHY1_SOFTRESET	9
-#define STIH407_MIPHY2_SOFTRESET	10
-#define STIH407_SATA0_PWR_SOFTRESET	11
-#define STIH407_SATA1_PWR_SOFTRESET	12
-#define STIH407_DELTA_SOFTRESET		13
-#define STIH407_BLITTER_SOFTRESET	14
-#define STIH407_HDTVOUT_SOFTRESET	15
-#define STIH407_HDQVDP_SOFTRESET	16
-#define STIH407_VDP_AUX_SOFTRESET	17
-#define STIH407_COMPO_SOFTRESET		18
-#define STIH407_HDMI_TX_PHY_SOFTRESET	19
-#define STIH407_JPEG_DEC_SOFTRESET	20
-#define STIH407_VP8_DEC_SOFTRESET	21
-#define STIH407_GPU_SOFTRESET		22
-#define STIH407_HVA_SOFTRESET		23
-#define STIH407_ERAM_HVA_SOFTRESET	24
-#define STIH407_LPM_SOFTRESET		25
-#define STIH407_KEYSCAN_SOFTRESET	26
-#define STIH407_USB2_PORT0_SOFTRESET	27
-#define STIH407_USB2_PORT1_SOFTRESET	28
-
-/* Picophy reset defines */
-#define STIH407_PICOPHY0_RESET		0
-#define STIH407_PICOPHY1_RESET		1
-#define STIH407_PICOPHY2_RESET		2
-
-#endif /* _DT_BINDINGS_RESET_CONTROLLER_STIH407 */
diff --git a/include/dt-bindings/reset-controller/stih415-resets.h b/include/dt-bindings/reset-controller/stih415-resets.h
deleted file mode 100644
index c2329fe29cf6..000000000000
--- a/include/dt-bindings/reset-controller/stih415-resets.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/*
- * This header provides constants for the reset controller
- * based peripheral powerdown requests on the STMicroelectronics
- * STiH415 SoC.
- */
-#ifndef _DT_BINDINGS_RESET_CONTROLLER_STIH415
-#define _DT_BINDINGS_RESET_CONTROLLER_STIH415
-
-#define STIH415_EMISS_POWERDOWN		0
-#define STIH415_NAND_POWERDOWN		1
-#define STIH415_KEYSCAN_POWERDOWN	2
-#define STIH415_USB0_POWERDOWN		3
-#define STIH415_USB1_POWERDOWN		4
-#define STIH415_USB2_POWERDOWN		5
-#define STIH415_SATA0_POWERDOWN		6
-#define STIH415_SATA1_POWERDOWN		7
-#define STIH415_PCIE_POWERDOWN		8
-
-#define STIH415_ETH0_SOFTRESET		0
-#define STIH415_ETH1_SOFTRESET		1
-#define STIH415_IRB_SOFTRESET		2
-#define STIH415_USB0_SOFTRESET		3
-#define STIH415_USB1_SOFTRESET		4
-#define STIH415_USB2_SOFTRESET		5
-#define STIH415_KEYSCAN_SOFTRESET	6
-
-#endif /* _DT_BINDINGS_RESET_CONTROLLER_STIH415 */
diff --git a/include/dt-bindings/reset-controller/stih416-resets.h b/include/dt-bindings/reset-controller/stih416-resets.h
deleted file mode 100644
index fcf9af1ac0b2..000000000000
--- a/include/dt-bindings/reset-controller/stih416-resets.h
+++ /dev/null
@@ -1,51 +0,0 @@
-/*
- * This header provides constants for the reset controller
- * based peripheral powerdown requests on the STMicroelectronics
- * STiH416 SoC.
- */
-#ifndef _DT_BINDINGS_RESET_CONTROLLER_STIH416
-#define _DT_BINDINGS_RESET_CONTROLLER_STIH416
-
-#define STIH416_EMISS_POWERDOWN		0
-#define STIH416_NAND_POWERDOWN		1
-#define STIH416_KEYSCAN_POWERDOWN	2
-#define STIH416_USB0_POWERDOWN		3
-#define STIH416_USB1_POWERDOWN		4
-#define STIH416_USB2_POWERDOWN		5
-#define STIH416_USB3_POWERDOWN		6
-#define STIH416_SATA0_POWERDOWN		7
-#define STIH416_SATA1_POWERDOWN		8
-#define STIH416_PCIE0_POWERDOWN		9
-#define STIH416_PCIE1_POWERDOWN		10
-
-#define STIH416_ETH0_SOFTRESET		0
-#define STIH416_ETH1_SOFTRESET		1
-#define STIH416_IRB_SOFTRESET		2
-#define STIH416_USB0_SOFTRESET		3
-#define STIH416_USB1_SOFTRESET		4
-#define STIH416_USB2_SOFTRESET		5
-#define STIH416_USB3_SOFTRESET		6
-#define STIH416_SATA0_SOFTRESET		7
-#define STIH416_SATA1_SOFTRESET		8
-#define STIH416_PCIE0_SOFTRESET		9
-#define STIH416_PCIE1_SOFTRESET		10
-#define STIH416_AUD_DAC_SOFTRESET	11
-#define STIH416_HDTVOUT_SOFTRESET	12
-#define STIH416_VTAC_M_RX_SOFTRESET	13
-#define STIH416_VTAC_A_RX_SOFTRESET	14
-#define STIH416_SYNC_HD_SOFTRESET	15
-#define STIH416_SYNC_SD_SOFTRESET	16
-#define STIH416_BLITTER_SOFTRESET	17
-#define STIH416_GPU_SOFTRESET		18
-#define STIH416_VTAC_M_TX_SOFTRESET	19
-#define STIH416_VTAC_A_TX_SOFTRESET	20
-#define STIH416_VTG_AUX_SOFTRESET	21
-#define STIH416_JPEG_DEC_SOFTRESET	22
-#define STIH416_HVA_SOFTRESET		23
-#define STIH416_COMPO_M_SOFTRESET	24
-#define STIH416_COMPO_A_SOFTRESET	25
-#define STIH416_VP8_DEC_SOFTRESET	26
-#define STIH416_VTG_MAIN_SOFTRESET	27
-#define STIH416_KEYSCAN_SOFTRESET	28
-
-#endif /* _DT_BINDINGS_RESET_CONTROLLER_STIH416 */
diff --git a/include/dt-bindings/reset/stih407-resets.h b/include/dt-bindings/reset/stih407-resets.h
new file mode 100644
index 000000000000..02d4328fe479
--- /dev/null
+++ b/include/dt-bindings/reset/stih407-resets.h
@@ -0,0 +1,61 @@
+/*
+ * This header provides constants for the reset controller
+ * based peripheral powerdown requests on the STMicroelectronics
+ * STiH407 SoC.
+ */
+#ifndef _DT_BINDINGS_RESET_CONTROLLER_STIH407
+#define _DT_BINDINGS_RESET_CONTROLLER_STIH407
+
+/* Powerdown requests control 0 */
+#define STIH407_EMISS_POWERDOWN		0
+#define STIH407_NAND_POWERDOWN		1
+
+/* Synp GMAC PowerDown */
+#define STIH407_ETH1_POWERDOWN		2
+
+/* Powerdown requests control 1 */
+#define STIH407_USB3_POWERDOWN		3
+#define STIH407_USB2_PORT1_POWERDOWN	4
+#define STIH407_USB2_PORT0_POWERDOWN	5
+#define STIH407_PCIE1_POWERDOWN		6
+#define STIH407_PCIE0_POWERDOWN		7
+#define STIH407_SATA1_POWERDOWN		8
+#define STIH407_SATA0_POWERDOWN		9
+
+/* Reset defines */
+#define STIH407_ETH1_SOFTRESET		0
+#define STIH407_MMC1_SOFTRESET		1
+#define STIH407_PICOPHY_SOFTRESET	2
+#define STIH407_IRB_SOFTRESET		3
+#define STIH407_PCIE0_SOFTRESET		4
+#define STIH407_PCIE1_SOFTRESET		5
+#define STIH407_SATA0_SOFTRESET		6
+#define STIH407_SATA1_SOFTRESET		7
+#define STIH407_MIPHY0_SOFTRESET	8
+#define STIH407_MIPHY1_SOFTRESET	9
+#define STIH407_MIPHY2_SOFTRESET	10
+#define STIH407_SATA0_PWR_SOFTRESET	11
+#define STIH407_SATA1_PWR_SOFTRESET	12
+#define STIH407_DELTA_SOFTRESET		13
+#define STIH407_BLITTER_SOFTRESET	14
+#define STIH407_HDTVOUT_SOFTRESET	15
+#define STIH407_HDQVDP_SOFTRESET	16
+#define STIH407_VDP_AUX_SOFTRESET	17
+#define STIH407_COMPO_SOFTRESET		18
+#define STIH407_HDMI_TX_PHY_SOFTRESET	19
+#define STIH407_JPEG_DEC_SOFTRESET	20
+#define STIH407_VP8_DEC_SOFTRESET	21
+#define STIH407_GPU_SOFTRESET		22
+#define STIH407_HVA_SOFTRESET		23
+#define STIH407_ERAM_HVA_SOFTRESET	24
+#define STIH407_LPM_SOFTRESET		25
+#define STIH407_KEYSCAN_SOFTRESET	26
+#define STIH407_USB2_PORT0_SOFTRESET	27
+#define STIH407_USB2_PORT1_SOFTRESET	28
+
+/* Picophy reset defines */
+#define STIH407_PICOPHY0_RESET		0
+#define STIH407_PICOPHY1_RESET		1
+#define STIH407_PICOPHY2_RESET		2
+
+#endif /* _DT_BINDINGS_RESET_CONTROLLER_STIH407 */
diff --git a/include/dt-bindings/reset/stih415-resets.h b/include/dt-bindings/reset/stih415-resets.h
new file mode 100644
index 000000000000..c2329fe29cf6
--- /dev/null
+++ b/include/dt-bindings/reset/stih415-resets.h
@@ -0,0 +1,27 @@
+/*
+ * This header provides constants for the reset controller
+ * based peripheral powerdown requests on the STMicroelectronics
+ * STiH415 SoC.
+ */
+#ifndef _DT_BINDINGS_RESET_CONTROLLER_STIH415
+#define _DT_BINDINGS_RESET_CONTROLLER_STIH415
+
+#define STIH415_EMISS_POWERDOWN		0
+#define STIH415_NAND_POWERDOWN		1
+#define STIH415_KEYSCAN_POWERDOWN	2
+#define STIH415_USB0_POWERDOWN		3
+#define STIH415_USB1_POWERDOWN		4
+#define STIH415_USB2_POWERDOWN		5
+#define STIH415_SATA0_POWERDOWN		6
+#define STIH415_SATA1_POWERDOWN		7
+#define STIH415_PCIE_POWERDOWN		8
+
+#define STIH415_ETH0_SOFTRESET		0
+#define STIH415_ETH1_SOFTRESET		1
+#define STIH415_IRB_SOFTRESET		2
+#define STIH415_USB0_SOFTRESET		3
+#define STIH415_USB1_SOFTRESET		4
+#define STIH415_USB2_SOFTRESET		5
+#define STIH415_KEYSCAN_SOFTRESET	6
+
+#endif /* _DT_BINDINGS_RESET_CONTROLLER_STIH415 */
diff --git a/include/dt-bindings/reset/stih416-resets.h b/include/dt-bindings/reset/stih416-resets.h
new file mode 100644
index 000000000000..fcf9af1ac0b2
--- /dev/null
+++ b/include/dt-bindings/reset/stih416-resets.h
@@ -0,0 +1,51 @@
+/*
+ * This header provides constants for the reset controller
+ * based peripheral powerdown requests on the STMicroelectronics
+ * STiH416 SoC.
+ */
+#ifndef _DT_BINDINGS_RESET_CONTROLLER_STIH416
+#define _DT_BINDINGS_RESET_CONTROLLER_STIH416
+
+#define STIH416_EMISS_POWERDOWN		0
+#define STIH416_NAND_POWERDOWN		1
+#define STIH416_KEYSCAN_POWERDOWN	2
+#define STIH416_USB0_POWERDOWN		3
+#define STIH416_USB1_POWERDOWN		4
+#define STIH416_USB2_POWERDOWN		5
+#define STIH416_USB3_POWERDOWN		6
+#define STIH416_SATA0_POWERDOWN		7
+#define STIH416_SATA1_POWERDOWN		8
+#define STIH416_PCIE0_POWERDOWN		9
+#define STIH416_PCIE1_POWERDOWN		10
+
+#define STIH416_ETH0_SOFTRESET		0
+#define STIH416_ETH1_SOFTRESET		1
+#define STIH416_IRB_SOFTRESET		2
+#define STIH416_USB0_SOFTRESET		3
+#define STIH416_USB1_SOFTRESET		4
+#define STIH416_USB2_SOFTRESET		5
+#define STIH416_USB3_SOFTRESET		6
+#define STIH416_SATA0_SOFTRESET		7
+#define STIH416_SATA1_SOFTRESET		8
+#define STIH416_PCIE0_SOFTRESET		9
+#define STIH416_PCIE1_SOFTRESET		10
+#define STIH416_AUD_DAC_SOFTRESET	11
+#define STIH416_HDTVOUT_SOFTRESET	12
+#define STIH416_VTAC_M_RX_SOFTRESET	13
+#define STIH416_VTAC_A_RX_SOFTRESET	14
+#define STIH416_SYNC_HD_SOFTRESET	15
+#define STIH416_SYNC_SD_SOFTRESET	16
+#define STIH416_BLITTER_SOFTRESET	17
+#define STIH416_GPU_SOFTRESET		18
+#define STIH416_VTAC_M_TX_SOFTRESET	19
+#define STIH416_VTAC_A_TX_SOFTRESET	20
+#define STIH416_VTG_AUX_SOFTRESET	21
+#define STIH416_JPEG_DEC_SOFTRESET	22
+#define STIH416_HVA_SOFTRESET		23
+#define STIH416_COMPO_M_SOFTRESET	24
+#define STIH416_COMPO_A_SOFTRESET	25
+#define STIH416_VP8_DEC_SOFTRESET	26
+#define STIH416_VTG_MAIN_SOFTRESET	27
+#define STIH416_KEYSCAN_SOFTRESET	28
+
+#endif /* _DT_BINDINGS_RESET_CONTROLLER_STIH416 */
-- 
cgit v1.2.3


From 27dc2fb18fe814fab0a5035859c77b279cdaec80 Mon Sep 17 00:00:00 2001
From: Joachim Eastwood <manabian@gmail.com>
Date: Wed, 6 May 2015 00:10:27 +0200
Subject: doc: dt: add documentation for lpc1850-rgu reset driver

Add device tree binding documentation for the Reset Generation Unit
(RGU) found on NXP LPC18xx and LPC43xx devies.

This documentation also includes a table which shows the RGU reset
number and the connected peripheral.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 .../devicetree/bindings/reset/nxp,lpc1850-rgu.txt  | 84 ++++++++++++++++++++++
 1 file changed, 84 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/reset/nxp,lpc1850-rgu.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/reset/nxp,lpc1850-rgu.txt b/Documentation/devicetree/bindings/reset/nxp,lpc1850-rgu.txt
new file mode 100644
index 000000000000..b4e96a278445
--- /dev/null
+++ b/Documentation/devicetree/bindings/reset/nxp,lpc1850-rgu.txt
@@ -0,0 +1,84 @@
+NXP LPC1850  Reset Generation Unit (RGU)
+========================================
+
+Please also refer to reset.txt in this directory for common reset
+controller binding usage.
+
+Required properties:
+- compatible: Should be "nxp,lpc1850-rgu"
+- reg: register base and length
+- clocks: phandle and clock specifier to RGU clocks
+- clock-names: should contain "delay" and "reg"
+- #reset-cells: should be 1
+
+See table below for valid peripheral reset numbers. Numbers not
+in the table below are either reserved or not applicable for
+normal operation.
+
+Reset	Peripheral
+  9	System control unit (SCU)
+ 12	ARM Cortex-M0 subsystem core (LPC43xx only)
+ 13	CPU core
+ 16	LCD controller
+ 17	USB0
+ 18	USB1
+ 19	DMA
+ 20	SDIO
+ 21	External memory controller (EMC)
+ 22	Ethernet
+ 25	Flash bank A
+ 27	EEPROM
+ 28	GPIO
+ 29	Flash bank B
+ 32	Timer0
+ 33	Timer1
+ 34	Timer2
+ 35	Timer3
+ 36	Repetitive Interrupt timer (RIT)
+ 37	State Configurable Timer (SCT)
+ 38	Motor control PWM (MCPWM)
+ 39	QEI
+ 40	ADC0
+ 41	ADC1
+ 42	DAC
+ 44	USART0
+ 45	UART1
+ 46	USART2
+ 47	USART3
+ 48	I2C0
+ 49	I2C1
+ 50	SSP0
+ 51	SSP1
+ 52	I2S0 and I2S1
+ 53	Serial Flash Interface (SPIFI)
+ 54	C_CAN1
+ 55	C_CAN0
+ 56	ARM Cortex-M0 application core (LPC4370 only)
+ 57	SGPIO (LPC43xx only)
+ 58	SPI (LPC43xx only)
+ 60	ADCHS (12-bit ADC) (LPC4370 only)
+
+Refer to NXP LPC18xx or LPC43xx user manual for more details about
+the reset signals and the connected block/peripheral.
+
+Reset provider example:
+rgu: reset-controller@40053000 {
+	compatible = "nxp,lpc1850-rgu";
+	reg = <0x40053000 0x1000>;
+	clocks = <&cgu BASE_SAFE_CLK>, <&ccu1 CLK_CPU_BUS>;
+	clock-names = "delay", "reg";
+	#reset-cells = <1>;
+};
+
+Reset consumer example:
+mac: ethernet@40010000 {
+	compatible = "nxp,lpc1850-dwmac", "snps,dwmac-3.611", "snps,dwmac";
+	reg = <0x40010000 0x2000>;
+	interrupts = <5>;
+	interrupt-names = "macirq";
+	clocks = <&ccu1 CLK_CPU_ETHERNET>;
+	clock-names = "stmmaceth";
+	resets = <&rgu 22>;
+	reset-names = "stmmaceth";
+	status = "disabled";
+};
-- 
cgit v1.2.3


From a73622a70a31c846d9d2a971c78fd1ffab88afcd Mon Sep 17 00:00:00 2001
From: Suman Anna <s-anna@ti.com>
Date: Mon, 20 Jul 2015 17:33:23 -0500
Subject: Documentation: dt: Add #iommu-cells info to OMAP iommu bindings

The OMAP IOMMU bindings is updated to reflect the required #iommu-cells
property.

Signed-off-by: Suman Anna <s-anna@ti.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 Documentation/devicetree/bindings/iommu/ti,omap-iommu.txt | 6 ++++++
 1 file changed, 6 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/iommu/ti,omap-iommu.txt b/Documentation/devicetree/bindings/iommu/ti,omap-iommu.txt
index 42531dc387aa..869699925fd5 100644
--- a/Documentation/devicetree/bindings/iommu/ti,omap-iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/ti,omap-iommu.txt
@@ -8,6 +8,11 @@ Required properties:
 - ti,hwmods  : Name of the hwmod associated with the IOMMU instance
 - reg        : Address space for the configuration registers
 - interrupts : Interrupt specifier for the IOMMU instance
+- #iommu-cells : Should be 0. OMAP IOMMUs are all "single-master" devices,
+                 and needs no additional data in the pargs specifier. Please
+                 also refer to the generic bindings document for more info
+                 on this property,
+                     Documentation/devicetree/bindings/iommu/iommu.txt
 
 Optional properties:
 - ti,#tlb-entries : Number of entries in the translation look-aside buffer.
@@ -18,6 +23,7 @@ Optional properties:
 Example:
 	/* OMAP3 ISP MMU */
 	mmu_isp: mmu@480bd400 {
+		#iommu-cells = <0>;
 		compatible = "ti,omap2-iommu";
 		reg = <0x480bd400 0x80>;
 		interrupts = <24>;
-- 
cgit v1.2.3


From 25a0a5ce16ecd7e60c4cf1436892433873e9d99d Mon Sep 17 00:00:00 2001
From: Ni Wade <wni@nvidia.com>
Date: Tue, 14 Jul 2015 15:40:56 +0800
Subject: thermal: add available policies sysfs attribute

The Linux thermal framework support to change thermal governor
policy in userspace, but it can't show what available policies
supported.

This patch adds available_policies attribute to the thermal
framework, it can list the thermal governors which can be
used for a particular zone. This attribute is read only.

Signed-off-by: Wei Ni <wni@nvidia.com>
Reviewed-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
---
 Documentation/thermal/sysfs-api.txt |  6 ++++++
 drivers/thermal/thermal_core.c      | 28 ++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/thermal/sysfs-api.txt b/Documentation/thermal/sysfs-api.txt
index c1f6864a8c5d..10f062ea6bc2 100644
--- a/Documentation/thermal/sysfs-api.txt
+++ b/Documentation/thermal/sysfs-api.txt
@@ -180,6 +180,7 @@ Thermal zone device sys I/F, created once it's registered:
     |---temp:			Current temperature
     |---mode:			Working mode of the thermal zone
     |---policy:			Thermal governor used for this zone
+    |---available_policies:	Available thermal governors for this zone
     |---trip_point_[0-*]_temp:	Trip point temperature
     |---trip_point_[0-*]_type:	Trip point type
     |---trip_point_[0-*]_hyst:	Hysteresis value for this trip point
@@ -256,6 +257,10 @@ policy
 	One of the various thermal governors used for a particular zone.
 	RW, Required
 
+available_policies
+	Available thermal governors which can be used for a particular zone.
+	RO, Required
+
 trip_point_[0-*]_temp
 	The temperature above which trip point will be fired.
 	Unit: millidegree Celsius
@@ -417,6 +422,7 @@ method, the sys I/F structure will be built like this:
     |---temp:			37000
     |---mode:			enabled
     |---policy:			step_wise
+    |---available_policies:	step_wise fair_share
     |---trip_point_0_temp:	100000
     |---trip_point_0_type:	critical
     |---trip_point_1_temp:	80000
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 04659bfb888b..c4700950e42e 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -847,6 +847,27 @@ policy_show(struct device *dev, struct device_attribute *devattr, char *buf)
 	return sprintf(buf, "%s\n", tz->governor->name);
 }
 
+static ssize_t
+available_policies_show(struct device *dev, struct device_attribute *devattr,
+			char *buf)
+{
+	struct thermal_governor *pos;
+	ssize_t count = 0;
+	ssize_t size = PAGE_SIZE;
+
+	mutex_lock(&thermal_governor_lock);
+
+	list_for_each_entry(pos, &thermal_governor_list, governor_list) {
+		size = PAGE_SIZE - count;
+		count += scnprintf(buf + count, size, "%s ", pos->name);
+	}
+	count += scnprintf(buf + count, size, "\n");
+
+	mutex_unlock(&thermal_governor_lock);
+
+	return count;
+}
+
 #ifdef CONFIG_THERMAL_EMULATION
 static ssize_t
 emul_temp_store(struct device *dev, struct device_attribute *attr,
@@ -1032,6 +1053,7 @@ static DEVICE_ATTR(temp, 0444, temp_show, NULL);
 static DEVICE_ATTR(mode, 0644, mode_show, mode_store);
 static DEVICE_ATTR(passive, S_IRUGO | S_IWUSR, passive_show, passive_store);
 static DEVICE_ATTR(policy, S_IRUGO | S_IWUSR, policy_show, policy_store);
+static DEVICE_ATTR(available_policies, S_IRUGO, available_policies_show, NULL);
 
 /* sys I/F for cooling device */
 #define to_cooling_device(_dev)	\
@@ -1817,6 +1839,11 @@ struct thermal_zone_device *thermal_zone_device_register(const char *type,
 	if (result)
 		goto unregister;
 
+	/* Create available_policies attribute */
+	result = device_create_file(&tz->device, &dev_attr_available_policies);
+	if (result)
+		goto unregister;
+
 	/* Update 'this' zone's governor information */
 	mutex_lock(&thermal_governor_lock);
 
@@ -1917,6 +1944,7 @@ void thermal_zone_device_unregister(struct thermal_zone_device *tz)
 	if (tz->ops->get_mode)
 		device_remove_file(&tz->device, &dev_attr_mode);
 	device_remove_file(&tz->device, &dev_attr_policy);
+	device_remove_file(&tz->device, &dev_attr_available_policies);
 	remove_trip_attrs(tz);
 	thermal_set_governor(tz, NULL);
 
-- 
cgit v1.2.3


From 62f46669688bf13643b91bbc30f19da1095a5ed6 Mon Sep 17 00:00:00 2001
From: Dirk Behme <dirk.behme@de.bosch.com>
Date: Mon, 3 Aug 2015 13:03:09 -0700
Subject: Input: zforce - make the interrupt GPIO optional

Add support for hardware which uses an I2C Serializer / Deserializer
(SerDes) to communicate with the zFroce touch driver. In this case the
SerDes will be configured as an interrupt controller and the zForce driver
will have no access to poll the GPIO line.

To support this, we add two dedicated new GPIOs in the device tree:
reset-gpios and irq-gpios, with the irq-gpios being optional.

To not break the existing device trees, the index based 'gpios' entries
are still supported, but marked as deprecated.

With this, if the interrupt GPIO is available, either via the old or new
device tree style, the while loop will read and handle the packets as long
as the GPIO indicates that the interrupt is asserted (existing, unchanged
driver behavior).

If the interrupt GPIO isn't available, i.e. not configured via the new
device tree style, we are falling back to one read per ISR invocation
(new behavior to support the SerDes).

Note that the gpiod functions help to handle the optional GPIO:
devm_gpiod_get_index_optional() will return NULL in case the interrupt
GPIO isn't available. And gpiod_get_value_cansleep() does cover this, too,
by returning 0 in this case.

Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@bq.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 .../bindings/input/touchscreen/zforce_ts.txt       |  8 +--
 drivers/input/touchscreen/zforce_ts.c              | 63 +++++++++++++++++-----
 2 files changed, 53 insertions(+), 18 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/input/touchscreen/zforce_ts.txt b/Documentation/devicetree/bindings/input/touchscreen/zforce_ts.txt
index 80c37df940a7..e3c27c4fd9c8 100644
--- a/Documentation/devicetree/bindings/input/touchscreen/zforce_ts.txt
+++ b/Documentation/devicetree/bindings/input/touchscreen/zforce_ts.txt
@@ -4,12 +4,12 @@ Required properties:
 - compatible: must be "neonode,zforce"
 - reg: I2C address of the chip
 - interrupts: interrupt to which the chip is connected
-- gpios: gpios the chip is connected to
-  first one is the interrupt gpio and second one the reset gpio
+- reset-gpios: reset gpio the chip is connected to
 - x-size: horizontal resolution of touchscreen
 - y-size: vertical resolution of touchscreen
 
 Optional properties:
+- irq-gpios : interrupt gpio the chip is connected to
 - vdd-supply: Regulator controlling the controller supply
 
 Example:
@@ -23,8 +23,8 @@ Example:
 			interrupts = <2 0>;
 			vdd-supply = <&reg_zforce_vdd>;
 
-			gpios = <&gpio5 6 0>, /* INT */
-				<&gpio5 9 0>; /* RST */
+			reset-gpios = <&gpio5 9 0>; /* RST */
+			irq-gpios = <&gpio5 6 0>; /* IRQ, optional */
 
 			x-size = <800>;
 			y-size = <600>;
diff --git a/drivers/input/touchscreen/zforce_ts.c b/drivers/input/touchscreen/zforce_ts.c
index 0aa934c3540e..781d0f83050a 100644
--- a/drivers/input/touchscreen/zforce_ts.c
+++ b/drivers/input/touchscreen/zforce_ts.c
@@ -510,7 +510,16 @@ static irqreturn_t zforce_irq_thread(int irq, void *dev_id)
 	if (!ts->suspending && device_may_wakeup(&client->dev))
 		pm_stay_awake(&client->dev);
 
-	while (gpiod_get_value_cansleep(ts->gpio_int)) {
+	/*
+	 * Run at least once and exit the loop if
+	 * - the optional interrupt GPIO isn't specified
+	 *   (there is only one packet read per ISR invocation, then)
+	 * or
+	 * - the GPIO isn't active any more
+	 *   (packet read until the level GPIO indicates that there is
+	 *    no IRQ any more)
+	 */
+	do {
 		ret = zforce_read_packet(ts, payload_buffer);
 		if (ret < 0) {
 			dev_err(&client->dev,
@@ -577,7 +586,7 @@ static irqreturn_t zforce_irq_thread(int irq, void *dev_id)
 				payload[RESPONSE_ID]);
 			break;
 		}
-	}
+	} while (gpiod_get_value_cansleep(ts->gpio_int));
 
 	if (!ts->suspending && device_may_wakeup(&client->dev))
 		pm_relax(&client->dev);
@@ -754,18 +763,8 @@ static int zforce_probe(struct i2c_client *client,
 	if (!ts)
 		return -ENOMEM;
 
-	/* INT GPIO */
-	ts->gpio_int = devm_gpiod_get_index(&client->dev, NULL, 0, GPIOD_IN);
-	if (IS_ERR(ts->gpio_int)) {
-		ret = PTR_ERR(ts->gpio_int);
-		dev_err(&client->dev,
-			"failed to request interrupt GPIO: %d\n", ret);
-		return ret;
-	}
-
-	/* RST GPIO */
-	ts->gpio_rst = devm_gpiod_get_index(&client->dev, NULL, 1,
-					    GPIOD_OUT_HIGH);
+	ts->gpio_rst = devm_gpiod_get_optional(&client->dev, "reset",
+					       GPIOD_OUT_HIGH);
 	if (IS_ERR(ts->gpio_rst)) {
 		ret = PTR_ERR(ts->gpio_rst);
 		dev_err(&client->dev,
@@ -773,6 +772,42 @@ static int zforce_probe(struct i2c_client *client,
 		return ret;
 	}
 
+	if (ts->gpio_rst) {
+		ts->gpio_int = devm_gpiod_get_optional(&client->dev, "irq",
+						       GPIOD_IN);
+		if (IS_ERR(ts->gpio_int)) {
+			ret = PTR_ERR(ts->gpio_int);
+			dev_err(&client->dev,
+				"failed to request interrupt GPIO: %d\n", ret);
+			return ret;
+		}
+	} else {
+		/*
+		 * Deprecated GPIO handling for compatibility
+		 * with legacy binding.
+		 */
+
+		/* INT GPIO */
+		ts->gpio_int = devm_gpiod_get_index(&client->dev, NULL, 0,
+						    GPIOD_IN);
+		if (IS_ERR(ts->gpio_int)) {
+			ret = PTR_ERR(ts->gpio_int);
+			dev_err(&client->dev,
+				"failed to request interrupt GPIO: %d\n", ret);
+			return ret;
+		}
+
+		/* RST GPIO */
+		ts->gpio_rst = devm_gpiod_get_index(&client->dev, NULL, 1,
+					    GPIOD_OUT_HIGH);
+		if (IS_ERR(ts->gpio_rst)) {
+			ret = PTR_ERR(ts->gpio_rst);
+			dev_err(&client->dev,
+				"failed to request reset GPIO: %d\n", ret);
+			return ret;
+		}
+	}
+
 	ts->reg_vdd = devm_regulator_get_optional(&client->dev, "vdd");
 	if (IS_ERR(ts->reg_vdd)) {
 		ret = PTR_ERR(ts->reg_vdd);
-- 
cgit v1.2.3


From a2e66ad34c10b27c6e408db50e6472c2aae2b5e8 Mon Sep 17 00:00:00 2001
From: Valentin Rothberg <valentinrothberg@gmail.com>
Date: Wed, 29 Jul 2015 09:08:53 +0200
Subject: Documentation: sysfs-bus-usb: s/CONFIG_PM_RUNTIME/CONFIG_PM/

PM_RUNTIME has been replaced with PM by commit 464ed18ebdb6 ("PM:
Eliminate CONFIG_PM_RUNTIME") hence substitute the reference on
PM_RUNTIME with PM and re-arrange the text to 80 column break.

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/ABI/testing/sysfs-bus-usb | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb
index e935625f5995..864637f25bee 100644
--- a/Documentation/ABI/testing/sysfs-bus-usb
+++ b/Documentation/ABI/testing/sysfs-bus-usb
@@ -118,12 +118,12 @@ What:		/sys/bus/usb/devices/.../power/usb3_hardware_lpm
 Date:		June 2015
 Contact:	Kevin Strasser <kevin.strasser@linux.intel.com>
 Description:
-		If CONFIG_PM_RUNTIME is set and a USB 3.0 lpm-capable device is
-		plugged in to a xHCI host which supports link PM, it will check
-		if U1 and U2 exit latencies have been set in the BOS
-		descriptor; if the check is is passed and the host supports
-		USB3 hardware LPM, USB3 hardware LPM will be enabled for the
-		device and the USB device directory will contain a file named
+		If CONFIG_PM is set and a USB 3.0 lpm-capable device is plugged
+		in to a xHCI host which supports link PM, it will check if U1
+		and U2 exit latencies have been set in the BOS descriptor; if
+		the check is is passed and the host supports USB3 hardware LPM,
+		USB3 hardware LPM will be enabled for the device and the USB
+		device directory will contain a file named
 		power/usb3_hardware_lpm. The file holds a string value (enable
 		or disable) indicating whether or not USB3 hardware LPM is
 		enabled for the device.
-- 
cgit v1.2.3


From 3c7c8468e5d993dfe377a67e41cbb23cda93af9e Mon Sep 17 00:00:00 2001
From: Tomas Winkler <tomas.winkler@intel.com>
Date: Sun, 26 Jul 2015 09:54:20 +0300
Subject: mei: add async event notification ioctls

Add ioctl IOCTL_MEI_NOTIFY_SET for enabling and disabling
async event notification.
Add ioctl IOCTL_MEI_NOTIFY_GET for receiving and acking
an event notification.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/ioctl/ioctl-number.txt   |  2 +
 Documentation/misc-devices/mei/mei.txt | 45 ++++++++++++++++++++++-
 drivers/misc/mei/main.c                | 67 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/mei.h               | 19 ++++++++++
 4 files changed, 132 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 611c52267d24..141f847c7648 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -124,6 +124,8 @@ Code  Seq#(hex)	Include File		Comments
 'H'	00-7F	linux/hiddev.h		conflict!
 'H'	00-0F	linux/hidraw.h		conflict!
 'H'	01	linux/mei.h		conflict!
+'H'	02	linux/mei.h		conflict!
+'H'	03	linux/mei.h		conflict!
 'H'	00-0F	sound/asound.h		conflict!
 'H'	20-40	sound/asound_fm.h	conflict!
 'H'	80-8F	sound/sfnt_info.h	conflict!
diff --git a/Documentation/misc-devices/mei/mei.txt b/Documentation/misc-devices/mei/mei.txt
index 8d47501bba0a..91c1fa34f48b 100644
--- a/Documentation/misc-devices/mei/mei.txt
+++ b/Documentation/misc-devices/mei/mei.txt
@@ -96,7 +96,7 @@ A code snippet for an application communicating with Intel AMTHI client:
 IOCTL
 =====
 
-The Intel MEI Driver supports the following IOCTL command:
+The Intel MEI Driver supports the following IOCTL commands:
 	IOCTL_MEI_CONNECT_CLIENT	Connect to firmware Feature (client).
 
 	usage:
@@ -125,6 +125,49 @@ The Intel MEI Driver supports the following IOCTL command:
         data that can be sent or received. (e.g. if MTU=2K, can send
         requests up to bytes 2k and received responses up to 2k bytes).
 
+	IOCTL_MEI_NOTIFY_SET: enable or disable event notifications
+
+	Usage:
+		uint32_t enable;
+		ioctl(fd, IOCTL_MEI_NOTIFY_SET, &enable);
+
+	Inputs:
+		uint32_t enable = 1;
+		or
+		uint32_t enable[disable] = 0;
+
+	Error returns:
+		EINVAL	Wrong IOCTL Number
+		ENODEV	Device  is not initialized or the client not connected
+		ENOMEM	Unable to allocate memory to client internal data.
+		EFAULT	Fatal Error (e.g. Unable to access user input data)
+		EOPNOTSUPP if the device doesn't support the feature
+
+	Notes:
+	The client must be connected in order to enable notification events
+
+
+	IOCTL_MEI_NOTIFY_GET : retrieve event
+
+	Usage:
+		uint32_t event;
+		ioctl(fd, IOCTL_MEI_NOTIFY_GET, &event);
+
+	Outputs:
+		1 - if an event is pending
+		0 - if there is no even pending
+
+	Error returns:
+		EINVAL	Wrong IOCTL Number
+		ENODEV	Device is not initialized or the client not connected
+		ENOMEM	Unable to allocate memory to client internal data.
+		EFAULT	Fatal Error (e.g. Unable to access user input data)
+		EOPNOTSUPP if the device doesn't support the feature
+
+	Notes:
+	The client must be connected and event notification has to be enabled
+	in order to receive an event
+
 
 Intel ME Applications
 =====================
diff --git a/drivers/misc/mei/main.c b/drivers/misc/mei/main.c
index e9513d651cd3..ffa70035af29 100644
--- a/drivers/misc/mei/main.c
+++ b/drivers/misc/mei/main.c
@@ -445,6 +445,45 @@ end:
 	return rets;
 }
 
+/**
+ * mei_ioctl_client_notify_request -
+ *     propagate event notification request to client
+ *
+ * @file: pointer to file structure
+ * @request: 0 - disable, 1 - enable
+ *
+ * Return: 0 on success , <0 on error
+ */
+static int mei_ioctl_client_notify_request(struct file *file, u32 request)
+{
+	struct mei_cl *cl = file->private_data;
+
+	return mei_cl_notify_request(cl, file, request);
+}
+
+/**
+ * mei_ioctl_client_notify_get -  wait for notification request
+ *
+ * @file: pointer to file structure
+ * @notify_get: 0 - disable, 1 - enable
+ *
+ * Return: 0 on success , <0 on error
+ */
+static int mei_ioctl_client_notify_get(struct file *file, u32 *notify_get)
+{
+	struct mei_cl *cl = file->private_data;
+	bool notify_ev;
+	bool block = (file->f_flags & O_NONBLOCK) == 0;
+	int rets;
+
+	rets = mei_cl_notify_get(cl, block, &notify_ev);
+	if (rets)
+		return rets;
+
+	*notify_get = notify_ev ? 1 : 0;
+	return 0;
+}
+
 /**
  * mei_ioctl - the IOCTL function
  *
@@ -459,6 +498,7 @@ static long mei_ioctl(struct file *file, unsigned int cmd, unsigned long data)
 	struct mei_device *dev;
 	struct mei_cl *cl = file->private_data;
 	struct mei_connect_client_data connect_data;
+	u32 notify_get, notify_req;
 	int rets;
 
 
@@ -499,6 +539,33 @@ static long mei_ioctl(struct file *file, unsigned int cmd, unsigned long data)
 
 		break;
 
+	case IOCTL_MEI_NOTIFY_SET:
+		dev_dbg(dev->dev, ": IOCTL_MEI_NOTIFY_SET.\n");
+		if (copy_from_user(&notify_req,
+				   (char __user *)data, sizeof(notify_req))) {
+			dev_dbg(dev->dev, "failed to copy data from userland\n");
+			rets = -EFAULT;
+			goto out;
+		}
+		rets = mei_ioctl_client_notify_request(file, notify_req);
+		break;
+
+	case IOCTL_MEI_NOTIFY_GET:
+		dev_dbg(dev->dev, ": IOCTL_MEI_NOTIFY_GET.\n");
+		rets = mei_ioctl_client_notify_get(file, &notify_get);
+		if (rets)
+			goto out;
+
+		dev_dbg(dev->dev, "copy connect data to user\n");
+		if (copy_to_user((char __user *)data,
+				&notify_get, sizeof(notify_get))) {
+			dev_dbg(dev->dev, "failed to copy data to userland\n");
+			rets = -EFAULT;
+			goto out;
+
+		}
+		break;
+
 	default:
 		dev_err(dev->dev, ": unsupported ioctl %d.\n", cmd);
 		rets = -ENOIOCTLCMD;
diff --git a/include/uapi/linux/mei.h b/include/uapi/linux/mei.h
index bc0d8b69c49e..7c3b64f6a215 100644
--- a/include/uapi/linux/mei.h
+++ b/include/uapi/linux/mei.h
@@ -107,4 +107,23 @@ struct mei_connect_client_data {
 	};
 };
 
+/**
+ * DOC: set and unset event notification for a connected client
+ *
+ * The IOCTL argument is 1 for enabling event notification and 0 for
+ * disabling the service
+ * Return:  -EOPNOTSUPP if the devices doesn't support the feature
+ */
+#define IOCTL_MEI_NOTIFY_SET _IOW('H', 0x02, __u32)
+
+/**
+ * DOC: retrieve notification
+ *
+ * The IOCTL output argument is 1 if an event was is pending and 0 otherwise
+ * the ioctl has to be called in order to acknowledge pending event
+ *
+ * Return:  -EOPNOTSUPP if the devices doesn't support the feature
+ */
+#define IOCTL_MEI_NOTIFY_GET _IOR('H', 0x03, __u32)
+
 #endif /* _LINUX_MEI_H  */
-- 
cgit v1.2.3


From f25b5edac7be7d405965fb68c6b6f65d8852d860 Mon Sep 17 00:00:00 2001
From: Alban Bedel <albeu@free.fr>
Date: Mon, 3 Aug 2015 19:23:51 +0200
Subject: devicetree: Add bindings for the ATH79 reset controller

Signed-off-by: Alban Bedel <albeu@free.fr>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 .../devicetree/bindings/reset/ath79-reset.txt        | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/reset/ath79-reset.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/reset/ath79-reset.txt b/Documentation/devicetree/bindings/reset/ath79-reset.txt
new file mode 100644
index 000000000000..4c56330bf398
--- /dev/null
+++ b/Documentation/devicetree/bindings/reset/ath79-reset.txt
@@ -0,0 +1,20 @@
+Binding for Qualcomm Atheros AR7xxx/AR9XXX reset controller
+
+Please also refer to reset.txt in this directory for common reset
+controller binding usage.
+
+Required Properties:
+- compatible: has to be "qca,<soctype>-reset", "qca,ar7100-reset"
+              as fallback
+- reg: Base address and size of the controllers memory area
+- #reset-cells : Specifies the number of cells needed to encode reset
+                 line, should be 1
+
+Example:
+
+	reset-controller@1806001c {
+		compatible = "qca,ar9132-reset", "qca,ar7100-reset";
+		reg = <0x1806001c 0x4>;
+
+		#reset-cells = <1>;
+	};
-- 
cgit v1.2.3


From f6e45c24f401f3d0e648bfba304c83b64d763559 Mon Sep 17 00:00:00 2001
From: Stephan Mueller <smueller@chronox.de>
Date: Mon, 3 Aug 2015 09:08:05 +0200
Subject: crypto: doc - AEAD API conversion

The AEAD API changes are now reflected in the crypto API doc book.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
 Documentation/DocBook/crypto-API.tmpl |  4 ++--
 include/crypto/aead.h                 | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/crypto-API.tmpl b/Documentation/DocBook/crypto-API.tmpl
index 0992531ffefb..8e17a41df4c3 100644
--- a/Documentation/DocBook/crypto-API.tmpl
+++ b/Documentation/DocBook/crypto-API.tmpl
@@ -585,7 +585,7 @@ kernel crypto API                                |   IPSEC Layer
 +-----------+                                    |
 |           |            (1)
 |   aead    | <-----------------------------------  esp_output
-| (seqniv)  | ---+
+|  (seqiv)  | ---+
 +-----------+    |
                  | (2)
 +-----------+    |
@@ -1687,7 +1687,7 @@ read(opfd, out, outlen);
 !Pinclude/linux/crypto.h Block Cipher Algorithm Definitions
 !Finclude/linux/crypto.h crypto_alg
 !Finclude/linux/crypto.h ablkcipher_alg
-!Finclude/linux/crypto.h aead_alg
+!Finclude/crypto/aead.h aead_alg
 !Finclude/linux/crypto.h blkcipher_alg
 !Finclude/linux/crypto.h cipher_alg
 !Finclude/crypto/rng.h rng_alg
diff --git a/include/crypto/aead.h b/include/crypto/aead.h
index 7169ad04acc0..14e35364cdfa 100644
--- a/include/crypto/aead.h
+++ b/include/crypto/aead.h
@@ -45,6 +45,30 @@
  * a breach in the integrity of the message. In essence, that -EBADMSG error
  * code is the key bonus an AEAD cipher has over "standard" block chaining
  * modes.
+ *
+ * Memory Structure:
+ *
+ * To support the needs of the most prominent user of AEAD ciphers, namely
+ * IPSEC, the AEAD ciphers have a special memory layout the caller must adhere
+ * to.
+ *
+ * The scatter list pointing to the input data must contain:
+ *
+ * * for RFC4106 ciphers, the concatenation of
+ * associated authentication data || IV || plaintext or ciphertext. Note, the
+ * same IV (buffer) is also set with the aead_request_set_crypt call. Note,
+ * the API call of aead_request_set_ad must provide the length of the AAD and
+ * the IV. The API call of aead_request_set_crypt only points to the size of
+ * the input plaintext or ciphertext.
+ *
+ * * for "normal" AEAD ciphers, the concatenation of
+ * associated authentication data || plaintext or ciphertext.
+ *
+ * It is important to note that if multiple scatter gather list entries form
+ * the input data mentioned above, the first entry must not point to a NULL
+ * buffer. If there is any potential where the AAD buffer can be NULL, the
+ * calling code must contain a precaution to ensure that this does not result
+ * in the first scatter gather list entry pointing to a NULL buffer.
  */
 
 /**
-- 
cgit v1.2.3


From 575c215c00140e83b32ca6c01443b8a458e55c73 Mon Sep 17 00:00:00 2001
From: Moritz Fischer <moritz.fischer@ettus.com>
Date: Thu, 30 Jul 2015 18:13:54 -0700
Subject: docs: dts: Added documentation for Xilinx Zynq Reset Controller
 bindings.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Moritz Fischer <moritz.fischer@ettus.com>
Reviewed-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
---
 .../devicetree/bindings/reset/zynq-reset.txt       | 68 ++++++++++++++++++++++
 1 file changed, 68 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/reset/zynq-reset.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/reset/zynq-reset.txt b/Documentation/devicetree/bindings/reset/zynq-reset.txt
new file mode 100644
index 000000000000..5860120e3064
--- /dev/null
+++ b/Documentation/devicetree/bindings/reset/zynq-reset.txt
@@ -0,0 +1,68 @@
+Xilinx Zynq Reset Manager
+
+The Zynq AP-SoC has several different resets.
+
+See Chapter 26 of the Zynq TRM (UG585) for more information about Zynq resets.
+
+Required properties:
+- compatible: "xlnx,zynq-reset"
+- reg: SLCR offset and size taken via syscon <0x200 0x48>
+- syscon: <&slcr>
+  This should be a phandle to the Zynq's SLCR registers.
+- #reset-cells: Must be 1
+
+The Zynq Reset Manager needs to be a childnode of the SLCR.
+
+Example:
+	rstc: rstc@200 {
+		compatible = "xlnx,zynq-reset";
+		reg = <0x200 0x48>;
+		#reset-cells = <1>;
+		syscon = <&slcr>;
+	};
+
+Reset outputs:
+ 0  : soft reset
+ 32 : ddr reset
+ 64 : topsw reset
+ 96 : dmac reset
+ 128: usb0 reset
+ 129: usb1 reset
+ 160: gem0 reset
+ 161: gem1 reset
+ 164: gem0 rx reset
+ 165: gem1 rx reset
+ 166: gem0 ref reset
+ 167: gem1 ref reset
+ 192: sdio0 reset
+ 193: sdio1 reset
+ 196: sdio0 ref reset
+ 197: sdio1 ref reset
+ 224: spi0 reset
+ 225: spi1 reset
+ 226: spi0 ref reset
+ 227: spi1 ref reset
+ 256: can0 reset
+ 257: can1 reset
+ 258: can0 ref reset
+ 259: can1 ref reset
+ 288: i2c0 reset
+ 289: i2c1 reset
+ 320: uart0 reset
+ 321: uart1 reset
+ 322: uart0 ref reset
+ 323: uart1 ref reset
+ 352: gpio reset
+ 384: lqspi reset
+ 385: qspi ref reset
+ 416: smc reset
+ 417: smc ref reset
+ 448: ocm reset
+ 512: fpga0 out reset
+ 513: fpga1 out reset
+ 514: fpga2 out reset
+ 515: fpga3 out reset
+ 544: a9 reset 0
+ 545: a9 reset 1
+ 552: peri reset
+
-- 
cgit v1.2.3


From 12d560f4ea87030667438a169912380be00cea4b Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Tue, 14 Jul 2015 18:35:23 -0700
Subject: rcu,locking: Privatize smp_mb__after_unlock_lock()

RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
likely the only thing that ever will use it, so this commit makes this
macro private to RCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
---
 Documentation/memory-barriers.txt   | 71 +++----------------------------------
 arch/powerpc/include/asm/spinlock.h |  2 --
 include/linux/spinlock.h            | 10 ------
 kernel/rcu/tree.h                   | 12 +++++++
 4 files changed, 16 insertions(+), 79 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 318523872db5..eafa6a53f72c 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from the perspective of
 another CPU not holding that lock.  In short, a ACQUIRE followed by an
 RELEASE may -not- be assumed to be a full memory barrier.
 
-Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
-imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
-pair to produce a full barrier, the ACQUIRE can be followed by an
-smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
-(including transitivity) if either (a) the RELEASE and the ACQUIRE are
-executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
-the same variable.  The smp_mb__after_unlock_lock() primitive is free
-on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
-execution of the critical sections corresponding to the RELEASE and the
-ACQUIRE can cross, so that:
+Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
+not imply a full memory barrier.  Therefore, the CPU's execution of the
+critical sections corresponding to the RELEASE and the ACQUIRE can cross,
+so that:
 
 	*A = a;
 	RELEASE M
@@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding the deadlock.
 	a sleep-unlock race, but the locking primitive needs to resolve
 	such races properly in any case.
 
-With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
-For example, with the following code, the store to *A will always be
-seen by other CPUs before the store to *B:
-
-	*A = a;
-	RELEASE M
-	ACQUIRE N
-	smp_mb__after_unlock_lock();
-	*B = b;
-
-The operations will always occur in one of the following orders:
-
-	STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
-	STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
-	ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B
-
-If the RELEASE and ACQUIRE were instead both operating on the same lock
-variable, only the first of these alternatives can occur.  In addition,
-the more strongly ordered systems may rule out some of the above orders.
-But in any case, as noted earlier, the smp_mb__after_unlock_lock()
-ensures that the store to *A will always be seen as happening before
-the store to *B.
-
 Locks and semaphores may not provide any guarantee of ordering on UP compiled
 systems, and so cannot be counted on in such a situation to actually achieve
 anything at all - especially with respect to I/O accesses - unless combined
@@ -2154,40 +2125,6 @@ But it won't see any of:
 	*E, *F or *G following RELEASE Q
 
 
-However, if the following occurs:
-
-	CPU 1				CPU 2
-	===============================	===============================
-	WRITE_ONCE(*A, a);
-	ACQUIRE M		     [1]
-	WRITE_ONCE(*B, b);
-	WRITE_ONCE(*C, c);
-	RELEASE M	     [1]
-	WRITE_ONCE(*D, d);		WRITE_ONCE(*E, e);
-					ACQUIRE M		     [2]
-					smp_mb__after_unlock_lock();
-					WRITE_ONCE(*F, f);
-					WRITE_ONCE(*G, g);
-					RELEASE M	     [2]
-					WRITE_ONCE(*H, h);
-
-CPU 3 might see:
-
-	*E, ACQUIRE M [1], *C, *B, *A, RELEASE M [1],
-		ACQUIRE M [2], *H, *F, *G, RELEASE M [2], *D
-
-But assuming CPU 1 gets the lock first, CPU 3 won't see any of:
-
-	*B, *C, *D, *F, *G or *H preceding ACQUIRE M [1]
-	*A, *B or *C following RELEASE M [1]
-	*F, *G or *H preceding ACQUIRE M [2]
-	*A, *B, *C, *E, *F or *G following RELEASE M [2]
-
-Note that the smp_mb__after_unlock_lock() is critically important
-here: Without it CPU 3 might see some of the above orderings.
-Without smp_mb__after_unlock_lock(), the accesses are not guaranteed
-to be seen in order unless CPU 3 holds lock M.
-
 
 ACQUIRES VS I/O ACCESSES
 ------------------------
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 4dbe072eecbe..523673d7583c 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -28,8 +28,6 @@
 #include <asm/synch.h>
 #include <asm/ppc-opcode.h>
 
-#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
-
 #ifdef CONFIG_PPC64
 /* use 0x800000yy when locked, where yy == CPU number */
 #ifdef __BIG_ENDIAN__
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 0063b24b4f36..16c5ed5a627c 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -130,16 +130,6 @@ do {								\
 #define smp_mb__before_spinlock()	smp_wmb()
 #endif
 
-/*
- * Place this after a lock-acquisition primitive to guarantee that
- * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
- * if the UNLOCK and LOCK are executed by the same CPU or if the
- * UNLOCK and LOCK operate on the same lock variable.
- */
-#ifndef smp_mb__after_unlock_lock
-#define smp_mb__after_unlock_lock()	do { } while (0)
-#endif
-
 /**
  * raw_spin_unlock_wait - wait until the spinlock gets unlocked
  * @lock: the spinlock in question.
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 0412030ca882..2e991f8361e4 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -653,3 +653,15 @@ static inline void rcu_nocb_q_lengths(struct rcu_data *rdp, long *ql, long *qll)
 #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
 }
 #endif /* #ifdef CONFIG_RCU_TRACE */
+
+/*
+ * Place this after a lock-acquisition primitive to guarantee that
+ * an UNLOCK+LOCK pair act as a full barrier.  This guarantee applies
+ * if the UNLOCK and LOCK are executed by the same CPU or if the
+ * UNLOCK and LOCK operate on the same lock variable.
+ */
+#ifdef CONFIG_PPC
+#define smp_mb__after_unlock_lock()	smp_mb()  /* Full ordering for lock. */
+#else /* #ifdef CONFIG_PPC */
+#define smp_mb__after_unlock_lock()	do { } while (0)
+#endif /* #else #ifdef CONFIG_PPC */
-- 
cgit v1.2.3


From 6abc8ca19df0078de17dc38340db3002ed489ce7 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Tue, 4 Aug 2015 15:20:55 -0400
Subject: cgroup: define controller file conventions

Traditionally, each cgroup controller implemented whatever interface
it wanted leading to interfaces which are widely inconsistent.
Examining the requirements of the controllers readily yield that there
are only a few control schemes shared among all.

Two major controllers already had to implement new interface for the
unified hierarchy due to significant structural changes.  Let's take
the chance to establish common conventions throughout all controllers.

This patch defines CGROUP_WEIGHT_MIN/DFL/MAX to be used on all weight
based control knobs and documents the conventions that controllers
should follow on the unified hierarchy.  Except for io.weight knob,
all existing unified hierarchy knobs are already compliant.  A
follow-up patch will update io.weight.

v2: Added descriptions of min, low and high knobs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 Documentation/cgroups/unified-hierarchy.txt | 80 ++++++++++++++++++++++++++---
 include/linux/cgroup.h                      |  9 ++++
 2 files changed, 81 insertions(+), 8 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/cgroups/unified-hierarchy.txt b/Documentation/cgroups/unified-hierarchy.txt
index 86847a7647ab..1ee9caf29e57 100644
--- a/Documentation/cgroups/unified-hierarchy.txt
+++ b/Documentation/cgroups/unified-hierarchy.txt
@@ -23,10 +23,13 @@ CONTENTS
 5. Other Changes
   5-1. [Un]populated Notification
   5-2. Other Core Changes
-  5-3. Per-Controller Changes
-    5-3-1. blkio
-    5-3-2. cpuset
-    5-3-3. memory
+  5-3. Controller File Conventions
+    5-3-1. Format
+    5-3-2. Control Knobs
+  5-4. Per-Controller Changes
+    5-4-1. blkio
+    5-4-2. cpuset
+    5-4-3. memory
 6. Planned Changes
   6-1. CAP for resource control
 
@@ -372,14 +375,75 @@ supported and the interface files "release_agent" and
 - The "cgroup.clone_children" file is removed.
 
 
-5-3. Per-Controller Changes
+5-3. Controller File Conventions
 
-5-3-1. blkio
+5-3-1. Format
+
+In general, all controller files should be in one of the following
+formats whenever possible.
+
+- Values only files
+
+  VAL0 VAL1...\n
+
+- Flat keyed files
+
+  KEY0 VAL0\n
+  KEY1 VAL1\n
+  ...
+
+- Nested keyed files
+
+  KEY0 SUB_KEY0=VAL00 SUB_KEY1=VAL01...
+  KEY1 SUB_KEY0=VAL10 SUB_KEY1=VAL11...
+  ...
+
+For a writeable file, the format for writing should generally match
+reading; however, controllers may allow omitting later fields or
+implement restricted shortcuts for most common use cases.
+
+For both flat and nested keyed files, only the values for a single key
+can be written at a time.  For nested keyed files, the sub key pairs
+may be specified in any order and not all pairs have to be specified.
+
+
+5-3-2. Control Knobs
+
+- Settings for a single feature should generally be implemented in a
+  single file.
+
+- In general, the root cgroup should be exempt from resource control
+  and thus shouldn't have resource control knobs.
+
+- If a controller implements ratio based resource distribution, the
+  control knob should be named "weight" and have the range [1, 10000]
+  and 100 should be the default value.  The values are chosen to allow
+  enough and symmetric bias in both directions while keeping it
+  intuitive (the default is 100%).
+
+- If a controller implements an absolute resource guarantee and/or
+  limit, the control knobs should be named "min" and "max"
+  respectively.  If a controller implements best effort resource
+  gurantee and/or limit, the control knobs should be named "low" and
+  "high" respectively.
+
+  In the above four control files, the special token "max" should be
+  used to represent upward infinity for both reading and writing.
+
+- If a setting has configurable default value and specific overrides,
+  the default settings should be keyed with "default" and appear as
+  the first entry in the file.  Specific entries can use "default" as
+  its value to indicate inheritance of the default value.
+
+
+5-4. Per-Controller Changes
+
+5-4-1. blkio
 
 - blk-throttle becomes properly hierarchical.
 
 
-5-3-2. cpuset
+5-4-2. cpuset
 
 - Tasks are kept in empty cpusets after hotplug and take on the masks
   of the nearest non-empty ancestor, instead of being moved to it.
@@ -388,7 +452,7 @@ supported and the interface files "release_agent" and
   masks of the nearest non-empty ancestor.
 
 
-5-3-3. memory
+5-4-3. memory
 
 - use_hierarchy is on by default and the cgroup file for the flag is
   not created.
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index a593e299162e..c6bf9d30c270 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -22,6 +22,15 @@
 
 #ifdef CONFIG_CGROUPS
 
+/*
+ * All weight knobs on the default hierarhcy should use the following min,
+ * default and max values.  The default value is the logarithmic center of
+ * MIN and MAX and allows 100x to be expressed in both directions.
+ */
+#define CGROUP_WEIGHT_MIN		1
+#define CGROUP_WEIGHT_DFL		100
+#define CGROUP_WEIGHT_MAX		10000
+
 /* a css_task_iter should be treated as an opaque object */
 struct css_task_iter {
 	struct cgroup_subsys		*ss;
-- 
cgit v1.2.3


From 7daaea256de42da112805703e3c77f08973156b3 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk@kernel.org>
Date: Thu, 25 Jun 2015 17:43:04 -0700
Subject: f2fs: add noextent_cache mount option

This patch adds noextent_cache mount option.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 Documentation/filesystems/f2fs.txt | 4 +++-
 fs/f2fs/super.c                    | 7 +++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
index e9e750e59efc..e2d5105b7214 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -143,7 +143,9 @@ fastboot               This option is used when a system wants to reduce mount
 extent_cache           Enable an extent cache based on rb-tree, it can cache
                        as many as extent which map between contiguous logical
                        address and physical address per inode, resulting in
-                       increasing the cache hit ratio.
+                       increasing the cache hit ratio. Set by default.
+noextent_cache         Diable an extent cache based on rb-tree explicitly, see
+                       the above extent_cache mount option.
 noinline_data          Disable the inline data feature, inline data feature is
                        enabled by default.
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index bc7684b6d57a..92520228ce71 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -65,6 +65,7 @@ enum {
 	Opt_nobarrier,
 	Opt_fastboot,
 	Opt_extent_cache,
+	Opt_noextent_cache,
 	Opt_noinline_data,
 	Opt_err,
 };
@@ -88,6 +89,7 @@ static match_table_t f2fs_tokens = {
 	{Opt_nobarrier, "nobarrier"},
 	{Opt_fastboot, "fastboot"},
 	{Opt_extent_cache, "extent_cache"},
+	{Opt_noextent_cache, "noextent_cache"},
 	{Opt_noinline_data, "noinline_data"},
 	{Opt_err, NULL},
 };
@@ -389,6 +391,9 @@ static int parse_options(struct super_block *sb, char *options)
 		case Opt_extent_cache:
 			set_opt(sbi, EXTENT_CACHE);
 			break;
+		case Opt_noextent_cache:
+			clear_opt(sbi, EXTENT_CACHE);
+			break;
 		case Opt_noinline_data:
 			clear_opt(sbi, INLINE_DATA);
 			break;
@@ -662,6 +667,8 @@ static int f2fs_show_options(struct seq_file *seq, struct dentry *root)
 		seq_puts(seq, ",fastboot");
 	if (test_opt(sbi, EXTENT_CACHE))
 		seq_puts(seq, ",extent_cache");
+	else
+		seq_puts(seq, ",noextent_cache");
 	seq_printf(seq, ",active_logs=%u", sbi->active_logs);
 
 	return 0;
-- 
cgit v1.2.3


From aca3c3546396b305fff79f09883cff48a908aac0 Mon Sep 17 00:00:00 2001
From: NeilBrown <neil@brown.name>
Date: Thu, 30 Jul 2015 10:11:24 +1000
Subject: twl4030_charger: allow max_current to be managed via sysfs.

'max_current' sysfs attributes are created which allow the
max to be set.
Whenever a current source changes, the default is restored.
This will be followed by a uevent, so user-space can decide to
update again.

Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
---
 .../ABI/testing/sysfs-class-power-twl4030          | 15 +++++
 drivers/power/twl4030_charger.c                    | 72 ++++++++++++++++++++++
 2 files changed, 87 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-power-twl4030

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-class-power-twl4030 b/Documentation/ABI/testing/sysfs-class-power-twl4030
new file mode 100644
index 000000000000..0331bba4605d
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-power-twl4030
@@ -0,0 +1,15 @@
+What: /sys/class/power_supply/twl4030_ac/max_current
+      /sys/class/power_supply/twl4030_usb/max_current
+Description:
+	Read/Write limit on current which may
+	be drawn from the ac (Accessory Charger) or
+	USB port.
+
+	Value is in micro-Amps.
+
+	Value is set automatically to an appropriate
+	value when a cable is plugged or unplugged.
+
+	Value can the set by writing to the attribute.
+	The change will only persist until the next
+	plug event.  These event are reported via udev.
diff --git a/drivers/power/twl4030_charger.c b/drivers/power/twl4030_charger.c
index 982675df21b7..b0a50adebfda 100644
--- a/drivers/power/twl4030_charger.c
+++ b/drivers/power/twl4030_charger.c
@@ -482,6 +482,8 @@ static irqreturn_t twl4030_charger_interrupt(int irq, void *arg)
 	struct twl4030_bci *bci = arg;
 
 	dev_dbg(bci->dev, "CHG_PRES irq\n");
+	/* reset current on each 'plug' event */
+	bci->ac_cur = 500000;
 	twl4030_charger_update_current(bci);
 	power_supply_changed(bci->ac);
 	power_supply_changed(bci->usb);
@@ -536,6 +538,63 @@ static irqreturn_t twl4030_bci_interrupt(int irq, void *arg)
 	return IRQ_HANDLED;
 }
 
+/*
+ * Provide "max_current" attribute in sysfs.
+ */
+static ssize_t
+twl4030_bci_max_current_store(struct device *dev, struct device_attribute *attr,
+	const char *buf, size_t n)
+{
+	struct twl4030_bci *bci = dev_get_drvdata(dev->parent);
+	int cur = 0;
+	int status = 0;
+	status = kstrtoint(buf, 10, &cur);
+	if (status)
+		return status;
+	if (cur < 0)
+		return -EINVAL;
+	if (dev == &bci->ac->dev)
+		bci->ac_cur = cur;
+	else
+		bci->usb_cur = cur;
+
+	twl4030_charger_update_current(bci);
+	return n;
+}
+
+/*
+ * sysfs max_current show
+ */
+static ssize_t twl4030_bci_max_current_show(struct device *dev,
+	struct device_attribute *attr, char *buf)
+{
+	int status = 0;
+	int cur = -1;
+	u8 bcictl1;
+	struct twl4030_bci *bci = dev_get_drvdata(dev->parent);
+
+	if (dev == &bci->ac->dev) {
+		if (!bci->ac_is_active)
+			cur = bci->ac_cur;
+	} else {
+		if (bci->ac_is_active)
+			cur = bci->usb_cur;
+	}
+	if (cur < 0) {
+		cur = twl4030bci_read_adc_val(TWL4030_BCIIREF1);
+		if (cur < 0)
+			return cur;
+		status = twl4030_bci_read(TWL4030_BCICTL1, &bcictl1);
+		if (status < 0)
+			return status;
+		cur = regval2ua(cur, bcictl1 & TWL4030_CGAIN);
+	}
+	return scnprintf(buf, PAGE_SIZE, "%u\n", cur);
+}
+
+static DEVICE_ATTR(max_current, 0644, twl4030_bci_max_current_show,
+			twl4030_bci_max_current_store);
+
 static void twl4030_bci_usb_work(struct work_struct *data)
 {
 	struct twl4030_bci *bci = container_of(data, struct twl4030_bci, work);
@@ -558,6 +617,12 @@ static int twl4030_bci_usb_ncb(struct notifier_block *nb, unsigned long val,
 
 	dev_dbg(bci->dev, "OTG notify %lu\n", val);
 
+	/* reset current on each 'plug' event */
+	if (allow_usb)
+		bci->usb_cur = 500000;
+	else
+		bci->usb_cur = 100000;
+
 	bci->event = val;
 	schedule_work(&bci->work);
 
@@ -831,6 +896,11 @@ static int twl4030_bci_probe(struct platform_device *pdev)
 		dev_warn(&pdev->dev, "failed to unmask interrupts: %d\n", ret);
 
 	twl4030_charger_update_current(bci);
+	if (device_create_file(&bci->usb->dev, &dev_attr_max_current))
+		dev_warn(&pdev->dev, "could not create sysfs file\n");
+	if (device_create_file(&bci->ac->dev, &dev_attr_max_current))
+		dev_warn(&pdev->dev, "could not create sysfs file\n");
+
 	twl4030_charger_enable_ac(true);
 	if (!IS_ERR_OR_NULL(bci->transceiver))
 		twl4030_bci_usb_ncb(&bci->usb_nb,
@@ -855,6 +925,8 @@ static int __exit twl4030_bci_remove(struct platform_device *pdev)
 	twl4030_charger_enable_usb(bci, false);
 	twl4030_charger_enable_backup(0, 0);
 
+	device_remove_file(&bci->usb->dev, &dev_attr_max_current);
+	device_remove_file(&bci->ac->dev, &dev_attr_max_current);
 	/* mask interrupts */
 	twl_i2c_write_u8(TWL4030_MODULE_INTERRUPTS, 0xff,
 			 TWL4030_INTERRUPTS_BCIIMR1A);
-- 
cgit v1.2.3


From 22d4c33f7335ddf6deda26630c78650e133bfe72 Mon Sep 17 00:00:00 2001
From: NeilBrown <neil@brown.name>
Date: Thu, 30 Jul 2015 10:11:24 +1000
Subject: twl4030_charger: enable manual enable/disable of usb charging.

'off' or 'auto' to

 /sys/class/power/twl4030_usb/mode

will now enable or disable charging from USB port.  Normally this is
enabled on 'plug' and disabled on 'unplug'.
Unplug will still disable charging.  'plug' will only enable it if
'auto' if selected.

Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
---
 .../ABI/testing/sysfs-class-power-twl4030          | 11 ++++
 drivers/power/twl4030_charger.c                    | 59 ++++++++++++++++++++++
 2 files changed, 70 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-class-power-twl4030 b/Documentation/ABI/testing/sysfs-class-power-twl4030
index 0331bba4605d..40e0f919cbde 100644
--- a/Documentation/ABI/testing/sysfs-class-power-twl4030
+++ b/Documentation/ABI/testing/sysfs-class-power-twl4030
@@ -13,3 +13,14 @@ Description:
 	Value can the set by writing to the attribute.
 	The change will only persist until the next
 	plug event.  These event are reported via udev.
+
+
+What: /sys/class/power_supply/twl4030_usb/mode
+Description:
+	Changing mode for USB port.
+	Writing to this can disable charging.
+
+	Possible values are:
+		"auto" - draw power as appropriate for detected
+			 power source and battery status.
+		"off"  - do not draw any power.
diff --git a/drivers/power/twl4030_charger.c b/drivers/power/twl4030_charger.c
index b0a50adebfda..6fa928ed3128 100644
--- a/drivers/power/twl4030_charger.c
+++ b/drivers/power/twl4030_charger.c
@@ -109,10 +109,16 @@ struct twl4030_bci {
 	unsigned int		ichg_eoc, ichg_lo, ichg_hi;
 	unsigned int		usb_cur, ac_cur;
 	bool			ac_is_active;
+	int			usb_mode; /* charging mode requested */
+#define	CHARGE_OFF	0
+#define	CHARGE_AUTO	1
 
 	unsigned long		event;
 };
 
+/* strings for 'usb_mode' values */
+static char *modes[] = { "off", "auto" };
+
 /*
  * clear and set bits on an given register on a given module
  */
@@ -386,6 +392,8 @@ static int twl4030_charger_enable_usb(struct twl4030_bci *bci, bool enable)
 {
 	int ret;
 
+	if (bci->usb_mode == CHARGE_OFF)
+		enable = false;
 	if (enable && !IS_ERR_OR_NULL(bci->transceiver)) {
 
 		twl4030_charger_update_current(bci);
@@ -629,6 +637,53 @@ static int twl4030_bci_usb_ncb(struct notifier_block *nb, unsigned long val,
 	return NOTIFY_OK;
 }
 
+/*
+ * sysfs charger enabled store
+ */
+static ssize_t
+twl4030_bci_mode_store(struct device *dev, struct device_attribute *attr,
+			  const char *buf, size_t n)
+{
+	struct twl4030_bci *bci = dev_get_drvdata(dev->parent);
+	int mode;
+	int status;
+
+	if (sysfs_streq(buf, modes[0]))
+		mode = 0;
+	else if (sysfs_streq(buf, modes[1]))
+		mode = 1;
+	else
+		return -EINVAL;
+	twl4030_charger_enable_usb(bci, false);
+	bci->usb_mode = mode;
+	status = twl4030_charger_enable_usb(bci, true);
+	return (status == 0) ? n : status;
+}
+
+/*
+ * sysfs charger enabled show
+ */
+static ssize_t
+twl4030_bci_mode_show(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	struct twl4030_bci *bci = dev_get_drvdata(dev->parent);
+	int len = 0;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(modes); i++)
+		if (bci->usb_mode == i)
+			len += snprintf(buf+len, PAGE_SIZE-len,
+					"[%s] ", modes[i]);
+		else
+			len += snprintf(buf+len, PAGE_SIZE-len,
+					"%s ", modes[i]);
+	buf[len-1] = '\n';
+	return len;
+}
+static DEVICE_ATTR(mode, 0644, twl4030_bci_mode_show,
+		   twl4030_bci_mode_store);
+
 static int twl4030_charger_get_current(void)
 {
 	int curr;
@@ -815,6 +870,7 @@ static int twl4030_bci_probe(struct platform_device *pdev)
 		bci->usb_cur = 500000;  /* 500mA */
 	else
 		bci->usb_cur = 100000;  /* 100mA */
+	bci->usb_mode = CHARGE_AUTO;
 
 	bci->dev = &pdev->dev;
 	bci->irq_chg = platform_get_irq(pdev, 0);
@@ -898,6 +954,8 @@ static int twl4030_bci_probe(struct platform_device *pdev)
 	twl4030_charger_update_current(bci);
 	if (device_create_file(&bci->usb->dev, &dev_attr_max_current))
 		dev_warn(&pdev->dev, "could not create sysfs file\n");
+	if (device_create_file(&bci->usb->dev, &dev_attr_mode))
+		dev_warn(&pdev->dev, "could not create sysfs file\n");
 	if (device_create_file(&bci->ac->dev, &dev_attr_max_current))
 		dev_warn(&pdev->dev, "could not create sysfs file\n");
 
@@ -926,6 +984,7 @@ static int __exit twl4030_bci_remove(struct platform_device *pdev)
 	twl4030_charger_enable_backup(0, 0);
 
 	device_remove_file(&bci->usb->dev, &dev_attr_max_current);
+	device_remove_file(&bci->usb->dev, &dev_attr_mode);
 	device_remove_file(&bci->ac->dev, &dev_attr_max_current);
 	/* mask interrupts */
 	twl_i2c_write_u8(TWL4030_MODULE_INTERRUPTS, 0xff,
-- 
cgit v1.2.3


From 7f4a633d21331155ee06c5ee44749ed35a3a3cc5 Mon Sep 17 00:00:00 2001
From: NeilBrown <neil@brown.name>
Date: Thu, 30 Jul 2015 10:11:24 +1000
Subject: twl4030_charger: add software controlled linear charging mode.

Add a 'continuous' option for usb charging which enables
the "linear" charging mode of the twl4030.

Linear charging does a good job with not-so-reliable power sources.
Auto mode does not work well as it switches off when voltage drops
momentarily.  Care must be taken not to over-charge.

It was used with a bike hub dynamo since a year or so. In that case
there are automatically charging stops when the cyclist needs a break.

Original-by: Andreas Kemnade <andreas@kemnade.info>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
---
 .../ABI/testing/sysfs-class-power-twl4030          |  9 ++++
 drivers/power/twl4030_charger.c                    | 55 ++++++++++++++++++++--
 2 files changed, 59 insertions(+), 5 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-class-power-twl4030 b/Documentation/ABI/testing/sysfs-class-power-twl4030
index 40e0f919cbde..e8baa36aaa2b 100644
--- a/Documentation/ABI/testing/sysfs-class-power-twl4030
+++ b/Documentation/ABI/testing/sysfs-class-power-twl4030
@@ -24,3 +24,12 @@ Description:
 		"auto" - draw power as appropriate for detected
 			 power source and battery status.
 		"off"  - do not draw any power.
+		"continuous"
+		       - activate mode described as "linear" in
+		         TWL data sheets.  This uses whatever
+			 current is available and doesn't switch off
+			 when voltage drops.
+
+			 This is useful for unstable power sources
+			 such as bicycle dynamo, but care should
+			 be taken that battery is not over-charged.
diff --git a/drivers/power/twl4030_charger.c b/drivers/power/twl4030_charger.c
index 6fa928ed3128..de5430deaf23 100644
--- a/drivers/power/twl4030_charger.c
+++ b/drivers/power/twl4030_charger.c
@@ -24,6 +24,8 @@
 #include <linux/usb/otg.h>
 #include <linux/i2c/twl4030-madc.h>
 
+#define TWL4030_BCIMDEN		0x00
+#define TWL4030_BCIMDKEY	0x01
 #define TWL4030_BCIMSTATEC	0x02
 #define TWL4030_BCIICHG		0x08
 #define TWL4030_BCIVAC		0x0a
@@ -35,13 +37,16 @@
 #define TWL4030_BCIIREF1	0x27
 #define TWL4030_BCIIREF2	0x28
 #define TWL4030_BCIMFKEY	0x11
+#define TWL4030_BCIMFEN3	0x14
 #define TWL4030_BCIMFTH8	0x1d
 #define TWL4030_BCIMFTH9	0x1e
+#define TWL4030_BCIWDKEY	0x21
 
 #define TWL4030_BCIMFSTS1	0x01
 
 #define TWL4030_BCIAUTOWEN	BIT(5)
 #define TWL4030_CONFIG_DONE	BIT(4)
+#define TWL4030_CVENAC		BIT(2)
 #define TWL4030_BCIAUTOUSB	BIT(1)
 #define TWL4030_BCIAUTOAC	BIT(0)
 #define TWL4030_CGAIN		BIT(5)
@@ -112,12 +117,13 @@ struct twl4030_bci {
 	int			usb_mode; /* charging mode requested */
 #define	CHARGE_OFF	0
 #define	CHARGE_AUTO	1
+#define	CHARGE_LINEAR	2
 
 	unsigned long		event;
 };
 
 /* strings for 'usb_mode' values */
-static char *modes[] = { "off", "auto" };
+static char *modes[] = { "off", "auto", "continuous" };
 
 /*
  * clear and set bits on an given register on a given module
@@ -404,16 +410,42 @@ static int twl4030_charger_enable_usb(struct twl4030_bci *bci, bool enable)
 			bci->usb_enabled = 1;
 		}
 
-		/* forcing the field BCIAUTOUSB (BOOT_BCI[1]) to 1 */
-		ret = twl4030_clear_set_boot_bci(0, TWL4030_BCIAUTOUSB);
-		if (ret < 0)
-			return ret;
+		if (bci->usb_mode == CHARGE_AUTO)
+			/* forcing the field BCIAUTOUSB (BOOT_BCI[1]) to 1 */
+			ret = twl4030_clear_set_boot_bci(0, TWL4030_BCIAUTOUSB);
 
 		/* forcing USBFASTMCHG(BCIMFSTS4[2]) to 1 */
 		ret = twl4030_clear_set(TWL_MODULE_MAIN_CHARGE, 0,
 			TWL4030_USBFASTMCHG, TWL4030_BCIMFSTS4);
+		if (bci->usb_mode == CHARGE_LINEAR) {
+			twl4030_clear_set_boot_bci(TWL4030_BCIAUTOAC|TWL4030_CVENAC, 0);
+			/* Watch dog key: WOVF acknowledge */
+			ret = twl_i2c_write_u8(TWL_MODULE_MAIN_CHARGE, 0x33,
+					       TWL4030_BCIWDKEY);
+			/* 0x24 + EKEY6: off mode */
+			ret = twl_i2c_write_u8(TWL_MODULE_MAIN_CHARGE, 0x2a,
+					       TWL4030_BCIMDKEY);
+			/* EKEY2: Linear charge: USB path */
+			ret = twl_i2c_write_u8(TWL_MODULE_MAIN_CHARGE, 0x26,
+					       TWL4030_BCIMDKEY);
+			/* WDKEY5: stop watchdog count */
+			ret = twl_i2c_write_u8(TWL_MODULE_MAIN_CHARGE, 0xf3,
+					       TWL4030_BCIWDKEY);
+			/* enable MFEN3 access */
+			ret = twl_i2c_write_u8(TWL_MODULE_MAIN_CHARGE, 0x9c,
+					       TWL4030_BCIMFKEY);
+			 /* ICHGEOCEN - end-of-charge monitor (current < 80mA)
+			  *                      (charging continues)
+			  * ICHGLOWEN - current level monitor (charge continues)
+			  * don't monitor over-current or heat save
+			  */
+			ret = twl_i2c_write_u8(TWL_MODULE_MAIN_CHARGE, 0xf0,
+					       TWL4030_BCIMFEN3);
+		}
 	} else {
 		ret = twl4030_clear_set_boot_bci(TWL4030_BCIAUTOUSB, 0);
+		ret |= twl_i2c_write_u8(TWL_MODULE_MAIN_CHARGE, 0x2a,
+					TWL4030_BCIMDKEY);
 		if (bci->usb_enabled) {
 			pm_runtime_mark_last_busy(bci->transceiver->dev);
 			pm_runtime_put_autosuspend(bci->transceiver->dev);
@@ -652,6 +684,8 @@ twl4030_bci_mode_store(struct device *dev, struct device_attribute *attr,
 		mode = 0;
 	else if (sysfs_streq(buf, modes[1]))
 		mode = 1;
+	else if (sysfs_streq(buf, modes[2]))
+		mode = 2;
 	else
 		return -EINVAL;
 	twl4030_charger_enable_usb(bci, false);
@@ -750,6 +784,17 @@ static int twl4030_bci_get_property(struct power_supply *psy,
 		is_charging = state & TWL4030_MSTATEC_USB;
 	else
 		is_charging = state & TWL4030_MSTATEC_AC;
+	if (!is_charging) {
+		u8 s;
+		twl4030_bci_read(TWL4030_BCIMDEN, &s);
+		if (psy->desc->type == POWER_SUPPLY_TYPE_USB)
+			is_charging = s & 1;
+		else
+			is_charging = s & 2;
+		if (is_charging)
+			/* A little white lie */
+			state = TWL4030_MSTATEC_QUICK1;
+	}
 
 	switch (psp) {
 	case POWER_SUPPLY_PROP_STATUS:
-- 
cgit v1.2.3


From b04b908d8a2901c2cc59db87defd9c08bd4293cc Mon Sep 17 00:00:00 2001
From: NeilBrown <neil@brown.name>
Date: Thu, 30 Jul 2015 10:11:24 +1000
Subject: twl4030_charger: add ac/mode to match usb/mode

This allows AC charging to be turned off, much like usb charging.
"continuous" mode is not available though.

Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
---
 .../ABI/testing/sysfs-class-power-twl4030          | 10 +++++++
 drivers/power/twl4030_charger.c                    | 35 +++++++++++++++++-----
 2 files changed, 37 insertions(+), 8 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-class-power-twl4030 b/Documentation/ABI/testing/sysfs-class-power-twl4030
index e8baa36aaa2b..be26af0f1895 100644
--- a/Documentation/ABI/testing/sysfs-class-power-twl4030
+++ b/Documentation/ABI/testing/sysfs-class-power-twl4030
@@ -33,3 +33,13 @@ Description:
 			 This is useful for unstable power sources
 			 such as bicycle dynamo, but care should
 			 be taken that battery is not over-charged.
+
+What: /sys/class/power_supply/twl4030_ac/mode
+Description:
+	Changing mode for 'ac' port.
+	Writing to this can disable charging.
+
+	Possible values are:
+		"auto" - draw power as appropriate for detected
+			 power source and battery status.
+		"off"  - do not draw any power.
diff --git a/drivers/power/twl4030_charger.c b/drivers/power/twl4030_charger.c
index de5430deaf23..68117ad23564 100644
--- a/drivers/power/twl4030_charger.c
+++ b/drivers/power/twl4030_charger.c
@@ -114,7 +114,7 @@ struct twl4030_bci {
 	unsigned int		ichg_eoc, ichg_lo, ichg_hi;
 	unsigned int		usb_cur, ac_cur;
 	bool			ac_is_active;
-	int			usb_mode; /* charging mode requested */
+	int			usb_mode, ac_mode; /* charging mode requested */
 #define	CHARGE_OFF	0
 #define	CHARGE_AUTO	1
 #define	CHARGE_LINEAR	2
@@ -459,10 +459,13 @@ static int twl4030_charger_enable_usb(struct twl4030_bci *bci, bool enable)
 /*
  * Enable/Disable AC Charge funtionality.
  */
-static int twl4030_charger_enable_ac(bool enable)
+static int twl4030_charger_enable_ac(struct twl4030_bci *bci, bool enable)
 {
 	int ret;
 
+	if (bci->ac_mode == CHARGE_OFF)
+		enable = false;
+
 	if (enable)
 		ret = twl4030_clear_set_boot_bci(0, TWL4030_BCIAUTOAC);
 	else
@@ -688,9 +691,17 @@ twl4030_bci_mode_store(struct device *dev, struct device_attribute *attr,
 		mode = 2;
 	else
 		return -EINVAL;
-	twl4030_charger_enable_usb(bci, false);
-	bci->usb_mode = mode;
-	status = twl4030_charger_enable_usb(bci, true);
+	if (dev == &bci->ac->dev) {
+		if (mode == 2)
+			return -EINVAL;
+		twl4030_charger_enable_ac(bci, false);
+		bci->ac_mode = mode;
+		status = twl4030_charger_enable_ac(bci, true);
+	} else {
+		twl4030_charger_enable_usb(bci, false);
+		bci->usb_mode = mode;
+		status = twl4030_charger_enable_usb(bci, true);
+	}
 	return (status == 0) ? n : status;
 }
 
@@ -704,9 +715,13 @@ twl4030_bci_mode_show(struct device *dev,
 	struct twl4030_bci *bci = dev_get_drvdata(dev->parent);
 	int len = 0;
 	int i;
+	int mode = bci->usb_mode;
+
+	if (dev == &bci->ac->dev)
+		mode = bci->ac_mode;
 
 	for (i = 0; i < ARRAY_SIZE(modes); i++)
-		if (bci->usb_mode == i)
+		if (mode == i)
 			len += snprintf(buf+len, PAGE_SIZE-len,
 					"[%s] ", modes[i]);
 		else
@@ -916,6 +931,7 @@ static int twl4030_bci_probe(struct platform_device *pdev)
 	else
 		bci->usb_cur = 100000;  /* 100mA */
 	bci->usb_mode = CHARGE_AUTO;
+	bci->ac_mode = CHARGE_AUTO;
 
 	bci->dev = &pdev->dev;
 	bci->irq_chg = platform_get_irq(pdev, 0);
@@ -1001,10 +1017,12 @@ static int twl4030_bci_probe(struct platform_device *pdev)
 		dev_warn(&pdev->dev, "could not create sysfs file\n");
 	if (device_create_file(&bci->usb->dev, &dev_attr_mode))
 		dev_warn(&pdev->dev, "could not create sysfs file\n");
+	if (device_create_file(&bci->ac->dev, &dev_attr_mode))
+		dev_warn(&pdev->dev, "could not create sysfs file\n");
 	if (device_create_file(&bci->ac->dev, &dev_attr_max_current))
 		dev_warn(&pdev->dev, "could not create sysfs file\n");
 
-	twl4030_charger_enable_ac(true);
+	twl4030_charger_enable_ac(bci, true);
 	if (!IS_ERR_OR_NULL(bci->transceiver))
 		twl4030_bci_usb_ncb(&bci->usb_nb,
 				    bci->transceiver->last_event,
@@ -1024,13 +1042,14 @@ static int __exit twl4030_bci_remove(struct platform_device *pdev)
 {
 	struct twl4030_bci *bci = platform_get_drvdata(pdev);
 
-	twl4030_charger_enable_ac(false);
+	twl4030_charger_enable_ac(bci, false);
 	twl4030_charger_enable_usb(bci, false);
 	twl4030_charger_enable_backup(0, 0);
 
 	device_remove_file(&bci->usb->dev, &dev_attr_max_current);
 	device_remove_file(&bci->usb->dev, &dev_attr_mode);
 	device_remove_file(&bci->ac->dev, &dev_attr_max_current);
+	device_remove_file(&bci->ac->dev, &dev_attr_mode);
 	/* mask interrupts */
 	twl_i2c_write_u8(TWL4030_MODULE_INTERRUPTS, 0xff,
 			 TWL4030_INTERRUPTS_BCIIMR1A);
-- 
cgit v1.2.3


From 7b9c5162c1829686baea1962177842c3fa1ea81c Mon Sep 17 00:00:00 2001
From: Niklas Cassel <nks@flawful.org>
Date: Sat, 25 Jul 2015 02:02:46 +0200
Subject: serial: etraxfs-uart: use mctrl_gpio helpers for handling modem
 signals

In order to use the mctrl_gpio helpers, we change the DT bindings:
ri-gpios renamed to rng-gpios. cd-gpios renamed to dcd-gpios.
However, no in-tree dts/dtsi specifies these, so no worries.

Signed-off-by: Niklas Cassel <nks@flawful.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 .../bindings/serial/axis,etraxfs-uart.txt          |  6 +++-
 drivers/tty/serial/Kconfig                         |  1 +
 drivers/tty/serial/etraxfs-uart.c                  | 40 ++++++----------------
 3 files changed, 16 insertions(+), 31 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/serial/axis,etraxfs-uart.txt b/Documentation/devicetree/bindings/serial/axis,etraxfs-uart.txt
index ebcbb62c0a76..51b3c9e80ad9 100644
--- a/Documentation/devicetree/bindings/serial/axis,etraxfs-uart.txt
+++ b/Documentation/devicetree/bindings/serial/axis,etraxfs-uart.txt
@@ -6,7 +6,7 @@ Required properties:
 - interrupts: device interrupt
 
 Optional properties:
-- {dtr,dsr,ri,cd}-gpios: specify a GPIO for DTR/DSR/RI/CD
+- {dtr,dsr,rng,dcd}-gpios: specify a GPIO for DTR/DSR/RI/DCD
   line respectively.
 
 Example:
@@ -16,4 +16,8 @@ serial@b00260000 {
 	reg = <0xb0026000 0x1000>;
 	interrupts = <68>;
 	status = "disabled";
+	dtr-gpios = <&sysgpio 0 GPIO_ACTIVE_LOW>;
+	dsr-gpios = <&sysgpio 1 GPIO_ACTIVE_LOW>;
+	rng-gpios = <&sysgpio 2 GPIO_ACTIVE_LOW>;
+	dcd-gpios = <&sysgpio 3 GPIO_ACTIVE_LOW>;
 };
diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig
index 4c8662ea7bb0..cbde6b77ef77 100644
--- a/drivers/tty/serial/Kconfig
+++ b/drivers/tty/serial/Kconfig
@@ -1067,6 +1067,7 @@ config SERIAL_ETRAXFS
 	bool "ETRAX FS serial port support"
 	depends on ETRAX_ARCH_V32 && OF
 	select SERIAL_CORE
+	select SERIAL_MCTRL_GPIO if GPIOLIB
 
 config SERIAL_ETRAXFS_CONSOLE
 	bool "ETRAX FS serial console support"
diff --git a/drivers/tty/serial/etraxfs-uart.c b/drivers/tty/serial/etraxfs-uart.c
index 8f242dc84cea..6813e316e9ff 100644
--- a/drivers/tty/serial/etraxfs-uart.c
+++ b/drivers/tty/serial/etraxfs-uart.c
@@ -10,6 +10,8 @@
 #include <linux/of_address.h>
 #include <hwregs/ser_defs.h>
 
+#include "serial_mctrl_gpio.h"
+
 #define DRV_NAME "etraxfs-uart"
 #define UART_NR CONFIG_ETRAX_SERIAL_PORTS
 
@@ -28,10 +30,7 @@ struct uart_cris_port {
 
 	void __iomem *regi_ser;
 
-	struct gpio_desc *dtr_pin;
-	struct gpio_desc *dsr_pin;
-	struct gpio_desc *ri_pin;
-	struct gpio_desc *cd_pin;
+	struct mctrl_gpios *gpios;
 
 	int write_ongoing;
 };
@@ -389,21 +388,9 @@ static unsigned int etraxfs_uart_get_mctrl(struct uart_port *port)
 	ret = 0;
 	if (crisv32_serial_get_rts(up))
 		ret |= TIOCM_RTS;
-	/* DTR is active low */
-	if (up->dtr_pin && !gpiod_get_raw_value(up->dtr_pin))
-		ret |= TIOCM_DTR;
-	/* CD is active low */
-	if (up->cd_pin && !gpiod_get_raw_value(up->cd_pin))
-		ret |= TIOCM_CD;
-	/* RI is active low */
-	if (up->ri_pin && !gpiod_get_raw_value(up->ri_pin))
-		ret |= TIOCM_RI;
-	/* DSR is active low */
-	if (up->dsr_pin && !gpiod_get_raw_value(up->dsr_pin))
-		ret |= TIOCM_DSR;
 	if (crisv32_serial_get_cts(up))
 		ret |= TIOCM_CTS;
-	return ret;
+	return mctrl_gpio_get(up->gpios, &ret);
 }
 
 static void etraxfs_uart_set_mctrl(struct uart_port *port, unsigned int mctrl)
@@ -411,15 +398,7 @@ static void etraxfs_uart_set_mctrl(struct uart_port *port, unsigned int mctrl)
 	struct uart_cris_port *up = (struct uart_cris_port *)port;
 
 	crisv32_serial_set_rts(up, mctrl & TIOCM_RTS ? 1 : 0, 0);
-	/* DTR is active low */
-	if (up->dtr_pin)
-		gpiod_set_raw_value(up->dtr_pin, mctrl & TIOCM_DTR ? 0 : 1);
-	/* RI is active low */
-	if (up->ri_pin)
-		gpiod_set_raw_value(up->ri_pin, mctrl & TIOCM_RNG ? 0 : 1);
-	/* CD is active low */
-	if (up->cd_pin)
-		gpiod_set_raw_value(up->cd_pin, mctrl & TIOCM_CD ? 0 : 1);
+	mctrl_gpio_set(up->gpios, mctrl);
 }
 
 static void etraxfs_uart_break_ctl(struct uart_port *port, int break_state)
@@ -913,11 +892,12 @@ static int etraxfs_uart_probe(struct platform_device *pdev)
 
 	up->irq = irq_of_parse_and_map(np, 0);
 	up->regi_ser = of_iomap(np, 0);
-	up->dtr_pin = devm_gpiod_get_optional(&pdev->dev, "dtr");
-	up->dsr_pin = devm_gpiod_get_optional(&pdev->dev, "dsr");
-	up->ri_pin = devm_gpiod_get_optional(&pdev->dev, "ri");
-	up->cd_pin = devm_gpiod_get_optional(&pdev->dev, "cd");
 	up->port.dev = &pdev->dev;
+
+	up->gpios = mctrl_gpio_init(&pdev->dev, 0);
+	if (IS_ERR(up->gpios))
+		return PTR_ERR(up->gpios);
+
 	cris_serial_port_init(&up->port, dev_id);
 
 	etraxfs_uart_ports[dev_id] = up;
-- 
cgit v1.2.3


From c268a743103aebba8d81d3365107f7170653099e Mon Sep 17 00:00:00 2001
From: Nicolas Ferre <nicolas.ferre@atmel.com>
Date: Thu, 30 Jul 2015 19:12:12 +0200
Subject: ARM: at91/soc: add basic support for new sama5d2 SoC

Add Kconfig entries, header file changes and addition to the documentation.
The early debug infrastructure is also added for easy development.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
---
 Documentation/arm/Atmel/README                       |  5 +++++
 Documentation/devicetree/bindings/arm/atmel-at91.txt |  2 ++
 arch/arm/Kconfig.debug                               |  6 ++++++
 arch/arm/include/debug/at91.S                        |  5 ++++-
 arch/arm/mach-at91/Kconfig                           | 12 ++++++++++++
 arch/arm/mach-at91/sama5.c                           |  3 +++
 arch/arm/mach-at91/soc.h                             |  3 +++
 7 files changed, 35 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/arm/Atmel/README b/Documentation/arm/Atmel/README
index c53a19b4aab2..0931cf7e2e56 100644
--- a/Documentation/arm/Atmel/README
+++ b/Documentation/arm/Atmel/README
@@ -90,6 +90,11 @@ the Atmel website: http://www.atmel.com.
         + Datasheet
           http://www.atmel.com/Images/Atmel-11238-32-bit-Cortex-A5-Microcontroller-SAMA5D4_Datasheet.pdf
 
+      - sama5d2 family
+        - sama5d27
+        + Datasheet
+          Coming soon
+
 
 Linux kernel information
 ------------------------
diff --git a/Documentation/devicetree/bindings/arm/atmel-at91.txt b/Documentation/devicetree/bindings/arm/atmel-at91.txt
index 424ac8cbfa08..7f04b8ae4ca9 100644
--- a/Documentation/devicetree/bindings/arm/atmel-at91.txt
+++ b/Documentation/devicetree/bindings/arm/atmel-at91.txt
@@ -27,6 +27,8 @@ compatible: must be one of:
     o "atmel,at91sam9xe"
  * "atmel,sama5" for SoCs using a Cortex-A5, shall be extended with the specific
    SoC family:
+    o "atmel,sama5d2" shall be extended with the specific SoC compatible:
+       - "atmel,sama5d27"
     o "atmel,sama5d3" shall be extended with the specific SoC compatible:
        - "atmel,sama5d31"
        - "atmel,sama5d33"
diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index a2e16f940394..946c8c0fa1fb 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -141,6 +141,12 @@ choice
 		depends on ARCH_AT91
 		depends on SOC_SAMA5
 
+	config AT91_DEBUG_LL_DBGU3
+		bool "Kernel low-level debugging on sama5d2"
+		select DEBUG_AT91_UART
+		depends on ARCH_AT91
+		depends on SOC_SAMA5
+
 	config DEBUG_BCM2835
 		bool "Kernel low-level debugging on BCM2835 PL011 UART"
 		depends on ARCH_BCM2835
diff --git a/arch/arm/include/debug/at91.S b/arch/arm/include/debug/at91.S
index c3c45e628e33..2556a8801c8c 100644
--- a/arch/arm/include/debug/at91.S
+++ b/arch/arm/include/debug/at91.S
@@ -13,9 +13,12 @@
 #define AT91_DBGU 0xfffff200 /* AT91_BASE_DBGU0 */
 #elif defined(CONFIG_AT91_DEBUG_LL_DBGU1)
 #define AT91_DBGU 0xffffee00 /* AT91_BASE_DBGU1 */
-#else
+#elif defined(CONFIG_AT91_DEBUG_LL_DBGU2)
 /* On sama5d4, use USART3 as low level serial console */
 #define AT91_DBGU 0xfc00c000 /* SAMA5D4_BASE_USART3 */
+#else
+/* On sama5d2, use UART1 as low level serial console */
+#define AT91_DBGU 0xf8020000
 #endif
 
 #ifdef CONFIG_MMU
diff --git a/arch/arm/mach-at91/Kconfig b/arch/arm/mach-at91/Kconfig
index fd95f34945f4..89a755b90db2 100644
--- a/arch/arm/mach-at91/Kconfig
+++ b/arch/arm/mach-at91/Kconfig
@@ -8,6 +8,18 @@ menuconfig ARCH_AT91
 	select SOC_BUS
 
 if ARCH_AT91
+config SOC_SAMA5D2
+	bool "SAMA5D2 family" if ARCH_MULTI_V7
+	select SOC_SAMA5
+	select CACHE_L2X0
+	select HAVE_FB_ATMEL
+	select HAVE_AT91_UTMI
+	select HAVE_AT91_USB_CLK
+	select HAVE_AT91_H32MX
+	select HAVE_AT91_GENERATED_CLK
+	help
+	  Select this if ou are using one of Atmel's SAMA5D2 family SoC.
+
 config SOC_SAMA5D3
 	bool "SAMA5D3 family" if ARCH_MULTI_V7
 	select SOC_SAMA5
diff --git a/arch/arm/mach-at91/sama5.c b/arch/arm/mach-at91/sama5.c
index 41d829d8e7d5..90c3c3051ae7 100644
--- a/arch/arm/mach-at91/sama5.c
+++ b/arch/arm/mach-at91/sama5.c
@@ -18,6 +18,8 @@
 #include "soc.h"
 
 static const struct at91_soc sama5_socs[] = {
+	AT91_SOC(SAMA5D2_CIDR_MATCH, SAMA5D27_EXID_MATCH,
+		 "sama5d27", "sama5d2"),
 	AT91_SOC(SAMA5D3_CIDR_MATCH, SAMA5D31_EXID_MATCH,
 		 "sama5d31", "sama5d3"),
 	AT91_SOC(SAMA5D3_CIDR_MATCH, SAMA5D33_EXID_MATCH,
@@ -64,6 +66,7 @@ DT_MACHINE_START(sama5_dt, "Atmel SAMA5")
 MACHINE_END
 
 static const char *sama5_alt_dt_board_compat[] __initconst = {
+	"atmel,sama5d2",
 	"atmel,sama5d4",
 	NULL
 };
diff --git a/arch/arm/mach-at91/soc.h b/arch/arm/mach-at91/soc.h
index be23c400596b..8ede0ef86172 100644
--- a/arch/arm/mach-at91/soc.h
+++ b/arch/arm/mach-at91/soc.h
@@ -62,6 +62,9 @@ at91_soc_init(const struct at91_soc *socs);
 #define AT91SAM9XE256_CIDR_MATCH	0x329a93a0
 #define AT91SAM9XE512_CIDR_MATCH	0x329aa3a0
 
+#define SAMA5D2_CIDR_MATCH		0x0a5c08c0
+#define SAMA5D27_EXID_MATCH		0x00000021
+
 #define SAMA5D3_CIDR_MATCH		0x0a5c07c0
 #define SAMA5D31_EXID_MATCH		0x00444300
 #define SAMA5D33_EXID_MATCH		0x00414300
-- 
cgit v1.2.3


From 61186371570e3ca62585e29e307f7a9c42ab789a Mon Sep 17 00:00:00 2001
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 26 Jul 2015 23:37:51 +0200
Subject: dt-binding: document the binding for mxc rtc

This adds documentation of device tree bindings for the
mxc rtc.

Cc: Rob Herring <robh+dt@kernel.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ian Campbell <ijc+devicetree@hellion.org.uk>
Cc: Kumar Gala <galak@codeaurora.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
---
 Documentation/devicetree/bindings/rtc/rtc-mxc.txt | 26 +++++++++++++++++++++++
 1 file changed, 26 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/rtc/rtc-mxc.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/rtc/rtc-mxc.txt b/Documentation/devicetree/bindings/rtc/rtc-mxc.txt
new file mode 100644
index 000000000000..5bcd31d995b0
--- /dev/null
+++ b/Documentation/devicetree/bindings/rtc/rtc-mxc.txt
@@ -0,0 +1,26 @@
+* Real Time Clock of the i.MX SoCs
+
+RTC controller for the i.MX SoCs
+
+Required properties:
+- compatible: Should be "fsl,imx1-rtc" or "fsl,imx21-rtc".
+- reg: physical base address of the controller and length of memory mapped
+  region.
+- interrupts: IRQ line for the RTC.
+- clocks: should contain two entries:
+  * one for the input reference
+  * one for the the SoC RTC
+- clock-names: should contain:
+  * "ref" for the input reference clock
+  * "ipg" for the SoC RTC clock
+
+Example:
+
+rtc@10007000 {
+	compatible = "fsl,imx21-rtc";
+	reg = <0x10007000 0x1000>;
+	interrupts = <22>;
+	clocks = <&clks IMX27_CLK_CKIL>,
+		 <&clks IMX27_CLK_RTC_IPG_GATE>;
+	clock-names = "ref", "ipg";
+};
-- 
cgit v1.2.3


From 54bf725e5d21022b25fe0d3c103fe82ed13b870b Mon Sep 17 00:00:00 2001
From: Dexuan Cui <decui@microsoft.com>
Date: Wed, 5 Aug 2015 00:52:45 -0700
Subject: Drivers: hv: vmbus: document the VMBus sysfs files

The 4 sysfs files should be stable ABIs to the user space.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/ABI/stable/sysfs-bus-vmbus | 29 +++++++++++++++++++++++++++++
 MAINTAINERS                              |  1 +
 2 files changed, 30 insertions(+)
 create mode 100644 Documentation/ABI/stable/sysfs-bus-vmbus

(limited to 'Documentation')

diff --git a/Documentation/ABI/stable/sysfs-bus-vmbus b/Documentation/ABI/stable/sysfs-bus-vmbus
new file mode 100644
index 000000000000..636e938d5e33
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-bus-vmbus
@@ -0,0 +1,29 @@
+What:		/sys/bus/vmbus/devices/vmbus_*/id
+Date:		Jul 2009
+KernelVersion:	2.6.31
+Contact:	K. Y. Srinivasan <kys@microsoft.com>
+Description:	The VMBus child_relid of the device's primary channel
+Users:		tools/hv/lsvmbus
+
+What:		/sys/bus/vmbus/devices/vmbus_*/class_id
+Date:		Jul 2009
+KernelVersion:	2.6.31
+Contact:	K. Y. Srinivasan <kys@microsoft.com>
+Description:	The VMBus interface type GUID of the device
+Users:		tools/hv/lsvmbus
+
+What:		/sys/bus/vmbus/devices/vmbus_*/device_id
+Date:		Jul 2009
+KernelVersion:	2.6.31
+Contact:	K. Y. Srinivasan <kys@microsoft.com>
+Description:	The VMBus interface instance GUID of the device
+Users:		tools/hv/lsvmbus
+
+What:		/sys/bus/vmbus/devices/vmbus_*/channel_vp_mapping
+Date:		Jul 2015
+KernelVersion:	4.2.0
+Contact:	K. Y. Srinivasan <kys@microsoft.com>
+Description:	The mapping of which primary/sub channels are bound to which
+		Virtual Processors.
+		Format: <channel's child_relid:the bound cpu's number>
+Users:		tools/hv/lsvmbus
diff --git a/MAINTAINERS b/MAINTAINERS
index 9289ecb57b68..cde3eceea4e8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4957,6 +4957,7 @@ F:	drivers/scsi/storvsc_drv.c
 F:	drivers/video/fbdev/hyperv_fb.c
 F:	include/linux/hyperv.h
 F:	tools/hv/
+F:	Documentation/ABI/stable/sysfs-bus-vmbus
 
 I2C OVER PARALLEL PORT
 M:	Jean Delvare <jdelvare@suse.com>
-- 
cgit v1.2.3


From 89af7ba5740b5937d10c8de93e79e71e5a933041 Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Wed, 5 Aug 2015 00:52:46 -0700
Subject: cpu-hotplug: convert cpu_hotplug_disabled to a counter

As a prerequisite to exporting cpu_hotplug_enable/cpu_hotplug_disable
functions to modules we need to convert cpu_hotplug_disabled to a counter
to properly support disable -> disable -> enable call sequences. E.g.
after Hyper-V vmbus module (which is supposed to be the first user of
exported cpu_hotplug_enable/cpu_hotplug_disable) did cpu_hotplug_disable()
hibernate path calls disable_nonboot_cpus() and if we hit an error in
_cpu_down() enable_nonboot_cpus() will be called on the failure path (thus
making cpu_hotplug_disabled = 0 and leaving cpu hotplug in 'enabled'
state). Same problem is possible if more than 1 module use
cpu_hotplug_disable/cpu_hotplug_enable on their load/unload paths. When
one of these modules is been unloaded it is logical to leave cpu hotplug
in 'disabled' state.

To support the change we need to increse cpu_hotplug_disabled counter
in disable_nonboot_cpus() unconditionally as all users of
disable_nonboot_cpus() are supposed to do enable_nonboot_cpus() in case
an error was returned.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/power/suspend-and-cpuhotplug.txt |  6 +++---
 kernel/cpu.c                                   | 21 +++++++++++++--------
 2 files changed, 16 insertions(+), 11 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
index 2850df3bf957..2fc909502db5 100644
--- a/Documentation/power/suspend-and-cpuhotplug.txt
+++ b/Documentation/power/suspend-and-cpuhotplug.txt
@@ -72,7 +72,7 @@ More details follow:
                                         |
                                         v
                            Disable regular cpu hotplug
-                        by setting cpu_hotplug_disabled=1
+                        by increasing cpu_hotplug_disabled
                                         |
                                         v
                             Release cpu_add_remove_lock
@@ -89,7 +89,7 @@ Resuming back is likewise, with the counterparts being (in the order of
 execution during resume):
 * enable_nonboot_cpus() which involves:
    |  Acquire cpu_add_remove_lock
-   |  Reset cpu_hotplug_disabled to 0, thereby enabling regular cpu hotplug
+   |  Decrease cpu_hotplug_disabled, thereby enabling regular cpu hotplug
    |  Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]
    |  Release cpu_add_remove_lock
    v
@@ -120,7 +120,7 @@ after the entire cycle is complete (i.e., suspend + resume).
                            Acquire cpu_add_remove_lock
                                         |
                                         v
-                          If cpu_hotplug_disabled is 1
+                          If cpu_hotplug_disabled > 0
                                 return gracefully
                                         |
                                         |
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 5644ec5582b9..0fca2ba96138 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -191,14 +191,14 @@ void cpu_hotplug_done(void)
 void cpu_hotplug_disable(void)
 {
 	cpu_maps_update_begin();
-	cpu_hotplug_disabled = 1;
+	cpu_hotplug_disabled++;
 	cpu_maps_update_done();
 }
 
 void cpu_hotplug_enable(void)
 {
 	cpu_maps_update_begin();
-	cpu_hotplug_disabled = 0;
+	WARN_ON(--cpu_hotplug_disabled < 0);
 	cpu_maps_update_done();
 }
 
@@ -608,13 +608,18 @@ int disable_nonboot_cpus(void)
 		}
 	}
 
-	if (!error) {
+	if (!error)
 		BUG_ON(num_online_cpus() > 1);
-		/* Make sure the CPUs won't be enabled by someone else */
-		cpu_hotplug_disabled = 1;
-	} else {
+	else
 		pr_err("Non-boot CPUs are not disabled\n");
-	}
+
+	/*
+	 * Make sure the CPUs won't be enabled by someone else. We need to do
+	 * this even in case of failure as all disable_nonboot_cpus() users are
+	 * supposed to do enable_nonboot_cpus() on the failure path.
+	 */
+	cpu_hotplug_disabled++;
+
 	cpu_maps_update_done();
 	return error;
 }
@@ -633,7 +638,7 @@ void __ref enable_nonboot_cpus(void)
 
 	/* Allow everyone to use the CPU hotplug again */
 	cpu_maps_update_begin();
-	cpu_hotplug_disabled = 0;
+	WARN_ON(--cpu_hotplug_disabled < 0);
 	if (cpumask_empty(frozen_cpus))
 		goto out;
 
-- 
cgit v1.2.3


From e5742ef1279e4cfe52fee7056b15a848ffe5c723 Mon Sep 17 00:00:00 2001
From: Mathieu Poirier <mathieu.poirier@linaro.org>
Date: Fri, 31 Jul 2015 09:37:21 -0600
Subject: coresight: binding for ETMv4 coresight drivers

Adding compatible string for new coresight ETMv4 tracer.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/devicetree/bindings/arm/coresight.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt
index 65a6db2271a2..62938eb9697f 100644
--- a/Documentation/devicetree/bindings/arm/coresight.txt
+++ b/Documentation/devicetree/bindings/arm/coresight.txt
@@ -17,6 +17,7 @@ its hardware characteristcs.
 		- "arm,coresight-tmc", "arm,primecell";
 		- "arm,coresight-funnel", "arm,primecell";
 		- "arm,coresight-etm3x", "arm,primecell";
+		- "arm,coresight-etm4x", "arm,primecell";
 		- "qcom,coresight-replicator1x", "arm,primecell";
 
 	* reg: physical base address and length of the register
-- 
cgit v1.2.3


From 414a1417d7b35e0e72edb16e45840e242cb6b52e Mon Sep 17 00:00:00 2001
From: Chunyan Zhang <zhang.chunyan@linaro.org>
Date: Fri, 31 Jul 2015 09:37:24 -0600
Subject: coresight-etm3x: Change the name of the ctxid_val to ctxid_pid

'ctxid_val' array was used to store the value of ETM context ID comparator
which actually stores the process ID to be traced, so using 'ctxid_pid' as
its name instead make it easier to understand.

This patch also changes the ABI, it is normally not allowed, but
fortunately it is a testing ABI and very new for now. Nevertheless,
if you don't think it should be changed, we could always add an alias
for userspace.

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 .../ABI/testing/sysfs-bus-coresight-devices-etm3x        |  2 +-
 drivers/hwtracing/coresight/coresight-etm.h              |  4 ++--
 drivers/hwtracing/coresight/coresight-etm3x.c            | 16 ++++++++--------
 3 files changed, 11 insertions(+), 11 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-etm3x b/Documentation/ABI/testing/sysfs-bus-coresight-devices-etm3x
index b4d0b99afffb..d72ca1736ba4 100644
--- a/Documentation/ABI/testing/sysfs-bus-coresight-devices-etm3x
+++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-etm3x
@@ -112,7 +112,7 @@ KernelVersion:	3.19
 Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
 Description: 	(RW) Mask to apply to all the context ID comparator.
 
-What:		/sys/bus/coresight/devices/<memory_map>.[etm|ptm]/ctxid_val
+What:		/sys/bus/coresight/devices/<memory_map>.[etm|ptm]/ctxid_pid
 Date:		November 2014
 KernelVersion:	3.19
 Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
diff --git a/drivers/hwtracing/coresight/coresight-etm.h b/drivers/hwtracing/coresight/coresight-etm.h
index 098ffbec0a44..52af5f0adbed 100644
--- a/drivers/hwtracing/coresight/coresight-etm.h
+++ b/drivers/hwtracing/coresight/coresight-etm.h
@@ -183,7 +183,7 @@
  * @seq_13_event: event causing the transition from 1 to 3.
  * @seq_curr_state: current value of the sequencer register.
  * @ctxid_idx: index for the context ID registers.
- * @ctxid_val: value for the context ID to trigger on.
+ * @ctxid_pid: value for the context ID to trigger on.
  * @ctxid_mask: mask applicable to all the context IDs.
  * @sync_freq:	Synchronisation frequency.
  * @timestamp_event: Defines an event that requests the insertion
@@ -235,7 +235,7 @@ struct etm_drvdata {
 	u32				seq_13_event;
 	u32				seq_curr_state;
 	u8				ctxid_idx;
-	u32				ctxid_val[ETM_MAX_CTXID_CMP];
+	u32				ctxid_pid[ETM_MAX_CTXID_CMP];
 	u32				ctxid_mask;
 	u32				sync_freq;
 	u32				timestamp_event;
diff --git a/drivers/hwtracing/coresight/coresight-etm3x.c b/drivers/hwtracing/coresight/coresight-etm3x.c
index cca98a8b2527..361b82068dda 100644
--- a/drivers/hwtracing/coresight/coresight-etm3x.c
+++ b/drivers/hwtracing/coresight/coresight-etm3x.c
@@ -238,7 +238,7 @@ static void etm_set_default(struct etm_drvdata *drvdata)
 	drvdata->seq_curr_state = 0x0;
 	drvdata->ctxid_idx = 0x0;
 	for (i = 0; i < drvdata->nr_ctxid_cmp; i++)
-		drvdata->ctxid_val[i] = 0x0;
+		drvdata->ctxid_pid[i] = 0x0;
 	drvdata->ctxid_mask = 0x0;
 }
 
@@ -289,7 +289,7 @@ static void etm_enable_hw(void *info)
 	for (i = 0; i < drvdata->nr_ext_out; i++)
 		etm_writel(drvdata, ETM_DEFAULT_EVENT_VAL, ETMEXTOUTEVRn(i));
 	for (i = 0; i < drvdata->nr_ctxid_cmp; i++)
-		etm_writel(drvdata, drvdata->ctxid_val[i], ETMCIDCVRn(i));
+		etm_writel(drvdata, drvdata->ctxid_pid[i], ETMCIDCVRn(i));
 	etm_writel(drvdata, drvdata->ctxid_mask, ETMCIDCMR);
 	etm_writel(drvdata, drvdata->sync_freq, ETMSYNCFR);
 	/* No external input selected */
@@ -1386,20 +1386,20 @@ static ssize_t ctxid_idx_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(ctxid_idx);
 
-static ssize_t ctxid_val_show(struct device *dev,
+static ssize_t ctxid_pid_show(struct device *dev,
 			      struct device_attribute *attr, char *buf)
 {
 	unsigned long val;
 	struct etm_drvdata *drvdata = dev_get_drvdata(dev->parent);
 
 	spin_lock(&drvdata->spinlock);
-	val = drvdata->ctxid_val[drvdata->ctxid_idx];
+	val = drvdata->ctxid_pid[drvdata->ctxid_idx];
 	spin_unlock(&drvdata->spinlock);
 
 	return sprintf(buf, "%#lx\n", val);
 }
 
-static ssize_t ctxid_val_store(struct device *dev,
+static ssize_t ctxid_pid_store(struct device *dev,
 			       struct device_attribute *attr,
 			       const char *buf, size_t size)
 {
@@ -1412,12 +1412,12 @@ static ssize_t ctxid_val_store(struct device *dev,
 		return ret;
 
 	spin_lock(&drvdata->spinlock);
-	drvdata->ctxid_val[drvdata->ctxid_idx] = val;
+	drvdata->ctxid_pid[drvdata->ctxid_idx] = val;
 	spin_unlock(&drvdata->spinlock);
 
 	return size;
 }
-static DEVICE_ATTR_RW(ctxid_val);
+static DEVICE_ATTR_RW(ctxid_pid);
 
 static ssize_t ctxid_mask_show(struct device *dev,
 			       struct device_attribute *attr, char *buf)
@@ -1609,7 +1609,7 @@ static struct attribute *coresight_etm_attrs[] = {
 	&dev_attr_seq_13_event.attr,
 	&dev_attr_seq_curr_state.attr,
 	&dev_attr_ctxid_idx.attr,
-	&dev_attr_ctxid_val.attr,
+	&dev_attr_ctxid_pid.attr,
 	&dev_attr_ctxid_mask.attr,
 	&dev_attr_sync_freq.attr,
 	&dev_attr_timestamp_event.attr,
-- 
cgit v1.2.3


From cd196ac3fa5a574f3ddf37b66fbe8c58225c3355 Mon Sep 17 00:00:00 2001
From: Chunyan Zhang <zhang.chunyan@linaro.org>
Date: Fri, 31 Jul 2015 09:37:25 -0600
Subject: coresight-etm4x: Change the name of the ctxid_val to ctxid_pid

'ctxid_val' array was used to store the value of ETM context ID comparator
which actually stores the process ID to be traced, so using 'ctxid_pid' as
its name instead make it easier to understand.

This patch also changes the ABI, it is normally not allowed, but
fortunately it is a testing ABI and very new for now. Nevertheless,
if you don't think it should be changed, we could always add an alias
for userspace.

Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 .../ABI/testing/sysfs-bus-coresight-devices-etm4x    |  2 +-
 drivers/hwtracing/coresight/coresight-etm4x.c        | 20 ++++++++++----------
 drivers/hwtracing/coresight/coresight-etm4x.h        |  4 ++--
 3 files changed, 13 insertions(+), 13 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-etm4x b/Documentation/ABI/testing/sysfs-bus-coresight-devices-etm4x
index 2fe2e3dae487..2355ed8ae31f 100644
--- a/Documentation/ABI/testing/sysfs-bus-coresight-devices-etm4x
+++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-etm4x
@@ -249,7 +249,7 @@ KernelVersion:	4.01
 Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
 Description:	(RW) Select which context ID comparator to work with.
 
-What:		/sys/bus/coresight/devices/<memory_map>.etm/ctxid_val
+What:		/sys/bus/coresight/devices/<memory_map>.etm/ctxid_pid
 Date:		April 2015
 KernelVersion:	4.01
 Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
index 1312e993c501..9afbda54b013 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -155,7 +155,7 @@ static void etm4_enable_hw(void *info)
 			       drvdata->base + TRCACATRn(i));
 	}
 	for (i = 0; i < drvdata->numcidc; i++)
-		writeq_relaxed(drvdata->ctxid_val[i],
+		writeq_relaxed(drvdata->ctxid_pid[i],
 			       drvdata->base + TRCCIDCVRn(i));
 	writel_relaxed(drvdata->ctxid_mask0, drvdata->base + TRCCIDCCTLR0);
 	writel_relaxed(drvdata->ctxid_mask1, drvdata->base + TRCCIDCCTLR1);
@@ -507,7 +507,7 @@ static ssize_t reset_store(struct device *dev,
 
 	drvdata->ctxid_idx = 0x0;
 	for (i = 0; i < drvdata->numcidc; i++)
-		drvdata->ctxid_val[i] = 0x0;
+		drvdata->ctxid_pid[i] = 0x0;
 	drvdata->ctxid_mask0 = 0x0;
 	drvdata->ctxid_mask1 = 0x0;
 
@@ -1815,7 +1815,7 @@ static ssize_t ctxid_idx_store(struct device *dev,
 }
 static DEVICE_ATTR_RW(ctxid_idx);
 
-static ssize_t ctxid_val_show(struct device *dev,
+static ssize_t ctxid_pid_show(struct device *dev,
 			      struct device_attribute *attr,
 			      char *buf)
 {
@@ -1825,12 +1825,12 @@ static ssize_t ctxid_val_show(struct device *dev,
 
 	spin_lock(&drvdata->spinlock);
 	idx = drvdata->ctxid_idx;
-	val = (unsigned long)drvdata->ctxid_val[idx];
+	val = (unsigned long)drvdata->ctxid_pid[idx];
 	spin_unlock(&drvdata->spinlock);
 	return scnprintf(buf, PAGE_SIZE, "%#lx\n", val);
 }
 
-static ssize_t ctxid_val_store(struct device *dev,
+static ssize_t ctxid_pid_store(struct device *dev,
 			       struct device_attribute *attr,
 			       const char *buf, size_t size)
 {
@@ -1850,11 +1850,11 @@ static ssize_t ctxid_val_store(struct device *dev,
 
 	spin_lock(&drvdata->spinlock);
 	idx = drvdata->ctxid_idx;
-	drvdata->ctxid_val[idx] = (u64)val;
+	drvdata->ctxid_pid[idx] = (u64)val;
 	spin_unlock(&drvdata->spinlock);
 	return size;
 }
-static DEVICE_ATTR_RW(ctxid_val);
+static DEVICE_ATTR_RW(ctxid_pid);
 
 static ssize_t ctxid_masks_show(struct device *dev,
 				struct device_attribute *attr,
@@ -1949,7 +1949,7 @@ static ssize_t ctxid_masks_store(struct device *dev,
 		 */
 		for (j = 0; j < 8; j++) {
 			if (maskbyte & 1)
-				drvdata->ctxid_val[i] &= ~(0xFF << (j * 8));
+				drvdata->ctxid_pid[i] &= ~(0xFF << (j * 8));
 			maskbyte >>= 1;
 		}
 		/* Select the next ctxid comparator mask value */
@@ -2193,7 +2193,7 @@ static struct attribute *coresight_etmv4_attrs[] = {
 	&dev_attr_res_idx.attr,
 	&dev_attr_res_ctrl.attr,
 	&dev_attr_ctxid_idx.attr,
-	&dev_attr_ctxid_val.attr,
+	&dev_attr_ctxid_pid.attr,
 	&dev_attr_ctxid_masks.attr,
 	&dev_attr_vmid_idx.attr,
 	&dev_attr_vmid_val.attr,
@@ -2514,7 +2514,7 @@ static void etm4_init_default_data(struct etmv4_drvdata *drvdata)
 	}
 
 	for (i = 0; i < drvdata->numcidc; i++)
-		drvdata->ctxid_val[i] = 0x0;
+		drvdata->ctxid_pid[i] = 0x0;
 	drvdata->ctxid_mask0 = 0x0;
 	drvdata->ctxid_mask1 = 0x0;
 
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h
index e08e983dd2d9..1e8fb369daaf 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.h
+++ b/drivers/hwtracing/coresight/coresight-etm4x.h
@@ -265,7 +265,7 @@
  * @addr_type:	Current status of the comparator register.
  * @ctxid_idx:	Context ID index selector.
  * @ctxid_size:	Size of the context ID field to consider.
- * @ctxid_val:	Value of the context ID comparator.
+ * @ctxid_pid:	Value of the context ID comparator.
  * @ctxid_mask0:Context ID comparator mask for comparator 0-3.
  * @ctxid_mask1:Context ID comparator mask for comparator 4-7.
  * @vmid_idx:	VM ID index selector.
@@ -352,7 +352,7 @@ struct etmv4_drvdata {
 	u8				addr_type[ETM_MAX_SINGLE_ADDR_CMP];
 	u8				ctxid_idx;
 	u8				ctxid_size;
-	u64				ctxid_val[ETMv4_MAX_CTXID_CMP];
+	u64				ctxid_pid[ETMv4_MAX_CTXID_CMP];
 	u32				ctxid_mask0;
 	u32				ctxid_mask1;
 	u8				vmid_idx;
-- 
cgit v1.2.3


From f8b66fe52d3ae97df9f2eef6410bd5aef095914c Mon Sep 17 00:00:00 2001
From: Masanari Iida <standby24x7@gmail.com>
Date: Fri, 31 Jul 2015 09:37:29 -0600
Subject: Doc: trace: Fix typo in coresight.txt

This patch fix spelling typos found in coresight.txt

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/trace/coresight.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index 77d14d51a670..0a5c3290e732 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -15,7 +15,7 @@ HW assisted tracing is becoming increasingly useful when dealing with systems
 that have many SoCs and other components like GPU and DMA engines.  ARM has
 developed a HW assisted tracing solution by means of different components, each
 being added to a design at synthesis time to cater to specific tracing needs.
-Compoments are generally categorised as source, link and sinks and are
+Components are generally categorised as source, link and sinks and are
 (usually) discovered using the AMBA bus.
 
 "Sources" generate a compressed stream representing the processor instruction
@@ -138,7 +138,7 @@ void coresight_unregister(struct coresight_device *csdev);
 
 The registering function is taking a "struct coresight_device *csdev" and
 register the device with the core framework.  The unregister function takes
-a reference to a "strut coresight_device", obtained at registration time.
+a reference to a "struct coresight_device", obtained at registration time.
 
 If everything goes well during the registration process the new devices will
 show up under /sys/bus/coresight/devices, as showns here for a TC2 platform:
-- 
cgit v1.2.3


From 2af38ab572b031a4111f01153cc020b1038b427b Mon Sep 17 00:00:00 2001
From: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Date: Mon, 27 Jul 2015 12:13:58 +0100
Subject: nvmem: Add bindings for simple nvmem framework

This patch adds bindings for simple nvmem framework which allows nvmem
consumers to talk to nvmem providers to get access to nvmem cell data.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
[Maxime Ripard: intial version of eeprom framework]
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Rajendra Nayak <rnayak@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/devicetree/bindings/nvmem/nvmem.txt | 80 +++++++++++++++++++++++
 1 file changed, 80 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/nvmem/nvmem.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/nvmem/nvmem.txt b/Documentation/devicetree/bindings/nvmem/nvmem.txt
new file mode 100644
index 000000000000..b52bc11e9597
--- /dev/null
+++ b/Documentation/devicetree/bindings/nvmem/nvmem.txt
@@ -0,0 +1,80 @@
+= NVMEM(Non Volatile Memory) Data Device Tree Bindings =
+
+This binding is intended to represent the location of hardware
+configuration data stored in NVMEMs like eeprom, efuses and so on.
+
+On a significant proportion of boards, the manufacturer has stored
+some data on NVMEM, for the OS to be able to retrieve these information
+and act upon it. Obviously, the OS has to know about where to retrieve
+these data from, and where they are stored on the storage device.
+
+This document is here to document this.
+
+= Data providers =
+Contains bindings specific to provider drivers and data cells as children
+of this node.
+
+Optional properties:
+ read-only: Mark the provider as read only.
+
+= Data cells =
+These are the child nodes of the provider which contain data cell
+information like offset and size in nvmem provider.
+
+Required properties:
+reg:	specifies the offset in byte within the storage device.
+
+Optional properties:
+
+bits:	Is pair of bit location and number of bits, which specifies offset
+	in bit and number of bits within the address range specified by reg property.
+	Offset takes values from 0-7.
+
+For example:
+
+	/* Provider */
+	qfprom: qfprom@00700000 {
+		...
+
+		/* Data cells */
+		tsens_calibration: calib@404 {
+			reg = <0x404 0x10>;
+		};
+
+		tsens_calibration_bckp: calib_bckp@504 {
+			reg = <0x504 0x11>;
+			bits = <6 128>
+		};
+
+		pvs_version: pvs-version@6 {
+			reg = <0x6 0x2>
+			bits = <7 2>
+		};
+
+		speed_bin: speed-bin@c{
+			reg = <0xc 0x1>;
+			bits = <2 3>;
+
+		};
+		...
+	};
+
+= Data consumers =
+Are device nodes which consume nvmem data cells/providers.
+
+Required-properties:
+nvmem-cells: list of phandle to the nvmem data cells.
+nvmem-cell-names: names for the each nvmem-cells specified. Required if
+	nvmem-cells is used.
+
+Optional-properties:
+nvmem	: list of phandles to nvmem providers.
+nvmem-names: names for the each nvmem provider. required if nvmem is used.
+
+For example:
+
+	tsens {
+		...
+		nvmem-cells = <&tsens_calibration>;
+		nvmem-cell-names = "calibration";
+	};
-- 
cgit v1.2.3


From 354ebb541dfa37a83395e5a9b7d68c34f80fffc0 Mon Sep 17 00:00:00 2001
From: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Date: Mon, 27 Jul 2015 12:14:14 +0100
Subject: Documentation: nvmem: add nvmem api level and how-to doc

This patch add basic how-to and api summary documentation for simple
NVMEM framework.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Rajendra Nayak <rnayak@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/nvmem/nvmem.txt | 152 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 152 insertions(+)
 create mode 100644 Documentation/nvmem/nvmem.txt

(limited to 'Documentation')

diff --git a/Documentation/nvmem/nvmem.txt b/Documentation/nvmem/nvmem.txt
new file mode 100644
index 000000000000..dbd40d879239
--- /dev/null
+++ b/Documentation/nvmem/nvmem.txt
@@ -0,0 +1,152 @@
+			    NVMEM SUBSYSTEM
+	  Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
+
+This document explains the NVMEM Framework along with the APIs provided,
+and how to use it.
+
+1. Introduction
+===============
+*NVMEM* is the abbreviation for Non Volatile Memory layer. It is used to
+retrieve configuration of SOC or Device specific data from non volatile
+memories like eeprom, efuses and so on.
+
+Before this framework existed, NVMEM drivers like eeprom were stored in
+drivers/misc, where they all had to duplicate pretty much the same code to
+register a sysfs file, allow in-kernel users to access the content of the
+devices they were driving, etc.
+
+This was also a problem as far as other in-kernel users were involved, since
+the solutions used were pretty much different from one driver to another, there
+was a rather big abstraction leak.
+
+This framework aims at solve these problems. It also introduces DT
+representation for consumer devices to go get the data they require (MAC
+Addresses, SoC/Revision ID, part numbers, and so on) from the NVMEMs. This
+framework is based on regmap, so that most of the abstraction available in
+regmap can be reused, across multiple types of buses.
+
+NVMEM Providers
++++++++++++++++
+
+NVMEM provider refers to an entity that implements methods to initialize, read
+and write the non-volatile memory.
+
+2. Registering/Unregistering the NVMEM provider
+===============================================
+
+A NVMEM provider can register with NVMEM core by supplying relevant
+nvmem configuration to nvmem_register(), on success core would return a valid
+nvmem_device pointer.
+
+nvmem_unregister(nvmem) is used to unregister a previously registered provider.
+
+For example, a simple qfprom case:
+
+static struct nvmem_config econfig = {
+	.name = "qfprom",
+	.owner = THIS_MODULE,
+};
+
+static int qfprom_probe(struct platform_device *pdev)
+{
+	...
+	econfig.dev = &pdev->dev;
+	nvmem = nvmem_register(&econfig);
+	...
+}
+
+It is mandatory that the NVMEM provider has a regmap associated with its
+struct device. Failure to do would return error code from nvmem_register().
+
+NVMEM Consumers
++++++++++++++++
+
+NVMEM consumers are the entities which make use of the NVMEM provider to
+read from and to NVMEM.
+
+3. NVMEM cell based consumer APIs
+=================================
+
+NVMEM cells are the data entries/fields in the NVMEM.
+The NVMEM framework provides 3 APIs to read/write NVMEM cells.
+
+struct nvmem_cell *nvmem_cell_get(struct device *dev, const char *name);
+struct nvmem_cell *devm_nvmem_cell_get(struct device *dev, const char *name);
+
+void nvmem_cell_put(struct nvmem_cell *cell);
+void devm_nvmem_cell_put(struct device *dev, struct nvmem_cell *cell);
+
+void *nvmem_cell_read(struct nvmem_cell *cell, ssize_t *len);
+int nvmem_cell_write(struct nvmem_cell *cell, void *buf, ssize_t len);
+
+*nvmem_cell_get() apis will get a reference to nvmem cell for a given id,
+and nvmem_cell_read/write() can then read or write to the cell.
+Once the usage of the cell is finished the consumer should call *nvmem_cell_put()
+to free all the allocation memory for the cell.
+
+4. Direct NVMEM device based consumer APIs
+==========================================
+
+In some instances it is necessary to directly read/write the NVMEM.
+To facilitate such consumers NVMEM framework provides below apis.
+
+struct nvmem_device *nvmem_device_get(struct device *dev, const char *name);
+struct nvmem_device *devm_nvmem_device_get(struct device *dev,
+					   const char *name);
+void nvmem_device_put(struct nvmem_device *nvmem);
+int nvmem_device_read(struct nvmem_device *nvmem, unsigned int offset,
+		      size_t bytes, void *buf);
+int nvmem_device_write(struct nvmem_device *nvmem, unsigned int offset,
+		       size_t bytes, void *buf);
+int nvmem_device_cell_read(struct nvmem_device *nvmem,
+			   struct nvmem_cell_info *info, void *buf);
+int nvmem_device_cell_write(struct nvmem_device *nvmem,
+			    struct nvmem_cell_info *info, void *buf);
+
+Before the consumers can read/write NVMEM directly, it should get hold
+of nvmem_controller from one of the *nvmem_device_get() api.
+
+The difference between these apis and cell based apis is that these apis always
+take nvmem_device as parameter.
+
+5. Releasing a reference to the NVMEM
+=====================================
+
+When a consumers no longer needs the NVMEM, it has to release the reference
+to the NVMEM it has obtained using the APIs mentioned in the above section.
+The NVMEM framework provides 2 APIs to release a reference to the NVMEM.
+
+void nvmem_cell_put(struct nvmem_cell *cell);
+void devm_nvmem_cell_put(struct device *dev, struct nvmem_cell *cell);
+void nvmem_device_put(struct nvmem_device *nvmem);
+void devm_nvmem_device_put(struct device *dev, struct nvmem_device *nvmem);
+
+Both these APIs are used to release a reference to the NVMEM and
+devm_nvmem_cell_put and devm_nvmem_device_put destroys the devres associated
+with this NVMEM.
+
+Userspace
++++++++++
+
+6. Userspace binary interface
+==============================
+
+Userspace can read/write the raw NVMEM file located at
+/sys/bus/nvmem/devices/*/nvmem
+
+ex:
+
+hexdump /sys/bus/nvmem/devices/qfprom0/nvmem
+
+0000000 0000 0000 0000 0000 0000 0000 0000 0000
+*
+00000a0 db10 2240 0000 e000 0c00 0c00 0000 0c00
+0000000 0000 0000 0000 0000 0000 0000 0000 0000
+...
+*
+0001000
+
+7. DeviceTree Binding
+=====================
+
+See Documentation/devicetree/bindings/nvmem/nvmem.txt
-- 
cgit v1.2.3


From b470d6d7a5dfe41112d55c39eac67ddc5afac80d Mon Sep 17 00:00:00 2001
From: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Date: Mon, 27 Jul 2015 12:16:59 +0100
Subject: nvmem: qfprom: Add bindings for qfprom

This patch adds bindings for qfprom found in QCOM SOCs. QFPROM driver
is based on simple nvmem framework.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Rajendra Nayak <rnayak@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/devicetree/bindings/nvmem/qfprom.txt | 35 ++++++++++++++++++++++
 1 file changed, 35 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/nvmem/qfprom.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/nvmem/qfprom.txt b/Documentation/devicetree/bindings/nvmem/qfprom.txt
new file mode 100644
index 000000000000..4ad68b7f5c18
--- /dev/null
+++ b/Documentation/devicetree/bindings/nvmem/qfprom.txt
@@ -0,0 +1,35 @@
+= Qualcomm QFPROM device tree bindings =
+
+This binding is intended to represent QFPROM which is found in most QCOM SOCs.
+
+Required properties:
+- compatible: should be "qcom,qfprom"
+- reg: Should contain registers location and length
+
+= Data cells =
+Are child nodes of qfprom, bindings of which as described in
+bindings/nvmem/nvmem.txt
+
+Example:
+
+	qfprom: qfprom@00700000 {
+		compatible 	= "qcom,qfprom";
+		reg		= <0x00700000 0x8000>;
+		...
+		/* Data cells */
+		tsens_calibration: calib@404 {
+			reg = <0x4404 0x10>;
+		};
+	};
+
+
+= Data consumers =
+Are device nodes which consume nvmem data cells.
+
+For example:
+
+	tsens {
+		...
+		nvmem-cells = <&tsens_calibration>;
+		nvmem-cell-names = "calibration";
+	};
-- 
cgit v1.2.3


From 3d0b16a66c8a9d10294572c6f79df4f15a27825d Mon Sep 17 00:00:00 2001
From: Maxime Ripard <maxime.ripard@free-electrons.com>
Date: Mon, 27 Jul 2015 12:17:09 +0100
Subject: nvmem: sunxi: Move the SID driver to the nvmem framework

Now that we have the nvmem framework, we can consolidate the common
driver code. Move the driver to the framework, and hopefully, it will
fix the sysfs file creation race.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
[srinivas.kandagatla: Moved to regmap based EEPROM framework]
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Rajendra Nayak <rnayak@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/ABI/testing/sysfs-driver-sunxi-sid   |  22 ---
 .../bindings/misc/allwinner,sunxi-sid.txt          |  17 --
 .../bindings/nvmem/allwinner,sunxi-sid.txt         |  21 +++
 drivers/misc/eeprom/Kconfig                        |  13 --
 drivers/misc/eeprom/Makefile                       |   1 -
 drivers/misc/eeprom/sunxi_sid.c                    | 156 -------------------
 drivers/nvmem/Kconfig                              |  11 ++
 drivers/nvmem/Makefile                             |   2 +
 drivers/nvmem/sunxi_sid.c                          | 171 +++++++++++++++++++++
 9 files changed, 205 insertions(+), 209 deletions(-)
 delete mode 100644 Documentation/ABI/testing/sysfs-driver-sunxi-sid
 delete mode 100644 Documentation/devicetree/bindings/misc/allwinner,sunxi-sid.txt
 create mode 100644 Documentation/devicetree/bindings/nvmem/allwinner,sunxi-sid.txt
 delete mode 100644 drivers/misc/eeprom/sunxi_sid.c
 create mode 100644 drivers/nvmem/sunxi_sid.c

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-driver-sunxi-sid b/Documentation/ABI/testing/sysfs-driver-sunxi-sid
deleted file mode 100644
index ffb9536f6ecc..000000000000
--- a/Documentation/ABI/testing/sysfs-driver-sunxi-sid
+++ /dev/null
@@ -1,22 +0,0 @@
-What:		/sys/devices/*/<our-device>/eeprom
-Date:		August 2013
-Contact:	Oliver Schinagl <oliver@schinagl.nl>
-Description:	read-only access to the SID (Security-ID) on current
-		A-series SoC's from Allwinner. Currently supports A10, A10s, A13
-		and A20 CPU's. The earlier A1x series of SoCs exports 16 bytes,
-		whereas the newer A20 SoC exposes 512 bytes split into sections.
-		Besides the 16 bytes of SID, there's also an SJTAG area,
-		HDMI-HDCP key and some custom keys. Below a quick overview, for
-		details see the user manual:
-		0x000  128 bit root-key (sun[457]i)
-		0x010  128 bit boot-key (sun7i)
-		0x020   64 bit security-jtag-key (sun7i)
-		0x028   16 bit key configuration (sun7i)
-		0x02b   16 bit custom-vendor-key (sun7i)
-		0x02c  320 bit low general key (sun7i)
-		0x040   32 bit read-control access (sun7i)
-		0x064  224 bit low general key (sun7i)
-		0x080 2304 bit HDCP-key (sun7i)
-		0x1a0  768 bit high general key (sun7i)
-Users:		any user space application which wants to read the SID on
-		Allwinner's A-series of CPU's.
diff --git a/Documentation/devicetree/bindings/misc/allwinner,sunxi-sid.txt b/Documentation/devicetree/bindings/misc/allwinner,sunxi-sid.txt
deleted file mode 100644
index fabdf64a5737..000000000000
--- a/Documentation/devicetree/bindings/misc/allwinner,sunxi-sid.txt
+++ /dev/null
@@ -1,17 +0,0 @@
-Allwinner sunxi-sid
-
-Required properties:
-- compatible: "allwinner,sun4i-a10-sid" or "allwinner,sun7i-a20-sid"
-- reg: Should contain registers location and length
-
-Example for sun4i:
-	sid@01c23800 {
-		compatible = "allwinner,sun4i-a10-sid";
-		reg = <0x01c23800 0x10>
-	};
-
-Example for sun7i:
-	sid@01c23800 {
-		compatible = "allwinner,sun7i-a20-sid";
-		reg = <0x01c23800 0x200>
-	};
diff --git a/Documentation/devicetree/bindings/nvmem/allwinner,sunxi-sid.txt b/Documentation/devicetree/bindings/nvmem/allwinner,sunxi-sid.txt
new file mode 100644
index 000000000000..d543ed3f5363
--- /dev/null
+++ b/Documentation/devicetree/bindings/nvmem/allwinner,sunxi-sid.txt
@@ -0,0 +1,21 @@
+Allwinner sunxi-sid
+
+Required properties:
+- compatible: "allwinner,sun4i-a10-sid" or "allwinner,sun7i-a20-sid"
+- reg: Should contain registers location and length
+
+= Data cells =
+Are child nodes of qfprom, bindings of which as described in
+bindings/nvmem/nvmem.txt
+
+Example for sun4i:
+	sid@01c23800 {
+		compatible = "allwinner,sun4i-a10-sid";
+		reg = <0x01c23800 0x10>
+	};
+
+Example for sun7i:
+	sid@01c23800 {
+		compatible = "allwinner,sun7i-a20-sid";
+		reg = <0x01c23800 0x200>
+	};
diff --git a/drivers/misc/eeprom/Kconfig b/drivers/misc/eeprom/Kconfig
index 9536852fd4c6..04f2e1fa9dd1 100644
--- a/drivers/misc/eeprom/Kconfig
+++ b/drivers/misc/eeprom/Kconfig
@@ -96,17 +96,4 @@ config EEPROM_DIGSY_MTC_CFG
 
 	  If unsure, say N.
 
-config EEPROM_SUNXI_SID
-	tristate "Allwinner sunxi security ID support"
-	depends on ARCH_SUNXI && SYSFS
-	help
-	  This is a driver for the 'security ID' available on various Allwinner
-	  devices.
-
-	  Due to the potential risks involved with changing e-fuses,
-	  this driver is read-only.
-
-	  This driver can also be built as a module. If so, the module
-	  will be called sunxi_sid.
-
 endmenu
diff --git a/drivers/misc/eeprom/Makefile b/drivers/misc/eeprom/Makefile
index 9507aec95e94..fc1e81d29267 100644
--- a/drivers/misc/eeprom/Makefile
+++ b/drivers/misc/eeprom/Makefile
@@ -4,5 +4,4 @@ obj-$(CONFIG_EEPROM_LEGACY)	+= eeprom.o
 obj-$(CONFIG_EEPROM_MAX6875)	+= max6875.o
 obj-$(CONFIG_EEPROM_93CX6)	+= eeprom_93cx6.o
 obj-$(CONFIG_EEPROM_93XX46)	+= eeprom_93xx46.o
-obj-$(CONFIG_EEPROM_SUNXI_SID)	+= sunxi_sid.o
 obj-$(CONFIG_EEPROM_DIGSY_MTC_CFG) += digsy_mtc_eeprom.o
diff --git a/drivers/misc/eeprom/sunxi_sid.c b/drivers/misc/eeprom/sunxi_sid.c
deleted file mode 100644
index 8385177ff32b..000000000000
--- a/drivers/misc/eeprom/sunxi_sid.c
+++ /dev/null
@@ -1,156 +0,0 @@
-/*
- * Copyright (c) 2013 Oliver Schinagl <oliver@schinagl.nl>
- * http://www.linux-sunxi.org
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * This driver exposes the Allwinner security ID, efuses exported in byte-
- * sized chunks.
- */
-
-#include <linux/compiler.h>
-#include <linux/device.h>
-#include <linux/err.h>
-#include <linux/export.h>
-#include <linux/fs.h>
-#include <linux/io.h>
-#include <linux/kernel.h>
-#include <linux/kobject.h>
-#include <linux/module.h>
-#include <linux/of_device.h>
-#include <linux/platform_device.h>
-#include <linux/random.h>
-#include <linux/slab.h>
-#include <linux/stat.h>
-#include <linux/sysfs.h>
-#include <linux/types.h>
-
-#define DRV_NAME "sunxi-sid"
-
-struct sunxi_sid_data {
-	void __iomem *reg_base;
-	unsigned int keysize;
-};
-
-/* We read the entire key, due to a 32 bit read alignment requirement. Since we
- * want to return the requested byte, this results in somewhat slower code and
- * uses 4 times more reads as needed but keeps code simpler. Since the SID is
- * only very rarely probed, this is not really an issue.
- */
-static u8 sunxi_sid_read_byte(const struct sunxi_sid_data *sid_data,
-			      const unsigned int offset)
-{
-	u32 sid_key;
-
-	if (offset >= sid_data->keysize)
-		return 0;
-
-	sid_key = ioread32be(sid_data->reg_base + round_down(offset, 4));
-	sid_key >>= (offset % 4) * 8;
-
-	return sid_key; /* Only return the last byte */
-}
-
-static ssize_t sid_read(struct file *fd, struct kobject *kobj,
-			struct bin_attribute *attr, char *buf,
-			loff_t pos, size_t size)
-{
-	struct platform_device *pdev;
-	struct sunxi_sid_data *sid_data;
-	int i;
-
-	pdev = to_platform_device(kobj_to_dev(kobj));
-	sid_data = platform_get_drvdata(pdev);
-
-	if (pos < 0 || pos >= sid_data->keysize)
-		return 0;
-	if (size > sid_data->keysize - pos)
-		size = sid_data->keysize - pos;
-
-	for (i = 0; i < size; i++)
-		buf[i] = sunxi_sid_read_byte(sid_data, pos + i);
-
-	return i;
-}
-
-static struct bin_attribute sid_bin_attr = {
-	.attr = { .name = "eeprom", .mode = S_IRUGO, },
-	.read = sid_read,
-};
-
-static int sunxi_sid_remove(struct platform_device *pdev)
-{
-	device_remove_bin_file(&pdev->dev, &sid_bin_attr);
-	dev_dbg(&pdev->dev, "driver unloaded\n");
-
-	return 0;
-}
-
-static const struct of_device_id sunxi_sid_of_match[] = {
-	{ .compatible = "allwinner,sun4i-a10-sid", .data = (void *)16},
-	{ .compatible = "allwinner,sun7i-a20-sid", .data = (void *)512},
-	{/* sentinel */},
-};
-MODULE_DEVICE_TABLE(of, sunxi_sid_of_match);
-
-static int sunxi_sid_probe(struct platform_device *pdev)
-{
-	struct sunxi_sid_data *sid_data;
-	struct resource *res;
-	const struct of_device_id *of_dev_id;
-	u8 *entropy;
-	unsigned int i;
-
-	sid_data = devm_kzalloc(&pdev->dev, sizeof(struct sunxi_sid_data),
-				GFP_KERNEL);
-	if (!sid_data)
-		return -ENOMEM;
-
-	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	sid_data->reg_base = devm_ioremap_resource(&pdev->dev, res);
-	if (IS_ERR(sid_data->reg_base))
-		return PTR_ERR(sid_data->reg_base);
-
-	of_dev_id = of_match_device(sunxi_sid_of_match, &pdev->dev);
-	if (!of_dev_id)
-		return -ENODEV;
-	sid_data->keysize = (int)of_dev_id->data;
-
-	platform_set_drvdata(pdev, sid_data);
-
-	sid_bin_attr.size = sid_data->keysize;
-	if (device_create_bin_file(&pdev->dev, &sid_bin_attr))
-		return -ENODEV;
-
-	entropy = kzalloc(sizeof(u8) * sid_data->keysize, GFP_KERNEL);
-	for (i = 0; i < sid_data->keysize; i++)
-		entropy[i] = sunxi_sid_read_byte(sid_data, i);
-	add_device_randomness(entropy, sid_data->keysize);
-	kfree(entropy);
-
-	dev_dbg(&pdev->dev, "loaded\n");
-
-	return 0;
-}
-
-static struct platform_driver sunxi_sid_driver = {
-	.probe = sunxi_sid_probe,
-	.remove = sunxi_sid_remove,
-	.driver = {
-		.name = DRV_NAME,
-		.of_match_table = sunxi_sid_of_match,
-	},
-};
-module_platform_driver(sunxi_sid_driver);
-
-MODULE_AUTHOR("Oliver Schinagl <oliver@schinagl.nl>");
-MODULE_DESCRIPTION("Allwinner sunxi security id driver");
-MODULE_LICENSE("GPL");
diff --git a/drivers/nvmem/Kconfig b/drivers/nvmem/Kconfig
index fa85805bbdb7..8db297821f78 100644
--- a/drivers/nvmem/Kconfig
+++ b/drivers/nvmem/Kconfig
@@ -25,4 +25,15 @@ config QCOM_QFPROM
 	  This driver can also be built as a module. If so, the module
 	  will be called nvmem_qfprom.
 
+config NVMEM_SUNXI_SID
+	tristate "Allwinner SoCs SID support"
+	depends on ARCH_SUNXI
+	select REGMAP_MMIO
+	help
+	  This is a driver for the 'security ID' available on various Allwinner
+	  devices.
+
+	  This driver can also be built as a module. If so, the module
+	  will be called nvmem_sunxi_sid.
+
 endif
diff --git a/drivers/nvmem/Makefile b/drivers/nvmem/Makefile
index ff44fe99c7d2..4328b930ad9a 100644
--- a/drivers/nvmem/Makefile
+++ b/drivers/nvmem/Makefile
@@ -8,3 +8,5 @@ nvmem_core-y			:= core.o
 # Devices
 obj-$(CONFIG_QCOM_QFPROM)	+= nvmem_qfprom.o
 nvmem_qfprom-y			:= qfprom.o
+obj-$(CONFIG_NVMEM_SUNXI_SID)	+= nvmem_sunxi_sid.o
+nvmem_sunxi_sid-y		:= sunxi_sid.o
diff --git a/drivers/nvmem/sunxi_sid.c b/drivers/nvmem/sunxi_sid.c
new file mode 100644
index 000000000000..14777dd5212d
--- /dev/null
+++ b/drivers/nvmem/sunxi_sid.c
@@ -0,0 +1,171 @@
+/*
+ * Allwinner sunXi SoCs Security ID support.
+ *
+ * Copyright (c) 2013 Oliver Schinagl <oliver@schinagl.nl>
+ * Copyright (C) 2014 Maxime Ripard <maxime.ripard@free-electrons.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+
+#include <linux/device.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/nvmem-provider.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/regmap.h>
+#include <linux/slab.h>
+#include <linux/random.h>
+
+
+static struct nvmem_config econfig = {
+	.name = "sunxi-sid",
+	.read_only = true,
+	.owner = THIS_MODULE,
+};
+
+struct sunxi_sid {
+	void __iomem		*base;
+};
+
+/* We read the entire key, due to a 32 bit read alignment requirement. Since we
+ * want to return the requested byte, this results in somewhat slower code and
+ * uses 4 times more reads as needed but keeps code simpler. Since the SID is
+ * only very rarely probed, this is not really an issue.
+ */
+static u8 sunxi_sid_read_byte(const struct sunxi_sid *sid,
+			      const unsigned int offset)
+{
+	u32 sid_key;
+
+	sid_key = ioread32be(sid->base + round_down(offset, 4));
+	sid_key >>= (offset % 4) * 8;
+
+	return sid_key; /* Only return the last byte */
+}
+
+static int sunxi_sid_read(void *context,
+			    const void *reg, size_t reg_size,
+			    void *val, size_t val_size)
+{
+	struct sunxi_sid *sid = context;
+	unsigned int offset = *(u32 *)reg;
+	u8 *buf = val;
+
+	while (val_size) {
+		*buf++ = sunxi_sid_read_byte(sid, offset);
+		val_size--;
+		offset++;
+	}
+
+	return 0;
+}
+
+static int sunxi_sid_write(void *context, const void *data, size_t count)
+{
+	/* Unimplemented, dummy to keep regmap core happy */
+	return 0;
+}
+
+static struct regmap_bus sunxi_sid_bus = {
+	.read = sunxi_sid_read,
+	.write = sunxi_sid_write,
+	.reg_format_endian_default = REGMAP_ENDIAN_NATIVE,
+	.val_format_endian_default = REGMAP_ENDIAN_NATIVE,
+};
+
+static bool sunxi_sid_writeable_reg(struct device *dev, unsigned int reg)
+{
+	return false;
+}
+
+static struct regmap_config sunxi_sid_regmap_config = {
+	.reg_bits = 32,
+	.val_bits = 8,
+	.reg_stride = 1,
+	.writeable_reg = sunxi_sid_writeable_reg,
+};
+
+static int sunxi_sid_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct resource *res;
+	struct nvmem_device *nvmem;
+	struct regmap *regmap;
+	struct sunxi_sid *sid;
+	int i, size;
+	char *randomness;
+
+	sid = devm_kzalloc(dev, sizeof(*sid), GFP_KERNEL);
+	if (!sid)
+		return -ENOMEM;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	sid->base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(sid->base))
+		return PTR_ERR(sid->base);
+
+	size = resource_size(res) - 1;
+	sunxi_sid_regmap_config.max_register = size;
+
+	regmap = devm_regmap_init(dev, &sunxi_sid_bus, sid,
+				  &sunxi_sid_regmap_config);
+	if (IS_ERR(regmap)) {
+		dev_err(dev, "regmap init failed\n");
+		return PTR_ERR(regmap);
+	}
+
+	econfig.dev = dev;
+	nvmem = nvmem_register(&econfig);
+	if (IS_ERR(nvmem))
+		return PTR_ERR(nvmem);
+
+	randomness = kzalloc(sizeof(u8) * size, GFP_KERNEL);
+	for (i = 0; i < size; i++)
+		randomness[i] = sunxi_sid_read_byte(sid, i);
+
+	add_device_randomness(randomness, size);
+	kfree(randomness);
+
+	platform_set_drvdata(pdev, nvmem);
+
+	return 0;
+}
+
+static int sunxi_sid_remove(struct platform_device *pdev)
+{
+	struct nvmem_device *nvmem = platform_get_drvdata(pdev);
+
+	return nvmem_unregister(nvmem);
+}
+
+static const struct of_device_id sunxi_sid_of_match[] = {
+	{ .compatible = "allwinner,sun4i-a10-sid" },
+	{ .compatible = "allwinner,sun7i-a20-sid" },
+	{/* sentinel */},
+};
+MODULE_DEVICE_TABLE(of, sunxi_sid_of_match);
+
+static struct platform_driver sunxi_sid_driver = {
+	.probe = sunxi_sid_probe,
+	.remove = sunxi_sid_remove,
+	.driver = {
+		.name = "eeprom-sunxi-sid",
+		.of_match_table = sunxi_sid_of_match,
+	},
+};
+module_platform_driver(sunxi_sid_driver);
+
+MODULE_AUTHOR("Oliver Schinagl <oliver@schinagl.nl>");
+MODULE_DESCRIPTION("Allwinner sunxi security id driver");
+MODULE_LICENSE("GPL");
-- 
cgit v1.2.3


From 223e8f01b072152cc4336e7dcb1cf8c21ab34432 Mon Sep 17 00:00:00 2001
From: "Seymour, Shane M" <shane.seymour@hp.com>
Date: Thu, 25 Jun 2015 02:33:08 +0000
Subject: sysfs.txt: update show method notes about sprintf/snprintf/scnprintf
 usage

Changed the documentation to allow sprintf() when the buffer
provided by sysfs cannot be overflowed. Explicitly say
snprintf() must never be used in a show function to format
data to be returned to user space.

Change based on a discussion about the patch
st: convert DRIVER_ATTR macros to DRIVER_ATTR_RO

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shane Seymour <shane.seymour@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/filesystems/sysfs.txt | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt
index b35a64b82f9e..9494afb9476a 100644
--- a/Documentation/filesystems/sysfs.txt
+++ b/Documentation/filesystems/sysfs.txt
@@ -212,7 +212,10 @@ Other notes:
 - show() methods should return the number of bytes printed into the
   buffer. This is the return value of scnprintf().
 
-- show() should always use scnprintf().
+- show() must not use snprintf() when formatting the value to be
+  returned to user space. If you can guarantee that an overflow
+  will never happen you can use sprintf() otherwise you must use
+  scnprintf().
 
 - store() should return the number of bytes used from the buffer. If the
   entire buffer has been used, just return the count argument.
-- 
cgit v1.2.3


From 197165d44925bd0fa892990851dee4d312a44b39 Mon Sep 17 00:00:00 2001
From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Date: Fri, 24 Apr 2015 14:24:44 +0530
Subject: powerpc/ftrace: add powerpc timebase as a trace clock source

Add a new powerpc-specific trace clock using the timebase register,
similar to x86-tsc. This gives us
- a fast, monotonic, hardware clock source for trace entries, and
- a clock that can be used to correlate events across cpus as well as across
  hypervisor and guests.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 Documentation/trace/ftrace.txt         |  5 +++++
 arch/powerpc/include/asm/Kbuild        |  1 -
 arch/powerpc/include/asm/trace_clock.h | 19 +++++++++++++++++++
 arch/powerpc/kernel/Makefile           |  1 +
 arch/powerpc/kernel/trace_clock.c      | 15 +++++++++++++++
 5 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/trace_clock.h
 create mode 100644 arch/powerpc/kernel/trace_clock.c

(limited to 'Documentation')

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 7ddb1e319f84..87bb4aa6a6b9 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -346,6 +346,11 @@ of ftrace. Here is a list of some of the key files:
 	  x86-tsc: Architectures may define their own clocks. For
 	  	   example, x86 uses its own TSC cycle clock here.
 
+	  ppc-tb: This uses the powerpc timebase register value.
+		  This is in sync across CPUs and can also be used
+		  to correlate events across hypervisor/guest if
+		  tb_offset is known.
+
 	To set a clock, simply echo the clock name into this file.
 
 	  echo global > trace_clock
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 050712e1ce41..ab9f4e0ed4cf 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -6,5 +6,4 @@ generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += preempt.h
 generic-y += rwsem.h
-generic-y += trace_clock.h
 generic-y += vtime.h
diff --git a/arch/powerpc/include/asm/trace_clock.h b/arch/powerpc/include/asm/trace_clock.h
new file mode 100644
index 000000000000..cf1ee75ca069
--- /dev/null
+++ b/arch/powerpc/include/asm/trace_clock.h
@@ -0,0 +1,19 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * Copyright (C) 2015 Naveen N. Rao, IBM Corporation
+ */
+
+#ifndef _ASM_PPC_TRACE_CLOCK_H
+#define _ASM_PPC_TRACE_CLOCK_H
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+extern u64 notrace trace_clock_ppc_tb(void);
+
+#define ARCH_TRACE_CLOCKS { trace_clock_ppc_tb, "ppc-tb", 0 },
+
+#endif  /* _ASM_PPC_TRACE_CLOCK_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 12868b1c4e05..ba336930d448 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -118,6 +118,7 @@ obj-$(CONFIG_PPC_IO_WORKAROUNDS)	+= io-workarounds.o
 obj-$(CONFIG_DYNAMIC_FTRACE)	+= ftrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER)	+= ftrace.o
 obj-$(CONFIG_FTRACE_SYSCALLS)	+= ftrace.o
+obj-$(CONFIG_TRACING)		+= trace_clock.o
 
 ifneq ($(CONFIG_PPC_INDIRECT_PIO),y)
 obj-y				+= iomap.o
diff --git a/arch/powerpc/kernel/trace_clock.c b/arch/powerpc/kernel/trace_clock.c
new file mode 100644
index 000000000000..49170690946d
--- /dev/null
+++ b/arch/powerpc/kernel/trace_clock.c
@@ -0,0 +1,15 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * Copyright (C) 2015 Naveen N. Rao, IBM Corporation
+ */
+
+#include <asm/trace_clock.h>
+#include <asm/time.h>
+
+u64 notrace trace_clock_ppc_tb(void)
+{
+	return get_tb();
+}
-- 
cgit v1.2.3


From 151b49e19119da28dfcd29561147ec56913cbe61 Mon Sep 17 00:00:00 2001
From: Frank Li <Frank.Li@freescale.com>
Date: Tue, 4 Aug 2015 10:25:41 -0500
Subject: Documentation: fsl-quadspi: Add fsl, imx7d-qspi compatible string

new compatible string: "fsl,imx7d-qspi"

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
---
 Documentation/devicetree/bindings/mtd/fsl-quadspi.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt b/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt
index 4461dc71cb10..ca72696d70ae 100644
--- a/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt
+++ b/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt
@@ -1,7 +1,8 @@
 * Freescale Quad Serial Peripheral Interface(QuadSPI)
 
 Required properties:
-  - compatible : Should be "fsl,vf610-qspi" or "fsl,imx6sx-qspi"
+  - compatible : Should be "fsl,vf610-qspi", "fsl,imx6sx-qspi",
+		 "fsl,imx7d-qspi"
   - reg : the first contains the register location and length,
           the second contains the memory mapping address and length
   - reg-names: Should contain the reg names "QuadSPI" and "QuadSPI-memory"
-- 
cgit v1.2.3


From cb94d0bb41e43b4fdd219ad930686397814bcd8d Mon Sep 17 00:00:00 2001
From: Frank Li <Frank.Li@freescale.com>
Date: Tue, 4 Aug 2015 10:25:52 -0500
Subject: Documentation: fsl-quadspi: Add fsl, imx6ul-qspi compatible string

new compatible string: "fsl,imx6ul-qspi".

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
---
 Documentation/devicetree/bindings/mtd/fsl-quadspi.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt b/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt
index ca72696d70ae..862aa2f8837a 100644
--- a/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt
+++ b/Documentation/devicetree/bindings/mtd/fsl-quadspi.txt
@@ -2,7 +2,7 @@
 
 Required properties:
   - compatible : Should be "fsl,vf610-qspi", "fsl,imx6sx-qspi",
-		 "fsl,imx7d-qspi"
+		 "fsl,imx7d-qspi", "fsl,imx6ul-qspi"
   - reg : the first contains the register location and length,
           the second contains the memory mapping address and length
   - reg-names: Should contain the reg names "QuadSPI" and "QuadSPI-memory"
-- 
cgit v1.2.3


From e9c9963b439db734e9705f96909c5a3388dd81bb Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Thu, 6 Aug 2015 12:44:23 -0600
Subject: Revert "DocBook: Avoid building man pages repeatedly and
 inconsistently"

This reverts commit b44158b17099ed5c7c8f4bfb7029942adbfbc318.  This commit
introduced warnings and possibly inconsistent results into the doc build
process.  The goal is good but it will need to be achieved another way.

Reported-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/Makefile            | 10 +---------
 Documentation/DocBook/device-drivers.tmpl |  6 ------
 Documentation/DocBook/gadget.tmpl         |  3 ---
 Documentation/DocBook/kernel-api.tmpl     |  6 ------
 4 files changed, 1 insertion(+), 24 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index 83fcb6c2a00f..11a41456b943 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -56,13 +56,6 @@ htmldocs: $(HTML)
 
 MAN := $(patsubst %.xml, %.9, $(BOOKS))
 mandocs: $(MAN)
-	@dups=$$(sed -n 's/.*<refname>\([^<]*\)<\/refname>.*/\1/p'	\
-		 $(obj)/*.xml.noextra | sort | uniq -d);		\
-	if [ -n "$$dups" ]; then					\
-		echo >&2 "The following manual pages are generated more than once:"; \
-		printf >&2 '%s\n' "$$dups";				\
-		exit 1;							\
-	fi
 	find $(obj)/man -name '*.9' | xargs gzip -nf
 
 installmandocs: mandocs
@@ -157,7 +150,7 @@ quiet_cmd_db2html = HTML    $@
             cp $(PNG-$(basename $(notdir $@))) $(patsubst %.html,%,$@); fi
 
 quiet_cmd_db2man = MAN     $@
-      cmd_db2man = if grep -q refentry $<; then xmlif excludeextra=1 <$< >$<.noextra && xmlto man $(XMLTOFLAGS) -o $(obj)/man $<.noextra ; fi
+      cmd_db2man = if grep -q refentry $<; then xmlto man $(XMLTOFLAGS) -o $(obj)/man $< ; fi
 %.9 : %.xml
 	@(which xmlto > /dev/null 2>&1) || \
 	 (echo "*** You need to install xmlto ***"; \
@@ -224,7 +217,6 @@ clean-files := $(DOCBOOKS) \
 	$(patsubst %.xml, %.ps,   $(DOCBOOKS)) \
 	$(patsubst %.xml, %.pdf,  $(DOCBOOKS)) \
 	$(patsubst %.xml, %.html, $(DOCBOOKS)) \
-	$(patsubst %, %.noextra,  $(DOCBOOKS)) \
 	$(patsubst %.xml, %.9,    $(DOCBOOKS)) \
 	$(index)
 
diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index 87853ea1f70b..faf09d4a0ea8 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -194,13 +194,8 @@ X!Edrivers/pnp/system.c
 
   <chapter id="snddev">
      <title>Sound Devices</title>
-<?xmlif if excludeextra='1'?>
-<?xmlif else?>
 !Iinclude/sound/core.h
-<?xmlif fi?>
 !Esound/sound_core.c
-<?xmlif if excludeextra='1'?>
-<?xmlif else?>
 !Iinclude/sound/pcm.h
 !Esound/core/pcm.c
 !Esound/core/device.c
@@ -216,7 +211,6 @@ X!Edrivers/pnp/system.c
 !Esound/core/hwdep.c
 !Esound/core/pcm_native.c
 !Esound/core/memalloc.c
-<?xmlif fi?>
 <!-- FIXME: Removed for now since no structured comments in source
 X!Isound/sound_firmware.c
 -->
diff --git a/Documentation/DocBook/gadget.tmpl b/Documentation/DocBook/gadget.tmpl
index e1b87bd63f82..641629221176 100644
--- a/Documentation/DocBook/gadget.tmpl
+++ b/Documentation/DocBook/gadget.tmpl
@@ -488,10 +488,7 @@ These are the same types and constants used by host
 side drivers (and usbcore).
 </para>
 
-<?xmlif if excludeextra='1'?>
-<?xmlif else?>
 !Iinclude/linux/usb/ch9.h
-<?xmlif fi?>
 </sect1>
 
 <sect1 id="core"><title>Core Objects and Methods</title>
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index 722249a0dbfe..ecfd0ea40661 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -58,11 +58,8 @@
 
      <sect1><title>String Conversions</title>
 !Elib/vsprintf.c
-<?xmlif if excludeextra='1'?>
-<?xmlif else?>
 !Finclude/linux/kernel.h kstrtol
 !Finclude/linux/kernel.h kstrtoul
-<?xmlif fi?>
 !Elib/kstrtox.c
      </sect1>
      <sect1><title>String Manipulation</title>
@@ -181,10 +178,7 @@ X!Ekernel/module.c
   <chapter id="hardware">
      <title>Hardware Interfaces</title>
      <sect1><title>Interrupt Handling</title>
-<?xmlif if excludeextra='1'?>
-<?xmlif else?>
 !Ekernel/irq/manage.c
-<?xmlif fi?>
      </sect1>
 
      <sect1><title>DMA Channels</title>
-- 
cgit v1.2.3


From 64e32895f9d87aaa0842c07d0b17f4895955db20 Mon Sep 17 00:00:00 2001
From: Jakub Wilk <jwilk@jwilk.net>
Date: Mon, 27 Jul 2015 10:15:18 +0200
Subject: SubmittingPatches: remove stray quote character

Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/SubmittingPatches | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 80b9f54f87ef..fa596d8a3ab0 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -739,7 +739,7 @@ interest on a single line; it should look something like:
 
       git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus
 
-  to get these changes:"
+  to get these changes:
 
 A pull request should also include an overall message saying what will be
 included in the request, a "git shortlog" listing of the patches
-- 
cgit v1.2.3


From b325e832e9478b920c1b8366623fe456a3527513 Mon Sep 17 00:00:00 2001
From: Murali Karicheri <m-karicheri2@ti.com>
Date: Tue, 4 Aug 2015 12:36:43 -0400
Subject: ARM: keystone: add documentation for SoCs and EVMs

Currently there is no general documentation on Keystone SoCs in the
Linux Documentation folder of the source tree. This patch adds some
essential documentation with links to help users of Keystone Linux and
also provide links to existing documents where necessary.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/arm/keystone/Overview.txt | 73 +++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)
 create mode 100644 Documentation/arm/keystone/Overview.txt

(limited to 'Documentation')

diff --git a/Documentation/arm/keystone/Overview.txt b/Documentation/arm/keystone/Overview.txt
new file mode 100644
index 000000000000..f17bc4c9dff9
--- /dev/null
+++ b/Documentation/arm/keystone/Overview.txt
@@ -0,0 +1,73 @@
+		TI Keystone Linux Overview
+		--------------------------
+
+Introduction
+------------
+Keystone range of SoCs are based on ARM Cortex-A15 MPCore Processors
+and c66x DSP cores. This document describes essential information required
+for users to run Linux on Keystone based EVMs from Texas Instruments.
+
+Following SoCs  & EVMs are currently supported:-
+
+------------ K2HK SoC and EVM --------------------------------------------------
+
+a.k.a Keystone 2 Hawking/Kepler SoC
+TCI6636K2H & TCI6636K2K: See documentation at
+	http://www.ti.com/product/tci6638k2k
+	http://www.ti.com/product/tci6638k2h
+
+EVM:
+http://www.advantech.com/Support/TI-EVM/EVMK2HX_sd.aspx
+
+------------ K2E SoC and EVM ---------------------------------------------------
+
+a.k.a Keystone 2 Edison SoC
+K2E  -  66AK2E05: See documentation at
+	http://www.ti.com/product/66AK2E05/technicaldocuments
+
+EVM:
+https://www.einfochips.com/index.php/partnerships/texas-instruments/k2e-evm.html
+
+------------ K2L SoC and EVM ---------------------------------------------------
+
+a.k.a Keystone 2 Lamarr SoC
+K2L  -  TCI6630K2L: See documentation at
+	http://www.ti.com/product/TCI6630K2L/technicaldocuments
+EVM:
+https://www.einfochips.com/index.php/partnerships/texas-instruments/k2l-evm.html
+
+Configuration
+-------------
+
+All of the K2 SoCs/EVMs share a common defconfig, keystone_defconfig and same
+image is used to boot on individual EVMs. The platform configuration is
+specified through DTS. Following are the DTS used:-
+	K2HK EVM : k2hk-evm.dts
+	K2E EVM  : k2e-evm.dts
+	K2L EVM  : k2l-evm.dts
+
+The device tree documentation for the keystone machines are located at
+        Documentation/devicetree/bindings/arm/keystone/keystone.txt
+
+Known issues & workaround
+-------------------------
+
+Some of the device drivers used on keystone are re-used from that from
+DaVinci and other TI SoCs. These device drivers may use clock APIs directly.
+Some of the keystone specific drivers such as netcp uses run time power
+management API instead to enable clock. As this API has limitations on
+keystone, following workaround is needed to boot Linux.
+
+   Add 'clk_ignore_unused' to the bootargs env variable in u-boot. Otherwise
+   clock frameworks will try to disable clocks that are unused and disable
+   the hardware. This is because netcp related power domain and clock
+   domains are enabled in u-boot as run time power management API currently
+   doesn't enable clocks for netcp due to a limitation. This workaround is
+   expected to be removed in the future when proper API support becomes
+   available. Until then, this work around is needed.
+
+
+Document Author
+---------------
+Murali Karicheri <m-karicheri2@ti.com>
+Copyright 2015 Texas Instruments
-- 
cgit v1.2.3


From cb4ff766c42f0511ead5488b8de77fd392677464 Mon Sep 17 00:00:00 2001
From: Lv Zheng <lv.zheng@intel.com>
Date: Wed, 5 Aug 2015 16:24:04 +0800
Subject: ACPI / Documentation: Update method tracing documentation.

This patch updates method tracing documentation.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/acpi/method-tracing.txt | 204 ++++++++++++++++++++++++++++++----
 1 file changed, 185 insertions(+), 19 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/acpi/method-tracing.txt b/Documentation/acpi/method-tracing.txt
index f6efb1ea559a..c2505eefc878 100644
--- a/Documentation/acpi/method-tracing.txt
+++ b/Documentation/acpi/method-tracing.txt
@@ -1,26 +1,192 @@
-/sys/module/acpi/parameters/:
+ACPICA Trace Facility
 
-trace_method_name
-	The AML method name that the user wants to trace
+Copyright (C) 2015, Intel Corporation
+Author: Lv Zheng <lv.zheng@intel.com>
 
-trace_debug_layer
-	The temporary debug_layer used when tracing the method.
-	Using 0xffffffff by default if it is 0.
 
-trace_debug_level
-	The temporary debug_level used when tracing the method.
-	Using 0x00ffffff by default if it is 0.
+Abstract:
 
-trace_state
-	The status of the tracing feature.
+This document describes the functions and the interfaces of the method
+tracing facility.
+
+1. Functionalities and usage examples:
+
+   ACPICA provides method tracing capability. And two functions are
+   currently implemented using this capability.
+
+   A. Log reducer
+   ACPICA subsystem provides debugging outputs when CONFIG_ACPI_DEBUG is
+   enabled. The debugging messages which are deployed via
+   ACPI_DEBUG_PRINT() macro can be reduced at 2 levels - per-component
+   level (known as debug layer, configured via
+   /sys/module/acpi/parameters/debug_layer) and per-type level (known as
+   debug level, configured via /sys/module/acpi/parameters/debug_level).
+
+   But when the particular layer/level is applied to the control method
+   evaluations, the quantity of the debugging outputs may still be too
+   large to be put into the kernel log buffer. The idea thus is worked out
+   to only enable the particular debug layer/level (normally more detailed)
+   logs when the control method evaluation is started, and disable the
+   detailed logging when the control method evaluation is stopped.
+
+   The following command examples illustrate the usage of the "log reducer"
+   functionality:
+   a. Filter out the debug layer/level matched logs when control methods
+      are being evaluated:
+      # cd /sys/module/acpi/parameters
+      # echo "0xXXXXXXXX" > trace_debug_layer
+      # echo "0xYYYYYYYY" > trace_debug_level
+      # echo "enable" > trace_state
+   b. Filter out the debug layer/level matched logs when the specified
+      control method is being evaluated:
+      # cd /sys/module/acpi/parameters
+      # echo "0xXXXXXXXX" > trace_debug_layer
+      # echo "0xYYYYYYYY" > trace_debug_level
+      # echo "\PPPP.AAAA.TTTT.HHHH" > trace_method_name
+      # echo "method" > /sys/module/acpi/parameters/trace_state
+   c. Filter out the debug layer/level matched logs when the specified
+      control method is being evaluated for the first time:
+      # cd /sys/module/acpi/parameters
+      # echo "0xXXXXXXXX" > trace_debug_layer
+      # echo "0xYYYYYYYY" > trace_debug_level
+      # echo "\PPPP.AAAA.TTTT.HHHH" > trace_method_name
+      # echo "method-once" > /sys/module/acpi/parameters/trace_state
+   Where:
+      0xXXXXXXXX/0xYYYYYYYY: Refer to Documentation/acpi/debug.txt for
+			     possible debug layer/level masking values.
+      \PPPP.AAAA.TTTT.HHHH: Full path of a control method that can be found
+			    in the ACPI namespace. It needn't be an entry
+			    of a control method evaluation.
+
+   B. AML tracer
+
+   There are special log entries added by the method tracing facility at
+   the "trace points" the AML interpreter starts/stops to execute a control
+   method, or an AML opcode. Note that the format of the log entries are
+   subject to change:
+     [    0.186427]   exdebug-0398 ex_trace_point        : Method Begin [0xf58394d8:\_SB.PCI0.LPCB.ECOK] execution.
+     [    0.186630]   exdebug-0398 ex_trace_point        : Opcode Begin [0xf5905c88:If] execution.
+     [    0.186820]   exdebug-0398 ex_trace_point        : Opcode Begin [0xf5905cc0:LEqual] execution.
+     [    0.187010]   exdebug-0398 ex_trace_point        : Opcode Begin [0xf5905a20:-NamePath-] execution.
+     [    0.187214]   exdebug-0398 ex_trace_point        : Opcode End [0xf5905a20:-NamePath-] execution.
+     [    0.187407]   exdebug-0398 ex_trace_point        : Opcode Begin [0xf5905f60:One] execution.
+     [    0.187594]   exdebug-0398 ex_trace_point        : Opcode End [0xf5905f60:One] execution.
+     [    0.187789]   exdebug-0398 ex_trace_point        : Opcode End [0xf5905cc0:LEqual] execution.
+     [    0.187980]   exdebug-0398 ex_trace_point        : Opcode Begin [0xf5905cc0:Return] execution.
+     [    0.188146]   exdebug-0398 ex_trace_point        : Opcode Begin [0xf5905f60:One] execution.
+     [    0.188334]   exdebug-0398 ex_trace_point        : Opcode End [0xf5905f60:One] execution.
+     [    0.188524]   exdebug-0398 ex_trace_point        : Opcode End [0xf5905cc0:Return] execution.
+     [    0.188712]   exdebug-0398 ex_trace_point        : Opcode End [0xf5905c88:If] execution.
+     [    0.188903]   exdebug-0398 ex_trace_point        : Method End [0xf58394d8:\_SB.PCI0.LPCB.ECOK] execution.
 
-	"enabled" means this feature is enabled
-	and the AML method is traced every time it's executed.
+   Developers can utilize these special log entries to track the AML
+   interpretion, thus can aid issue debugging and performance tuning. Note
+   that, as the "AML tracer" logs are implemented via ACPI_DEBUG_PRINT()
+   macro, CONFIG_ACPI_DEBUG is also required to be enabled for enabling
+   "AML tracer" logs.
 
-	"1" means this feature is enabled and the AML method
-	will only be traced during the next execution.
+   The following command examples illustrate the usage of the "AML tracer"
+   functionality:
+   a. Filter out the method start/stop "AML tracer" logs when control
+      methods are being evaluated:
+      # cd /sys/module/acpi/parameters
+      # echo "0x80" > trace_debug_layer
+      # echo "0x10" > trace_debug_level
+      # echo "enable" > trace_state
+   b. Filter out the method start/stop "AML tracer" when the specified
+      control method is being evaluated:
+      # cd /sys/module/acpi/parameters
+      # echo "0x80" > trace_debug_layer
+      # echo "0x10" > trace_debug_level
+      # echo "\PPPP.AAAA.TTTT.HHHH" > trace_method_name
+      # echo "method" > trace_state
+   c. Filter out the method start/stop "AML tracer" logs when the specified
+      control method is being evaluated for the first time:
+      # cd /sys/module/acpi/parameters
+      # echo "0x80" > trace_debug_layer
+      # echo "0x10" > trace_debug_level
+      # echo "\PPPP.AAAA.TTTT.HHHH" > trace_method_name
+      # echo "method-once" > trace_state
+   d. Filter out the method/opcode start/stop "AML tracer" when the
+      specified control method is being evaluated:
+      # cd /sys/module/acpi/parameters
+      # echo "0x80" > trace_debug_layer
+      # echo "0x10" > trace_debug_level
+      # echo "\PPPP.AAAA.TTTT.HHHH" > trace_method_name
+      # echo "opcode" > trace_state
+   e. Filter out the method/opcode start/stop "AML tracer" when the
+      specified control method is being evaluated for the first time:
+      # cd /sys/module/acpi/parameters
+      # echo "0x80" > trace_debug_layer
+      # echo "0x10" > trace_debug_level
+      # echo "\PPPP.AAAA.TTTT.HHHH" > trace_method_name
+      # echo "opcode-opcode" > trace_state
 
-	"disabled" means this feature is disabled.
-	Users can enable/disable this debug tracing feature by
-	"echo string > /sys/module/acpi/parameters/trace_state".
-	"string" should be one of "enable", "disable" and "1".
+  Note that all above method tracing facility related module parameters can
+  be used as the boot parameters, for example:
+      acpi.trace_debug_layer=0x80 acpi.trace_debug_level=0x10 \
+      acpi.trace_method_name=\_SB.LID0._LID acpi.trace_state=opcode-once
+
+2. Interface descriptions:
+
+   All method tracing functions can be configured via ACPI module
+   parameters that are accessible at /sys/module/acpi/parameters/:
+
+   trace_method_name
+	The full path of the AML method that the user wants to trace.
+	Note that the full path shouldn't contain the trailing "_"s in its
+	name segments but may contain "\" to form an absolute path.
+
+   trace_debug_layer
+	The temporary debug_layer used when the tracing feature is enabled.
+	Using ACPI_EXECUTER (0x80) by default, which is the debug_layer
+	used to match all "AML tracer" logs.
+
+   trace_debug_level
+	The temporary debug_level used when the tracing feature is enabled.
+	Using ACPI_LV_TRACE_POINT (0x10) by default, which is the
+	debug_level used to match all "AML tracer" logs.
+
+   trace_state
+	The status of the tracing feature.
+	Users can enable/disable this debug tracing feature by executing
+	the following command:
+	    # echo string > /sys/module/acpi/parameters/trace_state
+	Where "string" should be one of the followings:
+	"disable"
+	    Disable the method tracing feature.
+	"enable"
+	    Enable the method tracing feature.
+	    ACPICA debugging messages matching
+	    "trace_debug_layer/trace_debug_level" during any method
+	    execution will be logged.
+	"method"
+	    Enable the method tracing feature.
+	    ACPICA debugging messages matching
+	    "trace_debug_layer/trace_debug_level" during method execution
+	    of "trace_method_name" will be logged.
+	"method-once"
+	    Enable the method tracing feature.
+	    ACPICA debugging messages matching
+	    "trace_debug_layer/trace_debug_level" during method execution
+	    of "trace_method_name" will be logged only once.
+	"opcode"
+	    Enable the method tracing feature.
+	    ACPICA debugging messages matching
+	    "trace_debug_layer/trace_debug_level" during method/opcode
+	    execution of "trace_method_name" will be logged.
+	"opcode-once"
+	    Enable the method tracing feature.
+	    ACPICA debugging messages matching
+	    "trace_debug_layer/trace_debug_level" during method/opcode
+	    execution of "trace_method_name" will be logged only once.
+	Note that, the difference between the "enable" and other feature
+        enabling options are:
+	1. When "enable" is specified, since
+	   "trace_debug_layer/trace_debug_level" shall apply to all control
+	   method evaluations, after configuring "trace_state" to "enable",
+	   "trace_method_name" will be reset to NULL.
+	2. When "method/opcode" is specified, if
+	   "trace_method_name" is NULL when "trace_state" is configured to
+	   these options, the "trace_debug_layer/trace_debug_level" will
+	   apply to all control method evaluations.
-- 
cgit v1.2.3


From ba4539ffc19b090841cd32ffe2c57d937e311857 Mon Sep 17 00:00:00 2001
From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Wed, 29 Jul 2015 16:22:56 +0530
Subject: PM / OPP: Update bindings to make opp-hz a 64 bit value

With a 32 bit value, the maximum frequency that the bindings can
support is ~ 4 GHz. And that might fall short of what newer systems may
have.

Allow opp-hz to be a 64 bit big-endian value.

Suggested-by: Stephen Boyd <sboyd@codeaurora.org>
Suggested-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/devicetree/bindings/power/opp.txt | 40 ++++++++++++-------------
 1 file changed, 20 insertions(+), 20 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
index 0d5e7c978121..0cb44dc21f97 100644
--- a/Documentation/devicetree/bindings/power/opp.txt
+++ b/Documentation/devicetree/bindings/power/opp.txt
@@ -88,7 +88,7 @@ This defines voltage-current-frequency combinations along with other related
 properties.
 
 Required properties:
-- opp-hz: Frequency in Hz
+- opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer.
 
 Optional properties:
 - opp-microvolt: voltage in micro Volts.
@@ -158,20 +158,20 @@ Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
 		opp-shared;
 
 		opp00 {
-			opp-hz = <1000000000>;
+			opp-hz = /bits/ 64 <1000000000>;
 			opp-microvolt = <970000 975000 985000>;
 			opp-microamp = <70000>;
 			clock-latency-ns = <300000>;
 			opp-suspend;
 		};
 		opp01 {
-			opp-hz = <1100000000>;
+			opp-hz = /bits/ 64 <1100000000>;
 			opp-microvolt = <980000 1000000 1010000>;
 			opp-microamp = <80000>;
 			clock-latency-ns = <310000>;
 		};
 		opp02 {
-			opp-hz = <1200000000>;
+			opp-hz = /bits/ 64 <1200000000>;
 			opp-microvolt = <1025000>;
 			clock-latency-ns = <290000>;
 			turbo-mode;
@@ -237,20 +237,20 @@ independently.
 		 */
 
 		opp00 {
-			opp-hz = <1000000000>;
+			opp-hz = /bits/ 64 <1000000000>;
 			opp-microvolt = <970000 975000 985000>;
 			opp-microamp = <70000>;
 			clock-latency-ns = <300000>;
 			opp-suspend;
 		};
 		opp01 {
-			opp-hz = <1100000000>;
+			opp-hz = /bits/ 64 <1100000000>;
 			opp-microvolt = <980000 1000000 1010000>;
 			opp-microamp = <80000>;
 			clock-latency-ns = <310000>;
 		};
 		opp02 {
-			opp-hz = <1200000000>;
+			opp-hz = /bits/ 64 <1200000000>;
 			opp-microvolt = <1025000>;
 			opp-microamp = <90000;
 			lock-latency-ns = <290000>;
@@ -313,20 +313,20 @@ DVFS state together.
 		opp-shared;
 
 		opp00 {
-			opp-hz = <1000000000>;
+			opp-hz = /bits/ 64 <1000000000>;
 			opp-microvolt = <970000 975000 985000>;
 			opp-microamp = <70000>;
 			clock-latency-ns = <300000>;
 			opp-suspend;
 		};
 		opp01 {
-			opp-hz = <1100000000>;
+			opp-hz = /bits/ 64 <1100000000>;
 			opp-microvolt = <980000 1000000 1010000>;
 			opp-microamp = <80000>;
 			clock-latency-ns = <310000>;
 		};
 		opp02 {
-			opp-hz = <1200000000>;
+			opp-hz = /bits/ 64 <1200000000>;
 			opp-microvolt = <1025000>;
 			opp-microamp = <90000>;
 			clock-latency-ns = <290000>;
@@ -339,20 +339,20 @@ DVFS state together.
 		opp-shared;
 
 		opp10 {
-			opp-hz = <1300000000>;
+			opp-hz = /bits/ 64 <1300000000>;
 			opp-microvolt = <1045000 1050000 1055000>;
 			opp-microamp = <95000>;
 			clock-latency-ns = <400000>;
 			opp-suspend;
 		};
 		opp11 {
-			opp-hz = <1400000000>;
+			opp-hz = /bits/ 64 <1400000000>;
 			opp-microvolt = <1075000>;
 			opp-microamp = <100000>;
 			clock-latency-ns = <400000>;
 		};
 		opp12 {
-			opp-hz = <1500000000>;
+			opp-hz = /bits/ 64 <1500000000>;
 			opp-microvolt = <1010000 1100000 1110000>;
 			opp-microamp = <95000>;
 			clock-latency-ns = <400000>;
@@ -379,7 +379,7 @@ Example 4: Handling multiple regulators
 		opp-shared;
 
 		opp00 {
-			opp-hz = <1000000000>;
+			opp-hz = /bits/ 64 <1000000000>;
 			opp-microvolt = <970000>, /* Supply 0 */
 					<960000>, /* Supply 1 */
 					<960000>; /* Supply 2 */
@@ -392,7 +392,7 @@ Example 4: Handling multiple regulators
 		/* OR */
 
 		opp00 {
-			opp-hz = <1000000000>;
+			opp-hz = /bits/ 64 <1000000000>;
 			opp-microvolt = <970000 975000 985000>, /* Supply 0 */
 					<960000 965000 975000>, /* Supply 1 */
 					<960000 965000 975000>; /* Supply 2 */
@@ -405,7 +405,7 @@ Example 4: Handling multiple regulators
 		/* OR */
 
 		opp00 {
-			opp-hz = <1000000000>;
+			opp-hz = /bits/ 64 <1000000000>;
 			opp-microvolt = <970000 975000 985000>, /* Supply 0 */
 					<960000 965000 975000>, /* Supply 1 */
 					<960000 965000 975000>; /* Supply 2 */
@@ -437,12 +437,12 @@ Example 5: Multiple OPP tables
 		opp-shared;
 
 		opp00 {
-			opp-hz = <600000000>;
+			opp-hz = /bits/ 64 <600000000>;
 			...
 		};
 
 		opp01 {
-			opp-hz = <800000000>;
+			opp-hz = /bits/ 64 <800000000>;
 			...
 		};
 	};
@@ -453,12 +453,12 @@ Example 5: Multiple OPP tables
 		opp-shared;
 
 		opp10 {
-			opp-hz = <1000000000>;
+			opp-hz = /bits/ 64 <1000000000>;
 			...
 		};
 
 		opp11 {
-			opp-hz = <1100000000>;
+			opp-hz = /bits/ 64 <1100000000>;
 			...
 		};
 	};
-- 
cgit v1.2.3


From 3566c5b277a41e912d15cd583c4604c95bb6b3f8 Mon Sep 17 00:00:00 2001
From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Thu, 30 Jul 2015 22:27:19 +0530
Subject: PM / OPP: Create a directory for opp bindings

More platform specific extended opp bindings will follow and it would be
easy to manage them with a directory for opp. Lets create that and move
the existing opp bindings into it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/devicetree/bindings/opp/opp.txt   | 465 ++++++++++++++++++++++++
 Documentation/devicetree/bindings/power/opp.txt | 465 ------------------------
 2 files changed, 465 insertions(+), 465 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/opp/opp.txt
 delete mode 100644 Documentation/devicetree/bindings/power/opp.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
new file mode 100644
index 000000000000..0cb44dc21f97
--- /dev/null
+++ b/Documentation/devicetree/bindings/opp/opp.txt
@@ -0,0 +1,465 @@
+Generic OPP (Operating Performance Points) Bindings
+----------------------------------------------------
+
+Devices work at voltage-current-frequency combinations and some implementations
+have the liberty of choosing these. These combinations are called Operating
+Performance Points aka OPPs. This document defines bindings for these OPPs
+applicable across wide range of devices. For illustration purpose, this document
+uses CPU as a device.
+
+This document contain multiple versions of OPP binding and only one of them
+should be used per device.
+
+Binding 1: operating-points
+============================
+
+This binding only supports voltage-frequency pairs.
+
+Properties:
+- operating-points: An array of 2-tuples items, and each item consists
+  of frequency and voltage like <freq-kHz vol-uV>.
+	freq: clock frequency in kHz
+	vol: voltage in microvolt
+
+Examples:
+
+cpu@0 {
+	compatible = "arm,cortex-a9";
+	reg = <0>;
+	next-level-cache = <&L2>;
+	operating-points = <
+		/* kHz    uV */
+		792000  1100000
+		396000  950000
+		198000  850000
+	>;
+};
+
+
+Binding 2: operating-points-v2
+============================
+
+* Property: operating-points-v2
+
+Devices supporting OPPs must set their "operating-points-v2" property with
+phandle to a OPP table in their DT node. The OPP core will use this phandle to
+find the operating points for the device.
+
+Devices may want to choose OPP tables at runtime and so can provide a list of
+phandles here. But only *one* of them should be chosen at runtime. This must be
+accompanied by a corresponding "operating-points-names" property, to uniquely
+identify the OPP tables.
+
+If required, this can be extended for SoC vendor specfic bindings. Such bindings
+should be documented as Documentation/devicetree/bindings/power/<vendor>-opp.txt
+and should have a compatible description like: "operating-points-v2-<vendor>".
+
+Optional properties:
+- operating-points-names: Names of OPP tables (required if multiple OPP
+  tables are present), to uniquely identify them. The same list must be present
+  for all the CPUs which are sharing clock/voltage rails and hence the OPP
+  tables.
+
+* OPP Table Node
+
+This describes the OPPs belonging to a device. This node can have following
+properties:
+
+Required properties:
+- compatible: Allow OPPs to express their compatibility. It should be:
+  "operating-points-v2".
+
+- OPP nodes: One or more OPP nodes describing voltage-current-frequency
+  combinations. Their name isn't significant but their phandle can be used to
+  reference an OPP.
+
+Optional properties:
+- opp-shared: Indicates that device nodes using this OPP Table Node's phandle
+  switch their DVFS state together, i.e. they share clock/voltage/current lines.
+  Missing property means devices have independent clock/voltage/current lines,
+  but they share OPP tables.
+
+- status: Marks the OPP table enabled/disabled.
+
+
+* OPP Node
+
+This defines voltage-current-frequency combinations along with other related
+properties.
+
+Required properties:
+- opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer.
+
+Optional properties:
+- opp-microvolt: voltage in micro Volts.
+
+  A single regulator's voltage is specified with an array of size one or three.
+  Single entry is for target voltage and three entries are for <target min max>
+  voltages.
+
+  Entries for multiple regulators must be present in the same order as
+  regulators are specified in device's DT node.
+
+- opp-microamp: The maximum current drawn by the device in microamperes
+  considering system specific parameters (such as transients, process, aging,
+  maximum operating temperature range etc.) as necessary. This may be used to
+  set the most efficient regulator operating mode.
+
+  Should only be set if opp-microvolt is set for the OPP.
+
+  Entries for multiple regulators must be present in the same order as
+  regulators are specified in device's DT node. If this property isn't required
+  for few regulators, then this should be marked as zero for them. If it isn't
+  required for any regulator, then this property need not be present.
+
+- clock-latency-ns: Specifies the maximum possible transition latency (in
+  nanoseconds) for switching to this OPP from any other OPP.
+
+- turbo-mode: Marks the OPP to be used only for turbo modes. Turbo mode is
+  available on some platforms, where the device can run over its operating
+  frequency for a short duration of time limited by the device's power, current
+  and thermal limits.
+
+- opp-suspend: Marks the OPP to be used during device suspend. Only one OPP in
+  the table should have this.
+
+- status: Marks the node enabled/disabled.
+
+Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
+
+/ {
+	cpus {
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		cpu@0 {
+			compatible = "arm,cortex-a9";
+			reg = <0>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 0>;
+			clock-names = "cpu";
+			cpu-supply = <&cpu_supply0>;
+			operating-points-v2 = <&cpu0_opp_table>;
+		};
+
+		cpu@1 {
+			compatible = "arm,cortex-a9";
+			reg = <1>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 0>;
+			clock-names = "cpu";
+			cpu-supply = <&cpu_supply0>;
+			operating-points-v2 = <&cpu0_opp_table>;
+		};
+	};
+
+	cpu0_opp_table: opp_table0 {
+		compatible = "operating-points-v2";
+		opp-shared;
+
+		opp00 {
+			opp-hz = /bits/ 64 <1000000000>;
+			opp-microvolt = <970000 975000 985000>;
+			opp-microamp = <70000>;
+			clock-latency-ns = <300000>;
+			opp-suspend;
+		};
+		opp01 {
+			opp-hz = /bits/ 64 <1100000000>;
+			opp-microvolt = <980000 1000000 1010000>;
+			opp-microamp = <80000>;
+			clock-latency-ns = <310000>;
+		};
+		opp02 {
+			opp-hz = /bits/ 64 <1200000000>;
+			opp-microvolt = <1025000>;
+			clock-latency-ns = <290000>;
+			turbo-mode;
+		};
+	};
+};
+
+Example 2: Single cluster, Quad-core Qualcom-krait, switches DVFS states
+independently.
+
+/ {
+	cpus {
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		cpu@0 {
+			compatible = "qcom,krait";
+			reg = <0>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 0>;
+			clock-names = "cpu";
+			cpu-supply = <&cpu_supply0>;
+			operating-points-v2 = <&cpu_opp_table>;
+		};
+
+		cpu@1 {
+			compatible = "qcom,krait";
+			reg = <1>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 1>;
+			clock-names = "cpu";
+			cpu-supply = <&cpu_supply1>;
+			operating-points-v2 = <&cpu_opp_table>;
+		};
+
+		cpu@2 {
+			compatible = "qcom,krait";
+			reg = <2>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 2>;
+			clock-names = "cpu";
+			cpu-supply = <&cpu_supply2>;
+			operating-points-v2 = <&cpu_opp_table>;
+		};
+
+		cpu@3 {
+			compatible = "qcom,krait";
+			reg = <3>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 3>;
+			clock-names = "cpu";
+			cpu-supply = <&cpu_supply3>;
+			operating-points-v2 = <&cpu_opp_table>;
+		};
+	};
+
+	cpu_opp_table: opp_table {
+		compatible = "operating-points-v2";
+
+		/*
+		 * Missing opp-shared property means CPUs switch DVFS states
+		 * independently.
+		 */
+
+		opp00 {
+			opp-hz = /bits/ 64 <1000000000>;
+			opp-microvolt = <970000 975000 985000>;
+			opp-microamp = <70000>;
+			clock-latency-ns = <300000>;
+			opp-suspend;
+		};
+		opp01 {
+			opp-hz = /bits/ 64 <1100000000>;
+			opp-microvolt = <980000 1000000 1010000>;
+			opp-microamp = <80000>;
+			clock-latency-ns = <310000>;
+		};
+		opp02 {
+			opp-hz = /bits/ 64 <1200000000>;
+			opp-microvolt = <1025000>;
+			opp-microamp = <90000;
+			lock-latency-ns = <290000>;
+			turbo-mode;
+		};
+	};
+};
+
+Example 3: Dual-cluster, Dual-core per cluster. CPUs within a cluster switch
+DVFS state together.
+
+/ {
+	cpus {
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		cpu@0 {
+			compatible = "arm,cortex-a7";
+			reg = <0>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 0>;
+			clock-names = "cpu";
+			cpu-supply = <&cpu_supply0>;
+			operating-points-v2 = <&cluster0_opp>;
+		};
+
+		cpu@1 {
+			compatible = "arm,cortex-a7";
+			reg = <1>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 0>;
+			clock-names = "cpu";
+			cpu-supply = <&cpu_supply0>;
+			operating-points-v2 = <&cluster0_opp>;
+		};
+
+		cpu@100 {
+			compatible = "arm,cortex-a15";
+			reg = <100>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 1>;
+			clock-names = "cpu";
+			cpu-supply = <&cpu_supply1>;
+			operating-points-v2 = <&cluster1_opp>;
+		};
+
+		cpu@101 {
+			compatible = "arm,cortex-a15";
+			reg = <101>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 1>;
+			clock-names = "cpu";
+			cpu-supply = <&cpu_supply1>;
+			operating-points-v2 = <&cluster1_opp>;
+		};
+	};
+
+	cluster0_opp: opp_table0 {
+		compatible = "operating-points-v2";
+		opp-shared;
+
+		opp00 {
+			opp-hz = /bits/ 64 <1000000000>;
+			opp-microvolt = <970000 975000 985000>;
+			opp-microamp = <70000>;
+			clock-latency-ns = <300000>;
+			opp-suspend;
+		};
+		opp01 {
+			opp-hz = /bits/ 64 <1100000000>;
+			opp-microvolt = <980000 1000000 1010000>;
+			opp-microamp = <80000>;
+			clock-latency-ns = <310000>;
+		};
+		opp02 {
+			opp-hz = /bits/ 64 <1200000000>;
+			opp-microvolt = <1025000>;
+			opp-microamp = <90000>;
+			clock-latency-ns = <290000>;
+			turbo-mode;
+		};
+	};
+
+	cluster1_opp: opp_table1 {
+		compatible = "operating-points-v2";
+		opp-shared;
+
+		opp10 {
+			opp-hz = /bits/ 64 <1300000000>;
+			opp-microvolt = <1045000 1050000 1055000>;
+			opp-microamp = <95000>;
+			clock-latency-ns = <400000>;
+			opp-suspend;
+		};
+		opp11 {
+			opp-hz = /bits/ 64 <1400000000>;
+			opp-microvolt = <1075000>;
+			opp-microamp = <100000>;
+			clock-latency-ns = <400000>;
+		};
+		opp12 {
+			opp-hz = /bits/ 64 <1500000000>;
+			opp-microvolt = <1010000 1100000 1110000>;
+			opp-microamp = <95000>;
+			clock-latency-ns = <400000>;
+			turbo-mode;
+		};
+	};
+};
+
+Example 4: Handling multiple regulators
+
+/ {
+	cpus {
+		cpu@0 {
+			compatible = "arm,cortex-a7";
+			...
+
+			cpu-supply = <&cpu_supply0>, <&cpu_supply1>, <&cpu_supply2>;
+			operating-points-v2 = <&cpu0_opp_table>;
+		};
+	};
+
+	cpu0_opp_table: opp_table0 {
+		compatible = "operating-points-v2";
+		opp-shared;
+
+		opp00 {
+			opp-hz = /bits/ 64 <1000000000>;
+			opp-microvolt = <970000>, /* Supply 0 */
+					<960000>, /* Supply 1 */
+					<960000>; /* Supply 2 */
+			opp-microamp =  <70000>,  /* Supply 0 */
+					<70000>,  /* Supply 1 */
+					<70000>;  /* Supply 2 */
+			clock-latency-ns = <300000>;
+		};
+
+		/* OR */
+
+		opp00 {
+			opp-hz = /bits/ 64 <1000000000>;
+			opp-microvolt = <970000 975000 985000>, /* Supply 0 */
+					<960000 965000 975000>, /* Supply 1 */
+					<960000 965000 975000>; /* Supply 2 */
+			opp-microamp =  <70000>,		/* Supply 0 */
+					<70000>,		/* Supply 1 */
+					<70000>;		/* Supply 2 */
+			clock-latency-ns = <300000>;
+		};
+
+		/* OR */
+
+		opp00 {
+			opp-hz = /bits/ 64 <1000000000>;
+			opp-microvolt = <970000 975000 985000>, /* Supply 0 */
+					<960000 965000 975000>, /* Supply 1 */
+					<960000 965000 975000>; /* Supply 2 */
+			opp-microamp =  <70000>,		/* Supply 0 */
+					<0>,			/* Supply 1 doesn't need this */
+					<70000>;		/* Supply 2 */
+			clock-latency-ns = <300000>;
+		};
+	};
+};
+
+Example 5: Multiple OPP tables
+
+/ {
+	cpus {
+		cpu@0 {
+			compatible = "arm,cortex-a7";
+			...
+
+			cpu-supply = <&cpu_supply>
+			operating-points-v2 = <&cpu0_opp_table_slow>, <&cpu0_opp_table_fast>;
+			operating-points-names = "slow", "fast";
+		};
+	};
+
+	cpu0_opp_table_slow: opp_table_slow {
+		compatible = "operating-points-v2";
+		status = "okay";
+		opp-shared;
+
+		opp00 {
+			opp-hz = /bits/ 64 <600000000>;
+			...
+		};
+
+		opp01 {
+			opp-hz = /bits/ 64 <800000000>;
+			...
+		};
+	};
+
+	cpu0_opp_table_fast: opp_table_fast {
+		compatible = "operating-points-v2";
+		status = "okay";
+		opp-shared;
+
+		opp10 {
+			opp-hz = /bits/ 64 <1000000000>;
+			...
+		};
+
+		opp11 {
+			opp-hz = /bits/ 64 <1100000000>;
+			...
+		};
+	};
+};
diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
deleted file mode 100644
index 0cb44dc21f97..000000000000
--- a/Documentation/devicetree/bindings/power/opp.txt
+++ /dev/null
@@ -1,465 +0,0 @@
-Generic OPP (Operating Performance Points) Bindings
-----------------------------------------------------
-
-Devices work at voltage-current-frequency combinations and some implementations
-have the liberty of choosing these. These combinations are called Operating
-Performance Points aka OPPs. This document defines bindings for these OPPs
-applicable across wide range of devices. For illustration purpose, this document
-uses CPU as a device.
-
-This document contain multiple versions of OPP binding and only one of them
-should be used per device.
-
-Binding 1: operating-points
-============================
-
-This binding only supports voltage-frequency pairs.
-
-Properties:
-- operating-points: An array of 2-tuples items, and each item consists
-  of frequency and voltage like <freq-kHz vol-uV>.
-	freq: clock frequency in kHz
-	vol: voltage in microvolt
-
-Examples:
-
-cpu@0 {
-	compatible = "arm,cortex-a9";
-	reg = <0>;
-	next-level-cache = <&L2>;
-	operating-points = <
-		/* kHz    uV */
-		792000  1100000
-		396000  950000
-		198000  850000
-	>;
-};
-
-
-Binding 2: operating-points-v2
-============================
-
-* Property: operating-points-v2
-
-Devices supporting OPPs must set their "operating-points-v2" property with
-phandle to a OPP table in their DT node. The OPP core will use this phandle to
-find the operating points for the device.
-
-Devices may want to choose OPP tables at runtime and so can provide a list of
-phandles here. But only *one* of them should be chosen at runtime. This must be
-accompanied by a corresponding "operating-points-names" property, to uniquely
-identify the OPP tables.
-
-If required, this can be extended for SoC vendor specfic bindings. Such bindings
-should be documented as Documentation/devicetree/bindings/power/<vendor>-opp.txt
-and should have a compatible description like: "operating-points-v2-<vendor>".
-
-Optional properties:
-- operating-points-names: Names of OPP tables (required if multiple OPP
-  tables are present), to uniquely identify them. The same list must be present
-  for all the CPUs which are sharing clock/voltage rails and hence the OPP
-  tables.
-
-* OPP Table Node
-
-This describes the OPPs belonging to a device. This node can have following
-properties:
-
-Required properties:
-- compatible: Allow OPPs to express their compatibility. It should be:
-  "operating-points-v2".
-
-- OPP nodes: One or more OPP nodes describing voltage-current-frequency
-  combinations. Their name isn't significant but their phandle can be used to
-  reference an OPP.
-
-Optional properties:
-- opp-shared: Indicates that device nodes using this OPP Table Node's phandle
-  switch their DVFS state together, i.e. they share clock/voltage/current lines.
-  Missing property means devices have independent clock/voltage/current lines,
-  but they share OPP tables.
-
-- status: Marks the OPP table enabled/disabled.
-
-
-* OPP Node
-
-This defines voltage-current-frequency combinations along with other related
-properties.
-
-Required properties:
-- opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer.
-
-Optional properties:
-- opp-microvolt: voltage in micro Volts.
-
-  A single regulator's voltage is specified with an array of size one or three.
-  Single entry is for target voltage and three entries are for <target min max>
-  voltages.
-
-  Entries for multiple regulators must be present in the same order as
-  regulators are specified in device's DT node.
-
-- opp-microamp: The maximum current drawn by the device in microamperes
-  considering system specific parameters (such as transients, process, aging,
-  maximum operating temperature range etc.) as necessary. This may be used to
-  set the most efficient regulator operating mode.
-
-  Should only be set if opp-microvolt is set for the OPP.
-
-  Entries for multiple regulators must be present in the same order as
-  regulators are specified in device's DT node. If this property isn't required
-  for few regulators, then this should be marked as zero for them. If it isn't
-  required for any regulator, then this property need not be present.
-
-- clock-latency-ns: Specifies the maximum possible transition latency (in
-  nanoseconds) for switching to this OPP from any other OPP.
-
-- turbo-mode: Marks the OPP to be used only for turbo modes. Turbo mode is
-  available on some platforms, where the device can run over its operating
-  frequency for a short duration of time limited by the device's power, current
-  and thermal limits.
-
-- opp-suspend: Marks the OPP to be used during device suspend. Only one OPP in
-  the table should have this.
-
-- status: Marks the node enabled/disabled.
-
-Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
-
-/ {
-	cpus {
-		#address-cells = <1>;
-		#size-cells = <0>;
-
-		cpu@0 {
-			compatible = "arm,cortex-a9";
-			reg = <0>;
-			next-level-cache = <&L2>;
-			clocks = <&clk_controller 0>;
-			clock-names = "cpu";
-			cpu-supply = <&cpu_supply0>;
-			operating-points-v2 = <&cpu0_opp_table>;
-		};
-
-		cpu@1 {
-			compatible = "arm,cortex-a9";
-			reg = <1>;
-			next-level-cache = <&L2>;
-			clocks = <&clk_controller 0>;
-			clock-names = "cpu";
-			cpu-supply = <&cpu_supply0>;
-			operating-points-v2 = <&cpu0_opp_table>;
-		};
-	};
-
-	cpu0_opp_table: opp_table0 {
-		compatible = "operating-points-v2";
-		opp-shared;
-
-		opp00 {
-			opp-hz = /bits/ 64 <1000000000>;
-			opp-microvolt = <970000 975000 985000>;
-			opp-microamp = <70000>;
-			clock-latency-ns = <300000>;
-			opp-suspend;
-		};
-		opp01 {
-			opp-hz = /bits/ 64 <1100000000>;
-			opp-microvolt = <980000 1000000 1010000>;
-			opp-microamp = <80000>;
-			clock-latency-ns = <310000>;
-		};
-		opp02 {
-			opp-hz = /bits/ 64 <1200000000>;
-			opp-microvolt = <1025000>;
-			clock-latency-ns = <290000>;
-			turbo-mode;
-		};
-	};
-};
-
-Example 2: Single cluster, Quad-core Qualcom-krait, switches DVFS states
-independently.
-
-/ {
-	cpus {
-		#address-cells = <1>;
-		#size-cells = <0>;
-
-		cpu@0 {
-			compatible = "qcom,krait";
-			reg = <0>;
-			next-level-cache = <&L2>;
-			clocks = <&clk_controller 0>;
-			clock-names = "cpu";
-			cpu-supply = <&cpu_supply0>;
-			operating-points-v2 = <&cpu_opp_table>;
-		};
-
-		cpu@1 {
-			compatible = "qcom,krait";
-			reg = <1>;
-			next-level-cache = <&L2>;
-			clocks = <&clk_controller 1>;
-			clock-names = "cpu";
-			cpu-supply = <&cpu_supply1>;
-			operating-points-v2 = <&cpu_opp_table>;
-		};
-
-		cpu@2 {
-			compatible = "qcom,krait";
-			reg = <2>;
-			next-level-cache = <&L2>;
-			clocks = <&clk_controller 2>;
-			clock-names = "cpu";
-			cpu-supply = <&cpu_supply2>;
-			operating-points-v2 = <&cpu_opp_table>;
-		};
-
-		cpu@3 {
-			compatible = "qcom,krait";
-			reg = <3>;
-			next-level-cache = <&L2>;
-			clocks = <&clk_controller 3>;
-			clock-names = "cpu";
-			cpu-supply = <&cpu_supply3>;
-			operating-points-v2 = <&cpu_opp_table>;
-		};
-	};
-
-	cpu_opp_table: opp_table {
-		compatible = "operating-points-v2";
-
-		/*
-		 * Missing opp-shared property means CPUs switch DVFS states
-		 * independently.
-		 */
-
-		opp00 {
-			opp-hz = /bits/ 64 <1000000000>;
-			opp-microvolt = <970000 975000 985000>;
-			opp-microamp = <70000>;
-			clock-latency-ns = <300000>;
-			opp-suspend;
-		};
-		opp01 {
-			opp-hz = /bits/ 64 <1100000000>;
-			opp-microvolt = <980000 1000000 1010000>;
-			opp-microamp = <80000>;
-			clock-latency-ns = <310000>;
-		};
-		opp02 {
-			opp-hz = /bits/ 64 <1200000000>;
-			opp-microvolt = <1025000>;
-			opp-microamp = <90000;
-			lock-latency-ns = <290000>;
-			turbo-mode;
-		};
-	};
-};
-
-Example 3: Dual-cluster, Dual-core per cluster. CPUs within a cluster switch
-DVFS state together.
-
-/ {
-	cpus {
-		#address-cells = <1>;
-		#size-cells = <0>;
-
-		cpu@0 {
-			compatible = "arm,cortex-a7";
-			reg = <0>;
-			next-level-cache = <&L2>;
-			clocks = <&clk_controller 0>;
-			clock-names = "cpu";
-			cpu-supply = <&cpu_supply0>;
-			operating-points-v2 = <&cluster0_opp>;
-		};
-
-		cpu@1 {
-			compatible = "arm,cortex-a7";
-			reg = <1>;
-			next-level-cache = <&L2>;
-			clocks = <&clk_controller 0>;
-			clock-names = "cpu";
-			cpu-supply = <&cpu_supply0>;
-			operating-points-v2 = <&cluster0_opp>;
-		};
-
-		cpu@100 {
-			compatible = "arm,cortex-a15";
-			reg = <100>;
-			next-level-cache = <&L2>;
-			clocks = <&clk_controller 1>;
-			clock-names = "cpu";
-			cpu-supply = <&cpu_supply1>;
-			operating-points-v2 = <&cluster1_opp>;
-		};
-
-		cpu@101 {
-			compatible = "arm,cortex-a15";
-			reg = <101>;
-			next-level-cache = <&L2>;
-			clocks = <&clk_controller 1>;
-			clock-names = "cpu";
-			cpu-supply = <&cpu_supply1>;
-			operating-points-v2 = <&cluster1_opp>;
-		};
-	};
-
-	cluster0_opp: opp_table0 {
-		compatible = "operating-points-v2";
-		opp-shared;
-
-		opp00 {
-			opp-hz = /bits/ 64 <1000000000>;
-			opp-microvolt = <970000 975000 985000>;
-			opp-microamp = <70000>;
-			clock-latency-ns = <300000>;
-			opp-suspend;
-		};
-		opp01 {
-			opp-hz = /bits/ 64 <1100000000>;
-			opp-microvolt = <980000 1000000 1010000>;
-			opp-microamp = <80000>;
-			clock-latency-ns = <310000>;
-		};
-		opp02 {
-			opp-hz = /bits/ 64 <1200000000>;
-			opp-microvolt = <1025000>;
-			opp-microamp = <90000>;
-			clock-latency-ns = <290000>;
-			turbo-mode;
-		};
-	};
-
-	cluster1_opp: opp_table1 {
-		compatible = "operating-points-v2";
-		opp-shared;
-
-		opp10 {
-			opp-hz = /bits/ 64 <1300000000>;
-			opp-microvolt = <1045000 1050000 1055000>;
-			opp-microamp = <95000>;
-			clock-latency-ns = <400000>;
-			opp-suspend;
-		};
-		opp11 {
-			opp-hz = /bits/ 64 <1400000000>;
-			opp-microvolt = <1075000>;
-			opp-microamp = <100000>;
-			clock-latency-ns = <400000>;
-		};
-		opp12 {
-			opp-hz = /bits/ 64 <1500000000>;
-			opp-microvolt = <1010000 1100000 1110000>;
-			opp-microamp = <95000>;
-			clock-latency-ns = <400000>;
-			turbo-mode;
-		};
-	};
-};
-
-Example 4: Handling multiple regulators
-
-/ {
-	cpus {
-		cpu@0 {
-			compatible = "arm,cortex-a7";
-			...
-
-			cpu-supply = <&cpu_supply0>, <&cpu_supply1>, <&cpu_supply2>;
-			operating-points-v2 = <&cpu0_opp_table>;
-		};
-	};
-
-	cpu0_opp_table: opp_table0 {
-		compatible = "operating-points-v2";
-		opp-shared;
-
-		opp00 {
-			opp-hz = /bits/ 64 <1000000000>;
-			opp-microvolt = <970000>, /* Supply 0 */
-					<960000>, /* Supply 1 */
-					<960000>; /* Supply 2 */
-			opp-microamp =  <70000>,  /* Supply 0 */
-					<70000>,  /* Supply 1 */
-					<70000>;  /* Supply 2 */
-			clock-latency-ns = <300000>;
-		};
-
-		/* OR */
-
-		opp00 {
-			opp-hz = /bits/ 64 <1000000000>;
-			opp-microvolt = <970000 975000 985000>, /* Supply 0 */
-					<960000 965000 975000>, /* Supply 1 */
-					<960000 965000 975000>; /* Supply 2 */
-			opp-microamp =  <70000>,		/* Supply 0 */
-					<70000>,		/* Supply 1 */
-					<70000>;		/* Supply 2 */
-			clock-latency-ns = <300000>;
-		};
-
-		/* OR */
-
-		opp00 {
-			opp-hz = /bits/ 64 <1000000000>;
-			opp-microvolt = <970000 975000 985000>, /* Supply 0 */
-					<960000 965000 975000>, /* Supply 1 */
-					<960000 965000 975000>; /* Supply 2 */
-			opp-microamp =  <70000>,		/* Supply 0 */
-					<0>,			/* Supply 1 doesn't need this */
-					<70000>;		/* Supply 2 */
-			clock-latency-ns = <300000>;
-		};
-	};
-};
-
-Example 5: Multiple OPP tables
-
-/ {
-	cpus {
-		cpu@0 {
-			compatible = "arm,cortex-a7";
-			...
-
-			cpu-supply = <&cpu_supply>
-			operating-points-v2 = <&cpu0_opp_table_slow>, <&cpu0_opp_table_fast>;
-			operating-points-names = "slow", "fast";
-		};
-	};
-
-	cpu0_opp_table_slow: opp_table_slow {
-		compatible = "operating-points-v2";
-		status = "okay";
-		opp-shared;
-
-		opp00 {
-			opp-hz = /bits/ 64 <600000000>;
-			...
-		};
-
-		opp01 {
-			opp-hz = /bits/ 64 <800000000>;
-			...
-		};
-	};
-
-	cpu0_opp_table_fast: opp_table_fast {
-		compatible = "operating-points-v2";
-		status = "okay";
-		opp-shared;
-
-		opp10 {
-			opp-hz = /bits/ 64 <1000000000>;
-			...
-		};
-
-		opp11 {
-			opp-hz = /bits/ 64 <1100000000>;
-			...
-		};
-	};
-};
-- 
cgit v1.2.3


From fe2edd9cbb7763c68350e267f8060113ce9848fe Mon Sep 17 00:00:00 2001
From: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Date: Thu, 30 Jul 2015 01:02:36 +0200
Subject: Documentation: dt: atmel-at91: add clocks to system timer, rstc and
 shdwc

The system timer (at91rm9200), the reset controller and the shutdown
controller need an input clock. This is the slow clock and they will not
function without it.

Also fix the shutdown controller example.

Acked-By: Sebastian Reichel <sre@kernel.org>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
---
 Documentation/devicetree/bindings/arm/atmel-at91.txt | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/atmel-at91.txt b/Documentation/devicetree/bindings/arm/atmel-at91.txt
index 424ac8cbfa08..209710cdf3aa 100644
--- a/Documentation/devicetree/bindings/arm/atmel-at91.txt
+++ b/Documentation/devicetree/bindings/arm/atmel-at91.txt
@@ -50,6 +50,7 @@ System Timer (ST) required properties:
 - reg: Should contain registers location and length
 - interrupts: Should contain interrupt for the ST which is the IRQ line
   shared across all System Controller members.
+- clocks: phandle to input clock.
 Its subnodes can be:
 - watchdog: compatible should be "atmel,at91rm9200-wdt"
 
@@ -89,12 +90,14 @@ RSTC Reset Controller required properties:
 - compatible: Should be "atmel,<chip>-rstc".
   <chip> can be "at91sam9260" or "at91sam9g45"
 - reg: Should contain registers location and length
+- clocks: phandle to input clock.
 
 Example:
 
 	rstc@fffffd00 {
 		compatible = "atmel,at91sam9260-rstc";
 		reg = <0xfffffd00 0x10>;
+		clocks = <&clk32k>;
 	};
 
 RAMC SDRAM/DDR Controller required properties:
@@ -117,6 +120,7 @@ required properties:
 - compatible: Should be "atmel,<chip>-shdwc".
   <chip> can be "at91sam9260", "at91sam9rl" or "at91sam9x5".
 - reg: Should contain registers location and length
+- clocks: phandle to input clock.
 
 optional properties:
 - atmel,wakeup-mode: String, operation mode of the wakeup mode.
@@ -135,9 +139,10 @@ optional at91sam9x5 properties:
 
 Example:
 
-	rstc@fffffd00 {
-		compatible = "atmel,at91sam9260-rstc";
-		reg = <0xfffffd00 0x10>;
+	shdwc@fffffd10 {
+		compatible = "atmel,at91sam9260-shdwc";
+		reg = <0xfffffd10 0x10>;
+		clocks = <&clk32k>;
 	};
 
 Special Function Registers (SFR)
-- 
cgit v1.2.3


From 04600560a01270ef3a9bdeaeeebcfa23df81a327 Mon Sep 17 00:00:00 2001
From: Boris Brezillon <boris.brezillon@free-electrons.com>
Date: Fri, 31 Jul 2015 02:11:14 +0200
Subject: Documentation: dt: atmel-at91: add slow clock to tcb

The timer counters need the slow clock. It is required as they
will not function without it.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
---
 Documentation/devicetree/bindings/arm/atmel-at91.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/atmel-at91.txt b/Documentation/devicetree/bindings/arm/atmel-at91.txt
index 209710cdf3aa..dc2d0f06d058 100644
--- a/Documentation/devicetree/bindings/arm/atmel-at91.txt
+++ b/Documentation/devicetree/bindings/arm/atmel-at91.txt
@@ -62,7 +62,7 @@ TC/TCLIB Timer required properties:
   Note that you can specify several interrupt cells if the TC
   block has one interrupt per channel.
 - clock-names: tuple listing input clock names.
-	Required elements: "t0_clk"
+	Required elements: "t0_clk", "slow_clk"
 	Optional elements: "t1_clk", "t2_clk"
 - clocks: phandles to input clocks.
 
-- 
cgit v1.2.3


From d79e327aa58505fe693ef4a5157cc61eb968be02 Mon Sep 17 00:00:00 2001
From: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Date: Thu, 30 Jul 2015 00:43:07 +0200
Subject: Documentation: watchdog: at91sam9_wdt: add clocks property

The watchdog has an input clock, the slow clock. It is required as it will
not function without it.

Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
---
 Documentation/devicetree/bindings/watchdog/atmel-wdt.txt | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/watchdog/atmel-wdt.txt b/Documentation/devicetree/bindings/watchdog/atmel-wdt.txt
index a4d869744f59..86fa6de1019b 100644
--- a/Documentation/devicetree/bindings/watchdog/atmel-wdt.txt
+++ b/Documentation/devicetree/bindings/watchdog/atmel-wdt.txt
@@ -6,6 +6,7 @@ Required properties:
 - compatible: must be "atmel,at91sam9260-wdt".
 - reg: physical base address of the controller and length of memory mapped
   region.
+- clocks: phandle to input clock.
 
 Optional properties:
 - timeout-sec: contains the watchdog timeout in seconds.
@@ -39,6 +40,7 @@ Example:
 		compatible = "atmel,at91sam9260-wdt";
 		reg = <0xfffffd40 0x10>;
 		interrupts = <1 IRQ_TYPE_LEVEL_HIGH 7>;
+		clocks = <&clk32k>;
 		timeout-sec = <15>;
 		atmel,watchdog-type = "hardware";
 		atmel,reset-type = "all";
-- 
cgit v1.2.3


From e1162cba4e44bd97d7e690bf664afc71a85b1413 Mon Sep 17 00:00:00 2001
From: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Date: Thu, 30 Jul 2015 00:49:44 +0200
Subject: Documentation: dt: rtc: at91rm9200: add clocks property

The RTC needs an input clock, it is the slow clock. It is required as it
will not function without it.

Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
---
 Documentation/devicetree/bindings/rtc/atmel,at91rm9200-rtc.txt | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/rtc/atmel,at91rm9200-rtc.txt b/Documentation/devicetree/bindings/rtc/atmel,at91rm9200-rtc.txt
index 34c1505774bf..5d3791e789c6 100644
--- a/Documentation/devicetree/bindings/rtc/atmel,at91rm9200-rtc.txt
+++ b/Documentation/devicetree/bindings/rtc/atmel,at91rm9200-rtc.txt
@@ -5,6 +5,7 @@ Required properties:
 - reg: physical base address of the controller and length of memory mapped
   region.
 - interrupts: rtc alarm/event interrupt
+- clocks: phandle to input clock.
 
 Example:
 
@@ -12,4 +13,5 @@ rtc@fffffe00 {
 	compatible = "atmel,at91rm9200-rtc";
 	reg = <0xfffffe00 0x100>;
 	interrupts = <1 4 7>;
+	clocks = <&clk32k>;
 };
-- 
cgit v1.2.3


From 253508cadddbb676b05f03ada93d0f986df107f5 Mon Sep 17 00:00:00 2001
From: Nik Nyby <nikolas@gnu.org>
Date: Fri, 26 Jun 2015 12:05:39 -0400
Subject: fix typo in Documentation/SubmittingPatches

This adds a missing letter in Documentation/SubmittingPatches.

Signed-off-by: Nik Nyby <nikolas@gnu.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
---
 Documentation/SubmittingPatches | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 27e7e5edeca8..e2d29a749ff4 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -291,7 +291,7 @@ sending him e-mail.
 
 If you have a patch that fixes an exploitable security bug, send that patch
 to security@kernel.org.  For severe bugs, a short embargo may be considered
-to allow distrbutors to get the patch out to users; in such cases,
+to allow distributors to get the patch out to users; in such cases,
 obviously, the patch should not be sent to any public lists.
 
 Patches that fix a severe bug in a released kernel should be directed
-- 
cgit v1.2.3


From 831527b7304d06ea24afd2d01956dad06cd34043 Mon Sep 17 00:00:00 2001
From: Nik Nyby <nikolas@gnu.org>
Date: Mon, 6 Jul 2015 10:28:41 -0400
Subject: pcmcia: Fix typo in locking documentation

This fixes a typo in the docs: "devie" -> "device".

Signed-off-by: Nik Nyby <nikolas@gnu.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
---
 Documentation/pcmcia/locking.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/pcmcia/locking.txt b/Documentation/pcmcia/locking.txt
index 68f622bc4064..b2c9b478906b 100644
--- a/Documentation/pcmcia/locking.txt
+++ b/Documentation/pcmcia/locking.txt
@@ -96,7 +96,7 @@ or single-use fields not mentioned):
 3. Per PCMCIA-device Data:
 --------------------------
 
-The "main" struct pcmcia_devie is protected as follows (read-only fields
+The "main" struct pcmcia_device is protected as follows (read-only fields
 or single-use fields not mentioned):
 
 
-- 
cgit v1.2.3


From 6ade97724fb571338a0f82ecaac5bced2f43f042 Mon Sep 17 00:00:00 2001
From: Benjamin Herr <ben@0x539.de>
Date: Sat, 18 Jul 2015 14:31:40 +0200
Subject: Doc: fix trivial typo in SubmittingPatches

This patch changes the tense of a verb in SubmittingPatches to ensure
grammatical validity of the containing sentence.

Signed-off-by: Benjamin Herr <ben@0x539.de>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
---
 Documentation/SubmittingPatches | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index e2d29a749ff4..d3f56137b498 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -90,7 +90,7 @@ patch.
 
 Make sure your patch does not include any extra files which do not
 belong in a patch submission.  Make sure to review your patch -after-
-generated it with diff(1), to ensure accuracy.
+generating it with diff(1), to ensure accuracy.
 
 If your changes produce a lot of deltas, you need to split them into
 individual patches which modify things in logical stages; see section
-- 
cgit v1.2.3


From 0d850e7cdc69962e85abd7f7dcd2359f293f835a Mon Sep 17 00:00:00 2001
From: Leilk Liu <leilk.liu@mediatek.com>
Date: Fri, 7 Aug 2015 15:19:49 +0800
Subject: spi: Mediatek: Document devicetree bindings for spi bus

Signed-off-by: Leilk Liu <leilk.liu@mediatek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../devicetree/bindings/spi/spi-mt65xx.txt         | 51 ++++++++++++++++++++++
 1 file changed, 51 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/spi/spi-mt65xx.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/spi/spi-mt65xx.txt b/Documentation/devicetree/bindings/spi/spi-mt65xx.txt
new file mode 100644
index 000000000000..dcefc438272f
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/spi-mt65xx.txt
@@ -0,0 +1,51 @@
+Binding for MTK SPI controller
+
+Required properties:
+- compatible: should be one of the following.
+    - mediatek,mt8173-spi: for mt8173 platforms
+    - mediatek,mt8135-spi: for mt8135 platforms
+    - mediatek,mt6589-spi: for mt6589 platforms
+
+- #address-cells: should be 1.
+
+- #size-cells: should be 0.
+
+- reg: Address and length of the register set for the device
+
+- interrupts: Should contain spi interrupt
+
+- clocks: phandles to input clocks.
+  The first should be <&topckgen CLK_TOP_SPI_SEL>.
+  The second should be one of the following.
+   -  <&clk26m>: specify parent clock 26MHZ.
+   -  <&topckgen CLK_TOP_SYSPLL3_D2>: specify parent clock 109MHZ.
+				      It's the default one.
+   -  <&topckgen CLK_TOP_SYSPLL4_D2>: specify parent clock 78MHZ.
+   -  <&topckgen CLK_TOP_UNIVPLL2_D4>: specify parent clock 104MHZ.
+   -  <&topckgen CLK_TOP_UNIVPLL1_D8>: specify parent clock 78MHZ.
+
+- clock-names: shall be "spi-clk" for the controller clock, and
+  "parent-clk" for the parent clock.
+
+Optional properties:
+- mediatek,pad-select: specify which pins group(ck/mi/mo/cs) spi
+  controller used, this value should be 0~3, only required for MT8173.
+    0: specify GPIO69,70,71,72 for spi pins.
+    1: specify GPIO102,103,104,105 for spi pins.
+    2: specify GPIO128,129,130,131 for spi pins.
+    3: specify GPIO5,6,7,8 for spi pins.
+
+Example:
+
+- SoC Specific Portion:
+spi: spi@1100a000 {
+	compatible = "mediatek,mt8173-spi";
+	#address-cells = <1>;
+	#size-cells = <0>;
+	reg = <0 0x1100a000 0 0x1000>;
+	interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_LOW>;
+	clocks = <&topckgen CLK_TOP_SPI_SEL>, <&topckgen CLK_TOP_SYSPLL3_D2>;
+	clock-names = "spi-clk", "parent-clk";
+	mediatek,pad-select = <0>;
+	status = "disabled";
+};
-- 
cgit v1.2.3


From af1eb2913275c3ab1598b0c24c893499092df08a Mon Sep 17 00:00:00 2001
From: David Woodhouse <David.Woodhouse@intel.com>
Date: Mon, 20 Jul 2015 21:16:28 +0100
Subject: modsign: Allow password to be specified for signing key

We don't want this in the Kconfig since it might then get exposed in
/proc/config.gz. So make it a parameter to Kbuild instead. This also
means we don't have to jump through hoops to strip quotes from it, as
we would if it was a config option.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
---
 Documentation/kbuild/kbuild.txt  |  5 +++++
 Documentation/module-signing.txt |  3 +++
 scripts/sign-file.c              | 27 ++++++++++++++++++++++++++-
 3 files changed, 34 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt
index 6466704d47b5..0ff6a466a05b 100644
--- a/Documentation/kbuild/kbuild.txt
+++ b/Documentation/kbuild/kbuild.txt
@@ -174,6 +174,11 @@ The output directory is often set using "O=..." on the commandline.
 
 The value can be overridden in which case the default value is ignored.
 
+KBUILD_SIGN_PIN
+--------------------------------------------------
+This variable allows a passphrase or PIN to be passed to the sign-file
+utility when signing kernel modules, if the private key requires such.
+
 KBUILD_MODPOST_WARN
 --------------------------------------------------
 KBUILD_MODPOST_WARN can be set to avoid errors in case of undefined
diff --git a/Documentation/module-signing.txt b/Documentation/module-signing.txt
index c72702ec1ded..faaa6ea002f7 100644
--- a/Documentation/module-signing.txt
+++ b/Documentation/module-signing.txt
@@ -194,6 +194,9 @@ The hash algorithm used does not have to match the one configured, but if it
 doesn't, you should make sure that hash algorithm is either built into the
 kernel or can be loaded without requiring itself.
 
+If the private key requires a passphrase or PIN, it can be provided in the
+$KBUILD_SIGN_PIN environment variable.
+
 
 ============================
 SIGNED MODULES AND STRIPPING
diff --git a/scripts/sign-file.c b/scripts/sign-file.c
index 39aaabe89388..720b9bc933ae 100755
--- a/scripts/sign-file.c
+++ b/scripts/sign-file.c
@@ -80,6 +80,27 @@ static void drain_openssl_errors(void)
 		}					\
 	} while(0)
 
+static const char *key_pass;
+
+static int pem_pw_cb(char *buf, int len, int w, void *v)
+{
+	int pwlen;
+
+	if (!key_pass)
+		return -1;
+
+	pwlen = strlen(key_pass);
+	if (pwlen >= len)
+		return -1;
+
+	strcpy(buf, key_pass);
+
+	/* If it's wrong, don't keep trying it. */
+	key_pass = NULL;
+
+	return pwlen;
+}
+
 int main(int argc, char **argv)
 {
 	struct module_signature sig_info = { .id_type = PKEY_ID_PKCS7 };
@@ -96,9 +117,12 @@ int main(int argc, char **argv)
 	BIO *b, *bd = NULL, *bm;
 	int opt, n;
 
+	OpenSSL_add_all_algorithms();
 	ERR_load_crypto_strings();
 	ERR_clear_error();
 
+	key_pass = getenv("KBUILD_SIGN_PIN");
+
 	do {
 		opt = getopt(argc, argv, "dp");
 		switch (opt) {
@@ -132,7 +156,8 @@ int main(int argc, char **argv)
 	 */
 	b = BIO_new_file(private_key_name, "rb");
 	ERR(!b, "%s", private_key_name);
-        private_key = PEM_read_bio_PrivateKey(b, NULL, NULL, NULL);
+	private_key = PEM_read_bio_PrivateKey(b, NULL, pem_pw_cb, NULL);
+	ERR(!private_key, "%s", private_key_name);
 	BIO_free(b);
 
 	b = BIO_new_file(x509_name, "rb");
-- 
cgit v1.2.3


From 19e91b69d77bab16405cc284b451378e89a4110c Mon Sep 17 00:00:00 2001
From: David Woodhouse <David.Woodhouse@intel.com>
Date: Mon, 20 Jul 2015 21:16:29 +0100
Subject: modsign: Allow external signing key to be specified

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/module-signing.txt | 31 ++++++++++++++++++++++++++-----
 Makefile                         |  2 +-
 init/Kconfig                     | 14 ++++++++++++++
 kernel/Makefile                  |  5 +++++
 4 files changed, 46 insertions(+), 6 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/module-signing.txt b/Documentation/module-signing.txt
index faaa6ea002f7..84597c7ea175 100644
--- a/Documentation/module-signing.txt
+++ b/Documentation/module-signing.txt
@@ -88,6 +88,22 @@ This has a number of options available:
      than being a module) so that modules signed with that algorithm can have
      their signatures checked without causing a dependency loop.
 
+ (4) "File name or PKCS#11 URI of module signing key" (CONFIG_MODULE_SIG_KEY)
+
+     Setting this option to something other than its default of
+     "signing_key.priv" will disable the autogeneration of signing keys and
+     allow the kernel modules to be signed with a key of your choosing.
+     The string provided should identify a file containing a private key
+     in PEM form, or — on systems where the OpenSSL ENGINE_pkcs11 is
+     appropriately installed — a PKCS#11 URI as defined by RFC7512.
+
+     If the PEM file containing the private key is encrypted, or if the
+     PKCS#11 token requries a PIN, this can be provided at build time by
+     means of the KBUILD_SIGN_PIN variable.
+
+     The corresponding X.509 certificate in DER form should still be placed
+     in a file named signing_key.x509 in the top-level build directory.
+
 
 =======================
 GENERATING SIGNING KEYS
@@ -100,8 +116,9 @@ it can be deleted or stored securely.  The public key gets built into the
 kernel so that it can be used to check the signatures as the modules are
 loaded.
 
-Under normal conditions, the kernel build will automatically generate a new
-keypair using openssl if one does not exist in the files:
+Under normal conditions, when CONFIG_MODULE_SIG_KEY is unchanged from its
+default of "signing_key.priv", the kernel build will automatically generate
+a new keypair using openssl if one does not exist in the files:
 
 	signing_key.priv
 	signing_key.x509
@@ -135,8 +152,12 @@ kernel sources tree and the openssl command.  The following is an example to
 generate the public/private key files:
 
 	openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \
-	   -config x509.genkey -outform DER -out signing_key.x509 \
-	   -keyout signing_key.priv
+	   -config x509.genkey -outform PEM -out kernel_key.pem \
+	   -keyout kernel_key.pem
+
+The full pathname for the resulting kernel_key.pem file can then be specified
+in the CONFIG_MODULE_SIG_KEY option, and the certificate and key therein will
+be used instead of an autogenerated keypair.
 
 
 =========================
@@ -181,7 +202,7 @@ To manually sign a module, use the scripts/sign-file tool available in
 the Linux kernel source tree.  The script requires 4 arguments:
 
 	1.  The hash algorithm (e.g., sha256)
-	2.  The private key filename
+	2.  The private key filename or PKCS#11 URI
 	3.  The public key filename
 	4.  The kernel module to be signed
 
diff --git a/Makefile b/Makefile
index dc87ec280fbc..531dd16c9751 100644
--- a/Makefile
+++ b/Makefile
@@ -870,7 +870,7 @@ INITRD_COMPRESS-$(CONFIG_RD_LZ4)   := lz4
 # export INITRD_COMPRESS := $(INITRD_COMPRESS-y)
 
 ifdef CONFIG_MODULE_SIG_ALL
-MODSECKEY = ./signing_key.priv
+MODSECKEY = $(CONFIG_MODULE_SIG_KEY)
 MODPUBKEY = ./signing_key.x509
 export MODPUBKEY
 mod_sign_cmd = scripts/sign-file $(CONFIG_MODULE_SIG_HASH) $(MODSECKEY) $(MODPUBKEY)
diff --git a/init/Kconfig b/init/Kconfig
index 14b3d8422502..1b1148e9181b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1948,6 +1948,20 @@ config MODULE_SIG_HASH
 	default "sha384" if MODULE_SIG_SHA384
 	default "sha512" if MODULE_SIG_SHA512
 
+config MODULE_SIG_KEY
+	string "File name or PKCS#11 URI of module signing key"
+	default "signing_key.priv"
+	depends on MODULE_SIG
+	help
+         Provide the file name of a private key in PKCS#8 PEM format, or
+         a PKCS#11 URI according to RFC7512. The corresponding X.509
+         certificate in DER form should be present in signing_key.x509
+         in the top-level build directory.
+
+         If this option is unchanged from its default "signing_key.priv",
+         then the kernel will automatically generate the private key and
+         certificate as described in Documentation/module-signing.txt
+
 config MODULE_COMPRESS
 	bool "Compress modules on installation"
 	depends on MODULES
diff --git a/kernel/Makefile b/kernel/Makefile
index 43c4c920f30a..2c937ace292e 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -170,6 +170,10 @@ ifndef CONFIG_MODULE_SIG_HASH
 $(error Could not determine digest type to use from kernel config)
 endif
 
+# We do it this way rather than having a boolean option for enabling an
+# external private key, because 'make randconfig' might enable such a
+# boolean option and we unfortunately can't make it depend on !RANDCONFIG.
+ifeq ($(CONFIG_MODULE_SIG_KEY),"signing_key.priv")
 signing_key.priv signing_key.x509: x509.genkey
 	@echo "###"
 	@echo "### Now generating an X.509 key pair to be used for signing modules."
@@ -207,3 +211,4 @@ x509.genkey:
 	@echo >>x509.genkey "subjectKeyIdentifier=hash"
 	@echo >>x509.genkey "authorityKeyIdentifier=keyid"
 endif
+endif
-- 
cgit v1.2.3


From 1329e8cc69b93a0b1bc6d197b30dcff628c18dbf Mon Sep 17 00:00:00 2001
From: David Woodhouse <David.Woodhouse@intel.com>
Date: Mon, 20 Jul 2015 21:16:30 +0100
Subject: modsign: Extract signing cert from CONFIG_MODULE_SIG_KEY if needed

Where an external PEM file or PKCS#11 URI is given, we can get the cert
from it for ourselves instead of making the user drop signing_key.x509
in place for us.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/module-signing.txt |  11 ++--
 init/Kconfig                     |   8 +--
 kernel/Makefile                  |  38 +++++++++++
 scripts/Makefile                 |   3 +-
 scripts/extract-cert.c           | 132 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 181 insertions(+), 11 deletions(-)
 create mode 100644 scripts/extract-cert.c

(limited to 'Documentation')

diff --git a/Documentation/module-signing.txt b/Documentation/module-signing.txt
index 84597c7ea175..693001920890 100644
--- a/Documentation/module-signing.txt
+++ b/Documentation/module-signing.txt
@@ -93,17 +93,16 @@ This has a number of options available:
      Setting this option to something other than its default of
      "signing_key.priv" will disable the autogeneration of signing keys and
      allow the kernel modules to be signed with a key of your choosing.
-     The string provided should identify a file containing a private key
-     in PEM form, or — on systems where the OpenSSL ENGINE_pkcs11 is
-     appropriately installed — a PKCS#11 URI as defined by RFC7512.
+     The string provided should identify a file containing both a private
+     key and its corresponding X.509 certificate in PEM form, or — on
+     systems where the OpenSSL ENGINE_pkcs11 is functional — a PKCS#11 URI
+     as defined by RFC7512. In the latter case, the PKCS#11 URI should
+     reference both a certificate and a private key.
 
      If the PEM file containing the private key is encrypted, or if the
      PKCS#11 token requries a PIN, this can be provided at build time by
      means of the KBUILD_SIGN_PIN variable.
 
-     The corresponding X.509 certificate in DER form should still be placed
-     in a file named signing_key.x509 in the top-level build directory.
-
 
 =======================
 GENERATING SIGNING KEYS
diff --git a/init/Kconfig b/init/Kconfig
index 1b1148e9181b..e2e0a1d27886 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1953,10 +1953,10 @@ config MODULE_SIG_KEY
 	default "signing_key.priv"
 	depends on MODULE_SIG
 	help
-         Provide the file name of a private key in PKCS#8 PEM format, or
-         a PKCS#11 URI according to RFC7512. The corresponding X.509
-         certificate in DER form should be present in signing_key.x509
-         in the top-level build directory.
+         Provide the file name of a private key/certificate in PEM format,
+         or a PKCS#11 URI according to RFC7512. The file should contain, or
+         the URI should identify, both the certificate and its corresponding
+         private key.
 
          If this option is unchanged from its default "signing_key.priv",
          then the kernel will automatically generate the private key and
diff --git a/kernel/Makefile b/kernel/Makefile
index 2c937ace292e..fa2f8b84b18a 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -210,5 +210,43 @@ x509.genkey:
 	@echo >>x509.genkey "keyUsage=digitalSignature"
 	@echo >>x509.genkey "subjectKeyIdentifier=hash"
 	@echo >>x509.genkey "authorityKeyIdentifier=keyid"
+else
+# For external (PKCS#11 or PEM) key, we need to obtain the certificate from
+# CONFIG_MODULE_SIG_KEY automatically.
+quiet_cmd_extract_der = CERT_DER $(2)
+      cmd_extract_der = scripts/extract-cert "$(2)" signing_key.x509
+
+# CONFIG_MODULE_SIG_KEY is either a PKCS#11 URI or a filename. It is
+# surrounded by quotes, and may contain spaces. To strip the quotes
+# with $(patsubst) we need to turn the spaces into something else.
+# And if it's a filename, those spaces need to be escaped as '\ ' in
+# order to use it in dependencies or $(wildcard).
+space :=
+space +=
+space_escape := %%%SPACE%%%
+X509_SOURCE_temp := $(subst $(space),$(space_escape),$(CONFIG_MODULE_SIG_KEY))
+# We need this to check for absolute paths or PKCS#11 URIs.
+X509_SOURCE_ONEWORD := $(patsubst "%",%,$(X509_SOURCE_temp))
+# This is the actual source filename/URI without the quotes
+X509_SOURCE := $(subst $(space_escape),$(space),$(X509_SOURCE_ONEWORD))
+# This\ version\ with\ spaces\ escaped\ for\ $(wildcard)\ and\ dependencies
+X509_SOURCE_ESCAPED := $(subst $(space_escape),\$(space),$(X509_SOURCE_ONEWORD))
+
+ifeq ($(patsubst pkcs11:%,%,$(X509_SOURCE_ONEWORD)),$(X509_SOURCE_ONEWORD))
+# If it's a filename, depend on it.
+X509_DEP := $(X509_SOURCE_ESCAPED)
+ifeq ($(patsubst /%,%,$(X509_SOURCE_ONEWORD)),$(X509_SOURCE_ONEWORD))
+ifeq ($(wildcard $(X509_SOURCE_ESCAPED)),)
+ifneq ($(wildcard $(srctree)/$(X509_SOURCE_ESCAPED)),)
+# Non-absolute filename, found in source tree and not build tree
+X509_SOURCE := $(srctree)/$(X509_SOURCE)
+X509_DEP := $(srctree)/$(X509_SOURCE_ESCAPED)
+endif
+endif
+endif
+endif
+
+signing_key.x509: scripts/extract-cert include/config/module/sig/key.h $(X509_DEP)
+	$(call cmd,extract_der,$(X509_SOURCE))
 endif
 endif
diff --git a/scripts/Makefile b/scripts/Makefile
index b12fe020664d..236f683510bd 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -16,11 +16,12 @@ hostprogs-$(CONFIG_VT)           += conmakehash
 hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
 hostprogs-$(CONFIG_BUILDTIME_EXTABLE_SORT) += sortextable
 hostprogs-$(CONFIG_ASN1)	 += asn1_compiler
-hostprogs-$(CONFIG_MODULE_SIG)	 += sign-file
+hostprogs-$(CONFIG_MODULE_SIG)	 += sign-file extract-cert
 
 HOSTCFLAGS_sortextable.o = -I$(srctree)/tools/include
 HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include
 HOSTLOADLIBES_sign-file = -lcrypto
+HOSTLOADLIBES_extract-cert = -lcrypto
 
 always		:= $(hostprogs-y) $(hostprogs-m)
 
diff --git a/scripts/extract-cert.c b/scripts/extract-cert.c
new file mode 100644
index 000000000000..4fd5b2f07b45
--- /dev/null
+++ b/scripts/extract-cert.c
@@ -0,0 +1,132 @@
+/* Extract X.509 certificate in DER form from PKCS#11 or PEM.
+ *
+ * Copyright © 2014 Red Hat, Inc. All Rights Reserved.
+ * Copyright © 2015 Intel Corporation.
+ *
+ * Authors: David Howells <dhowells@redhat.com>
+ *          David Woodhouse <dwmw2@infradead.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <string.h>
+#include <getopt.h>
+#include <err.h>
+#include <arpa/inet.h>
+#include <openssl/bio.h>
+#include <openssl/evp.h>
+#include <openssl/pem.h>
+#include <openssl/pkcs7.h>
+#include <openssl/err.h>
+#include <openssl/engine.h>
+
+#define PKEY_ID_PKCS7 2
+
+static __attribute__((noreturn))
+void format(void)
+{
+	fprintf(stderr,
+		"Usage: scripts/extract-cert <source> <dest>\n");
+	exit(2);
+}
+
+static void display_openssl_errors(int l)
+{
+	const char *file;
+	char buf[120];
+	int e, line;
+
+	if (ERR_peek_error() == 0)
+		return;
+	fprintf(stderr, "At main.c:%d:\n", l);
+
+	while ((e = ERR_get_error_line(&file, &line))) {
+		ERR_error_string(e, buf);
+		fprintf(stderr, "- SSL %s: %s:%d\n", buf, file, line);
+	}
+}
+
+static void drain_openssl_errors(void)
+{
+	const char *file;
+	int line;
+
+	if (ERR_peek_error() == 0)
+		return;
+	while (ERR_get_error_line(&file, &line)) {}
+}
+
+#define ERR(cond, fmt, ...)				\
+	do {						\
+		bool __cond = (cond);			\
+		display_openssl_errors(__LINE__);	\
+		if (__cond) {				\
+			err(1, fmt, ## __VA_ARGS__);	\
+		}					\
+	} while(0)
+
+static const char *key_pass;
+
+int main(int argc, char **argv)
+{
+	char *cert_src, *cert_dst;
+	X509 *x509;
+	BIO *b;
+
+	OpenSSL_add_all_algorithms();
+	ERR_load_crypto_strings();
+	ERR_clear_error();
+
+        key_pass = getenv("KBUILD_SIGN_PIN");
+
+	if (argc != 3)
+		format();
+
+	cert_src = argv[1];
+	cert_dst = argv[2];
+
+	if (!strncmp(cert_src, "pkcs11:", 7)) {
+		ENGINE *e;
+		struct {
+			const char *cert_id;
+			X509 *cert;
+		} parms;
+
+		parms.cert_id = cert_src;
+		parms.cert = NULL;
+
+		ENGINE_load_builtin_engines();
+		drain_openssl_errors();
+		e = ENGINE_by_id("pkcs11");
+		ERR(!e, "Load PKCS#11 ENGINE");
+		if (ENGINE_init(e))
+			drain_openssl_errors();
+		else
+			ERR(1, "ENGINE_init");
+		if (key_pass)
+			ERR(!ENGINE_ctrl_cmd_string(e, "PIN", key_pass, 0), "Set PKCS#11 PIN");
+		ENGINE_ctrl_cmd(e, "LOAD_CERT_CTRL", 0, &parms, NULL, 1);
+		ERR(!parms.cert, "Get X.509 from PKCS#11");
+		x509 = parms.cert;
+	} else {
+		b = BIO_new_file(cert_src, "rb");
+		ERR(!b, "%s", cert_src);
+		x509 = PEM_read_bio_X509(b, NULL, NULL, NULL);
+		ERR(!x509, "%s", cert_src);
+		BIO_free(b);
+	}
+
+	b = BIO_new_file(cert_dst, "wb");
+	ERR(!b, "%s", cert_dst);
+	ERR(!i2d_X509_bio(b, x509), cert_dst);
+	BIO_free(b);
+
+	return 0;
+}
-- 
cgit v1.2.3


From fb1179499134bc718dc7557c7a6a95dc72f224cb Mon Sep 17 00:00:00 2001
From: David Woodhouse <David.Woodhouse@intel.com>
Date: Mon, 20 Jul 2015 21:16:30 +0100
Subject: modsign: Use single PEM file for autogenerated key

The current rule for generating signing_key.priv and signing_key.x509 is
a classic example of a bad rule which has a tendency to break parallel
make. When invoked to create *either* target, it generates the other
target as a side-effect that make didn't predict.

So let's switch to using a single file signing_key.pem which contains
both key and certificate. That matches what we do in the case of an
external key specified by CONFIG_MODULE_SIG_KEY anyway, so it's also
slightly cleaner.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---
 .gitignore                       |  1 +
 Documentation/module-signing.txt |  9 ++++-----
 Makefile                         |  4 ++--
 init/Kconfig                     |  4 ++--
 kernel/Makefile                  | 15 +++++++--------
 5 files changed, 16 insertions(+), 17 deletions(-)

(limited to 'Documentation')

diff --git a/.gitignore b/.gitignore
index 4ad4a98b884b..17fa24dd7e46 100644
--- a/.gitignore
+++ b/.gitignore
@@ -97,6 +97,7 @@ GTAGS
 # Leavings from module signing
 #
 extra_certificates
+signing_key.pem
 signing_key.priv
 signing_key.x509
 x509.genkey
diff --git a/Documentation/module-signing.txt b/Documentation/module-signing.txt
index 693001920890..5d5e4e32dc26 100644
--- a/Documentation/module-signing.txt
+++ b/Documentation/module-signing.txt
@@ -91,7 +91,7 @@ This has a number of options available:
  (4) "File name or PKCS#11 URI of module signing key" (CONFIG_MODULE_SIG_KEY)
 
      Setting this option to something other than its default of
-     "signing_key.priv" will disable the autogeneration of signing keys and
+     "signing_key.pem" will disable the autogeneration of signing keys and
      allow the kernel modules to be signed with a key of your choosing.
      The string provided should identify a file containing both a private
      key and its corresponding X.509 certificate in PEM form, or — on
@@ -116,11 +116,10 @@ kernel so that it can be used to check the signatures as the modules are
 loaded.
 
 Under normal conditions, when CONFIG_MODULE_SIG_KEY is unchanged from its
-default of "signing_key.priv", the kernel build will automatically generate
-a new keypair using openssl if one does not exist in the files:
+default, the kernel build will automatically generate a new keypair using
+openssl if one does not exist in the file:
 
-	signing_key.priv
-	signing_key.x509
+	signing_key.pem
 
 during the building of vmlinux (the public part of the key needs to be built
 into vmlinux) using parameters in the:
diff --git a/Makefile b/Makefile
index 531dd16c9751..6ab99d8cc23c 100644
--- a/Makefile
+++ b/Makefile
@@ -1173,8 +1173,8 @@ MRPROPER_DIRS  += include/config usr/include include/generated          \
 		  arch/*/include/generated .tmp_objdiff
 MRPROPER_FILES += .config .config.old .version .old_version \
 		  Module.symvers tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS \
-		  signing_key.priv signing_key.x509 x509.genkey		\
-		  extra_certificates signing_key.x509.keyid		\
+		  signing_key.pem signing_key.priv signing_key.x509	\
+		  x509.genkey extra_certificates signing_key.x509.keyid	\
 		  signing_key.x509.signer vmlinux-gdb.py
 
 # clean - Delete most, but leave enough to build external modules
diff --git a/init/Kconfig b/init/Kconfig
index e2e0a1d27886..2b119850784b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1950,7 +1950,7 @@ config MODULE_SIG_HASH
 
 config MODULE_SIG_KEY
 	string "File name or PKCS#11 URI of module signing key"
-	default "signing_key.priv"
+	default "signing_key.pem"
 	depends on MODULE_SIG
 	help
          Provide the file name of a private key/certificate in PEM format,
@@ -1958,7 +1958,7 @@ config MODULE_SIG_KEY
          the URI should identify, both the certificate and its corresponding
          private key.
 
-         If this option is unchanged from its default "signing_key.priv",
+         If this option is unchanged from its default "signing_key.pem",
          then the kernel will automatically generate the private key and
          certificate as described in Documentation/module-signing.txt
 
diff --git a/kernel/Makefile b/kernel/Makefile
index fa2f8b84b18a..7453283981ca 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -173,8 +173,8 @@ endif
 # We do it this way rather than having a boolean option for enabling an
 # external private key, because 'make randconfig' might enable such a
 # boolean option and we unfortunately can't make it depend on !RANDCONFIG.
-ifeq ($(CONFIG_MODULE_SIG_KEY),"signing_key.priv")
-signing_key.priv signing_key.x509: x509.genkey
+ifeq ($(CONFIG_MODULE_SIG_KEY),"signing_key.pem")
+signing_key.pem: x509.genkey
 	@echo "###"
 	@echo "### Now generating an X.509 key pair to be used for signing modules."
 	@echo "###"
@@ -185,8 +185,8 @@ signing_key.priv signing_key.x509: x509.genkey
 	@echo "###"
 	openssl req -new -nodes -utf8 -$(CONFIG_MODULE_SIG_HASH) -days 36500 \
 		-batch -x509 -config x509.genkey \
-		-outform DER -out signing_key.x509 \
-		-keyout signing_key.priv 2>&1
+		-outform PEM -out signing_key.pem \
+		-keyout signing_key.pem 2>&1
 	@echo "###"
 	@echo "### Key pair generated."
 	@echo "###"
@@ -210,9 +210,9 @@ x509.genkey:
 	@echo >>x509.genkey "keyUsage=digitalSignature"
 	@echo >>x509.genkey "subjectKeyIdentifier=hash"
 	@echo >>x509.genkey "authorityKeyIdentifier=keyid"
-else
-# For external (PKCS#11 or PEM) key, we need to obtain the certificate from
-# CONFIG_MODULE_SIG_KEY automatically.
+endif
+
+# We need to obtain the certificate from CONFIG_MODULE_SIG_KEY.
 quiet_cmd_extract_der = CERT_DER $(2)
       cmd_extract_der = scripts/extract-cert "$(2)" signing_key.x509
 
@@ -249,4 +249,3 @@ endif
 signing_key.x509: scripts/extract-cert include/config/module/sig/key.h $(X509_DEP)
 	$(call cmd,extract_der,$(X509_SOURCE))
 endif
-endif
-- 
cgit v1.2.3


From 99d27b1b52bd5cdf9bd9f7661ca8641e9a1b55e6 Mon Sep 17 00:00:00 2001
From: David Woodhouse <David.Woodhouse@intel.com>
Date: Mon, 20 Jul 2015 21:16:31 +0100
Subject: modsign: Add explicit CONFIG_SYSTEM_TRUSTED_KEYS option

Let the user explicitly provide a file containing trusted keys, instead of
just automatically finding files matching *.x509 in the build tree and
trusting whatever we find. This really ought to be an *explicit*
configuration, and the build rules for dealing with the files were
fairly painful too.

Fix applied from James Morris that removes an '=' from a macro definition
in kernel/Makefile as this is a feature that only exists from GNU make 3.82
onwards.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/module-signing.txt |  15 +++--
 init/Kconfig                     |  13 ++++
 kernel/Makefile                  | 125 ++++++++++++++++++++-------------------
 3 files changed, 89 insertions(+), 64 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/module-signing.txt b/Documentation/module-signing.txt
index 5d5e4e32dc26..4e62bc29666e 100644
--- a/Documentation/module-signing.txt
+++ b/Documentation/module-signing.txt
@@ -88,6 +88,7 @@ This has a number of options available:
      than being a module) so that modules signed with that algorithm can have
      their signatures checked without causing a dependency loop.
 
+
  (4) "File name or PKCS#11 URI of module signing key" (CONFIG_MODULE_SIG_KEY)
 
      Setting this option to something other than its default of
@@ -104,6 +105,13 @@ This has a number of options available:
      means of the KBUILD_SIGN_PIN variable.
 
 
+ (5) "Additional X.509 keys for default system keyring" (CONFIG_SYSTEM_TRUSTED_KEYS)
+
+     This option can be set to the filename of a PEM-encoded file containing
+     additional certificates which will be included in the system keyring by
+     default.
+
+
 =======================
 GENERATING SIGNING KEYS
 =======================
@@ -171,10 +179,9 @@ in a keyring called ".system_keyring" that can be seen by:
 	302d2d52 I------     1 perm 1f010000     0     0 asymmetri Fedora kernel signing key: d69a84e6bce3d216b979e9505b3e3ef9a7118079: X509.RSA a7118079 []
 	...
 
-Beyond the public key generated specifically for module signing, any file
-placed in the kernel source root directory or the kernel build root directory
-whose name is suffixed with ".x509" will be assumed to be an X.509 public key
-and will be added to the keyring.
+Beyond the public key generated specifically for module signing, additional
+trusted certificates can be provided in a PEM-encoded file referenced by the
+CONFIG_SYSTEM_TRUSTED_KEYS configuration option.
 
 Further, the architecture code may take public keys from a hardware store and
 add those in also (e.g. from the UEFI key database).
diff --git a/init/Kconfig b/init/Kconfig
index 2b119850784b..62b725653c36 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1752,6 +1752,19 @@ config SYSTEM_TRUSTED_KEYRING
 
 	  Keys in this keyring are used by module signature checking.
 
+config SYSTEM_TRUSTED_KEYS
+	string "Additional X.509 keys for default system keyring"
+	depends on SYSTEM_TRUSTED_KEYRING
+	help
+	  If set, this option should be the filename of a PEM-formatted file
+	  containing trusted X.509 certificates to be included in the default
+	  system keyring. Any certificate used for module signing is implicitly
+	  also trusted.
+
+	  NOTE: If you previously provided keys for the system keyring in the
+	  form of DER-encoded *.x509 files in the top-level build directory,
+	  those are no longer used. You will need to set this option instead.
+
 config SYSTEM_DATA_VERIFICATION
 	def_bool n
 	select SYSTEM_TRUSTED_KEYRING
diff --git a/kernel/Makefile b/kernel/Makefile
index 7453283981ca..575329777d9e 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -114,46 +114,75 @@ $(obj)/config_data.h: $(obj)/config_data.gz FORCE
 
 ###############################################################################
 #
-# Roll all the X.509 certificates that we can find together and pull them into
-# the kernel so that they get loaded into the system trusted keyring during
-# boot.
+# When a Kconfig string contains a filename, it is suitable for
+# passing to shell commands. It is surrounded by double-quotes, and
+# any double-quotes or backslashes within it are escaped by
+# backslashes.
 #
-# We look in the source root and the build root for all files whose name ends
-# in ".x509".  Unfortunately, this will generate duplicate filenames, so we
-# have make canonicalise the pathnames and then sort them to discard the
-# duplicates.
+# This is no use for dependencies or $(wildcard). We need to strip the
+# surrounding quotes and the escaping from quotes and backslashes, and
+# we *do* need to escape any spaces in the string. So, for example:
+#
+# Usage: $(eval $(call config_filename,FOO))
+#
+# Defines FOO_FILENAME based on the contents of the CONFIG_FOO option,
+# transformed as described above to be suitable for use within the
+# makefile.
+#
+# Also, if the filename is a relative filename and exists in the source
+# tree but not the build tree, define FOO_SRCPREFIX as $(srctree)/ to
+# be prefixed to *both* command invocation and dependencies.
+#
+# Note: We also print the filenames in the quiet_cmd_foo text, and
+# perhaps ought to have a version specially escaped for that purpose.
+# But it's only cosmetic, and $(patsubst "%",%,$(CONFIG_FOO)) is good
+# enough.  It'll strip the quotes in the common case where there's no
+# space and it's a simple filename, and it'll retain the quotes when
+# there's a space. There are some esoteric cases in which it'll print
+# the wrong thing, but we don't really care. The actual dependencies
+# and commands *do* get it right, with various combinations of single
+# and double quotes, backslashes and spaces in the filenames.
 #
 ###############################################################################
-ifeq ($(CONFIG_SYSTEM_TRUSTED_KEYRING),y)
-X509_CERTIFICATES-y := $(wildcard *.x509) $(wildcard $(srctree)/*.x509)
-X509_CERTIFICATES-$(CONFIG_MODULE_SIG) += $(objtree)/signing_key.x509
-X509_CERTIFICATES-raw := $(sort $(foreach CERT,$(X509_CERTIFICATES-y), \
-				$(or $(realpath $(CERT)),$(CERT))))
-X509_CERTIFICATES := $(subst $(realpath $(objtree))/,,$(X509_CERTIFICATES-raw))
-
-ifeq ($(X509_CERTIFICATES),)
-$(warning *** No X.509 certificates found ***)
+#
+quote := $(firstword " ")
+space :=
+space +=
+space_escape := %%%SPACE%%%
+#
+define config_filename
+ifneq ($$(CONFIG_$(1)),"")
+$(1)_FILENAME := $$(subst \\,\,$$(subst \$$(quote),$$(quote),$$(subst $$(space_escape),\$$(space),$$(patsubst "%",%,$$(subst $$(space),$$(space_escape),$$(CONFIG_$(1)))))))
+ifneq ($$(patsubst /%,%,$$(firstword $$($(1)_FILENAME))),$$(firstword $$($(1)_FILENAME)))
+else
+ifeq ($$(wildcard $$($(1)_FILENAME)),)
+ifneq ($$(wildcard $$(srctree)/$$($(1)_FILENAME)),)
+$(1)_SRCPREFIX := $(srctree)/
+endif
 endif
-
-ifneq ($(wildcard $(obj)/.x509.list),)
-ifneq ($(shell cat $(obj)/.x509.list),$(X509_CERTIFICATES))
-$(warning X.509 certificate list changed to "$(X509_CERTIFICATES)" from "$(shell cat $(obj)/.x509.list)")
-$(shell rm $(obj)/.x509.list)
 endif
 endif
+endef
+#
+###############################################################################
+
+
+ifeq ($(CONFIG_SYSTEM_TRUSTED_KEYRING),y)
+
+$(eval $(call config_filename,SYSTEM_TRUSTED_KEYS))
+
+SIGNING_X509-$(CONFIG_MODULE_SIG) += signing_key.x509
 
 kernel/system_certificates.o: $(obj)/x509_certificate_list
 
-quiet_cmd_x509certs  = CERTS   $@
-      cmd_x509certs  = cat $(X509_CERTIFICATES) /dev/null >$@ $(foreach X509,$(X509_CERTIFICATES),; $(kecho) "  - Including cert $(X509)")
+quiet_cmd_x509certs  = CERTS   $(SIGNING_X509-y) $(patsubst "%",%,$(2))
+      cmd_x509certs  = ( cat $(SIGNING_X509-y) /dev/null; \
+			 awk '/-----BEGIN CERTIFICATE-----/{flag=1;next}/-----END CERTIFICATE-----/{flag=0}flag' $(2) /dev/null | base64 -d ) > $@ || ( rm $@; exit 1)
 
 targets += $(obj)/x509_certificate_list
-$(obj)/x509_certificate_list: $(X509_CERTIFICATES) $(obj)/.x509.list
-	$(call if_changed,x509certs)
+$(obj)/x509_certificate_list: $(SIGNING_X509-y) include/config/system/trusted/keys.h $(wildcard include/config/module/sig.h) $(SYSTEM_TRUSTED_KEYS_SRCPREFIX)$(SYSTEM_TRUSTED_KEYS_FILENAME)
+	$(call if_changed,x509certs,$(SYSTEM_TRUSTED_KEYS_SRCPREFIX)$(CONFIG_SYSTEM_TRUSTED_KEYS))
 
-targets += $(obj)/.x509.list
-$(obj)/.x509.list:
-	@echo $(X509_CERTIFICATES) >$@
 endif
 
 clean-files := x509_certificate_list .x509.list
@@ -212,40 +241,16 @@ x509.genkey:
 	@echo >>x509.genkey "authorityKeyIdentifier=keyid"
 endif
 
-# We need to obtain the certificate from CONFIG_MODULE_SIG_KEY.
-quiet_cmd_extract_der = CERT_DER $(2)
-      cmd_extract_der = scripts/extract-cert "$(2)" signing_key.x509
+$(eval $(call config_filename,MODULE_SIG_KEY))
 
-# CONFIG_MODULE_SIG_KEY is either a PKCS#11 URI or a filename. It is
-# surrounded by quotes, and may contain spaces. To strip the quotes
-# with $(patsubst) we need to turn the spaces into something else.
-# And if it's a filename, those spaces need to be escaped as '\ ' in
-# order to use it in dependencies or $(wildcard).
-space :=
-space +=
-space_escape := %%%SPACE%%%
-X509_SOURCE_temp := $(subst $(space),$(space_escape),$(CONFIG_MODULE_SIG_KEY))
-# We need this to check for absolute paths or PKCS#11 URIs.
-X509_SOURCE_ONEWORD := $(patsubst "%",%,$(X509_SOURCE_temp))
-# This is the actual source filename/URI without the quotes
-X509_SOURCE := $(subst $(space_escape),$(space),$(X509_SOURCE_ONEWORD))
-# This\ version\ with\ spaces\ escaped\ for\ $(wildcard)\ and\ dependencies
-X509_SOURCE_ESCAPED := $(subst $(space_escape),\$(space),$(X509_SOURCE_ONEWORD))
-
-ifeq ($(patsubst pkcs11:%,%,$(X509_SOURCE_ONEWORD)),$(X509_SOURCE_ONEWORD))
-# If it's a filename, depend on it.
-X509_DEP := $(X509_SOURCE_ESCAPED)
-ifeq ($(patsubst /%,%,$(X509_SOURCE_ONEWORD)),$(X509_SOURCE_ONEWORD))
-ifeq ($(wildcard $(X509_SOURCE_ESCAPED)),)
-ifneq ($(wildcard $(srctree)/$(X509_SOURCE_ESCAPED)),)
-# Non-absolute filename, found in source tree and not build tree
-X509_SOURCE := $(srctree)/$(X509_SOURCE)
-X509_DEP := $(srctree)/$(X509_SOURCE_ESCAPED)
-endif
-endif
-endif
+# If CONFIG_MODULE_SIG_KEY isn't a PKCS#11 URI, depend on it
+ifeq ($(patsubst pkcs11:%,%,$(firstword $(MODULE_SIG_KEY_FILENAME))),$(firstword $(MODULE_SIG_KEY_FILENAME)))
+X509_DEP := $(MODULE_SIG_KEY_SRCPREFIX)$(MODULE_SIG_KEY_FILENAME)
 endif
 
+quiet_cmd_extract_der = SIGNING_CERT $(patsubst "%",%,$(2))
+      cmd_extract_der = scripts/extract-cert $(2) signing_key.x509
+
 signing_key.x509: scripts/extract-cert include/config/module/sig/key.h $(X509_DEP)
-	$(call cmd,extract_der,$(X509_SOURCE))
+	$(call cmd,extract_der,$(MODULE_SIG_KEY_SRCPREFIX)$(CONFIG_MODULE_SIG_KEY))
 endif
-- 
cgit v1.2.3


From afe10358e47a3ea1f69005cb1be78170d23f8afa Mon Sep 17 00:00:00 2001
From: Dmitry Torokhov <dtor@chromium.org>
Date: Thu, 16 Apr 2015 18:14:55 -0700
Subject: Input: elants_i2c - wire up regulator support

Elan touchscreen controllers use two power supplies, vcc33 and vccio,
and we need to enable them before trying to access the device. On X86
firmware usually does this, but on ARM it is usually left to the kernel.

Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
Reviewed-by: Benson Leung <bleung@chromium.org>
Reviewed-by: Scott Liu <scott.liu@emc.com.tw>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 .../devicetree/bindings/input/elants_i2c.txt       |   3 +
 drivers/input/touchscreen/elants_i2c.c             | 184 ++++++++++++++++++---
 2 files changed, 165 insertions(+), 22 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/input/elants_i2c.txt b/Documentation/devicetree/bindings/input/elants_i2c.txt
index a765232e6446..8a71038f3489 100644
--- a/Documentation/devicetree/bindings/input/elants_i2c.txt
+++ b/Documentation/devicetree/bindings/input/elants_i2c.txt
@@ -13,6 +13,9 @@ Optional properties:
 - pinctrl-names: should be "default" (see pinctrl binding [1]).
 - pinctrl-0: a phandle pointing to the pin settings for the device (see
   pinctrl binding [1]).
+- reset-gpios: reset gpio the chip is connected to.
+- vcc33-supply: a phandle for the regulator supplying 3.3V power.
+- vccio-supply: a phandle for the regulator supplying IO power.
 
 [0]: Documentation/devicetree/bindings/interrupt-controller/interrupts.txt
 [1]: Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt
diff --git a/drivers/input/touchscreen/elants_i2c.c b/drivers/input/touchscreen/elants_i2c.c
index 42e43f14602e..19122652e71b 100644
--- a/drivers/input/touchscreen/elants_i2c.c
+++ b/drivers/input/touchscreen/elants_i2c.c
@@ -38,6 +38,8 @@
 #include <linux/input/mt.h>
 #include <linux/acpi.h>
 #include <linux/of.h>
+#include <linux/gpio/consumer.h>
+#include <linux/regulator/consumer.h>
 #include <asm/unaligned.h>
 
 /* Device, Driver information */
@@ -102,6 +104,9 @@
 /* calibration timeout definition */
 #define ELAN_CALI_TIMEOUT_MSEC	10000
 
+#define ELAN_POWERON_DELAY_USEC	500
+#define ELAN_RESET_DELAY_MSEC	20
+
 enum elants_state {
 	ELAN_STATE_NORMAL,
 	ELAN_WAIT_QUEUE_HEADER,
@@ -118,6 +123,10 @@ struct elants_data {
 	struct i2c_client *client;
 	struct input_dev *input;
 
+	struct regulator *vcc33;
+	struct regulator *vccio;
+	struct gpio_desc *reset_gpio;
+
 	u16 fw_version;
 	u8 test_version;
 	u8 solution_version;
@@ -141,6 +150,7 @@ struct elants_data {
 	u8 buf[MAX_PACKET_SIZE];
 
 	bool wake_irq_enabled;
+	bool keep_power_in_suspend;
 };
 
 static int elants_i2c_send(struct i2c_client *client,
@@ -1058,6 +1068,67 @@ static void elants_i2c_remove_sysfs_group(void *_data)
 	sysfs_remove_group(&ts->client->dev.kobj, &elants_attribute_group);
 }
 
+static int elants_i2c_power_on(struct elants_data *ts)
+{
+	int error;
+
+	/*
+	 * If we do not have reset gpio assume platform firmware
+	 * controls regulators and does power them on for us.
+	 */
+	if (IS_ERR_OR_NULL(ts->reset_gpio))
+		return 0;
+
+	gpiod_set_value_cansleep(ts->reset_gpio, 1);
+
+	error = regulator_enable(ts->vcc33);
+	if (error) {
+		dev_err(&ts->client->dev,
+			"failed to enable vcc33 regulator: %d\n",
+			error);
+		goto release_reset_gpio;
+	}
+
+	error = regulator_enable(ts->vccio);
+	if (error) {
+		dev_err(&ts->client->dev,
+			"failed to enable vccio regulator: %d\n",
+			error);
+		regulator_disable(ts->vcc33);
+		goto release_reset_gpio;
+	}
+
+	/*
+	 * We need to wait a bit after powering on controller before
+	 * we are allowed to release reset GPIO.
+	 */
+	udelay(ELAN_POWERON_DELAY_USEC);
+
+release_reset_gpio:
+	gpiod_set_value_cansleep(ts->reset_gpio, 0);
+	if (error)
+		return error;
+
+	msleep(ELAN_RESET_DELAY_MSEC);
+
+	return 0;
+}
+
+static void elants_i2c_power_off(void *_data)
+{
+	struct elants_data *ts = _data;
+
+	if (!IS_ERR_OR_NULL(ts->reset_gpio)) {
+		/*
+		 * Activate reset gpio to prevent leakage through the
+		 * pin once we shut off power to the controller.
+		 */
+		gpiod_set_value_cansleep(ts->reset_gpio, 1);
+		regulator_disable(ts->vccio);
+		regulator_disable(ts->vcc33);
+	}
+}
+
 static int elants_i2c_probe(struct i2c_client *client,
 			    const struct i2c_device_id *id)
 {
@@ -1072,13 +1143,6 @@ static int elants_i2c_probe(struct i2c_client *client,
 		return -ENXIO;
 	}
 
-	/* Make sure there is something at this address */
-	if (i2c_smbus_xfer(client->adapter, client->addr, 0,
-			I2C_SMBUS_READ, 0, I2C_SMBUS_BYTE, &dummy) < 0) {
-		dev_err(&client->dev, "nothing at this address\n");
-		return -ENXIO;
-	}
-
 	ts = devm_kzalloc(&client->dev, sizeof(struct elants_data), GFP_KERNEL);
 	if (!ts)
 		return -ENOMEM;
@@ -1089,6 +1153,70 @@ static int elants_i2c_probe(struct i2c_client *client,
 	ts->client = client;
 	i2c_set_clientdata(client, ts);
 
+	ts->vcc33 = devm_regulator_get(&client->dev, "vcc33");
+	if (IS_ERR(ts->vcc33)) {
+		error = PTR_ERR(ts->vcc33);
+		if (error != -EPROBE_DEFER)
+			dev_err(&client->dev,
+				"Failed to get 'vcc33' regulator: %d\n",
+				error);
+		return error;
+	}
+
+	ts->vccio = devm_regulator_get(&client->dev, "vccio");
+	if (IS_ERR(ts->vccio)) {
+		error = PTR_ERR(ts->vccio);
+		if (error != -EPROBE_DEFER)
+			dev_err(&client->dev,
+				"Failed to get 'vccio' regulator: %d\n",
+				error);
+		return error;
+	}
+
+	ts->reset_gpio = devm_gpiod_get(&client->dev, "reset");
+	if (IS_ERR(ts->reset_gpio)) {
+		error = PTR_ERR(ts->reset_gpio);
+
+		if (error == -EPROBE_DEFER)
+			return error;
+
+		if (error != -ENOENT && error != -ENOSYS) {
+			dev_err(&client->dev,
+				"failed to get reset gpio: %d\n",
+				error);
+			return error;
+		}
+
+		ts->keep_power_in_suspend = true;
+	} else {
+		error = gpiod_direction_output(ts->reset_gpio, 0);
+		if (error) {
+			dev_err(&client->dev,
+				"failed to configure reset gpio as output: %d\n",
+				error);
+			return error;
+		}
+	}
+
+	error = elants_i2c_power_on(ts);
+	if (error)
+		return error;
+
+	error = devm_add_action(&client->dev, elants_i2c_power_off, ts);
+	if (error) {
+		dev_err(&client->dev,
+			"failed to install power off action: %d\n", error);
+		elants_i2c_power_off(ts);
+		return error;
+	}
+
+	/* Make sure there is something at this address */
+	if (i2c_smbus_xfer(client->adapter, client->addr, 0,
+			   I2C_SMBUS_READ, 0, I2C_SMBUS_BYTE, &dummy) < 0) {
+		dev_err(&client->dev, "nothing at this address\n");
+		return -ENXIO;
+	}
+
 	error = elants_i2c_initialize(ts);
 	if (error) {
 		dev_err(&client->dev, "failed to initialize: %d\n", error);
@@ -1196,17 +1324,23 @@ static int __maybe_unused elants_i2c_suspend(struct device *dev)
 
 	disable_irq(client->irq);
 
-	for (retry_cnt = 0; retry_cnt < MAX_RETRIES; retry_cnt++) {
-		error = elants_i2c_send(client, set_sleep_cmd,
-					sizeof(set_sleep_cmd));
-		if (!error)
-			break;
+	if (device_may_wakeup(dev) || ts->keep_power_in_suspend) {
+		for (retry_cnt = 0; retry_cnt < MAX_RETRIES; retry_cnt++) {
+			error = elants_i2c_send(client, set_sleep_cmd,
+						sizeof(set_sleep_cmd));
+			if (!error)
+				break;
 
-		dev_err(&client->dev, "suspend command failed: %d\n", error);
-	}
+			dev_err(&client->dev,
+				"suspend command failed: %d\n", error);
+		}
 
-	if (device_may_wakeup(dev))
-		ts->wake_irq_enabled = (enable_irq_wake(client->irq) == 0);
+		if (device_may_wakeup(dev))
+			ts->wake_irq_enabled =
+					(enable_irq_wake(client->irq) == 0);
+	} else {
+		elants_i2c_power_off(ts);
+	}
 
 	return 0;
 }
@@ -1222,13 +1356,19 @@ static int __maybe_unused elants_i2c_resume(struct device *dev)
 	if (device_may_wakeup(dev) && ts->wake_irq_enabled)
 		disable_irq_wake(client->irq);
 
-	for (retry_cnt = 0; retry_cnt < MAX_RETRIES; retry_cnt++) {
-		error = elants_i2c_send(client, set_active_cmd,
-					sizeof(set_active_cmd));
-		if (!error)
-			break;
+	if (ts->keep_power_in_suspend) {
+		for (retry_cnt = 0; retry_cnt < MAX_RETRIES; retry_cnt++) {
+			error = elants_i2c_send(client, set_active_cmd,
+						sizeof(set_active_cmd));
+			if (!error)
+				break;
 
-		dev_err(&client->dev, "resume command failed: %d\n", error);
+			dev_err(&client->dev,
+				"resume command failed: %d\n", error);
+		}
+	} else {
+		elants_i2c_power_on(ts);
+		elants_i2c_initialize(ts);
 	}
 
 	ts->state = ELAN_STATE_NORMAL;
-- 
cgit v1.2.3


From 5b55f52d13d3643e7d62ad2f37f55d8e7f5f0c8e Mon Sep 17 00:00:00 2001
From: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Date: Fri, 7 Aug 2015 18:31:14 +0530
Subject: Documentation/fb: add documentation for sm712fb

Create the documentation for SM712. Mention all the supported modes and
how to use.

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/fb/sm712fb.txt | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)
 create mode 100644 Documentation/fb/sm712fb.txt

(limited to 'Documentation')

diff --git a/Documentation/fb/sm712fb.txt b/Documentation/fb/sm712fb.txt
new file mode 100644
index 000000000000..c388442edf51
--- /dev/null
+++ b/Documentation/fb/sm712fb.txt
@@ -0,0 +1,31 @@
+What is sm712fb?
+=================
+
+This is a graphics framebuffer driver for Silicon Motion SM712 based processors.
+
+How to use it?
+==============
+
+Switching modes is done using the video=sm712fb:... boot parameter.
+
+If you want, for example, enable a resolution of 1280x1024x24bpp you should
+pass to the kernel this command line: "video=sm712fb:0x31B".
+
+You should not compile-in vesafb.
+
+Currently supported video modes are:
+
+[Graphic modes]
+
+bpp | 640x480  800x600  1024x768  1280x1024
+----+--------------------------------------------
+  8 | 0x301    0x303    0x305    0x307
+ 16 | 0x311    0x314    0x317    0x31A
+ 24 | 0x312    0x315    0x318    0x31B
+
+Missing Features
+================
+(alias TODO list)
+
+	* 2D acceleratrion
+	* dual-head support
-- 
cgit v1.2.3


From 3fc147e9156f6c176e5543c59d31182252f14933 Mon Sep 17 00:00:00 2001
From: Heiko Stuebner <heiko@sntech.de>
Date: Tue, 4 Aug 2015 21:37:01 +0200
Subject: PM / AVS: rockchip-io: add io selectors and supplies for rk3368

This adds the necessary data for handling io voltage domains on the rk3368.
As interesting tidbit, the rk3368 contains two separate iodomain areas.
One in the regular General Register Files (GRF) and one in PMUGRF in the
pmu power domain.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 .../bindings/power/rockchip-io-domain.txt          | 14 +++++
 drivers/power/avs/rockchip-io-domain.c             | 59 ++++++++++++++++++++++
 2 files changed, 73 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/power/rockchip-io-domain.txt b/Documentation/devicetree/bindings/power/rockchip-io-domain.txt
index 8b70db103ca7..b8627e763dba 100644
--- a/Documentation/devicetree/bindings/power/rockchip-io-domain.txt
+++ b/Documentation/devicetree/bindings/power/rockchip-io-domain.txt
@@ -33,6 +33,8 @@ Required properties:
 - compatible: should be one of:
   - "rockchip,rk3188-io-voltage-domain" for rk3188
   - "rockchip,rk3288-io-voltage-domain" for rk3288
+  - "rockchip,rk3368-io-voltage-domain" for rk3368
+  - "rockchip,rk3368-pmu-io-voltage-domain" for rk3368 pmu-domains
 - rockchip,grf: phandle to the syscon managing the "general register files"
 
 
@@ -64,6 +66,18 @@ Possible supplies for rk3288:
 - sdcard-supply: The supply connected to SDMMC0_VDD.
 - wifi-supply:   The supply connected to APIO3_VDD.  Also known as SDIO0.
 
+Possible supplies for rk3368:
+- audio-supply:  The supply connected to APIO3_VDD.
+- dvp-supply:    The supply connected to DVPIO_VDD.
+- flash0-supply: The supply connected to FLASH0_VDD.  Typically for eMMC
+- gpio30-supply: The supply connected to APIO1_VDD.
+- gpio1830       The supply connected to APIO4_VDD.
+- sdcard-supply: The supply connected to SDMMC0_VDD.
+- wifi-supply:   The supply connected to APIO2_VDD.  Also known as SDIO0.
+
+Possible supplies for rk3368 pmu-domains:
+- pmu-supply:    The supply connected to PMUIO_VDD.
+- vop-supply:    The supply connected to LCDC_VDD.
 
 Example:
 
diff --git a/drivers/power/avs/rockchip-io-domain.c b/drivers/power/avs/rockchip-io-domain.c
index 3ae35d0590d2..2e300028f0f7 100644
--- a/drivers/power/avs/rockchip-io-domain.c
+++ b/drivers/power/avs/rockchip-io-domain.c
@@ -43,6 +43,10 @@
 #define RK3288_SOC_CON2_FLASH0		BIT(7)
 #define RK3288_SOC_FLASH_SUPPLY_NUM	2
 
+#define RK3368_SOC_CON15		0x43c
+#define RK3368_SOC_CON15_FLASH0		BIT(14)
+#define RK3368_SOC_FLASH_SUPPLY_NUM	2
+
 struct rockchip_iodomain;
 
 /**
@@ -158,6 +162,25 @@ static void rk3288_iodomain_init(struct rockchip_iodomain *iod)
 		dev_warn(iod->dev, "couldn't update flash0 ctrl\n");
 }
 
+static void rk3368_iodomain_init(struct rockchip_iodomain *iod)
+{
+	int ret;
+	u32 val;
+
+	/* if no flash supply we should leave things alone */
+	if (!iod->supplies[RK3368_SOC_FLASH_SUPPLY_NUM].reg)
+		return;
+
+	/*
+	 * set flash0 iodomain to also use this framework
+	 * instead of a special gpio.
+	 */
+	val = RK3368_SOC_CON15_FLASH0 | (RK3368_SOC_CON15_FLASH0 << 16);
+	ret = regmap_write(iod->grf, RK3368_SOC_CON15, val);
+	if (ret < 0)
+		dev_warn(iod->dev, "couldn't update flash0 ctrl\n");
+}
+
 /*
  * On the rk3188 the io-domains are handled by a shared register with the
  * lower 8 bits being still being continuing drive-strength settings.
@@ -201,6 +224,34 @@ static const struct rockchip_iodomain_soc_data soc_data_rk3288 = {
 	.init = rk3288_iodomain_init,
 };
 
+static const struct rockchip_iodomain_soc_data soc_data_rk3368 = {
+	.grf_offset = 0x900,
+	.supply_names = {
+		NULL,		/* reserved */
+		"dvp",		/* DVPIO_VDD */
+		"flash0",	/* FLASH0_VDD (emmc) */
+		"wifi",		/* APIO2_VDD (sdio0) */
+		NULL,
+		"audio",	/* APIO3_VDD */
+		"sdcard",	/* SDMMC0_VDD (sdmmc) */
+		"gpio30",	/* APIO1_VDD */
+		"gpio1830",	/* APIO4_VDD (gpujtag) */
+	},
+	.init = rk3368_iodomain_init,
+};
+
+static const struct rockchip_iodomain_soc_data soc_data_rk3368_pmu = {
+	.grf_offset = 0x100,
+	.supply_names = {
+		NULL,
+		NULL,
+		NULL,
+		NULL,
+		"pmu",	        /*PMU IO domain*/
+		"vop",	        /*LCDC IO domain*/
+	},
+};
+
 static const struct of_device_id rockchip_iodomain_match[] = {
 	{
 		.compatible = "rockchip,rk3188-io-voltage-domain",
@@ -210,6 +261,14 @@ static const struct of_device_id rockchip_iodomain_match[] = {
 		.compatible = "rockchip,rk3288-io-voltage-domain",
 		.data = (void *)&soc_data_rk3288
 	},
+	{
+		.compatible = "rockchip,rk3368-io-voltage-domain",
+		.data = (void *)&soc_data_rk3368
+	},
+	{
+		.compatible = "rockchip,rk3368-pmu-io-voltage-domain",
+		.data = (void *)&soc_data_rk3368_pmu
+	},
 	{ /* sentinel */ },
 };
 
-- 
cgit v1.2.3


From cf184dc2dd33847f4b211b01d8c7ec0526e6c5e4 Mon Sep 17 00:00:00 2001
From: Jaiprakash Singh <b44839@freescale.com>
Date: Wed, 20 May 2015 21:17:11 -0500
Subject: fsl_ifc: Change IO accessor based on endianness

IFC IO accressor are set at run time based
on IFC IP registers endianness.IFC node in
DTS file contains information about
endianness.

Signed-off-by: Jaiprakash Singh <b44839@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Brian Norris <computersforpeace@gmail.com>
---
 .../bindings/memory-controllers/fsl/ifc.txt        |   3 +
 drivers/memory/fsl_ifc.c                           |  43 ++--
 drivers/mtd/nand/fsl_ifc_nand.c                    | 258 +++++++++++----------
 include/linux/fsl_ifc.h                            |  50 ++++
 4 files changed, 213 insertions(+), 141 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt b/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt
index d5e370450ac0..89427b018ba7 100644
--- a/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt
+++ b/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt
@@ -18,6 +18,8 @@ Properties:
               interrupt (NAND_EVTER_STAT).  If there is only one,
               that interrupt reports both types of event.
 
+- little-endian : If this property is absent, the big-endian mode will
+                  be in use as default for registers.
 
 - ranges : Each range corresponds to a single chipselect, and covers
            the entire access window as configured.
@@ -34,6 +36,7 @@ Example:
 		#size-cells = <1>;
 		reg = <0x0 0xffe1e000 0 0x2000>;
 		interrupts = <16 2 19 2>;
+		little-endian;
 
 		/* NOR, NAND Flashes and CPLD on board */
 		ranges = <0x0 0x0 0x0 0xee000000 0x02000000
diff --git a/drivers/memory/fsl_ifc.c b/drivers/memory/fsl_ifc.c
index 410c39749872..e87459f6d686 100644
--- a/drivers/memory/fsl_ifc.c
+++ b/drivers/memory/fsl_ifc.c
@@ -62,7 +62,7 @@ int fsl_ifc_find(phys_addr_t addr_base)
 		return -ENODEV;
 
 	for (i = 0; i < fsl_ifc_ctrl_dev->banks; i++) {
-		u32 cspr = in_be32(&fsl_ifc_ctrl_dev->regs->cspr_cs[i].cspr);
+		u32 cspr = ifc_in32(&fsl_ifc_ctrl_dev->regs->cspr_cs[i].cspr);
 		if (cspr & CSPR_V && (cspr & CSPR_BA) ==
 				convert_ifc_address(addr_base))
 			return i;
@@ -79,16 +79,16 @@ static int fsl_ifc_ctrl_init(struct fsl_ifc_ctrl *ctrl)
 	/*
 	 * Clear all the common status and event registers
 	 */
-	if (in_be32(&ifc->cm_evter_stat) & IFC_CM_EVTER_STAT_CSER)
-		out_be32(&ifc->cm_evter_stat, IFC_CM_EVTER_STAT_CSER);
+	if (ifc_in32(&ifc->cm_evter_stat) & IFC_CM_EVTER_STAT_CSER)
+		ifc_out32(IFC_CM_EVTER_STAT_CSER, &ifc->cm_evter_stat);
 
 	/* enable all error and events */
-	out_be32(&ifc->cm_evter_en, IFC_CM_EVTER_EN_CSEREN);
+	ifc_out32(IFC_CM_EVTER_EN_CSEREN, &ifc->cm_evter_en);
 
 	/* enable all error and event interrupts */
-	out_be32(&ifc->cm_evter_intr_en, IFC_CM_EVTER_INTR_EN_CSERIREN);
-	out_be32(&ifc->cm_erattr0, 0x0);
-	out_be32(&ifc->cm_erattr1, 0x0);
+	ifc_out32(IFC_CM_EVTER_INTR_EN_CSERIREN, &ifc->cm_evter_intr_en);
+	ifc_out32(0x0, &ifc->cm_erattr0);
+	ifc_out32(0x0, &ifc->cm_erattr1);
 
 	return 0;
 }
@@ -127,9 +127,9 @@ static u32 check_nand_stat(struct fsl_ifc_ctrl *ctrl)
 
 	spin_lock_irqsave(&nand_irq_lock, flags);
 
-	stat = in_be32(&ifc->ifc_nand.nand_evter_stat);
+	stat = ifc_in32(&ifc->ifc_nand.nand_evter_stat);
 	if (stat) {
-		out_be32(&ifc->ifc_nand.nand_evter_stat, stat);
+		ifc_out32(stat, &ifc->ifc_nand.nand_evter_stat);
 		ctrl->nand_stat = stat;
 		wake_up(&ctrl->nand_wait);
 	}
@@ -161,16 +161,16 @@ static irqreturn_t fsl_ifc_ctrl_irq(int irqno, void *data)
 	irqreturn_t ret = IRQ_NONE;
 
 	/* read for chip select error */
-	cs_err = in_be32(&ifc->cm_evter_stat);
+	cs_err = ifc_in32(&ifc->cm_evter_stat);
 	if (cs_err) {
 		dev_err(ctrl->dev, "transaction sent to IFC is not mapped to"
 				"any memory bank 0x%08X\n", cs_err);
 		/* clear the chip select error */
-		out_be32(&ifc->cm_evter_stat, IFC_CM_EVTER_STAT_CSER);
+		ifc_out32(IFC_CM_EVTER_STAT_CSER, &ifc->cm_evter_stat);
 
 		/* read error attribute registers print the error information */
-		status = in_be32(&ifc->cm_erattr0);
-		err_addr = in_be32(&ifc->cm_erattr1);
+		status = ifc_in32(&ifc->cm_erattr0);
+		err_addr = ifc_in32(&ifc->cm_erattr1);
 
 		if (status & IFC_CM_ERATTR0_ERTYP_READ)
 			dev_err(ctrl->dev, "Read transaction error"
@@ -231,6 +231,23 @@ static int fsl_ifc_ctrl_probe(struct platform_device *dev)
 		goto err;
 	}
 
+	version = ifc_in32(&fsl_ifc_ctrl_dev->regs->ifc_rev) &
+			FSL_IFC_VERSION_MASK;
+	banks = (version == FSL_IFC_VERSION_1_0_0) ? 4 : 8;
+	dev_info(&dev->dev, "IFC version %d.%d, %d banks\n",
+		version >> 24, (version >> 16) & 0xf, banks);
+
+	fsl_ifc_ctrl_dev->version = version;
+	fsl_ifc_ctrl_dev->banks = banks;
+
+	if (of_property_read_bool(dev->dev.of_node, "little-endian")) {
+		fsl_ifc_ctrl_dev->little_endian = true;
+		dev_dbg(&dev->dev, "IFC REGISTERS are LITTLE endian\n");
+	} else {
+		fsl_ifc_ctrl_dev->little_endian = false;
+		dev_dbg(&dev->dev, "IFC REGISTERS are BIG endian\n");
+	}
+
 	version = ioread32be(&fsl_ifc_ctrl_dev->regs->ifc_rev) &
 			FSL_IFC_VERSION_MASK;
 	banks = (version == FSL_IFC_VERSION_1_0_0) ? 4 : 8;
diff --git a/drivers/mtd/nand/fsl_ifc_nand.c b/drivers/mtd/nand/fsl_ifc_nand.c
index 51394e59901b..a4e27e891153 100644
--- a/drivers/mtd/nand/fsl_ifc_nand.c
+++ b/drivers/mtd/nand/fsl_ifc_nand.c
@@ -238,8 +238,8 @@ static void set_addr(struct mtd_info *mtd, int column, int page_addr, int oob)
 
 	ifc_nand_ctrl->page = page_addr;
 	/* Program ROW0/COL0 */
-	iowrite32be(page_addr, &ifc->ifc_nand.row0);
-	iowrite32be((oob ? IFC_NAND_COL_MS : 0) | column, &ifc->ifc_nand.col0);
+	ifc_out32(page_addr, &ifc->ifc_nand.row0);
+	ifc_out32((oob ? IFC_NAND_COL_MS : 0) | column, &ifc->ifc_nand.col0);
 
 	buf_num = page_addr & priv->bufnum_mask;
 
@@ -301,19 +301,19 @@ static void fsl_ifc_run_command(struct mtd_info *mtd)
 	int i;
 
 	/* set the chip select for NAND Transaction */
-	iowrite32be(priv->bank << IFC_NAND_CSEL_SHIFT,
-		    &ifc->ifc_nand.nand_csel);
+	ifc_out32(priv->bank << IFC_NAND_CSEL_SHIFT,
+		  &ifc->ifc_nand.nand_csel);
 
 	dev_vdbg(priv->dev,
 			"%s: fir0=%08x fcr0=%08x\n",
 			__func__,
-			ioread32be(&ifc->ifc_nand.nand_fir0),
-			ioread32be(&ifc->ifc_nand.nand_fcr0));
+			ifc_in32(&ifc->ifc_nand.nand_fir0),
+			ifc_in32(&ifc->ifc_nand.nand_fcr0));
 
 	ctrl->nand_stat = 0;
 
 	/* start read/write seq */
-	iowrite32be(IFC_NAND_SEQ_STRT_FIR_STRT, &ifc->ifc_nand.nandseq_strt);
+	ifc_out32(IFC_NAND_SEQ_STRT_FIR_STRT, &ifc->ifc_nand.nandseq_strt);
 
 	/* wait for command complete flag or timeout */
 	wait_event_timeout(ctrl->nand_wait, ctrl->nand_stat,
@@ -336,7 +336,7 @@ static void fsl_ifc_run_command(struct mtd_info *mtd)
 		int sector_end = sector + chip->ecc.steps - 1;
 
 		for (i = sector / 4; i <= sector_end / 4; i++)
-			eccstat[i] = ioread32be(&ifc->ifc_nand.nand_eccstat[i]);
+			eccstat[i] = ifc_in32(&ifc->ifc_nand.nand_eccstat[i]);
 
 		for (i = sector; i <= sector_end; i++) {
 			errors = check_read_ecc(mtd, ctrl, eccstat, i);
@@ -376,33 +376,33 @@ static void fsl_ifc_do_read(struct nand_chip *chip,
 
 	/* Program FIR/IFC_NAND_FCR0 for Small/Large page */
 	if (mtd->writesize > 512) {
-		iowrite32be((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
-			    (IFC_FIR_OP_CA0 << IFC_NAND_FIR0_OP1_SHIFT) |
-			    (IFC_FIR_OP_RA0 << IFC_NAND_FIR0_OP2_SHIFT) |
-			    (IFC_FIR_OP_CMD1 << IFC_NAND_FIR0_OP3_SHIFT) |
-			    (IFC_FIR_OP_RBCD << IFC_NAND_FIR0_OP4_SHIFT),
-			    &ifc->ifc_nand.nand_fir0);
-		iowrite32be(0x0, &ifc->ifc_nand.nand_fir1);
-
-		iowrite32be((NAND_CMD_READ0 << IFC_NAND_FCR0_CMD0_SHIFT) |
-			    (NAND_CMD_READSTART << IFC_NAND_FCR0_CMD1_SHIFT),
-			    &ifc->ifc_nand.nand_fcr0);
+		ifc_out32((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
+			  (IFC_FIR_OP_CA0 << IFC_NAND_FIR0_OP1_SHIFT) |
+			  (IFC_FIR_OP_RA0 << IFC_NAND_FIR0_OP2_SHIFT) |
+			  (IFC_FIR_OP_CMD1 << IFC_NAND_FIR0_OP3_SHIFT) |
+			  (IFC_FIR_OP_RBCD << IFC_NAND_FIR0_OP4_SHIFT),
+			  &ifc->ifc_nand.nand_fir0);
+		ifc_out32(0x0, &ifc->ifc_nand.nand_fir1);
+
+		ifc_out32((NAND_CMD_READ0 << IFC_NAND_FCR0_CMD0_SHIFT) |
+			  (NAND_CMD_READSTART << IFC_NAND_FCR0_CMD1_SHIFT),
+			  &ifc->ifc_nand.nand_fcr0);
 	} else {
-		iowrite32be((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
-			    (IFC_FIR_OP_CA0 << IFC_NAND_FIR0_OP1_SHIFT) |
-			    (IFC_FIR_OP_RA0  << IFC_NAND_FIR0_OP2_SHIFT) |
-			    (IFC_FIR_OP_RBCD << IFC_NAND_FIR0_OP3_SHIFT),
-			    &ifc->ifc_nand.nand_fir0);
-		iowrite32be(0x0, &ifc->ifc_nand.nand_fir1);
+		ifc_out32((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
+			  (IFC_FIR_OP_CA0 << IFC_NAND_FIR0_OP1_SHIFT) |
+			  (IFC_FIR_OP_RA0  << IFC_NAND_FIR0_OP2_SHIFT) |
+			  (IFC_FIR_OP_RBCD << IFC_NAND_FIR0_OP3_SHIFT),
+			  &ifc->ifc_nand.nand_fir0);
+		ifc_out32(0x0, &ifc->ifc_nand.nand_fir1);
 
 		if (oob)
-			iowrite32be(NAND_CMD_READOOB <<
-				    IFC_NAND_FCR0_CMD0_SHIFT,
-				    &ifc->ifc_nand.nand_fcr0);
+			ifc_out32(NAND_CMD_READOOB <<
+				  IFC_NAND_FCR0_CMD0_SHIFT,
+				  &ifc->ifc_nand.nand_fcr0);
 		else
-			iowrite32be(NAND_CMD_READ0 <<
-				    IFC_NAND_FCR0_CMD0_SHIFT,
-				    &ifc->ifc_nand.nand_fcr0);
+			ifc_out32(NAND_CMD_READ0 <<
+				  IFC_NAND_FCR0_CMD0_SHIFT,
+				  &ifc->ifc_nand.nand_fcr0);
 	}
 }
 
@@ -422,7 +422,7 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 	switch (command) {
 	/* READ0 read the entire buffer to use hardware ECC. */
 	case NAND_CMD_READ0:
-		iowrite32be(0, &ifc->ifc_nand.nand_fbcr);
+		ifc_out32(0, &ifc->ifc_nand.nand_fbcr);
 		set_addr(mtd, 0, page_addr, 0);
 
 		ifc_nand_ctrl->read_bytes = mtd->writesize + mtd->oobsize;
@@ -437,7 +437,7 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 
 	/* READOOB reads only the OOB because no ECC is performed. */
 	case NAND_CMD_READOOB:
-		iowrite32be(mtd->oobsize - column, &ifc->ifc_nand.nand_fbcr);
+		ifc_out32(mtd->oobsize - column, &ifc->ifc_nand.nand_fbcr);
 		set_addr(mtd, column, page_addr, 1);
 
 		ifc_nand_ctrl->read_bytes = mtd->writesize + mtd->oobsize;
@@ -453,19 +453,19 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 		if (command == NAND_CMD_PARAM)
 			timing = IFC_FIR_OP_RBCD;
 
-		iowrite32be((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
-			    (IFC_FIR_OP_UA  << IFC_NAND_FIR0_OP1_SHIFT) |
-			    (timing << IFC_NAND_FIR0_OP2_SHIFT),
-			    &ifc->ifc_nand.nand_fir0);
-		iowrite32be(command << IFC_NAND_FCR0_CMD0_SHIFT,
-			    &ifc->ifc_nand.nand_fcr0);
-		iowrite32be(column, &ifc->ifc_nand.row3);
+		ifc_out32((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
+			  (IFC_FIR_OP_UA  << IFC_NAND_FIR0_OP1_SHIFT) |
+			  (timing << IFC_NAND_FIR0_OP2_SHIFT),
+			  &ifc->ifc_nand.nand_fir0);
+		ifc_out32(command << IFC_NAND_FCR0_CMD0_SHIFT,
+			  &ifc->ifc_nand.nand_fcr0);
+		ifc_out32(column, &ifc->ifc_nand.row3);
 
 		/*
 		 * although currently it's 8 bytes for READID, we always read
 		 * the maximum 256 bytes(for PARAM)
 		 */
-		iowrite32be(256, &ifc->ifc_nand.nand_fbcr);
+		ifc_out32(256, &ifc->ifc_nand.nand_fbcr);
 		ifc_nand_ctrl->read_bytes = 256;
 
 		set_addr(mtd, 0, 0, 0);
@@ -480,16 +480,16 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 
 	/* ERASE2 uses the block and page address from ERASE1 */
 	case NAND_CMD_ERASE2:
-		iowrite32be((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
-			    (IFC_FIR_OP_RA0 << IFC_NAND_FIR0_OP1_SHIFT) |
-			    (IFC_FIR_OP_CMD1 << IFC_NAND_FIR0_OP2_SHIFT),
-			    &ifc->ifc_nand.nand_fir0);
+		ifc_out32((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
+			  (IFC_FIR_OP_RA0 << IFC_NAND_FIR0_OP1_SHIFT) |
+			  (IFC_FIR_OP_CMD1 << IFC_NAND_FIR0_OP2_SHIFT),
+			  &ifc->ifc_nand.nand_fir0);
 
-		iowrite32be((NAND_CMD_ERASE1 << IFC_NAND_FCR0_CMD0_SHIFT) |
-			    (NAND_CMD_ERASE2 << IFC_NAND_FCR0_CMD1_SHIFT),
-			    &ifc->ifc_nand.nand_fcr0);
+		ifc_out32((NAND_CMD_ERASE1 << IFC_NAND_FCR0_CMD0_SHIFT) |
+			  (NAND_CMD_ERASE2 << IFC_NAND_FCR0_CMD1_SHIFT),
+			  &ifc->ifc_nand.nand_fcr0);
 
-		iowrite32be(0, &ifc->ifc_nand.nand_fbcr);
+		ifc_out32(0, &ifc->ifc_nand.nand_fbcr);
 		ifc_nand_ctrl->read_bytes = 0;
 		fsl_ifc_run_command(mtd);
 		return;
@@ -506,19 +506,18 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 				(NAND_CMD_STATUS << IFC_NAND_FCR0_CMD1_SHIFT) |
 				(NAND_CMD_PAGEPROG << IFC_NAND_FCR0_CMD2_SHIFT);
 
-			iowrite32be(
-				 (IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
-				 (IFC_FIR_OP_CA0 << IFC_NAND_FIR0_OP1_SHIFT) |
-				 (IFC_FIR_OP_RA0 << IFC_NAND_FIR0_OP2_SHIFT) |
-				 (IFC_FIR_OP_WBCD  << IFC_NAND_FIR0_OP3_SHIFT) |
-				 (IFC_FIR_OP_CMD2 << IFC_NAND_FIR0_OP4_SHIFT),
-				 &ifc->ifc_nand.nand_fir0);
-			iowrite32be(
-				 (IFC_FIR_OP_CW1 << IFC_NAND_FIR1_OP5_SHIFT) |
-				 (IFC_FIR_OP_RDSTAT <<
-					IFC_NAND_FIR1_OP6_SHIFT) |
-				 (IFC_FIR_OP_NOP << IFC_NAND_FIR1_OP7_SHIFT),
-				 &ifc->ifc_nand.nand_fir1);
+			ifc_out32(
+				(IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
+				(IFC_FIR_OP_CA0 << IFC_NAND_FIR0_OP1_SHIFT) |
+				(IFC_FIR_OP_RA0 << IFC_NAND_FIR0_OP2_SHIFT) |
+				(IFC_FIR_OP_WBCD << IFC_NAND_FIR0_OP3_SHIFT) |
+				(IFC_FIR_OP_CMD2 << IFC_NAND_FIR0_OP4_SHIFT),
+				&ifc->ifc_nand.nand_fir0);
+			ifc_out32(
+				(IFC_FIR_OP_CW1 << IFC_NAND_FIR1_OP5_SHIFT) |
+				(IFC_FIR_OP_RDSTAT << IFC_NAND_FIR1_OP6_SHIFT) |
+				(IFC_FIR_OP_NOP << IFC_NAND_FIR1_OP7_SHIFT),
+				&ifc->ifc_nand.nand_fir1);
 		} else {
 			nand_fcr0 = ((NAND_CMD_PAGEPROG <<
 					IFC_NAND_FCR0_CMD1_SHIFT) |
@@ -527,20 +526,19 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 				    (NAND_CMD_STATUS <<
 					IFC_NAND_FCR0_CMD3_SHIFT));
 
-			iowrite32be(
+			ifc_out32(
 				(IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
 				(IFC_FIR_OP_CMD2 << IFC_NAND_FIR0_OP1_SHIFT) |
 				(IFC_FIR_OP_CA0 << IFC_NAND_FIR0_OP2_SHIFT) |
 				(IFC_FIR_OP_RA0 << IFC_NAND_FIR0_OP3_SHIFT) |
 				(IFC_FIR_OP_WBCD << IFC_NAND_FIR0_OP4_SHIFT),
 				&ifc->ifc_nand.nand_fir0);
-			iowrite32be(
-				 (IFC_FIR_OP_CMD1 << IFC_NAND_FIR1_OP5_SHIFT) |
-				 (IFC_FIR_OP_CW3 << IFC_NAND_FIR1_OP6_SHIFT) |
-				 (IFC_FIR_OP_RDSTAT <<
-					IFC_NAND_FIR1_OP7_SHIFT) |
-				 (IFC_FIR_OP_NOP << IFC_NAND_FIR1_OP8_SHIFT),
-				  &ifc->ifc_nand.nand_fir1);
+			ifc_out32(
+				(IFC_FIR_OP_CMD1 << IFC_NAND_FIR1_OP5_SHIFT) |
+				(IFC_FIR_OP_CW3 << IFC_NAND_FIR1_OP6_SHIFT) |
+				(IFC_FIR_OP_RDSTAT << IFC_NAND_FIR1_OP7_SHIFT) |
+				(IFC_FIR_OP_NOP << IFC_NAND_FIR1_OP8_SHIFT),
+				&ifc->ifc_nand.nand_fir1);
 
 			if (column >= mtd->writesize)
 				nand_fcr0 |=
@@ -555,7 +553,7 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 			column -= mtd->writesize;
 			ifc_nand_ctrl->oob = 1;
 		}
-		iowrite32be(nand_fcr0, &ifc->ifc_nand.nand_fcr0);
+		ifc_out32(nand_fcr0, &ifc->ifc_nand.nand_fcr0);
 		set_addr(mtd, column, page_addr, ifc_nand_ctrl->oob);
 		return;
 	}
@@ -563,24 +561,26 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 	/* PAGEPROG reuses all of the setup from SEQIN and adds the length */
 	case NAND_CMD_PAGEPROG: {
 		if (ifc_nand_ctrl->oob) {
-			iowrite32be(ifc_nand_ctrl->index -
-				    ifc_nand_ctrl->column,
-				    &ifc->ifc_nand.nand_fbcr);
+			ifc_out32(ifc_nand_ctrl->index -
+				  ifc_nand_ctrl->column,
+				  &ifc->ifc_nand.nand_fbcr);
 		} else {
-			iowrite32be(0, &ifc->ifc_nand.nand_fbcr);
+			ifc_out32(0, &ifc->ifc_nand.nand_fbcr);
 		}
 
 		fsl_ifc_run_command(mtd);
 		return;
 	}
 
-	case NAND_CMD_STATUS:
-		iowrite32be((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
-			    (IFC_FIR_OP_RB << IFC_NAND_FIR0_OP1_SHIFT),
-			    &ifc->ifc_nand.nand_fir0);
-		iowrite32be(NAND_CMD_STATUS << IFC_NAND_FCR0_CMD0_SHIFT,
-			    &ifc->ifc_nand.nand_fcr0);
-		iowrite32be(1, &ifc->ifc_nand.nand_fbcr);
+	case NAND_CMD_STATUS: {
+		void __iomem *addr;
+
+		ifc_out32((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
+			  (IFC_FIR_OP_RB << IFC_NAND_FIR0_OP1_SHIFT),
+			  &ifc->ifc_nand.nand_fir0);
+		ifc_out32(NAND_CMD_STATUS << IFC_NAND_FCR0_CMD0_SHIFT,
+			  &ifc->ifc_nand.nand_fcr0);
+		ifc_out32(1, &ifc->ifc_nand.nand_fbcr);
 		set_addr(mtd, 0, 0, 0);
 		ifc_nand_ctrl->read_bytes = 1;
 
@@ -590,17 +590,19 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command,
 		 * The chip always seems to report that it is
 		 * write-protected, even when it is not.
 		 */
+		addr = ifc_nand_ctrl->addr;
 		if (chip->options & NAND_BUSWIDTH_16)
-			setbits16(ifc_nand_ctrl->addr, NAND_STATUS_WP);
+			ifc_out16(ifc_in16(addr) | (NAND_STATUS_WP), addr);
 		else
-			setbits8(ifc_nand_ctrl->addr, NAND_STATUS_WP);
+			ifc_out8(ifc_in8(addr) | (NAND_STATUS_WP), addr);
 		return;
+	}
 
 	case NAND_CMD_RESET:
-		iowrite32be(IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT,
-			    &ifc->ifc_nand.nand_fir0);
-		iowrite32be(NAND_CMD_RESET << IFC_NAND_FCR0_CMD0_SHIFT,
-			    &ifc->ifc_nand.nand_fcr0);
+		ifc_out32(IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT,
+			  &ifc->ifc_nand.nand_fir0);
+		ifc_out32(NAND_CMD_RESET << IFC_NAND_FCR0_CMD0_SHIFT,
+			  &ifc->ifc_nand.nand_fcr0);
 		fsl_ifc_run_command(mtd);
 		return;
 
@@ -658,7 +660,7 @@ static uint8_t fsl_ifc_read_byte(struct mtd_info *mtd)
 	 */
 	if (ifc_nand_ctrl->index < ifc_nand_ctrl->read_bytes) {
 		offset = ifc_nand_ctrl->index++;
-		return in_8(ifc_nand_ctrl->addr + offset);
+		return ifc_in8(ifc_nand_ctrl->addr + offset);
 	}
 
 	dev_err(priv->dev, "%s: beyond end of buffer\n", __func__);
@@ -680,7 +682,7 @@ static uint8_t fsl_ifc_read_byte16(struct mtd_info *mtd)
 	 * next byte.
 	 */
 	if (ifc_nand_ctrl->index < ifc_nand_ctrl->read_bytes) {
-		data = in_be16(ifc_nand_ctrl->addr + ifc_nand_ctrl->index);
+		data = ifc_in16(ifc_nand_ctrl->addr + ifc_nand_ctrl->index);
 		ifc_nand_ctrl->index += 2;
 		return (uint8_t) data;
 	}
@@ -726,18 +728,18 @@ static int fsl_ifc_wait(struct mtd_info *mtd, struct nand_chip *chip)
 	u32 nand_fsr;
 
 	/* Use READ_STATUS command, but wait for the device to be ready */
-	iowrite32be((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
-		    (IFC_FIR_OP_RDSTAT << IFC_NAND_FIR0_OP1_SHIFT),
-		    &ifc->ifc_nand.nand_fir0);
-	iowrite32be(NAND_CMD_STATUS << IFC_NAND_FCR0_CMD0_SHIFT,
-		    &ifc->ifc_nand.nand_fcr0);
-	iowrite32be(1, &ifc->ifc_nand.nand_fbcr);
+	ifc_out32((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
+		  (IFC_FIR_OP_RDSTAT << IFC_NAND_FIR0_OP1_SHIFT),
+		  &ifc->ifc_nand.nand_fir0);
+	ifc_out32(NAND_CMD_STATUS << IFC_NAND_FCR0_CMD0_SHIFT,
+		  &ifc->ifc_nand.nand_fcr0);
+	ifc_out32(1, &ifc->ifc_nand.nand_fbcr);
 	set_addr(mtd, 0, 0, 0);
 	ifc_nand_ctrl->read_bytes = 1;
 
 	fsl_ifc_run_command(mtd);
 
-	nand_fsr = ioread32be(&ifc->ifc_nand.nand_fsr);
+	nand_fsr = ifc_in32(&ifc->ifc_nand.nand_fsr);
 
 	/*
 	 * The chip always seems to report that it is
@@ -829,34 +831,34 @@ static void fsl_ifc_sram_init(struct fsl_ifc_mtd *priv)
 	uint32_t cs = priv->bank;
 
 	/* Save CSOR and CSOR_ext */
-	csor = ioread32be(&ifc->csor_cs[cs].csor);
-	csor_ext = ioread32be(&ifc->csor_cs[cs].csor_ext);
+	csor = ifc_in32(&ifc->csor_cs[cs].csor);
+	csor_ext = ifc_in32(&ifc->csor_cs[cs].csor_ext);
 
 	/* chage PageSize 8K and SpareSize 1K*/
 	csor_8k = (csor & ~(CSOR_NAND_PGS_MASK)) | 0x0018C000;
-	iowrite32be(csor_8k, &ifc->csor_cs[cs].csor);
-	iowrite32be(0x0000400, &ifc->csor_cs[cs].csor_ext);
+	ifc_out32(csor_8k, &ifc->csor_cs[cs].csor);
+	ifc_out32(0x0000400, &ifc->csor_cs[cs].csor_ext);
 
 	/* READID */
-	iowrite32be((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
-		    (IFC_FIR_OP_UA  << IFC_NAND_FIR0_OP1_SHIFT) |
-		    (IFC_FIR_OP_RB << IFC_NAND_FIR0_OP2_SHIFT),
-		    &ifc->ifc_nand.nand_fir0);
-	iowrite32be(NAND_CMD_READID << IFC_NAND_FCR0_CMD0_SHIFT,
-		    &ifc->ifc_nand.nand_fcr0);
-	iowrite32be(0x0, &ifc->ifc_nand.row3);
+	ifc_out32((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) |
+		  (IFC_FIR_OP_UA  << IFC_NAND_FIR0_OP1_SHIFT) |
+		  (IFC_FIR_OP_RB << IFC_NAND_FIR0_OP2_SHIFT),
+		  &ifc->ifc_nand.nand_fir0);
+	ifc_out32(NAND_CMD_READID << IFC_NAND_FCR0_CMD0_SHIFT,
+		  &ifc->ifc_nand.nand_fcr0);
+	ifc_out32(0x0, &ifc->ifc_nand.row3);
 
-	iowrite32be(0x0, &ifc->ifc_nand.nand_fbcr);
+	ifc_out32(0x0, &ifc->ifc_nand.nand_fbcr);
 
 	/* Program ROW0/COL0 */
-	iowrite32be(0x0, &ifc->ifc_nand.row0);
-	iowrite32be(0x0, &ifc->ifc_nand.col0);
+	ifc_out32(0x0, &ifc->ifc_nand.row0);
+	ifc_out32(0x0, &ifc->ifc_nand.col0);
 
 	/* set the chip select for NAND Transaction */
-	iowrite32be(cs << IFC_NAND_CSEL_SHIFT, &ifc->ifc_nand.nand_csel);
+	ifc_out32(cs << IFC_NAND_CSEL_SHIFT, &ifc->ifc_nand.nand_csel);
 
 	/* start read seq */
-	iowrite32be(IFC_NAND_SEQ_STRT_FIR_STRT, &ifc->ifc_nand.nandseq_strt);
+	ifc_out32(IFC_NAND_SEQ_STRT_FIR_STRT, &ifc->ifc_nand.nandseq_strt);
 
 	/* wait for command complete flag or timeout */
 	wait_event_timeout(ctrl->nand_wait, ctrl->nand_stat,
@@ -866,8 +868,8 @@ static void fsl_ifc_sram_init(struct fsl_ifc_mtd *priv)
 		printk(KERN_ERR "fsl-ifc: Failed to Initialise SRAM\n");
 
 	/* Restore CSOR and CSOR_ext */
-	iowrite32be(csor, &ifc->csor_cs[cs].csor);
-	iowrite32be(csor_ext, &ifc->csor_cs[cs].csor_ext);
+	ifc_out32(csor, &ifc->csor_cs[cs].csor);
+	ifc_out32(csor_ext, &ifc->csor_cs[cs].csor_ext);
 }
 
 static int fsl_ifc_chip_init(struct fsl_ifc_mtd *priv)
@@ -884,7 +886,7 @@ static int fsl_ifc_chip_init(struct fsl_ifc_mtd *priv)
 
 	/* fill in nand_chip structure */
 	/* set up function call table */
-	if ((ioread32be(&ifc->cspr_cs[priv->bank].cspr)) & CSPR_PORT_SIZE_16)
+	if ((ifc_in32(&ifc->cspr_cs[priv->bank].cspr)) & CSPR_PORT_SIZE_16)
 		chip->read_byte = fsl_ifc_read_byte16;
 	else
 		chip->read_byte = fsl_ifc_read_byte;
@@ -898,13 +900,13 @@ static int fsl_ifc_chip_init(struct fsl_ifc_mtd *priv)
 	chip->bbt_td = &bbt_main_descr;
 	chip->bbt_md = &bbt_mirror_descr;
 
-	iowrite32be(0x0, &ifc->ifc_nand.ncfgr);
+	ifc_out32(0x0, &ifc->ifc_nand.ncfgr);
 
 	/* set up nand options */
 	chip->bbt_options = NAND_BBT_USE_FLASH;
 	chip->options = NAND_NO_SUBPAGE_WRITE;
 
-	if (ioread32be(&ifc->cspr_cs[priv->bank].cspr) & CSPR_PORT_SIZE_16) {
+	if (ifc_in32(&ifc->cspr_cs[priv->bank].cspr) & CSPR_PORT_SIZE_16) {
 		chip->read_byte = fsl_ifc_read_byte16;
 		chip->options |= NAND_BUSWIDTH_16;
 	} else {
@@ -917,7 +919,7 @@ static int fsl_ifc_chip_init(struct fsl_ifc_mtd *priv)
 	chip->ecc.read_page = fsl_ifc_read_page;
 	chip->ecc.write_page = fsl_ifc_write_page;
 
-	csor = ioread32be(&ifc->csor_cs[priv->bank].csor);
+	csor = ifc_in32(&ifc->csor_cs[priv->bank].csor);
 
 	/* Hardware generates ECC per 512 Bytes */
 	chip->ecc.size = 512;
@@ -1006,7 +1008,7 @@ static int fsl_ifc_chip_remove(struct fsl_ifc_mtd *priv)
 static int match_bank(struct fsl_ifc_regs __iomem *ifc, int bank,
 		      phys_addr_t addr)
 {
-	u32 cspr = ioread32be(&ifc->cspr_cs[bank].cspr);
+	u32 cspr = ifc_in32(&ifc->cspr_cs[bank].cspr);
 
 	if (!(cspr & CSPR_V))
 		return 0;
@@ -1092,16 +1094,16 @@ static int fsl_ifc_nand_probe(struct platform_device *dev)
 
 	dev_set_drvdata(priv->dev, priv);
 
-	iowrite32be(IFC_NAND_EVTER_EN_OPC_EN |
-		    IFC_NAND_EVTER_EN_FTOER_EN |
-		    IFC_NAND_EVTER_EN_WPER_EN,
-		    &ifc->ifc_nand.nand_evter_en);
+	ifc_out32(IFC_NAND_EVTER_EN_OPC_EN |
+		  IFC_NAND_EVTER_EN_FTOER_EN |
+		  IFC_NAND_EVTER_EN_WPER_EN,
+		  &ifc->ifc_nand.nand_evter_en);
 
 	/* enable NAND Machine Interrupts */
-	iowrite32be(IFC_NAND_EVTER_INTR_OPCIR_EN |
-		    IFC_NAND_EVTER_INTR_FTOERIR_EN |
-		    IFC_NAND_EVTER_INTR_WPERIR_EN,
-		    &ifc->ifc_nand.nand_evter_intr_en);
+	ifc_out32(IFC_NAND_EVTER_INTR_OPCIR_EN |
+		  IFC_NAND_EVTER_INTR_FTOERIR_EN |
+		  IFC_NAND_EVTER_INTR_WPERIR_EN,
+		  &ifc->ifc_nand.nand_evter_intr_en);
 	priv->mtd.name = kasprintf(GFP_KERNEL, "%llx.flash", (u64)res.start);
 	if (!priv->mtd.name) {
 		ret = -ENOMEM;
diff --git a/include/linux/fsl_ifc.h b/include/linux/fsl_ifc.h
index bf0321eabbda..0023088b253b 100644
--- a/include/linux/fsl_ifc.h
+++ b/include/linux/fsl_ifc.h
@@ -841,9 +841,59 @@ struct fsl_ifc_ctrl {
 
 	u32 nand_stat;
 	wait_queue_head_t nand_wait;
+	bool little_endian;
 };
 
 extern struct fsl_ifc_ctrl *fsl_ifc_ctrl_dev;
 
+static inline u32 ifc_in32(void __iomem *addr)
+{
+	u32 val;
+
+	if (fsl_ifc_ctrl_dev->little_endian)
+		val = ioread32(addr);
+	else
+		val = ioread32be(addr);
+
+	return val;
+}
+
+static inline u16 ifc_in16(void __iomem *addr)
+{
+	u16 val;
+
+	if (fsl_ifc_ctrl_dev->little_endian)
+		val = ioread16(addr);
+	else
+		val = ioread16be(addr);
+
+	return val;
+}
+
+static inline u8 ifc_in8(void __iomem *addr)
+{
+	return ioread8(addr);
+}
+
+static inline void ifc_out32(u32 val, void __iomem *addr)
+{
+	if (fsl_ifc_ctrl_dev->little_endian)
+		iowrite32(val, addr);
+	else
+		iowrite32be(val, addr);
+}
+
+static inline void ifc_out16(u16 val, void __iomem *addr)
+{
+	if (fsl_ifc_ctrl_dev->little_endian)
+		iowrite16(val, addr);
+	else
+		iowrite16be(val, addr);
+}
+
+static inline void ifc_out8(u8 val, void __iomem *addr)
+{
+	iowrite8(val, addr);
+}
 
 #endif /* __ASM_FSL_IFC_H */
-- 
cgit v1.2.3


From 378abcdf3297613f6712343ce3a79b7d6abdf955 Mon Sep 17 00:00:00 2001
From: Alexandru M Stan <amstan@chromium.org>
Date: Wed, 29 Jul 2015 20:57:20 +0200
Subject: ARM: dts: rockchip: add veyron-minnie board

Also known as the Asus Chromebook Flip.

Signed-off-by: Alexandru M Stan <amstan@chromium.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
---
 Documentation/devicetree/bindings/arm/rockchip.txt |   7 +
 arch/arm/boot/dts/Makefile                         |   1 +
 arch/arm/boot/dts/rk3288-veyron-minnie.dts         | 226 +++++++++++++++++++++
 3 files changed, 234 insertions(+)
 create mode 100644 arch/arm/boot/dts/rk3288-veyron-minnie.dts

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/rockchip.txt b/Documentation/devicetree/bindings/arm/rockchip.txt
index c7411cc53533..af58cd74aeff 100644
--- a/Documentation/devicetree/bindings/arm/rockchip.txt
+++ b/Documentation/devicetree/bindings/arm/rockchip.txt
@@ -38,6 +38,13 @@ Rockchip platforms device tree bindings
 		     "google,veyron-jerry-rev3", "google,veyron-jerry",
 		     "google,veyron", "rockchip,rk3288";
 
+- Google Minnie (Asus Chromebook Flip C100P):
+    Required root node properties:
+      - compatible = "google,veyron-minnie-rev4", "google,veyron-minnie-rev3",
+		     "google,veyron-minnie-rev2", "google,veyron-minnie-rev1",
+		     "google,veyron-minnie-rev0", "google,veyron-minnie",
+		     "google,veyron", "rockchip,rk3288";
+
 - Google Pinky (dev-board):
     Required root node properties:
       - compatible = "google,veyron-pinky-rev2", "google,veyron-pinky",
diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 7017d8e3f4f5..7805a6541a38 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -491,6 +491,7 @@ dtb-$(CONFIG_ARCH_ROCKCHIP) += \
 	rk3288-firefly.dtb \
 	rk3288-r89.dtb \
 	rk3288-veyron-jerry.dtb \
+	rk3288-veyron-minnie.dtb \
 	rk3288-veyron-pinky.dtb \
 	rk3288-veyron-speedy.dtb
 dtb-$(CONFIG_ARCH_S3C24XX) += \
diff --git a/arch/arm/boot/dts/rk3288-veyron-minnie.dts b/arch/arm/boot/dts/rk3288-veyron-minnie.dts
new file mode 100644
index 000000000000..0e30bd6bf92b
--- /dev/null
+++ b/arch/arm/boot/dts/rk3288-veyron-minnie.dts
@@ -0,0 +1,226 @@
+/*
+ * Google Veyron Minnie Rev 0+ board device tree source
+ *
+ * Copyright 2015 Google, Inc
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This file is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *  Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+#include "rk3288-veyron-chromebook.dtsi"
+
+/ {
+	model = "Google Minnie";
+	compatible = "google,veyron-minnie-rev4", "google,veyron-minnie-rev3",
+		     "google,veyron-minnie-rev2", "google,veyron-minnie-rev1",
+		     "google,veyron-minnie-rev0", "google,veyron-minnie",
+		     "google,veyron", "rockchip,rk3288";
+
+	backlight_regulator: backlight-regulator {
+		compatible = "regulator-fixed";
+		enable-active-high;
+		gpio = <&gpio2 12 GPIO_ACTIVE_HIGH>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&bl_pwr_en>;
+		regulator-name = "backlight_regulator";
+		vin-supply = <&vcc33_sys>;
+		startup-delay-us = <15000>;
+	};
+
+	panel_regulator: panel-regulator {
+		compatible = "regulator-fixed";
+		enable-active-high;
+		gpio = <&gpio7 14 GPIO_ACTIVE_HIGH>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&lcd_enable_h>;
+		regulator-name = "panel_regulator";
+		vin-supply = <&vcc33_sys>;
+	};
+
+	vcc18_lcd: vcc18-lcd {
+		compatible = "regulator-fixed";
+		enable-active-high;
+		gpio = <&gpio2 13 GPIO_ACTIVE_HIGH>;
+		pinctrl-names = "default";
+		pinctrl-0 = <&avdd_1v8_disp_en>;
+		regulator-name = "vcc18_lcd";
+		regulator-always-on;
+		regulator-boot-on;
+		vin-supply = <&vcc18_wl>;
+	};
+};
+
+&gpio_keys {
+	pinctrl-0 = <&pwr_key_l &ap_lid_int_l &volum_down_l &volum_up_l>;
+
+	volum_down {
+		label = "Volum_down";
+		gpios = <&gpio5 11 GPIO_ACTIVE_LOW>;
+		linux,code = <KEY_VOLUMEDOWN>;
+		debounce-interval = <100>;
+	};
+
+	volum_up {
+		label = "Volum_up";
+		gpios = <&gpio5 10 GPIO_ACTIVE_LOW>;
+		linux,code = <KEY_VOLUMEUP>;
+		debounce-interval = <100>;
+	};
+};
+
+&i2c_tunnel {
+	battery: bq27500@55 {
+		compatible = "ti,bq27500";
+		reg = <0x55>;
+	};
+};
+
+&i2c3 {
+	status = "okay";
+
+	clock-frequency = <400000>;
+	i2c-scl-falling-time-ns = <50>;
+	i2c-scl-rising-time-ns = <300>;
+};
+
+&rk808 {
+	pinctrl-names = "default";
+	pinctrl-0 = <&pmic_int_l &dvs_1 &dvs_2>;
+
+	regulators {
+		vcc33_touch: LDO_REG2 {
+			regulator-min-microvolt = <3300000>;
+			regulator-max-microvolt = <3300000>;
+			regulator-name = "vcc33_touch";
+			regulator-suspend-mem-disabled;
+		};
+
+		vcc5v_touch: SWITCH_REG2 {
+			regulator-name = "vcc5v_touch";
+			regulator-suspend-mem-disabled;
+		};
+	};
+};
+
+&sdmmc {
+	disable-wp;
+	pinctrl-names = "default";
+	pinctrl-0 = <&sdmmc_clk &sdmmc_cmd &sdmmc_cd_disabled &sdmmc_cd_gpio
+			&sdmmc_bus4>;
+};
+
+&vcc_5v {
+	enable-active-high;
+	gpio = <&gpio7 21 GPIO_ACTIVE_HIGH>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&drv_5v>;
+};
+
+&vcc50_hdmi {
+	enable-active-high;
+	gpio = <&gpio5 19 GPIO_ACTIVE_HIGH>;
+	pinctrl-names = "default";
+	pinctrl-0 = <&vcc50_hdmi_en>;
+};
+
+&pinctrl {
+	backlight {
+		bl_pwr_en: bl_pwr_en {
+			rockchip,pins = <2 12 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	buck-5v {
+		drv_5v: drv-5v {
+			rockchip,pins = <7 21 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	buttons {
+		volum_down_l: volum-down-l {
+			rockchip,pins = <5 11 RK_FUNC_GPIO &pcfg_pull_up>;
+		};
+
+		volum_up_l: volum-up-l {
+			rockchip,pins = <5 10 RK_FUNC_GPIO &pcfg_pull_up>;
+		};
+	};
+
+	hdmi {
+		vcc50_hdmi_en: vcc50-hdmi-en {
+			rockchip,pins = <5 19 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	lcd {
+		lcd_enable_h: lcd-en {
+			rockchip,pins = <7 14 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+
+		avdd_1v8_disp_en: avdd-1v8-disp-en {
+			rockchip,pins = <2 13 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	pmic {
+		dvs_1: dvs-1 {
+			rockchip,pins = <7 12 RK_FUNC_GPIO &pcfg_pull_down>;
+		};
+
+		dvs_2: dvs-2 {
+			rockchip,pins = <7 15 RK_FUNC_GPIO &pcfg_pull_down>;
+		};
+	};
+
+	prochot {
+		gpio_prochot: gpio-prochot {
+			rockchip,pins = <2 8 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+
+	touchscreen {
+		touch_int: touch-int {
+			rockchip,pins = <2 14 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+
+		touch_rst: touch-rst {
+			rockchip,pins = <2 15 RK_FUNC_GPIO &pcfg_pull_none>;
+		};
+	};
+};
-- 
cgit v1.2.3


From aab978772a4fbe71d8231cb58d0989737db27a36 Mon Sep 17 00:00:00 2001
From: Daniel Baluta <daniel.baluta@intel.com>
Date: Tue, 4 Aug 2015 17:20:08 +0300
Subject: DocBook: Add initial documentation for IIO

This is intended to help developers faster find their way
inside the Industrial I/O core and reduce time spent on IIO
drivers development.

Signed-off-by: Daniel Baluta <daniel.baluta@intel.com>
Acked-by: Crt Mori <cmo@melexis.com>
Reviewed-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
---
 Documentation/DocBook/Makefile |   2 +-
 Documentation/DocBook/iio.tmpl | 697 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 698 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/DocBook/iio.tmpl

(limited to 'Documentation')

diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index b6a6a2e0dd3b..9e086067b4ae 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -15,7 +15,7 @@ DOCBOOKS := z8530book.xml device-drivers.xml \
 	    80211.xml debugobjects.xml sh.xml regulator.xml \
 	    alsa-driver-api.xml writing-an-alsa-driver.xml \
 	    tracepoint.xml drm.xml media_api.xml w1.xml \
-	    writing_musb_glue_layer.xml crypto-API.xml
+	    writing_musb_glue_layer.xml crypto-API.xml iio.xml
 
 include Documentation/DocBook/media/Makefile
 
diff --git a/Documentation/DocBook/iio.tmpl b/Documentation/DocBook/iio.tmpl
new file mode 100644
index 000000000000..06bb53de5a47
--- /dev/null
+++ b/Documentation/DocBook/iio.tmpl
@@ -0,0 +1,697 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
+	"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
+
+<book id="iioid">
+  <bookinfo>
+    <title>Industrial I/O driver developer's guide </title>
+
+    <authorgroup>
+      <author>
+        <firstname>Daniel</firstname>
+        <surname>Baluta</surname>
+        <affiliation>
+          <address>
+            <email>daniel.baluta@intel.com</email>
+          </address>
+        </affiliation>
+      </author>
+    </authorgroup>
+
+    <copyright>
+      <year>2015</year>
+      <holder>Intel Corporation</holder>
+    </copyright>
+
+    <legalnotice>
+      <para>
+        This documentation is free software; you can redistribute
+        it and/or modify it under the terms of the GNU General Public
+        License version 2.
+      </para>
+    </legalnotice>
+  </bookinfo>
+
+  <toc></toc>
+
+  <chapter id="intro">
+    <title>Introduction</title>
+    <para>
+      The main purpose of the Industrial I/O subsystem (IIO) is to provide
+      support for devices that in some sense perform either analog-to-digital
+      conversion (ADC) or digital-to-analog conversion (DAC) or both. The aim
+      is to fill the gap between the somewhat similar hwmon and input
+      subsystems.
+      Hwmon is directed at low sample rate sensors used to monitor and
+      control the system itself, like fan speed control or temperature
+      measurement. Input is, as its name suggests, focused on human interaction
+      input devices (keyboard, mouse, touchscreen). In some cases there is
+      considerable overlap between these and IIO.
+  </para>
+  <para>
+    Devices that fall into this category include:
+    <itemizedlist>
+      <listitem>
+        analog to digital converters (ADCs)
+      </listitem>
+      <listitem>
+        accelerometers
+      </listitem>
+      <listitem>
+        capacitance to digital converters (CDCs)
+      </listitem>
+      <listitem>
+        digital to analog converters (DACs)
+      </listitem>
+      <listitem>
+        gyroscopes
+      </listitem>
+      <listitem>
+        inertial measurement units (IMUs)
+      </listitem>
+      <listitem>
+        color and light sensors
+      </listitem>
+      <listitem>
+        magnetometers
+      </listitem>
+      <listitem>
+        pressure sensors
+      </listitem>
+      <listitem>
+        proximity sensors
+      </listitem>
+      <listitem>
+        temperature sensors
+      </listitem>
+    </itemizedlist>
+    Usually these sensors are connected via SPI or I2C. A common use case of the
+    sensors devices is to have combined functionality (e.g. light plus proximity
+    sensor).
+  </para>
+  </chapter>
+  <chapter id='iiosubsys'>
+    <title>Industrial I/O core</title>
+    <para>
+      The Industrial I/O core offers:
+      <itemizedlist>
+        <listitem>
+         a unified framework for writing drivers for many different types of
+         embedded sensors.
+        </listitem>
+        <listitem>
+         a standard interface to user space applications manipulating sensors.
+        </listitem>
+      </itemizedlist>
+      The implementation can be found under <filename>
+      drivers/iio/industrialio-*</filename>
+  </para>
+  <sect1 id="iiodevice">
+    <title> Industrial I/O devices </title>
+
+!Finclude/linux/iio/iio.h iio_dev
+!Fdrivers/iio/industrialio-core.c iio_device_alloc
+!Fdrivers/iio/industrialio-core.c iio_device_free
+!Fdrivers/iio/industrialio-core.c iio_device_register
+!Fdrivers/iio/industrialio-core.c iio_device_unregister
+
+    <para>
+      An IIO device usually corresponds to a single hardware sensor and it
+      provides all the information needed by a driver handling a device.
+      Let's first have a look at the functionality embedded in an IIO
+      device then we will show how a device driver makes use of an IIO
+      device.
+    </para>
+    <para>
+        There are two ways for a user space application to interact
+        with an IIO driver.
+      <itemizedlist>
+        <listitem>
+          <filename>/sys/bus/iio/iio:deviceX/</filename>, this
+          represents a hardware sensor and groups together the data
+          channels of the same chip.
+        </listitem>
+        <listitem>
+          <filename>/dev/iio:deviceX</filename>, character device node
+          interface used for buffered data transfer and for events information
+          retrieval.
+        </listitem>
+      </itemizedlist>
+    </para>
+    A typical IIO driver will register itself as an I2C or SPI driver and will
+    create two routines, <function> probe </function> and <function> remove
+    </function>. At <function>probe</function>:
+    <itemizedlist>
+    <listitem>call <function>iio_device_alloc</function>, which allocates memory
+      for an IIO device.
+    </listitem>
+    <listitem> initialize IIO device fields with driver specific information
+              (e.g. device name, device channels).
+    </listitem>
+    <listitem>call <function> iio_device_register</function>, this registers the
+      device with the IIO core. After this call the device is ready to accept
+      requests from user space applications.
+    </listitem>
+    </itemizedlist>
+      At <function>remove</function>, we free the resources allocated in
+      <function>probe</function> in reverse order:
+    <itemizedlist>
+    <listitem><function>iio_device_unregister</function>, unregister the device
+      from the IIO core.
+    </listitem>
+    <listitem><function>iio_device_free</function>, free the memory allocated
+      for the IIO device.
+    </listitem>
+    </itemizedlist>
+
+    <sect2 id="iioattr"> <title> IIO device sysfs interface </title>
+      <para>
+        Attributes are sysfs files used to expose chip info and also allowing
+        applications to set various configuration parameters. For device
+        with index X, attributes can be found under
+        <filename>/sys/bus/iio/iio:deviceX/ </filename> directory.
+        Common attributes are:
+        <itemizedlist>
+          <listitem><filename>name</filename>, description of the physical
+            chip.
+          </listitem>
+          <listitem><filename>dev</filename>, shows the major:minor pair
+            associated with <filename>/dev/iio:deviceX</filename> node.
+          </listitem>
+          <listitem><filename>sampling_frequency_available</filename>,
+            available discrete set of sampling frequency values for
+            device.
+          </listitem>
+      </itemizedlist>
+      Available standard attributes for IIO devices are described in the
+      <filename>Documentation/ABI/testing/sysfs-bus-iio </filename> file
+      in the Linux kernel sources.
+      </para>
+    </sect2>
+    <sect2 id="iiochannel"> <title> IIO device channels </title>
+!Finclude/linux/iio/iio.h iio_chan_spec structure.
+      <para>
+        An IIO device channel is a representation of a data channel. An
+        IIO device can have one or multiple channels. For example:
+        <itemizedlist>
+          <listitem>
+          a thermometer sensor has one channel representing the
+          temperature measurement.
+          </listitem>
+          <listitem>
+          a light sensor with two channels indicating the measurements in
+          the visible and infrared spectrum.
+          </listitem>
+          <listitem>
+          an accelerometer can have up to 3 channels representing
+          acceleration on X, Y and Z axes.
+          </listitem>
+        </itemizedlist>
+      An IIO channel is described by the <type> struct iio_chan_spec
+      </type>. A thermometer driver for the temperature sensor in the
+      example above would have to describe its channel as follows:
+      <programlisting>
+      static const struct iio_chan_spec temp_channel[] = {
+          {
+              .type = IIO_TEMP,
+              .info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED),
+          },
+      };
+
+      </programlisting>
+      Channel sysfs attributes exposed to userspace are specified in
+      the form of <emphasis>bitmasks</emphasis>. Depending on their
+      shared info, attributes can be set in one of the following masks:
+      <itemizedlist>
+      <listitem><emphasis>info_mask_separate</emphasis>, attributes will
+        be specific to this channel</listitem>
+      <listitem><emphasis>info_mask_shared_by_type</emphasis>,
+        attributes are shared by all channels of the same type</listitem>
+      <listitem><emphasis>info_mask_shared_by_dir</emphasis>, attributes
+        are shared by all channels of the same direction </listitem>
+      <listitem><emphasis>info_mask_shared_by_all</emphasis>,
+        attributes are shared by all channels</listitem>
+      </itemizedlist>
+      When there are multiple data channels per channel type we have two
+      ways to distinguish between them:
+      <itemizedlist>
+      <listitem> set <emphasis> .modified</emphasis> field of <type>
+        iio_chan_spec</type> to 1. Modifiers are specified using
+        <emphasis>.channel2</emphasis> field of the same
+        <type>iio_chan_spec</type> structure and are used to indicate a
+        physically unique characteristic of the channel such as its direction
+        or spectral response. For example, a light sensor can have two channels,
+        one for infrared light and one for both infrared and visible light.
+      </listitem>
+      <listitem> set <emphasis>.indexed </emphasis> field of
+        <type>iio_chan_spec</type> to 1. In this case the channel is
+        simply another instance with an index specified by the
+        <emphasis>.channel</emphasis> field.
+      </listitem>
+      </itemizedlist>
+      Here is how we can make use of the channel's modifiers:
+      <programlisting>
+      static const struct iio_chan_spec light_channels[] = {
+          {
+              .type = IIO_INTENSITY,
+              .modified = 1,
+              .channel2 = IIO_MOD_LIGHT_IR,
+              .info_mask_separate = BIT(IIO_CHAN_INFO_RAW),
+              .info_mask_shared = BIT(IIO_CHAN_INFO_SAMP_FREQ),
+          },
+          {
+              .type = IIO_INTENSITY,
+              .modified = 1,
+              .channel2 = IIO_MOD_LIGHT_BOTH,
+              .info_mask_separate = BIT(IIO_CHAN_INFO_RAW),
+              .info_mask_shared = BIT(IIO_CHAN_INFO_SAMP_FREQ),
+          },
+          {
+              .type = IIO_LIGHT,
+              .info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED),
+              .info_mask_shared = BIT(IIO_CHAN_INFO_SAMP_FREQ),
+          },
+
+      }
+      </programlisting>
+      This channel's definition will generate two separate sysfs files
+      for raw data retrieval:
+      <itemizedlist>
+      <listitem>
+      <filename>/sys/bus/iio/iio:deviceX/in_intensity_ir_raw</filename>
+      </listitem>
+      <listitem>
+      <filename>/sys/bus/iio/iio:deviceX/in_intensity_both_raw</filename>
+      </listitem>
+      </itemizedlist>
+      one file for processed data:
+      <itemizedlist>
+      <listitem>
+      <filename>/sys/bus/iio/iio:deviceX/in_illuminance_input
+      </filename>
+      </listitem>
+      </itemizedlist>
+      and one shared sysfs file for sampling frequency:
+      <itemizedlist>
+      <listitem>
+      <filename>/sys/bus/iio/iio:deviceX/sampling_frequency.
+      </filename>
+      </listitem>
+      </itemizedlist>
+      </para>
+      <para>
+      Here is how we can make use of the channel's indexing:
+      <programlisting>
+      static const struct iio_chan_spec light_channels[] = {
+          {
+              .type = IIO_VOLTAGE,
+              .indexed = 1,
+              .channel = 0,
+              .info_mask_separate = BIT(IIO_CHAN_INFO_RAW),
+          },
+          {
+              .type = IIO_VOLTAGE,
+              .indexed = 1,
+              .channel = 1,
+              .info_mask_separate = BIT(IIO_CHAN_INFO_RAW),
+          },
+      }
+      </programlisting>
+      This will generate two separate attributes files for raw data
+      retrieval:
+      <itemizedlist>
+      <listitem>
+        <filename>/sys/bus/iio/devices/iio:deviceX/in_voltage0_raw</filename>,
+          representing voltage measurement for channel 0.
+      </listitem>
+      <listitem>
+        <filename>/sys/bus/iio/devices/iio:deviceX/in_voltage1_raw</filename>,
+          representing voltage measurement for channel 1.
+      </listitem>
+      </itemizedlist>
+      </para>
+    </sect2>
+  </sect1>
+
+  <sect1 id="iiobuffer"> <title> Industrial I/O buffers </title>
+!Finclude/linux/iio/buffer.h iio_buffer
+!Edrivers/iio/industrialio-buffer.c
+
+    <para>
+    The Industrial I/O core offers a way for continuous data capture
+    based on a trigger source. Multiple data channels can be read at once
+    from <filename>/dev/iio:deviceX</filename> character device node,
+    thus reducing the CPU load.
+    </para>
+
+    <sect2 id="iiobuffersysfs">
+    <title>IIO buffer sysfs interface </title>
+    <para>
+      An IIO buffer has an associated attributes directory under <filename>
+      /sys/bus/iio/iio:deviceX/buffer/</filename>. Here are the existing
+      attributes:
+      <itemizedlist>
+      <listitem>
+      <emphasis>length</emphasis>, the total number of data samples
+      (capacity) that can be stored by the buffer.
+      </listitem>
+      <listitem>
+        <emphasis>enable</emphasis>, activate buffer capture.
+      </listitem>
+      </itemizedlist>
+
+    </para>
+    </sect2>
+    <sect2 id="iiobuffersetup"> <title> IIO buffer setup </title>
+      <para>The meta information associated with a channel reading
+        placed in a buffer is called a <emphasis> scan element </emphasis>.
+        The important bits configuring scan elements are exposed to
+        userspace applications via the <filename>
+        /sys/bus/iio/iio:deviceX/scan_elements/</filename> directory. This
+        file contains attributes of the following form:
+      <itemizedlist>
+      <listitem><emphasis>enable</emphasis>, used for enabling a channel.
+        If and only if its attribute is non zero, then a triggered capture
+        will contain data samples for this channel.
+      </listitem>
+      <listitem><emphasis>type</emphasis>, description of the scan element
+        data storage within the buffer and hence the form in which it is
+        read from user space. Format is <emphasis>
+        [be|le]:[s|u]bits/storagebitsXrepeat[>>shift] </emphasis>.
+        <itemizedlist>
+        <listitem> <emphasis>be</emphasis> or <emphasis>le</emphasis>, specifies
+          big or little endian.
+        </listitem>
+        <listitem>
+        <emphasis>s </emphasis>or <emphasis>u</emphasis>, specifies if
+          signed (2's complement) or unsigned.
+        </listitem>
+        <listitem><emphasis>bits</emphasis>, is the number of valid data
+          bits.
+        </listitem>
+        <listitem><emphasis>storagebits</emphasis>, is the number of bits
+          (after padding) that it occupies in the buffer.
+        </listitem>
+        <listitem>
+        <emphasis>shift</emphasis>, if specified, is the shift that needs
+          to be applied prior to masking out unused bits.
+        </listitem>
+        <listitem>
+        <emphasis>repeat</emphasis>, specifies the number of bits/storagebits
+        repetitions. When the repeat element is 0 or 1, then the repeat
+        value is omitted.
+        </listitem>
+        </itemizedlist>
+      </listitem>
+      </itemizedlist>
+      For example, a driver for a 3-axis accelerometer with 12 bit
+      resolution where data is stored in two 8-bits registers as
+      follows:
+      <programlisting>
+        7   6   5   4   3   2   1   0
+      +---+---+---+---+---+---+---+---+
+      |D3 |D2 |D1 |D0 | X | X | X | X | (LOW byte, address 0x06)
+      +---+---+---+---+---+---+---+---+
+
+        7   6   5   4   3   2   1   0
+      +---+---+---+---+---+---+---+---+
+      |D11|D10|D9 |D8 |D7 |D6 |D5 |D4 | (HIGH byte, address 0x07)
+      +---+---+---+---+---+---+---+---+
+      </programlisting>
+
+      will have the following scan element type for each axis:
+      <programlisting>
+      $ cat /sys/bus/iio/devices/iio:device0/scan_elements/in_accel_y_type
+      le:s12/16>>4
+      </programlisting>
+      A user space application will interpret data samples read from the
+      buffer as two byte little endian signed data, that needs a 4 bits
+      right shift before masking out the 12 valid bits of data.
+    </para>
+    <para>
+      For implementing buffer support a driver should initialize the following
+      fields in <type>iio_chan_spec</type> definition:
+      <programlisting>
+          struct iio_chan_spec {
+              /* other members */
+              int scan_index
+              struct {
+                  char sign;
+                  u8 realbits;
+                  u8 storagebits;
+                  u8 shift;
+                  u8 repeat;
+                  enum iio_endian endianness;
+              } scan_type;
+          };
+      </programlisting>
+      The driver implementing the accelerometer described above will
+      have the following channel definition:
+      <programlisting>
+      struct struct iio_chan_spec accel_channels[] = {
+          {
+            .type = IIO_ACCEL,
+            .modified = 1,
+            .channel2 = IIO_MOD_X,
+            /* other stuff here */
+            .scan_index = 0,
+            .scan_type = {
+              .sign = 's',
+              .realbits = 12,
+              .storgebits = 16,
+              .shift = 4,
+              .endianness = IIO_LE,
+            },
+        }
+        /* similar for Y (with channel2 = IIO_MOD_Y, scan_index = 1)
+         * and Z (with channel2 = IIO_MOD_Z, scan_index = 2) axis
+         */
+    }
+    </programlisting>
+    </para>
+    <para>
+    Here <emphasis> scan_index </emphasis> defines the order in which
+    the enabled channels are placed inside the buffer. Channels with a lower
+    scan_index will be placed before channels with a higher index. Each
+    channel needs to have a unique scan_index.
+    </para>
+    <para>
+    Setting scan_index to -1 can be used to indicate that the specific
+    channel does not support buffered capture. In this case no entries will
+    be created for the channel in the scan_elements directory.
+    </para>
+    </sect2>
+  </sect1>
+
+  <sect1 id="iiotrigger"> <title> Industrial I/O triggers  </title>
+!Finclude/linux/iio/trigger.h iio_trigger
+!Edrivers/iio/industrialio-trigger.c
+    <para>
+      In many situations it is useful for a driver to be able to
+      capture data based on some external event (trigger) as opposed
+      to periodically polling for data. An IIO trigger can be provided
+      by a device driver that also has an IIO device based on hardware
+      generated events (e.g. data ready or threshold exceeded) or
+      provided by a separate driver from an independent interrupt
+      source (e.g. GPIO line connected to some external system, timer
+      interrupt or user space writing a specific file in sysfs). A
+      trigger may initiate data capture for a number of sensors and
+      also it may be completely unrelated to the sensor itself.
+    </para>
+
+    <sect2 id="iiotrigsysfs"> <title> IIO trigger sysfs interface </title>
+      There are two locations in sysfs related to triggers:
+      <itemizedlist>
+        <listitem><filename>/sys/bus/iio/devices/triggerY</filename>,
+          this file is created once an IIO trigger is registered with
+          the IIO core and corresponds to trigger with index Y. Because
+          triggers can be very different depending on type there are few
+          standard attributes that we can describe here:
+          <itemizedlist>
+            <listitem>
+              <emphasis>name</emphasis>, trigger name that can be later
+                used for association with a device.
+            </listitem>
+            <listitem>
+            <emphasis>sampling_frequency</emphasis>, some timer based
+              triggers use this attribute to specify the frequency for
+              trigger calls.
+            </listitem>
+          </itemizedlist>
+        </listitem>
+        <listitem>
+          <filename>/sys/bus/iio/devices/iio:deviceX/trigger/</filename>, this
+          directory is created once the device supports a triggered
+          buffer. We can associate a trigger with our device by writing
+          the trigger's name in the <filename>current_trigger</filename> file.
+        </listitem>
+      </itemizedlist>
+    </sect2>
+
+    <sect2 id="iiotrigattr"> <title> IIO trigger setup</title>
+
+    <para>
+      Let's see a simple example of how to setup a trigger to be used
+      by a driver.
+
+      <programlisting>
+      struct iio_trigger_ops trigger_ops = {
+          .set_trigger_state = sample_trigger_state,
+          .validate_device = sample_validate_device,
+      }
+
+      struct iio_trigger *trig;
+
+      /* first, allocate memory for our trigger */
+      trig = iio_trigger_alloc(dev, "trig-%s-%d", name, idx);
+
+      /* setup trigger operations field */
+      trig->ops = &amp;trigger_ops;
+
+      /* now register the trigger with the IIO core */
+      iio_trigger_register(trig);
+      </programlisting>
+    </para>
+    </sect2>
+
+    <sect2 id="iiotrigsetup"> <title> IIO trigger ops</title>
+!Finclude/linux/iio/trigger.h iio_trigger_ops
+     <para>
+        Notice that a trigger has a set of operations attached:
+        <itemizedlist>
+        <listitem>
+          <function>set_trigger_state</function>, switch the trigger on/off
+          on demand.
+        </listitem>
+        <listitem>
+          <function>validate_device</function>, function to validate the
+          device when the current trigger gets changed.
+        </listitem>
+        </itemizedlist>
+      </para>
+    </sect2>
+  </sect1>
+  <sect1 id="iiotriggered_buffer">
+    <title> Industrial I/O triggered buffers </title>
+    <para>
+    Now that we know what buffers and triggers are let's see how they
+    work together.
+    </para>
+    <sect2 id="iiotrigbufsetup"> <title> IIO triggered buffer setup</title>
+!Edrivers/iio/industrialio-triggered-buffer.c
+!Finclude/linux/iio/iio.h iio_buffer_setup_ops
+
+
+    <para>
+    A typical triggered buffer setup looks like this:
+    <programlisting>
+    const struct iio_buffer_setup_ops sensor_buffer_setup_ops = {
+      .preenable    = sensor_buffer_preenable,
+      .postenable   = sensor_buffer_postenable,
+      .postdisable  = sensor_buffer_postdisable,
+      .predisable   = sensor_buffer_predisable,
+    };
+
+    irqreturn_t sensor_iio_pollfunc(int irq, void *p)
+    {
+        pf->timestamp = iio_get_time_ns();
+        return IRQ_WAKE_THREAD;
+    }
+
+    irqreturn_t sensor_trigger_handler(int irq, void *p)
+    {
+        u16 buf[8];
+        int i = 0;
+
+        /* read data for each active channel */
+        for_each_set_bit(bit, active_scan_mask, masklength)
+            buf[i++] = sensor_get_data(bit)
+
+        iio_push_to_buffers_with_timestamp(indio_dev, buf, timestamp);
+
+        iio_trigger_notify_done(trigger);
+        return IRQ_HANDLED;
+    }
+
+    /* setup triggered buffer, usually in probe function */
+    iio_triggered_buffer_setup(indio_dev, sensor_iio_polfunc,
+                               sensor_trigger_handler,
+                               sensor_buffer_setup_ops);
+    </programlisting>
+    </para>
+    The important things to notice here are:
+    <itemizedlist>
+    <listitem><function> iio_buffer_setup_ops</function>, the buffer setup
+    functions to be called at predefined points in the buffer configuration
+    sequence (e.g. before enable, after disable). If not specified, the
+    IIO core uses the default <type>iio_triggered_buffer_setup_ops</type>.
+    </listitem>
+    <listitem><function>sensor_iio_pollfunc</function>, the function that
+    will be used as top half of poll function. It should do as little
+    processing as possible, because it runs in interrupt context. The most
+    common operation is recording of the current timestamp and for this reason
+    one can use the IIO core defined <function>iio_pollfunc_store_time
+    </function> function.
+    </listitem>
+    <listitem><function>sensor_trigger_handler</function>, the function that
+    will be used as bottom half of the poll function. This runs in the
+    context of a kernel thread and all the processing takes place here.
+    It usually reads data from the device and stores it in the internal
+    buffer together with the timestamp recorded in the top half.
+    </listitem>
+    </itemizedlist>
+    </sect2>
+  </sect1>
+  </chapter>
+  <chapter id='iioresources'>
+    <title> Resources </title>
+      IIO core may change during time so the best documentation to read is the
+      source code. There are several locations where you should look:
+      <itemizedlist>
+        <listitem>
+          <filename>drivers/iio/</filename>, contains the IIO core plus
+          and directories for each sensor type (e.g. accel, magnetometer,
+          etc.)
+        </listitem>
+        <listitem>
+          <filename>include/linux/iio/</filename>, contains the header
+          files, nice to read for the internal kernel interfaces.
+        </listitem>
+        <listitem>
+        <filename>include/uapi/linux/iio/</filename>, contains files to be
+          used by user space applications.
+        </listitem>
+        <listitem>
+         <filename>tools/iio/</filename>, contains tools for rapidly
+          testing buffers, events and device creation.
+        </listitem>
+        <listitem>
+          <filename>drivers/staging/iio/</filename>, contains code for some
+          drivers or experimental features that are not yet mature enough
+          to be moved out.
+        </listitem>
+      </itemizedlist>
+    <para>
+    Besides the code, there are some good online documentation sources:
+    <itemizedlist>
+    <listitem>
+      <ulink url="http://marc.info/?l=linux-iio"> Industrial I/O mailing
+      list </ulink>
+    </listitem>
+    <listitem>
+      <ulink url="http://wiki.analog.com/software/linux/docs/iio/iio">
+      Analog Device IIO wiki page </ulink>
+    </listitem>
+    <listitem>
+      <ulink url="https://fosdem.org/2015/schedule/event/iiosdr/">
+      Using the Linux IIO framework for SDR, Lars-Peter Clausen's
+      presentation at FOSDEM </ulink>
+    </listitem>
+    </itemizedlist>
+    </para>
+  </chapter>
+</book>
+
+<!--
+vim: softtabstop=2:shiftwidth=2:expandtab:textwidth=72
+-->
-- 
cgit v1.2.3


From 655665d67c8465501a46aa8ffde4e6e29201ba99 Mon Sep 17 00:00:00 2001
From: Cristina Opriceana <cristina.opriceana@gmail.com>
Date: Thu, 6 Aug 2015 14:17:53 +0300
Subject: iio: Documentation: Add trigger name attribute ABI documentation

This patch adds an entry in ABI Documentation for the name attribute
issued when a trigger is created.

Signed-off-by: Cristina Opriceana <cristina.opriceana@gmail.com>
Acked-by: Daniel Baluta <daniel.baluta@intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
---
 Documentation/ABI/testing/sysfs-bus-iio-trigger-sysfs | 9 +++++++++
 1 file changed, 9 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-bus-iio-trigger-sysfs b/Documentation/ABI/testing/sysfs-bus-iio-trigger-sysfs
index 5235e6c749ab..bbb039237a25 100644
--- a/Documentation/ABI/testing/sysfs-bus-iio-trigger-sysfs
+++ b/Documentation/ABI/testing/sysfs-bus-iio-trigger-sysfs
@@ -9,3 +9,12 @@ Description:
 		automated testing or in situations, where other trigger methods
 		are not applicable. For example no RTC or spare GPIOs.
 		X is the IIO index of the trigger.
+
+What:		/sys/bus/iio/devices/triggerX/name
+KernelVersion:	2.6.39
+Contact:	linux-iio@vger.kernel.org
+Description:
+		The name attribute holds a description string for the current
+		trigger. In order to associate the trigger with an IIO device
+		one should write this name string to
+		/sys/bus/iio/devices/iio:deviceY/trigger/current_trigger.
-- 
cgit v1.2.3


From bdb25b0af8bd225ddf82083d767fada56fb2c1ae Mon Sep 17 00:00:00 2001
From: Vladimir Barinov <vladimir.barinov@cogentembedded.com>
Date: Wed, 29 Jul 2015 15:57:41 +0300
Subject: iio: Fix typos in ABI documentation

Fix typos in ABI documentation

Signed-off-by: Vladimir Barinov <vladimir.barinov@cogentembedded.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
---
 Documentation/ABI/testing/sysfs-bus-iio | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-bus-iio b/Documentation/ABI/testing/sysfs-bus-iio
index 400d23441a61..abe228dc0dac 100644
--- a/Documentation/ABI/testing/sysfs-bus-iio
+++ b/Documentation/ABI/testing/sysfs-bus-iio
@@ -493,7 +493,7 @@ Contact:	linux-iio@vger.kernel.org
 Description:
 		Specifies the output powerdown mode.
 		DAC output stage is disconnected from the amplifier and
-		1kohm_to_gnd: connected	to ground via an 1kOhm resistor,
+		1kohm_to_gnd: connected to ground via an 1kOhm resistor,
 		6kohm_to_gnd: connected to ground via a 6kOhm resistor,
 		20kohm_to_gnd: connected to ground via a 20kOhm resistor,
 		100kohm_to_gnd: connected to ground via an 100kOhm resistor,
@@ -503,9 +503,9 @@ Description:
 		outX_powerdown_mode_available. If Y is not present the
 		mode is shared across all outputs.
 
-What:		/sys/.../iio:deviceX/out_votlageY_powerdown_mode_available
+What:		/sys/.../iio:deviceX/out_voltageY_powerdown_mode_available
 What:		/sys/.../iio:deviceX/out_voltage_powerdown_mode_available
-What:		/sys/.../iio:deviceX/out_altvotlageY_powerdown_mode_available
+What:		/sys/.../iio:deviceX/out_altvoltageY_powerdown_mode_available
 What:		/sys/.../iio:deviceX/out_altvoltage_powerdown_mode_available
 KernelVersion:	2.6.38
 Contact:	linux-iio@vger.kernel.org
-- 
cgit v1.2.3


From 68a403823600fc9d8277f930981d3a013353b2fe Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Tue, 17 Mar 2015 13:19:51 -0700
Subject: hwmon: (adm1275) Add support for ADM1293 and ADM1294

ADM1293 and ADM1294 are mostly compatible with other chips of the same
series, but have more configuration options. There are also some
differences in register details.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/hwmon/adm1275   |  40 +++++++-----
 drivers/hwmon/pmbus/Kconfig   |   4 +-
 drivers/hwmon/pmbus/adm1275.c | 141 +++++++++++++++++++++++++++++++++++++++---
 3 files changed, 158 insertions(+), 27 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/hwmon/adm1275 b/Documentation/hwmon/adm1275
index 15b4a20d5062..d697229e3c18 100644
--- a/Documentation/hwmon/adm1275
+++ b/Documentation/hwmon/adm1275
@@ -14,6 +14,10 @@ Supported chips:
     Prefix: 'adm1276'
     Addresses scanned: -
     Datasheet: www.analog.com/static/imported-files/data_sheets/ADM1276.pdf
+  * Analog Devices ADM1293/ADM1294
+    Prefix: 'adm1293', 'adm1294'
+    Addresses scanned: -
+    Datasheet: http://www.analog.com/media/en/technical-documentation/data-sheets/ADM1293_1294.pdf
 
 Author: Guenter Roeck <linux@roeck-us.net>
 
@@ -22,12 +26,12 @@ Description
 -----------
 
 This driver supports hardware montoring for Analog Devices ADM1075, ADM1275,
-and ADM1276 Hot-Swap Controller and Digital Power Monitor.
+ADM1276, ADM1293, and ADM1294 Hot-Swap Controller and Digital Power Monitors.
 
-ADM1075, ADM1275, and ADM1276 are hot-swap controllers that allow a circuit
-board to be removed from or inserted into a live backplane. They also feature
-current and voltage readback via an integrated 12-bit analog-to-digital
-converter (ADC), accessed using a PMBus interface.
+ADM1075, ADM1275, ADM1276, ADM1293, and ADM1294 are hot-swap controllers that
+allow a circuit board to be removed from or inserted into a live backplane.
+They also feature current and voltage readback via an integrated 12
+bit analog-to-digital converter (ADC), accessed using a PMBus interface.
 
 The driver is a client driver to the core PMBus driver. Please see
 Documentation/hwmon/pmbus for details on PMBus client drivers.
@@ -58,16 +62,16 @@ Sysfs entries
 The following attributes are supported. Limits are read-write, history reset
 attributes are write-only, all other attributes are read-only.
 
-in1_label		"vin1" or "vout1" depending on chip variant and
-			configuration. On ADM1075, vout1 reports the voltage on
-			the VAUX pin.
-in1_input		Measured voltage.
-in1_min			Minimum Voltage.
-in1_max			Maximum voltage.
-in1_min_alarm		Voltage low alarm.
-in1_max_alarm		Voltage high alarm.
-in1_highest		Historical maximum voltage.
-in1_reset_history	Write any value to reset history.
+inX_label		"vin1" or "vout1" depending on chip variant and
+			configuration. On ADM1075, ADM1293, and ADM1294,
+			vout1 reports the voltage on the VAUX pin.
+inX_input		Measured voltage.
+inX_min			Minimum Voltage.
+inX_max			Maximum voltage.
+inX_min_alarm		Voltage low alarm.
+inX_max_alarm		Voltage high alarm.
+inX_highest		Historical maximum voltage.
+inX_reset_history	Write any value to reset history.
 
 curr1_label		"iout1"
 curr1_input		Measured current.
@@ -86,7 +90,9 @@ curr1_reset_history	Write any value to reset history.
 
 power1_label		"pin1"
 power1_input		Input power.
+power1_input_lowest	Lowest observed input power. ADM1293 and ADM1294 only.
+power1_input_highest	Highest observed input power.
 power1_reset_history	Write any value to reset history.
 
-			Power attributes are supported on ADM1075 and ADM1276
-			only.
+			Power attributes are supported on ADM1075, ADM1276,
+			ADM1293, and ADM1294.
diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index 9f7dbd189c97..67901cb08f77 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -30,8 +30,8 @@ config SENSORS_ADM1275
 	default n
 	help
 	  If you say yes here you get hardware monitoring support for Analog
-	  Devices ADM1075, ADM1275, and ADM1276 Hot-Swap Controller and Digital
-	  Power Monitors.
+	  Devices ADM1075, ADM1275, ADM1276, ADM1293, and ADM1294 Hot-Swap
+	  Controller and Digital Power Monitors.
 
 	  This driver can also be built as a module. If so, the module will
 	  be called adm1275.
diff --git a/drivers/hwmon/pmbus/adm1275.c b/drivers/hwmon/pmbus/adm1275.c
index 1c19a2ee415c..188af4c89f40 100644
--- a/drivers/hwmon/pmbus/adm1275.c
+++ b/drivers/hwmon/pmbus/adm1275.c
@@ -24,7 +24,11 @@
 #include <linux/bitops.h>
 #include "pmbus.h"
 
-enum chips { adm1075, adm1275, adm1276 };
+enum chips { adm1075, adm1275, adm1276, adm1293, adm1294 };
+
+#define ADM1275_MFR_STATUS_IOUT_WARN2	BIT(0)
+#define ADM1293_MFR_STATUS_VAUX_UV_WARN	BIT(5)
+#define ADM1293_MFR_STATUS_VAUX_OV_WARN	BIT(6)
 
 #define ADM1275_PEAK_IOUT		0xd0
 #define ADM1275_PEAK_VIN		0xd1
@@ -37,18 +41,30 @@ enum chips { adm1075, adm1275, adm1276 };
 #define ADM1075_IRANGE_25		BIT(3)
 #define ADM1075_IRANGE_MASK		(BIT(3) | BIT(4))
 
+#define ADM1293_IRANGE_25		0
+#define ADM1293_IRANGE_50		BIT(6)
+#define ADM1293_IRANGE_100		BIT(7)
+#define ADM1293_IRANGE_200		(BIT(6) | BIT(7))
+#define ADM1293_IRANGE_MASK		(BIT(6) | BIT(7))
+
+#define ADM1293_VIN_SEL_012		BIT(2)
+#define ADM1293_VIN_SEL_074		BIT(3)
+#define ADM1293_VIN_SEL_210		(BIT(2) | BIT(3))
+#define ADM1293_VIN_SEL_MASK		(BIT(2) | BIT(3))
+
+#define ADM1293_VAUX_EN			BIT(1)
+
 #define ADM1275_IOUT_WARN2_LIMIT	0xd7
 #define ADM1275_DEVICE_CONFIG		0xd8
 
 #define ADM1275_IOUT_WARN2_SELECT	BIT(4)
 
 #define ADM1276_PEAK_PIN		0xda
-
-#define ADM1275_MFR_STATUS_IOUT_WARN2	BIT(0)
-
 #define ADM1075_READ_VAUX		0xdd
 #define ADM1075_VAUX_OV_WARN_LIMIT	0xde
 #define ADM1075_VAUX_UV_WARN_LIMIT	0xdf
+#define ADM1293_IOUT_MIN		0xe3
+#define ADM1293_PIN_MIN			0xe4
 #define ADM1075_VAUX_STATUS		0xf6
 
 #define ADM1075_VAUX_OV_WARN		BIT(7)
@@ -60,6 +76,9 @@ struct adm1275_data {
 	bool have_uc_fault;
 	bool have_vout;
 	bool have_vaux_status;
+	bool have_mfr_vaux_status;
+	bool have_iout_min;
+	bool have_pin_min;
 	bool have_pin_max;
 	struct pmbus_driver_info info;
 };
@@ -94,6 +113,28 @@ static const struct coefficients adm1276_coefficients[] = {
 	[4] = { 2115, 0, -1 },		/* power, vrange not set */
 };
 
+static const struct coefficients adm1293_coefficients[] = {
+	[0] = { 3333, -1, 0 },		/* voltage, vrange 1.2V */
+	[1] = { 5552, -5, -1 },		/* voltage, vrange 7.4V */
+	[2] = { 19604, -50, -2 },	/* voltage, vrange 21V */
+	[3] = { 8000, -100, -2 },	/* current, irange25 */
+	[4] = { 4000, -100, -2 },	/* current, irange50 */
+	[5] = { 20000, -1000, -3 },	/* current, irange100 */
+	[6] = { 10000, -1000, -3 },	/* current, irange200 */
+	[7] = { 10417, 0, -1 },		/* power, 1.2V, irange25 */
+	[8] = { 5208, 0, -1 },		/* power, 1.2V, irange50 */
+	[9] = { 26042, 0, -2 },		/* power, 1.2V, irange100 */
+	[10] = { 13021, 0, -2 },	/* power, 1.2V, irange200 */
+	[11] = { 17351, 0, -2 },	/* power, 7.4V, irange25 */
+	[12] = { 8676, 0, -2 },		/* power, 7.4V, irange50 */
+	[13] = { 4338, 0, -2 },		/* power, 7.4V, irange100 */
+	[14] = { 21689, 0, -3 },	/* power, 7.4V, irange200 */
+	[15] = { 6126, 0, -2 },		/* power, 21V, irange25 */
+	[16] = { 30631, 0, -3 },	/* power, 21V, irange50 */
+	[17] = { 15316, 0, -3 },	/* power, 21V, irange100 */
+	[18] = { 7658, 0, -3 },		/* power, 21V, irange200 */
+};
+
 static int adm1275_read_word_data(struct i2c_client *client, int page, int reg)
 {
 	const struct pmbus_driver_info *info = pmbus_get_driver_info(client);
@@ -131,6 +172,11 @@ static int adm1275_read_word_data(struct i2c_client *client, int page, int reg)
 			return -ENODATA;
 		ret = pmbus_read_word_data(client, 0, ADM1075_READ_VAUX);
 		break;
+	case PMBUS_VIRT_READ_IOUT_MIN:
+		if (!data->have_iout_min)
+			return -ENXIO;
+		ret = pmbus_read_word_data(client, 0, ADM1293_IOUT_MIN);
+		break;
 	case PMBUS_VIRT_READ_IOUT_MAX:
 		ret = pmbus_read_word_data(client, 0, ADM1275_PEAK_IOUT);
 		break;
@@ -140,6 +186,11 @@ static int adm1275_read_word_data(struct i2c_client *client, int page, int reg)
 	case PMBUS_VIRT_READ_VIN_MAX:
 		ret = pmbus_read_word_data(client, 0, ADM1275_PEAK_VIN);
 		break;
+	case PMBUS_VIRT_READ_PIN_MIN:
+		if (!data->have_pin_min)
+			return -ENXIO;
+		ret = pmbus_read_word_data(client, 0, ADM1293_PIN_MIN);
+		break;
 	case PMBUS_VIRT_READ_PIN_MAX:
 		if (!data->have_pin_max)
 			return -ENXIO;
@@ -163,6 +214,8 @@ static int adm1275_read_word_data(struct i2c_client *client, int page, int reg)
 static int adm1275_write_word_data(struct i2c_client *client, int page, int reg,
 				   u16 word)
 {
+	const struct pmbus_driver_info *info = pmbus_get_driver_info(client);
+	const struct adm1275_data *data = to_adm1275_data(info);
 	int ret;
 
 	if (page)
@@ -176,6 +229,9 @@ static int adm1275_write_word_data(struct i2c_client *client, int page, int reg,
 		break;
 	case PMBUS_VIRT_RESET_IOUT_HISTORY:
 		ret = pmbus_write_word_data(client, 0, ADM1275_PEAK_IOUT, 0);
+		if (!ret && data->have_iout_min)
+			ret = pmbus_write_word_data(client, 0,
+						    ADM1293_IOUT_MIN, 0);
 		break;
 	case PMBUS_VIRT_RESET_VOUT_HISTORY:
 		ret = pmbus_write_word_data(client, 0, ADM1275_PEAK_VOUT, 0);
@@ -185,6 +241,9 @@ static int adm1275_write_word_data(struct i2c_client *client, int page, int reg,
 		break;
 	case PMBUS_VIRT_RESET_PIN_HISTORY:
 		ret = pmbus_write_word_data(client, 0, ADM1276_PEAK_PIN, 0);
+		if (!ret && data->have_pin_min)
+			ret = pmbus_write_word_data(client, 0,
+						    ADM1293_PIN_MIN, 0);
 		break;
 	default:
 		ret = -ENODATA;
@@ -231,6 +290,15 @@ static int adm1275_read_byte_data(struct i2c_client *client, int page, int reg)
 				ret |= PB_VOLTAGE_OV_WARNING;
 			if (mfr_status & ADM1075_VAUX_UV_WARN)
 				ret |= PB_VOLTAGE_UV_WARNING;
+		} else if (data->have_mfr_vaux_status) {
+			mfr_status = pmbus_read_byte_data(client, page,
+						PMBUS_STATUS_MFR_SPECIFIC);
+			if (mfr_status < 0)
+				return mfr_status;
+			if (mfr_status & ADM1293_MFR_STATUS_VAUX_OV_WARN)
+				ret |= PB_VOLTAGE_OV_WARNING;
+			if (mfr_status & ADM1293_MFR_STATUS_VAUX_UV_WARN)
+				ret |= PB_VOLTAGE_UV_WARNING;
 		}
 		break;
 	default:
@@ -244,6 +312,8 @@ static const struct i2c_device_id adm1275_id[] = {
 	{ "adm1075", adm1075 },
 	{ "adm1275", adm1275 },
 	{ "adm1276", adm1276 },
+	{ "adm1293", adm1293 },
+	{ "adm1294", adm1294 },
 	{ }
 };
 MODULE_DEVICE_TABLE(i2c, adm1275_id);
@@ -258,7 +328,7 @@ static int adm1275_probe(struct i2c_client *client,
 	struct adm1275_data *data;
 	const struct i2c_device_id *mid;
 	const struct coefficients *coefficients;
-	int vindex = -1, cindex = -1, pindex = -1;
+	int vindex = -1, voindex = -1, cindex = -1, pindex = -1;
 
 	if (!i2c_check_functionality(client->adapter,
 				     I2C_FUNC_SMBUS_READ_BYTE_DATA
@@ -390,17 +460,72 @@ static int adm1275_probe(struct i2c_client *client,
 			info->func[0] |=
 			  PMBUS_HAVE_VOUT | PMBUS_HAVE_STATUS_VOUT;
 		break;
+	case adm1293:
+	case adm1294:
+		data->have_iout_min = true;
+		data->have_pin_min = true;
+		data->have_pin_max = true;
+		data->have_mfr_vaux_status = true;
+
+		coefficients = adm1293_coefficients;
+
+		voindex = 0;
+		switch (config & ADM1293_VIN_SEL_MASK) {
+		case ADM1293_VIN_SEL_012:	/* 1.2V */
+			vindex = 0;
+			break;
+		case ADM1293_VIN_SEL_074:	/* 7.4V */
+			vindex = 1;
+			break;
+		case ADM1293_VIN_SEL_210:	/* 21V */
+			vindex = 2;
+			break;
+		default:			/* disabled */
+			break;
+		}
+
+		switch (config & ADM1293_IRANGE_MASK) {
+		case ADM1293_IRANGE_25:
+			cindex = 3;
+			break;
+		case ADM1293_IRANGE_50:
+			cindex = 4;
+			break;
+		case ADM1293_IRANGE_100:
+			cindex = 5;
+			break;
+		case ADM1293_IRANGE_200:
+			cindex = 6;
+			break;
+		}
+
+		if (vindex >= 0)
+			pindex = 7 + vindex * 4 + (cindex - 3);
+
+		if (config & ADM1293_VAUX_EN)
+			info->func[0] |=
+				PMBUS_HAVE_VOUT | PMBUS_HAVE_STATUS_VOUT;
+
+		info->func[0] |= PMBUS_HAVE_PIN |
+			PMBUS_HAVE_VIN | PMBUS_HAVE_STATUS_INPUT;
+
+		break;
 	default:
 		dev_err(&client->dev, "Unsupported device\n");
 		return -ENODEV;
 	}
+
+	if (voindex < 0)
+		voindex = vindex;
 	if (vindex >= 0) {
 		info->m[PSC_VOLTAGE_IN] = coefficients[vindex].m;
 		info->b[PSC_VOLTAGE_IN] = coefficients[vindex].b;
 		info->R[PSC_VOLTAGE_IN] = coefficients[vindex].R;
-		info->m[PSC_VOLTAGE_OUT] = coefficients[vindex].m;
-		info->b[PSC_VOLTAGE_OUT] = coefficients[vindex].b;
-		info->R[PSC_VOLTAGE_OUT] = coefficients[vindex].R;
+	}
+	if (voindex >= 0) {
+		info->m[PSC_VOLTAGE_OUT] = coefficients[voindex].m;
+		info->b[PSC_VOLTAGE_OUT] = coefficients[voindex].b;
+		info->R[PSC_VOLTAGE_OUT] = coefficients[voindex].R;
 	}
 	if (cindex >= 0) {
 		info->m[PSC_CURRENT_OUT] = coefficients[cindex].m;
-- 
cgit v1.2.3


From a1dc86ebd2f2eb6652c8be9311c16acccd1708f8 Mon Sep 17 00:00:00 2001
From: Rabin Vincent <rabin@rab.in>
Date: Sun, 19 Jul 2015 00:29:14 +0200
Subject: hwmon: (lm70) add device tree support

Allow the lm70 to be probed from a device tree.

Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/devicetree/bindings/hwmon/lm70.txt | 21 +++++++++++++++
 drivers/hwmon/lm70.c                             | 34 +++++++++++++++++++++++-
 2 files changed, 54 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/hwmon/lm70.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/hwmon/lm70.txt b/Documentation/devicetree/bindings/hwmon/lm70.txt
new file mode 100644
index 000000000000..e7fd921aa4f1
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/lm70.txt
@@ -0,0 +1,21 @@
+* LM70/TMP121/LM71/LM74 thermometer.
+
+Required properties:
+- compatible: one of
+		"ti,lm70"
+		"ti,tmp121"
+		"ti,lm71"
+		"ti,lm74"
+
+See Documentation/devicetree/bindings/spi/spi-bus.txt for more required and
+optional properties.
+
+Example:
+
+spi_master {
+	temperature-sensor@0 {
+		compatible = "ti,lm70";
+		reg = <0>;
+		spi-max-frequency = <1000000>;
+	};
+};
diff --git a/drivers/hwmon/lm70.c b/drivers/hwmon/lm70.c
index 97204dce162d..9296e9daf774 100644
--- a/drivers/hwmon/lm70.c
+++ b/drivers/hwmon/lm70.c
@@ -37,6 +37,7 @@
 #include <linux/mod_devicetable.h>
 #include <linux/spi/spi.h>
 #include <linux/slab.h>
+#include <linux/of_device.h>
 
 
 #define DRVNAME		"lm70"
@@ -130,11 +131,41 @@ ATTRIBUTE_GROUPS(lm70);
 
 /*----------------------------------------------------------------------*/
 
+#ifdef CONFIG_OF
+static const struct of_device_id lm70_of_ids[] = {
+	{
+		.compatible = "ti,lm70",
+		.data = (void *) LM70_CHIP_LM70,
+	},
+	{
+		.compatible = "ti,tmp121",
+		.data = (void *) LM70_CHIP_TMP121,
+	},
+	{
+		.compatible = "ti,lm71",
+		.data = (void *) LM70_CHIP_LM71,
+	},
+	{
+		.compatible = "ti,lm74",
+		.data = (void *) LM70_CHIP_LM74,
+	},
+	{},
+};
+MODULE_DEVICE_TABLE(of, lm70_of_ids);
+#endif
+
 static int lm70_probe(struct spi_device *spi)
 {
-	int chip = spi_get_device_id(spi)->driver_data;
+	const struct of_device_id *match;
 	struct device *hwmon_dev;
 	struct lm70 *p_lm70;
+	int chip;
+
+	match = of_match_device(lm70_of_ids, &spi->dev);
+	if (match)
+		chip = (int)(uintptr_t)match->data;
+	else
+		chip = spi_get_device_id(spi)->driver_data;
 
 	/* signaling is SPI_MODE_0 */
 	if (spi->mode & (SPI_CPOL | SPI_CPHA))
@@ -169,6 +200,7 @@ static struct spi_driver lm70_driver = {
 	.driver = {
 		.name	= "lm70",
 		.owner	= THIS_MODULE,
+		.of_match_table	= of_match_ptr(lm70_of_ids),
 	},
 	.id_table = lm70_ids,
 	.probe	= lm70_probe,
-- 
cgit v1.2.3


From 1c6e8f6ba89c59270db223284d47e3c928c6fbfc Mon Sep 17 00:00:00 2001
From: Constantine Shulyupin <const@MakeLinux.com>
Date: Tue, 28 Jul 2015 01:01:07 +0300
Subject: hwmon: (nct7802) Add auto_point attributes

Introduced REG_PWM, pwm[1..3]_auto_point[1..5]_temp,
pwm[1..3]_auto_point[1..5]_pwm, nct7802_auto_point_attrs,
nct7802_auto_point_group, updated nct7802_regmap_is_volatile

Signed-off-by: Constantine Shulyupin <const@MakeLinux.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/hwmon/nct7802 |   3 +-
 drivers/hwmon/nct7802.c     | 130 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 127 insertions(+), 6 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/hwmon/nct7802 b/Documentation/hwmon/nct7802
index 2e00f5e344bc..5438deb6be02 100644
--- a/Documentation/hwmon/nct7802
+++ b/Documentation/hwmon/nct7802
@@ -17,8 +17,7 @@ This driver implements support for the Nuvoton NCT7802Y hardware monitoring
 chip. NCT7802Y supports 6 temperature sensors, 5 voltage sensors, and 3 fan
 speed sensors.
 
-The chip also supports intelligent fan speed control. This functionality is
-not currently supported by the driver.
+Smart Fan™ speed control is available via pwmX_auto_point attributes.
 
 Tested Boards and BIOS Versions
 -------------------------------
diff --git a/drivers/hwmon/nct7802.c b/drivers/hwmon/nct7802.c
index f4908bb228b8..3ce33d244cc0 100644
--- a/drivers/hwmon/nct7802.c
+++ b/drivers/hwmon/nct7802.c
@@ -53,6 +53,7 @@ static const u8 REG_VOLTAGE_LIMIT_MSB_SHIFT[2][5] = {
 #define REG_PECI_ENABLE		0x23
 #define REG_FAN_ENABLE		0x24
 #define REG_VMON_ENABLE		0x25
+#define REG_PWM(x)		(0x60 + (x))
 #define REG_SMARTFAN_EN(x)      (0x64 + (x) / 2)
 #define SMARTFAN_EN_SHIFT(x)    ((x) % 2 * 4)
 #define REG_VENDOR_ID		0xfd
@@ -130,6 +131,9 @@ static ssize_t show_pwm(struct device *dev, struct device_attribute *devattr,
 	unsigned int val;
 	int ret;
 
+	if (!attr->index)
+		return sprintf(buf, "255\n");
+
 	ret = regmap_read(data->regmap, attr->index, &val);
 	if (ret < 0)
 		return ret;
@@ -826,9 +830,12 @@ static SENSOR_DEVICE_ATTR(pwm2_mode, S_IRUGO, show_pwm_mode, NULL, 1);
 static SENSOR_DEVICE_ATTR(pwm3_mode, S_IRUGO, show_pwm_mode, NULL, 2);
 
 /* 7.2.91... Fan Control Output Value */
-static SENSOR_DEVICE_ATTR(pwm1, S_IRUGO | S_IWUSR, show_pwm, store_pwm, 0x60);
-static SENSOR_DEVICE_ATTR(pwm2, S_IRUGO | S_IWUSR, show_pwm, store_pwm, 0x61);
-static SENSOR_DEVICE_ATTR(pwm3, S_IRUGO | S_IWUSR, show_pwm, store_pwm, 0x62);
+static SENSOR_DEVICE_ATTR(pwm1, S_IRUGO | S_IWUSR, show_pwm, store_pwm,
+			  REG_PWM(0));
+static SENSOR_DEVICE_ATTR(pwm2, S_IRUGO | S_IWUSR, show_pwm, store_pwm,
+			  REG_PWM(1));
+static SENSOR_DEVICE_ATTR(pwm3, S_IRUGO | S_IWUSR, show_pwm, store_pwm,
+			  REG_PWM(2));
 
 /* 7.2.95... Temperature to Fan mapping Relationships Register */
 static SENSOR_DEVICE_ATTR(pwm1_enable, S_IRUGO | S_IWUSR, show_pwm_enable,
@@ -893,11 +900,125 @@ static struct attribute_group nct7802_pwm_group = {
 	.attrs = nct7802_pwm_attrs,
 };
 
+/* 7.2.115... 0x80-0x83, 0x84 Temperature (X-axis) transition */
+static SENSOR_DEVICE_ATTR_2(pwm1_auto_point1_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0x80, 0);
+static SENSOR_DEVICE_ATTR_2(pwm1_auto_point2_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0x81, 0);
+static SENSOR_DEVICE_ATTR_2(pwm1_auto_point3_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0x82, 0);
+static SENSOR_DEVICE_ATTR_2(pwm1_auto_point4_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0x83, 0);
+static SENSOR_DEVICE_ATTR_2(pwm1_auto_point5_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0x84, 0);
+
+/* 7.2.120... 0x85-0x88 PWM (Y-axis) transition */
+static SENSOR_DEVICE_ATTR(pwm1_auto_point1_pwm, S_IRUGO | S_IWUSR,
+			  show_pwm, store_pwm, 0x85);
+static SENSOR_DEVICE_ATTR(pwm1_auto_point2_pwm, S_IRUGO | S_IWUSR,
+			  show_pwm, store_pwm, 0x86);
+static SENSOR_DEVICE_ATTR(pwm1_auto_point3_pwm, S_IRUGO | S_IWUSR,
+			  show_pwm, store_pwm, 0x87);
+static SENSOR_DEVICE_ATTR(pwm1_auto_point4_pwm, S_IRUGO | S_IWUSR,
+			  show_pwm, store_pwm, 0x88);
+static SENSOR_DEVICE_ATTR(pwm1_auto_point5_pwm, S_IRUGO, show_pwm, NULL, 0);
+
+/* 7.2.124 Table 2 X-axis Transition Point 1 Register */
+static SENSOR_DEVICE_ATTR_2(pwm2_auto_point1_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0x90, 0);
+static SENSOR_DEVICE_ATTR_2(pwm2_auto_point2_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0x91, 0);
+static SENSOR_DEVICE_ATTR_2(pwm2_auto_point3_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0x92, 0);
+static SENSOR_DEVICE_ATTR_2(pwm2_auto_point4_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0x93, 0);
+static SENSOR_DEVICE_ATTR_2(pwm2_auto_point5_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0x94, 0);
+
+/* 7.2.129 Table 2 Y-axis Transition Point 1 Register */
+static SENSOR_DEVICE_ATTR(pwm2_auto_point1_pwm, S_IRUGO | S_IWUSR,
+			  show_pwm, store_pwm, 0x95);
+static SENSOR_DEVICE_ATTR(pwm2_auto_point2_pwm, S_IRUGO | S_IWUSR,
+			  show_pwm, store_pwm, 0x96);
+static SENSOR_DEVICE_ATTR(pwm2_auto_point3_pwm, S_IRUGO | S_IWUSR,
+			  show_pwm, store_pwm, 0x97);
+static SENSOR_DEVICE_ATTR(pwm2_auto_point4_pwm, S_IRUGO | S_IWUSR,
+			  show_pwm, store_pwm, 0x98);
+static SENSOR_DEVICE_ATTR(pwm2_auto_point5_pwm, S_IRUGO, show_pwm, NULL, 0);
+
+/* 7.2.133 Table 3 X-axis Transition Point 1 Register */
+static SENSOR_DEVICE_ATTR_2(pwm3_auto_point1_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0xA0, 0);
+static SENSOR_DEVICE_ATTR_2(pwm3_auto_point2_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0xA1, 0);
+static SENSOR_DEVICE_ATTR_2(pwm3_auto_point3_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0xA2, 0);
+static SENSOR_DEVICE_ATTR_2(pwm3_auto_point4_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0xA3, 0);
+static SENSOR_DEVICE_ATTR_2(pwm3_auto_point5_temp, S_IRUGO | S_IWUSR,
+			    show_temp, store_temp, 0xA4, 0);
+
+/* 7.2.138 Table 3 Y-axis Transition Point 1 Register */
+static SENSOR_DEVICE_ATTR(pwm3_auto_point1_pwm, S_IRUGO | S_IWUSR,
+			  show_pwm, store_pwm, 0xA5);
+static SENSOR_DEVICE_ATTR(pwm3_auto_point2_pwm, S_IRUGO | S_IWUSR,
+			  show_pwm, store_pwm, 0xA6);
+static SENSOR_DEVICE_ATTR(pwm3_auto_point3_pwm, S_IRUGO | S_IWUSR,
+			  show_pwm, store_pwm, 0xA7);
+static SENSOR_DEVICE_ATTR(pwm3_auto_point4_pwm, S_IRUGO | S_IWUSR,
+			  show_pwm, store_pwm, 0xA8);
+static SENSOR_DEVICE_ATTR(pwm3_auto_point5_pwm, S_IRUGO, show_pwm, NULL, 0);
+
+static struct attribute *nct7802_auto_point_attrs[] = {
+	&sensor_dev_attr_pwm1_auto_point1_temp.dev_attr.attr,
+	&sensor_dev_attr_pwm1_auto_point2_temp.dev_attr.attr,
+	&sensor_dev_attr_pwm1_auto_point3_temp.dev_attr.attr,
+	&sensor_dev_attr_pwm1_auto_point4_temp.dev_attr.attr,
+	&sensor_dev_attr_pwm1_auto_point5_temp.dev_attr.attr,
+
+	&sensor_dev_attr_pwm1_auto_point1_pwm.dev_attr.attr,
+	&sensor_dev_attr_pwm1_auto_point2_pwm.dev_attr.attr,
+	&sensor_dev_attr_pwm1_auto_point3_pwm.dev_attr.attr,
+	&sensor_dev_attr_pwm1_auto_point4_pwm.dev_attr.attr,
+	&sensor_dev_attr_pwm1_auto_point5_pwm.dev_attr.attr,
+
+	&sensor_dev_attr_pwm2_auto_point1_temp.dev_attr.attr,
+	&sensor_dev_attr_pwm2_auto_point2_temp.dev_attr.attr,
+	&sensor_dev_attr_pwm2_auto_point3_temp.dev_attr.attr,
+	&sensor_dev_attr_pwm2_auto_point4_temp.dev_attr.attr,
+	&sensor_dev_attr_pwm2_auto_point5_temp.dev_attr.attr,
+
+	&sensor_dev_attr_pwm2_auto_point1_pwm.dev_attr.attr,
+	&sensor_dev_attr_pwm2_auto_point2_pwm.dev_attr.attr,
+	&sensor_dev_attr_pwm2_auto_point3_pwm.dev_attr.attr,
+	&sensor_dev_attr_pwm2_auto_point4_pwm.dev_attr.attr,
+	&sensor_dev_attr_pwm2_auto_point5_pwm.dev_attr.attr,
+
+	&sensor_dev_attr_pwm3_auto_point1_temp.dev_attr.attr,
+	&sensor_dev_attr_pwm3_auto_point2_temp.dev_attr.attr,
+	&sensor_dev_attr_pwm3_auto_point3_temp.dev_attr.attr,
+	&sensor_dev_attr_pwm3_auto_point4_temp.dev_attr.attr,
+	&sensor_dev_attr_pwm3_auto_point5_temp.dev_attr.attr,
+
+	&sensor_dev_attr_pwm3_auto_point1_pwm.dev_attr.attr,
+	&sensor_dev_attr_pwm3_auto_point2_pwm.dev_attr.attr,
+	&sensor_dev_attr_pwm3_auto_point3_pwm.dev_attr.attr,
+	&sensor_dev_attr_pwm3_auto_point4_pwm.dev_attr.attr,
+	&sensor_dev_attr_pwm3_auto_point5_pwm.dev_attr.attr,
+
+	NULL
+};
+
+static struct attribute_group nct7802_auto_point_group = {
+	.attrs = nct7802_auto_point_attrs,
+};
+
 static const struct attribute_group *nct7802_groups[] = {
 	&nct7802_temp_group,
 	&nct7802_in_group,
 	&nct7802_fan_group,
 	&nct7802_pwm_group,
+	&nct7802_auto_point_group,
 	NULL
 };
 
@@ -945,7 +1066,8 @@ static int nct7802_detect(struct i2c_client *client,
 
 static bool nct7802_regmap_is_volatile(struct device *dev, unsigned int reg)
 {
-	return reg != REG_BANK && reg <= 0x20;
+	return (reg != REG_BANK && reg <= 0x20) ||
+		(reg >= REG_PWM(0) && reg <= REG_PWM(2));
 }
 
 static const struct regmap_config nct7802_regmap_config = {
-- 
cgit v1.2.3


From ead8080351c988b9c4bb5ec261d7b4c97ebb2773 Mon Sep 17 00:00:00 2001
From: Justin Maggard <jmaggard10@gmail.com>
Date: Wed, 5 Aug 2015 12:53:08 -0700
Subject: hwmon: (it87) Add support for IT8732F

Add support for the IT8732F.  This chip is pretty similar to IT8721F,
with the main difference being that the ADC LSB is 10.9 mV instead of
12 mV.

Signed-off-by: Justin Maggard <jmaggard@netgear.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/hwmon/it87 | 35 ++++++++++++++++++++++-------------
 drivers/hwmon/Kconfig    |  4 ++--
 drivers/hwmon/it87.c     | 43 ++++++++++++++++++++++++++++++++++++-------
 3 files changed, 60 insertions(+), 22 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/hwmon/it87 b/Documentation/hwmon/it87
index e87294878334..733296d65449 100644
--- a/Documentation/hwmon/it87
+++ b/Documentation/hwmon/it87
@@ -38,6 +38,10 @@ Supported chips:
     Prefix: 'it8728'
     Addresses scanned: from Super I/O config space (8 I/O ports)
     Datasheet: Not publicly available
+  * IT8732F
+    Prefix: 'it8732'
+    Addresses scanned: from Super I/O config space (8 I/O ports)
+    Datasheet: Not publicly available
   * IT8771E
     Prefix: 'it8771'
     Addresses scanned: from Super I/O config space (8 I/O ports)
@@ -111,9 +115,9 @@ Description
 -----------
 
 This driver implements support for the IT8603E, IT8620E, IT8623E, IT8705F,
-IT8712F, IT8716F, IT8718F, IT8720F, IT8721F, IT8726F, IT8728F, IT8758E,
-IT8771E, IT8772E, IT8781F, IT8782F, IT8783E/F, IT8786E, IT8790E, and SiS950
-chips.
+IT8712F, IT8716F, IT8718F, IT8720F, IT8721F, IT8726F, IT8728F, IT8732F,
+IT8758E, IT8771E, IT8772E, IT8781F, IT8782F, IT8783E/F, IT8786E, IT8790E, and
+SiS950 chips.
 
 These chips are 'Super I/O chips', supporting floppy disks, infrared ports,
 joysticks and other miscellaneous stuff. For hardware monitoring, they
@@ -137,10 +141,10 @@ The IT8716F, IT8718F, IT8720F, IT8721F/IT8758E and later IT8712F revisions
 have support for 2 additional fans. The additional fans are supported by the
 driver.
 
-The IT8716F, IT8718F, IT8720F, IT8721F/IT8758E, IT8781F, IT8782F, IT8783E/F,
-and late IT8712F and IT8705F also have optional 16-bit tachometer counters
-for fans 1 to 3. This is better (no more fan clock divider mess) but not
-compatible with the older chips and revisions. The 16-bit tachometer mode
+The IT8716F, IT8718F, IT8720F, IT8721F/IT8758E, IT8732F, IT8781F, IT8782F,
+IT8783E/F, and late IT8712F and IT8705F also have optional 16-bit tachometer
+counters for fans 1 to 3. This is better (no more fan clock divider mess) but
+not compatible with the older chips and revisions. The 16-bit tachometer mode
 is enabled by the driver when one of the above chips is detected.
 
 The IT8726F is just bit enhanced IT8716F with additional hardware
@@ -159,6 +163,9 @@ IT8728F. It only supports 16-bit fan mode.
 
 The IT8790E supports up to 3 fans. 16-bit fan mode is always enabled.
 
+The IT8732F supports a closed-loop mode for fan control, but this is not
+currently implemented by the driver.
+
 Temperatures are measured in degrees Celsius. An alarm is triggered once
 when the Overtemperature Shutdown limit is crossed.
 
@@ -173,12 +180,14 @@ is done.
 Voltage sensors (also known as IN sensors) report their values in volts. An
 alarm is triggered if the voltage has crossed a programmable minimum or
 maximum limit. Note that minimum in this case always means 'closest to
-zero'; this is important for negative voltage measurements. All voltage
-inputs can measure voltages between 0 and 4.08 volts, with a resolution of
-0.016 volt (except IT8603E, IT8721F/IT8758E and IT8728F: 0.012 volt.) The
-battery voltage in8 does not have limit registers.
-
-On the IT8603E, IT8721F/IT8758E, IT8781F, IT8782F, and IT8783E/F, some
+zero'; this is important for negative voltage measurements. On most chips, all
+voltage inputs can measure voltages between 0 and 4.08 volts, with a resolution
+of 0.016 volt.  IT8603E, IT8721F/IT8758E and IT8728F can measure between 0 and
+3.06 volts, with a resolution of 0.012 volt.  IT8732F can measure between 0 and
+2.8 volts with a resolution of 0.0109 volt.  The battery voltage in8 does not
+have limit registers.
+
+On the IT8603E, IT8721F/IT8758E, IT8732F, IT8781F, IT8782F, and IT8783E/F, some
 voltage inputs are internal and scaled inside the chip:
 * in3 (optional)
 * in7 (optional for IT8781F, IT8782F, and IT8783E/F)
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 7c65b7334738..500b262b89bb 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -609,8 +609,8 @@ config SENSORS_IT87
 	depends on !PPC
 	select HWMON_VID
 	help
-	  If you say yes here you get support for ITE IT8705F, IT8712F,
-	  IT8716F, IT8718F, IT8720F, IT8721F, IT8726F, IT8728F, IT8758E,
+	  If you say yes here you get support for ITE IT8705F, IT8712F, IT8716F,
+	  IT8718F, IT8720F, IT8721F, IT8726F, IT8728F, IT8732F, IT8758E,
 	  IT8771E, IT8772E, IT8781F, IT8782F, IT8783E/F, IT8786E, IT8790E,
 	  IT8603E, IT8620E, and IT8623E sensor chips, and the SiS950 clone.
 
diff --git a/drivers/hwmon/it87.c b/drivers/hwmon/it87.c
index d0ee556e8ce0..1896e26df634 100644
--- a/drivers/hwmon/it87.c
+++ b/drivers/hwmon/it87.c
@@ -21,6 +21,7 @@
  *            IT8721F  Super I/O chip w/LPC interface
  *            IT8726F  Super I/O chip w/LPC interface
  *            IT8728F  Super I/O chip w/LPC interface
+ *            IT8732F  Super I/O chip w/LPC interface
  *            IT8758E  Super I/O chip w/LPC interface
  *            IT8771E  Super I/O chip w/LPC interface
  *            IT8772E  Super I/O chip w/LPC interface
@@ -69,8 +70,9 @@
 
 #define DRVNAME "it87"
 
-enum chips { it87, it8712, it8716, it8718, it8720, it8721, it8728, it8771,
-	     it8772, it8781, it8782, it8783, it8786, it8790, it8603, it8620 };
+enum chips { it87, it8712, it8716, it8718, it8720, it8721, it8728, it8732,
+	     it8771, it8772, it8781, it8782, it8783, it8786, it8790, it8603,
+	     it8620 };
 
 static unsigned short force_id;
 module_param(force_id, ushort, 0);
@@ -148,6 +150,7 @@ static inline void superio_exit(void)
 #define IT8721F_DEVID 0x8721
 #define IT8726F_DEVID 0x8726
 #define IT8728F_DEVID 0x8728
+#define IT8732F_DEVID 0x8732
 #define IT8771E_DEVID 0x8771
 #define IT8772E_DEVID 0x8772
 #define IT8781F_DEVID 0x8781
@@ -265,6 +268,7 @@ struct it87_devices {
 #define FEAT_VID		(1 << 9)	/* Set if chip supports VID */
 #define FEAT_IN7_INTERNAL	(1 << 10)	/* Set if in7 is internal */
 #define FEAT_SIX_FANS		(1 << 11)	/* Supports six fans */
+#define FEAT_10_9MV_ADC		(1 << 12)
 
 static const struct it87_devices it87_devices[] = {
 	[it87] = {
@@ -315,6 +319,15 @@ static const struct it87_devices it87_devices[] = {
 		  | FEAT_IN7_INTERNAL,
 		.peci_mask = 0x07,
 	},
+	[it8732] = {
+		.name = "it8732",
+		.suffix = "F",
+		.features = FEAT_NEWER_AUTOPWM | FEAT_16BIT_FANS
+		  | FEAT_TEMP_OFFSET | FEAT_TEMP_OLD_PECI | FEAT_TEMP_PECI
+		  | FEAT_10_9MV_ADC | FEAT_IN7_INTERNAL,
+		.peci_mask = 0x07,
+		.old_peci_mask = 0x02,	/* Actually reports PCH */
+	},
 	[it8771] = {
 		.name = "it8771",
 		.suffix = "E",
@@ -391,6 +404,7 @@ static const struct it87_devices it87_devices[] = {
 
 #define has_16bit_fans(data)	((data)->features & FEAT_16BIT_FANS)
 #define has_12mv_adc(data)	((data)->features & FEAT_12MV_ADC)
+#define has_10_9mv_adc(data)	((data)->features & FEAT_10_9MV_ADC)
 #define has_newer_autopwm(data)	((data)->features & FEAT_NEWER_AUTOPWM)
 #define has_old_autopwm(data)	((data)->features & FEAT_OLD_AUTOPWM)
 #define has_temp_offset(data)	((data)->features & FEAT_TEMP_OFFSET)
@@ -475,7 +489,14 @@ struct it87_data {
 
 static int adc_lsb(const struct it87_data *data, int nr)
 {
-	int lsb = has_12mv_adc(data) ? 12 : 16;
+	int lsb;
+
+	if (has_12mv_adc(data))
+		lsb = 120;
+	else if (has_10_9mv_adc(data))
+		lsb = 109;
+	else
+		lsb = 160;
 	if (data->in_scaled & (1 << nr))
 		lsb <<= 1;
 	return lsb;
@@ -483,13 +504,13 @@ static int adc_lsb(const struct it87_data *data, int nr)
 
 static u8 in_to_reg(const struct it87_data *data, int nr, long val)
 {
-	val = DIV_ROUND_CLOSEST(val, adc_lsb(data, nr));
+	val = DIV_ROUND_CLOSEST(val * 10, adc_lsb(data, nr));
 	return clamp_val(val, 0, 255);
 }
 
 static int in_from_reg(const struct it87_data *data, int nr, int val)
 {
-	return val * adc_lsb(data, nr);
+	return DIV_ROUND_CLOSEST(val * adc_lsb(data, nr), 10);
 }
 
 static inline u8 FAN_TO_REG(long rpm, int div)
@@ -1515,9 +1536,14 @@ static ssize_t show_label(struct device *dev, struct device_attribute *attr,
 	};
 	struct it87_data *data = dev_get_drvdata(dev);
 	int nr = to_sensor_dev_attr(attr)->index;
+	const char *label;
 
-	return sprintf(buf, "%s\n", has_12mv_adc(data) ? labels_it8721[nr]
-						       : labels[nr]);
+	if (has_12mv_adc(data) || has_10_9mv_adc(data))
+		label = labels_it8721[nr];
+	else
+		label = labels[nr];
+
+	return sprintf(buf, "%s\n", label);
 }
 static SENSOR_DEVICE_ATTR(in3_label, S_IRUGO, show_label, NULL, 0);
 static SENSOR_DEVICE_ATTR(in7_label, S_IRUGO, show_label, NULL, 1);
@@ -1853,6 +1879,9 @@ static int __init it87_find(unsigned short *address,
 	case IT8728F_DEVID:
 		sio_data->type = it8728;
 		break;
+	case IT8732F_DEVID:
+		sio_data->type = it8732;
+		break;
 	case IT8771E_DEVID:
 		sio_data->type = it8771;
 		break;
-- 
cgit v1.2.3


From 1f61cab8a729e00af77b51b44c3a8dc8ef3b3eb9 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Mon, 8 Jun 2015 11:15:23 -0700
Subject: hwmon: (pmbus) Add support for MAX20751

MAX20751 is a multiphase power controller with internal buck converter.
It uses VR12.0 to report the output voltage. This requires an explicit
driver, since the VR version can not be auto-detected.

The chip supports a manufacturer specific command to fine-tune the output
voltage.  This command is not currently supported.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/hwmon/max20751   | 77 ++++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS                    |  7 ++++
 drivers/hwmon/pmbus/Kconfig    | 10 ++++++
 drivers/hwmon/pmbus/Makefile   |  1 +
 drivers/hwmon/pmbus/max20751.c | 64 +++++++++++++++++++++++++++++++++++
 5 files changed, 159 insertions(+)
 create mode 100644 Documentation/hwmon/max20751
 create mode 100644 drivers/hwmon/pmbus/max20751.c

(limited to 'Documentation')

diff --git a/Documentation/hwmon/max20751 b/Documentation/hwmon/max20751
new file mode 100644
index 000000000000..f9fa25ebb521
--- /dev/null
+++ b/Documentation/hwmon/max20751
@@ -0,0 +1,77 @@
+Kernel driver max20751
+======================
+
+Supported chips:
+  * maxim MAX20751
+    Prefix: 'max20751'
+    Addresses scanned: -
+    Datasheet: http://datasheets.maximintegrated.com/en/ds/MAX20751.pdf
+    Application note: http://pdfserv.maximintegrated.com/en/an/AN5941.pdf
+
+Author: Guenter Roeck <linux@roeck-us.net>
+
+
+Description
+-----------
+
+This driver supports MAX20751 Multiphase Master with PMBus Interface
+and Internal Buck Converter.
+
+The driver is a client driver to the core PMBus driver.
+Please see Documentation/hwmon/pmbus for details on PMBus client drivers.
+
+
+Usage Notes
+-----------
+
+This driver does not auto-detect devices. You will have to instantiate the
+devices explicitly. Please see Documentation/i2c/instantiating-devices for
+details.
+
+
+Platform data support
+---------------------
+
+The driver supports standard PMBus driver platform data.
+
+
+Sysfs entries
+-------------
+
+The following attributes are supported.
+
+in1_label		"vin1"
+in1_input		Measured voltage.
+in1_min			Minimum input voltage.
+in1_max			Maximum input voltage.
+in1_lcrit		Critical minimum input voltage.
+in1_crit		Critical maximum input voltage.
+in1_min_alarm		Input voltage low alarm.
+in1_lcrit_alarm		Input voltage critical low alarm.
+in1_min_alarm		Input voltage low alarm.
+in1_max_alarm		Input voltage high alarm.
+
+in2_label		"vout1"
+in2_input		Measured voltage.
+in2_min			Minimum output voltage.
+in2_max			Maximum output voltage.
+in2_lcrit		Critical minimum output voltage.
+in2_crit		Critical maximum output voltage.
+in2_min_alarm		Output voltage low alarm.
+in2_lcrit_alarm		Output voltage critical low alarm.
+in2_min_alarm		Output voltage low alarm.
+in2_max_alarm		Output voltage high alarm.
+
+curr1_input		Measured output current.
+curr1_label		"iout1"
+curr1_max		Maximum output current.
+curr1_alarm		Current high alarm.
+
+temp1_input		Measured temperature.
+temp1_max		Maximum temperature.
+temp1_crit		Critical high temperature.
+temp1_max_alarm		Chip temperature high alarm.
+temp1_crit_alarm	Chip temperature critical high alarm.
+
+power1_input		Output power.
+power1_label		"pout1"
diff --git a/MAINTAINERS b/MAINTAINERS
index a9ae6c105520..6a406b607eeb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6539,6 +6539,13 @@ S:	Maintained
 F:	Documentation/hwmon/max16065
 F:	drivers/hwmon/max16065.c
 
+MAX20751 HARDWARE MONITOR DRIVER
+M:	Guenter Roeck <linux@roeck-us.net>
+L:	lm-sensors@lm-sensors.org
+S:	Maintained
+F:	Documentation/hwmon/max20751
+F:	drivers/hwmon/max20751.c
+
 MAX6650 HARDWARE MONITOR AND FAN CONTROLLER DRIVER
 M:	"Hans J. Koch" <hjk@hansjkoch.de>
 L:	lm-sensors@lm-sensors.org
diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index 67901cb08f77..6e1e6fe434f7 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -73,6 +73,16 @@ config SENSORS_MAX16064
 	  This driver can also be built as a module. If so, the module will
 	  be called max16064.
 
+config SENSORS_MAX20751
+	tristate "Maxim MAX20751"
+	default n
+	help
+	  If you say yes here you get hardware monitoring support for Maxim
+	  MAX20751.
+
+	  This driver can also be built as a module. If so, the module will
+	  be called max20751.
+
 config SENSORS_MAX34440
 	tristate "Maxim MAX34440 and compatibles"
 	default n
diff --git a/drivers/hwmon/pmbus/Makefile b/drivers/hwmon/pmbus/Makefile
index 1454293e985c..bce046d37f02 100644
--- a/drivers/hwmon/pmbus/Makefile
+++ b/drivers/hwmon/pmbus/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_SENSORS_ADM1275)	+= adm1275.o
 obj-$(CONFIG_SENSORS_LM25066)	+= lm25066.o
 obj-$(CONFIG_SENSORS_LTC2978)	+= ltc2978.o
 obj-$(CONFIG_SENSORS_MAX16064)	+= max16064.o
+obj-$(CONFIG_SENSORS_MAX20751)	+= max20751.o
 obj-$(CONFIG_SENSORS_MAX34440)	+= max34440.o
 obj-$(CONFIG_SENSORS_MAX8688)	+= max8688.o
 obj-$(CONFIG_SENSORS_TPS40422)	+= tps40422.o
diff --git a/drivers/hwmon/pmbus/max20751.c b/drivers/hwmon/pmbus/max20751.c
new file mode 100644
index 000000000000..ab74aeae8cf2
--- /dev/null
+++ b/drivers/hwmon/pmbus/max20751.c
@@ -0,0 +1,64 @@
+/*
+ * Hardware monitoring driver for Maxim MAX20751
+ *
+ * Copyright (c) 2015 Guenter Roeck
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/err.h>
+#include <linux/i2c.h>
+#include "pmbus.h"
+
+static struct pmbus_driver_info max20751_info = {
+	.pages = 1,
+	.format[PSC_VOLTAGE_IN] = linear,
+	.format[PSC_VOLTAGE_OUT] = vid,
+	.vrm_version = vr12,
+	.format[PSC_TEMPERATURE] = linear,
+	.format[PSC_CURRENT_OUT] = linear,
+	.format[PSC_POWER] = linear,
+	.func[0] = PMBUS_HAVE_VIN | PMBUS_HAVE_VOUT | PMBUS_HAVE_STATUS_VOUT |
+		PMBUS_HAVE_IOUT | PMBUS_HAVE_STATUS_IOUT |
+		PMBUS_HAVE_TEMP | PMBUS_HAVE_STATUS_TEMP |
+		PMBUS_HAVE_POUT,
+};
+
+static int max20751_probe(struct i2c_client *client,
+			  const struct i2c_device_id *id)
+{
+	return pmbus_do_probe(client, id, &max20751_info);
+}
+
+static const struct i2c_device_id max20751_id[] = {
+	{"max20751", 0},
+	{}
+};
+
+MODULE_DEVICE_TABLE(i2c, max20751_id);
+
+static struct i2c_driver max20751_driver = {
+	.driver = {
+		   .name = "max20751",
+		   },
+	.probe = max20751_probe,
+	.remove = pmbus_do_remove,
+	.id_table = max20751_id,
+};
+
+module_i2c_driver(max20751_driver);
+
+MODULE_AUTHOR("Guenter Roeck <linux@roeck-us.net>");
+MODULE_DESCRIPTION("PMBus driver for Maxim MAX20751");
+MODULE_LICENSE("GPL");
-- 
cgit v1.2.3


From 1a94acf858000836d28b13942eebf5cbffc9ac34 Mon Sep 17 00:00:00 2001
From: Dinh Nguyen <dinguyen@opensource.altera.com>
Date: Fri, 24 Jul 2015 14:05:06 -0500
Subject: ARM: socfpga: dts: add "altr,modrst-offset" property

The "altr,modrst-offset" property represents the offset into the reset manager
that is the first register to be used by the driver to bring peripherals out
of reset.

Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com>
---
 Documentation/devicetree/bindings/reset/socfpga-reset.txt | 2 ++
 arch/arm/boot/dts/socfpga.dtsi                            | 1 +
 arch/arm/boot/dts/socfpga_arria10.dtsi                    | 1 +
 3 files changed, 4 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/reset/socfpga-reset.txt b/Documentation/devicetree/bindings/reset/socfpga-reset.txt
index 32c1c8bfd5dc..98c9f560e5c5 100644
--- a/Documentation/devicetree/bindings/reset/socfpga-reset.txt
+++ b/Documentation/devicetree/bindings/reset/socfpga-reset.txt
@@ -3,6 +3,7 @@ Altera SOCFPGA Reset Manager
 Required properties:
 - compatible : "altr,rst-mgr"
 - reg : Should contain 1 register ranges(address and length)
+- altr,modrst-offset : Should contain the offset of the first modrst register.
 - #reset-cells: 1
 
 Example:
@@ -10,4 +11,5 @@ Example:
 		#reset-cells = <1>;
 		compatible = "altr,rst-mgr";
 		reg = <0xffd05000 0x1000>;
+		altr,modrst-offset = <0x10>;
 	};
diff --git a/arch/arm/boot/dts/socfpga.dtsi b/arch/arm/boot/dts/socfpga.dtsi
index 80f924deed37..ed185b92bcbe 100644
--- a/arch/arm/boot/dts/socfpga.dtsi
+++ b/arch/arm/boot/dts/socfpga.dtsi
@@ -752,6 +752,7 @@
 			#reset-cells = <1>;
 			compatible = "altr,rst-mgr";
 			reg = <0xffd05000 0x1000>;
+			altr,modrst-offset = <0x10>;
 		};
 
 		usbphy0: usbphy@0 {
diff --git a/arch/arm/boot/dts/socfpga_arria10.dtsi b/arch/arm/boot/dts/socfpga_arria10.dtsi
index 4779b07310df..1d341dc3e598 100644
--- a/arch/arm/boot/dts/socfpga_arria10.dtsi
+++ b/arch/arm/boot/dts/socfpga_arria10.dtsi
@@ -588,6 +588,7 @@
 			#reset-cells = <1>;
 			compatible = "altr,rst-mgr";
 			reg = <0xffd05000 0x100>;
+			altr,modrst-offset = <0x20>;
 		};
 
 		scu: snoop-control-unit@ffffc000 {
-- 
cgit v1.2.3


From 5faf6e1f58b4488a8b24a722ccf317ed67a8e8d8 Mon Sep 17 00:00:00 2001
From: Wolfram Sang <wsa+renesas@sang-engineering.com>
Date: Sat, 11 Jul 2015 09:46:23 +0200
Subject: i2c: emev2: add driver

Add a basic driver for the Renesas EMEV2 SoC. Based on the driver from
the BSP which was first worked on by Ian, and made ready for upstream by
me.

Signed-off-by: Ian Molton <ian.molton@codethink.co.uk>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
---
 .../devicetree/bindings/i2c/i2c-emev2.txt          |  22 ++
 drivers/i2c/busses/Kconfig                         |   7 +
 drivers/i2c/busses/Makefile                        |   1 +
 drivers/i2c/busses/i2c-emev2.c                     | 332 +++++++++++++++++++++
 4 files changed, 362 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/i2c/i2c-emev2.txt
 create mode 100644 drivers/i2c/busses/i2c-emev2.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/i2c/i2c-emev2.txt b/Documentation/devicetree/bindings/i2c/i2c-emev2.txt
new file mode 100644
index 000000000000..5ed1ea1c7e14
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/i2c-emev2.txt
@@ -0,0 +1,22 @@
+Device tree configuration for Renesas EMEV2 IIC controller
+
+Required properties:
+- compatible      : "renesas,iic-emev2"
+- reg             : address start and address range size of device
+- interrupts      : specifier for the IIC controller interrupt
+- clocks          : phandle to the IP core SCLK
+- clock-names     : must be "sclk"
+- #address-cells  : should be <1>
+- #size-cells     : should be <0>
+
+Example:
+
+	iic0: i2c@e0070000 {
+		#address-cells = <1>;
+		#size-cells = <0>;
+		compatible = "renesas,iic-emev2";
+		reg = <0xe0070000 0x28>;
+		interrupts = <0 32 IRQ_TYPE_EDGE_RISING>;
+		clocks = <&iic0_sclk>;
+		clock-names = "sclk";
+	};
diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 577d58d1f1a1..0b798ae708fe 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -526,6 +526,13 @@ config I2C_EG20T
 	  ML7213/ML7223/ML7831 is companion chip for Intel Atom E6xx series.
 	  ML7213/ML7223/ML7831 is completely compatible for Intel EG20T PCH.
 
+config I2C_EMEV2
+	tristate "EMMA Mobile series I2C adapter"
+	depends on HAVE_CLK
+	help
+	  If you say yes to this option, support will be included for the
+	  I2C interface on the Renesas Electronics EM/EV family of processors.
+
 config I2C_EXYNOS5
 	tristate "Exynos5 high-speed I2C driver"
 	depends on ARCH_EXYNOS && OF
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index e5f537c80da0..50e8bbb65f1c 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -48,6 +48,7 @@ i2c-designware-pci-objs := i2c-designware-pcidrv.o
 obj-$(CONFIG_I2C_DIGICOLOR)	+= i2c-digicolor.o
 obj-$(CONFIG_I2C_EFM32)		+= i2c-efm32.o
 obj-$(CONFIG_I2C_EG20T)		+= i2c-eg20t.o
+obj-$(CONFIG_I2C_EMEV2)		+= i2c-emev2.o
 obj-$(CONFIG_I2C_EXYNOS5)	+= i2c-exynos5.o
 obj-$(CONFIG_I2C_GPIO)		+= i2c-gpio.o
 obj-$(CONFIG_I2C_HIGHLANDER)	+= i2c-highlander.o
diff --git a/drivers/i2c/busses/i2c-emev2.c b/drivers/i2c/busses/i2c-emev2.c
new file mode 100644
index 000000000000..192ef6b50c79
--- /dev/null
+++ b/drivers/i2c/busses/i2c-emev2.c
@@ -0,0 +1,332 @@
+/*
+ * I2C driver for the Renesas EMEV2 SoC
+ *
+ * Copyright (C) 2015 Wolfram Sang <wsa@sang-engineering.com>
+ * Copyright 2013 Codethink Ltd.
+ * Copyright 2010-2015 Renesas Electronics Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ */
+
+#include <linux/clk.h>
+#include <linux/completion.h>
+#include <linux/device.h>
+#include <linux/i2c.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/platform_device.h>
+#include <linux/sched.h>
+
+/* I2C Registers */
+#define I2C_OFS_IICACT0		0x00	/* start */
+#define I2C_OFS_IIC0		0x04	/* shift */
+#define I2C_OFS_IICC0		0x08	/* control */
+#define I2C_OFS_SVA0		0x0c	/* slave address */
+#define I2C_OFS_IICCL0		0x10	/* clock select */
+#define I2C_OFS_IICX0		0x14	/* extension */
+#define I2C_OFS_IICS0		0x18	/* status */
+#define I2C_OFS_IICSE0		0x1c	/* status For emulation */
+#define I2C_OFS_IICF0		0x20	/* IIC flag */
+
+/* I2C IICACT0 Masks */
+#define I2C_BIT_IICE0		0x0001
+
+/* I2C IICC0 Masks */
+#define I2C_BIT_LREL0		0x0040
+#define I2C_BIT_WREL0		0x0020
+#define I2C_BIT_SPIE0		0x0010
+#define I2C_BIT_WTIM0		0x0008
+#define I2C_BIT_ACKE0		0x0004
+#define I2C_BIT_STT0		0x0002
+#define I2C_BIT_SPT0		0x0001
+
+/* I2C IICCL0 Masks */
+#define I2C_BIT_SMC0		0x0008
+#define I2C_BIT_DFC0		0x0004
+
+/* I2C IICSE0 Masks */
+#define I2C_BIT_MSTS0		0x0080
+#define I2C_BIT_ALD0		0x0040
+#define I2C_BIT_EXC0		0x0020
+#define I2C_BIT_COI0		0x0010
+#define I2C_BIT_TRC0		0x0008
+#define I2C_BIT_ACKD0		0x0004
+#define I2C_BIT_STD0		0x0002
+#define I2C_BIT_SPD0		0x0001
+
+/* I2C IICF0 Masks */
+#define I2C_BIT_STCF		0x0080
+#define I2C_BIT_IICBSY		0x0040
+#define I2C_BIT_STCEN		0x0002
+#define I2C_BIT_IICRSV		0x0001
+
+struct em_i2c_device {
+	void __iomem *base;
+	struct i2c_adapter adap;
+	struct completion msg_done;
+	struct clk *sclk;
+};
+
+static inline void em_clear_set_bit(struct em_i2c_device *priv, u8 clear, u8 set, u8 reg)
+{
+	writeb((readb(priv->base + reg) & ~clear) | set, priv->base + reg);
+}
+
+static int em_i2c_wait_for_event(struct em_i2c_device *priv)
+{
+	unsigned long time_left;
+	int status;
+
+	reinit_completion(&priv->msg_done);
+
+	time_left = wait_for_completion_timeout(&priv->msg_done, priv->adap.timeout);
+
+	if (!time_left)
+		return -ETIMEDOUT;
+
+	status = readb(priv->base + I2C_OFS_IICSE0);
+	return status & I2C_BIT_ALD0 ? -EAGAIN : status;
+}
+
+static void em_i2c_stop(struct em_i2c_device *priv)
+{
+	/* Send Stop condition */
+	em_clear_set_bit(priv, 0, I2C_BIT_SPT0 | I2C_BIT_SPIE0, I2C_OFS_IICC0);
+
+	/* Wait for stop condition */
+	em_i2c_wait_for_event(priv);
+}
+
+static void em_i2c_reset(struct i2c_adapter *adap)
+{
+	struct em_i2c_device *priv = i2c_get_adapdata(adap);
+	int retr;
+
+	/* If I2C active */
+	if (readb(priv->base + I2C_OFS_IICACT0) & I2C_BIT_IICE0) {
+		/* Disable I2C operation */
+		writeb(0, priv->base + I2C_OFS_IICACT0);
+
+		retr = 1000;
+		while (readb(priv->base + I2C_OFS_IICACT0) == 1 && retr)
+			retr--;
+		WARN_ON(retr == 0);
+	}
+
+	/* Transfer mode set */
+	writeb(I2C_BIT_DFC0, priv->base + I2C_OFS_IICCL0);
+
+	/* Can Issue start without detecting a stop, Reservation disabled. */
+	writeb(I2C_BIT_STCEN | I2C_BIT_IICRSV, priv->base + I2C_OFS_IICF0);
+
+	/* I2C enable, 9 bit interrupt mode */
+	writeb(I2C_BIT_WTIM0, priv->base + I2C_OFS_IICC0);
+
+	/* Enable I2C operation */
+	writeb(I2C_BIT_IICE0, priv->base + I2C_OFS_IICACT0);
+
+	retr = 1000;
+	while (readb(priv->base + I2C_OFS_IICACT0) == 0 && retr)
+		retr--;
+	WARN_ON(retr == 0);
+}
+
+static int __em_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msg,
+				int stop)
+{
+	struct em_i2c_device *priv = i2c_get_adapdata(adap);
+	int count, status, read = !!(msg->flags & I2C_M_RD);
+
+	/* Send start condition */
+	em_clear_set_bit(priv, 0, I2C_BIT_ACKE0 | I2C_BIT_WTIM0, I2C_OFS_IICC0);
+	em_clear_set_bit(priv, 0, I2C_BIT_STT0, I2C_OFS_IICC0);
+
+	/* Send slave address and R/W type */
+	writeb((msg->addr << 1) | read, priv->base + I2C_OFS_IIC0);
+
+	/* Wait for transaction */
+	status = em_i2c_wait_for_event(priv);
+	if (status < 0)
+		goto out_reset;
+
+	/* Received NACK (result of setting slave address and R/W) */
+	if (!(status & I2C_BIT_ACKD0)) {
+		em_i2c_stop(priv);
+		goto out;
+	}
+
+	/* Extra setup for read transactions */
+	if (read) {
+		/* 8 bit interrupt mode */
+		em_clear_set_bit(priv, I2C_BIT_WTIM0, I2C_BIT_ACKE0, I2C_OFS_IICC0);
+		em_clear_set_bit(priv, I2C_BIT_WTIM0, I2C_BIT_WREL0, I2C_OFS_IICC0);
+
+		/* Wait for transaction */
+		status = em_i2c_wait_for_event(priv);
+		if (status < 0)
+			goto out_reset;
+	}
+
+	/* Send / receive data */
+	for (count = 0; count < msg->len; count++) {
+		if (read) { /* Read transaction */
+			msg->buf[count] = readb(priv->base + I2C_OFS_IIC0);
+			em_clear_set_bit(priv, 0, I2C_BIT_WREL0, I2C_OFS_IICC0);
+
+		} else { /* Write transaction */
+			/* Received NACK */
+			if (!(status & I2C_BIT_ACKD0)) {
+				em_i2c_stop(priv);
+				goto out;
+			}
+
+			/* Write data */
+			writeb(msg->buf[count], priv->base + I2C_OFS_IIC0);
+		}
+
+		/* Wait for R/W transaction */
+		status = em_i2c_wait_for_event(priv);
+		if (status < 0)
+			goto out_reset;
+	}
+
+	if (stop)
+		em_i2c_stop(priv);
+
+	return count;
+
+out_reset:
+	em_i2c_reset(adap);
+out:
+	return status < 0 ? status : -ENXIO;
+}
+
+static int em_i2c_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs,
+	int num)
+{
+	struct em_i2c_device *priv = i2c_get_adapdata(adap);
+	int ret, i;
+
+	if (readb(priv->base + I2C_OFS_IICF0) & I2C_BIT_IICBSY)
+		return -EAGAIN;
+
+	for (i = 0; i < num; i++) {
+		ret = __em_i2c_xfer(adap, &msgs[i], (i == (num - 1)));
+		if (ret < 0)
+			return ret;
+	}
+
+	/* I2C transfer completed */
+	return num;
+}
+
+static irqreturn_t em_i2c_irq_handler(int this_irq, void *dev_id)
+{
+	struct em_i2c_device *priv = dev_id;
+
+	complete(&priv->msg_done);
+	return IRQ_HANDLED;
+}
+
+static u32 em_i2c_func(struct i2c_adapter *adap)
+{
+	return I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL;
+}
+
+static struct i2c_algorithm em_i2c_algo = {
+	.master_xfer = em_i2c_xfer,
+	.functionality = em_i2c_func,
+};
+
+static int em_i2c_probe(struct platform_device *pdev)
+{
+	struct em_i2c_device *priv;
+	struct resource *r;
+	int irq, ret;
+
+	priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(&pdev->dev, r);
+	if (IS_ERR(priv->base))
+		return PTR_ERR(priv->base);
+
+	strlcpy(priv->adap.name, "EMEV2 I2C", sizeof(priv->adap.name));
+
+	priv->sclk = devm_clk_get(&pdev->dev, "sclk");
+	if (IS_ERR(priv->sclk))
+		return PTR_ERR(priv->sclk);
+
+	clk_prepare_enable(priv->sclk);
+
+	priv->adap.timeout = msecs_to_jiffies(100);
+	priv->adap.retries = 5;
+	priv->adap.dev.parent = &pdev->dev;
+	priv->adap.algo = &em_i2c_algo;
+	priv->adap.owner = THIS_MODULE;
+	priv->adap.dev.of_node = pdev->dev.of_node;
+
+	init_completion(&priv->msg_done);
+
+	platform_set_drvdata(pdev, priv);
+	i2c_set_adapdata(&priv->adap, priv);
+
+	em_i2c_reset(&priv->adap);
+
+	irq = platform_get_irq(pdev, 0);
+	ret = devm_request_irq(&pdev->dev, irq, em_i2c_irq_handler, 0,
+				"em_i2c", priv);
+	if (ret)
+		goto err_clk;
+
+	ret = i2c_add_adapter(&priv->adap);
+
+	if (ret)
+		goto err_clk;
+
+	dev_info(&pdev->dev, "Added i2c controller %d, irq %d\n", priv->adap.nr, irq);
+
+	return 0;
+
+err_clk:
+	clk_disable_unprepare(priv->sclk);
+	return ret;
+}
+
+static int em_i2c_remove(struct platform_device *dev)
+{
+	struct em_i2c_device *priv = platform_get_drvdata(dev);
+
+	i2c_del_adapter(&priv->adap);
+	clk_disable_unprepare(priv->sclk);
+
+	return 0;
+}
+
+static const struct of_device_id em_i2c_ids[] = {
+	{ .compatible = "renesas,iic-emev2", },
+	{ }
+};
+
+static struct platform_driver em_i2c_driver = {
+	.probe = em_i2c_probe,
+	.remove = em_i2c_remove,
+	.driver = {
+		.name = "em-i2c",
+		.of_match_table = em_i2c_ids,
+	}
+};
+module_platform_driver(em_i2c_driver);
+
+MODULE_DESCRIPTION("EMEV2 I2C bus driver");
+MODULE_AUTHOR("Ian Molton and Wolfram Sang <wsa@sang-engineering.com>");
+MODULE_LICENSE("GPL v2");
+MODULE_DEVICE_TABLE(of, em_i2c_ids);
-- 
cgit v1.2.3


From 63cab195bf498676619951e81ad5791e9d47c420 Mon Sep 17 00:00:00 2001
From: Anurag Kumar Vulisha <anurag.kumar.vulisha@xilinx.com>
Date: Fri, 10 Jul 2015 20:10:14 +0530
Subject: i2c: removed work arounds in i2c driver for Zynq Ultrascale+ MPSoC

Cadence 1.0 version has bugs which have been fixed in the cadence 1.4 version.
This patch removes the quirks present in the driver for cadence 1.4 version.

Signed-off-by: Anurag Kumar Vulisha <anuragku@xilinx.com>
[wsa: fixed indentation issues in r1p10_i2c_def]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
---
 .../devicetree/bindings/i2c/i2c-cadence.txt        |  6 +-
 drivers/i2c/busses/i2c-cadence.c                   | 68 ++++++++++++++++++----
 2 files changed, 62 insertions(+), 12 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/i2c/i2c-cadence.txt b/Documentation/devicetree/bindings/i2c/i2c-cadence.txt
index 7cb0b5608f49..ebaa90c58c8e 100644
--- a/Documentation/devicetree/bindings/i2c/i2c-cadence.txt
+++ b/Documentation/devicetree/bindings/i2c/i2c-cadence.txt
@@ -2,7 +2,11 @@ Binding for the Cadence I2C controller
 
 Required properties:
   - reg: Physical base address and size of the controller's register area.
-  - compatible: Compatibility string. Must be 'cdns,i2c-r1p10'.
+  - compatible: Should contain one of:
+		* "cdns,i2c-r1p10"
+		Note:	Use this when cadence i2c controller version 1.0 is used.
+		* "cdns,i2c-r1p14"
+		Note:	Use this when cadence i2c controller version 1.4 is used.
   - clocks: Input clock specifier. Refer to common clock bindings.
   - interrupts: Interrupt specifier. Refer to interrupt bindings.
   - #address-cells: Should be 1.
diff --git a/drivers/i2c/busses/i2c-cadence.c b/drivers/i2c/busses/i2c-cadence.c
index 2ee78e099d30..18e48ae4a60d 100644
--- a/drivers/i2c/busses/i2c-cadence.c
+++ b/drivers/i2c/busses/i2c-cadence.c
@@ -17,6 +17,7 @@
 #include <linux/io.h>
 #include <linux/module.h>
 #include <linux/platform_device.h>
+#include <linux/of.h>
 
 /* Register offsets for the I2C device. */
 #define CDNS_I2C_CR_OFFSET		0x00 /* Control Register, RW */
@@ -113,6 +114,8 @@
 
 #define CDNS_I2C_TIMEOUT_MAX	0xFF
 
+#define CDNS_I2C_BROKEN_HOLD_BIT	BIT(0)
+
 #define cdns_i2c_readreg(offset)       readl_relaxed(id->membase + offset)
 #define cdns_i2c_writereg(val, offset) writel_relaxed(val, id->membase + offset)
 
@@ -135,6 +138,7 @@
  * @bus_hold_flag:	Flag used in repeated start for clearing HOLD bit
  * @clk:		Pointer to struct clk
  * @clk_rate_change_nb:	Notifier block for clock rate changes
+ * @quirks:		flag for broken hold bit usage in r1p10
  */
 struct cdns_i2c {
 	void __iomem *membase;
@@ -154,6 +158,11 @@ struct cdns_i2c {
 	unsigned int bus_hold_flag;
 	struct clk *clk;
 	struct notifier_block clk_rate_change_nb;
+	u32 quirks;
+};
+
+struct cdns_platform_data {
+	u32 quirks;
 };
 
 #define to_cdns_i2c(_nb)	container_of(_nb, struct cdns_i2c, \
@@ -172,6 +181,12 @@ static void cdns_i2c_clear_bus_hold(struct cdns_i2c *id)
 		cdns_i2c_writereg(reg & ~CDNS_I2C_CR_HOLD, CDNS_I2C_CR_OFFSET);
 }
 
+static inline bool cdns_is_holdquirk(struct cdns_i2c *id, bool hold_wrkaround)
+{
+	return (hold_wrkaround &&
+		(id->curr_recv_count == CDNS_I2C_FIFO_DEPTH + 1));
+}
+
 /**
  * cdns_i2c_isr - Interrupt handler for the I2C device
  * @irq:	irq number for the I2C device
@@ -186,6 +201,7 @@ static irqreturn_t cdns_i2c_isr(int irq, void *ptr)
 {
 	unsigned int isr_status, avail_bytes, updatetx;
 	unsigned int bytes_to_send;
+	bool hold_quirk;
 	struct cdns_i2c *id = ptr;
 	/* Signal completion only after everything is updated */
 	int done_flag = 0;
@@ -208,6 +224,8 @@ static irqreturn_t cdns_i2c_isr(int irq, void *ptr)
 	if (id->recv_count > id->curr_recv_count)
 		updatetx = 1;
 
+	hold_quirk = (id->quirks & CDNS_I2C_BROKEN_HOLD_BIT) && updatetx;
+
 	/* When receiving, handle data interrupt and completion interrupt */
 	if (id->p_recv_buf &&
 	    ((isr_status & CDNS_I2C_IXR_COMP) ||
@@ -229,8 +247,7 @@ static irqreturn_t cdns_i2c_isr(int irq, void *ptr)
 			id->recv_count--;
 			id->curr_recv_count--;
 
-			if (updatetx &&
-			    (id->curr_recv_count == CDNS_I2C_FIFO_DEPTH + 1))
+			if (cdns_is_holdquirk(id, hold_quirk))
 				break;
 		}
 
@@ -241,8 +258,7 @@ static irqreturn_t cdns_i2c_isr(int irq, void *ptr)
 		 * maintain transfer size non-zero while performing a large
 		 * receive operation.
 		 */
-		if (updatetx &&
-		    (id->curr_recv_count == CDNS_I2C_FIFO_DEPTH + 1)) {
+		if (cdns_is_holdquirk(id, hold_quirk)) {
 			/* wait while fifo is full */
 			while (cdns_i2c_readreg(CDNS_I2C_XFER_SIZE_OFFSET) !=
 			       (id->curr_recv_count - CDNS_I2C_FIFO_DEPTH))
@@ -264,6 +280,22 @@ static irqreturn_t cdns_i2c_isr(int irq, void *ptr)
 						  CDNS_I2C_XFER_SIZE_OFFSET);
 				id->curr_recv_count = id->recv_count;
 			}
+		} else if (id->recv_count && !hold_quirk &&
+						!id->curr_recv_count) {
+
+			/* Set the slave address in address register*/
+			cdns_i2c_writereg(id->p_msg->addr & CDNS_I2C_ADDR_MASK,
+						CDNS_I2C_ADDR_OFFSET);
+
+			if (id->recv_count > CDNS_I2C_TRANSFER_SIZE) {
+				cdns_i2c_writereg(CDNS_I2C_TRANSFER_SIZE,
+						CDNS_I2C_XFER_SIZE_OFFSET);
+				id->curr_recv_count = CDNS_I2C_TRANSFER_SIZE;
+			} else {
+				cdns_i2c_writereg(id->recv_count,
+						CDNS_I2C_XFER_SIZE_OFFSET);
+				id->curr_recv_count = id->recv_count;
+			}
 		}
 
 		/* Clear hold (if not repeated start) and signal completion */
@@ -535,11 +567,13 @@ static int cdns_i2c_master_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs,
 	int ret, count;
 	u32 reg;
 	struct cdns_i2c *id = adap->algo_data;
+	bool hold_quirk;
 
 	/* Check if the bus is free */
 	if (cdns_i2c_readreg(CDNS_I2C_SR_OFFSET) & CDNS_I2C_SR_BA)
 		return -EAGAIN;
 
+	hold_quirk = !!(id->quirks & CDNS_I2C_BROKEN_HOLD_BIT);
 	/*
 	 * Set the flag to one when multiple messages are to be
 	 * processed with a repeated start.
@@ -552,7 +586,7 @@ static int cdns_i2c_master_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs,
 		 * followed by any other message, an error is returned
 		 * indicating that this sequence is not supported.
 		 */
-		for (count = 0; count < num - 1; count++) {
+		for (count = 0; (count < num - 1 && hold_quirk); count++) {
 			if (msgs[count].flags & I2C_M_RD) {
 				dev_warn(adap->dev.parent,
 					 "Can't do repeated start after a receive message\n");
@@ -815,6 +849,17 @@ static int __maybe_unused cdns_i2c_resume(struct device *_dev)
 static SIMPLE_DEV_PM_OPS(cdns_i2c_dev_pm_ops, cdns_i2c_suspend,
 			 cdns_i2c_resume);
 
+static const struct cdns_platform_data r1p10_i2c_def = {
+	.quirks = CDNS_I2C_BROKEN_HOLD_BIT,
+};
+
+static const struct of_device_id cdns_i2c_of_match[] = {
+	{ .compatible = "cdns,i2c-r1p10", .data = &r1p10_i2c_def },
+	{ .compatible = "cdns,i2c-r1p14",},
+	{ /* end of table */ }
+};
+MODULE_DEVICE_TABLE(of, cdns_i2c_of_match);
+
 /**
  * cdns_i2c_probe - Platform registration call
  * @pdev:	Handle to the platform device structure
@@ -830,6 +875,7 @@ static int cdns_i2c_probe(struct platform_device *pdev)
 	struct resource *r_mem;
 	struct cdns_i2c *id;
 	int ret;
+	const struct of_device_id *match;
 
 	id = devm_kzalloc(&pdev->dev, sizeof(*id), GFP_KERNEL);
 	if (!id)
@@ -837,6 +883,12 @@ static int cdns_i2c_probe(struct platform_device *pdev)
 
 	platform_set_drvdata(pdev, id);
 
+	match = of_match_node(cdns_i2c_of_match, pdev->dev.of_node);
+	if (match && match->data) {
+		const struct cdns_platform_data *data = match->data;
+		id->quirks = data->quirks;
+	}
+
 	r_mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	id->membase = devm_ioremap_resource(&pdev->dev, r_mem);
 	if (IS_ERR(id->membase))
@@ -935,12 +987,6 @@ static int cdns_i2c_remove(struct platform_device *pdev)
 	return 0;
 }
 
-static const struct of_device_id cdns_i2c_of_match[] = {
-	{ .compatible = "cdns,i2c-r1p10", },
-	{ /* end of table */ }
-};
-MODULE_DEVICE_TABLE(of, cdns_i2c_of_match);
-
 static struct platform_driver cdns_i2c_drv = {
 	.driver = {
 		.name  = DRIVER_NAME,
-- 
cgit v1.2.3


From 82cd5d041c8c2cdceaa6833d2f73905279d1c94f Mon Sep 17 00:00:00 2001
From: Ondrej Zary <linux@rainbow-software.org>
Date: Mon, 13 Jul 2015 19:31:12 +0200
Subject: i2c: parport: Add VCT-jig adapter

Add support for VCT-jig parallel port I2C adapter to i2c-parport.

The adapter schematic can be found here (in the RAR file):
http://remont-aud.net/shop/22/desc/vct-jig-komplekt-dlja-samostojatelnoj-sborki

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
---
 Documentation/i2c/busses/i2c-parport | 1 +
 drivers/i2c/busses/i2c-parport.h     | 8 ++++++++
 2 files changed, 9 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/i2c/busses/i2c-parport b/Documentation/i2c/busses/i2c-parport
index 0e2d17b460fd..c3dbb3bfd814 100644
--- a/Documentation/i2c/busses/i2c-parport
+++ b/Documentation/i2c/busses/i2c-parport
@@ -20,6 +20,7 @@ It currently supports the following devices:
  * (type=5) Analog Devices evaluation boards: ADM1025, ADM1030, ADM1031
  * (type=6) Barco LPT->DVI (K5800236) adapter
  * (type=7) One For All JP1 parallel port adapter
+ * (type=8) VCT-jig
 
 These devices use different pinout configurations, so you have to tell
 the driver what you have, using the type module parameter. There is no
diff --git a/drivers/i2c/busses/i2c-parport.h b/drivers/i2c/busses/i2c-parport.h
index 4e1294536805..84a6616b072f 100644
--- a/drivers/i2c/busses/i2c-parport.h
+++ b/drivers/i2c/busses/i2c-parport.h
@@ -89,6 +89,13 @@ static const struct adapter_parm adapter_parm[] = {
 		.getsda	= { 0x80, PORT_STAT, 1 },
 		.init	= { 0x04, PORT_DATA, 1 },
 	},
+	/* type 8: VCT-jig */
+	{
+		.setsda	= { 0x04, PORT_DATA, 1 },
+		.setscl	= { 0x01, PORT_DATA, 1 },
+		.getsda	= { 0x40, PORT_STAT, 0 },
+		.getscl	= { 0x80, PORT_STAT, 1 },
+	},
 };
 
 static int type = -1;
@@ -103,4 +110,5 @@ MODULE_PARM_DESC(type,
 	" 5 = ADM1025, ADM1030 and ADM1031 evaluation boards\n"
 	" 6 = Barco LPT->DVI (K5800236) adapter\n"
 	" 7 = One For All JP1 parallel port adapter\n"
+	" 8 = VCT-jig\n"
 );
-- 
cgit v1.2.3


From 8ab7f089ec003f817f74c45a2563ed40a50de208 Mon Sep 17 00:00:00 2001
From: Denis Carikli <denis@eukrea.com>
Date: Thu, 23 Jul 2015 12:03:47 +0200
Subject: DT: i2c: Add ADS7828 and ADS7830 to list of trivial devices

This adds devicetree documentation for the bindings of the
ads7828 driver.

Signed-off-by: Denis Carikli <denis@eukrea.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
---
 Documentation/devicetree/bindings/i2c/trivial-devices.txt | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/i2c/trivial-devices.txt b/Documentation/devicetree/bindings/i2c/trivial-devices.txt
index 00f8652e193a..d77d412cbc68 100644
--- a/Documentation/devicetree/bindings/i2c/trivial-devices.txt
+++ b/Documentation/devicetree/bindings/i2c/trivial-devices.txt
@@ -95,6 +95,8 @@ stm,m41t00		Serial Access TIMEKEEPER
 stm,m41t62		Serial real-time clock (RTC) with alarm
 stm,m41t80		M41T80 - SERIAL ACCESS RTC WITH ALARMS
 taos,tsl2550		Ambient Light Sensor with SMBUS/Two Wire Serial Interface
+ti,ads7828		8-Channels, 12-bit ADC
+ti,ads7830		8-Channels, 8-bit ADC
 ti,tsc2003		I2C Touch-Screen Controller
 ti,tmp102		Low Power Digital Temperature Sensor with SMBUS/Two Wire Serial Interface
 ti,tmp103		Low Power Digital Temperature Sensor with SMBUS/Two Wire Serial Interface
-- 
cgit v1.2.3


From 301b06f80fb120b36ad6981038ffe8a0b5039d44 Mon Sep 17 00:00:00 2001
From: Wolfram Sang <wsa@the-dreams.de>
Date: Sat, 8 Aug 2015 20:30:33 +0200
Subject: rtc: bq32k: move binding docs to proper place

The I2C dir is not for I2C client devices! Move it to the proper folder.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Pavel Machek <pavel@denx.de>
---
 Documentation/devicetree/bindings/i2c/ti,bq32k.txt | 18 ------------------
 Documentation/devicetree/bindings/rtc/ti,bq32k.txt | 18 ++++++++++++++++++
 2 files changed, 18 insertions(+), 18 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/i2c/ti,bq32k.txt
 create mode 100644 Documentation/devicetree/bindings/rtc/ti,bq32k.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/i2c/ti,bq32k.txt b/Documentation/devicetree/bindings/i2c/ti,bq32k.txt
deleted file mode 100644
index e204906b9ad3..000000000000
--- a/Documentation/devicetree/bindings/i2c/ti,bq32k.txt
+++ /dev/null
@@ -1,18 +0,0 @@
-* TI BQ32000                I2C Serial Real-Time Clock
-
-Required properties:
-- compatible: Should contain "ti,bq32000".
-- reg: I2C address for chip
-
-Optional properties:
-- trickle-resistor-ohms : Selected resistor for trickle charger
-       Values usable are 1120 and 20180
-       Should be given if trickle charger should be enabled
-- trickle-diode-disable : Do not use internal trickle charger diode
-       Should be given if internal trickle charger diode should be disabled
-Example:
-       bq32000: rtc@68 {
-               compatible = "ti,bq32000";
-               trickle-resistor-ohms = <1120>;
-               reg = <0x68>;
-       };
diff --git a/Documentation/devicetree/bindings/rtc/ti,bq32k.txt b/Documentation/devicetree/bindings/rtc/ti,bq32k.txt
new file mode 100644
index 000000000000..e204906b9ad3
--- /dev/null
+++ b/Documentation/devicetree/bindings/rtc/ti,bq32k.txt
@@ -0,0 +1,18 @@
+* TI BQ32000                I2C Serial Real-Time Clock
+
+Required properties:
+- compatible: Should contain "ti,bq32000".
+- reg: I2C address for chip
+
+Optional properties:
+- trickle-resistor-ohms : Selected resistor for trickle charger
+       Values usable are 1120 and 20180
+       Should be given if trickle charger should be enabled
+- trickle-diode-disable : Do not use internal trickle charger diode
+       Should be given if internal trickle charger diode should be disabled
+Example:
+       bq32000: rtc@68 {
+               compatible = "ti,bq32000";
+               trickle-resistor-ohms = <1120>;
+               reg = <0x68>;
+       };
-- 
cgit v1.2.3


From 220c04f834f7bd76e1a33711add61735796dc7f2 Mon Sep 17 00:00:00 2001
From: Wolfram Sang <wsa@the-dreams.de>
Date: Sat, 8 Aug 2015 20:30:34 +0200
Subject: hwmon: max6697: move binding docs to proper place

The I2C dir is not for I2C client devices! Move it to the proper folder.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Guenter Roeck <linux@roeck-us.net>
---
 .../devicetree/bindings/hwmon/max6697.txt          | 64 ++++++++++++++++++++++
 Documentation/devicetree/bindings/i2c/max6697.txt  | 64 ----------------------
 2 files changed, 64 insertions(+), 64 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/hwmon/max6697.txt
 delete mode 100644 Documentation/devicetree/bindings/i2c/max6697.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/hwmon/max6697.txt b/Documentation/devicetree/bindings/hwmon/max6697.txt
new file mode 100644
index 000000000000..5f793998e4a4
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/max6697.txt
@@ -0,0 +1,64 @@
+max6697 properties
+
+Required properties:
+- compatible:
+	Should be one of
+		maxim,max6581
+		maxim,max6602
+		maxim,max6622
+		maxim,max6636
+		maxim,max6689
+		maxim,max6693
+		maxim,max6694
+		maxim,max6697
+		maxim,max6698
+		maxim,max6699
+- reg: I2C address
+
+Optional properties:
+
+- smbus-timeout-disable
+	Set to disable SMBus timeout. If not specified, SMBus timeout will be
+	enabled.
+- extended-range-enable
+	Only valid for MAX6581. Set to enable extended temperature range.
+	Extended temperature will be disabled if not specified.
+- beta-compensation-enable
+	Only valid for MAX6693 and MX6694. Set to enable beta compensation on
+	remote temperature channel 1.
+	Beta compensation will be disabled if not specified.
+- alert-mask
+	Alert bit mask. Alert disabled for bits set.
+	Select bit 0 for local temperature, bit 1..7 for remote temperatures.
+	If not specified, alert will be enabled for all channels.
+- over-temperature-mask
+	Over-temperature bit mask. Over-temperature reporting disabled for
+	bits set.
+	Select bit 0 for local temperature, bit 1..7 for remote temperatures.
+	If not specified, over-temperature reporting will be enabled for all
+	channels.
+- resistance-cancellation
+	Boolean for all chips other than MAX6581. Set to enable resistance
+	cancellation on remote temperature channel 1.
+	For MAX6581, resistance cancellation enabled for all channels if
+	specified as boolean, otherwise as per bit mask specified.
+	Only supported for remote temperatures (bit 1..7).
+	If not specified, resistance cancellation will be disabled for all
+	channels.
+- transistor-ideality
+	For MAX6581 only. Two values; first is bit mask, second is ideality
+	select value as per MAX6581 data sheet. Select bit 1..7 for remote
+	channels.
+	Transistor ideality will be initialized to default (1.008) if not
+	specified.
+
+Example:
+
+temp-sensor@1a {
+	compatible = "maxim,max6697";
+	reg = <0x1a>;
+	smbus-timeout-disable;
+	resistance-cancellation;
+	alert-mask = <0x72>;
+	over-temperature-mask = <0x7f>;
+};
diff --git a/Documentation/devicetree/bindings/i2c/max6697.txt b/Documentation/devicetree/bindings/i2c/max6697.txt
deleted file mode 100644
index 5f793998e4a4..000000000000
--- a/Documentation/devicetree/bindings/i2c/max6697.txt
+++ /dev/null
@@ -1,64 +0,0 @@
-max6697 properties
-
-Required properties:
-- compatible:
-	Should be one of
-		maxim,max6581
-		maxim,max6602
-		maxim,max6622
-		maxim,max6636
-		maxim,max6689
-		maxim,max6693
-		maxim,max6694
-		maxim,max6697
-		maxim,max6698
-		maxim,max6699
-- reg: I2C address
-
-Optional properties:
-
-- smbus-timeout-disable
-	Set to disable SMBus timeout. If not specified, SMBus timeout will be
-	enabled.
-- extended-range-enable
-	Only valid for MAX6581. Set to enable extended temperature range.
-	Extended temperature will be disabled if not specified.
-- beta-compensation-enable
-	Only valid for MAX6693 and MX6694. Set to enable beta compensation on
-	remote temperature channel 1.
-	Beta compensation will be disabled if not specified.
-- alert-mask
-	Alert bit mask. Alert disabled for bits set.
-	Select bit 0 for local temperature, bit 1..7 for remote temperatures.
-	If not specified, alert will be enabled for all channels.
-- over-temperature-mask
-	Over-temperature bit mask. Over-temperature reporting disabled for
-	bits set.
-	Select bit 0 for local temperature, bit 1..7 for remote temperatures.
-	If not specified, over-temperature reporting will be enabled for all
-	channels.
-- resistance-cancellation
-	Boolean for all chips other than MAX6581. Set to enable resistance
-	cancellation on remote temperature channel 1.
-	For MAX6581, resistance cancellation enabled for all channels if
-	specified as boolean, otherwise as per bit mask specified.
-	Only supported for remote temperatures (bit 1..7).
-	If not specified, resistance cancellation will be disabled for all
-	channels.
-- transistor-ideality
-	For MAX6581 only. Two values; first is bit mask, second is ideality
-	select value as per MAX6581 data sheet. Select bit 1..7 for remote
-	channels.
-	Transistor ideality will be initialized to default (1.008) if not
-	specified.
-
-Example:
-
-temp-sensor@1a {
-	compatible = "maxim,max6697";
-	reg = <0x1a>;
-	smbus-timeout-disable;
-	resistance-cancellation;
-	alert-mask = <0x72>;
-	over-temperature-mask = <0x7f>;
-};
-- 
cgit v1.2.3


From 8113627c3f777473262192dfb2c693f0e1f78ef5 Mon Sep 17 00:00:00 2001
From: Wolfram Sang <wsa@the-dreams.de>
Date: Sat, 8 Aug 2015 20:30:35 +0200
Subject: hwmon: ina2xx: move binding docs to proper place

The I2C dir is not for I2C client devices! Move it to the proper folder.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/devicetree/bindings/hwmon/ina2xx.txt | 22 ++++++++++++++++++++++
 Documentation/devicetree/bindings/i2c/ina2xx.txt   | 22 ----------------------
 2 files changed, 22 insertions(+), 22 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/hwmon/ina2xx.txt
 delete mode 100644 Documentation/devicetree/bindings/i2c/ina2xx.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/hwmon/ina2xx.txt b/Documentation/devicetree/bindings/hwmon/ina2xx.txt
new file mode 100644
index 000000000000..a2ad85d7e747
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/ina2xx.txt
@@ -0,0 +1,22 @@
+ina2xx properties
+
+Required properties:
+- compatible: Must be one of the following:
+	- "ti,ina219" for ina219
+	- "ti,ina220" for ina220
+	- "ti,ina226" for ina226
+	- "ti,ina230" for ina230
+- reg: I2C address
+
+Optional properties:
+
+- shunt-resistor
+	Shunt resistor value in micro-Ohm
+
+Example:
+
+ina220@44 {
+	compatible = "ti,ina220";
+	reg = <0x44>;
+	shunt-resistor = <1000>;
+};
diff --git a/Documentation/devicetree/bindings/i2c/ina2xx.txt b/Documentation/devicetree/bindings/i2c/ina2xx.txt
deleted file mode 100644
index a2ad85d7e747..000000000000
--- a/Documentation/devicetree/bindings/i2c/ina2xx.txt
+++ /dev/null
@@ -1,22 +0,0 @@
-ina2xx properties
-
-Required properties:
-- compatible: Must be one of the following:
-	- "ti,ina219" for ina219
-	- "ti,ina220" for ina220
-	- "ti,ina226" for ina226
-	- "ti,ina230" for ina230
-- reg: I2C address
-
-Optional properties:
-
-- shunt-resistor
-	Shunt resistor value in micro-Ohm
-
-Example:
-
-ina220@44 {
-	compatible = "ti,ina220";
-	reg = <0x44>;
-	shunt-resistor = <1000>;
-};
-- 
cgit v1.2.3


From 6e24d205a8aa78227c6f2573ed725b4517b5b1b3 Mon Sep 17 00:00:00 2001
From: Wolfram Sang <wsa@the-dreams.de>
Date: Sat, 8 Aug 2015 20:30:36 +0200
Subject: hwmon: ina209: move binding docs to proper place

The I2C dir is not for I2C client devices! Move it to the proper folder.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/devicetree/bindings/hwmon/ina209.txt | 18 ++++++++++++++++++
 Documentation/devicetree/bindings/i2c/ina209.txt   | 18 ------------------
 2 files changed, 18 insertions(+), 18 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/hwmon/ina209.txt
 delete mode 100644 Documentation/devicetree/bindings/i2c/ina209.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/hwmon/ina209.txt b/Documentation/devicetree/bindings/hwmon/ina209.txt
new file mode 100644
index 000000000000..9dd2bee80840
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/ina209.txt
@@ -0,0 +1,18 @@
+ina209 properties
+
+Required properties:
+- compatible: Must be "ti,ina209"
+- reg: I2C address
+
+Optional properties:
+
+- shunt-resistor
+	Shunt resistor value in micro-Ohm
+
+Example:
+
+temp-sensor@4c {
+	compatible = "ti,ina209";
+	reg = <0x4c>;
+	shunt-resistor = <5000>;
+};
diff --git a/Documentation/devicetree/bindings/i2c/ina209.txt b/Documentation/devicetree/bindings/i2c/ina209.txt
deleted file mode 100644
index 9dd2bee80840..000000000000
--- a/Documentation/devicetree/bindings/i2c/ina209.txt
+++ /dev/null
@@ -1,18 +0,0 @@
-ina209 properties
-
-Required properties:
-- compatible: Must be "ti,ina209"
-- reg: I2C address
-
-Optional properties:
-
-- shunt-resistor
-	Shunt resistor value in micro-Ohm
-
-Example:
-
-temp-sensor@4c {
-	compatible = "ti,ina209";
-	reg = <0x4c>;
-	shunt-resistor = <5000>;
-};
-- 
cgit v1.2.3


From 92b7cb5dc885b38b21093eefed8028b615952965 Mon Sep 17 00:00:00 2001
From: Roger Quadros <rogerq@ti.com>
Date: Mon, 3 Aug 2015 17:40:51 +0300
Subject: extcon: palmas: Support GPIO based USB ID detection

Some palmas based chip variants do not have OTG based ID logic.
For these variants we rely on GPIO based USB ID detection.

These chips do have VBUS comparator for VBUS detection so we
continue to use the old way of detecting VBUS.

Signed-off-by: Roger Quadros <rogerq@ti.com>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
---
 .../devicetree/bindings/extcon/extcon-palmas.txt   |   5 +-
 drivers/extcon/extcon-palmas.c                     | 129 ++++++++++++++++++---
 drivers/extcon/extcon-usb-gpio.c                   |   1 +
 include/linux/mfd/palmas.h                         |   7 ++
 4 files changed, 126 insertions(+), 16 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/extcon/extcon-palmas.txt b/Documentation/devicetree/bindings/extcon/extcon-palmas.txt
index 45414bbcd945..f61d5af44a27 100644
--- a/Documentation/devicetree/bindings/extcon/extcon-palmas.txt
+++ b/Documentation/devicetree/bindings/extcon/extcon-palmas.txt
@@ -10,8 +10,11 @@ Required Properties:
 
 Optional Properties:
  - ti,wakeup : To enable the wakeup comparator in probe
- - ti,enable-id-detection: Perform ID detection.
+ - ti,enable-id-detection: Perform ID detection. If id-gpio is specified
+		it performs id-detection using GPIO else using OTG core.
  - ti,enable-vbus-detection: Perform VBUS detection.
+ - id-gpio: gpio for GPIO ID detection. See gpio binding.
+ - debounce-delay-ms: debounce delay for GPIO ID pin in milliseconds.
 
 palmas-usb {
        compatible = "ti,twl6035-usb", "ti,palmas-usb";
diff --git a/drivers/extcon/extcon-palmas.c b/drivers/extcon/extcon-palmas.c
index 8933e7e0d9da..662e91778cb0 100644
--- a/drivers/extcon/extcon-palmas.c
+++ b/drivers/extcon/extcon-palmas.c
@@ -28,6 +28,10 @@
 #include <linux/mfd/palmas.h>
 #include <linux/of.h>
 #include <linux/of_platform.h>
+#include <linux/of_gpio.h>
+#include <linux/workqueue.h>
+
+#define USB_GPIO_DEBOUNCE_MS	20	/* ms */
 
 static const unsigned int palmas_extcon_cable[] = {
 	EXTCON_USB,
@@ -118,19 +122,54 @@ static irqreturn_t palmas_id_irq_handler(int irq, void *_palmas_usb)
 	return IRQ_HANDLED;
 }
 
+static void palmas_gpio_id_detect(struct work_struct *work)
+{
+	int id;
+	struct palmas_usb *palmas_usb = container_of(to_delayed_work(work),
+						     struct palmas_usb,
+						     wq_detectid);
+	struct extcon_dev *edev = palmas_usb->edev;
+
+	if (!palmas_usb->id_gpiod)
+		return;
+
+	id = gpiod_get_value_cansleep(palmas_usb->id_gpiod);
+
+	if (id) {
+		extcon_set_cable_state_(edev, EXTCON_USB_HOST, false);
+		dev_info(palmas_usb->dev, "USB-HOST cable is detached\n");
+	} else {
+		extcon_set_cable_state_(edev, EXTCON_USB_HOST, true);
+		dev_info(palmas_usb->dev, "USB-HOST cable is attached\n");
+	}
+}
+
+static irqreturn_t palmas_gpio_id_irq_handler(int irq, void *_palmas_usb)
+{
+	struct palmas_usb *palmas_usb = _palmas_usb;
+
+	queue_delayed_work(system_power_efficient_wq, &palmas_usb->wq_detectid,
+			   palmas_usb->sw_debounce_jiffies);
+
+	return IRQ_HANDLED;
+}
+
 static void palmas_enable_irq(struct palmas_usb *palmas_usb)
 {
 	palmas_write(palmas_usb->palmas, PALMAS_USB_OTG_BASE,
 		PALMAS_USB_VBUS_CTRL_SET,
 		PALMAS_USB_VBUS_CTRL_SET_VBUS_ACT_COMP);
 
-	palmas_write(palmas_usb->palmas, PALMAS_USB_OTG_BASE,
-		PALMAS_USB_ID_CTRL_SET, PALMAS_USB_ID_CTRL_SET_ID_ACT_COMP);
+	if (palmas_usb->enable_id_detection) {
+		palmas_write(palmas_usb->palmas, PALMAS_USB_OTG_BASE,
+			     PALMAS_USB_ID_CTRL_SET,
+			     PALMAS_USB_ID_CTRL_SET_ID_ACT_COMP);
 
-	palmas_write(palmas_usb->palmas, PALMAS_USB_OTG_BASE,
-		PALMAS_USB_ID_INT_EN_HI_SET,
-		PALMAS_USB_ID_INT_EN_HI_SET_ID_GND |
-		PALMAS_USB_ID_INT_EN_HI_SET_ID_FLOAT);
+		palmas_write(palmas_usb->palmas, PALMAS_USB_OTG_BASE,
+			     PALMAS_USB_ID_INT_EN_HI_SET,
+			     PALMAS_USB_ID_INT_EN_HI_SET_ID_GND |
+			     PALMAS_USB_ID_INT_EN_HI_SET_ID_FLOAT);
+	}
 
 	if (palmas_usb->enable_vbus_detection)
 		palmas_vbus_irq_handler(palmas_usb->vbus_irq, palmas_usb);
@@ -169,20 +208,36 @@ static int palmas_usb_probe(struct platform_device *pdev)
 			palmas_usb->wakeup = pdata->wakeup;
 	}
 
+	palmas_usb->id_gpiod = devm_gpiod_get_optional(&pdev->dev, "id");
+	if (IS_ERR(palmas_usb->id_gpiod)) {
+		dev_err(&pdev->dev, "failed to get id gpio\n");
+		return PTR_ERR(palmas_usb->id_gpiod);
+	}
+
+	if (palmas_usb->enable_id_detection && palmas_usb->id_gpiod) {
+		palmas_usb->enable_id_detection = false;
+		palmas_usb->enable_gpio_id_detection = true;
+	}
+
+	if (palmas_usb->enable_gpio_id_detection) {
+		u32 debounce;
+
+		if (of_property_read_u32(node, "debounce-delay-ms", &debounce))
+			debounce = USB_GPIO_DEBOUNCE_MS;
+
+		status = gpiod_set_debounce(palmas_usb->id_gpiod,
+					    debounce * 1000);
+		if (status < 0)
+			palmas_usb->sw_debounce_jiffies = msecs_to_jiffies(debounce);
+	}
+
+	INIT_DELAYED_WORK(&palmas_usb->wq_detectid, palmas_gpio_id_detect);
+
 	palmas->usb = palmas_usb;
 	palmas_usb->palmas = palmas;
 
 	palmas_usb->dev	 = &pdev->dev;
 
-	palmas_usb->id_otg_irq = regmap_irq_get_virq(palmas->irq_data,
-						PALMAS_ID_OTG_IRQ);
-	palmas_usb->id_irq = regmap_irq_get_virq(palmas->irq_data,
-						PALMAS_ID_IRQ);
-	palmas_usb->vbus_otg_irq = regmap_irq_get_virq(palmas->irq_data,
-						PALMAS_VBUS_OTG_IRQ);
-	palmas_usb->vbus_irq = regmap_irq_get_virq(palmas->irq_data,
-						PALMAS_VBUS_IRQ);
-
 	palmas_usb_wakeup(palmas, palmas_usb->wakeup);
 
 	platform_set_drvdata(pdev, palmas_usb);
@@ -201,6 +256,10 @@ static int palmas_usb_probe(struct platform_device *pdev)
 	}
 
 	if (palmas_usb->enable_id_detection) {
+		palmas_usb->id_otg_irq = regmap_irq_get_virq(palmas->irq_data,
+							     PALMAS_ID_OTG_IRQ);
+		palmas_usb->id_irq = regmap_irq_get_virq(palmas->irq_data,
+							 PALMAS_ID_IRQ);
 		status = devm_request_threaded_irq(palmas_usb->dev,
 				palmas_usb->id_irq,
 				NULL, palmas_id_irq_handler,
@@ -212,9 +271,33 @@ static int palmas_usb_probe(struct platform_device *pdev)
 					palmas_usb->id_irq, status);
 			return status;
 		}
+	} else if (palmas_usb->enable_gpio_id_detection) {
+		palmas_usb->gpio_id_irq = gpiod_to_irq(palmas_usb->id_gpiod);
+		if (palmas_usb->gpio_id_irq < 0) {
+			dev_err(&pdev->dev, "failed to get id irq\n");
+			return palmas_usb->gpio_id_irq;
+		}
+		status = devm_request_threaded_irq(&pdev->dev,
+						   palmas_usb->gpio_id_irq,
+						   NULL,
+						   palmas_gpio_id_irq_handler,
+						   IRQF_TRIGGER_RISING |
+						   IRQF_TRIGGER_FALLING |
+						   IRQF_ONESHOT,
+						   "palmas_usb_id",
+						   palmas_usb);
+		if (status < 0) {
+			dev_err(&pdev->dev,
+				"failed to request handler for id irq\n");
+			return status;
+		}
 	}
 
 	if (palmas_usb->enable_vbus_detection) {
+		palmas_usb->vbus_otg_irq = regmap_irq_get_virq(palmas->irq_data,
+						       PALMAS_VBUS_OTG_IRQ);
+		palmas_usb->vbus_irq = regmap_irq_get_virq(palmas->irq_data,
+							   PALMAS_VBUS_IRQ);
 		status = devm_request_threaded_irq(palmas_usb->dev,
 				palmas_usb->vbus_irq, NULL,
 				palmas_vbus_irq_handler,
@@ -229,10 +312,21 @@ static int palmas_usb_probe(struct platform_device *pdev)
 	}
 
 	palmas_enable_irq(palmas_usb);
+	/* perform initial detection */
+	palmas_gpio_id_detect(&palmas_usb->wq_detectid.work);
 	device_set_wakeup_capable(&pdev->dev, true);
 	return 0;
 }
 
+static int palmas_usb_remove(struct platform_device *pdev)
+{
+	struct palmas_usb *palmas_usb = platform_get_drvdata(pdev);
+
+	cancel_delayed_work_sync(&palmas_usb->wq_detectid);
+
+	return 0;
+}
+
 #ifdef CONFIG_PM_SLEEP
 static int palmas_usb_suspend(struct device *dev)
 {
@@ -243,6 +337,8 @@ static int palmas_usb_suspend(struct device *dev)
 			enable_irq_wake(palmas_usb->vbus_irq);
 		if (palmas_usb->enable_id_detection)
 			enable_irq_wake(palmas_usb->id_irq);
+		if (palmas_usb->enable_gpio_id_detection)
+			enable_irq_wake(palmas_usb->gpio_id_irq);
 	}
 	return 0;
 }
@@ -256,6 +352,8 @@ static int palmas_usb_resume(struct device *dev)
 			disable_irq_wake(palmas_usb->vbus_irq);
 		if (palmas_usb->enable_id_detection)
 			disable_irq_wake(palmas_usb->id_irq);
+		if (palmas_usb->enable_gpio_id_detection)
+			disable_irq_wake(palmas_usb->gpio_id_irq);
 	}
 	return 0;
 };
@@ -273,6 +371,7 @@ static const struct of_device_id of_palmas_match_tbl[] = {
 
 static struct platform_driver palmas_usb_driver = {
 	.probe = palmas_usb_probe,
+	.remove = palmas_usb_remove,
 	.driver = {
 		.name = "palmas-usb",
 		.of_match_table = of_palmas_match_tbl,
diff --git a/drivers/extcon/extcon-usb-gpio.c b/drivers/extcon/extcon-usb-gpio.c
index a2a44536a608..2b2fecffb1ad 100644
--- a/drivers/extcon/extcon-usb-gpio.c
+++ b/drivers/extcon/extcon-usb-gpio.c
@@ -15,6 +15,7 @@
  */
 
 #include <linux/extcon.h>
+#include <linux/gpio.h>
 #include <linux/gpio/consumer.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
diff --git a/include/linux/mfd/palmas.h b/include/linux/mfd/palmas.h
index bb270bd03eed..13e1d96935ed 100644
--- a/include/linux/mfd/palmas.h
+++ b/include/linux/mfd/palmas.h
@@ -21,6 +21,7 @@
 #include <linux/regmap.h>
 #include <linux/regulator/driver.h>
 #include <linux/extcon.h>
+#include <linux/of_gpio.h>
 #include <linux/usb/phy_companion.h>
 
 #define PALMAS_NUM_CLIENTS		3
@@ -551,10 +552,16 @@ struct palmas_usb {
 	int vbus_otg_irq;
 	int vbus_irq;
 
+	int gpio_id_irq;
+	struct gpio_desc *id_gpiod;
+	unsigned long sw_debounce_jiffies;
+	struct delayed_work wq_detectid;
+
 	enum palmas_usb_state linkstat;
 	int wakeup;
 	bool enable_vbus_detection;
 	bool enable_id_detection;
+	bool enable_gpio_id_detection;
 };
 
 #define comparator_to_palmas(x) container_of((x), struct palmas_usb, comparator)
-- 
cgit v1.2.3


From ff1a15d9e79886e253dd4a0fe5c03d130879f58e Mon Sep 17 00:00:00 2001
From: Victoria Milhoan <vicki.milhoan@freescale.com>
Date: Wed, 5 Aug 2015 11:28:42 -0700
Subject: crypto: caam - Added clocks and clock-names properties to SEC4.0
 device tree binding
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The clocks and clock-names properties describe input clocks that may be
required for enablement of CAAM.

Signed-off-by: Victoria Milhoan <vicki.milhoan@freescale.com>
Tested-by: Horia Geantă <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
 Documentation/devicetree/bindings/crypto/fsl-sec4.txt | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/crypto/fsl-sec4.txt b/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
index e4022776ac6e..100307304766 100644
--- a/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
+++ b/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
@@ -106,6 +106,18 @@ PROPERTIES
           to the interrupt parent to which the child domain
           is being mapped.
 
+   - clocks
+      Usage: required if SEC 4.0 requires explicit enablement of clocks
+      Value type: <prop_encoded-array>
+      Definition:  A list of phandle and clock specifier pairs describing
+          the clocks required for enabling and disabling SEC 4.0.
+
+   - clock-names
+      Usage: required if SEC 4.0 requires explicit enablement of clocks
+      Value type: <string>
+      Definition: A list of clock name strings in the same order as the
+          clocks property.
+
    Note: All other standard properties (see the ePAPR) are allowed
    but are optional.
 
@@ -120,6 +132,11 @@ EXAMPLE
 		ranges = <0 0x300000 0x10000>;
 		interrupt-parent = <&mpic>;
 		interrupts = <92 2>;
+		clocks = <&clks IMX6QDL_CLK_CAAM_MEM>,
+			 <&clks IMX6QDL_CLK_CAAM_ACLK>,
+			 <&clks IMX6QDL_CLK_CAAM_IPG>,
+			 <&clks IMX6QDL_CLK_EIM_SLOW>;
+		clock-names = "mem", "aclk", "ipg", "emi_slow";
 	};
 
 =====================================================================
-- 
cgit v1.2.3


From 0e91657e331d55d4fd4d298cf9e9eecf23a3dd45 Mon Sep 17 00:00:00 2001
From: Brian Norris <computersforpeace@gmail.com>
Date: Thu, 6 Aug 2015 14:49:00 -0700
Subject: crypto: doc - make URL into hyperlink

The HTML output works a little nicer that way.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
 Documentation/DocBook/crypto-API.tmpl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/crypto-API.tmpl b/Documentation/DocBook/crypto-API.tmpl
index 8e17a41df4c3..07df23ea06e4 100644
--- a/Documentation/DocBook/crypto-API.tmpl
+++ b/Documentation/DocBook/crypto-API.tmpl
@@ -1101,7 +1101,7 @@ kernel crypto API            |       Caller
     </para>
 
     <para>
-     [1] http://www.chronox.de/libkcapi.html
+     [1] <ulink url="http://www.chronox.de/libkcapi.html">http://www.chronox.de/libkcapi.html</ulink>
     </para>
 
    </sect1>
@@ -1661,7 +1661,7 @@ read(opfd, out, outlen);
     </para>
 
     <para>
-     [1] http://www.chronox.de/libkcapi.html
+     [1] <ulink url="http://www.chronox.de/libkcapi.html">http://www.chronox.de/libkcapi.html</ulink>
     </para>
 
    </sect1>
-- 
cgit v1.2.3


From 6bc6d0a88179b732b9a5e40e05099dc219d1b3cb Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew@lunn.ch>
Date: Sat, 8 Aug 2015 17:09:14 +0200
Subject: dsa: Support multiple MDIO busses

When using a cluster of switches, some topologies will have an MDIO
bus per switch, not one for the whole cluster. Allow this to be
represented in the device tree, by adding an optional mii-bus property
at the switch level. The old platform_device method of instantiation
supports this already, so only the device tree binding needs extending
with an additional optional phandle.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/devicetree/bindings/net/dsa/dsa.txt |  5 +++++
 net/dsa/dsa.c                                     | 12 +++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt
index f0b4cd72411d..9cf9a0ec333c 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
@@ -58,6 +58,10 @@ Optionnal property:
 			  Documentation/devicetree/bindings/net/ethernet.txt
 			  for details.
 
+- mii-bus		: Should be a phandle to a valid MDIO bus device node.
+			  This mii-bus will be used in preference to the
+			  global dsa,mii-bus defined above, for this switch.
+
 Optional subnodes:
 - fixed-link		: Fixed-link subnode describing a link to a non-MDIO
 			  managed entity. See
@@ -107,6 +111,7 @@ Example:
 			#address-cells = <1>;
 			#size-cells = <0>;
 			reg = <17 1>;	/* MDIO address 17, switch 1 in tree */
+			mii-bus = <&mii_bus1>;
 
 			switch1uplink: port@0 {
 				reg = <0>;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index b445d492c115..78d4ac97aae3 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -574,7 +574,7 @@ static int dsa_of_probe(struct device *dev)
 {
 	struct device_node *np = dev->of_node;
 	struct device_node *child, *mdio, *ethernet, *port, *link;
-	struct mii_bus *mdio_bus;
+	struct mii_bus *mdio_bus, *mdio_bus_switch;
 	struct net_device *ethernet_dev;
 	struct dsa_platform_data *pd;
 	struct dsa_chip_data *cd;
@@ -636,6 +636,16 @@ static int dsa_of_probe(struct device *dev)
 		if (!of_property_read_u32(child, "eeprom-length", &eeprom_len))
 			cd->eeprom_len = eeprom_len;
 
+		mdio = of_parse_phandle(child, "mii-bus", 0);
+		if (mdio) {
+			mdio_bus_switch = of_mdio_find_bus(mdio);
+			if (!mdio_bus_switch) {
+				ret = -EPROBE_DEFER;
+				goto out_free_chip;
+			}
+			cd->host_dev = &mdio_bus_switch->dev;
+		}
+
 		for_each_available_child_of_node(child, port) {
 			port_reg = of_get_property(port, "reg", NULL);
 			if (!port_reg)
-- 
cgit v1.2.3


From cfca3789e0678c57e09dfb1a09fdfce427b7c92e Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Fri, 7 Aug 2015 20:55:59 -0700
Subject: hwmon: (pmbus) Add device IDs for TPS544{B,C}2{0,5}

Add device IDs and references for Texas Instruments TPS544B20, TPS544B25,
TPS544C20, and TPS544C25 to the generic PMBus driver.

Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/hwmon/pmbus   | 8 ++++++--
 drivers/hwmon/pmbus/Kconfig | 3 ++-
 drivers/hwmon/pmbus/pmbus.c | 4 ++++
 3 files changed, 12 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/hwmon/pmbus b/Documentation/hwmon/pmbus
index a3557da8f5b4..b397675e876d 100644
--- a/Documentation/hwmon/pmbus
+++ b/Documentation/hwmon/pmbus
@@ -23,11 +23,15 @@ Supported chips:
 	http://www.lineagepower.com/oem/pdf/PDT012A0X.pdf
 	http://www.lineagepower.com/oem/pdf/UDT020A0X.pdf
 	http://www.lineagepower.com/oem/pdf/MDT040A0X.pdf
-  * Texas Instruments TPS40400
-    Prefixes: 'tps40400'
+  * Texas Instruments TPS40400, TPS544B20, TPS544B25, TPS544C20, TPS544C25
+    Prefixes: 'tps40400', 'tps544b20', 'tps544b25', 'tps544c20', 'tps544c25'
     Addresses scanned: -
     Datasheets:
 	http://www.ti.com/lit/gpn/tps40400
+	http://www.ti.com/lit/gpn/tps544b20
+	http://www.ti.com/lit/gpn/tps544b25
+	http://www.ti.com/lit/gpn/tps544c20
+	http://www.ti.com/lit/gpn/tps544c25
   * Generic PMBus devices
     Prefix: 'pmbus'
     Addresses scanned: -
diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index 6e1e6fe434f7..2b3242cc7779 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -20,7 +20,8 @@ config SENSORS_PMBUS
 	help
 	  If you say yes here you get hardware monitoring support for generic
 	  PMBus devices, including but not limited to ADP4000, BMR453, BMR454,
-	  MDT040, NCP4200, NCP4208, PDT003, PDT006, PDT012, UDT020, and TPS40400.
+	  MDT040, NCP4200, NCP4208, PDT003, PDT006, PDT012, TPS40400, TPS544B20,
+	  TPS544B25, TPS544C20, TPS544C25, and UDT020.
 
 	  This driver can also be built as a module. If so, the module will
 	  be called pmbus.
diff --git a/drivers/hwmon/pmbus/pmbus.c b/drivers/hwmon/pmbus/pmbus.c
index bbfa35d82e42..0a74991a60f0 100644
--- a/drivers/hwmon/pmbus/pmbus.c
+++ b/drivers/hwmon/pmbus/pmbus.c
@@ -194,6 +194,10 @@ static const struct i2c_device_id pmbus_id[] = {
 	{"pdt012", 1},
 	{"pmbus", 0},
 	{"tps40400", 1},
+	{"tps544b20", 1},
+	{"tps544b25", 1},
+	{"tps544c20", 1},
+	{"tps544c25", 1},
 	{"udt020", 1},
 	{}
 };
-- 
cgit v1.2.3


From 6951813e66e07f18c9a425c3fbd966947ea401ab Mon Sep 17 00:00:00 2001
From: Mikhail Ulyanov <mikhail.ulyanov@cogentembedded.com>
Date: Wed, 22 Jul 2015 08:23:04 -0300
Subject: [media] devicetree: bindings: Document Renesas R-Car JPEG Processing
 Unit

Add Renesas R-Car JPEG processing unit driver device tree bindings
documentation.

Signed-off-by: Mikhail Ulyanov <mikhail.ulyanov@cogentembedded.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
---
 .../devicetree/bindings/media/renesas,jpu.txt      | 24 ++++++++++++++++++++++
 1 file changed, 24 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/renesas,jpu.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/media/renesas,jpu.txt b/Documentation/devicetree/bindings/media/renesas,jpu.txt
new file mode 100644
index 000000000000..0cb94201bf92
--- /dev/null
+++ b/Documentation/devicetree/bindings/media/renesas,jpu.txt
@@ -0,0 +1,24 @@
+* Renesas JPEG Processing Unit
+
+The JPEG processing unit (JPU) incorporates the JPEG codec with an encoding
+and decoding function conforming to the JPEG baseline process, so that the JPU
+can encode image data and decode JPEG data quickly.
+
+Required properties:
+  - compatible: should containg one of the following:
+			- "renesas,jpu-r8a7790" for R-Car H2
+			- "renesas,jpu-r8a7791" for R-Car M2-W
+			- "renesas,jpu-r8a7792" for R-Car V2H
+			- "renesas,jpu-r8a7793" for R-Car M2-N
+
+  - reg: Base address and length of the registers block for the JPU.
+  - interrupts: JPU interrupt specifier.
+  - clocks: A phandle + clock-specifier pair for the JPU functional clock.
+
+Example: R8A7790 (R-Car H2) JPU node
+	jpeg-codec@fe980000 {
+		compatible = "renesas,jpu-r8a7790";
+		reg = <0 0xfe980000 0 0x10300>;
+		interrupts = <0 272 IRQ_TYPE_LEVEL_HIGH>;
+		clocks = <&mstp1_clks R8A7790_CLK_JPU>;
+	};
-- 
cgit v1.2.3


From ed099f66dadd8bac93571fb28b05bdae066b39a2 Mon Sep 17 00:00:00 2001
From: Masanari Iida <standby24x7@gmail.com>
Date: Mon, 13 Jul 2015 20:36:50 -0300
Subject: [media] DocBook: Fix typo in intro.xml

This patch fix spelling typos in intro.xml.
This xml file is not created from comments within source,
I fix the xml file.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
[hans.verkuil@cisco.com: removed mention of obsolete devfs]
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
---
 Documentation/DocBook/media/dvb/intro.xml | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/media/dvb/intro.xml b/Documentation/DocBook/media/dvb/intro.xml
index bcc72c216402..d30751338493 100644
--- a/Documentation/DocBook/media/dvb/intro.xml
+++ b/Documentation/DocBook/media/dvb/intro.xml
@@ -163,9 +163,8 @@ are called:</para>
 <para>where N enumerates the DVB PCI cards in a system starting
 from&#x00A0;0, and M enumerates the devices of each type within each
 adapter, starting from&#x00A0;0, too. We will omit the &#8220;
-<constant>/dev/dvb/adapterN/</constant>&#8221; in the further dicussion
-of these devices. The naming scheme for the devices is the same wheter
-devfs is used or not.</para>
+<constant>/dev/dvb/adapterN/</constant>&#8221; in the further discussion
+of these devices.
 
 <para>More details about the data structures and function calls of all
 the devices are described in the following chapters.</para>
-- 
cgit v1.2.3


From 922e3c220c6a5ab4aee066a9fc9aa17fcb0e4740 Mon Sep 17 00:00:00 2001
From: S Twiss <stwiss.opensource@diasemi.com>
Date: Wed, 1 Jul 2015 14:11:33 +0100
Subject: mfd: da9062: dt: Add bindings for DA9062 driver

Add device Tree Bindings for the DA9062 driver

Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 Documentation/devicetree/bindings/mfd/da9062.txt | 79 ++++++++++++++++++++++++
 1 file changed, 79 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/da9062.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mfd/da9062.txt b/Documentation/devicetree/bindings/mfd/da9062.txt
new file mode 100644
index 000000000000..5765ed991537
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/da9062.txt
@@ -0,0 +1,79 @@
+* Dialog DA9062 Power Management Integrated Circuit (PMIC)
+
+DA9062 consists of a large and varied group of sub-devices:
+
+Device                   Supply Names    Description
+------                   ------------    -----------
+da9062-regulator        :               : LDOs & BUCKs
+da9062-watchdog         :               : Watchdog Timer
+
+======
+
+Required properties:
+
+- compatible : Should be "dlg,da9062".
+- reg : Specifies the I2C slave address (this defaults to 0x58 but it can be
+  modified to match the chip's OTP settings).
+- interrupt-parent : Specifies the reference to the interrupt controller for
+  the DA9062.
+- interrupts : IRQ line information.
+- interrupt-controller
+
+See Documentation/devicetree/bindings/interrupt-controller/interrupts.txt for
+further information on IRQ bindings.
+
+Sub-nodes:
+
+- regulators : This node defines the settings for the LDOs and BUCKs. The
+  DA9062 regulators are bound using their names listed below:
+
+    buck1    : BUCK_1
+    buck2    : BUCK_2
+    buck3    : BUCK_3
+    buck4    : BUCK_4
+    ldo1     : LDO_1
+    ldo2     : LDO_2
+    ldo3     : LDO_3
+    ldo4     : LDO_4
+
+  The component follows the standard regulator framework and the bindings
+  details of individual regulator device can be found in:
+  Documentation/devicetree/bindings/regulator/regulator.txt
+
+
+- watchdog: This node defines the settings for the watchdog driver associated
+  with the DA9062 PMIC. The compatible = "dlg,da9062-watchdog" should be added
+  if a node is created.
+
+
+Example:
+
+	pmic0: da9062@58 {
+		compatible = "dlg,da9062";
+		reg = <0x58>;
+		interrupt-parent = <&gpio6>;
+		interrupts = <11 IRQ_TYPE_LEVEL_LOW>;
+		interrupt-controller;
+
+		watchdog {
+			compatible = "dlg,da9062-watchdog";
+		};
+
+		regulators {
+			DA9062_BUCK1: buck1 {
+				regulator-name = "BUCK1";
+				regulator-min-microvolt = <300000>;
+				regulator-max-microvolt = <1570000>;
+				regulator-min-microamp = <500000>;
+				regulator-max-microamp = <2000000>;
+				regulator-boot-on;
+			};
+			DA9062_LDO1: ldo1 {
+				regulator-name = "LDO_1";
+				regulator-min-microvolt = <900000>;
+				regulator-max-microvolt = <3600000>;
+				regulator-boot-on;
+			};
+		};
+	};
+
-- 
cgit v1.2.3


From 13e72a49376efb9c743578b707033f1cb8865a55 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@redhat.com>
Date: Sat, 11 Jul 2015 14:59:55 +0200
Subject: mfd: axp20x: Add binding documentation for AXP152 PMIC

Add devicetree binding documentation for the AXP152 PMIC, this is a
stripped down version of the AXP202 PMIC with the battery charging
function removed.

Signed-off-by: Michal Suchanek <hramrach@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 Documentation/devicetree/bindings/mfd/axp20x.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mfd/axp20x.txt b/Documentation/devicetree/bindings/mfd/axp20x.txt
index 753f14f46e85..41811223e5be 100644
--- a/Documentation/devicetree/bindings/mfd/axp20x.txt
+++ b/Documentation/devicetree/bindings/mfd/axp20x.txt
@@ -1,12 +1,14 @@
 AXP family PMIC device tree bindings
 
 The axp20x family current members :
+axp152 (X-Powers)
 axp202 (X-Powers)
 axp209 (X-Powers)
 axp221 (X-Powers)
 
 Required properties:
-- compatible: "x-powers,axp202", "x-powers,axp209", "x-powers,axp221"
+- compatible: "x-powers,axp152", "x-powers,axp202", "x-powers,axp209",
+	      "x-powers,axp221"
 - reg: The I2C slave address for the AXP chip
 - interrupt-parent: The parent interrupt controller
 - interrupts: SoC NMI / GPIO interrupt connected to the PMIC's IRQ pin
-- 
cgit v1.2.3


From 34885237bf7d744433d3c4558a852e20fb2c2837 Mon Sep 17 00:00:00 2001
From: Chris Zhong <zyw@rock-chips.com>
Date: Mon, 20 Jul 2015 17:23:52 +0800
Subject: mfd: rk808: dt: Add the description about dvs gpio for rk808

add the description about dvs1, dvs2, and add the example.

Signed-off-by: Chris Zhong <zyw@rock-chips.com>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 Documentation/devicetree/bindings/mfd/rk808.txt | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mfd/rk808.txt b/Documentation/devicetree/bindings/mfd/rk808.txt
index 9e6e2592e5c8..4ca6aab4273a 100644
--- a/Documentation/devicetree/bindings/mfd/rk808.txt
+++ b/Documentation/devicetree/bindings/mfd/rk808.txt
@@ -24,6 +24,10 @@ Optional properties:
 - vcc10-supply: The input supply for LDO_REG6
 - vcc11-supply: The input supply for LDO_REG8
 - vcc12-supply: The input supply for SWITCH_REG2
+- dvs-gpios:  buck1/2 can be controlled by gpio dvs, this is GPIO specifiers
+  for 2 host gpio's used for dvs. The format of the gpio specifier depends in
+  the gpio controller. If DVS GPIOs aren't present, voltage changes will happen
+  very quickly with no slow ramp time.
 
 Regulators: All the regulators of RK808 to be instantiated shall be
 listed in a child node named 'regulators'. Each regulator is represented
@@ -55,7 +59,9 @@ Example:
 		interrupt-parent = <&gpio0>;
 		interrupts = <4 IRQ_TYPE_LEVEL_LOW>;
 		pinctrl-names = "default";
-		pinctrl-0 = <&pmic_int>;
+		pinctrl-0 = <&pmic_int &dvs_1 &dvs_2>;
+		dvs-gpios = <&gpio7 11 GPIO_ACTIVE_HIGH>,
+			    <&gpio7 15 GPIO_ACTIVE_HIGH>;
 		reg = <0x1b>;
 		rockchip,system-power-controller;
 		wakeup-source;
-- 
cgit v1.2.3


From 2f5532a231de3bf9d6e87623f0b307362946e52d Mon Sep 17 00:00:00 2001
From: S Twiss <stwiss.opensource@diasemi.com>
Date: Tue, 21 Jul 2015 11:29:06 +0100
Subject: devicetree: da9062: Add device tree bindings for DA9062 RTC

Add device tree bindings for the DA9062 RTC driver component

Signed-off-by: Steve Twiss <stwiss.opensource@diasemi.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 Documentation/devicetree/bindings/mfd/da9062.txt | 9 +++++++++
 1 file changed, 9 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mfd/da9062.txt b/Documentation/devicetree/bindings/mfd/da9062.txt
index 5765ed991537..38802b54d48a 100644
--- a/Documentation/devicetree/bindings/mfd/da9062.txt
+++ b/Documentation/devicetree/bindings/mfd/da9062.txt
@@ -5,6 +5,7 @@ DA9062 consists of a large and varied group of sub-devices:
 Device                   Supply Names    Description
 ------                   ------------    -----------
 da9062-regulator        :               : LDOs & BUCKs
+da9062-rtc              :               : Real-Time Clock
 da9062-watchdog         :               : Watchdog Timer
 
 ======
@@ -41,6 +42,10 @@ Sub-nodes:
   Documentation/devicetree/bindings/regulator/regulator.txt
 
 
+- rtc : This node defines settings required for the Real-Time Clock associated
+  with the DA9062. There are currently no entries in this binding, however
+  compatible = "dlg,da9062-rtc" should be added if a node is created.
+
 - watchdog: This node defines the settings for the watchdog driver associated
   with the DA9062 PMIC. The compatible = "dlg,da9062-watchdog" should be added
   if a node is created.
@@ -55,6 +60,10 @@ Example:
 		interrupts = <11 IRQ_TYPE_LEVEL_LOW>;
 		interrupt-controller;
 
+		rtc {
+			compatible = "dlg,da9062-rtc";
+		};
+
 		watchdog {
 			compatible = "dlg,da9062-watchdog";
 		};
-- 
cgit v1.2.3


From d9c93f5de8c84ccbb58fd88a530b730643fbb129 Mon Sep 17 00:00:00 2001
From: Boris Brezillon <boris.brezillon@free-electrons.com>
Date: Fri, 31 Jul 2015 18:45:54 +0200
Subject: mfd: atmel-hlcdc: Add support for new SoCs

Add the compatible strings for the at91sam9x5, at91sam9n12, sama5d4 and
sama5d2 SoCs.
Update the documentation accordingly.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 Documentation/devicetree/bindings/mfd/atmel-hlcdc.txt | 4 ++++
 drivers/mfd/atmel-hlcdc.c                             | 4 ++++
 2 files changed, 8 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mfd/atmel-hlcdc.txt b/Documentation/devicetree/bindings/mfd/atmel-hlcdc.txt
index f64de95a8e8b..ad5d90482a0e 100644
--- a/Documentation/devicetree/bindings/mfd/atmel-hlcdc.txt
+++ b/Documentation/devicetree/bindings/mfd/atmel-hlcdc.txt
@@ -2,7 +2,11 @@ Device-Tree bindings for Atmel's HLCDC (High LCD Controller) MFD driver
 
 Required properties:
  - compatible: value should be one of the following:
+   "atmel,at91sam9n12-hlcdc"
+   "atmel,at91sam9x5-hlcdc"
+   "atmel,sama5d2-hlcdc"
    "atmel,sama5d3-hlcdc"
+   "atmel,sama5d4-hlcdc"
  - reg: base address and size of the HLCDC device registers.
  - clock-names: the name of the 3 clocks requested by the HLCDC device.
    Should contain "periph_clk", "sys_clk" and "slow_clk".
diff --git a/drivers/mfd/atmel-hlcdc.c b/drivers/mfd/atmel-hlcdc.c
index d1d6e8f47993..3fff6b5d0426 100644
--- a/drivers/mfd/atmel-hlcdc.c
+++ b/drivers/mfd/atmel-hlcdc.c
@@ -141,7 +141,11 @@ static int atmel_hlcdc_remove(struct platform_device *pdev)
 }
 
 static const struct of_device_id atmel_hlcdc_match[] = {
+	{ .compatible = "atmel,at91sam9n12-hlcdc" },
+	{ .compatible = "atmel,at91sam9x5-hlcdc" },
+	{ .compatible = "atmel,sama5d2-hlcdc" },
 	{ .compatible = "atmel,sama5d3-hlcdc" },
+	{ .compatible = "atmel,sama5d4-hlcdc" },
 	{ /* sentinel */ },
 };
 
-- 
cgit v1.2.3


From ff066f731a71409a3c29d8bd1d702867167c6994 Mon Sep 17 00:00:00 2001
From: Constantine Shulyupin <const@MakeLinux.com>
Date: Thu, 6 Aug 2015 01:20:56 +0300
Subject: of: Add vendor prefix for Nuvoton

Signed-off-by: Constantine Shulyupin <const@MakeLinux.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index d444757c4d9e..217ece8e088b 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -146,6 +146,7 @@ netlogic	Broadcom Corporation (formerly NetLogic Microsystems)
 newhaven	Newhaven Display International
 nintendo	Nintendo
 nokia	Nokia
+nuvoton	Nuvoton Technology Corporation
 nvidia	NVIDIA
 nxp	NXP Semiconductors
 onnn	ON Semiconductor Corp.
-- 
cgit v1.2.3


From f3ff96e90733c24dc2f7e510822eb177d22683ed Mon Sep 17 00:00:00 2001
From: Frank Li <Frank.Li@freescale.com>
Date: Fri, 10 Jul 2015 02:09:43 +0800
Subject: Document: dt: binding: imx: update document for imx6ul support

This part just add necessary change to boot imx6ul.
Update clock and pinctrl for imx6ul

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
---
 .../devicetree/bindings/clock/imx6ul-clock.txt     | 13 ++++++++
 .../bindings/pinctrl/fsl,imx6ul-pinctrl.txt        | 36 ++++++++++++++++++++++
 2 files changed, 49 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/imx6ul-clock.txt
 create mode 100644 Documentation/devicetree/bindings/pinctrl/fsl,imx6ul-pinctrl.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/clock/imx6ul-clock.txt b/Documentation/devicetree/bindings/clock/imx6ul-clock.txt
new file mode 100644
index 000000000000..571d5039f663
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/imx6ul-clock.txt
@@ -0,0 +1,13 @@
+* Clock bindings for Freescale i.MX6 UltraLite
+
+Required properties:
+- compatible: Should be "fsl,imx6ul-ccm"
+- reg: Address and length of the register set
+- #clock-cells: Should be <1>
+- clocks: list of clock specifiers, must contain an entry for each required
+  entry in clock-names
+- clock-names: should include entries "ckil", "osc", "ipp_di0" and "ipp_di1"
+
+The clock consumer should specify the desired clock by having the clock
+ID in its "clocks" phandle cell.  See include/dt-bindings/clock/imx6ul-clock.h
+for the full list of i.MX6 UltraLite clock IDs.
diff --git a/Documentation/devicetree/bindings/pinctrl/fsl,imx6ul-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/fsl,imx6ul-pinctrl.txt
new file mode 100644
index 000000000000..a81bbf37ed66
--- /dev/null
+++ b/Documentation/devicetree/bindings/pinctrl/fsl,imx6ul-pinctrl.txt
@@ -0,0 +1,36 @@
+* Freescale i.MX6 UltraLite IOMUX Controller
+
+Please refer to fsl,imx-pinctrl.txt in this directory for common binding part
+and usage.
+
+Required properties:
+- compatible: "fsl,imx6ul-iomuxc"
+- fsl,pins: each entry consists of 6 integers and represents the mux and config
+  setting for one pin.  The first 5 integers <mux_reg conf_reg input_reg mux_val
+  input_val> are specified using a PIN_FUNC_ID macro, which can be found in
+  imx6ul-pinfunc.h under device tree source folder.  The last integer CONFIG is
+  the pad setting value like pull-up on this pin.  Please refer to i.MX6 UltraLite
+  Reference Manual for detailed CONFIG settings.
+
+CONFIG bits definition:
+PAD_CTL_HYS                     (1 << 16)
+PAD_CTL_PUS_100K_DOWN           (0 << 14)
+PAD_CTL_PUS_47K_UP              (1 << 14)
+PAD_CTL_PUS_100K_UP             (2 << 14)
+PAD_CTL_PUS_22K_UP              (3 << 14)
+PAD_CTL_PUE                     (1 << 13)
+PAD_CTL_PKE                     (1 << 12)
+PAD_CTL_ODE                     (1 << 11)
+PAD_CTL_SPEED_LOW               (0 << 6)
+PAD_CTL_SPEED_MED               (1 << 6)
+PAD_CTL_SPEED_HIGH              (3 << 6)
+PAD_CTL_DSE_DISABLE             (0 << 3)
+PAD_CTL_DSE_260ohm              (1 << 3)
+PAD_CTL_DSE_130ohm              (2 << 3)
+PAD_CTL_DSE_87ohm               (3 << 3)
+PAD_CTL_DSE_65ohm               (4 << 3)
+PAD_CTL_DSE_52ohm               (5 << 3)
+PAD_CTL_DSE_43ohm               (6 << 3)
+PAD_CTL_DSE_37ohm               (7 << 3)
+PAD_CTL_SRE_FAST                (1 << 0)
+PAD_CTL_SRE_SLOW                (0 << 0)
-- 
cgit v1.2.3


From fd26f8830979de48eb3f1c253eb9d2ee2e468eb6 Mon Sep 17 00:00:00 2001
From: Eric Anholt <eric@anholt.net>
Date: Thu, 4 Jun 2015 13:11:45 -0700
Subject: dt/bindings: Add binding for the Raspberry Pi firmware driver

This driver will provide support for calls into the firmware that will
be used by other drivers like cpufreq and vc4.

Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 .../bindings/arm/bcm/raspberrypi,bcm2835-firmware.txt      | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/bcm/raspberrypi,bcm2835-firmware.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/bcm/raspberrypi,bcm2835-firmware.txt b/Documentation/devicetree/bindings/arm/bcm/raspberrypi,bcm2835-firmware.txt
new file mode 100644
index 000000000000..6824b3180ffb
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/bcm/raspberrypi,bcm2835-firmware.txt
@@ -0,0 +1,14 @@
+Raspberry Pi VideoCore firmware driver
+
+Required properties:
+
+- compatible:		Should be "raspberrypi,bcm2835-firmware"
+- mboxes:		Phandle to the firmware device's Mailbox.
+			  (See: ../mailbox/mailbox.txt for more information)
+
+Example:
+
+firmware {
+	compatible = "raspberrypi,bcm2835-firmware";
+	mboxes = <&mailbox>;
+};
-- 
cgit v1.2.3


From 179dd8c0348af75b02c7d72eaaf1cb179f1721ef Mon Sep 17 00:00:00 2001
From: Peter Griffin <peter.griffin@linaro.org>
Date: Thu, 30 Jul 2015 14:08:54 -0300
Subject: [media] c8sectpfe: Add DT bindings documentation for c8sectpfe driver

This patch adds the DT bindings documentation for the c8sectpfe LinuxDVB
demux driver whose IP is in the STiH407 family silicon SoC's.

Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
---
 .../bindings/media/stih407-c8sectpfe.txt           | 89 ++++++++++++++++++++++
 include/dt-bindings/media/c8sectpfe.h              | 12 +++
 2 files changed, 101 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/stih407-c8sectpfe.txt
 create mode 100644 include/dt-bindings/media/c8sectpfe.h

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/media/stih407-c8sectpfe.txt b/Documentation/devicetree/bindings/media/stih407-c8sectpfe.txt
new file mode 100644
index 000000000000..d4def767bdfe
--- /dev/null
+++ b/Documentation/devicetree/bindings/media/stih407-c8sectpfe.txt
@@ -0,0 +1,89 @@
+STMicroelectronics STi c8sectpfe binding
+============================================
+
+This document describes the c8sectpfe device bindings that is used to get transport
+stream data into the SoC on the TS pins, and into DDR for further processing.
+
+It is typically used in conjunction with one or more demodulator and tuner devices
+which converts from the RF to digital domain. Demodulators and tuners are usually
+located on an external DVB frontend card connected to SoC TS input pins.
+
+Currently 7 TS input (tsin) channels are supported on the stih407 family SoC.
+
+Required properties (controller (parent) node):
+- compatible	: Should be "stih407-c8sectpfe"
+
+- reg		: Address and length of register sets for each device in
+		  "reg-names"
+
+- reg-names	: The names of the register addresses corresponding to the
+		  registers filled in "reg":
+			- c8sectpfe: c8sectpfe registers
+			- c8sectpfe-ram: c8sectpfe internal sram
+
+- clocks	: phandle list of c8sectpfe clocks
+- clock-names	: should be "c8sectpfe"
+See: Documentation/devicetree/bindings/clock/clock-bindings.txt
+
+- pinctrl-names	: a pinctrl state named tsin%d-serial or tsin%d-parallel (where %d is tsin-num)
+		   must be defined for each tsin child node.
+- pinctrl-0	: phandle referencing pin configuration for this tsin configuration
+See: Documentation/devicetree/bindings/pinctrl/pinctrl-binding.txt
+
+
+Required properties (tsin (child) node):
+
+- tsin-num	: tsin id of the InputBlock (must be between 0 to 6)
+- i2c-bus	: phandle to the I2C bus DT node which the demodulators & tuners on this tsin channel are connected.
+- rst-gpio	: reset gpio for this tsin channel.
+
+Optional properties (tsin (child) node):
+
+- invert-ts-clk		: Bool property to control sense of ts input clock (data stored on falling edge of clk).
+- serial-not-parallel	: Bool property to configure input bus width (serial on ts_data<7>).
+- async-not-sync	: Bool property to control if data is received in asynchronous mode
+			   (all bits/bytes with ts_valid or ts_packet asserted are valid).
+
+- dvb-card		: Describes the NIM card connected to this tsin channel.
+
+Example:
+
+/* stih410 SoC b2120 + b2004a + stv0367-pll(NIMB) + stv0367-tda18212 (NIMA) DT example) */
+
+	c8sectpfe@08a20000 {
+		compatible = "st,stih407-c8sectpfe";
+		status = "okay";
+		reg = <0x08a20000 0x10000>, <0x08a00000 0x4000>;
+		reg-names = "stfe", "stfe-ram";
+		interrupts = <0 34 0>, <0 35 0>;
+		interrupt-names = "stfe-error-irq", "stfe-idle-irq";
+
+		pinctrl-names	= "tsin0-serial", "tsin0-parallel", "tsin3-serial",
+				"tsin4-serial", "tsin5-serial";
+
+		pinctrl-0	= <&pinctrl_tsin0_serial>;
+		pinctrl-1	= <&pinctrl_tsin0_parallel>;
+		pinctrl-2	= <&pinctrl_tsin3_serial>;
+		pinctrl-3	= <&pinctrl_tsin4_serial_alt3>;
+		pinctrl-4	= <&pinctrl_tsin5_serial_alt1>;
+
+		clocks = <&clk_s_c0_flexgen CLK_PROC_STFE>;
+		clock-names = "stfe";
+
+		/* tsin0 is TSA on NIMA */
+		tsin0: port@0 {
+			tsin-num		= <0>;
+			serial-not-parallel;
+			i2c-bus			= <&ssc2>;
+			rst-gpio		= <&pio15 4 0>;
+			dvb-card		= <STV0367_TDA18212_NIMA_1>;
+		};
+
+		tsin3: port@3 {
+			tsin-num		= <3>;
+			serial-not-parallel;
+			i2c-bus			= <&ssc3>;
+			rst-gpio		= <&pio15 7 0>;
+			dvb-card		= <STV0367_TDA18212_NIMB_1>;
+		};
+	};
diff --git a/include/dt-bindings/media/c8sectpfe.h b/include/dt-bindings/media/c8sectpfe.h
new file mode 100644
index 000000000000..a0b5c7be683c
--- /dev/null
+++ b/include/dt-bindings/media/c8sectpfe.h
@@ -0,0 +1,12 @@
+#ifndef __DT_C8SECTPFE_H
+#define __DT_C8SECTPFE_H
+
+#define STV0367_TDA18212_NIMA_1	0
+#define STV0367_TDA18212_NIMA_2	1
+#define STV0367_TDA18212_NIMB_1	2
+#define STV0367_TDA18212_NIMB_2	3
+
+#define STV0903_6110_LNB24_NIMA	4
+#define STV0903_6110_LNB24_NIMB	5
+
+#endif /* __DT_C8SECTPFE_H */
-- 
cgit v1.2.3


From ea9eb698b2f59e16fbf9f480a9b35ddfb3c3a789 Mon Sep 17 00:00:00 2001
From: Alexander Aring <alex.aring@gmail.com>
Date: Tue, 11 Aug 2015 21:44:10 +0200
Subject: documentation: networking: add 6lowpan documentation

This patch adds a 6lowpan.txt into the networking documentation
directory. Currently this documentation describes how the lowpan
private data of net devices will be handled.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Suggested-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
---
 Documentation/networking/6lowpan.txt | 50 ++++++++++++++++++++++++++++++++++++
 MAINTAINERS                          |  1 +
 2 files changed, 51 insertions(+)
 create mode 100644 Documentation/networking/6lowpan.txt

(limited to 'Documentation')

diff --git a/Documentation/networking/6lowpan.txt b/Documentation/networking/6lowpan.txt
new file mode 100644
index 000000000000..a7dc7e939c7a
--- /dev/null
+++ b/Documentation/networking/6lowpan.txt
@@ -0,0 +1,50 @@
+
+Netdev private dataroom for 6lowpan interfaces:
+
+All 6lowpan able net devices, means all interfaces with ARPHRD_6LOWPAN,
+must have "struct lowpan_priv" placed at beginning of netdev_priv.
+
+The priv_size of each interface should be calculate by:
+
+ dev->priv_size = LOWPAN_PRIV_SIZE(LL_6LOWPAN_PRIV_DATA);
+
+Where LL_PRIV_6LOWPAN_DATA is sizeof linklayer 6lowpan private data struct.
+To access the LL_PRIV_6LOWPAN_DATA structure you can cast:
+
+ lowpan_priv(dev)-priv;
+
+to your LL_6LOWPAN_PRIV_DATA structure.
+
+Before registering the lowpan netdev interface you must run:
+
+ lowpan_netdev_setup(dev, LOWPAN_LLTYPE_FOOBAR);
+
+wheres LOWPAN_LLTYPE_FOOBAR is a define for your 6LoWPAN linklayer type of
+enum lowpan_lltypes.
+
+Example to evaluate the private usually you can do:
+
+static inline sturct lowpan_priv_foobar *
+lowpan_foobar_priv(struct net_device *dev)
+{
+	return (sturct lowpan_priv_foobar *)lowpan_priv(dev)->priv;
+}
+
+switch (dev->type) {
+case ARPHRD_6LOWPAN:
+	lowpan_priv = lowpan_priv(dev);
+	/* do great stuff which is ARPHRD_6LOWPAN related */
+	switch (lowpan_priv->lltype) {
+	case LOWPAN_LLTYPE_FOOBAR:
+		/* do 802.15.4 6LoWPAN handling here */
+		lowpan_foobar_priv(dev)->bar = foo;
+		break;
+	...
+	}
+	break;
+...
+}
+
+In case of generic 6lowpan branch ("net/6lowpan") you can remove the check
+on ARPHRD_6LOWPAN, because you can be sure that these function are called
+by ARPHRD_6LOWPAN interfaces.
diff --git a/MAINTAINERS b/MAINTAINERS
index 98ede02a96f2..5baa91c89b3b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -158,6 +158,7 @@ L:	linux-wpan@vger.kernel.org
 S:	Maintained
 F:	net/6lowpan/
 F:	include/net/6lowpan.h
+F:	Documentation/networking/6lowpan.txt
 
 6PACK NETWORK DRIVER FOR AX.25
 M:	Andreas Koensgen <ajk@comnets.uni-bremen.de>
-- 
cgit v1.2.3


From 78bdcad05ea17fa1fe4644324b877162b1df4e16 Mon Sep 17 00:00:00 2001
From: Kishon Vijay Abraham I <kishon@ti.com>
Date: Tue, 28 Jul 2015 19:09:09 +0530
Subject: PCI: dra7xx: Add support to make GPIO drive PERST# line

The PERST# line in am57x-evm is connected to a GPIO line and PERST# should
be driven high to indicate the clocks are stable (As per Figure 2-10: Power
Up of the PCIe CEM spec 3.0).

Add support to make GPIO drive PERST# line.

Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/devicetree/bindings/pci/ti-pci.txt |  3 +++
 drivers/pci/host/pci-dra7xx.c                    | 24 ++++++++++++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/pci/ti-pci.txt b/Documentation/devicetree/bindings/pci/ti-pci.txt
index 3d217911b313..60e25161f351 100644
--- a/Documentation/devicetree/bindings/pci/ti-pci.txt
+++ b/Documentation/devicetree/bindings/pci/ti-pci.txt
@@ -23,6 +23,9 @@ PCIe Designware Controller
    interrupt-map-mask,
    interrupt-map : as specified in ../designware-pcie.txt
 
+Optional Property:
+ - gpios : Should be added if a gpio line is required to drive PERST# line
+
 Example:
 axi {
 	compatible = "simple-bus";
diff --git a/drivers/pci/host/pci-dra7xx.c b/drivers/pci/host/pci-dra7xx.c
index 3772aff7df4f..abb85a157ce0 100644
--- a/drivers/pci/host/pci-dra7xx.c
+++ b/drivers/pci/host/pci-dra7xx.c
@@ -17,6 +17,7 @@
 #include <linux/irqdomain.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/of_gpio.h>
 #include <linux/pci.h>
 #include <linux/phy/phy.h>
 #include <linux/platform_device.h>
@@ -336,6 +337,9 @@ static int __init dra7xx_pcie_probe(struct platform_device *pdev)
 	struct device *dev = &pdev->dev;
 	struct device_node *np = dev->of_node;
 	char name[10];
+	int gpio_sel;
+	enum of_gpio_flags flags;
+	unsigned long gpio_flags;
 
 	dra7xx = devm_kzalloc(dev, sizeof(*dra7xx), GFP_KERNEL);
 	if (!dra7xx)
@@ -398,6 +402,22 @@ static int __init dra7xx_pcie_probe(struct platform_device *pdev)
 		goto err_get_sync;
 	}
 
+	gpio_sel = of_get_gpio_flags(dev->of_node, 0, &flags);
+	if (gpio_is_valid(gpio_sel)) {
+		gpio_flags = (flags & OF_GPIO_ACTIVE_LOW) ?
+				GPIOF_OUT_INIT_LOW : GPIOF_OUT_INIT_HIGH;
+		ret = devm_gpio_request_one(dev, gpio_sel, gpio_flags,
+					    "pcie_reset");
+		if (ret) {
+			dev_err(&pdev->dev, "gpio%d request failed, ret %d\n",
+				gpio_sel, ret);
+			goto err_gpio;
+		}
+	} else if (gpio_sel == -EPROBE_DEFER) {
+		ret = -EPROBE_DEFER;
+		goto err_gpio;
+	}
+
 	reg = dra7xx_pcie_readl(dra7xx, PCIECTRL_DRA7XX_CONF_DEVICE_CMD);
 	reg &= ~LTSSM_EN;
 	dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_DEVICE_CMD, reg);
@@ -406,11 +426,11 @@ static int __init dra7xx_pcie_probe(struct platform_device *pdev)
 
 	ret = dra7xx_add_pcie_port(dra7xx, pdev);
 	if (ret < 0)
-		goto err_add_port;
+		goto err_gpio;
 
 	return 0;
 
-err_add_port:
+err_gpio:
 	pm_runtime_put(dev);
 
 err_get_sync:
-- 
cgit v1.2.3


From 8bc964aa25e56b7445ffebffccd455f959370a16 Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Tue, 4 Aug 2015 14:28:03 +0200
Subject: clk: shmobile: r8a7778: Add CPG/MSTP Clock Domain support

Add Clock Domain support to the R-Car M1A Clock Pulse Generator (CPG)
driver using the generic PM Domain.  This allows to power-manage the
module clocks of SoC devices that are part of the CPG/MSTP Clock Domain
using Runtime PM, or for system suspend/resume.

SoC devices that are part of the CPG/MSTP Clock Domain and can be
power-managed through an MSTP clock should be tagged in DT with a proper
"power-domains" property.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
---
 .../bindings/clock/renesas,r8a7778-cpg-clocks.txt  | 29 +++++++++++++++++++---
 arch/arm/mach-shmobile/Kconfig                     |  1 +
 drivers/clk/shmobile/clk-r8a7778.c                 |  2 ++
 3 files changed, 29 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/clock/renesas,r8a7778-cpg-clocks.txt b/Documentation/devicetree/bindings/clock/renesas,r8a7778-cpg-clocks.txt
index 2f3747fdcf1c..e4cdaf1cb333 100644
--- a/Documentation/devicetree/bindings/clock/renesas,r8a7778-cpg-clocks.txt
+++ b/Documentation/devicetree/bindings/clock/renesas,r8a7778-cpg-clocks.txt
@@ -1,7 +1,9 @@
 * Renesas R8A7778 Clock Pulse Generator (CPG)
 
 The CPG generates core clocks for the R8A7778. It includes two PLLs and
-several fixed ratio dividers
+several fixed ratio dividers.
+The CPG also provides a Clock Domain for SoC devices, in combination with the
+CPG Module Stop (MSTP) Clocks.
 
 Required Properties:
 
@@ -10,10 +12,18 @@ Required Properties:
   - #clock-cells: Must be 1
   - clock-output-names: The names of the clocks. Supported clocks are
     "plla", "pllb", "b", "out", "p", "s", and "s1".
+  - #power-domain-cells: Must be 0
 
+SoC devices that are part of the CPG/MSTP Clock Domain and can be power-managed
+through an MSTP clock should refer to the CPG device node in their
+"power-domains" property, as documented by the generic PM domain bindings in
+Documentation/devicetree/bindings/power/power_domain.txt.
 
-Example
--------
+
+Examples
+--------
+
+  - CPG device node:
 
 	cpg_clocks: cpg_clocks@ffc80000 {
 		compatible = "renesas,r8a7778-cpg-clocks";
@@ -22,4 +32,17 @@ Example
 		clocks = <&extal_clk>;
 		clock-output-names = "plla", "pllb", "b",
 				     "out", "p", "s", "s1";
+		#power-domain-cells = <0>;
+	};
+
+
+  - CPG/MSTP Clock Domain member device node:
+
+	sdhi0: sd@ffe4c000 {
+		compatible = "renesas,sdhi-r8a7778";
+		reg = <0xffe4c000 0x100>;
+		interrupts = <0 87 IRQ_TYPE_LEVEL_HIGH>;
+		clocks = <&mstp3_clks R8A7778_CLK_SDHI0>;
+		power-domains = <&cpg_clocks>;
+		status = "disabled";
 	};
diff --git a/arch/arm/mach-shmobile/Kconfig b/arch/arm/mach-shmobile/Kconfig
index 45006479d461..e14fa5e87475 100644
--- a/arch/arm/mach-shmobile/Kconfig
+++ b/arch/arm/mach-shmobile/Kconfig
@@ -4,6 +4,7 @@ config ARCH_SHMOBILE
 
 config PM_RCAR
 	bool
+	select PM_GENERIC_DOMAINS if PM
 
 config PM_RMOBILE
 	bool
diff --git a/drivers/clk/shmobile/clk-r8a7778.c b/drivers/clk/shmobile/clk-r8a7778.c
index cb33b57274bf..fa45684e220c 100644
--- a/drivers/clk/shmobile/clk-r8a7778.c
+++ b/drivers/clk/shmobile/clk-r8a7778.c
@@ -124,6 +124,8 @@ static void __init r8a7778_cpg_clocks_init(struct device_node *np)
 	}
 
 	of_clk_add_provider(np, of_clk_src_onecell_get, &cpg->data);
+
+	cpg_mstp_add_clk_domain(np);
 }
 
 CLK_OF_DECLARE(r8a7778_cpg_clks, "renesas,r8a7778-cpg-clocks",
-- 
cgit v1.2.3


From b31fc90c14d9e584ac19983686cecab3e0764289 Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Tue, 4 Aug 2015 14:28:04 +0200
Subject: clk: shmobile: r8a7779: Add CPG/MSTP Clock Domain support

Add Clock Domain support to the R-Car H1 Clock Pulse Generator (CPG)
driver using the generic PM Domain.  This allows to power-manage the
module clocks of SoC devices that are part of the CPG/MSTP Clock Domain
using Runtime PM, or for system suspend/resume.

SoC devices that are part of the CPG/MSTP Clock Domain and can be
power-managed through an MSTP clock should be tagged in DT with a proper
"power-domains" property.

Also update the reg property in the DT binding doc example to match the
actual dtsi, which uses #address-cells and #size-cells == 1, not 2.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
---
 .../bindings/clock/renesas,r8a7779-cpg-clocks.txt  | 30 +++++++++++++++++++---
 drivers/clk/shmobile/clk-r8a7779.c                 |  2 ++
 2 files changed, 28 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/clock/renesas,r8a7779-cpg-clocks.txt b/Documentation/devicetree/bindings/clock/renesas,r8a7779-cpg-clocks.txt
index ed3c8cb12f4e..8c81547c29f5 100644
--- a/Documentation/devicetree/bindings/clock/renesas,r8a7779-cpg-clocks.txt
+++ b/Documentation/devicetree/bindings/clock/renesas,r8a7779-cpg-clocks.txt
@@ -1,7 +1,9 @@
 * Renesas R8A7779 Clock Pulse Generator (CPG)
 
 The CPG generates core clocks for the R8A7779. It includes one PLL and
-several fixed ratio dividers
+several fixed ratio dividers.
+The CPG also provides a Clock Domain for SoC devices, in combination with the
+CPG Module Stop (MSTP) Clocks.
 
 Required Properties:
 
@@ -12,16 +14,36 @@ Required Properties:
   - #clock-cells: Must be 1
   - clock-output-names: The names of the clocks. Supported clocks are "plla",
     "z", "zs", "s", "s1", "p", "b", "out".
+  - #power-domain-cells: Must be 0
 
+SoC devices that are part of the CPG/MSTP Clock Domain and can be power-managed
+through an MSTP clock should refer to the CPG device node in their
+"power-domains" property, as documented by the generic PM domain bindings in
+Documentation/devicetree/bindings/power/power_domain.txt.
 
-Example
--------
+
+Examples
+--------
+
+  - CPG device node:
 
 	cpg_clocks: cpg_clocks@ffc80000 {
 		compatible = "renesas,r8a7779-cpg-clocks";
-		reg = <0 0xffc80000 0 0x30>;
+		reg = <0xffc80000 0x30>;
 		clocks = <&extal_clk>;
 		#clock-cells = <1>;
 		clock-output-names = "plla", "z", "zs", "s", "s1", "p",
 		                     "b", "out";
+		#power-domain-cells = <0>;
+	};
+
+
+  - CPG/MSTP Clock Domain member device node:
+
+	sata: sata@fc600000 {
+		compatible = "renesas,sata-r8a7779", "renesas,rcar-sata";
+		reg = <0xfc600000 0x2000>;
+		interrupts = <0 100 IRQ_TYPE_LEVEL_HIGH>;
+		clocks = <&mstp1_clks R8A7779_CLK_SATA>;
+		power-domains = <&cpg_clocks>;
 	};
diff --git a/drivers/clk/shmobile/clk-r8a7779.c b/drivers/clk/shmobile/clk-r8a7779.c
index 652ecacb6daf..e42a63a2ad25 100644
--- a/drivers/clk/shmobile/clk-r8a7779.c
+++ b/drivers/clk/shmobile/clk-r8a7779.c
@@ -168,6 +168,8 @@ static void __init r8a7779_cpg_clocks_init(struct device_node *np)
 	}
 
 	of_clk_add_provider(np, of_clk_src_onecell_get, &cpg->data);
+
+	cpg_mstp_add_clk_domain(np);
 }
 CLK_OF_DECLARE(r8a7779_cpg_clks, "renesas,r8a7779-cpg-clocks",
 	       r8a7779_cpg_clocks_init);
-- 
cgit v1.2.3


From 63e05d9365dc25ae71bdde436b27c49daedf1977 Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Tue, 4 Aug 2015 14:28:05 +0200
Subject: clk: shmobile: rcar-gen2: Add CPG/MSTP Clock Domain support

Add Clock Domain support to the R-Car Gen2 Clock Pulse Generator (CPG)
driver using the generic PM Domain.  This allows to power-manage the
module clocks of SoC devices that are part of the CPG/MSTP Clock Domain
using Runtime PM, or for system suspend/resume.

SoC devices that are part of the CPG/MSTP Clock Domain and can be
power-managed through an MSTP clock should be tagged in DT with a proper
"power-domains" property.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
---
 .../clock/renesas,rcar-gen2-cpg-clocks.txt         | 26 ++++++++++++++++++++--
 drivers/clk/shmobile/clk-rcar-gen2.c               |  2 ++
 2 files changed, 26 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/clock/renesas,rcar-gen2-cpg-clocks.txt b/Documentation/devicetree/bindings/clock/renesas,rcar-gen2-cpg-clocks.txt
index 56f111bd3e45..2a9a8edc8f35 100644
--- a/Documentation/devicetree/bindings/clock/renesas,rcar-gen2-cpg-clocks.txt
+++ b/Documentation/devicetree/bindings/clock/renesas,rcar-gen2-cpg-clocks.txt
@@ -2,6 +2,8 @@
 
 The CPG generates core clocks for the R-Car Gen2 SoCs. It includes three PLLs
 and several fixed ratio dividers.
+The CPG also provides a Clock Domain for SoC devices, in combination with the
+CPG Module Stop (MSTP) Clocks.
 
 Required Properties:
 
@@ -20,10 +22,18 @@ Required Properties:
   - clock-output-names: The names of the clocks. Supported clocks are "main",
     "pll0", "pll1", "pll3", "lb", "qspi", "sdh", "sd0", "sd1", "z", "rcan", and
     "adsp"
+  - #power-domain-cells: Must be 0
 
+SoC devices that are part of the CPG/MSTP Clock Domain and can be power-managed
+through an MSTP clock should refer to the CPG device node in their
+"power-domains" property, as documented by the generic PM domain bindings in
+Documentation/devicetree/bindings/power/power_domain.txt.
 
-Example
--------
+
+Examples
+--------
+
+  - CPG device node:
 
 	cpg_clocks: cpg_clocks@e6150000 {
 		compatible = "renesas,r8a7790-cpg-clocks",
@@ -34,4 +44,16 @@ Example
 		clock-output-names = "main", "pll0, "pll1", "pll3",
 				     "lb", "qspi", "sdh", "sd0", "sd1", "z",
 				     "rcan", "adsp";
+		#power-domain-cells = <0>;
+	};
+
+
+  - CPG/MSTP Clock Domain member device node:
+
+	thermal@e61f0000 {
+		compatible = "renesas,thermal-r8a7790", "renesas,rcar-thermal";
+		reg = <0 0xe61f0000 0 0x14>, <0 0xe61f0100 0 0x38>;
+		interrupts = <0 69 IRQ_TYPE_LEVEL_HIGH>;
+		clocks = <&mstp5_clks R8A7790_CLK_THERMAL>;
+		power-domains = <&cpg_clocks>;
 	};
diff --git a/drivers/clk/shmobile/clk-rcar-gen2.c b/drivers/clk/shmobile/clk-rcar-gen2.c
index acfb6d7dbd6b..f2c457f494eb 100644
--- a/drivers/clk/shmobile/clk-rcar-gen2.c
+++ b/drivers/clk/shmobile/clk-rcar-gen2.c
@@ -415,6 +415,8 @@ static void __init rcar_gen2_cpg_clocks_init(struct device_node *np)
 	}
 
 	of_clk_add_provider(np, of_clk_src_onecell_get, &cpg->data);
+
+	cpg_mstp_add_clk_domain(np);
 }
 CLK_OF_DECLARE(rcar_gen2_cpg_clks, "renesas,rcar-gen2-cpg-clocks",
 	       rcar_gen2_cpg_clocks_init);
-- 
cgit v1.2.3


From f04b486d34ac6bab2aaa3988ee098b2bad3950de Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Tue, 4 Aug 2015 14:28:06 +0200
Subject: clk: shmobile: rz: Add CPG/MSTP Clock Domain support

Add Clock Domain support to the RZ Clock Pulse Generator (CPG) driver
using the generic PM Domain.  This allows to power-manage the module
clocks of SoC devices that are part of the CPG/MSTP Clock Domain using
Runtime PM, or for system suspend/resume.

SoC devices that are part of the CPG/MSTP Clock Domain and can be
power-managed through an MSTP clock should be tagged in DT with a proper
"power-domains" property.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
---
 .../bindings/clock/renesas,rz-cpg-clocks.txt       | 29 ++++++++++++++++++++--
 arch/arm/mach-shmobile/Kconfig                     |  1 +
 drivers/clk/shmobile/clk-rz.c                      |  3 +++
 3 files changed, 31 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/clock/renesas,rz-cpg-clocks.txt b/Documentation/devicetree/bindings/clock/renesas,rz-cpg-clocks.txt
index b0f7ddb8cdb1..bb51a33a1fbf 100644
--- a/Documentation/devicetree/bindings/clock/renesas,rz-cpg-clocks.txt
+++ b/Documentation/devicetree/bindings/clock/renesas,rz-cpg-clocks.txt
@@ -2,6 +2,8 @@
 
 The CPG generates core clocks for the RZ SoCs. It includes the PLL, variable
 CPU and GPU clocks, and several fixed ratio dividers.
+The CPG also provides a Clock Domain for SoC devices, in combination with the
+CPG Module Stop (MSTP) Clocks.
 
 Required Properties:
 
@@ -14,10 +16,18 @@ Required Properties:
   - #clock-cells: Must be 1
   - clock-output-names: The names of the clocks. Supported clocks are "pll",
     "i", and "g"
+  - #power-domain-cells: Must be 0
 
+SoC devices that are part of the CPG/MSTP Clock Domain and can be power-managed
+through an MSTP clock should refer to the CPG device node in their
+"power-domains" property, as documented by the generic PM domain bindings in
+Documentation/devicetree/bindings/power/power_domain.txt.
 
-Example
--------
+
+Examples
+--------
+
+  - CPG device node:
 
 	cpg_clocks: cpg_clocks@fcfe0000 {
 		#clock-cells = <1>;
@@ -26,4 +36,19 @@ Example
 		reg = <0xfcfe0000 0x18>;
 		clocks = <&extal_clk>, <&usb_x1_clk>;
 		clock-output-names = "pll", "i", "g";
+		#power-domain-cells = <0>;
+	};
+
+
+  - CPG/MSTP Clock Domain member device node:
+
+	mtu2: timer@fcff0000 {
+		compatible = "renesas,mtu2-r7s72100", "renesas,mtu2";
+		reg = <0xfcff0000 0x400>;
+		interrupts = <0 107 IRQ_TYPE_LEVEL_HIGH>;
+		interrupt-names = "tgi0a";
+		clocks = <&mstp3_clks R7S72100_CLK_MTU2>;
+		clock-names = "fck";
+		power-domains = <&cpg_clocks>;
+		status = "disabled";
 	};
diff --git a/arch/arm/mach-shmobile/Kconfig b/arch/arm/mach-shmobile/Kconfig
index e14fa5e87475..34eac88a9889 100644
--- a/arch/arm/mach-shmobile/Kconfig
+++ b/arch/arm/mach-shmobile/Kconfig
@@ -51,6 +51,7 @@ config ARCH_EMEV2
 
 config ARCH_R7S72100
 	bool "RZ/A1H (R7S72100)"
+	select PM_GENERIC_DOMAINS if PM
 	select SYS_SUPPORTS_SH_MTU2
 
 config ARCH_R8A73A4
diff --git a/drivers/clk/shmobile/clk-rz.c b/drivers/clk/shmobile/clk-rz.c
index 7e68e8630962..9766e3cb595f 100644
--- a/drivers/clk/shmobile/clk-rz.c
+++ b/drivers/clk/shmobile/clk-rz.c
@@ -10,6 +10,7 @@
  */
 
 #include <linux/clk-provider.h>
+#include <linux/clk/shmobile.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/of.h>
@@ -99,5 +100,7 @@ static void __init rz_cpg_clocks_init(struct device_node *np)
 	}
 
 	of_clk_add_provider(np, of_clk_src_onecell_get, &cpg->data);
+
+	cpg_mstp_add_clk_domain(np);
 }
 CLK_OF_DECLARE(rz_cpg_clks, "renesas,rz-cpg-clocks", rz_cpg_clocks_init);
-- 
cgit v1.2.3


From e69948a0a5309f3ef5715cb4ca7a9bd77d64e2cf Mon Sep 17 00:00:00 2001
From: Alexander Duyck <alexander.h.duyck@redhat.com>
Date: Tue, 11 Aug 2015 13:35:01 -0700
Subject: net: Document xfrm4_gc_thresh and xfrm6_gc_thresh

This change adds documentation for xfrm4_gc_thresh and xfrm6_gc_thresh
based on the comments in commit eeb1b73378b56 ("xfrm: Increase the garbage
collector threshold").

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 Documentation/networking/ip-sysctl.txt | 10 ++++++++++
 1 file changed, 10 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 56db1efd7189..46e88ed7f41d 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1181,6 +1181,11 @@ tag - INTEGER
 	Allows you to write a number, which can be used as required.
 	Default value is 0.
 
+xfrm4_gc_thresh - INTEGER
+	The threshold at which we will start garbage collecting for IPv4
+	destination cache entries.  At twice this value the system will
+	refuse new allocations.
+
 Alexey Kuznetsov.
 kuznet@ms2.inr.ac.ru
 
@@ -1617,6 +1622,11 @@ ratelimit - INTEGER
 	otherwise the minimal space between responses in milliseconds.
 	Default: 1000
 
+xfrm6_gc_thresh - INTEGER
+	The threshold at which we will start garbage collecting for IPv6
+	destination cache entries.  At twice this value the system will
+	refuse new allocations.
+
 
 IPv6 Update by:
 Pekka Savola <pekkas@netcore.fi>
-- 
cgit v1.2.3


From 05743b3a09e905b4b09681639fa386feb6a96b5a Mon Sep 17 00:00:00 2001
From: Keerthy <j-keerthy@ti.com>
Date: Fri, 7 Aug 2015 10:37:19 +0530
Subject: ARM: dts: AM4372: Add the am4372-rtc compatible string

am4372-rtc string was already part of dts, introduced to identify
the rtc specific to am4372 family of SoCs. It was removed in one of the
previous patches. Adding back the same with appropriate documentation.

Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
 Documentation/devicetree/bindings/rtc/rtc-omap.txt | 1 +
 arch/arm/boot/dts/am4372.dtsi                      | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/rtc/rtc-omap.txt b/Documentation/devicetree/bindings/rtc/rtc-omap.txt
index 4ba4dbd34289..43a83668673a 100644
--- a/Documentation/devicetree/bindings/rtc/rtc-omap.txt
+++ b/Documentation/devicetree/bindings/rtc/rtc-omap.txt
@@ -8,6 +8,7 @@ Required properties:
 			    Wakeup generation for event Alarm. It can also be
 			    used to control an external PMIC via the
 			    pmic_power_en pin.
+	- "ti,am4372-rtc" - for RTC IP used similar to that on AM437X SoC family.
 - reg: Address range of rtc register set
 - interrupts: rtc timer, alarm interrupts in order
 - interrupt-parent: phandle for the interrupt controller
diff --git a/arch/arm/boot/dts/am4372.dtsi b/arch/arm/boot/dts/am4372.dtsi
index 15fbc56596cc..6bc8b49bfaaf 100644
--- a/arch/arm/boot/dts/am4372.dtsi
+++ b/arch/arm/boot/dts/am4372.dtsi
@@ -330,7 +330,8 @@
 		};
 
 		rtc: rtc@44e3e000 {
-			compatible = "ti,am3352-rtc", "ti,da830-rtc";
+			compatible = "ti,am4372-rtc", "ti,am3352-rtc",
+				     "ti,da830-rtc";
 			reg = <0x44e3e000 0x1000>;
 			interrupts = <GIC_SPI 75 IRQ_TYPE_LEVEL_HIGH
 				      GIC_SPI 76 IRQ_TYPE_LEVEL_HIGH>;
-- 
cgit v1.2.3


From 228c37ff980f5643401a1667f5ab7c6f38602cf8 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Tue, 11 Aug 2015 12:38:54 +0100
Subject: sign-file: Document dependency on OpenSSL devel libraries

The revised sign-file program is no longer a script that wraps the openssl
program, but now rather a program that makes use of OpenSSL's crypto
library.  This means that to build the sign-file program, the kernel build
process now has a dependency on the OpenSSL development packages in
addition to OpenSSL itself.

Document this in Kconfig and in module-signing.txt.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
---
 Documentation/module-signing.txt | 3 +++
 init/Kconfig                     | 4 ++++
 2 files changed, 7 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/module-signing.txt b/Documentation/module-signing.txt
index 4e62bc29666e..02a9baf1c72f 100644
--- a/Documentation/module-signing.txt
+++ b/Documentation/module-signing.txt
@@ -111,6 +111,9 @@ This has a number of options available:
      additional certificates which will be included in the system keyring by
      default.
 
+Note that enabling module signing adds a dependency on the OpenSSL devel
+packages to the kernel build processes for the tool that does the signing.
+
 
 =======================
 GENERATING SIGNING KEYS
diff --git a/init/Kconfig b/init/Kconfig
index 62b725653c36..5d1a703663ad 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1897,6 +1897,10 @@ config MODULE_SIG
 	  is simply appended to the module. For more information see
 	  Documentation/module-signing.txt.
 
+	  Note that this option adds the OpenSSL development packages as a
+	  kernel build dependency so that the signing tool can use its crypto
+	  library.
+
 	  !!!WARNING!!!  If you enable this option, you MUST make sure that the
 	  module DOES NOT get stripped after being signed.  This includes the
 	  debuginfo strip done by some packagers (such as rpmbuild) and
-- 
cgit v1.2.3


From bf89386f166b18c21b2b7a2c5b6e496726c4f25f Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Mon, 8 Jun 2015 10:29:45 -0700
Subject: hwmon: (ltc2978) Add support for LTC3882

LTC3882 is mostly compatible with LTC3880. Major differences are that it
does not measure the input current, and it no longer supports LTC's legacy
mechanism to identify the chip.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 .../devicetree/bindings/hwmon/ltc2978.txt          |  3 +-
 Documentation/hwmon/ltc2978                        | 26 ++++++-----
 drivers/hwmon/pmbus/ltc2978.c                      | 50 +++++++++++++++++++---
 3 files changed, 62 insertions(+), 17 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/hwmon/ltc2978.txt b/Documentation/devicetree/bindings/hwmon/ltc2978.txt
index ed2f09dc2483..230389f6c984 100644
--- a/Documentation/devicetree/bindings/hwmon/ltc2978.txt
+++ b/Documentation/devicetree/bindings/hwmon/ltc2978.txt
@@ -6,6 +6,7 @@ Required properties:
   * "lltc,ltc2977"
   * "lltc,ltc2978"
   * "lltc,ltc3880"
+  * "lltc,ltc3882"
   * "lltc,ltc3883"
   * "lltc,ltm4676"
 - reg: I2C slave address
@@ -20,7 +21,7 @@ Valid names of regulators depend on number of supplies supported per device:
   * ltc2974 : vout0 - vout3
   * ltc2977 : vout0 - vout7
   * ltc2978 : vout0 - vout7
-  * ltc3880 : vout0 - vout1
+  * ltc3880, ltc3882 : vout0 - vout1
   * ltc3883 : vout0
   * ltm4676 : vout0 - vout1
 
diff --git a/Documentation/hwmon/ltc2978 b/Documentation/hwmon/ltc2978
index 686c078bb0e0..521ee8a1135c 100644
--- a/Documentation/hwmon/ltc2978
+++ b/Documentation/hwmon/ltc2978
@@ -19,6 +19,10 @@ Supported chips:
     Prefix: 'ltc3880'
     Addresses scanned: -
     Datasheet: http://www.linear.com/product/ltc3880
+  * Linear Technology LTC3882
+    Prefix: 'ltc3882'
+    Addresses scanned: -
+    Datasheet: http://www.linear.com/product/ltc3882
   * Linear Technology LTC3883
     Prefix: 'ltc3883'
     Addresses scanned: -
@@ -34,11 +38,12 @@ Author: Guenter Roeck <linux@roeck-us.net>
 Description
 -----------
 
-LTC2974 is a quad digital power supply manager. LTC2978 is an octal power supply
-monitor. LTC2977 is a pin compatible replacement for LTC2978. LTC3880 is a dual
-output poly-phase step-down DC/DC controller. LTC3883 is a single phase
-step-down DC/DC controller. LTM4676 is a dual 13A or single 26A uModule
-regulator.
+LTC2974 is a quad digital power supply managers.
+LTC2978 is an octal power supply monitor.
+LTC2977 is a pin compatible replacement for LTC2978.
+LTC3880 and LTC3882 are dual output poly-phase step-down DC/DC controllers.
+LTC3883 is a single phase step-down DC/DC controller.
+LTM4676 is a dual 13A or single 26A uModule regulator.
 
 
 Usage Notes
@@ -80,7 +85,7 @@ in[N]_label		"vout[1-8]".
 			LTC2974: N=2-5
 			LTC2977: N=2-9
 			LTC2978: N=2-9
-			LTC3880, LTM4676: N=2-3
+			LTC3880, LTC3882, LTM4676: N=2-3
 			LTC3883: N=2
 in[N]_input		Measured output voltage.
 in[N]_min		Minimum output voltage.
@@ -100,8 +105,9 @@ temp[N]_input		Measured temperature.
 			and temp5 reports the chip temperature.
 			On LTC2977 and LTC2978, only one temperature measurement
 			is supported and reports the chip temperature.
-			On LTC3880 and LTM4676, temp1 and temp2 report external
-			temperatures, and temp3 reports the chip temperature.
+			On LTC3880, LTC3882, and LTM4676, temp1 and temp2
+			report external temperatures, and temp3 reports
+			the chip temperature.
 			On LTC3883, temp1 reports an external temperature,
 			and temp2 reports the chip temperature.
 temp[N]_min		Mimimum temperature. LTC2974, LCT2977, and LTC2978 only.
@@ -128,7 +134,7 @@ power[N]_label		"pout[1-4]".
 			LTC2974: N=1-4
 			LTC2977: Not supported
 			LTC2978: Not supported
-			LTC3880, LTM4676: N=1-2
+			LTC3880, LTC3882, LTM4676: N=1-2
 			LTC3883: N=2
 power[N]_input		Measured output power.
 
@@ -143,7 +149,7 @@ curr[N]_label		"iout[1-4]".
 			LTC2974: N=1-4
 			LTC2977: not supported
 			LTC2978: not supported
-			LTC3880, LTM4676: N=2-3
+			LTC3880, LTC3882, LTM4676: N=2-3
 			LTC3883: N=2
 curr[N]_input		Measured output current.
 curr[N]_max		Maximum output current.
diff --git a/drivers/hwmon/pmbus/ltc2978.c b/drivers/hwmon/pmbus/ltc2978.c
index 7e10fbf8e133..28e1e735b5c2 100644
--- a/drivers/hwmon/pmbus/ltc2978.c
+++ b/drivers/hwmon/pmbus/ltc2978.c
@@ -25,13 +25,13 @@
 #include <linux/regulator/driver.h>
 #include "pmbus.h"
 
-enum chips { ltc2974, ltc2977, ltc2978, ltc3880, ltc3883, ltm4676 };
+enum chips { ltc2974, ltc2977, ltc2978, ltc3880, ltc3882, ltc3883, ltm4676 };
 
 /* Common for all chips */
 #define LTC2978_MFR_VOUT_PEAK		0xdd
 #define LTC2978_MFR_VIN_PEAK		0xde
 #define LTC2978_MFR_TEMPERATURE_PEAK	0xdf
-#define LTC2978_MFR_SPECIAL_ID		0xe7
+#define LTC2978_MFR_SPECIAL_ID		0xe7	/* Not on LTC3882 */
 
 /* LTC2974, LCT2977, and LTC2978 */
 #define LTC2978_MFR_VOUT_MIN		0xfb
@@ -42,7 +42,7 @@ enum chips { ltc2974, ltc2977, ltc2978, ltc3880, ltc3883, ltm4676 };
 #define LTC2974_MFR_IOUT_PEAK		0xd7
 #define LTC2974_MFR_IOUT_MIN		0xd8
 
-/* LTC3880, LTC3883, and LTM4676 */
+/* LTC3880, LTC3882, LTC3883, and LTM4676 */
 #define LTC3880_MFR_IOUT_PEAK		0xd7
 #define LTC3880_MFR_CLEAR_PEAKS		0xe3
 #define LTC3880_MFR_TEMPERATURE2_PEAK	0xf4
@@ -313,7 +313,7 @@ static int ltc2978_clear_peaks(struct i2c_client *client, int page,
 {
 	int ret;
 
-	if (id == ltc3880 || id == ltc3883 || id == ltm4676)
+	if (id == ltc3880 || id == ltc3882 || id == ltc3883 || id == ltm4676)
 		ret = pmbus_write_byte(client, 0, LTC3880_MFR_CLEAR_PEAKS);
 	else
 		ret = pmbus_write_byte(client, page, PMBUS_CLEAR_FAULTS);
@@ -369,6 +369,7 @@ static const struct i2c_device_id ltc2978_id[] = {
 	{"ltc2977", ltc2977},
 	{"ltc2978", ltc2978},
 	{"ltc3880", ltc3880},
+	{"ltc3882", ltc3882},
 	{"ltc3883", ltc3883},
 	{"ltm4676", ltm4676},
 	{}
@@ -393,8 +394,30 @@ static int ltc2978_get_id(struct i2c_client *client)
 	int chip_id;
 
 	chip_id = i2c_smbus_read_word_data(client, LTC2978_MFR_SPECIAL_ID);
-	if (chip_id < 0)
-		return chip_id;
+	if (chip_id < 0) {
+		const struct i2c_device_id *id;
+		u8 buf[I2C_SMBUS_BLOCK_MAX];
+		int ret;
+
+		if (!i2c_check_functionality(client->adapter,
+					     I2C_FUNC_SMBUS_READ_BLOCK_DATA))
+			return -ENODEV;
+
+		ret = i2c_smbus_read_block_data(client, PMBUS_MFR_ID, buf);
+		if (ret < 0)
+			return ret;
+		if (ret < 3 || strncmp(buf, "LTC", 3))
+			return -ENODEV;
+
+		ret = i2c_smbus_read_block_data(client, PMBUS_MFR_MODEL, buf);
+		if (ret < 0)
+			return ret;
+		for (id = &ltc2978_id[0]; strlen(id->name); id++) {
+			if (!strncasecmp(id->name, buf, strlen(id->name)))
+				return (int)id->driver_data;
+		}
+		return -ENODEV;
+	}
 
 	if (chip_id == LTC2974_ID_REV1 || chip_id == LTC2974_ID_REV2)
 		return ltc2974;
@@ -498,6 +521,20 @@ static int ltc2978_probe(struct i2c_client *client,
 		  | PMBUS_HAVE_POUT
 		  | PMBUS_HAVE_TEMP | PMBUS_HAVE_STATUS_TEMP;
 		break;
+	case ltc3882:
+		info->read_word_data = ltc3880_read_word_data;
+		info->pages = LTC3880_NUM_PAGES;
+		info->func[0] = PMBUS_HAVE_VIN
+		  | PMBUS_HAVE_STATUS_INPUT
+		  | PMBUS_HAVE_VOUT | PMBUS_HAVE_STATUS_VOUT
+		  | PMBUS_HAVE_IOUT | PMBUS_HAVE_STATUS_IOUT
+		  | PMBUS_HAVE_POUT | PMBUS_HAVE_TEMP
+		  | PMBUS_HAVE_TEMP2 | PMBUS_HAVE_STATUS_TEMP;
+		info->func[1] = PMBUS_HAVE_VOUT | PMBUS_HAVE_STATUS_VOUT
+		  | PMBUS_HAVE_IOUT | PMBUS_HAVE_STATUS_IOUT
+		  | PMBUS_HAVE_POUT
+		  | PMBUS_HAVE_TEMP | PMBUS_HAVE_STATUS_TEMP;
+		break;
 	case ltc3883:
 		info->read_word_data = ltc3883_read_word_data;
 		info->pages = LTC3883_NUM_PAGES;
@@ -530,6 +567,7 @@ static const struct of_device_id ltc2978_of_match[] = {
 	{ .compatible = "lltc,ltc2977" },
 	{ .compatible = "lltc,ltc2978" },
 	{ .compatible = "lltc,ltc3880" },
+	{ .compatible = "lltc,ltc3882" },
 	{ .compatible = "lltc,ltc3883" },
 	{ .compatible = "lltc,ltm4676" },
 	{ }
-- 
cgit v1.2.3


From 15398566f0ea95c66d202b8705dba4f59b9ba01c Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Fri, 7 Aug 2015 09:06:37 -0700
Subject: hwmon: (ltc2978) Add support for LTC3887

LTC3887 is an enhanced version of LTC3880 and supports the same commands.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/devicetree/bindings/hwmon/ltc2978.txt |  1 +
 Documentation/hwmon/ltc2978                         | 19 ++++++++++++-------
 drivers/hwmon/pmbus/Kconfig                         |  3 ++-
 drivers/hwmon/pmbus/ltc2978.c                       | 19 ++++++++++++++-----
 4 files changed, 29 insertions(+), 13 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/hwmon/ltc2978.txt b/Documentation/devicetree/bindings/hwmon/ltc2978.txt
index 230389f6c984..c1d23c14ddd9 100644
--- a/Documentation/devicetree/bindings/hwmon/ltc2978.txt
+++ b/Documentation/devicetree/bindings/hwmon/ltc2978.txt
@@ -8,6 +8,7 @@ Required properties:
   * "lltc,ltc3880"
   * "lltc,ltc3882"
   * "lltc,ltc3883"
+  * "lltc,ltc3887"
   * "lltc,ltm4676"
 - reg: I2C slave address
 
diff --git a/Documentation/hwmon/ltc2978 b/Documentation/hwmon/ltc2978
index 521ee8a1135c..ff130997f22e 100644
--- a/Documentation/hwmon/ltc2978
+++ b/Documentation/hwmon/ltc2978
@@ -27,6 +27,10 @@ Supported chips:
     Prefix: 'ltc3883'
     Addresses scanned: -
     Datasheet: http://www.linear.com/product/ltc3883
+  * Linear Technology LTC3887
+    Prefix: 'ltc3887'
+    Addresses scanned: -
+    Datasheet: http://www.linear.com/product/ltc3887
   * Linear Technology LTM4676
     Prefix: 'ltm4676'
     Addresses scanned: -
@@ -41,7 +45,8 @@ Description
 LTC2974 is a quad digital power supply managers.
 LTC2978 is an octal power supply monitor.
 LTC2977 is a pin compatible replacement for LTC2978.
-LTC3880 and LTC3882 are dual output poly-phase step-down DC/DC controllers.
+LTC3880, LTC3882, and LTC3887 are dual output poly-phase step-down DC/DC
+controllers.
 LTC3883 is a single phase step-down DC/DC controller.
 LTM4676 is a dual 13A or single 26A uModule regulator.
 
@@ -85,7 +90,7 @@ in[N]_label		"vout[1-8]".
 			LTC2974: N=2-5
 			LTC2977: N=2-9
 			LTC2978: N=2-9
-			LTC3880, LTC3882, LTM4676: N=2-3
+			LTC3880, LTC3882, LTC3887, LTM4676: N=2-3
 			LTC3883: N=2
 in[N]_input		Measured output voltage.
 in[N]_min		Minimum output voltage.
@@ -105,8 +110,8 @@ temp[N]_input		Measured temperature.
 			and temp5 reports the chip temperature.
 			On LTC2977 and LTC2978, only one temperature measurement
 			is supported and reports the chip temperature.
-			On LTC3880, LTC3882, and LTM4676, temp1 and temp2
-			report external temperatures, and temp3 reports
+			On LTC3880, LTC3882, LTC3887, and LTM4676, temp1 and
+			temp2 report external temperatures, and temp3 reports
 			the chip temperature.
 			On LTC3883, temp1 reports an external temperature,
 			and temp2 reports the chip temperature.
@@ -134,11 +139,11 @@ power[N]_label		"pout[1-4]".
 			LTC2974: N=1-4
 			LTC2977: Not supported
 			LTC2978: Not supported
-			LTC3880, LTC3882, LTM4676: N=1-2
+			LTC3880, LTC3882, LTC3887, LTM4676: N=1-2
 			LTC3883: N=2
 power[N]_input		Measured output power.
 
-curr1_label		"iin". LTC3880, LTC3883, and LTM4676 only.
+curr1_label		"iin". LTC3880, LTC3883, LTC3887, and LTM4676 only.
 curr1_input		Measured input current.
 curr1_max		Maximum input current.
 curr1_max_alarm		Input current high alarm.
@@ -149,7 +154,7 @@ curr[N]_label		"iout[1-4]".
 			LTC2974: N=1-4
 			LTC2977: not supported
 			LTC2978: not supported
-			LTC3880, LTC3882, LTM4676: N=2-3
+			LTC3880, LTC3882, LTC3887, LTM4676: N=2-3
 			LTC3883: N=2
 curr[N]_input		Measured output current.
 curr[N]_max		Maximum output current.
diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index 2b3242cc7779..43b6de900ce5 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -52,7 +52,8 @@ config SENSORS_LTC2978
 	default n
 	help
 	  If you say yes here you get hardware monitoring support for Linear
-	  Technology LTC2974, LTC2977, LTC2978, LTC3880, LTC3883, and LTM4676.
+	  Technology LTC2974, LTC2977, LTC2978, LTC3880, LTC3883, LTC3887,
+	  and LTM4676.
 
 	  This driver can also be built as a module. If so, the module will
 	  be called ltc2978.
diff --git a/drivers/hwmon/pmbus/ltc2978.c b/drivers/hwmon/pmbus/ltc2978.c
index acbfe3ec2ffd..0756d8ae9dad 100644
--- a/drivers/hwmon/pmbus/ltc2978.c
+++ b/drivers/hwmon/pmbus/ltc2978.c
@@ -1,6 +1,6 @@
 /*
  * Hardware monitoring driver for LTC2974, LTC2977, LTC2978, LTC3880,
- * LTC3883, and LTM4676
+ * LTC3883, LTC3887. and LTM4676
  *
  * Copyright (c) 2011 Ericsson AB.
  * Copyright (c) 2013, 2014 Guenter Roeck
@@ -25,7 +25,8 @@
 #include <linux/regulator/driver.h>
 #include "pmbus.h"
 
-enum chips { ltc2974, ltc2977, ltc2978, ltc3880, ltc3882, ltc3883, ltm4676 };
+enum chips { ltc2974, ltc2977, ltc2978, ltc3880, ltc3882, ltc3883, ltc3887,
+	     ltm4676 };
 
 /* Common for all chips */
 #define LTC2978_MFR_VOUT_PEAK		0xdd
@@ -42,7 +43,7 @@ enum chips { ltc2974, ltc2977, ltc2978, ltc3880, ltc3882, ltc3883, ltm4676 };
 #define LTC2974_MFR_IOUT_PEAK		0xd7
 #define LTC2974_MFR_IOUT_MIN		0xd8
 
-/* LTC3880, LTC3882, LTC3883, and LTM4676 */
+/* LTC3880, LTC3882, LTC3883, LTC3887, and LTM4676 */
 #define LTC3880_MFR_IOUT_PEAK		0xd7
 #define LTC3880_MFR_CLEAR_PEAKS		0xe3
 #define LTC3880_MFR_TEMPERATURE2_PEAK	0xf4
@@ -60,9 +61,11 @@ enum chips { ltc2974, ltc2977, ltc2978, ltc3880, ltc3882, ltc3883, ltm4676 };
 #define LTC3880_ID_MASK			0xff00
 #define LTC3883_ID			0x4300
 #define LTC3883_ID_MASK			0xff00
+#define LTC3887_ID			0x4700
+#define LTC3887_ID_MASK			0xff00
 #define LTM4676_ID			0x4400
 #define LTM4676_ID_2			0x4480
-#define LTM4676A_ID			0x47E0
+#define LTM4676A_ID			0x47e0
 #define LTM4676_ID_MASK			0xfff0
 
 #define LTC2974_NUM_PAGES		4
@@ -315,7 +318,8 @@ static int ltc2978_clear_peaks(struct i2c_client *client, int page,
 {
 	int ret;
 
-	if (id == ltc3880 || id == ltc3882 || id == ltc3883 || id == ltm4676)
+	if (id == ltc3880 || id == ltc3882 || id == ltc3883 || id == ltc3887 ||
+	    id == ltm4676)
 		ret = pmbus_write_byte(client, 0, LTC3880_MFR_CLEAR_PEAKS);
 	else
 		ret = pmbus_write_byte(client, page, PMBUS_CLEAR_FAULTS);
@@ -373,6 +377,7 @@ static const struct i2c_device_id ltc2978_id[] = {
 	{"ltc3880", ltc3880},
 	{"ltc3882", ltc3882},
 	{"ltc3883", ltc3883},
+	{"ltc3887", ltc3887},
 	{"ltm4676", ltm4676},
 	{}
 };
@@ -432,6 +437,8 @@ static int ltc2978_get_id(struct i2c_client *client)
 		return ltc3880;
 	else if ((chip_id & LTC3883_ID_MASK) == LTC3883_ID)
 		return ltc3883;
+	else if ((chip_id & LTC3887_ID_MASK) == LTC3887_ID)
+		return ltc3887;
 	else if ((chip_id & LTM4676_ID_MASK) == LTM4676_ID ||
 		 (chip_id & LTM4676_ID_MASK) == LTM4676_ID_2 ||
 		 (chip_id & LTM4676_ID_MASK) == LTM4676A_ID)
@@ -511,6 +518,7 @@ static int ltc2978_probe(struct i2c_client *client,
 		}
 		break;
 	case ltc3880:
+	case ltc3887:
 	case ltm4676:
 		info->read_word_data = ltc3880_read_word_data;
 		info->pages = LTC3880_NUM_PAGES;
@@ -573,6 +581,7 @@ static const struct of_device_id ltc2978_of_match[] = {
 	{ .compatible = "lltc,ltc3880" },
 	{ .compatible = "lltc,ltc3882" },
 	{ .compatible = "lltc,ltc3883" },
+	{ .compatible = "lltc,ltc3887" },
 	{ .compatible = "lltc,ltm4676" },
 	{ }
 };
-- 
cgit v1.2.3


From e8fed985d7bd6cda695e196028b54a5f3d2d91bb Mon Sep 17 00:00:00 2001
From: Rick Jones <rick.jones2@hp.com>
Date: Wed, 12 Aug 2015 10:23:14 -0700
Subject: documentation: bring vxlan documentation more up-to-date

A few things have changed since the previous version of the vxlan
documentation was written, so update it and correct some grammar and
such while we are at it.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/vxlan.txt | 52 ++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 24 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/vxlan.txt b/Documentation/networking/vxlan.txt
index 6d993510f091..c28f4989c3f0 100644
--- a/Documentation/networking/vxlan.txt
+++ b/Documentation/networking/vxlan.txt
@@ -1,32 +1,36 @@
 Virtual eXtensible Local Area Networking documentation
 ======================================================
 
-The VXLAN protocol is a tunnelling protocol that is designed to
-solve the problem of limited number of available VLAN's (4096).
-With VXLAN identifier is expanded to 24 bits.
-
-It is a draft RFC standard, that is implemented by Cisco Nexus,
-Vmware and Brocade. The protocol runs over UDP using a single
-destination port (still not standardized by IANA).
-This document describes the Linux kernel tunnel device,
-there is also an implantation of VXLAN for Openvswitch.
-
-Unlike most tunnels, a VXLAN is a 1 to N network, not just point
-to point. A VXLAN device can either dynamically learn the IP address
-of the other end, in a manner similar to a learning bridge, or the
-forwarding entries can be configured statically.
-
-The management of vxlan is done in a similar fashion to it's
-too closest neighbors GRE and VLAN. Configuring VXLAN requires
-the version of iproute2 that matches the kernel release
-where VXLAN was first merged upstream.
+The VXLAN protocol is a tunnelling protocol designed to solve the
+problem of limited VLAN IDs (4096) in IEEE 802.1q.  With VXLAN the
+size of the identifier is expanded to 24 bits (16777216).
+
+VXLAN is described by IETF RFC 7348, and has been implemented by a
+number of vendors.  The protocol runs over UDP using a single
+destination port.  This document describes the Linux kernel tunnel
+device, there is also a separate implementation of VXLAN for
+Openvswitch.
+
+Unlike most tunnels, a VXLAN is a 1 to N network, not just point to
+point. A VXLAN device can learn the IP address of the other endpoint
+either dynamically in a manner similar to a learning bridge, or make
+use of statically-configured forwarding entries.
+
+The management of vxlan is done in a manner similar to its two closest
+neighbors GRE and VLAN. Configuring VXLAN requires the version of
+iproute2 that matches the kernel release where VXLAN was first merged
+upstream.
 
 1. Create vxlan device
-  # ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth1
-
-This creates a new device (vxlan0). The device uses the
-the multicast group 239.1.1.1 over eth1 to handle packets where
-no entry is in the forwarding table.
+ # ip link add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth1 dstport 4789
+
+This creates a new device named vxlan0.  The device uses the multicast
+group 239.1.1.1 over eth1 to handle traffic for which there is no
+entry in the forwarding table.  The destination port number is set to
+the IANA-assigned value of 4789.  The Linux implementation of VXLAN
+pre-dates the IANA's selection of a standard destination port number
+and uses the Linux-selected value by default to maintain backwards
+compatibility.
 
 2. Delete vxlan device
   # ip link delete vxlan0
-- 
cgit v1.2.3


From ca6bc691b1b7cf45862410952806f73c1aa62fe6 Mon Sep 17 00:00:00 2001
From: Chen-Yu Tsai <wens@csie.org>
Date: Tue, 11 Aug 2015 13:32:55 +0800
Subject: crypto: sunxi-ss - Document optional reset control bindings

Later Allwinner SoCs split out the reset controls for individual modules
out of the clock gate controls. The "Security System" crypto engine is
no different.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
 Documentation/devicetree/bindings/crypto/sun4i-ss.txt | 4 ++++
 1 file changed, 4 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/crypto/sun4i-ss.txt b/Documentation/devicetree/bindings/crypto/sun4i-ss.txt
index 1e02d177fea6..5d38e9b7033f 100644
--- a/Documentation/devicetree/bindings/crypto/sun4i-ss.txt
+++ b/Documentation/devicetree/bindings/crypto/sun4i-ss.txt
@@ -9,6 +9,10 @@ Required properties:
 	* "ahb" : AHB gating clock
 	* "mod" : SS controller clock
 
+Optional properties:
+ - resets : phandle + reset specifier pair
+ - reset-names : must contain "ahb"
+
 Example:
 	crypto: crypto-engine@01c15000 {
 		compatible = "allwinner,sun4i-a10-crypto";
-- 
cgit v1.2.3


From 77a775b7ccaf86b0bb67ceaaf3b6d2720e12b506 Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij@linaro.org>
Date: Mon, 10 Aug 2015 11:51:46 +0200
Subject: gpio/ABI: document what is already the case
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit 926b663ce8215ba448960e1ff6e58b67a2c3b99b
"gpiolib: allow GPIOs to be named" added the ability to
name GPIO lines by an array of names stored in the GPIO
chip. This was in 2009 and has been an ABI since. Let's
document it properly.

Cc: Daniel Silverstone <dsilvers@digital-scurf.org>
Cc: Markus Pargmann <mpa@pengutronix.de>
Cc: Johan Hovold <johan@kernel.org>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 Documentation/ABI/testing/sysfs-gpio | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-gpio b/Documentation/ABI/testing/sysfs-gpio
index 80f4c94c7bef..55ffa2df1c10 100644
--- a/Documentation/ABI/testing/sysfs-gpio
+++ b/Documentation/ABI/testing/sysfs-gpio
@@ -16,7 +16,8 @@ Description:
     /sys/class/gpio
 	/export ... asks the kernel to export a GPIO to userspace
 	/unexport ... to return a GPIO to the kernel
-	/gpioN ... for each exported GPIO #N
+	/gpioN ... for each exported GPIO #N OR
+	/<LINE-NAME> ... for a properly named GPIO line
 	    /value ... always readable, writes fail for input GPIOs
 	    /direction ... r/w as: in, out (default low); write: high, low
 	    /edge ... r/w as: none, falling, rising, both
-- 
cgit v1.2.3


From 2ec3182f9c20a9eef0dacc0512cf2ca2df7be5ad Mon Sep 17 00:00:00 2001
From: Dongsu Park <dpark@posteo.net>
Date: Fri, 19 Dec 2014 14:53:03 +0100
Subject: Documentation: update notes in biovecs about arbitrarily sized bios

Update block/biovecs.txt so that it includes a note on what kind of
effects arbitrarily sized bios would bring to the block layer.
Also fix a trivial typo, bio_iter_iovec.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
---
 Documentation/block/biovecs.txt | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt
index 74a32ad52f53..25689584e6e0 100644
--- a/Documentation/block/biovecs.txt
+++ b/Documentation/block/biovecs.txt
@@ -24,7 +24,7 @@ particular, presenting the illusion of partially completed biovecs so that
 normal code doesn't have to deal with bi_bvec_done.
 
  * Driver code should no longer refer to biovecs directly; we now have
-   bio_iovec() and bio_iovec_iter() macros that return literal struct biovecs,
+   bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs,
    constructed from the raw biovecs but taking into account bi_bvec_done and
    bi_size.
 
@@ -109,3 +109,11 @@ Other implications:
    over all the biovecs in the new bio - which is silly as it's not needed.
 
    So, don't use bi_vcnt anymore.
+
+ * The current interface allows the block layer to split bios as needed, so we
+   could eliminate a lot of complexity particularly in stacked drivers. Code
+   that creates bios can then create whatever size bios are convenient, and
+   more importantly stacked drivers don't have to deal with both their own bio
+   size limitations and the limitations of the underlying devices. Thus
+   there's no need to define ->merge_bvec_fn() callbacks for individual block
+   drivers.
-- 
cgit v1.2.3


From 81db32a3c4627605a0e950d27a403ea02447ab60 Mon Sep 17 00:00:00 2001
From: Tim Bird <tim.bird@sonymobile.com>
Date: Mon, 10 Aug 2015 15:16:16 -0700
Subject: doc: Add more workqueue functions to the documentation

There are some workqueue functions declared in workqueue.h, so include
that in the workqueue section of the DocBook docs.

Signed-off-by: Tim Bird <tim.bird@sonymobile.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index faf09d4a0ea8..bbc1d7ee9c76 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -66,6 +66,7 @@
 !Ekernel/time/hrtimer.c
      </sect1>
      <sect1><title>Workqueues and Kevents</title>
+!Iinclude/linux/workqueue.h
 !Ekernel/workqueue.c
      </sect1>
      <sect1><title>Internal Functions</title>
-- 
cgit v1.2.3


From 4983953dbd038fd720e7e16eb9bad5d8ea059ccf Mon Sep 17 00:00:00 2001
From: David Drysdale <drysdale@google.com>
Date: Mon, 10 Aug 2015 09:00:44 +0100
Subject: Documentation: describe how to add a system call

Add a document describing the process of adding a new system call,
including the need for a flags argument for future compatibility, and
covering 32-bit/64-bit concerns (albeit in an x86-centric way).

Signed-off-by: David Drysdale <drysdale@google.com>
Reviewed-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Eric B Munson <emunson@akamai.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/adding-syscalls.txt | 527 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 527 insertions(+)
 create mode 100644 Documentation/adding-syscalls.txt

(limited to 'Documentation')

diff --git a/Documentation/adding-syscalls.txt b/Documentation/adding-syscalls.txt
new file mode 100644
index 000000000000..cc2d4ac4f404
--- /dev/null
+++ b/Documentation/adding-syscalls.txt
@@ -0,0 +1,527 @@
+Adding a New System Call
+========================
+
+This document describes what's involved in adding a new system call to the
+Linux kernel, over and above the normal submission advice in
+Documentation/SubmittingPatches.
+
+
+System Call Alternatives
+------------------------
+
+The first thing to consider when adding a new system call is whether one of
+the alternatives might be suitable instead.  Although system calls are the
+most traditional and most obvious interaction points between userspace and the
+kernel, there are other possibilities -- choose what fits best for your
+interface.
+
+ - If the operations involved can be made to look like a filesystem-like
+   object, it may make more sense to create a new filesystem or device.  This
+   also makes it easier to encapsulate the new functionality in a kernel module
+   rather than requiring it to be built into the main kernel.
+     - If the new functionality involves operations where the kernel notifies
+       userspace that something has happened, then returning a new file
+       descriptor for the relevant object allows userspace to use
+       poll/select/epoll to receive that notification.
+     - However, operations that don't map to read(2)/write(2)-like operations
+       have to be implemented as ioctl(2) requests, which can lead to a
+       somewhat opaque API.
+ - If you're just exposing runtime system information, a new node in sysfs
+   (see Documentation/filesystems/sysfs.txt) or the /proc filesystem may be
+   more appropriate.  However, access to these mechanisms requires that the
+   relevant filesystem is mounted, which might not always be the case (e.g.
+   in a namespaced/sandboxed/chrooted environment).  Avoid adding any API to
+   debugfs, as this is not considered a 'production' interface to userspace.
+ - If the operation is specific to a particular file or file descriptor, then
+   an additional fcntl(2) command option may be more appropriate.  However,
+   fcntl(2) is a multiplexing system call that hides a lot of complexity, so
+   this option is best for when the new function is closely analogous to
+   existing fcntl(2) functionality, or the new functionality is very simple
+   (for example, getting/setting a simple flag related to a file descriptor).
+ - If the operation is specific to a particular task or process, then an
+   additional prctl(2) command option may be more appropriate.  As with
+   fcntl(2), this system call is a complicated multiplexor so is best reserved
+   for near-analogs of existing prctl() commands or getting/setting a simple
+   flag related to a process.
+
+
+Designing the API: Planning for Extension
+-----------------------------------------
+
+A new system call forms part of the API of the kernel, and has to be supported
+indefinitely.  As such, it's a very good idea to explicitly discuss the
+interface on the kernel mailing list, and it's important to plan for future
+extensions of the interface.
+
+(The syscall table is littered with historical examples where this wasn't done,
+together with the corresponding follow-up system calls -- eventfd/eventfd2,
+dup2/dup3, inotify_init/inotify_init1,  pipe/pipe2, renameat/renameat2 -- so
+learn from the history of the kernel and plan for extensions from the start.)
+
+For simpler system calls that only take a couple of arguments, the preferred
+way to allow for future extensibility is to include a flags argument to the
+system call.  To make sure that userspace programs can safely use flags
+between kernel versions, check whether the flags value holds any unknown
+flags, and reject the system call (with EINVAL) if it does:
+
+    if (flags & ~(THING_FLAG1 | THING_FLAG2 | THING_FLAG3))
+        return -EINVAL;
+
+(If no flags values are used yet, check that the flags argument is zero.)
+
+For more sophisticated system calls that involve a larger number of arguments,
+it's preferred to encapsulate the majority of the arguments into a structure
+that is passed in by pointer.  Such a structure can cope with future extension
+by including a size argument in the structure:
+
+    struct xyzzy_params {
+        u32 size; /* userspace sets p->size = sizeof(struct xyzzy_params) */
+        u32 param_1;
+        u64 param_2;
+        u64 param_3;
+    };
+
+As long as any subsequently added field, say param_4, is designed so that a
+zero value gives the previous behaviour, then this allows both directions of
+version mismatch:
+
+ - To cope with a later userspace program calling an older kernel, the kernel
+   code should check that any memory beyond the size of the structure that it
+   expects is zero (effectively checking that param_4 == 0).
+ - To cope with an older userspace program calling a newer kernel, the kernel
+   code can zero-extend a smaller instance of the structure (effectively
+   setting param_4 = 0).
+
+See perf_event_open(2) and the perf_copy_attr() function (in
+kernel/events/core.c) for an example of this approach.
+
+
+Designing the API: Other Considerations
+---------------------------------------
+
+If your new system call allows userspace to refer to a kernel object, it
+should use a file descriptor as the handle for that object -- don't invent a
+new type of userspace object handle when the kernel already has mechanisms and
+well-defined semantics for using file descriptors.
+
+If your new xyzzy(2) system call does return a new file descriptor, then the
+flags argument should include a value that is equivalent to setting O_CLOEXEC
+on the new FD.  This makes it possible for userspace to close the timing
+window between xyzzy() and calling fcntl(fd, F_SETFD, FD_CLOEXEC), where an
+unexpected fork() and execve() in another thread could leak a descriptor to
+the exec'ed program. (However, resist the temptation to re-use the actual value
+of the O_CLOEXEC constant, as it is architecture-specific and is part of a
+numbering space of O_* flags that is fairly full.)
+
+If your system call returns a new file descriptor, you should also consider
+what it means to use the poll(2) family of system calls on that file
+descriptor. Making a file descriptor ready for reading or writing is the
+normal way for the kernel to indicate to userspace that an event has
+occurred on the corresponding kernel object.
+
+If your new xyzzy(2) system call involves a filename argument:
+
+    int sys_xyzzy(const char __user *path, ..., unsigned int flags);
+
+you should also consider whether an xyzzyat(2) version is more appropriate:
+
+    int sys_xyzzyat(int dfd, const char __user *path, ..., unsigned int flags);
+
+This allows more flexibility for how userspace specifies the file in question;
+in particular it allows userspace to request the functionality for an
+already-opened file descriptor using the AT_EMPTY_PATH flag, effectively giving
+an fxyzzy(3) operation for free:
+
+ - xyzzyat(AT_FDCWD, path, ..., 0) is equivalent to xyzzy(path,...)
+ - xyzzyat(fd, "", ..., AT_EMPTY_PATH) is equivalent to fxyzzy(fd, ...)
+
+(For more details on the rationale of the *at() calls, see the openat(2) man
+page; for an example of AT_EMPTY_PATH, see the statat(2) man page.)
+
+If your new xyzzy(2) system call involves a parameter describing an offset
+within a file, make its type loff_t so that 64-bit offsets can be supported
+even on 32-bit architectures.
+
+If your new xyzzy(2) system call involves privileged functionality, it needs
+to be governed by the appropriate Linux capability bit (checked with a call to
+capable()), as described in the capabilities(7) man page.  Choose an existing
+capability bit that governs related functionality, but try to avoid combining
+lots of only vaguely related functions together under the same bit, as this
+goes against capabilities' purpose of splitting the power of root.  In
+particular, avoid adding new uses of the already overly-general CAP_SYS_ADMIN
+capability.
+
+If your new xyzzy(2) system call manipulates a process other than the calling
+process, it should be restricted (using a call to ptrace_may_access()) so that
+only a calling process with the same permissions as the target process, or
+with the necessary capabilities, can manipulate the target process.
+
+Finally, be aware that some non-x86 architectures have an easier time if
+system call parameters that are explicitly 64-bit fall on odd-numbered
+arguments (i.e. parameter 1, 3, 5), to allow use of contiguous pairs of 32-bit
+registers.  (This concern does not apply if the arguments are part of a
+structure that's passed in by pointer.)
+
+
+Proposing the API
+-----------------
+
+To make new system calls easy to review, it's best to divide up the patchset
+into separate chunks.  These should include at least the following items as
+distinct commits (each of which is described further below):
+
+ - The core implementation of the system call, together with prototypes,
+   generic numbering, Kconfig changes and fallback stub implementation.
+ - Wiring up of the new system call for one particular architecture, usually
+   x86 (including all of x86_64, x86_32 and x32).
+ - A demonstration of the use of the new system call in userspace via a
+   selftest in tools/testing/selftests/.
+ - A draft man-page for the new system call, either as plain text in the
+   cover letter, or as a patch to the (separate) man-pages repository.
+
+New system call proposals, like any change to the kernel's API, should always
+be cc'ed to linux-api@vger.kernel.org.
+
+
+Generic System Call Implementation
+----------------------------------
+
+The main entry point for your new xyzzy(2) system call will be called
+sys_xyzzy(), but you add this entry point with the appropriate
+SYSCALL_DEFINEn() macro rather than explicitly.  The 'n' indicates the number
+of arguments to the system call, and the macro takes the system call name
+followed by the (type, name) pairs for the parameters as arguments.  Using
+this macro allows metadata about the new system call to be made available for
+other tools.
+
+The new entry point also needs a corresponding function prototype, in
+include/linux/syscalls.h, marked as asmlinkage to match the way that system
+calls are invoked:
+
+    asmlinkage long sys_xyzzy(...);
+
+Some architectures (e.g. x86) have their own architecture-specific syscall
+tables, but several other architectures share a generic syscall table. Add your
+new system call to the generic list by adding an entry to the list in
+include/uapi/asm-generic/unistd.h:
+
+    #define __NR_xyzzy 292
+    __SYSCALL(__NR_xyzzy, sys_xyzzy)
+
+Also update the __NR_syscalls count to reflect the additional system call, and
+note that if multiple new system calls are added in the same merge window,
+your new syscall number may get adjusted to resolve conflicts.
+
+The file kernel/sys_ni.c provides a fallback stub implementation of each system
+call, returning -ENOSYS.  Add your new system call here too:
+
+    cond_syscall(sys_xyzzy);
+
+Your new kernel functionality, and the system call that controls it, should
+normally be optional, so add a CONFIG option (typically to init/Kconfig) for
+it. As usual for new CONFIG options:
+
+ - Include a description of the new functionality and system call controlled
+   by the option.
+ - Make the option depend on EXPERT if it should be hidden from normal users.
+ - Make any new source files implementing the function dependent on the CONFIG
+   option in the Makefile (e.g. "obj-$(CONFIG_XYZZY_SYSCALL) += xyzzy.c").
+ - Double check that the kernel still builds with the new CONFIG option turned
+   off.
+
+To summarize, you need a commit that includes:
+
+ - CONFIG option for the new function, normally in init/Kconfig
+ - SYSCALL_DEFINEn(xyzzy, ...) for the entry point
+ - corresponding prototype in include/linux/syscalls.h
+ - generic table entry in include/uapi/asm-generic/unistd.h
+ - fallback stub in kernel/sys_ni.c
+
+
+x86 System Call Implementation
+------------------------------
+
+To wire up your new system call for x86 platforms, you need to update the
+master syscall tables.  Assuming your new system call isn't special in some
+way (see below), this involves a "common" entry (for x86_64 and x32) in
+arch/x86/entry/syscalls/syscall_64.tbl:
+
+    333   common   xyzzy     sys_xyzzy
+
+and an "i386" entry in arch/x86/entry/syscalls/syscall_32.tbl:
+
+    380   i386     xyzzy     sys_xyzzy
+
+Again, these numbers are liable to be changed if there are conflicts in the
+relevant merge window.
+
+
+Compatibility System Calls (Generic)
+------------------------------------
+
+For most system calls the same 64-bit implementation can be invoked even when
+the userspace program is itself 32-bit; even if the system call's parameters
+include an explicit pointer, this is handled transparently.
+
+However, there are a couple of situations where a compatibility layer is
+needed to cope with size differences between 32-bit and 64-bit.
+
+The first is if the 64-bit kernel also supports 32-bit userspace programs, and
+so needs to parse areas of (__user) memory that could hold either 32-bit or
+64-bit values.  In particular, this is needed whenever a system call argument
+is:
+
+ - a pointer to a pointer
+ - a pointer to a struct containing a pointer (e.g. struct iovec __user *)
+ - a pointer to a varying sized integral type (time_t, off_t, long, ...)
+ - a pointer to a struct containing a varying sized integral type.
+
+The second situation that requires a compatibility layer is if one of the
+system call's arguments has a type that is explicitly 64-bit even on a 32-bit
+architecture, for example loff_t or __u64.  In this case, a value that arrives
+at a 64-bit kernel from a 32-bit application will be split into two 32-bit
+values, which then need to be re-assembled in the compatibility layer.
+
+(Note that a system call argument that's a pointer to an explicit 64-bit type
+does *not* need a compatibility layer; for example, splice(2)'s arguments of
+type loff_t __user * do not trigger the need for a compat_ system call.)
+
+The compatibility version of the system call is called compat_sys_xyzzy(), and
+is added with the COMPAT_SYSCALL_DEFINEn() macro, analogously to
+SYSCALL_DEFINEn.  This version of the implementation runs as part of a 64-bit
+kernel, but expects to receive 32-bit parameter values and does whatever is
+needed to deal with them.  (Typically, the compat_sys_ version converts the
+values to 64-bit versions and either calls on to the sys_ version, or both of
+them call a common inner implementation function.)
+
+The compat entry point also needs a corresponding function prototype, in
+include/linux/compat.h, marked as asmlinkage to match the way that system
+calls are invoked:
+
+    asmlinkage long compat_sys_xyzzy(...);
+
+If the system call involves a structure that is laid out differently on 32-bit
+and 64-bit systems, say struct xyzzy_args, then the include/linux/compat.h
+header file should also include a compat version of the structure (struct
+compat_xyzzy_args) where each variable-size field has the appropriate compat_
+type that corresponds to the type in struct xyzzy_args.  The
+compat_sys_xyzzy() routine can then use this compat_ structure to parse the
+arguments from a 32-bit invocation.
+
+For example, if there are fields:
+
+    struct xyzzy_args {
+        const char __user *ptr;
+        __kernel_long_t varying_val;
+        u64 fixed_val;
+        /* ... */
+    };
+
+in struct xyzzy_args, then struct compat_xyzzy_args would have:
+
+    struct compat_xyzzy_args {
+        compat_uptr_t ptr;
+        compat_long_t varying_val;
+        u64 fixed_val;
+        /* ... */
+    };
+
+The generic system call list also needs adjusting to allow for the compat
+version; the entry in include/uapi/asm-generic/unistd.h should use
+__SC_COMP rather than __SYSCALL:
+
+    #define __NR_xyzzy 292
+    __SC_COMP(__NR_xyzzy, sys_xyzzy, compat_sys_xyzzy)
+
+To summarize, you need:
+
+ - a COMPAT_SYSCALL_DEFINEn(xyzzy, ...) for the compat entry point
+ - corresponding prototype in include/linux/compat.h
+ - (if needed) 32-bit mapping struct in include/linux/compat.h
+ - instance of __SC_COMP not __SYSCALL in include/uapi/asm-generic/unistd.h
+
+
+Compatibility System Calls (x86)
+--------------------------------
+
+To wire up the x86 architecture of a system call with a compatibility version,
+the entries in the syscall tables need to be adjusted.
+
+First, the entry in arch/x86/entry/syscalls/syscall_32.tbl gets an extra
+column to indicate that a 32-bit userspace program running on a 64-bit kernel
+should hit the compat entry point:
+
+    380   i386     xyzzy     sys_xyzzy    compat_sys_xyzzy
+
+Second, you need to figure out what should happen for the x32 ABI version of
+the new system call.  There's a choice here: the layout of the arguments
+should either match the 64-bit version or the 32-bit version.
+
+If there's a pointer-to-a-pointer involved, the decision is easy: x32 is
+ILP32, so the layout should match the 32-bit version, and the entry in
+arch/x86/entry/syscalls/syscall_64.tbl is split so that x32 programs hit the
+compatibility wrapper:
+
+    333   64       xyzzy     sys_xyzzy
+    ...
+    555   x32      xyzzy     compat_sys_xyzzy
+
+If no pointers are involved, then it is preferable to re-use the 64-bit system
+call for the x32 ABI (and consequently the entry in
+arch/x86/entry/syscalls/syscall_64.tbl is unchanged).
+
+In either case, you should check that the types involved in your argument
+layout do indeed map exactly from x32 (-mx32) to either the 32-bit (-m32) or
+64-bit (-m64) equivalents.
+
+
+System Calls Returning Elsewhere
+--------------------------------
+
+For most system calls, once the system call is complete the user program
+continues exactly where it left off -- at the next instruction, with the
+stack the same and most of the registers the same as before the system call,
+and with the same virtual memory space.
+
+However, a few system calls do things differently.  They might return to a
+different location (rt_sigreturn) or change the memory space (fork/vfork/clone)
+or even architecture (execve/execveat) of the program.
+
+To allow for this, the kernel implementation of the system call may need to
+save and restore additional registers to the kernel stack, allowing complete
+control of where and how execution continues after the system call.
+
+This is arch-specific, but typically involves defining assembly entry points
+that save/restore additional registers and invoke the real system call entry
+point.
+
+For x86_64, this is implemented as a stub_xyzzy entry point in
+arch/x86/entry/entry_64.S, and the entry in the syscall table
+(arch/x86/entry/syscalls/syscall_64.tbl) is adjusted to match:
+
+    333   common   xyzzy     stub_xyzzy
+
+The equivalent for 32-bit programs running on a 64-bit kernel is normally
+called stub32_xyzzy and implemented in arch/x86/entry/entry_64_compat.S,
+with the corresponding syscall table adjustment in
+arch/x86/entry/syscalls/syscall_32.tbl:
+
+    380   i386     xyzzy     sys_xyzzy    stub32_xyzzy
+
+If the system call needs a compatibility layer (as in the previous section)
+then the stub32_ version needs to call on to the compat_sys_ version of the
+system call rather than the native 64-bit version.  Also, if the x32 ABI
+implementation is not common with the x86_64 version, then its syscall
+table will also need to invoke a stub that calls on to the compat_sys_
+version.
+
+For completeness, it's also nice to set up a mapping so that user-mode Linux
+still works -- its syscall table will reference stub_xyzzy, but the UML build
+doesn't include arch/x86/entry/entry_64.S implementation (because UML
+simulates registers etc).  Fixing this is as simple as adding a #define to
+arch/x86/um/sys_call_table_64.c:
+
+    #define stub_xyzzy sys_xyzzy
+
+
+Other Details
+-------------
+
+Most of the kernel treats system calls in a generic way, but there is the
+occasional exception that may need updating for your particular system call.
+
+The audit subsystem is one such special case; it includes (arch-specific)
+functions that classify some special types of system call -- specifically
+file open (open/openat), program execution (execve/exeveat) or socket
+multiplexor (socketcall) operations. If your new system call is analogous to
+one of these, then the audit system should be updated.
+
+More generally, if there is an existing system call that is analogous to your
+new system call, it's worth doing a kernel-wide grep for the existing system
+call to check there are no other special cases.
+
+
+Testing
+-------
+
+A new system call should obviously be tested; it is also useful to provide
+reviewers with a demonstration of how user space programs will use the system
+call.  A good way to combine these aims is to include a simple self-test
+program in a new directory under tools/testing/selftests/.
+
+For a new system call, there will obviously be no libc wrapper function and so
+the test will need to invoke it using syscall(); also, if the system call
+involves a new userspace-visible structure, the corresponding header will need
+to be installed to compile the test.
+
+Make sure the selftest runs successfully on all supported architectures.  For
+example, check that it works when compiled as an x86_64 (-m64), x86_32 (-m32)
+and x32 (-mx32) ABI program.
+
+For more extensive and thorough testing of new functionality, you should also
+consider adding tests to the Linux Test Project, or to the xfstests project
+for filesystem-related changes.
+ - https://linux-test-project.github.io/
+ - git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
+
+
+Man Page
+--------
+
+All new system calls should come with a complete man page, ideally using groff
+markup, but plain text will do.  If groff is used, it's helpful to include a
+pre-rendered ASCII version of the man page in the cover email for the
+patchset, for the convenience of reviewers.
+
+The man page should be cc'ed to linux-man@vger.kernel.org
+For more details, see https://www.kernel.org/doc/man-pages/patches.html
+
+References and Sources
+----------------------
+
+ - LWN article from Michael Kerrisk on use of flags argument in system calls:
+   https://lwn.net/Articles/585415/
+ - LWN article from Michael Kerrisk on how to handle unknown flags in a system
+   call: https://lwn.net/Articles/588444/
+ - LWN article from Jake Edge describing constraints on 64-bit system call
+   arguments: https://lwn.net/Articles/311630/
+ - Pair of LWN articles from David Drysdale that describe the system call
+   implementation paths in detail for v3.14:
+    - https://lwn.net/Articles/604287/
+    - https://lwn.net/Articles/604515/
+ - Architecture-specific requirements for system calls are discussed in the
+   syscall(2) man-page:
+   http://man7.org/linux/man-pages/man2/syscall.2.html#NOTES
+ - Collated emails from Linus Torvalds discussing the problems with ioctl():
+   http://yarchive.net/comp/linux/ioctl.html
+ - "How to not invent kernel interfaces", Arnd Bergmann,
+   http://www.ukuug.org/events/linux2007/2007/papers/Bergmann.pdf
+ - LWN article from Michael Kerrisk on avoiding new uses of CAP_SYS_ADMIN:
+   https://lwn.net/Articles/486306/
+ - Recommendation from Andrew Morton that all related information for a new
+   system call should come in the same email thread:
+   https://lkml.org/lkml/2014/7/24/641
+ - Recommendation from Michael Kerrisk that a new system call should come with
+   a man page: https://lkml.org/lkml/2014/6/13/309
+ - Suggestion from Thomas Gleixner that x86 wire-up should be in a separate
+   commit: https://lkml.org/lkml/2014/11/19/254
+ - Suggestion from Greg Kroah-Hartman that it's good for new system calls to
+   come with a man-page & selftest: https://lkml.org/lkml/2014/3/19/710
+ - Discussion from Michael Kerrisk of new system call vs. prctl(2) extension:
+   https://lkml.org/lkml/2014/6/3/411
+ - Suggestion from Ingo Molnar that system calls that involve multiple
+   arguments should encapsulate those arguments in a struct, which includes a
+   size field for future extensibility: https://lkml.org/lkml/2015/7/30/117
+ - Numbering oddities arising from (re-)use of O_* numbering space flags:
+    - commit 75069f2b5bfb ("vfs: renumber FMODE_NONOTIFY and add to uniqueness
+      check")
+    - commit 12ed2e36c98a ("fanotify: FMODE_NONOTIFY and __O_SYNC in sparc
+      conflict")
+    - commit bb458c644a59 ("Safer ABI for O_TMPFILE")
+ - Discussion from Matthew Wilcox about restrictions on 64-bit arguments:
+   https://lkml.org/lkml/2008/12/12/187
+ - Recommendation from Greg Kroah-Hartman that unknown flags should be
+   policed: https://lkml.org/lkml/2014/7/17/577
+ - Recommendation from Linus Torvalds that x32 system calls should prefer
+   compatibility with 64-bit versions rather than 32-bit versions:
+   https://lkml.org/lkml/2011/8/31/244
-- 
cgit v1.2.3


From 7bbac697e4a6e64cc4bae26d57310c916aa0da7a Mon Sep 17 00:00:00 2001
From: Leo Yan <leo.yan@linaro.org>
Date: Mon, 3 Aug 2015 09:30:25 +0800
Subject: Documentation: minor typo fix in mailbox.txt

Fix minor typo so that can pass correct pointer variable for
container_of().

Signed-off-by: Leo Yan <leo.yan@linaro.org>
[jc: tweaked formatting]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/mailbox.txt | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/mailbox.txt b/Documentation/mailbox.txt
index 1092ad9578da..7ed371c85204 100644
--- a/Documentation/mailbox.txt
+++ b/Documentation/mailbox.txt
@@ -51,8 +51,7 @@ struct demo_client {
  */
 static void message_from_remote(struct mbox_client *cl, void *mssg)
 {
-	struct demo_client *dc = container_of(mbox_client,
-						struct demo_client, cl);
+	struct demo_client *dc = container_of(cl, struct demo_client, cl);
 	if (dc->async) {
 		if (is_an_ack(mssg)) {
 			/* An ACK to our last sample sent */
@@ -68,8 +67,7 @@ static void message_from_remote(struct mbox_client *cl, void *mssg)
 
 static void sample_sent(struct mbox_client *cl, void *mssg, int r)
 {
-	struct demo_client *dc = container_of(mbox_client,
-						struct demo_client, cl);
+	struct demo_client *dc = container_of(cl, struct demo_client, cl);
 	complete(&dc->c);
 }
 
-- 
cgit v1.2.3


From dd19f83d6cd90e4b7a601da2ed40d2a9d70aaf10 Mon Sep 17 00:00:00 2001
From: Scott Feldman <sfeldma@gmail.com>
Date: Wed, 12 Aug 2015 18:45:25 -0700
Subject: rocker: hook ndo_neigh_destroy to cleanup neigh refs in driver

Rocker driver tracks arp_tbl neighs to resolve IPv4 route nexthops.  The
driver uses NETEVENT_NEIGH_UPDATE for neigh adds and updates, but there is
no event when the neigh is removed from the device (such as when the device
goes admin down).  This patches hooks ndo_neigh_destroy so the driver can
know when a neigh is removed from the device.  In response, the driver will
purge the neigh entry from its internal tbl.

I didn't find an in-tree users of ndo_neigh_destroy, so I'm not sure if
this ndo is vestigial or if there are out-of-tree users.  In any case, it
does what I need here.  An alternative design would be to generate
NETEVENT_NEIGH_UPDATE event when neigh is being destroyed, setting state to
NUD_NONE so driver knows neigh entry is dead.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/switchdev.txt |  3 ++-
 drivers/net/ethernet/rocker/rocker.c   | 11 +++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index 9825f32a8634..476df0496686 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -367,4 +367,5 @@ driver's rocker_port_ipv4_resolve() for an example.
 
 The driver can monitor for updates to arp_tbl using the netevent notifier
 NETEVENT_NEIGH_UPDATE.  The device can be programmed with resolved nexthops
-for the routes as arp_tbl updates.
+for the routes as arp_tbl updates.  The driver implements ndo_neigh_destroy
+to know when arp_tbl neighbor entries are purged from the port.
diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 27f3e2b2ae00..a7cb74ac4758 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4264,6 +4264,16 @@ static int rocker_port_change_proto_down(struct net_device *dev,
 	return 0;
 }
 
+static void rocker_port_neigh_destroy(struct neighbour *n)
+{
+	struct rocker_port *rocker_port = netdev_priv(n->dev);
+	int flags = ROCKER_OP_FLAG_REMOVE | ROCKER_OP_FLAG_NOWAIT;
+	__be32 ip_addr = *(__be32 *)n->primary_key;
+
+	rocker_port_ipv4_neigh(rocker_port, SWITCHDEV_TRANS_NONE,
+			       flags, ip_addr, n->ha);
+}
+
 static const struct net_device_ops rocker_port_netdev_ops = {
 	.ndo_open			= rocker_port_open,
 	.ndo_stop			= rocker_port_stop,
@@ -4278,6 +4288,7 @@ static const struct net_device_ops rocker_port_netdev_ops = {
 	.ndo_fdb_dump			= switchdev_port_fdb_dump,
 	.ndo_get_phys_port_name		= rocker_port_get_phys_port_name,
 	.ndo_change_proto_down		= rocker_port_change_proto_down,
+	.ndo_neigh_destroy		= rocker_port_neigh_destroy,
 };
 
 /********************
-- 
cgit v1.2.3


From 1bd57127d4aaff518cf93f4809ec2f11b2baf865 Mon Sep 17 00:00:00 2001
From: Peter Chen <peter.chen@freescale.com>
Date: Wed, 15 Jul 2015 13:53:05 +0800
Subject: Doc: usb: ci-hdrc-usb2: add itc-setting at binding doc

It is used to configure the ITC (in register USBCMD) value
for kinds of applications.

Signed-off-by: Peter Chen <peter.chen@freescale.com>
---
 Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt
index 553e2fae3a76..4159c8cf78e3 100644
--- a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt
+++ b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt
@@ -30,6 +30,8 @@ Optional properties:
   argument that indicate usb controller index
 - disable-over-current: (FSL only) disable over current detect
 - external-vbus-divider: (FSL only) enables off-chip resistor divider for Vbus
+- itc-setting: interrupt threshold control register control, the setting
+  should be aligned with ITC bits at register USBCMD.
 
 Example:
 
@@ -41,4 +43,5 @@ Example:
 		phys = <&usb_phy0>;
 		phy-names = "usb-phy";
 		vbus-supply = <&reg_usb0_vbus>;
+		gadget-itc-setting = <0x4>; /* 4 micro-frames */
 	};
-- 
cgit v1.2.3


From c6fba7b750efe64a69dd83e0ff7249abd0e783aa Mon Sep 17 00:00:00 2001
From: Peter Chen <peter.chen@freescale.com>
Date: Tue, 28 Jul 2015 13:33:15 +0800
Subject: Doc: usb: ci-hdrc-usb2: add ahb-burst-config for binding doc

It is used to change ahb burst configuration for platforms, it is
vendor specific.

Signed-off-by: Peter Chen <peter.chen@freescale.com>
---
 Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt | 7 +++++++
 1 file changed, 7 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt
index 4159c8cf78e3..d52a747fb859 100644
--- a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt
+++ b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt
@@ -32,6 +32,11 @@ Optional properties:
 - external-vbus-divider: (FSL only) enables off-chip resistor divider for Vbus
 - itc-setting: interrupt threshold control register control, the setting
   should be aligned with ITC bits at register USBCMD.
+- ahb-burst-config: it is vendor dependent, the required value should be
+  aligned with AHBBRST at SBUSCFG, the range is from 0x0 to 0x7. This
+  property is used to change AHB burst configuration, check the chipidea
+  spec for meaning of each value. If this property is not existed, it
+  will use the reset value.
 
 Example:
 
@@ -44,4 +49,6 @@ Example:
 		phy-names = "usb-phy";
 		vbus-supply = <&reg_usb0_vbus>;
 		gadget-itc-setting = <0x4>; /* 4 micro-frames */
+		 /* Incremental burst of unspecified length */
+		ahb-burst-config = <0x0>;
 	};
-- 
cgit v1.2.3


From bd6e9d115df7fc6f11f3021c6fc05e3eaeb04a32 Mon Sep 17 00:00:00 2001
From: Peter Chen <peter.chen@freescale.com>
Date: Fri, 7 Aug 2015 15:08:02 +0800
Subject: Doc: usb: ci-hdrc-usb2: add tx(rx)-burst-config-dword for binding doc

It is used to override the default setting for burst size, changing
burst size takes effect only when the SBUSCFG.AHBBRST = 0.

Signed-off-by: Peter Chen <peter.chen@freescale.com>
---
 Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt | 10 ++++++++++
 1 file changed, 10 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt
index d52a747fb859..d71ef07bca5d 100644
--- a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt
+++ b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt
@@ -37,6 +37,14 @@ Optional properties:
   property is used to change AHB burst configuration, check the chipidea
   spec for meaning of each value. If this property is not existed, it
   will use the reset value.
+- tx-burst-size-dword: it is vendor dependent, the tx burst size in dword
+  (4 bytes), This register represents the maximum length of a the burst
+  in 32-bit words while moving data from system memory to the USB
+  bus, changing this value takes effect only the SBUSCFG.AHBBRST is 0.
+- rx-burst-size-dword: it is vendor dependent, the rx burst size in dword
+  (4 bytes), This register represents the maximum length of a the burst
+  in 32-bit words while moving data from the USB bus to system memory,
+  changing this value takes effect only the SBUSCFG.AHBBRST is 0.
 
 Example:
 
@@ -51,4 +59,6 @@ Example:
 		gadget-itc-setting = <0x4>; /* 4 micro-frames */
 		 /* Incremental burst of unspecified length */
 		ahb-burst-config = <0x0>;
+		tx-burst-size-dword = <0x10>; /* 64 bytes */
+		rx-burst-size-dword = <0x10>;
 	};
-- 
cgit v1.2.3


From 13e68d8bd05c998cae452a4f3400af1e8edd852e Mon Sep 17 00:00:00 2001
From: Daniel Axtens <dja@axtens.net>
Date: Fri, 14 Aug 2015 17:41:25 +1000
Subject: cxl: Allow the kernel to trust that an image won't change on PERST.

Provide a kernel API and a sysfs entry which allow a user to specify
that when a card is PERSTed, it's image will stay the same, allowing
it to participate in EEH.

cxl_reset is used to reflash the card. In that case, we cannot safely
assert that the image will not change. Therefore, disallow cxl_reset
if the flag is set.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 Documentation/ABI/testing/sysfs-class-cxl | 10 ++++++++++
 drivers/misc/cxl/api.c                    |  7 +++++++
 drivers/misc/cxl/cxl.h                    |  1 +
 drivers/misc/cxl/pci.c                    |  7 +++++++
 drivers/misc/cxl/sysfs.c                  | 26 ++++++++++++++++++++++++++
 include/misc/cxl.h                        | 10 ++++++++++
 6 files changed, 61 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl
index acfe9df83139..b07e86d4597f 100644
--- a/Documentation/ABI/testing/sysfs-class-cxl
+++ b/Documentation/ABI/testing/sysfs-class-cxl
@@ -223,3 +223,13 @@ Description:    write only
                 Writing 1 will issue a PERST to card which may cause the card
                 to reload the FPGA depending on load_image_on_perst.
 Users:		https://github.com/ibm-capi/libcxl
+
+What:		/sys/class/cxl/<card>/perst_reloads_same_image
+Date:		July 2015
+Contact:	linuxppc-dev@lists.ozlabs.org
+Description:	read/write
+		Trust that when an image is reloaded via PERST, it will not
+		have changed.
+		0 = don't trust, the image may be different (default)
+		1 = trust that the image will not change.
+Users:		https://github.com/ibm-capi/libcxl
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 729e0851167d..6a768a9ad22f 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -327,3 +327,10 @@ int cxl_afu_reset(struct cxl_context *ctx)
 	return cxl_afu_check_and_enable(afu);
 }
 EXPORT_SYMBOL_GPL(cxl_afu_reset);
+
+void cxl_perst_reloads_same_image(struct cxl_afu *afu,
+				  bool perst_reloads_same_image)
+{
+	afu->adapter->perst_same_image = perst_reloads_same_image;
+}
+EXPORT_SYMBOL_GPL(cxl_perst_reloads_same_image);
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index d540542f9931..cda02412b01e 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -493,6 +493,7 @@ struct cxl {
 	bool user_image_loaded;
 	bool perst_loads_image;
 	bool perst_select_user;
+	bool perst_same_image;
 };
 
 int cxl_alloc_one_irq(struct cxl *adapter);
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 2b61cb1ee62c..bfbd6478c0c5 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -878,6 +878,12 @@ int cxl_reset(struct cxl *adapter)
 	int i;
 	u32 val;
 
+	if (adapter->perst_same_image) {
+		dev_warn(&dev->dev,
+			 "cxl: refusing to reset/reflash when perst_reloads_same_image is set.\n");
+		return -EINVAL;
+	}
+
 	dev_info(&dev->dev, "CXL reset\n");
 
 	/* pcie_warm_reset requests a fundamental pci reset which includes a
@@ -1151,6 +1157,7 @@ static struct cxl *cxl_init_adapter(struct pci_dev *dev)
 	 * configure/reconfigure
 	 */
 	adapter->perst_loads_image = true;
+	adapter->perst_same_image = false;
 
 	rc = cxl_configure_adapter(adapter, dev);
 	if (rc) {
diff --git a/drivers/misc/cxl/sysfs.c b/drivers/misc/cxl/sysfs.c
index 31f38bc71a3d..6619cf1f6e1f 100644
--- a/drivers/misc/cxl/sysfs.c
+++ b/drivers/misc/cxl/sysfs.c
@@ -112,12 +112,38 @@ static ssize_t load_image_on_perst_store(struct device *device,
 	return count;
 }
 
+static ssize_t perst_reloads_same_image_show(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct cxl *adapter = to_cxl_adapter(device);
+
+	return scnprintf(buf, PAGE_SIZE, "%i\n", adapter->perst_same_image);
+}
+
+static ssize_t perst_reloads_same_image_store(struct device *device,
+				 struct device_attribute *attr,
+				 const char *buf, size_t count)
+{
+	struct cxl *adapter = to_cxl_adapter(device);
+	int rc;
+	int val;
+
+	rc = sscanf(buf, "%i", &val);
+	if ((rc != 1) || !(val == 1 || val == 0))
+		return -EINVAL;
+
+	adapter->perst_same_image = (val == 1 ? true : false);
+	return count;
+}
+
 static struct device_attribute adapter_attrs[] = {
 	__ATTR_RO(caia_version),
 	__ATTR_RO(psl_revision),
 	__ATTR_RO(base_image),
 	__ATTR_RO(image_loaded),
 	__ATTR_RW(load_image_on_perst),
+	__ATTR_RW(perst_reloads_same_image),
 	__ATTR(reset, S_IWUSR, NULL, reset_adapter_store),
 };
 
diff --git a/include/misc/cxl.h b/include/misc/cxl.h
index 7a6c1d6cc173..f2ffe5bd720d 100644
--- a/include/misc/cxl.h
+++ b/include/misc/cxl.h
@@ -200,4 +200,14 @@ unsigned int cxl_fd_poll(struct file *file, struct poll_table_struct *poll);
 ssize_t cxl_fd_read(struct file *file, char __user *buf, size_t count,
 			   loff_t *off);
 
+/*
+ * For EEH, a driver may want to assert a PERST will reload the same image
+ * from flash into the FPGA.
+ *
+ * This is a property of the entire adapter, not a single AFU, so drivers
+ * should set this property with care!
+ */
+void cxl_perst_reloads_same_image(struct cxl_afu *afu,
+				  bool perst_reloads_same_image);
+
 #endif /* _MISC_CXL_H */
-- 
cgit v1.2.3


From cfc411e7fff3e15cd6354ff69773907e2c9d1c0c Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Fri, 14 Aug 2015 15:20:41 +0100
Subject: Move certificate handling to its own directory

Move certificate handling out of the kernel/ directory and into a certs/
directory to get all the weird stuff in one place and move the generated
signing keys into this directory.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
---
 Documentation/module-signing.txt |  18 ++---
 MAINTAINERS                      |   9 +++
 Makefile                         |   4 +-
 certs/Kconfig                    |  42 +++++++++++
 certs/Makefile                   | 147 ++++++++++++++++++++++++++++++++++++
 certs/system_certificates.S      |  23 ++++++
 certs/system_keyring.c           | 157 +++++++++++++++++++++++++++++++++++++++
 crypto/Kconfig                   |   1 +
 init/Kconfig                     |  39 ----------
 kernel/Makefile                  | 143 -----------------------------------
 kernel/system_certificates.S     |  23 ------
 kernel/system_keyring.c          | 157 ---------------------------------------
 12 files changed, 390 insertions(+), 373 deletions(-)
 create mode 100644 certs/Kconfig
 create mode 100644 certs/Makefile
 create mode 100644 certs/system_certificates.S
 create mode 100644 certs/system_keyring.c
 delete mode 100644 kernel/system_certificates.S
 delete mode 100644 kernel/system_keyring.c

(limited to 'Documentation')

diff --git a/Documentation/module-signing.txt b/Documentation/module-signing.txt
index 02a9baf1c72f..a78bf1ffa68c 100644
--- a/Documentation/module-signing.txt
+++ b/Documentation/module-signing.txt
@@ -92,13 +92,13 @@ This has a number of options available:
  (4) "File name or PKCS#11 URI of module signing key" (CONFIG_MODULE_SIG_KEY)
 
      Setting this option to something other than its default of
-     "signing_key.pem" will disable the autogeneration of signing keys and
-     allow the kernel modules to be signed with a key of your choosing.
-     The string provided should identify a file containing both a private
-     key and its corresponding X.509 certificate in PEM form, or — on
-     systems where the OpenSSL ENGINE_pkcs11 is functional — a PKCS#11 URI
-     as defined by RFC7512. In the latter case, the PKCS#11 URI should
-     reference both a certificate and a private key.
+     "certs/signing_key.pem" will disable the autogeneration of signing keys
+     and allow the kernel modules to be signed with a key of your choosing.
+     The string provided should identify a file containing both a private key
+     and its corresponding X.509 certificate in PEM form, or — on systems where
+     the OpenSSL ENGINE_pkcs11 is functional — a PKCS#11 URI as defined by
+     RFC7512. In the latter case, the PKCS#11 URI should reference both a
+     certificate and a private key.
 
      If the PEM file containing the private key is encrypted, or if the
      PKCS#11 token requries a PIN, this can be provided at build time by
@@ -130,12 +130,12 @@ Under normal conditions, when CONFIG_MODULE_SIG_KEY is unchanged from its
 default, the kernel build will automatically generate a new keypair using
 openssl if one does not exist in the file:
 
-	signing_key.pem
+	certs/signing_key.pem
 
 during the building of vmlinux (the public part of the key needs to be built
 into vmlinux) using parameters in the:
 
-	x509.genkey
+	certs/x509.genkey
 
 file (which is also generated if it does not already exist).
 
diff --git a/MAINTAINERS b/MAINTAINERS
index bde2e3f5a10b..294dc59ed5e1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2589,6 +2589,15 @@ S:	Supported
 F:	Documentation/filesystems/ceph.txt
 F:	fs/ceph/
 
+CERTIFICATE HANDLING:
+M:	David Howells <dhowells@redhat.com>
+M:	David Woodhouse <dwmw2@infradead.org>
+L:	keyrings@linux-nfs.org
+S:	Maintained
+F:	Documentation/module-signing.txt
+F:	certs/
+F:	scripts/extract-cert.c
+
 CERTIFIED WIRELESS USB (WUSB) SUBSYSTEM:
 L:	linux-usb@vger.kernel.org
 S:	Orphan
diff --git a/Makefile b/Makefile
index 6ab99d8cc23c..2341942feb85 100644
--- a/Makefile
+++ b/Makefile
@@ -871,7 +871,7 @@ INITRD_COMPRESS-$(CONFIG_RD_LZ4)   := lz4
 
 ifdef CONFIG_MODULE_SIG_ALL
 MODSECKEY = $(CONFIG_MODULE_SIG_KEY)
-MODPUBKEY = ./signing_key.x509
+MODPUBKEY = certs/signing_key.x509
 export MODPUBKEY
 mod_sign_cmd = scripts/sign-file $(CONFIG_MODULE_SIG_HASH) $(MODSECKEY) $(MODPUBKEY)
 else
@@ -881,7 +881,7 @@ export mod_sign_cmd
 
 
 ifeq ($(KBUILD_EXTMOD),)
-core-y		+= kernel/ mm/ fs/ ipc/ security/ crypto/ block/
+core-y		+= kernel/ certs/ mm/ fs/ ipc/ security/ crypto/ block/
 
 vmlinux-dirs	:= $(patsubst %/,%,$(filter %/, $(init-y) $(init-m) \
 		     $(core-y) $(core-m) $(drivers-y) $(drivers-m) \
diff --git a/certs/Kconfig b/certs/Kconfig
new file mode 100644
index 000000000000..b030b9c7ed34
--- /dev/null
+++ b/certs/Kconfig
@@ -0,0 +1,42 @@
+menu "Certificates for signature checking"
+
+config MODULE_SIG_KEY
+	string "File name or PKCS#11 URI of module signing key"
+	default "certs/signing_key.pem"
+	depends on MODULE_SIG
+	help
+         Provide the file name of a private key/certificate in PEM format,
+         or a PKCS#11 URI according to RFC7512. The file should contain, or
+         the URI should identify, both the certificate and its corresponding
+         private key.
+
+         If this option is unchanged from its default "certs/signing_key.pem",
+         then the kernel will automatically generate the private key and
+         certificate as described in Documentation/module-signing.txt
+
+config SYSTEM_TRUSTED_KEYRING
+	bool "Provide system-wide ring of trusted keys"
+	depends on KEYS
+	help
+	  Provide a system keyring to which trusted keys can be added.  Keys in
+	  the keyring are considered to be trusted.  Keys may be added at will
+	  by the kernel from compiled-in data and from hardware key stores, but
+	  userspace may only add extra keys if those keys can be verified by
+	  keys already in the keyring.
+
+	  Keys in this keyring are used by module signature checking.
+
+config SYSTEM_TRUSTED_KEYS
+	string "Additional X.509 keys for default system keyring"
+	depends on SYSTEM_TRUSTED_KEYRING
+	help
+	  If set, this option should be the filename of a PEM-formatted file
+	  containing trusted X.509 certificates to be included in the default
+	  system keyring. Any certificate used for module signing is implicitly
+	  also trusted.
+
+	  NOTE: If you previously provided keys for the system keyring in the
+	  form of DER-encoded *.x509 files in the top-level build directory,
+	  those are no longer used. You will need to set this option instead.
+
+endmenu
diff --git a/certs/Makefile b/certs/Makefile
new file mode 100644
index 000000000000..5d33486d3b20
--- /dev/null
+++ b/certs/Makefile
@@ -0,0 +1,147 @@
+#
+# Makefile for the linux kernel signature checking certificates.
+#
+
+obj-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += system_keyring.o system_certificates.o
+
+###############################################################################
+#
+# When a Kconfig string contains a filename, it is suitable for
+# passing to shell commands. It is surrounded by double-quotes, and
+# any double-quotes or backslashes within it are escaped by
+# backslashes.
+#
+# This is no use for dependencies or $(wildcard). We need to strip the
+# surrounding quotes and the escaping from quotes and backslashes, and
+# we *do* need to escape any spaces in the string. So, for example:
+#
+# Usage: $(eval $(call config_filename,FOO))
+#
+# Defines FOO_FILENAME based on the contents of the CONFIG_FOO option,
+# transformed as described above to be suitable for use within the
+# makefile.
+#
+# Also, if the filename is a relative filename and exists in the source
+# tree but not the build tree, define FOO_SRCPREFIX as $(srctree)/ to
+# be prefixed to *both* command invocation and dependencies.
+#
+# Note: We also print the filenames in the quiet_cmd_foo text, and
+# perhaps ought to have a version specially escaped for that purpose.
+# But it's only cosmetic, and $(patsubst "%",%,$(CONFIG_FOO)) is good
+# enough.  It'll strip the quotes in the common case where there's no
+# space and it's a simple filename, and it'll retain the quotes when
+# there's a space. There are some esoteric cases in which it'll print
+# the wrong thing, but we don't really care. The actual dependencies
+# and commands *do* get it right, with various combinations of single
+# and double quotes, backslashes and spaces in the filenames.
+#
+###############################################################################
+#
+quote := $(firstword " ")
+space :=
+space +=
+space_escape := %%%SPACE%%%
+#
+define config_filename
+ifneq ($$(CONFIG_$(1)),"")
+$(1)_FILENAME := $$(subst \\,\,$$(subst \$$(quote),$$(quote),$$(subst $$(space_escape),\$$(space),$$(patsubst "%",%,$$(subst $$(space),$$(space_escape),$$(CONFIG_$(1)))))))
+ifneq ($$(patsubst /%,%,$$(firstword $$($(1)_FILENAME))),$$(firstword $$($(1)_FILENAME)))
+else
+ifeq ($$(wildcard $$($(1)_FILENAME)),)
+ifneq ($$(wildcard $$(srctree)/$$($(1)_FILENAME)),)
+$(1)_SRCPREFIX := $(srctree)/
+endif
+endif
+endif
+endif
+endef
+#
+###############################################################################
+
+ifeq ($(CONFIG_SYSTEM_TRUSTED_KEYRING),y)
+
+$(eval $(call config_filename,SYSTEM_TRUSTED_KEYS))
+
+# GCC doesn't include .incbin files in -MD generated dependencies (PR#66871)
+$(obj)/system_certificates.o: $(obj)/x509_certificate_list
+
+# Cope with signing_key.x509 existing in $(srctree) not $(objtree)
+AFLAGS_system_certificates.o := -I$(srctree)
+
+quiet_cmd_extract_certs  = EXTRACT_CERTS   $(patsubst "%",%,$(2))
+      cmd_extract_certs  = scripts/extract-cert $(2) $@ || ( rm $@; exit 1)
+
+targets += x509_certificate_list
+$(obj)/x509_certificate_list: scripts/extract-cert $(SYSTEM_TRUSTED_KEYS_SRCPREFIX)$(SYSTEM_TRUSTED_KEYS_FILENAME) FORCE
+	$(call if_changed,extract_certs,$(SYSTEM_TRUSTED_KEYS_SRCPREFIX)$(CONFIG_SYSTEM_TRUSTED_KEYS))
+endif
+
+clean-files := x509_certificate_list .x509.list
+
+ifeq ($(CONFIG_MODULE_SIG),y)
+###############################################################################
+#
+# If module signing is requested, say by allyesconfig, but a key has not been
+# supplied, then one will need to be generated to make sure the build does not
+# fail and that the kernel may be used afterwards.
+#
+###############################################################################
+ifndef CONFIG_MODULE_SIG_HASH
+$(error Could not determine digest type to use from kernel config)
+endif
+
+# We do it this way rather than having a boolean option for enabling an
+# external private key, because 'make randconfig' might enable such a
+# boolean option and we unfortunately can't make it depend on !RANDCONFIG.
+ifeq ($(CONFIG_MODULE_SIG_KEY),"certs/signing_key.pem")
+$(obj)/signing_key.pem: $(obj)/x509.genkey
+	@echo "###"
+	@echo "### Now generating an X.509 key pair to be used for signing modules."
+	@echo "###"
+	@echo "### If this takes a long time, you might wish to run rngd in the"
+	@echo "### background to keep the supply of entropy topped up.  It"
+	@echo "### needs to be run as root, and uses a hardware random"
+	@echo "### number generator if one is available."
+	@echo "###"
+	openssl req -new -nodes -utf8 -$(CONFIG_MODULE_SIG_HASH) -days 36500 \
+		-batch -x509 -config $(obj)/x509.genkey \
+		-outform PEM -out $(obj)/signing_key.pem \
+		-keyout $(obj)/signing_key.pem 2>&1
+	@echo "###"
+	@echo "### Key pair generated."
+	@echo "###"
+
+$(obj)/x509.genkey:
+	@echo Generating X.509 key generation config
+	@echo  >$@ "[ req ]"
+	@echo >>$@ "default_bits = 4096"
+	@echo >>$@ "distinguished_name = req_distinguished_name"
+	@echo >>$@ "prompt = no"
+	@echo >>$@ "string_mask = utf8only"
+	@echo >>$@ "x509_extensions = myexts"
+	@echo >>$@
+	@echo >>$@ "[ req_distinguished_name ]"
+	@echo >>$@ "#O = Unspecified company"
+	@echo >>$@ "CN = Build time autogenerated kernel key"
+	@echo >>$@ "#emailAddress = unspecified.user@unspecified.company"
+	@echo >>$@
+	@echo >>$@ "[ myexts ]"
+	@echo >>$@ "basicConstraints=critical,CA:FALSE"
+	@echo >>$@ "keyUsage=digitalSignature"
+	@echo >>$@ "subjectKeyIdentifier=hash"
+	@echo >>$@ "authorityKeyIdentifier=keyid"
+endif
+
+$(eval $(call config_filename,MODULE_SIG_KEY))
+
+# If CONFIG_MODULE_SIG_KEY isn't a PKCS#11 URI, depend on it
+ifeq ($(patsubst pkcs11:%,%,$(firstword $(MODULE_SIG_KEY_FILENAME))),$(firstword $(MODULE_SIG_KEY_FILENAME)))
+X509_DEP := $(MODULE_SIG_KEY_SRCPREFIX)$(MODULE_SIG_KEY_FILENAME)
+endif
+
+# GCC PR#66871 again.
+$(obj)/system_certificates.o: $(obj)/signing_key.x509
+
+$(obj)/signing_key.x509: scripts/extract-cert include/config/module/sig/key.h $(X509_DEP)
+	$(call cmd,extract_certs,$(MODULE_SIG_KEY_SRCPREFIX)$(CONFIG_MODULE_SIG_KEY))
+endif
diff --git a/certs/system_certificates.S b/certs/system_certificates.S
new file mode 100644
index 000000000000..9216e8c81764
--- /dev/null
+++ b/certs/system_certificates.S
@@ -0,0 +1,23 @@
+#include <linux/export.h>
+#include <linux/init.h>
+
+	__INITRODATA
+
+	.align 8
+	.globl VMLINUX_SYMBOL(system_certificate_list)
+VMLINUX_SYMBOL(system_certificate_list):
+__cert_list_start:
+#ifdef CONFIG_MODULE_SIG
+	.incbin "certs/signing_key.x509"
+#endif
+	.incbin "certs/x509_certificate_list"
+__cert_list_end:
+
+	.align 8
+	.globl VMLINUX_SYMBOL(system_certificate_list_size)
+VMLINUX_SYMBOL(system_certificate_list_size):
+#ifdef CONFIG_64BIT
+	.quad __cert_list_end - __cert_list_start
+#else
+	.long __cert_list_end - __cert_list_start
+#endif
diff --git a/certs/system_keyring.c b/certs/system_keyring.c
new file mode 100644
index 000000000000..2570598b784d
--- /dev/null
+++ b/certs/system_keyring.c
@@ -0,0 +1,157 @@
+/* System trusted keyring for trusted public keys
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+
+#include <linux/export.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/cred.h>
+#include <linux/err.h>
+#include <keys/asymmetric-type.h>
+#include <keys/system_keyring.h>
+#include <crypto/pkcs7.h>
+
+struct key *system_trusted_keyring;
+EXPORT_SYMBOL_GPL(system_trusted_keyring);
+
+extern __initconst const u8 system_certificate_list[];
+extern __initconst const unsigned long system_certificate_list_size;
+
+/*
+ * Load the compiled-in keys
+ */
+static __init int system_trusted_keyring_init(void)
+{
+	pr_notice("Initialise system trusted keyring\n");
+
+	system_trusted_keyring =
+		keyring_alloc(".system_keyring",
+			      KUIDT_INIT(0), KGIDT_INIT(0), current_cred(),
+			      ((KEY_POS_ALL & ~KEY_POS_SETATTR) |
+			      KEY_USR_VIEW | KEY_USR_READ | KEY_USR_SEARCH),
+			      KEY_ALLOC_NOT_IN_QUOTA, NULL);
+	if (IS_ERR(system_trusted_keyring))
+		panic("Can't allocate system trusted keyring\n");
+
+	set_bit(KEY_FLAG_TRUSTED_ONLY, &system_trusted_keyring->flags);
+	return 0;
+}
+
+/*
+ * Must be initialised before we try and load the keys into the keyring.
+ */
+device_initcall(system_trusted_keyring_init);
+
+/*
+ * Load the compiled-in list of X.509 certificates.
+ */
+static __init int load_system_certificate_list(void)
+{
+	key_ref_t key;
+	const u8 *p, *end;
+	size_t plen;
+
+	pr_notice("Loading compiled-in X.509 certificates\n");
+
+	p = system_certificate_list;
+	end = p + system_certificate_list_size;
+	while (p < end) {
+		/* Each cert begins with an ASN.1 SEQUENCE tag and must be more
+		 * than 256 bytes in size.
+		 */
+		if (end - p < 4)
+			goto dodgy_cert;
+		if (p[0] != 0x30 &&
+		    p[1] != 0x82)
+			goto dodgy_cert;
+		plen = (p[2] << 8) | p[3];
+		plen += 4;
+		if (plen > end - p)
+			goto dodgy_cert;
+
+		key = key_create_or_update(make_key_ref(system_trusted_keyring, 1),
+					   "asymmetric",
+					   NULL,
+					   p,
+					   plen,
+					   ((KEY_POS_ALL & ~KEY_POS_SETATTR) |
+					   KEY_USR_VIEW | KEY_USR_READ),
+					   KEY_ALLOC_NOT_IN_QUOTA |
+					   KEY_ALLOC_TRUSTED);
+		if (IS_ERR(key)) {
+			pr_err("Problem loading in-kernel X.509 certificate (%ld)\n",
+			       PTR_ERR(key));
+		} else {
+			set_bit(KEY_FLAG_BUILTIN, &key_ref_to_ptr(key)->flags);
+			pr_notice("Loaded X.509 cert '%s'\n",
+				  key_ref_to_ptr(key)->description);
+			key_ref_put(key);
+		}
+		p += plen;
+	}
+
+	return 0;
+
+dodgy_cert:
+	pr_err("Problem parsing in-kernel X.509 certificate list\n");
+	return 0;
+}
+late_initcall(load_system_certificate_list);
+
+#ifdef CONFIG_SYSTEM_DATA_VERIFICATION
+
+/**
+ * Verify a PKCS#7-based signature on system data.
+ * @data: The data to be verified.
+ * @len: Size of @data.
+ * @raw_pkcs7: The PKCS#7 message that is the signature.
+ * @pkcs7_len: The size of @raw_pkcs7.
+ * @usage: The use to which the key is being put.
+ */
+int system_verify_data(const void *data, unsigned long len,
+		       const void *raw_pkcs7, size_t pkcs7_len,
+		       enum key_being_used_for usage)
+{
+	struct pkcs7_message *pkcs7;
+	bool trusted;
+	int ret;
+
+	pkcs7 = pkcs7_parse_message(raw_pkcs7, pkcs7_len);
+	if (IS_ERR(pkcs7))
+		return PTR_ERR(pkcs7);
+
+	/* The data should be detached - so we need to supply it. */
+	if (pkcs7_supply_detached_data(pkcs7, data, len) < 0) {
+		pr_err("PKCS#7 signature with non-detached data\n");
+		ret = -EBADMSG;
+		goto error;
+	}
+
+	ret = pkcs7_verify(pkcs7, usage);
+	if (ret < 0)
+		goto error;
+
+	ret = pkcs7_validate_trust(pkcs7, system_trusted_keyring, &trusted);
+	if (ret < 0)
+		goto error;
+
+	if (!trusted) {
+		pr_err("PKCS#7 signature not signed with a trusted key\n");
+		ret = -ENOKEY;
+	}
+
+error:
+	pkcs7_free_message(pkcs7);
+	pr_devel("<==%s() = %d\n", __func__, ret);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(system_verify_data);
+
+#endif /* CONFIG_SYSTEM_DATA_VERIFICATION */
diff --git a/crypto/Kconfig b/crypto/Kconfig
index b4cfc5754033..51b01de7c0ae 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1601,5 +1601,6 @@ config CRYPTO_HASH_INFO
 
 source "drivers/crypto/Kconfig"
 source crypto/asymmetric_keys/Kconfig
+source certs/Kconfig
 
 endif	# if CRYPTO
diff --git a/init/Kconfig b/init/Kconfig
index 5d1a703663ad..5526dfaac628 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1740,31 +1740,6 @@ config MMAP_ALLOW_UNINITIALIZED
 
 	  See Documentation/nommu-mmap.txt for more information.
 
-config SYSTEM_TRUSTED_KEYRING
-	bool "Provide system-wide ring of trusted keys"
-	depends on KEYS
-	help
-	  Provide a system keyring to which trusted keys can be added.  Keys in
-	  the keyring are considered to be trusted.  Keys may be added at will
-	  by the kernel from compiled-in data and from hardware key stores, but
-	  userspace may only add extra keys if those keys can be verified by
-	  keys already in the keyring.
-
-	  Keys in this keyring are used by module signature checking.
-
-config SYSTEM_TRUSTED_KEYS
-	string "Additional X.509 keys for default system keyring"
-	depends on SYSTEM_TRUSTED_KEYRING
-	help
-	  If set, this option should be the filename of a PEM-formatted file
-	  containing trusted X.509 certificates to be included in the default
-	  system keyring. Any certificate used for module signing is implicitly
-	  also trusted.
-
-	  NOTE: If you previously provided keys for the system keyring in the
-	  form of DER-encoded *.x509 files in the top-level build directory,
-	  those are no longer used. You will need to set this option instead.
-
 config SYSTEM_DATA_VERIFICATION
 	def_bool n
 	select SYSTEM_TRUSTED_KEYRING
@@ -1965,20 +1940,6 @@ config MODULE_SIG_HASH
 	default "sha384" if MODULE_SIG_SHA384
 	default "sha512" if MODULE_SIG_SHA512
 
-config MODULE_SIG_KEY
-	string "File name or PKCS#11 URI of module signing key"
-	default "signing_key.pem"
-	depends on MODULE_SIG
-	help
-         Provide the file name of a private key/certificate in PEM format,
-         or a PKCS#11 URI according to RFC7512. The file should contain, or
-         the URI should identify, both the certificate and its corresponding
-         private key.
-
-         If this option is unchanged from its default "signing_key.pem",
-         then the kernel will automatically generate the private key and
-         certificate as described in Documentation/module-signing.txt
-
 config MODULE_COMPRESS
 	bool "Compress modules on installation"
 	depends on MODULES
diff --git a/kernel/Makefile b/kernel/Makefile
index 65ef3846fbe8..1aa153a1be21 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -45,7 +45,6 @@ ifneq ($(CONFIG_SMP),y)
 obj-y += up.o
 endif
 obj-$(CONFIG_UID16) += uid16.o
-obj-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += system_keyring.o system_certificates.o
 obj-$(CONFIG_MODULES) += module.o
 obj-$(CONFIG_MODULE_SIG) += module_signing.o
 obj-$(CONFIG_KALLSYMS) += kallsyms.o
@@ -111,145 +110,3 @@ $(obj)/config_data.gz: $(KCONFIG_CONFIG) FORCE
 targets += config_data.h
 $(obj)/config_data.h: $(obj)/config_data.gz FORCE
 	$(call filechk,ikconfiggz)
-
-###############################################################################
-#
-# When a Kconfig string contains a filename, it is suitable for
-# passing to shell commands. It is surrounded by double-quotes, and
-# any double-quotes or backslashes within it are escaped by
-# backslashes.
-#
-# This is no use for dependencies or $(wildcard). We need to strip the
-# surrounding quotes and the escaping from quotes and backslashes, and
-# we *do* need to escape any spaces in the string. So, for example:
-#
-# Usage: $(eval $(call config_filename,FOO))
-#
-# Defines FOO_FILENAME based on the contents of the CONFIG_FOO option,
-# transformed as described above to be suitable for use within the
-# makefile.
-#
-# Also, if the filename is a relative filename and exists in the source
-# tree but not the build tree, define FOO_SRCPREFIX as $(srctree)/ to
-# be prefixed to *both* command invocation and dependencies.
-#
-# Note: We also print the filenames in the quiet_cmd_foo text, and
-# perhaps ought to have a version specially escaped for that purpose.
-# But it's only cosmetic, and $(patsubst "%",%,$(CONFIG_FOO)) is good
-# enough.  It'll strip the quotes in the common case where there's no
-# space and it's a simple filename, and it'll retain the quotes when
-# there's a space. There are some esoteric cases in which it'll print
-# the wrong thing, but we don't really care. The actual dependencies
-# and commands *do* get it right, with various combinations of single
-# and double quotes, backslashes and spaces in the filenames.
-#
-###############################################################################
-#
-quote := $(firstword " ")
-space :=
-space +=
-space_escape := %%%SPACE%%%
-#
-define config_filename
-ifneq ($$(CONFIG_$(1)),"")
-$(1)_FILENAME := $$(subst \\,\,$$(subst \$$(quote),$$(quote),$$(subst $$(space_escape),\$$(space),$$(patsubst "%",%,$$(subst $$(space),$$(space_escape),$$(CONFIG_$(1)))))))
-ifneq ($$(patsubst /%,%,$$(firstword $$($(1)_FILENAME))),$$(firstword $$($(1)_FILENAME)))
-else
-ifeq ($$(wildcard $$($(1)_FILENAME)),)
-ifneq ($$(wildcard $$(srctree)/$$($(1)_FILENAME)),)
-$(1)_SRCPREFIX := $(srctree)/
-endif
-endif
-endif
-endif
-endef
-#
-###############################################################################
-
-ifeq ($(CONFIG_SYSTEM_TRUSTED_KEYRING),y)
-
-$(eval $(call config_filename,SYSTEM_TRUSTED_KEYS))
-
-# GCC doesn't include .incbin files in -MD generated dependencies (PR#66871)
-$(obj)/system_certificates.o: $(obj)/x509_certificate_list
-
-# Cope with signing_key.x509 existing in $(srctree) not $(objtree)
-AFLAGS_system_certificates.o := -I$(srctree)
-
-quiet_cmd_extract_certs  = EXTRACT_CERTS   $(patsubst "%",%,$(2))
-      cmd_extract_certs  = scripts/extract-cert $(2) $@ || ( rm $@; exit 1)
-
-targets += x509_certificate_list
-$(obj)/x509_certificate_list: scripts/extract-cert $(SYSTEM_TRUSTED_KEYS_SRCPREFIX)$(SYSTEM_TRUSTED_KEYS_FILENAME) FORCE
-	$(call if_changed,extract_certs,$(SYSTEM_TRUSTED_KEYS_SRCPREFIX)$(CONFIG_SYSTEM_TRUSTED_KEYS))
-endif
-
-clean-files := x509_certificate_list .x509.list
-
-ifeq ($(CONFIG_MODULE_SIG),y)
-###############################################################################
-#
-# If module signing is requested, say by allyesconfig, but a key has not been
-# supplied, then one will need to be generated to make sure the build does not
-# fail and that the kernel may be used afterwards.
-#
-###############################################################################
-ifndef CONFIG_MODULE_SIG_HASH
-$(error Could not determine digest type to use from kernel config)
-endif
-
-# We do it this way rather than having a boolean option for enabling an
-# external private key, because 'make randconfig' might enable such a
-# boolean option and we unfortunately can't make it depend on !RANDCONFIG.
-ifeq ($(CONFIG_MODULE_SIG_KEY),"signing_key.pem")
-signing_key.pem: x509.genkey
-	@echo "###"
-	@echo "### Now generating an X.509 key pair to be used for signing modules."
-	@echo "###"
-	@echo "### If this takes a long time, you might wish to run rngd in the"
-	@echo "### background to keep the supply of entropy topped up.  It"
-	@echo "### needs to be run as root, and uses a hardware random"
-	@echo "### number generator if one is available."
-	@echo "###"
-	openssl req -new -nodes -utf8 -$(CONFIG_MODULE_SIG_HASH) -days 36500 \
-		-batch -x509 -config x509.genkey \
-		-outform PEM -out signing_key.pem \
-		-keyout signing_key.pem 2>&1
-	@echo "###"
-	@echo "### Key pair generated."
-	@echo "###"
-
-x509.genkey:
-	@echo Generating X.509 key generation config
-	@echo  >x509.genkey "[ req ]"
-	@echo >>x509.genkey "default_bits = 4096"
-	@echo >>x509.genkey "distinguished_name = req_distinguished_name"
-	@echo >>x509.genkey "prompt = no"
-	@echo >>x509.genkey "string_mask = utf8only"
-	@echo >>x509.genkey "x509_extensions = myexts"
-	@echo >>x509.genkey
-	@echo >>x509.genkey "[ req_distinguished_name ]"
-	@echo >>x509.genkey "#O = Unspecified company"
-	@echo >>x509.genkey "CN = Build time autogenerated kernel key"
-	@echo >>x509.genkey "#emailAddress = unspecified.user@unspecified.company"
-	@echo >>x509.genkey
-	@echo >>x509.genkey "[ myexts ]"
-	@echo >>x509.genkey "basicConstraints=critical,CA:FALSE"
-	@echo >>x509.genkey "keyUsage=digitalSignature"
-	@echo >>x509.genkey "subjectKeyIdentifier=hash"
-	@echo >>x509.genkey "authorityKeyIdentifier=keyid"
-endif
-
-$(eval $(call config_filename,MODULE_SIG_KEY))
-
-# If CONFIG_MODULE_SIG_KEY isn't a PKCS#11 URI, depend on it
-ifeq ($(patsubst pkcs11:%,%,$(firstword $(MODULE_SIG_KEY_FILENAME))),$(firstword $(MODULE_SIG_KEY_FILENAME)))
-X509_DEP := $(MODULE_SIG_KEY_SRCPREFIX)$(MODULE_SIG_KEY_FILENAME)
-endif
-
-# GCC PR#66871 again.
-$(obj)/system_certificates.o: signing_key.x509
-
-signing_key.x509: scripts/extract-cert include/config/module/sig/key.h $(X509_DEP)
-	$(call cmd,extract_certs,$(MODULE_SIG_KEY_SRCPREFIX)$(CONFIG_MODULE_SIG_KEY))
-endif
diff --git a/kernel/system_certificates.S b/kernel/system_certificates.S
deleted file mode 100644
index 6ba2f75e7ba5..000000000000
--- a/kernel/system_certificates.S
+++ /dev/null
@@ -1,23 +0,0 @@
-#include <linux/export.h>
-#include <linux/init.h>
-
-	__INITRODATA
-
-	.align 8
-	.globl VMLINUX_SYMBOL(system_certificate_list)
-VMLINUX_SYMBOL(system_certificate_list):
-__cert_list_start:
-#ifdef CONFIG_MODULE_SIG
-	.incbin "signing_key.x509"
-#endif
-	.incbin "kernel/x509_certificate_list"
-__cert_list_end:
-
-	.align 8
-	.globl VMLINUX_SYMBOL(system_certificate_list_size)
-VMLINUX_SYMBOL(system_certificate_list_size):
-#ifdef CONFIG_64BIT
-	.quad __cert_list_end - __cert_list_start
-#else
-	.long __cert_list_end - __cert_list_start
-#endif
diff --git a/kernel/system_keyring.c b/kernel/system_keyring.c
deleted file mode 100644
index 2570598b784d..000000000000
--- a/kernel/system_keyring.c
+++ /dev/null
@@ -1,157 +0,0 @@
-/* System trusted keyring for trusted public keys
- *
- * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public Licence
- * as published by the Free Software Foundation; either version
- * 2 of the Licence, or (at your option) any later version.
- */
-
-#include <linux/export.h>
-#include <linux/kernel.h>
-#include <linux/sched.h>
-#include <linux/cred.h>
-#include <linux/err.h>
-#include <keys/asymmetric-type.h>
-#include <keys/system_keyring.h>
-#include <crypto/pkcs7.h>
-
-struct key *system_trusted_keyring;
-EXPORT_SYMBOL_GPL(system_trusted_keyring);
-
-extern __initconst const u8 system_certificate_list[];
-extern __initconst const unsigned long system_certificate_list_size;
-
-/*
- * Load the compiled-in keys
- */
-static __init int system_trusted_keyring_init(void)
-{
-	pr_notice("Initialise system trusted keyring\n");
-
-	system_trusted_keyring =
-		keyring_alloc(".system_keyring",
-			      KUIDT_INIT(0), KGIDT_INIT(0), current_cred(),
-			      ((KEY_POS_ALL & ~KEY_POS_SETATTR) |
-			      KEY_USR_VIEW | KEY_USR_READ | KEY_USR_SEARCH),
-			      KEY_ALLOC_NOT_IN_QUOTA, NULL);
-	if (IS_ERR(system_trusted_keyring))
-		panic("Can't allocate system trusted keyring\n");
-
-	set_bit(KEY_FLAG_TRUSTED_ONLY, &system_trusted_keyring->flags);
-	return 0;
-}
-
-/*
- * Must be initialised before we try and load the keys into the keyring.
- */
-device_initcall(system_trusted_keyring_init);
-
-/*
- * Load the compiled-in list of X.509 certificates.
- */
-static __init int load_system_certificate_list(void)
-{
-	key_ref_t key;
-	const u8 *p, *end;
-	size_t plen;
-
-	pr_notice("Loading compiled-in X.509 certificates\n");
-
-	p = system_certificate_list;
-	end = p + system_certificate_list_size;
-	while (p < end) {
-		/* Each cert begins with an ASN.1 SEQUENCE tag and must be more
-		 * than 256 bytes in size.
-		 */
-		if (end - p < 4)
-			goto dodgy_cert;
-		if (p[0] != 0x30 &&
-		    p[1] != 0x82)
-			goto dodgy_cert;
-		plen = (p[2] << 8) | p[3];
-		plen += 4;
-		if (plen > end - p)
-			goto dodgy_cert;
-
-		key = key_create_or_update(make_key_ref(system_trusted_keyring, 1),
-					   "asymmetric",
-					   NULL,
-					   p,
-					   plen,
-					   ((KEY_POS_ALL & ~KEY_POS_SETATTR) |
-					   KEY_USR_VIEW | KEY_USR_READ),
-					   KEY_ALLOC_NOT_IN_QUOTA |
-					   KEY_ALLOC_TRUSTED);
-		if (IS_ERR(key)) {
-			pr_err("Problem loading in-kernel X.509 certificate (%ld)\n",
-			       PTR_ERR(key));
-		} else {
-			set_bit(KEY_FLAG_BUILTIN, &key_ref_to_ptr(key)->flags);
-			pr_notice("Loaded X.509 cert '%s'\n",
-				  key_ref_to_ptr(key)->description);
-			key_ref_put(key);
-		}
-		p += plen;
-	}
-
-	return 0;
-
-dodgy_cert:
-	pr_err("Problem parsing in-kernel X.509 certificate list\n");
-	return 0;
-}
-late_initcall(load_system_certificate_list);
-
-#ifdef CONFIG_SYSTEM_DATA_VERIFICATION
-
-/**
- * Verify a PKCS#7-based signature on system data.
- * @data: The data to be verified.
- * @len: Size of @data.
- * @raw_pkcs7: The PKCS#7 message that is the signature.
- * @pkcs7_len: The size of @raw_pkcs7.
- * @usage: The use to which the key is being put.
- */
-int system_verify_data(const void *data, unsigned long len,
-		       const void *raw_pkcs7, size_t pkcs7_len,
-		       enum key_being_used_for usage)
-{
-	struct pkcs7_message *pkcs7;
-	bool trusted;
-	int ret;
-
-	pkcs7 = pkcs7_parse_message(raw_pkcs7, pkcs7_len);
-	if (IS_ERR(pkcs7))
-		return PTR_ERR(pkcs7);
-
-	/* The data should be detached - so we need to supply it. */
-	if (pkcs7_supply_detached_data(pkcs7, data, len) < 0) {
-		pr_err("PKCS#7 signature with non-detached data\n");
-		ret = -EBADMSG;
-		goto error;
-	}
-
-	ret = pkcs7_verify(pkcs7, usage);
-	if (ret < 0)
-		goto error;
-
-	ret = pkcs7_validate_trust(pkcs7, system_trusted_keyring, &trusted);
-	if (ret < 0)
-		goto error;
-
-	if (!trusted) {
-		pr_err("PKCS#7 signature not signed with a trusted key\n");
-		ret = -ENOKEY;
-	}
-
-error:
-	pkcs7_free_message(pkcs7);
-	pr_devel("<==%s() = %d\n", __func__, ret);
-	return ret;
-}
-EXPORT_SYMBOL_GPL(system_verify_data);
-
-#endif /* CONFIG_SYSTEM_DATA_VERIFICATION */
-- 
cgit v1.2.3


From 9ed71e7ad95eb9885420e8d33f33bd4d49e1b775 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben@decadent.org.uk>
Date: Thu, 6 Aug 2015 23:36:52 +0100
Subject: DocBook: Fix non-determinstic installation of duplicate man pages

Some kernel-doc sections are included in multiple DocBook files.  This
means the mandocs target will generate the same manual page multiple
times with different metadata (author name/address and manual title,
taken from the including DocBook file).  If it's invoked in a parallel
build, the output is non-determinstic.

Build the manual pages in a separate subdirectory per DocBook file,
then sort and de-duplicate when installing them (which is serialised).

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[jc: fixed conflicts with the docs tree]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/Makefile | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index b7d2110298de..5e9702194dfe 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -60,7 +60,9 @@ mandocs: $(MAN)
 
 installmandocs: mandocs
 	mkdir -p /usr/local/man/man9/
-	install -m 644 $(obj)/man/*.9.gz /usr/local/man/man9/
+	find $(obj)/man -name '*.9.gz' -printf '%h %f\n' | \
+		sort -k 2 -k 1 | uniq -f 1 | sed -e 's: :/:' | \
+		xargs install -m 644 -t /usr/local/man/man9/
 
 ###
 #External programs used
@@ -150,12 +152,12 @@ quiet_cmd_db2html = HTML    $@
             cp $(PNG-$(basename $(notdir $@))) $(patsubst %.html,%,$@); fi
 
 quiet_cmd_db2man = MAN     $@
-      cmd_db2man = if grep -q refentry $<; then xmlto man $(XMLTOFLAGS) -o $(obj)/man $< ; fi
+      cmd_db2man = if grep -q refentry $<; then xmlto man $(XMLTOFLAGS) -o $(obj)/man/$(*F) $< ; fi
 %.9 : %.xml
 	@(which xmlto > /dev/null 2>&1) || \
 	 (echo "*** You need to install xmlto ***"; \
 	  exit 1)
-	$(Q)mkdir -p $(obj)/man
+	$(Q)mkdir -p $(obj)/man/$(*F)
 	$(call cmd,db2man)
 	@touch $@
 
-- 
cgit v1.2.3


From 89f271c4d052063a72af2622f285f70caac91845 Mon Sep 17 00:00:00 2001
From: Joachim Eastwood <manabian@gmail.com>
Date: Thu, 9 Jul 2015 22:19:07 +0200
Subject: doc: dt: add documentation for nxp,lpc1773-spifi

Add device tree binding documentation for the SPI Flash Interface
(SPIFI) found on NXP LPC18xx and LPC43xx devies.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
---
 .../devicetree/bindings/mtd/nxp-spifi.txt          | 58 ++++++++++++++++++++++
 1 file changed, 58 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mtd/nxp-spifi.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mtd/nxp-spifi.txt b/Documentation/devicetree/bindings/mtd/nxp-spifi.txt
new file mode 100644
index 000000000000..f8b6b250654e
--- /dev/null
+++ b/Documentation/devicetree/bindings/mtd/nxp-spifi.txt
@@ -0,0 +1,58 @@
+* NXP SPI Flash Interface (SPIFI)
+
+NXP SPIFI is a specialized SPI interface for serial Flash devices.
+It supports one Flash device with 1-, 2- and 4-bits width in SPI
+mode 0 or 3. The controller operates in either command or memory
+mode. In memory mode the Flash is accessible from the CPU as
+normal memory.
+
+Required properties:
+  - compatible : Should be "nxp,lpc1773-spifi"
+  - reg : the first contains the register location and length,
+          the second contains the memory mapping address and length
+  - reg-names: Should contain the reg names "spifi" and "flash"
+  - interrupts : Should contain the interrupt for the device
+  - clocks : The clocks needed by the SPIFI controller
+  - clock-names : Should contain the clock names "spifi" and "reg"
+
+Optional properties:
+ - resets : phandle + reset specifier
+
+The SPI Flash must be a child of the SPIFI node and must have a
+compatible property as specified in bindings/mtd/jedec,spi-nor.txt
+
+Optionally it can also contain the following properties.
+ - spi-cpol : Controller only supports mode 0 and 3 so either
+              both spi-cpol and spi-cpha should be present or
+              none of them
+ - spi-cpha : See above
+ - spi-rx-bus-width : Used to select how many pins that are used
+                      for input on the controller
+
+See bindings/spi/spi-bus.txt for more information.
+
+Example:
+spifi: spifi@40003000 {
+	compatible = "nxp,lpc1773-spifi";
+	reg = <0x40003000 0x1000>, <0x14000000 0x4000000>;
+	reg-names = "spifi", "flash";
+	interrupts = <30>;
+	clocks = <&ccu1 CLK_SPIFI>, <&ccu1 CLK_CPU_SPIFI>;
+	clock-names = "spifi", "reg";
+	resets = <&rgu 53>;
+
+	flash@0 {
+		compatible = "jedec,spi-nor";
+		spi-cpol;
+		spi-cpha;
+		spi-rx-bus-width = <4>;
+		#address-cells = <1>;
+		#size-cells = <1>;
+
+		partition@0 {
+			label = "data";
+			reg = <0 0x200000>;
+		};
+	};
+};
+
-- 
cgit v1.2.3


From 6ef2bf71764708f7c58ee9300acd8df05dbaa06f Mon Sep 17 00:00:00 2001
From: Stefan Koch <stefan.koch10@gmail.com>
Date: Sat, 8 Aug 2015 11:32:55 +0200
Subject: usb: interface authorization: Documentation part

This part adds the documentation for the interface authorization.

Signed-off-by: Stefan Koch <skoch@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/ABI/testing/sysfs-bus-usb | 22 +++++++++++++++++++++
 Documentation/usb/authorization.txt     | 34 +++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb
index 864637f25bee..5c6c26446f2a 100644
--- a/Documentation/ABI/testing/sysfs-bus-usb
+++ b/Documentation/ABI/testing/sysfs-bus-usb
@@ -1,3 +1,25 @@
+What:		/sys/bus/usb/devices/INTERFACE/authorized
+Date:		June 2015
+KernelVersion:	4.2
+Description:
+		This allows to authorize (1) or deauthorize (0)
+		individual interfaces instead a whole device
+		in contrast to the device authorization.
+		If a deauthorized interface will be authorized
+		so the driver probing must be triggered manually
+		by writing INTERFACE to /sys/bus/usb/drivers_probe
+		This allows to avoid side-effects with drivers
+		that need multiple interfaces.
+		A deauthorized interface cannot be probed or claimed.
+
+What:		/sys/bus/usb/devices/usbX/interface_authorized_default
+Date:		June 2015
+KernelVersion:	4.2
+Description:
+		This is used as default value that determines
+		if interfaces would authorized per default.
+		The value can be 1 or 0. It is per default 1.
+
 What:		/sys/bus/usb/device/.../authorized
 Date:		July 2008
 KernelVersion:	2.6.26
diff --git a/Documentation/usb/authorization.txt b/Documentation/usb/authorization.txt
index c069b6884c77..020cec5585ce 100644
--- a/Documentation/usb/authorization.txt
+++ b/Documentation/usb/authorization.txt
@@ -3,6 +3,9 @@ Authorizing (or not) your USB devices to connect to the system
 
 (C) 2007 Inaky Perez-Gonzalez <inaky@linux.intel.com> Intel Corporation
 
+Interface authorization part:
+	(C) 2015 Stefan Koch <skoch@suse.de> SUSE LLC
+
 This feature allows you to control if a USB device can be used (or
 not) in a system. This feature will allow you to implement a lock-down
 of USB devices, fully controlled by user space.
@@ -90,3 +93,34 @@ etc, but you get the idea. Anybody with access to a device gadget kit
 can fake descriptors and device info. Don't trust that. You are
 welcome.
 
+
+Interface authorization
+-----------------------
+There is a similar approach to allow or deny specific USB interfaces.
+That allows to block only a subset of an USB device.
+
+Authorize an interface:
+$ echo 1 > /sys/bus/usb/devices/INTERFACE/authorized
+
+Deauthorize an interface:
+$ echo 0 > /sys/bus/usb/devices/INTERFACE/authorized
+
+The default value for new interfaces
+on a particular USB bus can be changed, too.
+
+Allow interfaces per default:
+$ echo 1 > /sys/bus/usb/devices/usbX/interface_authorized_default
+
+Deny interfaces per default:
+$ echo 0 > /sys/bus/usb/devices/usbX/interface_authorized_default
+
+Per default the interface_authorized_default bit is 1.
+So all interfaces would authorized per default.
+
+Note:
+If a deauthorized interface will be authorized so the driver probing must
+be triggered manually by writing INTERFACE to /sys/bus/usb/drivers_probe
+
+For drivers that need multiple interfaces all needed interfaces should be
+authroized first. After that the drivers should be probed.
+This avoids side effects.
-- 
cgit v1.2.3


From a6be357e9720a7f8e11e2997e60626179151d190 Mon Sep 17 00:00:00 2001
From: Christophe Ricard <christophe.ricard@gmail.com>
Date: Fri, 14 Aug 2015 22:33:38 +0200
Subject: nfc: st-nci: Add device tree documentation for spi phy

Add st-nci-spi phy devicetree documentation

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
---
 .../devicetree/bindings/net/nfc/st-nci-i2c.txt     | 33 ++++++++++++++++++++++
 .../devicetree/bindings/net/nfc/st-nci-spi.txt     | 31 ++++++++++++++++++++
 .../devicetree/bindings/net/nfc/st-nci.txt         | 33 ----------------------
 3 files changed, 64 insertions(+), 33 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/nfc/st-nci-i2c.txt
 create mode 100644 Documentation/devicetree/bindings/net/nfc/st-nci-spi.txt
 delete mode 100644 Documentation/devicetree/bindings/net/nfc/st-nci.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/nfc/st-nci-i2c.txt b/Documentation/devicetree/bindings/net/nfc/st-nci-i2c.txt
new file mode 100644
index 000000000000..d707588ed734
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/nfc/st-nci-i2c.txt
@@ -0,0 +1,33 @@
+* STMicroelectronics SAS. ST NCI NFC Controller
+
+Required properties:
+- compatible: Should be "st,st21nfcb-i2c" or "st,st21nfcc-i2c".
+- clock-frequency: I²C work frequency.
+- reg: address on the bus
+- interrupt-parent: phandle for the interrupt gpio controller
+- interrupts: GPIO interrupt to which the chip is connected
+- reset-gpios: Output GPIO pin used to reset the ST21NFCB
+
+Optional SoC Specific Properties:
+- pinctrl-names: Contains only one value - "default".
+- pintctrl-0: Specifies the pin control groups used for this controller.
+
+Example (for ARM-based BeagleBoard xM with ST21NFCB on I2C2):
+
+&i2c2 {
+
+	status = "okay";
+
+	st21nfcb: st21nfcb@8 {
+
+		compatible = "st,st21nfcb-i2c";
+
+		reg = <0x08>;
+		clock-frequency = <400000>;
+
+		interrupt-parent = <&gpio5>;
+		interrupts = <2 IRQ_TYPE_LEVEL_HIGH>;
+
+		reset-gpios = <&gpio5 29 GPIO_ACTIVE_HIGH>;
+	};
+};
diff --git a/Documentation/devicetree/bindings/net/nfc/st-nci-spi.txt b/Documentation/devicetree/bindings/net/nfc/st-nci-spi.txt
new file mode 100644
index 000000000000..525681b6dc39
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/nfc/st-nci-spi.txt
@@ -0,0 +1,31 @@
+* STMicroelectronics SAS. ST NCI NFC Controller
+
+Required properties:
+- compatible: Should be "st,st21nfcb-spi"
+- spi-max-frequency: Maximum SPI frequency (<= 10000000).
+- interrupt-parent: phandle for the interrupt gpio controller
+- interrupts: GPIO interrupt to which the chip is connected
+- reset-gpios: Output GPIO pin used to reset the ST21NFCB
+
+Optional SoC Specific Properties:
+- pinctrl-names: Contains only one value - "default".
+- pintctrl-0: Specifies the pin control groups used for this controller.
+
+Example (for ARM-based BeagleBoard xM with ST21NFCB on SPI4):
+
+&mcspi4 {
+
+	status = "okay";
+
+	st21nfcb: st21nfcb@0 {
+
+		compatible = "st,st21nfcb-spi";
+
+		clock-frequency = <4000000>;
+
+		interrupt-parent = <&gpio5>;
+		interrupts = <2 IRQ_TYPE_EDGE_RISING>;
+
+		reset-gpios = <&gpio5 29 GPIO_ACTIVE_HIGH>;
+	};
+};
diff --git a/Documentation/devicetree/bindings/net/nfc/st-nci.txt b/Documentation/devicetree/bindings/net/nfc/st-nci.txt
deleted file mode 100644
index d707588ed734..000000000000
--- a/Documentation/devicetree/bindings/net/nfc/st-nci.txt
+++ /dev/null
@@ -1,33 +0,0 @@
-* STMicroelectronics SAS. ST NCI NFC Controller
-
-Required properties:
-- compatible: Should be "st,st21nfcb-i2c" or "st,st21nfcc-i2c".
-- clock-frequency: I²C work frequency.
-- reg: address on the bus
-- interrupt-parent: phandle for the interrupt gpio controller
-- interrupts: GPIO interrupt to which the chip is connected
-- reset-gpios: Output GPIO pin used to reset the ST21NFCB
-
-Optional SoC Specific Properties:
-- pinctrl-names: Contains only one value - "default".
-- pintctrl-0: Specifies the pin control groups used for this controller.
-
-Example (for ARM-based BeagleBoard xM with ST21NFCB on I2C2):
-
-&i2c2 {
-
-	status = "okay";
-
-	st21nfcb: st21nfcb@8 {
-
-		compatible = "st,st21nfcb-i2c";
-
-		reg = <0x08>;
-		clock-frequency = <400000>;
-
-		interrupt-parent = <&gpio5>;
-		interrupts = <2 IRQ_TYPE_LEVEL_HIGH>;
-
-		reset-gpios = <&gpio5 29 GPIO_ACTIVE_HIGH>;
-	};
-};
-- 
cgit v1.2.3


From 5699f871d2d51ce40012501378670613d4d49214 Mon Sep 17 00:00:00 2001
From: Danilo Cesar Lemes de Paula <danilo.cesar@collabora.co.uk>
Date: Tue, 28 Jul 2015 16:45:15 -0300
Subject: scripts/kernel-doc: Adding cross-reference links to html
 documentation.

Functions, Structs and Parameters definitions on kernel documentation
are pure cosmetic, it only highlights the element.

To ease the navigation in the documentation we should use <links> inside
those tags so readers can easily jump between methods directly.

This was discussed in 2014[1] and is implemented by getting a list
of <refentries> from the DocBook XML to generate a database. Then it looks
for <function>,<structnames> and <paramdef> tags that matches the ones in
the database. As it only links existent references, no broken links are
added.

[1] - lists.freedesktop.org/archives/dri-devel/2014-August/065404.html

Signed-off-by: Danilo Cesar Lemes de Paula <danilo.cesar@collabora.co.uk>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephan Mueller <smueller@chronox.de>
Cc: Michal Marek <mmarek@suse.cz>
Cc: intel-gfx <intel-gfx@lists.freedesktop.org>
Cc: dri-devel <dri-devel@lists.freedesktop.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/Makefile |  43 ++++++---
 scripts/kernel-doc-xml-ref     | 198 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 228 insertions(+), 13 deletions(-)
 create mode 100755 scripts/kernel-doc-xml-ref

(limited to 'Documentation')

diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index 5e9702194dfe..b1d6c951ea29 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -66,8 +66,9 @@ installmandocs: mandocs
 
 ###
 #External programs used
-KERNELDOC = $(srctree)/scripts/kernel-doc
-DOCPROC   = $(objtree)/scripts/docproc
+KERNELDOCXMLREF = $(srctree)/scripts/kernel-doc-xml-ref
+KERNELDOC       = $(srctree)/scripts/kernel-doc
+DOCPROC         = $(objtree)/scripts/docproc
 
 XMLTOFLAGS = -m $(srctree)/$(src)/stylesheet.xsl
 XMLTOFLAGS += --skip-validation
@@ -91,7 +92,7 @@ define rule_docproc
         ) > $(dir $@).$(notdir $@).cmd
 endef
 
-%.xml: %.tmpl $(KERNELDOC) $(DOCPROC) FORCE
+%.xml: %.tmpl $(KERNELDOC) $(DOCPROC) $(KERNELDOCXMLREF) FORCE
 	$(call if_changed_rule,docproc)
 
 # Tell kbuild to always build the programs
@@ -142,7 +143,20 @@ quiet_cmd_db2html = HTML    $@
 		echo '<a HREF="$(patsubst %.html,%,$(notdir $@))/index.html"> \
 		$(patsubst %.html,%,$(notdir $@))</a><p>' > $@
 
-%.html:	%.xml
+###
+# Rules to create an aux XML and .db, and use them to re-process the DocBook XML
+# to fill internal hyperlinks
+       gen_aux_xml = :
+ quiet_gen_aux_xml = echo '  XMLREF  $@'
+silent_gen_aux_xml = :
+%.aux.xml: %.xml
+	@$($(quiet)gen_aux_xml)
+	@rm -rf $@
+	@(cat $< | egrep "^<refentry id" | egrep -o "\".*\"" | cut -f 2 -d \" > $<.db)
+	@$(KERNELDOCXMLREF) -db $<.db $< > $@
+.PRECIOUS: %.aux.xml
+
+%.html:	%.aux.xml
 	@(which xmlto > /dev/null 2>&1) || \
 	 (echo "*** You need to install xmlto ***"; \
 	  exit 1)
@@ -211,15 +225,18 @@ dochelp:
 ###
 # Temporary files left by various tools
 clean-files := $(DOCBOOKS) \
-	$(patsubst %.xml, %.dvi,  $(DOCBOOKS)) \
-	$(patsubst %.xml, %.aux,  $(DOCBOOKS)) \
-	$(patsubst %.xml, %.tex,  $(DOCBOOKS)) \
-	$(patsubst %.xml, %.log,  $(DOCBOOKS)) \
-	$(patsubst %.xml, %.out,  $(DOCBOOKS)) \
-	$(patsubst %.xml, %.ps,   $(DOCBOOKS)) \
-	$(patsubst %.xml, %.pdf,  $(DOCBOOKS)) \
-	$(patsubst %.xml, %.html, $(DOCBOOKS)) \
-	$(patsubst %.xml, %.9,    $(DOCBOOKS)) \
+	$(patsubst %.xml, %.dvi,     $(DOCBOOKS)) \
+	$(patsubst %.xml, %.aux,     $(DOCBOOKS)) \
+	$(patsubst %.xml, %.tex,     $(DOCBOOKS)) \
+	$(patsubst %.xml, %.log,     $(DOCBOOKS)) \
+	$(patsubst %.xml, %.out,     $(DOCBOOKS)) \
+	$(patsubst %.xml, %.ps,      $(DOCBOOKS)) \
+	$(patsubst %.xml, %.pdf,     $(DOCBOOKS)) \
+	$(patsubst %.xml, %.html,    $(DOCBOOKS)) \
+	$(patsubst %.xml, %.9,       $(DOCBOOKS)) \
+	$(patsubst %.xml, %.aux.xml, $(DOCBOOKS)) \
+	$(patsubst %.xml, %.xml.db,  $(DOCBOOKS)) \
+	$(patsubst %.xml, %.xml,     $(DOCBOOKS)) \
 	$(index)
 
 clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS)) man
diff --git a/scripts/kernel-doc-xml-ref b/scripts/kernel-doc-xml-ref
new file mode 100755
index 000000000000..104a5a5ba2c8
--- /dev/null
+++ b/scripts/kernel-doc-xml-ref
@@ -0,0 +1,198 @@
+#!/usr/bin/perl -w
+
+use strict;
+
+## Copyright (C) 2015  Intel Corporation                         ##
+#                                                                ##
+## This software falls under the GNU General Public License.     ##
+## Please read the COPYING file for more information             ##
+#
+#
+# This software reads a XML file and a list of valid interal
+# references to replace Docbook tags with links.
+#
+# The list of "valid internal references" must be one-per-line in the following format:
+#      API-struct-foo
+#      API-enum-bar
+#      API-my-function
+#
+# The software walks over the XML file looking for xml tags representing possible references
+# to the Document. Each reference will be cross checked against the "Valid Internal Reference" list. If
+# the referece is found it replaces its content by a <link> tag.
+#
+# usage:
+# kernel-doc-xml-ref -db filename
+#		     xml filename > outputfile
+
+# read arguments
+if ($#ARGV != 2) {
+	usage();
+}
+
+#Holds the database filename
+my $databasefile;
+my @database;
+
+#holds the inputfile
+my $inputfile;
+my $errors = 0;
+
+my %highlights = (
+	"<function>(.*?)</function>",
+	    "\"<function>\" . convert_function(\$1, \$line) . \"</function>\"",
+	"<structname>(.*?)</structname>",
+	    "\"<structname>\" . convert_struct(\$1) . \"</structname>\"",
+	"<funcdef>(.*?)<function>(.*?)</function></funcdef>",
+	    "\"<funcdef>\" . convert_param(\$1) . \"<function>\$2</function></funcdef>\"",
+	"<paramdef>(.*?)<parameter>(.*?)</parameter></paramdef>",
+	    "\"<paramdef>\" . convert_param(\$1) . \"<parameter>\$2</parameter></paramdef>\"");
+
+while($ARGV[0] =~ m/^-(.*)/) {
+	my $cmd = shift @ARGV;
+	if ($cmd eq "-db") {
+		$databasefile = shift @ARGV
+	} else {
+		usage();
+	}
+}
+$inputfile = shift @ARGV;
+
+sub open_database {
+	open (my $handle, '<', $databasefile) or die "Cannot open $databasefile";
+	chomp(my @lines = <$handle>);
+	close $handle;
+
+	@database = @lines;
+}
+
+sub process_file {
+	open_database();
+
+	my $dohighlight;
+	foreach my $pattern (keys %highlights) {
+		$dohighlight .=  "\$line =~ s:$pattern:$highlights{$pattern}:eg;\n";
+	}
+
+	open(FILE, $inputfile) or die("Could not open $inputfile") or die ("Cannot open $inputfile");
+	foreach my $line (<FILE>)  {
+		eval $dohighlight;
+		print $line;
+	}
+}
+
+sub trim($_)
+{
+	my $str = $_[0];
+	$str =~ s/^\s+|\s+$//g;
+	return $str
+}
+
+sub has_key_defined($_)
+{
+	if ( grep( /^$_[0]$/, @database)) {
+		return 1;
+	}
+	return 0;
+}
+
+# Gets a <function> content and add it a hyperlink if possible.
+sub convert_function($_)
+{
+	my $arg = $_[0];
+	my $key = $_[0];
+
+	my $line = $_[1];
+
+	$key = trim($key);
+
+	$key =~ s/[^A-Za-z0-9]/-/g;
+	$key = "API-" . $key;
+
+	# We shouldn't add links to <funcdef> prototype
+	if (!has_key_defined($key) || $line =~ m/\s+<funcdef/i) {
+		return $arg;
+	}
+
+	my $head = $arg;
+	my $tail = "";
+	if ($arg =~ /(.*?)( ?)$/) {
+		$head = $1;
+		$tail = $2;
+	}
+	return "<link linkend=\"$key\">$head</link>$tail";
+}
+
+# Converting a struct text to link
+sub convert_struct($_)
+{
+	my $arg = $_[0];
+	my $key = $_[0];
+	$key =~ s/(struct )?(\w)/$2/g;
+	$key =~ s/[^A-Za-z0-9]/-/g;
+	$key = "API-struct-" . $key;
+
+	if (!has_key_defined($key)) {
+		return $arg;
+	}
+
+	my ($head, $tail) = split_pointer($arg);
+	return "<link linkend=\"$key\">$head</link>$tail";
+}
+
+# Identify "object *" elements
+sub split_pointer($_)
+{
+	my $arg = $_[0];
+	if ($arg =~ /(.*?)( ?\* ?)/) {
+		return ($1, $2);
+	}
+	return ($arg, "");
+}
+
+sub convert_param($_)
+{
+	my $type = $_[0];
+	my $keyname = convert_key_name($type);
+
+	if (!has_key_defined($keyname)) {
+		return $type;
+	}
+
+	my ($head, $tail) = split_pointer($type);
+	return "<link linkend=\"$keyname\">$head</link>$tail";
+
+}
+
+# DocBook links are in the API-<TYPE>-<STRUCT-NAME> format
+# This method gets an element and returns a valid DocBook reference for it.
+sub convert_key_name($_)
+{
+	#Pattern $2 is optional and might be uninitialized
+	no warnings 'uninitialized';
+
+	my $str = $_[0];
+	$str =~ s/(const|static)? ?(struct)? ?([a-zA-Z0-9_]+) ?(\*|&)?/$2 $3/g ;
+
+	# trim
+	$str =~ s/^\s+|\s+$//g;
+
+	# spaces and _ to -
+	$str =~ s/[^A-Za-z0-9]/-/g;
+
+	return "API-" . $str;
+}
+
+sub usage {
+	print "Usage: $0 -db database filename\n";
+	print "         xml source file(s) > outputfile\n";
+	exit 1;
+}
+
+# starting point
+process_file();
+
+if ($errors) {
+	print STDERR "$errors errors\n";
+}
+
+exit($errors);
-- 
cgit v1.2.3


From efe3b616f88caa95dbe8636f4d0b3dcefca962bb Mon Sep 17 00:00:00 2001
From: Matt Ranostay <mranostay@gmail.com>
Date: Sun, 16 Aug 2015 23:04:53 -0700
Subject: Input: cap11xx - add LED support

Several cap11xx variants have LEDs that be can be controlled; let's plug
the driver into the LED subsystem.

Signed-off-by: Matt Ranostay <mranostay@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 .../devicetree/bindings/input/cap11xx.txt          |  19 +++
 drivers/input/keyboard/cap11xx.c                   | 144 ++++++++++++++++++++-
 2 files changed, 160 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/input/cap11xx.txt b/Documentation/devicetree/bindings/input/cap11xx.txt
index 7d0a3009771b..8c67a0b5058d 100644
--- a/Documentation/devicetree/bindings/input/cap11xx.txt
+++ b/Documentation/devicetree/bindings/input/cap11xx.txt
@@ -55,5 +55,24 @@ i2c_controller {
 				 <105>,		/* KEY_LEFT */
 				 <109>,		/* KEY_PAGEDOWN */
 				 <104>;		/* KEY_PAGEUP */
+
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		usr@0 {
+			label = "cap11xx:green:usr0";
+			reg = <0>;
+		};
+
+		usr@1 {
+			label = "cap11xx:green:usr1";
+			reg = <1>;
+		};
+
+		alive@2 {
+			label = "cap11xx:green:alive";
+			reg = <2>;
+			linux,default_trigger = "heartbeat";
+		};
 	};
 }
diff --git a/drivers/input/keyboard/cap11xx.c b/drivers/input/keyboard/cap11xx.c
index b58fd9d72d92..378db10001df 100644
--- a/drivers/input/keyboard/cap11xx.c
+++ b/drivers/input/keyboard/cap11xx.c
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/interrupt.h>
 #include <linux/input.h>
+#include <linux/leds.h>
 #include <linux/of_irq.h>
 #include <linux/regmap.h>
 #include <linux/i2c.h>
@@ -47,6 +48,20 @@
 #define CAP11XX_REG_CONFIG2		0x44
 #define CAP11XX_REG_CONFIG2_ALT_POL	BIT(6)
 #define CAP11XX_REG_SENSOR_BASE_CNT(X)	(0x50 + (X))
+#define CAP11XX_REG_LED_POLARITY	0x73
+#define CAP11XX_REG_LED_OUTPUT_CONTROL	0x74
+
+#define CAP11XX_REG_LED_DUTY_CYCLE_1	0x90
+#define CAP11XX_REG_LED_DUTY_CYCLE_2	0x91
+#define CAP11XX_REG_LED_DUTY_CYCLE_3	0x92
+#define CAP11XX_REG_LED_DUTY_CYCLE_4	0x93
+
+#define CAP11XX_REG_LED_DUTY_MIN_MASK	(0x0f)
+#define CAP11XX_REG_LED_DUTY_MIN_MASK_SHIFT	(0)
+#define CAP11XX_REG_LED_DUTY_MAX_MASK	(0xf0)
+#define CAP11XX_REG_LED_DUTY_MAX_MASK_SHIFT	(4)
+#define CAP11XX_REG_LED_DUTY_MAX_VALUE	(15)
+
 #define CAP11XX_REG_SENSOR_CALIB	(0xb1 + (X))
 #define CAP11XX_REG_SENSOR_CALIB_LSB1	0xb9
 #define CAP11XX_REG_SENSOR_CALIB_LSB2	0xba
@@ -56,10 +71,23 @@
 
 #define CAP11XX_MANUFACTURER_ID	0x5d
 
+#ifdef CONFIG_LEDS_CLASS
+struct cap11xx_led {
+	struct cap11xx_priv *priv;
+	struct led_classdev cdev;
+	struct work_struct work;
+	u32 reg;
+	enum led_brightness new_brightness;
+};
+#endif
+
 struct cap11xx_priv {
 	struct regmap *regmap;
 	struct input_dev *idev;
 
+	struct cap11xx_led *leds;
+	int num_leds;
+
 	/* config */
 	u32 keycodes[];
 };
@@ -67,6 +95,7 @@ struct cap11xx_priv {
 struct cap11xx_hw_model {
 	u8 product_id;
 	unsigned int num_channels;
+	unsigned int num_leds;
 };
 
 enum {
@@ -76,9 +105,9 @@ enum {
 };
 
 static const struct cap11xx_hw_model cap11xx_devices[] = {
-	[CAP1106] = { .product_id = 0x55, .num_channels = 6 },
-	[CAP1126] = { .product_id = 0x53, .num_channels = 6 },
-	[CAP1188] = { .product_id = 0x50, .num_channels = 8 },
+	[CAP1106] = { .product_id = 0x55, .num_channels = 6, .num_leds = 0 },
+	[CAP1126] = { .product_id = 0x53, .num_channels = 6, .num_leds = 2 },
+	[CAP1188] = { .product_id = 0x50, .num_channels = 8, .num_leds = 8 },
 };
 
 static const struct reg_default cap11xx_reg_defaults[] = {
@@ -111,6 +140,7 @@ static const struct reg_default cap11xx_reg_defaults[] = {
 	{ CAP11XX_REG_STANDBY_SENSITIVITY,	0x02 },
 	{ CAP11XX_REG_STANDBY_THRESH,		0x40 },
 	{ CAP11XX_REG_CONFIG2,			0x40 },
+	{ CAP11XX_REG_LED_POLARITY,		0x00 },
 	{ CAP11XX_REG_SENSOR_CALIB_LSB1,	0x00 },
 	{ CAP11XX_REG_SENSOR_CALIB_LSB2,	0x00 },
 };
@@ -177,6 +207,12 @@ out:
 
 static int cap11xx_set_sleep(struct cap11xx_priv *priv, bool sleep)
 {
+	/*
+	 * DLSEEP mode will turn off all LEDS, prevent this
+	 */
+	if (IS_ENABLED(CONFIG_LEDS_CLASS) && priv->num_leds)
+		return 0;
+
 	return regmap_update_bits(priv->regmap, CAP11XX_REG_MAIN_CONTROL,
 				  CAP11XX_REG_MAIN_CONTROL_DLSEEP,
 				  sleep ? CAP11XX_REG_MAIN_CONTROL_DLSEEP : 0);
@@ -196,6 +232,104 @@ static void cap11xx_input_close(struct input_dev *idev)
 	cap11xx_set_sleep(priv, true);
 }
 
+#ifdef CONFIG_LEDS_CLASS
+static void cap11xx_led_work(struct work_struct *work)
+{
+	struct cap11xx_led *led = container_of(work, struct cap11xx_led, work);
+	struct cap11xx_priv *priv = led->priv;
+	int value = led->new_brightness;
+
+	/*
+	 * All LEDs share the same duty cycle as this is a HW limitation.
+	 * Brightness levels per LED are either 0 (OFF) and 1 (ON).
+	 */
+	regmap_update_bits(priv->regmap, CAP11XX_REG_LED_OUTPUT_CONTROL,
+				BIT(led->reg), value ? BIT(led->reg) : 0);
+}
+
+static void cap11xx_led_set(struct led_classdev *cdev,
+			   enum led_brightness value)
+{
+	struct cap11xx_led *led = container_of(cdev, struct cap11xx_led, cdev);
+
+	if (led->new_brightness == value)
+		return;
+
+	led->new_brightness = value;
+	schedule_work(&led->work);
+}
+
+static int cap11xx_init_leds(struct device *dev,
+			     struct cap11xx_priv *priv, int num_leds)
+{
+	struct device_node *node = dev->of_node, *child;
+	struct cap11xx_led *led;
+	int cnt = of_get_child_count(node);
+	int error;
+
+	if (!num_leds || !cnt)
+		return 0;
+
+	if (cnt > num_leds)
+		return -EINVAL;
+
+	led = devm_kcalloc(dev, cnt, sizeof(struct cap11xx_led), GFP_KERNEL);
+	if (!led)
+		return -ENOMEM;
+
+	priv->leds = led;
+
+	error = regmap_update_bits(priv->regmap,
+				CAP11XX_REG_LED_OUTPUT_CONTROL, 0xff, 0);
+	if (error)
+		return error;
+
+	error = regmap_update_bits(priv->regmap, CAP11XX_REG_LED_DUTY_CYCLE_4,
+				CAP11XX_REG_LED_DUTY_MAX_MASK,
+				CAP11XX_REG_LED_DUTY_MAX_VALUE <<
+				CAP11XX_REG_LED_DUTY_MAX_MASK_SHIFT);
+	if (error)
+		return error;
+
+	for_each_child_of_node(node, child) {
+		u32 reg;
+
+		led->cdev.name =
+			of_get_property(child, "label", NULL) ? : child->name;
+		led->cdev.default_trigger =
+			of_get_property(child, "linux,default-trigger", NULL);
+		led->cdev.flags = 0;
+		led->cdev.brightness_set = cap11xx_led_set;
+		led->cdev.max_brightness = 1;
+		led->cdev.brightness = LED_OFF;
+
+		error = of_property_read_u32(child, "reg", &reg);
+		if (error != 0 || reg >= num_leds)
+			return -EINVAL;
+
+		led->reg = reg;
+		led->priv = priv;
+
+		INIT_WORK(&led->work, cap11xx_led_work);
+
+		error = devm_led_classdev_register(dev, &led->cdev);
+		if (error)
+			return error;
+
+		priv->num_leds++;
+		led++;
+	}
+
+	return 0;
+}
+#else
+static int cap11xx_init_leds(struct device *dev,
+			     struct cap11xx_priv *priv, int num_leds)
+{
+	return 0;
+}
+#endif
+
 static int cap11xx_i2c_probe(struct i2c_client *i2c_client,
 			     const struct i2c_device_id *id)
 {
@@ -316,6 +450,10 @@ static int cap11xx_i2c_probe(struct i2c_client *i2c_client,
 	priv->idev->open = cap11xx_input_open;
 	priv->idev->close = cap11xx_input_close;
 
+	error = cap11xx_init_leds(dev, priv, cap->num_leds);
+	if (error)
+		return error;
+
 	input_set_drvdata(priv->idev, priv);
 
 	/*
-- 
cgit v1.2.3


From 06f10e2f936424496d44e5541c220845c8c55345 Mon Sep 17 00:00:00 2001
From: Vinod Koul <vinod.koul@intel.com>
Date: Wed, 5 Aug 2015 08:42:04 +0530
Subject: Documentation: dmaengine: fix the DMA_CTRL_ACK documentation

As discussed recently the meaning of DMA_CTRL_ACK is that a desc cannot be
reused by provider until the client acknowledges receipt, i.e. has has a
chance to establish any dependency chains. So update documentation

Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Acked-by:Robert Jarzmik <robert.jarzmik@free.fr>
---
 Documentation/dmaengine/provider.txt | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/dmaengine/provider.txt b/Documentation/dmaengine/provider.txt
index ca67b0f04c6e..243889ec5c5a 100644
--- a/Documentation/dmaengine/provider.txt
+++ b/Documentation/dmaengine/provider.txt
@@ -345,12 +345,12 @@ where to put them)
       that abstracts it away.
 
   * DMA_CTRL_ACK
-    - If set, the transfer can be reused after being completed.
-    - There is a guarantee the transfer won't be freed until it is acked
-      by async_tx_ack().
-    - As a consequence, if a device driver wants to skip the dma_map_sg() and
-      dma_unmap_sg() in between 2 transfers, because the DMA'd data wasn't used,
-      it can resubmit the transfer right after its completion.
+    - If clear, the descriptor cannot be reused by provider until the
+      client acknowledges receipt, i.e. has has a chance to establish any
+      dependency chains
+    - This can be acked by invoking async_tx_ack()
+    - If set, does not mean descriptor can be reused
+
 
 General Design Notes
 --------------------
-- 
cgit v1.2.3


From 60884ddecdc2f45fd2862b7e95c9ad858584f82b Mon Sep 17 00:00:00 2001
From: Vinod Koul <vinod.koul@intel.com>
Date: Wed, 5 Aug 2015 08:42:06 +0530
Subject: Documentation: dmaengine: Add DMA_CTRL_REUSE documentation

Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Acked-by:Robert Jarzmik <robert.jarzmik@free.fr>
---
 Documentation/dmaengine/provider.txt | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/dmaengine/provider.txt b/Documentation/dmaengine/provider.txt
index 243889ec5c5a..67d4ce4df109 100644
--- a/Documentation/dmaengine/provider.txt
+++ b/Documentation/dmaengine/provider.txt
@@ -351,6 +351,23 @@ where to put them)
     - This can be acked by invoking async_tx_ack()
     - If set, does not mean descriptor can be reused
 
+  * DMA_CTRL_REUSE
+    - If set, the descriptor can be reused after being completed. It should
+      not be freed by provider if this flag is set.
+    - The descriptor should be prepared for reuse by invoking
+      dmaengine_desc_set_reuse() which will set DMA_CTRL_REUSE.
+    - dmaengine_desc_set_reuse() will succeed only when channel support
+      reusable descriptor as exhibited by capablities
+    - As a consequence, if a device driver wants to skip the dma_map_sg() and
+      dma_unmap_sg() in between 2 transfers, because the DMA'd data wasn't used,
+      it can resubmit the transfer right after its completion.
+    - Descriptor can be freed in few ways
+	- Clearing DMA_CTRL_REUSE by invoking dmaengine_desc_clear_reuse()
+	  and submitting for last txn
+	- Explicitly invoking dmaengine_desc_free(), this can succeed only
+	  when DMA_CTRL_REUSE is already set
+	- Terminating the channel
+
 
 General Design Notes
 --------------------
-- 
cgit v1.2.3


From ac49fbd1f9d437cca7234473850aef4165779383 Mon Sep 17 00:00:00 2001
From: Dirk Behme <dirk.behme@gmail.com>
Date: Sat, 18 Jul 2015 08:02:07 +0200
Subject: Documentation: gpio: consumer: describe active low property

I've been searching for any documentation of 'the active-low property of a GPIO'
already mentioned in this documenation. But couldn't find any. Add it.

Sigend-off-by: Dirk Behme <dirk.behme@gmail.com>
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
[Spelling, grammar fixes]
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 Documentation/gpio/consumer.txt | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/gpio/consumer.txt b/Documentation/gpio/consumer.txt
index 75542b91b766..a206639454ab 100644
--- a/Documentation/gpio/consumer.txt
+++ b/Documentation/gpio/consumer.txt
@@ -237,6 +237,39 @@ Note that these functions should only be used with great moderation ; a driver
 should not have to care about the physical line level.
 
 
+The active-low property
+-----------------------
+
+As a driver should not have to care about the physical line level, all of the
+gpiod_set_value_xxx() or gpiod_set_array_value_xxx() functions operate with
+the *logical* value. With this they take the active-low property into account.
+This means that they check whether the GPIO is configured to be active-low,
+and if so, they manipulate the passed value before the physical line level is
+driven.
+
+With this, all the gpiod_set_(array)_value_xxx() functions interpret the
+parameter "value" as "active" ("1") or "inactive" ("0"). The physical line
+level will be driven accordingly.
+
+As an example, if the active-low property for a dedicated GPIO is set, and the
+gpiod_set_(array)_value_xxx() passes "active" ("1"), the physical line level
+will be driven low.
+
+To summarize:
+
+Function (example)               active-low proporty  physical line
+gpiod_set_raw_value(desc, 0);        don't care           low
+gpiod_set_raw_value(desc, 1);        don't care           high
+gpiod_set_value(desc, 0);       default (active-high)     low
+gpiod_set_value(desc, 1);       default (active-high)     high
+gpiod_set_value(desc, 0);             active-low          high
+gpiod_set_value(desc, 1);             active-low          low
+
+Please note again that the set_raw/get_raw functions should be avoided as much
+as possible, especially by drivers which should not care about the actual
+physical line level and worry about the logical value instead.
+
+
 Set multiple GPIO outputs with a single function call
 -----------------------------------------------------
 The following functions set the output values of an array of GPIOs:
-- 
cgit v1.2.3


From c21e678b256baec428662704138d85cfc593abf4 Mon Sep 17 00:00:00 2001
From: Andreas Fenkart <afenkart@gmail.com>
Date: Tue, 7 Jul 2015 19:53:10 +0200
Subject: Documentation: dt: update ti,am33xx-hsmmc swakeup workaround

Before 5b83b2234be6733cf the driver was hard coding the wakeup irq to
be active low. The generic pm wakeirq does not override the active
high/low parameter, hence it must be specified correctly in the
device tree.
Mind that SDIO IRQ is active low as defined in the SDIO specification

Signed-off-by: Andreas Fenkart <afenkart@gmail.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt b/Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt
index 76bf087bc889..74166a0d460d 100644
--- a/Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt
+++ b/Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt
@@ -102,7 +102,7 @@ not every application needs SDIO irq, e.g. MMC cards.
 		pinctrl-1 = <&mmc1_idle>;
 		pinctrl-2 = <&mmc1_sleep>;
 		...
-		interrupts-extended = <&intc 64 &gpio2 28 0>;
+		interrupts-extended = <&intc 64 &gpio2 28 GPIO_ACTIVE_LOW>;
 	};
 
 	mmc1_idle : pinmux_cirq_pin {
-- 
cgit v1.2.3


From 6e146f5c41da7e9601fe92fb4d06b45431dbf95b Mon Sep 17 00:00:00 2001
From: Thierry Reding <thierry.reding@gmail.com>
Date: Mon, 27 Jul 2015 12:00:23 +0200
Subject: pwm: Add to device-drivers documentation

Add a short introductory text along with API documentation generated
from kerneldoc comments for the PWM framework.

Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
---
 Documentation/DocBook/device-drivers.tmpl | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index faf09d4a0ea8..172bff8cda79 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -455,4 +455,31 @@ X!Ilib/fonts/fonts.c
 !Edrivers/hsi/hsi.c
   </chapter>
 
+  <chapter id="pwm">
+    <title>Pulse-Width Modulation (PWM)</title>
+    <para>
+      Pulse-width modulation is a modulation technique primarily used to
+      control power supplied to electrical devices.
+    </para>
+    <para>
+      The PWM framework provides an abstraction for providers and consumers
+      of PWM signals. A controller that provides one or more PWM signals is
+      registered as <structname>struct pwm_chip</structname>. Providers are
+      expected to embed this structure in a driver-specific structure. This
+      structure contains fields that describe a particular chip.
+    </para>
+    <para>
+      A chip exposes one or more PWM signal sources, each of which exposed
+      as a <structname>struct pwm_device</structname>. Operations can be
+      performed on PWM devices to control the period, duty cycle, polarity
+      and active state of the signal.
+    </para>
+    <para>
+      Note that PWM devices are exclusive resources: they can always only be
+      used by one consumer at a time.
+    </para>
+!Iinclude/linux/pwm.h
+!Edrivers/pwm/core.c
+  </chapter>
+
 </book>
-- 
cgit v1.2.3


From 649ca820dab3d76e12408b74af3e8e97abb07ae0 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Mon, 8 Jun 2015 09:56:20 -0700
Subject: hwmon: (ltc2978) Add support for LTC2975

LTC2975 is mostly compatible to LTC2974, but supports input current
and power measurement.

Tested-by: Michael Jones <mike@proclivis.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 .../devicetree/bindings/hwmon/ltc2978.txt          |   3 +-
 Documentation/hwmon/ltc2978                        |  47 ++++++----
 drivers/hwmon/pmbus/Kconfig                        |   4 +-
 drivers/hwmon/pmbus/ltc2978.c                      | 104 +++++++++++++++++++--
 4 files changed, 125 insertions(+), 33 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/hwmon/ltc2978.txt b/Documentation/devicetree/bindings/hwmon/ltc2978.txt
index c1d23c14ddd9..e434d5fba2ae 100644
--- a/Documentation/devicetree/bindings/hwmon/ltc2978.txt
+++ b/Documentation/devicetree/bindings/hwmon/ltc2978.txt
@@ -3,6 +3,7 @@ ltc2978
 Required properties:
 - compatible: should contain one of:
   * "lltc,ltc2974"
+  * "lltc,ltc2975"
   * "lltc,ltc2977"
   * "lltc,ltc2978"
   * "lltc,ltc3880"
@@ -19,7 +20,7 @@ Optional properties:
   standard binding for regulators; see regulator.txt.
 
 Valid names of regulators depend on number of supplies supported per device:
-  * ltc2974 : vout0 - vout3
+  * ltc2974, ltc2975 : vout0 - vout3
   * ltc2977 : vout0 - vout7
   * ltc2978 : vout0 - vout7
   * ltc3880, ltc3882 : vout0 - vout1
diff --git a/Documentation/hwmon/ltc2978 b/Documentation/hwmon/ltc2978
index ff130997f22e..7e982884e359 100644
--- a/Documentation/hwmon/ltc2978
+++ b/Documentation/hwmon/ltc2978
@@ -6,6 +6,10 @@ Supported chips:
     Prefix: 'ltc2974'
     Addresses scanned: -
     Datasheet: http://www.linear.com/product/ltc2974
+  * Linear Technology LTC2975
+    Prefix: 'ltc2975'
+    Addresses scanned: -
+    Datasheet: http://www.linear.com/product/ltc2975
   * Linear Technology LTC2977
     Prefix: 'ltc2977'
     Addresses scanned: -
@@ -42,7 +46,7 @@ Author: Guenter Roeck <linux@roeck-us.net>
 Description
 -----------
 
-LTC2974 is a quad digital power supply managers.
+LTC2974 and LTC2975 are quad digital power supply managers.
 LTC2978 is an octal power supply monitor.
 LTC2977 is a pin compatible replacement for LTC2978.
 LTC3880, LTC3882, and LTC3887 are dual output poly-phase step-down DC/DC
@@ -71,23 +75,23 @@ in1_label		"vin"
 in1_input		Measured input voltage.
 in1_min			Minimum input voltage.
 in1_max			Maximum input voltage.
-			LTC2974, LTC2977, and LTC2978 only.
+			LTC2974, LTC2975, LTC2977, and LTC2978 only.
 in1_lcrit		Critical minimum input voltage.
-			LTC2974, LTC2977, and LTC2978 only.
+			LTC2974, LTC2975, LTC2977, and LTC2978 only.
 in1_crit		Critical maximum input voltage.
 in1_min_alarm		Input voltage low alarm.
 in1_max_alarm		Input voltage high alarm.
-			LTC2974, LTC2977, and LTC2978 only.
+			LTC2974, LTC2975, LTC2977, and LTC2978 only.
 in1_lcrit_alarm		Input voltage critical low alarm.
-			LTC2974, LTC2977, and LTC2978 only.
+			LTC2974, LTC2975, LTC2977, and LTC2978 only.
 in1_crit_alarm		Input voltage critical high alarm.
 in1_lowest		Lowest input voltage.
-			LTC2974, LTC2977, and LTC2978 only.
+			LTC2974, LTC2975, LTC2977, and LTC2978 only.
 in1_highest		Highest input voltage.
 in1_reset_history	Reset input voltage history.
 
 in[N]_label		"vout[1-8]".
-			LTC2974: N=2-5
+			LTC2974, LTC2975: N=2-5
 			LTC2977: N=2-9
 			LTC2978: N=2-9
 			LTC3880, LTC3882, LTC3887, LTM4676: N=2-3
@@ -101,13 +105,14 @@ in[N]_min_alarm		Output voltage low alarm.
 in[N]_max_alarm		Output voltage high alarm.
 in[N]_lcrit_alarm	Output voltage critical low alarm.
 in[N]_crit_alarm	Output voltage critical high alarm.
-in[N]_lowest		Lowest output voltage. LTC2974 and LTC2978 only.
+in[N]_lowest		Lowest output voltage. LTC2974, LTC2975,
+			and LTC2978 only.
 in[N]_highest		Highest output voltage.
 in[N]_reset_history	Reset output voltage history.
 
 temp[N]_input		Measured temperature.
-			On LTC2974, temp[1-4] report external temperatures,
-			and temp5 reports the chip temperature.
+			On LTC2974 and LTC2975, temp[1-4] report external
+			temperatures, and temp5 reports the chip temperature.
 			On LTC2977 and LTC2978, only one temperature measurement
 			is supported and reports the chip temperature.
 			On LTC3880, LTC3882, LTC3887, and LTM4676, temp1 and
@@ -120,23 +125,24 @@ temp[N]_max		Maximum temperature.
 temp[N]_lcrit		Critical low temperature.
 temp[N]_crit		Critical high temperature.
 temp[N]_min_alarm	Temperature low alarm.
-			LTC2974, LTC2977, and LTC2978 only.
+			LTC2974, LTC2975, LTC2977, and LTC2978 only.
 temp[N]_max_alarm	Temperature high alarm.
 temp[N]_lcrit_alarm	Temperature critical low alarm.
 temp[N]_crit_alarm	Temperature critical high alarm.
 temp[N]_lowest		Lowest measured temperature.
-			LTC2974, LTC2977, and LTC2978 only.
-			Not supported for chip temperature sensor on LTC2974.
+			LTC2974, LTC2975, LTC2977, and LTC2978 only.
+			Not supported for chip temperature sensor on LTC2974 and
+			LTC2975.
 temp[N]_highest		Highest measured temperature. Not supported for chip
-			temperature sensor on LTC2974.
+			temperature sensor on LTC2974 and LTC2975.
 temp[N]_reset_history	Reset temperature history. Not supported for chip
-			temperature sensor on LTC2974.
+			temperature sensor on LTC2974 and LTC2975.
 
 power1_label		"pin". LTC3883 only.
 power1_input		Measured input power.
 
 power[N]_label		"pout[1-4]".
-			LTC2974: N=1-4
+			LTC2974, LTC2975: N=1-4
 			LTC2977: Not supported
 			LTC2978: Not supported
 			LTC3880, LTC3882, LTC3887, LTM4676: N=1-2
@@ -151,7 +157,7 @@ curr1_highest		Highest input current. LTC3883 only.
 curr1_reset_history	Reset input current history. LTC3883 only.
 
 curr[N]_label		"iout[1-4]".
-			LTC2974: N=1-4
+			LTC2974, LTC2975: N=1-4
 			LTC2977: not supported
 			LTC2978: not supported
 			LTC3880, LTC3882, LTC3887, LTM4676: N=2-3
@@ -159,10 +165,11 @@ curr[N]_label		"iout[1-4]".
 curr[N]_input		Measured output current.
 curr[N]_max		Maximum output current.
 curr[N]_crit		Critical high output current.
-curr[N]_lcrit		Critical low output current. LTC2974 only.
+curr[N]_lcrit		Critical low output current. LTC2974 and LTC2975 only.
 curr[N]_max_alarm	Output current high alarm.
 curr[N]_crit_alarm	Output current critical high alarm.
-curr[N]_lcrit_alarm	Output current critical low alarm. LTC2974 only.
-curr[N]_lowest		Lowest output current. LTC2974 only.
+curr[N]_lcrit_alarm	Output current critical low alarm.
+			LTC2974 and LTC2975 only.
+curr[N]_lowest		Lowest output current. LTC2974 and LTC2975 only.
 curr[N]_highest		Highest output current.
 curr[N]_reset_history	Reset output current history.
diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index 43b6de900ce5..8279727987cb 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -52,8 +52,8 @@ config SENSORS_LTC2978
 	default n
 	help
 	  If you say yes here you get hardware monitoring support for Linear
-	  Technology LTC2974, LTC2977, LTC2978, LTC3880, LTC3883, LTC3887,
-	  and LTM4676.
+	  Technology LTC2974, LTC2975, LTC2977, LTC2978, LTC3880, LTC3883,
+	  LTC3887, and LTM4676.
 
 	  This driver can also be built as a module. If so, the module will
 	  be called ltc2978.
diff --git a/drivers/hwmon/pmbus/ltc2978.c b/drivers/hwmon/pmbus/ltc2978.c
index 0756d8ae9dad..e153ffd59ab0 100644
--- a/drivers/hwmon/pmbus/ltc2978.c
+++ b/drivers/hwmon/pmbus/ltc2978.c
@@ -1,9 +1,8 @@
 /*
- * Hardware monitoring driver for LTC2974, LTC2977, LTC2978, LTC3880,
- * LTC3883, LTC3887. and LTM4676
+ * Hardware monitoring driver for LTC2978 and compatible chips.
  *
  * Copyright (c) 2011 Ericsson AB.
- * Copyright (c) 2013, 2014 Guenter Roeck
+ * Copyright (c) 2013, 2014, 2015 Guenter Roeck
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -25,8 +24,8 @@
 #include <linux/regulator/driver.h>
 #include "pmbus.h"
 
-enum chips { ltc2974, ltc2977, ltc2978, ltc3880, ltc3882, ltc3883, ltc3887,
-	     ltm4676 };
+enum chips { ltc2974, ltc2975, ltc2977, ltc2978, ltc3880, ltc3882, ltc3883,
+	     ltc3887, ltm4676 };
 
 /* Common for all chips */
 #define LTC2978_MFR_VOUT_PEAK		0xdd
@@ -34,12 +33,12 @@ enum chips { ltc2974, ltc2977, ltc2978, ltc3880, ltc3882, ltc3883, ltc3887,
 #define LTC2978_MFR_TEMPERATURE_PEAK	0xdf
 #define LTC2978_MFR_SPECIAL_ID		0xe7	/* Not on LTC3882 */
 
-/* LTC2974, LCT2977, and LTC2978 */
+/* LTC2974, LTC2975, LCT2977, and LTC2978 */
 #define LTC2978_MFR_VOUT_MIN		0xfb
 #define LTC2978_MFR_VIN_MIN		0xfc
 #define LTC2978_MFR_TEMPERATURE_MIN	0xfd
 
-/* LTC2974 only */
+/* LTC2974, LTC2975 */
 #define LTC2974_MFR_IOUT_PEAK		0xd7
 #define LTC2974_MFR_IOUT_MIN		0xd8
 
@@ -51,8 +50,15 @@ enum chips { ltc2974, ltc2977, ltc2978, ltc3880, ltc3882, ltc3883, ltc3887,
 /* LTC3883 only */
 #define LTC3883_MFR_IIN_PEAK		0xe1
 
+/* LTC2975 only */
+#define LTC2975_MFR_IIN_PEAK		0xc4
+#define LTC2975_MFR_IIN_MIN		0xc5
+#define LTC2975_MFR_PIN_PEAK		0xc6
+#define LTC2975_MFR_PIN_MIN		0xc7
+
 #define LTC2974_ID_REV1			0x0212
 #define LTC2974_ID_REV2			0x0213
+#define LTC2975_ID			0x0223
 #define LTC2977_ID			0x0130
 #define LTC2978_ID_REV1			0x0121
 #define LTC2978_ID_REV2			0x0122
@@ -87,7 +93,8 @@ struct ltc2978_data {
 	u16 temp_min[LTC2974_NUM_PAGES], temp_max[LTC2974_NUM_PAGES];
 	u16 vout_min[LTC2978_NUM_PAGES], vout_max[LTC2978_NUM_PAGES];
 	u16 iout_min[LTC2974_NUM_PAGES], iout_max[LTC2974_NUM_PAGES];
-	u16 iin_max;
+	u16 iin_min, iin_max;
+	u16 pin_min, pin_max;
 	u16 temp2_max;
 	struct pmbus_driver_info info;
 };
@@ -246,6 +253,60 @@ static int ltc2974_read_word_data(struct i2c_client *client, int page, int reg)
 	return ret;
 }
 
+static int ltc2975_read_word_data(struct i2c_client *client, int page, int reg)
+{
+	const struct pmbus_driver_info *info = pmbus_get_driver_info(client);
+	struct ltc2978_data *data = to_ltc2978_data(info);
+	int ret;
+
+	switch (reg) {
+	case PMBUS_VIRT_READ_IIN_MAX:
+		ret = pmbus_read_word_data(client, page, LTC2975_MFR_IIN_PEAK);
+		if (ret >= 0) {
+			if (lin11_to_val(ret)
+			    > lin11_to_val(data->iin_max))
+				data->iin_max = ret;
+			ret = data->iin_max;
+		}
+		break;
+	case PMBUS_VIRT_READ_IIN_MIN:
+		ret = pmbus_read_word_data(client, page, LTC2975_MFR_IIN_MIN);
+		if (ret >= 0) {
+			if (lin11_to_val(ret)
+			    < lin11_to_val(data->iin_min))
+				data->iin_min = ret;
+			ret = data->iin_min;
+		}
+		break;
+	case PMBUS_VIRT_READ_PIN_MAX:
+		ret = pmbus_read_word_data(client, page, LTC2975_MFR_PIN_PEAK);
+		if (ret >= 0) {
+			if (lin11_to_val(ret)
+			    > lin11_to_val(data->pin_max))
+				data->pin_max = ret;
+			ret = data->pin_max;
+		}
+		break;
+	case PMBUS_VIRT_READ_PIN_MIN:
+		ret = pmbus_read_word_data(client, page, LTC2975_MFR_PIN_MIN);
+		if (ret >= 0) {
+			if (lin11_to_val(ret)
+			    < lin11_to_val(data->pin_min))
+				data->pin_min = ret;
+			ret = data->pin_min;
+		}
+		break;
+	case PMBUS_VIRT_RESET_IIN_HISTORY:
+	case PMBUS_VIRT_RESET_PIN_HISTORY:
+		ret = 0;
+		break;
+	default:
+		ret = ltc2978_read_word_data(client, page, reg);
+		break;
+	}
+	return ret;
+}
+
 static int ltc3880_read_word_data(struct i2c_client *client, int page, int reg)
 {
 	const struct pmbus_driver_info *info = pmbus_get_driver_info(client);
@@ -337,7 +398,13 @@ static int ltc2978_write_word_data(struct i2c_client *client, int page,
 	switch (reg) {
 	case PMBUS_VIRT_RESET_IIN_HISTORY:
 		data->iin_max = 0x7c00;
-		ret = ltc2978_clear_peaks(client, page, data->id);
+		data->iin_min = 0x7bff;
+		ret = ltc2978_clear_peaks(client, 0, data->id);
+		break;
+	case PMBUS_VIRT_RESET_PIN_HISTORY:
+		data->pin_max = 0x7c00;
+		data->pin_min = 0x7bff;
+		ret = ltc2978_clear_peaks(client, 0, data->id);
 		break;
 	case PMBUS_VIRT_RESET_IOUT_HISTORY:
 		data->iout_max[page] = 0x7c00;
@@ -372,6 +439,7 @@ static int ltc2978_write_word_data(struct i2c_client *client, int page,
 
 static const struct i2c_device_id ltc2978_id[] = {
 	{"ltc2974", ltc2974},
+	{"ltc2975", ltc2975},
 	{"ltc2977", ltc2977},
 	{"ltc2978", ltc2978},
 	{"ltc3880", ltc3880},
@@ -428,6 +496,8 @@ static int ltc2978_get_id(struct i2c_client *client)
 
 	if (chip_id == LTC2974_ID_REV1 || chip_id == LTC2974_ID_REV2)
 		return ltc2974;
+	else if (chip_id == LTC2975_ID)
+		return ltc2975;
 	else if (chip_id == LTC2977_ID)
 		return ltc2977;
 	else if (chip_id == LTC2978_ID_REV1 || chip_id == LTC2978_ID_REV2 ||
@@ -505,6 +575,19 @@ static int ltc2978_probe(struct i2c_client *client,
 			  | PMBUS_HAVE_IOUT | PMBUS_HAVE_STATUS_IOUT;
 		}
 		break;
+	case ltc2975:
+		info->read_word_data = ltc2975_read_word_data;
+		info->pages = LTC2974_NUM_PAGES;
+		info->func[0] = PMBUS_HAVE_IIN | PMBUS_HAVE_PIN
+		  | PMBUS_HAVE_VIN | PMBUS_HAVE_STATUS_INPUT
+		  | PMBUS_HAVE_TEMP2;
+		for (i = 0; i < info->pages; i++) {
+			info->func[i] |= PMBUS_HAVE_VOUT
+			  | PMBUS_HAVE_STATUS_VOUT | PMBUS_HAVE_POUT
+			  | PMBUS_HAVE_TEMP | PMBUS_HAVE_STATUS_TEMP
+			  | PMBUS_HAVE_IOUT | PMBUS_HAVE_STATUS_IOUT;
+		}
+		break;
 	case ltc2977:
 	case ltc2978:
 		info->read_word_data = ltc2978_read_word_data;
@@ -576,6 +659,7 @@ static int ltc2978_probe(struct i2c_client *client,
 #ifdef CONFIG_OF
 static const struct of_device_id ltc2978_of_match[] = {
 	{ .compatible = "lltc,ltc2974" },
+	{ .compatible = "lltc,ltc2975" },
 	{ .compatible = "lltc,ltc2977" },
 	{ .compatible = "lltc,ltc2978" },
 	{ .compatible = "lltc,ltc3880" },
@@ -601,5 +685,5 @@ static struct i2c_driver ltc2978_driver = {
 module_i2c_driver(ltc2978_driver);
 
 MODULE_AUTHOR("Guenter Roeck");
-MODULE_DESCRIPTION("PMBus driver for LTC2974, LTC2978, LTC3880, LTC3883, and LTM4676");
+MODULE_DESCRIPTION("PMBus driver for LTC2978 and comppatible chips");
 MODULE_LICENSE("GPL");
-- 
cgit v1.2.3


From 52aae6af71e0e78e25c64e13266917bb323984d5 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Mon, 17 Aug 2015 06:38:01 -0700
Subject: hwmon: (ltc2978) Add support for LTC2980 and LTM2987

LTC2980 and LTM2987 are command compatible to LTC2977. They consist of
two LTC2977 on a single die, and are instantiated as two separate chips,
each supporting eight channels.

Suggested-by: Michael Jones <mike@proclivis.com>
Tested-by: Michael Jones <mike@proclivis.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 .../devicetree/bindings/hwmon/ltc2978.txt          |  4 +-
 Documentation/hwmon/ltc2978                        | 49 ++++++++++++++++------
 drivers/hwmon/pmbus/Kconfig                        |  4 +-
 drivers/hwmon/pmbus/ltc2978.c                      | 20 +++++++--
 4 files changed, 58 insertions(+), 19 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/hwmon/ltc2978.txt b/Documentation/devicetree/bindings/hwmon/ltc2978.txt
index e434d5fba2ae..5dc2691ccf08 100644
--- a/Documentation/devicetree/bindings/hwmon/ltc2978.txt
+++ b/Documentation/devicetree/bindings/hwmon/ltc2978.txt
@@ -6,10 +6,12 @@ Required properties:
   * "lltc,ltc2975"
   * "lltc,ltc2977"
   * "lltc,ltc2978"
+  * "lltc,ltc2980"
   * "lltc,ltc3880"
   * "lltc,ltc3882"
   * "lltc,ltc3883"
   * "lltc,ltc3887"
+  * "lltc,ltm2987"
   * "lltc,ltm4676"
 - reg: I2C slave address
 
@@ -21,7 +23,7 @@ Optional properties:
 
 Valid names of regulators depend on number of supplies supported per device:
   * ltc2974, ltc2975 : vout0 - vout3
-  * ltc2977 : vout0 - vout7
+  * ltc2977, ltc2980, ltm2987 : vout0 - vout7
   * ltc2978 : vout0 - vout7
   * ltc3880, ltc3882 : vout0 - vout1
   * ltc3883 : vout0
diff --git a/Documentation/hwmon/ltc2978 b/Documentation/hwmon/ltc2978
index 7e982884e359..158355398798 100644
--- a/Documentation/hwmon/ltc2978
+++ b/Documentation/hwmon/ltc2978
@@ -19,6 +19,10 @@ Supported chips:
     Addresses scanned: -
     Datasheet: http://www.linear.com/product/ltc2978
     	       http://www.linear.com/product/ltc2978a
+  * Linear Technology LTC2980
+    Prefix: 'ltc2980'
+    Addresses scanned: -
+    Datasheet: http://www.linear.com/product/ltc2980
   * Linear Technology LTC3880
     Prefix: 'ltc3880'
     Addresses scanned: -
@@ -35,6 +39,10 @@ Supported chips:
     Prefix: 'ltc3887'
     Addresses scanned: -
     Datasheet: http://www.linear.com/product/ltc3887
+  * Linear Technology LTM2987
+    Prefix: 'ltm2987'
+    Addresses scanned: -
+    Datasheet: http://www.linear.com/product/ltm2987
   * Linear Technology LTM4676
     Prefix: 'ltm4676'
     Addresses scanned: -
@@ -49,9 +57,15 @@ Description
 LTC2974 and LTC2975 are quad digital power supply managers.
 LTC2978 is an octal power supply monitor.
 LTC2977 is a pin compatible replacement for LTC2978.
+LTC2980 is a 16-channel Power System Manager, consisting of two LTC2977
+in a single die. The chip is instantiated and reported as two separate chips
+on two different I2C bus addresses.
 LTC3880, LTC3882, and LTC3887 are dual output poly-phase step-down DC/DC
 controllers.
 LTC3883 is a single phase step-down DC/DC controller.
+LTM2987 is a 16-channel Power System Manager with two LTC2977 plus
+additional components on a single die. The chip is instantiated and reported
+as two separate chips on two different I2C bus addresses.
 LTM4676 is a dual 13A or single 26A uModule regulator.
 
 
@@ -75,24 +89,29 @@ in1_label		"vin"
 in1_input		Measured input voltage.
 in1_min			Minimum input voltage.
 in1_max			Maximum input voltage.
-			LTC2974, LTC2975, LTC2977, and LTC2978 only.
+			LTC2974, LTC2975, LTC2977, LTC2980, LTC2978, and
+			LTM2987 only.
 in1_lcrit		Critical minimum input voltage.
-			LTC2974, LTC2975, LTC2977, and LTC2978 only.
+			LTC2974, LTC2975, LTC2977, LTC2980, LTC2978, and
+			LTM2987 only.
 in1_crit		Critical maximum input voltage.
 in1_min_alarm		Input voltage low alarm.
 in1_max_alarm		Input voltage high alarm.
-			LTC2974, LTC2975, LTC2977, and LTC2978 only.
+			LTC2974, LTC2975, LTC2977, LTC2980, LTC2978, and
+			LTM2987 only.
 in1_lcrit_alarm		Input voltage critical low alarm.
-			LTC2974, LTC2975, LTC2977, and LTC2978 only.
+			LTC2974, LTC2975, LTC2977, LTC2980, LTC2978, and
+			LTM2987 only.
 in1_crit_alarm		Input voltage critical high alarm.
 in1_lowest		Lowest input voltage.
-			LTC2974, LTC2975, LTC2977, and LTC2978 only.
+			LTC2974, LTC2975, LTC2977, LTC2980, LTC2978, and
+			LTM2987 only.
 in1_highest		Highest input voltage.
 in1_reset_history	Reset input voltage history.
 
 in[N]_label		"vout[1-8]".
 			LTC2974, LTC2975: N=2-5
-			LTC2977: N=2-9
+			LTC2977, LTC2980, LTM2987: N=2-9
 			LTC2978: N=2-9
 			LTC3880, LTC3882, LTC3887, LTM4676: N=2-3
 			LTC3883: N=2
@@ -113,24 +132,28 @@ in[N]_reset_history	Reset output voltage history.
 temp[N]_input		Measured temperature.
 			On LTC2974 and LTC2975, temp[1-4] report external
 			temperatures, and temp5 reports the chip temperature.
-			On LTC2977 and LTC2978, only one temperature measurement
-			is supported and reports the chip temperature.
+			On LTC2977, LTC2980, LTC2978, and LTM2987, only one
+			temperature measurement is supported and reports
+			the chip temperature.
 			On LTC3880, LTC3882, LTC3887, and LTM4676, temp1 and
 			temp2 report external temperatures, and temp3 reports
 			the chip temperature.
 			On LTC3883, temp1 reports an external temperature,
 			and temp2 reports the chip temperature.
-temp[N]_min		Mimimum temperature. LTC2974, LCT2977, and LTC2978 only.
+temp[N]_min		Mimimum temperature. LTC2974, LCT2977, LTM2980, LTC2978,
+			and LTM2987 only.
 temp[N]_max		Maximum temperature.
 temp[N]_lcrit		Critical low temperature.
 temp[N]_crit		Critical high temperature.
 temp[N]_min_alarm	Temperature low alarm.
-			LTC2974, LTC2975, LTC2977, and LTC2978 only.
+			LTC2974, LTC2975, LTC2977, LTM2980, LTC2978, and
+			LTM2987 only.
 temp[N]_max_alarm	Temperature high alarm.
 temp[N]_lcrit_alarm	Temperature critical low alarm.
 temp[N]_crit_alarm	Temperature critical high alarm.
 temp[N]_lowest		Lowest measured temperature.
-			LTC2974, LTC2975, LTC2977, and LTC2978 only.
+			LTC2974, LTC2975, LTC2977, LTM2980, LTC2978, and
+			LTM2987 only.
 			Not supported for chip temperature sensor on LTC2974 and
 			LTC2975.
 temp[N]_highest		Highest measured temperature. Not supported for chip
@@ -143,7 +166,7 @@ power1_input		Measured input power.
 
 power[N]_label		"pout[1-4]".
 			LTC2974, LTC2975: N=1-4
-			LTC2977: Not supported
+			LTC2977, LTC2980, LTM2987: Not supported
 			LTC2978: Not supported
 			LTC3880, LTC3882, LTC3887, LTM4676: N=1-2
 			LTC3883: N=2
@@ -158,7 +181,7 @@ curr1_reset_history	Reset input current history. LTC3883 only.
 
 curr[N]_label		"iout[1-4]".
 			LTC2974, LTC2975: N=1-4
-			LTC2977: not supported
+			LTC2977, LTC2980, LTM2987: not supported
 			LTC2978: not supported
 			LTC3880, LTC3882, LTC3887, LTM4676: N=2-3
 			LTC3883: N=2
diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index 8279727987cb..af778aed4033 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -52,8 +52,8 @@ config SENSORS_LTC2978
 	default n
 	help
 	  If you say yes here you get hardware monitoring support for Linear
-	  Technology LTC2974, LTC2975, LTC2977, LTC2978, LTC3880, LTC3883,
-	  LTC3887, and LTM4676.
+	  Technology LTC2974, LTC2975, LTC2977, LTC2978, LTC2980, LTC3880,
+	  LTC3883, LTC3887, LTCM2987, and LTM4676.
 
 	  This driver can also be built as a module. If so, the module will
 	  be called ltc2978.
diff --git a/drivers/hwmon/pmbus/ltc2978.c b/drivers/hwmon/pmbus/ltc2978.c
index e9d3f828fe46..48dcde0bc740 100644
--- a/drivers/hwmon/pmbus/ltc2978.c
+++ b/drivers/hwmon/pmbus/ltc2978.c
@@ -24,8 +24,8 @@
 #include <linux/regulator/driver.h>
 #include "pmbus.h"
 
-enum chips { ltc2974, ltc2975, ltc2977, ltc2978, ltc3880, ltc3882, ltc3883,
-	     ltc3887, ltm4676 };
+enum chips { ltc2974, ltc2975, ltc2977, ltc2978, ltc2980, ltc3880, ltc3882,
+	ltc3883, ltc3887, ltm2987, ltm4676 };
 
 /* Common for all chips */
 #define LTC2978_MFR_VOUT_PEAK		0xdd
@@ -33,7 +33,7 @@ enum chips { ltc2974, ltc2975, ltc2977, ltc2978, ltc3880, ltc3882, ltc3883,
 #define LTC2978_MFR_TEMPERATURE_PEAK	0xdf
 #define LTC2978_MFR_SPECIAL_ID		0xe7	/* Undocumented on LTC3882 */
 
-/* LTC2974, LTC2975, LCT2977, and LTC2978 */
+/* LTC2974, LTC2975, LCT2977, LTC2980, LTC2978, and LTM2987 */
 #define LTC2978_MFR_VOUT_MIN		0xfb
 #define LTC2978_MFR_VIN_MIN		0xfc
 #define LTC2978_MFR_TEMPERATURE_MIN	0xfd
@@ -63,11 +63,15 @@ enum chips { ltc2974, ltc2975, ltc2977, ltc2978, ltc3880, ltc3882, ltc3883,
 #define LTC2977_ID			0x0130
 #define LTC2978_ID_REV1			0x0110	/* Early revision */
 #define LTC2978_ID_REV2			0x0120
+#define LTC2980_ID_A			0x8030	/* A/B for two die IDs */
+#define LTC2980_ID_B			0x8040
 #define LTC3880_ID			0x4020
 #define LTC3882_ID			0x4200
 #define LTC3882_ID_D1			0x4240	/* Dash 1 */
 #define LTC3883_ID			0x4300
 #define LTC3887_ID			0x4700
+#define LTM2987_ID_A			0x8010	/* A/B for two die IDs */
+#define LTM2987_ID_B			0x8020
 #define LTM4676_ID_REV1			0x4400
 #define LTM4676_ID_REV2			0x4480
 #define LTM4676A_ID			0x47e0
@@ -409,10 +413,12 @@ static const struct i2c_device_id ltc2978_id[] = {
 	{"ltc2975", ltc2975},
 	{"ltc2977", ltc2977},
 	{"ltc2978", ltc2978},
+	{"ltc2980", ltc2980},
 	{"ltc3880", ltc3880},
 	{"ltc3882", ltc3882},
 	{"ltc3883", ltc3883},
 	{"ltc3887", ltc3887},
+	{"ltm2987", ltm2987},
 	{"ltm4676", ltm4676},
 	{}
 };
@@ -471,6 +477,8 @@ static int ltc2978_get_id(struct i2c_client *client)
 		return ltc2977;
 	else if (chip_id == LTC2978_ID_REV1 || chip_id == LTC2978_ID_REV2)
 		return ltc2978;
+	else if (chip_id == LTC2980_ID_A || chip_id == LTC2980_ID_B)
+		return ltc2980;
 	else if (chip_id == LTC3880_ID)
 		return ltc3880;
 	else if (chip_id == LTC3882_ID || chip_id == LTC3882_ID_D1)
@@ -479,6 +487,8 @@ static int ltc2978_get_id(struct i2c_client *client)
 		return ltc3883;
 	else if (chip_id == LTC3887_ID)
 		return ltc3887;
+	else if (chip_id == LTM2987_ID_A || chip_id == LTM2987_ID_B)
+		return ltm2987;
 	else if (chip_id == LTM4676_ID_REV1 || chip_id == LTM4676_ID_REV2 ||
 		 chip_id == LTM4676A_ID)
 		return ltm4676;
@@ -559,6 +569,8 @@ static int ltc2978_probe(struct i2c_client *client,
 		break;
 	case ltc2977:
 	case ltc2978:
+	case ltc2980:
+	case ltm2987:
 		info->read_word_data = ltc2978_read_word_data;
 		info->pages = LTC2978_NUM_PAGES;
 		info->func[0] = PMBUS_HAVE_VIN | PMBUS_HAVE_STATUS_INPUT
@@ -634,10 +646,12 @@ static const struct of_device_id ltc2978_of_match[] = {
 	{ .compatible = "lltc,ltc2975" },
 	{ .compatible = "lltc,ltc2977" },
 	{ .compatible = "lltc,ltc2978" },
+	{ .compatible = "lltc,ltc2980" },
 	{ .compatible = "lltc,ltc3880" },
 	{ .compatible = "lltc,ltc3882" },
 	{ .compatible = "lltc,ltc3883" },
 	{ .compatible = "lltc,ltc3887" },
+	{ .compatible = "lltc,ltm2987" },
 	{ .compatible = "lltc,ltm4676" },
 	{ }
 };
-- 
cgit v1.2.3


From 228b687d9e20f367e4d5ea8723e3abbf0892db61 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Mon, 17 Aug 2015 07:21:43 -0700
Subject: hwmon: (ltc2978) Add support for LTC3886

LTC3886 is a is a dual PolyPhase DC/DC synchronous step-down switching
regulator controller. It is mostly command compatible to LTC3883,
but supports two phases instead of one.

Suggested-by: Michael Jones <mike@proclivis.com>
Tested-by: Michael Jones <mike@proclivis.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 .../devicetree/bindings/hwmon/ltc2978.txt          |  3 ++-
 Documentation/hwmon/ltc2978                        | 23 +++++++++++++--------
 drivers/hwmon/pmbus/Kconfig                        |  2 +-
 drivers/hwmon/pmbus/ltc2978.c                      | 24 ++++++++++++++++++++--
 4 files changed, 39 insertions(+), 13 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/hwmon/ltc2978.txt b/Documentation/devicetree/bindings/hwmon/ltc2978.txt
index 5dc2691ccf08..a7afbf60bb9c 100644
--- a/Documentation/devicetree/bindings/hwmon/ltc2978.txt
+++ b/Documentation/devicetree/bindings/hwmon/ltc2978.txt
@@ -10,6 +10,7 @@ Required properties:
   * "lltc,ltc3880"
   * "lltc,ltc3882"
   * "lltc,ltc3883"
+  * "lltc,ltc3886"
   * "lltc,ltc3887"
   * "lltc,ltm2987"
   * "lltc,ltm4676"
@@ -25,7 +26,7 @@ Valid names of regulators depend on number of supplies supported per device:
   * ltc2974, ltc2975 : vout0 - vout3
   * ltc2977, ltc2980, ltm2987 : vout0 - vout7
   * ltc2978 : vout0 - vout7
-  * ltc3880, ltc3882 : vout0 - vout1
+  * ltc3880, ltc3882, ltc3886 : vout0 - vout1
   * ltc3883 : vout0
   * ltm4676 : vout0 - vout1
 
diff --git a/Documentation/hwmon/ltc2978 b/Documentation/hwmon/ltc2978
index 158355398798..f997aa53eb95 100644
--- a/Documentation/hwmon/ltc2978
+++ b/Documentation/hwmon/ltc2978
@@ -35,6 +35,10 @@ Supported chips:
     Prefix: 'ltc3883'
     Addresses scanned: -
     Datasheet: http://www.linear.com/product/ltc3883
+  * Linear Technology LTC3886
+    Prefix: 'ltc3886'
+    Addresses scanned: -
+    Datasheet: http://www.linear.com/product/ltc3886
   * Linear Technology LTC3887
     Prefix: 'ltc3887'
     Addresses scanned: -
@@ -60,8 +64,8 @@ LTC2977 is a pin compatible replacement for LTC2978.
 LTC2980 is a 16-channel Power System Manager, consisting of two LTC2977
 in a single die. The chip is instantiated and reported as two separate chips
 on two different I2C bus addresses.
-LTC3880, LTC3882, and LTC3887 are dual output poly-phase step-down DC/DC
-controllers.
+LTC3880, LTC3882, LTC3886, and LTC3887 are dual output poly-phase step-down
+DC/DC controllers.
 LTC3883 is a single phase step-down DC/DC controller.
 LTM2987 is a 16-channel Power System Manager with two LTC2977 plus
 additional components on a single die. The chip is instantiated and reported
@@ -113,7 +117,7 @@ in[N]_label		"vout[1-8]".
 			LTC2974, LTC2975: N=2-5
 			LTC2977, LTC2980, LTM2987: N=2-9
 			LTC2978: N=2-9
-			LTC3880, LTC3882, LTC3887, LTM4676: N=2-3
+			LTC3880, LTC3882, LTC23886 LTC3887, LTM4676: N=2-3
 			LTC3883: N=2
 in[N]_input		Measured output voltage.
 in[N]_min		Minimum output voltage.
@@ -161,29 +165,30 @@ temp[N]_highest		Highest measured temperature. Not supported for chip
 temp[N]_reset_history	Reset temperature history. Not supported for chip
 			temperature sensor on LTC2974 and LTC2975.
 
-power1_label		"pin". LTC3883 only.
+power1_label		"pin". LTC3883 and LTC3886 only.
 power1_input		Measured input power.
 
 power[N]_label		"pout[1-4]".
 			LTC2974, LTC2975: N=1-4
 			LTC2977, LTC2980, LTM2987: Not supported
 			LTC2978: Not supported
-			LTC3880, LTC3882, LTC3887, LTM4676: N=1-2
+			LTC3880, LTC3882, LTC3886, LTC3887, LTM4676: N=1-2
 			LTC3883: N=2
 power[N]_input		Measured output power.
 
-curr1_label		"iin". LTC3880, LTC3883, LTC3887, and LTM4676 only.
+curr1_label		"iin". LTC3880, LTC3883, LTC3886, LTC3887, and LTM4676
+			only.
 curr1_input		Measured input current.
 curr1_max		Maximum input current.
 curr1_max_alarm		Input current high alarm.
-curr1_highest		Highest input current. LTC3883 only.
-curr1_reset_history	Reset input current history. LTC3883 only.
+curr1_highest		Highest input current. LTC3883 and LTC3886 only.
+curr1_reset_history	Reset input current history. LTC3883 and LTC3886 only.
 
 curr[N]_label		"iout[1-4]".
 			LTC2974, LTC2975: N=1-4
 			LTC2977, LTC2980, LTM2987: not supported
 			LTC2978: not supported
-			LTC3880, LTC3882, LTC3887, LTM4676: N=2-3
+			LTC3880, LTC3882, LTC3886, LTC3887, LTM4676: N=2-3
 			LTC3883: N=2
 curr[N]_input		Measured output current.
 curr[N]_max		Maximum output current.
diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index af778aed4033..5ebc36255c88 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -53,7 +53,7 @@ config SENSORS_LTC2978
 	help
 	  If you say yes here you get hardware monitoring support for Linear
 	  Technology LTC2974, LTC2975, LTC2977, LTC2978, LTC2980, LTC3880,
-	  LTC3883, LTC3887, LTCM2987, and LTM4676.
+	  LTC3883, LTC3886, LTC3887, LTCM2987, and LTM4676.
 
 	  This driver can also be built as a module. If so, the module will
 	  be called ltc2978.
diff --git a/drivers/hwmon/pmbus/ltc2978.c b/drivers/hwmon/pmbus/ltc2978.c
index 48dcde0bc740..60fe8f9839d3 100644
--- a/drivers/hwmon/pmbus/ltc2978.c
+++ b/drivers/hwmon/pmbus/ltc2978.c
@@ -25,7 +25,7 @@
 #include "pmbus.h"
 
 enum chips { ltc2974, ltc2975, ltc2977, ltc2978, ltc2980, ltc3880, ltc3882,
-	ltc3883, ltc3887, ltm2987, ltm4676 };
+	ltc3883, ltc3886, ltc3887, ltm2987, ltm4676 };
 
 /* Common for all chips */
 #define LTC2978_MFR_VOUT_PEAK		0xdd
@@ -47,7 +47,7 @@ enum chips { ltc2974, ltc2975, ltc2977, ltc2978, ltc2980, ltc3880, ltc3882,
 #define LTC3880_MFR_CLEAR_PEAKS		0xe3
 #define LTC3880_MFR_TEMPERATURE2_PEAK	0xf4
 
-/* LTC3883 only */
+/* LTC3883 and LTC3886 only */
 #define LTC3883_MFR_IIN_PEAK		0xe1
 
 /* LTC2975 only */
@@ -69,6 +69,7 @@ enum chips { ltc2974, ltc2975, ltc2977, ltc2978, ltc2980, ltc3880, ltc3882,
 #define LTC3882_ID			0x4200
 #define LTC3882_ID_D1			0x4240	/* Dash 1 */
 #define LTC3883_ID			0x4300
+#define LTC3886_ID			0x4600
 #define LTC3887_ID			0x4700
 #define LTM2987_ID_A			0x8010	/* A/B for two die IDs */
 #define LTM2987_ID_B			0x8020
@@ -417,6 +418,7 @@ static const struct i2c_device_id ltc2978_id[] = {
 	{"ltc3880", ltc3880},
 	{"ltc3882", ltc3882},
 	{"ltc3883", ltc3883},
+	{"ltc3886", ltc3886},
 	{"ltc3887", ltc3887},
 	{"ltm2987", ltm2987},
 	{"ltm4676", ltm4676},
@@ -485,6 +487,8 @@ static int ltc2978_get_id(struct i2c_client *client)
 		return ltc3882;
 	else if (chip_id == LTC3883_ID)
 		return ltc3883;
+	else if (chip_id == LTC3886_ID)
+		return ltc3886;
 	else if (chip_id == LTC3887_ID)
 		return ltc3887;
 	else if (chip_id == LTM2987_ID_A || chip_id == LTM2987_ID_B)
@@ -624,6 +628,21 @@ static int ltc2978_probe(struct i2c_client *client,
 		  | PMBUS_HAVE_PIN | PMBUS_HAVE_POUT | PMBUS_HAVE_TEMP
 		  | PMBUS_HAVE_TEMP2 | PMBUS_HAVE_STATUS_TEMP;
 		break;
+	case ltc3886:
+		data->features |= FEAT_CLEAR_PEAKS;
+		info->read_word_data = ltc3883_read_word_data;
+		info->pages = LTC3880_NUM_PAGES;
+		info->func[0] = PMBUS_HAVE_VIN | PMBUS_HAVE_IIN
+		  | PMBUS_HAVE_STATUS_INPUT
+		  | PMBUS_HAVE_VOUT | PMBUS_HAVE_STATUS_VOUT
+		  | PMBUS_HAVE_IOUT | PMBUS_HAVE_STATUS_IOUT
+		  | PMBUS_HAVE_PIN | PMBUS_HAVE_POUT | PMBUS_HAVE_TEMP
+		  | PMBUS_HAVE_TEMP2 | PMBUS_HAVE_STATUS_TEMP;
+		info->func[1] = PMBUS_HAVE_VOUT | PMBUS_HAVE_STATUS_VOUT
+		  | PMBUS_HAVE_IOUT | PMBUS_HAVE_STATUS_IOUT
+		  | PMBUS_HAVE_POUT
+		  | PMBUS_HAVE_TEMP | PMBUS_HAVE_STATUS_TEMP;
+		break;
 	default:
 		return -ENODEV;
 	}
@@ -650,6 +669,7 @@ static const struct of_device_id ltc2978_of_match[] = {
 	{ .compatible = "lltc,ltc3880" },
 	{ .compatible = "lltc,ltc3882" },
 	{ .compatible = "lltc,ltc3883" },
+	{ .compatible = "lltc,ltc3886" },
 	{ .compatible = "lltc,ltc3887" },
 	{ .compatible = "lltc,ltm2987" },
 	{ .compatible = "lltc,ltm4676" },
-- 
cgit v1.2.3


From c8254e27fd34cdd21febd5b1caca409117914332 Mon Sep 17 00:00:00 2001
From: Wang Dongsheng <dongsheng.wang@freescale.com>
Date: Fri, 14 Aug 2015 11:12:09 +0800
Subject: powerpc/85xx: Add binding for SCFG

SCFG provides SoC specific configuration and status registers for
the chip. Add this for powerpc platform.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 Documentation/devicetree/bindings/powerpc/fsl/scfg.txt | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/powerpc/fsl/scfg.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/scfg.txt b/Documentation/devicetree/bindings/powerpc/fsl/scfg.txt
new file mode 100644
index 000000000000..0532c46b3372
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/fsl/scfg.txt
@@ -0,0 +1,18 @@
+Freescale Supplement configuration unit (SCFG)
+
+SCFG is the supplemental configuration unit, that provides SoC specific
+configuration and status registers for the chip. Such as getting PEX port
+status.
+
+Required properties:
+
+- compatible: should be "fsl,<chip>-scfg"
+- reg: should contain base address and length of SCFG memory-mapped
+registers
+
+Example:
+
+	scfg: global-utilities@fc000 {
+		compatible = "fsl,t1040-scfg";
+		reg = <0xfc000 0x1000>;
+	};
-- 
cgit v1.2.3


From fbd7e41005a4daa676e3f516d07aeff0a81dca2b Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij@linaro.org>
Date: Sat, 11 Jul 2015 14:12:05 +0200
Subject: doc: dt: dma: add binding doc for pl08x

This introduces device tree bindings for the PL08x DMA controllers
when used with fixed signal assignment per channel, i.e. if each
channel on the PL08x is assigned precisely one burst/single signal
set.

[je: remove channel sub-node parsing, use cell value to assign AHB]

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
---
 .../devicetree/bindings/dma/arm-pl08x.txt          | 54 ++++++++++++++++++++++
 1 file changed, 54 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/arm-pl08x.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/dma/arm-pl08x.txt b/Documentation/devicetree/bindings/dma/arm-pl08x.txt
new file mode 100644
index 000000000000..8a0097a029d3
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/arm-pl08x.txt
@@ -0,0 +1,54 @@
+* ARM PrimeCells PL080 and PL081 and derivatives DMA controller
+
+Required properties:
+- compatible: "arm,pl080", "arm,primecell";
+	      "arm,pl081", "arm,primecell";
+- reg: Address range of the PL08x registers
+- interrupt: The PL08x interrupt number
+- clocks: The clock running the IP core clock
+- clock-names: Must contain "apb_pclk"
+- lli-bus-interface-ahb1: if AHB master 1 is eligible for fetching LLIs
+- lli-bus-interface-ahb2: if AHB master 2 is eligible for fetching LLIs
+- mem-bus-interface-ahb1: if AHB master 1 is eligible for fetching memory contents
+- mem-bus-interface-ahb2: if AHB master 2 is eligible for fetching memory contents
+- #dma-cells: must be <2>. First cell should contain the DMA request,
+              second cell should contain either 1 or 2 depending on
+              which AHB master that is used.
+
+Optional properties:
+- dma-channels: contains the total number of DMA channels supported by the DMAC
+- dma-requests: contains the total number of DMA requests supported by the DMAC
+- memcpy-burst-size: the size of the bursts for memcpy: 1, 4, 8, 16, 32
+  64, 128 or 256 bytes are legal values
+- memcpy-bus-width: the bus width used for memcpy: 8, 16 or 32 are legal
+  values
+
+Clients
+Required properties:
+- dmas: List of DMA controller phandle, request channel and AHB master id
+- dma-names: Names of the aforementioned requested channels
+
+Example:
+
+dmac0: dma-controller@10130000 {
+	compatible = "arm,pl080", "arm,primecell";
+	reg = <0x10130000 0x1000>;
+	interrupt-parent = <&vica>;
+	interrupts = <15>;
+	clocks = <&hclkdma0>;
+	clock-names = "apb_pclk";
+	lli-bus-interface-ahb1;
+	lli-bus-interface-ahb2;
+	mem-bus-interface-ahb2;
+	memcpy-burst-size = <256>;
+	memcpy-bus-width = <32>;
+	#dma-cells = <2>;
+};
+
+device@40008000 {
+	...
+	dmas = <&dmac0 0 2
+		&dmac0 1 2>;
+	dma-names = "tx", "rx";
+	...
+};
-- 
cgit v1.2.3


From df48f3ff95c432cc3469e0df67da76ffcf2237af Mon Sep 17 00:00:00 2001
From: Joachim Eastwood <manabian@gmail.com>
Date: Sat, 11 Jul 2015 14:12:07 +0200
Subject: doc: dt: dma: add bindings for lpc1850-dmamux

Add device tree bindings documentation for the
lpc1850-dmamux DMA router.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
---
 .../devicetree/bindings/dma/lpc1850-dmamux.txt     | 54 ++++++++++++++++++++++
 1 file changed, 54 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/lpc1850-dmamux.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/dma/lpc1850-dmamux.txt b/Documentation/devicetree/bindings/dma/lpc1850-dmamux.txt
new file mode 100644
index 000000000000..87740adb2995
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/lpc1850-dmamux.txt
@@ -0,0 +1,54 @@
+NXP LPC18xx/43xx DMA MUX (DMA request router)
+
+Required properties:
+- compatible:	"nxp,lpc1850-dmamux"
+- reg:		Memory map for accessing module
+- #dma-cells:	Should be set to <3>.
+		* 1st cell contain the master dma request signal
+		* 2nd cell contain the mux value (0-3) for the peripheral
+		* 3rd cell contain either 1 or 2 depending on the AHB
+		  master used.
+- dma-requests:	Number of DMA requests for the mux
+- dma-masters:	phandle pointing to the DMA controller
+
+The DMA controller node need to have the following poroperties:
+- dma-requests:	Number of DMA requests the controller can handle
+
+Example:
+
+dmac: dma@40002000 {
+	compatible = "nxp,lpc1850-gpdma", "arm,pl080", "arm,primecell";
+	arm,primecell-periphid = <0x00041080>;
+	reg = <0x40002000 0x1000>;
+	interrupts = <2>;
+	clocks = <&ccu1 CLK_CPU_DMA>;
+	clock-names = "apb_pclk";
+	#dma-cells = <2>;
+	dma-channels = <8>;
+	dma-requests = <16>;
+	lli-bus-interface-ahb1;
+	lli-bus-interface-ahb2;
+	mem-bus-interface-ahb1;
+	mem-bus-interface-ahb2;
+	memcpy-burst-size = <256>;
+	memcpy-bus-width = <32>;
+};
+
+dmamux: dma-mux {
+	compatible = "nxp,lpc1850-dmamux";
+	#dma-cells = <3>;
+	dma-requests = <64>;
+	dma-masters = <&dmac>;
+};
+
+uart0: serial@40081000 {
+	compatible = "nxp,lpc1850-uart", "ns16550a";
+	reg = <0x40081000 0x1000>;
+	reg-shift = <2>;
+	interrupts = <24>;
+	clocks = <&ccu2 CLK_APB0_UART0>, <&ccu1 CLK_CPU_UART0>;
+	clock-names = "uartclk", "reg";
+	dmas = <&dmamux 1 1 2
+		&dmamux 2 1 2>;
+	dma-names = "tx", "rx";
+};
-- 
cgit v1.2.3


From dba3398381dd1175c74721c97d1daf8fc5939276 Mon Sep 17 00:00:00 2001
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Tue, 18 Aug 2015 09:57:15 -0700
Subject: Revert "usb: interface authorization: Documentation part"

This reverts commit 6ef2bf71764708f7c58ee9300acd8df05dbaa06f as the
signed-off-by address is invalid.

Cc: Stefan Koch <stefan.koch10@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/ABI/testing/sysfs-bus-usb | 22 ---------------------
 Documentation/usb/authorization.txt     | 34 ---------------------------------
 2 files changed, 56 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb
index 5c6c26446f2a..864637f25bee 100644
--- a/Documentation/ABI/testing/sysfs-bus-usb
+++ b/Documentation/ABI/testing/sysfs-bus-usb
@@ -1,25 +1,3 @@
-What:		/sys/bus/usb/devices/INTERFACE/authorized
-Date:		June 2015
-KernelVersion:	4.2
-Description:
-		This allows to authorize (1) or deauthorize (0)
-		individual interfaces instead a whole device
-		in contrast to the device authorization.
-		If a deauthorized interface will be authorized
-		so the driver probing must be triggered manually
-		by writing INTERFACE to /sys/bus/usb/drivers_probe
-		This allows to avoid side-effects with drivers
-		that need multiple interfaces.
-		A deauthorized interface cannot be probed or claimed.
-
-What:		/sys/bus/usb/devices/usbX/interface_authorized_default
-Date:		June 2015
-KernelVersion:	4.2
-Description:
-		This is used as default value that determines
-		if interfaces would authorized per default.
-		The value can be 1 or 0. It is per default 1.
-
 What:		/sys/bus/usb/device/.../authorized
 Date:		July 2008
 KernelVersion:	2.6.26
diff --git a/Documentation/usb/authorization.txt b/Documentation/usb/authorization.txt
index 020cec5585ce..c069b6884c77 100644
--- a/Documentation/usb/authorization.txt
+++ b/Documentation/usb/authorization.txt
@@ -3,9 +3,6 @@ Authorizing (or not) your USB devices to connect to the system
 
 (C) 2007 Inaky Perez-Gonzalez <inaky@linux.intel.com> Intel Corporation
 
-Interface authorization part:
-	(C) 2015 Stefan Koch <skoch@suse.de> SUSE LLC
-
 This feature allows you to control if a USB device can be used (or
 not) in a system. This feature will allow you to implement a lock-down
 of USB devices, fully controlled by user space.
@@ -93,34 +90,3 @@ etc, but you get the idea. Anybody with access to a device gadget kit
 can fake descriptors and device info. Don't trust that. You are
 welcome.
 
-
-Interface authorization
------------------------
-There is a similar approach to allow or deny specific USB interfaces.
-That allows to block only a subset of an USB device.
-
-Authorize an interface:
-$ echo 1 > /sys/bus/usb/devices/INTERFACE/authorized
-
-Deauthorize an interface:
-$ echo 0 > /sys/bus/usb/devices/INTERFACE/authorized
-
-The default value for new interfaces
-on a particular USB bus can be changed, too.
-
-Allow interfaces per default:
-$ echo 1 > /sys/bus/usb/devices/usbX/interface_authorized_default
-
-Deny interfaces per default:
-$ echo 0 > /sys/bus/usb/devices/usbX/interface_authorized_default
-
-Per default the interface_authorized_default bit is 1.
-So all interfaces would authorized per default.
-
-Note:
-If a deauthorized interface will be authorized so the driver probing must
-be triggered manually by writing INTERFACE to /sys/bus/usb/drivers_probe
-
-For drivers that need multiple interfaces all needed interfaces should be
-authroized first. After that the drivers should be probed.
-This avoids side effects.
-- 
cgit v1.2.3


From 1e72e6f8859a598bfc22cf268c2dafe8ddb9f1b4 Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew@lunn.ch>
Date: Mon, 17 Aug 2015 23:52:50 +0200
Subject: net: dsa: Allow multi hop routes to be expressed

With more than two switches in a hierarchy, it becomes necessary to
describe multi-hop routes between switches. The current binding does
not allow this, although the older platform_data did. Extend the link
property to be a list rather than a single phandle to a remote switch.
It is then possible to express that a port should be used to reach
more than one switch and the switch maybe more than one hop away.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/devicetree/bindings/net/dsa/dsa.txt | 33 +++++++++++++++----
 net/dsa/dsa.c                                     | 40 +++++++++++++++++------
 2 files changed, 57 insertions(+), 16 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt
index 9cf9a0ec333c..04e6bef3ac3f 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
@@ -44,9 +44,10 @@ Note that a port labelled "dsa" will imply checking for the uplink phandle
 described below.
 
 Optionnal property:
-- link			: Should be a phandle to another switch's DSA port.
+- link			: Should be a list of phandles to another switch's DSA port.
 			  This property is only used when switches are being
-			  chained/cascaded together.
+			  chained/cascaded together. This port is used as outgoing port
+			  towards the phandle port, which can be more than one hop away.
 
 - phy-handle		: Phandle to a PHY on an external MDIO bus, not the
 			  switch internal one. See
@@ -100,10 +101,11 @@ Example:
 				label = "cpu";
 			};
 
-			switch0uplink: port@6 {
+			switch0port6: port@6 {
 				reg = <6>;
 				label = "dsa";
-				link = <&switch1uplink>;
+				link = <&switch1port0
+				        &switch2port0>;
 			};
 		};
 
@@ -113,10 +115,29 @@ Example:
 			reg = <17 1>;	/* MDIO address 17, switch 1 in tree */
 			mii-bus = <&mii_bus1>;
 
-			switch1uplink: port@0 {
+			switch1port0: port@0 {
 				reg = <0>;
 				label = "dsa";
-				link = <&switch0uplink>;
+				link = <&switch0port6>;
+			};
+			switch1port1: port@1 {
+				reg = <1>;
+				label = "dsa";
+				link = <&switch2port1>;
+			};
+		};
+
+		switch@2 {
+			#address-cells = <1>;
+			#size-cells = <0>;
+			reg = <18 2>;	/* MDIO address 18, switch 2 in tree */
+			mii-bus = <&mii_bus1>;
+
+			switch2port0: port@0 {
+				reg = <0>;
+				label = "dsa";
+				link = <&switch1port1
+				        &switch0port6>;
 			};
 		};
 	};
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 78d4ac97aae3..053eb2b8e682 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -554,6 +554,31 @@ static int dsa_of_setup_routing_table(struct dsa_platform_data *pd,
 	return 0;
 }
 
+static int dsa_of_probe_links(struct dsa_platform_data *pd,
+			      struct dsa_chip_data *cd,
+			      int chip_index, int port_index,
+			      struct device_node *port,
+			      const char *port_name)
+{
+	struct device_node *link;
+	int link_index;
+	int ret;
+
+	for (link_index = 0;; link_index++) {
+		link = of_parse_phandle(port, "link", link_index);
+		if (!link)
+			break;
+
+		if (!strcmp(port_name, "dsa") && pd->nr_chips > 1) {
+			ret = dsa_of_setup_routing_table(pd, cd, chip_index,
+							 port_index, link);
+			if (ret)
+				return ret;
+		}
+	}
+	return 0;
+}
+
 static void dsa_of_free_platform_data(struct dsa_platform_data *pd)
 {
 	int i;
@@ -573,7 +598,7 @@ static void dsa_of_free_platform_data(struct dsa_platform_data *pd)
 static int dsa_of_probe(struct device *dev)
 {
 	struct device_node *np = dev->of_node;
-	struct device_node *child, *mdio, *ethernet, *port, *link;
+	struct device_node *child, *mdio, *ethernet, *port;
 	struct mii_bus *mdio_bus, *mdio_bus_switch;
 	struct net_device *ethernet_dev;
 	struct dsa_platform_data *pd;
@@ -668,15 +693,10 @@ static int dsa_of_probe(struct device *dev)
 				goto out_free_chip;
 			}
 
-			link = of_parse_phandle(port, "link", 0);
-
-			if (!strcmp(port_name, "dsa") && link &&
-					pd->nr_chips > 1) {
-				ret = dsa_of_setup_routing_table(pd, cd,
-						chip_index, port_index, link);
-				if (ret)
-					goto out_free_chip;
-			}
+			ret = dsa_of_probe_links(pd, cd, chip_index,
+						 port_index, port, port_name);
+			if (ret)
+				goto out_free_chip;
 
 		}
 	}
-- 
cgit v1.2.3


From bd49784fd1e8f42c7600fbfa206361324857f373 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue, 18 Aug 2015 16:26:16 -0400
Subject: dm stats: report precise_timestamps and histogram in @stats_list
 output

If the user selected the precise_timestamps or histogram options, report
it in the @stats_list message output.

If the user didn't select these options, no extra tokens are reported,
thus it is backward compatible with old software that doesn't know about
precise timestamps and histogram.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.2
---
 Documentation/device-mapper/statistics.txt |  4 ++++
 drivers/md/dm-stats.c                      | 14 +++++++++++++-
 include/uapi/linux/dm-ioctl.h              |  4 ++--
 3 files changed, 19 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/device-mapper/statistics.txt b/Documentation/device-mapper/statistics.txt
index 4919b2dfd1b3..6f5ef944ca4c 100644
--- a/Documentation/device-mapper/statistics.txt
+++ b/Documentation/device-mapper/statistics.txt
@@ -121,6 +121,10 @@ Messages
 
 	Output format:
 	  <region_id>: <start_sector>+<length> <step> <program_id> <aux_data>
+	        precise_timestamps histogram:n1,n2,n3,...
+
+	The strings "precise_timestamps" and "histogram" are printed only
+	if they were specified when creating the region.
 
     @stats_print <region_id> [<starting_line> <number_of_lines>]
 
diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
index 8a8b48fa901a..8289804ccd99 100644
--- a/drivers/md/dm-stats.c
+++ b/drivers/md/dm-stats.c
@@ -457,12 +457,24 @@ static int dm_stats_list(struct dm_stats *stats, const char *program,
 	list_for_each_entry(s, &stats->list, list_entry) {
 		if (!program || !strcmp(program, s->program_id)) {
 			len = s->end - s->start;
-			DMEMIT("%d: %llu+%llu %llu %s %s\n", s->id,
+			DMEMIT("%d: %llu+%llu %llu %s %s", s->id,
 				(unsigned long long)s->start,
 				(unsigned long long)len,
 				(unsigned long long)s->step,
 				s->program_id,
 				s->aux_data);
+			if (s->stat_flags & STAT_PRECISE_TIMESTAMPS)
+				DMEMIT(" precise_timestamps");
+			if (s->n_histogram_entries) {
+				unsigned i;
+				DMEMIT(" histogram:");
+				for (i = 0; i < s->n_histogram_entries; i++) {
+					if (i)
+						DMEMIT(",");
+					DMEMIT("%llu", s->histogram_boundaries[i]);
+				}
+			}
+			DMEMIT("\n");
 		}
 	}
 	mutex_unlock(&stats->mutex);
diff --git a/include/uapi/linux/dm-ioctl.h b/include/uapi/linux/dm-ioctl.h
index 061aca3a962d..d34611e35a30 100644
--- a/include/uapi/linux/dm-ioctl.h
+++ b/include/uapi/linux/dm-ioctl.h
@@ -267,9 +267,9 @@ enum {
 #define DM_DEV_SET_GEOMETRY	_IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl)
 
 #define DM_VERSION_MAJOR	4
-#define DM_VERSION_MINOR	32
+#define DM_VERSION_MINOR	33
 #define DM_VERSION_PATCHLEVEL	0
-#define DM_VERSION_EXTRA	"-ioctl (2015-6-26)"
+#define DM_VERSION_EXTRA	"-ioctl (2015-8-18)"
 
 /* Status bits */
 #define DM_READONLY_FLAG	(1 << 0) /* In/Out */
-- 
cgit v1.2.3


From 77ea733884eb5520f22c36def1309fe2ab61633e Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Tue, 18 Aug 2015 14:55:24 -0700
Subject: blkcg: move io_service_bytes and io_serviced stats into blkcg_gq

Currently, both cfq-iosched and blk-throttle keep track of
io_service_bytes and io_serviced stats.  While keeping track of them
separately may be useful during development, it doesn't make much
sense otherwise.  Also, blk-throttle was counting bio's as IOs while
cfq-iosched request's, which is more confusing than informative.

This patch adds ->stat_bytes and ->stat_ios to blkg (blkcg_gq),
removes the counterparts from cfq-iosched and blk-throttle and let
them print from the common blkg counters.  The common counters are
incremented during bio issue in blkcg_bio_issue_check().

The outputs are still filtered by whether the policy has
blkg_policy_data on a given blkg, so cfq's output won't show up if it
has never been used for a given blkg.  The only times when the outputs
would differ significantly are when policies are attached on the fly
or elevators are switched back and forth.  Those are quite exceptional
operations and I don't think they warrant keeping separate counters.

v3: Update blkio-controller.txt accordingly.

v2: Account IOs during bio issues instead of request completions so
    that bio-based drivers can be handled the same way.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
---
 Documentation/cgroups/blkio-controller.txt | 24 ++------
 block/blk-cgroup.c                         | 98 ++++++++++++++++++++++++++++++
 block/blk-throttle.c                       | 73 ++--------------------
 block/cfq-iosched.c                        | 32 +++-------
 include/linux/blk-cgroup.h                 | 14 +++++
 5 files changed, 133 insertions(+), 108 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index 68b6a6a470b0..12686bec37b9 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -201,7 +201,7 @@ Proportional weight policy files
 	  specifies the number of bytes.
 
 - blkio.io_serviced
-	- Number of IOs completed to/from the disk by the group. These
+	- Number of IOs (bio) issued to the disk by the group. These
 	  are further divided by the type of operation - read or write, sync
 	  or async. First two fields specify the major and minor number of the
 	  device, third field specifies the operation type and the fourth field
@@ -327,18 +327,11 @@ Note: If both BW and IOPS rules are specified for a device, then IO is
       subjected to both the constraints.
 
 - blkio.throttle.io_serviced
-	- Number of IOs (bio) completed to/from the disk by the group (as
-	  seen by throttling policy). These are further divided by the type
-	  of operation - read or write, sync or async. First two fields specify
-	  the major and minor number of the device, third field specifies the
-	  operation type and the fourth field specifies the number of IOs.
-
-	  blkio.io_serviced does accounting as seen by CFQ and counts are in
-	  number of requests (struct request). On the other hand,
-	  blkio.throttle.io_serviced counts number of IO in terms of number
-	  of bios as seen by throttling policy.  These bios can later be
-	  merged by elevator and total number of requests completed can be
-	  lesser.
+	- Number of IOs (bio) issued to the disk by the group. These
+	  are further divided by the type of operation - read or write, sync
+	  or async. First two fields specify the major and minor number of the
+	  device, third field specifies the operation type and the fourth field
+	  specifies the number of IOs.
 
 - blkio.throttle.io_service_bytes
 	- Number of bytes transferred to/from the disk by the group. These
@@ -347,11 +340,6 @@ Note: If both BW and IOPS rules are specified for a device, then IO is
 	  device, third field specifies the operation type and the fourth field
 	  specifies the number of bytes.
 
-	  These numbers should roughly be same as blkio.io_service_bytes as
-	  updated by CFQ. The difference between two is that
-	  blkio.io_service_bytes will not be updated if CFQ is not operating
-	  on request queue.
-
 Common files among various policies
 -----------------------------------
 - blkio.reset_stats
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b26320720a3c..a25263ca39ca 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -73,6 +73,9 @@ static void blkg_free(struct blkcg_gq *blkg)
 
 	if (blkg->blkcg != &blkcg_root)
 		blk_exit_rl(&blkg->rl);
+
+	blkg_rwstat_exit(&blkg->stat_ios);
+	blkg_rwstat_exit(&blkg->stat_bytes);
 	kfree(blkg);
 }
 
@@ -95,6 +98,10 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
 	if (!blkg)
 		return NULL;
 
+	if (blkg_rwstat_init(&blkg->stat_bytes, gfp_mask) ||
+	    blkg_rwstat_init(&blkg->stat_ios, gfp_mask))
+		goto err_free;
+
 	blkg->q = q;
 	INIT_LIST_HEAD(&blkg->q_node);
 	blkg->blkcg = blkcg;
@@ -300,6 +307,7 @@ struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
 static void blkg_destroy(struct blkcg_gq *blkg)
 {
 	struct blkcg *blkcg = blkg->blkcg;
+	struct blkcg_gq *parent = blkg->parent;
 	int i;
 
 	lockdep_assert_held(blkg->q->queue_lock);
@@ -315,6 +323,12 @@ static void blkg_destroy(struct blkcg_gq *blkg)
 		if (blkg->pd[i] && pol->pd_offline_fn)
 			pol->pd_offline_fn(blkg->pd[i]);
 	}
+
+	if (parent) {
+		blkg_rwstat_add_aux(&parent->stat_bytes, &blkg->stat_bytes);
+		blkg_rwstat_add_aux(&parent->stat_ios, &blkg->stat_ios);
+	}
+
 	blkg->online = false;
 
 	radix_tree_delete(&blkcg->blkg_tree, blkg->q->id);
@@ -431,6 +445,9 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css,
 	 * anyway.  If you get hit by a race, retry.
 	 */
 	hlist_for_each_entry(blkg, &blkcg->blkg_list, blkcg_node) {
+		blkg_rwstat_reset(&blkg->stat_bytes);
+		blkg_rwstat_reset(&blkg->stat_ios);
+
 		for (i = 0; i < BLKCG_MAX_POLS; i++) {
 			struct blkcg_policy *pol = blkcg_policy[i];
 
@@ -579,6 +596,87 @@ u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
 }
 EXPORT_SYMBOL_GPL(blkg_prfill_rwstat);
 
+static u64 blkg_prfill_rwstat_field(struct seq_file *sf,
+				    struct blkg_policy_data *pd, int off)
+{
+	struct blkg_rwstat rwstat = blkg_rwstat_read((void *)pd->blkg + off);
+
+	return __blkg_prfill_rwstat(sf, pd, &rwstat);
+}
+
+/**
+ * blkg_print_stat_bytes - seq_show callback for blkg->stat_bytes
+ * @sf: seq_file to print to
+ * @v: unused
+ *
+ * To be used as cftype->seq_show to print blkg->stat_bytes.
+ * cftype->private must be set to the blkcg_policy.
+ */
+int blkg_print_stat_bytes(struct seq_file *sf, void *v)
+{
+	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
+			  blkg_prfill_rwstat_field, (void *)seq_cft(sf)->private,
+			  offsetof(struct blkcg_gq, stat_bytes), true);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(blkg_print_stat_bytes);
+
+/**
+ * blkg_print_stat_bytes - seq_show callback for blkg->stat_ios
+ * @sf: seq_file to print to
+ * @v: unused
+ *
+ * To be used as cftype->seq_show to print blkg->stat_ios.  cftype->private
+ * must be set to the blkcg_policy.
+ */
+int blkg_print_stat_ios(struct seq_file *sf, void *v)
+{
+	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
+			  blkg_prfill_rwstat_field, (void *)seq_cft(sf)->private,
+			  offsetof(struct blkcg_gq, stat_ios), true);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(blkg_print_stat_ios);
+
+static u64 blkg_prfill_rwstat_field_recursive(struct seq_file *sf,
+					      struct blkg_policy_data *pd,
+					      int off)
+{
+	struct blkg_rwstat rwstat = blkg_rwstat_recursive_sum(pd->blkg,
+							      NULL, off);
+	return __blkg_prfill_rwstat(sf, pd, &rwstat);
+}
+
+/**
+ * blkg_print_stat_bytes_recursive - recursive version of blkg_print_stat_bytes
+ * @sf: seq_file to print to
+ * @v: unused
+ */
+int blkg_print_stat_bytes_recursive(struct seq_file *sf, void *v)
+{
+	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
+			  blkg_prfill_rwstat_field_recursive,
+			  (void *)seq_cft(sf)->private,
+			  offsetof(struct blkcg_gq, stat_bytes), true);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(blkg_print_stat_bytes_recursive);
+
+/**
+ * blkg_print_stat_ios_recursive - recursive version of blkg_print_stat_ios
+ * @sf: seq_file to print to
+ * @v: unused
+ */
+int blkg_print_stat_ios_recursive(struct seq_file *sf, void *v)
+{
+	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
+			  blkg_prfill_rwstat_field_recursive,
+			  (void *)seq_cft(sf)->private,
+			  offsetof(struct blkcg_gq, stat_ios), true);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(blkg_print_stat_ios_recursive);
+
 /**
  * blkg_stat_recursive_sum - collect hierarchical blkg_stat
  * @blkg: blkg of interest
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index c0b2263a222a..bd3e4b2c7a9d 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -133,11 +133,6 @@ struct throtl_grp {
 	/* When did we start a new slice */
 	unsigned long slice_start[2];
 	unsigned long slice_end[2];
-
-	/* total bytes transferred */
-	struct blkg_rwstat		service_bytes;
-	/* total IOs serviced, post merge */
-	struct blkg_rwstat		serviced;
 };
 
 struct throtl_data
@@ -335,11 +330,7 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, int node)
 
 	tg = kzalloc_node(sizeof(*tg), gfp, node);
 	if (!tg)
-		goto err;
-
-	if (blkg_rwstat_init(&tg->service_bytes, gfp) ||
-	    blkg_rwstat_init(&tg->serviced, gfp))
-		goto err_free_tg;
+		return NULL;
 
 	throtl_service_queue_init(&tg->service_queue);
 
@@ -355,13 +346,6 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, int node)
 	tg->iops[WRITE] = -1;
 
 	return &tg->pd;
-
-err_free_tg:
-	blkg_rwstat_exit(&tg->serviced);
-	blkg_rwstat_exit(&tg->service_bytes);
-	kfree(tg);
-err:
-	return NULL;
 }
 
 static void throtl_pd_init(struct blkg_policy_data *pd)
@@ -419,19 +403,9 @@ static void throtl_pd_free(struct blkg_policy_data *pd)
 	struct throtl_grp *tg = pd_to_tg(pd);
 
 	del_timer_sync(&tg->service_queue.pending_timer);
-	blkg_rwstat_exit(&tg->serviced);
-	blkg_rwstat_exit(&tg->service_bytes);
 	kfree(tg);
 }
 
-static void throtl_pd_reset_stats(struct blkg_policy_data *pd)
-{
-	struct throtl_grp *tg = pd_to_tg(pd);
-
-	blkg_rwstat_reset(&tg->service_bytes);
-	blkg_rwstat_reset(&tg->serviced);
-}
-
 static struct throtl_grp *
 throtl_rb_first(struct throtl_service_queue *parent_sq)
 {
@@ -839,25 +813,6 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio,
 	return 0;
 }
 
-static void throtl_update_dispatch_stats(struct blkcg_gq *blkg, u64 bytes,
-					 int rw)
-{
-	struct throtl_grp *tg = blkg_to_tg(blkg);
-	unsigned long flags;
-
-	/*
-	 * Disabling interrupts to provide mutual exclusion between two
-	 * writes on same cpu. It probably is not needed for 64bit. Not
-	 * optimizing that case yet.
-	 */
-	local_irq_save(flags);
-
-	blkg_rwstat_add(&tg->serviced, rw, 1);
-	blkg_rwstat_add(&tg->service_bytes, rw, bytes);
-
-	local_irq_restore(flags);
-}
-
 static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
 {
 	bool rw = bio_data_dir(bio);
@@ -871,17 +826,9 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
 	 * more than once as a throttled bio will go through blk-throtl the
 	 * second time when it eventually gets issued.  Set it when a bio
 	 * is being charged to a tg.
-	 *
-	 * Dispatch stats aren't recursive and each @bio should only be
-	 * accounted by the @tg it was originally associated with.  Let's
-	 * update the stats when setting REQ_THROTTLED for the first time
-	 * which is guaranteed to be for the @bio's original tg.
 	 */
-	if (!(bio->bi_rw & REQ_THROTTLED)) {
+	if (!(bio->bi_rw & REQ_THROTTLED))
 		bio->bi_rw |= REQ_THROTTLED;
-		throtl_update_dispatch_stats(tg_to_blkg(tg),
-					     bio->bi_iter.bi_size, bio->bi_rw);
-	}
 }
 
 /**
@@ -1161,13 +1108,6 @@ static void blk_throtl_dispatch_work_fn(struct work_struct *work)
 	}
 }
 
-static int tg_print_rwstat(struct seq_file *sf, void *v)
-{
-	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_rwstat,
-			  &blkcg_policy_throtl, seq_cft(sf)->private, true);
-	return 0;
-}
-
 static u64 tg_prfill_conf_u64(struct seq_file *sf, struct blkg_policy_data *pd,
 			      int off)
 {
@@ -1304,13 +1244,13 @@ static struct cftype throtl_files[] = {
 	},
 	{
 		.name = "throttle.io_service_bytes",
-		.private = offsetof(struct throtl_grp, service_bytes),
-		.seq_show = tg_print_rwstat,
+		.private = (unsigned long)&blkcg_policy_throtl,
+		.seq_show = blkg_print_stat_bytes,
 	},
 	{
 		.name = "throttle.io_serviced",
-		.private = offsetof(struct throtl_grp, serviced),
-		.seq_show = tg_print_rwstat,
+		.private = (unsigned long)&blkcg_policy_throtl,
+		.seq_show = blkg_print_stat_ios,
 	},
 	{ }	/* terminate */
 };
@@ -1329,7 +1269,6 @@ static struct blkcg_policy blkcg_policy_throtl = {
 	.pd_init_fn		= throtl_pd_init,
 	.pd_online_fn		= throtl_pd_online,
 	.pd_free_fn		= throtl_pd_free,
-	.pd_reset_stats_fn	= throtl_pd_reset_stats,
 };
 
 bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index e861cc1ebd62..a948d4df3fc3 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -177,10 +177,6 @@ enum wl_type_t {
 
 struct cfqg_stats {
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
-	/* total bytes transferred */
-	struct blkg_rwstat		service_bytes;
-	/* total IOs serviced, post merge */
-	struct blkg_rwstat		serviced;
 	/* number of ios merged */
 	struct blkg_rwstat		merged;
 	/* total time spent on device in ns, may not be accurate w/ queueing */
@@ -696,8 +692,6 @@ static inline void cfqg_stats_update_dispatch(struct cfq_group *cfqg,
 					      uint64_t bytes, int rw)
 {
 	blkg_stat_add(&cfqg->stats.sectors, bytes >> 9);
-	blkg_rwstat_add(&cfqg->stats.serviced, rw, 1);
-	blkg_rwstat_add(&cfqg->stats.service_bytes, rw, bytes);
 }
 
 static inline void cfqg_stats_update_completion(struct cfq_group *cfqg,
@@ -717,8 +711,6 @@ static inline void cfqg_stats_update_completion(struct cfq_group *cfqg,
 static void cfqg_stats_reset(struct cfqg_stats *stats)
 {
 	/* queued stats shouldn't be cleared */
-	blkg_rwstat_reset(&stats->service_bytes);
-	blkg_rwstat_reset(&stats->serviced);
 	blkg_rwstat_reset(&stats->merged);
 	blkg_rwstat_reset(&stats->service_time);
 	blkg_rwstat_reset(&stats->wait_time);
@@ -738,8 +730,6 @@ static void cfqg_stats_reset(struct cfqg_stats *stats)
 static void cfqg_stats_add_aux(struct cfqg_stats *to, struct cfqg_stats *from)
 {
 	/* queued stats shouldn't be cleared */
-	blkg_rwstat_add_aux(&to->service_bytes, &from->service_bytes);
-	blkg_rwstat_add_aux(&to->serviced, &from->serviced);
 	blkg_rwstat_add_aux(&to->merged, &from->merged);
 	blkg_rwstat_add_aux(&to->service_time, &from->service_time);
 	blkg_rwstat_add_aux(&to->wait_time, &from->wait_time);
@@ -1544,8 +1534,6 @@ static void cfq_init_cfqg_base(struct cfq_group *cfqg)
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
 static void cfqg_stats_exit(struct cfqg_stats *stats)
 {
-	blkg_rwstat_exit(&stats->service_bytes);
-	blkg_rwstat_exit(&stats->serviced);
 	blkg_rwstat_exit(&stats->merged);
 	blkg_rwstat_exit(&stats->service_time);
 	blkg_rwstat_exit(&stats->wait_time);
@@ -1566,9 +1554,7 @@ static void cfqg_stats_exit(struct cfqg_stats *stats)
 
 static int cfqg_stats_init(struct cfqg_stats *stats, gfp_t gfp)
 {
-	if (blkg_rwstat_init(&stats->service_bytes, gfp) ||
-	    blkg_rwstat_init(&stats->serviced, gfp) ||
-	    blkg_rwstat_init(&stats->merged, gfp) ||
+	if (blkg_rwstat_init(&stats->merged, gfp) ||
 	    blkg_rwstat_init(&stats->service_time, gfp) ||
 	    blkg_rwstat_init(&stats->wait_time, gfp) ||
 	    blkg_rwstat_init(&stats->queued, gfp) ||
@@ -1994,13 +1980,13 @@ static struct cftype cfq_blkcg_files[] = {
 	},
 	{
 		.name = "io_service_bytes",
-		.private = offsetof(struct cfq_group, stats.service_bytes),
-		.seq_show = cfqg_print_rwstat,
+		.private = (unsigned long)&blkcg_policy_cfq,
+		.seq_show = blkg_print_stat_bytes,
 	},
 	{
 		.name = "io_serviced",
-		.private = offsetof(struct cfq_group, stats.serviced),
-		.seq_show = cfqg_print_rwstat,
+		.private = (unsigned long)&blkcg_policy_cfq,
+		.seq_show = blkg_print_stat_ios,
 	},
 	{
 		.name = "io_service_time",
@@ -2036,13 +2022,13 @@ static struct cftype cfq_blkcg_files[] = {
 	},
 	{
 		.name = "io_service_bytes_recursive",
-		.private = offsetof(struct cfq_group, stats.service_bytes),
-		.seq_show = cfqg_print_rwstat_recursive,
+		.private = (unsigned long)&blkcg_policy_cfq,
+		.seq_show = blkg_print_stat_bytes_recursive,
 	},
 	{
 		.name = "io_serviced_recursive",
-		.private = offsetof(struct cfq_group, stats.serviced),
-		.seq_show = cfqg_print_rwstat_recursive,
+		.private = (unsigned long)&blkcg_policy_cfq,
+		.seq_show = blkg_print_stat_ios_recursive,
 	},
 	{
 		.name = "io_service_time_recursive",
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 4630ce8f9425..286e1bde249f 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -127,6 +127,9 @@ struct blkcg_gq {
 	/* is this blkg online? protected by both blkcg and q locks */
 	bool				online;
 
+	struct blkg_rwstat		stat_bytes;
+	struct blkg_rwstat		stat_ios;
+
 	struct blkg_policy_data		*pd[BLKCG_MAX_POLS];
 
 	struct rcu_head			rcu_head;
@@ -190,6 +193,10 @@ u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
 u64 blkg_prfill_stat(struct seq_file *sf, struct blkg_policy_data *pd, int off);
 u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd,
 		       int off);
+int blkg_print_stat_bytes(struct seq_file *sf, void *v);
+int blkg_print_stat_ios(struct seq_file *sf, void *v);
+int blkg_print_stat_bytes_recursive(struct seq_file *sf, void *v);
+int blkg_print_stat_ios_recursive(struct seq_file *sf, void *v);
 
 u64 blkg_stat_recursive_sum(struct blkcg_gq *blkg,
 			    struct blkcg_policy *pol, int off);
@@ -700,6 +707,13 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q,
 
 	throtl = blk_throtl_bio(q, blkg, bio);
 
+	if (!throtl) {
+		blkg = blkg ?: q->root_blkg;
+		blkg_rwstat_add(&blkg->stat_bytes, bio->bi_flags,
+				bio->bi_iter.bi_size);
+		blkg_rwstat_add(&blkg->stat_ios, bio->bi_flags, 1);
+	}
+
 	rcu_read_unlock();
 	return !throtl;
 }
-- 
cgit v1.2.3


From 2ee867dcfa2eaef1063b686da55c35878b2da4a2 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Tue, 18 Aug 2015 14:55:34 -0700
Subject: blkcg: implement interface for the unified hierarchy

blkcg interface grew to be the biggest of all controllers and
unfortunately most inconsistent too.  The interface files are
inconsistent with a number of cloes duplicates.  Some files have
recursive variants while others don't.  There's distinction between
normal and leaf weights which isn't intuitive and there are a lot of
stat knobs which don't make much sense outside of debugging and expose
too much implementation details to userland.

In the unified hierarchy, everything is always hierarchical and
internal nodes can't have tasks rendering the two structural issues
twisting the current interface.  The interface has to be updated in a
significant anyway and this is a good chance to revamp it as a whole.
This patch implements blkcg interface for the unified hierarchy.

* (from a previous patch) blkcg is identified by "io" instead of
  "blkio" on the unified hierarchy.  Given that the whole interface is
  updated anyway, the rename shouldn't carry noticeable conversion
  overhead.

* The original interface consisted of 27 files is replaced with the
  following three files.

  blkio.stat	: per-blkcg stats
  blkio.weight	: per-cgroup and per-cgroup-queue weight settings
  blkio.max	: per-cgroup-queue bps and iops max limits

Documentation/cgroups/unified-hierarchy.txt updated accordingly.

v2: blkcg_policy->dfl_cftypes wasn't removed on
    blkcg_policy_unregister() corrupting the cftypes list.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
---
 Documentation/cgroups/unified-hierarchy.txt |  61 ++++++++++++++-
 block/blk-cgroup.c                          |  53 +++++++++++++
 block/blk-throttle.c                        | 112 ++++++++++++++++++++++++++++
 block/cfq-iosched.c                         |  61 +++++++++++++--
 include/linux/blk-cgroup.h                  |   1 +
 5 files changed, 279 insertions(+), 9 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/cgroups/unified-hierarchy.txt b/Documentation/cgroups/unified-hierarchy.txt
index 1ee9caf29e57..bd1ce15d5178 100644
--- a/Documentation/cgroups/unified-hierarchy.txt
+++ b/Documentation/cgroups/unified-hierarchy.txt
@@ -27,7 +27,7 @@ CONTENTS
     5-3-1. Format
     5-3-2. Control Knobs
   5-4. Per-Controller Changes
-    5-4-1. blkio
+    5-4-1. io
     5-4-2. cpuset
     5-4-3. memory
 6. Planned Changes
@@ -203,7 +203,7 @@ other issues.  The mapping from nice level to weight isn't obvious or
 universal, and there are various other knobs which simply aren't
 available for tasks.
 
-The blkio controller implicitly creates a hidden leaf node for each
+The io controller implicitly creates a hidden leaf node for each
 cgroup to host the tasks.  The hidden leaf has its own copies of all
 the knobs with "leaf_" prefixed.  While this allows equivalent control
 over internal tasks, it's with serious drawbacks.  It always adds an
@@ -438,9 +438,62 @@ may be specified in any order and not all pairs have to be specified.
 
 5-4. Per-Controller Changes
 
-5-4-1. blkio
+5-4-1. io
 
-- blk-throttle becomes properly hierarchical.
+- blkio is renamed to io.  The interface is overhauled anyway.  The
+  new name is more in line with the other two major controllers, cpu
+  and memory, and better suited given that it may be used for cgroup
+  writeback without involving block layer.
+
+- Everything including stat is always hierarchical making separate
+  recursive stat files pointless and, as no internal node can have
+  tasks, leaf weights are meaningless.  The operation model is
+  simplified and the interface is overhauled accordingly.
+
+  io.stat
+
+	The stat file.  The reported stats are from the point where
+	bio's are issued to request_queue.  The stats are counted
+	independent of which policies are enabled.  Each line in the
+	file follows the following format.  More fields may later be
+	added at the end.
+
+	  $MAJ:$MIN rbytes=$RBYTES wbytes=$WBYTES rios=$RIOS wrios=$WIOS
+
+  io.weight
+
+	The weight setting, currently only available and effective if
+	cfq-iosched is in use for the target device.  The weight is
+	between 10 and 1000 and defaults to 500.  The first line
+	always contains the default weight in the following format to
+	use when per-device setting is missing.
+
+	  default $WEIGHT
+
+	Subsequent lines list per-device weights of the following
+	format.
+
+	  $MAJ:$MIN $WEIGHT
+
+	Writing "$WEIGHT" or "default $WEIGHT" changes the default
+	setting.  Writing "$MAJ:$MIN $WEIGHT" sets per-device weight
+	while "$MAJ:$MIN default" clears it.
+
+	This file is available only on non-root cgroups.
+
+  io.max
+
+	The maximum bandwidth and/or iops setting, only available if
+	blk-throttle is enabled.  The file is of the following format.
+
+	  $MAJ:$MIN rbps=$RBPS wbps=$WBPS riops=$RIOPS wiops=$WIOPS
+
+	${R|W}BPS are read/write bytes per second and ${R|W}IOPS are
+	read/write IOs per second.  "max" indicates no limit.  Writing
+	to the file follows the same format but the individual
+	settings may be ommitted or specified in any order.
+
+	This file is available only on non-root cgroups.
 
 
 5-4-2. cpuset
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b5e72d756be1..88bdb73bd5e0 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -854,6 +854,53 @@ void blkg_conf_finish(struct blkg_conf_ctx *ctx)
 }
 EXPORT_SYMBOL_GPL(blkg_conf_finish);
 
+static int blkcg_print_stat(struct seq_file *sf, void *v)
+{
+	struct blkcg *blkcg = css_to_blkcg(seq_css(sf));
+	struct blkcg_gq *blkg;
+
+	rcu_read_lock();
+
+	hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) {
+		const char *dname;
+		struct blkg_rwstat rwstat;
+		u64 rbytes, wbytes, rios, wios;
+
+		dname = blkg_dev_name(blkg);
+		if (!dname)
+			continue;
+
+		spin_lock_irq(blkg->q->queue_lock);
+
+		rwstat = blkg_rwstat_recursive_sum(blkg, NULL,
+					offsetof(struct blkcg_gq, stat_bytes));
+		rbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_READ]);
+		wbytes = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]);
+
+		rwstat = blkg_rwstat_recursive_sum(blkg, NULL,
+					offsetof(struct blkcg_gq, stat_ios));
+		rios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_READ]);
+		wios = atomic64_read(&rwstat.aux_cnt[BLKG_RWSTAT_WRITE]);
+
+		spin_unlock_irq(blkg->q->queue_lock);
+
+		if (rbytes || wbytes || rios || wios)
+			seq_printf(sf, "%s rbytes=%llu wbytes=%llu rios=%llu wios=%llu\n",
+				   dname, rbytes, wbytes, rios, wios);
+	}
+
+	rcu_read_unlock();
+	return 0;
+}
+
+struct cftype blkcg_files[] = {
+	{
+		.name = "stat",
+		.seq_show = blkcg_print_stat,
+	},
+	{ }	/* terminate */
+};
+
 struct cftype blkcg_legacy_files[] = {
 	{
 		.name = "reset_stats",
@@ -1101,6 +1148,7 @@ struct cgroup_subsys io_cgrp_subsys = {
 	.css_offline = blkcg_css_offline,
 	.css_free = blkcg_css_free,
 	.can_attach = blkcg_can_attach,
+	.dfl_cftypes = blkcg_files,
 	.legacy_cftypes = blkcg_legacy_files,
 	.legacy_name = "blkio",
 #ifdef CONFIG_MEMCG
@@ -1273,6 +1321,9 @@ int blkcg_policy_register(struct blkcg_policy *pol)
 	mutex_unlock(&blkcg_pol_mutex);
 
 	/* everything is in place, add intf files for the new policy */
+	if (pol->dfl_cftypes)
+		WARN_ON(cgroup_add_dfl_cftypes(&io_cgrp_subsys,
+					       pol->dfl_cftypes));
 	if (pol->legacy_cftypes)
 		WARN_ON(cgroup_add_legacy_cftypes(&io_cgrp_subsys,
 						  pol->legacy_cftypes));
@@ -1312,6 +1363,8 @@ void blkcg_policy_unregister(struct blkcg_policy *pol)
 		goto out_unlock;
 
 	/* kill the intf files first */
+	if (pol->dfl_cftypes)
+		cgroup_rm_cftypes(pol->dfl_cftypes);
 	if (pol->legacy_cftypes)
 		cgroup_rm_cftypes(pol->legacy_cftypes);
 
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index a8bb2fd8f523..c75a2636dd40 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1265,6 +1265,117 @@ static struct cftype throtl_legacy_files[] = {
 	{ }	/* terminate */
 };
 
+static u64 tg_prfill_max(struct seq_file *sf, struct blkg_policy_data *pd,
+			 int off)
+{
+	struct throtl_grp *tg = pd_to_tg(pd);
+	const char *dname = blkg_dev_name(pd->blkg);
+	char bufs[4][21] = { "max", "max", "max", "max" };
+
+	if (!dname)
+		return 0;
+	if (tg->bps[READ] == -1 && tg->bps[WRITE] == -1 &&
+	    tg->iops[READ] == -1 && tg->iops[WRITE] == -1)
+		return 0;
+
+	if (tg->bps[READ] != -1)
+		snprintf(bufs[0], sizeof(bufs[0]), "%llu", tg->bps[READ]);
+	if (tg->bps[WRITE] != -1)
+		snprintf(bufs[1], sizeof(bufs[1]), "%llu", tg->bps[WRITE]);
+	if (tg->iops[READ] != -1)
+		snprintf(bufs[2], sizeof(bufs[2]), "%u", tg->iops[READ]);
+	if (tg->iops[WRITE] != -1)
+		snprintf(bufs[3], sizeof(bufs[3]), "%u", tg->iops[WRITE]);
+
+	seq_printf(sf, "%s rbps=%s wbps=%s riops=%s wiops=%s\n",
+		   dname, bufs[0], bufs[1], bufs[2], bufs[3]);
+	return 0;
+}
+
+static int tg_print_max(struct seq_file *sf, void *v)
+{
+	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), tg_prfill_max,
+			  &blkcg_policy_throtl, seq_cft(sf)->private, false);
+	return 0;
+}
+
+static ssize_t tg_set_max(struct kernfs_open_file *of,
+			  char *buf, size_t nbytes, loff_t off)
+{
+	struct blkcg *blkcg = css_to_blkcg(of_css(of));
+	struct blkg_conf_ctx ctx;
+	struct throtl_grp *tg;
+	u64 v[4];
+	int ret;
+
+	ret = blkg_conf_prep(blkcg, &blkcg_policy_throtl, buf, &ctx);
+	if (ret)
+		return ret;
+
+	tg = blkg_to_tg(ctx.blkg);
+
+	v[0] = tg->bps[READ];
+	v[1] = tg->bps[WRITE];
+	v[2] = tg->iops[READ];
+	v[3] = tg->iops[WRITE];
+
+	while (true) {
+		char tok[27];	/* wiops=18446744073709551616 */
+		char *p;
+		u64 val = -1;
+		int len;
+
+		if (sscanf(ctx.body, "%26s%n", tok, &len) != 1)
+			break;
+		if (tok[0] == '\0')
+			break;
+		ctx.body += len;
+
+		ret = -EINVAL;
+		p = tok;
+		strsep(&p, "=");
+		if (!p || (sscanf(p, "%llu", &val) != 1 && strcmp(p, "max")))
+			goto out_finish;
+
+		ret = -ERANGE;
+		if (!val)
+			goto out_finish;
+
+		ret = -EINVAL;
+		if (!strcmp(tok, "rbps"))
+			v[0] = val;
+		else if (!strcmp(tok, "wbps"))
+			v[1] = val;
+		else if (!strcmp(tok, "riops"))
+			v[2] = min_t(u64, val, UINT_MAX);
+		else if (!strcmp(tok, "wiops"))
+			v[3] = min_t(u64, val, UINT_MAX);
+		else
+			goto out_finish;
+	}
+
+	tg->bps[READ] = v[0];
+	tg->bps[WRITE] = v[1];
+	tg->iops[READ] = v[2];
+	tg->iops[WRITE] = v[3];
+
+	tg_conf_updated(tg);
+	ret = 0;
+out_finish:
+	blkg_conf_finish(&ctx);
+	return ret ?: nbytes;
+}
+
+static struct cftype throtl_files[] = {
+	{
+		.name = "max",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = tg_print_max,
+		.write = tg_set_max,
+	},
+	{ }	/* terminate */
+};
+
 static void throtl_shutdown_wq(struct request_queue *q)
 {
 	struct throtl_data *td = q->td;
@@ -1273,6 +1384,7 @@ static void throtl_shutdown_wq(struct request_queue *q)
 }
 
 static struct blkcg_policy blkcg_policy_throtl = {
+	.dfl_cftypes		= throtl_files,
 	.legacy_cftypes		= throtl_legacy_files,
 
 	.pd_alloc_fn		= throtl_pd_alloc,
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 7a7230136e43..97da5719c87a 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1740,7 +1740,7 @@ static int cfq_print_leaf_weight(struct seq_file *sf, void *v)
 
 static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
 					char *buf, size_t nbytes, loff_t off,
-					bool is_leaf_weight)
+					bool on_dfl, bool is_leaf_weight)
 {
 	struct blkcg *blkcg = css_to_blkcg(of_css(of));
 	struct blkg_conf_ctx ctx;
@@ -1753,9 +1753,17 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
 	if (ret)
 		return ret;
 
-	ret = -EINVAL;
-	if (sscanf(ctx.body, "%llu", &v) != 1)
+	if (sscanf(ctx.body, "%llu", &v) == 1) {
+		/* require "default" on dfl */
+		ret = -ERANGE;
+		if (!v && on_dfl)
+			goto out_finish;
+	} else if (!strcmp(strim(ctx.body), "default")) {
+		v = 0;
+	} else {
+		ret = -EINVAL;
 		goto out_finish;
+	}
 
 	cfqg = blkg_to_cfqg(ctx.blkg);
 	cfqgd = blkcg_to_cfqgd(blkcg);
@@ -1779,13 +1787,13 @@ out_finish:
 static ssize_t cfqg_set_weight_device(struct kernfs_open_file *of,
 				      char *buf, size_t nbytes, loff_t off)
 {
-	return __cfqg_set_weight_device(of, buf, nbytes, off, false);
+	return __cfqg_set_weight_device(of, buf, nbytes, off, false, false);
 }
 
 static ssize_t cfqg_set_leaf_weight_device(struct kernfs_open_file *of,
 					   char *buf, size_t nbytes, loff_t off)
 {
-	return __cfqg_set_weight_device(of, buf, nbytes, off, true);
+	return __cfqg_set_weight_device(of, buf, nbytes, off, false, true);
 }
 
 static int __cfq_set_weight(struct cgroup_subsys_state *css, u64 val,
@@ -2103,6 +2111,48 @@ static struct cftype cfq_blkcg_legacy_files[] = {
 #endif	/* CONFIG_DEBUG_BLK_CGROUP */
 	{ }	/* terminate */
 };
+
+static int cfq_print_weight_on_dfl(struct seq_file *sf, void *v)
+{
+	struct blkcg *blkcg = css_to_blkcg(seq_css(sf));
+	struct cfq_group_data *cgd = blkcg_to_cfqgd(blkcg);
+
+	seq_printf(sf, "default %u\n", cgd->weight);
+	blkcg_print_blkgs(sf, blkcg, cfqg_prfill_weight_device,
+			  &blkcg_policy_cfq, 0, false);
+	return 0;
+}
+
+static ssize_t cfq_set_weight_on_dfl(struct kernfs_open_file *of,
+				     char *buf, size_t nbytes, loff_t off)
+{
+	char *endp;
+	int ret;
+	u64 v;
+
+	buf = strim(buf);
+
+	/* "WEIGHT" or "default WEIGHT" sets the default weight */
+	v = simple_strtoull(buf, &endp, 0);
+	if (*endp == '\0' || sscanf(buf, "default %llu", &v) == 1) {
+		ret = __cfq_set_weight(of_css(of), v, false);
+		return ret ?: nbytes;
+	}
+
+	/* "MAJ:MIN WEIGHT" */
+	return __cfqg_set_weight_device(of, buf, nbytes, off, true, false);
+}
+
+static struct cftype cfq_blkcg_files[] = {
+	{
+		.name = "weight",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = cfq_print_weight_on_dfl,
+		.write = cfq_set_weight_on_dfl,
+	},
+	{ }	/* terminate */
+};
+
 #else /* GROUP_IOSCHED */
 static struct cfq_group *cfq_lookup_cfqg(struct cfq_data *cfqd,
 					 struct blkcg *blkcg)
@@ -4659,6 +4709,7 @@ static struct elevator_type iosched_cfq = {
 
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
 static struct blkcg_policy blkcg_policy_cfq = {
+	.dfl_cftypes		= cfq_blkcg_files,
 	.legacy_cftypes		= cfq_blkcg_legacy_files,
 
 	.cpd_alloc_fn		= cfq_cpd_alloc,
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index b270aef519c6..9a7c4bd45fff 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -148,6 +148,7 @@ typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkg_policy_data *pd);
 struct blkcg_policy {
 	int				plid;
 	/* cgroup files for the policy */
+	struct cftype			*dfl_cftypes;
 	struct cftype			*legacy_cftypes;
 
 	/* operations */
-- 
cgit v1.2.3


From 69d7fde5909b614114343974cfc52cb8ff30b544 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Tue, 18 Aug 2015 14:55:36 -0700
Subject: blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified
 hierarchy

cgroup is trying to make interface consistent across different
controllers.  For weight based resource control, the knob should have
the range [1, 10000] and default to 100.  This patch updates
cfq-iosched so that the weight range conforms.  The internal
calculations have enough range and the widening of the weight range
shouldn't cause any problem.

* blkcg_policy->cpd_bind_fn() is added.  If present, this is invoked
  when blkcg is attached to a hierarchy.

* cfq_cpd_init() is updated to use the new default value on the
  unified hierarchy.

* cfq_cpd_bind() callback is implemented to clear per-blkg configs and
  apply the default config matching the hierarchy type.

* cfqd->root_group->[leaf_]weight initialization in cfq_init_queue()
  is moved into !CONFIG_CFQ_GROUP_IOSCHED block.  cfq_cpd_bind() is
  now responsible for initializing the initial weights when blkcg is
  enabled.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
---
 Documentation/cgroups/unified-hierarchy.txt |  2 +-
 block/blk-cgroup.c                          | 21 +++++++++++
 block/cfq-iosched.c                         | 55 +++++++++++++++++++++--------
 include/linux/blk-cgroup.h                  |  2 ++
 4 files changed, 64 insertions(+), 16 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/cgroups/unified-hierarchy.txt b/Documentation/cgroups/unified-hierarchy.txt
index bd1ce15d5178..e0975c2cf03d 100644
--- a/Documentation/cgroups/unified-hierarchy.txt
+++ b/Documentation/cgroups/unified-hierarchy.txt
@@ -464,7 +464,7 @@ may be specified in any order and not all pairs have to be specified.
 
 	The weight setting, currently only available and effective if
 	cfq-iosched is in use for the target device.  The weight is
-	between 10 and 1000 and defaults to 500.  The first line
+	between 1 and 10000 and defaults to 100.  The first line
 	always contains the default weight in the following format to
 	use when per-device setting is missing.
 
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 88bdb73bd5e0..ac8370cb2515 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1143,11 +1143,32 @@ static int blkcg_can_attach(struct cgroup_subsys_state *css,
 	return ret;
 }
 
+static void blkcg_bind(struct cgroup_subsys_state *root_css)
+{
+	int i;
+
+	mutex_lock(&blkcg_pol_mutex);
+
+	for (i = 0; i < BLKCG_MAX_POLS; i++) {
+		struct blkcg_policy *pol = blkcg_policy[i];
+		struct blkcg *blkcg;
+
+		if (!pol || !pol->cpd_bind_fn)
+			continue;
+
+		list_for_each_entry(blkcg, &all_blkcgs, all_blkcgs_node)
+			if (blkcg->cpd[pol->plid])
+				pol->cpd_bind_fn(blkcg->cpd[pol->plid]);
+	}
+	mutex_unlock(&blkcg_pol_mutex);
+}
+
 struct cgroup_subsys io_cgrp_subsys = {
 	.css_alloc = blkcg_css_alloc,
 	.css_offline = blkcg_css_offline,
 	.css_free = blkcg_css_free,
 	.can_attach = blkcg_can_attach,
+	.bind = blkcg_bind,
 	.dfl_cftypes = blkcg_files,
 	.legacy_cftypes = blkcg_legacy_files,
 	.legacy_name = "blkio",
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 0fe721eaff28..04de88463a98 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1522,6 +1522,9 @@ static void cfq_init_cfqg_base(struct cfq_group *cfqg)
 }
 
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
+static int __cfq_set_weight(struct cgroup_subsys_state *css, u64 val,
+			    bool on_dfl, bool reset_dev, bool is_leaf_weight);
+
 static void cfqg_stats_exit(struct cfqg_stats *stats)
 {
 	blkg_rwstat_exit(&stats->merged);
@@ -1578,14 +1581,14 @@ static struct blkcg_policy_data *cfq_cpd_alloc(gfp_t gfp)
 static void cfq_cpd_init(struct blkcg_policy_data *cpd)
 {
 	struct cfq_group_data *cgd = cpd_to_cfqgd(cpd);
+	unsigned int weight = cgroup_on_dfl(blkcg_root.css.cgroup) ?
+			      CGROUP_WEIGHT_DFL : CFQ_WEIGHT_LEGACY_DFL;
 
-	if (cpd_to_blkcg(cpd) == &blkcg_root) {
-		cgd->weight = 2 * CFQ_WEIGHT_LEGACY_DFL;
-		cgd->leaf_weight = 2 * CFQ_WEIGHT_LEGACY_DFL;
-	} else {
-		cgd->weight = CFQ_WEIGHT_LEGACY_DFL;
-		cgd->leaf_weight = CFQ_WEIGHT_LEGACY_DFL;
-	}
+	if (cpd_to_blkcg(cpd) == &blkcg_root)
+		weight *= 2;
+
+	cgd->weight = weight;
+	cgd->leaf_weight = weight;
 }
 
 static void cfq_cpd_free(struct blkcg_policy_data *cpd)
@@ -1593,6 +1596,19 @@ static void cfq_cpd_free(struct blkcg_policy_data *cpd)
 	kfree(cpd_to_cfqgd(cpd));
 }
 
+static void cfq_cpd_bind(struct blkcg_policy_data *cpd)
+{
+	struct blkcg *blkcg = cpd_to_blkcg(cpd);
+	bool on_dfl = cgroup_on_dfl(blkcg_root.css.cgroup);
+	unsigned int weight = on_dfl ? CGROUP_WEIGHT_DFL : CFQ_WEIGHT_LEGACY_DFL;
+
+	if (blkcg == &blkcg_root)
+		weight *= 2;
+
+	WARN_ON_ONCE(__cfq_set_weight(&blkcg->css, weight, on_dfl, true, false));
+	WARN_ON_ONCE(__cfq_set_weight(&blkcg->css, weight, on_dfl, true, true));
+}
+
 static struct blkg_policy_data *cfq_pd_alloc(gfp_t gfp, int node)
 {
 	struct cfq_group *cfqg;
@@ -1742,6 +1758,8 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
 					char *buf, size_t nbytes, loff_t off,
 					bool on_dfl, bool is_leaf_weight)
 {
+	unsigned int min = on_dfl ? CGROUP_WEIGHT_MIN : CFQ_WEIGHT_LEGACY_MIN;
+	unsigned int max = on_dfl ? CGROUP_WEIGHT_MAX : CFQ_WEIGHT_LEGACY_MAX;
 	struct blkcg *blkcg = css_to_blkcg(of_css(of));
 	struct blkg_conf_ctx ctx;
 	struct cfq_group *cfqg;
@@ -1769,7 +1787,7 @@ static ssize_t __cfqg_set_weight_device(struct kernfs_open_file *of,
 	cfqgd = blkcg_to_cfqgd(blkcg);
 
 	ret = -ERANGE;
-	if (!v || (v >= CFQ_WEIGHT_LEGACY_MIN && v <= CFQ_WEIGHT_LEGACY_MAX)) {
+	if (!v || (v >= min && v <= max)) {
 		if (!is_leaf_weight) {
 			cfqg->dev_weight = v;
 			cfqg->new_weight = v ?: cfqgd->weight;
@@ -1797,15 +1815,17 @@ static ssize_t cfqg_set_leaf_weight_device(struct kernfs_open_file *of,
 }
 
 static int __cfq_set_weight(struct cgroup_subsys_state *css, u64 val,
-			    bool is_leaf_weight)
+			    bool on_dfl, bool reset_dev, bool is_leaf_weight)
 {
+	unsigned int min = on_dfl ? CGROUP_WEIGHT_MIN : CFQ_WEIGHT_LEGACY_MIN;
+	unsigned int max = on_dfl ? CGROUP_WEIGHT_MAX : CFQ_WEIGHT_LEGACY_MAX;
 	struct blkcg *blkcg = css_to_blkcg(css);
 	struct blkcg_gq *blkg;
 	struct cfq_group_data *cfqgd;
 	int ret = 0;
 
-	if (val < CFQ_WEIGHT_LEGACY_MIN || val > CFQ_WEIGHT_LEGACY_MAX)
-		return -EINVAL;
+	if (val < min || val > max)
+		return -ERANGE;
 
 	spin_lock_irq(&blkcg->lock);
 	cfqgd = blkcg_to_cfqgd(blkcg);
@@ -1826,9 +1846,13 @@ static int __cfq_set_weight(struct cgroup_subsys_state *css, u64 val,
 			continue;
 
 		if (!is_leaf_weight) {
+			if (reset_dev)
+				cfqg->dev_weight = 0;
 			if (!cfqg->dev_weight)
 				cfqg->new_weight = cfqgd->weight;
 		} else {
+			if (reset_dev)
+				cfqg->dev_leaf_weight = 0;
 			if (!cfqg->dev_leaf_weight)
 				cfqg->new_leaf_weight = cfqgd->leaf_weight;
 		}
@@ -1842,13 +1866,13 @@ out:
 static int cfq_set_weight(struct cgroup_subsys_state *css, struct cftype *cft,
 			  u64 val)
 {
-	return __cfq_set_weight(css, val, false);
+	return __cfq_set_weight(css, val, false, false, false);
 }
 
 static int cfq_set_leaf_weight(struct cgroup_subsys_state *css,
 			       struct cftype *cft, u64 val)
 {
-	return __cfq_set_weight(css, val, true);
+	return __cfq_set_weight(css, val, false, false, true);
 }
 
 static int cfqg_print_stat(struct seq_file *sf, void *v)
@@ -2135,7 +2159,7 @@ static ssize_t cfq_set_weight_on_dfl(struct kernfs_open_file *of,
 	/* "WEIGHT" or "default WEIGHT" sets the default weight */
 	v = simple_strtoull(buf, &endp, 0);
 	if (*endp == '\0' || sscanf(buf, "default %llu", &v) == 1) {
-		ret = __cfq_set_weight(of_css(of), v, false);
+		ret = __cfq_set_weight(of_css(of), v, true, false, false);
 		return ret ?: nbytes;
 	}
 
@@ -4512,9 +4536,9 @@ static int cfq_init_queue(struct request_queue *q, struct elevator_type *e)
 		goto out_free;
 
 	cfq_init_cfqg_base(cfqd->root_group);
-#endif
 	cfqd->root_group->weight = 2 * CFQ_WEIGHT_LEGACY_DFL;
 	cfqd->root_group->leaf_weight = 2 * CFQ_WEIGHT_LEGACY_DFL;
+#endif
 
 	/*
 	 * Not strictly needed (since RB_ROOT just clears the node and we
@@ -4715,6 +4739,7 @@ static struct blkcg_policy blkcg_policy_cfq = {
 	.cpd_alloc_fn		= cfq_cpd_alloc,
 	.cpd_init_fn		= cfq_cpd_init,
 	.cpd_free_fn		= cfq_cpd_free,
+	.cpd_bind_fn		= cfq_cpd_bind,
 
 	.pd_alloc_fn		= cfq_pd_alloc,
 	.pd_init_fn		= cfq_pd_init,
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 9a7c4bd45fff..0a5cc7a1109b 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -138,6 +138,7 @@ struct blkcg_gq {
 typedef struct blkcg_policy_data *(blkcg_pol_alloc_cpd_fn)(gfp_t gfp);
 typedef void (blkcg_pol_init_cpd_fn)(struct blkcg_policy_data *cpd);
 typedef void (blkcg_pol_free_cpd_fn)(struct blkcg_policy_data *cpd);
+typedef void (blkcg_pol_bind_cpd_fn)(struct blkcg_policy_data *cpd);
 typedef struct blkg_policy_data *(blkcg_pol_alloc_pd_fn)(gfp_t gfp, int node);
 typedef void (blkcg_pol_init_pd_fn)(struct blkg_policy_data *pd);
 typedef void (blkcg_pol_online_pd_fn)(struct blkg_policy_data *pd);
@@ -155,6 +156,7 @@ struct blkcg_policy {
 	blkcg_pol_alloc_cpd_fn		*cpd_alloc_fn;
 	blkcg_pol_init_cpd_fn		*cpd_init_fn;
 	blkcg_pol_free_cpd_fn		*cpd_free_fn;
+	blkcg_pol_bind_cpd_fn		*cpd_bind_fn;
 
 	blkcg_pol_alloc_pd_fn		*pd_alloc_fn;
 	blkcg_pol_init_pd_fn		*pd_init_fn;
-- 
cgit v1.2.3


From ccf2dc51bc4a715670641aa5af0d4636acd8e0cd Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Mon, 17 Aug 2015 20:08:09 -0700
Subject: hwmon: (ltc2978) Add support for LTM4675
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

LTM2975 is a dual 9A or single 18A μModule regulator.
It is register compatible with LTM4676.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/hwmon/ltc2978   | 24 ++++++++++++++++--------
 drivers/hwmon/pmbus/Kconfig   |  2 +-
 drivers/hwmon/pmbus/ltc2978.c | 10 ++++++++--
 3 files changed, 25 insertions(+), 11 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/hwmon/ltc2978 b/Documentation/hwmon/ltc2978
index f997aa53eb95..9a49d3c90cd1 100644
--- a/Documentation/hwmon/ltc2978
+++ b/Documentation/hwmon/ltc2978
@@ -47,6 +47,10 @@ Supported chips:
     Prefix: 'ltm2987'
     Addresses scanned: -
     Datasheet: http://www.linear.com/product/ltm2987
+  * Linear Technology LTM4675
+    Prefix: 'ltm4675'
+    Addresses scanned: -
+    Datasheet: http://www.linear.com/product/ltm4675
   * Linear Technology LTM4676
     Prefix: 'ltm4676'
     Addresses scanned: -
@@ -70,6 +74,7 @@ LTC3883 is a single phase step-down DC/DC controller.
 LTM2987 is a 16-channel Power System Manager with two LTC2977 plus
 additional components on a single die. The chip is instantiated and reported
 as two separate chips on two different I2C bus addresses.
+LTM4675 is a dual 9A or single 18A μModule regulator
 LTM4676 is a dual 13A or single 26A uModule regulator.
 
 
@@ -117,7 +122,8 @@ in[N]_label		"vout[1-8]".
 			LTC2974, LTC2975: N=2-5
 			LTC2977, LTC2980, LTM2987: N=2-9
 			LTC2978: N=2-9
-			LTC3880, LTC3882, LTC23886 LTC3887, LTM4676: N=2-3
+			LTC3880, LTC3882, LTC23886 LTC3887, LTM4675, LTM4676:
+				N=2-3
 			LTC3883: N=2
 in[N]_input		Measured output voltage.
 in[N]_min		Minimum output voltage.
@@ -139,9 +145,9 @@ temp[N]_input		Measured temperature.
 			On LTC2977, LTC2980, LTC2978, and LTM2987, only one
 			temperature measurement is supported and reports
 			the chip temperature.
-			On LTC3880, LTC3882, LTC3887, and LTM4676, temp1 and
-			temp2 report external temperatures, and temp3 reports
-			the chip temperature.
+			On LTC3880, LTC3882, LTC3887, LTM4675, and LTM4676,
+			temp1 and temp2 report external temperatures, and temp3
+			reports the chip temperature.
 			On LTC3883, temp1 reports an external temperature,
 			and temp2 reports the chip temperature.
 temp[N]_min		Mimimum temperature. LTC2974, LCT2977, LTM2980, LTC2978,
@@ -172,12 +178,13 @@ power[N]_label		"pout[1-4]".
 			LTC2974, LTC2975: N=1-4
 			LTC2977, LTC2980, LTM2987: Not supported
 			LTC2978: Not supported
-			LTC3880, LTC3882, LTC3886, LTC3887, LTM4676: N=1-2
+			LTC3880, LTC3882, LTC3886, LTC3887, LTM4675, LTM4676:
+				N=1-2
 			LTC3883: N=2
 power[N]_input		Measured output power.
 
-curr1_label		"iin". LTC3880, LTC3883, LTC3886, LTC3887, and LTM4676
-			only.
+curr1_label		"iin". LTC3880, LTC3883, LTC3886, LTC3887, LTM4675,
+			and LTM4676 only.
 curr1_input		Measured input current.
 curr1_max		Maximum input current.
 curr1_max_alarm		Input current high alarm.
@@ -188,7 +195,8 @@ curr[N]_label		"iout[1-4]".
 			LTC2974, LTC2975: N=1-4
 			LTC2977, LTC2980, LTM2987: not supported
 			LTC2978: not supported
-			LTC3880, LTC3882, LTC3886, LTC3887, LTM4676: N=2-3
+			LTC3880, LTC3882, LTC3886, LTC3887, LTM4675, LTM4676:
+				N=2-3
 			LTC3883: N=2
 curr[N]_input		Measured output current.
 curr[N]_max		Maximum output current.
diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index 5ebc36255c88..df6ebb2b8f0f 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -53,7 +53,7 @@ config SENSORS_LTC2978
 	help
 	  If you say yes here you get hardware monitoring support for Linear
 	  Technology LTC2974, LTC2975, LTC2977, LTC2978, LTC2980, LTC3880,
-	  LTC3883, LTC3886, LTC3887, LTCM2987, and LTM4676.
+	  LTC3883, LTC3886, LTC3887, LTCM2987, LTM4675, and LTM4676.
 
 	  This driver can also be built as a module. If so, the module will
 	  be called ltc2978.
diff --git a/drivers/hwmon/pmbus/ltc2978.c b/drivers/hwmon/pmbus/ltc2978.c
index f1c69c9de849..58b789c28b48 100644
--- a/drivers/hwmon/pmbus/ltc2978.c
+++ b/drivers/hwmon/pmbus/ltc2978.c
@@ -28,7 +28,7 @@
 #include "pmbus.h"
 
 enum chips { ltc2974, ltc2975, ltc2977, ltc2978, ltc2980, ltc3880, ltc3882,
-	ltc3883, ltc3886, ltc3887, ltm2987, ltm4676 };
+	ltc3883, ltc3886, ltc3887, ltm2987, ltm4675, ltm4676 };
 
 /* Common for all chips */
 #define LTC2978_MFR_VOUT_PEAK		0xdd
@@ -46,7 +46,7 @@ enum chips { ltc2974, ltc2975, ltc2977, ltc2978, ltc2980, ltc3880, ltc3882,
 #define LTC2974_MFR_IOUT_PEAK		0xd7
 #define LTC2974_MFR_IOUT_MIN		0xd8
 
-/* LTC3880, LTC3882, LTC3883, LTC3887, and LTM4676 */
+/* LTC3880, LTC3882, LTC3883, LTC3887, LTM4675, and LTM4676 */
 #define LTC3880_MFR_IOUT_PEAK		0xd7
 #define LTC3880_MFR_CLEAR_PEAKS		0xe3
 #define LTC3880_MFR_TEMPERATURE2_PEAK	0xf4
@@ -77,6 +77,7 @@ enum chips { ltc2974, ltc2975, ltc2977, ltc2978, ltc2980, ltc3880, ltc3882,
 #define LTC3887_ID			0x4700
 #define LTM2987_ID_A			0x8010	/* A/B for two die IDs */
 #define LTM2987_ID_B			0x8020
+#define LTM4675_ID			0x47a0
 #define LTM4676_ID_REV1			0x4400
 #define LTM4676_ID_REV2			0x4480
 #define LTM4676A_ID			0x47e0
@@ -509,6 +510,7 @@ static const struct i2c_device_id ltc2978_id[] = {
 	{"ltc3886", ltc3886},
 	{"ltc3887", ltc3887},
 	{"ltm2987", ltm2987},
+	{"ltm4675", ltm4675},
 	{"ltm4676", ltm4676},
 	{}
 };
@@ -581,6 +583,8 @@ static int ltc2978_get_id(struct i2c_client *client)
 		return ltc3887;
 	else if (chip_id == LTM2987_ID_A || chip_id == LTM2987_ID_B)
 		return ltm2987;
+	else if (chip_id == LTM4675_ID)
+		return ltm4675;
 	else if (chip_id == LTM4676_ID_REV1 || chip_id == LTM4676_ID_REV2 ||
 		 chip_id == LTM4676A_ID)
 		return ltm4676;
@@ -678,6 +682,7 @@ static int ltc2978_probe(struct i2c_client *client,
 		break;
 	case ltc3880:
 	case ltc3887:
+	case ltm4675:
 	case ltm4676:
 		data->features |= FEAT_CLEAR_PEAKS | FEAT_NEEDS_POLLING;
 		info->read_word_data = ltc3880_read_word_data;
@@ -763,6 +768,7 @@ static const struct of_device_id ltc2978_of_match[] = {
 	{ .compatible = "lltc,ltc3886" },
 	{ .compatible = "lltc,ltc3887" },
 	{ .compatible = "lltc,ltm2987" },
+	{ .compatible = "lltc,ltm4675" },
 	{ .compatible = "lltc,ltm4676" },
 	{ }
 };
-- 
cgit v1.2.3


From 6d8f7abd235c1a38629cdada49cc53992f4ad42e Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Wed, 8 Jul 2015 16:28:16 +0200
Subject: dmaengine: mv_xor: remove support for dmacap,* DT properties

The only reason why we had dmacap,* properties is because back when
DMA_MEMSET was supported, only one out of the two channels per engine
could do a memset operation. But this is something that the driver
already knows anyway, and since then, the DMA_MEMSET support has been
removed.

The driver is already well aware of what each channel supports and the
one to one mapping between Linux specific implementation details (such
as dmacap,interrupt enabling DMA_INTERRUPT) and DT properties is a
good indication that these DT properties are wrong.

Therefore, this commit simply gets rid of these dmacap,* properties,
they are now ignored, and the driver is responsible for knowing the
capabilities of the hardware with regard to the dmaengine subsystem
expectations.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
---
 Documentation/devicetree/bindings/dma/mv-xor.txt | 10 ++++------
 drivers/dma/mv_xor.c                             |  9 +++------
 2 files changed, 7 insertions(+), 12 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/dma/mv-xor.txt b/Documentation/devicetree/bindings/dma/mv-xor.txt
index cc29c35266e2..276ef815ef32 100644
--- a/Documentation/devicetree/bindings/dma/mv-xor.txt
+++ b/Documentation/devicetree/bindings/dma/mv-xor.txt
@@ -12,10 +12,13 @@ XOR engine has. Those sub-nodes have the following required
 properties:
 - interrupts: interrupt of the XOR channel
 
-And the following optional properties:
+The sub-nodes used to contain one or several of the following
+properties, but they are now deprecated:
 - dmacap,memcpy to indicate that the XOR channel is capable of memcpy operations
 - dmacap,memset to indicate that the XOR channel is capable of memset operations
 - dmacap,xor to indicate that the XOR channel is capable of xor operations
+- dmacap,interrupt to indicate that the XOR channel is capable of
+  generating interrupts
 
 Example:
 
@@ -28,13 +31,8 @@ xor@d0060900 {
 
 	xor00 {
 	      interrupts = <51>;
-	      dmacap,memcpy;
-	      dmacap,xor;
 	};
 	xor01 {
 	      interrupts = <52>;
-	      dmacap,memcpy;
-	      dmacap,xor;
-	      dmacap,memset;
 	};
 };
diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c
index fbaf1ead2597..b165e0b31a48 100644
--- a/drivers/dma/mv_xor.c
+++ b/drivers/dma/mv_xor.c
@@ -1190,12 +1190,9 @@ static int mv_xor_probe(struct platform_device *pdev)
 			op_in_desc = (int)of_id->data;
 
 			dma_cap_zero(cap_mask);
-			if (of_property_read_bool(np, "dmacap,memcpy"))
-				dma_cap_set(DMA_MEMCPY, cap_mask);
-			if (of_property_read_bool(np, "dmacap,xor"))
-				dma_cap_set(DMA_XOR, cap_mask);
-			if (of_property_read_bool(np, "dmacap,interrupt"))
-				dma_cap_set(DMA_INTERRUPT, cap_mask);
+			dma_cap_set(DMA_MEMCPY, cap_mask);
+			dma_cap_set(DMA_XOR, cap_mask);
+			dma_cap_set(DMA_INTERRUPT, cap_mask);
 
 			irq = irq_of_parse_and_map(np, 0);
 			if (!irq) {
-- 
cgit v1.2.3


From b096c1377d1e50cea91d1db13bca8e7802199a67 Mon Sep 17 00:00:00 2001
From: Emilio López <emilio@elopez.com.ar>
Date: Sun, 26 Jul 2015 22:50:55 +0200
Subject: dmaengine: sun4i: Add support for the DMA engine on sun[457]i SoCs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This patch adds support for the DMA engine present on Allwinner A10,
A13, A10S and A20 SoCs. This engine has two kinds of channels: normal
and dedicated. The main difference is in the mode of operation;
while a single normal channel may be operating at any given time,
dedicated channels may operate simultaneously provided there is no
overlap of source or destination.

Hardware documentation can be found on A10 User Manual (section 12), A13
User Manual (section 14) and A20 User Manual (section 1.12)

Signed-off-by: Emilio López <emilio@elopez.com.ar>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
---
 .../devicetree/bindings/dma/sun4i-dma.txt          |   46 +
 drivers/dma/Kconfig                                |   11 +
 drivers/dma/Makefile                               |    1 +
 drivers/dma/sun4i-dma.c                            | 1288 ++++++++++++++++++++
 4 files changed, 1346 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/sun4i-dma.txt
 create mode 100644 drivers/dma/sun4i-dma.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/dma/sun4i-dma.txt b/Documentation/devicetree/bindings/dma/sun4i-dma.txt
new file mode 100644
index 000000000000..f1634a27a830
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/sun4i-dma.txt
@@ -0,0 +1,46 @@
+Allwinner A10 DMA Controller
+
+This driver follows the generic DMA bindings defined in dma.txt.
+
+Required properties:
+
+- compatible:	Must be "allwinner,sun4i-a10-dma"
+- reg:		Should contain the registers base address and length
+- interrupts:	Should contain a reference to the interrupt used by this device
+- clocks:	Should contain a reference to the parent AHB clock
+- #dma-cells :	Should be 2, first cell denoting normal or dedicated dma,
+		second cell holding the request line number.
+
+Example:
+	dma: dma-controller@01c02000 {
+		compatible = "allwinner,sun4i-a10-dma";
+		reg = <0x01c02000 0x1000>;
+		interrupts = <27>;
+		clocks = <&ahb_gates 6>;
+		#dma-cells = <2>;
+	};
+
+Clients:
+
+DMA clients connected to the Allwinner A10 DMA controller must use the
+format described in the dma.txt file, using a three-cell specifier for
+each channel: a phandle plus two integer cells.
+The three cells in order are:
+
+1. A phandle pointing to the DMA controller.
+2. Whether it is using normal (0) or dedicated (1) channels
+3. The port ID as specified in the datasheet
+
+Example:
+	spi2: spi@01c17000 {
+		compatible = "allwinner,sun4i-a10-spi";
+		reg = <0x01c17000 0x1000>;
+		interrupts = <0 12 4>;
+		clocks = <&ahb_gates 22>, <&spi2_clk>;
+		clock-names = "ahb", "mod";
+		dmas = <&dma 1 29>, <&dma 1 28>;
+		dma-names = "rx", "tx";
+		status = "disabled";
+		#address-cells = <1>;
+		#size-cells = <0>;
+	};
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 9d5a77cb9715..5244b44f5e67 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -434,6 +434,17 @@ config XILINX_VDMA
 	  channels, Memory Mapped to Stream (MM2S) and Stream to
 	  Memory Mapped (S2MM) for the data transfers.
 
+config DMA_SUN4I
+	tristate "Allwinner A10 DMA SoCs support"
+	depends on MACH_SUN4I || MACH_SUN5I || MACH_SUN7I || COMPILE_TEST
+	default (MACH_SUN4I || MACH_SUN5I || MACH_SUN7I)
+	select DMA_ENGINE
+	select DMA_OF
+	select DMA_VIRTUAL_CHANNELS
+	help
+	  Enable support for the DMA controller present in the sun4i,
+	  sun5i and sun7i Allwinner ARM SoCs.
+
 config DMA_SUN6I
 	tristate "Allwinner A31 SoCs DMA support"
 	depends on MACH_SUN6I || MACH_SUN8I || COMPILE_TEST
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 800cb2d20be8..e707ff956164 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -55,5 +55,6 @@ obj-y += xilinx/
 obj-$(CONFIG_INTEL_MIC_X100_DMA) += mic_x100_dma.o
 obj-$(CONFIG_NBPFAXI_DMA) += nbpfaxi.o
 obj-$(CONFIG_DMA_SUN6I) += sun6i-dma.o
+obj-$(CONFIG_DMA_SUN4I) += sun4i-dma.o
 obj-$(CONFIG_IMG_MDC_DMA) += img-mdc-dma.o
 obj-$(CONFIG_XGENE_DMA) += xgene-dma.o
diff --git a/drivers/dma/sun4i-dma.c b/drivers/dma/sun4i-dma.c
new file mode 100644
index 000000000000..a1a500d96ff2
--- /dev/null
+++ b/drivers/dma/sun4i-dma.c
@@ -0,0 +1,1288 @@
+/*
+ * Copyright (C) 2014 Emilio López
+ * Emilio López <emilio@elopez.com.ar>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/bitmap.h>
+#include <linux/bitops.h>
+#include <linux/clk.h>
+#include <linux/dmaengine.h>
+#include <linux/dmapool.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/of_dma.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include "virt-dma.h"
+
+/** Common macros to normal and dedicated DMA registers **/
+
+#define SUN4I_DMA_CFG_LOADING			BIT(31)
+#define SUN4I_DMA_CFG_DST_DATA_WIDTH(width)	((width) << 25)
+#define SUN4I_DMA_CFG_DST_BURST_LENGTH(len)	((len) << 23)
+#define SUN4I_DMA_CFG_DST_ADDR_MODE(mode)	((mode) << 21)
+#define SUN4I_DMA_CFG_DST_DRQ_TYPE(type)	((type) << 16)
+#define SUN4I_DMA_CFG_SRC_DATA_WIDTH(width)	((width) << 9)
+#define SUN4I_DMA_CFG_SRC_BURST_LENGTH(len)	((len) << 7)
+#define SUN4I_DMA_CFG_SRC_ADDR_MODE(mode)	((mode) << 5)
+#define SUN4I_DMA_CFG_SRC_DRQ_TYPE(type)	(type)
+
+/** Normal DMA register values **/
+
+/* Normal DMA source/destination data request type values */
+#define SUN4I_NDMA_DRQ_TYPE_SDRAM		0x16
+#define SUN4I_NDMA_DRQ_TYPE_LIMIT		(0x1F + 1)
+
+/** Normal DMA register layout **/
+
+/* Dedicated DMA source/destination address mode values */
+#define SUN4I_NDMA_ADDR_MODE_LINEAR		0
+#define SUN4I_NDMA_ADDR_MODE_IO			1
+
+/* Normal DMA configuration register layout */
+#define SUN4I_NDMA_CFG_CONT_MODE		BIT(30)
+#define SUN4I_NDMA_CFG_WAIT_STATE(n)		((n) << 27)
+#define SUN4I_NDMA_CFG_DST_NON_SECURE		BIT(22)
+#define SUN4I_NDMA_CFG_BYTE_COUNT_MODE_REMAIN	BIT(15)
+#define SUN4I_NDMA_CFG_SRC_NON_SECURE		BIT(6)
+
+/** Dedicated DMA register values **/
+
+/* Dedicated DMA source/destination address mode values */
+#define SUN4I_DDMA_ADDR_MODE_LINEAR		0
+#define SUN4I_DDMA_ADDR_MODE_IO			1
+#define SUN4I_DDMA_ADDR_MODE_HORIZONTAL_PAGE	2
+#define SUN4I_DDMA_ADDR_MODE_VERTICAL_PAGE	3
+
+/* Dedicated DMA source/destination data request type values */
+#define SUN4I_DDMA_DRQ_TYPE_SDRAM		0x1
+#define SUN4I_DDMA_DRQ_TYPE_LIMIT		(0x1F + 1)
+
+/** Dedicated DMA register layout **/
+
+/* Dedicated DMA configuration register layout */
+#define SUN4I_DDMA_CFG_BUSY			BIT(30)
+#define SUN4I_DDMA_CFG_CONT_MODE		BIT(29)
+#define SUN4I_DDMA_CFG_DST_NON_SECURE		BIT(28)
+#define SUN4I_DDMA_CFG_BYTE_COUNT_MODE_REMAIN	BIT(15)
+#define SUN4I_DDMA_CFG_SRC_NON_SECURE		BIT(12)
+
+/* Dedicated DMA parameter register layout */
+#define SUN4I_DDMA_PARA_DST_DATA_BLK_SIZE(n)	(((n) - 1) << 24)
+#define SUN4I_DDMA_PARA_DST_WAIT_CYCLES(n)	(((n) - 1) << 16)
+#define SUN4I_DDMA_PARA_SRC_DATA_BLK_SIZE(n)	(((n) - 1) << 8)
+#define SUN4I_DDMA_PARA_SRC_WAIT_CYCLES(n)	(((n) - 1) << 0)
+
+/** DMA register offsets **/
+
+/* General register offsets */
+#define SUN4I_DMA_IRQ_ENABLE_REG		0x0
+#define SUN4I_DMA_IRQ_PENDING_STATUS_REG	0x4
+
+/* Normal DMA register offsets */
+#define SUN4I_NDMA_CHANNEL_REG_BASE(n)		(0x100 + (n) * 0x20)
+#define SUN4I_NDMA_CFG_REG			0x0
+#define SUN4I_NDMA_SRC_ADDR_REG			0x4
+#define SUN4I_NDMA_DST_ADDR_REG		0x8
+#define SUN4I_NDMA_BYTE_COUNT_REG		0xC
+
+/* Dedicated DMA register offsets */
+#define SUN4I_DDMA_CHANNEL_REG_BASE(n)		(0x300 + (n) * 0x20)
+#define SUN4I_DDMA_CFG_REG			0x0
+#define SUN4I_DDMA_SRC_ADDR_REG			0x4
+#define SUN4I_DDMA_DST_ADDR_REG		0x8
+#define SUN4I_DDMA_BYTE_COUNT_REG		0xC
+#define SUN4I_DDMA_PARA_REG			0x18
+
+/** DMA Driver **/
+
+/*
+ * Normal DMA has 8 channels, and Dedicated DMA has another 8, so
+ * that's 16 channels. As for endpoints, there's 29 and 21
+ * respectively. Given that the Normal DMA endpoints (other than
+ * SDRAM) can be used as tx/rx, we need 78 vchans in total
+ */
+#define SUN4I_NDMA_NR_MAX_CHANNELS	8
+#define SUN4I_DDMA_NR_MAX_CHANNELS	8
+#define SUN4I_DMA_NR_MAX_CHANNELS					\
+	(SUN4I_NDMA_NR_MAX_CHANNELS + SUN4I_DDMA_NR_MAX_CHANNELS)
+#define SUN4I_NDMA_NR_MAX_VCHANS	(29 * 2 - 1)
+#define SUN4I_DDMA_NR_MAX_VCHANS	21
+#define SUN4I_DMA_NR_MAX_VCHANS						\
+	(SUN4I_NDMA_NR_MAX_VCHANS + SUN4I_DDMA_NR_MAX_VCHANS)
+
+/* This set of SUN4I_DDMA timing parameters were found experimentally while
+ * working with the SPI driver and seem to make it behave correctly */
+#define SUN4I_DDMA_MAGIC_SPI_PARAMETERS \
+	(SUN4I_DDMA_PARA_DST_DATA_BLK_SIZE(1) |			\
+	 SUN4I_DDMA_PARA_SRC_DATA_BLK_SIZE(1) |				\
+	 SUN4I_DDMA_PARA_DST_WAIT_CYCLES(2) |				\
+	 SUN4I_DDMA_PARA_SRC_WAIT_CYCLES(2))
+
+struct sun4i_dma_pchan {
+	/* Register base of channel */
+	void __iomem			*base;
+	/* vchan currently being serviced */
+	struct sun4i_dma_vchan		*vchan;
+	/* Is this a dedicated pchan? */
+	int				is_dedicated;
+};
+
+struct sun4i_dma_vchan {
+	struct virt_dma_chan		vc;
+	struct dma_slave_config		cfg;
+	struct sun4i_dma_pchan		*pchan;
+	struct sun4i_dma_promise	*processing;
+	struct sun4i_dma_contract	*contract;
+	u8				endpoint;
+	int				is_dedicated;
+};
+
+struct sun4i_dma_promise {
+	u32				cfg;
+	u32				para;
+	dma_addr_t			src;
+	dma_addr_t			dst;
+	size_t				len;
+	struct list_head		list;
+};
+
+/* A contract is a set of promises */
+struct sun4i_dma_contract {
+	struct virt_dma_desc		vd;
+	struct list_head		demands;
+	struct list_head		completed_demands;
+	int				is_cyclic;
+};
+
+struct sun4i_dma_dev {
+	DECLARE_BITMAP(pchans_used, SUN4I_DMA_NR_MAX_CHANNELS);
+	struct dma_device		slave;
+	struct sun4i_dma_pchan		*pchans;
+	struct sun4i_dma_vchan		*vchans;
+	void __iomem			*base;
+	struct clk			*clk;
+	int				irq;
+	spinlock_t			lock;
+};
+
+static struct sun4i_dma_dev *to_sun4i_dma_dev(struct dma_device *dev)
+{
+	return container_of(dev, struct sun4i_dma_dev, slave);
+}
+
+static struct sun4i_dma_vchan *to_sun4i_dma_vchan(struct dma_chan *chan)
+{
+	return container_of(chan, struct sun4i_dma_vchan, vc.chan);
+}
+
+static struct sun4i_dma_contract *to_sun4i_dma_contract(struct virt_dma_desc *vd)
+{
+	return container_of(vd, struct sun4i_dma_contract, vd);
+}
+
+static struct device *chan2dev(struct dma_chan *chan)
+{
+	return &chan->dev->device;
+}
+
+static int convert_burst(u32 maxburst)
+{
+	if (maxburst > 8)
+		return -EINVAL;
+
+	/* 1 -> 0, 4 -> 1, 8 -> 2 */
+	return (maxburst >> 2);
+}
+
+static int convert_buswidth(enum dma_slave_buswidth addr_width)
+{
+	if (addr_width > DMA_SLAVE_BUSWIDTH_4_BYTES)
+		return -EINVAL;
+
+	/* 8 (1 byte) -> 0, 16 (2 bytes) -> 1, 32 (4 bytes) -> 2 */
+	return (addr_width >> 1);
+}
+
+static void sun4i_dma_free_chan_resources(struct dma_chan *chan)
+{
+	struct sun4i_dma_vchan *vchan = to_sun4i_dma_vchan(chan);
+
+	vchan_free_chan_resources(&vchan->vc);
+}
+
+static struct sun4i_dma_pchan *find_and_use_pchan(struct sun4i_dma_dev *priv,
+						  struct sun4i_dma_vchan *vchan)
+{
+	struct sun4i_dma_pchan *pchan = NULL, *pchans = priv->pchans;
+	unsigned long flags;
+	int i, max;
+
+	/*
+	 * pchans 0-SUN4I_NDMA_NR_MAX_CHANNELS are normal, and
+	 * SUN4I_NDMA_NR_MAX_CHANNELS+ are dedicated ones
+	 */
+	if (vchan->is_dedicated) {
+		i = SUN4I_NDMA_NR_MAX_CHANNELS;
+		max = SUN4I_DMA_NR_MAX_CHANNELS;
+	} else {
+		i = 0;
+		max = SUN4I_NDMA_NR_MAX_CHANNELS;
+	}
+
+	spin_lock_irqsave(&priv->lock, flags);
+	for_each_clear_bit_from(i, &priv->pchans_used, max) {
+		pchan = &pchans[i];
+		pchan->vchan = vchan;
+		set_bit(i, priv->pchans_used);
+		break;
+	}
+	spin_unlock_irqrestore(&priv->lock, flags);
+
+	return pchan;
+}
+
+static void release_pchan(struct sun4i_dma_dev *priv,
+			  struct sun4i_dma_pchan *pchan)
+{
+	unsigned long flags;
+	int nr = pchan - priv->pchans;
+
+	spin_lock_irqsave(&priv->lock, flags);
+
+	pchan->vchan = NULL;
+	clear_bit(nr, priv->pchans_used);
+
+	spin_unlock_irqrestore(&priv->lock, flags);
+}
+
+static void configure_pchan(struct sun4i_dma_pchan *pchan,
+			    struct sun4i_dma_promise *d)
+{
+	/*
+	 * Configure addresses and misc parameters depending on type
+	 * SUN4I_DDMA has an extra field with timing parameters
+	 */
+	if (pchan->is_dedicated) {
+		writel_relaxed(d->src, pchan->base + SUN4I_DDMA_SRC_ADDR_REG);
+		writel_relaxed(d->dst, pchan->base + SUN4I_DDMA_DST_ADDR_REG);
+		writel_relaxed(d->len, pchan->base + SUN4I_DDMA_BYTE_COUNT_REG);
+		writel_relaxed(d->para, pchan->base + SUN4I_DDMA_PARA_REG);
+		writel_relaxed(d->cfg, pchan->base + SUN4I_DDMA_CFG_REG);
+	} else {
+		writel_relaxed(d->src, pchan->base + SUN4I_NDMA_SRC_ADDR_REG);
+		writel_relaxed(d->dst, pchan->base + SUN4I_NDMA_DST_ADDR_REG);
+		writel_relaxed(d->len, pchan->base + SUN4I_NDMA_BYTE_COUNT_REG);
+		writel_relaxed(d->cfg, pchan->base + SUN4I_NDMA_CFG_REG);
+	}
+}
+
+static void set_pchan_interrupt(struct sun4i_dma_dev *priv,
+				struct sun4i_dma_pchan *pchan,
+				int half, int end)
+{
+	u32 reg;
+	int pchan_number = pchan - priv->pchans;
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->lock, flags);
+
+	reg = readl_relaxed(priv->base + SUN4I_DMA_IRQ_ENABLE_REG);
+
+	if (half)
+		reg |= BIT(pchan_number * 2);
+	else
+		reg &= ~BIT(pchan_number * 2);
+
+	if (end)
+		reg |= BIT(pchan_number * 2 + 1);
+	else
+		reg &= ~BIT(pchan_number * 2 + 1);
+
+	writel_relaxed(reg, priv->base + SUN4I_DMA_IRQ_ENABLE_REG);
+
+	spin_unlock_irqrestore(&priv->lock, flags);
+}
+
+/**
+ * Execute pending operations on a vchan
+ *
+ * When given a vchan, this function will try to acquire a suitable
+ * pchan and, if successful, will configure it to fulfill a promise
+ * from the next pending contract.
+ *
+ * This function must be called with &vchan->vc.lock held.
+ */
+static int __execute_vchan_pending(struct sun4i_dma_dev *priv,
+				   struct sun4i_dma_vchan *vchan)
+{
+	struct sun4i_dma_promise *promise = NULL;
+	struct sun4i_dma_contract *contract = NULL;
+	struct sun4i_dma_pchan *pchan;
+	struct virt_dma_desc *vd;
+	int ret;
+
+	lockdep_assert_held(&vchan->vc.lock);
+
+	/* We need a pchan to do anything, so secure one if available */
+	pchan = find_and_use_pchan(priv, vchan);
+	if (!pchan)
+		return -EBUSY;
+
+	/*
+	 * Channel endpoints must not be repeated, so if this vchan
+	 * has already submitted some work, we can't do anything else
+	 */
+	if (vchan->processing) {
+		dev_dbg(chan2dev(&vchan->vc.chan),
+			"processing something to this endpoint already\n");
+		ret = -EBUSY;
+		goto release_pchan;
+	}
+
+	do {
+		/* Figure out which contract we're working with today */
+		vd = vchan_next_desc(&vchan->vc);
+		if (!vd) {
+			dev_dbg(chan2dev(&vchan->vc.chan),
+				"No pending contract found");
+			ret = 0;
+			goto release_pchan;
+		}
+
+		contract = to_sun4i_dma_contract(vd);
+		if (list_empty(&contract->demands)) {
+			/* The contract has been completed so mark it as such */
+			list_del(&contract->vd.node);
+			vchan_cookie_complete(&contract->vd);
+			dev_dbg(chan2dev(&vchan->vc.chan),
+				"Empty contract found and marked complete");
+		}
+	} while (list_empty(&contract->demands));
+
+	/* Now find out what we need to do */
+	promise = list_first_entry(&contract->demands,
+				   struct sun4i_dma_promise, list);
+	vchan->processing = promise;
+
+	/* ... and make it reality */
+	if (promise) {
+		vchan->contract = contract;
+		vchan->pchan = pchan;
+		set_pchan_interrupt(priv, pchan, contract->is_cyclic, 1);
+		configure_pchan(pchan, promise);
+	}
+
+	return 0;
+
+release_pchan:
+	release_pchan(priv, pchan);
+	return ret;
+}
+
+static int sanitize_config(struct dma_slave_config *sconfig,
+			   enum dma_transfer_direction direction)
+{
+	switch (direction) {
+	case DMA_MEM_TO_DEV:
+		if ((sconfig->dst_addr_width == DMA_SLAVE_BUSWIDTH_UNDEFINED) ||
+		    !sconfig->dst_maxburst)
+			return -EINVAL;
+
+		if (sconfig->src_addr_width == DMA_SLAVE_BUSWIDTH_UNDEFINED)
+			sconfig->src_addr_width = sconfig->dst_addr_width;
+
+		if (!sconfig->src_maxburst)
+			sconfig->src_maxburst = sconfig->dst_maxburst;
+
+		break;
+
+	case DMA_DEV_TO_MEM:
+		if ((sconfig->src_addr_width == DMA_SLAVE_BUSWIDTH_UNDEFINED) ||
+		    !sconfig->src_maxburst)
+			return -EINVAL;
+
+		if (sconfig->dst_addr_width == DMA_SLAVE_BUSWIDTH_UNDEFINED)
+			sconfig->dst_addr_width = sconfig->src_addr_width;
+
+		if (!sconfig->dst_maxburst)
+			sconfig->dst_maxburst = sconfig->src_maxburst;
+
+		break;
+	default:
+		return 0;
+	}
+
+	return 0;
+}
+
+/**
+ * Generate a promise, to be used in a normal DMA contract.
+ *
+ * A NDMA promise contains all the information required to program the
+ * normal part of the DMA Engine and get data copied. A non-executed
+ * promise will live in the demands list on a contract. Once it has been
+ * completed, it will be moved to the completed demands list for later freeing.
+ * All linked promises will be freed when the corresponding contract is freed
+ */
+static struct sun4i_dma_promise *
+generate_ndma_promise(struct dma_chan *chan, dma_addr_t src, dma_addr_t dest,
+		      size_t len, struct dma_slave_config *sconfig,
+		      enum dma_transfer_direction direction)
+{
+	struct sun4i_dma_promise *promise;
+	int ret;
+
+	ret = sanitize_config(sconfig, direction);
+	if (ret)
+		return NULL;
+
+	promise = kzalloc(sizeof(*promise), GFP_NOWAIT);
+	if (!promise)
+		return NULL;
+
+	promise->src = src;
+	promise->dst = dest;
+	promise->len = len;
+	promise->cfg = SUN4I_DMA_CFG_LOADING |
+		SUN4I_NDMA_CFG_BYTE_COUNT_MODE_REMAIN;
+
+	dev_dbg(chan2dev(chan),
+		"src burst %d, dst burst %d, src buswidth %d, dst buswidth %d",
+		sconfig->src_maxburst, sconfig->dst_maxburst,
+		sconfig->src_addr_width, sconfig->dst_addr_width);
+
+	/* Source burst */
+	ret = convert_burst(sconfig->src_maxburst);
+	if (IS_ERR_VALUE(ret))
+		goto fail;
+	promise->cfg |= SUN4I_DMA_CFG_SRC_BURST_LENGTH(ret);
+
+	/* Destination burst */
+	ret = convert_burst(sconfig->dst_maxburst);
+	if (IS_ERR_VALUE(ret))
+		goto fail;
+	promise->cfg |= SUN4I_DMA_CFG_DST_BURST_LENGTH(ret);
+
+	/* Source bus width */
+	ret = convert_buswidth(sconfig->src_addr_width);
+	if (IS_ERR_VALUE(ret))
+		goto fail;
+	promise->cfg |= SUN4I_DMA_CFG_SRC_DATA_WIDTH(ret);
+
+	/* Destination bus width */
+	ret = convert_buswidth(sconfig->dst_addr_width);
+	if (IS_ERR_VALUE(ret))
+		goto fail;
+	promise->cfg |= SUN4I_DMA_CFG_DST_DATA_WIDTH(ret);
+
+	return promise;
+
+fail:
+	kfree(promise);
+	return NULL;
+}
+
+/**
+ * Generate a promise, to be used in a dedicated DMA contract.
+ *
+ * A DDMA promise contains all the information required to program the
+ * Dedicated part of the DMA Engine and get data copied. A non-executed
+ * promise will live in the demands list on a contract. Once it has been
+ * completed, it will be moved to the completed demands list for later freeing.
+ * All linked promises will be freed when the corresponding contract is freed
+ */
+static struct sun4i_dma_promise *
+generate_ddma_promise(struct dma_chan *chan, dma_addr_t src, dma_addr_t dest,
+		      size_t len, struct dma_slave_config *sconfig)
+{
+	struct sun4i_dma_promise *promise;
+	int ret;
+
+	promise = kzalloc(sizeof(*promise), GFP_NOWAIT);
+	if (!promise)
+		return NULL;
+
+	promise->src = src;
+	promise->dst = dest;
+	promise->len = len;
+	promise->cfg = SUN4I_DMA_CFG_LOADING |
+		SUN4I_DDMA_CFG_BYTE_COUNT_MODE_REMAIN;
+
+	/* Source burst */
+	ret = convert_burst(sconfig->src_maxburst);
+	if (IS_ERR_VALUE(ret))
+		goto fail;
+	promise->cfg |= SUN4I_DMA_CFG_SRC_BURST_LENGTH(ret);
+
+	/* Destination burst */
+	ret = convert_burst(sconfig->dst_maxburst);
+	if (IS_ERR_VALUE(ret))
+		goto fail;
+	promise->cfg |= SUN4I_DMA_CFG_DST_BURST_LENGTH(ret);
+
+	/* Source bus width */
+	ret = convert_buswidth(sconfig->src_addr_width);
+	if (IS_ERR_VALUE(ret))
+		goto fail;
+	promise->cfg |= SUN4I_DMA_CFG_SRC_DATA_WIDTH(ret);
+
+	/* Destination bus width */
+	ret = convert_buswidth(sconfig->dst_addr_width);
+	if (IS_ERR_VALUE(ret))
+		goto fail;
+	promise->cfg |= SUN4I_DMA_CFG_DST_DATA_WIDTH(ret);
+
+	return promise;
+
+fail:
+	kfree(promise);
+	return NULL;
+}
+
+/**
+ * Generate a contract
+ *
+ * Contracts function as DMA descriptors. As our hardware does not support
+ * linked lists, we need to implement SG via software. We use a contract
+ * to hold all the pieces of the request and process them serially one
+ * after another. Each piece is represented as a promise.
+ */
+static struct sun4i_dma_contract *generate_dma_contract(void)
+{
+	struct sun4i_dma_contract *contract;
+
+	contract = kzalloc(sizeof(*contract), GFP_NOWAIT);
+	if (!contract)
+		return NULL;
+
+	INIT_LIST_HEAD(&contract->demands);
+	INIT_LIST_HEAD(&contract->completed_demands);
+
+	return contract;
+}
+
+/**
+ * Get next promise on a cyclic transfer
+ *
+ * Cyclic contracts contain a series of promises which are executed on a
+ * loop. This function returns the next promise from a cyclic contract,
+ * so it can be programmed into the hardware.
+ */
+static struct sun4i_dma_promise *
+get_next_cyclic_promise(struct sun4i_dma_contract *contract)
+{
+	struct sun4i_dma_promise *promise;
+
+	promise = list_first_entry_or_null(&contract->demands,
+					   struct sun4i_dma_promise, list);
+	if (!promise) {
+		list_splice_init(&contract->completed_demands,
+				 &contract->demands);
+		promise = list_first_entry(&contract->demands,
+					   struct sun4i_dma_promise, list);
+	}
+
+	return promise;
+}
+
+/**
+ * Free a contract and all its associated promises
+ */
+static void sun4i_dma_free_contract(struct virt_dma_desc *vd)
+{
+	struct sun4i_dma_contract *contract = to_sun4i_dma_contract(vd);
+	struct sun4i_dma_promise *promise;
+
+	/* Free all the demands and completed demands */
+	list_for_each_entry(promise, &contract->demands, list)
+		kfree(promise);
+
+	list_for_each_entry(promise, &contract->completed_demands, list)
+		kfree(promise);
+
+	kfree(contract);
+}
+
+static struct dma_async_tx_descriptor *
+sun4i_dma_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dest,
+			  dma_addr_t src, size_t len, unsigned long flags)
+{
+	struct sun4i_dma_vchan *vchan = to_sun4i_dma_vchan(chan);
+	struct dma_slave_config *sconfig = &vchan->cfg;
+	struct sun4i_dma_promise *promise;
+	struct sun4i_dma_contract *contract;
+
+	contract = generate_dma_contract();
+	if (!contract)
+		return NULL;
+
+	/*
+	 * We can only do the copy to bus aligned addresses, so
+	 * choose the best one so we get decent performance. We also
+	 * maximize the burst size for this same reason.
+	 */
+	sconfig->src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+	sconfig->dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+	sconfig->src_maxburst = 8;
+	sconfig->dst_maxburst = 8;
+
+	if (vchan->is_dedicated)
+		promise = generate_ddma_promise(chan, src, dest, len, sconfig);
+	else
+		promise = generate_ndma_promise(chan, src, dest, len, sconfig,
+						DMA_MEM_TO_MEM);
+
+	if (!promise) {
+		kfree(contract);
+		return NULL;
+	}
+
+	/* Configure memcpy mode */
+	if (vchan->is_dedicated) {
+		promise->cfg |= SUN4I_DMA_CFG_SRC_DRQ_TYPE(SUN4I_DDMA_DRQ_TYPE_SDRAM) |
+				SUN4I_DMA_CFG_DST_DRQ_TYPE(SUN4I_DDMA_DRQ_TYPE_SDRAM);
+	} else {
+		promise->cfg |= SUN4I_DMA_CFG_SRC_DRQ_TYPE(SUN4I_NDMA_DRQ_TYPE_SDRAM) |
+				SUN4I_DMA_CFG_DST_DRQ_TYPE(SUN4I_NDMA_DRQ_TYPE_SDRAM);
+	}
+
+	/* Fill the contract with our only promise */
+	list_add_tail(&promise->list, &contract->demands);
+
+	/* And add it to the vchan */
+	return vchan_tx_prep(&vchan->vc, &contract->vd, flags);
+}
+
+static struct dma_async_tx_descriptor *
+sun4i_dma_prep_dma_cyclic(struct dma_chan *chan, dma_addr_t buf, size_t len,
+			  size_t period_len, enum dma_transfer_direction dir,
+			  unsigned long flags)
+{
+	struct sun4i_dma_vchan *vchan = to_sun4i_dma_vchan(chan);
+	struct dma_slave_config *sconfig = &vchan->cfg;
+	struct sun4i_dma_promise *promise;
+	struct sun4i_dma_contract *contract;
+	dma_addr_t src, dest;
+	u32 endpoints;
+	int nr_periods, offset, plength, i;
+
+	if (!is_slave_direction(dir)) {
+		dev_err(chan2dev(chan), "Invalid DMA direction\n");
+		return NULL;
+	}
+
+	if (vchan->is_dedicated) {
+		/*
+		 * As we are using this just for audio data, we need to use
+		 * normal DMA. There is nothing stopping us from supporting
+		 * dedicated DMA here as well, so if a client comes up and
+		 * requires it, it will be simple to implement it.
+		 */
+		dev_err(chan2dev(chan),
+			"Cyclic transfers are only supported on Normal DMA\n");
+		return NULL;
+	}
+
+	contract = generate_dma_contract();
+	if (!contract)
+		return NULL;
+
+	contract->is_cyclic = 1;
+
+	/* Figure out the endpoints and the address we need */
+	if (dir == DMA_MEM_TO_DEV) {
+		src = buf;
+		dest = sconfig->dst_addr;
+		endpoints = SUN4I_DMA_CFG_SRC_DRQ_TYPE(SUN4I_NDMA_DRQ_TYPE_SDRAM) |
+			    SUN4I_DMA_CFG_DST_DRQ_TYPE(vchan->endpoint) |
+			    SUN4I_DMA_CFG_DST_ADDR_MODE(SUN4I_NDMA_ADDR_MODE_IO);
+	} else {
+		src = sconfig->src_addr;
+		dest = buf;
+		endpoints = SUN4I_DMA_CFG_SRC_DRQ_TYPE(vchan->endpoint) |
+			    SUN4I_DMA_CFG_SRC_ADDR_MODE(SUN4I_NDMA_ADDR_MODE_IO) |
+			    SUN4I_DMA_CFG_DST_DRQ_TYPE(SUN4I_NDMA_DRQ_TYPE_SDRAM);
+	}
+
+	/*
+	 * We will be using half done interrupts to make two periods
+	 * out of a promise, so we need to program the DMA engine less
+	 * often
+	 */
+
+	/*
+	 * The engine can interrupt on half-transfer, so we can use
+	 * this feature to program the engine half as often as if we
+	 * didn't use it (keep in mind the hardware doesn't support
+	 * linked lists).
+	 *
+	 * Say you have a set of periods (| marks the start/end, I for
+	 * interrupt, P for programming the engine to do a new
+	 * transfer), the easy but slow way would be to do
+	 *
+	 *  |---|---|---|---| (periods / promises)
+	 *  P  I,P I,P I,P  I
+	 *
+	 * Using half transfer interrupts you can do
+	 *
+	 *  |-------|-------| (promises as configured on hw)
+	 *  |---|---|---|---| (periods)
+	 *  P   I  I,P  I   I
+	 *
+	 * Which requires half the engine programming for the same
+	 * functionality.
+	 */
+	nr_periods = DIV_ROUND_UP(len / period_len, 2);
+	for (i = 0; i < nr_periods; i++) {
+		/* Calculate the offset in the buffer and the length needed */
+		offset = i * period_len * 2;
+		plength = min((len - offset), (period_len * 2));
+		if (dir == DMA_MEM_TO_DEV)
+			src = buf + offset;
+		else
+			dest = buf + offset;
+
+		/* Make the promise */
+		promise = generate_ndma_promise(chan, src, dest,
+						plength, sconfig, dir);
+		if (!promise) {
+			/* TODO: should we free everything? */
+			return NULL;
+		}
+		promise->cfg |= endpoints;
+
+		/* Then add it to the contract */
+		list_add_tail(&promise->list, &contract->demands);
+	}
+
+	/* And add it to the vchan */
+	return vchan_tx_prep(&vchan->vc, &contract->vd, flags);
+}
+
+static struct dma_async_tx_descriptor *
+sun4i_dma_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
+			unsigned int sg_len, enum dma_transfer_direction dir,
+			unsigned long flags, void *context)
+{
+	struct sun4i_dma_vchan *vchan = to_sun4i_dma_vchan(chan);
+	struct dma_slave_config *sconfig = &vchan->cfg;
+	struct sun4i_dma_promise *promise;
+	struct sun4i_dma_contract *contract;
+	u8 ram_type, io_mode, linear_mode;
+	struct scatterlist *sg;
+	dma_addr_t srcaddr, dstaddr;
+	u32 endpoints, para;
+	int i;
+
+	if (!sgl)
+		return NULL;
+
+	if (!is_slave_direction(dir)) {
+		dev_err(chan2dev(chan), "Invalid DMA direction\n");
+		return NULL;
+	}
+
+	contract = generate_dma_contract();
+	if (!contract)
+		return NULL;
+
+	if (vchan->is_dedicated) {
+		io_mode = SUN4I_DDMA_ADDR_MODE_IO;
+		linear_mode = SUN4I_DDMA_ADDR_MODE_LINEAR;
+		ram_type = SUN4I_DDMA_DRQ_TYPE_SDRAM;
+	} else {
+		io_mode = SUN4I_NDMA_ADDR_MODE_IO;
+		linear_mode = SUN4I_NDMA_ADDR_MODE_LINEAR;
+		ram_type = SUN4I_NDMA_DRQ_TYPE_SDRAM;
+	}
+
+	if (dir == DMA_MEM_TO_DEV)
+		endpoints = SUN4I_DMA_CFG_DST_DRQ_TYPE(vchan->endpoint) |
+			    SUN4I_DMA_CFG_DST_ADDR_MODE(io_mode) |
+			    SUN4I_DMA_CFG_SRC_DRQ_TYPE(ram_type) |
+			    SUN4I_DMA_CFG_SRC_ADDR_MODE(linear_mode);
+	else
+		endpoints = SUN4I_DMA_CFG_DST_DRQ_TYPE(ram_type) |
+			    SUN4I_DMA_CFG_DST_ADDR_MODE(linear_mode) |
+			    SUN4I_DMA_CFG_SRC_DRQ_TYPE(vchan->endpoint) |
+			    SUN4I_DMA_CFG_SRC_ADDR_MODE(io_mode);
+
+	for_each_sg(sgl, sg, sg_len, i) {
+		/* Figure out addresses */
+		if (dir == DMA_MEM_TO_DEV) {
+			srcaddr = sg_dma_address(sg);
+			dstaddr = sconfig->dst_addr;
+		} else {
+			srcaddr = sconfig->src_addr;
+			dstaddr = sg_dma_address(sg);
+		}
+
+		/*
+		 * These are the magic DMA engine timings that keep SPI going.
+		 * I haven't seen any interface on DMAEngine to configure
+		 * timings, and so far they seem to work for everything we
+		 * support, so I've kept them here. I don't know if other
+		 * devices need different timings because, as usual, we only
+		 * have the "para" bitfield meanings, but no comment on what
+		 * the values should be when doing a certain operation :|
+		 */
+		para = SUN4I_DDMA_MAGIC_SPI_PARAMETERS;
+
+		/* And make a suitable promise */
+		if (vchan->is_dedicated)
+			promise = generate_ddma_promise(chan, srcaddr, dstaddr,
+							sg_dma_len(sg),
+							sconfig);
+		else
+			promise = generate_ndma_promise(chan, srcaddr, dstaddr,
+							sg_dma_len(sg),
+							sconfig, dir);
+
+		if (!promise)
+			return NULL; /* TODO: should we free everything? */
+
+		promise->cfg |= endpoints;
+		promise->para = para;
+
+		/* Then add it to the contract */
+		list_add_tail(&promise->list, &contract->demands);
+	}
+
+	/*
+	 * Once we've got all the promises ready, add the contract
+	 * to the pending list on the vchan
+	 */
+	return vchan_tx_prep(&vchan->vc, &contract->vd, flags);
+}
+
+static int sun4i_dma_terminate_all(struct dma_chan *chan)
+{
+	struct sun4i_dma_dev *priv = to_sun4i_dma_dev(chan->device);
+	struct sun4i_dma_vchan *vchan = to_sun4i_dma_vchan(chan);
+	struct sun4i_dma_pchan *pchan = vchan->pchan;
+	LIST_HEAD(head);
+	unsigned long flags;
+
+	spin_lock_irqsave(&vchan->vc.lock, flags);
+	vchan_get_all_descriptors(&vchan->vc, &head);
+	spin_unlock_irqrestore(&vchan->vc.lock, flags);
+
+	/*
+	 * Clearing the configuration register will halt the pchan. Interrupts
+	 * may still trigger, so don't forget to disable them.
+	 */
+	if (pchan) {
+		if (pchan->is_dedicated)
+			writel(0, pchan->base + SUN4I_DDMA_CFG_REG);
+		else
+			writel(0, pchan->base + SUN4I_NDMA_CFG_REG);
+		set_pchan_interrupt(priv, pchan, 0, 0);
+		release_pchan(priv, pchan);
+	}
+
+	spin_lock_irqsave(&vchan->vc.lock, flags);
+	vchan_dma_desc_free_list(&vchan->vc, &head);
+	/* Clear these so the vchan is usable again */
+	vchan->processing = NULL;
+	vchan->pchan = NULL;
+	spin_unlock_irqrestore(&vchan->vc.lock, flags);
+
+	return 0;
+}
+
+static int sun4i_dma_config(struct dma_chan *chan,
+			    struct dma_slave_config *config)
+{
+	struct sun4i_dma_vchan *vchan = to_sun4i_dma_vchan(chan);
+
+	memcpy(&vchan->cfg, config, sizeof(*config));
+
+	return 0;
+}
+
+static struct dma_chan *sun4i_dma_of_xlate(struct of_phandle_args *dma_spec,
+					   struct of_dma *ofdma)
+{
+	struct sun4i_dma_dev *priv = ofdma->of_dma_data;
+	struct sun4i_dma_vchan *vchan;
+	struct dma_chan *chan;
+	u8 is_dedicated = dma_spec->args[0];
+	u8 endpoint = dma_spec->args[1];
+
+	/* Check if type is Normal or Dedicated */
+	if (is_dedicated != 0 && is_dedicated != 1)
+		return NULL;
+
+	/* Make sure the endpoint looks sane */
+	if ((is_dedicated && endpoint >= SUN4I_DDMA_DRQ_TYPE_LIMIT) ||
+	    (!is_dedicated && endpoint >= SUN4I_NDMA_DRQ_TYPE_LIMIT))
+		return NULL;
+
+	chan = dma_get_any_slave_channel(&priv->slave);
+	if (!chan)
+		return NULL;
+
+	/* Assign the endpoint to the vchan */
+	vchan = to_sun4i_dma_vchan(chan);
+	vchan->is_dedicated = is_dedicated;
+	vchan->endpoint = endpoint;
+
+	return chan;
+}
+
+static enum dma_status sun4i_dma_tx_status(struct dma_chan *chan,
+					   dma_cookie_t cookie,
+					   struct dma_tx_state *state)
+{
+	struct sun4i_dma_vchan *vchan = to_sun4i_dma_vchan(chan);
+	struct sun4i_dma_pchan *pchan = vchan->pchan;
+	struct sun4i_dma_contract *contract;
+	struct sun4i_dma_promise *promise;
+	struct virt_dma_desc *vd;
+	unsigned long flags;
+	enum dma_status ret;
+	size_t bytes = 0;
+
+	ret = dma_cookie_status(chan, cookie, state);
+	if (!state || (ret == DMA_COMPLETE))
+		return ret;
+
+	spin_lock_irqsave(&vchan->vc.lock, flags);
+	vd = vchan_find_desc(&vchan->vc, cookie);
+	if (!vd)
+		goto exit;
+	contract = to_sun4i_dma_contract(vd);
+
+	list_for_each_entry(promise, &contract->demands, list)
+		bytes += promise->len;
+
+	/*
+	 * The hardware is configured to return the remaining byte
+	 * quantity. If possible, replace the first listed element's
+	 * full size with the actual remaining amount
+	 */
+	promise = list_first_entry_or_null(&contract->demands,
+					   struct sun4i_dma_promise, list);
+	if (promise && pchan) {
+		bytes -= promise->len;
+		if (pchan->is_dedicated)
+			bytes += readl(pchan->base + SUN4I_DDMA_BYTE_COUNT_REG);
+		else
+			bytes += readl(pchan->base + SUN4I_NDMA_BYTE_COUNT_REG);
+	}
+
+exit:
+
+	dma_set_residue(state, bytes);
+	spin_unlock_irqrestore(&vchan->vc.lock, flags);
+
+	return ret;
+}
+
+static void sun4i_dma_issue_pending(struct dma_chan *chan)
+{
+	struct sun4i_dma_dev *priv = to_sun4i_dma_dev(chan->device);
+	struct sun4i_dma_vchan *vchan = to_sun4i_dma_vchan(chan);
+	unsigned long flags;
+
+	spin_lock_irqsave(&vchan->vc.lock, flags);
+
+	/*
+	 * If there are pending transactions for this vchan, push one of
+	 * them into the engine to get the ball rolling.
+	 */
+	if (vchan_issue_pending(&vchan->vc))
+		__execute_vchan_pending(priv, vchan);
+
+	spin_unlock_irqrestore(&vchan->vc.lock, flags);
+}
+
+static irqreturn_t sun4i_dma_interrupt(int irq, void *dev_id)
+{
+	struct sun4i_dma_dev *priv = dev_id;
+	struct sun4i_dma_pchan *pchans = priv->pchans, *pchan;
+	struct sun4i_dma_vchan *vchan;
+	struct sun4i_dma_contract *contract;
+	struct sun4i_dma_promise *promise;
+	unsigned long pendirq, irqs, disableirqs;
+	int bit, i, free_room, allow_mitigation = 1;
+
+	pendirq = readl_relaxed(priv->base + SUN4I_DMA_IRQ_PENDING_STATUS_REG);
+
+handle_pending:
+
+	disableirqs = 0;
+	free_room = 0;
+
+	for_each_set_bit(bit, &pendirq, 32) {
+		pchan = &pchans[bit >> 1];
+		vchan = pchan->vchan;
+		if (!vchan) /* a terminated channel may still interrupt */
+			continue;
+		contract = vchan->contract;
+
+		/*
+		 * Disable the IRQ and free the pchan if it's an end
+		 * interrupt (odd bit)
+		 */
+		if (bit & 1) {
+			spin_lock(&vchan->vc.lock);
+
+			/*
+			 * Move the promise into the completed list now that
+			 * we're done with it
+			 */
+			list_del(&vchan->processing->list);
+			list_add_tail(&vchan->processing->list,
+				      &contract->completed_demands);
+
+			/*
+			 * Cyclic DMA transfers are special:
+			 * - There's always something we can dispatch
+			 * - We need to run the callback
+			 * - Latency is very important, as this is used by audio
+			 * We therefore just cycle through the list and dispatch
+			 * whatever we have here, reusing the pchan. There's
+			 * no need to run the thread after this.
+			 *
+			 * For non-cyclic transfers we need to look around,
+			 * so we can program some more work, or notify the
+			 * client that their transfers have been completed.
+			 */
+			if (contract->is_cyclic) {
+				promise = get_next_cyclic_promise(contract);
+				vchan->processing = promise;
+				configure_pchan(pchan, promise);
+				vchan_cyclic_callback(&contract->vd);
+			} else {
+				vchan->processing = NULL;
+				vchan->pchan = NULL;
+
+				free_room = 1;
+				disableirqs |= BIT(bit);
+				release_pchan(priv, pchan);
+			}
+
+			spin_unlock(&vchan->vc.lock);
+		} else {
+			/* Half done interrupt */
+			if (contract->is_cyclic)
+				vchan_cyclic_callback(&contract->vd);
+			else
+				disableirqs |= BIT(bit);
+		}
+	}
+
+	/* Disable the IRQs for events we handled */
+	spin_lock(&priv->lock);
+	irqs = readl_relaxed(priv->base + SUN4I_DMA_IRQ_ENABLE_REG);
+	writel_relaxed(irqs & ~disableirqs,
+		       priv->base + SUN4I_DMA_IRQ_ENABLE_REG);
+	spin_unlock(&priv->lock);
+
+	/* Writing 1 to the pending field will clear the pending interrupt */
+	writel_relaxed(pendirq, priv->base + SUN4I_DMA_IRQ_PENDING_STATUS_REG);
+
+	/*
+	 * If a pchan was freed, we may be able to schedule something else,
+	 * so have a look around
+	 */
+	if (free_room) {
+		for (i = 0; i < SUN4I_DMA_NR_MAX_VCHANS; i++) {
+			vchan = &priv->vchans[i];
+			spin_lock(&vchan->vc.lock);
+			__execute_vchan_pending(priv, vchan);
+			spin_unlock(&vchan->vc.lock);
+		}
+	}
+
+	/*
+	 * Handle newer interrupts if some showed up, but only do it once
+	 * to avoid a too long a loop
+	 */
+	if (allow_mitigation) {
+		pendirq = readl_relaxed(priv->base +
+					SUN4I_DMA_IRQ_PENDING_STATUS_REG);
+		if (pendirq) {
+			allow_mitigation = 0;
+			goto handle_pending;
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static int sun4i_dma_probe(struct platform_device *pdev)
+{
+	struct sun4i_dma_dev *priv;
+	struct resource *res;
+	int i, j, ret;
+
+	priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(priv->base))
+		return PTR_ERR(priv->base);
+
+	priv->irq = platform_get_irq(pdev, 0);
+	if (priv->irq < 0) {
+		dev_err(&pdev->dev, "Cannot claim IRQ\n");
+		return priv->irq;
+	}
+
+	priv->clk = devm_clk_get(&pdev->dev, NULL);
+	if (IS_ERR(priv->clk)) {
+		dev_err(&pdev->dev, "No clock specified\n");
+		return PTR_ERR(priv->clk);
+	}
+
+	platform_set_drvdata(pdev, priv);
+	spin_lock_init(&priv->lock);
+
+	dma_cap_zero(priv->slave.cap_mask);
+	dma_cap_set(DMA_PRIVATE, priv->slave.cap_mask);
+	dma_cap_set(DMA_MEMCPY, priv->slave.cap_mask);
+	dma_cap_set(DMA_CYCLIC, priv->slave.cap_mask);
+	dma_cap_set(DMA_SLAVE, priv->slave.cap_mask);
+
+	INIT_LIST_HEAD(&priv->slave.channels);
+	priv->slave.device_free_chan_resources	= sun4i_dma_free_chan_resources;
+	priv->slave.device_tx_status		= sun4i_dma_tx_status;
+	priv->slave.device_issue_pending	= sun4i_dma_issue_pending;
+	priv->slave.device_prep_slave_sg	= sun4i_dma_prep_slave_sg;
+	priv->slave.device_prep_dma_memcpy	= sun4i_dma_prep_dma_memcpy;
+	priv->slave.device_prep_dma_cyclic	= sun4i_dma_prep_dma_cyclic;
+	priv->slave.device_config		= sun4i_dma_config;
+	priv->slave.device_terminate_all	= sun4i_dma_terminate_all;
+	priv->slave.copy_align			= 2;
+	priv->slave.src_addr_widths		= BIT(DMA_SLAVE_BUSWIDTH_1_BYTE) |
+						  BIT(DMA_SLAVE_BUSWIDTH_2_BYTES) |
+						  BIT(DMA_SLAVE_BUSWIDTH_4_BYTES);
+	priv->slave.dst_addr_widths		= BIT(DMA_SLAVE_BUSWIDTH_1_BYTE) |
+						  BIT(DMA_SLAVE_BUSWIDTH_2_BYTES) |
+						  BIT(DMA_SLAVE_BUSWIDTH_4_BYTES);
+	priv->slave.directions			= BIT(DMA_DEV_TO_MEM) |
+						  BIT(DMA_MEM_TO_DEV);
+	priv->slave.residue_granularity		= DMA_RESIDUE_GRANULARITY_BURST;
+
+	priv->slave.dev = &pdev->dev;
+
+	priv->pchans = devm_kcalloc(&pdev->dev, SUN4I_DMA_NR_MAX_CHANNELS,
+				    sizeof(struct sun4i_dma_pchan), GFP_KERNEL);
+	priv->vchans = devm_kcalloc(&pdev->dev, SUN4I_DMA_NR_MAX_VCHANS,
+				    sizeof(struct sun4i_dma_vchan), GFP_KERNEL);
+	if (!priv->vchans || !priv->pchans)
+		return -ENOMEM;
+
+	/*
+	 * [0..SUN4I_NDMA_NR_MAX_CHANNELS) are normal pchans, and
+	 * [SUN4I_NDMA_NR_MAX_CHANNELS..SUN4I_DMA_NR_MAX_CHANNELS) are
+	 * dedicated ones
+	 */
+	for (i = 0; i < SUN4I_NDMA_NR_MAX_CHANNELS; i++)
+		priv->pchans[i].base = priv->base +
+			SUN4I_NDMA_CHANNEL_REG_BASE(i);
+
+	for (j = 0; i < SUN4I_DMA_NR_MAX_CHANNELS; i++, j++) {
+		priv->pchans[i].base = priv->base +
+			SUN4I_DDMA_CHANNEL_REG_BASE(j);
+		priv->pchans[i].is_dedicated = 1;
+	}
+
+	for (i = 0; i < SUN4I_DMA_NR_MAX_VCHANS; i++) {
+		struct sun4i_dma_vchan *vchan = &priv->vchans[i];
+
+		spin_lock_init(&vchan->vc.lock);
+		vchan->vc.desc_free = sun4i_dma_free_contract;
+		vchan_init(&vchan->vc, &priv->slave);
+	}
+
+	ret = clk_prepare_enable(priv->clk);
+	if (ret) {
+		dev_err(&pdev->dev, "Couldn't enable the clock\n");
+		return ret;
+	}
+
+	/*
+	 * Make sure the IRQs are all disabled and accounted for. The bootloader
+	 * likes to leave these dirty
+	 */
+	writel(0, priv->base + SUN4I_DMA_IRQ_ENABLE_REG);
+	writel(0xFFFFFFFF, priv->base + SUN4I_DMA_IRQ_PENDING_STATUS_REG);
+
+	ret = devm_request_irq(&pdev->dev, priv->irq, sun4i_dma_interrupt,
+			       0, dev_name(&pdev->dev), priv);
+	if (ret) {
+		dev_err(&pdev->dev, "Cannot request IRQ\n");
+		goto err_clk_disable;
+	}
+
+	ret = dma_async_device_register(&priv->slave);
+	if (ret) {
+		dev_warn(&pdev->dev, "Failed to register DMA engine device\n");
+		goto err_clk_disable;
+	}
+
+	ret = of_dma_controller_register(pdev->dev.of_node, sun4i_dma_of_xlate,
+					 priv);
+	if (ret) {
+		dev_err(&pdev->dev, "of_dma_controller_register failed\n");
+		goto err_dma_unregister;
+	}
+
+	dev_dbg(&pdev->dev, "Successfully probed SUN4I_DMA\n");
+
+	return 0;
+
+err_dma_unregister:
+	dma_async_device_unregister(&priv->slave);
+err_clk_disable:
+	clk_disable_unprepare(priv->clk);
+	return ret;
+}
+
+static int sun4i_dma_remove(struct platform_device *pdev)
+{
+	struct sun4i_dma_dev *priv = platform_get_drvdata(pdev);
+
+	/* Disable IRQ so no more work is scheduled */
+	disable_irq(priv->irq);
+
+	of_dma_controller_free(pdev->dev.of_node);
+	dma_async_device_unregister(&priv->slave);
+
+	clk_disable_unprepare(priv->clk);
+
+	return 0;
+}
+
+static const struct of_device_id sun4i_dma_match[] = {
+	{ .compatible = "allwinner,sun4i-a10-dma" },
+	{ /* sentinel */ },
+};
+
+static struct platform_driver sun4i_dma_driver = {
+	.probe	= sun4i_dma_probe,
+	.remove	= sun4i_dma_remove,
+	.driver	= {
+		.name		= "sun4i-dma",
+		.of_match_table	= sun4i_dma_match,
+	},
+};
+
+module_platform_driver(sun4i_dma_driver);
+
+MODULE_DESCRIPTION("Allwinner A10 Dedicated DMA Controller Driver");
+MODULE_AUTHOR("Emilio López <emilio@elopez.com.ar>");
+MODULE_LICENSE("GPL");
-- 
cgit v1.2.3


From 84ad6e5cd3e8b365c893f31787864cae5500610b Mon Sep 17 00:00:00 2001
From: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Date: Wed, 19 Aug 2015 22:19:54 +0530
Subject: leds/powernv: Add driver for PowerNV platform

This patch implements LED driver for PowerNV platform using the existing
generic LED class framework.

PowerNV platform has below type of LEDs:
  - System attention
      Indicates there is a problem with the system that needs attention.
  - Identify
      Helps the user locate/identify a particular FRU or resource in the
      system.
  - Fault
      Indicates there is a problem with the FRU or resource at the
      location with which the indicator is associated.

We register classdev structures for all individual LEDs detected on the
system through LED specific device tree nodes. Device tree nodes specify
what all kind of LEDs present on the same location code. It registers
LED classdev structure for each of them.

All the system LEDs can be found in the same regular path /sys/class/leds/.
We don't use LED colors. We use LED node and led-types property to form
LED classdev. Our LEDs have names in this format.

        <location_code>:<attention|identify|fault>

Any positive brightness value would turn on the LED and a zero value would
turn off the LED. The driver will return LED_FULL (255) for any turned on
LED and LED_OFF (0) for any turned off LED.

The platform level implementation of LED get and set state has been
achieved through OPAL calls. These calls are made available for the
driver by exporting from architecture specific codes.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Tested-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Jacek Anaszewski <j.anaszewski@samsung.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 .../devicetree/bindings/leds/leds-powernv.txt      |  26 ++
 drivers/leds/Kconfig                               |  11 +
 drivers/leds/Makefile                              |   1 +
 drivers/leds/leds-powernv.c                        | 345 +++++++++++++++++++++
 4 files changed, 383 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/leds/leds-powernv.txt
 create mode 100644 drivers/leds/leds-powernv.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/leds/leds-powernv.txt b/Documentation/devicetree/bindings/leds/leds-powernv.txt
new file mode 100644
index 000000000000..66655690f749
--- /dev/null
+++ b/Documentation/devicetree/bindings/leds/leds-powernv.txt
@@ -0,0 +1,26 @@
+Device Tree binding for LEDs on IBM Power Systems
+-------------------------------------------------
+
+Required properties:
+- compatible : Should be "ibm,opal-v3-led".
+- led-mode   : Should be "lightpath" or "guidinglight".
+
+Each location code of FRU/Enclosure must be expressed in the
+form of a sub-node.
+
+Required properties for the sub nodes:
+- led-types : Supported LED types (attention/identify/fault) provided
+              in the form of string array.
+
+Example:
+
+leds {
+	compatible = "ibm,opal-v3-led";
+	led-mode = "lightpath";
+
+	U78C9.001.RST0027-P1-C1 {
+		led-types = "identify", "fault";
+	};
+	...
+	...
+};
diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
index 9ad35f72ab4c..f218cc3acc10 100644
--- a/drivers/leds/Kconfig
+++ b/drivers/leds/Kconfig
@@ -560,6 +560,17 @@ config LEDS_BLINKM
 	  This option enables support for the BlinkM RGB LED connected
 	  through I2C. Say Y to enable support for the BlinkM LED.
 
+config LEDS_POWERNV
+	tristate "LED support for PowerNV Platform"
+	depends on LEDS_CLASS
+	depends on PPC_POWERNV
+	depends on OF
+	help
+	  This option enables support for the system LEDs present on
+	  PowerNV platforms. Say 'y' to enable this support in kernel.
+	  To compile this driver as a module, choose 'm' here: the module
+	  will be called leds-powernv.
+
 config LEDS_SYSCON
 	bool "LED support for LEDs on system controllers"
 	depends on LEDS_CLASS=y
diff --git a/drivers/leds/Makefile b/drivers/leds/Makefile
index 8d6a24a2f513..6a943d16ecab 100644
--- a/drivers/leds/Makefile
+++ b/drivers/leds/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_LEDS_VERSATILE)		+= leds-versatile.o
 obj-$(CONFIG_LEDS_MENF21BMC)		+= leds-menf21bmc.o
 obj-$(CONFIG_LEDS_PM8941_WLED)		+= leds-pm8941-wled.o
 obj-$(CONFIG_LEDS_KTD2692)		+= leds-ktd2692.o
+obj-$(CONFIG_LEDS_POWERNV)		+= leds-powernv.o
 
 # LED SPI Drivers
 obj-$(CONFIG_LEDS_DAC124S085)		+= leds-dac124s085.o
diff --git a/drivers/leds/leds-powernv.c b/drivers/leds/leds-powernv.c
new file mode 100644
index 000000000000..a2fea192573b
--- /dev/null
+++ b/drivers/leds/leds-powernv.c
@@ -0,0 +1,345 @@
+/*
+ * PowerNV LED Driver
+ *
+ * Copyright IBM Corp. 2015
+ *
+ * Author: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
+ * Author: Anshuman Khandual <khandual@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/leds.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <asm/opal.h>
+
+/* Map LED type to description. */
+struct led_type_map {
+	const int	type;
+	const char	*desc;
+};
+static const struct led_type_map led_type_map[] = {
+	{OPAL_SLOT_LED_TYPE_ID,		POWERNV_LED_TYPE_IDENTIFY},
+	{OPAL_SLOT_LED_TYPE_FAULT,	POWERNV_LED_TYPE_FAULT},
+	{OPAL_SLOT_LED_TYPE_ATTN,	POWERNV_LED_TYPE_ATTENTION},
+	{-1,				NULL},
+};
+
+struct powernv_led_common {
+	/*
+	 * By default unload path resets all the LEDs. But on PowerNV
+	 * platform we want to retain LED state across reboot as these
+	 * are controlled by firmware. Also service processor can modify
+	 * the LEDs independent of OS. Hence avoid resetting LEDs in
+	 * unload path.
+	 */
+	bool		led_disabled;
+
+	/* Max supported LED type */
+	__be64		max_led_type;
+
+	/* glabal lock */
+	struct mutex	lock;
+};
+
+/* PowerNV LED data */
+struct powernv_led_data {
+	struct led_classdev	cdev;
+	char			*loc_code;	/* LED location code */
+	int			led_type;	/* OPAL_SLOT_LED_TYPE_* */
+
+	struct powernv_led_common *common;
+};
+
+
+/* Returns OPAL_SLOT_LED_TYPE_* for given led type string */
+static int powernv_get_led_type(const char *led_type_desc)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(led_type_map); i++)
+		if (!strcmp(led_type_map[i].desc, led_type_desc))
+			return led_type_map[i].type;
+
+	return -1;
+}
+
+/*
+ * This commits the state change of the requested LED through an OPAL call.
+ * This function is called from work queue task context when ever it gets
+ * scheduled. This function can sleep at opal_async_wait_response call.
+ */
+static void powernv_led_set(struct powernv_led_data *powernv_led,
+			    enum led_brightness value)
+{
+	int rc, token;
+	u64 led_mask, led_value = 0;
+	__be64 max_type;
+	struct opal_msg msg;
+	struct device *dev = powernv_led->cdev.dev;
+	struct powernv_led_common *powernv_led_common = powernv_led->common;
+
+	/* Prepare for the OPAL call */
+	max_type = powernv_led_common->max_led_type;
+	led_mask = OPAL_SLOT_LED_STATE_ON << powernv_led->led_type;
+	if (value)
+		led_value = led_mask;
+
+	/* OPAL async call */
+	token = opal_async_get_token_interruptible();
+	if (token < 0) {
+		if (token != -ERESTARTSYS)
+			dev_err(dev, "%s: Couldn't get OPAL async token\n",
+				__func__);
+		return;
+	}
+
+	rc = opal_leds_set_ind(token, powernv_led->loc_code,
+			       led_mask, led_value, &max_type);
+	if (rc != OPAL_ASYNC_COMPLETION) {
+		dev_err(dev, "%s: OPAL set LED call failed for %s [rc=%d]\n",
+			__func__, powernv_led->loc_code, rc);
+		goto out_token;
+	}
+
+	rc = opal_async_wait_response(token, &msg);
+	if (rc) {
+		dev_err(dev,
+			"%s: Failed to wait for the async response [rc=%d]\n",
+			__func__, rc);
+		goto out_token;
+	}
+
+	rc = be64_to_cpu(msg.params[1]);
+	if (rc != OPAL_SUCCESS)
+		dev_err(dev, "%s : OAPL async call returned failed [rc=%d]\n",
+			__func__, rc);
+
+out_token:
+	opal_async_release_token(token);
+}
+
+/*
+ * This function fetches the LED state for a given LED type for
+ * mentioned LED classdev structure.
+ */
+static enum led_brightness powernv_led_get(struct powernv_led_data *powernv_led)
+{
+	int rc;
+	__be64 mask, value, max_type;
+	u64 led_mask, led_value;
+	struct device *dev = powernv_led->cdev.dev;
+	struct powernv_led_common *powernv_led_common = powernv_led->common;
+
+	/* Fetch all LED status */
+	mask = cpu_to_be64(0);
+	value = cpu_to_be64(0);
+	max_type = powernv_led_common->max_led_type;
+
+	rc = opal_leds_get_ind(powernv_led->loc_code,
+			       &mask, &value, &max_type);
+	if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL) {
+		dev_err(dev, "%s: OPAL get led call failed [rc=%d]\n",
+			__func__, rc);
+		return LED_OFF;
+	}
+
+	led_mask = be64_to_cpu(mask);
+	led_value = be64_to_cpu(value);
+
+	/* LED status available */
+	if (!((led_mask >> powernv_led->led_type) & OPAL_SLOT_LED_STATE_ON)) {
+		dev_err(dev, "%s: LED status not available for %s\n",
+			__func__, powernv_led->cdev.name);
+		return LED_OFF;
+	}
+
+	/* LED status value */
+	if ((led_value >> powernv_led->led_type) & OPAL_SLOT_LED_STATE_ON)
+		return LED_FULL;
+
+	return LED_OFF;
+}
+
+/*
+ * LED classdev 'brightness_get' function. This schedules work
+ * to update LED state.
+ */
+static void powernv_brightness_set(struct led_classdev *led_cdev,
+				   enum led_brightness value)
+{
+	struct powernv_led_data *powernv_led =
+		container_of(led_cdev, struct powernv_led_data, cdev);
+	struct powernv_led_common *powernv_led_common = powernv_led->common;
+
+	/* Do not modify LED in unload path */
+	if (powernv_led_common->led_disabled)
+		return;
+
+	mutex_lock(&powernv_led_common->lock);
+	powernv_led_set(powernv_led, value);
+	mutex_unlock(&powernv_led_common->lock);
+}
+
+/* LED classdev 'brightness_get' function */
+static enum led_brightness powernv_brightness_get(struct led_classdev *led_cdev)
+{
+	struct powernv_led_data *powernv_led =
+		container_of(led_cdev, struct powernv_led_data, cdev);
+
+	return powernv_led_get(powernv_led);
+}
+
+/*
+ * This function registers classdev structure for any given type of LED on
+ * a given child LED device node.
+ */
+static int powernv_led_create(struct device *dev,
+			      struct powernv_led_data *powernv_led,
+			      const char *led_type_desc)
+{
+	int rc;
+
+	/* Make sure LED type is supported */
+	powernv_led->led_type = powernv_get_led_type(led_type_desc);
+	if (powernv_led->led_type == -1) {
+		dev_warn(dev, "%s: No support for led type : %s\n",
+			 __func__, led_type_desc);
+		return -EINVAL;
+	}
+
+	/* Create the name for classdev */
+	powernv_led->cdev.name = devm_kasprintf(dev, GFP_KERNEL, "%s:%s",
+						powernv_led->loc_code,
+						led_type_desc);
+	if (!powernv_led->cdev.name) {
+		dev_err(dev,
+			"%s: Memory allocation failed for classdev name\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	powernv_led->cdev.brightness_set = powernv_brightness_set;
+	powernv_led->cdev.brightness_get = powernv_brightness_get;
+	powernv_led->cdev.brightness = LED_OFF;
+	powernv_led->cdev.max_brightness = LED_FULL;
+
+	/* Register the classdev */
+	rc = devm_led_classdev_register(dev, &powernv_led->cdev);
+	if (rc) {
+		dev_err(dev, "%s: Classdev registration failed for %s\n",
+			__func__, powernv_led->cdev.name);
+	}
+
+	return rc;
+}
+
+/* Go through LED device tree node and register LED classdev structure */
+static int powernv_led_classdev(struct platform_device *pdev,
+				struct device_node *led_node,
+				struct powernv_led_common *powernv_led_common)
+{
+	const char *cur = NULL;
+	int rc = -1;
+	struct property *p;
+	struct device_node *np;
+	struct powernv_led_data *powernv_led;
+	struct device *dev = &pdev->dev;
+
+	for_each_child_of_node(led_node, np) {
+		p = of_find_property(np, "led-types", NULL);
+		if (!p)
+			continue;
+
+		while ((cur = of_prop_next_string(p, cur)) != NULL) {
+			powernv_led = devm_kzalloc(dev, sizeof(*powernv_led),
+						   GFP_KERNEL);
+			if (!powernv_led)
+				return -ENOMEM;
+
+			powernv_led->common = powernv_led_common;
+			powernv_led->loc_code = (char *)np->name;
+
+			rc = powernv_led_create(dev, powernv_led, cur);
+			if (rc)
+				return rc;
+		} /* while end */
+	}
+
+	return rc;
+}
+
+/* Platform driver probe */
+static int powernv_led_probe(struct platform_device *pdev)
+{
+	struct device_node *led_node;
+	struct powernv_led_common *powernv_led_common;
+	struct device *dev = &pdev->dev;
+
+	led_node = of_find_node_by_path("/ibm,opal/leds");
+	if (!led_node) {
+		dev_err(dev, "%s: LED parent device node not found\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	powernv_led_common = devm_kzalloc(dev, sizeof(*powernv_led_common),
+					  GFP_KERNEL);
+	if (!powernv_led_common)
+		return -ENOMEM;
+
+	mutex_init(&powernv_led_common->lock);
+	powernv_led_common->max_led_type = cpu_to_be64(OPAL_SLOT_LED_TYPE_MAX);
+
+	platform_set_drvdata(pdev, powernv_led_common);
+
+	return powernv_led_classdev(pdev, led_node, powernv_led_common);
+}
+
+/* Platform driver remove */
+static int powernv_led_remove(struct platform_device *pdev)
+{
+	struct powernv_led_common *powernv_led_common;
+
+	/* Disable LED operation */
+	powernv_led_common = platform_get_drvdata(pdev);
+	powernv_led_common->led_disabled = true;
+
+	/* Destroy lock */
+	mutex_destroy(&powernv_led_common->lock);
+
+	dev_info(&pdev->dev, "PowerNV led module unregistered\n");
+	return 0;
+}
+
+/* Platform driver property match */
+static const struct of_device_id powernv_led_match[] = {
+	{
+		.compatible	= "ibm,opal-v3-led",
+	},
+	{},
+};
+MODULE_DEVICE_TABLE(of, powernv_led_match);
+
+static struct platform_driver powernv_led_driver = {
+	.probe	= powernv_led_probe,
+	.remove = powernv_led_remove,
+	.driver = {
+		.name = "powernv-led-driver",
+		.of_match_table = powernv_led_match,
+	},
+};
+
+module_platform_driver(powernv_led_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("PowerNV LED driver");
+MODULE_AUTHOR("Vasant Hegde <hegdevasant@linux.vnet.ibm.com>");
-- 
cgit v1.2.3


From c70727a5bc18a5a233fddc6056d1de9144d7a293 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Fri, 17 Jul 2015 06:51:36 +0200
Subject: xen: allow more than 512 GB of RAM for 64 bit pv-domains

64 bit pv-domains under Xen are limited to 512 GB of RAM today. The
main reason has been the 3 level p2m tree, which was replaced by the
virtual mapped linear p2m list. Parallel to the p2m list which is
being used by the kernel itself there is a 3 level mfn tree for usage
by the Xen tools and eventually for crash dump analysis. For this tree
the linear p2m list can serve as a replacement, too. As the kernel
can't know whether the tools are capable of dealing with the p2m list
instead of the mfn tree, the limit of 512 GB can't be dropped in all
cases.

This patch replaces the hard limit by a kernel parameter which tells
the kernel to obey the 512 GB limit or not. The default is selected by
a configuration parameter which specifies whether the 512 GB limit
should be active per default for domUs (domain save/restore/migration
and crash dump analysis are affected).

Memory above the domain limit is returned to the hypervisor instead of
being identity mapped, which was wrong anyway.

The kernel configuration parameter to specify the maximum size of a
domain can be deleted, as it is not relevant any more.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 Documentation/kernel-parameters.txt |  7 +++++
 arch/x86/include/asm/xen/page.h     |  4 ---
 arch/x86/xen/Kconfig                | 20 ++++++++-----
 arch/x86/xen/p2m.c                  | 10 +++----
 arch/x86/xen/setup.c                | 59 +++++++++++++++++++++++++++++++------
 5 files changed, 73 insertions(+), 27 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1d6f0459cd7b..47f7b985f833 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -4078,6 +4078,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			plus one apbt timer for broadcast timer.
 			x86_intel_mid_timer=apbt_only | lapic_and_apbt
 
+	xen_512gb_limit		[KNL,X86-64,XEN]
+			Restricts the kernel running paravirtualized under Xen
+			to use only up to 512 GB of RAM. The reason to do so is
+			crash analysis tools and Xen tools for doing domain
+			save/restore/migration must be enabled to handle larger
+			domains.
+
 	xen_emul_unplug=		[HW,X86,XEN]
 			Unplug Xen emulated devices
 			Format: [unplug0,][unplug1]
diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index c44a5d53e464..10ef14b2616d 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -35,10 +35,6 @@ typedef struct xpaddr {
 #define FOREIGN_FRAME(m)	((m) | FOREIGN_FRAME_BIT)
 #define IDENTITY_FRAME(m)	((m) | IDENTITY_FRAME_BIT)
 
-/* Maximum amount of memory we can handle in a domain in pages */
-#define MAX_DOMAIN_PAGES						\
-    ((unsigned long)((u64)CONFIG_XEN_MAX_DOMAIN_MEMORY * 1024 * 1024 * 1024 / PAGE_SIZE))
-
 extern unsigned long *machine_to_phys_mapping;
 extern unsigned long  machine_to_phys_nr;
 extern unsigned long *xen_p2m_addr;
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index e88fda867a33..7bcf21b865d1 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -23,14 +23,18 @@ config XEN_PVHVM
 	def_bool y
 	depends on XEN && PCI && X86_LOCAL_APIC
 
-config XEN_MAX_DOMAIN_MEMORY
-       int
-       default 500 if X86_64
-       default 64 if X86_32
-       depends on XEN
-       help
-         This only affects the sizing of some bss arrays, the unused
-         portions of which are freed.
+config XEN_512GB
+	bool "Limit Xen pv-domain memory to 512GB"
+	depends on XEN && X86_64
+	default y
+	help
+	  Limit paravirtualized user domains to 512GB of RAM.
+
+	  The Xen tools and crash dump analysis tools might not support
+	  pv-domains with more than 512 GB of RAM. This option controls the
+	  default setting of the kernel to use only up to 512 GB or more.
+	  It is always possible to change the default via specifying the
+	  boot parameter "xen_512gb_limit".
 
 config XEN_SAVE_RESTORE
        bool
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index c719f7c36cb8..c059ca1c8293 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -517,7 +517,7 @@ static pte_t *alloc_p2m_pmd(unsigned long addr, pte_t *pte_pg)
  */
 static bool alloc_p2m(unsigned long pfn)
 {
-	unsigned topidx, mididx;
+	unsigned topidx;
 	unsigned long *top_mfn_p, *mid_mfn;
 	pte_t *ptep, *pte_pg;
 	unsigned int level;
@@ -525,9 +525,6 @@ static bool alloc_p2m(unsigned long pfn)
 	unsigned long addr = (unsigned long)(xen_p2m_addr + pfn);
 	unsigned long p2m_pfn;
 
-	topidx = p2m_top_index(pfn);
-	mididx = p2m_mid_index(pfn);
-
 	ptep = lookup_address(addr, &level);
 	BUG_ON(!ptep || level != PG_LEVEL_4K);
 	pte_pg = (pte_t *)((unsigned long)ptep & ~(PAGE_SIZE - 1));
@@ -539,7 +536,8 @@ static bool alloc_p2m(unsigned long pfn)
 			return false;
 	}
 
-	if (p2m_top_mfn) {
+	if (p2m_top_mfn && pfn < MAX_P2M_PFN) {
+		topidx = p2m_top_index(pfn);
 		top_mfn_p = &p2m_top_mfn[topidx];
 		mid_mfn = ACCESS_ONCE(p2m_top_mfn_p[topidx]);
 
@@ -596,7 +594,7 @@ static bool alloc_p2m(unsigned long pfn)
 			wmb(); /* Tools are synchronizing via p2m_generation. */
 			HYPERVISOR_shared_info->arch.p2m_generation++;
 			if (mid_mfn)
-				mid_mfn[mididx] = virt_to_mfn(p2m);
+				mid_mfn[p2m_mid_index(pfn)] = virt_to_mfn(p2m);
 			p2m = NULL;
 		}
 
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index b096d02d34ac..f96002107691 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -33,6 +33,8 @@
 #include "p2m.h"
 #include "mmu.h"
 
+#define GB(x) ((uint64_t)(x) * 1024 * 1024 * 1024)
+
 /* Amount of extra memory space we add to the e820 ranges */
 struct xen_memory_region xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS] __initdata;
 
@@ -69,6 +71,26 @@ static unsigned long xen_remap_mfn __initdata = INVALID_P2M_ENTRY;
  */
 #define EXTRA_MEM_RATIO		(10)
 
+static bool xen_512gb_limit __initdata = IS_ENABLED(CONFIG_XEN_512GB);
+
+static void __init xen_parse_512gb(void)
+{
+	bool val = false;
+	char *arg;
+
+	arg = strstr(xen_start_info->cmd_line, "xen_512gb_limit");
+	if (!arg)
+		return;
+
+	arg = strstr(xen_start_info->cmd_line, "xen_512gb_limit=");
+	if (!arg)
+		val = true;
+	else if (strtobool(arg + strlen("xen_512gb_limit="), &val))
+		return;
+
+	xen_512gb_limit = val;
+}
+
 static void __init xen_add_extra_mem(phys_addr_t start, phys_addr_t size)
 {
 	int i;
@@ -503,12 +525,29 @@ void __init xen_remap_memory(void)
 	pr_info("Remapped %ld page(s)\n", remapped);
 }
 
+static unsigned long __init xen_get_pages_limit(void)
+{
+	unsigned long limit;
+
+#ifdef CONFIG_X86_32
+	limit = GB(64) / PAGE_SIZE;
+#else
+	limit = ~0ul;
+	if (!xen_initial_domain() && xen_512gb_limit)
+		limit = GB(512) / PAGE_SIZE;
+#endif
+	return limit;
+}
+
 static unsigned long __init xen_get_max_pages(void)
 {
-	unsigned long max_pages = MAX_DOMAIN_PAGES;
+	unsigned long max_pages, limit;
 	domid_t domid = DOMID_SELF;
 	int ret;
 
+	limit = xen_get_pages_limit();
+	max_pages = limit;
+
 	/*
 	 * For the initial domain we use the maximum reservation as
 	 * the maximum page.
@@ -524,7 +563,7 @@ static unsigned long __init xen_get_max_pages(void)
 			max_pages = ret;
 	}
 
-	return min(max_pages, MAX_DOMAIN_PAGES);
+	return min(max_pages, limit);
 }
 
 static void __init xen_align_and_add_e820_region(phys_addr_t start,
@@ -699,7 +738,7 @@ static void __init xen_reserve_xen_mfnlist(void)
  **/
 char * __init xen_memory_setup(void)
 {
-	unsigned long max_pfn = xen_start_info->nr_pages;
+	unsigned long max_pfn;
 	phys_addr_t mem_end, addr, size, chunk_size;
 	u32 type;
 	int rc;
@@ -709,7 +748,9 @@ char * __init xen_memory_setup(void)
 	int i;
 	int op;
 
-	max_pfn = min(MAX_DOMAIN_PAGES, max_pfn);
+	xen_parse_512gb();
+	max_pfn = xen_get_pages_limit();
+	max_pfn = min(max_pfn, xen_start_info->nr_pages);
 	mem_end = PFN_PHYS(max_pfn);
 
 	memmap.nr_entries = E820MAX;
@@ -762,12 +803,15 @@ char * __init xen_memory_setup(void)
 	 * is limited to the max size of lowmem, so that it doesn't
 	 * get completely filled.
 	 *
+	 * Make sure we have no memory above max_pages, as this area
+	 * isn't handled by the p2m management.
+	 *
 	 * In principle there could be a problem in lowmem systems if
 	 * the initial memory is also very large with respect to
 	 * lowmem, but we won't try to deal with that here.
 	 */
-	extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)),
-			  extra_pages);
+	extra_pages = min3(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)),
+			   extra_pages, max_pages - max_pfn);
 	i = 0;
 	addr = xen_e820_map[0].addr;
 	size = xen_e820_map[0].size;
@@ -803,9 +847,6 @@ char * __init xen_memory_setup(void)
 	/*
 	 * Set the rest as identity mapped, in case PCI BARs are
 	 * located here.
-	 *
-	 * PFNs above MAX_P2M_PFN are considered identity mapped as
-	 * well.
 	 */
 	set_phys_range_identity(addr / PAGE_SIZE, ~0ul);
 
-- 
cgit v1.2.3


From 5f141548824cebbff2e838ff401c34e667797467 Mon Sep 17 00:00:00 2001
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Date: Mon, 10 Aug 2015 16:34:33 -0400
Subject: xen/PMU: Sysfs interface for setting Xen PMU mode

Set Xen's PMU mode via /sys/hypervisor/pmu/pmu_mode. Add XENPMU hypercall.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 Documentation/ABI/testing/sysfs-hypervisor-pmu |  23 +++++
 arch/x86/include/asm/xen/hypercall.h           |   6 ++
 arch/x86/xen/Kconfig                           |   1 +
 drivers/xen/Kconfig                            |   3 +
 drivers/xen/sys-hypervisor.c                   | 136 ++++++++++++++++++++++++-
 include/xen/interface/xen.h                    |   1 +
 include/xen/interface/xenpmu.h                 |  59 +++++++++++
 7 files changed, 228 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/sysfs-hypervisor-pmu
 create mode 100644 include/xen/interface/xenpmu.h

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-hypervisor-pmu b/Documentation/ABI/testing/sysfs-hypervisor-pmu
new file mode 100644
index 000000000000..224faa105e18
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-hypervisor-pmu
@@ -0,0 +1,23 @@
+What:		/sys/hypervisor/pmu/pmu_mode
+Date:		August 2015
+KernelVersion:	4.3
+Contact:	Boris Ostrovsky <boris.ostrovsky@oracle.com>
+Description:
+		Describes mode that Xen's performance-monitoring unit (PMU)
+		uses. Accepted values are
+			"off"  -- PMU is disabled
+			"self" -- The guest can profile itself
+			"hv"   -- The guest can profile itself and, if it is
+				  privileged (e.g. dom0), the hypervisor
+			"all" --  The guest can profile itself, the hypervisor
+				  and all other guests. Only available to
+				  privileged guests.
+
+What:           /sys/hypervisor/pmu/pmu_features
+Date:           August 2015
+KernelVersion:  4.3
+Contact:        Boris Ostrovsky <boris.ostrovsky@oracle.com>
+Description:
+		Describes Xen PMU features (as an integer). A set bit indicates
+		that the corresponding feature is enabled. See
+		include/xen/interface/xenpmu.h for available features
diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index ca08a27b90b3..83aea8055119 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -465,6 +465,12 @@ HYPERVISOR_tmem_op(
 	return _hypercall1(int, tmem_op, op);
 }
 
+static inline int
+HYPERVISOR_xenpmu_op(unsigned int op, void *arg)
+{
+	return _hypercall2(int, xenpmu_op, op, arg);
+}
+
 static inline void
 MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set)
 {
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 7bcf21b865d1..a8ffdb85656f 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -7,6 +7,7 @@ config XEN
 	depends on PARAVIRT
 	select PARAVIRT_CLOCK
 	select XEN_HAVE_PVMMU
+	select XEN_HAVE_VPMU
 	depends on X86_64 || (X86_32 && X86_PAE)
 	depends on X86_TSC
 	help
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 936760455dd5..73708acce3ca 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -288,4 +288,7 @@ config XEN_SYMS
           Exports hypervisor symbols (along with their types and addresses) via
           /proc/xen/xensyms file, similar to /proc/kallsyms
 
+config XEN_HAVE_VPMU
+       bool
+
 endmenu
diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 96453f8a85c5..b5a7342e0ba5 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -20,6 +20,9 @@
 #include <xen/xenbus.h>
 #include <xen/interface/xen.h>
 #include <xen/interface/version.h>
+#ifdef CONFIG_XEN_HAVE_VPMU
+#include <xen/interface/xenpmu.h>
+#endif
 
 #define HYPERVISOR_ATTR_RO(_name) \
 static struct hyp_sysfs_attr  _name##_attr = __ATTR_RO(_name)
@@ -368,6 +371,126 @@ static void xen_properties_destroy(void)
 	sysfs_remove_group(hypervisor_kobj, &xen_properties_group);
 }
 
+#ifdef CONFIG_XEN_HAVE_VPMU
+struct pmu_mode {
+	const char *name;
+	uint32_t mode;
+};
+
+static struct pmu_mode pmu_modes[] = {
+	{"off", XENPMU_MODE_OFF},
+	{"self", XENPMU_MODE_SELF},
+	{"hv", XENPMU_MODE_HV},
+	{"all", XENPMU_MODE_ALL}
+};
+
+static ssize_t pmu_mode_store(struct hyp_sysfs_attr *attr,
+			      const char *buffer, size_t len)
+{
+	int ret;
+	struct xen_pmu_params xp;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(pmu_modes); i++) {
+		if (strncmp(buffer, pmu_modes[i].name, len - 1) == 0) {
+			xp.val = pmu_modes[i].mode;
+			break;
+		}
+	}
+
+	if (i == ARRAY_SIZE(pmu_modes))
+		return -EINVAL;
+
+	xp.version.maj = XENPMU_VER_MAJ;
+	xp.version.min = XENPMU_VER_MIN;
+	ret = HYPERVISOR_xenpmu_op(XENPMU_mode_set, &xp);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t pmu_mode_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	int ret;
+	struct xen_pmu_params xp;
+	int i;
+	uint32_t mode;
+
+	xp.version.maj = XENPMU_VER_MAJ;
+	xp.version.min = XENPMU_VER_MIN;
+	ret = HYPERVISOR_xenpmu_op(XENPMU_mode_get, &xp);
+	if (ret)
+		return ret;
+
+	mode = (uint32_t)xp.val;
+	for (i = 0; i < ARRAY_SIZE(pmu_modes); i++) {
+		if (mode == pmu_modes[i].mode)
+			return sprintf(buffer, "%s\n", pmu_modes[i].name);
+	}
+
+	return -EINVAL;
+}
+HYPERVISOR_ATTR_RW(pmu_mode);
+
+static ssize_t pmu_features_store(struct hyp_sysfs_attr *attr,
+				  const char *buffer, size_t len)
+{
+	int ret;
+	uint32_t features;
+	struct xen_pmu_params xp;
+
+	ret = kstrtou32(buffer, 0, &features);
+	if (ret)
+		return ret;
+
+	xp.val = features;
+	xp.version.maj = XENPMU_VER_MAJ;
+	xp.version.min = XENPMU_VER_MIN;
+	ret = HYPERVISOR_xenpmu_op(XENPMU_feature_set, &xp);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t pmu_features_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	int ret;
+	struct xen_pmu_params xp;
+
+	xp.version.maj = XENPMU_VER_MAJ;
+	xp.version.min = XENPMU_VER_MIN;
+	ret = HYPERVISOR_xenpmu_op(XENPMU_feature_get, &xp);
+	if (ret)
+		return ret;
+
+	return sprintf(buffer, "0x%x\n", (uint32_t)xp.val);
+}
+HYPERVISOR_ATTR_RW(pmu_features);
+
+static struct attribute *xen_pmu_attrs[] = {
+	&pmu_mode_attr.attr,
+	&pmu_features_attr.attr,
+	NULL
+};
+
+static const struct attribute_group xen_pmu_group = {
+	.name = "pmu",
+	.attrs = xen_pmu_attrs,
+};
+
+static int __init xen_pmu_init(void)
+{
+	return sysfs_create_group(hypervisor_kobj, &xen_pmu_group);
+}
+
+static void xen_pmu_destroy(void)
+{
+	sysfs_remove_group(hypervisor_kobj, &xen_pmu_group);
+}
+#endif
+
 static int __init hyper_sysfs_init(void)
 {
 	int ret;
@@ -390,7 +513,15 @@ static int __init hyper_sysfs_init(void)
 	ret = xen_properties_init();
 	if (ret)
 		goto prop_out;
-
+#ifdef CONFIG_XEN_HAVE_VPMU
+	if (xen_initial_domain()) {
+		ret = xen_pmu_init();
+		if (ret) {
+			xen_properties_destroy();
+			goto prop_out;
+		}
+	}
+#endif
 	goto out;
 
 prop_out:
@@ -407,6 +538,9 @@ out:
 
 static void __exit hyper_sysfs_exit(void)
 {
+#ifdef CONFIG_XEN_HAVE_VPMU
+	xen_pmu_destroy();
+#endif
 	xen_properties_destroy();
 	xen_compilation_destroy();
 	xen_sysfs_uuid_destroy();
diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index 8194270edcf0..e9d4501d1f5e 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -80,6 +80,7 @@
 #define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
+#define __HYPERVISOR_xenpmu_op            40
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/include/xen/interface/xenpmu.h b/include/xen/interface/xenpmu.h
new file mode 100644
index 000000000000..eac1b498b89f
--- /dev/null
+++ b/include/xen/interface/xenpmu.h
@@ -0,0 +1,59 @@
+#ifndef __XEN_PUBLIC_XENPMU_H__
+#define __XEN_PUBLIC_XENPMU_H__
+
+#include "xen.h"
+
+#define XENPMU_VER_MAJ    0
+#define XENPMU_VER_MIN    1
+
+/*
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_xenpmu_op(enum xenpmu_op cmd, struct xenpmu_params *args);
+ *
+ * @cmd  == XENPMU_* (PMU operation)
+ * @args == struct xenpmu_params
+ */
+/* ` enum xenpmu_op { */
+#define XENPMU_mode_get        0 /* Also used for getting PMU version */
+#define XENPMU_mode_set        1
+#define XENPMU_feature_get     2
+#define XENPMU_feature_set     3
+#define XENPMU_init            4
+#define XENPMU_finish          5
+
+/* ` } */
+
+/* Parameters structure for HYPERVISOR_xenpmu_op call */
+struct xen_pmu_params {
+	/* IN/OUT parameters */
+	struct {
+		uint32_t maj;
+		uint32_t min;
+	} version;
+	uint64_t val;
+
+	/* IN parameters */
+	uint32_t vcpu;
+	uint32_t pad;
+};
+
+/* PMU modes:
+ * - XENPMU_MODE_OFF:   No PMU virtualization
+ * - XENPMU_MODE_SELF:  Guests can profile themselves
+ * - XENPMU_MODE_HV:    Guests can profile themselves, dom0 profiles
+ *                      itself and Xen
+ * - XENPMU_MODE_ALL:   Only dom0 has access to VPMU and it profiles
+ *                      everyone: itself, the hypervisor and the guests.
+ */
+#define XENPMU_MODE_OFF           0
+#define XENPMU_MODE_SELF          (1<<0)
+#define XENPMU_MODE_HV            (1<<1)
+#define XENPMU_MODE_ALL           (1<<2)
+
+/*
+ * PMU features:
+ * - XENPMU_FEATURE_INTEL_BTS: Intel BTS support (ignored on AMD)
+ */
+#define XENPMU_FEATURE_INTEL_BTS  1
+
+#endif /* __XEN_PUBLIC_XENPMU_H__ */
-- 
cgit v1.2.3


From e2e05394e4a3420dab96f728df4531893494e15d Mon Sep 17 00:00:00 2001
From: Ross Zwisler <ross.zwisler@linux.intel.com>
Date: Tue, 18 Aug 2015 13:55:41 -0600
Subject: pmem, dax: have direct_access use __pmem annotation

Update the annotation for the kaddr pointer returned by direct_access()
so that it is a __pmem pointer.  This is consistent with the PMEM driver
and with how this direct_access() pointer is used in the DAX code.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/filesystems/Locking |  3 ++-
 arch/powerpc/sysdev/axonram.c     |  7 ++++---
 drivers/block/brd.c               |  4 ++--
 drivers/nvdimm/pmem.c             |  4 ++--
 drivers/s390/block/dcssblk.c      | 10 ++++++----
 fs/block_dev.c                    |  2 +-
 fs/dax.c                          | 37 ++++++++++++++++++++-----------------
 include/linux/blkdev.h            |  8 ++++----
 8 files changed, 41 insertions(+), 34 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 6a34a0f4d37c..06d443450f21 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -397,7 +397,8 @@ prototypes:
 	int (*release) (struct gendisk *, fmode_t);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-	int (*direct_access) (struct block_device *, sector_t, void **, unsigned long *);
+	int (*direct_access) (struct block_device *, sector_t, void __pmem **,
+				unsigned long *);
 	int (*media_changed) (struct gendisk *);
 	void (*unlock_native_capacity) (struct gendisk *);
 	int (*revalidate_disk) (struct gendisk *);
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index ee90db17b097..a2be2a66dab6 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -141,13 +141,14 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
  */
 static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-		       void **kaddr, unsigned long *pfn, long size)
+		       void __pmem **kaddr, unsigned long *pfn, long size)
 {
 	struct axon_ram_bank *bank = device->bd_disk->private_data;
 	loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
+	void *addr = (void *)(bank->ph_addr + offset);
 
-	*kaddr = (void *)(bank->ph_addr + offset);
-	*pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT;
+	*kaddr = (void __pmem *)addr;
+	*pfn = virt_to_phys(addr) >> PAGE_SHIFT;
 
 	return bank->size - offset;
 }
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 64ab4951e9d6..c96402fd1560 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -371,7 +371,7 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 static long brd_direct_access(struct block_device *bdev, sector_t sector,
-			void **kaddr, unsigned long *pfn, long size)
+			void __pmem **kaddr, unsigned long *pfn, long size)
 {
 	struct brd_device *brd = bdev->bd_disk->private_data;
 	struct page *page;
@@ -381,7 +381,7 @@ static long brd_direct_access(struct block_device *bdev, sector_t sector,
 	page = brd_insert_page(brd, sector);
 	if (!page)
 		return -ENOSPC;
-	*kaddr = page_address(page);
+	*kaddr = (void __pmem *)page_address(page);
 	*pfn = page_to_pfn(page);
 
 	/*
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index eb7552d939e1..f3b629779266 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -92,7 +92,7 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 }
 
 static long pmem_direct_access(struct block_device *bdev, sector_t sector,
-			      void **kaddr, unsigned long *pfn, long size)
+		      void __pmem **kaddr, unsigned long *pfn, long size)
 {
 	struct pmem_device *pmem = bdev->bd_disk->private_data;
 	size_t offset = sector << 9;
@@ -101,7 +101,7 @@ static long pmem_direct_access(struct block_device *bdev, sector_t sector,
 		return -ENODEV;
 
 	/* FIXME convert DAX to comprehend that this mapping has a lifetime */
-	*kaddr = (void __force *) pmem->virt_addr + offset;
+	*kaddr = pmem->virt_addr + offset;
 	*pfn = (pmem->phys_addr + offset) >> PAGE_SHIFT;
 
 	return pmem->size - offset;
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index da212813f2d5..2c5a397b9f3e 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -29,7 +29,7 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode);
 static void dcssblk_release(struct gendisk *disk, fmode_t mode);
 static void dcssblk_make_request(struct request_queue *q, struct bio *bio);
 static long dcssblk_direct_access(struct block_device *bdev, sector_t secnum,
-				 void **kaddr, unsigned long *pfn, long size);
+			 void __pmem **kaddr, unsigned long *pfn, long size);
 
 static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0";
 
@@ -879,18 +879,20 @@ fail:
 
 static long
 dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
-			void **kaddr, unsigned long *pfn, long size)
+			void __pmem **kaddr, unsigned long *pfn, long size)
 {
 	struct dcssblk_dev_info *dev_info;
 	unsigned long offset, dev_sz;
+	void *addr;
 
 	dev_info = bdev->bd_disk->private_data;
 	if (!dev_info)
 		return -ENODEV;
 	dev_sz = dev_info->end - dev_info->start;
 	offset = secnum * 512;
-	*kaddr = (void *) (dev_info->start + offset);
-	*pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT;
+	addr = (void *) (dev_info->start + offset);
+	*pfn = virt_to_phys(addr) >> PAGE_SHIFT;
+	*kaddr = (void __pmem *) addr;
 
 	return dev_sz - offset;
 }
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 198243717da5..2345a9870e2c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -441,7 +441,7 @@ EXPORT_SYMBOL_GPL(bdev_write_page);
  * accessible at this address.
  */
 long bdev_direct_access(struct block_device *bdev, sector_t sector,
-			void **addr, unsigned long *pfn, long size)
+			void __pmem **addr, unsigned long *pfn, long size)
 {
 	long avail;
 	const struct block_device_operations *ops = bdev->bd_disk->fops;
diff --git a/fs/dax.c b/fs/dax.c
index e07fecc93f80..7c634ac797b1 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -35,7 +35,7 @@ int dax_clear_blocks(struct inode *inode, sector_t block, long size)
 
 	might_sleep();
 	do {
-		void *addr;
+		void __pmem *addr;
 		unsigned long pfn;
 		long count;
 
@@ -47,7 +47,7 @@ int dax_clear_blocks(struct inode *inode, sector_t block, long size)
 			unsigned pgsz = PAGE_SIZE - offset_in_page(addr);
 			if (pgsz > count)
 				pgsz = count;
-			clear_pmem((void __pmem *)addr, pgsz);
+			clear_pmem(addr, pgsz);
 			addr += pgsz;
 			size -= pgsz;
 			count -= pgsz;
@@ -62,7 +62,8 @@ int dax_clear_blocks(struct inode *inode, sector_t block, long size)
 }
 EXPORT_SYMBOL_GPL(dax_clear_blocks);
 
-static long dax_get_addr(struct buffer_head *bh, void **addr, unsigned blkbits)
+static long dax_get_addr(struct buffer_head *bh, void __pmem **addr,
+		unsigned blkbits)
 {
 	unsigned long pfn;
 	sector_t sector = bh->b_blocknr << (blkbits - 9);
@@ -70,15 +71,15 @@ static long dax_get_addr(struct buffer_head *bh, void **addr, unsigned blkbits)
 }
 
 /* the clear_pmem() calls are ordered by a wmb_pmem() in the caller */
-static void dax_new_buf(void *addr, unsigned size, unsigned first, loff_t pos,
-			loff_t end)
+static void dax_new_buf(void __pmem *addr, unsigned size, unsigned first,
+		loff_t pos, loff_t end)
 {
 	loff_t final = end - pos + first; /* The final byte of the buffer */
 
 	if (first > 0)
-		clear_pmem((void __pmem *)addr, first);
+		clear_pmem(addr, first);
 	if (final < size)
-		clear_pmem((void __pmem *)addr + final, size - final);
+		clear_pmem(addr + final, size - final);
 }
 
 static bool buffer_written(struct buffer_head *bh)
@@ -106,7 +107,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 	loff_t pos = start;
 	loff_t max = start;
 	loff_t bh_max = start;
-	void *addr;
+	void __pmem *addr;
 	bool hole = false;
 	bool need_wmb = false;
 
@@ -158,11 +159,11 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
 		}
 
 		if (iov_iter_rw(iter) == WRITE) {
-			len = copy_from_iter_pmem((void __pmem *)addr,
-					max - pos, iter);
+			len = copy_from_iter_pmem(addr, max - pos, iter);
 			need_wmb = true;
 		} else if (!hole)
-			len = copy_to_iter(addr, max - pos, iter);
+			len = copy_to_iter((void __force *)addr, max - pos,
+					iter);
 		else
 			len = iov_iter_zero(max - pos, iter);
 
@@ -268,11 +269,13 @@ static int dax_load_hole(struct address_space *mapping, struct page *page,
 static int copy_user_bh(struct page *to, struct buffer_head *bh,
 			unsigned blkbits, unsigned long vaddr)
 {
-	void *vfrom, *vto;
+	void __pmem *vfrom;
+	void *vto;
+
 	if (dax_get_addr(bh, &vfrom, blkbits) < 0)
 		return -EIO;
 	vto = kmap_atomic(to);
-	copy_user_page(vto, vfrom, vaddr, to);
+	copy_user_page(vto, (void __force *)vfrom, vaddr, to);
 	kunmap_atomic(vto);
 	return 0;
 }
@@ -283,7 +286,7 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 	struct address_space *mapping = inode->i_mapping;
 	sector_t sector = bh->b_blocknr << (inode->i_blkbits - 9);
 	unsigned long vaddr = (unsigned long)vmf->virtual_address;
-	void *addr;
+	void __pmem *addr;
 	unsigned long pfn;
 	pgoff_t size;
 	int error;
@@ -312,7 +315,7 @@ static int dax_insert_mapping(struct inode *inode, struct buffer_head *bh,
 	}
 
 	if (buffer_unwritten(bh) || buffer_new(bh)) {
-		clear_pmem((void __pmem *)addr, PAGE_SIZE);
+		clear_pmem(addr, PAGE_SIZE);
 		wmb_pmem();
 	}
 
@@ -548,11 +551,11 @@ int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length,
 	if (err < 0)
 		return err;
 	if (buffer_written(&bh)) {
-		void *addr;
+		void __pmem *addr;
 		err = dax_get_addr(&bh, &addr, inode->i_blkbits);
 		if (err < 0)
 			return err;
-		clear_pmem((void __pmem *)addr + offset, length);
+		clear_pmem(addr + offset, length);
 		wmb_pmem();
 	}
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d4068c17d0df..c401ecdff9cb 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1555,8 +1555,8 @@ struct block_device_operations {
 	int (*rw_page)(struct block_device *, sector_t, struct page *, int rw);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-	long (*direct_access)(struct block_device *, sector_t,
-					void **, unsigned long *pfn, long size);
+	long (*direct_access)(struct block_device *, sector_t, void __pmem **,
+			unsigned long *pfn, long size);
 	unsigned int (*check_events) (struct gendisk *disk,
 				      unsigned int clearing);
 	/* ->media_changed() is DEPRECATED, use ->check_events() instead */
@@ -1574,8 +1574,8 @@ extern int __blkdev_driver_ioctl(struct block_device *, fmode_t, unsigned int,
 extern int bdev_read_page(struct block_device *, sector_t, struct page *);
 extern int bdev_write_page(struct block_device *, sector_t, struct page *,
 						struct writeback_control *);
-extern long bdev_direct_access(struct block_device *, sector_t, void **addr,
-						unsigned long *pfn, long size);
+extern long bdev_direct_access(struct block_device *, sector_t,
+		void __pmem **addr, unsigned long *pfn, long size);
 #else /* CONFIG_BLOCK */
 
 struct block_device;
-- 
cgit v1.2.3


From c04c674fadeb4a8e6522fc838d4620f7cfd4c621 Mon Sep 17 00:00:00 2001
From: Robert Baldyga <r.baldyga@samsung.com>
Date: Thu, 20 Aug 2015 17:26:02 +0200
Subject: nfc: s3fwrn5: Add driver for Samsung S3FWRN5 NFC Chip

Add driver for Samsung S3FWRN5 NFC controller.
S3FWRN5 is using NCI protocol and I2C communication interface.

Signed-off-by: Robert Baldyga <r.baldyga@samsung.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
---
 .../devicetree/bindings/net/nfc/s3fwrn5.txt        |  27 ++
 MAINTAINERS                                        |   6 +
 drivers/nfc/Kconfig                                |   1 +
 drivers/nfc/Makefile                               |   1 +
 drivers/nfc/s3fwrn5/Kconfig                        |  19 +
 drivers/nfc/s3fwrn5/Makefile                       |  11 +
 drivers/nfc/s3fwrn5/core.c                         | 219 +++++++++
 drivers/nfc/s3fwrn5/firmware.c                     | 511 +++++++++++++++++++++
 drivers/nfc/s3fwrn5/firmware.h                     | 111 +++++
 drivers/nfc/s3fwrn5/i2c.c                          | 306 ++++++++++++
 drivers/nfc/s3fwrn5/nci.c                          | 165 +++++++
 drivers/nfc/s3fwrn5/nci.h                          |  89 ++++
 drivers/nfc/s3fwrn5/s3fwrn5.h                      |  99 ++++
 13 files changed, 1565 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/nfc/s3fwrn5.txt
 create mode 100644 drivers/nfc/s3fwrn5/Kconfig
 create mode 100644 drivers/nfc/s3fwrn5/Makefile
 create mode 100644 drivers/nfc/s3fwrn5/core.c
 create mode 100644 drivers/nfc/s3fwrn5/firmware.c
 create mode 100644 drivers/nfc/s3fwrn5/firmware.h
 create mode 100644 drivers/nfc/s3fwrn5/i2c.c
 create mode 100644 drivers/nfc/s3fwrn5/nci.c
 create mode 100644 drivers/nfc/s3fwrn5/nci.h
 create mode 100644 drivers/nfc/s3fwrn5/s3fwrn5.h

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/nfc/s3fwrn5.txt b/Documentation/devicetree/bindings/net/nfc/s3fwrn5.txt
new file mode 100644
index 000000000000..fb1e75facf1b
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/nfc/s3fwrn5.txt
@@ -0,0 +1,27 @@
+* Samsung S3FWRN5 NCI NFC Controller
+
+Required properties:
+- compatible: Should be "samsung,s3fwrn5-i2c".
+- reg: address on the bus
+- interrupt-parent: phandle for the interrupt gpio controller
+- interrupts: GPIO interrupt to which the chip is connected
+- s3fwrn5,en-gpios: Output GPIO pin used for enabling/disabling the chip
+- s3fwrn5,fw-gpios: Output GPIO pin used to enter firmware mode and
+  sleep/wakeup control
+
+Example:
+
+&hsi2c_4 {
+	status = "okay";
+	s3fwrn5@27 {
+		compatible = "samsung,s3fwrn5-i2c";
+
+		reg = <0x27>;
+
+		interrupt-parent = <&gpa1>;
+		interrupts = <3 0 0>;
+
+		s3fwrn5,en-gpios = <&gpf1 4 0>;
+		s3fwrn5,fw-gpios = <&gpj0 2 0>;
+	};
+};
diff --git a/MAINTAINERS b/MAINTAINERS
index ca51eba9fe5d..7a3b1b901d22 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8871,6 +8871,12 @@ L:	linux-media@vger.kernel.org
 S:	Supported
 F:	drivers/media/i2c/s5k5baf.c
 
+SAMSUNG S3FWRN5 NFC DRIVER
+M:	Robert Baldyga <r.baldyga@samsung.com>
+L:	linux-nfc@lists.01.org (moderated for non-subscribers)
+S:	Supported
+F:	drivers/nfc/s3fwrn5
+
 SAMSUNG SOC CLOCK DRIVERS
 M:	Sylwester Nawrocki <s.nawrocki@samsung.com>
 M:	Tomasz Figa <tomasz.figa@gmail.com>
diff --git a/drivers/nfc/Kconfig b/drivers/nfc/Kconfig
index 722673cb785b..6639cd1cae36 100644
--- a/drivers/nfc/Kconfig
+++ b/drivers/nfc/Kconfig
@@ -74,4 +74,5 @@ source "drivers/nfc/nfcmrvl/Kconfig"
 source "drivers/nfc/st21nfca/Kconfig"
 source "drivers/nfc/st-nci/Kconfig"
 source "drivers/nfc/nxp-nci/Kconfig"
+source "drivers/nfc/s3fwrn5/Kconfig"
 endmenu
diff --git a/drivers/nfc/Makefile b/drivers/nfc/Makefile
index 368b6dfe71b3..2757fe1b8aa5 100644
--- a/drivers/nfc/Makefile
+++ b/drivers/nfc/Makefile
@@ -14,3 +14,4 @@ obj-$(CONFIG_NFC_TRF7970A)	+= trf7970a.o
 obj-$(CONFIG_NFC_ST21NFCA)  	+= st21nfca/
 obj-$(CONFIG_NFC_ST_NCI)	+= st-nci/
 obj-$(CONFIG_NFC_NXP_NCI)	+= nxp-nci/
+obj-$(CONFIG_NFC_S3FWRN5)	+= s3fwrn5/
diff --git a/drivers/nfc/s3fwrn5/Kconfig b/drivers/nfc/s3fwrn5/Kconfig
new file mode 100644
index 000000000000..7e3b255b3f99
--- /dev/null
+++ b/drivers/nfc/s3fwrn5/Kconfig
@@ -0,0 +1,19 @@
+config NFC_S3FWRN5
+	tristate
+	---help---
+	  Core driver for Samsung S3FWRN5 NFC chip. Contains core utilities
+	  of chip. It's intended to be used by PHYs to avoid duplicating lots
+	  of common code.
+
+config NFC_S3FWRN5_I2C
+	tristate "Samsung S3FWRN5 I2C support"
+	depends on NFC_NCI && I2C
+	select NFC_S3FWRN5
+	default n
+	---help---
+	  This module adds support for an I2C interface to the S3FWRN5 chip.
+	  Select this if your platform is using the I2C bus.
+
+	  To compile this driver as a module, choose m here. The module will
+	  be called s3fwrn5_i2c.ko.
+	  Say N if unsure.
diff --git a/drivers/nfc/s3fwrn5/Makefile b/drivers/nfc/s3fwrn5/Makefile
new file mode 100644
index 000000000000..3381c34faf62
--- /dev/null
+++ b/drivers/nfc/s3fwrn5/Makefile
@@ -0,0 +1,11 @@
+#
+# Makefile for Samsung S3FWRN5 NFC driver
+#
+
+s3fwrn5-objs = core.o firmware.o nci.o
+s3fwrn5_i2c-objs = i2c.o
+
+obj-$(CONFIG_NFC_S3FWRN5) += s3fwrn5.o
+obj-$(CONFIG_NFC_S3FWRN5_I2C) += s3fwrn5_i2c.o
+
+ccflags-$(CONFIG_NFC_DEBUG) := -DDEBUG
diff --git a/drivers/nfc/s3fwrn5/core.c b/drivers/nfc/s3fwrn5/core.c
new file mode 100644
index 000000000000..0d866ca295e3
--- /dev/null
+++ b/drivers/nfc/s3fwrn5/core.c
@@ -0,0 +1,219 @@
+/*
+ * NCI based driver for Samsung S3FWRN5 NFC chip
+ *
+ * Copyright (C) 2015 Samsung Electrnoics
+ * Robert Baldyga <r.baldyga@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/module.h>
+#include <net/nfc/nci_core.h>
+
+#include "s3fwrn5.h"
+#include "firmware.h"
+#include "nci.h"
+
+#define S3FWRN5_NFC_PROTOCOLS  (NFC_PROTO_JEWEL_MASK | \
+				NFC_PROTO_MIFARE_MASK | \
+				NFC_PROTO_FELICA_MASK | \
+				NFC_PROTO_ISO14443_MASK | \
+				NFC_PROTO_ISO14443_B_MASK | \
+				NFC_PROTO_ISO15693_MASK)
+
+static int s3fwrn5_firmware_update(struct s3fwrn5_info *info)
+{
+	bool need_update;
+	int ret;
+
+	s3fwrn5_fw_init(&info->fw_info, "sec_s3fwrn5_firmware.bin");
+
+	/* Update firmware */
+
+	s3fwrn5_set_wake(info, false);
+	s3fwrn5_set_mode(info, S3FWRN5_MODE_FW);
+
+	ret = s3fwrn5_fw_setup(&info->fw_info);
+	if (ret < 0)
+		return ret;
+
+	need_update = s3fwrn5_fw_check_version(&info->fw_info,
+		info->ndev->manufact_specific_info);
+	if (!need_update)
+		goto out;
+
+	dev_info(&info->ndev->nfc_dev->dev, "Detected new firmware version\n");
+
+	ret = s3fwrn5_fw_download(&info->fw_info);
+	if (ret < 0)
+		goto out;
+
+	/* Update RF configuration */
+
+	s3fwrn5_set_mode(info, S3FWRN5_MODE_NCI);
+
+	s3fwrn5_set_wake(info, true);
+	ret = s3fwrn5_nci_rf_configure(info, "sec_s3fwrn5_rfreg.bin");
+	s3fwrn5_set_wake(info, false);
+
+out:
+	s3fwrn5_set_mode(info, S3FWRN5_MODE_COLD);
+	s3fwrn5_fw_cleanup(&info->fw_info);
+	return ret;
+}
+
+static int s3fwrn5_nci_open(struct nci_dev *ndev)
+{
+	struct s3fwrn5_info *info = nci_get_drvdata(ndev);
+
+	if (s3fwrn5_get_mode(info) != S3FWRN5_MODE_COLD)
+		return  -EBUSY;
+
+	s3fwrn5_set_mode(info, S3FWRN5_MODE_NCI);
+	s3fwrn5_set_wake(info, true);
+
+	return 0;
+}
+
+static int s3fwrn5_nci_close(struct nci_dev *ndev)
+{
+	struct s3fwrn5_info *info = nci_get_drvdata(ndev);
+
+	s3fwrn5_set_wake(info, false);
+	s3fwrn5_set_mode(info, S3FWRN5_MODE_COLD);
+
+	return 0;
+}
+
+static int s3fwrn5_nci_send(struct nci_dev *ndev, struct sk_buff *skb)
+{
+	struct s3fwrn5_info *info = nci_get_drvdata(ndev);
+	int ret;
+
+	mutex_lock(&info->mutex);
+
+	if (s3fwrn5_get_mode(info) != S3FWRN5_MODE_NCI) {
+		mutex_unlock(&info->mutex);
+		return -EINVAL;
+	}
+
+	ret = s3fwrn5_write(info, skb);
+	if (ret < 0)
+		kfree_skb(skb);
+
+	mutex_unlock(&info->mutex);
+	return ret;
+}
+
+static int s3fwrn5_nci_post_setup(struct nci_dev *ndev)
+{
+	struct s3fwrn5_info *info = nci_get_drvdata(ndev);
+	int ret;
+
+	ret = s3fwrn5_firmware_update(info);
+	if (ret < 0)
+		goto out;
+
+	/* NCI core reset */
+
+	s3fwrn5_set_mode(info, S3FWRN5_MODE_NCI);
+	s3fwrn5_set_wake(info, true);
+
+	ret = nci_core_reset(info->ndev);
+	if (ret < 0)
+		goto out;
+
+	ret = nci_core_init(info->ndev);
+
+out:
+	return ret;
+}
+
+static struct nci_ops s3fwrn5_nci_ops = {
+	.open = s3fwrn5_nci_open,
+	.close = s3fwrn5_nci_close,
+	.send = s3fwrn5_nci_send,
+	.post_setup = s3fwrn5_nci_post_setup,
+};
+
+int s3fwrn5_probe(struct nci_dev **ndev, void *phy_id, struct device *pdev,
+	struct s3fwrn5_phy_ops *phy_ops, unsigned int max_payload)
+{
+	struct s3fwrn5_info *info;
+	int ret;
+
+	info = devm_kzalloc(pdev, sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return -ENOMEM;
+
+	info->phy_id = phy_id;
+	info->pdev = pdev;
+	info->phy_ops = phy_ops;
+	info->max_payload = max_payload;
+	mutex_init(&info->mutex);
+
+	s3fwrn5_set_mode(info, S3FWRN5_MODE_COLD);
+
+	s3fwrn5_nci_get_prop_ops(&s3fwrn5_nci_ops.prop_ops,
+		&s3fwrn5_nci_ops.n_prop_ops);
+
+	info->ndev = nci_allocate_device(&s3fwrn5_nci_ops,
+		S3FWRN5_NFC_PROTOCOLS, 0, 0);
+	if (!info->ndev)
+		return -ENOMEM;
+
+	nci_set_parent_dev(info->ndev, pdev);
+	nci_set_drvdata(info->ndev, info);
+
+	ret = nci_register_device(info->ndev);
+	if (ret < 0) {
+		nci_free_device(info->ndev);
+		return ret;
+	}
+
+	info->fw_info.ndev = info->ndev;
+
+	*ndev = info->ndev;
+
+	return ret;
+}
+EXPORT_SYMBOL(s3fwrn5_probe);
+
+void s3fwrn5_remove(struct nci_dev *ndev)
+{
+	struct s3fwrn5_info *info = nci_get_drvdata(ndev);
+
+	s3fwrn5_set_mode(info, S3FWRN5_MODE_COLD);
+
+	nci_unregister_device(ndev);
+	nci_free_device(ndev);
+}
+EXPORT_SYMBOL(s3fwrn5_remove);
+
+int s3fwrn5_recv_frame(struct nci_dev *ndev, struct sk_buff *skb,
+	enum s3fwrn5_mode mode)
+{
+	switch (mode) {
+	case S3FWRN5_MODE_NCI:
+		return nci_recv_frame(ndev, skb);
+	case S3FWRN5_MODE_FW:
+		return s3fwrn5_fw_recv_frame(ndev, skb);
+	default:
+		return -ENODEV;
+	}
+}
+EXPORT_SYMBOL(s3fwrn5_recv_frame);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Samsung S3FWRN5 NFC driver");
+MODULE_AUTHOR("Robert Baldyga <r.baldyga@samsung.com>");
diff --git a/drivers/nfc/s3fwrn5/firmware.c b/drivers/nfc/s3fwrn5/firmware.c
new file mode 100644
index 000000000000..64a90252c57f
--- /dev/null
+++ b/drivers/nfc/s3fwrn5/firmware.c
@@ -0,0 +1,511 @@
+/*
+ * NCI based driver for Samsung S3FWRN5 NFC chip
+ *
+ * Copyright (C) 2015 Samsung Electrnoics
+ * Robert Baldyga <r.baldyga@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/completion.h>
+#include <linux/firmware.h>
+#include <linux/crypto.h>
+#include <crypto/sha.h>
+
+#include "s3fwrn5.h"
+#include "firmware.h"
+
+struct s3fwrn5_fw_version {
+	__u8 major;
+	__u8 build1;
+	__u8 build2;
+	__u8 target;
+};
+
+static int s3fwrn5_fw_send_msg(struct s3fwrn5_fw_info *fw_info,
+	struct sk_buff *msg, struct sk_buff **rsp)
+{
+	struct s3fwrn5_info *info =
+		container_of(fw_info, struct s3fwrn5_info, fw_info);
+	long ret;
+
+	reinit_completion(&fw_info->completion);
+
+	ret = s3fwrn5_write(info, msg);
+	if (ret < 0)
+		return ret;
+
+	ret = wait_for_completion_interruptible_timeout(
+		&fw_info->completion, msecs_to_jiffies(1000));
+	if (ret < 0)
+		return ret;
+	else if (ret == 0)
+		return -ENXIO;
+
+	if (!fw_info->rsp)
+		return -EINVAL;
+
+	*rsp = fw_info->rsp;
+	fw_info->rsp = NULL;
+
+	return 0;
+}
+
+static int s3fwrn5_fw_prep_msg(struct s3fwrn5_fw_info *fw_info,
+	struct sk_buff **msg, u8 type, u8 code, const void *data, u16 len)
+{
+	struct s3fwrn5_fw_header hdr;
+	struct sk_buff *skb;
+
+	hdr.type = type | fw_info->parity;
+	fw_info->parity ^= 0x80;
+	hdr.code = code;
+	hdr.len = len;
+
+	skb = alloc_skb(S3FWRN5_FW_HDR_SIZE + len, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	memcpy(skb_put(skb, S3FWRN5_FW_HDR_SIZE), &hdr, S3FWRN5_FW_HDR_SIZE);
+	if (len)
+		memcpy(skb_put(skb, len), data, len);
+
+	*msg = skb;
+
+	return 0;
+}
+
+static int s3fwrn5_fw_get_bootinfo(struct s3fwrn5_fw_info *fw_info,
+	struct s3fwrn5_fw_cmd_get_bootinfo_rsp *bootinfo)
+{
+	struct sk_buff *msg, *rsp = NULL;
+	struct s3fwrn5_fw_header *hdr;
+	int ret;
+
+	/* Send GET_BOOTINFO command */
+
+	ret = s3fwrn5_fw_prep_msg(fw_info, &msg, S3FWRN5_FW_MSG_CMD,
+		S3FWRN5_FW_CMD_GET_BOOTINFO, NULL, 0);
+	if (ret < 0)
+		return ret;
+
+	ret = s3fwrn5_fw_send_msg(fw_info, msg, &rsp);
+	kfree_skb(msg);
+	if (ret < 0)
+		return ret;
+
+	hdr = (struct s3fwrn5_fw_header *) rsp->data;
+	if (hdr->code != S3FWRN5_FW_RET_SUCCESS) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	memcpy(bootinfo, rsp->data + S3FWRN5_FW_HDR_SIZE, 10);
+
+out:
+	kfree_skb(rsp);
+	return ret;
+}
+
+static int s3fwrn5_fw_enter_update_mode(struct s3fwrn5_fw_info *fw_info,
+	const void *hash_data, u16 hash_size,
+	const void *sig_data, u16 sig_size)
+{
+	struct s3fwrn5_fw_cmd_enter_updatemode args;
+	struct sk_buff *msg, *rsp = NULL;
+	struct s3fwrn5_fw_header *hdr;
+	int ret;
+
+	/* Send ENTER_UPDATE_MODE command */
+
+	args.hashcode_size = hash_size;
+	args.signature_size = sig_size;
+
+	ret = s3fwrn5_fw_prep_msg(fw_info, &msg, S3FWRN5_FW_MSG_CMD,
+		S3FWRN5_FW_CMD_ENTER_UPDATE_MODE, &args, sizeof(args));
+	if (ret < 0)
+		return ret;
+
+	ret = s3fwrn5_fw_send_msg(fw_info, msg, &rsp);
+	kfree_skb(msg);
+	if (ret < 0)
+		return ret;
+
+	hdr = (struct s3fwrn5_fw_header *) rsp->data;
+	if (hdr->code != S3FWRN5_FW_RET_SUCCESS) {
+		ret = -EPROTO;
+		goto out;
+	}
+
+	kfree_skb(rsp);
+
+	/* Send hashcode data */
+
+	ret = s3fwrn5_fw_prep_msg(fw_info, &msg, S3FWRN5_FW_MSG_DATA, 0,
+		hash_data, hash_size);
+	if (ret < 0)
+		return ret;
+
+	ret = s3fwrn5_fw_send_msg(fw_info, msg, &rsp);
+	kfree_skb(msg);
+	if (ret < 0)
+		return ret;
+
+	hdr = (struct s3fwrn5_fw_header *) rsp->data;
+	if (hdr->code != S3FWRN5_FW_RET_SUCCESS) {
+		ret = -EPROTO;
+		goto out;
+	}
+
+	kfree_skb(rsp);
+
+	/* Send signature data */
+
+	ret = s3fwrn5_fw_prep_msg(fw_info, &msg, S3FWRN5_FW_MSG_DATA, 0,
+		sig_data, sig_size);
+	if (ret < 0)
+		return ret;
+
+	ret = s3fwrn5_fw_send_msg(fw_info, msg, &rsp);
+	kfree_skb(msg);
+	if (ret < 0)
+		return ret;
+
+	hdr = (struct s3fwrn5_fw_header *) rsp->data;
+	if (hdr->code != S3FWRN5_FW_RET_SUCCESS)
+		ret = -EPROTO;
+
+out:
+	kfree_skb(rsp);
+	return ret;
+}
+
+static int s3fwrn5_fw_update_sector(struct s3fwrn5_fw_info *fw_info,
+	u32 base_addr, const void *data)
+{
+	struct s3fwrn5_fw_cmd_update_sector args;
+	struct sk_buff *msg, *rsp = NULL;
+	struct s3fwrn5_fw_header *hdr;
+	int ret, i;
+
+	/* Send UPDATE_SECTOR command */
+
+	args.base_address = base_addr;
+
+	ret = s3fwrn5_fw_prep_msg(fw_info, &msg, S3FWRN5_FW_MSG_CMD,
+		S3FWRN5_FW_CMD_UPDATE_SECTOR, &args, sizeof(args));
+	if (ret < 0)
+		return ret;
+
+	ret = s3fwrn5_fw_send_msg(fw_info, msg, &rsp);
+	kfree_skb(msg);
+	if (ret < 0)
+		return ret;
+
+	hdr = (struct s3fwrn5_fw_header *) rsp->data;
+	if (hdr->code != S3FWRN5_FW_RET_SUCCESS) {
+		ret = -EPROTO;
+		goto err;
+	}
+
+	kfree_skb(rsp);
+
+	/* Send data split into 256-byte packets */
+
+	for (i = 0; i < 16; ++i) {
+		ret = s3fwrn5_fw_prep_msg(fw_info, &msg,
+			S3FWRN5_FW_MSG_DATA, 0, data+256*i, 256);
+		if (ret < 0)
+			break;
+
+		ret = s3fwrn5_fw_send_msg(fw_info, msg, &rsp);
+		kfree_skb(msg);
+		if (ret < 0)
+			break;
+
+		hdr = (struct s3fwrn5_fw_header *) rsp->data;
+		if (hdr->code != S3FWRN5_FW_RET_SUCCESS) {
+			ret = -EPROTO;
+			goto err;
+		}
+
+		kfree_skb(rsp);
+	}
+
+	return ret;
+
+err:
+	kfree_skb(rsp);
+	return ret;
+}
+
+static int s3fwrn5_fw_complete_update_mode(struct s3fwrn5_fw_info *fw_info)
+{
+	struct sk_buff *msg, *rsp = NULL;
+	struct s3fwrn5_fw_header *hdr;
+	int ret;
+
+	/* Send COMPLETE_UPDATE_MODE command */
+
+	ret = s3fwrn5_fw_prep_msg(fw_info, &msg, S3FWRN5_FW_MSG_CMD,
+		S3FWRN5_FW_CMD_COMPLETE_UPDATE_MODE, NULL, 0);
+	if (ret < 0)
+		return ret;
+
+	ret = s3fwrn5_fw_send_msg(fw_info, msg, &rsp);
+	kfree_skb(msg);
+	if (ret < 0)
+		return ret;
+
+	hdr = (struct s3fwrn5_fw_header *) rsp->data;
+	if (hdr->code != S3FWRN5_FW_RET_SUCCESS)
+		ret = -EPROTO;
+
+	kfree_skb(rsp);
+
+	return ret;
+}
+
+/*
+ * Firmware header stucture:
+ *
+ * 0x00 - 0x0B : Date and time string (w/o NUL termination)
+ * 0x10 - 0x13 : Firmware version
+ * 0x14 - 0x17 : Signature address
+ * 0x18 - 0x1B : Signature size
+ * 0x1C - 0x1F : Firmware image address
+ * 0x20 - 0x23 : Firmware sectors count
+ * 0x24 - 0x27 : Custom signature address
+ * 0x28 - 0x2B : Custom signature size
+ */
+
+#define S3FWRN5_FW_IMAGE_HEADER_SIZE 44
+
+static int s3fwrn5_fw_request_firmware(struct s3fwrn5_fw_info *fw_info)
+{
+	struct s3fwrn5_fw_image *fw = &fw_info->fw;
+	u32 sig_off;
+	u32 image_off;
+	u32 custom_sig_off;
+	int ret;
+
+	ret = request_firmware(&fw->fw, fw_info->fw_name,
+		&fw_info->ndev->nfc_dev->dev);
+	if (ret < 0)
+		return ret;
+
+	if (fw->fw->size < S3FWRN5_FW_IMAGE_HEADER_SIZE)
+		return -EINVAL;
+
+	memcpy(fw->date, fw->fw->data + 0x00, 12);
+	fw->date[12] = '\0';
+
+	memcpy(&fw->version, fw->fw->data + 0x10, 4);
+
+	memcpy(&sig_off, fw->fw->data + 0x14, 4);
+	fw->sig = fw->fw->data + sig_off;
+	memcpy(&fw->sig_size, fw->fw->data + 0x18, 4);
+
+	memcpy(&image_off, fw->fw->data + 0x1C, 4);
+	fw->image = fw->fw->data + image_off;
+	memcpy(&fw->image_sectors, fw->fw->data + 0x20, 4);
+
+	memcpy(&custom_sig_off, fw->fw->data + 0x24, 4);
+	fw->custom_sig = fw->fw->data + custom_sig_off;
+	memcpy(&fw->custom_sig_size, fw->fw->data + 0x28, 4);
+
+	return 0;
+}
+
+static void s3fwrn5_fw_release_firmware(struct s3fwrn5_fw_info *fw_info)
+{
+	release_firmware(fw_info->fw.fw);
+}
+
+static int s3fwrn5_fw_get_base_addr(
+	struct s3fwrn5_fw_cmd_get_bootinfo_rsp *bootinfo, u32 *base_addr)
+{
+	int i;
+	struct {
+		u8 version[4];
+		u32 base_addr;
+	} match[] = {
+		{{0x05, 0x00, 0x00, 0x00}, 0x00005000},
+		{{0x05, 0x00, 0x00, 0x01}, 0x00003000},
+		{{0x05, 0x00, 0x00, 0x02}, 0x00003000},
+		{{0x05, 0x00, 0x00, 0x03}, 0x00003000},
+		{{0x05, 0x00, 0x00, 0x05}, 0x00003000}
+	};
+
+	for (i = 0; i < ARRAY_SIZE(match); ++i)
+		if (bootinfo->hw_version[0] == match[i].version[0] &&
+			bootinfo->hw_version[1] == match[i].version[1] &&
+			bootinfo->hw_version[3] == match[i].version[3]) {
+			*base_addr = match[i].base_addr;
+			return 0;
+		}
+
+	return -EINVAL;
+}
+
+static inline bool
+s3fwrn5_fw_is_custom(struct s3fwrn5_fw_cmd_get_bootinfo_rsp *bootinfo)
+{
+	return !!bootinfo->hw_version[2];
+}
+
+int s3fwrn5_fw_setup(struct s3fwrn5_fw_info *fw_info)
+{
+	struct s3fwrn5_fw_cmd_get_bootinfo_rsp bootinfo;
+	int ret;
+
+	/* Get firmware data */
+
+	ret = s3fwrn5_fw_request_firmware(fw_info);
+	if (ret < 0) {
+		dev_err(&fw_info->ndev->nfc_dev->dev,
+			"Failed to get fw file, ret=%02x\n", ret);
+		return ret;
+	}
+
+	/* Get bootloader info */
+
+	ret = s3fwrn5_fw_get_bootinfo(fw_info, &bootinfo);
+	if (ret < 0) {
+		dev_err(&fw_info->ndev->nfc_dev->dev,
+			"Failed to get bootinfo, ret=%02x\n", ret);
+		goto err;
+	}
+
+	/* Match hardware version to obtain firmware base address */
+
+	ret = s3fwrn5_fw_get_base_addr(&bootinfo, &fw_info->base_addr);
+	if (ret < 0) {
+		dev_err(&fw_info->ndev->nfc_dev->dev,
+			"Unknown hardware version\n");
+		goto err;
+	}
+
+	fw_info->sector_size = bootinfo.sector_size;
+
+	fw_info->sig_size = s3fwrn5_fw_is_custom(&bootinfo) ?
+		fw_info->fw.custom_sig_size : fw_info->fw.sig_size;
+	fw_info->sig = s3fwrn5_fw_is_custom(&bootinfo) ?
+		fw_info->fw.custom_sig : fw_info->fw.sig;
+
+	return 0;
+
+err:
+	s3fwrn5_fw_release_firmware(fw_info);
+	return ret;
+}
+
+bool s3fwrn5_fw_check_version(struct s3fwrn5_fw_info *fw_info, u32 version)
+{
+	struct s3fwrn5_fw_version *new = (void *) &fw_info->fw.version;
+	struct s3fwrn5_fw_version *old = (void *) &version;
+
+	if (new->major > old->major)
+		return true;
+	if (new->build1 > old->build1)
+		return true;
+	if (new->build2 > old->build2)
+		return true;
+
+	return false;
+}
+
+int s3fwrn5_fw_download(struct s3fwrn5_fw_info *fw_info)
+{
+	struct s3fwrn5_fw_image *fw = &fw_info->fw;
+	u8 hash_data[SHA1_DIGEST_SIZE];
+	struct scatterlist sg;
+	struct hash_desc desc;
+	u32 image_size, off;
+	int ret;
+
+	image_size = fw_info->sector_size * fw->image_sectors;
+
+	/* Compute SHA of firmware data */
+
+	sg_init_one(&sg, fw->image, image_size);
+	desc.tfm = crypto_alloc_hash("sha1", 0, CRYPTO_ALG_ASYNC);
+	crypto_hash_init(&desc);
+	crypto_hash_update(&desc, &sg, image_size);
+	crypto_hash_final(&desc, hash_data);
+	crypto_free_hash(desc.tfm);
+
+	/* Firmware update process */
+
+	dev_info(&fw_info->ndev->nfc_dev->dev,
+		"Firmware update: %s\n", fw_info->fw_name);
+
+	ret = s3fwrn5_fw_enter_update_mode(fw_info, hash_data,
+		SHA1_DIGEST_SIZE, fw_info->sig, fw_info->sig_size);
+	if (ret < 0) {
+		dev_err(&fw_info->ndev->nfc_dev->dev,
+			"Unable to enter update mode\n");
+		goto out;
+	}
+
+	for (off = 0; off < image_size; off += fw_info->sector_size) {
+		ret = s3fwrn5_fw_update_sector(fw_info,
+			fw_info->base_addr + off, fw->image + off);
+		if (ret < 0) {
+			dev_err(&fw_info->ndev->nfc_dev->dev,
+				"Firmware update error (code=%d)\n", ret);
+			goto out;
+		}
+	}
+
+	ret = s3fwrn5_fw_complete_update_mode(fw_info);
+	if (ret < 0) {
+		dev_err(&fw_info->ndev->nfc_dev->dev,
+			"Unable to complete update mode\n");
+		goto out;
+	}
+
+	dev_info(&fw_info->ndev->nfc_dev->dev,
+		"Firmware update: success\n");
+
+out:
+	return ret;
+}
+
+void s3fwrn5_fw_init(struct s3fwrn5_fw_info *fw_info, const char *fw_name)
+{
+	fw_info->parity = 0x00;
+	fw_info->rsp = NULL;
+	fw_info->fw.fw = NULL;
+	strcpy(fw_info->fw_name, fw_name);
+	init_completion(&fw_info->completion);
+}
+
+void s3fwrn5_fw_cleanup(struct s3fwrn5_fw_info *fw_info)
+{
+	s3fwrn5_fw_release_firmware(fw_info);
+}
+
+int s3fwrn5_fw_recv_frame(struct nci_dev *ndev, struct sk_buff *skb)
+{
+	struct s3fwrn5_info *info = nci_get_drvdata(ndev);
+	struct s3fwrn5_fw_info *fw_info = &info->fw_info;
+
+	BUG_ON(fw_info->rsp);
+
+	fw_info->rsp = skb;
+
+	complete(&fw_info->completion);
+
+	return 0;
+}
diff --git a/drivers/nfc/s3fwrn5/firmware.h b/drivers/nfc/s3fwrn5/firmware.h
new file mode 100644
index 000000000000..1ec0647ab917
--- /dev/null
+++ b/drivers/nfc/s3fwrn5/firmware.h
@@ -0,0 +1,111 @@
+/*
+ * NCI based driver for Samsung S3FWRN5 NFC chip
+ *
+ * Copyright (C) 2015 Samsung Electrnoics
+ * Robert Baldyga <r.baldyga@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __LOCAL_S3FWRN5_FIRMWARE_H_
+#define __LOCAL_S3FWRN5_FIRMWARE_H_
+
+/* FW Message Types */
+#define S3FWRN5_FW_MSG_CMD			0x00
+#define S3FWRN5_FW_MSG_RSP			0x01
+#define S3FWRN5_FW_MSG_DATA			0x02
+
+/* FW Return Codes */
+#define S3FWRN5_FW_RET_SUCCESS			0x00
+#define S3FWRN5_FW_RET_MESSAGE_TYPE_INVALID	0x01
+#define S3FWRN5_FW_RET_COMMAND_INVALID		0x02
+#define S3FWRN5_FW_RET_PAGE_DATA_OVERFLOW	0x03
+#define S3FWRN5_FW_RET_SECT_DATA_OVERFLOW	0x04
+#define S3FWRN5_FW_RET_AUTHENTICATION_FAIL	0x05
+#define S3FWRN5_FW_RET_FLASH_OPERATION_FAIL	0x06
+#define S3FWRN5_FW_RET_ADDRESS_OUT_OF_RANGE	0x07
+#define S3FWRN5_FW_RET_PARAMETER_INVALID	0x08
+
+/* ---- FW Packet structures ---- */
+#define S3FWRN5_FW_HDR_SIZE 4
+
+struct s3fwrn5_fw_header {
+	__u8 type;
+	__u8 code;
+	__u16 len;
+};
+
+#define S3FWRN5_FW_CMD_RESET			0x00
+
+#define S3FWRN5_FW_CMD_GET_BOOTINFO		0x01
+
+struct s3fwrn5_fw_cmd_get_bootinfo_rsp {
+	__u8 hw_version[4];
+	__u16 sector_size;
+	__u16 page_size;
+	__u16 frame_max_size;
+	__u16 hw_buffer_size;
+};
+
+#define S3FWRN5_FW_CMD_ENTER_UPDATE_MODE	0x02
+
+struct s3fwrn5_fw_cmd_enter_updatemode {
+	__u16 hashcode_size;
+	__u16 signature_size;
+};
+
+#define S3FWRN5_FW_CMD_UPDATE_SECTOR		0x04
+
+struct s3fwrn5_fw_cmd_update_sector {
+	__u32 base_address;
+};
+
+#define S3FWRN5_FW_CMD_COMPLETE_UPDATE_MODE	0x05
+
+struct s3fwrn5_fw_image {
+	const struct firmware *fw;
+
+	char date[13];
+	u32 version;
+	const void *sig;
+	u32 sig_size;
+	const void *image;
+	u32 image_sectors;
+	const void *custom_sig;
+	u32 custom_sig_size;
+};
+
+struct s3fwrn5_fw_info {
+	struct nci_dev *ndev;
+	struct s3fwrn5_fw_image fw;
+	char fw_name[NFC_FIRMWARE_NAME_MAXSIZE + 1];
+
+	const void *sig;
+	u32 sig_size;
+	u32 sector_size;
+	u32 base_addr;
+
+	struct completion completion;
+	struct sk_buff *rsp;
+	char parity;
+};
+
+void s3fwrn5_fw_init(struct s3fwrn5_fw_info *fw_info, const char *fw_name);
+int s3fwrn5_fw_setup(struct s3fwrn5_fw_info *fw_info);
+bool s3fwrn5_fw_check_version(struct s3fwrn5_fw_info *fw_info, u32 version);
+int s3fwrn5_fw_download(struct s3fwrn5_fw_info *fw_info);
+void s3fwrn5_fw_cleanup(struct s3fwrn5_fw_info *fw_info);
+
+int s3fwrn5_fw_recv_frame(struct nci_dev *ndev, struct sk_buff *skb);
+
+#endif /* __LOCAL_S3FWRN5_FIRMWARE_H_ */
diff --git a/drivers/nfc/s3fwrn5/i2c.c b/drivers/nfc/s3fwrn5/i2c.c
new file mode 100644
index 000000000000..b4dd7dd47473
--- /dev/null
+++ b/drivers/nfc/s3fwrn5/i2c.c
@@ -0,0 +1,306 @@
+/*
+ * I2C Link Layer for Samsung S3FWRN5 NCI based Driver
+ *
+ * Copyright (C) 2015 Samsung Electrnoics
+ * Robert Baldyga <r.baldyga@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/i2c.h>
+#include <linux/gpio.h>
+#include <linux/delay.h>
+#include <linux/of_gpio.h>
+#include <linux/of_irq.h>
+#include <linux/module.h>
+
+#include <net/nfc/nfc.h>
+
+#include "s3fwrn5.h"
+
+#define S3FWRN5_I2C_DRIVER_NAME "s3fwrn5_i2c"
+
+#define S3FWRN5_I2C_MAX_PAYLOAD 32
+#define S3FWRN5_EN_WAIT_TIME 150
+
+struct s3fwrn5_i2c_phy {
+	struct i2c_client *i2c_dev;
+	struct nci_dev *ndev;
+
+	unsigned int gpio_en;
+	unsigned int gpio_fw_wake;
+
+	struct mutex mutex;
+
+	enum s3fwrn5_mode mode;
+	unsigned int irq_skip:1;
+};
+
+static void s3fwrn5_i2c_set_wake(void *phy_id, bool wake)
+{
+	struct s3fwrn5_i2c_phy *phy = phy_id;
+
+	mutex_lock(&phy->mutex);
+	gpio_set_value(phy->gpio_fw_wake, wake);
+	msleep(S3FWRN5_EN_WAIT_TIME/2);
+	mutex_unlock(&phy->mutex);
+}
+
+static void s3fwrn5_i2c_set_mode(void *phy_id, enum s3fwrn5_mode mode)
+{
+	struct s3fwrn5_i2c_phy *phy = phy_id;
+
+	mutex_lock(&phy->mutex);
+
+	if (phy->mode == mode)
+		goto out;
+
+	phy->mode = mode;
+
+	gpio_set_value(phy->gpio_en, 1);
+	gpio_set_value(phy->gpio_fw_wake, 0);
+	if (mode == S3FWRN5_MODE_FW)
+		gpio_set_value(phy->gpio_fw_wake, 1);
+
+	if (mode != S3FWRN5_MODE_COLD) {
+		msleep(S3FWRN5_EN_WAIT_TIME);
+		gpio_set_value(phy->gpio_en, 0);
+		msleep(S3FWRN5_EN_WAIT_TIME/2);
+	}
+
+	phy->irq_skip = true;
+
+out:
+	mutex_unlock(&phy->mutex);
+}
+
+static enum s3fwrn5_mode s3fwrn5_i2c_get_mode(void *phy_id)
+{
+	struct s3fwrn5_i2c_phy *phy = phy_id;
+	enum s3fwrn5_mode mode;
+
+	mutex_lock(&phy->mutex);
+
+	mode = phy->mode;
+
+	mutex_unlock(&phy->mutex);
+
+	return mode;
+}
+
+static int s3fwrn5_i2c_write(void *phy_id, struct sk_buff *skb)
+{
+	struct s3fwrn5_i2c_phy *phy = phy_id;
+	int ret;
+
+	mutex_lock(&phy->mutex);
+
+	phy->irq_skip = false;
+
+	ret = i2c_master_send(phy->i2c_dev, skb->data, skb->len);
+	if (ret == -EREMOTEIO) {
+		/* Retry, chip was in standby */
+		usleep_range(110000, 120000);
+		ret  = i2c_master_send(phy->i2c_dev, skb->data, skb->len);
+	}
+
+	mutex_unlock(&phy->mutex);
+
+	if (ret < 0)
+		return ret;
+
+	if (ret != skb->len)
+		return -EREMOTEIO;
+
+	return 0;
+}
+
+static struct s3fwrn5_phy_ops i2c_phy_ops = {
+	.set_wake = s3fwrn5_i2c_set_wake,
+	.set_mode = s3fwrn5_i2c_set_mode,
+	.get_mode = s3fwrn5_i2c_get_mode,
+	.write = s3fwrn5_i2c_write,
+};
+
+static int s3fwrn5_i2c_read(struct s3fwrn5_i2c_phy *phy)
+{
+	struct sk_buff *skb;
+	size_t hdr_size;
+	size_t data_len;
+	char hdr[4];
+	int ret;
+
+	hdr_size = (phy->mode == S3FWRN5_MODE_NCI) ?
+		NCI_CTRL_HDR_SIZE : S3FWRN5_FW_HDR_SIZE;
+	ret = i2c_master_recv(phy->i2c_dev, hdr, hdr_size);
+	if (ret < 0)
+		return ret;
+
+	if (ret < hdr_size)
+		return -EBADMSG;
+
+	data_len = (phy->mode == S3FWRN5_MODE_NCI) ?
+		((struct nci_ctrl_hdr *)hdr)->plen :
+		((struct s3fwrn5_fw_header *)hdr)->len;
+
+	skb = alloc_skb(hdr_size + data_len, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	memcpy(skb_put(skb, hdr_size), hdr, hdr_size);
+
+	if (data_len == 0)
+		goto out;
+
+	ret = i2c_master_recv(phy->i2c_dev, skb_put(skb, data_len), data_len);
+	if (ret != data_len) {
+		kfree_skb(skb);
+		return -EBADMSG;
+	}
+
+out:
+	return s3fwrn5_recv_frame(phy->ndev, skb, phy->mode);
+}
+
+static irqreturn_t s3fwrn5_i2c_irq_thread_fn(int irq, void *phy_id)
+{
+	struct s3fwrn5_i2c_phy *phy = phy_id;
+	int ret = 0;
+
+	if (!phy || !phy->ndev) {
+		WARN_ON_ONCE(1);
+		return IRQ_NONE;
+	}
+
+	mutex_lock(&phy->mutex);
+
+	if (phy->irq_skip)
+		goto out;
+
+	switch (phy->mode) {
+	case S3FWRN5_MODE_NCI:
+	case S3FWRN5_MODE_FW:
+		ret = s3fwrn5_i2c_read(phy);
+		break;
+	case S3FWRN5_MODE_COLD:
+		ret = -EREMOTEIO;
+		break;
+	}
+
+out:
+	mutex_unlock(&phy->mutex);
+
+	return IRQ_HANDLED;
+}
+
+static int s3fwrn5_i2c_parse_dt(struct i2c_client *client)
+{
+	struct s3fwrn5_i2c_phy *phy = i2c_get_clientdata(client);
+	struct device_node *np = client->dev.of_node;
+
+	if (!np)
+		return -ENODEV;
+
+	phy->gpio_en = of_get_named_gpio(np, "s3fwrn5,en-gpios", 0);
+	if (!gpio_is_valid(phy->gpio_en))
+		return -ENODEV;
+
+	phy->gpio_fw_wake = of_get_named_gpio(np, "s3fwrn5,fw-gpios", 0);
+	if (!gpio_is_valid(phy->gpio_fw_wake))
+		return -ENODEV;
+
+	return 0;
+}
+
+static int s3fwrn5_i2c_probe(struct i2c_client *client,
+				  const struct i2c_device_id *id)
+{
+	struct s3fwrn5_i2c_phy *phy;
+	int ret;
+
+	phy = devm_kzalloc(&client->dev, sizeof(*phy), GFP_KERNEL);
+	if (!phy)
+		return -ENOMEM;
+
+	mutex_init(&phy->mutex);
+	phy->mode = S3FWRN5_MODE_COLD;
+	phy->irq_skip = true;
+
+	phy->i2c_dev = client;
+	i2c_set_clientdata(client, phy);
+
+	ret = s3fwrn5_i2c_parse_dt(client);
+	if (ret < 0)
+		return ret;
+
+	ret = devm_gpio_request_one(&phy->i2c_dev->dev, phy->gpio_en,
+		GPIOF_OUT_INIT_HIGH, "s3fwrn5_en");
+	if (ret < 0)
+		return ret;
+
+	ret = devm_gpio_request_one(&phy->i2c_dev->dev, phy->gpio_fw_wake,
+		GPIOF_OUT_INIT_LOW, "s3fwrn5_fw_wake");
+	if (ret < 0)
+		return ret;
+
+	ret = s3fwrn5_probe(&phy->ndev, phy, &phy->i2c_dev->dev, &i2c_phy_ops,
+		S3FWRN5_I2C_MAX_PAYLOAD);
+	if (ret < 0)
+		return ret;
+
+	ret = request_threaded_irq(phy->i2c_dev->irq, NULL,
+		s3fwrn5_i2c_irq_thread_fn, IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
+		S3FWRN5_I2C_DRIVER_NAME, phy);
+	if (ret)
+		s3fwrn5_remove(phy->ndev);
+
+	return ret;
+}
+
+static int s3fwrn5_i2c_remove(struct i2c_client *client)
+{
+	struct s3fwrn5_i2c_phy *phy = i2c_get_clientdata(client);
+
+	s3fwrn5_remove(phy->ndev);
+
+	return 0;
+}
+
+static struct i2c_device_id s3fwrn5_i2c_id_table[] = {
+	{S3FWRN5_I2C_DRIVER_NAME, 0},
+	{}
+};
+MODULE_DEVICE_TABLE(i2c, s3fwrn5_i2c_id_table);
+
+static const struct of_device_id of_s3fwrn5_i2c_match[] = {
+	{ .compatible = "samsung,s3fwrn5-i2c", },
+	{}
+};
+MODULE_DEVICE_TABLE(of, of_s3fwrn5_i2c_match);
+
+static struct i2c_driver s3fwrn5_i2c_driver = {
+	.driver = {
+		.owner = THIS_MODULE,
+		.name = S3FWRN5_I2C_DRIVER_NAME,
+		.of_match_table = of_match_ptr(of_s3fwrn5_i2c_match),
+	},
+	.probe = s3fwrn5_i2c_probe,
+	.remove = s3fwrn5_i2c_remove,
+	.id_table = s3fwrn5_i2c_id_table,
+};
+
+module_i2c_driver(s3fwrn5_i2c_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("I2C driver for Samsung S3FWRN5");
+MODULE_AUTHOR("Robert Baldyga <r.baldyga@samsung.com>");
diff --git a/drivers/nfc/s3fwrn5/nci.c b/drivers/nfc/s3fwrn5/nci.c
new file mode 100644
index 000000000000..ace0071c5339
--- /dev/null
+++ b/drivers/nfc/s3fwrn5/nci.c
@@ -0,0 +1,165 @@
+/*
+ * NCI based driver for Samsung S3FWRN5 NFC chip
+ *
+ * Copyright (C) 2015 Samsung Electrnoics
+ * Robert Baldyga <r.baldyga@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/completion.h>
+#include <linux/firmware.h>
+
+#include "s3fwrn5.h"
+#include "nci.h"
+
+static int s3fwrn5_nci_prop_rsp(struct nci_dev *ndev, struct sk_buff *skb)
+{
+	__u8 status = skb->data[0];
+
+	nci_req_complete(ndev, status);
+	return 0;
+}
+
+static struct nci_prop_ops s3fwrn5_nci_prop_ops[] = {
+	{
+		.opcode = nci_opcode_pack(NCI_GID_PROPRIETARY,
+				NCI_PROP_AGAIN),
+		.rsp = s3fwrn5_nci_prop_rsp,
+	},
+	{
+		.opcode = nci_opcode_pack(NCI_GID_PROPRIETARY,
+				NCI_PROP_GET_RFREG),
+		.rsp = s3fwrn5_nci_prop_rsp,
+	},
+	{
+		.opcode = nci_opcode_pack(NCI_GID_PROPRIETARY,
+				NCI_PROP_SET_RFREG),
+		.rsp = s3fwrn5_nci_prop_rsp,
+	},
+	{
+		.opcode = nci_opcode_pack(NCI_GID_PROPRIETARY,
+				NCI_PROP_GET_RFREG_VER),
+		.rsp = s3fwrn5_nci_prop_rsp,
+	},
+	{
+		.opcode = nci_opcode_pack(NCI_GID_PROPRIETARY,
+				NCI_PROP_SET_RFREG_VER),
+		.rsp = s3fwrn5_nci_prop_rsp,
+	},
+	{
+		.opcode = nci_opcode_pack(NCI_GID_PROPRIETARY,
+				NCI_PROP_START_RFREG),
+		.rsp = s3fwrn5_nci_prop_rsp,
+	},
+	{
+		.opcode = nci_opcode_pack(NCI_GID_PROPRIETARY,
+				NCI_PROP_STOP_RFREG),
+		.rsp = s3fwrn5_nci_prop_rsp,
+	},
+	{
+		.opcode = nci_opcode_pack(NCI_GID_PROPRIETARY,
+				NCI_PROP_FW_CFG),
+		.rsp = s3fwrn5_nci_prop_rsp,
+	},
+	{
+		.opcode = nci_opcode_pack(NCI_GID_PROPRIETARY,
+				NCI_PROP_WR_RESET),
+		.rsp = s3fwrn5_nci_prop_rsp,
+	},
+};
+
+void s3fwrn5_nci_get_prop_ops(struct nci_prop_ops **ops, size_t *n)
+{
+	*ops = s3fwrn5_nci_prop_ops;
+	*n = ARRAY_SIZE(s3fwrn5_nci_prop_ops);
+}
+
+#define S3FWRN5_RFREG_SECTION_SIZE 252
+
+int s3fwrn5_nci_rf_configure(struct s3fwrn5_info *info, const char *fw_name)
+{
+	const struct firmware *fw;
+	struct nci_prop_fw_cfg_cmd fw_cfg;
+	struct nci_prop_set_rfreg_cmd set_rfreg;
+	struct nci_prop_stop_rfreg_cmd stop_rfreg;
+	u32 checksum;
+	int i, len;
+	int ret;
+
+	ret = request_firmware(&fw, fw_name, &info->ndev->nfc_dev->dev);
+	if (ret < 0)
+		return ret;
+
+	/* Compute rfreg checksum */
+
+	checksum = 0;
+	for (i = 0; i < fw->size; i += 4)
+		checksum += *((u32 *)(fw->data+i));
+
+	/* Set default clock configuration for external crystal */
+
+	fw_cfg.clk_type = 0x01;
+	fw_cfg.clk_speed = 0xff;
+	fw_cfg.clk_req = 0xff;
+	ret = nci_prop_cmd(info->ndev, NCI_PROP_FW_CFG,
+		sizeof(fw_cfg), (__u8 *)&fw_cfg);
+	if (ret < 0)
+		goto out;
+
+	/* Start rfreg configuration */
+
+	dev_info(&info->ndev->nfc_dev->dev,
+		"rfreg configuration update: %s\n", fw_name);
+
+	ret = nci_prop_cmd(info->ndev, NCI_PROP_START_RFREG, 0, NULL);
+	if (ret < 0) {
+		dev_err(&info->ndev->nfc_dev->dev,
+			"Unable to start rfreg update\n");
+		goto out;
+	}
+
+	/* Update rfreg */
+
+	set_rfreg.index = 0;
+	for (i = 0; i < fw->size; i += S3FWRN5_RFREG_SECTION_SIZE) {
+		len = (fw->size - i < S3FWRN5_RFREG_SECTION_SIZE) ?
+			(fw->size - i) : S3FWRN5_RFREG_SECTION_SIZE;
+		memcpy(set_rfreg.data, fw->data+i, len);
+		ret = nci_prop_cmd(info->ndev, NCI_PROP_SET_RFREG,
+			len+1, (__u8 *)&set_rfreg);
+		if (ret < 0) {
+			dev_err(&info->ndev->nfc_dev->dev,
+				"rfreg update error (code=%d)\n", ret);
+			goto out;
+		}
+		set_rfreg.index++;
+	}
+
+	/* Finish rfreg configuration */
+
+	stop_rfreg.checksum = checksum & 0xffff;
+	ret = nci_prop_cmd(info->ndev, NCI_PROP_STOP_RFREG,
+		sizeof(stop_rfreg), (__u8 *)&stop_rfreg);
+	if (ret < 0) {
+		dev_err(&info->ndev->nfc_dev->dev,
+			"Unable to stop rfreg update\n");
+		goto out;
+	}
+
+	dev_info(&info->ndev->nfc_dev->dev,
+		"rfreg configuration update: success\n");
+out:
+	release_firmware(fw);
+	return ret;
+}
diff --git a/drivers/nfc/s3fwrn5/nci.h b/drivers/nfc/s3fwrn5/nci.h
new file mode 100644
index 000000000000..0e68d439dde6
--- /dev/null
+++ b/drivers/nfc/s3fwrn5/nci.h
@@ -0,0 +1,89 @@
+/*
+ * NCI based driver for Samsung S3FWRN5 NFC chip
+ *
+ * Copyright (C) 2015 Samsung Electrnoics
+ * Robert Baldyga <r.baldyga@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __LOCAL_S3FWRN5_NCI_H_
+#define __LOCAL_S3FWRN5_NCI_H_
+
+#include "s3fwrn5.h"
+
+#define NCI_PROP_AGAIN		0x01
+
+#define NCI_PROP_GET_RFREG	0x21
+#define NCI_PROP_SET_RFREG	0x22
+
+struct nci_prop_set_rfreg_cmd {
+	__u8 index;
+	__u8 data[252];
+};
+
+struct nci_prop_set_rfreg_rsp {
+	__u8 status;
+};
+
+#define NCI_PROP_GET_RFREG_VER	0x24
+
+struct nci_prop_get_rfreg_ver_rsp {
+	__u8 status;
+	__u8 data[8];
+};
+
+#define NCI_PROP_SET_RFREG_VER	0x25
+
+struct nci_prop_set_rfreg_ver_cmd {
+	__u8 data[8];
+};
+
+struct nci_prop_set_rfreg_ver_rsp {
+	__u8 status;
+};
+
+#define NCI_PROP_START_RFREG	0x26
+
+struct nci_prop_start_rfreg_rsp {
+	__u8 status;
+};
+
+#define NCI_PROP_STOP_RFREG	0x27
+
+struct nci_prop_stop_rfreg_cmd {
+	__u16 checksum;
+};
+
+struct nci_prop_stop_rfreg_rsp {
+	__u8 status;
+};
+
+#define NCI_PROP_FW_CFG		0x28
+
+struct nci_prop_fw_cfg_cmd {
+	__u8 clk_type;
+	__u8 clk_speed;
+	__u8 clk_req;
+};
+
+struct nci_prop_fw_cfg_rsp {
+	__u8 status;
+};
+
+#define NCI_PROP_WR_RESET	0x2f
+
+void s3fwrn5_nci_get_prop_ops(struct nci_prop_ops **ops, size_t *n);
+int s3fwrn5_nci_rf_configure(struct s3fwrn5_info *info, const char *fw_name);
+
+#endif /* __LOCAL_S3FWRN5_NCI_H_ */
diff --git a/drivers/nfc/s3fwrn5/s3fwrn5.h b/drivers/nfc/s3fwrn5/s3fwrn5.h
new file mode 100644
index 000000000000..89210d4828b8
--- /dev/null
+++ b/drivers/nfc/s3fwrn5/s3fwrn5.h
@@ -0,0 +1,99 @@
+/*
+ * NCI based driver for Samsung S3FWRN5 NFC chip
+ *
+ * Copyright (C) 2015 Samsung Electrnoics
+ * Robert Baldyga <r.baldyga@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __LOCAL_S3FWRN5_H_
+#define __LOCAL_S3FWRN5_H_
+
+#include <linux/nfc.h>
+
+#include <net/nfc/nci_core.h>
+
+#include "firmware.h"
+
+enum s3fwrn5_mode {
+	S3FWRN5_MODE_COLD,
+	S3FWRN5_MODE_NCI,
+	S3FWRN5_MODE_FW,
+};
+
+struct s3fwrn5_phy_ops {
+	void (*set_wake)(void *id, bool sleep);
+	void (*set_mode)(void *id, enum s3fwrn5_mode);
+	enum s3fwrn5_mode (*get_mode)(void *id);
+	int (*write)(void *id, struct sk_buff *skb);
+};
+
+struct s3fwrn5_info {
+	struct nci_dev *ndev;
+	void *phy_id;
+	struct device *pdev;
+
+	struct s3fwrn5_phy_ops *phy_ops;
+	unsigned int max_payload;
+
+	struct s3fwrn5_fw_info fw_info;
+
+	struct mutex mutex;
+};
+
+static inline int s3fwrn5_set_mode(struct s3fwrn5_info *info,
+	enum s3fwrn5_mode mode)
+{
+	if (!info->phy_ops->set_mode)
+		return -ENOTSUPP;
+
+	info->phy_ops->set_mode(info->phy_id, mode);
+
+	return 0;
+}
+
+static inline enum s3fwrn5_mode s3fwrn5_get_mode(struct s3fwrn5_info *info)
+{
+	if (!info->phy_ops->get_mode)
+		return -ENOTSUPP;
+
+	return info->phy_ops->get_mode(info->phy_id);
+}
+
+static inline int s3fwrn5_set_wake(struct s3fwrn5_info *info, bool wake)
+{
+	if (!info->phy_ops->set_wake)
+		return -ENOTSUPP;
+
+	info->phy_ops->set_wake(info->phy_id, wake);
+
+	return 0;
+}
+
+static inline int s3fwrn5_write(struct s3fwrn5_info *info, struct sk_buff *skb)
+{
+	if (!info->phy_ops->write)
+		return -ENOTSUPP;
+
+	return info->phy_ops->write(info->phy_id, skb);
+}
+
+int s3fwrn5_probe(struct nci_dev **ndev, void *phy_id, struct device *pdev,
+	struct s3fwrn5_phy_ops *phy_ops, unsigned int max_payload);
+void s3fwrn5_remove(struct nci_dev *ndev);
+
+int s3fwrn5_recv_frame(struct nci_dev *ndev, struct sk_buff *skb,
+	enum s3fwrn5_mode mode);
+
+#endif /* __LOCAL_S3FWRN5_H_ */
-- 
cgit v1.2.3


From a493f339a88ddd20693460c1dcf8230aa3732b8b Mon Sep 17 00:00:00 2001
From: Eric Anholt <eric@anholt.net>
Date: Thu, 6 Aug 2015 16:00:31 -0700
Subject: irqchip/bcm2835: Add support for being used as a second level
 controller

The BCM2836 (Raspberry Pi 2) uses two levels of interrupt handling
with the CPU-local interrupts being the root, so we need to register
ours as chained off of the CPU's local interrupt.

Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>
Cc: linux-rpi-kernel@lists.infradead.org
Cc: Lee Jones <lee@kernel.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1438902033-31477-3-git-send-email-eric@anholt.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 .../brcm,bcm2835-armctrl-ic.txt                    | 25 ++++++++++++-
 drivers/irqchip/irq-bcm2835.c                      | 43 ++++++++++++++++++++--
 2 files changed, 64 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2835-armctrl-ic.txt b/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2835-armctrl-ic.txt
index 7da578d72123..2d6c8bb4d827 100644
--- a/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2835-armctrl-ic.txt
+++ b/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2835-armctrl-ic.txt
@@ -5,9 +5,14 @@ The BCM2835 contains a custom top-level interrupt controller, which supports
 controller, or the HW block containing it, is referred to occasionally
 as "armctrl" in the SoC documentation, hence naming of this binding.
 
+The BCM2836 contains the same interrupt controller with the same
+interrupts, but the per-CPU interrupt controller is the root, and an
+interrupt there indicates that the ARMCTRL has an interrupt to handle.
+
 Required properties:
 
-- compatible : should be "brcm,bcm2835-armctrl-ic"
+- compatible : should be "brcm,bcm2835-armctrl-ic" or
+                 "brcm,bcm2836-armctrl-ic"
 - reg : Specifies base physical address and size of the registers.
 - interrupt-controller : Identifies the node as an interrupt controller
 - #interrupt-cells : Specifies the number of cells needed to encode an
@@ -20,6 +25,12 @@ Required properties:
   The 2nd cell contains the interrupt number within the bank. Valid values
   are 0..7 for bank 0, and 0..31 for bank 1.
 
+Additional required properties for brcm,bcm2836-armctrl-ic:
+- interrupt-parent : Specifies the parent interrupt controller when this
+  controller is the second level.
+- interrupts : Specifies the interrupt on the parent for this interrupt
+  controller to handle.
+
 The interrupt sources are as follows:
 
 Bank 0:
@@ -102,9 +113,21 @@ Bank 2:
 
 Example:
 
+/* BCM2835, first level */
 intc: interrupt-controller {
 	compatible = "brcm,bcm2835-armctrl-ic";
 	reg = <0x7e00b200 0x200>;
 	interrupt-controller;
 	#interrupt-cells = <2>;
 };
+
+/* BCM2836, second level */
+intc: interrupt-controller {
+	compatible = "brcm,bcm2836-armctrl-ic";
+	reg = <0x7e00b200 0x200>;
+	interrupt-controller;
+	#interrupt-cells = <2>;
+
+	interrupt-parent = <&local_intc>;
+	interrupts = <8>;
+};
diff --git a/drivers/irqchip/irq-bcm2835.c b/drivers/irqchip/irq-bcm2835.c
index a40d97268b8a..ed4ca9deca70 100644
--- a/drivers/irqchip/irq-bcm2835.c
+++ b/drivers/irqchip/irq-bcm2835.c
@@ -96,6 +96,7 @@ struct armctrl_ic {
 static struct armctrl_ic intc __read_mostly;
 static void __exception_irq_entry bcm2835_handle_irq(
 	struct pt_regs *regs);
+static void bcm2836_chained_handle_irq(unsigned int irq, struct irq_desc *desc);
 
 static void armctrl_mask_irq(struct irq_data *d)
 {
@@ -139,7 +140,8 @@ static const struct irq_domain_ops armctrl_ops = {
 };
 
 static int __init armctrl_of_init(struct device_node *node,
-	struct device_node *parent)
+				  struct device_node *parent,
+				  bool is_2836)
 {
 	void __iomem *base;
 	int irq, b, i;
@@ -168,10 +170,34 @@ static int __init armctrl_of_init(struct device_node *node,
 		}
 	}
 
-	set_handle_irq(bcm2835_handle_irq);
+	if (is_2836) {
+		int parent_irq = irq_of_parse_and_map(node, 0);
+
+		if (!parent_irq) {
+			panic("%s: unable to get parent interrupt.\n",
+			      node->full_name);
+		}
+		irq_set_chained_handler(parent_irq, bcm2836_chained_handle_irq);
+	} else {
+		set_handle_irq(bcm2835_handle_irq);
+	}
+
 	return 0;
 }
 
+static int __init bcm2835_armctrl_of_init(struct device_node *node,
+					  struct device_node *parent)
+{
+	return armctrl_of_init(node, parent, false);
+}
+
+static int __init bcm2836_armctrl_of_init(struct device_node *node,
+					  struct device_node *parent)
+{
+	return armctrl_of_init(node, parent, true);
+}
+
+
 /*
  * Handle each interrupt across the entire interrupt controller.  This reads the
  * status register before handling each interrupt, which is necessary given that
@@ -219,4 +245,15 @@ static void __exception_irq_entry bcm2835_handle_irq(
 		handle_IRQ(irq_linear_revmap(intc.domain, hwirq), regs);
 }
 
-IRQCHIP_DECLARE(bcm2835_armctrl_ic, "brcm,bcm2835-armctrl-ic", armctrl_of_init);
+static void bcm2836_chained_handle_irq(unsigned int irq, struct irq_desc *desc)
+{
+	u32 hwirq;
+
+	while ((hwirq = get_next_armctrl_hwirq()) != ~0)
+		generic_handle_irq(irq_linear_revmap(intc.domain, hwirq));
+}
+
+IRQCHIP_DECLARE(bcm2835_armctrl_ic, "brcm,bcm2835-armctrl-ic",
+		bcm2835_armctrl_of_init);
+IRQCHIP_DECLARE(bcm2836_armctrl_ic, "brcm,bcm2836-armctrl-ic",
+		bcm2836_armctrl_of_init);
-- 
cgit v1.2.3


From 815e7a31c3f7929f371da9c7e9ee91cfd52ef453 Mon Sep 17 00:00:00 2001
From: Eric Anholt <eric@anholt.net>
Date: Thu, 6 Aug 2015 16:00:32 -0700
Subject: irqchip: Add documentation for the bcm2836 interrupt controller

This is a new per-cpu root interrupt controller on the Raspberry Pi 2,
which will chain to the bcm2835 interrupt controller for peripheral
interrupts.

Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Stephen Warren <swarren@wwwdotorg.org>
Cc: linux-rpi-kernel@lists.infradead.org
Cc: Lee Jones <lee@kernel.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1438902033-31477-4-git-send-email-eric@anholt.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 .../interrupt-controller/brcm,bcm2836-l1-intc.txt  | 37 ++++++++++++++++++++++
 1 file changed, 37 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2836-l1-intc.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2836-l1-intc.txt b/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2836-l1-intc.txt
new file mode 100644
index 000000000000..f320dcd6e69b
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/brcm,bcm2836-l1-intc.txt
@@ -0,0 +1,37 @@
+BCM2836 per-CPU interrupt controller
+
+The BCM2836 has a per-cpu interrupt controller for the timer, PMU
+events, and SMP IPIs.  One of the CPUs may receive interrupts for the
+peripheral (GPU) events, which chain to the BCM2835-style interrupt
+controller.
+
+Required properties:
+
+- compatible:	 	Should be "brcm,bcm2836-l1-intc"
+- reg:			Specifies base physical address and size of the
+			  registers
+- interrupt-controller:	Identifies the node as an interrupt controller
+- #interrupt-cells:	Specifies the number of cells needed to encode an
+			  interrupt source. The value shall be 1
+
+Please refer to interrupts.txt in this directory for details of the common
+Interrupt Controllers bindings used by client devices.
+
+The interrupt sources are as follows:
+
+0: CNTPSIRQ
+1: CNTPNSIRQ
+2: CNTHPIRQ
+3: CNTVIRQ
+8: GPU_FAST
+9: PMU_FAST
+
+Example:
+
+local_intc: local_intc {
+	compatible = "brcm,bcm2836-l1-intc";
+	reg = <0x40000000 0x100>;
+	interrupt-controller;
+	#interrupt-cells = <1>;
+	interrupt-parent = <&local_intc>;
+};
-- 
cgit v1.2.3


From 4b226fbde68b8bfb66452067523a677b8e6492fa Mon Sep 17 00:00:00 2001
From: Michael van der Westhuizen <michael@smart-africa.com>
Date: Tue, 18 Aug 2015 22:21:52 +0200
Subject: dt: snps,dw-apb-ssi: Document new I/O data register width property

This change documents a new property for the snps,dw-apb-ssi device,
allowing an implementer to specify either four byte or two bytes
access to the SPI controller data register.

This supports a change that unbreaks this driver on picoXcell
platforms.

Signed-off-by: Michael van der Westhuizen <michael@smart-africa.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt
index bd99193e87b9..204b311e0400 100644
--- a/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt
+++ b/Documentation/devicetree/bindings/spi/snps,dw-apb-ssi.txt
@@ -10,6 +10,8 @@ Required properties:
 Optional properties:
 - cs-gpios : Specifies the gpio pis to be used for chipselects.
 - num-cs : The number of chipselects. If omitted, this will default to 4.
+- reg-io-width : The I/O register width (in bytes) implemented by this
+  device.  Supported values are 2 or 4 (the default).
 
 Child nodes as per the generic SPI binding.
 
-- 
cgit v1.2.3


From 88c25dcb1185f2c7041550976430e27fa9d8dcca Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Fri, 21 Aug 2015 13:37:54 -0300
Subject: [media] DocBook: fix an unbalanced <para> tag

The </para> got lost on some change at the DVB docbook, making
the tag unbalanced and causing a warning. Fix it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/media/dvb/intro.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/media/dvb/intro.xml b/Documentation/DocBook/media/dvb/intro.xml
index d30751338493..51db15648099 100644
--- a/Documentation/DocBook/media/dvb/intro.xml
+++ b/Documentation/DocBook/media/dvb/intro.xml
@@ -164,7 +164,7 @@ are called:</para>
 from&#x00A0;0, and M enumerates the devices of each type within each
 adapter, starting from&#x00A0;0, too. We will omit the &#8220;
 <constant>/dev/dvb/adapterN/</constant>&#8221; in the further discussion
-of these devices.
+of these devices.</para>
 
 <para>More details about the data structures and function calls of all
 the devices are described in the following chapters.</para>
-- 
cgit v1.2.3


From dc2c8bd3c9a44ed38d9af6c7243fdddc42ec391a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Fri, 21 Aug 2015 14:17:13 -0300
Subject: [media] DocBook/device-drivers: Add drivers/media core stuff

There are lots of docbook marks at the media subsystem, but
those aren't used.

Add the core headers/code in order to start generating docs.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl | 30 ++++++++++++++++++++++++++++++
 drivers/media/dvb-core/dvb_math.h         | 10 ++++++----
 2 files changed, 36 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index faf09d4a0ea8..e3e0f4880770 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -216,6 +216,36 @@ X!Isound/sound_firmware.c
 -->
   </chapter>
 
+  <chapter id="mediadev">
+     <title>Media Devices</title>
+!Iinclude/media/media-device.h
+!Iinclude/media/media-devnode.h
+!Iinclude/media/media-entity.h
+!Iinclude/media/v4l2-async.h
+!Iinclude/media/v4l2-flash-led-class.h
+!Iinclude/media/v4l2-mem2mem.h
+!Iinclude/media/v4l2-of.h
+!Iinclude/media/v4l2-subdev.h
+!Iinclude/media/rc-core.h
+<!-- FIXME: Removed for now due to document generation inconsistency
+X!Iinclude/media/v4l2-ctrls.h
+X!Iinclude/media/v4l2-dv-timings.h
+X!Iinclude/media/v4l2-event.h
+X!Iinclude/media/v4l2-mediabus.h
+X!Iinclude/media/videobuf2-memops.h
+X!Iinclude/media/videobuf2-core.h
+X!Iinclude/media/lirc.h
+X!Edrivers/media/dvb-core/dvb_demux.c
+X!Idrivers/media/dvb-core/dvb_frontend.h
+X!Idrivers/media/dvb-core/dvbdev.h
+X!Edrivers/media/dvb-core/dvb_net.c
+X!Idrivers/media/dvb-core/dvb_ringbuffer.h
+X!Idrivers/media/dvb-core/dvb_ca_en50221.h
+X!Idrivers/media/dvb-core/dvb_math.h
+-->
+
+  </chapter>
+
   <chapter id="uart16x50">
      <title>16x50 UART Driver</title>
 !Edrivers/tty/serial/serial_core.c
diff --git a/drivers/media/dvb-core/dvb_math.h b/drivers/media/dvb-core/dvb_math.h
index aecc867e9404..f586aa001ede 100644
--- a/drivers/media/dvb-core/dvb_math.h
+++ b/drivers/media/dvb-core/dvb_math.h
@@ -30,9 +30,10 @@
  * to use rational values you can use the following method:
  *   intlog2(value) = intlog2(value * 2^x) - x * 2^24
  *
- * example: intlog2(8) will give 3 << 24 = 3 * 2^24
- * example: intlog2(9) will give 3 << 24 + ... = 3.16... * 2^24
- * example: intlog2(1.5) = intlog2(3) - 2^24 = 0.584... * 2^24
+ * Some usecase examples:
+ *	intlog2(8) will give 3 << 24 = 3 * 2^24
+ *	intlog2(9) will give 3 << 24 + ... = 3.16... * 2^24
+ *	intlog2(1.5) = intlog2(3) - 2^24 = 0.584... * 2^24
  *
  * @param value The value (must be != 0)
  * @return log2(value) * 2^24
@@ -45,7 +46,8 @@ extern unsigned int intlog2(u32 value);
  * to use rational values you can use the following method:
  *   intlog10(value) = intlog10(value * 10^x) - x * 2^24
  *
- * example: intlog10(1000) will give 3 << 24 = 3 * 2^24
+ * An usecase example:
+ *	intlog10(1000) will give 3 << 24 = 3 * 2^24
  *   due to the implementation intlog10(1000) might be not exactly 3 * 2^24
  *
  * look at intlog2 for similar examples
-- 
cgit v1.2.3


From 5240f4e68d4209588194ea1db60f7c822e2a1c6f Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Sat, 22 Aug 2015 05:02:21 -0300
Subject: [media] DocBook/media/Makefile: Avoid make htmldocs to fail

If make is called twice like that:
	make V=1 DOCBOOKS=device-drivers.xml htmldocs

Make will fail with:

	make -f ./scripts/Makefile.build obj=scripts/basic
	rm -f .tmp_quiet_recordmcount
	make -f ./scripts/Makefile.build obj=scripts build_docproc
	make -f ./scripts/Makefile.build obj=Documentation/DocBook htmldocs
	rm -rf Documentation/DocBook/index.html; echo '<h1>Linux Kernel HTML Documentation</h1>' >> Documentation/DocBook/index.html && echo '<h2>Kernel Version: 4.2.0-rc2</h2>' >> Documentation/DocBook/index.html && cat Documentation/DocBook/device-drivers.html >> Documentation/DocBook/index.html
	cp ./Documentation/DocBook//bayer.png ./Documentation/DocBook//constraints.png ./Documentation/DocBook//crop.gif ./Documentation/DocBook//dvbstb.png ./Documentation/DocBook//fieldseq_bt.gif ./Documentation/DocBook//fieldseq_tb.gif ./Documentation/DocBook//nv12mt.gif ./Documentation/DocBook//nv12mt_example.gif ./Documentation/DocBook//pipeline.png ./Documentation/DocBook//selection.png ./Documentation/DocBook//vbi_525.gif ./Documentation/DocBook//vbi_625.gif ./Documentation/DocBook//vbi_hsync.gif ./Documentation/DocBook/media/*.svg ./Documentation/DocBook/media/v4l/*.svg ./Documentation/DocBook//media_api
	cp: target './Documentation/DocBook//media_api' is not a directory
	Documentation/DocBook/Makefile:53: recipe for target 'htmldocs' failed

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/media/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/media/Makefile b/Documentation/DocBook/media/Makefile
index 23996f88cd58..08527e7ea4d0 100644
--- a/Documentation/DocBook/media/Makefile
+++ b/Documentation/DocBook/media/Makefile
@@ -199,7 +199,8 @@ DVB_DOCUMENTED = \
 #
 
 install_media_images = \
-	$(Q)-cp $(OBJIMGFILES) $(MEDIA_SRC_DIR)/*.svg $(MEDIA_SRC_DIR)/v4l/*.svg $(MEDIA_OBJ_DIR)/media_api
+	$(Q)-mkdir $(MEDIA_OBJ_DIR)/media_api; \
+	cp $(OBJIMGFILES) $(MEDIA_SRC_DIR)/*.svg $(MEDIA_SRC_DIR)/v4l/*.svg $(MEDIA_OBJ_DIR)/media_api
 
 $(MEDIA_OBJ_DIR)/%: $(MEDIA_SRC_DIR)/%.b64
 	$(Q)base64 -d $< >$@
-- 
cgit v1.2.3


From fbefb1a87c7e6e24df6ca5b42b42985e2680c2ea Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Sat, 22 Aug 2015 07:09:29 -0300
Subject: [media] DocBook: add dvb_ca_en50221.h to documentation

There are already some tags at dvb_ca_en50221.h, but using a
different format. Convert them, fix a few entries and add
to the device-drivers DocBook.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl |   2 +-
 drivers/media/dvb-core/dvb_ca_en50221.c   | 167 +++++++++++++++---------------
 drivers/media/dvb-core/dvb_ca_en50221.h   |  34 +++---
 3 files changed, 102 insertions(+), 101 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index e3e0f4880770..e0c358760344 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -227,6 +227,7 @@ X!Isound/sound_firmware.c
 !Iinclude/media/v4l2-of.h
 !Iinclude/media/v4l2-subdev.h
 !Iinclude/media/rc-core.h
+!Idrivers/media/dvb-core/dvb_ca_en50221.h
 <!-- FIXME: Removed for now due to document generation inconsistency
 X!Iinclude/media/v4l2-ctrls.h
 X!Iinclude/media/v4l2-dv-timings.h
@@ -240,7 +241,6 @@ X!Idrivers/media/dvb-core/dvb_frontend.h
 X!Idrivers/media/dvb-core/dvbdev.h
 X!Edrivers/media/dvb-core/dvb_net.c
 X!Idrivers/media/dvb-core/dvb_ringbuffer.h
-X!Idrivers/media/dvb-core/dvb_ca_en50221.h
 X!Idrivers/media/dvb-core/dvb_math.h
 -->
 
diff --git a/drivers/media/dvb-core/dvb_ca_en50221.c b/drivers/media/dvb-core/dvb_ca_en50221.c
index 72937756f60c..fb66184dc9b6 100644
--- a/drivers/media/dvb-core/dvb_ca_en50221.c
+++ b/drivers/media/dvb-core/dvb_ca_en50221.c
@@ -169,10 +169,10 @@ static int dvb_ca_en50221_write_data(struct dvb_ca_private *ca, int slot, u8 * e
 /**
  * Safely find needle in haystack.
  *
- * @param haystack Buffer to look in.
- * @param hlen Number of bytes in haystack.
- * @param needle Buffer to find.
- * @param nlen Number of bytes in needle.
+ * @haystack: Buffer to look in.
+ * @hlen: Number of bytes in haystack.
+ * @needle: Buffer to find.
+ * @nlen: Number of bytes in needle.
  * @return Pointer into haystack needle was found at, or NULL if not found.
  */
 static char *findstr(char * haystack, int hlen, char * needle, int nlen)
@@ -197,7 +197,7 @@ static char *findstr(char * haystack, int hlen, char * needle, int nlen)
 
 
 /**
- * Check CAM status.
+ * dvb_ca_en50221_check_camstatus - Check CAM status.
  */
 static int dvb_ca_en50221_check_camstatus(struct dvb_ca_private *ca, int slot)
 {
@@ -240,13 +240,13 @@ static int dvb_ca_en50221_check_camstatus(struct dvb_ca_private *ca, int slot)
 
 
 /**
- * Wait for flags to become set on the STATUS register on a CAM interface,
- * checking for errors and timeout.
+ * dvb_ca_en50221_wait_if_status - Wait for flags to become set on the STATUS
+ *	 register on a CAM interface, checking for errors and timeout.
  *
- * @param ca CA instance.
- * @param slot Slot on interface.
- * @param waitfor Flags to wait for.
- * @param timeout_ms Timeout in milliseconds.
+ * @ca: CA instance.
+ * @slot: Slot on interface.
+ * @waitfor: Flags to wait for.
+ * @timeout_ms: Timeout in milliseconds.
  *
  * @return 0 on success, nonzero on error.
  */
@@ -290,10 +290,10 @@ static int dvb_ca_en50221_wait_if_status(struct dvb_ca_private *ca, int slot,
 
 
 /**
- * Initialise the link layer connection to a CAM.
+ * dvb_ca_en50221_link_init - Initialise the link layer connection to a CAM.
  *
- * @param ca CA instance.
- * @param slot Slot id.
+ * @ca: CA instance.
+ * @slot: Slot id.
  *
  * @return 0 on success, nonzero on failure.
  */
@@ -346,14 +346,14 @@ static int dvb_ca_en50221_link_init(struct dvb_ca_private *ca, int slot)
 }
 
 /**
- * Read a tuple from attribute memory.
+ * dvb_ca_en50221_read_tuple - Read a tuple from attribute memory.
  *
- * @param ca CA instance.
- * @param slot Slot id.
- * @param address Address to read from. Updated.
- * @param tupleType Tuple id byte. Updated.
- * @param tupleLength Tuple length. Updated.
- * @param tuple Dest buffer for tuple (must be 256 bytes). Updated.
+ * @ca: CA instance.
+ * @slot: Slot id.
+ * @address: Address to read from. Updated.
+ * @tupleType: Tuple id byte. Updated.
+ * @tupleLength: Tuple length. Updated.
+ * @tuple: Dest buffer for tuple (must be 256 bytes). Updated.
  *
  * @return 0 on success, nonzero on error.
  */
@@ -399,11 +399,11 @@ static int dvb_ca_en50221_read_tuple(struct dvb_ca_private *ca, int slot,
 
 
 /**
- * Parse attribute memory of a CAM module, extracting Config register, and checking
- * it is a DVB CAM module.
+ * dvb_ca_en50221_parse_attributes - Parse attribute memory of a CAM module,
+ *	extracting Config register, and checking it is a DVB CAM module.
  *
- * @param ca CA instance.
- * @param slot Slot id.
+ * @ca: CA instance.
+ * @slot: Slot id.
  *
  * @return 0 on success, <0 on failure.
  */
@@ -546,10 +546,10 @@ static int dvb_ca_en50221_parse_attributes(struct dvb_ca_private *ca, int slot)
 
 
 /**
- * Set CAM's configoption correctly.
+ * dvb_ca_en50221_set_configoption - Set CAM's configoption correctly.
  *
- * @param ca CA instance.
- * @param slot Slot containing the CAM.
+ * @ca: CA instance.
+ * @slot: Slot containing the CAM.
  */
 static int dvb_ca_en50221_set_configoption(struct dvb_ca_private *ca, int slot)
 {
@@ -574,15 +574,16 @@ static int dvb_ca_en50221_set_configoption(struct dvb_ca_private *ca, int slot)
 
 
 /**
- * This function talks to an EN50221 CAM control interface. It reads a buffer of
- * data from the CAM. The data can either be stored in a supplied buffer, or
- * automatically be added to the slot's rx_buffer.
+ * dvb_ca_en50221_read_data - This function talks to an EN50221 CAM control
+ *	interface. It reads a buffer of data from the CAM. The data can either
+ *	be stored in a supplied buffer, or automatically be added to the slot's
+ *	rx_buffer.
  *
- * @param ca CA instance.
- * @param slot Slot to read from.
- * @param ebuf If non-NULL, the data will be written to this buffer. If NULL,
+ * @ca: CA instance.
+ * @slot: Slot to read from.
+ * @ebuf: If non-NULL, the data will be written to this buffer. If NULL,
  * the data will be added into the buffering system as a normal fragment.
- * @param ecount Size of ebuf. Ignored if ebuf is NULL.
+ * @ecount: Size of ebuf. Ignored if ebuf is NULL.
  *
  * @return Number of bytes read, or < 0 on error
  */
@@ -698,14 +699,14 @@ exit:
 
 
 /**
- * This function talks to an EN50221 CAM control interface. It writes a buffer of data
- * to a CAM.
+ * dvb_ca_en50221_write_data - This function talks to an EN50221 CAM control
+ *				interface. It writes a buffer of data to a CAM.
  *
- * @param ca CA instance.
- * @param slot Slot to write to.
- * @param ebuf The data in this buffer is treated as a complete link-level packet to
+ * @ca: CA instance.
+ * @slot: Slot to write to.
+ * @ebuf: The data in this buffer is treated as a complete link-level packet to
  * be written.
- * @param count Size of ebuf.
+ * @count: Size of ebuf.
  *
  * @return Number of bytes written, or < 0 on error.
  */
@@ -790,10 +791,10 @@ EXPORT_SYMBOL(dvb_ca_en50221_camchange_irq);
 
 
 /**
- * A CAM has been removed => shut it down.
+ * dvb_ca_en50221_camready_irq - A CAM has been removed => shut it down.
  *
- * @param ca CA instance.
- * @param slot Slot to shut down.
+ * @ca: CA instance.
+ * @slot: Slot to shut down.
  */
 static int dvb_ca_en50221_slot_shutdown(struct dvb_ca_private *ca, int slot)
 {
@@ -815,11 +816,11 @@ EXPORT_SYMBOL(dvb_ca_en50221_camready_irq);
 
 
 /**
- * A CAMCHANGE IRQ has occurred.
+ * dvb_ca_en50221_camready_irq - A CAMCHANGE IRQ has occurred.
  *
- * @param ca CA instance.
- * @param slot Slot concerned.
- * @param change_type One of the DVB_CA_CAMCHANGE_* values.
+ * @ca: CA instance.
+ * @slot: Slot concerned.
+ * @change_type: One of the DVB_CA_CAMCHANGE_* values.
  */
 void dvb_ca_en50221_camchange_irq(struct dvb_ca_en50221 *pubca, int slot, int change_type)
 {
@@ -844,10 +845,10 @@ EXPORT_SYMBOL(dvb_ca_en50221_frda_irq);
 
 
 /**
- * A CAMREADY IRQ has occurred.
+ * dvb_ca_en50221_camready_irq - A CAMREADY IRQ has occurred.
  *
- * @param ca CA instance.
- * @param slot Slot concerned.
+ * @ca: CA instance.
+ * @slot: Slot concerned.
  */
 void dvb_ca_en50221_camready_irq(struct dvb_ca_en50221 *pubca, int slot)
 {
@@ -865,8 +866,8 @@ void dvb_ca_en50221_camready_irq(struct dvb_ca_en50221 *pubca, int slot)
 /**
  * An FR or DA IRQ has occurred.
  *
- * @param ca CA instance.
- * @param slot Slot concerned.
+ * @ca: CA instance.
+ * @slot: Slot concerned.
  */
 void dvb_ca_en50221_frda_irq(struct dvb_ca_en50221 *pubca, int slot)
 {
@@ -899,7 +900,7 @@ void dvb_ca_en50221_frda_irq(struct dvb_ca_en50221 *pubca, int slot)
 /**
  * Wake up the DVB CA thread
  *
- * @param ca CA instance.
+ * @ca: CA instance.
  */
 static void dvb_ca_en50221_thread_wakeup(struct dvb_ca_private *ca)
 {
@@ -914,7 +915,7 @@ static void dvb_ca_en50221_thread_wakeup(struct dvb_ca_private *ca)
 /**
  * Update the delay used by the thread.
  *
- * @param ca CA instance.
+ * @ca: CA instance.
  */
 static void dvb_ca_en50221_thread_update_delay(struct dvb_ca_private *ca)
 {
@@ -1177,10 +1178,10 @@ static int dvb_ca_en50221_thread(void *data)
  * Real ioctl implementation.
  * NOTE: CA_SEND_MSG/CA_GET_MSG ioctls have userspace buffers passed to them.
  *
- * @param inode Inode concerned.
- * @param file File concerned.
- * @param cmd IOCTL command.
- * @param arg Associated argument.
+ * @inode: Inode concerned.
+ * @file: File concerned.
+ * @cmd: IOCTL command.
+ * @arg: Associated argument.
  *
  * @return 0 on success, <0 on error.
  */
@@ -1258,10 +1259,10 @@ out_unlock:
 /**
  * Wrapper for ioctl implementation.
  *
- * @param inode Inode concerned.
- * @param file File concerned.
- * @param cmd IOCTL command.
- * @param arg Associated argument.
+ * @inode: Inode concerned.
+ * @file: File concerned.
+ * @cmd: IOCTL command.
+ * @arg: Associated argument.
  *
  * @return 0 on success, <0 on error.
  */
@@ -1275,10 +1276,10 @@ static long dvb_ca_en50221_io_ioctl(struct file *file,
 /**
  * Implementation of write() syscall.
  *
- * @param file File structure.
- * @param buf Source buffer.
- * @param count Size of source buffer.
- * @param ppos Position in file (ignored).
+ * @file: File structure.
+ * @buf: Source buffer.
+ * @count: Size of source buffer.
+ * @ppos: Position in file (ignored).
  *
  * @return Number of bytes read, or <0 on error.
  */
@@ -1416,10 +1417,10 @@ nextslot:
 /**
  * Implementation of read() syscall.
  *
- * @param file File structure.
- * @param buf Destination buffer.
- * @param count Size of destination buffer.
- * @param ppos Position in file (ignored).
+ * @file: File structure.
+ * @buf: Destination buffer.
+ * @count: Size of destination buffer.
+ * @ppos: Position in file (ignored).
  *
  * @return Number of bytes read, or <0 on error.
  */
@@ -1519,8 +1520,8 @@ exit:
 /**
  * Implementation of file open syscall.
  *
- * @param inode Inode concerned.
- * @param file File concerned.
+ * @inode: Inode concerned.
+ * @file: File concerned.
  *
  * @return 0 on success, <0 on failure.
  */
@@ -1564,8 +1565,8 @@ static int dvb_ca_en50221_io_open(struct inode *inode, struct file *file)
 /**
  * Implementation of file close syscall.
  *
- * @param inode Inode concerned.
- * @param file File concerned.
+ * @inode: Inode concerned.
+ * @file: File concerned.
  *
  * @return 0 on success, <0 on failure.
  */
@@ -1592,8 +1593,8 @@ static int dvb_ca_en50221_io_release(struct inode *inode, struct file *file)
 /**
  * Implementation of poll() syscall.
  *
- * @param file File concerned.
- * @param wait poll wait table.
+ * @file: File concerned.
+ * @wait: poll wait table.
  *
  * @return Standard poll mask.
  */
@@ -1656,10 +1657,10 @@ static const struct dvb_device dvbdev_ca = {
 /**
  * Initialise a new DVB CA EN50221 interface device.
  *
- * @param dvb_adapter DVB adapter to attach the new CA device to.
- * @param ca The dvb_ca instance.
- * @param flags Flags describing the CA device (DVB_CA_FLAG_*).
- * @param slot_count Number of slots supported.
+ * @dvb_adapter: DVB adapter to attach the new CA device to.
+ * @ca: The dvb_ca instance.
+ * @flags: Flags describing the CA device (DVB_CA_FLAG_*).
+ * @slot_count: Number of slots supported.
  *
  * @return 0 on success, nonzero on failure
  */
@@ -1743,8 +1744,8 @@ EXPORT_SYMBOL(dvb_ca_en50221_release);
 /**
  * Release a DVB CA EN50221 interface device.
  *
- * @param ca_dev The dvb_device_t instance for the CA device.
- * @param ca The associated dvb_ca instance.
+ * @ca_dev: The dvb_device_t instance for the CA device.
+ * @ca: The associated dvb_ca instance.
  */
 void dvb_ca_en50221_release(struct dvb_ca_en50221 *pubca)
 {
diff --git a/drivers/media/dvb-core/dvb_ca_en50221.h b/drivers/media/dvb-core/dvb_ca_en50221.h
index 7df2e141187a..aba3b4fbd704 100644
--- a/drivers/media/dvb-core/dvb_ca_en50221.h
+++ b/drivers/media/dvb-core/dvb_ca_en50221.h
@@ -83,27 +83,27 @@ struct dvb_ca_en50221 {
 /* Functions for reporting IRQ events */
 
 /**
- * A CAMCHANGE IRQ has occurred.
+ * dvb_ca_en50221_camchange_irq - A CAMCHANGE IRQ has occurred.
  *
- * @param ca CA instance.
- * @param slot Slot concerned.
- * @param change_type One of the DVB_CA_CAMCHANGE_* values
+ * @pubca: CA instance.
+ * @slot: Slot concerned.
+ * @change_type: One of the DVB_CA_CAMCHANGE_* values
  */
 void dvb_ca_en50221_camchange_irq(struct dvb_ca_en50221* pubca, int slot, int change_type);
 
 /**
- * A CAMREADY IRQ has occurred.
+ * dvb_ca_en50221_camready_irq - A CAMREADY IRQ has occurred.
  *
- * @param ca CA instance.
- * @param slot Slot concerned.
+ * @pubca: CA instance.
+ * @slot: Slot concerned.
  */
 void dvb_ca_en50221_camready_irq(struct dvb_ca_en50221* pubca, int slot);
 
 /**
- * An FR or a DA IRQ has occurred.
+ * dvb_ca_en50221_frda_irq - An FR or a DA IRQ has occurred.
  *
- * @param ca CA instance.
- * @param slot Slot concerned.
+ * @ca: CA instance.
+ * @slot: Slot concerned.
  */
 void dvb_ca_en50221_frda_irq(struct dvb_ca_en50221* ca, int slot);
 
@@ -113,21 +113,21 @@ void dvb_ca_en50221_frda_irq(struct dvb_ca_en50221* ca, int slot);
 /* Initialisation/shutdown functions */
 
 /**
- * Initialise a new DVB CA device.
+ * dvb_ca_en50221_init - Initialise a new DVB CA device.
  *
- * @param dvb_adapter DVB adapter to attach the new CA device to.
- * @param ca The dvb_ca instance.
- * @param flags Flags describing the CA device (DVB_CA_EN50221_FLAG_*).
- * @param slot_count Number of slots supported.
+ * @dvb_adapter: DVB adapter to attach the new CA device to.
+ * @ca: The dvb_ca instance.
+ * @flags: Flags describing the CA device (DVB_CA_EN50221_FLAG_*).
+ * @slot_count: Number of slots supported.
  *
  * @return 0 on success, nonzero on failure
  */
 extern int dvb_ca_en50221_init(struct dvb_adapter *dvb_adapter, struct dvb_ca_en50221* ca, int flags, int slot_count);
 
 /**
- * Release a DVB CA device.
+ * dvb_ca_en50221_release - Release a DVB CA device.
  *
- * @param ca The associated dvb_ca instance.
+ * @ca: The associated dvb_ca instance.
  */
 extern void dvb_ca_en50221_release(struct dvb_ca_en50221* ca);
 
-- 
cgit v1.2.3


From 4f1c1868e8d4e493d66b8e11eebbc95abdbaf444 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Sat, 22 Aug 2015 07:19:20 -0300
Subject: [media] DocBook: add dvb_frontend.h to documentation

There are already some comments at dvb_frontend.h that are ready
for DocBook, although not properly formatted.

Convert them, and add this file to the device-drivers DocBook.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl |  2 +-
 drivers/media/dvb-core/dvb_frontend.h     | 68 +++++++++++++++----------------
 2 files changed, 34 insertions(+), 36 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index e0c358760344..fb5c16a24e4b 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -228,6 +228,7 @@ X!Isound/sound_firmware.c
 !Iinclude/media/v4l2-subdev.h
 !Iinclude/media/rc-core.h
 !Idrivers/media/dvb-core/dvb_ca_en50221.h
+!Idrivers/media/dvb-core/dvb_frontend.h
 <!-- FIXME: Removed for now due to document generation inconsistency
 X!Iinclude/media/v4l2-ctrls.h
 X!Iinclude/media/v4l2-dv-timings.h
@@ -237,7 +238,6 @@ X!Iinclude/media/videobuf2-memops.h
 X!Iinclude/media/videobuf2-core.h
 X!Iinclude/media/lirc.h
 X!Edrivers/media/dvb-core/dvb_demux.c
-X!Idrivers/media/dvb-core/dvb_frontend.h
 X!Idrivers/media/dvb-core/dvbdev.h
 X!Edrivers/media/dvb-core/dvb_net.c
 X!Idrivers/media/dvb-core/dvb_ringbuffer.h
diff --git a/drivers/media/dvb-core/dvb_frontend.h b/drivers/media/dvb-core/dvb_frontend.h
index 4816947294fe..d20d3da97201 100644
--- a/drivers/media/dvb-core/dvb_frontend.h
+++ b/drivers/media/dvb-core/dvb_frontend.h
@@ -121,30 +121,28 @@ enum tuner_param {
 	DVBFE_TUNER_DUMMY		= (1 << 31)
 };
 
-/*
- * ALGO_HW: (Hardware Algorithm)
- * ----------------------------------------------------------------
- * Devices that support this algorithm do everything in hardware
- * and no software support is needed to handle them.
- * Requesting these devices to LOCK is the only thing required,
- * device is supposed to do everything in the hardware.
+/**
+ * enum dvbfe_algo - defines the algorithm used to tune into a channel
+ *
+ * @DVBFE_ALGO_HW: (Hardware Algorithm)
+ *	Devices that support this algorithm do everything in hardware
+ *	and no software support is needed to handle them.
+ *	Requesting these devices to LOCK is the only thing required,
+ *	device is supposed to do everything in the hardware.
  *
- * ALGO_SW: (Software Algorithm)
- * ----------------------------------------------------------------
+ * @DVBFE_ALGO_SW: (Software Algorithm)
  * These are dumb devices, that require software to do everything
  *
- * ALGO_CUSTOM: (Customizable Agorithm)
- * ----------------------------------------------------------------
- * Devices having this algorithm can be customized to have specific
- * algorithms in the frontend driver, rather than simply doing a
- * software zig-zag. In this case the zigzag maybe hardware assisted
- * or it maybe completely done in hardware. In all cases, usage of
- * this algorithm, in conjunction with the search and track
- * callbacks, utilizes the driver specific algorithm.
+ * @DVBFE_ALGO_CUSTOM: (Customizable Agorithm)
+ *	Devices having this algorithm can be customized to have specific
+ *	algorithms in the frontend driver, rather than simply doing a
+ *	software zig-zag. In this case the zigzag maybe hardware assisted
+ *	or it maybe completely done in hardware. In all cases, usage of
+ *	this algorithm, in conjunction with the search and track
+ *	callbacks, utilizes the driver specific algorithm.
  *
- * ALGO_RECOVERY: (Recovery Algorithm)
- * ----------------------------------------------------------------
- * These devices have AUTO recovery capabilities from LOCK failure
+ * @DVBFE_ALGO_RECOVERY: (Recovery Algorithm)
+ *	These devices have AUTO recovery capabilities from LOCK failure
  */
 enum dvbfe_algo {
 	DVBFE_ALGO_HW			= (1 <<  0),
@@ -162,27 +160,27 @@ struct tuner_state {
 	u32 refclock;
 };
 
-/*
- * search callback possible return status
+/**
+ * enum dvbfe_search - search callback possible return status
  *
- * DVBFE_ALGO_SEARCH_SUCCESS
- * The frontend search algorithm completed and returned successfully
+ * @DVBFE_ALGO_SEARCH_SUCCESS:
+ *	The frontend search algorithm completed and returned successfully
  *
- * DVBFE_ALGO_SEARCH_ASLEEP
- * The frontend search algorithm is sleeping
+ * @DVBFE_ALGO_SEARCH_ASLEEP:
+ *	The frontend search algorithm is sleeping
  *
- * DVBFE_ALGO_SEARCH_FAILED
- * The frontend search for a signal failed
+ * @DVBFE_ALGO_SEARCH_FAILED:
+ *	The frontend search for a signal failed
  *
- * DVBFE_ALGO_SEARCH_INVALID
- * The frontend search algorith was probably supplied with invalid
- * parameters and the search is an invalid one
+ * @DVBFE_ALGO_SEARCH_INVALID:
+ *	The frontend search algorith was probably supplied with invalid
+ *	parameters and the search is an invalid one
  *
- * DVBFE_ALGO_SEARCH_ERROR
- * The frontend search algorithm failed due to some error
+ * @DVBFE_ALGO_SEARCH_ERROR:
+ *	The frontend search algorithm failed due to some error
  *
- * DVBFE_ALGO_SEARCH_AGAIN
- * The frontend search algorithm was requested to search again
+ * @DVBFE_ALGO_SEARCH_AGAIN:
+ *	The frontend search algorithm was requested to search again
  */
 enum dvbfe_search {
 	DVBFE_ALGO_SEARCH_SUCCESS	= (1 <<  0),
-- 
cgit v1.2.3


From e08bb6f79fa0b9bf94ea44a306d1528172d788af Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Sat, 22 Aug 2015 07:37:28 -0300
Subject: [media] DocBook: add dvb_math.h to documentation

There are already some comments at dvb_math.h that are ready
for DocBook, although not properly formatted.

Convert them, fix some issues and add this file to
the device-drivers DocBook.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl |  2 +-
 drivers/media/dvb-core/dvb_math.h         | 15 +++++++++------
 2 files changed, 10 insertions(+), 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index fb5c16a24e4b..21fc7684d706 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -229,6 +229,7 @@ X!Isound/sound_firmware.c
 !Iinclude/media/rc-core.h
 !Idrivers/media/dvb-core/dvb_ca_en50221.h
 !Idrivers/media/dvb-core/dvb_frontend.h
+!Idrivers/media/dvb-core/dvb_math.h
 <!-- FIXME: Removed for now due to document generation inconsistency
 X!Iinclude/media/v4l2-ctrls.h
 X!Iinclude/media/v4l2-dv-timings.h
@@ -241,7 +242,6 @@ X!Edrivers/media/dvb-core/dvb_demux.c
 X!Idrivers/media/dvb-core/dvbdev.h
 X!Edrivers/media/dvb-core/dvb_net.c
 X!Idrivers/media/dvb-core/dvb_ringbuffer.h
-X!Idrivers/media/dvb-core/dvb_math.h
 -->
 
   </chapter>
diff --git a/drivers/media/dvb-core/dvb_math.h b/drivers/media/dvb-core/dvb_math.h
index f586aa001ede..34dc1df03cab 100644
--- a/drivers/media/dvb-core/dvb_math.h
+++ b/drivers/media/dvb-core/dvb_math.h
@@ -25,7 +25,9 @@
 #include <linux/types.h>
 
 /**
- * computes log2 of a value; the result is shifted left by 24 bits
+ * cintlog2 - computes log2 of a value; the result is shifted left by 24 bits
+ *
+ * @value: The value (must be != 0)
  *
  * to use rational values you can use the following method:
  *   intlog2(value) = intlog2(value * 2^x) - x * 2^24
@@ -35,13 +37,15 @@
  *	intlog2(9) will give 3 << 24 + ... = 3.16... * 2^24
  *	intlog2(1.5) = intlog2(3) - 2^24 = 0.584... * 2^24
  *
- * @param value The value (must be != 0)
- * @return log2(value) * 2^24
+ *
+ * return: log2(value) * 2^24
  */
 extern unsigned int intlog2(u32 value);
 
 /**
- * computes log10 of a value; the result is shifted left by 24 bits
+ * intlog10 - computes log10 of a value; the result is shifted left by 24 bits
+ *
+ * @value: The value (must be != 0)
  *
  * to use rational values you can use the following method:
  *   intlog10(value) = intlog10(value * 10^x) - x * 2^24
@@ -52,8 +56,7 @@ extern unsigned int intlog2(u32 value);
  *
  * look at intlog2 for similar examples
  *
- * @param value The value (must be != 0)
- * @return log10(value) * 2^24
+ * return: log10(value) * 2^24
  */
 extern unsigned int intlog10(u32 value);
 
-- 
cgit v1.2.3


From 2a86e373e0cf926f5cb41725f1b0ce3874e7b1cf Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Sat, 22 Aug 2015 07:38:51 -0300
Subject: [media] DocBook: add dvb_ringbuffer.h to documentation

There are already some comments at dvb_ringbuffer.h that are ready
for DocBook, although not properly formatted.

Convert them, fix some issues and add this file to
the device-drivers DocBook.

While here, put multi-line comments on the right format.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl |   2 +-
 drivers/media/dvb-core/dvb_ringbuffer.h   | 135 +++++++++++++++++-------------
 2 files changed, 76 insertions(+), 61 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index 21fc7684d706..030b3803cc68 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -230,6 +230,7 @@ X!Isound/sound_firmware.c
 !Idrivers/media/dvb-core/dvb_ca_en50221.h
 !Idrivers/media/dvb-core/dvb_frontend.h
 !Idrivers/media/dvb-core/dvb_math.h
+!Idrivers/media/dvb-core/dvb_ringbuffer.h
 <!-- FIXME: Removed for now due to document generation inconsistency
 X!Iinclude/media/v4l2-ctrls.h
 X!Iinclude/media/v4l2-dv-timings.h
@@ -241,7 +242,6 @@ X!Iinclude/media/lirc.h
 X!Edrivers/media/dvb-core/dvb_demux.c
 X!Idrivers/media/dvb-core/dvbdev.h
 X!Edrivers/media/dvb-core/dvb_net.c
-X!Idrivers/media/dvb-core/dvb_ringbuffer.h
 -->
 
   </chapter>
diff --git a/drivers/media/dvb-core/dvb_ringbuffer.h b/drivers/media/dvb-core/dvb_ringbuffer.h
index 9e1e11b7c39c..3ebc2d34b4a2 100644
--- a/drivers/media/dvb-core/dvb_ringbuffer.h
+++ b/drivers/media/dvb-core/dvb_ringbuffer.h
@@ -45,33 +45,33 @@ struct dvb_ringbuffer {
 
 
 /*
-** Notes:
-** ------
-** (1) For performance reasons read and write routines don't check buffer sizes
-**     and/or number of bytes free/available. This has to be done before these
-**     routines are called. For example:
-**
-**     *** write <buflen> bytes ***
-**     free = dvb_ringbuffer_free(rbuf);
-**     if (free >= buflen)
-**         count = dvb_ringbuffer_write(rbuf, buffer, buflen);
-**     else
-**         ...
-**
-**     *** read min. 1000, max. <bufsize> bytes ***
-**     avail = dvb_ringbuffer_avail(rbuf);
-**     if (avail >= 1000)
-**         count = dvb_ringbuffer_read(rbuf, buffer, min(avail, bufsize));
-**     else
-**         ...
-**
-** (2) If there is exactly one reader and one writer, there is no need
-**     to lock read or write operations.
-**     Two or more readers must be locked against each other.
-**     Flushing the buffer counts as a read operation.
-**     Resetting the buffer counts as a read and write operation.
-**     Two or more writers must be locked against each other.
-*/
+ * Notes:
+ * ------
+ * (1) For performance reasons read and write routines don't check buffer sizes
+ *     and/or number of bytes free/available. This has to be done before these
+ *     routines are called. For example:
+ *
+ *     *** write @buflen: bytes ***
+ *     free = dvb_ringbuffer_free(rbuf);
+ *     if (free >= buflen)
+ *         count = dvb_ringbuffer_write(rbuf, buffer, buflen);
+ *     else
+ *         ...
+ *
+ *     *** read min. 1000, max. @bufsize: bytes ***
+ *     avail = dvb_ringbuffer_avail(rbuf);
+ *     if (avail >= 1000)
+ *         count = dvb_ringbuffer_read(rbuf, buffer, min(avail, bufsize));
+ *     else
+ *         ...
+ *
+ * (2) If there is exactly one reader and one writer, there is no need
+ *     to lock read or write operations.
+ *     Two or more readers must be locked against each other.
+ *     Flushing the buffer counts as a read operation.
+ *     Resetting the buffer counts as a read and write operation.
+ *     Two or more writers must be locked against each other.
+ */
 
 /* initialize ring buffer, lock and queue */
 extern void dvb_ringbuffer_init(struct dvb_ringbuffer *rbuf, void *data, size_t len);
@@ -87,9 +87,9 @@ extern ssize_t dvb_ringbuffer_avail(struct dvb_ringbuffer *rbuf);
 
 
 /*
-** Reset the read and write pointers to zero and flush the buffer
-** This counts as a read and write operation
-*/
+ * Reset the read and write pointers to zero and flush the buffer
+ * This counts as a read and write operation
+ */
 extern void dvb_ringbuffer_reset(struct dvb_ringbuffer *rbuf);
 
 
@@ -101,19 +101,19 @@ extern void dvb_ringbuffer_flush(struct dvb_ringbuffer *rbuf);
 /* flush buffer protected by spinlock and wake-up waiting task(s) */
 extern void dvb_ringbuffer_flush_spinlock_wakeup(struct dvb_ringbuffer *rbuf);
 
-/* peek at byte <offs> in the buffer */
+/* peek at byte @offs: in the buffer */
 #define DVB_RINGBUFFER_PEEK(rbuf,offs)	\
 			(rbuf)->data[((rbuf)->pread+(offs))%(rbuf)->size]
 
-/* advance read ptr by <num> bytes */
+/* advance read ptr by @num: bytes */
 #define DVB_RINGBUFFER_SKIP(rbuf,num)	\
 			(rbuf)->pread=((rbuf)->pread+(num))%(rbuf)->size
 
 /*
-** read <len> bytes from ring buffer into <buf>
-** <usermem> specifies whether <buf> resides in user space
-** returns number of bytes transferred or -EFAULT
-*/
+ * read @len: bytes from ring buffer into @buf:
+ * @usermem: specifies whether @buf: resides in user space
+ * returns number of bytes transferred or -EFAULT
+ */
 extern ssize_t dvb_ringbuffer_read_user(struct dvb_ringbuffer *rbuf,
 				   u8 __user *buf, size_t len);
 extern void dvb_ringbuffer_read(struct dvb_ringbuffer *rbuf,
@@ -127,9 +127,9 @@ extern void dvb_ringbuffer_read(struct dvb_ringbuffer *rbuf,
 			{ (rbuf)->data[(rbuf)->pwrite]=(byte); \
 			(rbuf)->pwrite=((rbuf)->pwrite+1)%(rbuf)->size; }
 /*
-** write <len> bytes to ring buffer
-** <usermem> specifies whether <buf> resides in user space
-** returns number of bytes transferred or -EFAULT
+ * write @len: bytes to ring buffer
+ * @usermem: specifies whether @buf: resides in user space
+ * returns number of bytes transferred or -EFAULT
 */
 extern ssize_t dvb_ringbuffer_write(struct dvb_ringbuffer *rbuf, const u8 *buf,
 				    size_t len);
@@ -138,48 +138,63 @@ extern ssize_t dvb_ringbuffer_write_user(struct dvb_ringbuffer *rbuf,
 
 
 /**
- * Write a packet into the ringbuffer.
+ * dvb_ringbuffer_pkt_write - Write a packet into the ringbuffer.
  *
- * <rbuf> Ringbuffer to write to.
- * <buf> Buffer to write.
- * <len> Length of buffer (currently limited to 65535 bytes max).
+ * @rbuf: Ringbuffer to write to.
+ * @buf: Buffer to write.
+ * @len: Length of buffer (currently limited to 65535 bytes max).
  * returns Number of bytes written, or -EFAULT, -ENOMEM, -EVINAL.
  */
 extern ssize_t dvb_ringbuffer_pkt_write(struct dvb_ringbuffer *rbuf, u8* buf,
 					size_t len);
 
 /**
- * Read from a packet in the ringbuffer. Note: unlike dvb_ringbuffer_read(), this
- * does NOT update the read pointer in the ringbuffer. You must use
- * dvb_ringbuffer_pkt_dispose() to mark a packet as no longer required.
- *
- * <rbuf> Ringbuffer concerned.
- * <idx> Packet index as returned by dvb_ringbuffer_pkt_next().
- * <offset> Offset into packet to read from.
- * <buf> Destination buffer for data.
- * <len> Size of destination buffer.
- * <usermem> Set to 1 if <buf> is in userspace.
+ * dvb_ringbuffer_pkt_read_user - Read from a packet in the ringbuffer.
+ * Note: unlike dvb_ringbuffer_read(), this does NOT update the read pointer
+ * in the ringbuffer. You must use dvb_ringbuffer_pkt_dispose() to mark a
+ * packet as no longer required.
+ *
+ * @rbuf: Ringbuffer concerned.
+ * @idx: Packet index as returned by dvb_ringbuffer_pkt_next().
+ * @offset: Offset into packet to read from.
+ * @buf: Destination buffer for data.
+ * @len: Size of destination buffer.
+ *
  * returns Number of bytes read, or -EFAULT.
  */
 extern ssize_t dvb_ringbuffer_pkt_read_user(struct dvb_ringbuffer *rbuf, size_t idx,
 				       int offset, u8 __user *buf, size_t len);
+
+/**
+ * dvb_ringbuffer_pkt_read - Read from a packet in the ringbuffer.
+ * Note: unlike dvb_ringbuffer_read_user(), this DOES update the read pointer
+ * in the ringbuffer.
+ *
+ * @rbuf: Ringbuffer concerned.
+ * @idx: Packet index as returned by dvb_ringbuffer_pkt_next().
+ * @offset: Offset into packet to read from.
+ * @buf: Destination buffer for data.
+ * @len: Size of destination buffer.
+ *
+ * returns Number of bytes read, or -EFAULT.
+ */
 extern ssize_t dvb_ringbuffer_pkt_read(struct dvb_ringbuffer *rbuf, size_t idx,
 				       int offset, u8 *buf, size_t len);
 
 /**
- * Dispose of a packet in the ring buffer.
+ * dvb_ringbuffer_pkt_dispose - Dispose of a packet in the ring buffer.
  *
- * <rbuf> Ring buffer concerned.
- * <idx> Packet index as returned by dvb_ringbuffer_pkt_next().
+ * @rbuf: Ring buffer concerned.
+ * @idx: Packet index as returned by dvb_ringbuffer_pkt_next().
  */
 extern void dvb_ringbuffer_pkt_dispose(struct dvb_ringbuffer *rbuf, size_t idx);
 
 /**
- * Get the index of the next packet in a ringbuffer.
+ * dvb_ringbuffer_pkt_next - Get the index of the next packet in a ringbuffer.
  *
- * <rbuf> Ringbuffer concerned.
- * <idx> Previous packet index, or -1 to return the first packet index.
- * <pktlen> On success, will be updated to contain the length of the packet in bytes.
+ * @rbuf: Ringbuffer concerned.
+ * @idx: Previous packet index, or -1 to return the first packet index.
+ * @pktlen: On success, will be updated to contain the length of the packet in bytes.
  * returns Packet index (if >=0), or -1 if no packets available.
  */
 extern ssize_t dvb_ringbuffer_pkt_next(struct dvb_ringbuffer *rbuf, size_t idx, size_t* pktlen);
-- 
cgit v1.2.3


From 8c2721d57a4bac055ae7bb1874a13a928277d5ff Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Sat, 22 Aug 2015 08:03:49 -0300
Subject: [media] v4l2-ctrls.h: add to device-drivers DocBook

The comments there are using a wrong format. Due to that,
DocBook were unable to parse it.

Fix the tags format, and add it to device-drivers.xml.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl |    6 +-
 include/media/v4l2-ctrls.h                | 1011 ++++++++++++++++-------------
 2 files changed, 552 insertions(+), 465 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index 030b3803cc68..5f01f7ad15dc 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -231,17 +231,13 @@ X!Isound/sound_firmware.c
 !Idrivers/media/dvb-core/dvb_frontend.h
 !Idrivers/media/dvb-core/dvb_math.h
 !Idrivers/media/dvb-core/dvb_ringbuffer.h
+!Iinclude/media/v4l2-ctrls.h
 <!-- FIXME: Removed for now due to document generation inconsistency
-X!Iinclude/media/v4l2-ctrls.h
 X!Iinclude/media/v4l2-dv-timings.h
 X!Iinclude/media/v4l2-event.h
 X!Iinclude/media/v4l2-mediabus.h
 X!Iinclude/media/videobuf2-memops.h
 X!Iinclude/media/videobuf2-core.h
-X!Iinclude/media/lirc.h
-X!Edrivers/media/dvb-core/dvb_demux.c
-X!Idrivers/media/dvb-core/dvbdev.h
-X!Edrivers/media/dvb-core/dvb_net.c
 -->
 
   </chapter>
diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h
index 911f3e542834..88f736661c06 100644
--- a/include/media/v4l2-ctrls.h
+++ b/include/media/v4l2-ctrls.h
@@ -36,7 +36,8 @@ struct v4l2_subscribed_event;
 struct v4l2_fh;
 struct poll_table_struct;
 
-/** union v4l2_ctrl_ptr - A pointer to a control value.
+/**
+ * union v4l2_ctrl_ptr - A pointer to a control value.
  * @p_s32:	Pointer to a 32-bit signed value.
  * @p_s64:	Pointer to a 64-bit signed value.
  * @p_u8:	Pointer to a 8-bit unsigned value.
@@ -55,30 +56,34 @@ union v4l2_ctrl_ptr {
 	void *p;
 };
 
-/** struct v4l2_ctrl_ops - The control operations that the driver has to provide.
-  * @g_volatile_ctrl: Get a new value for this control. Generally only relevant
-  *		for volatile (and usually read-only) controls such as a control
-  *		that returns the current signal strength which changes
-  *		continuously.
-  *		If not set, then the currently cached value will be returned.
-  * @try_ctrl:	Test whether the control's value is valid. Only relevant when
-  *		the usual min/max/step checks are not sufficient.
-  * @s_ctrl:	Actually set the new control value. s_ctrl is compulsory. The
-  *		ctrl->handler->lock is held when these ops are called, so no
-  *		one else can access controls owned by that handler.
-  */
+/**
+ * struct v4l2_ctrl_ops - The control operations that the driver has to provide.
+ * @g_volatile_ctrl: Get a new value for this control. Generally only relevant
+ *		for volatile (and usually read-only) controls such as a control
+ *		that returns the current signal strength which changes
+ *		continuously.
+ *		If not set, then the currently cached value will be returned.
+ * @try_ctrl:	Test whether the control's value is valid. Only relevant when
+ *		the usual min/max/step checks are not sufficient.
+ * @s_ctrl:	Actually set the new control value. s_ctrl is compulsory. The
+ *		ctrl->handler->lock is held when these ops are called, so no
+ *		one else can access controls owned by that handler.
+ */
 struct v4l2_ctrl_ops {
 	int (*g_volatile_ctrl)(struct v4l2_ctrl *ctrl);
 	int (*try_ctrl)(struct v4l2_ctrl *ctrl);
 	int (*s_ctrl)(struct v4l2_ctrl *ctrl);
 };
 
-/** struct v4l2_ctrl_type_ops - The control type operations that the driver has to provide.
-  * @equal: return true if both values are equal.
-  * @init: initialize the value.
-  * @log: log the value.
-  * @validate: validate the value. Return 0 on success and a negative value otherwise.
-  */
+/**
+ * struct v4l2_ctrl_type_ops - The control type operations that the driver
+ * 			       has to provide.
+ *
+ * @equal: return true if both values are equal.
+ * @init: initialize the value.
+ * @log: log the value.
+ * @validate: validate the value. Return 0 on success and a negative value otherwise.
+ */
 struct v4l2_ctrl_type_ops {
 	bool (*equal)(const struct v4l2_ctrl *ctrl, u32 idx,
 		      union v4l2_ctrl_ptr ptr1,
@@ -92,74 +97,75 @@ struct v4l2_ctrl_type_ops {
 
 typedef void (*v4l2_ctrl_notify_fnc)(struct v4l2_ctrl *ctrl, void *priv);
 
-/** struct v4l2_ctrl - The control structure.
-  * @node:	The list node.
-  * @ev_subs:	The list of control event subscriptions.
-  * @handler:	The handler that owns the control.
-  * @cluster:	Point to start of cluster array.
-  * @ncontrols:	Number of controls in cluster array.
-  * @done:	Internal flag: set for each processed control.
-  * @is_new:	Set when the user specified a new value for this control. It
-  *		is also set when called from v4l2_ctrl_handler_setup. Drivers
-  *		should never set this flag.
-  * @has_changed: Set when the current value differs from the new value. Drivers
-  *		should never use this flag.
-  * @is_private: If set, then this control is private to its handler and it
-  *		will not be added to any other handlers. Drivers can set
-  *		this flag.
-  * @is_auto:   If set, then this control selects whether the other cluster
-  *		members are in 'automatic' mode or 'manual' mode. This is
-  *		used for autogain/gain type clusters. Drivers should never
-  *		set this flag directly.
-  * @is_int:    If set, then this control has a simple integer value (i.e. it
-  *		uses ctrl->val).
-  * @is_string: If set, then this control has type V4L2_CTRL_TYPE_STRING.
-  * @is_ptr:	If set, then this control is an array and/or has type >= V4L2_CTRL_COMPOUND_TYPES
-  *		and/or has type V4L2_CTRL_TYPE_STRING. In other words, struct
-  *		v4l2_ext_control uses field p to point to the data.
-  * @is_array: If set, then this control contains an N-dimensional array.
-  * @has_volatiles: If set, then one or more members of the cluster are volatile.
-  *		Drivers should never touch this flag.
-  * @call_notify: If set, then call the handler's notify function whenever the
-  *		control's value changes.
-  * @manual_mode_value: If the is_auto flag is set, then this is the value
-  *		of the auto control that determines if that control is in
-  *		manual mode. So if the value of the auto control equals this
-  *		value, then the whole cluster is in manual mode. Drivers should
-  *		never set this flag directly.
-  * @ops:	The control ops.
-  * @type_ops:	The control type ops.
-  * @id:	The control ID.
-  * @name:	The control name.
-  * @type:	The control type.
-  * @minimum:	The control's minimum value.
-  * @maximum:	The control's maximum value.
-  * @default_value: The control's default value.
-  * @step:	The control's step value for non-menu controls.
-  * @elems:	The number of elements in the N-dimensional array.
-  * @elem_size:	The size in bytes of the control.
-  * @dims:	The size of each dimension.
-  * @nr_of_dims:The number of dimensions in @dims.
-  * @menu_skip_mask: The control's skip mask for menu controls. This makes it
-  *		easy to skip menu items that are not valid. If bit X is set,
-  *		then menu item X is skipped. Of course, this only works for
-  *		menus with <= 32 menu items. There are no menus that come
-  *		close to that number, so this is OK. Should we ever need more,
-  *		then this will have to be extended to a u64 or a bit array.
-  * @qmenu:	A const char * array for all menu items. Array entries that are
-  *		empty strings ("") correspond to non-existing menu items (this
-  *		is in addition to the menu_skip_mask above). The last entry
-  *		must be NULL.
-  * @flags:	The control's flags.
-  * @cur:	The control's current value.
-  * @val:	The control's new s32 value.
-  * @val64:	The control's new s64 value.
-  * @priv:	The control's private pointer. For use by the driver. It is
-  *		untouched by the control framework. Note that this pointer is
-  *		not freed when the control is deleted. Should this be needed
-  *		then a new internal bitfield can be added to tell the framework
-  *		to free this pointer.
-  */
+/**
+ * struct v4l2_ctrl - The control structure.
+ * @node:	The list node.
+ * @ev_subs:	The list of control event subscriptions.
+ * @handler:	The handler that owns the control.
+ * @cluster:	Point to start of cluster array.
+ * @ncontrols:	Number of controls in cluster array.
+ * @done:	Internal flag: set for each processed control.
+ * @is_new:	Set when the user specified a new value for this control. It
+ *		is also set when called from v4l2_ctrl_handler_setup. Drivers
+ *		should never set this flag.
+ * @has_changed: Set when the current value differs from the new value. Drivers
+ *		should never use this flag.
+ * @is_private: If set, then this control is private to its handler and it
+ *		will not be added to any other handlers. Drivers can set
+ *		this flag.
+ * @is_auto:   If set, then this control selects whether the other cluster
+ *		members are in 'automatic' mode or 'manual' mode. This is
+ *		used for autogain/gain type clusters. Drivers should never
+ *		set this flag directly.
+ * @is_int:    If set, then this control has a simple integer value (i.e. it
+ *		uses ctrl->val).
+ * @is_string: If set, then this control has type V4L2_CTRL_TYPE_STRING.
+ * @is_ptr:	If set, then this control is an array and/or has type >= V4L2_CTRL_COMPOUND_TYPES
+ *		and/or has type V4L2_CTRL_TYPE_STRING. In other words, struct
+ *		v4l2_ext_control uses field p to point to the data.
+ * @is_array: If set, then this control contains an N-dimensional array.
+ * @has_volatiles: If set, then one or more members of the cluster are volatile.
+ *		Drivers should never touch this flag.
+ * @call_notify: If set, then call the handler's notify function whenever the
+ *		control's value changes.
+ * @manual_mode_value: If the is_auto flag is set, then this is the value
+ *		of the auto control that determines if that control is in
+ *		manual mode. So if the value of the auto control equals this
+ *		value, then the whole cluster is in manual mode. Drivers should
+ *		never set this flag directly.
+ * @ops:	The control ops.
+ * @type_ops:	The control type ops.
+ * @id:	The control ID.
+ * @name:	The control name.
+ * @type:	The control type.
+ * @minimum:	The control's minimum value.
+ * @maximum:	The control's maximum value.
+ * @default_value: The control's default value.
+ * @step:	The control's step value for non-menu controls.
+ * @elems:	The number of elements in the N-dimensional array.
+ * @elem_size:	The size in bytes of the control.
+ * @dims:	The size of each dimension.
+ * @nr_of_dims:The number of dimensions in @dims.
+ * @menu_skip_mask: The control's skip mask for menu controls. This makes it
+ *		easy to skip menu items that are not valid. If bit X is set,
+ *		then menu item X is skipped. Of course, this only works for
+ *		menus with <= 32 menu items. There are no menus that come
+ *		close to that number, so this is OK. Should we ever need more,
+ *		then this will have to be extended to a u64 or a bit array.
+ * @qmenu:	A const char * array for all menu items. Array entries that are
+ *		empty strings ("") correspond to non-existing menu items (this
+ *		is in addition to the menu_skip_mask above). The last entry
+ *		must be NULL.
+ * @flags:	The control's flags.
+ * @cur:	The control's current value.
+ * @val:	The control's new s32 value.
+ * @val64:	The control's new s64 value.
+ * @priv:	The control's private pointer. For use by the driver. It is
+ *		untouched by the control framework. Note that this pointer is
+ *		not freed when the control is deleted. Should this be needed
+ *		then a new internal bitfield can be added to tell the framework
+ *		to free this pointer.
+ */
 struct v4l2_ctrl {
 	/* Administrative fields */
 	struct list_head node;
@@ -210,16 +216,17 @@ struct v4l2_ctrl {
 	union v4l2_ctrl_ptr p_cur;
 };
 
-/** struct v4l2_ctrl_ref - The control reference.
-  * @node:	List node for the sorted list.
-  * @next:	Single-link list node for the hash.
-  * @ctrl:	The actual control information.
-  * @helper:	Pointer to helper struct. Used internally in prepare_ext_ctrls().
-  *
-  * Each control handler has a list of these refs. The list_head is used to
-  * keep a sorted-by-control-ID list of all controls, while the next pointer
-  * is used to link the control in the hash's bucket.
-  */
+/**
+ * struct v4l2_ctrl_ref - The control reference.
+ * @node:	List node for the sorted list.
+ * @next:	Single-link list node for the hash.
+ * @ctrl:	The actual control information.
+ * @helper:	Pointer to helper struct. Used internally in prepare_ext_ctrls().
+ *
+ * Each control handler has a list of these refs. The list_head is used to
+ * keep a sorted-by-control-ID list of all controls, while the next pointer
+ * is used to link the control in the hash's bucket.
+ */
 struct v4l2_ctrl_ref {
 	struct list_head node;
 	struct v4l2_ctrl_ref *next;
@@ -227,25 +234,26 @@ struct v4l2_ctrl_ref {
 	struct v4l2_ctrl_helper *helper;
 };
 
-/** struct v4l2_ctrl_handler - The control handler keeps track of all the
-  * controls: both the controls owned by the handler and those inherited
-  * from other handlers.
-  * @_lock:	Default for "lock".
-  * @lock:	Lock to control access to this handler and its controls.
-  *		May be replaced by the user right after init.
-  * @ctrls:	The list of controls owned by this handler.
-  * @ctrl_refs:	The list of control references.
-  * @cached:	The last found control reference. It is common that the same
-  *		control is needed multiple times, so this is a simple
-  *		optimization.
-  * @buckets:	Buckets for the hashing. Allows for quick control lookup.
-  * @notify:	A notify callback that is called whenever the control changes value.
-  *		Note that the handler's lock is held when the notify function
-  *		is called!
-  * @notify_priv: Passed as argument to the v4l2_ctrl notify callback.
-  * @nr_of_buckets: Total number of buckets in the array.
-  * @error:	The error code of the first failed control addition.
-  */
+/**
+ * struct v4l2_ctrl_handler - The control handler keeps track of all the
+ * controls: both the controls owned by the handler and those inherited
+ * from other handlers.
+ * @_lock:	Default for "lock".
+ * @lock:	Lock to control access to this handler and its controls.
+ *		May be replaced by the user right after init.
+ * @ctrls:	The list of controls owned by this handler.
+ * @ctrl_refs:	The list of control references.
+ * @cached:	The last found control reference. It is common that the same
+ *		control is needed multiple times, so this is a simple
+ *		optimization.
+ * @buckets:	Buckets for the hashing. Allows for quick control lookup.
+ * @notify:	A notify callback that is called whenever the control changes value.
+ *		Note that the handler's lock is held when the notify function
+ *		is called!
+ * @notify_priv: Passed as argument to the v4l2_ctrl notify callback.
+ * @nr_of_buckets: Total number of buckets in the array.
+ * @error:	The error code of the first failed control addition.
+ */
 struct v4l2_ctrl_handler {
 	struct mutex _lock;
 	struct mutex *lock;
@@ -259,32 +267,33 @@ struct v4l2_ctrl_handler {
 	int error;
 };
 
-/** struct v4l2_ctrl_config - Control configuration structure.
-  * @ops:	The control ops.
-  * @type_ops:	The control type ops. Only needed for compound controls.
-  * @id:	The control ID.
-  * @name:	The control name.
-  * @type:	The control type.
-  * @min:	The control's minimum value.
-  * @max:	The control's maximum value.
-  * @step:	The control's step value for non-menu controls.
-  * @def: 	The control's default value.
-  * @dims:	The size of each dimension.
-  * @elem_size:	The size in bytes of the control.
-  * @flags:	The control's flags.
-  * @menu_skip_mask: The control's skip mask for menu controls. This makes it
-  *		easy to skip menu items that are not valid. If bit X is set,
-  *		then menu item X is skipped. Of course, this only works for
-  *		menus with <= 64 menu items. There are no menus that come
-  *		close to that number, so this is OK. Should we ever need more,
-  *		then this will have to be extended to a bit array.
-  * @qmenu:	A const char * array for all menu items. Array entries that are
-  *		empty strings ("") correspond to non-existing menu items (this
-  *		is in addition to the menu_skip_mask above). The last entry
-  *		must be NULL.
-  * @is_private: If set, then this control is private to its handler and it
-  *		will not be added to any other handlers.
-  */
+/**
+ * struct v4l2_ctrl_config - Control configuration structure.
+ * @ops:	The control ops.
+ * @type_ops:	The control type ops. Only needed for compound controls.
+ * @id:	The control ID.
+ * @name:	The control name.
+ * @type:	The control type.
+ * @min:	The control's minimum value.
+ * @max:	The control's maximum value.
+ * @step:	The control's step value for non-menu controls.
+ * @def: 	The control's default value.
+ * @dims:	The size of each dimension.
+ * @elem_size:	The size in bytes of the control.
+ * @flags:	The control's flags.
+ * @menu_skip_mask: The control's skip mask for menu controls. This makes it
+ *		easy to skip menu items that are not valid. If bit X is set,
+ *		then menu item X is skipped. Of course, this only works for
+ *		menus with <= 64 menu items. There are no menus that come
+ *		close to that number, so this is OK. Should we ever need more,
+ *		then this will have to be extended to a bit array.
+ * @qmenu:	A const char * array for all menu items. Array entries that are
+ *		empty strings ("") correspond to non-existing menu items (this
+ *		is in addition to the menu_skip_mask above). The last entry
+ *		must be NULL.
+ * @is_private: If set, then this control is private to its handler and it
+ *		will not be added to any other handlers.
+ */
 struct v4l2_ctrl_config {
 	const struct v4l2_ctrl_ops *ops;
 	const struct v4l2_ctrl_type_ops *type_ops;
@@ -304,42 +313,44 @@ struct v4l2_ctrl_config {
 	unsigned int is_private:1;
 };
 
-/** v4l2_ctrl_fill() - Fill in the control fields based on the control ID.
-  *
-  * This works for all standard V4L2 controls.
-  * For non-standard controls it will only fill in the given arguments
-  * and @name will be NULL.
-  *
-  * This function will overwrite the contents of @name, @type and @flags.
-  * The contents of @min, @max, @step and @def may be modified depending on
-  * the type.
-  *
-  * Do not use in drivers! It is used internally for backwards compatibility
-  * control handling only. Once all drivers are converted to use the new
-  * control framework this function will no longer be exported.
-  */
+/**
+ * v4l2_ctrl_fill() - Fill in the control fields based on the control ID.
+ *
+ * This works for all standard V4L2 controls.
+ * For non-standard controls it will only fill in the given arguments
+ * and @name will be NULL.
+ *
+ * This function will overwrite the contents of @name, @type and @flags.
+ * The contents of @min, @max, @step and @def may be modified depending on
+ * the type.
+ *
+ * Do not use in drivers! It is used internally for backwards compatibility
+ * control handling only. Once all drivers are converted to use the new
+ * control framework this function will no longer be exported.
+ */
 void v4l2_ctrl_fill(u32 id, const char **name, enum v4l2_ctrl_type *type,
 		    s64 *min, s64 *max, u64 *step, s64 *def, u32 *flags);
 
 
-/** v4l2_ctrl_handler_init_class() - Initialize the control handler.
-  * @hdl:	The control handler.
-  * @nr_of_controls_hint: A hint of how many controls this handler is
-  *		expected to refer to. This is the total number, so including
-  *		any inherited controls. It doesn't have to be precise, but if
-  *		it is way off, then you either waste memory (too many buckets
-  *		are allocated) or the control lookup becomes slower (not enough
-  *		buckets are allocated, so there are more slow list lookups).
-  *		It will always work, though.
-  * @key:	Used by the lock validator if CONFIG_LOCKDEP is set.
-  * @name:	Used by the lock validator if CONFIG_LOCKDEP is set.
-  *
-  * Returns an error if the buckets could not be allocated. This error will
-  * also be stored in @hdl->error.
-  *
-  * Never use this call directly, always use the v4l2_ctrl_handler_init
-  * macro that hides the @key and @name arguments.
-  */
+/**
+ * v4l2_ctrl_handler_init_class() - Initialize the control handler.
+ * @hdl:	The control handler.
+ * @nr_of_controls_hint: A hint of how many controls this handler is
+ *		expected to refer to. This is the total number, so including
+ *		any inherited controls. It doesn't have to be precise, but if
+ *		it is way off, then you either waste memory (too many buckets
+ *		are allocated) or the control lookup becomes slower (not enough
+ *		buckets are allocated, so there are more slow list lookups).
+ *		It will always work, though.
+ * @key:	Used by the lock validator if CONFIG_LOCKDEP is set.
+ * @name:	Used by the lock validator if CONFIG_LOCKDEP is set.
+ *
+ * Returns an error if the buckets could not be allocated. This error will
+ * also be stored in @hdl->error.
+ *
+ * Never use this call directly, always use the v4l2_ctrl_handler_init
+ * macro that hides the @key and @name arguments.
+ */
 int v4l2_ctrl_handler_init_class(struct v4l2_ctrl_handler *hdl,
 				 unsigned nr_of_controls_hint,
 				 struct lock_class_key *key, const char *name);
@@ -361,289 +372,326 @@ int v4l2_ctrl_handler_init_class(struct v4l2_ctrl_handler *hdl,
 	v4l2_ctrl_handler_init_class(hdl, nr_of_controls_hint, NULL, NULL)
 #endif
 
-/** v4l2_ctrl_handler_free() - Free all controls owned by the handler and free
-  * the control list.
-  * @hdl:	The control handler.
-  *
-  * Does nothing if @hdl == NULL.
-  */
+/**
+ * v4l2_ctrl_handler_free() - Free all controls owned by the handler and free
+ * the control list.
+ * @hdl:	The control handler.
+ *
+ * Does nothing if @hdl == NULL.
+ */
 void v4l2_ctrl_handler_free(struct v4l2_ctrl_handler *hdl);
 
-/** v4l2_ctrl_lock() - Helper function to lock the handler
-  * associated with the control.
-  * @ctrl:	The control to lock.
-  */
+/**
+ * v4l2_ctrl_lock() - Helper function to lock the handler
+ * associated with the control.
+ * @ctrl:	The control to lock.
+ */
 static inline void v4l2_ctrl_lock(struct v4l2_ctrl *ctrl)
 {
 	mutex_lock(ctrl->handler->lock);
 }
 
-/** v4l2_ctrl_unlock() - Helper function to unlock the handler
-  * associated with the control.
-  * @ctrl:	The control to unlock.
-  */
+/**
+ * v4l2_ctrl_unlock() - Helper function to unlock the handler
+ * associated with the control.
+ * @ctrl:	The control to unlock.
+ */
 static inline void v4l2_ctrl_unlock(struct v4l2_ctrl *ctrl)
 {
 	mutex_unlock(ctrl->handler->lock);
 }
 
-/** v4l2_ctrl_handler_setup() - Call the s_ctrl op for all controls belonging
-  * to the handler to initialize the hardware to the current control values.
-  * @hdl:	The control handler.
-  *
-  * Button controls will be skipped, as are read-only controls.
-  *
-  * If @hdl == NULL, then this just returns 0.
-  */
+/**
+ * v4l2_ctrl_handler_setup() - Call the s_ctrl op for all controls belonging
+ * to the handler to initialize the hardware to the current control values.
+ * @hdl:	The control handler.
+ *
+ * Button controls will be skipped, as are read-only controls.
+ *
+ * If @hdl == NULL, then this just returns 0.
+ */
 int v4l2_ctrl_handler_setup(struct v4l2_ctrl_handler *hdl);
 
-/** v4l2_ctrl_handler_log_status() - Log all controls owned by the handler.
-  * @hdl:	The control handler.
-  * @prefix:	The prefix to use when logging the control values. If the
-  *		prefix does not end with a space, then ": " will be added
-  *		after the prefix. If @prefix == NULL, then no prefix will be
-  *		used.
-  *
-  * For use with VIDIOC_LOG_STATUS.
-  *
-  * Does nothing if @hdl == NULL.
-  */
+/**
+ * v4l2_ctrl_handler_log_status() - Log all controls owned by the handler.
+ * @hdl:	The control handler.
+ * @prefix:	The prefix to use when logging the control values. If the
+ *		prefix does not end with a space, then ": " will be added
+ *		after the prefix. If @prefix == NULL, then no prefix will be
+ *		used.
+ *
+ * For use with VIDIOC_LOG_STATUS.
+ *
+ * Does nothing if @hdl == NULL.
+ */
 void v4l2_ctrl_handler_log_status(struct v4l2_ctrl_handler *hdl,
 				  const char *prefix);
 
-/** v4l2_ctrl_new_custom() - Allocate and initialize a new custom V4L2
-  * control.
-  * @hdl:	The control handler.
-  * @cfg:	The control's configuration data.
-  * @priv:	The control's driver-specific private data.
-  *
-  * If the &v4l2_ctrl struct could not be allocated then NULL is returned
-  * and @hdl->error is set to the error code (if it wasn't set already).
-  */
+/**
+ * v4l2_ctrl_new_custom() - Allocate and initialize a new custom V4L2
+ * control.
+ * @hdl:	The control handler.
+ * @cfg:	The control's configuration data.
+ * @priv:	The control's driver-specific private data.
+ *
+ * If the &v4l2_ctrl struct could not be allocated then NULL is returned
+ * and @hdl->error is set to the error code (if it wasn't set already).
+ */
 struct v4l2_ctrl *v4l2_ctrl_new_custom(struct v4l2_ctrl_handler *hdl,
 			const struct v4l2_ctrl_config *cfg, void *priv);
 
-/** v4l2_ctrl_new_std() - Allocate and initialize a new standard V4L2 non-menu control.
-  * @hdl:	The control handler.
-  * @ops:	The control ops.
-  * @id:	The control ID.
-  * @min:	The control's minimum value.
-  * @max:	The control's maximum value.
-  * @step:	The control's step value
-  * @def: 	The control's default value.
-  *
-  * If the &v4l2_ctrl struct could not be allocated, or the control
-  * ID is not known, then NULL is returned and @hdl->error is set to the
-  * appropriate error code (if it wasn't set already).
-  *
-  * If @id refers to a menu control, then this function will return NULL.
-  *
-  * Use v4l2_ctrl_new_std_menu() when adding menu controls.
-  */
+/**
+ * v4l2_ctrl_new_std() - Allocate and initialize a new standard V4L2 non-menu control.
+ * @hdl:	The control handler.
+ * @ops:	The control ops.
+ * @id:	The control ID.
+ * @min:	The control's minimum value.
+ * @max:	The control's maximum value.
+ * @step:	The control's step value
+ * @def: 	The control's default value.
+ *
+ * If the &v4l2_ctrl struct could not be allocated, or the control
+ * ID is not known, then NULL is returned and @hdl->error is set to the
+ * appropriate error code (if it wasn't set already).
+ *
+ * If @id refers to a menu control, then this function will return NULL.
+ *
+ * Use v4l2_ctrl_new_std_menu() when adding menu controls.
+ */
 struct v4l2_ctrl *v4l2_ctrl_new_std(struct v4l2_ctrl_handler *hdl,
 			const struct v4l2_ctrl_ops *ops,
 			u32 id, s64 min, s64 max, u64 step, s64 def);
 
-/** v4l2_ctrl_new_std_menu() - Allocate and initialize a new standard V4L2 menu control.
-  * @hdl:	The control handler.
-  * @ops:	The control ops.
-  * @id:	The control ID.
-  * @max:	The control's maximum value.
-  * @mask: 	The control's skip mask for menu controls. This makes it
-  *		easy to skip menu items that are not valid. If bit X is set,
-  *		then menu item X is skipped. Of course, this only works for
-  *		menus with <= 64 menu items. There are no menus that come
-  *		close to that number, so this is OK. Should we ever need more,
-  *		then this will have to be extended to a bit array.
-  * @def: 	The control's default value.
-  *
-  * Same as v4l2_ctrl_new_std(), but @min is set to 0 and the @mask value
-  * determines which menu items are to be skipped.
-  *
-  * If @id refers to a non-menu control, then this function will return NULL.
-  */
+/**
+ * v4l2_ctrl_new_std_menu() - Allocate and initialize a new standard V4L2 menu control.
+ * @hdl:	The control handler.
+ * @ops:	The control ops.
+ * @id:	The control ID.
+ * @max:	The control's maximum value.
+ * @mask: 	The control's skip mask for menu controls. This makes it
+ *		easy to skip menu items that are not valid. If bit X is set,
+ *		then menu item X is skipped. Of course, this only works for
+ *		menus with <= 64 menu items. There are no menus that come
+ *		close to that number, so this is OK. Should we ever need more,
+ *		then this will have to be extended to a bit array.
+ * @def: 	The control's default value.
+ *
+ * Same as v4l2_ctrl_new_std(), but @min is set to 0 and the @mask value
+ * determines which menu items are to be skipped.
+ *
+ * If @id refers to a non-menu control, then this function will return NULL.
+ */
 struct v4l2_ctrl *v4l2_ctrl_new_std_menu(struct v4l2_ctrl_handler *hdl,
 			const struct v4l2_ctrl_ops *ops,
 			u32 id, u8 max, u64 mask, u8 def);
 
-/** v4l2_ctrl_new_std_menu_items() - Create a new standard V4L2 menu control
-  * with driver specific menu.
-  * @hdl:	The control handler.
-  * @ops:	The control ops.
-  * @id:	The control ID.
-  * @max:	The control's maximum value.
-  * @mask:	The control's skip mask for menu controls. This makes it
-  *		easy to skip menu items that are not valid. If bit X is set,
-  *		then menu item X is skipped. Of course, this only works for
-  *		menus with <= 64 menu items. There are no menus that come
-  *		close to that number, so this is OK. Should we ever need more,
-  *		then this will have to be extended to a bit array.
-  * @def:	The control's default value.
-  * @qmenu:	The new menu.
-  *
-  * Same as v4l2_ctrl_new_std_menu(), but @qmenu will be the driver specific
-  * menu of this control.
-  *
-  */
+/**
+ * v4l2_ctrl_new_std_menu_items() - Create a new standard V4L2 menu control
+ * with driver specific menu.
+ * @hdl:	The control handler.
+ * @ops:	The control ops.
+ * @id:	The control ID.
+ * @max:	The control's maximum value.
+ * @mask:	The control's skip mask for menu controls. This makes it
+ *		easy to skip menu items that are not valid. If bit X is set,
+ *		then menu item X is skipped. Of course, this only works for
+ *		menus with <= 64 menu items. There are no menus that come
+ *		close to that number, so this is OK. Should we ever need more,
+ *		then this will have to be extended to a bit array.
+ * @def:	The control's default value.
+ * @qmenu:	The new menu.
+ *
+ * Same as v4l2_ctrl_new_std_menu(), but @qmenu will be the driver specific
+ * menu of this control.
+ *
+ */
 struct v4l2_ctrl *v4l2_ctrl_new_std_menu_items(struct v4l2_ctrl_handler *hdl,
 			const struct v4l2_ctrl_ops *ops, u32 id, u8 max,
 			u64 mask, u8 def, const char * const *qmenu);
 
-/** v4l2_ctrl_new_int_menu() - Create a new standard V4L2 integer menu control.
-  * @hdl:	The control handler.
-  * @ops:	The control ops.
-  * @id:	The control ID.
-  * @max:	The control's maximum value.
-  * @def:	The control's default value.
-  * @qmenu_int:	The control's menu entries.
-  *
-  * Same as v4l2_ctrl_new_std_menu(), but @mask is set to 0 and it additionaly
-  * takes as an argument an array of integers determining the menu items.
-  *
-  * If @id refers to a non-integer-menu control, then this function will return NULL.
-  */
+/**
+ * v4l2_ctrl_new_int_menu() - Create a new standard V4L2 integer menu control.
+ * @hdl:	The control handler.
+ * @ops:	The control ops.
+ * @id:	The control ID.
+ * @max:	The control's maximum value.
+ * @def:	The control's default value.
+ * @qmenu_int:	The control's menu entries.
+ *
+ * Same as v4l2_ctrl_new_std_menu(), but @mask is set to 0 and it additionaly
+ * takes as an argument an array of integers determining the menu items.
+ *
+ * If @id refers to a non-integer-menu control, then this function will return NULL.
+ */
 struct v4l2_ctrl *v4l2_ctrl_new_int_menu(struct v4l2_ctrl_handler *hdl,
 			const struct v4l2_ctrl_ops *ops,
 			u32 id, u8 max, u8 def, const s64 *qmenu_int);
 
-/** v4l2_ctrl_add_ctrl() - Add a control from another handler to this handler.
-  * @hdl:	The control handler.
-  * @ctrl:	The control to add.
-  *
-  * It will return NULL if it was unable to add the control reference.
-  * If the control already belonged to the handler, then it will do
-  * nothing and just return @ctrl.
-  */
+/**
+ * v4l2_ctrl_add_ctrl() - Add a control from another handler to this handler.
+ * @hdl:	The control handler.
+ * @ctrl:	The control to add.
+ *
+ * It will return NULL if it was unable to add the control reference.
+ * If the control already belonged to the handler, then it will do
+ * nothing and just return @ctrl.
+ */
 struct v4l2_ctrl *v4l2_ctrl_add_ctrl(struct v4l2_ctrl_handler *hdl,
 					  struct v4l2_ctrl *ctrl);
 
-/** v4l2_ctrl_add_handler() - Add all controls from handler @add to
-  * handler @hdl.
-  * @hdl:	The control handler.
-  * @add:	The control handler whose controls you want to add to
-  *		the @hdl control handler.
-  * @filter:	This function will filter which controls should be added.
-  *
-  * Does nothing if either of the two handlers is a NULL pointer.
-  * If @filter is NULL, then all controls are added. Otherwise only those
-  * controls for which @filter returns true will be added.
-  * In case of an error @hdl->error will be set to the error code (if it
-  * wasn't set already).
-  */
+/**
+ * v4l2_ctrl_add_handler() - Add all controls from handler @add to
+ * handler @hdl.
+ * @hdl:	The control handler.
+ * @add:	The control handler whose controls you want to add to
+ *		the @hdl control handler.
+ * @filter:	This function will filter which controls should be added.
+ *
+ * Does nothing if either of the two handlers is a NULL pointer.
+ * If @filter is NULL, then all controls are added. Otherwise only those
+ * controls for which @filter returns true will be added.
+ * In case of an error @hdl->error will be set to the error code (if it
+ * wasn't set already).
+ */
 int v4l2_ctrl_add_handler(struct v4l2_ctrl_handler *hdl,
 			  struct v4l2_ctrl_handler *add,
 			  bool (*filter)(const struct v4l2_ctrl *ctrl));
 
-/** v4l2_ctrl_radio_filter() - Standard filter for radio controls.
-  * @ctrl:	The control that is filtered.
-  *
-  * This will return true for any controls that are valid for radio device
-  * nodes. Those are all of the V4L2_CID_AUDIO_* user controls and all FM
-  * transmitter class controls.
-  *
-  * This function is to be used with v4l2_ctrl_add_handler().
-  */
+/**
+ * v4l2_ctrl_radio_filter() - Standard filter for radio controls.
+ * @ctrl:	The control that is filtered.
+ *
+ * This will return true for any controls that are valid for radio device
+ * nodes. Those are all of the V4L2_CID_AUDIO_* user controls and all FM
+ * transmitter class controls.
+ *
+ * This function is to be used with v4l2_ctrl_add_handler().
+ */
 bool v4l2_ctrl_radio_filter(const struct v4l2_ctrl *ctrl);
 
-/** v4l2_ctrl_cluster() - Mark all controls in the cluster as belonging to that cluster.
-  * @ncontrols:	The number of controls in this cluster.
-  * @controls: 	The cluster control array of size @ncontrols.
-  */
+/**
+ * v4l2_ctrl_cluster() - Mark all controls in the cluster as belonging to that cluster.
+ * @ncontrols:	The number of controls in this cluster.
+ * @controls: 	The cluster control array of size @ncontrols.
+ */
 void v4l2_ctrl_cluster(unsigned ncontrols, struct v4l2_ctrl **controls);
 
 
-/** v4l2_ctrl_auto_cluster() - Mark all controls in the cluster as belonging to
-  * that cluster and set it up for autofoo/foo-type handling.
-  * @ncontrols:	The number of controls in this cluster.
-  * @controls:	The cluster control array of size @ncontrols. The first control
-  *		must be the 'auto' control (e.g. autogain, autoexposure, etc.)
-  * @manual_val: The value for the first control in the cluster that equals the
-  *		manual setting.
-  * @set_volatile: If true, then all controls except the first auto control will
-  *		be volatile.
-  *
-  * Use for control groups where one control selects some automatic feature and
-  * the other controls are only active whenever the automatic feature is turned
-  * off (manual mode). Typical examples: autogain vs gain, auto-whitebalance vs
-  * red and blue balance, etc.
-  *
-  * The behavior of such controls is as follows:
-  *
-  * When the autofoo control is set to automatic, then any manual controls
-  * are set to inactive and any reads will call g_volatile_ctrl (if the control
-  * was marked volatile).
-  *
-  * When the autofoo control is set to manual, then any manual controls will
-  * be marked active, and any reads will just return the current value without
-  * going through g_volatile_ctrl.
-  *
-  * In addition, this function will set the V4L2_CTRL_FLAG_UPDATE flag
-  * on the autofoo control and V4L2_CTRL_FLAG_INACTIVE on the foo control(s)
-  * if autofoo is in auto mode.
-  */
+/**
+ * v4l2_ctrl_auto_cluster() - Mark all controls in the cluster as belonging to
+ * that cluster and set it up for autofoo/foo-type handling.
+ * @ncontrols:	The number of controls in this cluster.
+ * @controls:	The cluster control array of size @ncontrols. The first control
+ *		must be the 'auto' control (e.g. autogain, autoexposure, etc.)
+ * @manual_val: The value for the first control in the cluster that equals the
+ *		manual setting.
+ * @set_volatile: If true, then all controls except the first auto control will
+ *		be volatile.
+ *
+ * Use for control groups where one control selects some automatic feature and
+ * the other controls are only active whenever the automatic feature is turned
+ * off (manual mode). Typical examples: autogain vs gain, auto-whitebalance vs
+ * red and blue balance, etc.
+ *
+ * The behavior of such controls is as follows:
+ *
+ * When the autofoo control is set to automatic, then any manual controls
+ * are set to inactive and any reads will call g_volatile_ctrl (if the control
+ * was marked volatile).
+ *
+ * When the autofoo control is set to manual, then any manual controls will
+ * be marked active, and any reads will just return the current value without
+ * going through g_volatile_ctrl.
+ *
+ * In addition, this function will set the V4L2_CTRL_FLAG_UPDATE flag
+ * on the autofoo control and V4L2_CTRL_FLAG_INACTIVE on the foo control(s)
+ * if autofoo is in auto mode.
+ */
 void v4l2_ctrl_auto_cluster(unsigned ncontrols, struct v4l2_ctrl **controls,
 			u8 manual_val, bool set_volatile);
 
 
-/** v4l2_ctrl_find() - Find a control with the given ID.
-  * @hdl:	The control handler.
-  * @id:	The control ID to find.
-  *
-  * If @hdl == NULL this will return NULL as well. Will lock the handler so
-  * do not use from inside &v4l2_ctrl_ops.
-  */
+/**
+ * v4l2_ctrl_find() - Find a control with the given ID.
+ * @hdl:	The control handler.
+ * @id:	The control ID to find.
+ *
+ * If @hdl == NULL this will return NULL as well. Will lock the handler so
+ * do not use from inside &v4l2_ctrl_ops.
+ */
 struct v4l2_ctrl *v4l2_ctrl_find(struct v4l2_ctrl_handler *hdl, u32 id);
 
-/** v4l2_ctrl_activate() - Make the control active or inactive.
-  * @ctrl:	The control to (de)activate.
-  * @active:	True if the control should become active.
-  *
-  * This sets or clears the V4L2_CTRL_FLAG_INACTIVE flag atomically.
-  * Does nothing if @ctrl == NULL.
-  * This will usually be called from within the s_ctrl op.
-  * The V4L2_EVENT_CTRL event will be generated afterwards.
-  *
-  * This function assumes that the control handler is locked.
-  */
+/**
+ * v4l2_ctrl_activate() - Make the control active or inactive.
+ * @ctrl:	The control to (de)activate.
+ * @active:	True if the control should become active.
+ *
+ * This sets or clears the V4L2_CTRL_FLAG_INACTIVE flag atomically.
+ * Does nothing if @ctrl == NULL.
+ * This will usually be called from within the s_ctrl op.
+ * The V4L2_EVENT_CTRL event will be generated afterwards.
+ *
+ * This function assumes that the control handler is locked.
+ */
 void v4l2_ctrl_activate(struct v4l2_ctrl *ctrl, bool active);
 
-/** v4l2_ctrl_grab() - Mark the control as grabbed or not grabbed.
-  * @ctrl:	The control to (de)activate.
-  * @grabbed:	True if the control should become grabbed.
-  *
-  * This sets or clears the V4L2_CTRL_FLAG_GRABBED flag atomically.
-  * Does nothing if @ctrl == NULL.
-  * The V4L2_EVENT_CTRL event will be generated afterwards.
-  * This will usually be called when starting or stopping streaming in the
-  * driver.
-  *
-  * This function assumes that the control handler is not locked and will
-  * take the lock itself.
-  */
+/**
+ * v4l2_ctrl_grab() - Mark the control as grabbed or not grabbed.
+ * @ctrl:	The control to (de)activate.
+ * @grabbed:	True if the control should become grabbed.
+ *
+ * This sets or clears the V4L2_CTRL_FLAG_GRABBED flag atomically.
+ * Does nothing if @ctrl == NULL.
+ * The V4L2_EVENT_CTRL event will be generated afterwards.
+ * This will usually be called when starting or stopping streaming in the
+ * driver.
+ *
+ * This function assumes that the control handler is not locked and will
+ * take the lock itself.
+ */
 void v4l2_ctrl_grab(struct v4l2_ctrl *ctrl, bool grabbed);
 
 
-/** __v4l2_ctrl_modify_range() - Unlocked variant of v4l2_ctrl_modify_range() */
+/**
+ *__v4l2_ctrl_modify_range() - Unlocked variant of v4l2_ctrl_modify_range()
+ *
+ * @ctrl:	The control to update.
+ * @min:	The control's minimum value.
+ * @max:	The control's maximum value.
+ * @step:	The control's step value
+ * @def:	The control's default value.
+ *
+ * Update the range of a control on the fly. This works for control types
+ * INTEGER, BOOLEAN, MENU, INTEGER MENU and BITMASK. For menu controls the
+ * @step value is interpreted as a menu_skip_mask.
+ *
+ * An error is returned if one of the range arguments is invalid for this
+ * control type.
+ *
+ * This function assumes that the control handler is not locked and will
+ * take the lock itself.
+ */
 int __v4l2_ctrl_modify_range(struct v4l2_ctrl *ctrl,
 			     s64 min, s64 max, u64 step, s64 def);
 
-/** v4l2_ctrl_modify_range() - Update the range of a control.
-  * @ctrl:	The control to update.
-  * @min:	The control's minimum value.
-  * @max:	The control's maximum value.
-  * @step:	The control's step value
-  * @def:	The control's default value.
-  *
-  * Update the range of a control on the fly. This works for control types
-  * INTEGER, BOOLEAN, MENU, INTEGER MENU and BITMASK. For menu controls the
-  * @step value is interpreted as a menu_skip_mask.
-  *
-  * An error is returned if one of the range arguments is invalid for this
-  * control type.
-  *
-  * This function assumes that the control handler is not locked and will
-  * take the lock itself.
-  */
+/**
+ * v4l2_ctrl_modify_range() - Update the range of a control.
+ * @ctrl:	The control to update.
+ * @min:	The control's minimum value.
+ * @max:	The control's maximum value.
+ * @step:	The control's step value
+ * @def:	The control's default value.
+ *
+ * Update the range of a control on the fly. This works for control types
+ * INTEGER, BOOLEAN, MENU, INTEGER MENU and BITMASK. For menu controls the
+ * @step value is interpreted as a menu_skip_mask.
+ *
+ * An error is returned if one of the range arguments is invalid for this
+ * control type.
+ *
+ * This function assumes that the control handler is not locked and will
+ * take the lock itself.
+ */
 static inline int v4l2_ctrl_modify_range(struct v4l2_ctrl *ctrl,
 					 s64 min, s64 max, u64 step, s64 def)
 {
@@ -656,21 +704,23 @@ static inline int v4l2_ctrl_modify_range(struct v4l2_ctrl *ctrl,
 	return rval;
 }
 
-/** v4l2_ctrl_notify() - Function to set a notify callback for a control.
-  * @ctrl:	The control.
-  * @notify:	The callback function.
-  * @priv:	The callback private handle, passed as argument to the callback.
-  *
-  * This function sets a callback function for the control. If @ctrl is NULL,
-  * then it will do nothing. If @notify is NULL, then the notify callback will
-  * be removed.
-  *
-  * There can be only one notify. If another already exists, then a WARN_ON
-  * will be issued and the function will do nothing.
-  */
+/**
+ * v4l2_ctrl_notify() - Function to set a notify callback for a control.
+ * @ctrl:	The control.
+ * @notify:	The callback function.
+ * @priv:	The callback private handle, passed as argument to the callback.
+ *
+ * This function sets a callback function for the control. If @ctrl is NULL,
+ * then it will do nothing. If @notify is NULL, then the notify callback will
+ * be removed.
+ *
+ * There can be only one notify. If another already exists, then a WARN_ON
+ * will be issued and the function will do nothing.
+ */
 void v4l2_ctrl_notify(struct v4l2_ctrl *ctrl, v4l2_ctrl_notify_fnc notify, void *priv);
 
-/** v4l2_ctrl_get_name() - Get the name of the control
+/**
+ * v4l2_ctrl_get_name() - Get the name of the control
  * @id:		The control ID.
  *
  * This function returns the name of the given control ID or NULL if it isn't
@@ -678,7 +728,8 @@ void v4l2_ctrl_notify(struct v4l2_ctrl *ctrl, v4l2_ctrl_notify_fnc notify, void
  */
 const char *v4l2_ctrl_get_name(u32 id);
 
-/** v4l2_ctrl_get_menu() - Get the menu string array of the control
+/**
+ * v4l2_ctrl_get_menu() - Get the menu string array of the control
  * @id:		The control ID.
  *
  * This function returns the NULL-terminated menu string array name of the
@@ -686,7 +737,8 @@ const char *v4l2_ctrl_get_name(u32 id);
  */
 const char * const *v4l2_ctrl_get_menu(u32 id);
 
-/** v4l2_ctrl_get_int_menu() - Get the integer menu array of the control
+/**
+ * v4l2_ctrl_get_int_menu() - Get the integer menu array of the control
  * @id:		The control ID.
  * @len:	The size of the integer array.
  *
@@ -695,29 +747,41 @@ const char * const *v4l2_ctrl_get_menu(u32 id);
  */
 const s64 *v4l2_ctrl_get_int_menu(u32 id, u32 *len);
 
-/** v4l2_ctrl_g_ctrl() - Helper function to get the control's value from within a driver.
-  * @ctrl:	The control.
-  *
-  * This returns the control's value safely by going through the control
-  * framework. This function will lock the control's handler, so it cannot be
-  * used from within the &v4l2_ctrl_ops functions.
-  *
-  * This function is for integer type controls only.
-  */
+/**
+ * v4l2_ctrl_g_ctrl() - Helper function to get the control's value from within a driver.
+ * @ctrl:	The control.
+ *
+ * This returns the control's value safely by going through the control
+ * framework. This function will lock the control's handler, so it cannot be
+ * used from within the &v4l2_ctrl_ops functions.
+ *
+ * This function is for integer type controls only.
+ */
 s32 v4l2_ctrl_g_ctrl(struct v4l2_ctrl *ctrl);
 
-/** __v4l2_ctrl_s_ctrl() - Unlocked variant of v4l2_ctrl_s_ctrl(). */
+/**
+ * __v4l2_ctrl_s_ctrl() - Unlocked variant of v4l2_ctrl_s_ctrl().
+ * @ctrl:	The control.
+ * @val:	The new value.
+ *
+ * This set the control's new value safely by going through the control
+ * framework. This function will lock the control's handler, so it cannot be
+ * used from within the &v4l2_ctrl_ops functions.
+ *
+ * This function is for integer type controls only.
+ */
 int __v4l2_ctrl_s_ctrl(struct v4l2_ctrl *ctrl, s32 val);
+
 /** v4l2_ctrl_s_ctrl() - Helper function to set the control's value from within a driver.
-  * @ctrl:	The control.
-  * @val:	The new value.
-  *
-  * This set the control's new value safely by going through the control
-  * framework. This function will lock the control's handler, so it cannot be
-  * used from within the &v4l2_ctrl_ops functions.
-  *
-  * This function is for integer type controls only.
-  */
+ * @ctrl:	The control.
+ * @val:	The new value.
+ *
+ * This set the control's new value safely by going through the control
+ * framework. This function will lock the control's handler, so it cannot be
+ * used from within the &v4l2_ctrl_ops functions.
+ *
+ * This function is for integer type controls only.
+ */
 static inline int v4l2_ctrl_s_ctrl(struct v4l2_ctrl *ctrl, s32 val)
 {
 	int rval;
@@ -729,30 +793,45 @@ static inline int v4l2_ctrl_s_ctrl(struct v4l2_ctrl *ctrl, s32 val)
 	return rval;
 }
 
-/** v4l2_ctrl_g_ctrl_int64() - Helper function to get a 64-bit control's value from within a driver.
-  * @ctrl:	The control.
-  *
-  * This returns the control's value safely by going through the control
-  * framework. This function will lock the control's handler, so it cannot be
-  * used from within the &v4l2_ctrl_ops functions.
-  *
-  * This function is for 64-bit integer type controls only.
-  */
+/**
+ * v4l2_ctrl_g_ctrl_int64() - Helper function to get a 64-bit control's value
+ *	from within a driver.
+ * @ctrl:	The control.
+ *
+ * This returns the control's value safely by going through the control
+ * framework. This function will lock the control's handler, so it cannot be
+ * used from within the &v4l2_ctrl_ops functions.
+ *
+ * This function is for 64-bit integer type controls only.
+ */
 s64 v4l2_ctrl_g_ctrl_int64(struct v4l2_ctrl *ctrl);
 
-/** __v4l2_ctrl_s_ctrl_int64() - Unlocked variant of v4l2_ctrl_s_ctrl_int64(). */
+/**
+ * __v4l2_ctrl_s_ctrl_int64() - Unlocked variant of v4l2_ctrl_s_ctrl_int64().
+ *
+ * @ctrl:	The control.
+ * @val:	The new value.
+ *
+ * This set the control's new value safely by going through the control
+ * framework. This function will lock the control's handler, so it cannot be
+ * used from within the &v4l2_ctrl_ops functions.
+ *
+ * This function is for 64-bit integer type controls only.
+ */
 int __v4l2_ctrl_s_ctrl_int64(struct v4l2_ctrl *ctrl, s64 val);
 
-/** v4l2_ctrl_s_ctrl_int64() - Helper function to set a 64-bit control's value from within a driver.
-  * @ctrl:	The control.
-  * @val:	The new value.
-  *
-  * This set the control's new value safely by going through the control
-  * framework. This function will lock the control's handler, so it cannot be
-  * used from within the &v4l2_ctrl_ops functions.
-  *
-  * This function is for 64-bit integer type controls only.
-  */
+/** v4l2_ctrl_s_ctrl_int64() - Helper function to set a 64-bit control's value
+ *	from within a driver.
+ *
+ * @ctrl:	The control.
+ * @val:	The new value.
+ *
+ * This set the control's new value safely by going through the control
+ * framework. This function will lock the control's handler, so it cannot be
+ * used from within the &v4l2_ctrl_ops functions.
+ *
+ * This function is for 64-bit integer type controls only.
+ */
 static inline int v4l2_ctrl_s_ctrl_int64(struct v4l2_ctrl *ctrl, s64 val)
 {
 	int rval;
@@ -764,19 +843,31 @@ static inline int v4l2_ctrl_s_ctrl_int64(struct v4l2_ctrl *ctrl, s64 val)
 	return rval;
 }
 
-/** __v4l2_ctrl_s_ctrl_string() - Unlocked variant of v4l2_ctrl_s_ctrl_string(). */
+/** __v4l2_ctrl_s_ctrl_string() - Unlocked variant of v4l2_ctrl_s_ctrl_string().
+ *
+ * @ctrl:	The control.
+ * @s:		The new string.
+ *
+ * This set the control's new string safely by going through the control
+ * framework. This function will lock the control's handler, so it cannot be
+ * used from within the &v4l2_ctrl_ops functions.
+ *
+ * This function is for string type controls only.
+ */
 int __v4l2_ctrl_s_ctrl_string(struct v4l2_ctrl *ctrl, const char *s);
 
-/** v4l2_ctrl_s_ctrl_string() - Helper function to set a control's string value from within a driver.
-  * @ctrl:	The control.
-  * @s:		The new string.
-  *
-  * This set the control's new string safely by going through the control
-  * framework. This function will lock the control's handler, so it cannot be
-  * used from within the &v4l2_ctrl_ops functions.
-  *
-  * This function is for string type controls only.
-  */
+/** v4l2_ctrl_s_ctrl_string() - Helper function to set a control's string value
+ *	 from within a driver.
+ *
+ * @ctrl:	The control.
+ * @s:		The new string.
+ *
+ * This set the control's new string safely by going through the control
+ * framework. This function will lock the control's handler, so it cannot be
+ * used from within the &v4l2_ctrl_ops functions.
+ *
+ * This function is for string type controls only.
+ */
 static inline int v4l2_ctrl_s_ctrl_string(struct v4l2_ctrl *ctrl, const char *s)
 {
 	int rval;
-- 
cgit v1.2.3


From b6fce850d9fa46eefdaa7ef28ac1f3fce7c803d2 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Sat, 22 Aug 2015 08:28:39 -0300
Subject: [media] v4l2-event.h: fix comments and add to DocBook

The comments there are good enough for DocBook, however they're
using a wrong format. Fix and add to device-drivers.xml.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl |  2 +-
 include/media/v4l2-event.h                | 47 ++++++++++++++++---------------
 2 files changed, 26 insertions(+), 23 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index 5f01f7ad15dc..46e6818b95ce 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -232,9 +232,9 @@ X!Isound/sound_firmware.c
 !Idrivers/media/dvb-core/dvb_math.h
 !Idrivers/media/dvb-core/dvb_ringbuffer.h
 !Iinclude/media/v4l2-ctrls.h
+!Iinclude/media/v4l2-event.h
 <!-- FIXME: Removed for now due to document generation inconsistency
 X!Iinclude/media/v4l2-dv-timings.h
-X!Iinclude/media/v4l2-event.h
 X!Iinclude/media/v4l2-mediabus.h
 X!Iinclude/media/videobuf2-memops.h
 X!Iinclude/media/videobuf2-core.h
diff --git a/include/media/v4l2-event.h b/include/media/v4l2-event.h
index 1ab9045e52e3..9792f906423b 100644
--- a/include/media/v4l2-event.h
+++ b/include/media/v4l2-event.h
@@ -68,10 +68,11 @@ struct v4l2_subdev;
 struct v4l2_subscribed_event;
 struct video_device;
 
-/** struct v4l2_kevent - Internal kernel event struct.
-  * @list:	List node for the v4l2_fh->available list.
-  * @sev:	Pointer to parent v4l2_subscribed_event.
-  * @event:	The event itself.
+/**
+ * struct v4l2_kevent - Internal kernel event struct.
+ * @list:	List node for the v4l2_fh->available list.
+ * @sev:	Pointer to parent v4l2_subscribed_event.
+ * @event:	The event itself.
   */
 struct v4l2_kevent {
 	struct list_head	list;
@@ -80,11 +81,12 @@ struct v4l2_kevent {
 };
 
 /** struct v4l2_subscribed_event_ops - Subscribed event operations.
-  * @add:	Optional callback, called when a new listener is added
-  * @del:	Optional callback, called when a listener stops listening
-  * @replace:	Optional callback that can replace event 'old' with event 'new'.
-  * @merge:	Optional callback that can merge event 'old' into event 'new'.
-  */
+ *
+ * @add:	Optional callback, called when a new listener is added
+ * @del:	Optional callback, called when a listener stops listening
+ * @replace:	Optional callback that can replace event 'old' with event 'new'.
+ * @merge:	Optional callback that can merge event 'old' into event 'new'.
+ */
 struct v4l2_subscribed_event_ops {
 	int  (*add)(struct v4l2_subscribed_event *sev, unsigned elems);
 	void (*del)(struct v4l2_subscribed_event *sev);
@@ -92,19 +94,20 @@ struct v4l2_subscribed_event_ops {
 	void (*merge)(const struct v4l2_event *old, struct v4l2_event *new);
 };
 
-/** struct v4l2_subscribed_event - Internal struct representing a subscribed event.
-  * @list:	List node for the v4l2_fh->subscribed list.
-  * @type:	Event type.
-  * @id:	Associated object ID (e.g. control ID). 0 if there isn't any.
-  * @flags:	Copy of v4l2_event_subscription->flags.
-  * @fh:	Filehandle that subscribed to this event.
-  * @node:	List node that hooks into the object's event list (if there is one).
-  * @ops:	v4l2_subscribed_event_ops
-  * @elems:	The number of elements in the events array.
-  * @first:	The index of the events containing the oldest available event.
-  * @in_use:	The number of queued events.
-  * @events:	An array of @elems events.
-  */
+/**
+ * struct v4l2_subscribed_event - Internal struct representing a subscribed event.
+ * @list:	List node for the v4l2_fh->subscribed list.
+ * @type:	Event type.
+ * @id:	Associated object ID (e.g. control ID). 0 if there isn't any.
+ * @flags:	Copy of v4l2_event_subscription->flags.
+ * @fh:	Filehandle that subscribed to this event.
+ * @node:	List node that hooks into the object's event list (if there is one).
+ * @ops:	v4l2_subscribed_event_ops
+ * @elems:	The number of elements in the events array.
+ * @first:	The index of the events containing the oldest available event.
+ * @in_use:	The number of queued events.
+ * @events:	An array of @elems events.
+ */
 struct v4l2_subscribed_event {
 	struct list_head	list;
 	u32			type;
-- 
cgit v1.2.3


From 506bb54bdb0898e046e62751d761a5592966a637 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Sat, 22 Aug 2015 08:36:13 -0300
Subject: [media] v4l-dv-timings.h: Add to device-drivers DocBook

There are already markups for documentation at v4l-dv-timings.h,
however, they're not properly formatted.

Convert them to the right format and add this file to
the device-drivers DocBook.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl |   2 +-
 include/media/v4l2-dv-timings.h           | 135 +++++++++++++++++-------------
 2 files changed, 78 insertions(+), 59 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index 46e6818b95ce..30ba2cf2735a 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -233,8 +233,8 @@ X!Isound/sound_firmware.c
 !Idrivers/media/dvb-core/dvb_ringbuffer.h
 !Iinclude/media/v4l2-ctrls.h
 !Iinclude/media/v4l2-event.h
+!Iinclude/media/v4l2-dv-timings.h
 <!-- FIXME: Removed for now due to document generation inconsistency
-X!Iinclude/media/v4l2-dv-timings.h
 X!Iinclude/media/v4l2-mediabus.h
 X!Iinclude/media/videobuf2-memops.h
 X!Iinclude/media/videobuf2-core.h
diff --git a/include/media/v4l2-dv-timings.h b/include/media/v4l2-dv-timings.h
index e18a653549cd..b6130b50a0f1 100644
--- a/include/media/v4l2-dv-timings.h
+++ b/include/media/v4l2-dv-timings.h
@@ -23,11 +23,14 @@
 
 #include <linux/videodev2.h>
 
-/** v4l2_dv_timings_presets: list of all dv_timings presets.
+/**
+ * v4l2_dv_timings_presets: list of all dv_timings presets.
  */
 extern const struct v4l2_dv_timings v4l2_dv_timings_presets[];
 
-/** v4l2_check_dv_timings_fnc - timings check callback
+/**
+ * v4l2_check_dv_timings_fnc - timings check callback
+ *
  * @t: the v4l2_dv_timings struct.
  * @handle: a handle from the driver.
  *
@@ -35,83 +38,95 @@ extern const struct v4l2_dv_timings v4l2_dv_timings_presets[];
  */
 typedef bool v4l2_check_dv_timings_fnc(const struct v4l2_dv_timings *t, void *handle);
 
-/** v4l2_valid_dv_timings() - are these timings valid?
-  * @t:	  the v4l2_dv_timings struct.
-  * @cap: the v4l2_dv_timings_cap capabilities.
-  * @fnc: callback to check if this timing is OK. May be NULL.
-  * @fnc_handle: a handle that is passed on to @fnc.
-  *
-  * Returns true if the given dv_timings struct is supported by the
-  * hardware capabilities and the callback function (if non-NULL), returns
-  * false otherwise.
-  */
+/**
+ * v4l2_valid_dv_timings() - are these timings valid?
+ *
+ * @t:	  the v4l2_dv_timings struct.
+ * @cap: the v4l2_dv_timings_cap capabilities.
+ * @fnc: callback to check if this timing is OK. May be NULL.
+ * @fnc_handle: a handle that is passed on to @fnc.
+ *
+ * Returns true if the given dv_timings struct is supported by the
+ * hardware capabilities and the callback function (if non-NULL), returns
+ * false otherwise.
+ */
 bool v4l2_valid_dv_timings(const struct v4l2_dv_timings *t,
 			   const struct v4l2_dv_timings_cap *cap,
 			   v4l2_check_dv_timings_fnc fnc,
 			   void *fnc_handle);
 
-/** v4l2_enum_dv_timings_cap() - Helper function to enumerate possible DV timings based on capabilities
-  * @t:	  the v4l2_enum_dv_timings struct.
-  * @cap: the v4l2_dv_timings_cap capabilities.
-  * @fnc: callback to check if this timing is OK. May be NULL.
-  * @fnc_handle: a handle that is passed on to @fnc.
-  *
-  * This enumerates dv_timings using the full list of possible CEA-861 and DMT
-  * timings, filtering out any timings that are not supported based on the
-  * hardware capabilities and the callback function (if non-NULL).
-  *
-  * If a valid timing for the given index is found, it will fill in @t and
-  * return 0, otherwise it returns -EINVAL.
-  */
+/**
+ * v4l2_enum_dv_timings_cap() - Helper function to enumerate possible DV
+ *	 timings based on capabilities
+ *
+ * @t:	  the v4l2_enum_dv_timings struct.
+ * @cap: the v4l2_dv_timings_cap capabilities.
+ * @fnc: callback to check if this timing is OK. May be NULL.
+ * @fnc_handle: a handle that is passed on to @fnc.
+ *
+ * This enumerates dv_timings using the full list of possible CEA-861 and DMT
+ * timings, filtering out any timings that are not supported based on the
+ * hardware capabilities and the callback function (if non-NULL).
+ *
+ * If a valid timing for the given index is found, it will fill in @t and
+ * return 0, otherwise it returns -EINVAL.
+ */
 int v4l2_enum_dv_timings_cap(struct v4l2_enum_dv_timings *t,
 			     const struct v4l2_dv_timings_cap *cap,
 			     v4l2_check_dv_timings_fnc fnc,
 			     void *fnc_handle);
 
-/** v4l2_find_dv_timings_cap() - Find the closest timings struct
-  * @t:	  the v4l2_enum_dv_timings struct.
-  * @cap: the v4l2_dv_timings_cap capabilities.
-  * @pclock_delta: maximum delta between t->pixelclock and the timing struct
-  *		under consideration.
-  * @fnc: callback to check if a given timings struct is OK. May be NULL.
-  * @fnc_handle: a handle that is passed on to @fnc.
-  *
-  * This function tries to map the given timings to an entry in the
-  * full list of possible CEA-861 and DMT timings, filtering out any timings
-  * that are not supported based on the hardware capabilities and the callback
-  * function (if non-NULL).
-  *
-  * On success it will fill in @t with the found timings and it returns true.
-  * On failure it will return false.
-  */
+/**
+ * v4l2_find_dv_timings_cap() - Find the closest timings struct
+ *
+ * @t:	  the v4l2_enum_dv_timings struct.
+ * @cap: the v4l2_dv_timings_cap capabilities.
+ * @pclock_delta: maximum delta between t->pixelclock and the timing struct
+ *		under consideration.
+ * @fnc: callback to check if a given timings struct is OK. May be NULL.
+ * @fnc_handle: a handle that is passed on to @fnc.
+ *
+ * This function tries to map the given timings to an entry in the
+ * full list of possible CEA-861 and DMT timings, filtering out any timings
+ * that are not supported based on the hardware capabilities and the callback
+ * function (if non-NULL).
+ *
+ * On success it will fill in @t with the found timings and it returns true.
+ * On failure it will return false.
+ */
 bool v4l2_find_dv_timings_cap(struct v4l2_dv_timings *t,
 			      const struct v4l2_dv_timings_cap *cap,
 			      unsigned pclock_delta,
 			      v4l2_check_dv_timings_fnc fnc,
 			      void *fnc_handle);
 
-/** v4l2_match_dv_timings() - do two timings match?
-  * @measured:	  the measured timings data.
-  * @standard:	  the timings according to the standard.
-  * @pclock_delta: maximum delta in Hz between standard->pixelclock and
-  * 		the measured timings.
-  *
-  * Returns true if the two timings match, returns false otherwise.
-  */
+/**
+ * v4l2_match_dv_timings() - do two timings match?
+ *
+ * @measured:	  the measured timings data.
+ * @standard:	  the timings according to the standard.
+ * @pclock_delta: maximum delta in Hz between standard->pixelclock and
+ * 		the measured timings.
+ *
+ * Returns true if the two timings match, returns false otherwise.
+ */
 bool v4l2_match_dv_timings(const struct v4l2_dv_timings *measured,
 			   const struct v4l2_dv_timings *standard,
 			   unsigned pclock_delta);
 
-/** v4l2_print_dv_timings() - log the contents of a dv_timings struct
-  * @dev_prefix:device prefix for each log line.
-  * @prefix:	additional prefix for each log line, may be NULL.
-  * @t:		the timings data.
-  * @detailed:	if true, give a detailed log.
-  */
+/**
+ * v4l2_print_dv_timings() - log the contents of a dv_timings struct
+ * @dev_prefix:device prefix for each log line.
+ * @prefix:	additional prefix for each log line, may be NULL.
+ * @t:		the timings data.
+ * @detailed:	if true, give a detailed log.
+ */
 void v4l2_print_dv_timings(const char *dev_prefix, const char *prefix,
 			   const struct v4l2_dv_timings *t, bool detailed);
 
-/** v4l2_detect_cvt - detect if the given timings follow the CVT standard
+/**
+ * v4l2_detect_cvt - detect if the given timings follow the CVT standard
+ *
  * @frame_height - the total height of the frame (including blanking) in lines.
  * @hfreq - the horizontal frequency in Hz.
  * @vsync - the height of the vertical sync in lines.
@@ -131,7 +146,9 @@ bool v4l2_detect_cvt(unsigned frame_height, unsigned hfreq, unsigned vsync,
 		unsigned active_width, u32 polarities, bool interlaced,
 		struct v4l2_dv_timings *fmt);
 
-/** v4l2_detect_gtf - detect if the given timings follow the GTF standard
+/**
+ * v4l2_detect_gtf - detect if the given timings follow the GTF standard
+ *
  * @frame_height - the total height of the frame (including blanking) in lines.
  * @hfreq - the horizontal frequency in Hz.
  * @vsync - the height of the vertical sync in lines.
@@ -153,8 +170,10 @@ bool v4l2_detect_gtf(unsigned frame_height, unsigned hfreq, unsigned vsync,
 		u32 polarities, bool interlaced, struct v4l2_fract aspect,
 		struct v4l2_dv_timings *fmt);
 
-/** v4l2_calc_aspect_ratio - calculate the aspect ratio based on bytes
+/**
+ * v4l2_calc_aspect_ratio - calculate the aspect ratio based on bytes
  *	0x15 and 0x16 from the EDID.
+ *
  * @hor_landscape - byte 0x15 from the EDID.
  * @vert_portrait - byte 0x16 from the EDID.
  *
-- 
cgit v1.2.3


From d78757e780f1845dd5c9615169f7401c7d0d1a66 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Sat, 22 Aug 2015 08:57:02 -0300
Subject: [media] videobuf2-core: Add it to device-drivers DocBook

Most of the stuff at videobuf2-core are ok for adding it to
DocBook.

Two notes here:
1) As videobuf2-core will be soon be changed, better to
   not spend too much efforts right now, as things will
   change soon;

2) struct vb2_queue has a number of private elements that are
   documented. As Kernel nano documentation format handles
   "private:" arguments, we need to put them on a separate
   comment block or to remove. Keeping the comments is
   obviously better ;)

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl |  2 +-
 include/media/videobuf2-core.h            | 10 ++++++----
 2 files changed, 7 insertions(+), 5 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index 30ba2cf2735a..e636bbd41933 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -234,10 +234,10 @@ X!Isound/sound_firmware.c
 !Iinclude/media/v4l2-ctrls.h
 !Iinclude/media/v4l2-event.h
 !Iinclude/media/v4l2-dv-timings.h
+!Iinclude/media/videobuf2-core.h
 <!-- FIXME: Removed for now due to document generation inconsistency
 X!Iinclude/media/v4l2-mediabus.h
 X!Iinclude/media/videobuf2-memops.h
-X!Iinclude/media/videobuf2-core.h
 -->
 
   </chapter>
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index 22a44c2f5963..4f7f7aed9157 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -362,7 +362,9 @@ struct v4l2_fh;
  *		start_streaming() can be called. Used when a DMA engine
  *		cannot be started unless at least this number of buffers
  *		have been queued into the driver.
- *
+ */
+/*
+ * Private elements (won't appear at the DocBook):
  * @mmap_lock:	private mutex used when buffers are allocated/freed/mmapped
  * @memory:	current memory type used
  * @bufs:	videobuf buffer structures
@@ -405,7 +407,7 @@ struct vb2_queue {
 	gfp_t				gfp_flags;
 	u32				min_buffers_needed;
 
-/* private: internal use only */
+	/* private: internal use only */
 	struct mutex			mmap_lock;
 	enum v4l2_memory		memory;
 	struct vb2_buffer		*bufs[VIDEO_MAX_FRAME];
@@ -482,7 +484,8 @@ size_t vb2_read(struct vb2_queue *q, char __user *data, size_t count,
 		loff_t *ppos, int nonblock);
 size_t vb2_write(struct vb2_queue *q, const char __user *data, size_t count,
 		loff_t *ppos, int nonblock);
-/**
+
+/*
  * vb2_thread_fnc - callback function for use with vb2_thread
  *
  * This is called whenever a buffer is dequeued in the thread.
@@ -575,7 +578,6 @@ static inline void vb2_set_plane_payload(struct vb2_buffer *vb,
  * vb2_get_plane_payload() - get bytesused for the plane plane_no
  * @vb:		buffer for which plane payload should be set
  * @plane_no:	plane number for which payload should be set
- * @size:	payload in bytes
  */
 static inline unsigned long vb2_get_plane_payload(struct vb2_buffer *vb,
 				 unsigned int plane_no)
-- 
cgit v1.2.3


From b6836a6fbc4b7e35679bf7bb8b54bad294ae0b10 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Sat, 22 Aug 2015 09:01:58 -0300
Subject: [media] videobuf2-memops.h: add to device-drivers DocBook

The comment metadata was wrong:

Warning(.//include/media/videobuf2-memops.h:25): cannot understand function prototype: 'struct vb2_vmarea_handler '
Warning(.//include/media/videobuf2-memops.h): no structured comments found

Fix and add to DocBook.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl | 2 +-
 include/media/videobuf2-memops.h          | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index e636bbd41933..8d1e04c94dce 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -235,9 +235,9 @@ X!Isound/sound_firmware.c
 !Iinclude/media/v4l2-event.h
 !Iinclude/media/v4l2-dv-timings.h
 !Iinclude/media/videobuf2-core.h
+!Iinclude/media/videobuf2-memops.h
 <!-- FIXME: Removed for now due to document generation inconsistency
 X!Iinclude/media/v4l2-mediabus.h
-X!Iinclude/media/videobuf2-memops.h
 -->
 
   </chapter>
diff --git a/include/media/videobuf2-memops.h b/include/media/videobuf2-memops.h
index f05444ca8c0c..9f36641a6781 100644
--- a/include/media/videobuf2-memops.h
+++ b/include/media/videobuf2-memops.h
@@ -17,7 +17,8 @@
 #include <media/videobuf2-core.h>
 
 /**
- * vb2_vmarea_handler - common vma refcount tracking handler
+ * struct vb2_vmarea_handler - common vma refcount tracking handler
+ *
  * @refcount:	pointer to refcount entry in the buffer
  * @put:	callback to function that decreases buffer refcount
  * @arg:	argument for @put callback
-- 
cgit v1.2.3


From 98d00bd7a2477ffce8010458938425095345f470 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Sat, 22 Aug 2015 09:04:46 -0300
Subject: [media] v4l2-mediabus: Add to DocBook

Fix the format of the comments and add to DocBook.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl | 4 +---
 include/media/v4l2-mediabus.h             | 4 ++--
 2 files changed, 3 insertions(+), 5 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index 8d1e04c94dce..53a7c5d8b996 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -236,9 +236,7 @@ X!Isound/sound_firmware.c
 !Iinclude/media/v4l2-dv-timings.h
 !Iinclude/media/videobuf2-core.h
 !Iinclude/media/videobuf2-memops.h
-<!-- FIXME: Removed for now due to document generation inconsistency
-X!Iinclude/media/v4l2-mediabus.h
--->
+!Iinclude/media/v4l2-mediabus.h
 
   </chapter>
 
diff --git a/include/media/v4l2-mediabus.h b/include/media/v4l2-mediabus.h
index 73069e4c2796..34cc99e093ef 100644
--- a/include/media/v4l2-mediabus.h
+++ b/include/media/v4l2-mediabus.h
@@ -65,7 +65,7 @@
 					 V4L2_MBUS_CSI2_CHANNEL_2 | V4L2_MBUS_CSI2_CHANNEL_3)
 
 /**
- * v4l2_mbus_type - media bus type
+ * enum v4l2_mbus_type - media bus type
  * @V4L2_MBUS_PARALLEL:	parallel interface with hsync and vsync
  * @V4L2_MBUS_BT656:	parallel interface with embedded synchronisation, can
  *			also be used for BT.1120
@@ -78,7 +78,7 @@ enum v4l2_mbus_type {
 };
 
 /**
- * v4l2_mbus_config - media bus configuration
+ * struct v4l2_mbus_config - media bus configuration
  * @type:	in: interface type
  * @flags:	in / out: configuration flags, depending on @type
  */
-- 
cgit v1.2.3


From 04ffb9c126d6d887b08b68bbe0c0842ec9def78d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Sat, 22 Aug 2015 09:22:25 -0300
Subject: [media] DocBook: Better organize media devices

Instead of putting all media devices on a flat structure,
split them on 4 types, just like we do with the userspace API:
	Video2Linux devices
	Digital TV (DVB) devices
	Remote Controller devices
	Media Controller devices

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index 53a7c5d8b996..5c9375b98303 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -218,25 +218,34 @@ X!Isound/sound_firmware.c
 
   <chapter id="mediadev">
      <title>Media Devices</title>
-!Iinclude/media/media-device.h
-!Iinclude/media/media-devnode.h
-!Iinclude/media/media-entity.h
+
+     <sect1><title>Video2Linux devices</title>
 !Iinclude/media/v4l2-async.h
+!Iinclude/media/v4l2-ctrls.h
+!Iinclude/media/v4l2-dv-timings.h
+!Iinclude/media/v4l2-event.h
 !Iinclude/media/v4l2-flash-led-class.h
+!Iinclude/media/v4l2-mediabus.h
 !Iinclude/media/v4l2-mem2mem.h
 !Iinclude/media/v4l2-of.h
 !Iinclude/media/v4l2-subdev.h
-!Iinclude/media/rc-core.h
+!Iinclude/media/videobuf2-core.h
+!Iinclude/media/videobuf2-memops.h
+     </sect1>
+     <sect1><title>Digital TV (DVB) devices</title>
 !Idrivers/media/dvb-core/dvb_ca_en50221.h
 !Idrivers/media/dvb-core/dvb_frontend.h
 !Idrivers/media/dvb-core/dvb_math.h
 !Idrivers/media/dvb-core/dvb_ringbuffer.h
-!Iinclude/media/v4l2-ctrls.h
-!Iinclude/media/v4l2-event.h
-!Iinclude/media/v4l2-dv-timings.h
-!Iinclude/media/videobuf2-core.h
-!Iinclude/media/videobuf2-memops.h
-!Iinclude/media/v4l2-mediabus.h
+     </sect1>
+     <sect1><title>Remote Controller devices</title>
+!Iinclude/media/rc-core.h
+     </sect1>
+     <sect1><title>Media Controller devices</title>
+!Iinclude/media/media-device.h
+!Iinclude/media/media-devnode.h
+!Iinclude/media/media-entity.h
+     </sect1>
 
   </chapter>
 
-- 
cgit v1.2.3


From d071c833a0d30e7aae0ea565d92ef83c79106d6f Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Date: Sat, 22 Aug 2015 19:39:38 -0300
Subject: [media] dvbdev: document most of the functions/data structs

Document the most relevant functions and data structs for
developers that are working with the DVB subsystem.

Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/DocBook/device-drivers.tmpl |   1 +
 drivers/media/dvb-core/dvbdev.h           | 116 +++++++++++++++++++++++++-----
 2 files changed, 100 insertions(+), 17 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index 5c9375b98303..93b74884ae24 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -237,6 +237,7 @@ X!Isound/sound_firmware.c
 !Idrivers/media/dvb-core/dvb_frontend.h
 !Idrivers/media/dvb-core/dvb_math.h
 !Idrivers/media/dvb-core/dvb_ringbuffer.h
+!Idrivers/media/dvb-core/dvbdev.h
      </sect1>
      <sect1><title>Remote Controller devices</title>
 !Iinclude/media/rc-core.h
diff --git a/drivers/media/dvb-core/dvbdev.h b/drivers/media/dvb-core/dvbdev.h
index 12629b8ecb0c..c61a4f03a66f 100644
--- a/drivers/media/dvb-core/dvbdev.h
+++ b/drivers/media/dvb-core/dvbdev.h
@@ -57,6 +57,25 @@
 
 struct dvb_frontend;
 
+/**
+ * struct dvb_adapter - represents a Digital TV adapter using Linux DVB API
+ *
+ * @num:		Number of the adapter
+ * @list_head:		List with the DVB adapters
+ * @device_list:	List with the DVB devices
+ * @name:		Name of the adapter
+ * @proposed_mac:	proposed MAC address for the adapter
+ * @priv:		private data
+ * @device:		pointer to struct device
+ * @module:		pointer to struct module
+ * @mfe_shared:		mfe shared: indicates mutually exclusive frontends
+ *			Thie usage of this flag is currently deprecated
+ * @mfe_dvbdev:		Frontend device in use, in the case of MFE
+ * @mfe_lock:		Lock to prevent using the other frontends when MFE is
+ *			used.
+ * @mdev:		pointer to struct media_device, used when the media
+ *			controller is used.
+ */
 struct dvb_adapter {
 	int num;
 	struct list_head list_head;
@@ -78,7 +97,34 @@ struct dvb_adapter {
 #endif
 };
 
-
+/**
+ * struct dvb_device - represents a DVB device node
+ *
+ * @list_head:	List head with all DVB devices
+ * @fops:	pointer to struct file_operations
+ * @adapter:	pointer to the adapter that holds this device node
+ * @type:	type of the device: DVB_DEVICE_SEC, DVB_DEVICE_FRONTEND,
+ *		DVB_DEVICE_DEMUX, DVB_DEVICE_DVR, DVB_DEVICE_CA, DVB_DEVICE_NET
+ * @minor:	devnode minor number. Major number is always DVB_MAJOR.
+ * @id:		device ID number, inside the adapter
+ * @readers:	Initialized by the caller. Each call to open() in Read Only mode
+ *		decreases this counter by one.
+ * @writers:	Initialized by the caller. Each call to open() in Read/Write
+ *		mode decreases this counter by one.
+ * @users:	Initialized by the caller. Each call to open() in any mode
+ *		decreases this counter by one.
+ * @wait_queue:	wait queue, used to wait for certain events inside one of
+ *		the DVB API callers
+ * @kernel_ioctl: callback function used to handle ioctl calls from userspace.
+ * @name:	Name to be used for the device at the Media Controller
+ * @entity:	pointer to struct media_entity associated with the device node
+ * @pads:	pointer to struct media_pad associated with @entity;
+ * @priv:	private data
+ *
+ * This structure is used by the DVB core (frontend, CA, net, demux) in
+ * order to create the device nodes. Usually, driver should not initialize
+ * this struct diretly.
+ */
 struct dvb_device {
 	struct list_head list_head;
 	const struct file_operations *fops;
@@ -109,19 +155,55 @@ struct dvb_device {
 	void *priv;
 };
 
+/**
+ * dvb_register_adapter - Registers a new DVB adapter
+ *
+ * @adap:	pointer to struct dvb_adapter
+ * @name:	Adapter's name
+ * @module:	initialized with THIS_MODULE at the caller
+ * @device:	pointer to struct device that corresponds to the device driver
+ * @adapter_nums: Array with a list of the numbers for @dvb_register_adapter;
+ * 		to select among them. Typically, initialized with:
+ *		DVB_DEFINE_MOD_OPT_ADAPTER_NR(adapter_nums)
+ */
+int dvb_register_adapter(struct dvb_adapter *adap, const char *name,
+			 struct module *module, struct device *device,
+			 short *adapter_nums);
 
-extern int dvb_register_adapter(struct dvb_adapter *adap, const char *name,
-				struct module *module, struct device *device,
-				short *adapter_nums);
-extern int dvb_unregister_adapter (struct dvb_adapter *adap);
-
-extern int dvb_register_device (struct dvb_adapter *adap,
-				struct dvb_device **pdvbdev,
-				const struct dvb_device *template,
-				void *priv,
-				int type);
+/**
+ * dvb_unregister_adapter - Unregisters a DVB adapter
+ *
+ * @adap:	pointer to struct dvb_adapter
+ */
+int dvb_unregister_adapter(struct dvb_adapter *adap);
 
-extern void dvb_unregister_device (struct dvb_device *dvbdev);
+/**
+ * dvb_register_device - Registers a new DVB device
+ *
+ * @adap:	pointer to struct dvb_adapter
+ * @pdvbdev:	pointer to the place where the new struct dvb_device will be
+ *		stored
+ * @template:	Template used to create &pdvbdev;
+ * @device:	pointer to struct device that corresponds to the device driver
+ * @adapter_nums: Array with a list of the numbers for @dvb_register_adapter;
+ * 		to select among them. Typically, initialized with:
+ *		DVB_DEFINE_MOD_OPT_ADAPTER_NR(adapter_nums)
+ * @priv:	private data
+ * @type:	type of the device: DVB_DEVICE_SEC, DVB_DEVICE_FRONTEND,
+ *		DVB_DEVICE_DEMUX, DVB_DEVICE_DVR, DVB_DEVICE_CA, DVB_DEVICE_NET
+ */
+int dvb_register_device(struct dvb_adapter *adap,
+			struct dvb_device **pdvbdev,
+			const struct dvb_device *template,
+			void *priv,
+			int type);
+
+/**
+ * dvb_unregister_device - Unregisters a DVB device
+ *
+ * @dvbdev:	pointer to struct dvb_device
+ */
+void dvb_unregister_device(struct dvb_device *dvbdev);
 
 #ifdef CONFIG_MEDIA_CONTROLLER_DVB
 void dvb_create_media_graph(struct dvb_adapter *adap);
@@ -136,17 +218,17 @@ static inline void dvb_create_media_graph(struct dvb_adapter *adap) {}
 #define dvb_register_media_controller(a, b) {}
 #endif
 
-extern int dvb_generic_open (struct inode *inode, struct file *file);
-extern int dvb_generic_release (struct inode *inode, struct file *file);
-extern long dvb_generic_ioctl (struct file *file,
+int dvb_generic_open (struct inode *inode, struct file *file);
+int dvb_generic_release (struct inode *inode, struct file *file);
+long dvb_generic_ioctl (struct file *file,
 			      unsigned int cmd, unsigned long arg);
 
 /* we don't mess with video_usercopy() any more,
 we simply define out own dvb_usercopy(), which will hopefully become
 generic_usercopy()  someday... */
 
-extern int dvb_usercopy(struct file *file, unsigned int cmd, unsigned long arg,
-			    int (*func)(struct file *file, unsigned int cmd, void *arg));
+int dvb_usercopy(struct file *file, unsigned int cmd, unsigned long arg,
+		 int (*func)(struct file *file, unsigned int cmd, void *arg));
 
 /** generic DVB attach function. */
 #ifdef CONFIG_MEDIA_ATTACH
-- 
cgit v1.2.3


From 43bcad2bb485f053661e5cfe306c34178c6651c7 Mon Sep 17 00:00:00 2001
From: Lars-Peter Clausen <lars@metafoo.de>
Date: Thu, 20 Aug 2015 17:39:12 +0200
Subject: devicetree: Add bindings documentation for Analog Devices AXI-DMAC

Add the devicetree descriptor for the Analog Devices AXI-DMAC DMA
controller. This is a soft peripheral used in FPGAs and the bindings
describe how it is connected to the system (clock, interrupt, memory map)
as well as the configuration options that were used when the peripheral was
instantiated.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
---
 .../devicetree/bindings/dma/adi,axi-dmac.txt       | 61 ++++++++++++++++++++++
 include/dt-bindings/dma/axi-dmac.h                 | 48 +++++++++++++++++
 2 files changed, 109 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/adi,axi-dmac.txt
 create mode 100644 include/dt-bindings/dma/axi-dmac.h

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/dma/adi,axi-dmac.txt b/Documentation/devicetree/bindings/dma/adi,axi-dmac.txt
new file mode 100644
index 000000000000..47cb1d14b690
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/adi,axi-dmac.txt
@@ -0,0 +1,61 @@
+Analog Device AXI-DMAC DMA controller
+
+Required properties:
+ - compatible: Must be "adi,axi-dmac-1.00.a".
+ - reg: Specification for the controllers memory mapped register map.
+ - interrupts: Specification for the controllers interrupt.
+ - clocks: Phandle and specifier to the controllers AXI interface clock
+ - #dma-cells: Must be 1.
+
+Required sub-nodes:
+ - adi,channels: This sub-node must contain a sub-node for each DMA channel. For
+   the channel sub-nodes the following bindings apply. They must match the
+   configuration options of the peripheral as it was instantiated.
+
+Required properties for adi,channels sub-node:
+ - #size-cells: Must be 0
+ - #address-cells: Must be 1
+
+Required channel sub-node properties:
+ - reg: Which channel this node refers to.
+ - adi,length-width: Width of the DMA transfer length register.
+ - adi,source-bus-width,
+   adi,destination-bus-width: Width of the source or destination bus in bits.
+ - adi,source-bus-type,
+   adi,destination-bus-type: Type of the source or destination bus. Must be one
+   of the following:
+	0 (AXI_DMAC_TYPE_AXI_MM): Memory mapped AXI interface
+	1 (AXI_DMAC_TYPE_AXI_STREAM): Streaming AXI interface
+	2 (AXI_DMAC_TYPE_AXI_FIFO): FIFO interface
+
+Optional channel properties:
+ - adi,cyclic: Must be set if the channel supports hardware cyclic DMA
+   transfers.
+ - adi,2d: Must be set if the channel supports hardware 2D DMA transfers.
+
+DMA clients connected to the AXI-DMAC DMA controller must use the format
+described in the dma.txt file using a one-cell specifier. The value of the
+specifier refers to the DMA channel index.
+
+Example:
+
+dma: dma@7c420000 {
+	compatible = "adi,axi-dmac-1.00.a";
+	reg = <0x7c420000 0x10000>;
+	interrupts = <0 57 0>;
+	clocks = <&clkc 16>;
+	#dma-cells = <1>;
+
+	adi,channels {
+		#size-cells = <0>;
+		#address-cells = <1>;
+
+		dma-channel@0 {
+			reg = <0>;
+			adi,source-bus-width = <32>;
+			adi,source-bus-type = <ADI_AXI_DMAC_TYPE_MM_AXI>;
+			adi,destination-bus-width = <64>;
+			adi,destination-bus-type = <ADI_AXI_DMAC_TYPE_FIFO>;
+		};
+	};
+};
diff --git a/include/dt-bindings/dma/axi-dmac.h b/include/dt-bindings/dma/axi-dmac.h
new file mode 100644
index 000000000000..ad9e6ecb9c2f
--- /dev/null
+++ b/include/dt-bindings/dma/axi-dmac.h
@@ -0,0 +1,48 @@
+/*
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This file is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __DT_BINDINGS_DMA_AXI_DMAC_H__
+#define __DT_BINDINGS_DMA_AXI_DMAC_H__
+
+#define AXI_DMAC_BUS_TYPE_AXI_MM		0
+#define AXI_DMAC_BUS_TYPE_AXI_STREAM	1
+#define AXI_DMAC_BUS_TYPE_FIFO			2
+
+#endif
-- 
cgit v1.2.3


From 5b9eaa5659b32cf6c85a492d2e3bfa7a3a413144 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben.hutchings@codethink.co.uk>
Date: Tue, 30 Jun 2015 17:53:59 +0100
Subject: pinctrl: sh-pfc: Implement pinconf power-source param for voltage
 switching

The pfc in the R8A7790 (and probably others in the R-Car gen 2 family)
supports switching SDHI signals between 3.3V and 1.8V nominal voltage,
and the SD driver should do that when switching to and from UHS modes.

Add a flag for pins that have configurable I/O voltage and SoC
operations to get and set the nominal voltage.  Implement the pinconf
power-source parameter using these operations.

Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 .../bindings/pinctrl/renesas,pfc-pinctrl.txt       |  4 +-
 drivers/pinctrl/sh-pfc/pinctrl.c                   | 44 +++++++++++++++++++++-
 drivers/pinctrl/sh-pfc/sh_pfc.h                    |  5 +++
 3 files changed, 50 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt
index e089142cfb14..9496934528bd 100644
--- a/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt
+++ b/Documentation/devicetree/bindings/pinctrl/renesas,pfc-pinctrl.txt
@@ -71,7 +71,9 @@ Pin Configuration Node Properties:
 
 The pin configuration parameters use the generic pinconf bindings defined in
 pinctrl-bindings.txt in this directory. The supported parameters are
-bias-disable, bias-pull-up and bias-pull-down.
+bias-disable, bias-pull-up, bias-pull-down and power-source. For pins that
+have a configurable I/O voltage, the power-source value should be the
+nominal I/O voltage in millivolts.
 
 
 GPIO
diff --git a/drivers/pinctrl/sh-pfc/pinctrl.c b/drivers/pinctrl/sh-pfc/pinctrl.c
index 6fe7459f0ccb..863c3e30ce05 100644
--- a/drivers/pinctrl/sh-pfc/pinctrl.c
+++ b/drivers/pinctrl/sh-pfc/pinctrl.c
@@ -491,6 +491,9 @@ static bool sh_pfc_pinconf_validate(struct sh_pfc *pfc, unsigned int _pin,
 	case PIN_CONFIG_BIAS_PULL_DOWN:
 		return pin->configs & SH_PFC_PIN_CFG_PULL_DOWN;
 
+	case PIN_CONFIG_POWER_SOURCE:
+		return pin->configs & SH_PFC_PIN_CFG_IO_VOLTAGE;
+
 	default:
 		return false;
 	}
@@ -503,7 +506,6 @@ static int sh_pfc_pinconf_get(struct pinctrl_dev *pctldev, unsigned _pin,
 	struct sh_pfc *pfc = pmx->pfc;
 	enum pin_config_param param = pinconf_to_config_param(*config);
 	unsigned long flags;
-	unsigned int bias;
 
 	if (!sh_pfc_pinconf_validate(pfc, _pin, param))
 		return -ENOTSUPP;
@@ -511,7 +513,9 @@ static int sh_pfc_pinconf_get(struct pinctrl_dev *pctldev, unsigned _pin,
 	switch (param) {
 	case PIN_CONFIG_BIAS_DISABLE:
 	case PIN_CONFIG_BIAS_PULL_UP:
-	case PIN_CONFIG_BIAS_PULL_DOWN:
+	case PIN_CONFIG_BIAS_PULL_DOWN: {
+		unsigned int bias;
+
 		if (!pfc->info->ops || !pfc->info->ops->get_bias)
 			return -ENOTSUPP;
 
@@ -524,6 +528,24 @@ static int sh_pfc_pinconf_get(struct pinctrl_dev *pctldev, unsigned _pin,
 
 		*config = 0;
 		break;
+	}
+
+	case PIN_CONFIG_POWER_SOURCE: {
+		int ret;
+
+		if (!pfc->info->ops || !pfc->info->ops->get_io_voltage)
+			return -ENOTSUPP;
+
+		spin_lock_irqsave(&pfc->lock, flags);
+		ret = pfc->info->ops->get_io_voltage(pfc, _pin);
+		spin_unlock_irqrestore(&pfc->lock, flags);
+
+		if (ret < 0)
+			return ret;
+
+		*config = ret;
+		break;
+	}
 
 	default:
 		return -ENOTSUPP;
@@ -560,6 +582,24 @@ static int sh_pfc_pinconf_set(struct pinctrl_dev *pctldev, unsigned _pin,
 
 			break;
 
+		case PIN_CONFIG_POWER_SOURCE: {
+			unsigned int arg =
+				pinconf_to_config_argument(configs[i]);
+			int ret;
+
+			if (!pfc->info->ops || !pfc->info->ops->set_io_voltage)
+				return -ENOTSUPP;
+
+			spin_lock_irqsave(&pfc->lock, flags);
+			ret = pfc->info->ops->set_io_voltage(pfc, _pin, arg);
+			spin_unlock_irqrestore(&pfc->lock, flags);
+
+			if (ret)
+				return ret;
+
+			break;
+		}
+
 		default:
 			return -ENOTSUPP;
 		}
diff --git a/drivers/pinctrl/sh-pfc/sh_pfc.h b/drivers/pinctrl/sh-pfc/sh_pfc.h
index c7508d5f6886..734f7a92229c 100644
--- a/drivers/pinctrl/sh-pfc/sh_pfc.h
+++ b/drivers/pinctrl/sh-pfc/sh_pfc.h
@@ -12,6 +12,7 @@
 #define __SH_PFC_H
 
 #include <linux/bug.h>
+#include <linux/pinctrl/pinconf-generic.h>
 #include <linux/stringify.h>
 
 enum {
@@ -26,6 +27,7 @@ enum {
 #define SH_PFC_PIN_CFG_OUTPUT		(1 << 1)
 #define SH_PFC_PIN_CFG_PULL_UP		(1 << 2)
 #define SH_PFC_PIN_CFG_PULL_DOWN	(1 << 3)
+#define SH_PFC_PIN_CFG_IO_VOLTAGE	(1 << 4)
 #define SH_PFC_PIN_CFG_NO_GPIO		(1 << 31)
 
 struct sh_pfc_pin {
@@ -121,6 +123,9 @@ struct sh_pfc_soc_operations {
 	unsigned int (*get_bias)(struct sh_pfc *pfc, unsigned int pin);
 	void (*set_bias)(struct sh_pfc *pfc, unsigned int pin,
 			 unsigned int bias);
+	int (*get_io_voltage)(struct sh_pfc *pfc, unsigned int pin);
+	int (*set_io_voltage)(struct sh_pfc *pfc, unsigned int pin,
+			      u16 voltage_mV);
 };
 
 struct sh_pfc_soc_info {
-- 
cgit v1.2.3


From bb5f8ea4d5149f3dec6f7cd24c040c52bfc0cdbd Mon Sep 17 00:00:00 2001
From: "ludovic.desroches@atmel.com" <ludovic.desroches@atmel.com>
Date: Wed, 29 Jul 2015 16:22:47 +0200
Subject: mmc: sdhci-of-at91: introduce driver for the Atmel SDMMC

Introduce driver for he Atmel SDMMC available on sama5d2. It is a sdhci
compliant controller.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 .../devicetree/bindings/mmc/sdhci-atmel.txt        |  21 +++
 drivers/mmc/host/Kconfig                           |   8 +
 drivers/mmc/host/Makefile                          |   1 +
 drivers/mmc/host/sdhci-of-at91.c                   | 192 +++++++++++++++++++++
 4 files changed, 222 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mmc/sdhci-atmel.txt
 create mode 100644 drivers/mmc/host/sdhci-of-at91.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mmc/sdhci-atmel.txt b/Documentation/devicetree/bindings/mmc/sdhci-atmel.txt
new file mode 100644
index 000000000000..1b662d7171a0
--- /dev/null
+++ b/Documentation/devicetree/bindings/mmc/sdhci-atmel.txt
@@ -0,0 +1,21 @@
+* Atmel SDHCI controller
+
+This file documents the differences between the core properties in
+Documentation/devicetree/bindings/mmc/mmc.txt and the properties used by the
+sdhci-of-at91 driver.
+
+Required properties:
+- compatible:		Must be "atmel,sama5d2-sdhci".
+- clocks:		Phandlers to the clocks.
+- clock-names:		Must be "hclock", "multclk", "baseclk";
+
+
+Example:
+
+sdmmc0: sdio-host@a0000000 {
+	compatible = "atmel,sama5d2-sdhci";
+	reg = <0xa0000000 0x300>;
+	interrupts = <31 IRQ_TYPE_LEVEL_HIGH 0>;
+	clocks = <&sdmmc0_hclk>, <&sdmmc0_gclk>, <&main>;
+	clock-names = "hclock", "multclk", "baseclk";
+};
diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 6a0f9c79be26..8a1e3498261e 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -129,6 +129,14 @@ config MMC_SDHCI_OF_ARASAN
 
 	  If unsure, say N.
 
+config MMC_SDHCI_OF_AT91
+	tristate "SDHCI OF support for the Atmel SDMMC controller"
+	depends on MMC_SDHCI_PLTFM
+	depends on OF
+	select MMC_SDHCI_IO_ACCESSORS
+	help
+	  This selects the Atmel SDMMC driver
+
 config MMC_SDHCI_OF_ESDHC
 	tristate "SDHCI OF support for the Freescale eSDHC controller"
 	depends on MMC_SDHCI_PLTFM
diff --git a/drivers/mmc/host/Makefile b/drivers/mmc/host/Makefile
index e928d61c5f4b..4f3452afa6ca 100644
--- a/drivers/mmc/host/Makefile
+++ b/drivers/mmc/host/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_MMC_SDHCI_ESDHC_IMX)	+= sdhci-esdhc-imx.o
 obj-$(CONFIG_MMC_SDHCI_DOVE)		+= sdhci-dove.o
 obj-$(CONFIG_MMC_SDHCI_TEGRA)		+= sdhci-tegra.o
 obj-$(CONFIG_MMC_SDHCI_OF_ARASAN)	+= sdhci-of-arasan.o
+obj-$(CONFIG_MMC_SDHCI_OF_AT91)		+= sdhci-of-at91.o
 obj-$(CONFIG_MMC_SDHCI_OF_ESDHC)	+= sdhci-of-esdhc.o
 obj-$(CONFIG_MMC_SDHCI_OF_HLWD)		+= sdhci-of-hlwd.o
 obj-$(CONFIG_MMC_SDHCI_BCM_KONA)	+= sdhci-bcm-kona.o
diff --git a/drivers/mmc/host/sdhci-of-at91.c b/drivers/mmc/host/sdhci-of-at91.c
new file mode 100644
index 000000000000..7a9f4b19f989
--- /dev/null
+++ b/drivers/mmc/host/sdhci-of-at91.c
@@ -0,0 +1,192 @@
+/*
+ * Atmel SDMMC controller driver.
+ *
+ * Copyright (C) 2015 Atmel,
+ *		 2015 Ludovic Desroches <ludovic.desroches@atmel.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/clk.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/mmc/host.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+
+#include "sdhci-pltfm.h"
+
+#define SDMMC_CACR	0x230
+#define		SDMMC_CACR_CAPWREN	BIT(0)
+#define		SDMMC_CACR_KEY		(0x46 << 8)
+
+struct sdhci_at91_priv {
+	struct clk *hclock;
+	struct clk *gck;
+	struct clk *mainck;
+};
+
+static const struct sdhci_ops sdhci_at91_sama5d2_ops = {
+	.set_clock		= sdhci_set_clock,
+	.set_bus_width		= sdhci_set_bus_width,
+	.reset			= sdhci_reset,
+	.set_uhs_signaling	= sdhci_set_uhs_signaling,
+};
+
+static const struct sdhci_pltfm_data soc_data_sama5d2 = {
+	.ops = &sdhci_at91_sama5d2_ops,
+};
+
+static const struct of_device_id sdhci_at91_dt_match[] = {
+	{ .compatible = "atmel,sama5d2-sdhci", .data = &soc_data_sama5d2 },
+	{}
+};
+
+static int sdhci_at91_probe(struct platform_device *pdev)
+{
+	const struct of_device_id	*match;
+	const struct sdhci_pltfm_data	*soc_data;
+	struct sdhci_host		*host;
+	struct sdhci_pltfm_host		*pltfm_host;
+	struct sdhci_at91_priv		*priv;
+	unsigned int			caps0, caps1;
+	unsigned int			clk_base, clk_mul;
+	unsigned int			gck_rate, real_gck_rate;
+	int				ret;
+
+	match = of_match_device(sdhci_at91_dt_match, &pdev->dev);
+	if (!match)
+		return -EINVAL;
+	soc_data = match->data;
+
+	priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv) {
+		dev_err(&pdev->dev, "unable to allocate private data\n");
+		return -ENOMEM;
+	}
+
+	priv->mainck = devm_clk_get(&pdev->dev, "baseclk");
+	if (IS_ERR(priv->mainck)) {
+		dev_err(&pdev->dev, "failed to get baseclk\n");
+		return PTR_ERR(priv->mainck);
+	}
+
+	priv->hclock = devm_clk_get(&pdev->dev, "hclock");
+	if (IS_ERR(priv->hclock)) {
+		dev_err(&pdev->dev, "failed to get hclock\n");
+		return PTR_ERR(priv->hclock);
+	}
+
+	priv->gck = devm_clk_get(&pdev->dev, "multclk");
+	if (IS_ERR(priv->gck)) {
+		dev_err(&pdev->dev, "failed to get multclk\n");
+		return PTR_ERR(priv->gck);
+	}
+
+	host = sdhci_pltfm_init(pdev, soc_data, 0);
+	if (IS_ERR(host))
+		return PTR_ERR(host);
+
+	/*
+	 * The mult clock is provided by as a generated clock by the PMC
+	 * controller. In order to set the rate of gck, we have to get the
+	 * base clock rate and the clock mult from capabilities.
+	 */
+	clk_prepare_enable(priv->hclock);
+	caps0 = readl(host->ioaddr + SDHCI_CAPABILITIES);
+	caps1 = readl(host->ioaddr + SDHCI_CAPABILITIES_1);
+	clk_base = (caps0 & SDHCI_CLOCK_V3_BASE_MASK) >> SDHCI_CLOCK_BASE_SHIFT;
+	clk_mul = (caps1 & SDHCI_CLOCK_MUL_MASK) >> SDHCI_CLOCK_MUL_SHIFT;
+	gck_rate = clk_base * 1000000 * (clk_mul + 1);
+	ret = clk_set_rate(priv->gck, gck_rate);
+	if (ret < 0) {
+		dev_err(&pdev->dev, "failed to set gck");
+		goto hclock_disable_unprepare;
+		return -EINVAL;
+	}
+	/*
+	 * We need to check if we have the requested rate for gck because in
+	 * some cases this rate could be not supported. If it happens, the rate
+	 * is the closest one gck can provide. We have to update the value
+	 * of clk mul.
+	 */
+	real_gck_rate = clk_get_rate(priv->gck);
+	if (real_gck_rate != gck_rate) {
+		clk_mul = real_gck_rate / (clk_base * 1000000) - 1;
+		caps1 &= (~SDHCI_CLOCK_MUL_MASK);
+		caps1 |= ((clk_mul << SDHCI_CLOCK_MUL_SHIFT) & SDHCI_CLOCK_MUL_MASK);
+		/* Set capabilities in r/w mode. */
+		writel(SDMMC_CACR_KEY | SDMMC_CACR_CAPWREN, host->ioaddr + SDMMC_CACR);
+		writel(caps1, host->ioaddr + SDHCI_CAPABILITIES_1);
+		/* Set capabilities in ro mode. */
+		writel(0, host->ioaddr + SDMMC_CACR);
+		dev_info(&pdev->dev, "update clk mul to %u as gck rate is %u Hz\n",
+			 clk_mul, real_gck_rate);
+	}
+
+	clk_prepare_enable(priv->mainck);
+	clk_prepare_enable(priv->gck);
+
+	pltfm_host = sdhci_priv(host);
+	pltfm_host->priv = priv;
+
+	ret = mmc_of_parse(host->mmc);
+	if (ret)
+		goto clocks_disable_unprepare;
+
+	sdhci_get_of_property(pdev);
+
+	ret = sdhci_add_host(host);
+	if (ret)
+		goto clocks_disable_unprepare;
+
+	return 0;
+
+clocks_disable_unprepare:
+	clk_disable_unprepare(priv->gck);
+	clk_disable_unprepare(priv->mainck);
+hclock_disable_unprepare:
+	clk_disable_unprepare(priv->hclock);
+	sdhci_pltfm_free(pdev);
+	return ret;
+}
+
+static int sdhci_at91_remove(struct platform_device *pdev)
+{
+	struct sdhci_host	*host = platform_get_drvdata(pdev);
+	struct sdhci_pltfm_host	*pltfm_host = sdhci_priv(host);
+	struct sdhci_at91_priv	*priv = pltfm_host->priv;
+
+	sdhci_pltfm_unregister(pdev);
+
+	clk_disable_unprepare(priv->gck);
+	clk_disable_unprepare(priv->hclock);
+	clk_disable_unprepare(priv->mainck);
+
+	return 0;
+}
+
+static struct platform_driver sdhci_at91_driver = {
+	.driver		= {
+		.name	= "sdhci-at91",
+		.owner	= THIS_MODULE,
+		.of_match_table = sdhci_at91_dt_match,
+		.pm	= SDHCI_PLTFM_PMOPS,
+	},
+	.probe		= sdhci_at91_probe,
+	.remove		= sdhci_at91_remove,
+};
+
+module_platform_driver(sdhci_at91_driver);
+
+MODULE_DESCRIPTION("SDHCI driver for at91");
+MODULE_AUTHOR("Ludovic Desroches <ludovic.desroches@atmel.com>");
+MODULE_LICENSE("GPL v2");
-- 
cgit v1.2.3


From cfa0327b0d03091e0c47249c080e50e287be762d Mon Sep 17 00:00:00 2001
From: Wolfram Sang <wsa+renesas@sang-engineering.com>
Date: Mon, 27 Jul 2015 14:03:38 +0200
Subject: i2c: support 10 bit and slave addresses in sysfs 'new_device'

We now have seperate address spaces for 10 bit and we-are-slave clients.
Update the sysfs device instantiation method to support these types by
accepting the address offsets that are assigned to the extra address
spaces. Update the documentation, too.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
---
 Documentation/i2c/slave-interface   |  9 ++++++---
 Documentation/i2c/ten-bit-addresses |  4 ++++
 drivers/i2c/i2c-core.c              | 12 +++++++++++-
 3 files changed, 21 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/i2c/slave-interface b/Documentation/i2c/slave-interface
index 2dee4e2d62df..61ed05cd9531 100644
--- a/Documentation/i2c/slave-interface
+++ b/Documentation/i2c/slave-interface
@@ -31,10 +31,13 @@ User manual
 ===========
 
 I2C slave backends behave like standard I2C clients. So, you can instantiate
-them as described in the document 'instantiating-devices'. A quick example for
-instantiating the slave-eeprom driver from userspace at address 0x64 on bus 1:
+them as described in the document 'instantiating-devices'. The only difference
+is that i2c slave backends have their own address space. So, you have to add
+0x1000 to the address you would originally request. An example for
+instantiating the slave-eeprom driver from userspace at the 7 bit address 0x64
+on bus 1:
 
-  # echo slave-24c02 0x64 > /sys/bus/i2c/devices/i2c-1/new_device
+  # echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-1/new_device
 
 Each backend should come with separate documentation to describe its specific
 behaviour and setup.
diff --git a/Documentation/i2c/ten-bit-addresses b/Documentation/i2c/ten-bit-addresses
index cdfe13901b99..7b2d11e53a49 100644
--- a/Documentation/i2c/ten-bit-addresses
+++ b/Documentation/i2c/ten-bit-addresses
@@ -2,6 +2,10 @@ The I2C protocol knows about two kinds of device addresses: normal 7 bit
 addresses, and an extended set of 10 bit addresses. The sets of addresses
 do not intersect: the 7 bit address 0x10 is not the same as the 10 bit
 address 0x10 (though a single device could respond to both of them).
+To avoid ambiguity, the user sees 10 bit addresses mapped to a different
+address space, namely 0xa000-0xa3ff. The leading 0xa (= 10) represents the
+10 bit mode. This is used for creating device names in sysfs. It is also
+needed when instantiating 10 bit devices via the new_device file in sysfs.
 
 I2C messages to and from 10-bit address devices have a different format.
 See the I2C specification for the details.
diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index fc6d89316144..039817eaecb5 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -1158,6 +1158,16 @@ i2c_sysfs_new_device(struct device *dev, struct device_attribute *attr,
 		return -EINVAL;
 	}
 
+	if ((info.addr & I2C_ADDR_OFFSET_TEN_BIT) == I2C_ADDR_OFFSET_TEN_BIT) {
+		info.addr &= ~I2C_ADDR_OFFSET_TEN_BIT;
+		info.flags |= I2C_CLIENT_TEN;
+	}
+
+	if (info.addr & I2C_ADDR_OFFSET_SLAVE) {
+		info.addr &= ~I2C_ADDR_OFFSET_SLAVE;
+		info.flags |= I2C_CLIENT_SLAVE;
+	}
+
 	client = i2c_new_device(adap, &info);
 	if (!client)
 		return -EINVAL;
@@ -1209,7 +1219,7 @@ i2c_sysfs_delete_device(struct device *dev, struct device_attribute *attr,
 			  i2c_adapter_depth(adap));
 	list_for_each_entry_safe(client, next, &adap->userspace_clients,
 				 detected) {
-		if (client->addr == addr) {
+		if (i2c_encode_flags_to_addr(client) == addr) {
 			dev_info(dev, "%s: Deleting device %s at 0x%02hx\n",
 				 "delete_device", client->name, client->addr);
 
-- 
cgit v1.2.3


From 7a59b00a0906945f7fe25a10332ac0820491a0c3 Mon Sep 17 00:00:00 2001
From: Wolfram Sang <wsa+renesas@sang-engineering.com>
Date: Sat, 8 Aug 2015 13:35:18 +0200
Subject: i2c: dt: describe generic bindings

Start a new file which describes the generic bindings used for I2C with
device tree. So we have a central place to look for them, increase
visibility of them, and hopefully reduce the amount of custom properties
introduced.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Vaibhav Hiremath <vaibhav.hiremath@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
---
 Documentation/devicetree/bindings/i2c/i2c.txt | 33 +++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/i2c/i2c.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/i2c/i2c.txt b/Documentation/devicetree/bindings/i2c/i2c.txt
new file mode 100644
index 000000000000..1175efed4a41
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/i2c.txt
@@ -0,0 +1,33 @@
+Generic device tree bindings for I2C busses
+===========================================
+
+This document describes generic bindings which can be used to describe I2C
+busses in a device tree.
+
+Required properties
+-------------------
+
+- #address-cells  - should be <1>. Read more about addresses below.
+- #size-cells     - should be <0>.
+- compatible      - name of I2C bus controller following generic names
+		    recommended practice.
+
+For other required properties e.g. to describe register sets, interrupts,
+clocks, etc. check the binding documentation of the specific driver.
+
+The cells properties above define that an address of children of an I2C bus
+are described by a single value. This is usually a 7 bit address. However,
+flags can be attached to the address. I2C_TEN_BIT_ADDRESS is used to mark a 10
+bit address. It is needed to avoid the ambiguity between e.g. a 7 bit address
+of 0x50 and a 10 bit address of 0x050 which, in theory, can be on the same bus.
+Another flag is I2C_OWN_SLAVE_ADDRESS to mark addresses on which we listen to
+be devices ourselves.
+
+Optional properties
+-------------------
+
+These properties may not be supported by all drivers. However, if a driver
+wants to support one of the below features, it should adapt the bindings below.
+
+- clock-frequency	- frequency of bus clock in Hz
+- wakeup-source		- device can be used as a wakeup source.
-- 
cgit v1.2.3


From b3fdd32799d834e2626fae087906e886037350c6 Mon Sep 17 00:00:00 2001
From: York Sun <yorksun@freescale.com>
Date: Mon, 17 Aug 2015 11:53:48 -0700
Subject: i2c: mux: Add register-based mux i2c-mux-reg

Based on i2c-mux-gpio driver, similarly the register-based mux
switch from one bus to another by setting a single register.
The register can be on PCIe bus, local bus, or any memory-mapped
address. The endianness of such register can be specified in device
tree if used, or in platform data.

Signed-off-by: York Sun <yorksun@freescale.com>
Acked-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
---
 .../devicetree/bindings/i2c/i2c-mux-reg.txt        |  74 ++++++
 drivers/i2c/muxes/Kconfig                          |  11 +
 drivers/i2c/muxes/Makefile                         |   1 +
 drivers/i2c/muxes/i2c-mux-reg.c                    | 294 +++++++++++++++++++++
 include/linux/platform_data/i2c-mux-reg.h          |  44 +++
 5 files changed, 424 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/i2c/i2c-mux-reg.txt
 create mode 100644 drivers/i2c/muxes/i2c-mux-reg.c
 create mode 100644 include/linux/platform_data/i2c-mux-reg.h

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/i2c/i2c-mux-reg.txt b/Documentation/devicetree/bindings/i2c/i2c-mux-reg.txt
new file mode 100644
index 000000000000..688783fbe696
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/i2c-mux-reg.txt
@@ -0,0 +1,74 @@
+Register-based I2C Bus Mux
+
+This binding describes an I2C bus multiplexer that uses a single register
+to route the I2C signals.
+
+Required properties:
+- compatible: i2c-mux-reg
+- i2c-parent: The phandle of the I2C bus that this multiplexer's master-side
+  port is connected to.
+* Standard I2C mux properties. See mux.txt in this directory.
+* I2C child bus nodes. See mux.txt in this directory.
+
+Optional properties:
+- reg: this pair of <offset size> specifies the register to control the mux.
+  The <offset size> depends on its parent node. It can be any memory-mapped
+  address. The size must be either 1, 2, or 4 bytes. If reg is omitted, the
+  resource of this device will be used.
+- little-endian: The existence indicates the register is in little endian.
+- big-endian: The existence indicates the register is in big endian.
+  If both little-endian and big-endian are omitted, the endianness of the
+  CPU will be used.
+- write-only: The existence indicates the register is write-only.
+- idle-state: value to set the muxer to when idle. When no value is
+  given, it defaults to the last value used.
+
+Whenever an access is made to a device on a child bus, the value set
+in the revelant node's reg property will be output to the register.
+
+If an idle state is defined, using the idle-state (optional) property,
+whenever an access is not being made to a device on a child bus, the
+register will be set according to the idle value.
+
+If an idle state is not defined, the most recently used value will be
+left programmed into the register.
+
+Example of a mux on PCIe card, the host is a powerpc SoC (big endian):
+
+	i2c-mux {
+		/* the <offset size> depends on the address translation
+		 * of the parent device. If omitted, device resource
+		 * will be used instead. The size is to determine
+		 * whether iowrite32, iowrite16, or iowrite8 will be used.
+		 */
+		reg = <0x6028 0x4>;
+		little-endian;		/* little endian register on PCIe */
+		compatible = "i2c-mux-reg";
+		#address-cells = <1>;
+		#size-cells = <0>;
+		i2c-parent = <&i2c1>;
+		i2c@0 {
+			reg = <0>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			si5338: clock-generator@70 {
+				compatible = "silabs,si5338";
+				reg = <0x70>;
+				/* other stuff */
+			};
+		};
+
+		i2c@1 {
+			/* data is written using iowrite32 */
+			reg = <1>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+
+			si5338: clock-generator@70 {
+				compatible = "silabs,si5338";
+				reg = <0x70>;
+				/* other stuff */
+			};
+		};
+	};
diff --git a/drivers/i2c/muxes/Kconfig b/drivers/i2c/muxes/Kconfig
index fdd0769c84a3..f06b0e24673b 100644
--- a/drivers/i2c/muxes/Kconfig
+++ b/drivers/i2c/muxes/Kconfig
@@ -61,4 +61,15 @@ config I2C_MUX_PINCTRL
 	  This driver can also be built as a module. If so, the module will be
 	  called pinctrl-i2cmux.
 
+config I2C_MUX_REG
+	tristate "Register-based I2C multiplexer"
+	help
+	  If you say yes to this option, support will be included for a
+	  register based I2C multiplexer. This driver provides access to
+	  I2C busses connected through a MUX, which is controlled
+	  by a single register.
+
+	  This driver can also be built as a module.  If so, the module
+	  will be called i2c-mux-reg.
+
 endmenu
diff --git a/drivers/i2c/muxes/Makefile b/drivers/i2c/muxes/Makefile
index 465778b5d5dc..e89799b76a92 100644
--- a/drivers/i2c/muxes/Makefile
+++ b/drivers/i2c/muxes/Makefile
@@ -7,5 +7,6 @@ obj-$(CONFIG_I2C_MUX_GPIO)	+= i2c-mux-gpio.o
 obj-$(CONFIG_I2C_MUX_PCA9541)	+= i2c-mux-pca9541.o
 obj-$(CONFIG_I2C_MUX_PCA954x)	+= i2c-mux-pca954x.o
 obj-$(CONFIG_I2C_MUX_PINCTRL)	+= i2c-mux-pinctrl.o
+obj-$(CONFIG_I2C_MUX_REG)	+= i2c-mux-reg.o
 
 ccflags-$(CONFIG_I2C_DEBUG_BUS) := -DDEBUG
diff --git a/drivers/i2c/muxes/i2c-mux-reg.c b/drivers/i2c/muxes/i2c-mux-reg.c
new file mode 100644
index 000000000000..86d41d36a783
--- /dev/null
+++ b/drivers/i2c/muxes/i2c-mux-reg.c
@@ -0,0 +1,294 @@
+/*
+ * I2C multiplexer using a single register
+ *
+ * Copyright 2015 Freescale Semiconductor
+ * York Sun  <yorksun@freescale.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ */
+
+#include <linux/i2c.h>
+#include <linux/i2c-mux.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/of_address.h>
+#include <linux/platform_data/i2c-mux-reg.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+
+struct regmux {
+	struct i2c_adapter *parent;
+	struct i2c_adapter **adap; /* child busses */
+	struct i2c_mux_reg_platform_data data;
+};
+
+static int i2c_mux_reg_set(const struct regmux *mux, unsigned int chan_id)
+{
+	if (!mux->data.reg)
+		return -EINVAL;
+
+	switch (mux->data.reg_size) {
+	case 4:
+		if (mux->data.little_endian) {
+			iowrite32(chan_id, mux->data.reg);
+			if (!mux->data.write_only)
+				ioread32(mux->data.reg);
+		} else {
+			iowrite32be(chan_id, mux->data.reg);
+			if (!mux->data.write_only)
+				ioread32(mux->data.reg);
+		}
+		break;
+	case 2:
+		if (mux->data.little_endian) {
+			iowrite16(chan_id, mux->data.reg);
+			if (!mux->data.write_only)
+				ioread16(mux->data.reg);
+		} else {
+			iowrite16be(chan_id, mux->data.reg);
+			if (!mux->data.write_only)
+				ioread16be(mux->data.reg);
+		}
+		break;
+	case 1:
+		iowrite8(chan_id, mux->data.reg);
+		if (!mux->data.write_only)
+			ioread8(mux->data.reg);
+		break;
+	default:
+		pr_err("Invalid register size\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int i2c_mux_reg_select(struct i2c_adapter *adap, void *data,
+			      unsigned int chan)
+{
+	struct regmux *mux = data;
+
+	return i2c_mux_reg_set(mux, chan);
+}
+
+static int i2c_mux_reg_deselect(struct i2c_adapter *adap, void *data,
+				unsigned int chan)
+{
+	struct regmux *mux = data;
+
+	if (mux->data.idle_in_use)
+		return i2c_mux_reg_set(mux, mux->data.idle);
+
+	return 0;
+}
+
+#ifdef CONFIG_OF
+static int i2c_mux_reg_probe_dt(struct regmux *mux,
+					struct platform_device *pdev)
+{
+	struct device_node *np = pdev->dev.of_node;
+	struct device_node *adapter_np, *child;
+	struct i2c_adapter *adapter;
+	struct resource res;
+	unsigned *values;
+	int i = 0;
+
+	if (!np)
+		return -ENODEV;
+
+	adapter_np = of_parse_phandle(np, "i2c-parent", 0);
+	if (!adapter_np) {
+		dev_err(&pdev->dev, "Cannot parse i2c-parent\n");
+		return -ENODEV;
+	}
+	adapter = of_find_i2c_adapter_by_node(adapter_np);
+	if (!adapter)
+		return -EPROBE_DEFER;
+
+	mux->parent = adapter;
+	mux->data.parent = i2c_adapter_id(adapter);
+	put_device(&adapter->dev);
+
+	mux->data.n_values = of_get_child_count(np);
+	if (of_find_property(np, "little-endian", NULL)) {
+		mux->data.little_endian = true;
+	} else if (of_find_property(np, "big-endian", NULL)) {
+		mux->data.little_endian = false;
+	} else {
+#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __LITTLE_ENDIAN : \
+	defined(__LITTLE_ENDIAN)
+		mux->data.little_endian = true;
+#elif defined(__BYTE_ORDER) ? __BYTE_ORDER == __BIG_ENDIAN : \
+	defined(__BIG_ENDIAN)
+		mux->data.little_endian = false;
+#else
+#error Endianness not defined?
+#endif
+	}
+	if (of_find_property(np, "write-only", NULL))
+		mux->data.write_only = true;
+	else
+		mux->data.write_only = false;
+
+	values = devm_kzalloc(&pdev->dev,
+			      sizeof(*mux->data.values) * mux->data.n_values,
+			      GFP_KERNEL);
+	if (!values) {
+		dev_err(&pdev->dev, "Cannot allocate values array");
+		return -ENOMEM;
+	}
+
+	for_each_child_of_node(np, child) {
+		of_property_read_u32(child, "reg", values + i);
+		i++;
+	}
+	mux->data.values = values;
+
+	if (!of_property_read_u32(np, "idle-state", &mux->data.idle))
+		mux->data.idle_in_use = true;
+
+	/* map address from "reg" if exists */
+	if (of_address_to_resource(np, 0, &res)) {
+		mux->data.reg_size = resource_size(&res);
+		if (mux->data.reg_size > 4) {
+			dev_err(&pdev->dev, "Invalid address size\n");
+			return -EINVAL;
+		}
+		mux->data.reg = devm_ioremap_resource(&pdev->dev, &res);
+		if (IS_ERR(mux->data.reg))
+			return PTR_ERR(mux->data.reg);
+	}
+
+	return 0;
+}
+#else
+static int i2c_mux_reg_probe_dt(struct gpiomux *mux,
+					struct platform_device *pdev)
+{
+	return 0;
+}
+#endif
+
+static int i2c_mux_reg_probe(struct platform_device *pdev)
+{
+	struct regmux *mux;
+	struct i2c_adapter *parent;
+	struct resource *res;
+	int (*deselect)(struct i2c_adapter *, void *, u32);
+	unsigned int class;
+	int i, ret, nr;
+
+	mux = devm_kzalloc(&pdev->dev, sizeof(*mux), GFP_KERNEL);
+	if (!mux)
+		return -ENOMEM;
+
+	platform_set_drvdata(pdev, mux);
+
+	if (dev_get_platdata(&pdev->dev)) {
+		memcpy(&mux->data, dev_get_platdata(&pdev->dev),
+			sizeof(mux->data));
+
+		parent = i2c_get_adapter(mux->data.parent);
+		if (!parent)
+			return -EPROBE_DEFER;
+
+		mux->parent = parent;
+	} else {
+		ret = i2c_mux_reg_probe_dt(mux, pdev);
+		if (ret < 0) {
+			dev_err(&pdev->dev, "Error parsing device tree");
+			return ret;
+		}
+	}
+
+	if (!mux->data.reg) {
+		dev_info(&pdev->dev,
+			"Register not set, using platform resource\n");
+		res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+		mux->data.reg_size = resource_size(res);
+		if (mux->data.reg_size > 4) {
+			dev_err(&pdev->dev, "Invalid resource size\n");
+			return -EINVAL;
+		}
+		mux->data.reg = devm_ioremap_resource(&pdev->dev, res);
+		if (IS_ERR(mux->data.reg))
+			return PTR_ERR(mux->data.reg);
+	}
+
+	mux->adap = devm_kzalloc(&pdev->dev,
+				 sizeof(*mux->adap) * mux->data.n_values,
+				 GFP_KERNEL);
+	if (!mux->adap) {
+		dev_err(&pdev->dev, "Cannot allocate i2c_adapter structure");
+		return -ENOMEM;
+	}
+
+	if (mux->data.idle_in_use)
+		deselect = i2c_mux_reg_deselect;
+	else
+		deselect = NULL;
+
+	for (i = 0; i < mux->data.n_values; i++) {
+		nr = mux->data.base_nr ? (mux->data.base_nr + i) : 0;
+		class = mux->data.classes ? mux->data.classes[i] : 0;
+
+		mux->adap[i] = i2c_add_mux_adapter(mux->parent, &pdev->dev, mux,
+						   nr, mux->data.values[i],
+						   class, i2c_mux_reg_select,
+						   deselect);
+		if (!mux->adap[i]) {
+			ret = -ENODEV;
+			dev_err(&pdev->dev, "Failed to add adapter %d\n", i);
+			goto add_adapter_failed;
+		}
+	}
+
+	dev_dbg(&pdev->dev, "%d port mux on %s adapter\n",
+		 mux->data.n_values, mux->parent->name);
+
+	return 0;
+
+add_adapter_failed:
+	for (; i > 0; i--)
+		i2c_del_mux_adapter(mux->adap[i - 1]);
+
+	return ret;
+}
+
+static int i2c_mux_reg_remove(struct platform_device *pdev)
+{
+	struct regmux *mux = platform_get_drvdata(pdev);
+	int i;
+
+	for (i = 0; i < mux->data.n_values; i++)
+		i2c_del_mux_adapter(mux->adap[i]);
+
+	i2c_put_adapter(mux->parent);
+
+	return 0;
+}
+
+static const struct of_device_id i2c_mux_reg_of_match[] = {
+	{ .compatible = "i2c-mux-reg", },
+	{},
+};
+MODULE_DEVICE_TABLE(of, i2c_mux_reg_of_match);
+
+static struct platform_driver i2c_mux_reg_driver = {
+	.probe	= i2c_mux_reg_probe,
+	.remove	= i2c_mux_reg_remove,
+	.driver	= {
+		.name	= "i2c-mux-reg",
+	},
+};
+
+module_platform_driver(i2c_mux_reg_driver);
+
+MODULE_DESCRIPTION("Register-based I2C multiplexer driver");
+MODULE_AUTHOR("York Sun <yorksun@freescale.com>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("platform:i2c-mux-reg");
diff --git a/include/linux/platform_data/i2c-mux-reg.h b/include/linux/platform_data/i2c-mux-reg.h
new file mode 100644
index 000000000000..c68712aadf43
--- /dev/null
+++ b/include/linux/platform_data/i2c-mux-reg.h
@@ -0,0 +1,44 @@
+/*
+ * I2C multiplexer using a single register
+ *
+ * Copyright 2015 Freescale Semiconductor
+ * York Sun <yorksun@freescale.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ */
+
+#ifndef __LINUX_PLATFORM_DATA_I2C_MUX_REG_H
+#define __LINUX_PLATFORM_DATA_I2C_MUX_REG_H
+
+/**
+ * struct i2c_mux_reg_platform_data - Platform-dependent data for i2c-mux-reg
+ * @parent: Parent I2C bus adapter number
+ * @base_nr: Base I2C bus number to number adapters from or zero for dynamic
+ * @values: Array of value for each channel
+ * @n_values: Number of multiplexer channels
+ * @little_endian: Indicating if the register is in little endian
+ * @write_only: Reading the register is not allowed by hardware
+ * @classes: Optional I2C auto-detection classes
+ * @idle: Value to write to mux when idle
+ * @idle_in_use: indicate if idle value is in use
+ * @reg: Virtual address of the register to switch channel
+ * @reg_size: register size in bytes
+ */
+struct i2c_mux_reg_platform_data {
+	int parent;
+	int base_nr;
+	const unsigned int *values;
+	int n_values;
+	bool little_endian;
+	bool write_only;
+	const unsigned int *classes;
+	u32 idle;
+	bool idle_in_use;
+	void __iomem *reg;
+	resource_size_t reg_size;
+};
+
+#endif	/* __LINUX_PLATFORM_DATA_I2C_MUX_REG_H */
-- 
cgit v1.2.3


From 3f9c37a0c9a59db97ca5712eca7838b842949047 Mon Sep 17 00:00:00 2001
From: Joachim Eastwood <manabian@gmail.com>
Date: Sun, 16 Aug 2015 20:10:16 +0200
Subject: i2c: lpc2k: add driver

Add support for the I2C controller found on several NXP devices
including LPC2xxx, LPC178x/7x and LPC18xx/43xx. The controller
is implemented as a state machine and the driver act upon the
state changes when the bus is accessed.

The I2C controller supports master/slave operation, bus
arbitration, programmable clock rate, and speeds up to 1 Mbit/s.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
---
 .../devicetree/bindings/i2c/i2c-lpc2k.txt          |  33 ++
 drivers/i2c/busses/Kconfig                         |  10 +
 drivers/i2c/busses/Makefile                        |   1 +
 drivers/i2c/busses/i2c-lpc2k.c                     | 513 +++++++++++++++++++++
 4 files changed, 557 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/i2c/i2c-lpc2k.txt
 create mode 100644 drivers/i2c/busses/i2c-lpc2k.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/i2c/i2c-lpc2k.txt b/Documentation/devicetree/bindings/i2c/i2c-lpc2k.txt
new file mode 100644
index 000000000000..4101aa621ad4
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/i2c-lpc2k.txt
@@ -0,0 +1,33 @@
+NXP I2C controller for LPC2xxx/178x/18xx/43xx
+
+Required properties:
+ - compatible: must be "nxp,lpc1788-i2c"
+ - reg: physical address and length of the device registers
+ - interrupts: a single interrupt specifier
+ - clocks: clock for the device
+ - #address-cells: should be <1>
+ - #size-cells: should be <0>
+
+Optional properties:
+- clock-frequency: the desired I2C bus clock frequency in Hz; in
+  absence of this property the default value is used (100 kHz).
+
+Example:
+i2c0: i2c@400a1000 {
+	compatible = "nxp,lpc1788-i2c";
+	reg = <0x400a1000 0x1000>;
+	interrupts = <18>;
+	clocks = <&ccu1 CLK_APB1_I2C0>;
+	#address-cells = <1>;
+	#size-cells = <0>;
+};
+
+&i2c0 {
+	clock-frequency = <400000>;
+
+	lm75@48 {
+		compatible = "nxp,lm75";
+		reg = <0x48>;
+	};
+};
+
diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 0b798ae708fe..48f4b796003c 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -619,6 +619,16 @@ config I2C_KEMPLD
 	  This driver can also be built as a module. If so, the module
 	  will be called i2c-kempld.
 
+config I2C_LPC2K
+	tristate "I2C bus support for NXP LPC2K/LPC178x/18xx/43xx"
+	depends on OF && (ARCH_LPC18XX || COMPILE_TEST)
+	help
+	  This driver supports the I2C interface found several NXP
+	  devices including LPC2xxx, LPC178x/7x and LPC18xx/43xx.
+
+	  This driver can also be built as a module.  If so, the module
+	  will be called i2c-lpc2k.
+
 config I2C_MESON
 	tristate "Amlogic Meson I2C controller"
 	depends on ARCH_MESON
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 50e8bbb65f1c..6df3b303bd09 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_I2C_IMX)		+= i2c-imx.o
 obj-$(CONFIG_I2C_IOP3XX)	+= i2c-iop3xx.o
 obj-$(CONFIG_I2C_JZ4780)	+= i2c-jz4780.o
 obj-$(CONFIG_I2C_KEMPLD)	+= i2c-kempld.o
+obj-$(CONFIG_I2C_LPC2K)		+= i2c-lpc2k.o
 obj-$(CONFIG_I2C_MESON)		+= i2c-meson.o
 obj-$(CONFIG_I2C_MPC)		+= i2c-mpc.o
 obj-$(CONFIG_I2C_MT65XX)	+= i2c-mt65xx.o
diff --git a/drivers/i2c/busses/i2c-lpc2k.c b/drivers/i2c/busses/i2c-lpc2k.c
new file mode 100644
index 000000000000..8560a13bf1b3
--- /dev/null
+++ b/drivers/i2c/busses/i2c-lpc2k.c
@@ -0,0 +1,513 @@
+/*
+ * Copyright (C) 2011 NXP Semiconductors
+ *
+ * Code portions referenced from the i2x-pxa and i2c-pnx drivers
+ *
+ * Make SMBus byte and word transactions work on LPC178x/7x
+ * Copyright (c) 2012
+ * Alexander Potashev, Emcraft Systems, aspotashev@emcraft.com
+ * Anton Protopopov, Emcraft Systems, antonp@emcraft.com
+ *
+ * Copyright (C) 2015 Joachim Eastwood <manabian@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#include <linux/clk.h>
+#include <linux/errno.h>
+#include <linux/i2c.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/platform_device.h>
+#include <linux/sched.h>
+#include <linux/time.h>
+
+/* LPC24xx register offsets and bits */
+#define LPC24XX_I2CONSET	0x00
+#define LPC24XX_I2STAT		0x04
+#define LPC24XX_I2DAT		0x08
+#define LPC24XX_I2ADDR		0x0c
+#define LPC24XX_I2SCLH		0x10
+#define LPC24XX_I2SCLL		0x14
+#define LPC24XX_I2CONCLR	0x18
+
+#define LPC24XX_AA		BIT(2)
+#define LPC24XX_SI		BIT(3)
+#define LPC24XX_STO		BIT(4)
+#define LPC24XX_STA		BIT(5)
+#define LPC24XX_I2EN		BIT(6)
+
+#define LPC24XX_STO_AA		(LPC24XX_STO | LPC24XX_AA)
+#define LPC24XX_CLEAR_ALL	(LPC24XX_AA | LPC24XX_SI | LPC24XX_STO | \
+				 LPC24XX_STA | LPC24XX_I2EN)
+
+/* I2C SCL clock has different duty cycle depending on mode */
+#define I2C_STD_MODE_DUTY		46
+#define I2C_FAST_MODE_DUTY		36
+#define I2C_FAST_MODE_PLUS_DUTY		38
+
+/*
+ * 26 possible I2C status codes, but codes applicable only
+ * to master are listed here and used in this driver
+ */
+enum {
+	M_BUS_ERROR		= 0x00,
+	M_START			= 0x08,
+	M_REPSTART		= 0x10,
+	MX_ADDR_W_ACK		= 0x18,
+	MX_ADDR_W_NACK		= 0x20,
+	MX_DATA_W_ACK		= 0x28,
+	MX_DATA_W_NACK		= 0x30,
+	M_DATA_ARB_LOST		= 0x38,
+	MR_ADDR_R_ACK		= 0x40,
+	MR_ADDR_R_NACK		= 0x48,
+	MR_DATA_R_ACK		= 0x50,
+	MR_DATA_R_NACK		= 0x58,
+	M_I2C_IDLE		= 0xf8,
+};
+
+struct lpc2k_i2c {
+	void __iomem		*base;
+	struct clk		*clk;
+	int			irq;
+	wait_queue_head_t	wait;
+	struct i2c_adapter	adap;
+	struct i2c_msg		*msg;
+	int			msg_idx;
+	int			msg_status;
+	int			is_last;
+};
+
+static void i2c_lpc2k_reset(struct lpc2k_i2c *i2c)
+{
+	/* Will force clear all statuses */
+	writel(LPC24XX_CLEAR_ALL, i2c->base + LPC24XX_I2CONCLR);
+	writel(0, i2c->base + LPC24XX_I2ADDR);
+	writel(LPC24XX_I2EN, i2c->base + LPC24XX_I2CONSET);
+}
+
+static int i2c_lpc2k_clear_arb(struct lpc2k_i2c *i2c)
+{
+	unsigned long timeout = jiffies + msecs_to_jiffies(1000);
+
+	/*
+	 * If the transfer needs to abort for some reason, we'll try to
+	 * force a stop condition to clear any pending bus conditions
+	 */
+	writel(LPC24XX_STO, i2c->base + LPC24XX_I2CONSET);
+
+	/* Wait for status change */
+	while (readl(i2c->base + LPC24XX_I2STAT) != M_I2C_IDLE) {
+		if (time_after(jiffies, timeout)) {
+			/* Bus was not idle, try to reset adapter */
+			i2c_lpc2k_reset(i2c);
+			return -EBUSY;
+		}
+
+		cpu_relax();
+	}
+
+	return 0;
+}
+
+static void i2c_lpc2k_pump_msg(struct lpc2k_i2c *i2c)
+{
+	unsigned char data;
+	u32 status;
+
+	/*
+	 * I2C in the LPC2xxx series is basically a state machine.
+	 * Just run through the steps based on the current status.
+	 */
+	status = readl(i2c->base + LPC24XX_I2STAT);
+
+	switch (status) {
+	case M_START:
+	case M_REPSTART:
+		/* Start bit was just sent out, send out addr and dir */
+		data = i2c->msg->addr << 1;
+		if (i2c->msg->flags & I2C_M_RD)
+			data |= 1;
+
+		writel(data, i2c->base + LPC24XX_I2DAT);
+		writel(LPC24XX_STA, i2c->base + LPC24XX_I2CONCLR);
+		break;
+
+	case MX_ADDR_W_ACK:
+	case MX_DATA_W_ACK:
+		/*
+		 * Address or data was sent out with an ACK. If there is more
+		 * data to send, send it now
+		 */
+		if (i2c->msg_idx < i2c->msg->len) {
+			writel(i2c->msg->buf[i2c->msg_idx],
+			       i2c->base + LPC24XX_I2DAT);
+		} else if (i2c->is_last) {
+			/* Last message, send stop */
+			writel(LPC24XX_STO_AA, i2c->base + LPC24XX_I2CONSET);
+			writel(LPC24XX_SI, i2c->base + LPC24XX_I2CONCLR);
+			i2c->msg_status = 0;
+			disable_irq_nosync(i2c->irq);
+		} else {
+			i2c->msg_status = 0;
+			disable_irq_nosync(i2c->irq);
+		}
+
+		i2c->msg_idx++;
+		break;
+
+	case MR_ADDR_R_ACK:
+		/* Receive first byte from slave */
+		if (i2c->msg->len == 1) {
+			/* Last byte, return NACK */
+			writel(LPC24XX_AA, i2c->base + LPC24XX_I2CONCLR);
+		} else {
+			/* Not last byte, return ACK */
+			writel(LPC24XX_AA, i2c->base + LPC24XX_I2CONSET);
+		}
+
+		writel(LPC24XX_STA, i2c->base + LPC24XX_I2CONCLR);
+		break;
+
+	case MR_DATA_R_NACK:
+		/*
+		 * The I2C shows NACK status on reads, so we need to accept
+		 * the NACK as an ACK here. This should be ok, as the real
+		 * BACK would of been caught on the address write.
+		 */
+	case MR_DATA_R_ACK:
+		/* Data was received */
+		if (i2c->msg_idx < i2c->msg->len) {
+			i2c->msg->buf[i2c->msg_idx] =
+					readl(i2c->base + LPC24XX_I2DAT);
+		}
+
+		/* If transfer is done, send STOP */
+		if (i2c->msg_idx >= i2c->msg->len - 1 && i2c->is_last) {
+			writel(LPC24XX_STO_AA, i2c->base + LPC24XX_I2CONSET);
+			writel(LPC24XX_SI, i2c->base + LPC24XX_I2CONCLR);
+			i2c->msg_status = 0;
+		}
+
+		/* Message is done */
+		if (i2c->msg_idx >= i2c->msg->len - 1) {
+			i2c->msg_status = 0;
+			disable_irq_nosync(i2c->irq);
+		}
+
+		/*
+		 * One pre-last data input, send NACK to tell the slave that
+		 * this is going to be the last data byte to be transferred.
+		 */
+		if (i2c->msg_idx >= i2c->msg->len - 2) {
+			/* One byte left to receive - NACK */
+			writel(LPC24XX_AA, i2c->base + LPC24XX_I2CONCLR);
+		} else {
+			/* More than one byte left to receive - ACK */
+			writel(LPC24XX_AA, i2c->base + LPC24XX_I2CONSET);
+		}
+
+		writel(LPC24XX_STA, i2c->base + LPC24XX_I2CONCLR);
+		i2c->msg_idx++;
+		break;
+
+	case MX_ADDR_W_NACK:
+	case MX_DATA_W_NACK:
+	case MR_ADDR_R_NACK:
+		/* NACK processing is done */
+		writel(LPC24XX_STO_AA, i2c->base + LPC24XX_I2CONSET);
+		i2c->msg_status = -ENXIO;
+		disable_irq_nosync(i2c->irq);
+		break;
+
+	case M_DATA_ARB_LOST:
+		/* Arbitration lost */
+		i2c->msg_status = -EAGAIN;
+
+		/* Release the I2C bus */
+		writel(LPC24XX_STA | LPC24XX_STO, i2c->base + LPC24XX_I2CONCLR);
+		disable_irq_nosync(i2c->irq);
+		break;
+
+	default:
+		/* Unexpected statuses */
+		i2c->msg_status = -EIO;
+		disable_irq_nosync(i2c->irq);
+		break;
+	}
+
+	/* Exit on failure or all bytes transferred */
+	if (i2c->msg_status != -EBUSY)
+		wake_up(&i2c->wait);
+
+	/*
+	 * If `msg_status` is zero, then `lpc2k_process_msg()`
+	 * is responsible for clearing the SI flag.
+	 */
+	if (i2c->msg_status != 0)
+		writel(LPC24XX_SI, i2c->base + LPC24XX_I2CONCLR);
+}
+
+static int lpc2k_process_msg(struct lpc2k_i2c *i2c, int msgidx)
+{
+	/* A new transfer is kicked off by initiating a start condition */
+	if (!msgidx) {
+		writel(LPC24XX_STA, i2c->base + LPC24XX_I2CONSET);
+	} else {
+		/*
+		 * A multi-message I2C transfer continues where the
+		 * previous I2C transfer left off and uses the
+		 * current condition of the I2C adapter.
+		 */
+		if (unlikely(i2c->msg->flags & I2C_M_NOSTART)) {
+			WARN_ON(i2c->msg->len == 0);
+
+			if (!(i2c->msg->flags & I2C_M_RD)) {
+				/* Start transmit of data */
+				writel(i2c->msg->buf[0],
+				       i2c->base + LPC24XX_I2DAT);
+				i2c->msg_idx++;
+			}
+		} else {
+			/* Start or repeated start */
+			writel(LPC24XX_STA, i2c->base + LPC24XX_I2CONSET);
+		}
+
+		writel(LPC24XX_SI, i2c->base + LPC24XX_I2CONCLR);
+	}
+
+	enable_irq(i2c->irq);
+
+	/* Wait for transfer completion */
+	if (wait_event_timeout(i2c->wait, i2c->msg_status != -EBUSY,
+			       msecs_to_jiffies(1000)) == 0) {
+		disable_irq_nosync(i2c->irq);
+
+		return -ETIMEDOUT;
+	}
+
+	return i2c->msg_status;
+}
+
+static int i2c_lpc2k_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs,
+			  int msg_num)
+{
+	struct lpc2k_i2c *i2c = i2c_get_adapdata(adap);
+	int ret, i;
+	u32 stat;
+
+	/* Check for bus idle condition */
+	stat = readl(i2c->base + LPC24XX_I2STAT);
+	if (stat != M_I2C_IDLE) {
+		/* Something is holding the bus, try to clear it */
+		return i2c_lpc2k_clear_arb(i2c);
+	}
+
+	/* Process a single message at a time */
+	for (i = 0; i < msg_num; i++) {
+		/* Save message pointer and current message data index */
+		i2c->msg = &msgs[i];
+		i2c->msg_idx = 0;
+		i2c->msg_status = -EBUSY;
+		i2c->is_last = (i == (msg_num - 1));
+
+		ret = lpc2k_process_msg(i2c, i);
+		if (ret)
+			return ret;
+	}
+
+	return msg_num;
+}
+
+static irqreturn_t i2c_lpc2k_handler(int irq, void *dev_id)
+{
+	struct lpc2k_i2c *i2c = dev_id;
+
+	if (readl(i2c->base + LPC24XX_I2CONSET) & LPC24XX_SI) {
+		i2c_lpc2k_pump_msg(i2c);
+		return IRQ_HANDLED;
+	}
+
+	return IRQ_NONE;
+}
+
+static u32 i2c_lpc2k_functionality(struct i2c_adapter *adap)
+{
+	/* Only emulated SMBus for now */
+	return I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL;
+}
+
+static const struct i2c_algorithm i2c_lpc2k_algorithm = {
+	.master_xfer	= i2c_lpc2k_xfer,
+	.functionality	= i2c_lpc2k_functionality,
+};
+
+static int i2c_lpc2k_probe(struct platform_device *pdev)
+{
+	struct lpc2k_i2c *i2c;
+	struct resource *res;
+	u32 bus_clk_rate;
+	u32 scl_high;
+	u32 clkrate;
+	int ret;
+
+	i2c = devm_kzalloc(&pdev->dev, sizeof(*i2c), GFP_KERNEL);
+	if (!i2c)
+		return -ENOMEM;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	i2c->base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(i2c->base))
+		return PTR_ERR(i2c->base);
+
+	i2c->irq = platform_get_irq(pdev, 0);
+	if (i2c->irq < 0) {
+		dev_err(&pdev->dev, "can't get interrupt resource\n");
+		return i2c->irq;
+	}
+
+	init_waitqueue_head(&i2c->wait);
+
+	i2c->clk = devm_clk_get(&pdev->dev, NULL);
+	if (IS_ERR(i2c->clk)) {
+		dev_err(&pdev->dev, "error getting clock\n");
+		return PTR_ERR(i2c->clk);
+	}
+
+	ret = clk_prepare_enable(i2c->clk);
+	if (ret) {
+		dev_err(&pdev->dev, "unable to enable clock.\n");
+		return ret;
+	}
+
+	ret = devm_request_irq(&pdev->dev, i2c->irq, i2c_lpc2k_handler, 0,
+			       dev_name(&pdev->dev), i2c);
+	if (ret < 0) {
+		dev_err(&pdev->dev, "can't request interrupt.\n");
+		goto fail_clk;
+	}
+
+	disable_irq_nosync(i2c->irq);
+
+	/* Place controller is a known state */
+	i2c_lpc2k_reset(i2c);
+
+	ret = of_property_read_u32(pdev->dev.of_node, "clock-frequency",
+				   &bus_clk_rate);
+	if (ret)
+		bus_clk_rate = 100000; /* 100 kHz default clock rate */
+
+	clkrate = clk_get_rate(i2c->clk);
+	if (clkrate == 0) {
+		dev_err(&pdev->dev, "can't get I2C base clock\n");
+		ret = -EINVAL;
+		goto fail_clk;
+	}
+
+	/* Setup I2C dividers to generate clock with proper duty cycle */
+	clkrate = clkrate / bus_clk_rate;
+	if (bus_clk_rate <= 100000)
+		scl_high = (clkrate * I2C_STD_MODE_DUTY) / 100;
+	else if (bus_clk_rate <= 400000)
+		scl_high = (clkrate * I2C_FAST_MODE_DUTY) / 100;
+	else
+		scl_high = (clkrate * I2C_FAST_MODE_PLUS_DUTY) / 100;
+
+	writel(scl_high, i2c->base + LPC24XX_I2SCLH);
+	writel(clkrate - scl_high, i2c->base + LPC24XX_I2SCLL);
+
+	platform_set_drvdata(pdev, i2c);
+
+	i2c_set_adapdata(&i2c->adap, i2c);
+	i2c->adap.owner = THIS_MODULE;
+	strlcpy(i2c->adap.name, "LPC2K I2C adapter", sizeof(i2c->adap.name));
+	i2c->adap.algo = &i2c_lpc2k_algorithm;
+	i2c->adap.dev.parent = &pdev->dev;
+	i2c->adap.dev.of_node = pdev->dev.of_node;
+
+	ret = i2c_add_adapter(&i2c->adap);
+	if (ret < 0) {
+		dev_err(&pdev->dev, "failed to add adapter!\n");
+		goto fail_clk;
+	}
+
+	dev_info(&pdev->dev, "LPC2K I2C adapter\n");
+
+	return 0;
+
+fail_clk:
+	clk_disable_unprepare(i2c->clk);
+	return ret;
+}
+
+static int i2c_lpc2k_remove(struct platform_device *dev)
+{
+	struct lpc2k_i2c *i2c = platform_get_drvdata(dev);
+
+	i2c_del_adapter(&i2c->adap);
+	clk_disable_unprepare(i2c->clk);
+
+	return 0;
+}
+
+#ifdef CONFIG_PM
+static int i2c_lpc2k_suspend(struct device *dev)
+{
+	struct platform_device *pdev = to_platform_device(dev);
+	struct lpc2k_i2c *i2c = platform_get_drvdata(pdev);
+
+	clk_disable(i2c->clk);
+
+	return 0;
+}
+
+static int i2c_lpc2k_resume(struct device *dev)
+{
+	struct platform_device *pdev = to_platform_device(dev);
+	struct lpc2k_i2c *i2c = platform_get_drvdata(pdev);
+
+	clk_enable(i2c->clk);
+	i2c_lpc2k_reset(i2c);
+
+	return 0;
+}
+
+static const struct dev_pm_ops i2c_lpc2k_dev_pm_ops = {
+	.suspend_noirq = i2c_lpc2k_suspend,
+	.resume_noirq = i2c_lpc2k_resume,
+};
+
+#define I2C_LPC2K_DEV_PM_OPS (&i2c_lpc2k_dev_pm_ops)
+#else
+#define I2C_LPC2K_DEV_PM_OPS NULL
+#endif
+
+static const struct of_device_id lpc2k_i2c_match[] = {
+	{ .compatible = "nxp,lpc1788-i2c" },
+	{},
+};
+MODULE_DEVICE_TABLE(of, lpc2k_i2c_match);
+
+static struct platform_driver i2c_lpc2k_driver = {
+	.probe	= i2c_lpc2k_probe,
+	.remove	= i2c_lpc2k_remove,
+	.driver	= {
+		.name		= "lpc2k-i2c",
+		.pm		= I2C_LPC2K_DEV_PM_OPS,
+		.of_match_table	= lpc2k_i2c_match,
+	},
+};
+module_platform_driver(i2c_lpc2k_driver);
+
+MODULE_AUTHOR("Kevin Wells <kevin.wells@nxp.com>");
+MODULE_DESCRIPTION("I2C driver for LPC2xxx devices");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("platform:lpc2k-i2c");
-- 
cgit v1.2.3


From 89bd794cf60760ef053d3694c6749f651c034e02 Mon Sep 17 00:00:00 2001
From: Javier Martinez Canillas <javier@osg.samsung.com>
Date: Mon, 24 Aug 2015 10:47:18 +0200
Subject: mfd: max77686: Don't suggest in binding to use a deprecated property

The regulator-compatible property from the regulator DT binding was
deprecated. But the max77686 DT binding doc still suggest to use it
instead of the regulator node name's which is the correct approach.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 Documentation/devicetree/bindings/mfd/max77686.txt | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mfd/max77686.txt b/Documentation/devicetree/bindings/mfd/max77686.txt
index 163bd81a4607..8221102d3fc2 100644
--- a/Documentation/devicetree/bindings/mfd/max77686.txt
+++ b/Documentation/devicetree/bindings/mfd/max77686.txt
@@ -26,7 +26,7 @@ Optional node:
 	};
 	refer Documentation/devicetree/bindings/regulator/regulator.txt
 
-  The regulator-compatible property of regulator should initialized with string
+  The regulator node's name should be initialized with a string
 to get matched with their hardware counterparts as follow:
 
 	-LDOn 	:	for LDOs, where n can lie in range 1 to 26.
@@ -55,16 +55,14 @@ Example:
 		reg = <0x09>;
 
 		voltage-regulators {
-			ldo11_reg {
-				regulator-compatible = "LDO11";
+			ldo11_reg: LDO11 {
 				regulator-name = "vdd_ldo11";
 				regulator-min-microvolt = <1900000>;
 				regulator-max-microvolt = <1900000>;
 				regulator-always-on;
 			};
 
-			buck1_reg {
-				regulator-compatible = "BUCK1";
+			buck1_reg: BUCK1 {
 				regulator-name = "vdd_mif";
 				regulator-min-microvolt = <950000>;
 				regulator-max-microvolt = <1300000>;
@@ -72,8 +70,7 @@ Example:
 				regulator-boot-on;
 			};
 
-			buck9_reg {
-				regulator-compatible = "BUCK9";
+			buck9_reg: BUCK9 {
 				regulator-name = "CAM_ISP_CORE_1.2V";
 				regulator-min-microvolt = <1000000>;
 				regulator-max-microvolt = <1200000>;
-- 
cgit v1.2.3


From 00d689169dd088257de7ed218ff2c638bc5ad3b5 Mon Sep 17 00:00:00 2001
From: Javier Martinez Canillas <javier@osg.samsung.com>
Date: Mon, 24 Aug 2015 10:47:19 +0200
Subject: mfd: max77686: Use a generic name for the PMIC node in the example

The ePAPR standard says that: "the name of a node should be somewhat
generic, reflecting the function of the device and not its precise
programming model."

So, change the max77686 binding document example to use a generic
node name instead of using the chip's name.

Suggested-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 Documentation/devicetree/bindings/mfd/max77686.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mfd/max77686.txt b/Documentation/devicetree/bindings/mfd/max77686.txt
index 8221102d3fc2..d2ed3c20a5c3 100644
--- a/Documentation/devicetree/bindings/mfd/max77686.txt
+++ b/Documentation/devicetree/bindings/mfd/max77686.txt
@@ -48,7 +48,7 @@ to get matched with their hardware counterparts as follow:
 
 Example:
 
-	max77686@09 {
+	max77686: pmic@09 {
 		compatible = "maxim,max77686";
 		interrupt-parent = <&wakeup_eint>;
 		interrupts = <26 0>;
-- 
cgit v1.2.3


From 00f1493e9e0c875f16623bdb088108ada6e7647e Mon Sep 17 00:00:00 2001
From: Javier Martinez Canillas <javier@osg.samsung.com>
Date: Mon, 24 Aug 2015 10:47:20 +0200
Subject: mfd: Add DT binding for Maxim MAX77802 IC

The MAX77802 is a chip that contains regulators, 2 32kHz clocks,
a RTC and an I2C interface to program the individual components.

The are already DT bindings for the regulators and clocks and
these reference to a bindings/mfd/max77802.txt file, that didn't
exist, for the details about the PMIC.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 Documentation/devicetree/bindings/mfd/max77802.txt | 26 ++++++++++++++++++++++
 1 file changed, 26 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/max77802.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mfd/max77802.txt b/Documentation/devicetree/bindings/mfd/max77802.txt
new file mode 100644
index 000000000000..51fc1a60caa5
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/max77802.txt
@@ -0,0 +1,26 @@
+Maxim MAX77802 multi-function device
+
+The Maxim MAX77802 is a Power Management IC (PMIC) that contains 10 high
+efficiency Buck regulators, 32 Low-DropOut (LDO) regulators used to power
+up application processors and peripherals, a 2-channel 32kHz clock outputs,
+a Real-Time-Clock (RTC) and a I2C interface to program the individual
+regulators, clocks outputs and the RTC.
+
+Bindings for the built-in 32k clock generator block and
+regulators are defined in ../clk/maxim,max77802.txt and
+../regulator/max77802.txt respectively.
+
+Required properties:
+- compatible		: Must be "maxim,max77802"
+- reg			: Specifies the I2C slave address of PMIC block.
+- interrupts		: I2C device IRQ line connected to the main SoC.
+- interrupt-parent	: The parent interrupt controller.
+
+Example:
+
+	max77802: pmic@09 {
+		compatible = "maxim,max77802";
+		interrupt-parent = <&intc>;
+		interrupts = <26 IRQ_TYPE_NONE>;
+		reg = <0x09>;
+	};
-- 
cgit v1.2.3


From aa60a8391337aa0f14ed890b237f5b8e6cadfbbf Mon Sep 17 00:00:00 2001
From: Javier Martinez Canillas <javier@osg.samsung.com>
Date: Mon, 24 Aug 2015 10:47:21 +0200
Subject: mfd: max77686: Split out regulator part from the DT binding

The Maxim MAX77686 PMIC is a multi-function device with regulators,
clocks and a RTC. The DT bindings for the clocks are in a separate
file but the bindings for the regulators are inside the mfd part.

To make it consistent with the clocks portion of the binding and
because is more natural to look for regulator bindings under the
bindings/regulator sub-directory, split the regulator portion of
the DT binding and add it as a separate file.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 Documentation/devicetree/bindings/mfd/max77686.txt | 60 ++----------------
 .../devicetree/bindings/regulator/max77686.txt     | 71 ++++++++++++++++++++++
 2 files changed, 75 insertions(+), 56 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/regulator/max77686.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mfd/max77686.txt b/Documentation/devicetree/bindings/mfd/max77686.txt
index d2ed3c20a5c3..741e76688cf2 100644
--- a/Documentation/devicetree/bindings/mfd/max77686.txt
+++ b/Documentation/devicetree/bindings/mfd/max77686.txt
@@ -7,8 +7,9 @@ different i2c slave address,presently for which we are statically creating i2c
 client while probing.This document describes the binding for mfd device and
 PMIC submodule.
 
-Binding for the built-in 32k clock generator block is defined separately
-in bindings/clk/maxim,max77686.txt file.
+Bindings for the built-in 32k clock generator block and
+regulators are defined in ../clk/maxim,max77686.txt and
+../regulator/max77686.txt respectively.
 
 Required properties:
 - compatible : Must be "maxim,max77686";
@@ -16,36 +17,6 @@ Required properties:
 - interrupts : This i2c device has an IRQ line connected to the main SoC.
 - interrupt-parent : The parent interrupt controller.
 
-Optional node:
-- voltage-regulators : The regulators of max77686 have to be instantiated
-  under subnode named "voltage-regulators" using the following format.
-
-	regulator_name {
-		regulator-compatible = LDOn/BUCKn
-		standard regulator constraints....
-	};
-	refer Documentation/devicetree/bindings/regulator/regulator.txt
-
-  The regulator node's name should be initialized with a string
-to get matched with their hardware counterparts as follow:
-
-	-LDOn 	:	for LDOs, where n can lie in range 1 to 26.
-		 	example: LDO1, LDO2, LDO26.
-	-BUCKn 	:	for BUCKs, where n can lie in range 1 to 9.
-			example: BUCK1, BUCK5, BUCK9.
-
-  Regulators which can be turned off during system suspend:
-	-LDOn	:	2, 6-8, 10-12, 14-16,
-	-BUCKn	:	1-4.
-  Use standard regulator bindings for it ('regulator-off-in-suspend').
-
-  LDO20, LDO21, LDO22, BUCK8 and BUCK9 can be configured to GPIO enable
-  control. To turn this feature on this property must be added to the regulator
-  sub-node:
-	- maxim,ena-gpios :	one GPIO specifier enable control (the gpio
-				flags are actually ignored and always
-				ACTIVE_HIGH is used)
-
 Example:
 
 	max77686: pmic@09 {
@@ -53,27 +24,4 @@ Example:
 		interrupt-parent = <&wakeup_eint>;
 		interrupts = <26 0>;
 		reg = <0x09>;
-
-		voltage-regulators {
-			ldo11_reg: LDO11 {
-				regulator-name = "vdd_ldo11";
-				regulator-min-microvolt = <1900000>;
-				regulator-max-microvolt = <1900000>;
-				regulator-always-on;
-			};
-
-			buck1_reg: BUCK1 {
-				regulator-name = "vdd_mif";
-				regulator-min-microvolt = <950000>;
-				regulator-max-microvolt = <1300000>;
-				regulator-always-on;
-				regulator-boot-on;
-			};
-
-			buck9_reg: BUCK9 {
-				regulator-name = "CAM_ISP_CORE_1.2V";
-				regulator-min-microvolt = <1000000>;
-				regulator-max-microvolt = <1200000>;
-				maxim,ena-gpios = <&gpm0 3 GPIO_ACTIVE_HIGH>;
-			};
-	}
+	};
diff --git a/Documentation/devicetree/bindings/regulator/max77686.txt b/Documentation/devicetree/bindings/regulator/max77686.txt
new file mode 100644
index 000000000000..0dded64d89d3
--- /dev/null
+++ b/Documentation/devicetree/bindings/regulator/max77686.txt
@@ -0,0 +1,71 @@
+Binding for Maxim MAX77686 regulators
+
+This is a part of the device tree bindings of MAX77686 multi-function device.
+More information can be found in ../mfd/max77686.txt file.
+
+The MAX77686 PMIC has 9 high-efficiency Buck and 26 Low-DropOut (LDO)
+regulators that can be controlled over I2C.
+
+Following properties should be present in main device node of the MFD chip.
+
+Optional node:
+- voltage-regulators : The regulators of max77686 have to be instantiated
+  under subnode named "voltage-regulators" using the following format.
+
+	regulator_name {
+		regulator-compatible = LDOn/BUCKn
+		standard regulator constraints....
+	};
+	refer Documentation/devicetree/bindings/regulator/regulator.txt
+
+  The regulator node's name should be initialized with a string
+to get matched with their hardware counterparts as follow:
+
+	-LDOn 	:	for LDOs, where n can lie in range 1 to 26.
+			example: LDO1, LDO2, LDO26.
+	-BUCKn 	:	for BUCKs, where n can lie in range 1 to 9.
+			example: BUCK1, BUCK5, BUCK9.
+
+  Regulators which can be turned off during system suspend:
+	-LDOn	:	2, 6-8, 10-12, 14-16,
+	-BUCKn	:	1-4.
+  Use standard regulator bindings for it ('regulator-off-in-suspend').
+
+  LDO20, LDO21, LDO22, BUCK8 and BUCK9 can be configured to GPIO enable
+  control. To turn this feature on this property must be added to the regulator
+  sub-node:
+	- maxim,ena-gpios :	one GPIO specifier enable control (the gpio
+				flags are actually ignored and always
+				ACTIVE_HIGH is used)
+
+Example:
+
+	max77686: pmic@09 {
+		compatible = "maxim,max77686";
+		interrupt-parent = <&wakeup_eint>;
+		interrupts = <26 IRQ_TYPE_NONE>;
+		reg = <0x09>;
+
+		voltage-regulators {
+			ldo11_reg: LDO11 {
+				regulator-name = "vdd_ldo11";
+				regulator-min-microvolt = <1900000>;
+				regulator-max-microvolt = <1900000>;
+				regulator-always-on;
+			};
+
+			buck1_reg: BUCK1 {
+				regulator-name = "vdd_mif";
+				regulator-min-microvolt = <950000>;
+				regulator-max-microvolt = <1300000>;
+				regulator-always-on;
+				regulator-boot-on;
+			};
+
+			buck9_reg: BUCK9 {
+				regulator-name = "CAM_ISP_CORE_1.2V";
+				regulator-min-microvolt = <1000000>;
+				regulator-max-microvolt = <1200000>;
+				maxim,ena-gpios = <&gpm0 3 GPIO_ACTIVE_HIGH>;
+			};
+	};
-- 
cgit v1.2.3


From cebb053bd144d88a4c24b14dd8ac775d8e5972d5 Mon Sep 17 00:00:00 2001
From: Thierry Reding <treding@nvidia.com>
Date: Fri, 1 Aug 2014 15:50:44 +0200
Subject: of: Add vendor prefix for Sharp Corporation

Use "sharp" as the vendor prefix for Sharp Corporation in device
tree compatible strings.

Signed-off-by: Thierry Reding <treding@nvidia.com>
[robh: fix name to Sharp Corporation]
Signed-off-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index d444757c4d9e..9a4ac79038e9 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -181,6 +181,7 @@ sbs	Smart Battery System
 schindler	Schindler
 seagate	Seagate Technology PLC
 semtech	Semtech Corporation
+sharp	Sharp Corporation
 sil	Silicon Image
 silabs	Silicon Laboratories
 siliconmitus	Silicon Mitus, Inc.
-- 
cgit v1.2.3


From e4144fe5d47c91c92d36cdbd5f31ed8d6e3a57ab Mon Sep 17 00:00:00 2001
From: Mario Carrillo <mario.alfredo.c.arevalo@intel.com>
Date: Mon, 24 Aug 2015 09:33:09 -0500
Subject: docs: update HOWTO for 3.x -> 4.x versioning

The HOWTO document needed updating for the new kernel versioning.

Signed-off-by: Mario Carrillo <mario.alfredo.c.arevalo@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/HOWTO | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/HOWTO b/Documentation/HOWTO
index 93aa8604630e..21152d397b88 100644
--- a/Documentation/HOWTO
+++ b/Documentation/HOWTO
@@ -218,16 +218,16 @@ The development process
 Linux kernel development process currently consists of a few different
 main kernel "branches" and lots of different subsystem-specific kernel
 branches.  These different branches are:
-  - main 3.x kernel tree
-  - 3.x.y -stable kernel tree
-  - 3.x -git kernel patches
+  - main 4.x kernel tree
+  - 4.x.y -stable kernel tree
+  - 4.x -git kernel patches
   - subsystem specific kernel trees and patches
-  - the 3.x -next kernel tree for integration tests
+  - the 4.x -next kernel tree for integration tests
 
-3.x kernel tree
+4.x kernel tree
 -----------------
-3.x kernels are maintained by Linus Torvalds, and can be found on
-kernel.org in the pub/linux/kernel/v3.x/ directory.  Its development
+4.x kernels are maintained by Linus Torvalds, and can be found on
+kernel.org in the pub/linux/kernel/v4.x/ directory.  Its development
 process is as follows:
   - As soon as a new kernel is released a two weeks window is open,
     during this period of time maintainers can submit big diffs to
@@ -262,20 +262,20 @@ mailing list about kernel releases:
 	released according to perceived bug status, not according to a
 	preconceived timeline."
 
-3.x.y -stable kernel tree
+4.x.y -stable kernel tree
 ---------------------------
 Kernels with 3-part versions are -stable kernels. They contain
 relatively small and critical fixes for security problems or significant
-regressions discovered in a given 3.x kernel.
+regressions discovered in a given 4.x kernel.
 
 This is the recommended branch for users who want the most recent stable
 kernel and are not interested in helping test development/experimental
 versions.
 
-If no 3.x.y kernel is available, then the highest numbered 3.x
+If no 4.x.y kernel is available, then the highest numbered 4.x
 kernel is the current stable kernel.
 
-3.x.y are maintained by the "stable" team <stable@vger.kernel.org>, and
+4.x.y are maintained by the "stable" team <stable@vger.kernel.org>, and
 are released as needs dictate.  The normal release period is approximately
 two weeks, but it can be longer if there are no pressing problems.  A
 security-related problem, instead, can cause a release to happen almost
@@ -285,7 +285,7 @@ The file Documentation/stable_kernel_rules.txt in the kernel tree
 documents what kinds of changes are acceptable for the -stable tree, and
 how the release process works.
 
-3.x -git patches
+4.x -git patches
 ------------------
 These are daily snapshots of Linus' kernel tree which are managed in a
 git repository (hence the name.) These patches are usually released
@@ -317,9 +317,9 @@ revisions to it, and maintainers can mark patches as under review,
 accepted, or rejected.  Most of these patchwork sites are listed at
 http://patchwork.kernel.org/.
 
-3.x -next kernel tree for integration tests
+4.x -next kernel tree for integration tests
 ---------------------------------------------
-Before updates from subsystem trees are merged into the mainline 3.x
+Before updates from subsystem trees are merged into the mainline 4.x
 tree, they need to be integration-tested.  For this purpose, a special
 testing repository exists into which virtually all subsystem trees are
 pulled on an almost daily basis:
-- 
cgit v1.2.3


From 61a88a76b43ce35254415821047aeefde3f656a9 Mon Sep 17 00:00:00 2001
From: Nan Xiao <xiaonan830818@gmail.com>
Date: Thu, 20 Aug 2015 18:17:10 +0800
Subject: Documentation/Intel-IOMMU.txt: Modify definition of DRHD

Hi all,

According to "Intel Virtualization Technology for Directed I/O" specification,
DRHD stands for "DMA Remapping Hardware Unit Definition" , not "DMA
Engine Reporting Structure".

Signed-off-by: Nan Xiao <nan@chinadtrace.org>
---
 Documentation/Intel-IOMMU.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/Intel-IOMMU.txt b/Documentation/Intel-IOMMU.txt
index cf9431db8731..7b57fc087088 100644
--- a/Documentation/Intel-IOMMU.txt
+++ b/Documentation/Intel-IOMMU.txt
@@ -10,7 +10,7 @@ This guide gives a quick cheat sheet for some basic understanding.
 Some Keywords
 
 DMAR - DMA remapping
-DRHD - DMA Engine Reporting Structure
+DRHD - DMA Remapping Hardware Unit Definition
 RMRR - Reserved memory Region Reporting Structure
 ZLR  - Zero length reads from PCI devices
 IOVA - IO Virtual address.
-- 
cgit v1.2.3


From 0fe0965e63c80dbc4bb005253ecff3494cbefc04 Mon Sep 17 00:00:00 2001
From: Alexander Kuleshov <kuleshovmail@gmail.com>
Date: Fri, 21 Aug 2015 15:19:06 +0600
Subject: Documentation/x86: Rename IRQSTACKSIZE to IRQ_STACK_SIZE

The IRQSTACKSIZE was renamed to the IRQ_STACK_SIZE in the
(26f80bd6a9 x86-64: Convert irqstacks to per-cpu) commit,
but it still named IRQSTACKSIZE in the documentation. This
patch fixes this.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/x86/kernel-stacks | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index 0f3a6c201943..9a0aa4d3a866 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -16,7 +16,7 @@ associated with each CPU.  These stacks are only used while the kernel
 is in control on that CPU; when a CPU returns to user space the
 specialized stacks contain no useful data.  The main CPU stacks are:
 
-* Interrupt stack.  IRQSTACKSIZE
+* Interrupt stack.  IRQ_STACK_SIZE
 
   Used for external hardware interrupts.  If this is the first external
   hardware interrupt (i.e. not a nested hardware interrupt) then the
-- 
cgit v1.2.3


From 32ecaf89bdfd842c9669b3f38ff82907e3cd60d0 Mon Sep 17 00:00:00 2001
From: Pawel Moll <pawel.moll@arm.com>
Date: Thu, 6 Aug 2015 16:05:13 +0100
Subject: clk: versatile: Add SP810 device tree bindings document

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 Documentation/devicetree/bindings/arm/sp810.txt | 35 +++++++++++++++++++++++++
 1 file changed, 35 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/sp810.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/sp810.txt b/Documentation/devicetree/bindings/arm/sp810.txt
new file mode 100644
index 000000000000..440ee0892a4d
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/sp810.txt
@@ -0,0 +1,35 @@
+SP810 System Controller
+-----------------------
+
+Required properties:
+
+- compatible:	standard compatible string for a Primecell peripheral,
+		see Documentation/devicetree/bindings/arm/primecell.txt
+		for more details
+		should be: "arm,sp810", "arm,primecell"
+
+- reg:		standard registers property, physical address and size
+		of the control registers
+
+- clock-names:	from the common clock bindings, for more details see
+		Documentation/devicetree/bindings/clock/clock-bindings.txt;
+		should be: "refclk", "timclk", "apb_pclk"
+
+- clocks:	from the common clock bindings, phandle and clock
+		specifier pairs for the entries of clock-names property
+
+- #clock-cells: from the common clock bindings;
+		should be: <1>
+
+- clock-output-names: from the common clock bindings;
+		should be: "timerclken0", "timerclken1", "timerclken2", "timerclken3"
+
+Example:
+	sysctl@020000 {
+		compatible = "arm,sp810", "arm,primecell";
+		reg = <0x020000 0x1000>;
+		clocks = <&v2m_refclk32khz>, <&v2m_refclk1mhz>, <&smbclk>;
+		clock-names = "refclk", "timclk", "apb_pclk";
+		#clock-cells = <1>;
+		clock-output-names = "timerclken0", "timerclken1", "timerclken2", "timerclken3";
+	};
-- 
cgit v1.2.3


From 62f477119834d912a8471e775d2aeaca0166ab29 Mon Sep 17 00:00:00 2001
From: Stephen Boyd <sboyd@codeaurora.org>
Date: Thu, 30 Jul 2015 17:20:57 -0700
Subject: clk: versatile: Switch to assigned clock parents

We're removing struct clk from the clk provider API. This code is
calling the consumer APIs to change the parent to a 1 MHz fixed
rate clock for each of the clocks that the driver provides. Move
to using the assigned-clock-parents DT property for this instead.
Because this is an ABI break, detect if the property is missing
and fall back to setting the parent explicitly before the clocks
are registered.

Acked-by: Pawel Moll <pawel.moll@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 Documentation/devicetree/bindings/arm/sp810.txt | 13 ++++-
 drivers/clk/versatile/clk-sp810.c               | 76 +++++--------------------
 2 files changed, 27 insertions(+), 62 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/sp810.txt b/Documentation/devicetree/bindings/arm/sp810.txt
index 440ee0892a4d..6808fb5dee40 100644
--- a/Documentation/devicetree/bindings/arm/sp810.txt
+++ b/Documentation/devicetree/bindings/arm/sp810.txt
@@ -24,12 +24,23 @@ Required properties:
 - clock-output-names: from the common clock bindings;
 		should be: "timerclken0", "timerclken1", "timerclken2", "timerclken3"
 
+- assigned-clocks: from the common clock binding;
+		should be: clock specifier for each output clock of this
+		provider node
+
+- assigned-clock-parents: from the common clock binding;
+		should be: phandle of input clock listed in clocks
+		property with the highest frequency
+
 Example:
-	sysctl@020000 {
+	v2m_sysctl: sysctl@020000 {
 		compatible = "arm,sp810", "arm,primecell";
 		reg = <0x020000 0x1000>;
 		clocks = <&v2m_refclk32khz>, <&v2m_refclk1mhz>, <&smbclk>;
 		clock-names = "refclk", "timclk", "apb_pclk";
 		#clock-cells = <1>;
 		clock-output-names = "timerclken0", "timerclken1", "timerclken2", "timerclken3";
+		assigned-clocks = <&v2m_sysctl 0>, <&v2m_sysctl 1>, <&v2m_sysctl 3>, <&v2m_sysctl 3>;
+		assigned-clock-parents = <&v2m_refclk1mhz>, <&v2m_refclk1mhz>, <&v2m_refclk1mhz>, <&v2m_refclk1mhz>;
+
 	};
diff --git a/drivers/clk/versatile/clk-sp810.c b/drivers/clk/versatile/clk-sp810.c
index 7fbe4d4bf35e..a1cdef6b0f90 100644
--- a/drivers/clk/versatile/clk-sp810.c
+++ b/drivers/clk/versatile/clk-sp810.c
@@ -33,12 +33,9 @@ struct clk_sp810_timerclken {
 
 struct clk_sp810 {
 	struct device_node *node;
-	int refclk_index, timclk_index;
 	void __iomem *base;
 	spinlock_t lock;
 	struct clk_sp810_timerclken timerclken[4];
-	struct clk *refclk;
-	struct clk *timclk;
 };
 
 static u8 clk_sp810_timerclken_get_parent(struct clk_hw *hw)
@@ -71,55 +68,7 @@ static int clk_sp810_timerclken_set_parent(struct clk_hw *hw, u8 index)
 	return 0;
 }
 
-/*
- * FIXME - setting the parent every time .prepare is invoked is inefficient.
- * This is better handled by a dedicated clock tree configuration mechanism at
- * init-time.  Revisit this later when such a mechanism exists
- */
-static int clk_sp810_timerclken_prepare(struct clk_hw *hw)
-{
-	struct clk_sp810_timerclken *timerclken = to_clk_sp810_timerclken(hw);
-	struct clk_sp810 *sp810 = timerclken->sp810;
-	struct clk *old_parent = __clk_get_parent(hw->clk);
-	struct clk *new_parent;
-
-	if (!sp810->refclk)
-		sp810->refclk = of_clk_get(sp810->node, sp810->refclk_index);
-
-	if (!sp810->timclk)
-		sp810->timclk = of_clk_get(sp810->node, sp810->timclk_index);
-
-	if (WARN_ON(IS_ERR(sp810->refclk) || IS_ERR(sp810->timclk)))
-		return -ENOENT;
-
-	/* Select fastest parent */
-	if (clk_get_rate(sp810->refclk) > clk_get_rate(sp810->timclk))
-		new_parent = sp810->refclk;
-	else
-		new_parent = sp810->timclk;
-
-	/* Switch the parent if necessary */
-	if (old_parent != new_parent) {
-		clk_prepare(new_parent);
-		clk_set_parent(hw->clk, new_parent);
-		clk_unprepare(old_parent);
-	}
-
-	return 0;
-}
-
-static void clk_sp810_timerclken_unprepare(struct clk_hw *hw)
-{
-	struct clk_sp810_timerclken *timerclken = to_clk_sp810_timerclken(hw);
-	struct clk_sp810 *sp810 = timerclken->sp810;
-
-	clk_put(sp810->timclk);
-	clk_put(sp810->refclk);
-}
-
 static const struct clk_ops clk_sp810_timerclken_ops = {
-	.prepare = clk_sp810_timerclken_prepare,
-	.unprepare = clk_sp810_timerclken_unprepare,
 	.get_parent = clk_sp810_timerclken_get_parent,
 	.set_parent = clk_sp810_timerclken_set_parent,
 };
@@ -140,24 +89,18 @@ static void __init clk_sp810_of_setup(struct device_node *node)
 {
 	struct clk_sp810 *sp810 = kzalloc(sizeof(*sp810), GFP_KERNEL);
 	const char *parent_names[2];
+	int num = ARRAY_SIZE(parent_names);
 	char name[12];
 	struct clk_init_data init;
 	int i;
+	bool deprecated;
 
 	if (!sp810) {
 		pr_err("Failed to allocate memory for SP810!\n");
 		return;
 	}
 
-	sp810->refclk_index = of_property_match_string(node, "clock-names",
-			"refclk");
-	parent_names[0] = of_clk_get_parent_name(node, sp810->refclk_index);
-
-	sp810->timclk_index = of_property_match_string(node, "clock-names",
-			"timclk");
-	parent_names[1] = of_clk_get_parent_name(node, sp810->timclk_index);
-
-	if (!parent_names[0] || !parent_names[1]) {
+	if (of_clk_parent_fill(node, parent_names, num) != num) {
 		pr_warn("Failed to obtain parent clocks for SP810!\n");
 		return;
 	}
@@ -170,7 +113,9 @@ static void __init clk_sp810_of_setup(struct device_node *node)
 	init.ops = &clk_sp810_timerclken_ops;
 	init.flags = CLK_IS_BASIC;
 	init.parent_names = parent_names;
-	init.num_parents = ARRAY_SIZE(parent_names);
+	init.num_parents = num;
+
+	deprecated = !of_find_property(node, "assigned-clock-parents", NULL);
 
 	for (i = 0; i < ARRAY_SIZE(sp810->timerclken); i++) {
 		snprintf(name, ARRAY_SIZE(name), "timerclken%d", i);
@@ -179,6 +124,15 @@ static void __init clk_sp810_of_setup(struct device_node *node)
 		sp810->timerclken[i].channel = i;
 		sp810->timerclken[i].hw.init = &init;
 
+		/*
+		 * If DT isn't setting the parent, force it to be
+		 * the 1 MHz clock without going through the framework.
+		 * We do this before clk_register() so that it can determine
+		 * the parent and setup the tree properly.
+		 */
+		if (deprecated)
+			init.ops->set_parent(&sp810->timerclken[i].hw, 1);
+
 		sp810->timerclken[i].clk = clk_register(NULL,
 				&sp810->timerclken[i].hw);
 		WARN_ON(IS_ERR(sp810->timerclken[i].clk));
-- 
cgit v1.2.3


From 7e2a51e0cf75028888e5c670b3a3963e1716bdff Mon Sep 17 00:00:00 2001
From: Leo Yan <leo.yan@linaro.org>
Date: Tue, 4 Aug 2015 15:27:26 +0800
Subject: dt-bindings: arm: Hi6220: add doc for SRAM controller

Document "hisilicon,hi6220-sramctrl" for SRAM controller.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 .../devicetree/bindings/arm/hisilicon/hisilicon.txt    | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt b/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
index c431c67524d6..c733e28e18e5 100644
--- a/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
+++ b/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
@@ -127,6 +127,24 @@ Example:
 		#clock-cells = <1>;
 	};
 
+
+Hisilicon Hi6220 SRAM controller
+
+Required properties:
+- compatible : "hisilicon,hi6220-sramctrl", "syscon"
+- reg : Register address and size
+
+Hisilicon's SoCs use sram for multiple purpose; on Hi6220 there have several
+SRAM banks for power management, modem, security, etc. Further, use "syscon"
+managing the common sram which can be shared by multiple modules.
+
+Example:
+	/*for Hi6220*/
+	sram: sram@fff80000 {
+		compatible = "hisilicon,hi6220-sramctrl", "syscon";
+		reg = <0x0 0xfff80000 0x0 0x12000>;
+	};
+
 -----------------------------------------------------------------------
 Hisilicon HiP01 system controller
 
-- 
cgit v1.2.3


From 832446e8aaaeaf9365da18f95f01a42e6da27279 Mon Sep 17 00:00:00 2001
From: Leo Yan <leo.yan@linaro.org>
Date: Tue, 4 Aug 2015 15:27:27 +0800
Subject: dt-bindings: clk: Hi6220: Document stub clock driver

Document the new compatible for stub clock driver which is used for CPU
and DDR's dynamic frequency scaling.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 .../devicetree/bindings/clock/hi6220-clock.txt        | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/clock/hi6220-clock.txt b/Documentation/devicetree/bindings/clock/hi6220-clock.txt
index 259e30af9597..e4d5feaebc29 100644
--- a/Documentation/devicetree/bindings/clock/hi6220-clock.txt
+++ b/Documentation/devicetree/bindings/clock/hi6220-clock.txt
@@ -15,19 +15,36 @@ Required Properties:
 	- "hisilicon,hi6220-sysctrl"
 	- "hisilicon,hi6220-mediactrl"
 	- "hisilicon,hi6220-pmctrl"
+	- "hisilicon,hi6220-stub-clk"
 
 - reg: physical base address of the controller and length of memory mapped
   region.
 
 - #clock-cells: should be 1.
 
-For example:
+Optional Properties:
+
+- hisilicon,hi6220-clk-sram: phandle to the syscon managing the SoC internal sram;
+  the driver need use the sram to pass parameters for frequency change.
+
+- mboxes: use the label reference for the mailbox as the first parameter, the
+  second parameter is the channel number.
+
+Example 1:
 	sys_ctrl: sys_ctrl@f7030000 {
 		compatible = "hisilicon,hi6220-sysctrl", "syscon";
 		reg = <0x0 0xf7030000 0x0 0x2000>;
 		#clock-cells = <1>;
 	};
 
+Example 2:
+	stub_clock: stub_clock {
+		compatible = "hisilicon,hi6220-stub-clk";
+		hisilicon,hi6220-clk-sram = <&sram>;
+		#clock-cells = <1>;
+		mboxes = <&mailbox 1>;
+	};
+
 Each clock is assigned an identifier and client nodes use this identifier
 to specify the clock which they consume.
 
-- 
cgit v1.2.3


From 67c9a1b5dadf05e22d7e2d32604fb2b21bf3f666 Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij@linaro.org>
Date: Thu, 30 Jul 2015 15:20:00 +0200
Subject: clk: add bindings for the Ux500 clocks

These Ux500 clocks have been around for years and were never
properly documented. Add the proper binding documentation.

Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 Documentation/devicetree/bindings/clock/ux500.txt | 64 +++++++++++++++++++++++
 1 file changed, 64 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/ux500.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/clock/ux500.txt b/Documentation/devicetree/bindings/clock/ux500.txt
new file mode 100644
index 000000000000..e52bd4b72348
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/ux500.txt
@@ -0,0 +1,64 @@
+Clock bindings for ST-Ericsson Ux500 clocks
+
+Required properties :
+- compatible : shall contain only one of the following:
+  "stericsson,u8500-clks"
+  "stericsson,u8540-clks"
+  "stericsson,u9540-clks"
+- reg : shall contain base register location and length for
+  CLKRST1, 2, 3, 5, and 6 in an array. Note the absence of
+  CLKRST4, which does not exist.
+
+Required subnodes:
+- prcmu-clock: a subnode with one clock cell for PRCMU (power,
+  reset, control unit) clocks. The cell indicates which PRCMU
+  clock in the prcmu-clock node the consumer wants to use.
+- prcc-periph-clock: a subnode with two clock cells for
+  PRCC (programmable reset- and clock controller) peripheral clocks.
+  The first cell indicates which PRCC block the consumer
+  wants to use, possible values are 1, 2, 3, 5, 6. The second
+  cell indicates which clock inside the PRCC block it wants,
+  possible values are 0 thru 31.
+- prcc-kernel-clock: a subnode with two clock cells for
+  PRCC (programmable reset- and clock controller) kernel clocks
+  The first cell indicates which PRCC block the consumer
+  wants to use, possible values are 1, 2, 3, 5, 6. The second
+  cell indicates which clock inside the PRCC block it wants,
+  possible values are 0 thru 31.
+- rtc32k-clock: a subnode with zero clock cells for the 32kHz
+  RTC clock.
+- smp-twd-clock: a subnode for the ARM SMP Timer Watchdog cluster
+  with zero clock cells.
+
+Example:
+
+clocks {
+	compatible = "stericsson,u8500-clks";
+	/*
+	 * Registers for the CLKRST block on peripheral
+	 * groups 1, 2, 3, 5, 6,
+	 */
+	reg = <0x8012f000 0x1000>, <0x8011f000 0x1000>,
+	    <0x8000f000 0x1000>, <0xa03ff000 0x1000>,
+	    <0xa03cf000 0x1000>;
+
+	prcmu_clk: prcmu-clock {
+		#clock-cells = <1>;
+	};
+
+	prcc_pclk: prcc-periph-clock {
+		#clock-cells = <2>;
+	};
+
+	prcc_kclk: prcc-kernel-clock {
+		#clock-cells = <2>;
+	};
+
+	rtc_clk: rtc32k-clock {
+		#clock-cells = <0>;
+	};
+
+	smp_twd_clk: smp-twd-clock {
+		#clock-cells = <0>;
+	};
+};
-- 
cgit v1.2.3


From 7ddbc2423c3301280b883bbb04b998203f30312c Mon Sep 17 00:00:00 2001
From: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Date: Tue, 21 Jul 2015 17:44:49 -0700
Subject: backlight: pm8941-wled: Move PM8941 WLED driver to backlight

The Qualcomm PM8941 WLED block is used for backlight and should therefor
be in the backlight framework and not in the LED framework. This moves
the driver and adapts to the backlight api instead.

Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Acked-by: Jacek Anaszewski <j.anaszewski@samsung.com>
Acked-by: Jingoo Han <jingoohan1@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 .../devicetree/bindings/leds/leds-pm8941-wled.txt  |  43 --
 .../bindings/video/backlight/pm8941-wled.txt       |  40 ++
 drivers/leds/Kconfig                               |   8 -
 drivers/leds/Makefile                              |   1 -
 drivers/leds/leds-pm8941-wled.c                    | 435 ---------------------
 drivers/video/backlight/Kconfig                    |   7 +
 drivers/video/backlight/Makefile                   |   1 +
 drivers/video/backlight/pm8941-wled.c              | 427 ++++++++++++++++++++
 8 files changed, 475 insertions(+), 487 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/leds/leds-pm8941-wled.txt
 create mode 100644 Documentation/devicetree/bindings/video/backlight/pm8941-wled.txt
 delete mode 100644 drivers/leds/leds-pm8941-wled.c
 create mode 100644 drivers/video/backlight/pm8941-wled.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/leds/leds-pm8941-wled.txt b/Documentation/devicetree/bindings/leds/leds-pm8941-wled.txt
deleted file mode 100644
index a85a964d61f5..000000000000
--- a/Documentation/devicetree/bindings/leds/leds-pm8941-wled.txt
+++ /dev/null
@@ -1,43 +0,0 @@
-Binding for Qualcomm PM8941 WLED driver
-
-Required properties:
-- compatible: should be "qcom,pm8941-wled"
-- reg: slave address
-
-Optional properties:
-- label: The label for this led
-  See Documentation/devicetree/bindings/leds/common.txt
-- linux,default-trigger: Default trigger assigned to the LED
-  See Documentation/devicetree/bindings/leds/common.txt
-- qcom,cs-out: bool; enable current sink output
-- qcom,cabc: bool; enable content adaptive backlight control
-- qcom,ext-gen: bool; use externally generated modulator signal to dim
-- qcom,current-limit: mA; per-string current limit; value from 0 to 25
-	default: 20mA
-- qcom,current-boost-limit: mA; boost current limit; one of:
-	105, 385, 525, 805, 980, 1260, 1400, 1680
-	default: 805mA
-- qcom,switching-freq: kHz; switching frequency; one of:
-	600, 640, 685, 738, 800, 872, 960, 1066, 1200, 1371,
-	1600, 1920, 2400, 3200, 4800, 9600,
-	default: 1600kHz
-- qcom,ovp: V; Over-voltage protection limit; one of:
-	27, 29, 32, 35
-	default: 29V
-- qcom,num-strings: #; number of led strings attached; value from 1 to 3
-	default: 2
-
-Example:
-
-pm8941-wled@d800 {
-	compatible = "qcom,pm8941-wled";
-	reg = <0xd800>;
-	label = "backlight";
-
-	qcom,cs-out;
-	qcom,current-limit = <20>;
-	qcom,current-boost-limit = <805>;
-	qcom,switching-freq = <1600>;
-	qcom,ovp = <29>;
-	qcom,num-strings = <2>;
-};
diff --git a/Documentation/devicetree/bindings/video/backlight/pm8941-wled.txt b/Documentation/devicetree/bindings/video/backlight/pm8941-wled.txt
new file mode 100644
index 000000000000..424f8444a6cd
--- /dev/null
+++ b/Documentation/devicetree/bindings/video/backlight/pm8941-wled.txt
@@ -0,0 +1,40 @@
+Binding for Qualcomm PM8941 WLED driver
+
+Required properties:
+- compatible: should be "qcom,pm8941-wled"
+- reg: slave address
+
+Optional properties:
+- label: The name of the backlight device
+- qcom,cs-out: bool; enable current sink output
+- qcom,cabc: bool; enable content adaptive backlight control
+- qcom,ext-gen: bool; use externally generated modulator signal to dim
+- qcom,current-limit: mA; per-string current limit; value from 0 to 25
+	default: 20mA
+- qcom,current-boost-limit: mA; boost current limit; one of:
+	105, 385, 525, 805, 980, 1260, 1400, 1680
+	default: 805mA
+- qcom,switching-freq: kHz; switching frequency; one of:
+	600, 640, 685, 738, 800, 872, 960, 1066, 1200, 1371,
+	1600, 1920, 2400, 3200, 4800, 9600,
+	default: 1600kHz
+- qcom,ovp: V; Over-voltage protection limit; one of:
+	27, 29, 32, 35
+	default: 29V
+- qcom,num-strings: #; number of led strings attached; value from 1 to 3
+	default: 2
+
+Example:
+
+pm8941-wled@d800 {
+	compatible = "qcom,pm8941-wled";
+	reg = <0xd800>;
+	label = "backlight";
+
+	qcom,cs-out;
+	qcom,current-limit = <20>;
+	qcom,current-boost-limit = <805>;
+	qcom,switching-freq = <1600>;
+	qcom,ovp = <29>;
+	qcom,num-strings = <2>;
+};
diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
index 9ad35f72ab4c..b8d4b965ca2a 100644
--- a/drivers/leds/Kconfig
+++ b/drivers/leds/Kconfig
@@ -578,14 +578,6 @@ config LEDS_VERSATILE
 	  This option enabled support for the LEDs on the ARM Versatile
 	  and RealView boards. Say Y to enabled these.
 
-config LEDS_PM8941_WLED
-	tristate "LED support for the Qualcomm PM8941 WLED block"
-	depends on LEDS_CLASS
-	select REGMAP
-	help
-	  This option enables support for the 'White' LED block
-	  on Qualcomm PM8941 PMICs.
-
 comment "LED Triggers"
 source "drivers/leds/trigger/Kconfig"
 
diff --git a/drivers/leds/Makefile b/drivers/leds/Makefile
index 8d6a24a2f513..abe96d960ebe 100644
--- a/drivers/leds/Makefile
+++ b/drivers/leds/Makefile
@@ -63,7 +63,6 @@ obj-$(CONFIG_LEDS_BLINKM)		+= leds-blinkm.o
 obj-$(CONFIG_LEDS_SYSCON)		+= leds-syscon.o
 obj-$(CONFIG_LEDS_VERSATILE)		+= leds-versatile.o
 obj-$(CONFIG_LEDS_MENF21BMC)		+= leds-menf21bmc.o
-obj-$(CONFIG_LEDS_PM8941_WLED)		+= leds-pm8941-wled.o
 obj-$(CONFIG_LEDS_KTD2692)		+= leds-ktd2692.o
 
 # LED SPI Drivers
diff --git a/drivers/leds/leds-pm8941-wled.c b/drivers/leds/leds-pm8941-wled.c
deleted file mode 100644
index bf64a593fbf1..000000000000
--- a/drivers/leds/leds-pm8941-wled.c
+++ /dev/null
@@ -1,435 +0,0 @@
-/* Copyright (c) 2015, Sony Mobile Communications, AB.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 and
- * only version 2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- */
-
-#include <linux/kernel.h>
-#include <linux/leds.h>
-#include <linux/module.h>
-#include <linux/of.h>
-#include <linux/of_device.h>
-#include <linux/regmap.h>
-
-#define PM8941_WLED_REG_VAL_BASE		0x40
-#define  PM8941_WLED_REG_VAL_MAX		0xFFF
-
-#define PM8941_WLED_REG_MOD_EN			0x46
-#define  PM8941_WLED_REG_MOD_EN_BIT		BIT(7)
-#define  PM8941_WLED_REG_MOD_EN_MASK		BIT(7)
-
-#define PM8941_WLED_REG_SYNC			0x47
-#define  PM8941_WLED_REG_SYNC_MASK		0x07
-#define  PM8941_WLED_REG_SYNC_LED1		BIT(0)
-#define  PM8941_WLED_REG_SYNC_LED2		BIT(1)
-#define  PM8941_WLED_REG_SYNC_LED3		BIT(2)
-#define  PM8941_WLED_REG_SYNC_ALL		0x07
-#define  PM8941_WLED_REG_SYNC_CLEAR		0x00
-
-#define PM8941_WLED_REG_FREQ			0x4c
-#define  PM8941_WLED_REG_FREQ_MASK		0x0f
-
-#define PM8941_WLED_REG_OVP			0x4d
-#define  PM8941_WLED_REG_OVP_MASK		0x03
-
-#define PM8941_WLED_REG_BOOST			0x4e
-#define  PM8941_WLED_REG_BOOST_MASK		0x07
-
-#define PM8941_WLED_REG_SINK			0x4f
-#define  PM8941_WLED_REG_SINK_MASK		0xe0
-#define  PM8941_WLED_REG_SINK_SHFT		0x05
-
-/* Per-'string' registers below */
-#define PM8941_WLED_REG_STR_OFFSET		0x10
-
-#define PM8941_WLED_REG_STR_MOD_EN_BASE		0x60
-#define  PM8941_WLED_REG_STR_MOD_MASK		BIT(7)
-#define  PM8941_WLED_REG_STR_MOD_EN		BIT(7)
-
-#define PM8941_WLED_REG_STR_SCALE_BASE		0x62
-#define  PM8941_WLED_REG_STR_SCALE_MASK		0x1f
-
-#define PM8941_WLED_REG_STR_MOD_SRC_BASE	0x63
-#define  PM8941_WLED_REG_STR_MOD_SRC_MASK	0x01
-#define  PM8941_WLED_REG_STR_MOD_SRC_INT	0x00
-#define  PM8941_WLED_REG_STR_MOD_SRC_EXT	0x01
-
-#define PM8941_WLED_REG_STR_CABC_BASE		0x66
-#define  PM8941_WLED_REG_STR_CABC_MASK		BIT(7)
-#define  PM8941_WLED_REG_STR_CABC_EN		BIT(7)
-
-struct pm8941_wled_config {
-	u32 i_boost_limit;
-	u32 ovp;
-	u32 switch_freq;
-	u32 num_strings;
-	u32 i_limit;
-	bool cs_out_en;
-	bool ext_gen;
-	bool cabc_en;
-};
-
-struct pm8941_wled {
-	struct regmap *regmap;
-	u16 addr;
-
-	struct led_classdev cdev;
-
-	struct pm8941_wled_config cfg;
-};
-
-static int pm8941_wled_set(struct led_classdev *cdev,
-			   enum led_brightness value)
-{
-	struct pm8941_wled *wled;
-	u8 ctrl = 0;
-	u16 val;
-	int rc;
-	int i;
-
-	wled = container_of(cdev, struct pm8941_wled, cdev);
-
-	if (value != 0)
-		ctrl = PM8941_WLED_REG_MOD_EN_BIT;
-
-	val = value * PM8941_WLED_REG_VAL_MAX / LED_FULL;
-
-	rc = regmap_update_bits(wled->regmap,
-			wled->addr + PM8941_WLED_REG_MOD_EN,
-			PM8941_WLED_REG_MOD_EN_MASK, ctrl);
-	if (rc)
-		return rc;
-
-	for (i = 0; i < wled->cfg.num_strings; ++i) {
-		u8 v[2] = { val & 0xff, (val >> 8) & 0xf };
-
-		rc = regmap_bulk_write(wled->regmap,
-				wled->addr + PM8941_WLED_REG_VAL_BASE + 2 * i,
-				v, 2);
-		if (rc)
-			return rc;
-	}
-
-	rc = regmap_update_bits(wled->regmap,
-			wled->addr + PM8941_WLED_REG_SYNC,
-			PM8941_WLED_REG_SYNC_MASK, PM8941_WLED_REG_SYNC_ALL);
-	if (rc)
-		return rc;
-
-	rc = regmap_update_bits(wled->regmap,
-			wled->addr + PM8941_WLED_REG_SYNC,
-			PM8941_WLED_REG_SYNC_MASK, PM8941_WLED_REG_SYNC_CLEAR);
-	return rc;
-}
-
-static void pm8941_wled_set_brightness(struct led_classdev *cdev,
-				       enum led_brightness value)
-{
-	if (pm8941_wled_set(cdev, value)) {
-		dev_err(cdev->dev, "Unable to set brightness\n");
-		return;
-	}
-	cdev->brightness = value;
-}
-
-static int pm8941_wled_setup(struct pm8941_wled *wled)
-{
-	int rc;
-	int i;
-
-	rc = regmap_update_bits(wled->regmap,
-			wled->addr + PM8941_WLED_REG_OVP,
-			PM8941_WLED_REG_OVP_MASK, wled->cfg.ovp);
-	if (rc)
-		return rc;
-
-	rc = regmap_update_bits(wled->regmap,
-			wled->addr + PM8941_WLED_REG_BOOST,
-			PM8941_WLED_REG_BOOST_MASK, wled->cfg.i_boost_limit);
-	if (rc)
-		return rc;
-
-	rc = regmap_update_bits(wled->regmap,
-			wled->addr + PM8941_WLED_REG_FREQ,
-			PM8941_WLED_REG_FREQ_MASK, wled->cfg.switch_freq);
-	if (rc)
-		return rc;
-
-	if (wled->cfg.cs_out_en) {
-		u8 all = (BIT(wled->cfg.num_strings) - 1)
-				<< PM8941_WLED_REG_SINK_SHFT;
-
-		rc = regmap_update_bits(wled->regmap,
-				wled->addr + PM8941_WLED_REG_SINK,
-				PM8941_WLED_REG_SINK_MASK, all);
-		if (rc)
-			return rc;
-	}
-
-	for (i = 0; i < wled->cfg.num_strings; ++i) {
-		u16 addr = wled->addr + PM8941_WLED_REG_STR_OFFSET * i;
-
-		rc = regmap_update_bits(wled->regmap,
-				addr + PM8941_WLED_REG_STR_MOD_EN_BASE,
-				PM8941_WLED_REG_STR_MOD_MASK,
-				PM8941_WLED_REG_STR_MOD_EN);
-		if (rc)
-			return rc;
-
-		if (wled->cfg.ext_gen) {
-			rc = regmap_update_bits(wled->regmap,
-					addr + PM8941_WLED_REG_STR_MOD_SRC_BASE,
-					PM8941_WLED_REG_STR_MOD_SRC_MASK,
-					PM8941_WLED_REG_STR_MOD_SRC_EXT);
-			if (rc)
-				return rc;
-		}
-
-		rc = regmap_update_bits(wled->regmap,
-				addr + PM8941_WLED_REG_STR_SCALE_BASE,
-				PM8941_WLED_REG_STR_SCALE_MASK,
-				wled->cfg.i_limit);
-		if (rc)
-			return rc;
-
-		rc = regmap_update_bits(wled->regmap,
-				addr + PM8941_WLED_REG_STR_CABC_BASE,
-				PM8941_WLED_REG_STR_CABC_MASK,
-				wled->cfg.cabc_en ?
-					PM8941_WLED_REG_STR_CABC_EN : 0);
-		if (rc)
-			return rc;
-	}
-
-	return 0;
-}
-
-static const struct pm8941_wled_config pm8941_wled_config_defaults = {
-	.i_boost_limit = 3,
-	.i_limit = 20,
-	.ovp = 2,
-	.switch_freq = 5,
-	.num_strings = 0,
-	.cs_out_en = false,
-	.ext_gen = false,
-	.cabc_en = false,
-};
-
-struct pm8941_wled_var_cfg {
-	const u32 *values;
-	u32 (*fn)(u32);
-	int size;
-};
-
-static const u32 pm8941_wled_i_boost_limit_values[] = {
-	105, 385, 525, 805, 980, 1260, 1400, 1680,
-};
-
-static const struct pm8941_wled_var_cfg pm8941_wled_i_boost_limit_cfg = {
-	.values = pm8941_wled_i_boost_limit_values,
-	.size = ARRAY_SIZE(pm8941_wled_i_boost_limit_values),
-};
-
-static const u32 pm8941_wled_ovp_values[] = {
-	35, 32, 29, 27,
-};
-
-static const struct pm8941_wled_var_cfg pm8941_wled_ovp_cfg = {
-	.values = pm8941_wled_ovp_values,
-	.size = ARRAY_SIZE(pm8941_wled_ovp_values),
-};
-
-static u32 pm8941_wled_num_strings_values_fn(u32 idx)
-{
-	return idx + 1;
-}
-
-static const struct pm8941_wled_var_cfg pm8941_wled_num_strings_cfg = {
-	.fn = pm8941_wled_num_strings_values_fn,
-	.size = 3,
-};
-
-static u32 pm8941_wled_switch_freq_values_fn(u32 idx)
-{
-	return 19200 / (2 * (1 + idx));
-}
-
-static const struct pm8941_wled_var_cfg pm8941_wled_switch_freq_cfg = {
-	.fn = pm8941_wled_switch_freq_values_fn,
-	.size = 16,
-};
-
-static const struct pm8941_wled_var_cfg pm8941_wled_i_limit_cfg = {
-	.size = 26,
-};
-
-static u32 pm8941_wled_values(const struct pm8941_wled_var_cfg *cfg, u32 idx)
-{
-	if (idx >= cfg->size)
-		return UINT_MAX;
-	if (cfg->fn)
-		return cfg->fn(idx);
-	if (cfg->values)
-		return cfg->values[idx];
-	return idx;
-}
-
-static int pm8941_wled_configure(struct pm8941_wled *wled, struct device *dev)
-{
-	struct pm8941_wled_config *cfg = &wled->cfg;
-	u32 val;
-	int rc;
-	u32 c;
-	int i;
-	int j;
-
-	const struct {
-		const char *name;
-		u32 *val_ptr;
-		const struct pm8941_wled_var_cfg *cfg;
-	} u32_opts[] = {
-		{
-			"qcom,current-boost-limit",
-			&cfg->i_boost_limit,
-			.cfg = &pm8941_wled_i_boost_limit_cfg,
-		},
-		{
-			"qcom,current-limit",
-			&cfg->i_limit,
-			.cfg = &pm8941_wled_i_limit_cfg,
-		},
-		{
-			"qcom,ovp",
-			&cfg->ovp,
-			.cfg = &pm8941_wled_ovp_cfg,
-		},
-		{
-			"qcom,switching-freq",
-			&cfg->switch_freq,
-			.cfg = &pm8941_wled_switch_freq_cfg,
-		},
-		{
-			"qcom,num-strings",
-			&cfg->num_strings,
-			.cfg = &pm8941_wled_num_strings_cfg,
-		},
-	};
-	const struct {
-		const char *name;
-		bool *val_ptr;
-	} bool_opts[] = {
-		{ "qcom,cs-out", &cfg->cs_out_en, },
-		{ "qcom,ext-gen", &cfg->ext_gen, },
-		{ "qcom,cabc", &cfg->cabc_en, },
-	};
-
-	rc = of_property_read_u32(dev->of_node, "reg", &val);
-	if (rc || val > 0xffff) {
-		dev_err(dev, "invalid IO resources\n");
-		return rc ? rc : -EINVAL;
-	}
-	wled->addr = val;
-
-	rc = of_property_read_string(dev->of_node, "label", &wled->cdev.name);
-	if (rc)
-		wled->cdev.name = dev->of_node->name;
-
-	wled->cdev.default_trigger = of_get_property(dev->of_node,
-			"linux,default-trigger", NULL);
-
-	*cfg = pm8941_wled_config_defaults;
-	for (i = 0; i < ARRAY_SIZE(u32_opts); ++i) {
-		rc = of_property_read_u32(dev->of_node, u32_opts[i].name, &val);
-		if (rc == -EINVAL) {
-			continue;
-		} else if (rc) {
-			dev_err(dev, "error reading '%s'\n", u32_opts[i].name);
-			return rc;
-		}
-
-		c = UINT_MAX;
-		for (j = 0; c != val; j++) {
-			c = pm8941_wled_values(u32_opts[i].cfg, j);
-			if (c == UINT_MAX) {
-				dev_err(dev, "invalid value for '%s'\n",
-					u32_opts[i].name);
-				return -EINVAL;
-			}
-		}
-
-		dev_dbg(dev, "'%s' = %u\n", u32_opts[i].name, c);
-		*u32_opts[i].val_ptr = j;
-	}
-
-	for (i = 0; i < ARRAY_SIZE(bool_opts); ++i) {
-		if (of_property_read_bool(dev->of_node, bool_opts[i].name))
-			*bool_opts[i].val_ptr = true;
-	}
-
-	cfg->num_strings = cfg->num_strings + 1;
-
-	return 0;
-}
-
-static int pm8941_wled_probe(struct platform_device *pdev)
-{
-	struct pm8941_wled *wled;
-	struct regmap *regmap;
-	int rc;
-
-	regmap = dev_get_regmap(pdev->dev.parent, NULL);
-	if (!regmap) {
-		dev_err(&pdev->dev, "Unable to get regmap\n");
-		return -EINVAL;
-	}
-
-	wled = devm_kzalloc(&pdev->dev, sizeof(*wled), GFP_KERNEL);
-	if (!wled)
-		return -ENOMEM;
-
-	wled->regmap = regmap;
-
-	rc = pm8941_wled_configure(wled, &pdev->dev);
-	if (rc)
-		return rc;
-
-	rc = pm8941_wled_setup(wled);
-	if (rc)
-		return rc;
-
-	wled->cdev.brightness_set = pm8941_wled_set_brightness;
-
-	rc = devm_led_classdev_register(&pdev->dev, &wled->cdev);
-	if (rc)
-		return rc;
-
-	platform_set_drvdata(pdev, wled);
-
-	return 0;
-};
-
-static const struct of_device_id pm8941_wled_match_table[] = {
-	{ .compatible = "qcom,pm8941-wled" },
-	{}
-};
-MODULE_DEVICE_TABLE(of, pm8941_wled_match_table);
-
-static struct platform_driver pm8941_wled_driver = {
-	.probe = pm8941_wled_probe,
-	.driver	= {
-		.name = "pm8941-wled",
-		.of_match_table	= pm8941_wled_match_table,
-	},
-};
-
-module_platform_driver(pm8941_wled_driver);
-
-MODULE_DESCRIPTION("pm8941 wled driver");
-MODULE_LICENSE("GPL v2");
-MODULE_ALIAS("platform:pm8941-wled");
diff --git a/drivers/video/backlight/Kconfig b/drivers/video/backlight/Kconfig
index 0505b796d743..5ffa4b4e26c0 100644
--- a/drivers/video/backlight/Kconfig
+++ b/drivers/video/backlight/Kconfig
@@ -299,6 +299,13 @@ config BACKLIGHT_TOSA
 	  If you have an Sharp SL-6000 Zaurus say Y to enable a driver
 	  for its backlight
 
+config BACKLIGHT_PM8941_WLED
+	tristate "Qualcomm PM8941 WLED Driver"
+	select REGMAP
+	help
+	  If you have the Qualcomm PM8941, say Y to enable a driver for the
+	  WLED block.
+
 config BACKLIGHT_SAHARA
 	tristate "Tabletkiosk Sahara Touch-iT Backlight Driver"
 	depends on X86
diff --git a/drivers/video/backlight/Makefile b/drivers/video/backlight/Makefile
index d67073f9d421..16ec534cff30 100644
--- a/drivers/video/backlight/Makefile
+++ b/drivers/video/backlight/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_BACKLIGHT_OMAP1)		+= omap1_bl.o
 obj-$(CONFIG_BACKLIGHT_OT200)		+= ot200_bl.o
 obj-$(CONFIG_BACKLIGHT_PANDORA)		+= pandora_bl.o
 obj-$(CONFIG_BACKLIGHT_PCF50633)	+= pcf50633-backlight.o
+obj-$(CONFIG_BACKLIGHT_PM8941_WLED)	+= pm8941-wled.o
 obj-$(CONFIG_BACKLIGHT_PWM)		+= pwm_bl.o
 obj-$(CONFIG_BACKLIGHT_SAHARA)		+= kb3886_bl.o
 obj-$(CONFIG_BACKLIGHT_SKY81452)	+= sky81452-backlight.o
diff --git a/drivers/video/backlight/pm8941-wled.c b/drivers/video/backlight/pm8941-wled.c
new file mode 100644
index 000000000000..c704c3236034
--- /dev/null
+++ b/drivers/video/backlight/pm8941-wled.c
@@ -0,0 +1,427 @@
+/* Copyright (c) 2015, Sony Mobile Communications, AB.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/backlight.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/regmap.h>
+
+#define PM8941_WLED_REG_VAL_BASE		0x40
+#define  PM8941_WLED_REG_VAL_MAX		0xFFF
+
+#define PM8941_WLED_REG_MOD_EN			0x46
+#define  PM8941_WLED_REG_MOD_EN_BIT		BIT(7)
+#define  PM8941_WLED_REG_MOD_EN_MASK		BIT(7)
+
+#define PM8941_WLED_REG_SYNC			0x47
+#define  PM8941_WLED_REG_SYNC_MASK		0x07
+#define  PM8941_WLED_REG_SYNC_LED1		BIT(0)
+#define  PM8941_WLED_REG_SYNC_LED2		BIT(1)
+#define  PM8941_WLED_REG_SYNC_LED3		BIT(2)
+#define  PM8941_WLED_REG_SYNC_ALL		0x07
+#define  PM8941_WLED_REG_SYNC_CLEAR		0x00
+
+#define PM8941_WLED_REG_FREQ			0x4c
+#define  PM8941_WLED_REG_FREQ_MASK		0x0f
+
+#define PM8941_WLED_REG_OVP			0x4d
+#define  PM8941_WLED_REG_OVP_MASK		0x03
+
+#define PM8941_WLED_REG_BOOST			0x4e
+#define  PM8941_WLED_REG_BOOST_MASK		0x07
+
+#define PM8941_WLED_REG_SINK			0x4f
+#define  PM8941_WLED_REG_SINK_MASK		0xe0
+#define  PM8941_WLED_REG_SINK_SHFT		0x05
+
+/* Per-'string' registers below */
+#define PM8941_WLED_REG_STR_OFFSET		0x10
+
+#define PM8941_WLED_REG_STR_MOD_EN_BASE		0x60
+#define  PM8941_WLED_REG_STR_MOD_MASK		BIT(7)
+#define  PM8941_WLED_REG_STR_MOD_EN		BIT(7)
+
+#define PM8941_WLED_REG_STR_SCALE_BASE		0x62
+#define  PM8941_WLED_REG_STR_SCALE_MASK		0x1f
+
+#define PM8941_WLED_REG_STR_MOD_SRC_BASE	0x63
+#define  PM8941_WLED_REG_STR_MOD_SRC_MASK	0x01
+#define  PM8941_WLED_REG_STR_MOD_SRC_INT	0x00
+#define  PM8941_WLED_REG_STR_MOD_SRC_EXT	0x01
+
+#define PM8941_WLED_REG_STR_CABC_BASE		0x66
+#define  PM8941_WLED_REG_STR_CABC_MASK		BIT(7)
+#define  PM8941_WLED_REG_STR_CABC_EN		BIT(7)
+
+struct pm8941_wled_config {
+	u32 i_boost_limit;
+	u32 ovp;
+	u32 switch_freq;
+	u32 num_strings;
+	u32 i_limit;
+	bool cs_out_en;
+	bool ext_gen;
+	bool cabc_en;
+};
+
+struct pm8941_wled {
+	const char *name;
+	struct regmap *regmap;
+	u16 addr;
+
+	struct pm8941_wled_config cfg;
+};
+
+static int pm8941_wled_update_status(struct backlight_device *bl)
+{
+	struct pm8941_wled *wled = bl_get_data(bl);
+	u16 val = bl->props.brightness;
+	u8 ctrl = 0;
+	int rc;
+	int i;
+
+	if (bl->props.power != FB_BLANK_UNBLANK ||
+	    bl->props.fb_blank != FB_BLANK_UNBLANK ||
+	    bl->props.state & BL_CORE_FBBLANK)
+		val = 0;
+
+	if (val != 0)
+		ctrl = PM8941_WLED_REG_MOD_EN_BIT;
+
+	rc = regmap_update_bits(wled->regmap,
+			wled->addr + PM8941_WLED_REG_MOD_EN,
+			PM8941_WLED_REG_MOD_EN_MASK, ctrl);
+	if (rc)
+		return rc;
+
+	for (i = 0; i < wled->cfg.num_strings; ++i) {
+		u8 v[2] = { val & 0xff, (val >> 8) & 0xf };
+
+		rc = regmap_bulk_write(wled->regmap,
+				wled->addr + PM8941_WLED_REG_VAL_BASE + 2 * i,
+				v, 2);
+		if (rc)
+			return rc;
+	}
+
+	rc = regmap_update_bits(wled->regmap,
+			wled->addr + PM8941_WLED_REG_SYNC,
+			PM8941_WLED_REG_SYNC_MASK, PM8941_WLED_REG_SYNC_ALL);
+	if (rc)
+		return rc;
+
+	rc = regmap_update_bits(wled->regmap,
+			wled->addr + PM8941_WLED_REG_SYNC,
+			PM8941_WLED_REG_SYNC_MASK, PM8941_WLED_REG_SYNC_CLEAR);
+	return rc;
+}
+
+static int pm8941_wled_setup(struct pm8941_wled *wled)
+{
+	int rc;
+	int i;
+
+	rc = regmap_update_bits(wled->regmap,
+			wled->addr + PM8941_WLED_REG_OVP,
+			PM8941_WLED_REG_OVP_MASK, wled->cfg.ovp);
+	if (rc)
+		return rc;
+
+	rc = regmap_update_bits(wled->regmap,
+			wled->addr + PM8941_WLED_REG_BOOST,
+			PM8941_WLED_REG_BOOST_MASK, wled->cfg.i_boost_limit);
+	if (rc)
+		return rc;
+
+	rc = regmap_update_bits(wled->regmap,
+			wled->addr + PM8941_WLED_REG_FREQ,
+			PM8941_WLED_REG_FREQ_MASK, wled->cfg.switch_freq);
+	if (rc)
+		return rc;
+
+	if (wled->cfg.cs_out_en) {
+		u8 all = (BIT(wled->cfg.num_strings) - 1)
+				<< PM8941_WLED_REG_SINK_SHFT;
+
+		rc = regmap_update_bits(wled->regmap,
+				wled->addr + PM8941_WLED_REG_SINK,
+				PM8941_WLED_REG_SINK_MASK, all);
+		if (rc)
+			return rc;
+	}
+
+	for (i = 0; i < wled->cfg.num_strings; ++i) {
+		u16 addr = wled->addr + PM8941_WLED_REG_STR_OFFSET * i;
+
+		rc = regmap_update_bits(wled->regmap,
+				addr + PM8941_WLED_REG_STR_MOD_EN_BASE,
+				PM8941_WLED_REG_STR_MOD_MASK,
+				PM8941_WLED_REG_STR_MOD_EN);
+		if (rc)
+			return rc;
+
+		if (wled->cfg.ext_gen) {
+			rc = regmap_update_bits(wled->regmap,
+					addr + PM8941_WLED_REG_STR_MOD_SRC_BASE,
+					PM8941_WLED_REG_STR_MOD_SRC_MASK,
+					PM8941_WLED_REG_STR_MOD_SRC_EXT);
+			if (rc)
+				return rc;
+		}
+
+		rc = regmap_update_bits(wled->regmap,
+				addr + PM8941_WLED_REG_STR_SCALE_BASE,
+				PM8941_WLED_REG_STR_SCALE_MASK,
+				wled->cfg.i_limit);
+		if (rc)
+			return rc;
+
+		rc = regmap_update_bits(wled->regmap,
+				addr + PM8941_WLED_REG_STR_CABC_BASE,
+				PM8941_WLED_REG_STR_CABC_MASK,
+				wled->cfg.cabc_en ?
+					PM8941_WLED_REG_STR_CABC_EN : 0);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
+static const struct pm8941_wled_config pm8941_wled_config_defaults = {
+	.i_boost_limit = 3,
+	.i_limit = 20,
+	.ovp = 2,
+	.switch_freq = 5,
+	.num_strings = 0,
+	.cs_out_en = false,
+	.ext_gen = false,
+	.cabc_en = false,
+};
+
+struct pm8941_wled_var_cfg {
+	const u32 *values;
+	u32 (*fn)(u32);
+	int size;
+};
+
+static const u32 pm8941_wled_i_boost_limit_values[] = {
+	105, 385, 525, 805, 980, 1260, 1400, 1680,
+};
+
+static const struct pm8941_wled_var_cfg pm8941_wled_i_boost_limit_cfg = {
+	.values = pm8941_wled_i_boost_limit_values,
+	.size = ARRAY_SIZE(pm8941_wled_i_boost_limit_values),
+};
+
+static const u32 pm8941_wled_ovp_values[] = {
+	35, 32, 29, 27,
+};
+
+static const struct pm8941_wled_var_cfg pm8941_wled_ovp_cfg = {
+	.values = pm8941_wled_ovp_values,
+	.size = ARRAY_SIZE(pm8941_wled_ovp_values),
+};
+
+static u32 pm8941_wled_num_strings_values_fn(u32 idx)
+{
+	return idx + 1;
+}
+
+static const struct pm8941_wled_var_cfg pm8941_wled_num_strings_cfg = {
+	.fn = pm8941_wled_num_strings_values_fn,
+	.size = 3,
+};
+
+static u32 pm8941_wled_switch_freq_values_fn(u32 idx)
+{
+	return 19200 / (2 * (1 + idx));
+}
+
+static const struct pm8941_wled_var_cfg pm8941_wled_switch_freq_cfg = {
+	.fn = pm8941_wled_switch_freq_values_fn,
+	.size = 16,
+};
+
+static const struct pm8941_wled_var_cfg pm8941_wled_i_limit_cfg = {
+	.size = 26,
+};
+
+static u32 pm8941_wled_values(const struct pm8941_wled_var_cfg *cfg, u32 idx)
+{
+	if (idx >= cfg->size)
+		return UINT_MAX;
+	if (cfg->fn)
+		return cfg->fn(idx);
+	if (cfg->values)
+		return cfg->values[idx];
+	return idx;
+}
+
+static int pm8941_wled_configure(struct pm8941_wled *wled, struct device *dev)
+{
+	struct pm8941_wled_config *cfg = &wled->cfg;
+	u32 val;
+	int rc;
+	u32 c;
+	int i;
+	int j;
+
+	const struct {
+		const char *name;
+		u32 *val_ptr;
+		const struct pm8941_wled_var_cfg *cfg;
+	} u32_opts[] = {
+		{
+			"qcom,current-boost-limit",
+			&cfg->i_boost_limit,
+			.cfg = &pm8941_wled_i_boost_limit_cfg,
+		},
+		{
+			"qcom,current-limit",
+			&cfg->i_limit,
+			.cfg = &pm8941_wled_i_limit_cfg,
+		},
+		{
+			"qcom,ovp",
+			&cfg->ovp,
+			.cfg = &pm8941_wled_ovp_cfg,
+		},
+		{
+			"qcom,switching-freq",
+			&cfg->switch_freq,
+			.cfg = &pm8941_wled_switch_freq_cfg,
+		},
+		{
+			"qcom,num-strings",
+			&cfg->num_strings,
+			.cfg = &pm8941_wled_num_strings_cfg,
+		},
+	};
+	const struct {
+		const char *name;
+		bool *val_ptr;
+	} bool_opts[] = {
+		{ "qcom,cs-out", &cfg->cs_out_en, },
+		{ "qcom,ext-gen", &cfg->ext_gen, },
+		{ "qcom,cabc", &cfg->cabc_en, },
+	};
+
+	rc = of_property_read_u32(dev->of_node, "reg", &val);
+	if (rc || val > 0xffff) {
+		dev_err(dev, "invalid IO resources\n");
+		return rc ? rc : -EINVAL;
+	}
+	wled->addr = val;
+
+	rc = of_property_read_string(dev->of_node, "label", &wled->name);
+	if (rc)
+		wled->name = dev->of_node->name;
+
+	*cfg = pm8941_wled_config_defaults;
+	for (i = 0; i < ARRAY_SIZE(u32_opts); ++i) {
+		rc = of_property_read_u32(dev->of_node, u32_opts[i].name, &val);
+		if (rc == -EINVAL) {
+			continue;
+		} else if (rc) {
+			dev_err(dev, "error reading '%s'\n", u32_opts[i].name);
+			return rc;
+		}
+
+		c = UINT_MAX;
+		for (j = 0; c != val; j++) {
+			c = pm8941_wled_values(u32_opts[i].cfg, j);
+			if (c == UINT_MAX) {
+				dev_err(dev, "invalid value for '%s'\n",
+					u32_opts[i].name);
+				return -EINVAL;
+			}
+		}
+
+		dev_dbg(dev, "'%s' = %u\n", u32_opts[i].name, c);
+		*u32_opts[i].val_ptr = j;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(bool_opts); ++i) {
+		if (of_property_read_bool(dev->of_node, bool_opts[i].name))
+			*bool_opts[i].val_ptr = true;
+	}
+
+	cfg->num_strings = cfg->num_strings + 1;
+
+	return 0;
+}
+
+static const struct backlight_ops pm8941_wled_ops = {
+	.update_status = pm8941_wled_update_status,
+};
+
+static int pm8941_wled_probe(struct platform_device *pdev)
+{
+	struct backlight_properties props;
+	struct backlight_device *bl;
+	struct pm8941_wled *wled;
+	struct regmap *regmap;
+	int rc;
+
+	regmap = dev_get_regmap(pdev->dev.parent, NULL);
+	if (!regmap) {
+		dev_err(&pdev->dev, "Unable to get regmap\n");
+		return -EINVAL;
+	}
+
+	wled = devm_kzalloc(&pdev->dev, sizeof(*wled), GFP_KERNEL);
+	if (!wled)
+		return -ENOMEM;
+
+	wled->regmap = regmap;
+
+	rc = pm8941_wled_configure(wled, &pdev->dev);
+	if (rc)
+		return rc;
+
+	rc = pm8941_wled_setup(wled);
+	if (rc)
+		return rc;
+
+	memset(&props, 0, sizeof(struct backlight_properties));
+	props.type = BACKLIGHT_RAW;
+	props.max_brightness = PM8941_WLED_REG_VAL_MAX;
+	bl = devm_backlight_device_register(&pdev->dev, wled->name,
+					    &pdev->dev, wled,
+					    &pm8941_wled_ops, &props);
+	if (IS_ERR(bl))
+		return PTR_ERR(bl);
+
+	return 0;
+};
+
+static const struct of_device_id pm8941_wled_match_table[] = {
+	{ .compatible = "qcom,pm8941-wled" },
+	{}
+};
+MODULE_DEVICE_TABLE(of, pm8941_wled_match_table);
+
+static struct platform_driver pm8941_wled_driver = {
+	.probe = pm8941_wled_probe,
+	.driver	= {
+		.name = "pm8941-wled",
+		.of_match_table	= pm8941_wled_match_table,
+	},
+};
+
+module_platform_driver(pm8941_wled_driver);
+
+MODULE_DESCRIPTION("pm8941 wled driver");
+MODULE_LICENSE("GPL v2");
-- 
cgit v1.2.3


From ca30475698696af3a03f6eaee16472ae09d42269 Mon Sep 17 00:00:00 2001
From: "Xiao, Nan" <nan.xiao@hp.com>
Date: Mon, 24 Aug 2015 06:22:42 +0000
Subject: x86/vt-d: Fix documentation of DRHD

According to "Intel Virtualization Technology for Directed
I/O" specification, DRHD stands for "DMA Remapping Hardware
Unit Definition" , not "DMA Engine Reporting Structure".

Signed-off-by: Nan Xiao <nan.xiao@hp.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 Documentation/Intel-IOMMU.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/Intel-IOMMU.txt b/Documentation/Intel-IOMMU.txt
index cf9431db8731..7b57fc087088 100644
--- a/Documentation/Intel-IOMMU.txt
+++ b/Documentation/Intel-IOMMU.txt
@@ -10,7 +10,7 @@ This guide gives a quick cheat sheet for some basic understanding.
 Some Keywords
 
 DMAR - DMA remapping
-DRHD - DMA Engine Reporting Structure
+DRHD - DMA Remapping Hardware Unit Definition
 RMRR - Reserved memory Region Reporting Structure
 ZLR  - Zero length reads from PCI devices
 IOVA - IO Virtual address.
-- 
cgit v1.2.3


From 068627a5978a885b33f0f49da81860ce78c3ac6f Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert+renesas@glider.be>
Date: Tue, 4 Aug 2015 15:11:21 +0200
Subject: of: Add vendor prefix for JEDEC Solid State Technology Association

Add the "jedec" vendor prefix for the "JEDEC Solid State Technology
Association" (formerly known as the "Joint Electron Device Engineering
Council"), which is already in use in several bindings.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index 9a4ac79038e9..953aa8f173cf 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -112,6 +112,7 @@ intel	Intel Corporation
 intercontrol	Inter Control Group
 isee	ISEE 2007 S.L.
 isil	Intersil
+jedec	JEDEC Solid State Technology Association
 karo	Ka-Ro electronics GmbH
 keymile	Keymile GmbH
 kinetic Kinetic Technologies
-- 
cgit v1.2.3


From 126b16e2ad98c46aa0f53dbf62f71c09ba6b1d99 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Thu, 23 Jul 2015 17:52:43 +0100
Subject: Docs: dt: add generic MSI bindings

Currently msi-parent is used in a couple of drivers despite being fairly
underspecified. This patch adds a generic binding for MSIs (including
the existing msi-parent property) enabling the description of platform
devices capable of using MSIs.

While MSIs are primarily distinguished by doorbell and payload, some MSI
controllers (e.g. the GICv3 ITS) also use side-band information
accompanying the write to identify the master which originated the MSI,
to allow for sandboxing. This sideband information is non-probeable and
needs to be described in the DT. Other MSI controllers may have
additional configuration details which need to be described per-master.

This patch adds a generic msi-parent binding document, extending the
de-facto standard with a new (optional) #msi-cells which can be used to
express any per-master configuration and/or sideband data. This is
sufficient to describe non-hotpluggable devices.

For busses where sideband data may be derived from some bus-specific
master ID scheme, other properties will be required to describe the
mapping.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
---
 .../bindings/interrupt-controller/msi.txt          | 135 +++++++++++++++++++++
 1 file changed, 135 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/interrupt-controller/msi.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/interrupt-controller/msi.txt b/Documentation/devicetree/bindings/interrupt-controller/msi.txt
new file mode 100644
index 000000000000..c60c034dcf19
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/msi.txt
@@ -0,0 +1,135 @@
+This document describes the generic device tree binding for MSI controllers and
+their master(s).
+
+Message Signaled Interrupts (MSIs) are a class of interrupts generated by a
+write to an MMIO address.
+
+MSIs were originally specified by PCI (and are used with PCIe), but may also be
+used with other busses, and hence a mechanism is required to relate devices on
+those busses to the MSI controllers which they are capable of using,
+potentially including additional information.
+
+MSIs are distinguished by some combination of:
+
+- The doorbell (the MMIO address written to).
+  
+  Devices may be configured by software to write to arbitrary doorbells which
+  they can address. An MSI controller may feature a number of doorbells.
+
+- The payload (the value written to the doorbell).
+  
+  Devices may be configured to write an arbitrary payload chosen by software.
+  MSI controllers may have restrictions on permitted payloads.
+
+- Sideband information accompanying the write.
+  
+  Typically this is neither configurable nor probeable, and depends on the path
+  taken through the memory system (i.e. it is a property of the combination of
+  MSI controller and device rather than a property of either in isolation).
+
+
+MSI controllers:
+================
+
+An MSI controller signals interrupts to a CPU when a write is made to an MMIO
+address by some master. An MSI controller may feature a number of doorbells.
+
+Required properties:
+--------------------
+
+- msi-controller: Identifies the node as an MSI controller.
+
+Optional properties:
+--------------------
+
+- #msi-cells: The number of cells in an msi-specifier, required if not zero.
+
+  Typically this will encode information related to sideband data, and will
+  not encode doorbells or payloads as these can be configured dynamically.
+
+  The meaning of the msi-specifier is defined by the device tree binding of
+  the specific MSI controller. 
+
+
+MSI clients
+===========
+
+MSI clients are devices which generate MSIs. For each MSI they wish to
+generate, the doorbell and payload may be configured, though sideband
+information may not be configurable.
+
+Required properties:
+--------------------
+
+- msi-parent: A list of phandle + msi-specifier pairs, one for each MSI
+  controller which the device is capable of using.
+
+  This property is unordered, and MSIs may be allocated from any combination of
+  MSI controllers listed in the msi-parent property.
+
+  If a device has restrictions on the allocation of MSIs, these restrictions
+  must be described with additional properties.
+
+  When #msi-cells is non-zero, busses with an msi-parent will require
+  additional properties to describe the relationship between devices on the bus
+  and the set of MSIs they can potentially generate.
+
+
+Example
+=======
+
+/ {
+	#address-cells = <1>;
+	#size-cells = <1>;
+
+	msi_a: msi-controller@a {
+		reg = <0xa 0xf00>;
+		compatible = "vendor-a,some-controller";
+		msi-controller;
+		/* No sideband data, so #msi-cells omitted */
+	};
+
+	msi_b: msi-controller@b {
+		reg = <0xb 0xf00>;
+		compatible = "vendor-b,another-controller";
+		msi-controller;
+		/* Each device has some unique ID */
+		#msi-cells = <1>;
+	};
+
+	msi_c: msi-controller@c {
+		reg = <0xb 0xf00>;
+		compatible = "vendor-b,another-controller";
+		msi-controller;
+		/* Each device has some unique ID */
+		#msi-cells = <1>;
+	};
+
+	dev@0 {
+		reg = <0x0 0xf00>;
+		compatible = "vendor-c,some-device";
+
+		/* Can only generate MSIs to msi_a */
+		msi-parent = <&msi_a>;
+	};
+
+	dev@1 {
+		reg = <0x1 0xf00>;
+		compatible = "vendor-c,some-device";
+
+		/* 
+		 * Can generate MSIs to either A or B.
+		 */
+		msi-parent = <&msi_a>, <&msi_b 0x17>;
+	};
+
+	dev@2 {
+		reg = <0x2 0xf00>;
+		compatible = "vendor-c,some-device";
+		/*
+		 * Has different IDs at each MSI controller.
+		 * Can generate MSIs to all of the MSI controllers.
+		 */
+		msi-parent = <&msi_a>, <&msi_b 0x17>, <&msi_c 0x53>;
+	};
+};
-- 
cgit v1.2.3


From 43e122b014c955a33220fabbd09c4b5e4f422c3c Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 21 Aug 2015 17:38:02 -0700
Subject: tcp: refine pacing rate determination

When TCP pacing was added back in linux-3.12, we chose
to apply a fixed ratio of 200 % against current rate,
to allow probing for optimal throughput even during
slow start phase, where cwnd can be doubled every other gRTT.

At Google, we found it was better applying a different ratio
while in Congestion Avoidance phase.
This ratio was set to 120 %.

We've used the normal tcp_in_slow_start() helper for a while,
then tuned the condition to select the conservative ratio
as soon as cwnd >= ssthresh/2 :

- After cwnd reduction, it is safer to ramp up more slowly,
  as we approach optimal cwnd.
- Initial ramp up (ssthresh == INFINITY) still allows doubling
  cwnd every other RTT.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/ip-sysctl.txt | 15 +++++++++++++++
 include/net/tcp.h                      |  2 ++
 net/ipv4/sysctl_net_ipv4.c             | 19 +++++++++++++++++++
 net/ipv4/tcp_input.c                   | 18 +++++++++++++++++-
 4 files changed, 53 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 46e88ed7f41d..ac77a13d2ea2 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -586,6 +586,21 @@ tcp_min_tso_segs - INTEGER
 	if available window is too small.
 	Default: 2
 
+tcp_pacing_ss_ratio - INTEGER
+	sk->sk_pacing_rate is set by TCP stack using a ratio applied
+	to current rate. (current_rate = cwnd * mss / srtt)
+	If TCP is in slow start, tcp_pacing_ss_ratio is applied
+	to let TCP probe for bigger speeds, assuming cwnd can be
+	doubled every other RTT.
+	Default: 200
+
+tcp_pacing_ca_ratio - INTEGER
+	sk->sk_pacing_rate is set by TCP stack using a ratio applied
+	to current rate. (current_rate = cwnd * mss / srtt)
+	If TCP is in congestion avoidance phase, tcp_pacing_ca_ratio
+	is applied to conservatively probe for bigger throughput.
+	Default: 120
+
 tcp_tso_win_divisor - INTEGER
 	This allows control over what percentage of the congestion window
 	can be consumed by a single TSO frame.
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 309801f7eb82..4a7b03947a38 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -281,6 +281,8 @@ extern unsigned int sysctl_tcp_notsent_lowat;
 extern int sysctl_tcp_min_tso_segs;
 extern int sysctl_tcp_autocorking;
 extern int sysctl_tcp_invalid_ratelimit;
+extern int sysctl_tcp_pacing_ss_ratio;
+extern int sysctl_tcp_pacing_ca_ratio;
 
 extern atomic_long_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 0330ab2e2b63..879bdc5c95b1 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -29,6 +29,7 @@
 static int zero;
 static int one = 1;
 static int four = 4;
+static int thousand = 1000;
 static int gso_max_segs = GSO_MAX_SEGS;
 static int tcp_retr1_max = 255;
 static int ip_local_port_range_min[] = { 1, 1 };
@@ -711,6 +712,24 @@ static struct ctl_table ipv4_table[] = {
 		.extra1		= &one,
 		.extra2		= &gso_max_segs,
 	},
+	{
+		.procname	= "tcp_pacing_ss_ratio",
+		.data		= &sysctl_tcp_pacing_ss_ratio,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &thousand,
+	},
+	{
+		.procname	= "tcp_pacing_ca_ratio",
+		.data		= &sysctl_tcp_pacing_ca_ratio,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &thousand,
+	},
 	{
 		.procname	= "tcp_autocorking",
 		.data		= &sysctl_tcp_autocorking,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0abca2841de2..dc08e2352665 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -753,13 +753,29 @@ static void tcp_rtt_estimator(struct sock *sk, long mrtt_us)
  * TCP pacing, to smooth the burst on large writes when packets
  * in flight is significantly lower than cwnd (or rwin)
  */
+int sysctl_tcp_pacing_ss_ratio __read_mostly = 200;
+int sysctl_tcp_pacing_ca_ratio __read_mostly = 120;
+
 static void tcp_update_pacing_rate(struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 	u64 rate;
 
 	/* set sk_pacing_rate to 200 % of current rate (mss * cwnd / srtt) */
-	rate = (u64)tp->mss_cache * 2 * (USEC_PER_SEC << 3);
+	rate = (u64)tp->mss_cache * ((USEC_PER_SEC / 100) << 3);
+
+	/* current rate is (cwnd * mss) / srtt
+	 * In Slow Start [1], set sk_pacing_rate to 200 % the current rate.
+	 * In Congestion Avoidance phase, set it to 120 % the current rate.
+	 *
+	 * [1] : Normal Slow Start condition is (tp->snd_cwnd < tp->snd_ssthresh)
+	 *	 If snd_cwnd >= (tp->snd_ssthresh / 2), we are approaching
+	 *	 end of slow start and should slow down.
+	 */
+	if (tp->snd_cwnd < tp->snd_ssthresh / 2)
+		rate *= sysctl_tcp_pacing_ss_ratio;
+	else
+		rate *= sysctl_tcp_pacing_ca_ratio;
 
 	rate *= max(tp->snd_cwnd, tp->packets_out);
 
-- 
cgit v1.2.3


From bbf58bf3488e41f346536aa89d62bdf2fe771128 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@primarydata.com>
Date: Mon, 24 Aug 2015 20:39:18 -0400
Subject: NFSv4.2/pnfs: Make the layoutstats timer configurable

Allow advanced users to set the layoutstats timer in order to lengthen
or shorten the period between layoutstat transmissions to the server.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 Documentation/kernel-parameters.txt    | 9 +++++++++
 fs/nfs/flexfilelayout/flexfilelayout.c | 5 ++++-
 fs/nfs/pnfs.c                          | 4 ++++
 fs/nfs/pnfs.h                          | 3 +++
 4 files changed, 20 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1d6f0459cd7b..30d78b561574 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2279,6 +2279,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			The default parameter value of '0' causes the kernel
 			not to attempt recovery of lost locks.
 
+	nfs4.layoutstats_timer =
+			[NFSv4.2] Change the rate at which the kernel sends
+			layoutstats to the pNFS metadata server.
+
+			Setting this to value to 0 causes the kernel to use
+			whatever value is the default set by the layout
+			driver. A non-zero value sets the minimum interval
+			in seconds between layoutstats transmissions.
+
 	nfsd.nfs4_disable_idmapping=
 			[NFSv4] When set to the default of '1', the NFSv4
 			server will return only numeric uids and gids to
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 0fbf37de2a41..9f6fb8876b3f 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -533,14 +533,17 @@ nfs4_ff_layoutstat_start_io(struct nfs4_ff_layout_mirror *mirror,
 			    ktime_t now)
 {
 	static const ktime_t notime = {0};
+	s64 report_interval = FF_LAYOUTSTATS_REPORT_INTERVAL;
 
 	nfs4_ff_start_busy_timer(&layoutstat->busy_timer, now);
 	if (ktime_equal(mirror->start_time, notime))
 		mirror->start_time = now;
 	if (ktime_equal(mirror->last_report_time, notime))
 		mirror->last_report_time = now;
+	if (layoutstats_timer != 0)
+		report_interval = (s64)layoutstats_timer * 1000LL;
 	if (ktime_to_ms(ktime_sub(now, mirror->last_report_time)) >=
-			FF_LAYOUTSTATS_REPORT_INTERVAL) {
+			report_interval) {
 		mirror->last_report_time = now;
 		return true;
 	}
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 247c5a5d2d6b..3530bb703214 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -2285,3 +2285,7 @@ out_put:
 }
 EXPORT_SYMBOL_GPL(pnfs_report_layoutstat);
 #endif
+
+unsigned int layoutstats_timer;
+module_param(layoutstats_timer, uint, 0644);
+EXPORT_SYMBOL_GPL(layoutstats_timer);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 02c27f93caf1..d3979dd1037a 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -528,12 +528,15 @@ pnfs_use_threshold(struct nfs4_threshold **dst, struct nfs4_threshold *src,
 					nfss->pnfs_curr_ld->id == src->l_type);
 }
 
+extern unsigned int layoutstats_timer;
+
 #ifdef NFS_DEBUG
 void nfs4_print_deviceid(const struct nfs4_deviceid *dev_id);
 #else
 static inline void nfs4_print_deviceid(const struct nfs4_deviceid *dev_id)
 {
 }
+
 #endif /* NFS_DEBUG */
 #else  /* CONFIG_NFS_V4_1 */
 
-- 
cgit v1.2.3


From 77760e94928f910b745ab8d00298a7c8b5786fb3 Mon Sep 17 00:00:00 2001
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 25 Aug 2015 15:33:13 -0700
Subject: Documentation: networking: add a DSA document

Describe how the DSA subsystem works, its design principles,
limitations, and describe in details how to implement a DSA switch
driver.

Acked-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/dsa/dsa.txt | 615 +++++++++++++++++++++++++++++++++++
 1 file changed, 615 insertions(+)
 create mode 100644 Documentation/networking/dsa/dsa.txt

(limited to 'Documentation')

diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.txt
new file mode 100644
index 000000000000..aa9c1f9313cd
--- /dev/null
+++ b/Documentation/networking/dsa/dsa.txt
@@ -0,0 +1,615 @@
+Distributed Switch Architecture
+===============================
+
+Introduction
+============
+
+This document describes the Distributed Switch Architecture (DSA) subsystem
+design principles, limitations, interactions with other subsystems, and how to
+develop drivers for this subsystem as well as a TODO for developers interested
+in joining the effort.
+
+Design principles
+=================
+
+The Distributed Switch Architecture is a subsystem which was primarily designed
+to support Marvell Ethernet switches (MV88E6xxx, a.k.a Linkstreet product line)
+using Linux, but has since evolved to support other vendors as well.
+
+The original philosophy behind this design was to be able to use unmodified
+Linux tools such as bridge, iproute2, ifconfig to work transparently whether
+they configured/queried a switch port network device or a regular network
+device.
+
+An Ethernet switch is typically comprised of multiple front-panel ports, and one
+or more CPU or management port. The DSA subsystem currently relies on the
+presence of a management port connected to an Ethernet controller capable of
+receiving Ethernet frames from the switch. This is a very common setup for all
+kinds of Ethernet switches found in Small Home and Office products: routers,
+gateways, or even top-of-the rack switches. This host Ethernet controller will
+be later referred to as "master" and "cpu" in DSA terminology and code.
+
+The D in DSA stands for Distributed, because the subsystem has been designed
+with the ability to configure and manage cascaded switches on top of each other
+using upstream and downstream Ethernet links between switches. These specific
+ports are referred to as "dsa" ports in DSA terminology and code. A collection
+of multiple switches connected to each other is called a "switch tree".
+
+For each front-panel port, DSA will create specialized network devices which are
+used as controlling and data-flowing endpoints for use by the Linux networking
+stack. These specialized network interfaces are referred to as "slave" network
+interfaces in DSA terminology and code.
+
+The ideal case for using DSA is when an Ethernet switch supports a "switch tag"
+which is a hardware feature making the switch insert a specific tag for each
+Ethernet frames it received to/from specific ports to help the management
+interface figure out:
+
+- what port is this frame coming from
+- what was the reason why this frame got forwarded
+- how to send CPU originated traffic to specific ports
+
+The subsystem does support switches not capable of inserting/stripping tags, but
+the features might be slightly limited in that case (traffic separation relies
+on Port-based VLAN IDs).
+
+Note that DSA does not currently create network interfaces for the "cpu" and
+"dsa" ports because:
+
+- the "cpu" port is the Ethernet switch facing side of the management
+  controller, and as such, would create a duplication of feature, since you
+  would get two interfaces for the same conduit: master netdev, and "cpu" netdev
+
+- the "dsa" port(s) are just conduits between two or more switches, and as such
+  cannot really be used as proper network interfaces either, only the
+  downstream, or the top-most upstream interface makes sense with that model
+
+Switch tagging protocols
+------------------------
+
+DSA currently supports 4 different tagging protocols, and a tag-less mode as
+well. The different protocols are implemented in:
+
+net/dsa/tag_trailer.c: Marvell's 4 trailer tag mode (legacy)
+net/dsa/tag_dsa.c: Marvell's original DSA tag
+net/dsa/tag_edsa.c: Marvell's enhanced DSA tag
+net/dsa/tag_brcm.c: Broadcom's 4 bytes tag
+
+The exact format of the tag protocol is vendor specific, but in general, they
+all contain something which:
+
+- identifies which port the Ethernet frame came from/should be sent to
+- provides a reason why this frame was forwarded to the management interface
+
+Master network devices
+----------------------
+
+Master network devices are regular, unmodified Linux network device drivers for
+the CPU/management Ethernet interface. Such a driver might occasionally need to
+know whether DSA is enabled (e.g.: to enable/disable specific offload features),
+but the DSA subsystem has been proven to work with industry standard drivers:
+e1000e, mv643xx_eth etc. without having to introduce modifications to these
+drivers. Such network devices are also often referred to as conduit network
+devices since they act as a pipe between the host processor and the hardware
+Ethernet switch.
+
+Networking stack hooks
+----------------------
+
+When a master netdev is used with DSA, a small hook is placed in in the
+networking stack is in order to have the DSA subsystem process the Ethernet
+switch specific tagging protocol. DSA accomplishes this by registering a
+specific (and fake) Ethernet type (later becoming skb->protocol) with the
+networking stack, this is also known as a ptype or packet_type. A typical
+Ethernet Frame receive sequence looks like this:
+
+Master network device (e.g.: e1000e):
+
+Receive interrupt fires:
+- receive function is invoked
+- basic packet processing is done: getting length, status etc.
+- packet is prepared to be processed by the Ethernet layer by calling
+  eth_type_trans
+
+net/ethernet/eth.c:
+
+eth_type_trans(skb, dev)
+	if (dev->dsa_ptr != NULL)
+		-> skb->protocol = ETH_P_XDSA
+
+drivers/net/ethernet/*:
+
+netif_receive_skb(skb)
+	-> iterate over registered packet_type
+		-> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv()
+
+net/dsa/dsa.c:
+	-> dsa_switch_rcv()
+		-> invoke switch tag specific protocol handler in
+		   net/dsa/tag_*.c
+
+net/dsa/tag_*.c:
+	-> inspect and strip switch tag protocol to determine originating port
+	-> locate per-port network device
+	-> invoke eth_type_trans() with the DSA slave network device
+	-> invoked netif_receive_skb()
+
+Past this point, the DSA slave network devices get delivered regular Ethernet
+frames that can be processed by the networking stack.
+
+Slave network devices
+---------------------
+
+Slave network devices created by DSA are stacked on top of their master network
+device, each of these network interfaces will be responsible for being a
+controlling and data-flowing end-point for each front-panel port of the switch.
+These interfaces are specialized in order to:
+
+- insert/remove the switch tag protocol (if it exists) when sending traffic
+  to/from specific switch ports
+- query the switch for ethtool operations: statistics, link state,
+  Wake-on-LAN, register dumps...
+- external/internal PHY management: link, auto-negotiation etc.
+
+These slave network devices have custom net_device_ops and ethtool_ops function
+pointers which allow DSA to introduce a level of layering between the networking
+stack/ethtool, and the switch driver implementation.
+
+Upon frame transmission from these slave network devices, DSA will look up which
+switch tagging protocol is currently registered with these network devices, and
+invoke a specific transmit routine which takes care of adding the relevant
+switch tag in the Ethernet frames.
+
+These frames are then queued for transmission using the master network device
+ndo_start_xmit() function, since they contain the appropriate switch tag, the
+Ethernet switch will be able to process these incoming frames from the
+management interface and delivers these frames to the physical switch port.
+
+Graphical representation
+------------------------
+
+Summarized, this is basically how DSA looks like from a network device
+perspective:
+
+
+			|---------------------------
+			| CPU network device (eth0)|
+			----------------------------
+			| <tag added by switch     |
+			|                          |
+			|                          |
+			|        tag added by CPU> |
+		|--------------------------------------------|
+		| Switch driver				     |
+		|--------------------------------------------|
+                    ||        ||         ||
+		|-------|  |-------|  |-------|
+		| sw0p0 |  | sw0p1 |  | sw0p2 |
+		|-------|  |-------|  |-------|
+
+Slave MDIO bus
+--------------
+
+In order to be able to read to/from a switch PHY built into it, DSA creates a
+slave MDIO bus which allows a specific switch driver to divert and intercept
+MDIO reads/writes towards specific PHY addresses. In most MDIO-connected
+switches, these functions would utilize direct or indirect PHY addressing mode
+to return standard MII registers from the switch builtin PHYs, allowing the PHY
+library and/or to return link status, link partner pages, auto-negotiation
+results etc..
+
+For Ethernet switches which have both external and internal MDIO busses, the
+slave MII bus can be utilized to mux/demux MDIO reads and writes towards either
+internal or external MDIO devices this switch might be connected to: internal
+PHYs, external PHYs, or even external switches.
+
+Data structures
+---------------
+
+DSA data structures are defined in include/net/dsa.h as well as
+net/dsa/dsa_priv.h.
+
+dsa_chip_data: platform data configuration for a given switch device, this
+structure describes a switch device's parent device, its address, as well as
+various properties of its ports: names/labels, and finally a routing table
+indication (when cascading switches)
+
+dsa_platform_data: platform device configuration data which can reference a
+collection of dsa_chip_data structure if multiples switches are cascaded, the
+master network device this switch tree is attached to needs to be referenced
+
+dsa_switch_tree: structure assigned to the master network device under
+"dsa_ptr", this structure references a dsa_platform_data structure as well as
+the tagging protocol supported by the switch tree, and which receive/transmit
+function hooks should be invoked, information about the directly attached switch
+is also provided: CPU port. Finally, a collection of dsa_switch are referenced
+to address individual switches in the tree.
+
+dsa_switch: structure describing a switch device in the tree, referencing a
+dsa_switch_tree as a backpointer, slave network devices, master network device,
+and a reference to the backing dsa_switch_driver
+
+dsa_switch_driver: structure referencing function pointers, see below for a full
+description.
+
+Design limitations
+==================
+
+DSA is a platform device driver
+-------------------------------
+
+DSA is implemented as a DSA platform device driver which is convenient because
+it will register the entire DSA switch tree attached to a master network device
+in one-shot, facilitating the device creation and simplifying the device driver
+model a bit, this comes however with a number of limitations:
+
+- building DSA and its switch drivers as modules is currently not working
+- the device driver parenting does not necessarily reflect the original
+  bus/device the switch can be created from
+- supporting non-MDIO and non-MMIO (platform) switches is not possible
+
+Limits on the number of devices and ports
+-----------------------------------------
+
+DSA currently limits the number of maximum switches within a tree to 4
+(DSA_MAX_SWITCHES), and the number of ports per switch to 12 (DSA_MAX_PORTS).
+These limits could be extended to support larger configurations would this need
+arise.
+
+Lack of CPU/DSA network devices
+-------------------------------
+
+DSA does not currently create slave network devices for the CPU or DSA ports, as
+described before. This might be an issue in the following cases:
+
+- inability to fetch switch CPU port statistics counters using ethtool, which
+  can make it harder to debug MDIO switch connected using xMII interfaces
+
+- inability to configure the CPU port link parameters based on the Ethernet
+  controller capabilities attached to it: http://patchwork.ozlabs.org/patch/509806/
+
+- inability to configure specific VLAN IDs / trunking VLANs between switches
+  when using a cascaded setup
+
+Common pitfalls using DSA setups
+--------------------------------
+
+Once a master network device is configured to use DSA (dev->dsa_ptr becomes
+non-NULL), and the switch behind it expects a tagging protocol, this network
+interface can only exclusively be used as a conduit interface. Sending packets
+directly through this interface (e.g.: opening a socket using this interface)
+will not make us go through the switch tagging protocol transmit function, so
+the Ethernet switch on the other end, expecting a tag will typically drop this
+frame.
+
+Slave network devices check that the master network device is UP before allowing
+you to administratively bring UP these slave network devices. A common
+configuration mistake is forgetting to bring UP the master network device first.
+
+Interactions with other subsystems
+==================================
+
+DSA currently leverages the following subsystems:
+
+- MDIO/PHY library: drivers/net/phy/phy.c, mdio_bus.c
+- Switchdev: net/switchdev/*
+- Device Tree for various of_* functions
+- HWMON: drivers/hwmon/*
+
+MDIO/PHY library
+----------------
+
+Slave network devices exposed by DSA may or may not be interfacing with PHY
+devices (struct phy_device as defined in include/linux/phy.h), but the DSA
+subsystem deals with all possible combinations:
+
+- internal PHY devices, built into the Ethernet switch hardware
+- external PHY devices, connected via an internal or external MDIO bus
+- internal PHY devices, connected via an internal MDIO bus
+- special, non-autonegotiated or non MDIO-managed PHY devices: SFPs, MoCA; a.k.a
+  fixed PHYs
+
+The PHY configuration is done by the dsa_slave_phy_setup() function and the
+logic basically looks like this:
+
+- if Device Tree is used, the PHY device is looked up using the standard
+  "phy-handle" property, if found, this PHY device is created and registered
+  using of_phy_connect()
+
+- if Device Tree is used, and the PHY device is "fixed", that is, conforms to
+  the definition of a non-MDIO managed PHY as defined in
+  Documentation/devicetree/bindings/net/fixed-link.txt, the PHY is registered
+  and connected transparently using the special fixed MDIO bus driver
+
+- finally, if the PHY is built into the switch, as is very common with
+  standalone switch packages, the PHY is probed using the slave MII bus created
+  by DSA
+
+
+SWITCHDEV
+---------
+
+DSA directly utilizes SWITCHDEV when interfacing with the bridge layer, and
+more specifically with its VLAN filtering portion when configuring VLANs on top
+of per-port slave network devices. Since DSA primarily deals with
+MDIO-connected switches, although not exclusively, SWITCHDEV's
+prepare/abort/commit phases are often simplified into a prepare phase which
+checks whether the operation is supporte by the DSA switch driver, and a commit
+phase which applies the changes.
+
+As of today, the only SWITCHDEV objects supported by DSA are the FDB and VLAN
+objects.
+
+Device Tree
+-----------
+
+DSA features a standardized binding which is documented in
+Documentation/devicetree/bindings/net/dsa/dsa.txt. PHY/MDIO library helper
+functions such as of_get_phy_mode(), of_phy_connect() are also used to query
+per-port PHY specific details: interface connection, MDIO bus location etc..
+
+HWMON
+-----
+
+Some switch drivers feature internal temperature sensors which are exposed as
+regular HWMON devices in /sys/class/hwmon/.
+
+Driver development
+==================
+
+DSA switch drivers need to implement a dsa_switch_driver structure which will
+contain the various members described below.
+
+register_switch_driver() registers this dsa_switch_driver in its internal list
+of drivers to probe for. unregister_switch_driver() does the exact opposite.
+
+Unless requested differently by setting the priv_size member accordingly, DSA
+does not allocate any driver private context space.
+
+Switch configuration
+--------------------
+
+- priv_size: additional size needed by the switch driver for its private context
+
+- tag_protocol: this is to indicate what kind of tagging protocol is supported,
+  should be a valid value from the dsa_tag_protocol enum
+
+- probe: probe routine which will be invoked by the DSA platform device upon
+  registration to test for the presence/absence of a switch device. For MDIO
+  devices, it is recommended to issue a read towards internal registers using
+  the switch pseudo-PHY and return whether this is a supported device. For other
+  buses, return a non-NULL string
+
+- setup: setup function for the switch, this function is responsible for setting
+  up the dsa_switch_driver private structure with all it needs: register maps,
+  interrupts, mutexes, locks etc.. This function is also expected to properly
+  configure the switch to separate all network interfaces from each other, that
+  is, they should be isolated by the switch hardware itself, typically by creating
+  a Port-based VLAN ID for each port and allowing only the CPU port and the
+  specific port to be in the forwarding vector. Ports that are unused by the
+  platform should be disabled. Past this function, the switch is expected to be
+  fully configured and ready to serve any kind of request. It is recommended
+  to issue a software reset of the switch during this setup function in order to
+  avoid relying on what a previous software agent such as a bootloader/firmware
+  may have previously configured.
+
+- set_addr: Some switches require the programming of the management interface's
+  Ethernet MAC address, switch drivers can also disable ageing of MAC addresses
+  on the management interface and "hardcode"/"force" this MAC address for the
+  CPU/management interface as an optimization
+
+PHY devices and link management
+-------------------------------
+
+- get_phy_flags: Some switches are interfaced to various kinds of Ethernet PHYs,
+  if the PHY library PHY driver needs to know about information it cannot obtain
+  on its own (e.g.: coming from switch memory mapped registers), this function
+  should return a 32-bits bitmask of "flags", that is private between the switch
+  driver and the Ethernet PHY driver in drivers/net/phy/*.
+
+- phy_read: Function invoked by the DSA slave MDIO bus when attempting to read
+  the switch port MDIO registers. If unavailable, return 0xffff for each read.
+  For builtin switch Ethernet PHYs, this function should allow reading the link
+  status, auto-negotiation results, link partner pages etc..
+
+- phy_write: Function invoked by the DSA slave MDIO bus when attempting to write
+  to the switch port MDIO registers. If unavailable return a negative error
+  code.
+
+- poll_link: Function invoked by DSA to query the link state of the switch
+  builtin Ethernet PHYs, per port. This function is responsible for calling
+  netif_carrier_{on,off} when appropriate, and can be used to poll all ports in a
+  single call. Executes from workqueue context.
+
+- adjust_link: Function invoked by the PHY library when a slave network device
+  is attached to a PHY device. This function is responsible for appropriately
+  configuring the switch port link parameters: speed, duplex, pause based on
+  what the phy_device is providing.
+
+- fixed_link_update: Function invoked by the PHY library, and specifically by
+  the fixed PHY driver asking the switch driver for link parameters that could
+  not be auto-negotiated, or obtained by reading the PHY registers through MDIO.
+  This is particularly useful for specific kinds of hardware such as QSGMII,
+  MoCA or other kinds of non-MDIO managed PHYs where out of band link
+  information is obtained
+
+Ethtool operations
+------------------
+
+- get_strings: ethtool function used to query the driver's strings, will
+  typically return statistics strings, private flags strings etc.
+
+- get_ethtool_stats: ethtool function used to query per-port statistics and
+  return their values. DSA overlays slave network devices general statistics:
+  RX/TX counters from the network device, with switch driver specific statistics
+  per port
+
+- get_sset_count: ethtool function used to query the number of statistics items
+
+- get_wol: ethtool function used to obtain Wake-on-LAN settings per-port, this
+  function may, for certain implementations also query the master network device
+  Wake-on-LAN settings if this interface needs to participate in Wake-on-LAN
+
+- set_wol: ethtool function used to configure Wake-on-LAN settings per-port,
+  direct counterpart to set_wol with similar restrictions
+
+- set_eee: ethtool function which is used to configure a switch port EEE (Green
+  Ethernet) settings, can optionally invoke the PHY library to enable EEE at the
+  PHY level if relevant. This function should enable EEE at the switch port MAC
+  controller and data-processing logic
+
+- get_eee: ethtool function which is used to query a switch port EEE settings,
+  this function should return the EEE state of the switch port MAC controller
+  and data-processing logic as well as query the PHY for its currently configured
+  EEE settings
+
+- get_eeprom_len: ethtool function returning for a given switch the EEPROM
+  length/size in bytes
+
+- get_eeprom: ethtool function returning for a given switch the EEPROM contents
+
+- set_eeprom: ethtool function writing specified data to a given switch EEPROM
+
+- get_regs_len: ethtool function returning the register length for a given
+  switch
+
+- get_regs: ethtool function returning the Ethernet switch internal register
+  contents. This function might require user-land code in ethtool to
+  pretty-print register values and registers
+
+Power management
+----------------
+
+- suspend: function invoked by the DSA platform device when the system goes to
+  suspend, should quiesce all Ethernet switch activities, but keep ports
+  participating in Wake-on-LAN active as well as additional wake-up logic if
+  supported
+
+- resume: function invoked by the DSA platform device when the system resumes,
+  should resume all Ethernet switch activities and re-configure the switch to be
+  in a fully active state
+
+- port_enable: function invoked by the DSA slave network device ndo_open
+  function when a port is administratively brought up, this function should be
+  fully enabling a given switch port. DSA takes care of marking the port with
+  BR_STATE_BLOCKING if the port is a bridge member, or BR_STATE_FORWARDING if it
+  was not, and propagating these changes down to the hardware
+
+- port_disable: function invoked by the DSA slave network device ndo_close
+  function when a port is administratively brought down, this function should be
+  fully disabling a given switch port. DSA takes care of marking the port with
+  BR_STATE_DISABLED and propagating changes to the hardware if this port is
+  disabled while being a bridge member
+
+Hardware monitoring
+-------------------
+
+These callbacks are only available if CONFIG_NET_DSA_HWMON is enabled:
+
+- get_temp: this function queries the given switch for its temperature
+
+- get_temp_limit: this function returns the switch current maximum temperature
+  limit
+
+- set_temp_limit: this function configures the maximum temperature limit allowed
+
+- get_temp_alarm: this function returns the critical temperature threshold
+  returning an alarm notification
+
+See Documentation/hwmon/sysfs-interface for details.
+
+Bridge layer
+------------
+
+- port_join_bridge: bridge layer function invoked when a given switch port is
+  added to a bridge, this function should be doing the necessary at the switch
+  level to permit the joining port from being added to the relevant logical
+  domain for it to ingress/egress traffic with other members of the bridge. DSA
+  does nothing but calculate a bitmask of switch ports currently members of the
+  specified bridge being requested the join
+
+- port_leave_bridge: bridge layer function invoked when a given switch port is
+  removed from a bridge, this function should be doing the necessary at the
+  switch level to deny the leaving port from ingress/egress traffic from the
+  remaining bridge members. When the port leaves the bridge, it should be aged
+  out at the switch hardware for the switch to (re) learn MAC addresses behind
+  this port. DSA calculates the bitmask of ports still members of the bridge
+  being left
+
+- port_stp_update: bridge layer function invoked when a given switch port STP
+  state is computed by the bridge layer and should be propagated to switch
+  hardware to forward/block/learn traffic. The switch driver is responsible for
+  computing a STP state change based on current and asked parameters and perform
+  the relevant ageing based on the intersection results
+
+Bridge VLAN filtering
+---------------------
+
+- port_pvid_get: bridge layer function invoked when a Port-based VLAN ID is
+  queried for the given switch port
+
+- port_pvid_set: bridge layer function invoked when a Port-based VLAN ID needs
+  to be configured on the given switch port
+
+- port_vlan_add: bridge layer function invoked when a VLAN is configured
+  (tagged or untagged) for the given switch port
+
+- port_vlan_del: bridge layer function invoked when a VLAN is removed from the
+  given switch port
+
+- vlan_getnext: bridge layer function invoked to query the next configured VLAN
+  in the switch, i.e. returns the bitmaps of members and untagged ports
+
+- port_fdb_add: bridge layer function invoked when the bridge wants to install a
+  Forwarding Database entry, the switch hardware should be programmed with the
+  specified address in the specified VLAN Id in the forwarding database
+  associated with this VLAN ID
+
+Note: VLAN ID 0 corresponds to the port private database, which, in the context
+of DSA, would be the its port-based VLAN, used by the associated bridge device.
+
+- port_fdb_del: bridge layer function invoked when the bridge wants to remove a
+  Forwarding Database entry, the switch hardware should be programmed to delete
+  the specified MAC address from the specified VLAN ID if it was mapped into
+  this port forwarding database
+
+TODO
+====
+
+The platform device problem
+---------------------------
+DSA is currently implemented as a platform device driver which is far from ideal
+as was discussed in this thread:
+
+http://permalink.gmane.org/gmane.linux.network/329848
+
+This basically prevents the device driver model to be properly used and applied,
+and support non-MDIO, non-MMIO Ethernet connected switches.
+
+Another problem with the platform device driver approach is that it prevents the
+use of a modular switch drivers build due to a circular dependency, illustrated
+here:
+
+http://comments.gmane.org/gmane.linux.network/345803
+
+Attempts of reworking this has been done here:
+
+https://lwn.net/Articles/643149/
+
+Making SWITCHDEV and DSA converge towards an unified codebase
+-------------------------------------------------------------
+
+SWITCHDEV properly takes care of abstracting the networking stack with offload
+capable hardware, but does not enforce a strict switch device driver model. On
+the other DSA enforces a fairly strict device driver model, and deals with most
+of the switch specific. At some point we should envision a merger between these
+two subsystems and get the best of both worlds.
+
+Other hanging fruits
+--------------------
+
+- making the number of ports fully dynamic and not dependent on DSA_MAX_PORTS
+- allowing more than one CPU/management interface:
+  http://comments.gmane.org/gmane.linux.network/365657
+- porting more drivers from other vendors:
+  http://comments.gmane.org/gmane.linux.network/365510
-- 
cgit v1.2.3


From ef6346386b096549972d5b62f773eafb772682e3 Mon Sep 17 00:00:00 2001
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 25 Aug 2015 15:33:14 -0700
Subject: Documentation: networking: dsa: Add Broadcom SF2 document

Add a document describing the Broadcom Starfigther 2 switch hardware,
its specifics, and how the driver is implemented and its specifics.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/dsa/bcm_sf2.txt | 114 +++++++++++++++++++++++++++++++
 1 file changed, 114 insertions(+)
 create mode 100644 Documentation/networking/dsa/bcm_sf2.txt

(limited to 'Documentation')

diff --git a/Documentation/networking/dsa/bcm_sf2.txt b/Documentation/networking/dsa/bcm_sf2.txt
new file mode 100644
index 000000000000..d999d0c1c5b8
--- /dev/null
+++ b/Documentation/networking/dsa/bcm_sf2.txt
@@ -0,0 +1,114 @@
+Broadcom Starfighter 2 Ethernet switch driver
+=============================================
+
+Broadcom's Starfighter 2 Ethernet switch hardware block is commonly found and
+deployed in the following products:
+
+- xDSL gateways such as BCM63138
+- streaming/multimedia Set Top Box such as BCM7445
+- Cable Modem/residential gateways such as BCM7145/BCM3390
+
+The switch is typically deployed in a configuration involving between 5 to 13
+ports, offering a range of built-in and customizable interfaces:
+
+- single integrated Gigabit PHY
+- quad integrated Gigabit PHY
+- quad external Gigabit PHY w/ MDIO multiplexer
+- integrated MoCA PHY
+- several external MII/RevMII/GMII/RGMII interfaces
+
+The switch also supports specific congestion control features which allow MoCA
+fail-over not to lose packets during a MoCA role re-election, as well as out of
+band back-pressure to the host CPU network interface when downstream interfaces
+are connected at a lower speed.
+
+The switch hardware block is typically interfaced using MMIO accesses and
+contains a bunch of sub-blocks/registers:
+
+* SWITCH_CORE: common switch registers
+* SWITCH_REG: external interfaces switch register
+* SWITCH_MDIO: external MDIO bus controller (there is another one in SWITCH_CORE,
+  which is used for indirect PHY accesses)
+* SWITCH_INDIR_RW: 64-bits wide register helper block
+* SWITCH_INTRL2_0/1: Level-2 interrupt controllers
+* SWITCH_ACB: Admission control block
+* SWITCH_FCB: Fail-over control block
+
+Implementation details
+======================
+
+The driver is located in drivers/net/dsa/bcm_sf2.c and is implemented as a DSA
+driver; see Documentation/networking/dsa/dsa.txt for details on the subsytem
+and what it provides.
+
+The SF2 switch is configured to enable a Broadcom specific 4-bytes switch tag
+which gets inserted by the switch for every packet forwarded to the CPU
+interface, conversely, the CPU network interface should insert a similar tag for
+packets entering the CPU port. The tag format is described in
+net/dsa/tag_brcm.c.
+
+Overall, the SF2 driver is a fairly regular DSA driver; there are a few
+specifics covered below.
+
+Device Tree probing
+-------------------
+
+The DSA platform device driver is probed using a specific compatible string
+provided in net/dsa/dsa.c. The reason for that is because the DSA subsystem gets
+registered as a platform device driver currently. DSA will provide the needed
+device_node pointers which are then accessible by the switch driver setup
+function to setup resources such as register ranges and interrupts. This
+currently works very well because none of the of_* functions utilized by the
+driver require a struct device to be bound to a struct device_node, but things
+may change in the future.
+
+MDIO indirect accesses
+----------------------
+
+Due to a limitation in how Broadcom switches have been designed, external
+Broadcom switches connected to a SF2 require the use of the DSA slave MDIO bus
+in order to properly configure them. By default, the SF2 pseudo-PHY address, and
+an external switch pseudo-PHY address will both be snooping for incoming MDIO
+transactions, since they are at the same address (30), resulting in some kind of
+"double" programming. Using DSA, and setting ds->phys_mii_mask accordingly, we
+selectively divert reads and writes towards external Broadcom switches
+pseudo-PHY addresses. Newer revisions of the SF2 hardware have introduced a
+configurable pseudo-PHY address which circumvents the initial design limitation.
+
+Multimedia over CoAxial (MoCA) interfaces
+-----------------------------------------
+
+MoCA interfaces are fairly specific and require the use of a firmware blob which
+gets loaded onto the MoCA processor(s) for packet processing. The switch
+hardware contains logic which will assert/de-assert link states accordingly for
+the MoCA interface whenever the MoCA coaxial cable gets disconnected or the
+firmware gets reloaded. The SF2 driver relies on such events to properly set its
+MoCA interface carrier state and properly report this to the networking stack.
+
+The MoCA interfaces are supported using the PHY library's fixed PHY/emulated PHY
+device and the switch driver registers a fixed_link_update callback for such
+PHYs which reflects the link state obtained from the interrupt handler.
+
+
+Power Management
+----------------
+
+Whenever possible, the SF2 driver tries to minimize the overall switch power
+consumption by applying a combination of:
+
+- turning off internal buffers/memories
+- disabling packet processing logic
+- putting integrated PHYs in IDDQ/low-power
+- reducing the switch core clock based on the active port count
+- enabling and advertising EEE
+- turning off RGMII data processing logic when the link goes down
+
+Wake-on-LAN
+-----------
+
+Wake-on-LAN is currently implemented by utilizing the host processor Ethernet
+MAC controller wake-on logic. Whenever Wake-on-LAN is requested, an intersection
+between the user request and the supported host Ethernet interface WoL
+capabilities is done and the intersection result gets configured. During
+system-wide suspend/resume, only ports not participating in Wake-on-LAN are
+disabled.
-- 
cgit v1.2.3


From 3fffd12839273429a185d68431f117f0a3654b07 Mon Sep 17 00:00:00 2001
From: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Date: Mon, 17 Aug 2015 23:52:51 -0700
Subject: i2c: allow specifying separate wakeup interrupt in device tree

Instead of having each i2c driver individually parse device tree data in
case it or platform supports separate wakeup interrupt, and handle
enabling and disabling wakeup interrupts in their power management
routines, let's have i2c core do that for us.

Platforms wishing to specify separate wakeup interrupt for the device
should use named interrupt syntax in their DTSes:

	interrupt-parent = <&intc1>;
	interrupts = <5 0>, <6 0>;
	interrupt-names = "irq", "wakeup";

This patch is inspired by work done by Vignesh R <vigneshr@ti.com> for
pixcir_i2c_ts driver.

Note that the original code tried to preserve any existing wakeup
settings from userspace but was not quite right in that regard:
it would preserve wakeup flag set by userspace upon driver rebinding;
but it would re-arm the wakeup flag if it was disabled by userspace.

We think that resetting the flag upon re-binding the driver is proper
behavior as the driver is responsible for setting up and handling
wakeups.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Tested-by: Vignesh R <vigneshr@ti.com>
[wsa: updated the commit message]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
---
 Documentation/devicetree/bindings/i2c/i2c.txt | 16 +++++++--
 drivers/i2c/i2c-core.c                        | 51 ++++++++++++++++++++++-----
 2 files changed, 56 insertions(+), 11 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/i2c/i2c.txt b/Documentation/devicetree/bindings/i2c/i2c.txt
index 1175efed4a41..8a99150ac3a7 100644
--- a/Documentation/devicetree/bindings/i2c/i2c.txt
+++ b/Documentation/devicetree/bindings/i2c/i2c.txt
@@ -12,7 +12,7 @@ Required properties
 - compatible      - name of I2C bus controller following generic names
 		    recommended practice.
 
-For other required properties e.g. to describe register sets, interrupts,
+For other required properties e.g. to describe register sets,
 clocks, etc. check the binding documentation of the specific driver.
 
 The cells properties above define that an address of children of an I2C bus
@@ -29,5 +29,17 @@ Optional properties
 These properties may not be supported by all drivers. However, if a driver
 wants to support one of the below features, it should adapt the bindings below.
 
-- clock-frequency	- frequency of bus clock in Hz
+- clock-frequency	- frequency of bus clock in Hz.
 - wakeup-source		- device can be used as a wakeup source.
+
+- interrupts		- interrupts used by the device.
+- interrupt-names	- "irq" and "wakeup" names are recognized by I2C core,
+			  other names are left to individual drivers.
+
+Binding may contain optional "interrupts" property, describing interrupts
+used by the device. I2C core will assign "irq" interrupt (or the very first
+interrupt if not using interrupt names) as primary interrupt for the slave.
+
+Also, if device is marked as a wakeup source, I2C core will set up "wakeup"
+interrupt for the device. If "wakeup" interrupt name is not present in the
+binding, then primary interrupt will be used as wakeup interrupt.
diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index 98f6c75b1d18..5f89f1e3c2f2 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -48,6 +48,7 @@
 #include <linux/rwsem.h>
 #include <linux/pm_runtime.h>
 #include <linux/pm_domain.h>
+#include <linux/pm_wakeirq.h>
 #include <linux/acpi.h>
 #include <linux/jump_label.h>
 #include <asm/uaccess.h>
@@ -645,11 +646,13 @@ static int i2c_device_probe(struct device *dev)
 	if (!client->irq) {
 		int irq = -ENOENT;
 
-		if (dev->of_node)
-			irq = of_irq_get(dev->of_node, 0);
-		else if (ACPI_COMPANION(dev))
+		if (dev->of_node) {
+			irq = of_irq_get_byname(dev->of_node, "irq");
+			if (irq == -EINVAL || irq == -ENODATA)
+				irq = of_irq_get(dev->of_node, 0);
+		} else if (ACPI_COMPANION(dev)) {
 			irq = acpi_dev_gpio_irq_get(ACPI_COMPANION(dev), 0);
-
+		}
 		if (irq == -EPROBE_DEFER)
 			return irq;
 		if (irq < 0)
@@ -662,23 +665,49 @@ static int i2c_device_probe(struct device *dev)
 	if (!driver->probe || !driver->id_table)
 		return -ENODEV;
 
-	if (!device_can_wakeup(&client->dev))
-		device_init_wakeup(&client->dev,
-					client->flags & I2C_CLIENT_WAKE);
+	if (client->flags & I2C_CLIENT_WAKE) {
+		int wakeirq = -ENOENT;
+
+		if (dev->of_node) {
+			wakeirq = of_irq_get_byname(dev->of_node, "wakeup");
+			if (wakeirq == -EPROBE_DEFER)
+				return wakeirq;
+		}
+
+		device_init_wakeup(&client->dev, true);
+
+		if (wakeirq > 0 && wakeirq != client->irq)
+			status = dev_pm_set_dedicated_wake_irq(dev, wakeirq);
+		else if (client->irq > 0)
+			status = dev_pm_set_wake_irq(dev, wakeirq);
+		else
+			status = 0;
+
+		if (status)
+			dev_warn(&client->dev, "failed to set up wakeup irq");
+	}
+
 	dev_dbg(dev, "probe\n");
 
 	status = of_clk_set_defaults(dev->of_node, false);
 	if (status < 0)
-		return status;
+		goto err_clear_wakeup_irq;
 
 	status = dev_pm_domain_attach(&client->dev, true);
 	if (status != -EPROBE_DEFER) {
 		status = driver->probe(client, i2c_match_id(driver->id_table,
 					client));
 		if (status)
-			dev_pm_domain_detach(&client->dev, true);
+			goto err_detach_pm_domain;
 	}
 
+	return 0;
+
+err_detach_pm_domain:
+	dev_pm_domain_detach(&client->dev, true);
+err_clear_wakeup_irq:
+	dev_pm_clear_wake_irq(&client->dev);
+	device_init_wakeup(&client->dev, false);
 	return status;
 }
 
@@ -698,6 +727,10 @@ static int i2c_device_remove(struct device *dev)
 	}
 
 	dev_pm_domain_detach(&client->dev, true);
+
+	dev_pm_clear_wake_irq(&client->dev);
+	device_init_wakeup(&client->dev, false);
+
 	return status;
 }
 
-- 
cgit v1.2.3


From 65be2c79acc3aa0f9c0e8d4871f5a451d854465a Mon Sep 17 00:00:00 2001
From: "Matthew R. Ochs" <mrochs@linux.vnet.ibm.com>
Date: Thu, 13 Aug 2015 21:47:43 -0500
Subject: cxlflash: Superpipe support

Add superpipe supporting infrastructure to device driver for the IBM CXL
Flash adapter. This patch allows userspace applications to take advantage
of the accelerated I/O features that this adapter provides and bypass the
traditional filesystem stack.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
---
 Documentation/ioctl/ioctl-number.txt |    1 +
 Documentation/powerpc/cxlflash.txt   |  257 +++++
 drivers/scsi/cxlflash/Makefile       |    2 +-
 drivers/scsi/cxlflash/common.h       |   19 +
 drivers/scsi/cxlflash/lunmgt.c       |  263 +++++
 drivers/scsi/cxlflash/main.c         |   38 +-
 drivers/scsi/cxlflash/sislite.h      |    5 +-
 drivers/scsi/cxlflash/superpipe.c    | 2014 ++++++++++++++++++++++++++++++++++
 drivers/scsi/cxlflash/superpipe.h    |  132 +++
 include/uapi/scsi/Kbuild             |    1 +
 include/uapi/scsi/cxlflash_ioctl.h   |  140 +++
 11 files changed, 2868 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/powerpc/cxlflash.txt
 create mode 100644 drivers/scsi/cxlflash/lunmgt.c
 create mode 100644 drivers/scsi/cxlflash/superpipe.c
 create mode 100644 drivers/scsi/cxlflash/superpipe.h
 create mode 100644 include/uapi/scsi/cxlflash_ioctl.h

(limited to 'Documentation')

diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 611c52267d24..9bd118d26a8a 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -314,6 +314,7 @@ Code  Seq#(hex)	Include File		Comments
 0xB3	00	linux/mmc/ioctl.h
 0xC0	00-0F	linux/usb/iowarrior.h
 0xCA	00-0F	uapi/misc/cxl.h
+0xCA	80-8F	uapi/scsi/cxlflash_ioctl.h
 0xCB	00-1F	CBM serial IEC bus	in development:
 					<mailto:michael.klein@puffin.lb.shuttle.de>
 0xCD	01	linux/reiserfs_fs.h
diff --git a/Documentation/powerpc/cxlflash.txt b/Documentation/powerpc/cxlflash.txt
new file mode 100644
index 000000000000..f943967f90ce
--- /dev/null
+++ b/Documentation/powerpc/cxlflash.txt
@@ -0,0 +1,257 @@
+Introduction
+============
+
+    The IBM Power architecture provides support for CAPI (Coherent
+    Accelerator Power Interface), which is available to certain PCIe slots
+    on Power 8 systems. CAPI can be thought of as a special tunneling
+    protocol through PCIe that allow PCIe adapters to look like special
+    purpose co-processors which can read or write an application's
+    memory and generate page faults. As a result, the host interface to
+    an adapter running in CAPI mode does not require the data buffers to
+    be mapped to the device's memory (IOMMU bypass) nor does it require
+    memory to be pinned.
+
+    On Linux, Coherent Accelerator (CXL) kernel services present CAPI
+    devices as a PCI device by implementing a virtual PCI host bridge.
+    This abstraction simplifies the infrastructure and programming
+    model, allowing for drivers to look similar to other native PCI
+    device drivers.
+
+    CXL provides a mechanism by which user space applications can
+    directly talk to a device (network or storage) bypassing the typical
+    kernel/device driver stack. The CXL Flash Adapter Driver enables a
+    user space application direct access to Flash storage.
+
+    The CXL Flash Adapter Driver is a kernel module that sits in the
+    SCSI stack as a low level device driver (below the SCSI disk and
+    protocol drivers) for the IBM CXL Flash Adapter. This driver is
+    responsible for the initialization of the adapter, setting up the
+    special path for user space access, and performing error recovery. It
+    communicates directly the Flash Accelerator Functional Unit (AFU)
+    as described in Documentation/powerpc/cxl.txt.
+
+    The cxlflash driver supports two, mutually exclusive, modes of
+    operation at the device (LUN) level:
+
+        - Any flash device (LUN) can be configured to be accessed as a
+          regular disk device (i.e.: /dev/sdc). This is the default mode.
+
+        - Any flash device (LUN) can be configured to be accessed from
+          user space with a special block library. This mode further
+          specifies the means of accessing the device and provides for
+          either raw access to the entire LUN (referred to as direct
+          or physical LUN access) or access to a kernel/AFU-mediated
+          partition of the LUN (referred to as virtual LUN access). The
+          segmentation of a disk device into virtual LUNs is assisted
+          by special translation services provided by the Flash AFU.
+
+Overview
+========
+
+    The Coherent Accelerator Interface Architecture (CAIA) introduces a
+    concept of a master context. A master typically has special privileges
+    granted to it by the kernel or hypervisor allowing it to perform AFU
+    wide management and control. The master may or may not be involved
+    directly in each user I/O, but at the minimum is involved in the
+    initial setup before the user application is allowed to send requests
+    directly to the AFU.
+
+    The CXL Flash Adapter Driver establishes a master context with the
+    AFU. It uses memory mapped I/O (MMIO) for this control and setup. The
+    Adapter Problem Space Memory Map looks like this:
+
+                     +-------------------------------+
+                     |    512 * 64 KB User MMIO      |
+                     |        (per context)          |
+                     |       User Accessible         |
+                     +-------------------------------+
+                     |    512 * 128 B per context    |
+                     |    Provisioning and Control   |
+                     |   Trusted Process accessible  |
+                     +-------------------------------+
+                     |         64 KB Global          |
+                     |   Trusted Process accessible  |
+                     +-------------------------------+
+
+    This driver configures itself into the SCSI software stack as an
+    adapter driver. The driver is the only entity that is considered a
+    Trusted Process to program the Provisioning and Control and Global
+    areas in the MMIO Space shown above.  The master context driver
+    discovers all LUNs attached to the CXL Flash adapter and instantiates
+    scsi block devices (/dev/sdb, /dev/sdc etc.) for each unique LUN
+    seen from each path.
+
+    Once these scsi block devices are instantiated, an application
+    written to a specification provided by the block library may get
+    access to the Flash from user space (without requiring a system call).
+
+    This master context driver also provides a series of ioctls for this
+    block library to enable this user space access.  The driver supports
+    two modes for accessing the block device.
+
+    The first mode is called a virtual mode. In this mode a single scsi
+    block device (/dev/sdb) may be carved up into any number of distinct
+    virtual LUNs. The virtual LUNs may be resized as long as the sum of
+    the sizes of all the virtual LUNs, along with the meta-data associated
+    with it does not exceed the physical capacity.
+
+    The second mode is called the physical mode. In this mode a single
+    block device (/dev/sdb) may be opened directly by the block library
+    and the entire space for the LUN is available to the application.
+
+    Only the physical mode provides persistence of the data.  i.e. The
+    data written to the block device will survive application exit and
+    restart and also reboot. The virtual LUNs do not persist (i.e. do
+    not survive after the application terminates or the system reboots).
+
+
+Block library API
+=================
+
+    Applications intending to get access to the CXL Flash from user
+    space should use the block library, as it abstracts the details of
+    interfacing directly with the cxlflash driver that are necessary for
+    performing administrative actions (i.e.: setup, tear down, resize).
+    The block library can be thought of as a 'user' of services,
+    implemented as IOCTLs, that are provided by the cxlflash driver
+    specifically for devices (LUNs) operating in user space access
+    mode. While it is not a requirement that applications understand
+    the interface between the block library and the cxlflash driver,
+    a high-level overview of each supported service (IOCTL) is provided
+    below.
+
+    The block library can be found on GitHub:
+    http://www.github.com/mikehollinger/ibmcapikv
+
+
+CXL Flash Driver IOCTLs
+=======================
+
+    Users, such as the block library, that wish to interface with a flash
+    device (LUN) via user space access need to use the services provided
+    by the cxlflash driver. As these services are implemented as ioctls,
+    a file descriptor handle must first be obtained in order to establish
+    the communication channel between a user and the kernel.  This file
+    descriptor is obtained by opening the device special file associated
+    with the scsi disk device (/dev/sdb) that was created during LUN
+    discovery. As per the location of the cxlflash driver within the
+    SCSI protocol stack, this open is actually not seen by the cxlflash
+    driver. Upon successful open, the user receives a file descriptor
+    (herein referred to as fd1) that should be used for issuing the
+    subsequent ioctls listed below.
+
+    The structure definitions for these IOCTLs are available in:
+    uapi/scsi/cxlflash_ioctl.h
+
+DK_CXLFLASH_ATTACH
+------------------
+
+    This ioctl obtains, initializes, and starts a context using the CXL
+    kernel services. These services specify a context id (u16) by which
+    to uniquely identify the context and its allocated resources. The
+    services additionally provide a second file descriptor (herein
+    referred to as fd2) that is used by the block library to initiate
+    memory mapped I/O (via mmap()) to the CXL flash device and poll for
+    completion events. This file descriptor is intentionally installed by
+    this driver and not the CXL kernel services to allow for intermediary
+    notification and access in the event of a non-user-initiated close(),
+    such as a killed process. This design point is described in further
+    detail in the description for the DK_CXLFLASH_DETACH ioctl.
+
+    There are a few important aspects regarding the "tokens" (context id
+    and fd2) that are provided back to the user:
+
+        - These tokens are only valid for the process under which they
+          were created. The child of a forked process cannot continue
+          to use the context id or file descriptor created by its parent.
+
+        - These tokens are only valid for the lifetime of the context and
+          the process under which they were created. Once either is
+          destroyed, the tokens are to be considered stale and subsequent
+          usage will result in errors.
+
+        - When a context is no longer needed, the user shall detach from
+          the context via the DK_CXLFLASH_DETACH ioctl.
+
+        - A close on fd2 will invalidate the tokens. This operation is not
+          required by the user.
+
+DK_CXLFLASH_USER_DIRECT
+-----------------------
+    This ioctl is responsible for transitioning the LUN to direct
+    (physical) mode access and configuring the AFU for direct access from
+    user space on a per-context basis. Additionally, the block size and
+    last logical block address (LBA) are returned to the user.
+
+    As mentioned previously, when operating in user space access mode,
+    LUNs may be accessed in whole or in part. Only one mode is allowed
+    at a time and if one mode is active (outstanding references exist),
+    requests to use the LUN in a different mode are denied.
+
+    The AFU is configured for direct access from user space by adding an
+    entry to the AFU's resource handle table. The index of the entry is
+    treated as a resource handle that is returned to the user. The user
+    is then able to use the handle to reference the LUN during I/O.
+
+DK_CXLFLASH_RELEASE
+-------------------
+    This ioctl is responsible for releasing a previously obtained
+    reference to either a physical or virtual LUN. This can be
+    thought of as the inverse of the DK_CXLFLASH_USER_DIRECT or
+    DK_CXLFLASH_USER_VIRTUAL ioctls. Upon success, the resource handle
+    is no longer valid and the entry in the resource handle table is
+    made available to be used again.
+
+    As part of the release process for virtual LUNs, the virtual LUN
+    is first resized to 0 to clear out and free the translation tables
+    associated with the virtual LUN reference.
+
+DK_CXLFLASH_DETACH
+------------------
+    This ioctl is responsible for unregistering a context with the
+    cxlflash driver and release outstanding resources that were
+    not explicitly released via the DK_CXLFLASH_RELEASE ioctl. Upon
+    success, all "tokens" which had been provided to the user from the
+    DK_CXLFLASH_ATTACH onward are no longer valid.
+
+DK_CXLFLASH_VERIFY
+------------------
+    This ioctl is used to detect various changes such as the capacity of
+    the disk changing, the number of LUNs visible changing, etc. In cases
+    where the changes affect the application (such as a LUN resize), the
+    cxlflash driver will report the changed state to the application.
+
+    The user calls in when they want to validate that a LUN hasn't been
+    changed in response to a check condition. As the user is operating out
+    of band from the kernel, they will see these types of events without
+    the kernel's knowledge. When encountered, the user's architected
+    behavior is to call in to this ioctl, indicating what they want to
+    verify and passing along any appropriate information. For now, only
+    verifying a LUN change (ie: size different) with sense data is
+    supported.
+
+DK_CXLFLASH_RECOVER_AFU
+-----------------------
+    This ioctl is used to drive recovery (if such an action is warranted)
+    of a specified user context. Any state associated with the user context
+    is re-established upon successful recovery.
+
+    User contexts are put into an error condition when the device needs to
+    be reset or is terminating. Users are notified of this error condition
+    by seeing all 0xF's on an MMIO read. Upon encountering this, the
+    architected behavior for a user is to call into this ioctl to recover
+    their context. A user may also call into this ioctl at any time to
+    check if the device is operating normally. If a failure is returned
+    from this ioctl, the user is expected to gracefully clean up their
+    context via release/detach ioctls. Until they do, the context they
+    hold is not relinquished. The user may also optionally exit the process
+    at which time the context/resources they held will be freed as part of
+    the release fop.
+
+DK_CXLFLASH_MANAGE_LUN
+----------------------
+    This ioctl is used to switch a LUN from a mode where it is available
+    for file-system access (legacy), to a mode where it is set aside for
+    exclusive user space access (superpipe). In case a LUN is visible
+    across multiple ports and adapters, this ioctl is used to uniquely
+    identify each LUN by its World Wide Node Name (WWNN).
diff --git a/drivers/scsi/cxlflash/Makefile b/drivers/scsi/cxlflash/Makefile
index dc95e203e3af..c14d24c720d6 100644
--- a/drivers/scsi/cxlflash/Makefile
+++ b/drivers/scsi/cxlflash/Makefile
@@ -1,2 +1,2 @@
 obj-$(CONFIG_CXLFLASH) += cxlflash.o
-cxlflash-y += main.o
+cxlflash-y += main.o superpipe.o lunmgt.o
diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
index ffdbc572d180..d3e54e61c7a5 100644
--- a/drivers/scsi/cxlflash/common.h
+++ b/drivers/scsi/cxlflash/common.h
@@ -107,6 +107,17 @@ struct cxlflash_cfg {
 	struct pci_pool *cxlflash_cmd_pool;
 	struct pci_dev *parent_dev;
 
+	atomic_t recovery_threads;
+	struct mutex ctx_recovery_mutex;
+	struct mutex ctx_tbl_list_mutex;
+	struct ctx_info *ctx_tbl[MAX_CONTEXT];
+	struct list_head ctx_err_recovery; /* contexts w/ recovery pending */
+	struct file_operations cxl_fops;
+
+	atomic_t num_user_contexts;
+
+	struct list_head lluns; /* list of llun_info structs */
+
 	wait_queue_head_t tmf_waitq;
 	bool tmf_active;
 	wait_queue_head_t limbo_waitq;
@@ -182,4 +193,12 @@ int cxlflash_afu_reset(struct cxlflash_cfg *);
 struct afu_cmd *cxlflash_cmd_checkout(struct afu *);
 void cxlflash_cmd_checkin(struct afu_cmd *);
 int cxlflash_afu_sync(struct afu *, ctx_hndl_t, res_hndl_t, u8);
+void cxlflash_list_init(void);
+void cxlflash_term_global_luns(void);
+void cxlflash_free_errpage(void);
+int cxlflash_ioctl(struct scsi_device *, int, void __user *);
+void cxlflash_stop_term_user_contexts(struct cxlflash_cfg *);
+int cxlflash_mark_contexts_error(struct cxlflash_cfg *);
+void cxlflash_term_local_luns(struct cxlflash_cfg *);
+
 #endif /* ifndef _CXLFLASH_COMMON_H */
diff --git a/drivers/scsi/cxlflash/lunmgt.c b/drivers/scsi/cxlflash/lunmgt.c
new file mode 100644
index 000000000000..66d5bef11ee6
--- /dev/null
+++ b/drivers/scsi/cxlflash/lunmgt.c
@@ -0,0 +1,263 @@
+/*
+ * CXL Flash Device Driver
+ *
+ * Written by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>, IBM Corporation
+ *             Matthew R. Ochs <mrochs@linux.vnet.ibm.com>, IBM Corporation
+ *
+ * Copyright (C) 2015 IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <misc/cxl.h>
+#include <asm/unaligned.h>
+
+#include <scsi/scsi_host.h>
+#include <uapi/scsi/cxlflash_ioctl.h>
+
+#include "sislite.h"
+#include "common.h"
+#include "superpipe.h"
+
+/**
+ * create_local() - allocate and initialize a local LUN information structure
+ * @sdev:	SCSI device associated with LUN.
+ * @wwid:	World Wide Node Name for LUN.
+ *
+ * Return: Allocated local llun_info structure on success, NULL on failure
+ */
+static struct llun_info *create_local(struct scsi_device *sdev, u8 *wwid)
+{
+	struct llun_info *lli = NULL;
+
+	lli = kzalloc(sizeof(*lli), GFP_KERNEL);
+	if (unlikely(!lli)) {
+		pr_err("%s: could not allocate lli\n", __func__);
+		goto out;
+	}
+
+	lli->sdev = sdev;
+	lli->newly_created = true;
+	lli->host_no = sdev->host->host_no;
+
+	memcpy(lli->wwid, wwid, DK_CXLFLASH_MANAGE_LUN_WWID_LEN);
+out:
+	return lli;
+}
+
+/**
+ * create_global() - allocate and initialize a global LUN information structure
+ * @sdev:	SCSI device associated with LUN.
+ * @wwid:	World Wide Node Name for LUN.
+ *
+ * Return: Allocated global glun_info structure on success, NULL on failure
+ */
+static struct glun_info *create_global(struct scsi_device *sdev, u8 *wwid)
+{
+	struct glun_info *gli = NULL;
+
+	gli = kzalloc(sizeof(*gli), GFP_KERNEL);
+	if (unlikely(!gli)) {
+		pr_err("%s: could not allocate gli\n", __func__);
+		goto out;
+	}
+
+	mutex_init(&gli->mutex);
+	memcpy(gli->wwid, wwid, DK_CXLFLASH_MANAGE_LUN_WWID_LEN);
+out:
+	return gli;
+}
+
+/**
+ * refresh_local() - find and update local LUN information structure by WWID
+ * @cfg:	Internal structure associated with the host.
+ * @wwid:	WWID associated with LUN.
+ *
+ * When the LUN is found, mark it by updating it's newly_created field.
+ *
+ * Return: Found local lun_info structure on success, NULL on failure
+ * If a LUN with the WWID is found in the list, refresh it's state.
+ */
+static struct llun_info *refresh_local(struct cxlflash_cfg *cfg, u8 *wwid)
+{
+	struct llun_info *lli, *temp;
+
+	list_for_each_entry_safe(lli, temp, &cfg->lluns, list)
+		if (!memcmp(lli->wwid, wwid, DK_CXLFLASH_MANAGE_LUN_WWID_LEN)) {
+			lli->newly_created = false;
+			return lli;
+		}
+
+	return NULL;
+}
+
+/**
+ * lookup_global() - find a global LUN information structure by WWID
+ * @wwid:	WWID associated with LUN.
+ *
+ * Return: Found global lun_info structure on success, NULL on failure
+ */
+static struct glun_info *lookup_global(u8 *wwid)
+{
+	struct glun_info *gli, *temp;
+
+	list_for_each_entry_safe(gli, temp, &global.gluns, list)
+		if (!memcmp(gli->wwid, wwid, DK_CXLFLASH_MANAGE_LUN_WWID_LEN))
+			return gli;
+
+	return NULL;
+}
+
+/**
+ * find_and_create_lun() - find or create a local LUN information structure
+ * @sdev:	SCSI device associated with LUN.
+ * @wwid:	WWID associated with LUN.
+ *
+ * The LUN is kept both in a local list (per adapter) and in a global list
+ * (across all adapters). Certain attributes of the LUN are local to the
+ * adapter (such as index, port selection mask etc.).
+ * The block allocation map is shared across all adapters (i.e. associated
+ * wih the global list). Since different attributes are associated with
+ * the per adapter and global entries, allocate two separate structures for each
+ * LUN (one local, one global).
+ *
+ * Keep a pointer back from the local to the global entry.
+ *
+ * Return: Found/Allocated local lun_info structure on success, NULL on failure
+ */
+static struct llun_info *find_and_create_lun(struct scsi_device *sdev, u8 *wwid)
+{
+	struct llun_info *lli = NULL;
+	struct glun_info *gli = NULL;
+	struct Scsi_Host *shost = sdev->host;
+	struct cxlflash_cfg *cfg = shost_priv(shost);
+
+	mutex_lock(&global.mutex);
+	if (unlikely(!wwid))
+		goto out;
+
+	lli = refresh_local(cfg, wwid);
+	if (lli)
+		goto out;
+
+	lli = create_local(sdev, wwid);
+	if (unlikely(!lli))
+		goto out;
+
+	gli = lookup_global(wwid);
+	if (gli) {
+		lli->parent = gli;
+		list_add(&lli->list, &cfg->lluns);
+		goto out;
+	}
+
+	gli = create_global(sdev, wwid);
+	if (unlikely(!gli)) {
+		kfree(lli);
+		lli = NULL;
+		goto out;
+	}
+
+	lli->parent = gli;
+	list_add(&lli->list, &cfg->lluns);
+
+	list_add(&gli->list, &global.gluns);
+
+out:
+	mutex_unlock(&global.mutex);
+	pr_debug("%s: returning %p\n", __func__, lli);
+	return lli;
+}
+
+/**
+ * cxlflash_term_local_luns() - Delete all entries from local LUN list, free.
+ * @cfg:	Internal structure associated with the host.
+ */
+void cxlflash_term_local_luns(struct cxlflash_cfg *cfg)
+{
+	struct llun_info *lli, *temp;
+
+	mutex_lock(&global.mutex);
+	list_for_each_entry_safe(lli, temp, &cfg->lluns, list) {
+		list_del(&lli->list);
+		kfree(lli);
+	}
+	mutex_unlock(&global.mutex);
+}
+
+/**
+ * cxlflash_list_init() - initializes the global LUN list
+ */
+void cxlflash_list_init(void)
+{
+	INIT_LIST_HEAD(&global.gluns);
+	mutex_init(&global.mutex);
+	global.err_page = NULL;
+}
+
+/**
+ * cxlflash_term_global_luns() - frees resources associated with global LUN list
+ */
+void cxlflash_term_global_luns(void)
+{
+	struct glun_info *gli, *temp;
+
+	mutex_lock(&global.mutex);
+	list_for_each_entry_safe(gli, temp, &global.gluns, list) {
+		list_del(&gli->list);
+		kfree(gli);
+	}
+	mutex_unlock(&global.mutex);
+}
+
+/**
+ * cxlflash_manage_lun() - handles LUN management activities
+ * @sdev:	SCSI device associated with LUN.
+ * @manage:	Manage ioctl data structure.
+ *
+ * This routine is used to notify the driver about a LUN's WWID and associate
+ * SCSI devices (sdev) with a global LUN instance. Additionally it serves to
+ * change a LUN's operating mode: legacy or superpipe.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int cxlflash_manage_lun(struct scsi_device *sdev,
+			struct dk_cxlflash_manage_lun *manage)
+{
+	int rc = 0;
+	struct llun_info *lli = NULL;
+	u64 flags = manage->hdr.flags;
+	u32 chan = sdev->channel;
+
+	lli = find_and_create_lun(sdev, manage->wwid);
+	pr_debug("%s: ENTER: WWID = %016llX%016llX, flags = %016llX li = %p\n",
+		 __func__, get_unaligned_le64(&manage->wwid[0]),
+		 get_unaligned_le64(&manage->wwid[8]),
+		 manage->hdr.flags, lli);
+	if (unlikely(!lli)) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	if (flags & DK_CXLFLASH_MANAGE_LUN_ENABLE_SUPERPIPE) {
+		if (lli->newly_created)
+			lli->port_sel = CHAN2PORT(chan);
+		else
+			lli->port_sel = BOTH_PORTS;
+		/* Store off lun in unpacked, AFU-friendly format */
+		lli->lun_id[chan] = lun_to_lunid(sdev->lun);
+		sdev->hostdata = lli;
+	} else if (flags & DK_CXLFLASH_MANAGE_LUN_DISABLE_SUPERPIPE) {
+		if (lli->parent->mode != MODE_NONE)
+			rc = -EBUSY;
+		else
+			sdev->hostdata = NULL;
+	}
+
+out:
+	pr_debug("%s: returning rc=%d\n", __func__, rc);
+	return rc;
+}
diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 3ae8dca236ef..02d464f41b7f 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -23,6 +23,7 @@
 
 #include <scsi/scsi_cmnd.h>
 #include <scsi/scsi_host.h>
+#include <uapi/scsi/cxlflash_ioctl.h>
 
 #include "main.h"
 #include "sislite.h"
@@ -519,7 +520,7 @@ static int cxlflash_eh_host_reset_handler(struct scsi_cmnd *scp)
 	case STATE_NORMAL:
 		cfg->state = STATE_LIMBO;
 		scsi_block_requests(cfg->host);
-
+		cxlflash_mark_contexts_error(cfg);
 		rcr = cxlflash_afu_reset(cfg);
 		if (rcr) {
 			rc = FAILED;
@@ -662,6 +663,21 @@ static ssize_t cxlflash_store_lun_mode(struct device *dev,
 	return count;
 }
 
+/**
+ * cxlflash_show_ioctl_version() - presents the current ioctl version of the host
+ * @dev:	Generic device associated with the host.
+ * @attr:	Device attribute representing the ioctl version.
+ * @buf:	Buffer of length PAGE_SIZE to report back the ioctl version.
+ *
+ * Return: The size of the ASCII string returned in @buf.
+ */
+static ssize_t cxlflash_show_ioctl_version(struct device *dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	return scnprintf(buf, PAGE_SIZE, "%u\n", DK_CXLFLASH_VERSION_0);
+}
+
 /**
  * cxlflash_show_dev_mode() - presents the current mode of the device
  * @dev:	Generic device associated with the device.
@@ -700,11 +716,13 @@ static DEVICE_ATTR(port0, S_IRUGO, cxlflash_show_port_status, NULL);
 static DEVICE_ATTR(port1, S_IRUGO, cxlflash_show_port_status, NULL);
 static DEVICE_ATTR(lun_mode, S_IRUGO | S_IWUSR, cxlflash_show_lun_mode,
 		   cxlflash_store_lun_mode);
+static DEVICE_ATTR(ioctl_version, S_IRUGO, cxlflash_show_ioctl_version, NULL);
 
 static struct device_attribute *cxlflash_host_attrs[] = {
 	&dev_attr_port0,
 	&dev_attr_port1,
 	&dev_attr_lun_mode,
+	&dev_attr_ioctl_version,
 	NULL
 };
 
@@ -725,6 +743,7 @@ static struct scsi_host_template driver_template = {
 	.module = THIS_MODULE,
 	.name = CXLFLASH_ADAPTER_NAME,
 	.info = cxlflash_driver_info,
+	.ioctl = cxlflash_ioctl,
 	.proc_name = CXLFLASH_NAME,
 	.queuecommand = cxlflash_queuecommand,
 	.eh_device_reset_handler = cxlflash_eh_device_reset_handler,
@@ -872,9 +891,11 @@ static void cxlflash_remove(struct pci_dev *pdev)
 	spin_unlock_irqrestore(&cfg->tmf_waitq.lock, lock_flags);
 
 	cfg->state = STATE_FAILTERM;
+	cxlflash_stop_term_user_contexts(cfg);
 
 	switch (cfg->init_state) {
 	case INIT_STATE_SCSI:
+		cxlflash_term_local_luns(cfg);
 		scsi_remove_host(cfg->host);
 		scsi_host_put(cfg->host);
 		/* Fall through */
@@ -2274,6 +2295,10 @@ static int cxlflash_probe(struct pci_dev *pdev,
 	INIT_WORK(&cfg->work_q, cxlflash_worker_thread);
 	cfg->lr_state = LINK_RESET_INVALID;
 	cfg->lr_port = -1;
+	mutex_init(&cfg->ctx_tbl_list_mutex);
+	mutex_init(&cfg->ctx_recovery_mutex);
+	INIT_LIST_HEAD(&cfg->ctx_err_recovery);
+	INIT_LIST_HEAD(&cfg->lluns);
 
 	pci_set_drvdata(pdev, cfg);
 
@@ -2335,6 +2360,7 @@ out_remove:
 static pci_ers_result_t cxlflash_pci_error_detected(struct pci_dev *pdev,
 						    pci_channel_state_t state)
 {
+	int rc = 0;
 	struct cxlflash_cfg *cfg = pci_get_drvdata(pdev);
 	struct device *dev = &cfg->dev->dev;
 
@@ -2346,7 +2372,10 @@ static pci_ers_result_t cxlflash_pci_error_detected(struct pci_dev *pdev,
 
 		/* Turn off legacy I/O */
 		scsi_block_requests(cfg->host);
-
+		rc = cxlflash_mark_contexts_error(cfg);
+		if (unlikely(rc))
+			dev_err(dev, "%s: Failed to mark user contexts!(%d)\n",
+				__func__, rc);
 		term_mc(cfg, UNDO_START);
 		stop_afu(cfg);
 
@@ -2431,6 +2460,8 @@ static int __init init_cxlflash(void)
 	pr_info("%s: IBM Power CXL Flash Adapter: %s\n",
 		__func__, CXLFLASH_DRIVER_DATE);
 
+	cxlflash_list_init();
+
 	return pci_register_driver(&cxlflash_driver);
 }
 
@@ -2439,6 +2470,9 @@ static int __init init_cxlflash(void)
  */
 static void __exit exit_cxlflash(void)
 {
+	cxlflash_term_global_luns();
+	cxlflash_free_errpage();
+
 	pci_unregister_driver(&cxlflash_driver);
 }
 
diff --git a/drivers/scsi/cxlflash/sislite.h b/drivers/scsi/cxlflash/sislite.h
index bf5d39978630..66b889151a4c 100644
--- a/drivers/scsi/cxlflash/sislite.h
+++ b/drivers/scsi/cxlflash/sislite.h
@@ -409,7 +409,10 @@ struct sisl_lxt_entry {
 
 };
 
-/* Per the SISlite spec, RHT entries are to be 16-byte aligned */
+/*
+ * RHT - Resource Handle Table
+ * Per the SISlite spec, RHT entries are to be 16-byte aligned
+ */
 struct sisl_rht_entry {
 	struct sisl_lxt_entry *lxt_start;
 	u32 lxt_cnt;
diff --git a/drivers/scsi/cxlflash/superpipe.c b/drivers/scsi/cxlflash/superpipe.c
new file mode 100644
index 000000000000..3c8bce8bbb0b
--- /dev/null
+++ b/drivers/scsi/cxlflash/superpipe.c
@@ -0,0 +1,2014 @@
+/*
+ * CXL Flash Device Driver
+ *
+ * Written by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>, IBM Corporation
+ *             Matthew R. Ochs <mrochs@linux.vnet.ibm.com>, IBM Corporation
+ *
+ * Copyright (C) 2015 IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/delay.h>
+#include <linux/file.h>
+#include <linux/syscalls.h>
+#include <misc/cxl.h>
+#include <asm/unaligned.h>
+
+#include <scsi/scsi.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_eh.h>
+#include <uapi/scsi/cxlflash_ioctl.h>
+
+#include "sislite.h"
+#include "common.h"
+#include "superpipe.h"
+
+struct cxlflash_global global;
+
+/**
+ * marshal_det_to_rele() - translate detach to release structure
+ * @detach:	Destination structure for the translate/copy.
+ * @rele:	Source structure from which to translate/copy.
+ */
+static void marshal_det_to_rele(struct dk_cxlflash_detach *detach,
+				struct dk_cxlflash_release *release)
+{
+	release->hdr = detach->hdr;
+	release->context_id = detach->context_id;
+}
+
+/**
+ * cxlflash_free_errpage() - frees resources associated with global error page
+ */
+void cxlflash_free_errpage(void)
+{
+
+	mutex_lock(&global.mutex);
+	if (global.err_page) {
+		__free_page(global.err_page);
+		global.err_page = NULL;
+	}
+	mutex_unlock(&global.mutex);
+}
+
+/**
+ * cxlflash_stop_term_user_contexts() - stops/terminates known user contexts
+ * @cfg:	Internal structure associated with the host.
+ *
+ * When the host needs to go down, all users must be quiesced and their
+ * memory freed. This is accomplished by putting the contexts in error
+ * state which will notify the user and let them 'drive' the tear-down.
+ * Meanwhile, this routine camps until all user contexts have been removed.
+ */
+void cxlflash_stop_term_user_contexts(struct cxlflash_cfg *cfg)
+{
+	struct device *dev = &cfg->dev->dev;
+	int i, found;
+
+	cxlflash_mark_contexts_error(cfg);
+
+	while (true) {
+		found = false;
+
+		for (i = 0; i < MAX_CONTEXT; i++)
+			if (cfg->ctx_tbl[i]) {
+				found = true;
+				break;
+			}
+
+		if (!found && list_empty(&cfg->ctx_err_recovery))
+			return;
+
+		dev_dbg(dev, "%s: Wait for user contexts to quiesce...\n",
+			__func__);
+		wake_up_all(&cfg->limbo_waitq);
+		ssleep(1);
+	}
+}
+
+/**
+ * find_error_context() - locates a context by cookie on the error recovery list
+ * @cfg:	Internal structure associated with the host.
+ * @rctxid:	Desired context by id.
+ * @file:	Desired context by file.
+ *
+ * Return: Found context on success, NULL on failure
+ */
+static struct ctx_info *find_error_context(struct cxlflash_cfg *cfg, u64 rctxid,
+					   struct file *file)
+{
+	struct ctx_info *ctxi;
+
+	list_for_each_entry(ctxi, &cfg->ctx_err_recovery, list)
+		if ((ctxi->ctxid == rctxid) || (ctxi->file == file))
+			return ctxi;
+
+	return NULL;
+}
+
+/**
+ * get_context() - obtains a validated and locked context reference
+ * @cfg:	Internal structure associated with the host.
+ * @rctxid:	Desired context (raw, un-decoded format).
+ * @arg:	LUN information or file associated with request.
+ * @ctx_ctrl:	Control information to 'steer' desired lookup.
+ *
+ * NOTE: despite the name pid, in linux, current->pid actually refers
+ * to the lightweight process id (tid) and can change if the process is
+ * multi threaded. The tgid remains constant for the process and only changes
+ * when the process of fork. For all intents and purposes, think of tgid
+ * as a pid in the traditional sense.
+ *
+ * Return: Validated context on success, NULL on failure
+ */
+struct ctx_info *get_context(struct cxlflash_cfg *cfg, u64 rctxid,
+			     void *arg, enum ctx_ctrl ctx_ctrl)
+{
+	struct device *dev = &cfg->dev->dev;
+	struct ctx_info *ctxi = NULL;
+	struct lun_access *lun_access = NULL;
+	struct file *file = NULL;
+	struct llun_info *lli = arg;
+	u64 ctxid = DECODE_CTXID(rctxid);
+	int rc;
+	pid_t pid = current->tgid, ctxpid = 0;
+
+	if (ctx_ctrl & CTX_CTRL_FILE) {
+		lli = NULL;
+		file = (struct file *)arg;
+	}
+
+	if (ctx_ctrl & CTX_CTRL_CLONE)
+		pid = current->parent->tgid;
+
+	if (likely(ctxid < MAX_CONTEXT)) {
+		while (true) {
+			rc = mutex_lock_interruptible(&cfg->ctx_tbl_list_mutex);
+			if (rc)
+				goto out;
+
+			ctxi = cfg->ctx_tbl[ctxid];
+			if (ctxi)
+				if ((file && (ctxi->file != file)) ||
+				    (!file && (ctxi->ctxid != rctxid)))
+					ctxi = NULL;
+
+			if ((ctx_ctrl & CTX_CTRL_ERR) ||
+			    (!ctxi && (ctx_ctrl & CTX_CTRL_ERR_FALLBACK)))
+				ctxi = find_error_context(cfg, rctxid, file);
+			if (!ctxi) {
+				mutex_unlock(&cfg->ctx_tbl_list_mutex);
+				goto out;
+			}
+
+			/*
+			 * Need to acquire ownership of the context while still
+			 * under the table/list lock to serialize with a remove
+			 * thread. Use the 'try' to avoid stalling the
+			 * table/list lock for a single context.
+			 *
+			 * Note that the lock order is:
+			 *
+			 *	cfg->ctx_tbl_list_mutex -> ctxi->mutex
+			 *
+			 * Therefore release ctx_tbl_list_mutex before retrying.
+			 */
+			rc = mutex_trylock(&ctxi->mutex);
+			mutex_unlock(&cfg->ctx_tbl_list_mutex);
+			if (rc)
+				break; /* got the context's lock! */
+		}
+
+		if (ctxi->unavail)
+			goto denied;
+
+		ctxpid = ctxi->pid;
+		if (likely(!(ctx_ctrl & CTX_CTRL_NOPID)))
+			if (pid != ctxpid)
+				goto denied;
+
+		if (lli) {
+			list_for_each_entry(lun_access, &ctxi->luns, list)
+				if (lun_access->lli == lli)
+					goto out;
+			goto denied;
+		}
+	}
+
+out:
+	dev_dbg(dev, "%s: rctxid=%016llX ctxinfo=%p ctxpid=%u pid=%u "
+		"ctx_ctrl=%u\n", __func__, rctxid, ctxi, ctxpid, pid,
+		ctx_ctrl);
+
+	return ctxi;
+
+denied:
+	mutex_unlock(&ctxi->mutex);
+	ctxi = NULL;
+	goto out;
+}
+
+/**
+ * put_context() - release a context that was retrieved from get_context()
+ * @ctxi:	Context to release.
+ *
+ * For now, releasing the context equates to unlocking it's mutex.
+ */
+void put_context(struct ctx_info *ctxi)
+{
+	mutex_unlock(&ctxi->mutex);
+}
+
+/**
+ * afu_attach() - attach a context to the AFU
+ * @cfg:	Internal structure associated with the host.
+ * @ctxi:	Context to attach.
+ *
+ * Upon setting the context capabilities, they must be confirmed with
+ * a read back operation as the context might have been closed since
+ * the mailbox was unlocked. When this occurs, registration is failed.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int afu_attach(struct cxlflash_cfg *cfg, struct ctx_info *ctxi)
+{
+	struct device *dev = &cfg->dev->dev;
+	struct afu *afu = cfg->afu;
+	struct sisl_ctrl_map *ctrl_map = ctxi->ctrl_map;
+	int rc = 0;
+	u64 val;
+
+	/* Unlock cap and restrict user to read/write cmds in translated mode */
+	readq_be(&ctrl_map->mbox_r);
+	val = (SISL_CTX_CAP_READ_CMD | SISL_CTX_CAP_WRITE_CMD);
+	writeq_be(val, &ctrl_map->ctx_cap);
+	val = readq_be(&ctrl_map->ctx_cap);
+	if (val != (SISL_CTX_CAP_READ_CMD | SISL_CTX_CAP_WRITE_CMD)) {
+		dev_err(dev, "%s: ctx may be closed val=%016llX\n",
+			__func__, val);
+		rc = -EAGAIN;
+		goto out;
+	}
+
+	/* Set up MMIO registers pointing to the RHT */
+	writeq_be((u64)ctxi->rht_start, &ctrl_map->rht_start);
+	val = SISL_RHT_CNT_ID((u64)MAX_RHT_PER_CONTEXT, (u64)(afu->ctx_hndl));
+	writeq_be(val, &ctrl_map->rht_cnt_id);
+out:
+	dev_dbg(dev, "%s: returning rc=%d\n", __func__, rc);
+	return rc;
+}
+
+/**
+ * read_cap16() - issues a SCSI READ_CAP16 command
+ * @sdev:	SCSI device associated with LUN.
+ * @lli:	LUN destined for capacity request.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int read_cap16(struct scsi_device *sdev, struct llun_info *lli)
+{
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct device *dev = &cfg->dev->dev;
+	struct glun_info *gli = lli->parent;
+	u8 *cmd_buf = NULL;
+	u8 *scsi_cmd = NULL;
+	u8 *sense_buf = NULL;
+	int rc = 0;
+	int result = 0;
+	int retry_cnt = 0;
+	u32 tout = (MC_DISCOVERY_TIMEOUT * HZ);
+
+retry:
+	cmd_buf = kzalloc(CMD_BUFSIZE, GFP_KERNEL);
+	scsi_cmd = kzalloc(MAX_COMMAND_SIZE, GFP_KERNEL);
+	sense_buf = kzalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL);
+	if (unlikely(!cmd_buf || !scsi_cmd || !sense_buf)) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	scsi_cmd[0] = SERVICE_ACTION_IN_16;	/* read cap(16) */
+	scsi_cmd[1] = SAI_READ_CAPACITY_16;	/* service action */
+	put_unaligned_be32(CMD_BUFSIZE, &scsi_cmd[10]);
+
+	dev_dbg(dev, "%s: %ssending cmd(0x%x)\n", __func__,
+		retry_cnt ? "re" : "", scsi_cmd[0]);
+
+	result = scsi_execute(sdev, scsi_cmd, DMA_FROM_DEVICE, cmd_buf,
+			      CMD_BUFSIZE, sense_buf, tout, 5, 0, NULL);
+
+	if (driver_byte(result) == DRIVER_SENSE) {
+		result &= ~(0xFF<<24); /* DRIVER_SENSE is not an error */
+		if (result & SAM_STAT_CHECK_CONDITION) {
+			struct scsi_sense_hdr sshdr;
+
+			scsi_normalize_sense(sense_buf, SCSI_SENSE_BUFFERSIZE,
+					    &sshdr);
+			switch (sshdr.sense_key) {
+			case NO_SENSE:
+			case RECOVERED_ERROR:
+				/* fall through */
+			case NOT_READY:
+				result &= ~SAM_STAT_CHECK_CONDITION;
+				break;
+			case UNIT_ATTENTION:
+				switch (sshdr.asc) {
+				case 0x29: /* Power on Reset or Device Reset */
+					/* fall through */
+				case 0x2A: /* Device capacity changed */
+				case 0x3F: /* Report LUNs changed */
+					/* Retry the command once more */
+					if (retry_cnt++ < 1) {
+						kfree(cmd_buf);
+						kfree(scsi_cmd);
+						kfree(sense_buf);
+						goto retry;
+					}
+				}
+				break;
+			default:
+				break;
+			}
+		}
+	}
+
+	if (result) {
+		dev_err(dev, "%s: command failed, result=0x%x\n",
+			__func__, result);
+		rc = -EIO;
+		goto out;
+	}
+
+	/*
+	 * Read cap was successful, grab values from the buffer;
+	 * note that we don't need to worry about unaligned access
+	 * as the buffer is allocated on an aligned boundary.
+	 */
+	mutex_lock(&gli->mutex);
+	gli->max_lba = be64_to_cpu(*((u64 *)&cmd_buf[0]));
+	gli->blk_len = be32_to_cpu(*((u32 *)&cmd_buf[8]));
+	mutex_unlock(&gli->mutex);
+
+out:
+	kfree(cmd_buf);
+	kfree(scsi_cmd);
+	kfree(sense_buf);
+
+	dev_dbg(dev, "%s: maxlba=%lld blklen=%d rc=%d\n",
+		__func__, gli->max_lba, gli->blk_len, rc);
+	return rc;
+}
+
+/**
+ * get_rhte() - obtains validated resource handle table entry reference
+ * @ctxi:	Context owning the resource handle.
+ * @rhndl:	Resource handle associated with entry.
+ * @lli:	LUN associated with request.
+ *
+ * Return: Validated RHTE on success, NULL on failure
+ */
+struct sisl_rht_entry *get_rhte(struct ctx_info *ctxi, res_hndl_t rhndl,
+				struct llun_info *lli)
+{
+	struct sisl_rht_entry *rhte = NULL;
+
+	if (unlikely(!ctxi->rht_start)) {
+		pr_debug("%s: Context does not have allocated RHT!\n",
+			 __func__);
+		goto out;
+	}
+
+	if (unlikely(rhndl >= MAX_RHT_PER_CONTEXT)) {
+		pr_debug("%s: Bad resource handle! (%d)\n", __func__, rhndl);
+		goto out;
+	}
+
+	if (unlikely(ctxi->rht_lun[rhndl] != lli)) {
+		pr_debug("%s: Bad resource handle LUN! (%d)\n",
+			 __func__, rhndl);
+		goto out;
+	}
+
+	rhte = &ctxi->rht_start[rhndl];
+	if (unlikely(rhte->nmask == 0)) {
+		pr_debug("%s: Unopened resource handle! (%d)\n",
+			 __func__, rhndl);
+		rhte = NULL;
+		goto out;
+	}
+
+out:
+	return rhte;
+}
+
+/**
+ * rhte_checkout() - obtains free/empty resource handle table entry
+ * @ctxi:	Context owning the resource handle.
+ * @lli:	LUN associated with request.
+ *
+ * Return: Free RHTE on success, NULL on failure
+ */
+struct sisl_rht_entry *rhte_checkout(struct ctx_info *ctxi,
+				     struct llun_info *lli)
+{
+	struct sisl_rht_entry *rhte = NULL;
+	int i;
+
+	/* Find a free RHT entry */
+	for (i = 0; i < MAX_RHT_PER_CONTEXT; i++)
+		if (ctxi->rht_start[i].nmask == 0) {
+			rhte = &ctxi->rht_start[i];
+			ctxi->rht_out++;
+			break;
+		}
+
+	if (likely(rhte))
+		ctxi->rht_lun[i] = lli;
+
+	pr_debug("%s: returning rhte=%p (%d)\n", __func__, rhte, i);
+	return rhte;
+}
+
+/**
+ * rhte_checkin() - releases a resource handle table entry
+ * @ctxi:	Context owning the resource handle.
+ * @rhte:	RHTE to release.
+ */
+void rhte_checkin(struct ctx_info *ctxi,
+		  struct sisl_rht_entry *rhte)
+{
+	u32 rsrc_handle = rhte - ctxi->rht_start;
+
+	rhte->nmask = 0;
+	rhte->fp = 0;
+	ctxi->rht_out--;
+	ctxi->rht_lun[rsrc_handle] = NULL;
+}
+
+/**
+ * rhte_format1() - populates a RHTE for format 1
+ * @rhte:	RHTE to populate.
+ * @lun_id:	LUN ID of LUN associated with RHTE.
+ * @perm:	Desired permissions for RHTE.
+ * @port_sel:	Port selection mask
+ */
+static void rht_format1(struct sisl_rht_entry *rhte, u64 lun_id, u32 perm,
+			u32 port_sel)
+{
+	/*
+	 * Populate the Format 1 RHT entry for direct access (physical
+	 * LUN) using the synchronization sequence defined in the
+	 * SISLite specification.
+	 */
+	struct sisl_rht_entry_f1 dummy = { 0 };
+	struct sisl_rht_entry_f1 *rhte_f1 = (struct sisl_rht_entry_f1 *)rhte;
+
+	memset(rhte_f1, 0, sizeof(*rhte_f1));
+	rhte_f1->fp = SISL_RHT_FP(1U, 0);
+	dma_wmb(); /* Make setting of format bit visible */
+
+	rhte_f1->lun_id = lun_id;
+	dma_wmb(); /* Make setting of LUN id visible */
+
+	/*
+	 * Use a dummy RHT Format 1 entry to build the second dword
+	 * of the entry that must be populated in a single write when
+	 * enabled (valid bit set to TRUE).
+	 */
+	dummy.valid = 0x80;
+	dummy.fp = SISL_RHT_FP(1U, perm);
+	dummy.port_sel = port_sel;
+	rhte_f1->dw = dummy.dw;
+
+	dma_wmb(); /* Make remaining RHT entry fields visible */
+}
+
+/**
+ * cxlflash_lun_attach() - attaches a user to a LUN and manages the LUN's mode
+ * @gli:	LUN to attach.
+ * @mode:	Desired mode of the LUN.
+ * @locked:	Mutex status on current thread.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int cxlflash_lun_attach(struct glun_info *gli, enum lun_mode mode, bool locked)
+{
+	int rc = 0;
+
+	if (!locked)
+		mutex_lock(&gli->mutex);
+
+	if (gli->mode == MODE_NONE)
+		gli->mode = mode;
+	else if (gli->mode != mode) {
+		pr_debug("%s: LUN operating in mode %d, requested mode %d\n",
+			 __func__, gli->mode, mode);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	gli->users++;
+	WARN_ON(gli->users <= 0);
+out:
+	pr_debug("%s: Returning rc=%d gli->mode=%u gli->users=%u\n",
+		 __func__, rc, gli->mode, gli->users);
+	if (!locked)
+		mutex_unlock(&gli->mutex);
+	return rc;
+}
+
+/**
+ * cxlflash_lun_detach() - detaches a user from a LUN and resets the LUN's mode
+ * @gli:	LUN to detach.
+ */
+void cxlflash_lun_detach(struct glun_info *gli)
+{
+	mutex_lock(&gli->mutex);
+	WARN_ON(gli->mode == MODE_NONE);
+	if (--gli->users == 0)
+		gli->mode = MODE_NONE;
+	pr_debug("%s: gli->users=%u\n", __func__, gli->users);
+	WARN_ON(gli->users < 0);
+	mutex_unlock(&gli->mutex);
+}
+
+/**
+ * _cxlflash_disk_release() - releases the specified resource entry
+ * @sdev:	SCSI device associated with LUN.
+ * @ctxi:	Context owning resources.
+ * @release:	Release ioctl data structure.
+ *
+ * Note that the AFU sync should _not_ be performed when the context is sitting
+ * on the error recovery list. A context on the error recovery list is not known
+ * to the AFU due to reset. When the context is recovered, it will be reattached
+ * and made known again to the AFU.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int _cxlflash_disk_release(struct scsi_device *sdev,
+			   struct ctx_info *ctxi,
+			   struct dk_cxlflash_release *release)
+{
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct device *dev = &cfg->dev->dev;
+	struct llun_info *lli = sdev->hostdata;
+	struct glun_info *gli = lli->parent;
+	struct afu *afu = cfg->afu;
+	bool put_ctx = false;
+
+	res_hndl_t rhndl = release->rsrc_handle;
+
+	int rc = 0;
+	u64 ctxid = DECODE_CTXID(release->context_id),
+	    rctxid = release->context_id;
+
+	struct sisl_rht_entry *rhte;
+	struct sisl_rht_entry_f1 *rhte_f1;
+
+	dev_dbg(dev, "%s: ctxid=%llu rhndl=0x%llx gli->mode=%u gli->users=%u\n",
+		__func__, ctxid, release->rsrc_handle, gli->mode, gli->users);
+
+	if (!ctxi) {
+		ctxi = get_context(cfg, rctxid, lli, CTX_CTRL_ERR_FALLBACK);
+		if (unlikely(!ctxi)) {
+			dev_dbg(dev, "%s: Bad context! (%llu)\n",
+				__func__, ctxid);
+			rc = -EINVAL;
+			goto out;
+		}
+
+		put_ctx = true;
+	}
+
+	rhte = get_rhte(ctxi, rhndl, lli);
+	if (unlikely(!rhte)) {
+		dev_dbg(dev, "%s: Bad resource handle! (%d)\n",
+			__func__, rhndl);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	switch (gli->mode) {
+	case MODE_PHYSICAL:
+		/*
+		 * Clear the Format 1 RHT entry for direct access
+		 * (physical LUN) using the synchronization sequence
+		 * defined in the SISLite specification.
+		 */
+		rhte_f1 = (struct sisl_rht_entry_f1 *)rhte;
+
+		rhte_f1->valid = 0;
+		dma_wmb(); /* Make revocation of RHT entry visible */
+
+		rhte_f1->lun_id = 0;
+		dma_wmb(); /* Make clearing of LUN id visible */
+
+		rhte_f1->dw = 0;
+		dma_wmb(); /* Make RHT entry bottom-half clearing visible */
+
+		if (!ctxi->err_recovery_active)
+			cxlflash_afu_sync(afu, ctxid, rhndl, AFU_HW_SYNC);
+		break;
+	default:
+		WARN(1, "Unsupported LUN mode!");
+		goto out;
+	}
+
+	rhte_checkin(ctxi, rhte);
+	cxlflash_lun_detach(gli);
+
+out:
+	if (put_ctx)
+		put_context(ctxi);
+	dev_dbg(dev, "%s: returning rc=%d\n", __func__, rc);
+	return rc;
+}
+
+int cxlflash_disk_release(struct scsi_device *sdev,
+			  struct dk_cxlflash_release *release)
+{
+	return _cxlflash_disk_release(sdev, NULL, release);
+}
+
+/**
+ * destroy_context() - releases a context
+ * @cfg:	Internal structure associated with the host.
+ * @ctxi:	Context to release.
+ *
+ * Note that the rht_lun member of the context was cut from a single
+ * allocation when the context was created and therefore does not need
+ * to be explicitly freed. Also note that we conditionally check for the
+ * existence of the context control map before clearing the RHT registers
+ * and context capabilities because it is possible to destroy a context
+ * while the context is in the error state (previous mapping was removed
+ * [so we don't have to worry about clearing] and context is waiting for
+ * a new mapping).
+ */
+static void destroy_context(struct cxlflash_cfg *cfg,
+			    struct ctx_info *ctxi)
+{
+	struct afu *afu = cfg->afu;
+
+	WARN_ON(!list_empty(&ctxi->luns));
+
+	/* Clear RHT registers and drop all capabilities for this context */
+	if (afu->afu_map && ctxi->ctrl_map) {
+		writeq_be(0, &ctxi->ctrl_map->rht_start);
+		writeq_be(0, &ctxi->ctrl_map->rht_cnt_id);
+		writeq_be(0, &ctxi->ctrl_map->ctx_cap);
+	}
+
+	/* Free memory associated with context */
+	free_page((ulong)ctxi->rht_start);
+	kfree(ctxi->rht_lun);
+	kfree(ctxi);
+	atomic_dec_if_positive(&cfg->num_user_contexts);
+}
+
+/**
+ * create_context() - allocates and initializes a context
+ * @cfg:	Internal structure associated with the host.
+ * @ctx:	Previously obtained CXL context reference.
+ * @ctxid:	Previously obtained process element associated with CXL context.
+ * @adap_fd:	Previously obtained adapter fd associated with CXL context.
+ * @file:	Previously obtained file associated with CXL context.
+ * @perms:	User-specified permissions.
+ *
+ * The context's mutex is locked when an allocated context is returned.
+ *
+ * Return: Allocated context on success, NULL on failure
+ */
+static struct ctx_info *create_context(struct cxlflash_cfg *cfg,
+				       struct cxl_context *ctx, int ctxid,
+				       int adap_fd, struct file *file,
+				       u32 perms)
+{
+	struct device *dev = &cfg->dev->dev;
+	struct afu *afu = cfg->afu;
+	struct ctx_info *ctxi = NULL;
+	struct llun_info **lli = NULL;
+	struct sisl_rht_entry *rhte;
+
+	ctxi = kzalloc(sizeof(*ctxi), GFP_KERNEL);
+	lli = kzalloc((MAX_RHT_PER_CONTEXT * sizeof(*lli)), GFP_KERNEL);
+	if (unlikely(!ctxi || !lli)) {
+		dev_err(dev, "%s: Unable to allocate context!\n", __func__);
+		goto err;
+	}
+
+	rhte = (struct sisl_rht_entry *)get_zeroed_page(GFP_KERNEL);
+	if (unlikely(!rhte)) {
+		dev_err(dev, "%s: Unable to allocate RHT!\n", __func__);
+		goto err;
+	}
+
+	ctxi->rht_lun = lli;
+	ctxi->rht_start = rhte;
+	ctxi->rht_perms = perms;
+
+	ctxi->ctrl_map = &afu->afu_map->ctrls[ctxid].ctrl;
+	ctxi->ctxid = ENCODE_CTXID(ctxi, ctxid);
+	ctxi->lfd = adap_fd;
+	ctxi->pid = current->tgid; /* tgid = pid */
+	ctxi->ctx = ctx;
+	ctxi->file = file;
+	mutex_init(&ctxi->mutex);
+	INIT_LIST_HEAD(&ctxi->luns);
+	INIT_LIST_HEAD(&ctxi->list); /* initialize for list_empty() */
+
+	atomic_inc(&cfg->num_user_contexts);
+	mutex_lock(&ctxi->mutex);
+out:
+	return ctxi;
+
+err:
+	kfree(lli);
+	kfree(ctxi);
+	ctxi = NULL;
+	goto out;
+}
+
+/**
+ * _cxlflash_disk_detach() - detaches a LUN from a context
+ * @sdev:	SCSI device associated with LUN.
+ * @ctxi:	Context owning resources.
+ * @detach:	Detach ioctl data structure.
+ *
+ * As part of the detach, all per-context resources associated with the LUN
+ * are cleaned up. When detaching the last LUN for a context, the context
+ * itself is cleaned up and released.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int _cxlflash_disk_detach(struct scsi_device *sdev,
+				 struct ctx_info *ctxi,
+				 struct dk_cxlflash_detach *detach)
+{
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct device *dev = &cfg->dev->dev;
+	struct llun_info *lli = sdev->hostdata;
+	struct lun_access *lun_access, *t;
+	struct dk_cxlflash_release rel;
+	bool put_ctx = false;
+
+	int i;
+	int rc = 0;
+	int lfd;
+	u64 ctxid = DECODE_CTXID(detach->context_id),
+	    rctxid = detach->context_id;
+
+	dev_dbg(dev, "%s: ctxid=%llu\n", __func__, ctxid);
+
+	if (!ctxi) {
+		ctxi = get_context(cfg, rctxid, lli, CTX_CTRL_ERR_FALLBACK);
+		if (unlikely(!ctxi)) {
+			dev_dbg(dev, "%s: Bad context! (%llu)\n",
+				__func__, ctxid);
+			rc = -EINVAL;
+			goto out;
+		}
+
+		put_ctx = true;
+	}
+
+	/* Cleanup outstanding resources tied to this LUN */
+	if (ctxi->rht_out) {
+		marshal_det_to_rele(detach, &rel);
+		for (i = 0; i < MAX_RHT_PER_CONTEXT; i++) {
+			if (ctxi->rht_lun[i] == lli) {
+				rel.rsrc_handle = i;
+				_cxlflash_disk_release(sdev, ctxi, &rel);
+			}
+
+			/* No need to loop further if we're done */
+			if (ctxi->rht_out == 0)
+				break;
+		}
+	}
+
+	/* Take our LUN out of context, free the node */
+	list_for_each_entry_safe(lun_access, t, &ctxi->luns, list)
+		if (lun_access->lli == lli) {
+			list_del(&lun_access->list);
+			kfree(lun_access);
+			lun_access = NULL;
+			break;
+		}
+
+	/* Tear down context following last LUN cleanup */
+	if (list_empty(&ctxi->luns)) {
+		ctxi->unavail = true;
+		mutex_unlock(&ctxi->mutex);
+		mutex_lock(&cfg->ctx_tbl_list_mutex);
+		mutex_lock(&ctxi->mutex);
+
+		/* Might not have been in error list so conditionally remove */
+		if (!list_empty(&ctxi->list))
+			list_del(&ctxi->list);
+		cfg->ctx_tbl[ctxid] = NULL;
+		mutex_unlock(&cfg->ctx_tbl_list_mutex);
+		mutex_unlock(&ctxi->mutex);
+
+		lfd = ctxi->lfd;
+		destroy_context(cfg, ctxi);
+		ctxi = NULL;
+		put_ctx = false;
+
+		/*
+		 * As a last step, clean up external resources when not
+		 * already on an external cleanup thread, i.e.: close(adap_fd).
+		 *
+		 * NOTE: this will free up the context from the CXL services,
+		 * allowing it to dole out the same context_id on a future
+		 * (or even currently in-flight) disk_attach operation.
+		 */
+		if (lfd != -1)
+			sys_close(lfd);
+	}
+
+out:
+	if (put_ctx)
+		put_context(ctxi);
+	dev_dbg(dev, "%s: returning rc=%d\n", __func__, rc);
+	return rc;
+}
+
+static int cxlflash_disk_detach(struct scsi_device *sdev,
+				struct dk_cxlflash_detach *detach)
+{
+	return _cxlflash_disk_detach(sdev, NULL, detach);
+}
+
+/**
+ * cxlflash_cxl_release() - release handler for adapter file descriptor
+ * @inode:	File-system inode associated with fd.
+ * @file:	File installed with adapter file descriptor.
+ *
+ * This routine is the release handler for the fops registered with
+ * the CXL services on an initial attach for a context. It is called
+ * when a close is performed on the adapter file descriptor returned
+ * to the user. Programmatically, the user is not required to perform
+ * the close, as it is handled internally via the detach ioctl when
+ * a context is being removed. Note that nothing prevents the user
+ * from performing a close, but the user should be aware that doing
+ * so is considered catastrophic and subsequent usage of the superpipe
+ * API with previously saved off tokens will fail.
+ *
+ * When initiated from an external close (either by the user or via
+ * a process tear down), the routine derives the context reference
+ * and calls detach for each LUN associated with the context. The
+ * final detach operation will cause the context itself to be freed.
+ * Note that the saved off lfd is reset prior to calling detach to
+ * signify that the final detach should not perform a close.
+ *
+ * When initiated from a detach operation as part of the tear down
+ * of a context, the context is first completely freed and then the
+ * close is performed. This routine will fail to derive the context
+ * reference (due to the context having already been freed) and then
+ * call into the CXL release entry point.
+ *
+ * Thus, with exception to when the CXL process element (context id)
+ * lookup fails (a case that should theoretically never occur), every
+ * call into this routine results in a complete freeing of a context.
+ *
+ * As part of the detach, all per-context resources associated with the LUN
+ * are cleaned up. When detaching the last LUN for a context, the context
+ * itself is cleaned up and released.
+ *
+ * Return: 0 on success
+ */
+static int cxlflash_cxl_release(struct inode *inode, struct file *file)
+{
+	struct cxl_context *ctx = cxl_fops_get_context(file);
+	struct cxlflash_cfg *cfg = container_of(file->f_op, struct cxlflash_cfg,
+						cxl_fops);
+	struct device *dev = &cfg->dev->dev;
+	struct ctx_info *ctxi = NULL;
+	struct dk_cxlflash_detach detach = { { 0 }, 0 };
+	struct lun_access *lun_access, *t;
+	enum ctx_ctrl ctrl = CTX_CTRL_ERR_FALLBACK | CTX_CTRL_FILE;
+	int ctxid;
+
+	ctxid = cxl_process_element(ctx);
+	if (unlikely(ctxid < 0)) {
+		dev_err(dev, "%s: Context %p was closed! (%d)\n",
+			__func__, ctx, ctxid);
+		goto out;
+	}
+
+	ctxi = get_context(cfg, ctxid, file, ctrl);
+	if (unlikely(!ctxi)) {
+		ctxi = get_context(cfg, ctxid, file, ctrl | CTX_CTRL_CLONE);
+		if (!ctxi) {
+			dev_dbg(dev, "%s: Context %d already free!\n",
+				__func__, ctxid);
+			goto out_release;
+		}
+
+		dev_dbg(dev, "%s: Another process owns context %d!\n",
+			__func__, ctxid);
+		put_context(ctxi);
+		goto out;
+	}
+
+	dev_dbg(dev, "%s: close(%d) for context %d\n",
+		__func__, ctxi->lfd, ctxid);
+
+	/* Reset the file descriptor to indicate we're on a close() thread */
+	ctxi->lfd = -1;
+	detach.context_id = ctxi->ctxid;
+	list_for_each_entry_safe(lun_access, t, &ctxi->luns, list)
+		_cxlflash_disk_detach(lun_access->sdev, ctxi, &detach);
+out_release:
+	cxl_fd_release(inode, file);
+out:
+	dev_dbg(dev, "%s: returning\n", __func__);
+	return 0;
+}
+
+/**
+ * unmap_context() - clears a previously established mapping
+ * @ctxi:	Context owning the mapping.
+ *
+ * This routine is used to switch between the error notification page
+ * (dummy page of all 1's) and the real mapping (established by the CXL
+ * fault handler).
+ */
+static void unmap_context(struct ctx_info *ctxi)
+{
+	unmap_mapping_range(ctxi->file->f_mapping, 0, 0, 1);
+}
+
+/**
+ * get_err_page() - obtains and allocates the error notification page
+ *
+ * Return: error notification page on success, NULL on failure
+ */
+static struct page *get_err_page(void)
+{
+	struct page *err_page = global.err_page;
+
+	if (unlikely(!err_page)) {
+		err_page = alloc_page(GFP_KERNEL);
+		if (unlikely(!err_page)) {
+			pr_err("%s: Unable to allocate err_page!\n", __func__);
+			goto out;
+		}
+
+		memset(page_address(err_page), -1, PAGE_SIZE);
+
+		/* Serialize update w/ other threads to avoid a leak */
+		mutex_lock(&global.mutex);
+		if (likely(!global.err_page))
+			global.err_page = err_page;
+		else {
+			__free_page(err_page);
+			err_page = global.err_page;
+		}
+		mutex_unlock(&global.mutex);
+	}
+
+out:
+	pr_debug("%s: returning err_page=%p\n", __func__, err_page);
+	return err_page;
+}
+
+/**
+ * cxlflash_mmap_fault() - mmap fault handler for adapter file descriptor
+ * @vma:	VM area associated with mapping.
+ * @vmf:	VM fault associated with current fault.
+ *
+ * To support error notification via MMIO, faults are 'caught' by this routine
+ * that was inserted before passing back the adapter file descriptor on attach.
+ * When a fault occurs, this routine evaluates if error recovery is active and
+ * if so, installs the error page to 'notify' the user about the error state.
+ * During normal operation, the fault is simply handled by the original fault
+ * handler that was installed by CXL services as part of initializing the
+ * adapter file descriptor. The VMA's page protection bits are toggled to
+ * indicate cached/not-cached depending on the memory backing the fault.
+ *
+ * Return: 0 on success, VM_FAULT_SIGBUS on failure
+ */
+static int cxlflash_mmap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	struct file *file = vma->vm_file;
+	struct cxl_context *ctx = cxl_fops_get_context(file);
+	struct cxlflash_cfg *cfg = container_of(file->f_op, struct cxlflash_cfg,
+						cxl_fops);
+	struct device *dev = &cfg->dev->dev;
+	struct ctx_info *ctxi = NULL;
+	struct page *err_page = NULL;
+	enum ctx_ctrl ctrl = CTX_CTRL_ERR_FALLBACK | CTX_CTRL_FILE;
+	int rc = 0;
+	int ctxid;
+
+	ctxid = cxl_process_element(ctx);
+	if (unlikely(ctxid < 0)) {
+		dev_err(dev, "%s: Context %p was closed! (%d)\n",
+			__func__, ctx, ctxid);
+		goto err;
+	}
+
+	ctxi = get_context(cfg, ctxid, file, ctrl);
+	if (unlikely(!ctxi)) {
+		dev_dbg(dev, "%s: Bad context! (%d)\n", __func__, ctxid);
+		goto err;
+	}
+
+	dev_dbg(dev, "%s: fault(%d) for context %d\n",
+		__func__, ctxi->lfd, ctxid);
+
+	if (likely(!ctxi->err_recovery_active)) {
+		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+		rc = ctxi->cxl_mmap_vmops->fault(vma, vmf);
+	} else {
+		dev_dbg(dev, "%s: err recovery active, use err_page!\n",
+			__func__);
+
+		err_page = get_err_page();
+		if (unlikely(!err_page)) {
+			dev_err(dev, "%s: Could not obtain error page!\n",
+				__func__);
+			rc = VM_FAULT_RETRY;
+			goto out;
+		}
+
+		get_page(err_page);
+		vmf->page = err_page;
+		vma->vm_page_prot = pgprot_cached(vma->vm_page_prot);
+	}
+
+out:
+	if (likely(ctxi))
+		put_context(ctxi);
+	dev_dbg(dev, "%s: returning rc=%d\n", __func__, rc);
+	return rc;
+
+err:
+	rc = VM_FAULT_SIGBUS;
+	goto out;
+}
+
+/*
+ * Local MMAP vmops to 'catch' faults
+ */
+static const struct vm_operations_struct cxlflash_mmap_vmops = {
+	.fault = cxlflash_mmap_fault,
+};
+
+/**
+ * cxlflash_cxl_mmap() - mmap handler for adapter file descriptor
+ * @file:	File installed with adapter file descriptor.
+ * @vma:	VM area associated with mapping.
+ *
+ * Installs local mmap vmops to 'catch' faults for error notification support.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int cxlflash_cxl_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct cxl_context *ctx = cxl_fops_get_context(file);
+	struct cxlflash_cfg *cfg = container_of(file->f_op, struct cxlflash_cfg,
+						cxl_fops);
+	struct device *dev = &cfg->dev->dev;
+	struct ctx_info *ctxi = NULL;
+	enum ctx_ctrl ctrl = CTX_CTRL_ERR_FALLBACK | CTX_CTRL_FILE;
+	int ctxid;
+	int rc = 0;
+
+	ctxid = cxl_process_element(ctx);
+	if (unlikely(ctxid < 0)) {
+		dev_err(dev, "%s: Context %p was closed! (%d)\n",
+			__func__, ctx, ctxid);
+		rc = -EIO;
+		goto out;
+	}
+
+	ctxi = get_context(cfg, ctxid, file, ctrl);
+	if (unlikely(!ctxi)) {
+		dev_dbg(dev, "%s: Bad context! (%d)\n", __func__, ctxid);
+		rc = -EIO;
+		goto out;
+	}
+
+	dev_dbg(dev, "%s: mmap(%d) for context %d\n",
+		__func__, ctxi->lfd, ctxid);
+
+	rc = cxl_fd_mmap(file, vma);
+	if (likely(!rc)) {
+		/* Insert ourself in the mmap fault handler path */
+		ctxi->cxl_mmap_vmops = vma->vm_ops;
+		vma->vm_ops = &cxlflash_mmap_vmops;
+	}
+
+out:
+	if (likely(ctxi))
+		put_context(ctxi);
+	return rc;
+}
+
+/*
+ * Local fops for adapter file descriptor
+ */
+static const struct file_operations cxlflash_cxl_fops = {
+	.owner = THIS_MODULE,
+	.mmap = cxlflash_cxl_mmap,
+	.release = cxlflash_cxl_release,
+};
+
+/**
+ * cxlflash_mark_contexts_error() - move contexts to error state and list
+ * @cfg:	Internal structure associated with the host.
+ *
+ * A context is only moved over to the error list when there are no outstanding
+ * references to it. This ensures that a running operation has completed.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int cxlflash_mark_contexts_error(struct cxlflash_cfg *cfg)
+{
+	int i, rc = 0;
+	struct ctx_info *ctxi = NULL;
+
+	mutex_lock(&cfg->ctx_tbl_list_mutex);
+
+	for (i = 0; i < MAX_CONTEXT; i++) {
+		ctxi = cfg->ctx_tbl[i];
+		if (ctxi) {
+			mutex_lock(&ctxi->mutex);
+			cfg->ctx_tbl[i] = NULL;
+			list_add(&ctxi->list, &cfg->ctx_err_recovery);
+			ctxi->err_recovery_active = true;
+			ctxi->ctrl_map = NULL;
+			unmap_context(ctxi);
+			mutex_unlock(&ctxi->mutex);
+		}
+	}
+
+	mutex_unlock(&cfg->ctx_tbl_list_mutex);
+	return rc;
+}
+
+/*
+ * Dummy NULL fops
+ */
+static const struct file_operations null_fops = {
+	.owner = THIS_MODULE,
+};
+
+/**
+ * cxlflash_disk_attach() - attach a LUN to a context
+ * @sdev:	SCSI device associated with LUN.
+ * @attach:	Attach ioctl data structure.
+ *
+ * Creates a context and attaches LUN to it. A LUN can only be attached
+ * one time to a context (subsequent attaches for the same context/LUN pair
+ * are not supported). Additional LUNs can be attached to a context by
+ * specifying the 'reuse' flag defined in the cxlflash_ioctl.h header.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int cxlflash_disk_attach(struct scsi_device *sdev,
+				struct dk_cxlflash_attach *attach)
+{
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct device *dev = &cfg->dev->dev;
+	struct afu *afu = cfg->afu;
+	struct llun_info *lli = sdev->hostdata;
+	struct glun_info *gli = lli->parent;
+	struct cxl_ioctl_start_work *work;
+	struct ctx_info *ctxi = NULL;
+	struct lun_access *lun_access = NULL;
+	int rc = 0;
+	u32 perms;
+	int ctxid = -1;
+	u64 rctxid = 0UL;
+	struct file *file;
+
+	struct cxl_context *ctx;
+
+	int fd = -1;
+
+	/* On first attach set fileops */
+	if (atomic_read(&cfg->num_user_contexts) == 0)
+		cfg->cxl_fops = cxlflash_cxl_fops;
+
+	if (attach->num_interrupts > 4) {
+		dev_dbg(dev, "%s: Cannot support this many interrupts %llu\n",
+			__func__, attach->num_interrupts);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	if (gli->max_lba == 0) {
+		dev_dbg(dev, "%s: No capacity info for this LUN (%016llX)\n",
+			__func__, lli->lun_id[sdev->channel]);
+		rc = read_cap16(sdev, lli);
+		if (rc) {
+			dev_err(dev, "%s: Invalid device! (%d)\n",
+				__func__, rc);
+			rc = -ENODEV;
+			goto out;
+		}
+		dev_dbg(dev, "%s: LBA = %016llX\n", __func__, gli->max_lba);
+		dev_dbg(dev, "%s: BLK_LEN = %08X\n", __func__, gli->blk_len);
+	}
+
+	if (attach->hdr.flags & DK_CXLFLASH_ATTACH_REUSE_CONTEXT) {
+		rctxid = attach->context_id;
+		ctxi = get_context(cfg, rctxid, NULL, 0);
+		if (!ctxi) {
+			dev_dbg(dev, "%s: Bad context! (%016llX)\n",
+				__func__, rctxid);
+			rc = -EINVAL;
+			goto out;
+		}
+
+		list_for_each_entry(lun_access, &ctxi->luns, list)
+			if (lun_access->lli == lli) {
+				dev_dbg(dev, "%s: Already attached!\n",
+					__func__);
+				rc = -EINVAL;
+				goto out;
+			}
+	}
+
+	lun_access = kzalloc(sizeof(*lun_access), GFP_KERNEL);
+	if (unlikely(!lun_access)) {
+		dev_err(dev, "%s: Unable to allocate lun_access!\n", __func__);
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	lun_access->lli = lli;
+	lun_access->sdev = sdev;
+
+	/* Non-NULL context indicates reuse */
+	if (ctxi) {
+		dev_dbg(dev, "%s: Reusing context for LUN! (%016llX)\n",
+			__func__, rctxid);
+		list_add(&lun_access->list, &ctxi->luns);
+		fd = ctxi->lfd;
+		goto out_attach;
+	}
+
+	ctx = cxl_dev_context_init(cfg->dev);
+	if (unlikely(IS_ERR_OR_NULL(ctx))) {
+		dev_err(dev, "%s: Could not initialize context %p\n",
+			__func__, ctx);
+		rc = -ENODEV;
+		goto err0;
+	}
+
+	ctxid = cxl_process_element(ctx);
+	if (unlikely((ctxid > MAX_CONTEXT) || (ctxid < 0))) {
+		dev_err(dev, "%s: ctxid (%d) invalid!\n", __func__, ctxid);
+		rc = -EPERM;
+		goto err1;
+	}
+
+	file = cxl_get_fd(ctx, &cfg->cxl_fops, &fd);
+	if (unlikely(fd < 0)) {
+		rc = -ENODEV;
+		dev_err(dev, "%s: Could not get file descriptor\n", __func__);
+		goto err1;
+	}
+
+	/* Translate read/write O_* flags from fcntl.h to AFU permission bits */
+	perms = SISL_RHT_PERM(attach->hdr.flags + 1);
+
+	ctxi = create_context(cfg, ctx, ctxid, fd, file, perms);
+	if (unlikely(!ctxi)) {
+		dev_err(dev, "%s: Failed to create context! (%d)\n",
+			__func__, ctxid);
+		goto err2;
+	}
+
+	work = &ctxi->work;
+	work->num_interrupts = attach->num_interrupts;
+	work->flags = CXL_START_WORK_NUM_IRQS;
+
+	rc = cxl_start_work(ctx, work);
+	if (unlikely(rc)) {
+		dev_dbg(dev, "%s: Could not start context rc=%d\n",
+			__func__, rc);
+		goto err3;
+	}
+
+	rc = afu_attach(cfg, ctxi);
+	if (unlikely(rc)) {
+		dev_err(dev, "%s: Could not attach AFU rc %d\n", __func__, rc);
+		goto err4;
+	}
+
+	/*
+	 * No error paths after this point. Once the fd is installed it's
+	 * visible to user space and can't be undone safely on this thread.
+	 * There is no need to worry about a deadlock here because no one
+	 * knows about us yet; we can be the only one holding our mutex.
+	 */
+	list_add(&lun_access->list, &ctxi->luns);
+	mutex_unlock(&ctxi->mutex);
+	mutex_lock(&cfg->ctx_tbl_list_mutex);
+	mutex_lock(&ctxi->mutex);
+	cfg->ctx_tbl[ctxid] = ctxi;
+	mutex_unlock(&cfg->ctx_tbl_list_mutex);
+	fd_install(fd, file);
+
+out_attach:
+	attach->hdr.return_flags = 0;
+	attach->context_id = ctxi->ctxid;
+	attach->block_size = gli->blk_len;
+	attach->mmio_size = sizeof(afu->afu_map->hosts[0].harea);
+	attach->last_lba = gli->max_lba;
+	attach->max_xfer = (sdev->host->max_sectors * 512) / gli->blk_len;
+
+out:
+	attach->adap_fd = fd;
+
+	if (ctxi)
+		put_context(ctxi);
+
+	dev_dbg(dev, "%s: returning ctxid=%d fd=%d bs=%lld rc=%d llba=%lld\n",
+		__func__, ctxid, fd, attach->block_size, rc, attach->last_lba);
+	return rc;
+
+err4:
+	cxl_stop_context(ctx);
+err3:
+	put_context(ctxi);
+	destroy_context(cfg, ctxi);
+	ctxi = NULL;
+err2:
+	/*
+	 * Here, we're overriding the fops with a dummy all-NULL fops because
+	 * fput() calls the release fop, which will cause us to mistakenly
+	 * call into the CXL code. Rather than try to add yet more complexity
+	 * to that routine (cxlflash_cxl_release) we should try to fix the
+	 * issue here.
+	 */
+	file->f_op = &null_fops;
+	fput(file);
+	put_unused_fd(fd);
+	fd = -1;
+err1:
+	cxl_release_context(ctx);
+err0:
+	kfree(lun_access);
+	goto out;
+}
+
+/**
+ * recover_context() - recovers a context in error
+ * @cfg:	Internal structure associated with the host.
+ * @ctxi:	Context to release.
+ *
+ * Restablishes the state for a context-in-error.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int recover_context(struct cxlflash_cfg *cfg, struct ctx_info *ctxi)
+{
+	struct device *dev = &cfg->dev->dev;
+	int rc = 0;
+	int old_fd, fd = -1;
+	int ctxid = -1;
+	struct file *file;
+	struct cxl_context *ctx;
+	struct afu *afu = cfg->afu;
+
+	ctx = cxl_dev_context_init(cfg->dev);
+	if (unlikely(IS_ERR_OR_NULL(ctx))) {
+		dev_err(dev, "%s: Could not initialize context %p\n",
+			__func__, ctx);
+		rc = -ENODEV;
+		goto out;
+	}
+
+	ctxid = cxl_process_element(ctx);
+	if (unlikely((ctxid > MAX_CONTEXT) || (ctxid < 0))) {
+		dev_err(dev, "%s: ctxid (%d) invalid!\n", __func__, ctxid);
+		rc = -EPERM;
+		goto err1;
+	}
+
+	file = cxl_get_fd(ctx, &cfg->cxl_fops, &fd);
+	if (unlikely(fd < 0)) {
+		rc = -ENODEV;
+		dev_err(dev, "%s: Could not get file descriptor\n", __func__);
+		goto err1;
+	}
+
+	rc = cxl_start_work(ctx, &ctxi->work);
+	if (unlikely(rc)) {
+		dev_dbg(dev, "%s: Could not start context rc=%d\n",
+			__func__, rc);
+		goto err2;
+	}
+
+	/* Update with new MMIO area based on updated context id */
+	ctxi->ctrl_map = &afu->afu_map->ctrls[ctxid].ctrl;
+
+	rc = afu_attach(cfg, ctxi);
+	if (rc) {
+		dev_err(dev, "%s: Could not attach AFU rc %d\n", __func__, rc);
+		goto err3;
+	}
+
+	/*
+	 * No error paths after this point. Once the fd is installed it's
+	 * visible to user space and can't be undone safely on this thread.
+	 */
+	old_fd = ctxi->lfd;
+	ctxi->ctxid = ENCODE_CTXID(ctxi, ctxid);
+	ctxi->lfd = fd;
+	ctxi->ctx = ctx;
+	ctxi->file = file;
+
+	/*
+	 * Put context back in table (note the reinit of the context list);
+	 * we must first drop the context's mutex and then acquire it in
+	 * order with the table/list mutex to avoid a deadlock - safe to do
+	 * here because no one can find us at this moment in time.
+	 */
+	mutex_unlock(&ctxi->mutex);
+	mutex_lock(&cfg->ctx_tbl_list_mutex);
+	mutex_lock(&ctxi->mutex);
+	list_del_init(&ctxi->list);
+	cfg->ctx_tbl[ctxid] = ctxi;
+	mutex_unlock(&cfg->ctx_tbl_list_mutex);
+	fd_install(fd, file);
+
+	/* Release the original adapter fd and associated CXL resources */
+	sys_close(old_fd);
+out:
+	dev_dbg(dev, "%s: returning ctxid=%d fd=%d rc=%d\n",
+		__func__, ctxid, fd, rc);
+	return rc;
+
+err3:
+	cxl_stop_context(ctx);
+err2:
+	fput(file);
+	put_unused_fd(fd);
+err1:
+	cxl_release_context(ctx);
+	goto out;
+}
+
+/**
+ * check_state() - checks and responds to the current adapter state
+ * @cfg:	Internal structure associated with the host.
+ *
+ * This routine can block and should only be used on process context.
+ * Note that when waking up from waiting in limbo, the state is unknown
+ * and must be checked again before proceeding.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int check_state(struct cxlflash_cfg *cfg)
+{
+	struct device *dev = &cfg->dev->dev;
+	int rc = 0;
+
+retry:
+	switch (cfg->state) {
+	case STATE_LIMBO:
+		dev_dbg(dev, "%s: Limbo, going to wait...\n", __func__);
+		rc = wait_event_interruptible(cfg->limbo_waitq,
+					      cfg->state != STATE_LIMBO);
+		if (unlikely(rc))
+			break;
+		goto retry;
+	case STATE_FAILTERM:
+		dev_dbg(dev, "%s: Failed/Terminating!\n", __func__);
+		rc = -ENODEV;
+		break;
+	default:
+		break;
+	}
+
+	return rc;
+}
+
+/**
+ * cxlflash_afu_recover() - initiates AFU recovery
+ * @sdev:	SCSI device associated with LUN.
+ * @recover:	Recover ioctl data structure.
+ *
+ * Only a single recovery is allowed at a time to avoid exhausting CXL
+ * resources (leading to recovery failure) in the event that we're up
+ * against the maximum number of contexts limit. For similar reasons,
+ * a context recovery is retried if there are multiple recoveries taking
+ * place at the same time and the failure was due to CXL services being
+ * unable to keep up.
+ *
+ * Because a user can detect an error condition before the kernel, it is
+ * quite possible for this routine to act as the kernel's EEH detection
+ * source (MMIO read of mbox_r). Because of this, there is a window of
+ * time where an EEH might have been detected but not yet 'serviced'
+ * (callback invoked, causing the device to enter limbo state). To avoid
+ * looping in this routine during that window, a 1 second sleep is in place
+ * between the time the MMIO failure is detected and the time a wait on the
+ * limbo wait queue is attempted via check_state().
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int cxlflash_afu_recover(struct scsi_device *sdev,
+				struct dk_cxlflash_recover_afu *recover)
+{
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct device *dev = &cfg->dev->dev;
+	struct llun_info *lli = sdev->hostdata;
+	struct afu *afu = cfg->afu;
+	struct ctx_info *ctxi = NULL;
+	struct mutex *mutex = &cfg->ctx_recovery_mutex;
+	u64 ctxid = DECODE_CTXID(recover->context_id),
+	    rctxid = recover->context_id;
+	long reg;
+	int lretry = 20; /* up to 2 seconds */
+	int rc = 0;
+
+	atomic_inc(&cfg->recovery_threads);
+	rc = mutex_lock_interruptible(mutex);
+	if (rc)
+		goto out;
+
+	dev_dbg(dev, "%s: reason 0x%016llX rctxid=%016llX\n",
+		__func__, recover->reason, rctxid);
+
+retry:
+	/* Ensure that this process is attached to the context */
+	ctxi = get_context(cfg, rctxid, lli, CTX_CTRL_ERR_FALLBACK);
+	if (unlikely(!ctxi)) {
+		dev_dbg(dev, "%s: Bad context! (%llu)\n", __func__, ctxid);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	if (ctxi->err_recovery_active) {
+retry_recover:
+		rc = recover_context(cfg, ctxi);
+		if (unlikely(rc)) {
+			dev_err(dev, "%s: Recovery failed for context %llu (rc=%d)\n",
+				__func__, ctxid, rc);
+			if ((rc == -ENODEV) &&
+			    ((atomic_read(&cfg->recovery_threads) > 1) ||
+			     (lretry--))) {
+				dev_dbg(dev, "%s: Going to try again!\n",
+					__func__);
+				mutex_unlock(mutex);
+				msleep(100);
+				rc = mutex_lock_interruptible(mutex);
+				if (rc)
+					goto out;
+				goto retry_recover;
+			}
+
+			goto out;
+		}
+
+		ctxi->err_recovery_active = false;
+		recover->context_id = ctxi->ctxid;
+		recover->adap_fd = ctxi->lfd;
+		recover->mmio_size = sizeof(afu->afu_map->hosts[0].harea);
+		recover->hdr.return_flags |=
+			DK_CXLFLASH_RECOVER_AFU_CONTEXT_RESET;
+		goto out;
+	}
+
+	/* Test if in error state */
+	reg = readq_be(&afu->ctrl_map->mbox_r);
+	if (reg == -1) {
+		dev_dbg(dev, "%s: MMIO read fail! Wait for recovery...\n",
+			__func__);
+		mutex_unlock(&ctxi->mutex);
+		ctxi = NULL;
+		ssleep(1);
+		rc = check_state(cfg);
+		if (unlikely(rc))
+			goto out;
+		goto retry;
+	}
+
+	dev_dbg(dev, "%s: MMIO working, no recovery required!\n", __func__);
+out:
+	if (likely(ctxi))
+		put_context(ctxi);
+	mutex_unlock(mutex);
+	atomic_dec_if_positive(&cfg->recovery_threads);
+	return rc;
+}
+
+/**
+ * process_sense() - evaluates and processes sense data
+ * @sdev:	SCSI device associated with LUN.
+ * @verify:	Verify ioctl data structure.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int process_sense(struct scsi_device *sdev,
+			 struct dk_cxlflash_verify *verify)
+{
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct device *dev = &cfg->dev->dev;
+	struct llun_info *lli = sdev->hostdata;
+	struct glun_info *gli = lli->parent;
+	u64 prev_lba = gli->max_lba;
+	struct scsi_sense_hdr sshdr = { 0 };
+	int rc = 0;
+
+	rc = scsi_normalize_sense((const u8 *)&verify->sense_data,
+				  DK_CXLFLASH_VERIFY_SENSE_LEN, &sshdr);
+	if (!rc) {
+		dev_err(dev, "%s: Failed to normalize sense data!\n", __func__);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	switch (sshdr.sense_key) {
+	case NO_SENSE:
+	case RECOVERED_ERROR:
+		/* fall through */
+	case NOT_READY:
+		break;
+	case UNIT_ATTENTION:
+		switch (sshdr.asc) {
+		case 0x29: /* Power on Reset or Device Reset */
+			/* fall through */
+		case 0x2A: /* Device settings/capacity changed */
+			rc = read_cap16(sdev, lli);
+			if (rc) {
+				rc = -ENODEV;
+				break;
+			}
+			if (prev_lba != gli->max_lba)
+				dev_dbg(dev, "%s: Capacity changed old=%lld "
+					"new=%lld\n", __func__, prev_lba,
+					gli->max_lba);
+			break;
+		case 0x3F: /* Report LUNs changed, Rescan. */
+			scsi_scan_host(cfg->host);
+			break;
+		default:
+			rc = -EIO;
+			break;
+		}
+		break;
+	default:
+		rc = -EIO;
+		break;
+	}
+out:
+	dev_dbg(dev, "%s: sense_key %x asc %x ascq %x rc %d\n", __func__,
+		sshdr.sense_key, sshdr.asc, sshdr.ascq, rc);
+	return rc;
+}
+
+/**
+ * cxlflash_disk_verify() - verifies a LUN is the same and handle size changes
+ * @sdev:	SCSI device associated with LUN.
+ * @verify:	Verify ioctl data structure.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int cxlflash_disk_verify(struct scsi_device *sdev,
+				struct dk_cxlflash_verify *verify)
+{
+	int rc = 0;
+	struct ctx_info *ctxi = NULL;
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct device *dev = &cfg->dev->dev;
+	struct llun_info *lli = sdev->hostdata;
+	struct glun_info *gli = lli->parent;
+	struct sisl_rht_entry *rhte = NULL;
+	res_hndl_t rhndl = verify->rsrc_handle;
+	u64 ctxid = DECODE_CTXID(verify->context_id),
+	    rctxid = verify->context_id;
+	u64 last_lba = 0;
+
+	dev_dbg(dev, "%s: ctxid=%llu rhndl=%016llX, hint=%016llX, "
+		"flags=%016llX\n", __func__, ctxid, verify->rsrc_handle,
+		verify->hint, verify->hdr.flags);
+
+	ctxi = get_context(cfg, rctxid, lli, 0);
+	if (unlikely(!ctxi)) {
+		dev_dbg(dev, "%s: Bad context! (%llu)\n", __func__, ctxid);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	rhte = get_rhte(ctxi, rhndl, lli);
+	if (unlikely(!rhte)) {
+		dev_dbg(dev, "%s: Bad resource handle! (%d)\n",
+			__func__, rhndl);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	/*
+	 * Look at the hint/sense to see if it requires us to redrive
+	 * inquiry (i.e. the Unit attention is due to the WWN changing).
+	 */
+	if (verify->hint & DK_CXLFLASH_VERIFY_HINT_SENSE) {
+		rc = process_sense(sdev, verify);
+		if (unlikely(rc)) {
+			dev_err(dev, "%s: Failed to validate sense data (%d)\n",
+				__func__, rc);
+			goto out;
+		}
+	}
+
+	switch (gli->mode) {
+	case MODE_PHYSICAL:
+		last_lba = gli->max_lba;
+		break;
+	default:
+		WARN(1, "Unsupported LUN mode!");
+	}
+
+	verify->last_lba = last_lba;
+
+out:
+	if (likely(ctxi))
+		put_context(ctxi);
+	dev_dbg(dev, "%s: returning rc=%d llba=%llX\n",
+		__func__, rc, verify->last_lba);
+	return rc;
+}
+
+/**
+ * decode_ioctl() - translates an encoded ioctl to an easily identifiable string
+ * @cmd:	The ioctl command to decode.
+ *
+ * Return: A string identifying the decoded ioctl.
+ */
+static char *decode_ioctl(int cmd)
+{
+	switch (cmd) {
+	case DK_CXLFLASH_ATTACH:
+		return __stringify_1(DK_CXLFLASH_ATTACH);
+	case DK_CXLFLASH_USER_DIRECT:
+		return __stringify_1(DK_CXLFLASH_USER_DIRECT);
+	case DK_CXLFLASH_RELEASE:
+		return __stringify_1(DK_CXLFLASH_RELEASE);
+	case DK_CXLFLASH_DETACH:
+		return __stringify_1(DK_CXLFLASH_DETACH);
+	case DK_CXLFLASH_VERIFY:
+		return __stringify_1(DK_CXLFLASH_VERIFY);
+	case DK_CXLFLASH_RECOVER_AFU:
+		return __stringify_1(DK_CXLFLASH_RECOVER_AFU);
+	case DK_CXLFLASH_MANAGE_LUN:
+		return __stringify_1(DK_CXLFLASH_MANAGE_LUN);
+	}
+
+	return "UNKNOWN";
+}
+
+/**
+ * cxlflash_disk_direct_open() - opens a direct (physical) disk
+ * @sdev:	SCSI device associated with LUN.
+ * @arg:	UDirect ioctl data structure.
+ *
+ * On successful return, the user is informed of the resource handle
+ * to be used to identify the direct lun and the size (in blocks) of
+ * the direct lun in last LBA format.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int cxlflash_disk_direct_open(struct scsi_device *sdev, void *arg)
+{
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct device *dev = &cfg->dev->dev;
+	struct afu *afu = cfg->afu;
+	struct llun_info *lli = sdev->hostdata;
+	struct glun_info *gli = lli->parent;
+
+	struct dk_cxlflash_udirect *pphys = (struct dk_cxlflash_udirect *)arg;
+
+	u64 ctxid = DECODE_CTXID(pphys->context_id),
+	    rctxid = pphys->context_id;
+	u64 lun_size = 0;
+	u64 last_lba = 0;
+	u64 rsrc_handle = -1;
+	u32 port = CHAN2PORT(sdev->channel);
+
+	int rc = 0;
+
+	struct ctx_info *ctxi = NULL;
+	struct sisl_rht_entry *rhte = NULL;
+
+	pr_debug("%s: ctxid=%llu ls=0x%llx\n", __func__, ctxid, lun_size);
+
+	rc = cxlflash_lun_attach(gli, MODE_PHYSICAL, false);
+	if (unlikely(rc)) {
+		dev_dbg(dev, "%s: Failed to attach to LUN! (PHYSICAL)\n",
+			__func__);
+		goto out;
+	}
+
+	ctxi = get_context(cfg, rctxid, lli, 0);
+	if (unlikely(!ctxi)) {
+		dev_dbg(dev, "%s: Bad context! (%llu)\n", __func__, ctxid);
+		rc = -EINVAL;
+		goto err1;
+	}
+
+	rhte = rhte_checkout(ctxi, lli);
+	if (unlikely(!rhte)) {
+		dev_dbg(dev, "%s: too many opens for this context\n", __func__);
+		rc = -EMFILE;	/* too many opens  */
+		goto err1;
+	}
+
+	rsrc_handle = (rhte - ctxi->rht_start);
+
+	rht_format1(rhte, lli->lun_id[sdev->channel], ctxi->rht_perms, port);
+	cxlflash_afu_sync(afu, ctxid, rsrc_handle, AFU_LW_SYNC);
+
+	last_lba = gli->max_lba;
+	pphys->hdr.return_flags = 0;
+	pphys->last_lba = last_lba;
+	pphys->rsrc_handle = rsrc_handle;
+
+out:
+	if (likely(ctxi))
+		put_context(ctxi);
+	dev_dbg(dev, "%s: returning handle 0x%llx rc=%d llba %lld\n",
+		__func__, rsrc_handle, rc, last_lba);
+	return rc;
+
+err1:
+	cxlflash_lun_detach(gli);
+	goto out;
+}
+
+/**
+ * ioctl_common() - common IOCTL handler for driver
+ * @sdev:	SCSI device associated with LUN.
+ * @cmd:	IOCTL command.
+ *
+ * Handles common fencing operations that are valid for multiple ioctls. Always
+ * allow through ioctls that are cleanup oriented in nature, even when operating
+ * in a failed/terminating state.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int ioctl_common(struct scsi_device *sdev, int cmd)
+{
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct device *dev = &cfg->dev->dev;
+	struct llun_info *lli = sdev->hostdata;
+	int rc = 0;
+
+	if (unlikely(!lli)) {
+		dev_dbg(dev, "%s: Unknown LUN\n", __func__);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	rc = check_state(cfg);
+	if (unlikely(rc) && (cfg->state == STATE_FAILTERM)) {
+		switch (cmd) {
+		case DK_CXLFLASH_RELEASE:
+		case DK_CXLFLASH_DETACH:
+			dev_dbg(dev, "%s: Command override! (%d)\n",
+				__func__, rc);
+			rc = 0;
+			break;
+		}
+	}
+out:
+	return rc;
+}
+
+/**
+ * cxlflash_ioctl() - IOCTL handler for driver
+ * @sdev:	SCSI device associated with LUN.
+ * @cmd:	IOCTL command.
+ * @arg:	Userspace ioctl data structure.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int cxlflash_ioctl(struct scsi_device *sdev, int cmd, void __user *arg)
+{
+	typedef int (*sioctl) (struct scsi_device *, void *);
+
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct device *dev = &cfg->dev->dev;
+	struct afu *afu = cfg->afu;
+	struct dk_cxlflash_hdr *hdr;
+	char buf[sizeof(union cxlflash_ioctls)];
+	size_t size = 0;
+	bool known_ioctl = false;
+	int idx;
+	int rc = 0;
+	struct Scsi_Host *shost = sdev->host;
+	sioctl do_ioctl = NULL;
+
+	static const struct {
+		size_t size;
+		sioctl ioctl;
+	} ioctl_tbl[] = {	/* NOTE: order matters here */
+	{sizeof(struct dk_cxlflash_attach), (sioctl)cxlflash_disk_attach},
+	{sizeof(struct dk_cxlflash_udirect), cxlflash_disk_direct_open},
+	{sizeof(struct dk_cxlflash_release), (sioctl)cxlflash_disk_release},
+	{sizeof(struct dk_cxlflash_detach), (sioctl)cxlflash_disk_detach},
+	{sizeof(struct dk_cxlflash_verify), (sioctl)cxlflash_disk_verify},
+	{sizeof(struct dk_cxlflash_recover_afu), (sioctl)cxlflash_afu_recover},
+	{sizeof(struct dk_cxlflash_manage_lun), (sioctl)cxlflash_manage_lun},
+	};
+
+	/* Restrict command set to physical support only for internal LUN */
+	if (afu->internal_lun)
+		switch (cmd) {
+		case DK_CXLFLASH_RELEASE:
+			dev_dbg(dev, "%s: %s not supported for lun_mode=%d\n",
+				__func__, decode_ioctl(cmd), afu->internal_lun);
+			rc = -EINVAL;
+			goto cxlflash_ioctl_exit;
+		}
+
+	switch (cmd) {
+	case DK_CXLFLASH_ATTACH:
+	case DK_CXLFLASH_USER_DIRECT:
+	case DK_CXLFLASH_RELEASE:
+	case DK_CXLFLASH_DETACH:
+	case DK_CXLFLASH_VERIFY:
+	case DK_CXLFLASH_RECOVER_AFU:
+		dev_dbg(dev, "%s: %s (%08X) on dev(%d/%d/%d/%llu)\n",
+			__func__, decode_ioctl(cmd), cmd, shost->host_no,
+			sdev->channel, sdev->id, sdev->lun);
+		rc = ioctl_common(sdev, cmd);
+		if (unlikely(rc))
+			goto cxlflash_ioctl_exit;
+
+		/* fall through */
+
+	case DK_CXLFLASH_MANAGE_LUN:
+		known_ioctl = true;
+		idx = _IOC_NR(cmd) - _IOC_NR(DK_CXLFLASH_ATTACH);
+		size = ioctl_tbl[idx].size;
+		do_ioctl = ioctl_tbl[idx].ioctl;
+
+		if (likely(do_ioctl))
+			break;
+
+		/* fall through */
+	default:
+		rc = -EINVAL;
+		goto cxlflash_ioctl_exit;
+	}
+
+	if (unlikely(copy_from_user(&buf, arg, size))) {
+		dev_err(dev, "%s: copy_from_user() fail! "
+			"size=%lu cmd=%d (%s) arg=%p\n",
+			__func__, size, cmd, decode_ioctl(cmd), arg);
+		rc = -EFAULT;
+		goto cxlflash_ioctl_exit;
+	}
+
+	hdr = (struct dk_cxlflash_hdr *)&buf;
+	if (hdr->version != DK_CXLFLASH_VERSION_0) {
+		dev_dbg(dev, "%s: Version %u not supported for %s\n",
+			__func__, hdr->version, decode_ioctl(cmd));
+		rc = -EINVAL;
+		goto cxlflash_ioctl_exit;
+	}
+
+	if (hdr->rsvd[0] || hdr->rsvd[1] || hdr->rsvd[2] || hdr->return_flags) {
+		dev_dbg(dev, "%s: Reserved/rflags populated!\n", __func__);
+		rc = -EINVAL;
+		goto cxlflash_ioctl_exit;
+	}
+
+	rc = do_ioctl(sdev, (void *)&buf);
+	if (likely(!rc))
+		if (unlikely(copy_to_user(arg, &buf, size))) {
+			dev_err(dev, "%s: copy_to_user() fail! "
+				"size=%lu cmd=%d (%s) arg=%p\n",
+				__func__, size, cmd, decode_ioctl(cmd), arg);
+			rc = -EFAULT;
+		}
+
+	/* fall through to exit */
+
+cxlflash_ioctl_exit:
+	if (unlikely(rc && known_ioctl))
+		dev_err(dev, "%s: ioctl %s (%08X) on dev(%d/%d/%d/%llu) "
+			"returned rc %d\n", __func__,
+			decode_ioctl(cmd), cmd, shost->host_no,
+			sdev->channel, sdev->id, sdev->lun, rc);
+	else
+		dev_dbg(dev, "%s: ioctl %s (%08X) on dev(%d/%d/%d/%llu) "
+			"returned rc %d\n", __func__, decode_ioctl(cmd),
+			cmd, shost->host_no, sdev->channel, sdev->id,
+			sdev->lun, rc);
+	return rc;
+}
diff --git a/drivers/scsi/cxlflash/superpipe.h b/drivers/scsi/cxlflash/superpipe.h
new file mode 100644
index 000000000000..ae39b9627118
--- /dev/null
+++ b/drivers/scsi/cxlflash/superpipe.h
@@ -0,0 +1,132 @@
+/*
+ * CXL Flash Device Driver
+ *
+ * Written by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>, IBM Corporation
+ *             Matthew R. Ochs <mrochs@linux.vnet.ibm.com>, IBM Corporation
+ *
+ * Copyright (C) 2015 IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _CXLFLASH_SUPERPIPE_H
+#define _CXLFLASH_SUPERPIPE_H
+
+extern struct cxlflash_global global;
+
+/*
+ * Terminology: use afu (and not adapter) to refer to the HW.
+ * Adapter is the entire slot and includes PSL out of which
+ * only the AFU is visible to user space.
+ */
+
+/* Chunk size parms: note sislite minimum chunk size is
+   0x10000 LBAs corresponding to a NMASK or 16.
+*/
+#define MC_CHUNK_SIZE     (1 << MC_RHT_NMASK)	/* in LBAs */
+
+#define MC_DISCOVERY_TIMEOUT 5  /* 5 secs */
+
+#define CHAN2PORT(_x)	((_x) + 1)
+
+enum lun_mode {
+	MODE_NONE = 0,
+	MODE_PHYSICAL
+};
+
+/* Global (entire driver, spans adapters) lun_info structure */
+struct glun_info {
+	u64 max_lba;		/* from read cap(16) */
+	u32 blk_len;		/* from read cap(16) */
+	enum lun_mode mode;	/* NONE, PHYSICAL */
+	int users;		/* Number of users w/ references to LUN */
+
+	u8 wwid[16];
+
+	struct mutex mutex;
+
+	struct list_head list;
+};
+
+/* Local (per-adapter) lun_info structure */
+struct llun_info {
+	u64 lun_id[CXLFLASH_NUM_FC_PORTS]; /* from REPORT_LUNS */
+	u32 lun_index;		/* Index in the LUN table */
+	u32 host_no;		/* host_no from Scsi_host */
+	u32 port_sel;		/* What port to use for this LUN */
+	bool newly_created;	/* Whether the LUN was just discovered */
+
+	u8 wwid[16];		/* Keep a duplicate copy here? */
+
+	struct glun_info *parent; /* Pointer to entry in global LUN structure */
+	struct scsi_device *sdev;
+	struct list_head list;
+};
+
+struct lun_access {
+	struct llun_info *lli;
+	struct scsi_device *sdev;
+	struct list_head list;
+};
+
+enum ctx_ctrl {
+	CTX_CTRL_CLONE		= (1 << 1),
+	CTX_CTRL_ERR		= (1 << 2),
+	CTX_CTRL_ERR_FALLBACK	= (1 << 3),
+	CTX_CTRL_NOPID		= (1 << 4),
+	CTX_CTRL_FILE		= (1 << 5)
+};
+
+#define ENCODE_CTXID(_ctx, _id)	(((((u64)_ctx) & 0xFFFFFFFF0) << 28) | _id)
+#define DECODE_CTXID(_val)	(_val & 0xFFFFFFFF)
+
+struct ctx_info {
+	struct sisl_ctrl_map *ctrl_map; /* initialized at startup */
+	struct sisl_rht_entry *rht_start; /* 1 page (req'd for alignment),
+					     alloc/free on attach/detach */
+	u32 rht_out;		/* Number of checked out RHT entries */
+	u32 rht_perms;		/* User-defined permissions for RHT entries */
+	struct llun_info **rht_lun;       /* Mapping of RHT entries to LUNs */
+
+	struct cxl_ioctl_start_work work;
+	u64 ctxid;
+	int lfd;
+	pid_t pid;
+	bool unavail;
+	bool err_recovery_active;
+	struct mutex mutex; /* Context protection */
+	struct cxl_context *ctx;
+	struct list_head luns;	/* LUNs attached to this context */
+	const struct vm_operations_struct *cxl_mmap_vmops;
+	struct file *file;
+	struct list_head list; /* Link contexts in error recovery */
+};
+
+struct cxlflash_global {
+	struct mutex mutex;
+	struct list_head gluns;/* list of glun_info structs */
+	struct page *err_page; /* One page of all 0xF for error notification */
+};
+
+int cxlflash_disk_release(struct scsi_device *, struct dk_cxlflash_release *);
+int _cxlflash_disk_release(struct scsi_device *, struct ctx_info *,
+			   struct dk_cxlflash_release *);
+
+int cxlflash_lun_attach(struct glun_info *, enum lun_mode, bool);
+void cxlflash_lun_detach(struct glun_info *);
+
+struct ctx_info *get_context(struct cxlflash_cfg *, u64, void *, enum ctx_ctrl);
+void put_context(struct ctx_info *);
+
+struct sisl_rht_entry *get_rhte(struct ctx_info *, res_hndl_t,
+				struct llun_info *);
+
+struct sisl_rht_entry *rhte_checkout(struct ctx_info *, struct llun_info *);
+void rhte_checkin(struct ctx_info *, struct sisl_rht_entry *);
+
+int cxlflash_manage_lun(struct scsi_device *, struct dk_cxlflash_manage_lun *);
+
+#endif /* ifndef _CXLFLASH_SUPERPIPE_H */
diff --git a/include/uapi/scsi/Kbuild b/include/uapi/scsi/Kbuild
index 75746d52f208..d791e0ad509d 100644
--- a/include/uapi/scsi/Kbuild
+++ b/include/uapi/scsi/Kbuild
@@ -3,3 +3,4 @@ header-y += fc/
 header-y += scsi_bsg_fc.h
 header-y += scsi_netlink.h
 header-y += scsi_netlink_fc.h
+header-y += cxlflash_ioctl.h
diff --git a/include/uapi/scsi/cxlflash_ioctl.h b/include/uapi/scsi/cxlflash_ioctl.h
new file mode 100644
index 000000000000..570773406531
--- /dev/null
+++ b/include/uapi/scsi/cxlflash_ioctl.h
@@ -0,0 +1,140 @@
+/*
+ * CXL Flash Device Driver
+ *
+ * Written by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>, IBM Corporation
+ *             Matthew R. Ochs <mrochs@linux.vnet.ibm.com>, IBM Corporation
+ *
+ * Copyright (C) 2015 IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _CXLFLASH_IOCTL_H
+#define _CXLFLASH_IOCTL_H
+
+#include <linux/types.h>
+
+/*
+ * Structure and flag definitions CXL Flash superpipe ioctls
+ */
+
+#define DK_CXLFLASH_VERSION_0	0
+
+struct dk_cxlflash_hdr {
+	__u16 version;			/* Version data */
+	__u16 rsvd[3];			/* Reserved for future use */
+	__u64 flags;			/* Input flags */
+	__u64 return_flags;		/* Returned flags */
+};
+
+/*
+ * Notes:
+ * -----
+ * The 'context_id' field of all ioctl structures contains the context
+ * identifier for a context in the lower 32-bits (upper 32-bits are not
+ * to be used when identifying a context to the AFU). That said, the value
+ * in its entirety (all 64-bits) is to be treated as an opaque cookie and
+ * should be presented as such when issuing ioctls.
+ *
+ * For DK_CXLFLASH_ATTACH ioctl, user specifies read/write access
+ * permissions via the O_RDONLY, O_WRONLY, and O_RDWR flags defined in
+ * the fcntl.h header file.
+ */
+#define DK_CXLFLASH_ATTACH_REUSE_CONTEXT	0x8000000000000000ULL
+
+struct dk_cxlflash_attach {
+	struct dk_cxlflash_hdr hdr;	/* Common fields */
+	__u64 num_interrupts;		/* Requested number of interrupts */
+	__u64 context_id;		/* Returned context */
+	__u64 mmio_size;		/* Returned size of MMIO area */
+	__u64 block_size;		/* Returned block size, in bytes */
+	__u64 adap_fd;			/* Returned adapter file descriptor */
+	__u64 last_lba;			/* Returned last LBA on the device */
+	__u64 max_xfer;			/* Returned max transfer size, blocks */
+	__u64 reserved[8];		/* Reserved for future use */
+};
+
+struct dk_cxlflash_detach {
+	struct dk_cxlflash_hdr hdr;	/* Common fields */
+	__u64 context_id;		/* Context to detach */
+	__u64 reserved[8];		/* Reserved for future use */
+};
+
+struct dk_cxlflash_udirect {
+	struct dk_cxlflash_hdr hdr;	/* Common fields */
+	__u64 context_id;		/* Context to own physical resources */
+	__u64 rsrc_handle;		/* Returned resource handle */
+	__u64 last_lba;			/* Returned last LBA on the device */
+	__u64 reserved[8];		/* Reserved for future use */
+};
+
+struct dk_cxlflash_release {
+	struct dk_cxlflash_hdr hdr;	/* Common fields */
+	__u64 context_id;		/* Context owning resources */
+	__u64 rsrc_handle;		/* Resource handle to release */
+	__u64 reserved[8];		/* Reserved for future use */
+};
+
+#define DK_CXLFLASH_VERIFY_SENSE_LEN	18
+#define DK_CXLFLASH_VERIFY_HINT_SENSE	0x8000000000000000ULL
+
+struct dk_cxlflash_verify {
+	struct dk_cxlflash_hdr hdr;	/* Common fields */
+	__u64 context_id;		/* Context owning resources to verify */
+	__u64 rsrc_handle;		/* Resource handle of LUN */
+	__u64 hint;			/* Reasons for verify */
+	__u64 last_lba;			/* Returned last LBA of device */
+	__u8 sense_data[DK_CXLFLASH_VERIFY_SENSE_LEN]; /* SCSI sense data */
+	__u8 pad[6];			/* Pad to next 8-byte boundary */
+	__u64 reserved[8];		/* Reserved for future use */
+};
+
+#define DK_CXLFLASH_RECOVER_AFU_CONTEXT_RESET	0x8000000000000000ULL
+
+struct dk_cxlflash_recover_afu {
+	struct dk_cxlflash_hdr hdr;	/* Common fields */
+	__u64 reason;			/* Reason for recovery request */
+	__u64 context_id;		/* Context to recover / updated ID */
+	__u64 mmio_size;		/* Returned size of MMIO area */
+	__u64 adap_fd;			/* Returned adapter file descriptor */
+	__u64 reserved[8];		/* Reserved for future use */
+};
+
+#define DK_CXLFLASH_MANAGE_LUN_WWID_LEN			16
+#define DK_CXLFLASH_MANAGE_LUN_ENABLE_SUPERPIPE		0x8000000000000000ULL
+#define DK_CXLFLASH_MANAGE_LUN_DISABLE_SUPERPIPE	0x4000000000000000ULL
+#define DK_CXLFLASH_MANAGE_LUN_ALL_PORTS_ACCESSIBLE	0x2000000000000000ULL
+
+struct dk_cxlflash_manage_lun {
+	struct dk_cxlflash_hdr hdr;			/* Common fields */
+	__u8 wwid[DK_CXLFLASH_MANAGE_LUN_WWID_LEN];	/* Page83 WWID, NAA-6 */
+	__u64 reserved[8];				/* Rsvd, future use */
+};
+
+union cxlflash_ioctls {
+	struct dk_cxlflash_attach attach;
+	struct dk_cxlflash_detach detach;
+	struct dk_cxlflash_udirect udirect;
+	struct dk_cxlflash_release release;
+	struct dk_cxlflash_verify verify;
+	struct dk_cxlflash_recover_afu recover_afu;
+	struct dk_cxlflash_manage_lun manage_lun;
+};
+
+#define MAX_CXLFLASH_IOCTL_SZ	(sizeof(union cxlflash_ioctls))
+
+#define CXL_MAGIC 0xCA
+#define CXL_IOWR(_n, _s)	_IOWR(CXL_MAGIC, _n, struct _s)
+
+#define DK_CXLFLASH_ATTACH		CXL_IOWR(0x80, dk_cxlflash_attach)
+#define DK_CXLFLASH_USER_DIRECT		CXL_IOWR(0x81, dk_cxlflash_udirect)
+#define DK_CXLFLASH_RELEASE		CXL_IOWR(0x82, dk_cxlflash_release)
+#define DK_CXLFLASH_DETACH		CXL_IOWR(0x83, dk_cxlflash_detach)
+#define DK_CXLFLASH_VERIFY		CXL_IOWR(0x84, dk_cxlflash_verify)
+#define DK_CXLFLASH_RECOVER_AFU		CXL_IOWR(0x85, dk_cxlflash_recover_afu)
+#define DK_CXLFLASH_MANAGE_LUN		CXL_IOWR(0x86, dk_cxlflash_manage_lun)
+
+#endif /* ifndef _CXLFLASH_IOCTL_H */
-- 
cgit v1.2.3


From 2cb79266d6b229dbebd31fe114af1bdab25c8076 Mon Sep 17 00:00:00 2001
From: "Matthew R. Ochs" <mrochs@linux.vnet.ibm.com>
Date: Thu, 13 Aug 2015 21:47:53 -0500
Subject: cxlflash: Virtual LUN support

Add support for physical LUN segmentation (virtual LUNs) to device
driver supporting the IBM CXL Flash adapter. This patch allows user
space applications to virtually segment a physical LUN into N virtual
LUNs, taking advantage of the translation features provided by this
adapter.

Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
---
 Documentation/powerpc/cxlflash.txt |   63 +-
 drivers/scsi/cxlflash/Makefile     |    2 +-
 drivers/scsi/cxlflash/common.h     |    4 +
 drivers/scsi/cxlflash/lunmgt.c     |    3 +
 drivers/scsi/cxlflash/main.c       |   13 +
 drivers/scsi/cxlflash/sislite.h    |   20 +-
 drivers/scsi/cxlflash/superpipe.c  |   82 ++-
 drivers/scsi/cxlflash/superpipe.h  |   17 +-
 drivers/scsi/cxlflash/vlun.c       | 1243 ++++++++++++++++++++++++++++++++++++
 drivers/scsi/cxlflash/vlun.h       |   86 +++
 include/uapi/scsi/cxlflash_ioctl.h |   34 +
 11 files changed, 1550 insertions(+), 17 deletions(-)
 create mode 100644 drivers/scsi/cxlflash/vlun.c
 create mode 100644 drivers/scsi/cxlflash/vlun.h

(limited to 'Documentation')

diff --git a/Documentation/powerpc/cxlflash.txt b/Documentation/powerpc/cxlflash.txt
index f943967f90ce..4202d1bc583c 100644
--- a/Documentation/powerpc/cxlflash.txt
+++ b/Documentation/powerpc/cxlflash.txt
@@ -163,7 +163,8 @@ DK_CXLFLASH_ATTACH
 
         - These tokens are only valid for the process under which they
           were created. The child of a forked process cannot continue
-          to use the context id or file descriptor created by its parent.
+          to use the context id or file descriptor created by its parent
+          (see DK_CXLFLASH_VLUN_CLONE for further details).
 
         - These tokens are only valid for the lifetime of the context and
           the process under which they were created. Once either is
@@ -193,6 +194,45 @@ DK_CXLFLASH_USER_DIRECT
     treated as a resource handle that is returned to the user. The user
     is then able to use the handle to reference the LUN during I/O.
 
+DK_CXLFLASH_USER_VIRTUAL
+------------------------
+    This ioctl is responsible for transitioning the LUN to virtual mode
+    of access and configuring the AFU for virtual access from user space
+    on a per-context basis. Additionally, the block size and last logical
+    block address (LBA) are returned to the user.
+
+    As mentioned previously, when operating in user space access mode,
+    LUNs may be accessed in whole or in part. Only one mode is allowed
+    at a time and if one mode is active (outstanding references exist),
+    requests to use the LUN in a different mode are denied.
+
+    The AFU is configured for virtual access from user space by adding
+    an entry to the AFU's resource handle table. The index of the entry
+    is treated as a resource handle that is returned to the user. The
+    user is then able to use the handle to reference the LUN during I/O.
+
+    By default, the virtual LUN is created with a size of 0. The user
+    would need to use the DK_CXLFLASH_VLUN_RESIZE ioctl to adjust the grow
+    the virtual LUN to a desired size. To avoid having to perform this
+    resize for the initial creation of the virtual LUN, the user has the
+    option of specifying a size as part of the DK_CXLFLASH_USER_VIRTUAL
+    ioctl, such that when success is returned to the user, the
+    resource handle that is provided is already referencing provisioned
+    storage. This is reflected by the last LBA being a non-zero value.
+
+DK_CXLFLASH_VLUN_RESIZE
+-----------------------
+    This ioctl is responsible for resizing a previously created virtual
+    LUN and will fail if invoked upon a LUN that is not in virtual
+    mode. Upon success, an updated last LBA is returned to the user
+    indicating the new size of the virtual LUN associated with the
+    resource handle.
+
+    The partitioning of virtual LUNs is jointly mediated by the cxlflash
+    driver and the AFU. An allocation table is kept for each LUN that is
+    operating in the virtual mode and used to program a LUN translation
+    table that the AFU references when provided with a resource handle.
+
 DK_CXLFLASH_RELEASE
 -------------------
     This ioctl is responsible for releasing a previously obtained
@@ -214,6 +254,27 @@ DK_CXLFLASH_DETACH
     success, all "tokens" which had been provided to the user from the
     DK_CXLFLASH_ATTACH onward are no longer valid.
 
+DK_CXLFLASH_VLUN_CLONE
+----------------------
+    This ioctl is responsible for cloning a previously created
+    context to a more recently created context. It exists solely to
+    support maintaining user space access to storage after a process
+    forks. Upon success, the child process (which invoked the ioctl)
+    will have access to the same LUNs via the same resource handle(s)
+    and fd2 as the parent, but under a different context.
+
+    Context sharing across processes is not supported with CXL and
+    therefore each fork must be met with establishing a new context
+    for the child process. This ioctl simplifies the state management
+    and playback required by a user in such a scenario. When a process
+    forks, child process can clone the parents context by first creating
+    a context (via DK_CXLFLASH_ATTACH) and then using this ioctl to
+    perform the clone from the parent to the child.
+
+    The clone itself is fairly simple. The resource handle and lun
+    translation tables are copied from the parent context to the child's
+    and then synced with the AFU.
+
 DK_CXLFLASH_VERIFY
 ------------------
     This ioctl is used to detect various changes such as the capacity of
diff --git a/drivers/scsi/cxlflash/Makefile b/drivers/scsi/cxlflash/Makefile
index c14d24c720d6..9e39866d473b 100644
--- a/drivers/scsi/cxlflash/Makefile
+++ b/drivers/scsi/cxlflash/Makefile
@@ -1,2 +1,2 @@
 obj-$(CONFIG_CXLFLASH) += cxlflash.o
-cxlflash-y += main.o superpipe.o lunmgt.o
+cxlflash-y += main.o superpipe.o lunmgt.o vlun.o
diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
index d3e54e61c7a5..1c56037146e1 100644
--- a/drivers/scsi/cxlflash/common.h
+++ b/drivers/scsi/cxlflash/common.h
@@ -116,6 +116,9 @@ struct cxlflash_cfg {
 
 	atomic_t num_user_contexts;
 
+	/* Parameters that are LUN table related */
+	int last_lun_index[CXLFLASH_NUM_FC_PORTS];
+	int promote_lun_index;
 	struct list_head lluns; /* list of llun_info structs */
 
 	wait_queue_head_t tmf_waitq;
@@ -200,5 +203,6 @@ int cxlflash_ioctl(struct scsi_device *, int, void __user *);
 void cxlflash_stop_term_user_contexts(struct cxlflash_cfg *);
 int cxlflash_mark_contexts_error(struct cxlflash_cfg *);
 void cxlflash_term_local_luns(struct cxlflash_cfg *);
+void cxlflash_restore_luntable(struct cxlflash_cfg *);
 
 #endif /* ifndef _CXLFLASH_COMMON_H */
diff --git a/drivers/scsi/cxlflash/lunmgt.c b/drivers/scsi/cxlflash/lunmgt.c
index 66d5bef11ee6..d98ad0ff64c1 100644
--- a/drivers/scsi/cxlflash/lunmgt.c
+++ b/drivers/scsi/cxlflash/lunmgt.c
@@ -20,6 +20,7 @@
 
 #include "sislite.h"
 #include "common.h"
+#include "vlun.h"
 #include "superpipe.h"
 
 /**
@@ -42,6 +43,7 @@ static struct llun_info *create_local(struct scsi_device *sdev, u8 *wwid)
 	lli->sdev = sdev;
 	lli->newly_created = true;
 	lli->host_no = sdev->host->host_no;
+	lli->in_table = false;
 
 	memcpy(lli->wwid, wwid, DK_CXLFLASH_MANAGE_LUN_WWID_LEN);
 out:
@@ -208,6 +210,7 @@ void cxlflash_term_global_luns(void)
 	mutex_lock(&global.mutex);
 	list_for_each_entry_safe(gli, temp, &global.gluns, list) {
 		list_del(&gli->list);
+		cxlflash_ba_terminate(&gli->blka.ba_lun);
 		kfree(gli);
 	}
 	mutex_unlock(&global.mutex);
diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 02d464f41b7f..458ed838f83a 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -1989,6 +1989,8 @@ static int init_afu(struct cxlflash_cfg *cfg)
 	afu_err_intr_init(cfg->afu);
 	atomic64_set(&afu->room, readq_be(&afu->host_map->cmd_room));
 
+	/* Restore the LUN mappings */
+	cxlflash_restore_luntable(cfg);
 err1:
 	pr_debug("%s: returning rc=%d\n", __func__, rc);
 	return rc;
@@ -2286,6 +2288,17 @@ static int cxlflash_probe(struct pci_dev *pdev,
 
 	cfg->init_state = INIT_STATE_NONE;
 	cfg->dev = pdev;
+
+	/*
+	 * The promoted LUNs move to the top of the LUN table. The rest stay
+	 * on the bottom half. The bottom half grows from the end
+	 * (index = 255), whereas the top half grows from the beginning
+	 * (index = 0).
+	 */
+	cfg->promote_lun_index  = 0;
+	cfg->last_lun_index[0] = CXLFLASH_NUM_VLUNS/2 - 1;
+	cfg->last_lun_index[1] = CXLFLASH_NUM_VLUNS/2 - 1;
+
 	cfg->dev_id = (struct pci_device_id *)dev_id;
 	cfg->mcctx = NULL;
 
diff --git a/drivers/scsi/cxlflash/sislite.h b/drivers/scsi/cxlflash/sislite.h
index 66b889151a4c..63bf394fe78c 100644
--- a/drivers/scsi/cxlflash/sislite.h
+++ b/drivers/scsi/cxlflash/sislite.h
@@ -397,16 +397,17 @@ struct cxlflash_afu_map {
 	};
 };
 
-/* LBA translation control blocks */
-
+/*
+ * LXT - LBA Translation Table
+ * LXT control blocks
+ */
 struct sisl_lxt_entry {
 	u64 rlba_base;	/* bits 0:47 is base
-				 * b48:55 is lun index
-				 * b58:59 is write & read perms
-				 * (if no perm, afu_rc=0x15)
-				 * b60:63 is port_sel mask
-				 */
-
+			 * b48:55 is lun index
+			 * b58:59 is write & read perms
+			 * (if no perm, afu_rc=0x15)
+			 * b60:63 is port_sel mask
+			 */
 };
 
 /*
@@ -465,4 +466,7 @@ struct sisl_rht_entry_f1 {
 #define TMF_LUN_RESET  0x1U
 #define TMF_CLEAR_ACA  0x2U
 
+
+#define SISLITE_MAX_WS_BLOCKS 512
+
 #endif /* _SISLITE_H */
diff --git a/drivers/scsi/cxlflash/superpipe.c b/drivers/scsi/cxlflash/superpipe.c
index 3c8bce8bbb0b..f1b62cea75b1 100644
--- a/drivers/scsi/cxlflash/superpipe.c
+++ b/drivers/scsi/cxlflash/superpipe.c
@@ -26,10 +26,24 @@
 
 #include "sislite.h"
 #include "common.h"
+#include "vlun.h"
 #include "superpipe.h"
 
 struct cxlflash_global global;
 
+/**
+ * marshal_rele_to_resize() - translate release to resize structure
+ * @rele:	Source structure from which to translate/copy.
+ * @resize:	Destination structure for the translate/copy.
+ */
+static void marshal_rele_to_resize(struct dk_cxlflash_release *release,
+				   struct dk_cxlflash_resize *resize)
+{
+	resize->hdr = release->hdr;
+	resize->context_id = release->context_id;
+	resize->rsrc_handle = release->rsrc_handle;
+}
+
 /**
  * marshal_det_to_rele() - translate detach to release structure
  * @detach:	Destination structure for the translate/copy.
@@ -449,6 +463,7 @@ void rhte_checkin(struct ctx_info *ctxi,
 	rhte->fp = 0;
 	ctxi->rht_out--;
 	ctxi->rht_lun[rsrc_handle] = NULL;
+	ctxi->rht_needs_ws[rsrc_handle] = false;
 }
 
 /**
@@ -526,13 +541,21 @@ out:
 /**
  * cxlflash_lun_detach() - detaches a user from a LUN and resets the LUN's mode
  * @gli:	LUN to detach.
+ *
+ * When resetting the mode, terminate block allocation resources as they
+ * are no longer required (service is safe to call even when block allocation
+ * resources were not present - such as when transitioning from physical mode).
+ * These resources will be reallocated when needed (subsequent transition to
+ * virtual mode).
  */
 void cxlflash_lun_detach(struct glun_info *gli)
 {
 	mutex_lock(&gli->mutex);
 	WARN_ON(gli->mode == MODE_NONE);
-	if (--gli->users == 0)
+	if (--gli->users == 0) {
 		gli->mode = MODE_NONE;
+		cxlflash_ba_terminate(&gli->blka.ba_lun);
+	}
 	pr_debug("%s: gli->users=%u\n", __func__, gli->users);
 	WARN_ON(gli->users < 0);
 	mutex_unlock(&gli->mutex);
@@ -544,10 +567,12 @@ void cxlflash_lun_detach(struct glun_info *gli)
  * @ctxi:	Context owning resources.
  * @release:	Release ioctl data structure.
  *
- * Note that the AFU sync should _not_ be performed when the context is sitting
- * on the error recovery list. A context on the error recovery list is not known
- * to the AFU due to reset. When the context is recovered, it will be reattached
- * and made known again to the AFU.
+ * For LUNs in virtual mode, the virtual LUN associated with the specified
+ * resource handle is resized to 0 prior to releasing the RHTE. Note that the
+ * AFU sync should _not_ be performed when the context is sitting on the error
+ * recovery list. A context on the error recovery list is not known to the AFU
+ * due to reset. When the context is recovered, it will be reattached and made
+ * known again to the AFU.
  *
  * Return: 0 on success, -errno on failure
  */
@@ -562,6 +587,7 @@ int _cxlflash_disk_release(struct scsi_device *sdev,
 	struct afu *afu = cfg->afu;
 	bool put_ctx = false;
 
+	struct dk_cxlflash_resize size;
 	res_hndl_t rhndl = release->rsrc_handle;
 
 	int rc = 0;
@@ -594,7 +620,24 @@ int _cxlflash_disk_release(struct scsi_device *sdev,
 		goto out;
 	}
 
+	/*
+	 * Resize to 0 for virtual LUNS by setting the size
+	 * to 0. This will clear LXT_START and LXT_CNT fields
+	 * in the RHT entry and properly sync with the AFU.
+	 *
+	 * Afterwards we clear the remaining fields.
+	 */
 	switch (gli->mode) {
+	case MODE_VIRTUAL:
+		marshal_rele_to_resize(release, &size);
+		size.req_size = 0;
+		rc = _cxlflash_vlun_resize(sdev, ctxi, &size);
+		if (rc) {
+			dev_dbg(dev, "%s: resize failed rc %d\n", __func__, rc);
+			goto out;
+		}
+
+		break;
 	case MODE_PHYSICAL:
 		/*
 		 * Clear the Format 1 RHT entry for direct access
@@ -666,6 +709,7 @@ static void destroy_context(struct cxlflash_cfg *cfg,
 
 	/* Free memory associated with context */
 	free_page((ulong)ctxi->rht_start);
+	kfree(ctxi->rht_needs_ws);
 	kfree(ctxi->rht_lun);
 	kfree(ctxi);
 	atomic_dec_if_positive(&cfg->num_user_contexts);
@@ -693,11 +737,13 @@ static struct ctx_info *create_context(struct cxlflash_cfg *cfg,
 	struct afu *afu = cfg->afu;
 	struct ctx_info *ctxi = NULL;
 	struct llun_info **lli = NULL;
+	bool *ws = NULL;
 	struct sisl_rht_entry *rhte;
 
 	ctxi = kzalloc(sizeof(*ctxi), GFP_KERNEL);
 	lli = kzalloc((MAX_RHT_PER_CONTEXT * sizeof(*lli)), GFP_KERNEL);
-	if (unlikely(!ctxi || !lli)) {
+	ws = kzalloc((MAX_RHT_PER_CONTEXT * sizeof(*ws)), GFP_KERNEL);
+	if (unlikely(!ctxi || !lli || !ws)) {
 		dev_err(dev, "%s: Unable to allocate context!\n", __func__);
 		goto err;
 	}
@@ -709,6 +755,7 @@ static struct ctx_info *create_context(struct cxlflash_cfg *cfg,
 	}
 
 	ctxi->rht_lun = lli;
+	ctxi->rht_needs_ws = ws;
 	ctxi->rht_start = rhte;
 	ctxi->rht_perms = perms;
 
@@ -728,6 +775,7 @@ out:
 	return ctxi;
 
 err:
+	kfree(ws);
 	kfree(lli);
 	kfree(ctxi);
 	ctxi = NULL;
@@ -1729,6 +1777,12 @@ static int cxlflash_disk_verify(struct scsi_device *sdev,
 	case MODE_PHYSICAL:
 		last_lba = gli->max_lba;
 		break;
+	case MODE_VIRTUAL:
+		/* Cast lxt_cnt to u64 for multiply to be treated as 64bit op */
+		last_lba = ((u64)rhte->lxt_cnt * MC_CHUNK_SIZE * gli->blk_len);
+		last_lba /= CXLFLASH_BLOCK_SIZE;
+		last_lba--;
+		break;
 	default:
 		WARN(1, "Unsupported LUN mode!");
 	}
@@ -1756,12 +1810,18 @@ static char *decode_ioctl(int cmd)
 		return __stringify_1(DK_CXLFLASH_ATTACH);
 	case DK_CXLFLASH_USER_DIRECT:
 		return __stringify_1(DK_CXLFLASH_USER_DIRECT);
+	case DK_CXLFLASH_USER_VIRTUAL:
+		return __stringify_1(DK_CXLFLASH_USER_VIRTUAL);
+	case DK_CXLFLASH_VLUN_RESIZE:
+		return __stringify_1(DK_CXLFLASH_VLUN_RESIZE);
 	case DK_CXLFLASH_RELEASE:
 		return __stringify_1(DK_CXLFLASH_RELEASE);
 	case DK_CXLFLASH_DETACH:
 		return __stringify_1(DK_CXLFLASH_DETACH);
 	case DK_CXLFLASH_VERIFY:
 		return __stringify_1(DK_CXLFLASH_VERIFY);
+	case DK_CXLFLASH_VLUN_CLONE:
+		return __stringify_1(DK_CXLFLASH_VLUN_CLONE);
 	case DK_CXLFLASH_RECOVER_AFU:
 		return __stringify_1(DK_CXLFLASH_RECOVER_AFU);
 	case DK_CXLFLASH_MANAGE_LUN:
@@ -1876,6 +1936,7 @@ static int ioctl_common(struct scsi_device *sdev, int cmd)
 	rc = check_state(cfg);
 	if (unlikely(rc) && (cfg->state == STATE_FAILTERM)) {
 		switch (cmd) {
+		case DK_CXLFLASH_VLUN_RESIZE:
 		case DK_CXLFLASH_RELEASE:
 		case DK_CXLFLASH_DETACH:
 			dev_dbg(dev, "%s: Command override! (%d)\n",
@@ -1923,12 +1984,18 @@ int cxlflash_ioctl(struct scsi_device *sdev, int cmd, void __user *arg)
 	{sizeof(struct dk_cxlflash_verify), (sioctl)cxlflash_disk_verify},
 	{sizeof(struct dk_cxlflash_recover_afu), (sioctl)cxlflash_afu_recover},
 	{sizeof(struct dk_cxlflash_manage_lun), (sioctl)cxlflash_manage_lun},
+	{sizeof(struct dk_cxlflash_uvirtual), cxlflash_disk_virtual_open},
+	{sizeof(struct dk_cxlflash_resize), (sioctl)cxlflash_vlun_resize},
+	{sizeof(struct dk_cxlflash_clone), (sioctl)cxlflash_disk_clone},
 	};
 
 	/* Restrict command set to physical support only for internal LUN */
 	if (afu->internal_lun)
 		switch (cmd) {
 		case DK_CXLFLASH_RELEASE:
+		case DK_CXLFLASH_USER_VIRTUAL:
+		case DK_CXLFLASH_VLUN_RESIZE:
+		case DK_CXLFLASH_VLUN_CLONE:
 			dev_dbg(dev, "%s: %s not supported for lun_mode=%d\n",
 				__func__, decode_ioctl(cmd), afu->internal_lun);
 			rc = -EINVAL;
@@ -1942,6 +2009,9 @@ int cxlflash_ioctl(struct scsi_device *sdev, int cmd, void __user *arg)
 	case DK_CXLFLASH_DETACH:
 	case DK_CXLFLASH_VERIFY:
 	case DK_CXLFLASH_RECOVER_AFU:
+	case DK_CXLFLASH_USER_VIRTUAL:
+	case DK_CXLFLASH_VLUN_RESIZE:
+	case DK_CXLFLASH_VLUN_CLONE:
 		dev_dbg(dev, "%s: %s (%08X) on dev(%d/%d/%d/%llu)\n",
 			__func__, decode_ioctl(cmd), cmd, shost->host_no,
 			sdev->channel, sdev->id, sdev->lun);
diff --git a/drivers/scsi/cxlflash/superpipe.h b/drivers/scsi/cxlflash/superpipe.h
index ae39b9627118..d7dc88bc64a4 100644
--- a/drivers/scsi/cxlflash/superpipe.h
+++ b/drivers/scsi/cxlflash/superpipe.h
@@ -31,9 +31,11 @@ extern struct cxlflash_global global;
 #define MC_DISCOVERY_TIMEOUT 5  /* 5 secs */
 
 #define CHAN2PORT(_x)	((_x) + 1)
+#define PORT2CHAN(_x)	((_x) - 1)
 
 enum lun_mode {
 	MODE_NONE = 0,
+	MODE_VIRTUAL,
 	MODE_PHYSICAL
 };
 
@@ -41,13 +43,14 @@ enum lun_mode {
 struct glun_info {
 	u64 max_lba;		/* from read cap(16) */
 	u32 blk_len;		/* from read cap(16) */
-	enum lun_mode mode;	/* NONE, PHYSICAL */
+	enum lun_mode mode;	/* NONE, VIRTUAL, PHYSICAL */
 	int users;		/* Number of users w/ references to LUN */
 
 	u8 wwid[16];
 
 	struct mutex mutex;
 
+	struct blka blka;
 	struct list_head list;
 };
 
@@ -58,6 +61,7 @@ struct llun_info {
 	u32 host_no;		/* host_no from Scsi_host */
 	u32 port_sel;		/* What port to use for this LUN */
 	bool newly_created;	/* Whether the LUN was just discovered */
+	bool in_table;		/* Whether a LUN table entry was created */
 
 	u8 wwid[16];		/* Keep a duplicate copy here? */
 
@@ -90,6 +94,7 @@ struct ctx_info {
 	u32 rht_out;		/* Number of checked out RHT entries */
 	u32 rht_perms;		/* User-defined permissions for RHT entries */
 	struct llun_info **rht_lun;       /* Mapping of RHT entries to LUNs */
+	bool *rht_needs_ws;	/* User-desired write-same function per RHTE */
 
 	struct cxl_ioctl_start_work work;
 	u64 ctxid;
@@ -111,10 +116,18 @@ struct cxlflash_global {
 	struct page *err_page; /* One page of all 0xF for error notification */
 };
 
+int cxlflash_vlun_resize(struct scsi_device *, struct dk_cxlflash_resize *);
+int _cxlflash_vlun_resize(struct scsi_device *, struct ctx_info *,
+			  struct dk_cxlflash_resize *);
+
 int cxlflash_disk_release(struct scsi_device *, struct dk_cxlflash_release *);
 int _cxlflash_disk_release(struct scsi_device *, struct ctx_info *,
 			   struct dk_cxlflash_release *);
 
+int cxlflash_disk_clone(struct scsi_device *, struct dk_cxlflash_clone *);
+
+int cxlflash_disk_virtual_open(struct scsi_device *, void *);
+
 int cxlflash_lun_attach(struct glun_info *, enum lun_mode, bool);
 void cxlflash_lun_detach(struct glun_info *);
 
@@ -127,6 +140,8 @@ struct sisl_rht_entry *get_rhte(struct ctx_info *, res_hndl_t,
 struct sisl_rht_entry *rhte_checkout(struct ctx_info *, struct llun_info *);
 void rhte_checkin(struct ctx_info *, struct sisl_rht_entry *);
 
+void cxlflash_ba_terminate(struct ba_lun *);
+
 int cxlflash_manage_lun(struct scsi_device *, struct dk_cxlflash_manage_lun *);
 
 #endif /* ifndef _CXLFLASH_SUPERPIPE_H */
diff --git a/drivers/scsi/cxlflash/vlun.c b/drivers/scsi/cxlflash/vlun.c
new file mode 100644
index 000000000000..6155cb1d4ed3
--- /dev/null
+++ b/drivers/scsi/cxlflash/vlun.c
@@ -0,0 +1,1243 @@
+/*
+ * CXL Flash Device Driver
+ *
+ * Written by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>, IBM Corporation
+ *             Matthew R. Ochs <mrochs@linux.vnet.ibm.com>, IBM Corporation
+ *
+ * Copyright (C) 2015 IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/syscalls.h>
+#include <misc/cxl.h>
+#include <asm/unaligned.h>
+#include <asm/bitsperlong.h>
+
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_host.h>
+#include <uapi/scsi/cxlflash_ioctl.h>
+
+#include "sislite.h"
+#include "common.h"
+#include "vlun.h"
+#include "superpipe.h"
+
+/**
+ * marshal_virt_to_resize() - translate uvirtual to resize structure
+ * @virt:	Source structure from which to translate/copy.
+ * @resize:	Destination structure for the translate/copy.
+ */
+static void marshal_virt_to_resize(struct dk_cxlflash_uvirtual *virt,
+				   struct dk_cxlflash_resize *resize)
+{
+	resize->hdr = virt->hdr;
+	resize->context_id = virt->context_id;
+	resize->rsrc_handle = virt->rsrc_handle;
+	resize->req_size = virt->lun_size;
+	resize->last_lba = virt->last_lba;
+}
+
+/**
+ * marshal_clone_to_rele() - translate clone to release structure
+ * @clone:	Source structure from which to translate/copy.
+ * @rele:	Destination structure for the translate/copy.
+ */
+static void marshal_clone_to_rele(struct dk_cxlflash_clone *clone,
+				  struct dk_cxlflash_release *release)
+{
+	release->hdr = clone->hdr;
+	release->context_id = clone->context_id_dst;
+}
+
+/**
+ * ba_init() - initializes a block allocator
+ * @ba_lun:	Block allocator to initialize.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int ba_init(struct ba_lun *ba_lun)
+{
+	struct ba_lun_info *bali = NULL;
+	int lun_size_au = 0, i = 0;
+	int last_word_underflow = 0;
+	u64 *lam;
+
+	pr_debug("%s: Initializing LUN: lun_id = %llX, "
+		 "ba_lun->lsize = %lX, ba_lun->au_size = %lX\n",
+		__func__, ba_lun->lun_id, ba_lun->lsize, ba_lun->au_size);
+
+	/* Calculate bit map size */
+	lun_size_au = ba_lun->lsize / ba_lun->au_size;
+	if (lun_size_au == 0) {
+		pr_debug("%s: Requested LUN size of 0!\n", __func__);
+		return -EINVAL;
+	}
+
+	/* Allocate lun information container */
+	bali = kzalloc(sizeof(struct ba_lun_info), GFP_KERNEL);
+	if (unlikely(!bali)) {
+		pr_err("%s: Failed to allocate lun_info for lun_id %llX\n",
+		       __func__, ba_lun->lun_id);
+		return -ENOMEM;
+	}
+
+	bali->total_aus = lun_size_au;
+	bali->lun_bmap_size = lun_size_au / BITS_PER_LONG;
+
+	if (lun_size_au % BITS_PER_LONG)
+		bali->lun_bmap_size++;
+
+	/* Allocate bitmap space */
+	bali->lun_alloc_map = kzalloc((bali->lun_bmap_size * sizeof(u64)),
+				      GFP_KERNEL);
+	if (unlikely(!bali->lun_alloc_map)) {
+		pr_err("%s: Failed to allocate lun allocation map: "
+		       "lun_id = %llX\n", __func__, ba_lun->lun_id);
+		kfree(bali);
+		return -ENOMEM;
+	}
+
+	/* Initialize the bit map size and set all bits to '1' */
+	bali->free_aun_cnt = lun_size_au;
+
+	for (i = 0; i < bali->lun_bmap_size; i++)
+		bali->lun_alloc_map[i] = 0xFFFFFFFFFFFFFFFFULL;
+
+	/* If the last word not fully utilized, mark extra bits as allocated */
+	last_word_underflow = (bali->lun_bmap_size * BITS_PER_LONG);
+	last_word_underflow -= bali->free_aun_cnt;
+	if (last_word_underflow > 0) {
+		lam = &bali->lun_alloc_map[bali->lun_bmap_size - 1];
+		for (i = (HIBIT - last_word_underflow + 1);
+		     i < BITS_PER_LONG;
+		     i++)
+			clear_bit(i, (ulong *)lam);
+	}
+
+	/* Initialize high elevator index, low/curr already at 0 from kzalloc */
+	bali->free_high_idx = bali->lun_bmap_size;
+
+	/* Allocate clone map */
+	bali->aun_clone_map = kzalloc((bali->total_aus * sizeof(u8)),
+				      GFP_KERNEL);
+	if (unlikely(!bali->aun_clone_map)) {
+		pr_err("%s: Failed to allocate clone map: lun_id = %llX\n",
+		       __func__, ba_lun->lun_id);
+		kfree(bali->lun_alloc_map);
+		kfree(bali);
+		return -ENOMEM;
+	}
+
+	/* Pass the allocated lun info as a handle to the user */
+	ba_lun->ba_lun_handle = bali;
+
+	pr_debug("%s: Successfully initialized the LUN: "
+		 "lun_id = %llX, bitmap size = %X, free_aun_cnt = %llX\n",
+		__func__, ba_lun->lun_id, bali->lun_bmap_size,
+		bali->free_aun_cnt);
+	return 0;
+}
+
+/**
+ * find_free_range() - locates a free bit within the block allocator
+ * @low:	First word in block allocator to start search.
+ * @high:	Last word in block allocator to search.
+ * @bali:	LUN information structure owning the block allocator to search.
+ * @bit_word:	Passes back the word in the block allocator owning the free bit.
+ *
+ * Return: The bit position within the passed back word, -1 on failure
+ */
+static int find_free_range(u32 low,
+			   u32 high,
+			   struct ba_lun_info *bali, int *bit_word)
+{
+	int i;
+	u64 bit_pos = -1;
+	ulong *lam, num_bits;
+
+	for (i = low; i < high; i++)
+		if (bali->lun_alloc_map[i] != 0) {
+			lam = (ulong *)&bali->lun_alloc_map[i];
+			num_bits = (sizeof(*lam) * BITS_PER_BYTE);
+			bit_pos = find_first_bit(lam, num_bits);
+
+			pr_devel("%s: Found free bit %llX in lun "
+				 "map entry %llX at bitmap index = %X\n",
+				 __func__, bit_pos, bali->lun_alloc_map[i],
+				 i);
+
+			*bit_word = i;
+			bali->free_aun_cnt--;
+			clear_bit(bit_pos, lam);
+			break;
+		}
+
+	return bit_pos;
+}
+
+/**
+ * ba_alloc() - allocates a block from the block allocator
+ * @ba_lun:	Block allocator from which to allocate a block.
+ *
+ * Return: The allocated block, -1 on failure
+ */
+static u64 ba_alloc(struct ba_lun *ba_lun)
+{
+	u64 bit_pos = -1;
+	int bit_word = 0;
+	struct ba_lun_info *bali = NULL;
+
+	bali = ba_lun->ba_lun_handle;
+
+	pr_debug("%s: Received block allocation request: "
+		 "lun_id = %llX, free_aun_cnt = %llX\n",
+		 __func__, ba_lun->lun_id, bali->free_aun_cnt);
+
+	if (bali->free_aun_cnt == 0) {
+		pr_debug("%s: No space left on LUN: lun_id = %llX\n",
+			 __func__, ba_lun->lun_id);
+		return -1ULL;
+	}
+
+	/* Search to find a free entry, curr->high then low->curr */
+	bit_pos = find_free_range(bali->free_curr_idx,
+				  bali->free_high_idx, bali, &bit_word);
+	if (bit_pos == -1) {
+		bit_pos = find_free_range(bali->free_low_idx,
+					  bali->free_curr_idx,
+					  bali, &bit_word);
+		if (bit_pos == -1) {
+			pr_debug("%s: Could not find an allocation unit on LUN:"
+				 " lun_id = %llX\n", __func__, ba_lun->lun_id);
+			return -1ULL;
+		}
+	}
+
+	/* Update the free_curr_idx */
+	if (bit_pos == HIBIT)
+		bali->free_curr_idx = bit_word + 1;
+	else
+		bali->free_curr_idx = bit_word;
+
+	pr_debug("%s: Allocating AU number %llX, on lun_id %llX, "
+		 "free_aun_cnt = %llX\n", __func__,
+		 ((bit_word * BITS_PER_LONG) + bit_pos), ba_lun->lun_id,
+		 bali->free_aun_cnt);
+
+	return (u64) ((bit_word * BITS_PER_LONG) + bit_pos);
+}
+
+/**
+ * validate_alloc() - validates the specified block has been allocated
+ * @ba_lun_info:	LUN info owning the block allocator.
+ * @aun:		Block to validate.
+ *
+ * Return: 0 on success, -1 on failure
+ */
+static int validate_alloc(struct ba_lun_info *bali, u64 aun)
+{
+	int idx = 0, bit_pos = 0;
+
+	idx = aun / BITS_PER_LONG;
+	bit_pos = aun % BITS_PER_LONG;
+
+	if (test_bit(bit_pos, (ulong *)&bali->lun_alloc_map[idx]))
+		return -1;
+
+	return 0;
+}
+
+/**
+ * ba_free() - frees a block from the block allocator
+ * @ba_lun:	Block allocator from which to allocate a block.
+ * @to_free:	Block to free.
+ *
+ * Return: 0 on success, -1 on failure
+ */
+static int ba_free(struct ba_lun *ba_lun, u64 to_free)
+{
+	int idx = 0, bit_pos = 0;
+	struct ba_lun_info *bali = NULL;
+
+	bali = ba_lun->ba_lun_handle;
+
+	if (validate_alloc(bali, to_free)) {
+		pr_debug("%s: The AUN %llX is not allocated on lun_id %llX\n",
+			 __func__, to_free, ba_lun->lun_id);
+		return -1;
+	}
+
+	pr_debug("%s: Received a request to free AU %llX on lun_id %llX, "
+		 "free_aun_cnt = %llX\n", __func__, to_free, ba_lun->lun_id,
+		 bali->free_aun_cnt);
+
+	if (bali->aun_clone_map[to_free] > 0) {
+		pr_debug("%s: AUN %llX on lun_id %llX has been cloned. Clone "
+			 "count = %X\n", __func__, to_free, ba_lun->lun_id,
+			 bali->aun_clone_map[to_free]);
+		bali->aun_clone_map[to_free]--;
+		return 0;
+	}
+
+	idx = to_free / BITS_PER_LONG;
+	bit_pos = to_free % BITS_PER_LONG;
+
+	set_bit(bit_pos, (ulong *)&bali->lun_alloc_map[idx]);
+	bali->free_aun_cnt++;
+
+	if (idx < bali->free_low_idx)
+		bali->free_low_idx = idx;
+	else if (idx > bali->free_high_idx)
+		bali->free_high_idx = idx;
+
+	pr_debug("%s: Successfully freed AU at bit_pos %X, bit map index %X on "
+		 "lun_id %llX, free_aun_cnt = %llX\n", __func__, bit_pos, idx,
+		 ba_lun->lun_id, bali->free_aun_cnt);
+
+	return 0;
+}
+
+/**
+ * ba_clone() - Clone a chunk of the block allocation table
+ * @ba_lun:	Block allocator from which to allocate a block.
+ * @to_free:	Block to free.
+ *
+ * Return: 0 on success, -1 on failure
+ */
+static int ba_clone(struct ba_lun *ba_lun, u64 to_clone)
+{
+	struct ba_lun_info *bali = ba_lun->ba_lun_handle;
+
+	if (validate_alloc(bali, to_clone)) {
+		pr_debug("%s: AUN %llX is not allocated on lun_id %llX\n",
+			 __func__, to_clone, ba_lun->lun_id);
+		return -1;
+	}
+
+	pr_debug("%s: Received a request to clone AUN %llX on lun_id %llX\n",
+		 __func__, to_clone, ba_lun->lun_id);
+
+	if (bali->aun_clone_map[to_clone] == MAX_AUN_CLONE_CNT) {
+		pr_debug("%s: AUN %llX on lun_id %llX hit max clones already\n",
+			 __func__, to_clone, ba_lun->lun_id);
+		return -1;
+	}
+
+	bali->aun_clone_map[to_clone]++;
+
+	return 0;
+}
+
+/**
+ * ba_space() - returns the amount of free space left in the block allocator
+ * @ba_lun:	Block allocator.
+ *
+ * Return: Amount of free space in block allocator
+ */
+static u64 ba_space(struct ba_lun *ba_lun)
+{
+	struct ba_lun_info *bali = ba_lun->ba_lun_handle;
+
+	return bali->free_aun_cnt;
+}
+
+/**
+ * cxlflash_ba_terminate() - frees resources associated with the block allocator
+ * @ba_lun:	Block allocator.
+ *
+ * Safe to call in a partially allocated state.
+ */
+void cxlflash_ba_terminate(struct ba_lun *ba_lun)
+{
+	struct ba_lun_info *bali = ba_lun->ba_lun_handle;
+
+	if (bali) {
+		kfree(bali->aun_clone_map);
+		kfree(bali->lun_alloc_map);
+		kfree(bali);
+		ba_lun->ba_lun_handle = NULL;
+	}
+}
+
+/**
+ * init_vlun() - initializes a LUN for virtual use
+ * @lun_info:	LUN information structure that owns the block allocator.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int init_vlun(struct llun_info *lli)
+{
+	int rc = 0;
+	struct glun_info *gli = lli->parent;
+	struct blka *blka = &gli->blka;
+
+	memset(blka, 0, sizeof(*blka));
+	mutex_init(&blka->mutex);
+
+	/* LUN IDs are unique per port, save the index instead */
+	blka->ba_lun.lun_id = lli->lun_index;
+	blka->ba_lun.lsize = gli->max_lba + 1;
+	blka->ba_lun.lba_size = gli->blk_len;
+
+	blka->ba_lun.au_size = MC_CHUNK_SIZE;
+	blka->nchunk = blka->ba_lun.lsize / MC_CHUNK_SIZE;
+
+	rc = ba_init(&blka->ba_lun);
+	if (unlikely(rc))
+		pr_debug("%s: cannot init block_alloc, rc=%d\n", __func__, rc);
+
+	pr_debug("%s: returning rc=%d lli=%p\n", __func__, rc, lli);
+	return rc;
+}
+
+/**
+ * write_same16() - sends a SCSI WRITE_SAME16 (0) command to specified LUN
+ * @sdev:	SCSI device associated with LUN.
+ * @lba:	Logical block address to start write same.
+ * @nblks:	Number of logical blocks to write same.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int write_same16(struct scsi_device *sdev,
+			u64 lba,
+			u32 nblks)
+{
+	u8 *cmd_buf = NULL;
+	u8 *scsi_cmd = NULL;
+	u8 *sense_buf = NULL;
+	int rc = 0;
+	int result = 0;
+	int ws_limit = SISLITE_MAX_WS_BLOCKS;
+	u64 offset = lba;
+	int left = nblks;
+	u32 tout = sdev->request_queue->rq_timeout;
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct device *dev = &cfg->dev->dev;
+
+	cmd_buf = kzalloc(CMD_BUFSIZE, GFP_KERNEL);
+	scsi_cmd = kzalloc(MAX_COMMAND_SIZE, GFP_KERNEL);
+	sense_buf = kzalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL);
+	if (unlikely(!cmd_buf || !scsi_cmd || !sense_buf)) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	while (left > 0) {
+
+		scsi_cmd[0] = WRITE_SAME_16;
+		put_unaligned_be64(offset, &scsi_cmd[2]);
+		put_unaligned_be32(ws_limit < left ? ws_limit : left,
+				   &scsi_cmd[10]);
+
+		result = scsi_execute(sdev, scsi_cmd, DMA_TO_DEVICE, cmd_buf,
+				      CMD_BUFSIZE, sense_buf, tout, 5, 0, NULL);
+		if (result) {
+			dev_err_ratelimited(dev, "%s: command failed for "
+					    "offset %lld result=0x%x\n",
+					    __func__, offset, result);
+			rc = -EIO;
+			goto out;
+		}
+		left -= ws_limit;
+		offset += ws_limit;
+	}
+
+out:
+	kfree(cmd_buf);
+	kfree(scsi_cmd);
+	kfree(sense_buf);
+	pr_debug("%s: returning rc=%d\n", __func__, rc);
+	return rc;
+}
+
+/**
+ * grow_lxt() - expands the translation table associated with the specified RHTE
+ * @afu:	AFU associated with the host.
+ * @sdev:	SCSI device associated with LUN.
+ * @ctxid:	Context ID of context owning the RHTE.
+ * @rhndl:	Resource handle associated with the RHTE.
+ * @rhte:	Resource handle entry (RHTE).
+ * @new_size:	Number of translation entries associated with RHTE.
+ *
+ * By design, this routine employs a 'best attempt' allocation and will
+ * truncate the requested size down if there is not sufficient space in
+ * the block allocator to satisfy the request but there does exist some
+ * amount of space. The user is made aware of this by returning the size
+ * allocated.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int grow_lxt(struct afu *afu,
+		    struct scsi_device *sdev,
+		    ctx_hndl_t ctxid,
+		    res_hndl_t rhndl,
+		    struct sisl_rht_entry *rhte,
+		    u64 *new_size)
+{
+	struct sisl_lxt_entry *lxt = NULL, *lxt_old = NULL;
+	struct llun_info *lli = sdev->hostdata;
+	struct glun_info *gli = lli->parent;
+	struct blka *blka = &gli->blka;
+	u32 av_size;
+	u32 ngrps, ngrps_old;
+	u64 aun;		/* chunk# allocated by block allocator */
+	u64 delta = *new_size - rhte->lxt_cnt;
+	u64 my_new_size;
+	int i, rc = 0;
+
+	/*
+	 * Check what is available in the block allocator before re-allocating
+	 * LXT array. This is done up front under the mutex which must not be
+	 * released until after allocation is complete.
+	 */
+	mutex_lock(&blka->mutex);
+	av_size = ba_space(&blka->ba_lun);
+	if (unlikely(av_size <= 0)) {
+		pr_debug("%s: ba_space error: av_size %d\n", __func__, av_size);
+		mutex_unlock(&blka->mutex);
+		rc = -ENOSPC;
+		goto out;
+	}
+
+	if (av_size < delta)
+		delta = av_size;
+
+	lxt_old = rhte->lxt_start;
+	ngrps_old = LXT_NUM_GROUPS(rhte->lxt_cnt);
+	ngrps = LXT_NUM_GROUPS(rhte->lxt_cnt + delta);
+
+	if (ngrps != ngrps_old) {
+		/* reallocate to fit new size */
+		lxt = kzalloc((sizeof(*lxt) * LXT_GROUP_SIZE * ngrps),
+			      GFP_KERNEL);
+		if (unlikely(!lxt)) {
+			mutex_unlock(&blka->mutex);
+			rc = -ENOMEM;
+			goto out;
+		}
+
+		/* copy over all old entries */
+		memcpy(lxt, lxt_old, (sizeof(*lxt) * rhte->lxt_cnt));
+	} else
+		lxt = lxt_old;
+
+	/* nothing can fail from now on */
+	my_new_size = rhte->lxt_cnt + delta;
+
+	/* add new entries to the end */
+	for (i = rhte->lxt_cnt; i < my_new_size; i++) {
+		/*
+		 * Due to the earlier check of available space, ba_alloc
+		 * cannot fail here. If it did due to internal error,
+		 * leave a rlba_base of -1u which will likely be a
+		 * invalid LUN (too large).
+		 */
+		aun = ba_alloc(&blka->ba_lun);
+		if ((aun == -1ULL) || (aun >= blka->nchunk))
+			pr_debug("%s: ba_alloc error: allocated chunk# %llX, "
+				 "max %llX\n", __func__, aun, blka->nchunk - 1);
+
+		/* select both ports, use r/w perms from RHT */
+		lxt[i].rlba_base = ((aun << MC_CHUNK_SHIFT) |
+				    (lli->lun_index << LXT_LUNIDX_SHIFT) |
+				    (RHT_PERM_RW << LXT_PERM_SHIFT |
+				     lli->port_sel));
+	}
+
+	mutex_unlock(&blka->mutex);
+
+	/*
+	 * The following sequence is prescribed in the SISlite spec
+	 * for syncing up with the AFU when adding LXT entries.
+	 */
+	dma_wmb(); /* Make LXT updates are visible */
+
+	rhte->lxt_start = lxt;
+	dma_wmb(); /* Make RHT entry's LXT table update visible */
+
+	rhte->lxt_cnt = my_new_size;
+	dma_wmb(); /* Make RHT entry's LXT table size update visible */
+
+	cxlflash_afu_sync(afu, ctxid, rhndl, AFU_LW_SYNC);
+
+	/* free old lxt if reallocated */
+	if (lxt != lxt_old)
+		kfree(lxt_old);
+	*new_size = my_new_size;
+out:
+	pr_debug("%s: returning rc=%d\n", __func__, rc);
+	return rc;
+}
+
+/**
+ * shrink_lxt() - reduces translation table associated with the specified RHTE
+ * @afu:	AFU associated with the host.
+ * @sdev:	SCSI device associated with LUN.
+ * @rhndl:	Resource handle associated with the RHTE.
+ * @rhte:	Resource handle entry (RHTE).
+ * @ctxi:	Context owning resources.
+ * @new_size:	Number of translation entries associated with RHTE.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int shrink_lxt(struct afu *afu,
+		      struct scsi_device *sdev,
+		      res_hndl_t rhndl,
+		      struct sisl_rht_entry *rhte,
+		      struct ctx_info *ctxi,
+		      u64 *new_size)
+{
+	struct sisl_lxt_entry *lxt, *lxt_old;
+	struct llun_info *lli = sdev->hostdata;
+	struct glun_info *gli = lli->parent;
+	struct blka *blka = &gli->blka;
+	ctx_hndl_t ctxid = DECODE_CTXID(ctxi->ctxid);
+	bool needs_ws = ctxi->rht_needs_ws[rhndl];
+	bool needs_sync = !ctxi->err_recovery_active;
+	u32 ngrps, ngrps_old;
+	u64 aun;		/* chunk# allocated by block allocator */
+	u64 delta = rhte->lxt_cnt - *new_size;
+	u64 my_new_size;
+	int i, rc = 0;
+
+	lxt_old = rhte->lxt_start;
+	ngrps_old = LXT_NUM_GROUPS(rhte->lxt_cnt);
+	ngrps = LXT_NUM_GROUPS(rhte->lxt_cnt - delta);
+
+	if (ngrps != ngrps_old) {
+		/* Reallocate to fit new size unless new size is 0 */
+		if (ngrps) {
+			lxt = kzalloc((sizeof(*lxt) * LXT_GROUP_SIZE * ngrps),
+				      GFP_KERNEL);
+			if (unlikely(!lxt)) {
+				rc = -ENOMEM;
+				goto out;
+			}
+
+			/* Copy over old entries that will remain */
+			memcpy(lxt, lxt_old,
+			       (sizeof(*lxt) * (rhte->lxt_cnt - delta)));
+		} else
+			lxt = NULL;
+	} else
+		lxt = lxt_old;
+
+	/* Nothing can fail from now on */
+	my_new_size = rhte->lxt_cnt - delta;
+
+	/*
+	 * The following sequence is prescribed in the SISlite spec
+	 * for syncing up with the AFU when removing LXT entries.
+	 */
+	rhte->lxt_cnt = my_new_size;
+	dma_wmb(); /* Make RHT entry's LXT table size update visible */
+
+	rhte->lxt_start = lxt;
+	dma_wmb(); /* Make RHT entry's LXT table update visible */
+
+	if (needs_sync)
+		cxlflash_afu_sync(afu, ctxid, rhndl, AFU_HW_SYNC);
+
+	if (needs_ws) {
+		/*
+		 * Mark the context as unavailable, so that we can release
+		 * the mutex safely.
+		 */
+		ctxi->unavail = true;
+		mutex_unlock(&ctxi->mutex);
+	}
+
+	/* Free LBAs allocated to freed chunks */
+	mutex_lock(&blka->mutex);
+	for (i = delta - 1; i >= 0; i--) {
+		/* Mask the higher 48 bits before shifting, even though
+		 * it is a noop
+		 */
+		aun = (lxt_old[my_new_size + i].rlba_base & SISL_ASTATUS_MASK);
+		aun = (aun >> MC_CHUNK_SHIFT);
+		if (needs_ws)
+			write_same16(sdev, aun, MC_CHUNK_SIZE);
+		ba_free(&blka->ba_lun, aun);
+	}
+	mutex_unlock(&blka->mutex);
+
+	if (needs_ws) {
+		/* Make the context visible again */
+		mutex_lock(&ctxi->mutex);
+		ctxi->unavail = false;
+	}
+
+	/* Free old lxt if reallocated */
+	if (lxt != lxt_old)
+		kfree(lxt_old);
+	*new_size = my_new_size;
+out:
+	pr_debug("%s: returning rc=%d\n", __func__, rc);
+	return rc;
+}
+
+/**
+ * _cxlflash_vlun_resize() - changes the size of a virtual lun
+ * @sdev:	SCSI device associated with LUN owning virtual LUN.
+ * @ctxi:	Context owning resources.
+ * @resize:	Resize ioctl data structure.
+ *
+ * On successful return, the user is informed of the new size (in blocks)
+ * of the virtual lun in last LBA format. When the size of the virtual
+ * lun is zero, the last LBA is reflected as -1. See comment in the
+ * prologue for _cxlflash_disk_release() regarding AFU syncs and contexts
+ * on the error recovery list.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int _cxlflash_vlun_resize(struct scsi_device *sdev,
+			  struct ctx_info *ctxi,
+			  struct dk_cxlflash_resize *resize)
+{
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct llun_info *lli = sdev->hostdata;
+	struct glun_info *gli = lli->parent;
+	struct afu *afu = cfg->afu;
+	bool put_ctx = false;
+
+	res_hndl_t rhndl = resize->rsrc_handle;
+	u64 new_size;
+	u64 nsectors;
+	u64 ctxid = DECODE_CTXID(resize->context_id),
+	    rctxid = resize->context_id;
+
+	struct sisl_rht_entry *rhte;
+
+	int rc = 0;
+
+	/*
+	 * The requested size (req_size) is always assumed to be in 4k blocks,
+	 * so we have to convert it here from 4k to chunk size.
+	 */
+	nsectors = (resize->req_size * CXLFLASH_BLOCK_SIZE) / gli->blk_len;
+	new_size = DIV_ROUND_UP(nsectors, MC_CHUNK_SIZE);
+
+	pr_debug("%s: ctxid=%llu rhndl=0x%llx, req_size=0x%llx,"
+		 "new_size=%llx\n", __func__, ctxid, resize->rsrc_handle,
+		 resize->req_size, new_size);
+
+	if (unlikely(gli->mode != MODE_VIRTUAL)) {
+		pr_debug("%s: LUN mode does not support resize! (%d)\n",
+			 __func__, gli->mode);
+		rc = -EINVAL;
+		goto out;
+
+	}
+
+	if (!ctxi) {
+		ctxi = get_context(cfg, rctxid, lli, CTX_CTRL_ERR_FALLBACK);
+		if (unlikely(!ctxi)) {
+			pr_debug("%s: Bad context! (%llu)\n", __func__, ctxid);
+			rc = -EINVAL;
+			goto out;
+		}
+
+		put_ctx = true;
+	}
+
+	rhte = get_rhte(ctxi, rhndl, lli);
+	if (unlikely(!rhte)) {
+		pr_debug("%s: Bad resource handle! (%u)\n", __func__, rhndl);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	if (new_size > rhte->lxt_cnt)
+		rc = grow_lxt(afu, sdev, ctxid, rhndl, rhte, &new_size);
+	else if (new_size < rhte->lxt_cnt)
+		rc = shrink_lxt(afu, sdev, rhndl, rhte, ctxi, &new_size);
+
+	resize->hdr.return_flags = 0;
+	resize->last_lba = (new_size * MC_CHUNK_SIZE * gli->blk_len);
+	resize->last_lba /= CXLFLASH_BLOCK_SIZE;
+	resize->last_lba--;
+
+out:
+	if (put_ctx)
+		put_context(ctxi);
+	pr_debug("%s: resized to %lld returning rc=%d\n",
+		 __func__, resize->last_lba, rc);
+	return rc;
+}
+
+int cxlflash_vlun_resize(struct scsi_device *sdev,
+			 struct dk_cxlflash_resize *resize)
+{
+	return _cxlflash_vlun_resize(sdev, NULL, resize);
+}
+
+/**
+ * cxlflash_restore_luntable() - Restore LUN table to prior state
+ * @cfg:	Internal structure associated with the host.
+ */
+void cxlflash_restore_luntable(struct cxlflash_cfg *cfg)
+{
+	struct llun_info *lli, *temp;
+	u32 chan;
+	u32 lind;
+	struct afu *afu = cfg->afu;
+	struct sisl_global_map *agm = &afu->afu_map->global;
+
+	mutex_lock(&global.mutex);
+
+	list_for_each_entry_safe(lli, temp, &cfg->lluns, list) {
+		if (!lli->in_table)
+			continue;
+
+		lind = lli->lun_index;
+
+		if (lli->port_sel == BOTH_PORTS) {
+			writeq_be(lli->lun_id[0], &agm->fc_port[0][lind]);
+			writeq_be(lli->lun_id[1], &agm->fc_port[1][lind]);
+			pr_debug("%s: Virtual LUN on slot %d  id0=%llx, "
+				 "id1=%llx\n", __func__, lind,
+				 lli->lun_id[0], lli->lun_id[1]);
+		} else {
+			chan = PORT2CHAN(lli->port_sel);
+			writeq_be(lli->lun_id[chan], &agm->fc_port[chan][lind]);
+			pr_debug("%s: Virtual LUN on slot %d chan=%d, "
+				 "id=%llx\n", __func__, lind, chan,
+				 lli->lun_id[chan]);
+		}
+	}
+
+	mutex_unlock(&global.mutex);
+}
+
+/**
+ * init_luntable() - write an entry in the LUN table
+ * @cfg:	Internal structure associated with the host.
+ * @lli:	Per adapter LUN information structure.
+ *
+ * On successful return, a LUN table entry is created.
+ * At the top for LUNs visible on both ports.
+ * At the bottom for LUNs visible only on one port.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int init_luntable(struct cxlflash_cfg *cfg, struct llun_info *lli)
+{
+	u32 chan;
+	u32 lind;
+	int rc = 0;
+	struct afu *afu = cfg->afu;
+	struct sisl_global_map *agm = &afu->afu_map->global;
+
+	mutex_lock(&global.mutex);
+
+	if (lli->in_table)
+		goto out;
+
+	if (lli->port_sel == BOTH_PORTS) {
+		/*
+		 * If this LUN is visible from both ports, we will put
+		 * it in the top half of the LUN table.
+		 */
+		if ((cfg->promote_lun_index == cfg->last_lun_index[0]) ||
+		    (cfg->promote_lun_index == cfg->last_lun_index[1])) {
+			rc = -ENOSPC;
+			goto out;
+		}
+
+		lind = lli->lun_index = cfg->promote_lun_index;
+		writeq_be(lli->lun_id[0], &agm->fc_port[0][lind]);
+		writeq_be(lli->lun_id[1], &agm->fc_port[1][lind]);
+		cfg->promote_lun_index++;
+		pr_debug("%s: Virtual LUN on slot %d  id0=%llx, id1=%llx\n",
+			 __func__, lind, lli->lun_id[0], lli->lun_id[1]);
+	} else {
+		/*
+		 * If this LUN is visible only from one port, we will put
+		 * it in the bottom half of the LUN table.
+		 */
+		chan = PORT2CHAN(lli->port_sel);
+		if (cfg->promote_lun_index == cfg->last_lun_index[chan]) {
+			rc = -ENOSPC;
+			goto out;
+		}
+
+		lind = lli->lun_index = cfg->last_lun_index[chan];
+		writeq_be(lli->lun_id[chan], &agm->fc_port[chan][lind]);
+		cfg->last_lun_index[chan]--;
+		pr_debug("%s: Virtual LUN on slot %d  chan=%d, id=%llx\n",
+			 __func__, lind, chan, lli->lun_id[chan]);
+	}
+
+	lli->in_table = true;
+out:
+	mutex_unlock(&global.mutex);
+	pr_debug("%s: returning rc=%d\n", __func__, rc);
+	return rc;
+}
+
+/**
+ * cxlflash_disk_virtual_open() - open a virtual disk of specified size
+ * @sdev:	SCSI device associated with LUN owning virtual LUN.
+ * @arg:	UVirtual ioctl data structure.
+ *
+ * On successful return, the user is informed of the resource handle
+ * to be used to identify the virtual lun and the size (in blocks) of
+ * the virtual lun in last LBA format. When the size of the virtual lun
+ * is zero, the last LBA is reflected as -1.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int cxlflash_disk_virtual_open(struct scsi_device *sdev, void *arg)
+{
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct device *dev = &cfg->dev->dev;
+	struct llun_info *lli = sdev->hostdata;
+	struct glun_info *gli = lli->parent;
+
+	struct dk_cxlflash_uvirtual *virt = (struct dk_cxlflash_uvirtual *)arg;
+	struct dk_cxlflash_resize resize;
+
+	u64 ctxid = DECODE_CTXID(virt->context_id),
+	    rctxid = virt->context_id;
+	u64 lun_size = virt->lun_size;
+	u64 last_lba = 0;
+	u64 rsrc_handle = -1;
+
+	int rc = 0;
+
+	struct ctx_info *ctxi = NULL;
+	struct sisl_rht_entry *rhte = NULL;
+
+	pr_debug("%s: ctxid=%llu ls=0x%llx\n", __func__, ctxid, lun_size);
+
+	mutex_lock(&gli->mutex);
+	if (gli->mode == MODE_NONE) {
+		/* Setup the LUN table and block allocator on first call */
+		rc = init_luntable(cfg, lli);
+		if (rc) {
+			dev_err(dev, "%s: call to init_luntable failed "
+				"rc=%d!\n", __func__, rc);
+			goto err0;
+		}
+
+		rc = init_vlun(lli);
+		if (rc) {
+			dev_err(dev, "%s: call to init_vlun failed rc=%d!\n",
+				__func__, rc);
+			rc = -ENOMEM;
+			goto err0;
+		}
+	}
+
+	rc = cxlflash_lun_attach(gli, MODE_VIRTUAL, true);
+	if (unlikely(rc)) {
+		dev_err(dev, "%s: Failed to attach to LUN! (VIRTUAL)\n",
+			__func__);
+		goto err0;
+	}
+	mutex_unlock(&gli->mutex);
+
+	ctxi = get_context(cfg, rctxid, lli, 0);
+	if (unlikely(!ctxi)) {
+		dev_err(dev, "%s: Bad context! (%llu)\n", __func__, ctxid);
+		rc = -EINVAL;
+		goto err1;
+	}
+
+	rhte = rhte_checkout(ctxi, lli);
+	if (unlikely(!rhte)) {
+		dev_err(dev, "%s: too many opens for this context\n", __func__);
+		rc = -EMFILE;	/* too many opens  */
+		goto err1;
+	}
+
+	rsrc_handle = (rhte - ctxi->rht_start);
+
+	/* Populate RHT format 0 */
+	rhte->nmask = MC_RHT_NMASK;
+	rhte->fp = SISL_RHT_FP(0U, ctxi->rht_perms);
+
+	/* Resize even if requested size is 0 */
+	marshal_virt_to_resize(virt, &resize);
+	resize.rsrc_handle = rsrc_handle;
+	rc = _cxlflash_vlun_resize(sdev, ctxi, &resize);
+	if (rc) {
+		dev_err(dev, "%s: resize failed rc %d\n", __func__, rc);
+		goto err2;
+	}
+	last_lba = resize.last_lba;
+
+	if (virt->hdr.flags & DK_CXLFLASH_UVIRTUAL_NEED_WRITE_SAME)
+		ctxi->rht_needs_ws[rsrc_handle] = true;
+
+	virt->hdr.return_flags = 0;
+	virt->last_lba = last_lba;
+	virt->rsrc_handle = rsrc_handle;
+
+out:
+	if (likely(ctxi))
+		put_context(ctxi);
+	pr_debug("%s: returning handle 0x%llx rc=%d llba %lld\n",
+		 __func__, rsrc_handle, rc, last_lba);
+	return rc;
+
+err2:
+	rhte_checkin(ctxi, rhte);
+err1:
+	cxlflash_lun_detach(gli);
+	goto out;
+err0:
+	/* Special common cleanup prior to successful LUN attach */
+	cxlflash_ba_terminate(&gli->blka.ba_lun);
+	mutex_unlock(&gli->mutex);
+	goto out;
+}
+
+/**
+ * clone_lxt() - copies translation tables from source to destination RHTE
+ * @afu:	AFU associated with the host.
+ * @blka:	Block allocator associated with LUN.
+ * @ctxid:	Context ID of context owning the RHTE.
+ * @rhndl:	Resource handle associated with the RHTE.
+ * @rhte:	Destination resource handle entry (RHTE).
+ * @rhte_src:	Source resource handle entry (RHTE).
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int clone_lxt(struct afu *afu,
+		     struct blka *blka,
+		     ctx_hndl_t ctxid,
+		     res_hndl_t rhndl,
+		     struct sisl_rht_entry *rhte,
+		     struct sisl_rht_entry *rhte_src)
+{
+	struct sisl_lxt_entry *lxt;
+	u32 ngrps;
+	u64 aun;		/* chunk# allocated by block allocator */
+	int i, j;
+
+	ngrps = LXT_NUM_GROUPS(rhte_src->lxt_cnt);
+
+	if (ngrps) {
+		/* allocate new LXTs for clone */
+		lxt = kzalloc((sizeof(*lxt) * LXT_GROUP_SIZE * ngrps),
+				GFP_KERNEL);
+		if (unlikely(!lxt))
+			return -ENOMEM;
+
+		/* copy over */
+		memcpy(lxt, rhte_src->lxt_start,
+		       (sizeof(*lxt) * rhte_src->lxt_cnt));
+
+		/* clone the LBAs in block allocator via ref_cnt */
+		mutex_lock(&blka->mutex);
+		for (i = 0; i < rhte_src->lxt_cnt; i++) {
+			aun = (lxt[i].rlba_base >> MC_CHUNK_SHIFT);
+			if (ba_clone(&blka->ba_lun, aun) == -1ULL) {
+				/* free the clones already made */
+				for (j = 0; j < i; j++) {
+					aun = (lxt[j].rlba_base >>
+					       MC_CHUNK_SHIFT);
+					ba_free(&blka->ba_lun, aun);
+				}
+
+				mutex_unlock(&blka->mutex);
+				kfree(lxt);
+				return -EIO;
+			}
+		}
+		mutex_unlock(&blka->mutex);
+	} else {
+		lxt = NULL;
+	}
+
+	/*
+	 * The following sequence is prescribed in the SISlite spec
+	 * for syncing up with the AFU when adding LXT entries.
+	 */
+	dma_wmb(); /* Make LXT updates are visible */
+
+	rhte->lxt_start = lxt;
+	dma_wmb(); /* Make RHT entry's LXT table update visible */
+
+	rhte->lxt_cnt = rhte_src->lxt_cnt;
+	dma_wmb(); /* Make RHT entry's LXT table size update visible */
+
+	cxlflash_afu_sync(afu, ctxid, rhndl, AFU_LW_SYNC);
+
+	pr_debug("%s: returning\n", __func__);
+	return 0;
+}
+
+/**
+ * cxlflash_disk_clone() - clone a context by making snapshot of another
+ * @sdev:	SCSI device associated with LUN owning virtual LUN.
+ * @clone:	Clone ioctl data structure.
+ *
+ * This routine effectively performs cxlflash_disk_open operation for each
+ * in-use virtual resource in the source context. Note that the destination
+ * context must be in pristine state and cannot have any resource handles
+ * open at the time of the clone.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+int cxlflash_disk_clone(struct scsi_device *sdev,
+			struct dk_cxlflash_clone *clone)
+{
+	struct cxlflash_cfg *cfg = (struct cxlflash_cfg *)sdev->host->hostdata;
+	struct llun_info *lli = sdev->hostdata;
+	struct glun_info *gli = lli->parent;
+	struct blka *blka = &gli->blka;
+	struct afu *afu = cfg->afu;
+	struct dk_cxlflash_release release = { { 0 }, 0 };
+
+	struct ctx_info *ctxi_src = NULL,
+			*ctxi_dst = NULL;
+	struct lun_access *lun_access_src, *lun_access_dst;
+	u32 perms;
+	u64 ctxid_src = DECODE_CTXID(clone->context_id_src),
+	    ctxid_dst = DECODE_CTXID(clone->context_id_dst),
+	    rctxid_src = clone->context_id_src,
+	    rctxid_dst = clone->context_id_dst;
+	int adap_fd_src = clone->adap_fd_src;
+	int i, j;
+	int rc = 0;
+	bool found;
+	LIST_HEAD(sidecar);
+
+	pr_debug("%s: ctxid_src=%llu ctxid_dst=%llu adap_fd_src=%d\n",
+		 __func__, ctxid_src, ctxid_dst, adap_fd_src);
+
+	/* Do not clone yourself */
+	if (unlikely(rctxid_src == rctxid_dst)) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	if (unlikely(gli->mode != MODE_VIRTUAL)) {
+		rc = -EINVAL;
+		pr_debug("%s: Clone not supported on physical LUNs! (%d)\n",
+			 __func__, gli->mode);
+		goto out;
+	}
+
+	ctxi_src = get_context(cfg, rctxid_src, lli, CTX_CTRL_CLONE);
+	ctxi_dst = get_context(cfg, rctxid_dst, lli, 0);
+	if (unlikely(!ctxi_src || !ctxi_dst)) {
+		pr_debug("%s: Bad context! (%llu,%llu)\n", __func__,
+			 ctxid_src, ctxid_dst);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	if (unlikely(adap_fd_src != ctxi_src->lfd)) {
+		pr_debug("%s: Invalid source adapter fd! (%d)\n",
+			 __func__, adap_fd_src);
+		rc = -EINVAL;
+		goto out;
+	}
+
+	/* Verify there is no open resource handle in the destination context */
+	for (i = 0; i < MAX_RHT_PER_CONTEXT; i++)
+		if (ctxi_dst->rht_start[i].nmask != 0) {
+			rc = -EINVAL;
+			goto out;
+		}
+
+	/* Clone LUN access list */
+	list_for_each_entry(lun_access_src, &ctxi_src->luns, list) {
+		found = false;
+		list_for_each_entry(lun_access_dst, &ctxi_dst->luns, list)
+			if (lun_access_dst->sdev == lun_access_src->sdev) {
+				found = true;
+				break;
+			}
+
+		if (!found) {
+			lun_access_dst = kzalloc(sizeof(*lun_access_dst),
+						 GFP_KERNEL);
+			if (unlikely(!lun_access_dst)) {
+				pr_err("%s: Unable to allocate lun_access!\n",
+				       __func__);
+				rc = -ENOMEM;
+				goto out;
+			}
+
+			*lun_access_dst = *lun_access_src;
+			list_add(&lun_access_dst->list, &sidecar);
+		}
+	}
+
+	if (unlikely(!ctxi_src->rht_out)) {
+		pr_debug("%s: Nothing to clone!\n", __func__);
+		goto out_success;
+	}
+
+	/* User specified permission on attach */
+	perms = ctxi_dst->rht_perms;
+
+	/*
+	 * Copy over checked-out RHT (and their associated LXT) entries by
+	 * hand, stopping after we've copied all outstanding entries and
+	 * cleaning up if the clone fails.
+	 *
+	 * Note: This loop is equivalent to performing cxlflash_disk_open and
+	 * cxlflash_vlun_resize. As such, LUN accounting needs to be taken into
+	 * account by attaching after each successful RHT entry clone. In the
+	 * event that a clone failure is experienced, the LUN detach is handled
+	 * via the cleanup performed by _cxlflash_disk_release.
+	 */
+	for (i = 0; i < MAX_RHT_PER_CONTEXT; i++) {
+		if (ctxi_src->rht_out == ctxi_dst->rht_out)
+			break;
+		if (ctxi_src->rht_start[i].nmask == 0)
+			continue;
+
+		/* Consume a destination RHT entry */
+		ctxi_dst->rht_out++;
+		ctxi_dst->rht_start[i].nmask = ctxi_src->rht_start[i].nmask;
+		ctxi_dst->rht_start[i].fp =
+		    SISL_RHT_FP_CLONE(ctxi_src->rht_start[i].fp, perms);
+		ctxi_dst->rht_lun[i] = ctxi_src->rht_lun[i];
+
+		rc = clone_lxt(afu, blka, ctxid_dst, i,
+			       &ctxi_dst->rht_start[i],
+			       &ctxi_src->rht_start[i]);
+		if (rc) {
+			marshal_clone_to_rele(clone, &release);
+			for (j = 0; j < i; j++) {
+				release.rsrc_handle = j;
+				_cxlflash_disk_release(sdev, ctxi_dst,
+						       &release);
+			}
+
+			/* Put back the one we failed on */
+			rhte_checkin(ctxi_dst, &ctxi_dst->rht_start[i]);
+			goto err;
+		}
+
+		cxlflash_lun_attach(gli, gli->mode, false);
+	}
+
+out_success:
+	list_splice(&sidecar, &ctxi_dst->luns);
+	sys_close(adap_fd_src);
+
+	/* fall through */
+out:
+	if (ctxi_src)
+		put_context(ctxi_src);
+	if (ctxi_dst)
+		put_context(ctxi_dst);
+	pr_debug("%s: returning rc=%d\n", __func__, rc);
+	return rc;
+
+err:
+	list_for_each_entry_safe(lun_access_src, lun_access_dst, &sidecar, list)
+		kfree(lun_access_src);
+	goto out;
+}
diff --git a/drivers/scsi/cxlflash/vlun.h b/drivers/scsi/cxlflash/vlun.h
new file mode 100644
index 000000000000..8b29a74946e4
--- /dev/null
+++ b/drivers/scsi/cxlflash/vlun.h
@@ -0,0 +1,86 @@
+/*
+ * CXL Flash Device Driver
+ *
+ * Written by: Manoj N. Kumar <manoj@linux.vnet.ibm.com>, IBM Corporation
+ *             Matthew R. Ochs <mrochs@linux.vnet.ibm.com>, IBM Corporation
+ *
+ * Copyright (C) 2015 IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _CXLFLASH_VLUN_H
+#define _CXLFLASH_VLUN_H
+
+/* RHT - Resource Handle Table */
+#define MC_RHT_NMASK      16	/* in bits */
+#define MC_CHUNK_SHIFT    MC_RHT_NMASK	/* shift to go from LBA to chunk# */
+
+#define HIBIT             (BITS_PER_LONG - 1)
+
+#define MAX_AUN_CLONE_CNT 0xFF
+
+/*
+ * LXT - LBA Translation Table
+ *
+ * +-------+-------+-------+-------+-------+-------+-------+---+---+
+ * | RLBA_BASE                                     |LUN_IDX| P |SEL|
+ * +-------+-------+-------+-------+-------+-------+-------+---+---+
+ *
+ * The LXT Entry contains the physical LBA where the chunk starts (RLBA_BASE).
+ * AFU ORes the low order bits from the virtual LBA (offset into the chunk)
+ * with RLBA_BASE. The result is the physical LBA to be sent to storage.
+ * The LXT Entry also contains an index to a LUN TBL and a bitmask of which
+ * outgoing (FC) * ports can be selected. The port select bit-mask is ANDed
+ * with a global port select bit-mask maintained by the driver.
+ * In addition, it has permission bits that are ANDed with the
+ * RHT permissions to arrive at the final permissions for the chunk.
+ *
+ * LXT tables are allocated dynamically in groups. This is done to avoid
+ * a malloc/free overhead each time the LXT has to grow or shrink.
+ *
+ * Based on the current lxt_cnt (used), it is always possible to know
+ * how many are allocated (used+free). The number of allocated entries is
+ * not stored anywhere.
+ *
+ * The LXT table is re-allocated whenever it needs to cross into another group.
+*/
+#define LXT_GROUP_SIZE          8
+#define LXT_NUM_GROUPS(lxt_cnt) (((lxt_cnt) + 7)/8)	/* alloc'ed groups */
+#define LXT_LUNIDX_SHIFT  8	/* LXT entry, shift for LUN index */
+#define LXT_PERM_SHIFT    4	/* LXT entry, shift for permission bits */
+
+struct ba_lun_info {
+	u64 *lun_alloc_map;
+	u32 lun_bmap_size;
+	u32 total_aus;
+	u64 free_aun_cnt;
+
+	/* indices to be used for elevator lookup of free map */
+	u32 free_low_idx;
+	u32 free_curr_idx;
+	u32 free_high_idx;
+
+	u8 *aun_clone_map;
+};
+
+struct ba_lun {
+	u64 lun_id;
+	u64 wwpn;
+	size_t lsize;		/* LUN size in number of LBAs             */
+	size_t lba_size;	/* LBA size in number of bytes            */
+	size_t au_size;		/* Allocation Unit size in number of LBAs */
+	struct ba_lun_info *ba_lun_handle;
+};
+
+/* Block Allocator */
+struct blka {
+	struct ba_lun ba_lun;
+	u64 nchunk;		/* number of chunks */
+	struct mutex mutex;
+};
+
+#endif /* ifndef _CXLFLASH_SUPERPIPE_H */
diff --git a/include/uapi/scsi/cxlflash_ioctl.h b/include/uapi/scsi/cxlflash_ioctl.h
index 570773406531..831351b2e660 100644
--- a/include/uapi/scsi/cxlflash_ioctl.h
+++ b/include/uapi/scsi/cxlflash_ioctl.h
@@ -71,6 +71,17 @@ struct dk_cxlflash_udirect {
 	__u64 reserved[8];		/* Reserved for future use */
 };
 
+#define DK_CXLFLASH_UVIRTUAL_NEED_WRITE_SAME	0x8000000000000000ULL
+
+struct dk_cxlflash_uvirtual {
+	struct dk_cxlflash_hdr hdr;	/* Common fields */
+	__u64 context_id;		/* Context to own virtual resources */
+	__u64 lun_size;			/* Requested size, in 4K blocks */
+	__u64 rsrc_handle;		/* Returned resource handle */
+	__u64 last_lba;			/* Returned last LBA of LUN */
+	__u64 reserved[8];		/* Reserved for future use */
+};
+
 struct dk_cxlflash_release {
 	struct dk_cxlflash_hdr hdr;	/* Common fields */
 	__u64 context_id;		/* Context owning resources */
@@ -78,6 +89,23 @@ struct dk_cxlflash_release {
 	__u64 reserved[8];		/* Reserved for future use */
 };
 
+struct dk_cxlflash_resize {
+	struct dk_cxlflash_hdr hdr;	/* Common fields */
+	__u64 context_id;		/* Context owning resources */
+	__u64 rsrc_handle;		/* Resource handle of LUN to resize */
+	__u64 req_size;			/* New requested size, in 4K blocks */
+	__u64 last_lba;			/* Returned last LBA of LUN */
+	__u64 reserved[8];		/* Reserved for future use */
+};
+
+struct dk_cxlflash_clone {
+	struct dk_cxlflash_hdr hdr;	/* Common fields */
+	__u64 context_id_src;		/* Context to clone from */
+	__u64 context_id_dst;		/* Context to clone to */
+	__u64 adap_fd_src;		/* Source context adapter fd */
+	__u64 reserved[8];		/* Reserved for future use */
+};
+
 #define DK_CXLFLASH_VERIFY_SENSE_LEN	18
 #define DK_CXLFLASH_VERIFY_HINT_SENSE	0x8000000000000000ULL
 
@@ -118,7 +146,10 @@ union cxlflash_ioctls {
 	struct dk_cxlflash_attach attach;
 	struct dk_cxlflash_detach detach;
 	struct dk_cxlflash_udirect udirect;
+	struct dk_cxlflash_uvirtual uvirtual;
 	struct dk_cxlflash_release release;
+	struct dk_cxlflash_resize resize;
+	struct dk_cxlflash_clone clone;
 	struct dk_cxlflash_verify verify;
 	struct dk_cxlflash_recover_afu recover_afu;
 	struct dk_cxlflash_manage_lun manage_lun;
@@ -136,5 +167,8 @@ union cxlflash_ioctls {
 #define DK_CXLFLASH_VERIFY		CXL_IOWR(0x84, dk_cxlflash_verify)
 #define DK_CXLFLASH_RECOVER_AFU		CXL_IOWR(0x85, dk_cxlflash_recover_afu)
 #define DK_CXLFLASH_MANAGE_LUN		CXL_IOWR(0x86, dk_cxlflash_manage_lun)
+#define DK_CXLFLASH_USER_VIRTUAL	CXL_IOWR(0x87, dk_cxlflash_uvirtual)
+#define DK_CXLFLASH_VLUN_RESIZE		CXL_IOWR(0x88, dk_cxlflash_resize)
+#define DK_CXLFLASH_VLUN_CLONE		CXL_IOWR(0x89, dk_cxlflash_clone)
 
 #endif /* ifndef _CXLFLASH_IOCTL_H */
-- 
cgit v1.2.3


From 9b28829d6da391f67a76dbba07a167e2b554bd10 Mon Sep 17 00:00:00 2001
From: Vineet Gupta <vgupta@synopsys.com>
Date: Tue, 18 Nov 2014 17:36:11 +0530
Subject: ARCv2: perf: Finally introduce HS perf unit

With all features in place, the ARC HS pct block can now be effectively
allowed to be probed/used

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 Documentation/devicetree/bindings/arc/archs-pct.txt | 17 +++++++++++++++++
 MAINTAINERS                                         |  2 +-
 arch/arc/include/asm/perf_event.h                   |  5 ++++-
 arch/arc/kernel/perf_event.c                        |  3 ++-
 4 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arc/archs-pct.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arc/archs-pct.txt b/Documentation/devicetree/bindings/arc/archs-pct.txt
new file mode 100644
index 000000000000..1ae98b87c640
--- /dev/null
+++ b/Documentation/devicetree/bindings/arc/archs-pct.txt
@@ -0,0 +1,17 @@
+* ARC HS Performance Counters
+
+The ARC HS can be configured with a pipeline performance monitor for counting
+CPU and cache events like cache misses and hits. Like conventional PCT there
+are 100+ hardware conditions dynamically mapped to upto 32 counters.
+It also supports overflow interrupts.
+
+Required properties:
+
+- compatible : should contain
+	"snps,archs-pct"
+
+Example:
+
+pmu {
+        compatible = "snps,archs-pct";
+};
diff --git a/MAINTAINERS b/MAINTAINERS
index d7ab7363b263..d4cda4273131 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9874,7 +9874,7 @@ SYNOPSYS ARC ARCHITECTURE
 M:	Vineet Gupta <vgupta@synopsys.com>
 S:	Supported
 F:	arch/arc/
-F:	Documentation/devicetree/bindings/arc/
+F:	Documentation/devicetree/bindings/arc/*
 F:	drivers/tty/serial/arc_uart.c
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc.git
 
diff --git a/arch/arc/include/asm/perf_event.h b/arch/arc/include/asm/perf_event.h
index 5824ab46cb71..5f071762fb1c 100644
--- a/arch/arc/include/asm/perf_event.h
+++ b/arch/arc/include/asm/perf_event.h
@@ -105,8 +105,11 @@ static const char * const arc_pmu_ev_hw_map[] = {
 	[PERF_COUNT_HW_INSTRUCTIONS] = "iall",
 	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = "ijmp", /* Excludes ZOL jumps */
 	[PERF_COUNT_ARC_BPOK]         = "bpok",	  /* NP-NT, PT-T, PNT-NT */
+#ifdef CONFIG_ISA_ARCV2
+	[PERF_COUNT_HW_BRANCH_MISSES] = "bpmp",
+#else
 	[PERF_COUNT_HW_BRANCH_MISSES] = "bpfail", /* NP-T, PT-NT, PNT-T */
-
+#endif
 	[PERF_COUNT_ARC_LDC] = "imemrdc",	/* Instr: mem read cached */
 	[PERF_COUNT_ARC_STC] = "imemwrc",	/* Instr: mem write cached */
 
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 743065251e23..0c08bb1ce15a 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -551,6 +551,7 @@ static int arc_pmu_device_probe(struct platform_device *pdev)
 #ifdef CONFIG_OF
 static const struct of_device_id arc_pmu_match[] = {
 	{ .compatible = "snps,arc700-pct" },
+	{ .compatible = "snps,archs-pct" },
 	{},
 };
 MODULE_DEVICE_TABLE(of, arc_pmu_match);
@@ -558,7 +559,7 @@ MODULE_DEVICE_TABLE(of, arc_pmu_match);
 
 static struct platform_driver arc_pmu_driver = {
 	.driver	= {
-		.name		= "arc700-pct",
+		.name		= "arc-pct",
 		.of_match_table = of_match_ptr(arc_pmu_match),
 	},
 	.probe		= arc_pmu_device_probe,
-- 
cgit v1.2.3


From 2a2a7ea7c0126d388c14c28927cdba429b4858dd Mon Sep 17 00:00:00 2001
From: Haibo Chen <haibo.chen@freescale.com>
Date: Tue, 11 Aug 2015 19:38:28 +0800
Subject: mmc: sdhci-esdhc-imx: Document new DT bindings for imx7d support

Add a required property "fsl,imx7d-usdhc" in binding doc.
Add an optional property "fsl,tuning-step" in binding doc.

Signed-off-by: Haibo Chen <haibo.chen@freescale.com>
Acked-by: Dong Aisheng <aisheng.dong@freescale.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt | 6 ++++++
 1 file changed, 6 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt b/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt
index 211e7785f4d2..dca56d6248f5 100644
--- a/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt
+++ b/Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.txt
@@ -15,6 +15,7 @@ Required properties:
 	       "fsl,imx6q-usdhc"
 	       "fsl,imx6sl-usdhc"
 	       "fsl,imx6sx-usdhc"
+	       "fsl,imx7d-usdhc"
 
 Optional properties:
 - fsl,wp-controller : Indicate to use controller internal write protection
@@ -27,6 +28,11 @@ Optional properties:
   transparent level shifters on the outputs of the controller. Two cells are
   required, first cell specifies minimum slot voltage (mV), second cell
   specifies maximum slot voltage (mV). Several ranges could be specified.
+- fsl,tuning-step: Specify the increasing delay cell steps in tuning procedure.
+  The uSDHC use one delay cell as default increasing step to do tuning process.
+  This property allows user to change the tuning step to more than one delay
+  cells which is useful for some special boards or cards when the default
+  tuning step can't find the proper delay window within limited tuning retries.
 
 Examples:
 
-- 
cgit v1.2.3


From da795ec26e2542f1e306598a1d7a31c0762f2bd7 Mon Sep 17 00:00:00 2001
From: Shawn Lin <shawn.lin@rock-chips.com>
Date: Tue, 11 Aug 2015 15:57:05 +0800
Subject: mmc: sdhci-of-arasan: Add the support for sdhci-5.1

This patch adds the compatible string in sdhci-of-arasan.c to
support sdhci-arasan5.1 version of controller. No documented
controller IP version is found in the TRM, so we use ths version
of command queueing engine integrated into this controller by arasan
to specify our controller.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 Documentation/devicetree/bindings/mmc/arasan,sdhci.txt | 2 +-
 drivers/mmc/host/sdhci-of-arasan.c                     | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt b/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt
index 7e9490313d5a..da541c3631f8 100644
--- a/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt
+++ b/Documentation/devicetree/bindings/mmc/arasan,sdhci.txt
@@ -9,7 +9,7 @@ Device Tree Bindings for the Arasan SDHCI Controller
 
 Required Properties:
   - compatible: Compatibility string. Must be 'arasan,sdhci-8.9a' or
-                'arasan,sdhci-4.9a'
+                'arasan,sdhci-4.9a' or 'arasan,sdhci-5.1'
   - reg: From mmc bindings: Register location and length.
   - clocks: From clock bindings: Handles to clock inputs.
   - clock-names: From clock bindings: Tuple including "clk_xin" and "clk_ahb"
diff --git a/drivers/mmc/host/sdhci-of-arasan.c b/drivers/mmc/host/sdhci-of-arasan.c
index ef5a7d241323..75379cb0fb35 100644
--- a/drivers/mmc/host/sdhci-of-arasan.c
+++ b/drivers/mmc/host/sdhci-of-arasan.c
@@ -217,6 +217,7 @@ static int sdhci_arasan_remove(struct platform_device *pdev)
 
 static const struct of_device_id sdhci_arasan_of_match[] = {
 	{ .compatible = "arasan,sdhci-8.9a" },
+	{ .compatible = "arasan,sdhci-5.1" },
 	{ .compatible = "arasan,sdhci-4.9a" },
 	{ }
 };
-- 
cgit v1.2.3


From 5aeb5d205e122b70742df94a7d51a3502f1a8277 Mon Sep 17 00:00:00 2001
From: Huang Rui <ray.huang@amd.com>
Date: Thu, 27 Aug 2015 16:07:37 +0800
Subject: hwmon: (fam15h_power) Add documentation for new processors support

This patch updates description of fam15h_power driver, its scope is
extended to family 16h processsors.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/hwmon/fam15h_power | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/hwmon/fam15h_power b/Documentation/hwmon/fam15h_power
index 80654813d04a..e2b1b69eebea 100644
--- a/Documentation/hwmon/fam15h_power
+++ b/Documentation/hwmon/fam15h_power
@@ -3,12 +3,13 @@ Kernel driver fam15h_power
 
 Supported chips:
 * AMD Family 15h Processors
+* AMD Family 16h Processors
 
   Prefix: 'fam15h_power'
   Addresses scanned: PCI space
   Datasheets:
   BIOS and Kernel Developer's Guide (BKDG) For AMD Family 15h Processors
-    (not yet published)
+  BIOS and Kernel Developer's Guide (BKDG) For AMD Family 16h Processors
 
 Author: Andreas Herrmann <herrmann.der.user@googlemail.com>
 
@@ -16,10 +17,11 @@ Description
 -----------
 
 This driver permits reading of registers providing power information
-of AMD Family 15h processors.
+of AMD Family 15h and 16h processors.
 
-For AMD Family 15h processors the following power values can be
-calculated using different processor northbridge function registers:
+For AMD Family 15h and 16h processors the following power values can
+be calculated using different processor northbridge function
+registers:
 
 * BasePwrWatts: Specifies in watts the maximum amount of power
   consumed by the processor for NB and logic external to the core.
-- 
cgit v1.2.3


From 3b7ce99748f0d006f9d1aa85709872e7b46787f7 Mon Sep 17 00:00:00 2001
From: Ricard Wanderlof <ricard.wanderlof@axis.com>
Date: Thu, 27 Aug 2015 11:35:20 +0200
Subject: ASoC: ics43432: Add codec driver for InvenSense ICS-43432

Add support for the InvenSense ICS-43432 I2S MEMS microphone.

This is a non-software-configurable MEMS microphone with I2S output.

Tested on a setup with a single ICS-43432 (the device itself supports
stereo operation using a hardware pin controlling left vs. right channel
output).

Signed-off-by: Ricard Wanderlof <ricardw@axis.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../devicetree/bindings/sound/ics43432.txt         | 17 +++++
 .../devicetree/bindings/vendor-prefixes.txt        |  1 +
 sound/soc/codecs/Kconfig                           |  4 ++
 sound/soc/codecs/Makefile                          |  2 +
 sound/soc/codecs/ics43432.c                        | 77 ++++++++++++++++++++++
 5 files changed, 101 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/ics43432.txt
 create mode 100644 sound/soc/codecs/ics43432.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/sound/ics43432.txt b/Documentation/devicetree/bindings/sound/ics43432.txt
new file mode 100644
index 000000000000..b02e3a6c0fef
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/ics43432.txt
@@ -0,0 +1,17 @@
+Invensense ICS-43432 MEMS microphone with I2S output.
+
+There are no software configuration options for this device, indeed, the only
+host connection is the I2S interface. Apart from requirements on clock
+frequency (460 kHz to 3.379 MHz according to the data sheet) there must be
+64 clock cycles in each stereo output frame; 24 of the 32 available bits
+contain audio data. A hardware pin determines if the device outputs data
+on the left or right channel of the I2S frame.
+
+Required properties:
+  - compatible : Must be "invensense,ics43432"
+
+Example:
+
+	ics43432: ics43432 {
+		compatible = "invensense,ics43432";
+	};
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index d444757c4d9e..80bf96d4795f 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -110,6 +110,7 @@ ingenic	Ingenic Semiconductor
 innolux	Innolux Corporation
 intel	Intel Corporation
 intercontrol	Inter Control Group
+invensense	InvenSense Inc.
 isee	ISEE 2007 S.L.
 isil	Intersil
 karo	Ka-Ro electronics GmbH
diff --git a/sound/soc/codecs/Kconfig b/sound/soc/codecs/Kconfig
index efaafce8ba38..1bb6446dce53 100644
--- a/sound/soc/codecs/Kconfig
+++ b/sound/soc/codecs/Kconfig
@@ -62,6 +62,7 @@ config SND_SOC_ALL_CODECS
 	select SND_SOC_BT_SCO
 	select SND_SOC_ES8328_SPI if SPI_MASTER
 	select SND_SOC_ES8328_I2C if I2C
+	select SND_SOC_ICS43432
 	select SND_SOC_ISABELLE if I2C
 	select SND_SOC_JZ4740_CODEC
 	select SND_SOC_LM4857 if I2C
@@ -446,6 +447,9 @@ config SND_SOC_ES8328_SPI
 	tristate
 	select SND_SOC_ES8328
 
+config SND_SOC_ICS43432
+	tristate
+
 config SND_SOC_ISABELLE
         tristate
 
diff --git a/sound/soc/codecs/Makefile b/sound/soc/codecs/Makefile
index cf160d972cb3..5a19a670a82b 100644
--- a/sound/soc/codecs/Makefile
+++ b/sound/soc/codecs/Makefile
@@ -55,6 +55,7 @@ snd-soc-dmic-objs := dmic.o
 snd-soc-es8328-objs := es8328.o
 snd-soc-es8328-i2c-objs := es8328-i2c.o
 snd-soc-es8328-spi-objs := es8328-spi.o
+snd-soc-ics43432-objs := ics43432.o
 snd-soc-isabelle-objs := isabelle.o
 snd-soc-jz4740-codec-objs := jz4740.o
 snd-soc-l3-objs := l3.o
@@ -242,6 +243,7 @@ obj-$(CONFIG_SND_SOC_DMIC)	+= snd-soc-dmic.o
 obj-$(CONFIG_SND_SOC_ES8328)	+= snd-soc-es8328.o
 obj-$(CONFIG_SND_SOC_ES8328_I2C)+= snd-soc-es8328-i2c.o
 obj-$(CONFIG_SND_SOC_ES8328_SPI)+= snd-soc-es8328-spi.o
+obj-$(CONFIG_SND_SOC_ICS43432)	+= snd-soc-ics43432.o
 obj-$(CONFIG_SND_SOC_ISABELLE)	+= snd-soc-isabelle.o
 obj-$(CONFIG_SND_SOC_JZ4740_CODEC)	+= snd-soc-jz4740-codec.o
 obj-$(CONFIG_SND_SOC_L3)	+= snd-soc-l3.o
diff --git a/sound/soc/codecs/ics43432.c b/sound/soc/codecs/ics43432.c
new file mode 100644
index 000000000000..4f202c15df39
--- /dev/null
+++ b/sound/soc/codecs/ics43432.c
@@ -0,0 +1,77 @@
+/*
+ * I2S MEMS microphone driver for InvenSense ICS-43432
+ *
+ * - Non configurable.
+ * - I2S interface, 64 BCLs per frame, 32 bits per channel, 24 bit data
+ *
+ * Copyright (c) 2015 Axis Communications AB
+ *
+ * Licensed under GPL2.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <sound/core.h>
+#include <sound/pcm.h>
+#include <sound/pcm_params.h>
+#include <sound/soc.h>
+#include <sound/initval.h>
+#include <sound/tlv.h>
+
+#define ICS43432_RATE_MIN 7190 /* Hz, from data sheet */
+#define ICS43432_RATE_MAX 52800  /* Hz, from data sheet */
+
+#define ICS43432_FORMATS (SNDRV_PCM_FMTBIT_S24_LE | SNDRV_PCM_FMTBIT_S32)
+
+static struct snd_soc_dai_driver ics43432_dai = {
+	.name = "ics43432-hifi",
+	.capture = {
+		.stream_name = "Capture",
+		.channels_min = 1,
+		.channels_max = 2,
+		.rate_min = ICS43432_RATE_MIN,
+		.rate_max = ICS43432_RATE_MAX,
+		.rates = SNDRV_PCM_RATE_CONTINUOUS,
+		.formats = ICS43432_FORMATS,
+	},
+};
+
+static struct snd_soc_codec_driver ics43432_codec_driver = {
+};
+
+static int ics43432_probe(struct platform_device *pdev)
+{
+	return snd_soc_register_codec(&pdev->dev, &ics43432_codec_driver,
+			&ics43432_dai, 1);
+}
+
+static int ics43432_remove(struct platform_device *pdev)
+{
+	snd_soc_unregister_codec(&pdev->dev);
+	return 0;
+}
+
+#ifdef CONFIG_OF
+static const struct of_device_id ics43432_ids[] = {
+	{ .compatible = "invensense,ics43432", },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, ics43432_dt_ids);
+#endif
+
+static struct platform_driver ics43432_driver = {
+	.driver = {
+		.name = "ics43432",
+		.owner = THIS_MODULE,
+		.of_match_table = of_match_ptr(ics43432_ids),
+	},
+	.probe = ics43432_probe,
+	.remove = ics43432_remove,
+};
+
+module_platform_driver(ics43432_driver);
+
+MODULE_DESCRIPTION("ASoC ICS43432 driver");
+MODULE_AUTHOR("Ricard Wanderlof <ricardw@axis.com>");
+MODULE_LICENSE("GPLv2");
-- 
cgit v1.2.3


From 3f1d44ae640172482a8c0125efe9ca93331b056b Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Thu, 27 Aug 2015 11:13:36 +0100
Subject: Documentation/Changes: Now need OpenSSL devel packages for module
 signing

The module signing script (sign-file) used to be a wrapper around the
openssl program.  It has now been replaced by a C program that uses the
crypto library from the OpenSSL package meaning that the OpenSSL devel
packages are necessary to provide the devel library link and the header
files.

This would be openssl-devel on Fedora and libssl-dev on Debian.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: James Morris <james.l.morris@oracle.com>
---
 Documentation/Changes | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/Changes b/Documentation/Changes
index 646cdaa6e9d1..6d8863004858 100644
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -43,6 +43,7 @@ o  udev                   081                     # udevd --version
 o  grub                   0.93                    # grub --version || grub-install --version
 o  mcelog                 0.6                     # mcelog --version
 o  iptables               1.4.2                   # iptables -V
+o  openssl & libcrypto    1.0.1k                  # openssl version
 
 
 Kernel compilation
@@ -79,6 +80,17 @@ BC
 You will need bc to build kernels 3.10 and higher
 
 
+OpenSSL
+-------
+
+Module signing and external certificate handling use the OpenSSL program and
+crypto library to do key creation and signature generation.
+
+You will need openssl to build kernels 3.7 and higher if module signing is
+enabled.  You will also need openssl development packages to build kernels 4.3
+and higher.
+
+
 System utilities
 ================
 
@@ -295,6 +307,10 @@ Binutils
 --------
 o  <ftp://ftp.kernel.org/pub/linux/devel/binutils/>
 
+OpenSSL
+-------
+o  <https://www.openssl.org/>
+
 System utilities
 ****************
 
@@ -392,4 +408,3 @@ o  <http://oprofile.sf.net/download/>
 NFS-Utils
 ---------
 o  <http://nfs.sourceforge.net/>
-
-- 
cgit v1.2.3


From 2baa891e42d84159b693eadd44f6fe1486285bdc Mon Sep 17 00:00:00 2001
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Date: Mon, 24 Aug 2015 12:13:33 -0700
Subject: x86/mm/mtrr: Remove kernel internal MTRR interfaces: unexport
 mtrr_add() and mtrr_del()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The effort to replace mtrr_add() with architecture agnostic
arch_phys_wc_add() is complete, this will ensure write-combining
implementations (PAT on x86) is taken advantage instead of using
MTRR. With the effort done now, hide direct MTRR access for
drivers.

The legacy user-space /proc/mtrr ABI is not affected.

Update x86 documentation on MTRR to reflect the completion of
the phasing out of direct access to MTRR, also add a note on
platform firmware code use of MTRRs based on the obituary
discussion of MTRRs on Linux [0].

  [0] http://lkml.kernel.org/r/1438991330.3109.196.camel@hp.com

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: <syrjala@sci.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: airlied@linux.ie
Cc: benh@kernel.crashing.org
Cc: bhelgaas@google.com
Cc: dan.j.williams@intel.com
Cc: konrad.wilk@oracle.com
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: mst@redhat.com
Cc: netdev@vger.kernel.org
Cc: vinod.koul@intel.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1440443613-13696-12-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/x86/mtrr.txt      | 20 ++++++++++++++++----
 arch/x86/kernel/cpu/mtrr/main.c |  2 --
 2 files changed, 16 insertions(+), 6 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt
index 860bc3adc223..dc3e703913ac 100644
--- a/Documentation/x86/mtrr.txt
+++ b/Documentation/x86/mtrr.txt
@@ -6,10 +6,22 @@ Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
 ===============================================================================
 Phasing out MTRR use
 
-MTRR use is replaced on modern x86 hardware with PAT. Over time the only type
-of effective MTRR that is expected to be supported will be for write-combining.
-As MTRR use is phased out device drivers should use arch_phys_wc_add() to make
-MTRR effective on non-PAT systems while a no-op on PAT enabled systems.
+MTRR use is replaced on modern x86 hardware with PAT. Direct MTRR use by
+drivers on Linux is now completely phased out, device drivers should use
+arch_phys_wc_add() in combination with ioremap_wc() to make MTRR effective on
+non-PAT systems while a no-op but equally effective on PAT enabled systems.
+
+Even if Linux does not use MTRRs directly, some x86 platform firmware may still
+set up MTRRs early before booting the OS. They do this as some platform
+firmware may still have implemented access to MTRRs which would be controlled
+and handled by the platform firmware directly. An example of platform use of
+MTRRs is through the use of SMI handlers, one case could be for fan control,
+the platform code would need uncachable access to some of its fan control
+registers. Such platform access does not need any Operating System MTRR code in
+place other than mtrr_type_lookup() to ensure any OS specific mapping requests
+are aligned with platform MTRR setup. If MTRRs are only set up by the platform
+firmware code though and the OS does not make any specific MTRR mapping
+requests mtrr_type_lookup() should always return MTRR_TYPE_INVALID.
 
 For details refer to Documentation/x86/pat.txt.
 
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index e7ed0d8ebacb..f891b4750f04 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -448,7 +448,6 @@ int mtrr_add(unsigned long base, unsigned long size, unsigned int type,
 	return mtrr_add_page(base >> PAGE_SHIFT, size >> PAGE_SHIFT, type,
 			     increment);
 }
-EXPORT_SYMBOL(mtrr_add);
 
 /**
  * mtrr_del_page - delete a memory type region
@@ -537,7 +536,6 @@ int mtrr_del(int reg, unsigned long base, unsigned long size)
 		return -EINVAL;
 	return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
 }
-EXPORT_SYMBOL(mtrr_del);
 
 /**
  * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable
-- 
cgit v1.2.3


From aa14318aa02f14044acf81f418e1a1cbda29fcfc Mon Sep 17 00:00:00 2001
From: Jacek Anaszewski <j.anaszewski@samsung.com>
Date: Fri, 10 Apr 2015 10:36:56 +0200
Subject: DT: leds: Improve description of flash LEDs related properties

1. Since max-microamp property has had no users so far, then rename
   it to more descriptive led-max-microamp.
2. Since flash-timeout-us property has had no users so far, then rename
   it to more accurate flash-max-timeout-us.
3. Describe led-max-microamp property as mandatory for specific board
   configurations.
4. Make flash-max-microamp and flash-max-timeout-us properties mandatory
   for devices with configurable flash current and flash timeout settings
   respectively.

Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: devicetree@vger.kernel.org
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
---
 Documentation/devicetree/bindings/leds/common.txt | 27 +++++++++++++++--------
 1 file changed, 18 insertions(+), 9 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/leds/common.txt b/Documentation/devicetree/bindings/leds/common.txt
index 747c53805eec..68419843e32f 100644
--- a/Documentation/devicetree/bindings/leds/common.txt
+++ b/Documentation/devicetree/bindings/leds/common.txt
@@ -29,14 +29,23 @@ Optional properties for child nodes:
      "ide-disk" - LED indicates disk activity
      "timer" - LED flashes at a fixed, configurable rate
 
-- max-microamp : maximum intensity in microamperes of the LED
-		 (torch LED for flash devices)
-- flash-max-microamp : maximum intensity in microamperes of the
-                       flash LED; it is mandatory if the LED should
-		       support the flash mode
-- flash-timeout-us : timeout in microseconds after which the flash
-                     LED is turned off
+- led-max-microamp : Maximum LED supply current in microamperes. This property
+                     can be made mandatory for the board configurations
+                     introducing a risk of hardware damage in case an excessive
+                     current is set.
+                     For flash LED controllers with configurable current this
+                     property is mandatory for the LEDs in the non-flash modes
+                     (e.g. torch or indicator).
 
+Required properties for flash LED child nodes:
+- flash-max-microamp : Maximum flash LED supply current in microamperes.
+- flash-max-timeout-us : Maximum timeout in microseconds after which the flash
+                         LED is turned off.
+
+For controllers that have no configurable current the flash-max-microamp
+property can be omitted.
+For controllers that have no configurable timeout the flash-max-timeout-us
+property can be omitted.
 
 Examples:
 
@@ -49,7 +58,7 @@ system-status {
 camera-flash {
 	label = "Flash";
 	led-sources = <0>, <1>;
-	max-microamp = <50000>;
+	led-max-microamp = <50000>;
 	flash-max-microamp = <320000>;
-	flash-timeout-us = <500000>;
+	flash-max-timeout-us = <500000>;
 };
-- 
cgit v1.2.3


From f7fafd083ccc340502448903aaddc76f10785c8c Mon Sep 17 00:00:00 2001
From: Vincent Donnefort <vdonnefort@gmail.com>
Date: Thu, 2 Jul 2015 19:56:40 +0200
Subject: leds: leds-ns2: move LED modes mapping outside of the driver

On the board n090401 (Seagate NAS 4-Bay), the LED mode mapping (GPIO
values to LED mode) is different from the one used on other boards
supported by the leds-ns2 driver.

With this patch the hardcoded mapping is removed from leds-ns2. Now,
it must be defined either in the platform data (if an old-fashion board
setup file is used) or in the DT node. In order to allow the later, this
patch also introduces a modes-map property for the leds-ns2 DT binding.

Signed-off-by: Vincent Donnefort <vdonnefort@gmail.com>
Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>
---
 .../devicetree/bindings/leds/leds-ns2.txt          |   9 ++
 drivers/leds/leds-ns2.c                            | 102 +++++++++++----------
 include/dt-bindings/leds/leds-ns2.h                |   8 ++
 include/linux/platform_data/leds-kirkwood-ns2.h    |  14 +++
 4 files changed, 85 insertions(+), 48 deletions(-)
 create mode 100644 include/dt-bindings/leds/leds-ns2.h

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/leds/leds-ns2.txt b/Documentation/devicetree/bindings/leds/leds-ns2.txt
index aef3aca34d2d..9f81258a5b6e 100644
--- a/Documentation/devicetree/bindings/leds/leds-ns2.txt
+++ b/Documentation/devicetree/bindings/leds/leds-ns2.txt
@@ -8,6 +8,9 @@ Each LED is represented as a sub-node of the ns2-leds device.
 Required sub-node properties:
 - cmd-gpio: Command LED GPIO. See OF device-tree GPIO specification.
 - slow-gpio: Slow LED GPIO. See OF device-tree GPIO specification.
+- modes-map: A mapping between LED modes (off, on or SATA activity blinking) and
+  the corresponding cmd-gpio/slow-gpio values. All the GPIO values combinations
+  should be given in order to avoid having an unknown mode at driver probe time.
 
 Optional sub-node properties:
 - label: Name for this LED. If omitted, the label is taken from the node name.
@@ -15,6 +18,8 @@ Optional sub-node properties:
 
 Example:
 
+#include <dt-bindings/leds/leds-ns2.h>
+
 ns2-leds {
 	compatible = "lacie,ns2-leds";
 
@@ -22,5 +27,9 @@ ns2-leds {
 		label = "ns2:blue:sata";
 		slow-gpio = <&gpio0 29 0>;
 		cmd-gpio = <&gpio0 30 0>;
+		modes-map = <NS_V2_LED_OFF  0 1
+			     NS_V2_LED_ON   1 0
+			     NS_V2_LED_ON   0 0
+			     NS_V2_LED_SATA 1 1>;
 	};
 };
diff --git a/drivers/leds/leds-ns2.c b/drivers/leds/leds-ns2.c
index 1fd6adbb43b7..b0bc03539dbb 100644
--- a/drivers/leds/leds-ns2.c
+++ b/drivers/leds/leds-ns2.c
@@ -33,46 +33,20 @@
 #include <linux/of_gpio.h>
 
 /*
- * The Network Space v2 dual-GPIO LED is wired to a CPLD and can blink in
- * relation with the SATA activity. This capability is exposed through the
- * "sata" sysfs attribute.
- *
- * The following array detail the different LED registers and the combination
- * of their possible values:
- *
- *  cmd_led   |  slow_led  | /SATA active | LED state
- *            |            |              |
- *     1      |     0      |      x       |  off
- *     -      |     1      |      x       |  on
- *     0      |     0      |      1       |  on
- *     0      |     0      |      0       |  blink (rate 300ms)
+ * The Network Space v2 dual-GPIO LED is wired to a CPLD. Three different LED
+ * modes are available: off, on and SATA activity blinking. The LED modes are
+ * controlled through two GPIOs (command and slow): each combination of values
+ * for the command/slow GPIOs corresponds to a LED mode.
  */
 
-enum ns2_led_modes {
-	NS_V2_LED_OFF,
-	NS_V2_LED_ON,
-	NS_V2_LED_SATA,
-};
-
-struct ns2_led_mode_value {
-	enum ns2_led_modes	mode;
-	int			cmd_level;
-	int			slow_level;
-};
-
-static struct ns2_led_mode_value ns2_led_modval[] = {
-	{ NS_V2_LED_OFF	, 1, 0 },
-	{ NS_V2_LED_ON	, 0, 1 },
-	{ NS_V2_LED_ON	, 1, 1 },
-	{ NS_V2_LED_SATA, 0, 0 },
-};
-
 struct ns2_led_data {
 	struct led_classdev	cdev;
 	unsigned		cmd;
 	unsigned		slow;
 	unsigned char		sata; /* True when SATA mode active. */
 	rwlock_t		rw_lock; /* Lock GPIOs. */
+	int			num_modes;
+	struct ns2_led_modval	*modval;
 };
 
 static int ns2_led_get_mode(struct ns2_led_data *led_dat,
@@ -88,10 +62,10 @@ static int ns2_led_get_mode(struct ns2_led_data *led_dat,
 	cmd_level = gpio_get_value(led_dat->cmd);
 	slow_level = gpio_get_value(led_dat->slow);
 
-	for (i = 0; i < ARRAY_SIZE(ns2_led_modval); i++) {
-		if (cmd_level == ns2_led_modval[i].cmd_level &&
-		    slow_level == ns2_led_modval[i].slow_level) {
-			*mode = ns2_led_modval[i].mode;
+	for (i = 0; i < led_dat->num_modes; i++) {
+		if (cmd_level == led_dat->modval[i].cmd_level &&
+		    slow_level == led_dat->modval[i].slow_level) {
+			*mode = led_dat->modval[i].mode;
 			ret = 0;
 			break;
 		}
@@ -110,12 +84,12 @@ static void ns2_led_set_mode(struct ns2_led_data *led_dat,
 
 	write_lock_irqsave(&led_dat->rw_lock, flags);
 
-	for (i = 0; i < ARRAY_SIZE(ns2_led_modval); i++) {
-		if (mode == ns2_led_modval[i].mode) {
+	for (i = 0; i < led_dat->num_modes; i++) {
+		if (mode == led_dat->modval[i].mode) {
 			gpio_set_value(led_dat->cmd,
-				       ns2_led_modval[i].cmd_level);
+				       led_dat->modval[i].cmd_level);
 			gpio_set_value(led_dat->slow,
-				       ns2_led_modval[i].slow_level);
+				       led_dat->modval[i].slow_level);
 		}
 	}
 
@@ -228,6 +202,8 @@ create_ns2_led(struct platform_device *pdev, struct ns2_led_data *led_dat,
 	led_dat->cdev.groups = ns2_led_groups;
 	led_dat->cmd = template->cmd;
 	led_dat->slow = template->slow;
+	led_dat->modval = template->modval;
+	led_dat->num_modes = template->num_modes;
 
 	ret = ns2_led_get_mode(led_dat, &mode);
 	if (ret < 0)
@@ -259,9 +235,8 @@ ns2_leds_get_of_pdata(struct device *dev, struct ns2_led_platform_data *pdata)
 {
 	struct device_node *np = dev->of_node;
 	struct device_node *child;
-	struct ns2_led *leds;
+	struct ns2_led *led, *leds;
 	int num_leds = 0;
-	int i = 0;
 
 	num_leds = of_get_child_count(np);
 	if (!num_leds)
@@ -272,26 +247,57 @@ ns2_leds_get_of_pdata(struct device *dev, struct ns2_led_platform_data *pdata)
 	if (!leds)
 		return -ENOMEM;
 
+	led = leds;
 	for_each_child_of_node(np, child) {
 		const char *string;
-		int ret;
+		int ret, i, num_modes;
+		struct ns2_led_modval *modval;
 
 		ret = of_get_named_gpio(child, "cmd-gpio", 0);
 		if (ret < 0)
 			return ret;
-		leds[i].cmd = ret;
+		led->cmd = ret;
 		ret = of_get_named_gpio(child, "slow-gpio", 0);
 		if (ret < 0)
 			return ret;
-		leds[i].slow = ret;
+		led->slow = ret;
 		ret = of_property_read_string(child, "label", &string);
-		leds[i].name = (ret == 0) ? string : child->name;
+		led->name = (ret == 0) ? string : child->name;
 		ret = of_property_read_string(child, "linux,default-trigger",
 					      &string);
 		if (ret == 0)
-			leds[i].default_trigger = string;
+			led->default_trigger = string;
+
+		ret = of_property_count_u32_elems(child, "modes-map");
+		if (ret < 0 || ret % 3) {
+			dev_err(dev,
+				"Missing or malformed modes-map property\n");
+			return -EINVAL;
+		}
+
+		num_modes = ret / 3;
+		modval = devm_kzalloc(dev,
+				      num_modes * sizeof(struct ns2_led_modval),
+				      GFP_KERNEL);
+		if (!modval)
+			return -ENOMEM;
+
+		for (i = 0; i < num_modes; i++) {
+			of_property_read_u32_index(child,
+						"modes-map", 3 * i,
+						(u32 *) &modval[i].mode);
+			of_property_read_u32_index(child,
+						"modes-map", 3 * i + 1,
+						(u32 *) &modval[i].cmd_level);
+			of_property_read_u32_index(child,
+						"modes-map", 3 * i + 2,
+						(u32 *) &modval[i].slow_level);
+		}
+
+		led->num_modes = num_modes;
+		led->modval = modval;
 
-		i++;
+		led++;
 	}
 
 	pdata->leds = leds;
diff --git a/include/dt-bindings/leds/leds-ns2.h b/include/dt-bindings/leds/leds-ns2.h
new file mode 100644
index 000000000000..491c5f974a92
--- /dev/null
+++ b/include/dt-bindings/leds/leds-ns2.h
@@ -0,0 +1,8 @@
+#ifndef _DT_BINDINGS_LEDS_NS2_H
+#define _DT_BINDINGS_LEDS_NS2_H
+
+#define NS_V2_LED_OFF	0
+#define NS_V2_LED_ON	1
+#define NS_V2_LED_SATA	2
+
+#endif
diff --git a/include/linux/platform_data/leds-kirkwood-ns2.h b/include/linux/platform_data/leds-kirkwood-ns2.h
index 6a9fed57f346..eb8a6860e816 100644
--- a/include/linux/platform_data/leds-kirkwood-ns2.h
+++ b/include/linux/platform_data/leds-kirkwood-ns2.h
@@ -9,11 +9,25 @@
 #ifndef __LEDS_KIRKWOOD_NS2_H
 #define __LEDS_KIRKWOOD_NS2_H
 
+enum ns2_led_modes {
+	NS_V2_LED_OFF,
+	NS_V2_LED_ON,
+	NS_V2_LED_SATA,
+};
+
+struct ns2_led_modval {
+	enum ns2_led_modes	mode;
+	int			cmd_level;
+	int			slow_level;
+};
+
 struct ns2_led {
 	const char	*name;
 	const char	*default_trigger;
 	unsigned	cmd;
 	unsigned	slow;
+	int		num_modes;
+	struct ns2_led_modval *modval;
 };
 
 struct ns2_led_platform_data {
-- 
cgit v1.2.3


From ce14c5831364118324b10c0355dead062b9ddd40 Mon Sep 17 00:00:00 2001
From: Prarit Bhargava <prarit@redhat.com>
Date: Tue, 25 Aug 2015 13:34:53 -0400
Subject: Documentation, add kernel-parameters.txt entry for dis_ucode_ldr

dis_ucode_ldr was introduced in 65cef13 ("x86, microcode: Add a disable
chicken bit") and will disable microcode loading on x86.  This kernel
parameter is buried in the code and should be added to the Documentation.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/kernel-parameters.txt | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1d6f0459cd7b..34f2afbf7c15 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -910,6 +910,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Disable PIN 1 of APIC timer
 			Can be useful to work around chipset bugs.
 
+	dis_ucode_ldr	[X86] Disable the microcode loader.
+
 	dma_debug=off	If the kernel is compiled with DMA_API_DEBUG support,
 			this option disables the debugging code at boot.
 
-- 
cgit v1.2.3


From bf9628373418328daf67b6508fc3713fc6f5067d Mon Sep 17 00:00:00 2001
From: Kamlakant Patel <kamlakant.patel@broadcom.com>
Date: Thu, 27 Aug 2015 17:49:29 +0530
Subject: spi: Add DT bindings documentation for Netlogic XLP SPI controller

Add DT bindings documentation for SPI controller driver used by
Netlogic XLP MIPS64 SoCs.

Signed-off-by: Kamlakant Patel <kamlakant.patel@broadcom.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/devicetree/bindings/spi/spi-xlp.txt | 39 +++++++++++++++++++++++
 1 file changed, 39 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/spi/spi-xlp.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/spi/spi-xlp.txt b/Documentation/devicetree/bindings/spi/spi-xlp.txt
new file mode 100644
index 000000000000..40e82d51efec
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/spi-xlp.txt
@@ -0,0 +1,39 @@
+SPI Master controller for Netlogic XLP MIPS64 SOCs
+==================================================
+
+Currently this SPI controller driver is supported for the following
+Netlogic XLP SoCs:
+	XLP832, XLP316, XLP208, XLP980, XLP532
+
+Required properties:
+- compatible		: Should be "netlogic,xlp832-spi".
+- #address-cells	: Number of cells required to define a chip select address
+			  on the SPI bus.
+- #size-cells		: Should be zero.
+- reg			: Should contain register location and length.
+- clocks		: Phandle of the spi clock
+- interrupts		: Interrupt number used by this controller.
+- interrupt-parent	: Phandle of the parent interrupt controller.
+
+SPI slave nodes must be children of the SPI master node and can contain
+properties described in Documentation/devicetree/bindings/spi/spi-bus.txt.
+
+Example:
+
+	spi: xlp_spi@3a100 {
+		compatible = "netlogic,xlp832-spi";
+		#address-cells = <1>;
+		#size-cells = <0>;
+		reg = <0 0x3a100 0x100>;
+		clocks = <&spi_clk>;
+		interrupts = <34>;
+		interrupt-parent = <&pic>;
+
+		spi_nor@1 {
+			compatible = "spansion,s25sl12801";
+			#address-cells = <1>;
+			#size-cells = <1>;
+			reg = <1>;	/* Chip Select */
+			spi-max-frequency = <40000000>;
+		};
+};
-- 
cgit v1.2.3


From 628536ea0627e71da654bd34b1942c85832dbdba Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Tue, 25 Aug 2015 01:14:48 -0600
Subject: ASoC: Clean up docbook warnings

A number of functions and structures in the sound subsystem had incomplete
and/or obsolete DocBook comments, leading to warnings when the docs were
built.  Correct those comments so that we can enjoy our audio in the
absence of warning noise.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/DocBook/alsa-driver-api.tmpl |  2 +-
 include/sound/soc.h                        | 11 +++++++----
 sound/soc/soc-core.c                       | 13 +++++++++----
 sound/soc/soc-dapm.c                       |  2 +-
 4 files changed, 18 insertions(+), 10 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/alsa-driver-api.tmpl b/Documentation/DocBook/alsa-driver-api.tmpl
index 71f9246127ec..e94a10bb4a9e 100644
--- a/Documentation/DocBook/alsa-driver-api.tmpl
+++ b/Documentation/DocBook/alsa-driver-api.tmpl
@@ -108,7 +108,7 @@
      <sect1><title>ASoC Core API</title>
 !Iinclude/sound/soc.h
 !Esound/soc/soc-core.c
-!Esound/soc/soc-cache.c
+<!-- !Esound/soc/soc-cache.c no docbook comments here -->
 !Esound/soc/soc-devres.c
 !Esound/soc/soc-io.c
 !Esound/soc/soc-pcm.c
diff --git a/include/sound/soc.h b/include/sound/soc.h
index 93df8bf9d54a..4537e81eeeda 100644
--- a/include/sound/soc.h
+++ b/include/sound/soc.h
@@ -619,6 +619,7 @@ int snd_soc_put_strobe(struct snd_kcontrol *kcontrol,
  * @pin:    name of the pin to update
  * @mask:   bits to check for in reported jack status
  * @invert: if non-zero then pin is enabled when status is not reported
+ * @list:   internal list entry
  */
 struct snd_soc_jack_pin {
 	struct list_head list;
@@ -635,7 +636,7 @@ struct snd_soc_jack_pin {
  * @jack_type: type of jack that is expected for this voltage
  * @debounce_time: debounce_time for jack, codec driver should wait for this
  *		duration before reading the adc for voltages
- * @:list: list container
+ * @list:   internal list entry
  */
 struct snd_soc_jack_zone {
 	unsigned int min_mv;
@@ -651,12 +652,12 @@ struct snd_soc_jack_zone {
  * @gpio:         legacy gpio number
  * @idx:          gpio descriptor index within the function of the GPIO
  *                consumer device
- * @gpiod_dev     GPIO consumer device
+ * @gpiod_dev:    GPIO consumer device
  * @name:         gpio name. Also as connection ID for the GPIO consumer
  *                device function name lookup
  * @report:       value to report when jack detected
  * @invert:       report presence in low state
- * @debouce_time: debouce time in ms
+ * @debounce_time: debounce time in ms
  * @wake:	  enable as wake source
  * @jack_status_check: callback function which overrides the detection
  *		       to provide more complex checks (eg, reading an
@@ -672,11 +673,13 @@ struct snd_soc_jack_gpio {
 	int debounce_time;
 	bool wake;
 
+	/* private: */
 	struct snd_soc_jack *jack;
 	struct delayed_work work;
 	struct gpio_desc *desc;
 
 	void *data;
+	/* public: */
 	int (*jack_status_check)(void *data);
 };
 
@@ -1319,7 +1322,7 @@ static inline struct snd_soc_dapm_context *snd_soc_codec_get_dapm(
 
 /**
  * snd_soc_dapm_init_bias_level() - Initialize CODEC DAPM bias level
- * @dapm: The CODEC for which to initialize the DAPM bias level
+ * @codec: The CODEC for which to initialize the DAPM bias level
  * @level: The DAPM level to initialize to
  *
  * Initializes the CODEC DAPM bias level. See snd_soc_dapm_init_bias_level().
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index 90d6335de17a..32242512d828 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -2798,6 +2798,7 @@ EXPORT_SYMBOL_GPL(snd_soc_register_component);
 /**
  * snd_soc_unregister_component - Unregister a component from the ASoC core
  *
+ * @dev: The device to unregister
  */
 void snd_soc_unregister_component(struct device *dev)
 {
@@ -2877,7 +2878,8 @@ EXPORT_SYMBOL_GPL(snd_soc_add_platform);
 /**
  * snd_soc_register_platform - Register a platform with the ASoC core
  *
- * @platform: platform to register
+ * @dev: The device for the platform
+ * @platform_drv: The driver for the platform
  */
 int snd_soc_register_platform(struct device *dev,
 		const struct snd_soc_platform_driver *platform_drv)
@@ -2938,7 +2940,7 @@ EXPORT_SYMBOL_GPL(snd_soc_lookup_platform);
 /**
  * snd_soc_unregister_platform - Unregister a platform from the ASoC core
  *
- * @platform: platform to unregister
+ * @dev: platform to unregister
  */
 void snd_soc_unregister_platform(struct device *dev)
 {
@@ -3029,7 +3031,10 @@ static int snd_soc_codec_set_bias_level(struct snd_soc_dapm_context *dapm,
 /**
  * snd_soc_register_codec - Register a codec with the ASoC core
  *
- * @codec: codec to register
+ * @dev: The parent device for this codec
+ * @codec_drv: Codec driver
+ * @dai_drv: The associated DAI driver
+ * @num_dai: Number of DAIs
  */
 int snd_soc_register_codec(struct device *dev,
 			   const struct snd_soc_codec_driver *codec_drv,
@@ -3128,7 +3133,7 @@ EXPORT_SYMBOL_GPL(snd_soc_register_codec);
 /**
  * snd_soc_unregister_codec - Unregister a codec from the ASoC core
  *
- * @codec: codec to unregister
+ * @dev: codec to unregister
  */
 void snd_soc_unregister_codec(struct device *dev)
 {
diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c
index aa327c92480c..c4e3720bea41 100644
--- a/sound/soc/soc-dapm.c
+++ b/sound/soc/soc-dapm.c
@@ -2911,7 +2911,7 @@ EXPORT_SYMBOL_GPL(snd_soc_dapm_weak_routes);
 
 /**
  * snd_soc_dapm_new_widgets - add new dapm widgets
- * @dapm: DAPM context
+ * @card: card to be checked for new dapm widgets
  *
  * Checks the codec for any new dapm widgets and creates them if found.
  *
-- 
cgit v1.2.3


From 72b236d60218fe211a8e1210be31c31e81684b86 Mon Sep 17 00:00:00 2001
From: Aaron Skomra <skomra@gmail.com>
Date: Thu, 20 Aug 2015 16:05:17 -0700
Subject: HID: wacom: Add support for Express Key Remote.

This device is pad (buttons) only, there is no stylus or touch. Up to
five remotes can pair with the device's associated USB dongle.

Signed-off-by: Aaron Skomra <aaron.skomra@wacom.com>
Reviewed-by: Jason Gerecke <jason.gerecke@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 Documentation/ABI/testing/sysfs-driver-wacom |  19 +++
 drivers/hid/wacom.h                          |   7 +-
 drivers/hid/wacom_sys.c                      | 195 +++++++++++++++++++++++++++
 drivers/hid/wacom_wac.c                      | 144 ++++++++++++++++++++
 drivers/hid/wacom_wac.h                      |   7 +-
 5 files changed, 370 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-driver-wacom b/Documentation/ABI/testing/sysfs-driver-wacom
index c4f0fed64a6e..dca429340772 100644
--- a/Documentation/ABI/testing/sysfs-driver-wacom
+++ b/Documentation/ABI/testing/sysfs-driver-wacom
@@ -77,3 +77,22 @@ Description:
 		The format is also scrambled, like in the USB mode, and it can
 		be summarized by converting 76543210 into GECA6420.
 					    HGFEDCBA      HFDB7531
+
+What:		/sys/bus/hid/devices/<bus>:<vid>:<pid>.<n>/wacom_remote/unpair_remote
+Date:		July 2015
+Contact:	linux-input@vger.kernel.org
+Description:
+		Writing the character sequence '*' followed by a newline to
+		this file will delete all of the current pairings on the
+		device. Other character sequences are reserved. This file is
+		write only.
+
+What:		/sys/bus/hid/devices/<bus>:<vid>:<pid>.<n>/wacom_remote/<serial_number>/remote_mode
+Date:		July 2015
+Contact:	linux-input@vger.kernel.org
+Description:
+		Reading from this file reports the mode status of the
+		remote as indicated by the LED lights on the device. If no
+		reports have been received from the paired device, reading
+		from this file will report '-1'. The mode is read-only
+		and cannot be set through the driver.
diff --git a/drivers/hid/wacom.h b/drivers/hid/wacom.h
index a533787a6d85..4681a65a4579 100644
--- a/drivers/hid/wacom.h
+++ b/drivers/hid/wacom.h
@@ -113,7 +113,7 @@ struct wacom {
 	struct mutex lock;
 	struct work_struct work;
 	struct wacom_led {
-		u8 select[2]; /* status led selector (0..3) */
+		u8 select[5]; /* status led selector (0..3) */
 		u8 llv;       /* status led brightness no button (1..127) */
 		u8 hlv;       /* status led brightness button pressed (1..127) */
 		u8 img_lum;   /* OLED matrix display brightness */
@@ -123,6 +123,8 @@ struct wacom {
 	struct power_supply *ac;
 	struct power_supply_desc battery_desc;
 	struct power_supply_desc ac_desc;
+	struct kobject *remote_dir;
+	struct attribute_group remote_group[5];
 };
 
 static inline void wacom_schedule_work(struct wacom_wac *wacom_wac)
@@ -147,4 +149,7 @@ int wacom_wac_event(struct hid_device *hdev, struct hid_field *field,
 		struct hid_usage *usage, __s32 value);
 void wacom_wac_report(struct hid_device *hdev, struct hid_report *report);
 void wacom_battery_work(struct work_struct *work);
+int wacom_remote_create_attr_group(struct wacom *wacom, __u32 serial,
+				   int index);
+void wacom_remote_destroy_attr_group(struct wacom *wacom, __u32 serial);
 #endif
diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
index 6edb7d136476..5f6e48e55df9 100644
--- a/drivers/hid/wacom_sys.c
+++ b/drivers/hid/wacom_sys.c
@@ -23,9 +23,13 @@
 #define WAC_CMD_ICON_XFER	0x23
 #define WAC_CMD_ICON_BT_XFER	0x26
 #define WAC_CMD_RETRIES		10
+#define WAC_CMD_DELETE_PAIRING	0x20
+#define WAC_CMD_UNPAIR_ALL	0xFF
+#define WAC_REMOTE_SERIAL_MAX_STRLEN	9
 
 #define DEV_ATTR_RW_PERM (S_IRUGO | S_IWUSR | S_IWGRP)
 #define DEV_ATTR_WO_PERM (S_IWUSR | S_IWGRP)
+#define DEV_ATTR_RO_PERM (S_IRUSR | S_IRGRP)
 
 static int wacom_get_report(struct hid_device *hdev, u8 type, u8 *buf,
 			    size_t size, unsigned int retries)
@@ -1119,6 +1123,189 @@ static ssize_t wacom_store_speed(struct device *dev,
 static DEVICE_ATTR(speed, DEV_ATTR_RW_PERM,
 		wacom_show_speed, wacom_store_speed);
 
+
+static ssize_t wacom_show_remote_mode(struct kobject *kobj,
+				      struct kobj_attribute *kattr,
+				      char *buf, int index)
+{
+	struct device *dev = container_of(kobj->parent, struct device, kobj);
+	struct hid_device *hdev = container_of(dev, struct hid_device, dev);
+	struct wacom *wacom = hid_get_drvdata(hdev);
+	u8 mode;
+
+	mode = wacom->led.select[index];
+	if (mode >= 0 && mode < 3)
+		return snprintf(buf, PAGE_SIZE, "%d\n", mode);
+	else
+		return snprintf(buf, PAGE_SIZE, "%d\n", -1);
+}
+
+#define DEVICE_EKR_ATTR_GROUP(SET_ID)					\
+static ssize_t wacom_show_remote##SET_ID##_mode(struct kobject *kobj,	\
+			       struct kobj_attribute *kattr, char *buf)	\
+{									\
+	return wacom_show_remote_mode(kobj, kattr, buf, SET_ID);	\
+}									\
+static struct kobj_attribute remote##SET_ID##_mode_attr = {		\
+	.attr = {.name = "remote_mode",					\
+		.mode = DEV_ATTR_RO_PERM},				\
+	.show = wacom_show_remote##SET_ID##_mode,			\
+};									\
+static struct attribute *remote##SET_ID##_serial_attrs[] = {		\
+	&remote##SET_ID##_mode_attr.attr,				\
+	NULL								\
+};									\
+static struct attribute_group remote##SET_ID##_serial_group = {		\
+	.name = NULL,							\
+	.attrs = remote##SET_ID##_serial_attrs,				\
+}
+
+DEVICE_EKR_ATTR_GROUP(0);
+DEVICE_EKR_ATTR_GROUP(1);
+DEVICE_EKR_ATTR_GROUP(2);
+DEVICE_EKR_ATTR_GROUP(3);
+DEVICE_EKR_ATTR_GROUP(4);
+
+int wacom_remote_create_attr_group(struct wacom *wacom, __u32 serial, int index)
+{
+	int error = 0;
+	char *buf;
+	struct wacom_wac *wacom_wac = &wacom->wacom_wac;
+
+	wacom_wac->serial[index] = serial;
+
+	buf = kzalloc(WAC_REMOTE_SERIAL_MAX_STRLEN, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+	snprintf(buf, WAC_REMOTE_SERIAL_MAX_STRLEN, "%d", serial);
+	wacom->remote_group[index].name = buf;
+
+	error = sysfs_create_group(wacom->remote_dir,
+				   &wacom->remote_group[index]);
+	if (error) {
+		hid_err(wacom->hdev,
+			"cannot create sysfs group err: %d\n", error);
+		kobject_put(wacom->remote_dir);
+		return error;
+	}
+
+	return 0;
+}
+
+void wacom_remote_destroy_attr_group(struct wacom *wacom, __u32 serial)
+{
+	struct wacom_wac *wacom_wac = &wacom->wacom_wac;
+	int i;
+
+	if (!serial)
+		return;
+
+	for (i = 0; i < WACOM_MAX_REMOTES; i++) {
+		if (wacom_wac->serial[i] == serial) {
+			wacom_wac->serial[i] = 0;
+			wacom->led.select[i] = WACOM_STATUS_UNKNOWN;
+			if (wacom->remote_group[i].name) {
+				sysfs_remove_group(wacom->remote_dir,
+						   &wacom->remote_group[i]);
+				kfree(wacom->remote_group[i].name);
+				wacom->remote_group[i].name = NULL;
+			}
+		}
+	}
+}
+
+static int wacom_cmd_unpair_remote(struct wacom *wacom, unsigned char selector)
+{
+	const size_t buf_size = 2;
+	unsigned char *buf;
+	int retval;
+
+	buf = kzalloc(buf_size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	buf[0] = WAC_CMD_DELETE_PAIRING;
+	buf[1] = selector;
+
+	retval = wacom_set_report(wacom->hdev, HID_OUTPUT_REPORT, buf,
+				  buf_size, WAC_CMD_RETRIES);
+	kfree(buf);
+
+	return retval;
+}
+
+static ssize_t wacom_store_unpair_remote(struct kobject *kobj,
+					 struct kobj_attribute *attr,
+					 const char *buf, size_t count)
+{
+	unsigned char selector = 0;
+	struct device *dev = container_of(kobj->parent, struct device, kobj);
+	struct hid_device *hdev = container_of(dev, struct hid_device, dev);
+	struct wacom *wacom = hid_get_drvdata(hdev);
+	int err;
+
+	if (!strncmp(buf, "*\n", 2)) {
+		selector = WAC_CMD_UNPAIR_ALL;
+	} else {
+		hid_info(wacom->hdev, "remote: unrecognized unpair code: %s\n",
+			 buf);
+		return -1;
+	}
+
+	mutex_lock(&wacom->lock);
+
+	err = wacom_cmd_unpair_remote(wacom, selector);
+	mutex_unlock(&wacom->lock);
+
+	return err < 0 ? err : count;
+}
+
+static struct kobj_attribute unpair_remote_attr = {
+	.attr = {.name = "unpair_remote", .mode = 0200},
+	.store = wacom_store_unpair_remote,
+};
+
+static const struct attribute *remote_unpair_attrs[] = {
+	&unpair_remote_attr.attr,
+	NULL
+};
+
+static int wacom_initialize_remote(struct wacom *wacom)
+{
+	int error = 0;
+	struct wacom_wac *wacom_wac = &(wacom->wacom_wac);
+	int i;
+
+	if (wacom->wacom_wac.features.type != REMOTE)
+		return 0;
+
+	wacom->remote_group[0] = remote0_serial_group;
+	wacom->remote_group[1] = remote1_serial_group;
+	wacom->remote_group[2] = remote2_serial_group;
+	wacom->remote_group[3] = remote3_serial_group;
+	wacom->remote_group[4] = remote4_serial_group;
+
+	wacom->remote_dir = kobject_create_and_add("wacom_remote",
+						   &wacom->hdev->dev.kobj);
+	if (!wacom->remote_dir)
+		return -ENOMEM;
+
+	error = sysfs_create_files(wacom->remote_dir, remote_unpair_attrs);
+
+	if (error) {
+		hid_err(wacom->hdev,
+			"cannot create sysfs group err: %d\n", error);
+		return error;
+	}
+
+	for (i = 0; i < WACOM_MAX_REMOTES; i++) {
+		wacom->led.select[i] = WACOM_STATUS_UNKNOWN;
+		wacom_wac->serial[i] = 0;
+	}
+
+	return 0;
+}
+
 static struct input_dev *wacom_allocate_input(struct wacom *wacom)
 {
 	struct input_dev *input_dev;
@@ -1164,6 +1351,8 @@ static void wacom_clean_inputs(struct wacom *wacom)
 		else
 			input_free_device(wacom->wacom_wac.pad_input);
 	}
+	if (wacom->remote_dir)
+		kobject_put(wacom->remote_dir);
 	wacom->wacom_wac.pen_input = NULL;
 	wacom->wacom_wac.touch_input = NULL;
 	wacom->wacom_wac.pad_input = NULL;
@@ -1243,10 +1432,16 @@ static int wacom_register_inputs(struct wacom *wacom)
 		error = wacom_initialize_leds(wacom);
 		if (error)
 			goto fail_leds;
+
+		error = wacom_initialize_remote(wacom);
+		if (error)
+			goto fail_remote;
 	}
 
 	return 0;
 
+fail_remote:
+	wacom_destroy_leds(wacom);
 fail_leds:
 	input_unregister_device(pad_input_dev);
 	pad_input_dev = NULL;
diff --git a/drivers/hid/wacom_wac.c b/drivers/hid/wacom_wac.c
index ee5d278afa3f..391a68731fe3 100644
--- a/drivers/hid/wacom_wac.c
+++ b/drivers/hid/wacom_wac.c
@@ -631,6 +631,130 @@ static int wacom_intuos_inout(struct wacom_wac *wacom)
 	return 0;
 }
 
+static int wacom_remote_irq(struct wacom_wac *wacom_wac, size_t len)
+{
+	unsigned char *data = wacom_wac->data;
+	struct input_dev *input = wacom_wac->pad_input;
+	struct wacom *wacom = container_of(wacom_wac, struct wacom, wacom_wac);
+	struct wacom_features *features = &wacom_wac->features;
+	int bat_charging, bat_percent, touch_ring_mode;
+	__u32 serial;
+	int i;
+
+	if (data[0] != WACOM_REPORT_REMOTE) {
+		dev_dbg(input->dev.parent,
+			"%s: received unknown report #%d", __func__, data[0]);
+		return 0;
+	}
+
+	serial = data[3] + (data[4] << 8) + (data[5] << 16);
+	wacom_wac->id[0] = PAD_DEVICE_ID;
+
+	input_report_key(input, BTN_0, (data[9] & 0x01));
+	input_report_key(input, BTN_1, (data[9] & 0x02));
+	input_report_key(input, BTN_2, (data[9] & 0x04));
+	input_report_key(input, BTN_3, (data[9] & 0x08));
+	input_report_key(input, BTN_4, (data[9] & 0x10));
+	input_report_key(input, BTN_5, (data[9] & 0x20));
+	input_report_key(input, BTN_6, (data[9] & 0x40));
+	input_report_key(input, BTN_7, (data[9] & 0x80));
+
+	input_report_key(input, BTN_8, (data[10] & 0x01));
+	input_report_key(input, BTN_9, (data[10] & 0x02));
+	input_report_key(input, BTN_A, (data[10] & 0x04));
+	input_report_key(input, BTN_B, (data[10] & 0x08));
+	input_report_key(input, BTN_C, (data[10] & 0x10));
+	input_report_key(input, BTN_X, (data[10] & 0x20));
+	input_report_key(input, BTN_Y, (data[10] & 0x40));
+	input_report_key(input, BTN_Z, (data[10] & 0x80));
+
+	input_report_key(input, BTN_BASE, (data[11] & 0x01));
+	input_report_key(input, BTN_BASE2, (data[11] & 0x02));
+
+	if (data[12] & 0x80)
+		input_report_abs(input, ABS_WHEEL, (data[12] & 0x7f));
+	else
+		input_report_abs(input, ABS_WHEEL, 0);
+
+	bat_percent = data[7] & 0x7f;
+	bat_charging = !!(data[7] & 0x80);
+
+	if (data[9] | data[10] | (data[11] & 0x03) | data[12])
+		input_report_abs(input, ABS_MISC, PAD_DEVICE_ID);
+	else
+		input_report_abs(input, ABS_MISC, 0);
+
+	input_event(input, EV_MSC, MSC_SERIAL, serial);
+
+	/*Which mode select (LED light) is currently on?*/
+	touch_ring_mode = (data[11] & 0xC0) >> 6;
+
+	for (i = 0; i < WACOM_MAX_REMOTES; i++) {
+		if (wacom_wac->serial[i] == serial)
+			wacom->led.select[i] = touch_ring_mode;
+	}
+
+	if (!wacom->battery &&
+	    !(features->quirks & WACOM_QUIRK_BATTERY)) {
+		features->quirks |= WACOM_QUIRK_BATTERY;
+		INIT_WORK(&wacom->work, wacom_battery_work);
+		wacom_schedule_work(wacom_wac);
+	}
+
+	wacom_notify_battery(wacom_wac, bat_percent, bat_charging, 1,
+			     bat_charging);
+
+	return 1;
+}
+
+static int wacom_remote_status_irq(struct wacom_wac *wacom_wac, size_t len)
+{
+	struct wacom *wacom = container_of(wacom_wac, struct wacom, wacom_wac);
+	unsigned char *data = wacom_wac->data;
+	int i;
+
+	if (data[0] != WACOM_REPORT_DEVICE_LIST)
+		return 0;
+
+	for (i = 0; i < WACOM_MAX_REMOTES; i++) {
+		int j = i * 6;
+		int serial = (data[j+6] << 16) + (data[j+5] << 8) + data[j+4];
+		bool connected = data[j+2];
+
+		if (connected) {
+			int k;
+
+			if (wacom_wac->serial[i] == serial)
+				continue;
+
+			if (wacom_wac->serial[i]) {
+				wacom_remote_destroy_attr_group(wacom,
+							wacom_wac->serial[i]);
+			}
+
+			/* A remote can pair more than once with an EKR,
+			 * check to make sure this serial isn't already paired.
+			 */
+			for (k = 0; k < WACOM_MAX_REMOTES; k++) {
+				if (wacom_wac->serial[k] == serial)
+					break;
+			}
+
+			if (k < WACOM_MAX_REMOTES) {
+				wacom_wac->serial[i] = serial;
+				continue;
+			}
+			wacom_remote_create_attr_group(wacom, serial, i);
+
+		} else if (wacom_wac->serial[i]) {
+			wacom_remote_destroy_attr_group(wacom,
+							wacom_wac->serial[i]);
+		}
+	}
+
+	return 0;
+}
+
 static void wacom_intuos_general(struct wacom_wac *wacom)
 {
 	struct wacom_features *features = &wacom->features;
@@ -2191,6 +2315,13 @@ void wacom_wac_irq(struct wacom_wac *wacom_wac, size_t len)
 		sync = wacom_wireless_irq(wacom_wac, len);
 		break;
 
+	case REMOTE:
+		if (wacom_wac->data[0] == WACOM_REPORT_DEVICE_LIST)
+			sync = wacom_remote_status_irq(wacom_wac, len);
+		else
+			sync = wacom_remote_irq(wacom_wac, len);
+		break;
+
 	default:
 		sync = false;
 		break;
@@ -2298,6 +2429,9 @@ void wacom_setup_device_quirks(struct wacom *wacom)
 	if (features->type == BAMBOO_PAD)
 		features->device_type = WACOM_DEVICETYPE_TOUCH;
 
+	if (features->type == REMOTE)
+		features->device_type = WACOM_DEVICETYPE_PAD;
+
 	if (wacom->hdev->bus == BUS_BLUETOOTH)
 		features->quirks |= WACOM_QUIRK_BATTERY;
 
@@ -2717,6 +2851,11 @@ int wacom_setup_pad_input_capabilities(struct input_dev *input_dev,
 
 		break;
 
+	case REMOTE:
+		input_set_capability(input_dev, EV_MSC, MSC_SERIAL);
+		input_set_abs_params(input_dev, ABS_WHEEL, 0, 71, 0, 0);
+		break;
+
 	default:
 		/* no pad supported */
 		return -ENODEV;
@@ -3186,6 +3325,10 @@ static const struct wacom_features wacom_features_0x323 =
 	{ "Wacom Intuos P M", 21600, 13500, 1023, 31,
 	  INTUOSHT, WACOM_INTUOS_RES, WACOM_INTUOS_RES,
 	  .check_for_hid_type = true, .hid_type = HID_TYPE_USBNONE };
+static const struct wacom_features wacom_features_0x331 =
+	{ "Wacom Express Key Remote", 0, 0, 0, 0,
+	  REMOTE, 0, 0, 18, .check_for_hid_type = true,
+	  .hid_type = HID_TYPE_USBNONE };
 
 static const struct wacom_features wacom_features_HID_ANY_ID =
 	{ "Wacom HID", .type = HID_GENERIC };
@@ -3341,6 +3484,7 @@ const struct hid_device_id wacom_ids[] = {
 	{ USB_DEVICE_WACOM(0x32B) },
 	{ USB_DEVICE_WACOM(0x32C) },
 	{ USB_DEVICE_WACOM(0x32F) },
+	{ USB_DEVICE_WACOM(0x331) },
 	{ USB_DEVICE_WACOM(0x333) },
 	{ USB_DEVICE_WACOM(0x335) },
 	{ USB_DEVICE_WACOM(0x336) },
diff --git a/drivers/hid/wacom_wac.h b/drivers/hid/wacom_wac.h
index 4ee5c13b4e75..1e270d401e18 100644
--- a/drivers/hid/wacom_wac.h
+++ b/drivers/hid/wacom_wac.h
@@ -16,6 +16,8 @@
 #define WACOM_PKGLEN_MAX	192
 
 #define WACOM_NAME_MAX		64
+#define WACOM_MAX_REMOTES	5
+#define WACOM_STATUS_UNKNOWN	255
 
 /* packet length for individual models */
 #define WACOM_PKGLEN_BBFUN	 9
@@ -65,6 +67,8 @@
 #define WACOM_REPORT_USB		192
 #define WACOM_REPORT_BPAD_PEN		3
 #define WACOM_REPORT_BPAD_TOUCH		16
+#define WACOM_REPORT_DEVICE_LIST	16
+#define WACOM_REPORT_REMOTE		17
 
 /* device quirks */
 #define WACOM_QUIRK_BBTOUCH_LOWRES	0x0001
@@ -129,6 +133,7 @@ enum {
 	WACOM_24HDT,
 	WACOM_27QHDT,
 	BAMBOO_PAD,
+	REMOTE,
 	TABLETPC,   /* add new TPC below */
 	TABLETPCE,
 	TABLETPC2FG,
@@ -208,7 +213,7 @@ struct wacom_wac {
 	unsigned char data[WACOM_PKGLEN_MAX];
 	int tool[2];
 	int id[2];
-	__u32 serial[2];
+	__u32 serial[5];
 	bool reporting_data;
 	struct wacom_features features;
 	struct wacom_shared *shared;
-- 
cgit v1.2.3


From 7724105686e718ac476a6ad3304fea2fbcfcffde Mon Sep 17 00:00:00 2001
From: Mike Marciniszyn <mike.marciniszyn@intel.com>
Date: Thu, 30 Jul 2015 15:17:43 -0400
Subject: IB/hfi1: add driver files

Signed-off-by: Andrew Friedley <andrew.friedley@intel.com>
Signed-off-by: Arthur Kepner <arthur.kepner@intel.com>
Signed-off-by: Brendan Cunningham <brendan.cunningham@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jim Snow <jim.m.snow@intel.com>
Signed-off-by: John Gregor <john.a.gregor@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Kevin Pine <kevin.pine@intel.com>
Signed-off-by: Kyle Liddell <kyle.liddell@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Ravi Krishnaswamy <ravi.krishnaswamy@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Sanath Kumar <sanath.s.kumar@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Vlad Danushevsky <vladimir.danusevsky@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
---
 Documentation/infiniband/sysfs.txt          |    20 +
 MAINTAINERS                                 |     6 +
 drivers/staging/rdma/Kconfig                |     2 +
 drivers/staging/rdma/Makefile               |     1 +
 drivers/staging/rdma/hfi1/Kconfig           |    37 +
 drivers/staging/rdma/hfi1/Makefile          |    19 +
 drivers/staging/rdma/hfi1/TODO              |     6 +
 drivers/staging/rdma/hfi1/chip.c            | 10798 ++++++++++++++++++++++++++
 drivers/staging/rdma/hfi1/chip.h            |  1035 +++
 drivers/staging/rdma/hfi1/chip_registers.h  |  1289 +++
 drivers/staging/rdma/hfi1/common.h          |   415 +
 drivers/staging/rdma/hfi1/cq.c              |   558 ++
 drivers/staging/rdma/hfi1/debugfs.c         |   899 +++
 drivers/staging/rdma/hfi1/debugfs.h         |    78 +
 drivers/staging/rdma/hfi1/device.c          |   142 +
 drivers/staging/rdma/hfi1/device.h          |    61 +
 drivers/staging/rdma/hfi1/diag.c            |  1873 +++++
 drivers/staging/rdma/hfi1/dma.c             |   186 +
 drivers/staging/rdma/hfi1/driver.c          |  1241 +++
 drivers/staging/rdma/hfi1/eprom.c           |   475 ++
 drivers/staging/rdma/hfi1/eprom.h           |    55 +
 drivers/staging/rdma/hfi1/file_ops.c        |  2140 +++++
 drivers/staging/rdma/hfi1/firmware.c        |  1620 ++++
 drivers/staging/rdma/hfi1/hfi.h             |  1821 +++++
 drivers/staging/rdma/hfi1/init.c            |  1722 ++++
 drivers/staging/rdma/hfi1/intr.c            |   207 +
 drivers/staging/rdma/hfi1/iowait.h          |   186 +
 drivers/staging/rdma/hfi1/keys.c            |   411 +
 drivers/staging/rdma/hfi1/mad.c             |  4257 ++++++++++
 drivers/staging/rdma/hfi1/mad.h             |   325 +
 drivers/staging/rdma/hfi1/mmap.c            |   192 +
 drivers/staging/rdma/hfi1/mr.c              |   546 ++
 drivers/staging/rdma/hfi1/opa_compat.h      |   129 +
 drivers/staging/rdma/hfi1/pcie.c            |  1253 +++
 drivers/staging/rdma/hfi1/pio.c             |  1771 +++++
 drivers/staging/rdma/hfi1/pio.h             |   224 +
 drivers/staging/rdma/hfi1/pio_copy.c        |   858 ++
 drivers/staging/rdma/hfi1/platform_config.h |   286 +
 drivers/staging/rdma/hfi1/qp.c              |  1687 ++++
 drivers/staging/rdma/hfi1/qp.h              |   235 +
 drivers/staging/rdma/hfi1/qsfp.c            |   546 ++
 drivers/staging/rdma/hfi1/qsfp.h            |   222 +
 drivers/staging/rdma/hfi1/rc.c              |  2426 ++++++
 drivers/staging/rdma/hfi1/ruc.c             |   948 +++
 drivers/staging/rdma/hfi1/sdma.c            |  2962 +++++++
 drivers/staging/rdma/hfi1/sdma.h            |  1123 +++
 drivers/staging/rdma/hfi1/srq.c             |   397 +
 drivers/staging/rdma/hfi1/sysfs.c           |   739 ++
 drivers/staging/rdma/hfi1/trace.c           |   221 +
 drivers/staging/rdma/hfi1/trace.h           |  1409 ++++
 drivers/staging/rdma/hfi1/twsi.c            |   518 ++
 drivers/staging/rdma/hfi1/twsi.h            |    68 +
 drivers/staging/rdma/hfi1/uc.c              |   585 ++
 drivers/staging/rdma/hfi1/ud.c              |   885 +++
 drivers/staging/rdma/hfi1/user_pages.c      |   156 +
 drivers/staging/rdma/hfi1/user_sdma.c       |  1444 ++++
 drivers/staging/rdma/hfi1/user_sdma.h       |    89 +
 drivers/staging/rdma/hfi1/verbs.c           |  2142 +++++
 drivers/staging/rdma/hfi1/verbs.h           |  1149 +++
 drivers/staging/rdma/hfi1/verbs_mcast.c     |   385 +
 60 files changed, 57480 insertions(+)
 create mode 100644 drivers/staging/rdma/hfi1/Kconfig
 create mode 100644 drivers/staging/rdma/hfi1/Makefile
 create mode 100644 drivers/staging/rdma/hfi1/TODO
 create mode 100644 drivers/staging/rdma/hfi1/chip.c
 create mode 100644 drivers/staging/rdma/hfi1/chip.h
 create mode 100644 drivers/staging/rdma/hfi1/chip_registers.h
 create mode 100644 drivers/staging/rdma/hfi1/common.h
 create mode 100644 drivers/staging/rdma/hfi1/cq.c
 create mode 100644 drivers/staging/rdma/hfi1/debugfs.c
 create mode 100644 drivers/staging/rdma/hfi1/debugfs.h
 create mode 100644 drivers/staging/rdma/hfi1/device.c
 create mode 100644 drivers/staging/rdma/hfi1/device.h
 create mode 100644 drivers/staging/rdma/hfi1/diag.c
 create mode 100644 drivers/staging/rdma/hfi1/dma.c
 create mode 100644 drivers/staging/rdma/hfi1/driver.c
 create mode 100644 drivers/staging/rdma/hfi1/eprom.c
 create mode 100644 drivers/staging/rdma/hfi1/eprom.h
 create mode 100644 drivers/staging/rdma/hfi1/file_ops.c
 create mode 100644 drivers/staging/rdma/hfi1/firmware.c
 create mode 100644 drivers/staging/rdma/hfi1/hfi.h
 create mode 100644 drivers/staging/rdma/hfi1/init.c
 create mode 100644 drivers/staging/rdma/hfi1/intr.c
 create mode 100644 drivers/staging/rdma/hfi1/iowait.h
 create mode 100644 drivers/staging/rdma/hfi1/keys.c
 create mode 100644 drivers/staging/rdma/hfi1/mad.c
 create mode 100644 drivers/staging/rdma/hfi1/mad.h
 create mode 100644 drivers/staging/rdma/hfi1/mmap.c
 create mode 100644 drivers/staging/rdma/hfi1/mr.c
 create mode 100644 drivers/staging/rdma/hfi1/opa_compat.h
 create mode 100644 drivers/staging/rdma/hfi1/pcie.c
 create mode 100644 drivers/staging/rdma/hfi1/pio.c
 create mode 100644 drivers/staging/rdma/hfi1/pio.h
 create mode 100644 drivers/staging/rdma/hfi1/pio_copy.c
 create mode 100644 drivers/staging/rdma/hfi1/platform_config.h
 create mode 100644 drivers/staging/rdma/hfi1/qp.c
 create mode 100644 drivers/staging/rdma/hfi1/qp.h
 create mode 100644 drivers/staging/rdma/hfi1/qsfp.c
 create mode 100644 drivers/staging/rdma/hfi1/qsfp.h
 create mode 100644 drivers/staging/rdma/hfi1/rc.c
 create mode 100644 drivers/staging/rdma/hfi1/ruc.c
 create mode 100644 drivers/staging/rdma/hfi1/sdma.c
 create mode 100644 drivers/staging/rdma/hfi1/sdma.h
 create mode 100644 drivers/staging/rdma/hfi1/srq.c
 create mode 100644 drivers/staging/rdma/hfi1/sysfs.c
 create mode 100644 drivers/staging/rdma/hfi1/trace.c
 create mode 100644 drivers/staging/rdma/hfi1/trace.h
 create mode 100644 drivers/staging/rdma/hfi1/twsi.c
 create mode 100644 drivers/staging/rdma/hfi1/twsi.h
 create mode 100644 drivers/staging/rdma/hfi1/uc.c
 create mode 100644 drivers/staging/rdma/hfi1/ud.c
 create mode 100644 drivers/staging/rdma/hfi1/user_pages.c
 create mode 100644 drivers/staging/rdma/hfi1/user_sdma.c
 create mode 100644 drivers/staging/rdma/hfi1/user_sdma.h
 create mode 100644 drivers/staging/rdma/hfi1/verbs.c
 create mode 100644 drivers/staging/rdma/hfi1/verbs.h
 create mode 100644 drivers/staging/rdma/hfi1/verbs_mcast.c

(limited to 'Documentation')

diff --git a/Documentation/infiniband/sysfs.txt b/Documentation/infiniband/sysfs.txt
index ddd519b72ee1..9028b025501a 100644
--- a/Documentation/infiniband/sysfs.txt
+++ b/Documentation/infiniband/sysfs.txt
@@ -64,3 +64,23 @@ MTHCA
     fw_ver   - Firmware version
     hca_type - HCA type: "MT23108", "MT25208 (MT23108 compat mode)",
                or "MT25208"
+
+HFI1
+
+  The hfi1 driver also creates these additional files:
+
+   hw_rev - hardware revision
+   board_id - manufacturing board id
+   tempsense - thermal sense information
+   serial - board serial number
+   nfreectxts - number of free user contexts
+   nctxts - number of allowed contexts (PSM2)
+   chip_reset - diagnostic (root only)
+   boardversion - board version
+   ports/1/
+          CMgtA/
+               cc_settings_bin - CCA tables used by PSM2
+               cc_table_bin
+          sc2v/ - 32 files (0 - 31) used to translate sl->vl
+          sl2sc/ - 32 files (0 - 31) used to translate sl->sc
+          vl2mtu/ - 16 (0 - 15) files used to determine MTU for vl
diff --git a/MAINTAINERS b/MAINTAINERS
index db1a523ed493..2249ac889076 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9809,6 +9809,12 @@ M:	Arnaud Patard <arnaud.patard@rtp-net.org>
 S:	Odd Fixes
 F:	drivers/staging/xgifb/
 
+HFI1 DRIVER
+M:	Mike Marciniszyn <infinipath@intel.com>
+L:	linux-rdma@vger.kernel.org
+S:	Supported
+F:	drivers/staging/rdma/hfi1
+
 STARFIRE/DURALAN NETWORK DRIVER
 M:	Ion Badulescu <ionut@badula.org>
 S:	Odd Fixes
diff --git a/drivers/staging/rdma/Kconfig b/drivers/staging/rdma/Kconfig
index 5084088c8946..cf5fe9bb87a1 100644
--- a/drivers/staging/rdma/Kconfig
+++ b/drivers/staging/rdma/Kconfig
@@ -24,6 +24,8 @@ if STAGING_RDMA
 
 source "drivers/staging/rdma/amso1100/Kconfig"
 
+source "drivers/staging/rdma/hfi1/Kconfig"
+
 source "drivers/staging/rdma/ipath/Kconfig"
 
 endif
diff --git a/drivers/staging/rdma/Makefile b/drivers/staging/rdma/Makefile
index a2a459ac8d67..cbd915ac7f20 100644
--- a/drivers/staging/rdma/Makefile
+++ b/drivers/staging/rdma/Makefile
@@ -1,3 +1,4 @@
 # Entries for RDMA_STAGING tree
 obj-$(CONFIG_INFINIBAND_AMSO1100)	+= amso1100/
+obj-$(CONFIG_INFINIBAND_HFI1)	+= hfi1/
 obj-$(CONFIG_INFINIBAND_IPATH)	+= ipath/
diff --git a/drivers/staging/rdma/hfi1/Kconfig b/drivers/staging/rdma/hfi1/Kconfig
new file mode 100644
index 000000000000..fd25078ee923
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/Kconfig
@@ -0,0 +1,37 @@
+config INFINIBAND_HFI1
+	tristate "Intel OPA Gen1 support"
+	depends on X86_64
+	default m
+	---help---
+	This is a low-level driver for Intel OPA Gen1 adapter.
+config HFI1_DEBUG_SDMA_ORDER
+	bool "HFI1 SDMA Order debug"
+	depends on INFINIBAND_HFI1
+	default n
+	---help---
+	This is a debug flag to test for out of order
+	sdma completions for unit testing
+config HFI1_VERBS_31BIT_PSN
+	bool "HFI1 enable 31 bit PSN"
+	depends on INFINIBAND_HFI1
+	default y
+	---help---
+	Setting this enables 31 BIT PSN
+	For verbs RC/UC
+config SDMA_VERBOSITY
+	bool "Config SDMA Verbosity"
+	depends on INFINIBAND_HFI1
+	default n
+	---help---
+	This is a configuration flag to enable verbose
+	SDMA debug
+config PRESCAN_RXQ
+	bool "Enable prescanning of the RX queue for ECNs"
+	depends on INFINIBAND_HFI1
+	default n
+	---help---
+	This option toggles the prescanning of the receive queue for
+	Explicit Congestion Notifications. If an ECN is detected, it
+	is processed as quickly as possible, the ECN is toggled off.
+	After the prescanning step, the receive queue is processed as
+	usual.
diff --git a/drivers/staging/rdma/hfi1/Makefile b/drivers/staging/rdma/hfi1/Makefile
new file mode 100644
index 000000000000..2e5daa6cdcc2
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/Makefile
@@ -0,0 +1,19 @@
+#
+# HFI driver
+#
+#
+#
+# Called from the kernel module build system.
+#
+obj-$(CONFIG_INFINIBAND_HFI1) += hfi1.o
+
+hfi1-y := chip.o cq.o device.o diag.o dma.o driver.o eprom.o file_ops.o firmware.o \
+	init.o intr.o keys.o mad.o mmap.o mr.o pcie.o pio.o pio_copy.o \
+	qp.o qsfp.o rc.o ruc.o sdma.o srq.o sysfs.o trace.o twsi.o \
+	uc.o ud.o user_pages.o user_sdma.o verbs_mcast.o verbs.o
+hfi1-$(CONFIG_DEBUG_FS) += debugfs.o
+
+CFLAGS_trace.o = -I$(src)
+ifdef MVERSION
+CFLAGS_driver.o = -DHFI_DRIVER_VERSION_BASE=\"$(MVERSION)\"
+endif
diff --git a/drivers/staging/rdma/hfi1/TODO b/drivers/staging/rdma/hfi1/TODO
new file mode 100644
index 000000000000..05de0dad8762
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/TODO
@@ -0,0 +1,6 @@
+July, 2015
+
+- Remove unneeded file entries in sysfs
+- Remove software processing of IB protocol and place in library for use
+  by qib, ipath (if still present), hfi1, and eventually soft-roce
+
diff --git a/drivers/staging/rdma/hfi1/chip.c b/drivers/staging/rdma/hfi1/chip.c
new file mode 100644
index 000000000000..654eafef1d30
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/chip.c
@@ -0,0 +1,10798 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+/*
+ * This file contains all of the code that is specific to the HFI chip
+ */
+
+#include <linux/pci.h>
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+
+#include "hfi.h"
+#include "trace.h"
+#include "mad.h"
+#include "pio.h"
+#include "sdma.h"
+#include "eprom.h"
+
+#define NUM_IB_PORTS 1
+
+uint kdeth_qp;
+module_param_named(kdeth_qp, kdeth_qp, uint, S_IRUGO);
+MODULE_PARM_DESC(kdeth_qp, "Set the KDETH queue pair prefix");
+
+uint num_vls = HFI1_MAX_VLS_SUPPORTED;
+module_param(num_vls, uint, S_IRUGO);
+MODULE_PARM_DESC(num_vls, "Set number of Virtual Lanes to use (1-8)");
+
+/*
+ * Default time to aggregate two 10K packets from the idle state
+ * (timer not running). The timer starts at the end of the first packet,
+ * so only the time for one 10K packet and header plus a bit extra is needed.
+ * 10 * 1024 + 64 header byte = 10304 byte
+ * 10304 byte / 12.5 GB/s = 824.32ns
+ */
+uint rcv_intr_timeout = (824 + 16); /* 16 is for coalescing interrupt */
+module_param(rcv_intr_timeout, uint, S_IRUGO);
+MODULE_PARM_DESC(rcv_intr_timeout, "Receive interrupt mitigation timeout in ns");
+
+uint rcv_intr_count = 16; /* same as qib */
+module_param(rcv_intr_count, uint, S_IRUGO);
+MODULE_PARM_DESC(rcv_intr_count, "Receive interrupt mitigation count");
+
+ushort link_crc_mask = SUPPORTED_CRCS;
+module_param(link_crc_mask, ushort, S_IRUGO);
+MODULE_PARM_DESC(link_crc_mask, "CRCs to use on the link");
+
+uint loopback;
+module_param_named(loopback, loopback, uint, S_IRUGO);
+MODULE_PARM_DESC(loopback, "Put into loopback mode (1 = serdes, 3 = external cable");
+
+/* Other driver tunables */
+uint rcv_intr_dynamic = 1; /* enable dynamic mode for rcv int mitigation*/
+static ushort crc_14b_sideband = 1;
+static uint use_flr = 1;
+uint quick_linkup; /* skip LNI */
+
+struct flag_table {
+	u64 flag;	/* the flag */
+	char *str;	/* description string */
+	u16 extra;	/* extra information */
+	u16 unused0;
+	u32 unused1;
+};
+
+/* str must be a string constant */
+#define FLAG_ENTRY(str, extra, flag) {flag, str, extra}
+#define FLAG_ENTRY0(str, flag) {flag, str, 0}
+
+/* Send Error Consequences */
+#define SEC_WRITE_DROPPED	0x1
+#define SEC_PACKET_DROPPED	0x2
+#define SEC_SC_HALTED		0x4	/* per-context only */
+#define SEC_SPC_FREEZE		0x8	/* per-HFI only */
+
+#define VL15CTXT                  1
+#define MIN_KERNEL_KCTXTS         2
+#define NUM_MAP_REGS             32
+
+/* Bit offset into the GUID which carries HFI id information */
+#define GUID_HFI_INDEX_SHIFT     39
+
+/* extract the emulation revision */
+#define emulator_rev(dd) ((dd)->irev >> 8)
+/* parallel and serial emulation versions are 3 and 4 respectively */
+#define is_emulator_p(dd) ((((dd)->irev) & 0xf) == 3)
+#define is_emulator_s(dd) ((((dd)->irev) & 0xf) == 4)
+
+/* RSM fields */
+
+/* packet type */
+#define IB_PACKET_TYPE         2ull
+#define QW_SHIFT               6ull
+/* QPN[7..1] */
+#define QPN_WIDTH              7ull
+
+/* LRH.BTH: QW 0, OFFSET 48 - for match */
+#define LRH_BTH_QW             0ull
+#define LRH_BTH_BIT_OFFSET     48ull
+#define LRH_BTH_OFFSET(off)    ((LRH_BTH_QW << QW_SHIFT) | (off))
+#define LRH_BTH_MATCH_OFFSET   LRH_BTH_OFFSET(LRH_BTH_BIT_OFFSET)
+#define LRH_BTH_SELECT
+#define LRH_BTH_MASK           3ull
+#define LRH_BTH_VALUE          2ull
+
+/* LRH.SC[3..0] QW 0, OFFSET 56 - for match */
+#define LRH_SC_QW              0ull
+#define LRH_SC_BIT_OFFSET      56ull
+#define LRH_SC_OFFSET(off)     ((LRH_SC_QW << QW_SHIFT) | (off))
+#define LRH_SC_MATCH_OFFSET    LRH_SC_OFFSET(LRH_SC_BIT_OFFSET)
+#define LRH_SC_MASK            128ull
+#define LRH_SC_VALUE           0ull
+
+/* SC[n..0] QW 0, OFFSET 60 - for select */
+#define LRH_SC_SELECT_OFFSET  ((LRH_SC_QW << QW_SHIFT) | (60ull))
+
+/* QPN[m+n:1] QW 1, OFFSET 1 */
+#define QPN_SELECT_OFFSET      ((1ull << QW_SHIFT) | (1ull))
+
+/* defines to build power on SC2VL table */
+#define SC2VL_VAL( \
+	num, \
+	sc0, sc0val, \
+	sc1, sc1val, \
+	sc2, sc2val, \
+	sc3, sc3val, \
+	sc4, sc4val, \
+	sc5, sc5val, \
+	sc6, sc6val, \
+	sc7, sc7val) \
+( \
+	((u64)(sc0val) << SEND_SC2VLT##num##_SC##sc0##_SHIFT) | \
+	((u64)(sc1val) << SEND_SC2VLT##num##_SC##sc1##_SHIFT) | \
+	((u64)(sc2val) << SEND_SC2VLT##num##_SC##sc2##_SHIFT) | \
+	((u64)(sc3val) << SEND_SC2VLT##num##_SC##sc3##_SHIFT) | \
+	((u64)(sc4val) << SEND_SC2VLT##num##_SC##sc4##_SHIFT) | \
+	((u64)(sc5val) << SEND_SC2VLT##num##_SC##sc5##_SHIFT) | \
+	((u64)(sc6val) << SEND_SC2VLT##num##_SC##sc6##_SHIFT) | \
+	((u64)(sc7val) << SEND_SC2VLT##num##_SC##sc7##_SHIFT)   \
+)
+
+#define DC_SC_VL_VAL( \
+	range, \
+	e0, e0val, \
+	e1, e1val, \
+	e2, e2val, \
+	e3, e3val, \
+	e4, e4val, \
+	e5, e5val, \
+	e6, e6val, \
+	e7, e7val, \
+	e8, e8val, \
+	e9, e9val, \
+	e10, e10val, \
+	e11, e11val, \
+	e12, e12val, \
+	e13, e13val, \
+	e14, e14val, \
+	e15, e15val) \
+( \
+	((u64)(e0val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e0##_SHIFT) | \
+	((u64)(e1val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e1##_SHIFT) | \
+	((u64)(e2val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e2##_SHIFT) | \
+	((u64)(e3val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e3##_SHIFT) | \
+	((u64)(e4val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e4##_SHIFT) | \
+	((u64)(e5val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e5##_SHIFT) | \
+	((u64)(e6val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e6##_SHIFT) | \
+	((u64)(e7val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e7##_SHIFT) | \
+	((u64)(e8val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e8##_SHIFT) | \
+	((u64)(e9val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e9##_SHIFT) | \
+	((u64)(e10val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e10##_SHIFT) | \
+	((u64)(e11val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e11##_SHIFT) | \
+	((u64)(e12val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e12##_SHIFT) | \
+	((u64)(e13val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e13##_SHIFT) | \
+	((u64)(e14val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e14##_SHIFT) | \
+	((u64)(e15val) << DCC_CFG_SC_VL_TABLE_##range##_ENTRY##e15##_SHIFT) \
+)
+
+/* all CceStatus sub-block freeze bits */
+#define ALL_FROZE (CCE_STATUS_SDMA_FROZE_SMASK \
+			| CCE_STATUS_RXE_FROZE_SMASK \
+			| CCE_STATUS_TXE_FROZE_SMASK \
+			| CCE_STATUS_TXE_PIO_FROZE_SMASK)
+/* all CceStatus sub-block TXE pause bits */
+#define ALL_TXE_PAUSE (CCE_STATUS_TXE_PIO_PAUSED_SMASK \
+			| CCE_STATUS_TXE_PAUSED_SMASK \
+			| CCE_STATUS_SDMA_PAUSED_SMASK)
+/* all CceStatus sub-block RXE pause bits */
+#define ALL_RXE_PAUSE CCE_STATUS_RXE_PAUSED_SMASK
+
+/*
+ * CCE Error flags.
+ */
+static struct flag_table cce_err_status_flags[] = {
+/* 0*/	FLAG_ENTRY0("CceCsrParityErr",
+		CCE_ERR_STATUS_CCE_CSR_PARITY_ERR_SMASK),
+/* 1*/	FLAG_ENTRY0("CceCsrReadBadAddrErr",
+		CCE_ERR_STATUS_CCE_CSR_READ_BAD_ADDR_ERR_SMASK),
+/* 2*/	FLAG_ENTRY0("CceCsrWriteBadAddrErr",
+		CCE_ERR_STATUS_CCE_CSR_WRITE_BAD_ADDR_ERR_SMASK),
+/* 3*/	FLAG_ENTRY0("CceTrgtAsyncFifoParityErr",
+		CCE_ERR_STATUS_CCE_TRGT_ASYNC_FIFO_PARITY_ERR_SMASK),
+/* 4*/	FLAG_ENTRY0("CceTrgtAccessErr",
+		CCE_ERR_STATUS_CCE_TRGT_ACCESS_ERR_SMASK),
+/* 5*/	FLAG_ENTRY0("CceRspdDataParityErr",
+		CCE_ERR_STATUS_CCE_RSPD_DATA_PARITY_ERR_SMASK),
+/* 6*/	FLAG_ENTRY0("CceCli0AsyncFifoParityErr",
+		CCE_ERR_STATUS_CCE_CLI0_ASYNC_FIFO_PARITY_ERR_SMASK),
+/* 7*/	FLAG_ENTRY0("CceCsrCfgBusParityErr",
+		CCE_ERR_STATUS_CCE_CSR_CFG_BUS_PARITY_ERR_SMASK),
+/* 8*/	FLAG_ENTRY0("CceCli2AsyncFifoParityErr",
+		CCE_ERR_STATUS_CCE_CLI2_ASYNC_FIFO_PARITY_ERR_SMASK),
+/* 9*/	FLAG_ENTRY0("CceCli1AsyncFifoPioCrdtParityErr",
+	    CCE_ERR_STATUS_CCE_CLI1_ASYNC_FIFO_PIO_CRDT_PARITY_ERR_SMASK),
+/*10*/	FLAG_ENTRY0("CceCli1AsyncFifoPioCrdtParityErr",
+	    CCE_ERR_STATUS_CCE_CLI1_ASYNC_FIFO_SDMA_HD_PARITY_ERR_SMASK),
+/*11*/	FLAG_ENTRY0("CceCli1AsyncFifoRxdmaParityError",
+	    CCE_ERR_STATUS_CCE_CLI1_ASYNC_FIFO_RXDMA_PARITY_ERROR_SMASK),
+/*12*/	FLAG_ENTRY0("CceCli1AsyncFifoDbgParityError",
+		CCE_ERR_STATUS_CCE_CLI1_ASYNC_FIFO_DBG_PARITY_ERROR_SMASK),
+/*13*/	FLAG_ENTRY0("PcicRetryMemCorErr",
+		CCE_ERR_STATUS_PCIC_RETRY_MEM_COR_ERR_SMASK),
+/*14*/	FLAG_ENTRY0("PcicRetryMemCorErr",
+		CCE_ERR_STATUS_PCIC_RETRY_SOT_MEM_COR_ERR_SMASK),
+/*15*/	FLAG_ENTRY0("PcicPostHdQCorErr",
+		CCE_ERR_STATUS_PCIC_POST_HD_QCOR_ERR_SMASK),
+/*16*/	FLAG_ENTRY0("PcicPostHdQCorErr",
+		CCE_ERR_STATUS_PCIC_POST_DAT_QCOR_ERR_SMASK),
+/*17*/	FLAG_ENTRY0("PcicPostHdQCorErr",
+		CCE_ERR_STATUS_PCIC_CPL_HD_QCOR_ERR_SMASK),
+/*18*/	FLAG_ENTRY0("PcicCplDatQCorErr",
+		CCE_ERR_STATUS_PCIC_CPL_DAT_QCOR_ERR_SMASK),
+/*19*/	FLAG_ENTRY0("PcicNPostHQParityErr",
+		CCE_ERR_STATUS_PCIC_NPOST_HQ_PARITY_ERR_SMASK),
+/*20*/	FLAG_ENTRY0("PcicNPostDatQParityErr",
+		CCE_ERR_STATUS_PCIC_NPOST_DAT_QPARITY_ERR_SMASK),
+/*21*/	FLAG_ENTRY0("PcicRetryMemUncErr",
+		CCE_ERR_STATUS_PCIC_RETRY_MEM_UNC_ERR_SMASK),
+/*22*/	FLAG_ENTRY0("PcicRetrySotMemUncErr",
+		CCE_ERR_STATUS_PCIC_RETRY_SOT_MEM_UNC_ERR_SMASK),
+/*23*/	FLAG_ENTRY0("PcicPostHdQUncErr",
+		CCE_ERR_STATUS_PCIC_POST_HD_QUNC_ERR_SMASK),
+/*24*/	FLAG_ENTRY0("PcicPostDatQUncErr",
+		CCE_ERR_STATUS_PCIC_POST_DAT_QUNC_ERR_SMASK),
+/*25*/	FLAG_ENTRY0("PcicCplHdQUncErr",
+		CCE_ERR_STATUS_PCIC_CPL_HD_QUNC_ERR_SMASK),
+/*26*/	FLAG_ENTRY0("PcicCplDatQUncErr",
+		CCE_ERR_STATUS_PCIC_CPL_DAT_QUNC_ERR_SMASK),
+/*27*/	FLAG_ENTRY0("PcicTransmitFrontParityErr",
+		CCE_ERR_STATUS_PCIC_TRANSMIT_FRONT_PARITY_ERR_SMASK),
+/*28*/	FLAG_ENTRY0("PcicTransmitBackParityErr",
+		CCE_ERR_STATUS_PCIC_TRANSMIT_BACK_PARITY_ERR_SMASK),
+/*29*/	FLAG_ENTRY0("PcicReceiveParityErr",
+		CCE_ERR_STATUS_PCIC_RECEIVE_PARITY_ERR_SMASK),
+/*30*/	FLAG_ENTRY0("CceTrgtCplTimeoutErr",
+		CCE_ERR_STATUS_CCE_TRGT_CPL_TIMEOUT_ERR_SMASK),
+/*31*/	FLAG_ENTRY0("LATriggered",
+		CCE_ERR_STATUS_LA_TRIGGERED_SMASK),
+/*32*/	FLAG_ENTRY0("CceSegReadBadAddrErr",
+		CCE_ERR_STATUS_CCE_SEG_READ_BAD_ADDR_ERR_SMASK),
+/*33*/	FLAG_ENTRY0("CceSegWriteBadAddrErr",
+		CCE_ERR_STATUS_CCE_SEG_WRITE_BAD_ADDR_ERR_SMASK),
+/*34*/	FLAG_ENTRY0("CceRcplAsyncFifoParityErr",
+		CCE_ERR_STATUS_CCE_RCPL_ASYNC_FIFO_PARITY_ERR_SMASK),
+/*35*/	FLAG_ENTRY0("CceRxdmaConvFifoParityErr",
+		CCE_ERR_STATUS_CCE_RXDMA_CONV_FIFO_PARITY_ERR_SMASK),
+/*36*/	FLAG_ENTRY0("CceMsixTableCorErr",
+		CCE_ERR_STATUS_CCE_MSIX_TABLE_COR_ERR_SMASK),
+/*37*/	FLAG_ENTRY0("CceMsixTableUncErr",
+		CCE_ERR_STATUS_CCE_MSIX_TABLE_UNC_ERR_SMASK),
+/*38*/	FLAG_ENTRY0("CceIntMapCorErr",
+		CCE_ERR_STATUS_CCE_INT_MAP_COR_ERR_SMASK),
+/*39*/	FLAG_ENTRY0("CceIntMapUncErr",
+		CCE_ERR_STATUS_CCE_INT_MAP_UNC_ERR_SMASK),
+/*40*/	FLAG_ENTRY0("CceMsixCsrParityErr",
+		CCE_ERR_STATUS_CCE_MSIX_CSR_PARITY_ERR_SMASK),
+/*41-63 reserved*/
+};
+
+/*
+ * Misc Error flags
+ */
+#define MES(text) MISC_ERR_STATUS_MISC_##text##_ERR_SMASK
+static struct flag_table misc_err_status_flags[] = {
+/* 0*/	FLAG_ENTRY0("CSR_PARITY", MES(CSR_PARITY)),
+/* 1*/	FLAG_ENTRY0("CSR_READ_BAD_ADDR", MES(CSR_READ_BAD_ADDR)),
+/* 2*/	FLAG_ENTRY0("CSR_WRITE_BAD_ADDR", MES(CSR_WRITE_BAD_ADDR)),
+/* 3*/	FLAG_ENTRY0("SBUS_WRITE_FAILED", MES(SBUS_WRITE_FAILED)),
+/* 4*/	FLAG_ENTRY0("KEY_MISMATCH", MES(KEY_MISMATCH)),
+/* 5*/	FLAG_ENTRY0("FW_AUTH_FAILED", MES(FW_AUTH_FAILED)),
+/* 6*/	FLAG_ENTRY0("EFUSE_CSR_PARITY", MES(EFUSE_CSR_PARITY)),
+/* 7*/	FLAG_ENTRY0("EFUSE_READ_BAD_ADDR", MES(EFUSE_READ_BAD_ADDR)),
+/* 8*/	FLAG_ENTRY0("EFUSE_WRITE", MES(EFUSE_WRITE)),
+/* 9*/	FLAG_ENTRY0("EFUSE_DONE_PARITY", MES(EFUSE_DONE_PARITY)),
+/*10*/	FLAG_ENTRY0("INVALID_EEP_CMD", MES(INVALID_EEP_CMD)),
+/*11*/	FLAG_ENTRY0("MBIST_FAIL", MES(MBIST_FAIL)),
+/*12*/	FLAG_ENTRY0("PLL_LOCK_FAIL", MES(PLL_LOCK_FAIL))
+};
+
+/*
+ * TXE PIO Error flags and consequences
+ */
+static struct flag_table pio_err_status_flags[] = {
+/* 0*/	FLAG_ENTRY("PioWriteBadCtxt",
+	SEC_WRITE_DROPPED,
+	SEND_PIO_ERR_STATUS_PIO_WRITE_BAD_CTXT_ERR_SMASK),
+/* 1*/	FLAG_ENTRY("PioWriteAddrParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_WRITE_ADDR_PARITY_ERR_SMASK),
+/* 2*/	FLAG_ENTRY("PioCsrParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_CSR_PARITY_ERR_SMASK),
+/* 3*/	FLAG_ENTRY("PioSbMemFifo0",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_SB_MEM_FIFO0_ERR_SMASK),
+/* 4*/	FLAG_ENTRY("PioSbMemFifo1",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_SB_MEM_FIFO1_ERR_SMASK),
+/* 5*/	FLAG_ENTRY("PioPccFifoParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_PCC_FIFO_PARITY_ERR_SMASK),
+/* 6*/	FLAG_ENTRY("PioPecFifoParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_PEC_FIFO_PARITY_ERR_SMASK),
+/* 7*/	FLAG_ENTRY("PioSbrdctlCrrelParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_SBRDCTL_CRREL_PARITY_ERR_SMASK),
+/* 8*/	FLAG_ENTRY("PioSbrdctrlCrrelFifoParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_SBRDCTRL_CRREL_FIFO_PARITY_ERR_SMASK),
+/* 9*/	FLAG_ENTRY("PioPktEvictFifoParityErr",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_PKT_EVICT_FIFO_PARITY_ERR_SMASK),
+/*10*/	FLAG_ENTRY("PioSmPktResetParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_SM_PKT_RESET_PARITY_ERR_SMASK),
+/*11*/	FLAG_ENTRY("PioVlLenMemBank0Unc",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_VL_LEN_MEM_BANK0_UNC_ERR_SMASK),
+/*12*/	FLAG_ENTRY("PioVlLenMemBank1Unc",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_VL_LEN_MEM_BANK1_UNC_ERR_SMASK),
+/*13*/	FLAG_ENTRY("PioVlLenMemBank0Cor",
+	0,
+	SEND_PIO_ERR_STATUS_PIO_VL_LEN_MEM_BANK0_COR_ERR_SMASK),
+/*14*/	FLAG_ENTRY("PioVlLenMemBank1Cor",
+	0,
+	SEND_PIO_ERR_STATUS_PIO_VL_LEN_MEM_BANK1_COR_ERR_SMASK),
+/*15*/	FLAG_ENTRY("PioCreditRetFifoParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_CREDIT_RET_FIFO_PARITY_ERR_SMASK),
+/*16*/	FLAG_ENTRY("PioPpmcPblFifo",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_PPMC_PBL_FIFO_ERR_SMASK),
+/*17*/	FLAG_ENTRY("PioInitSmIn",
+	0,
+	SEND_PIO_ERR_STATUS_PIO_INIT_SM_IN_ERR_SMASK),
+/*18*/	FLAG_ENTRY("PioPktEvictSmOrArbSm",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_PKT_EVICT_SM_OR_ARB_SM_ERR_SMASK),
+/*19*/	FLAG_ENTRY("PioHostAddrMemUnc",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_HOST_ADDR_MEM_UNC_ERR_SMASK),
+/*20*/	FLAG_ENTRY("PioHostAddrMemCor",
+	0,
+	SEND_PIO_ERR_STATUS_PIO_HOST_ADDR_MEM_COR_ERR_SMASK),
+/*21*/	FLAG_ENTRY("PioWriteDataParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_WRITE_DATA_PARITY_ERR_SMASK),
+/*22*/	FLAG_ENTRY("PioStateMachine",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_STATE_MACHINE_ERR_SMASK),
+/*23*/	FLAG_ENTRY("PioWriteQwValidParity",
+	SEC_WRITE_DROPPED|SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_WRITE_QW_VALID_PARITY_ERR_SMASK),
+/*24*/	FLAG_ENTRY("PioBlockQwCountParity",
+	SEC_WRITE_DROPPED|SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_BLOCK_QW_COUNT_PARITY_ERR_SMASK),
+/*25*/	FLAG_ENTRY("PioVlfVlLenParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_VLF_VL_LEN_PARITY_ERR_SMASK),
+/*26*/	FLAG_ENTRY("PioVlfSopParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_VLF_SOP_PARITY_ERR_SMASK),
+/*27*/	FLAG_ENTRY("PioVlFifoParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_VL_FIFO_PARITY_ERR_SMASK),
+/*28*/	FLAG_ENTRY("PioPpmcBqcMemParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_PPMC_BQC_MEM_PARITY_ERR_SMASK),
+/*29*/	FLAG_ENTRY("PioPpmcSopLen",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_PPMC_SOP_LEN_ERR_SMASK),
+/*30-31 reserved*/
+/*32*/	FLAG_ENTRY("PioCurrentFreeCntParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_CURRENT_FREE_CNT_PARITY_ERR_SMASK),
+/*33*/	FLAG_ENTRY("PioLastReturnedCntParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_LAST_RETURNED_CNT_PARITY_ERR_SMASK),
+/*34*/	FLAG_ENTRY("PioPccSopHeadParity",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_PCC_SOP_HEAD_PARITY_ERR_SMASK),
+/*35*/	FLAG_ENTRY("PioPecSopHeadParityErr",
+	SEC_SPC_FREEZE,
+	SEND_PIO_ERR_STATUS_PIO_PEC_SOP_HEAD_PARITY_ERR_SMASK),
+/*36-63 reserved*/
+};
+
+/* TXE PIO errors that cause an SPC freeze */
+#define ALL_PIO_FREEZE_ERR \
+	(SEND_PIO_ERR_STATUS_PIO_WRITE_ADDR_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_CSR_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_SB_MEM_FIFO0_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_SB_MEM_FIFO1_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_PCC_FIFO_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_PEC_FIFO_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_SBRDCTL_CRREL_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_SBRDCTRL_CRREL_FIFO_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_PKT_EVICT_FIFO_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_SM_PKT_RESET_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_VL_LEN_MEM_BANK0_UNC_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_VL_LEN_MEM_BANK1_UNC_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_CREDIT_RET_FIFO_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_PPMC_PBL_FIFO_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_PKT_EVICT_SM_OR_ARB_SM_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_HOST_ADDR_MEM_UNC_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_WRITE_DATA_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_STATE_MACHINE_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_WRITE_QW_VALID_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_BLOCK_QW_COUNT_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_VLF_VL_LEN_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_VLF_SOP_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_VL_FIFO_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_PPMC_BQC_MEM_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_PPMC_SOP_LEN_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_CURRENT_FREE_CNT_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_LAST_RETURNED_CNT_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_PCC_SOP_HEAD_PARITY_ERR_SMASK \
+	| SEND_PIO_ERR_STATUS_PIO_PEC_SOP_HEAD_PARITY_ERR_SMASK)
+
+/*
+ * TXE SDMA Error flags
+ */
+static struct flag_table sdma_err_status_flags[] = {
+/* 0*/	FLAG_ENTRY0("SDmaRpyTagErr",
+		SEND_DMA_ERR_STATUS_SDMA_RPY_TAG_ERR_SMASK),
+/* 1*/	FLAG_ENTRY0("SDmaCsrParityErr",
+		SEND_DMA_ERR_STATUS_SDMA_CSR_PARITY_ERR_SMASK),
+/* 2*/	FLAG_ENTRY0("SDmaPcieReqTrackingUncErr",
+		SEND_DMA_ERR_STATUS_SDMA_PCIE_REQ_TRACKING_UNC_ERR_SMASK),
+/* 3*/	FLAG_ENTRY0("SDmaPcieReqTrackingCorErr",
+		SEND_DMA_ERR_STATUS_SDMA_PCIE_REQ_TRACKING_COR_ERR_SMASK),
+/*04-63 reserved*/
+};
+
+/* TXE SDMA errors that cause an SPC freeze */
+#define ALL_SDMA_FREEZE_ERR  \
+		(SEND_DMA_ERR_STATUS_SDMA_RPY_TAG_ERR_SMASK \
+		| SEND_DMA_ERR_STATUS_SDMA_CSR_PARITY_ERR_SMASK \
+		| SEND_DMA_ERR_STATUS_SDMA_PCIE_REQ_TRACKING_UNC_ERR_SMASK)
+
+/*
+ * TXE Egress Error flags
+ */
+#define SEES(text) SEND_EGRESS_ERR_STATUS_##text##_ERR_SMASK
+static struct flag_table egress_err_status_flags[] = {
+/* 0*/	FLAG_ENTRY0("TxPktIntegrityMemCorErr", SEES(TX_PKT_INTEGRITY_MEM_COR)),
+/* 1*/	FLAG_ENTRY0("TxPktIntegrityMemUncErr", SEES(TX_PKT_INTEGRITY_MEM_UNC)),
+/* 2 reserved */
+/* 3*/	FLAG_ENTRY0("TxEgressFifoUnderrunOrParityErr",
+		SEES(TX_EGRESS_FIFO_UNDERRUN_OR_PARITY)),
+/* 4*/	FLAG_ENTRY0("TxLinkdownErr", SEES(TX_LINKDOWN)),
+/* 5*/	FLAG_ENTRY0("TxIncorrectLinkStateErr", SEES(TX_INCORRECT_LINK_STATE)),
+/* 6 reserved */
+/* 7*/	FLAG_ENTRY0("TxPioLaunchIntfParityErr",
+		SEES(TX_PIO_LAUNCH_INTF_PARITY)),
+/* 8*/	FLAG_ENTRY0("TxSdmaLaunchIntfParityErr",
+		SEES(TX_SDMA_LAUNCH_INTF_PARITY)),
+/* 9-10 reserved */
+/*11*/	FLAG_ENTRY0("TxSbrdCtlStateMachineParityErr",
+		SEES(TX_SBRD_CTL_STATE_MACHINE_PARITY)),
+/*12*/	FLAG_ENTRY0("TxIllegalVLErr", SEES(TX_ILLEGAL_VL)),
+/*13*/	FLAG_ENTRY0("TxLaunchCsrParityErr", SEES(TX_LAUNCH_CSR_PARITY)),
+/*14*/	FLAG_ENTRY0("TxSbrdCtlCsrParityErr", SEES(TX_SBRD_CTL_CSR_PARITY)),
+/*15*/	FLAG_ENTRY0("TxConfigParityErr", SEES(TX_CONFIG_PARITY)),
+/*16*/	FLAG_ENTRY0("TxSdma0DisallowedPacketErr",
+		SEES(TX_SDMA0_DISALLOWED_PACKET)),
+/*17*/	FLAG_ENTRY0("TxSdma1DisallowedPacketErr",
+		SEES(TX_SDMA1_DISALLOWED_PACKET)),
+/*18*/	FLAG_ENTRY0("TxSdma2DisallowedPacketErr",
+		SEES(TX_SDMA2_DISALLOWED_PACKET)),
+/*19*/	FLAG_ENTRY0("TxSdma3DisallowedPacketErr",
+		SEES(TX_SDMA3_DISALLOWED_PACKET)),
+/*20*/	FLAG_ENTRY0("TxSdma4DisallowedPacketErr",
+		SEES(TX_SDMA4_DISALLOWED_PACKET)),
+/*21*/	FLAG_ENTRY0("TxSdma5DisallowedPacketErr",
+		SEES(TX_SDMA5_DISALLOWED_PACKET)),
+/*22*/	FLAG_ENTRY0("TxSdma6DisallowedPacketErr",
+		SEES(TX_SDMA6_DISALLOWED_PACKET)),
+/*23*/	FLAG_ENTRY0("TxSdma7DisallowedPacketErr",
+		SEES(TX_SDMA7_DISALLOWED_PACKET)),
+/*24*/	FLAG_ENTRY0("TxSdma8DisallowedPacketErr",
+		SEES(TX_SDMA8_DISALLOWED_PACKET)),
+/*25*/	FLAG_ENTRY0("TxSdma9DisallowedPacketErr",
+		SEES(TX_SDMA9_DISALLOWED_PACKET)),
+/*26*/	FLAG_ENTRY0("TxSdma10DisallowedPacketErr",
+		SEES(TX_SDMA10_DISALLOWED_PACKET)),
+/*27*/	FLAG_ENTRY0("TxSdma11DisallowedPacketErr",
+		SEES(TX_SDMA11_DISALLOWED_PACKET)),
+/*28*/	FLAG_ENTRY0("TxSdma12DisallowedPacketErr",
+		SEES(TX_SDMA12_DISALLOWED_PACKET)),
+/*29*/	FLAG_ENTRY0("TxSdma13DisallowedPacketErr",
+		SEES(TX_SDMA13_DISALLOWED_PACKET)),
+/*30*/	FLAG_ENTRY0("TxSdma14DisallowedPacketErr",
+		SEES(TX_SDMA14_DISALLOWED_PACKET)),
+/*31*/	FLAG_ENTRY0("TxSdma15DisallowedPacketErr",
+		SEES(TX_SDMA15_DISALLOWED_PACKET)),
+/*32*/	FLAG_ENTRY0("TxLaunchFifo0UncOrParityErr",
+		SEES(TX_LAUNCH_FIFO0_UNC_OR_PARITY)),
+/*33*/	FLAG_ENTRY0("TxLaunchFifo1UncOrParityErr",
+		SEES(TX_LAUNCH_FIFO1_UNC_OR_PARITY)),
+/*34*/	FLAG_ENTRY0("TxLaunchFifo2UncOrParityErr",
+		SEES(TX_LAUNCH_FIFO2_UNC_OR_PARITY)),
+/*35*/	FLAG_ENTRY0("TxLaunchFifo3UncOrParityErr",
+		SEES(TX_LAUNCH_FIFO3_UNC_OR_PARITY)),
+/*36*/	FLAG_ENTRY0("TxLaunchFifo4UncOrParityErr",
+		SEES(TX_LAUNCH_FIFO4_UNC_OR_PARITY)),
+/*37*/	FLAG_ENTRY0("TxLaunchFifo5UncOrParityErr",
+		SEES(TX_LAUNCH_FIFO5_UNC_OR_PARITY)),
+/*38*/	FLAG_ENTRY0("TxLaunchFifo6UncOrParityErr",
+		SEES(TX_LAUNCH_FIFO6_UNC_OR_PARITY)),
+/*39*/	FLAG_ENTRY0("TxLaunchFifo7UncOrParityErr",
+		SEES(TX_LAUNCH_FIFO7_UNC_OR_PARITY)),
+/*40*/	FLAG_ENTRY0("TxLaunchFifo8UncOrParityErr",
+		SEES(TX_LAUNCH_FIFO8_UNC_OR_PARITY)),
+/*41*/	FLAG_ENTRY0("TxCreditReturnParityErr", SEES(TX_CREDIT_RETURN_PARITY)),
+/*42*/	FLAG_ENTRY0("TxSbHdrUncErr", SEES(TX_SB_HDR_UNC)),
+/*43*/	FLAG_ENTRY0("TxReadSdmaMemoryUncErr", SEES(TX_READ_SDMA_MEMORY_UNC)),
+/*44*/	FLAG_ENTRY0("TxReadPioMemoryUncErr", SEES(TX_READ_PIO_MEMORY_UNC)),
+/*45*/	FLAG_ENTRY0("TxEgressFifoUncErr", SEES(TX_EGRESS_FIFO_UNC)),
+/*46*/	FLAG_ENTRY0("TxHcrcInsertionErr", SEES(TX_HCRC_INSERTION)),
+/*47*/	FLAG_ENTRY0("TxCreditReturnVLErr", SEES(TX_CREDIT_RETURN_VL)),
+/*48*/	FLAG_ENTRY0("TxLaunchFifo0CorErr", SEES(TX_LAUNCH_FIFO0_COR)),
+/*49*/	FLAG_ENTRY0("TxLaunchFifo1CorErr", SEES(TX_LAUNCH_FIFO1_COR)),
+/*50*/	FLAG_ENTRY0("TxLaunchFifo2CorErr", SEES(TX_LAUNCH_FIFO2_COR)),
+/*51*/	FLAG_ENTRY0("TxLaunchFifo3CorErr", SEES(TX_LAUNCH_FIFO3_COR)),
+/*52*/	FLAG_ENTRY0("TxLaunchFifo4CorErr", SEES(TX_LAUNCH_FIFO4_COR)),
+/*53*/	FLAG_ENTRY0("TxLaunchFifo5CorErr", SEES(TX_LAUNCH_FIFO5_COR)),
+/*54*/	FLAG_ENTRY0("TxLaunchFifo6CorErr", SEES(TX_LAUNCH_FIFO6_COR)),
+/*55*/	FLAG_ENTRY0("TxLaunchFifo7CorErr", SEES(TX_LAUNCH_FIFO7_COR)),
+/*56*/	FLAG_ENTRY0("TxLaunchFifo8CorErr", SEES(TX_LAUNCH_FIFO8_COR)),
+/*57*/	FLAG_ENTRY0("TxCreditOverrunErr", SEES(TX_CREDIT_OVERRUN)),
+/*58*/	FLAG_ENTRY0("TxSbHdrCorErr", SEES(TX_SB_HDR_COR)),
+/*59*/	FLAG_ENTRY0("TxReadSdmaMemoryCorErr", SEES(TX_READ_SDMA_MEMORY_COR)),
+/*60*/	FLAG_ENTRY0("TxReadPioMemoryCorErr", SEES(TX_READ_PIO_MEMORY_COR)),
+/*61*/	FLAG_ENTRY0("TxEgressFifoCorErr", SEES(TX_EGRESS_FIFO_COR)),
+/*62*/	FLAG_ENTRY0("TxReadSdmaMemoryCsrUncErr",
+		SEES(TX_READ_SDMA_MEMORY_CSR_UNC)),
+/*63*/	FLAG_ENTRY0("TxReadPioMemoryCsrUncErr",
+		SEES(TX_READ_PIO_MEMORY_CSR_UNC)),
+};
+
+/*
+ * TXE Egress Error Info flags
+ */
+#define SEEI(text) SEND_EGRESS_ERR_INFO_##text##_ERR_SMASK
+static struct flag_table egress_err_info_flags[] = {
+/* 0*/	FLAG_ENTRY0("Reserved", 0ull),
+/* 1*/	FLAG_ENTRY0("VLErr", SEEI(VL)),
+/* 2*/	FLAG_ENTRY0("JobKeyErr", SEEI(JOB_KEY)),
+/* 3*/	FLAG_ENTRY0("JobKeyErr", SEEI(JOB_KEY)),
+/* 4*/	FLAG_ENTRY0("PartitionKeyErr", SEEI(PARTITION_KEY)),
+/* 5*/	FLAG_ENTRY0("SLIDErr", SEEI(SLID)),
+/* 6*/	FLAG_ENTRY0("OpcodeErr", SEEI(OPCODE)),
+/* 7*/	FLAG_ENTRY0("VLMappingErr", SEEI(VL_MAPPING)),
+/* 8*/	FLAG_ENTRY0("RawErr", SEEI(RAW)),
+/* 9*/	FLAG_ENTRY0("RawIPv6Err", SEEI(RAW_IPV6)),
+/*10*/	FLAG_ENTRY0("GRHErr", SEEI(GRH)),
+/*11*/	FLAG_ENTRY0("BypassErr", SEEI(BYPASS)),
+/*12*/	FLAG_ENTRY0("KDETHPacketsErr", SEEI(KDETH_PACKETS)),
+/*13*/	FLAG_ENTRY0("NonKDETHPacketsErr", SEEI(NON_KDETH_PACKETS)),
+/*14*/	FLAG_ENTRY0("TooSmallIBPacketsErr", SEEI(TOO_SMALL_IB_PACKETS)),
+/*15*/	FLAG_ENTRY0("TooSmallBypassPacketsErr", SEEI(TOO_SMALL_BYPASS_PACKETS)),
+/*16*/	FLAG_ENTRY0("PbcTestErr", SEEI(PBC_TEST)),
+/*17*/	FLAG_ENTRY0("BadPktLenErr", SEEI(BAD_PKT_LEN)),
+/*18*/	FLAG_ENTRY0("TooLongIBPacketErr", SEEI(TOO_LONG_IB_PACKET)),
+/*19*/	FLAG_ENTRY0("TooLongBypassPacketsErr", SEEI(TOO_LONG_BYPASS_PACKETS)),
+/*20*/	FLAG_ENTRY0("PbcStaticRateControlErr", SEEI(PBC_STATIC_RATE_CONTROL)),
+/*21*/	FLAG_ENTRY0("BypassBadPktLenErr", SEEI(BAD_PKT_LEN)),
+};
+
+/* TXE Egress errors that cause an SPC freeze */
+#define ALL_TXE_EGRESS_FREEZE_ERR \
+	(SEES(TX_EGRESS_FIFO_UNDERRUN_OR_PARITY) \
+	| SEES(TX_PIO_LAUNCH_INTF_PARITY) \
+	| SEES(TX_SDMA_LAUNCH_INTF_PARITY) \
+	| SEES(TX_SBRD_CTL_STATE_MACHINE_PARITY) \
+	| SEES(TX_LAUNCH_CSR_PARITY) \
+	| SEES(TX_SBRD_CTL_CSR_PARITY) \
+	| SEES(TX_CONFIG_PARITY) \
+	| SEES(TX_LAUNCH_FIFO0_UNC_OR_PARITY) \
+	| SEES(TX_LAUNCH_FIFO1_UNC_OR_PARITY) \
+	| SEES(TX_LAUNCH_FIFO2_UNC_OR_PARITY) \
+	| SEES(TX_LAUNCH_FIFO3_UNC_OR_PARITY) \
+	| SEES(TX_LAUNCH_FIFO4_UNC_OR_PARITY) \
+	| SEES(TX_LAUNCH_FIFO5_UNC_OR_PARITY) \
+	| SEES(TX_LAUNCH_FIFO6_UNC_OR_PARITY) \
+	| SEES(TX_LAUNCH_FIFO7_UNC_OR_PARITY) \
+	| SEES(TX_LAUNCH_FIFO8_UNC_OR_PARITY) \
+	| SEES(TX_CREDIT_RETURN_PARITY))
+
+/*
+ * TXE Send error flags
+ */
+#define SES(name) SEND_ERR_STATUS_SEND_##name##_ERR_SMASK
+static struct flag_table send_err_status_flags[] = {
+/* 0*/	FLAG_ENTRY0("SDmaRpyTagErr", SES(CSR_PARITY)),
+/* 1*/	FLAG_ENTRY0("SendCsrReadBadAddrErr", SES(CSR_READ_BAD_ADDR)),
+/* 2*/	FLAG_ENTRY0("SendCsrWriteBadAddrErr", SES(CSR_WRITE_BAD_ADDR))
+};
+
+/*
+ * TXE Send Context Error flags and consequences
+ */
+static struct flag_table sc_err_status_flags[] = {
+/* 0*/	FLAG_ENTRY("InconsistentSop",
+		SEC_PACKET_DROPPED | SEC_SC_HALTED,
+		SEND_CTXT_ERR_STATUS_PIO_INCONSISTENT_SOP_ERR_SMASK),
+/* 1*/	FLAG_ENTRY("DisallowedPacket",
+		SEC_PACKET_DROPPED | SEC_SC_HALTED,
+		SEND_CTXT_ERR_STATUS_PIO_DISALLOWED_PACKET_ERR_SMASK),
+/* 2*/	FLAG_ENTRY("WriteCrossesBoundary",
+		SEC_WRITE_DROPPED | SEC_SC_HALTED,
+		SEND_CTXT_ERR_STATUS_PIO_WRITE_CROSSES_BOUNDARY_ERR_SMASK),
+/* 3*/	FLAG_ENTRY("WriteOverflow",
+		SEC_WRITE_DROPPED | SEC_SC_HALTED,
+		SEND_CTXT_ERR_STATUS_PIO_WRITE_OVERFLOW_ERR_SMASK),
+/* 4*/	FLAG_ENTRY("WriteOutOfBounds",
+		SEC_WRITE_DROPPED | SEC_SC_HALTED,
+		SEND_CTXT_ERR_STATUS_PIO_WRITE_OUT_OF_BOUNDS_ERR_SMASK),
+/* 5-63 reserved*/
+};
+
+/*
+ * RXE Receive Error flags
+ */
+#define RXES(name) RCV_ERR_STATUS_RX_##name##_ERR_SMASK
+static struct flag_table rxe_err_status_flags[] = {
+/* 0*/	FLAG_ENTRY0("RxDmaCsrCorErr", RXES(DMA_CSR_COR)),
+/* 1*/	FLAG_ENTRY0("RxDcIntfParityErr", RXES(DC_INTF_PARITY)),
+/* 2*/	FLAG_ENTRY0("RxRcvHdrUncErr", RXES(RCV_HDR_UNC)),
+/* 3*/	FLAG_ENTRY0("RxRcvHdrCorErr", RXES(RCV_HDR_COR)),
+/* 4*/	FLAG_ENTRY0("RxRcvDataUncErr", RXES(RCV_DATA_UNC)),
+/* 5*/	FLAG_ENTRY0("RxRcvDataCorErr", RXES(RCV_DATA_COR)),
+/* 6*/	FLAG_ENTRY0("RxRcvQpMapTableUncErr", RXES(RCV_QP_MAP_TABLE_UNC)),
+/* 7*/	FLAG_ENTRY0("RxRcvQpMapTableCorErr", RXES(RCV_QP_MAP_TABLE_COR)),
+/* 8*/	FLAG_ENTRY0("RxRcvCsrParityErr", RXES(RCV_CSR_PARITY)),
+/* 9*/	FLAG_ENTRY0("RxDcSopEopParityErr", RXES(DC_SOP_EOP_PARITY)),
+/*10*/	FLAG_ENTRY0("RxDmaFlagUncErr", RXES(DMA_FLAG_UNC)),
+/*11*/	FLAG_ENTRY0("RxDmaFlagCorErr", RXES(DMA_FLAG_COR)),
+/*12*/	FLAG_ENTRY0("RxRcvFsmEncodingErr", RXES(RCV_FSM_ENCODING)),
+/*13*/	FLAG_ENTRY0("RxRbufFreeListUncErr", RXES(RBUF_FREE_LIST_UNC)),
+/*14*/	FLAG_ENTRY0("RxRbufFreeListCorErr", RXES(RBUF_FREE_LIST_COR)),
+/*15*/	FLAG_ENTRY0("RxRbufLookupDesRegUncErr", RXES(RBUF_LOOKUP_DES_REG_UNC)),
+/*16*/	FLAG_ENTRY0("RxRbufLookupDesRegUncCorErr",
+		RXES(RBUF_LOOKUP_DES_REG_UNC_COR)),
+/*17*/	FLAG_ENTRY0("RxRbufLookupDesUncErr", RXES(RBUF_LOOKUP_DES_UNC)),
+/*18*/	FLAG_ENTRY0("RxRbufLookupDesCorErr", RXES(RBUF_LOOKUP_DES_COR)),
+/*19*/	FLAG_ENTRY0("RxRbufBlockListReadUncErr",
+		RXES(RBUF_BLOCK_LIST_READ_UNC)),
+/*20*/	FLAG_ENTRY0("RxRbufBlockListReadCorErr",
+		RXES(RBUF_BLOCK_LIST_READ_COR)),
+/*21*/	FLAG_ENTRY0("RxRbufCsrQHeadBufNumParityErr",
+		RXES(RBUF_CSR_QHEAD_BUF_NUM_PARITY)),
+/*22*/	FLAG_ENTRY0("RxRbufCsrQEntCntParityErr",
+		RXES(RBUF_CSR_QENT_CNT_PARITY)),
+/*23*/	FLAG_ENTRY0("RxRbufCsrQNextBufParityErr",
+		RXES(RBUF_CSR_QNEXT_BUF_PARITY)),
+/*24*/	FLAG_ENTRY0("RxRbufCsrQVldBitParityErr",
+		RXES(RBUF_CSR_QVLD_BIT_PARITY)),
+/*25*/	FLAG_ENTRY0("RxRbufCsrQHdPtrParityErr", RXES(RBUF_CSR_QHD_PTR_PARITY)),
+/*26*/	FLAG_ENTRY0("RxRbufCsrQTlPtrParityErr", RXES(RBUF_CSR_QTL_PTR_PARITY)),
+/*27*/	FLAG_ENTRY0("RxRbufCsrQNumOfPktParityErr",
+		RXES(RBUF_CSR_QNUM_OF_PKT_PARITY)),
+/*28*/	FLAG_ENTRY0("RxRbufCsrQEOPDWParityErr", RXES(RBUF_CSR_QEOPDW_PARITY)),
+/*29*/	FLAG_ENTRY0("RxRbufCtxIdParityErr", RXES(RBUF_CTX_ID_PARITY)),
+/*30*/	FLAG_ENTRY0("RxRBufBadLookupErr", RXES(RBUF_BAD_LOOKUP)),
+/*31*/	FLAG_ENTRY0("RxRbufFullErr", RXES(RBUF_FULL)),
+/*32*/	FLAG_ENTRY0("RxRbufEmptyErr", RXES(RBUF_EMPTY)),
+/*33*/	FLAG_ENTRY0("RxRbufFlRdAddrParityErr", RXES(RBUF_FL_RD_ADDR_PARITY)),
+/*34*/	FLAG_ENTRY0("RxRbufFlWrAddrParityErr", RXES(RBUF_FL_WR_ADDR_PARITY)),
+/*35*/	FLAG_ENTRY0("RxRbufFlInitdoneParityErr",
+		RXES(RBUF_FL_INITDONE_PARITY)),
+/*36*/	FLAG_ENTRY0("RxRbufFlInitWrAddrParityErr",
+		RXES(RBUF_FL_INIT_WR_ADDR_PARITY)),
+/*37*/	FLAG_ENTRY0("RxRbufNextFreeBufUncErr", RXES(RBUF_NEXT_FREE_BUF_UNC)),
+/*38*/	FLAG_ENTRY0("RxRbufNextFreeBufCorErr", RXES(RBUF_NEXT_FREE_BUF_COR)),
+/*39*/	FLAG_ENTRY0("RxLookupDesPart1UncErr", RXES(LOOKUP_DES_PART1_UNC)),
+/*40*/	FLAG_ENTRY0("RxLookupDesPart1UncCorErr",
+		RXES(LOOKUP_DES_PART1_UNC_COR)),
+/*41*/	FLAG_ENTRY0("RxLookupDesPart2ParityErr",
+		RXES(LOOKUP_DES_PART2_PARITY)),
+/*42*/	FLAG_ENTRY0("RxLookupRcvArrayUncErr", RXES(LOOKUP_RCV_ARRAY_UNC)),
+/*43*/	FLAG_ENTRY0("RxLookupRcvArrayCorErr", RXES(LOOKUP_RCV_ARRAY_COR)),
+/*44*/	FLAG_ENTRY0("RxLookupCsrParityErr", RXES(LOOKUP_CSR_PARITY)),
+/*45*/	FLAG_ENTRY0("RxHqIntrCsrParityErr", RXES(HQ_INTR_CSR_PARITY)),
+/*46*/	FLAG_ENTRY0("RxHqIntrFsmErr", RXES(HQ_INTR_FSM)),
+/*47*/	FLAG_ENTRY0("RxRbufDescPart1UncErr", RXES(RBUF_DESC_PART1_UNC)),
+/*48*/	FLAG_ENTRY0("RxRbufDescPart1CorErr", RXES(RBUF_DESC_PART1_COR)),
+/*49*/	FLAG_ENTRY0("RxRbufDescPart2UncErr", RXES(RBUF_DESC_PART2_UNC)),
+/*50*/	FLAG_ENTRY0("RxRbufDescPart2CorErr", RXES(RBUF_DESC_PART2_COR)),
+/*51*/	FLAG_ENTRY0("RxDmaHdrFifoRdUncErr", RXES(DMA_HDR_FIFO_RD_UNC)),
+/*52*/	FLAG_ENTRY0("RxDmaHdrFifoRdCorErr", RXES(DMA_HDR_FIFO_RD_COR)),
+/*53*/	FLAG_ENTRY0("RxDmaDataFifoRdUncErr", RXES(DMA_DATA_FIFO_RD_UNC)),
+/*54*/	FLAG_ENTRY0("RxDmaDataFifoRdCorErr", RXES(DMA_DATA_FIFO_RD_COR)),
+/*55*/	FLAG_ENTRY0("RxRbufDataUncErr", RXES(RBUF_DATA_UNC)),
+/*56*/	FLAG_ENTRY0("RxRbufDataCorErr", RXES(RBUF_DATA_COR)),
+/*57*/	FLAG_ENTRY0("RxDmaCsrParityErr", RXES(DMA_CSR_PARITY)),
+/*58*/	FLAG_ENTRY0("RxDmaEqFsmEncodingErr", RXES(DMA_EQ_FSM_ENCODING)),
+/*59*/	FLAG_ENTRY0("RxDmaDqFsmEncodingErr", RXES(DMA_DQ_FSM_ENCODING)),
+/*60*/	FLAG_ENTRY0("RxDmaCsrUncErr", RXES(DMA_CSR_UNC)),
+/*61*/	FLAG_ENTRY0("RxCsrReadBadAddrErr", RXES(CSR_READ_BAD_ADDR)),
+/*62*/	FLAG_ENTRY0("RxCsrWriteBadAddrErr", RXES(CSR_WRITE_BAD_ADDR)),
+/*63*/	FLAG_ENTRY0("RxCsrParityErr", RXES(CSR_PARITY))
+};
+
+/* RXE errors that will trigger an SPC freeze */
+#define ALL_RXE_FREEZE_ERR  \
+	(RCV_ERR_STATUS_RX_RCV_QP_MAP_TABLE_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RCV_CSR_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_DMA_FLAG_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RCV_FSM_ENCODING_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_FREE_LIST_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_LOOKUP_DES_REG_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_LOOKUP_DES_REG_UNC_COR_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_LOOKUP_DES_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_BLOCK_LIST_READ_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_CSR_QHEAD_BUF_NUM_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_CSR_QENT_CNT_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_CSR_QNEXT_BUF_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_CSR_QVLD_BIT_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_CSR_QHD_PTR_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_CSR_QTL_PTR_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_CSR_QNUM_OF_PKT_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_CSR_QEOPDW_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_CTX_ID_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_BAD_LOOKUP_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_FULL_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_EMPTY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_FL_RD_ADDR_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_FL_WR_ADDR_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_FL_INITDONE_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_FL_INIT_WR_ADDR_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_NEXT_FREE_BUF_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_LOOKUP_DES_PART1_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_LOOKUP_DES_PART1_UNC_COR_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_LOOKUP_DES_PART2_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_LOOKUP_RCV_ARRAY_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_LOOKUP_CSR_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_HQ_INTR_CSR_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_HQ_INTR_FSM_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_DESC_PART1_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_DESC_PART1_COR_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_DESC_PART2_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_DMA_HDR_FIFO_RD_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_DMA_DATA_FIFO_RD_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_RBUF_DATA_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_DMA_CSR_PARITY_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_DMA_EQ_FSM_ENCODING_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_DMA_DQ_FSM_ENCODING_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_DMA_CSR_UNC_ERR_SMASK \
+	| RCV_ERR_STATUS_RX_CSR_PARITY_ERR_SMASK)
+
+#define RXE_FREEZE_ABORT_MASK \
+	(RCV_ERR_STATUS_RX_DMA_CSR_UNC_ERR_SMASK | \
+	RCV_ERR_STATUS_RX_DMA_HDR_FIFO_RD_UNC_ERR_SMASK | \
+	RCV_ERR_STATUS_RX_DMA_DATA_FIFO_RD_UNC_ERR_SMASK)
+
+/*
+ * DCC Error Flags
+ */
+#define DCCE(name) DCC_ERR_FLG_##name##_SMASK
+static struct flag_table dcc_err_flags[] = {
+	FLAG_ENTRY0("bad_l2_err", DCCE(BAD_L2_ERR)),
+	FLAG_ENTRY0("bad_sc_err", DCCE(BAD_SC_ERR)),
+	FLAG_ENTRY0("bad_mid_tail_err", DCCE(BAD_MID_TAIL_ERR)),
+	FLAG_ENTRY0("bad_preemption_err", DCCE(BAD_PREEMPTION_ERR)),
+	FLAG_ENTRY0("preemption_err", DCCE(PREEMPTION_ERR)),
+	FLAG_ENTRY0("preemptionvl15_err", DCCE(PREEMPTIONVL15_ERR)),
+	FLAG_ENTRY0("bad_vl_marker_err", DCCE(BAD_VL_MARKER_ERR)),
+	FLAG_ENTRY0("bad_dlid_target_err", DCCE(BAD_DLID_TARGET_ERR)),
+	FLAG_ENTRY0("bad_lver_err", DCCE(BAD_LVER_ERR)),
+	FLAG_ENTRY0("uncorrectable_err", DCCE(UNCORRECTABLE_ERR)),
+	FLAG_ENTRY0("bad_crdt_ack_err", DCCE(BAD_CRDT_ACK_ERR)),
+	FLAG_ENTRY0("unsup_pkt_type", DCCE(UNSUP_PKT_TYPE)),
+	FLAG_ENTRY0("bad_ctrl_flit_err", DCCE(BAD_CTRL_FLIT_ERR)),
+	FLAG_ENTRY0("event_cntr_parity_err", DCCE(EVENT_CNTR_PARITY_ERR)),
+	FLAG_ENTRY0("event_cntr_rollover_err", DCCE(EVENT_CNTR_ROLLOVER_ERR)),
+	FLAG_ENTRY0("link_err", DCCE(LINK_ERR)),
+	FLAG_ENTRY0("misc_cntr_rollover_err", DCCE(MISC_CNTR_ROLLOVER_ERR)),
+	FLAG_ENTRY0("bad_ctrl_dist_err", DCCE(BAD_CTRL_DIST_ERR)),
+	FLAG_ENTRY0("bad_tail_dist_err", DCCE(BAD_TAIL_DIST_ERR)),
+	FLAG_ENTRY0("bad_head_dist_err", DCCE(BAD_HEAD_DIST_ERR)),
+	FLAG_ENTRY0("nonvl15_state_err", DCCE(NONVL15_STATE_ERR)),
+	FLAG_ENTRY0("vl15_multi_err", DCCE(VL15_MULTI_ERR)),
+	FLAG_ENTRY0("bad_pkt_length_err", DCCE(BAD_PKT_LENGTH_ERR)),
+	FLAG_ENTRY0("unsup_vl_err", DCCE(UNSUP_VL_ERR)),
+	FLAG_ENTRY0("perm_nvl15_err", DCCE(PERM_NVL15_ERR)),
+	FLAG_ENTRY0("slid_zero_err", DCCE(SLID_ZERO_ERR)),
+	FLAG_ENTRY0("dlid_zero_err", DCCE(DLID_ZERO_ERR)),
+	FLAG_ENTRY0("length_mtu_err", DCCE(LENGTH_MTU_ERR)),
+	FLAG_ENTRY0("rx_early_drop_err", DCCE(RX_EARLY_DROP_ERR)),
+	FLAG_ENTRY0("late_short_err", DCCE(LATE_SHORT_ERR)),
+	FLAG_ENTRY0("late_long_err", DCCE(LATE_LONG_ERR)),
+	FLAG_ENTRY0("late_ebp_err", DCCE(LATE_EBP_ERR)),
+	FLAG_ENTRY0("fpe_tx_fifo_ovflw_err", DCCE(FPE_TX_FIFO_OVFLW_ERR)),
+	FLAG_ENTRY0("fpe_tx_fifo_unflw_err", DCCE(FPE_TX_FIFO_UNFLW_ERR)),
+	FLAG_ENTRY0("csr_access_blocked_host", DCCE(CSR_ACCESS_BLOCKED_HOST)),
+	FLAG_ENTRY0("csr_access_blocked_uc", DCCE(CSR_ACCESS_BLOCKED_UC)),
+	FLAG_ENTRY0("tx_ctrl_parity_err", DCCE(TX_CTRL_PARITY_ERR)),
+	FLAG_ENTRY0("tx_ctrl_parity_mbe_err", DCCE(TX_CTRL_PARITY_MBE_ERR)),
+	FLAG_ENTRY0("tx_sc_parity_err", DCCE(TX_SC_PARITY_ERR)),
+	FLAG_ENTRY0("rx_ctrl_parity_mbe_err", DCCE(RX_CTRL_PARITY_MBE_ERR)),
+	FLAG_ENTRY0("csr_parity_err", DCCE(CSR_PARITY_ERR)),
+	FLAG_ENTRY0("csr_inval_addr", DCCE(CSR_INVAL_ADDR)),
+	FLAG_ENTRY0("tx_byte_shft_parity_err", DCCE(TX_BYTE_SHFT_PARITY_ERR)),
+	FLAG_ENTRY0("rx_byte_shft_parity_err", DCCE(RX_BYTE_SHFT_PARITY_ERR)),
+	FLAG_ENTRY0("fmconfig_err", DCCE(FMCONFIG_ERR)),
+	FLAG_ENTRY0("rcvport_err", DCCE(RCVPORT_ERR)),
+};
+
+/*
+ * LCB error flags
+ */
+#define LCBE(name) DC_LCB_ERR_FLG_##name##_SMASK
+static struct flag_table lcb_err_flags[] = {
+/* 0*/	FLAG_ENTRY0("CSR_PARITY_ERR", LCBE(CSR_PARITY_ERR)),
+/* 1*/	FLAG_ENTRY0("INVALID_CSR_ADDR", LCBE(INVALID_CSR_ADDR)),
+/* 2*/	FLAG_ENTRY0("RST_FOR_FAILED_DESKEW", LCBE(RST_FOR_FAILED_DESKEW)),
+/* 3*/	FLAG_ENTRY0("ALL_LNS_FAILED_REINIT_TEST",
+		LCBE(ALL_LNS_FAILED_REINIT_TEST)),
+/* 4*/	FLAG_ENTRY0("LOST_REINIT_STALL_OR_TOS", LCBE(LOST_REINIT_STALL_OR_TOS)),
+/* 5*/	FLAG_ENTRY0("TX_LESS_THAN_FOUR_LNS", LCBE(TX_LESS_THAN_FOUR_LNS)),
+/* 6*/	FLAG_ENTRY0("RX_LESS_THAN_FOUR_LNS", LCBE(RX_LESS_THAN_FOUR_LNS)),
+/* 7*/	FLAG_ENTRY0("SEQ_CRC_ERR", LCBE(SEQ_CRC_ERR)),
+/* 8*/	FLAG_ENTRY0("REINIT_FROM_PEER", LCBE(REINIT_FROM_PEER)),
+/* 9*/	FLAG_ENTRY0("REINIT_FOR_LN_DEGRADE", LCBE(REINIT_FOR_LN_DEGRADE)),
+/*10*/	FLAG_ENTRY0("CRC_ERR_CNT_HIT_LIMIT", LCBE(CRC_ERR_CNT_HIT_LIMIT)),
+/*11*/	FLAG_ENTRY0("RCLK_STOPPED", LCBE(RCLK_STOPPED)),
+/*12*/	FLAG_ENTRY0("UNEXPECTED_REPLAY_MARKER", LCBE(UNEXPECTED_REPLAY_MARKER)),
+/*13*/	FLAG_ENTRY0("UNEXPECTED_ROUND_TRIP_MARKER",
+		LCBE(UNEXPECTED_ROUND_TRIP_MARKER)),
+/*14*/	FLAG_ENTRY0("ILLEGAL_NULL_LTP", LCBE(ILLEGAL_NULL_LTP)),
+/*15*/	FLAG_ENTRY0("ILLEGAL_FLIT_ENCODING", LCBE(ILLEGAL_FLIT_ENCODING)),
+/*16*/	FLAG_ENTRY0("FLIT_INPUT_BUF_OFLW", LCBE(FLIT_INPUT_BUF_OFLW)),
+/*17*/	FLAG_ENTRY0("VL_ACK_INPUT_BUF_OFLW", LCBE(VL_ACK_INPUT_BUF_OFLW)),
+/*18*/	FLAG_ENTRY0("VL_ACK_INPUT_PARITY_ERR", LCBE(VL_ACK_INPUT_PARITY_ERR)),
+/*19*/	FLAG_ENTRY0("VL_ACK_INPUT_WRONG_CRC_MODE",
+		LCBE(VL_ACK_INPUT_WRONG_CRC_MODE)),
+/*20*/	FLAG_ENTRY0("FLIT_INPUT_BUF_MBE", LCBE(FLIT_INPUT_BUF_MBE)),
+/*21*/	FLAG_ENTRY0("FLIT_INPUT_BUF_SBE", LCBE(FLIT_INPUT_BUF_SBE)),
+/*22*/	FLAG_ENTRY0("REPLAY_BUF_MBE", LCBE(REPLAY_BUF_MBE)),
+/*23*/	FLAG_ENTRY0("REPLAY_BUF_SBE", LCBE(REPLAY_BUF_SBE)),
+/*24*/	FLAG_ENTRY0("CREDIT_RETURN_FLIT_MBE", LCBE(CREDIT_RETURN_FLIT_MBE)),
+/*25*/	FLAG_ENTRY0("RST_FOR_LINK_TIMEOUT", LCBE(RST_FOR_LINK_TIMEOUT)),
+/*26*/	FLAG_ENTRY0("RST_FOR_INCOMPLT_RND_TRIP",
+		LCBE(RST_FOR_INCOMPLT_RND_TRIP)),
+/*27*/	FLAG_ENTRY0("HOLD_REINIT", LCBE(HOLD_REINIT)),
+/*28*/	FLAG_ENTRY0("NEG_EDGE_LINK_TRANSFER_ACTIVE",
+		LCBE(NEG_EDGE_LINK_TRANSFER_ACTIVE)),
+/*29*/	FLAG_ENTRY0("REDUNDANT_FLIT_PARITY_ERR",
+		LCBE(REDUNDANT_FLIT_PARITY_ERR))
+};
+
+/*
+ * DC8051 Error Flags
+ */
+#define D8E(name) DC_DC8051_ERR_FLG_##name##_SMASK
+static struct flag_table dc8051_err_flags[] = {
+	FLAG_ENTRY0("SET_BY_8051", D8E(SET_BY_8051)),
+	FLAG_ENTRY0("LOST_8051_HEART_BEAT", D8E(LOST_8051_HEART_BEAT)),
+	FLAG_ENTRY0("CRAM_MBE", D8E(CRAM_MBE)),
+	FLAG_ENTRY0("CRAM_SBE", D8E(CRAM_SBE)),
+	FLAG_ENTRY0("DRAM_MBE", D8E(DRAM_MBE)),
+	FLAG_ENTRY0("DRAM_SBE", D8E(DRAM_SBE)),
+	FLAG_ENTRY0("IRAM_MBE", D8E(IRAM_MBE)),
+	FLAG_ENTRY0("IRAM_SBE", D8E(IRAM_SBE)),
+	FLAG_ENTRY0("UNMATCHED_SECURE_MSG_ACROSS_BCC_LANES",
+		D8E(UNMATCHED_SECURE_MSG_ACROSS_BCC_LANES)),
+	FLAG_ENTRY0("INVALID_CSR_ADDR", D8E(INVALID_CSR_ADDR)),
+};
+
+/*
+ * DC8051 Information Error flags
+ *
+ * Flags in DC8051_DBG_ERR_INFO_SET_BY_8051.ERROR field.
+ */
+static struct flag_table dc8051_info_err_flags[] = {
+	FLAG_ENTRY0("Spico ROM check failed",  SPICO_ROM_FAILED),
+	FLAG_ENTRY0("Unknown frame received",  UNKNOWN_FRAME),
+	FLAG_ENTRY0("Target BER not met",      TARGET_BER_NOT_MET),
+	FLAG_ENTRY0("Serdes internal loopback failure",
+					FAILED_SERDES_INTERNAL_LOOPBACK),
+	FLAG_ENTRY0("Failed SerDes init",      FAILED_SERDES_INIT),
+	FLAG_ENTRY0("Failed LNI(Polling)",     FAILED_LNI_POLLING),
+	FLAG_ENTRY0("Failed LNI(Debounce)",    FAILED_LNI_DEBOUNCE),
+	FLAG_ENTRY0("Failed LNI(EstbComm)",    FAILED_LNI_ESTBCOMM),
+	FLAG_ENTRY0("Failed LNI(OptEq)",       FAILED_LNI_OPTEQ),
+	FLAG_ENTRY0("Failed LNI(VerifyCap_1)", FAILED_LNI_VERIFY_CAP1),
+	FLAG_ENTRY0("Failed LNI(VerifyCap_2)", FAILED_LNI_VERIFY_CAP2),
+	FLAG_ENTRY0("Failed LNI(ConfigLT)",    FAILED_LNI_CONFIGLT)
+};
+
+/*
+ * DC8051 Information Host Information flags
+ *
+ * Flags in DC8051_DBG_ERR_INFO_SET_BY_8051.HOST_MSG field.
+ */
+static struct flag_table dc8051_info_host_msg_flags[] = {
+	FLAG_ENTRY0("Host request done", 0x0001),
+	FLAG_ENTRY0("BC SMA message", 0x0002),
+	FLAG_ENTRY0("BC PWR_MGM message", 0x0004),
+	FLAG_ENTRY0("BC Unknown message (BCC)", 0x0008),
+	FLAG_ENTRY0("BC Unknown message (LCB)", 0x0010),
+	FLAG_ENTRY0("External device config request", 0x0020),
+	FLAG_ENTRY0("VerifyCap all frames received", 0x0040),
+	FLAG_ENTRY0("LinkUp achieved", 0x0080),
+	FLAG_ENTRY0("Link going down", 0x0100),
+};
+
+
+static u32 encoded_size(u32 size);
+static u32 chip_to_opa_lstate(struct hfi1_devdata *dd, u32 chip_lstate);
+static int set_physical_link_state(struct hfi1_devdata *dd, u64 state);
+static void read_vc_remote_phy(struct hfi1_devdata *dd, u8 *power_management,
+			       u8 *continuous);
+static void read_vc_remote_fabric(struct hfi1_devdata *dd, u8 *vau, u8 *z,
+				  u8 *vcu, u16 *vl15buf, u8 *crc_sizes);
+static void read_vc_remote_link_width(struct hfi1_devdata *dd,
+				      u8 *remote_tx_rate, u16 *link_widths);
+static void read_vc_local_link_width(struct hfi1_devdata *dd, u8 *misc_bits,
+				     u8 *flag_bits, u16 *link_widths);
+static void read_remote_device_id(struct hfi1_devdata *dd, u16 *device_id,
+				  u8 *device_rev);
+static void read_mgmt_allowed(struct hfi1_devdata *dd, u8 *mgmt_allowed);
+static void read_local_lni(struct hfi1_devdata *dd, u8 *enable_lane_rx);
+static int read_tx_settings(struct hfi1_devdata *dd, u8 *enable_lane_tx,
+			    u8 *tx_polarity_inversion,
+			    u8 *rx_polarity_inversion, u8 *max_rate);
+static void handle_sdma_eng_err(struct hfi1_devdata *dd,
+				unsigned int context, u64 err_status);
+static void handle_qsfp_int(struct hfi1_devdata *dd, u32 source, u64 reg);
+static void handle_dcc_err(struct hfi1_devdata *dd,
+			   unsigned int context, u64 err_status);
+static void handle_lcb_err(struct hfi1_devdata *dd,
+			   unsigned int context, u64 err_status);
+static void handle_8051_interrupt(struct hfi1_devdata *dd, u32 unused, u64 reg);
+static void handle_cce_err(struct hfi1_devdata *dd, u32 unused, u64 reg);
+static void handle_rxe_err(struct hfi1_devdata *dd, u32 unused, u64 reg);
+static void handle_misc_err(struct hfi1_devdata *dd, u32 unused, u64 reg);
+static void handle_pio_err(struct hfi1_devdata *dd, u32 unused, u64 reg);
+static void handle_sdma_err(struct hfi1_devdata *dd, u32 unused, u64 reg);
+static void handle_egress_err(struct hfi1_devdata *dd, u32 unused, u64 reg);
+static void handle_txe_err(struct hfi1_devdata *dd, u32 unused, u64 reg);
+static void set_partition_keys(struct hfi1_pportdata *);
+static const char *link_state_name(u32 state);
+static const char *link_state_reason_name(struct hfi1_pportdata *ppd,
+					  u32 state);
+static int do_8051_command(struct hfi1_devdata *dd, u32 type, u64 in_data,
+			   u64 *out_data);
+static int read_idle_sma(struct hfi1_devdata *dd, u64 *data);
+static int thermal_init(struct hfi1_devdata *dd);
+
+static int wait_logical_linkstate(struct hfi1_pportdata *ppd, u32 state,
+				  int msecs);
+static void read_planned_down_reason_code(struct hfi1_devdata *dd, u8 *pdrrc);
+static void handle_temp_err(struct hfi1_devdata *);
+static void dc_shutdown(struct hfi1_devdata *);
+static void dc_start(struct hfi1_devdata *);
+
+/*
+ * Error interrupt table entry.  This is used as input to the interrupt
+ * "clear down" routine used for all second tier error interrupt register.
+ * Second tier interrupt registers have a single bit representing them
+ * in the top-level CceIntStatus.
+ */
+struct err_reg_info {
+	u32 status;		/* status CSR offset */
+	u32 clear;		/* clear CSR offset */
+	u32 mask;		/* mask CSR offset */
+	void (*handler)(struct hfi1_devdata *dd, u32 source, u64 reg);
+	const char *desc;
+};
+
+#define NUM_MISC_ERRS (IS_GENERAL_ERR_END - IS_GENERAL_ERR_START)
+#define NUM_DC_ERRS (IS_DC_END - IS_DC_START)
+#define NUM_VARIOUS (IS_VARIOUS_END - IS_VARIOUS_START)
+
+/*
+ * Helpers for building HFI and DC error interrupt table entries.  Different
+ * helpers are needed because of inconsistent register names.
+ */
+#define EE(reg, handler, desc) \
+	{ reg##_STATUS, reg##_CLEAR, reg##_MASK, \
+		handler, desc }
+#define DC_EE1(reg, handler, desc) \
+	{ reg##_FLG, reg##_FLG_CLR, reg##_FLG_EN, handler, desc }
+#define DC_EE2(reg, handler, desc) \
+	{ reg##_FLG, reg##_CLR, reg##_EN, handler, desc }
+
+/*
+ * Table of the "misc" grouping of error interrupts.  Each entry refers to
+ * another register containing more information.
+ */
+static const struct err_reg_info misc_errs[NUM_MISC_ERRS] = {
+/* 0*/	EE(CCE_ERR,		handle_cce_err,    "CceErr"),
+/* 1*/	EE(RCV_ERR,		handle_rxe_err,    "RxeErr"),
+/* 2*/	EE(MISC_ERR,	handle_misc_err,   "MiscErr"),
+/* 3*/	{ 0, 0, 0, NULL }, /* reserved */
+/* 4*/	EE(SEND_PIO_ERR,    handle_pio_err,    "PioErr"),
+/* 5*/	EE(SEND_DMA_ERR,    handle_sdma_err,   "SDmaErr"),
+/* 6*/	EE(SEND_EGRESS_ERR, handle_egress_err, "EgressErr"),
+/* 7*/	EE(SEND_ERR,	handle_txe_err,    "TxeErr")
+	/* the rest are reserved */
+};
+
+/*
+ * Index into the Various section of the interrupt sources
+ * corresponding to the Critical Temperature interrupt.
+ */
+#define TCRIT_INT_SOURCE 4
+
+/*
+ * SDMA error interrupt entry - refers to another register containing more
+ * information.
+ */
+static const struct err_reg_info sdma_eng_err =
+	EE(SEND_DMA_ENG_ERR, handle_sdma_eng_err, "SDmaEngErr");
+
+static const struct err_reg_info various_err[NUM_VARIOUS] = {
+/* 0*/	{ 0, 0, 0, NULL }, /* PbcInt */
+/* 1*/	{ 0, 0, 0, NULL }, /* GpioAssertInt */
+/* 2*/	EE(ASIC_QSFP1,	handle_qsfp_int,	"QSFP1"),
+/* 3*/	EE(ASIC_QSFP2,	handle_qsfp_int,	"QSFP2"),
+/* 4*/	{ 0, 0, 0, NULL }, /* TCritInt */
+	/* rest are reserved */
+};
+
+/*
+ * The DC encoding of mtu_cap for 10K MTU in the DCC_CFG_PORT_CONFIG
+ * register can not be derived from the MTU value because 10K is not
+ * a power of 2. Therefore, we need a constant. Everything else can
+ * be calculated.
+ */
+#define DCC_CFG_PORT_MTU_CAP_10240 7
+
+/*
+ * Table of the DC grouping of error interrupts.  Each entry refers to
+ * another register containing more information.
+ */
+static const struct err_reg_info dc_errs[NUM_DC_ERRS] = {
+/* 0*/	DC_EE1(DCC_ERR,		handle_dcc_err,	       "DCC Err"),
+/* 1*/	DC_EE2(DC_LCB_ERR,	handle_lcb_err,	       "LCB Err"),
+/* 2*/	DC_EE2(DC_DC8051_ERR,	handle_8051_interrupt, "DC8051 Interrupt"),
+/* 3*/	/* dc_lbm_int - special, see is_dc_int() */
+	/* the rest are reserved */
+};
+
+struct cntr_entry {
+	/*
+	 * counter name
+	 */
+	char *name;
+
+	/*
+	 * csr to read for name (if applicable)
+	 */
+	u64 csr;
+
+	/*
+	 * offset into dd or ppd to store the counter's value
+	 */
+	int offset;
+
+	/*
+	 * flags
+	 */
+	u8 flags;
+
+	/*
+	 * accessor for stat element, context either dd or ppd
+	 */
+	u64 (*rw_cntr)(const struct cntr_entry *,
+			       void *context,
+			       int vl,
+			       int mode,
+			       u64 data);
+};
+
+#define C_RCV_HDR_OVF_FIRST C_RCV_HDR_OVF_0
+#define C_RCV_HDR_OVF_LAST C_RCV_HDR_OVF_159
+
+#define CNTR_ELEM(name, csr, offset, flags, accessor) \
+{ \
+	name, \
+	csr, \
+	offset, \
+	flags, \
+	accessor \
+}
+
+/* 32bit RXE */
+#define RXE32_PORT_CNTR_ELEM(name, counter, flags) \
+CNTR_ELEM(#name, \
+	  (counter * 8 + RCV_COUNTER_ARRAY32), \
+	  0, flags | CNTR_32BIT, \
+	  port_access_u32_csr)
+
+#define RXE32_DEV_CNTR_ELEM(name, counter, flags) \
+CNTR_ELEM(#name, \
+	  (counter * 8 + RCV_COUNTER_ARRAY32), \
+	  0, flags | CNTR_32BIT, \
+	  dev_access_u32_csr)
+
+/* 64bit RXE */
+#define RXE64_PORT_CNTR_ELEM(name, counter, flags) \
+CNTR_ELEM(#name, \
+	  (counter * 8 + RCV_COUNTER_ARRAY64), \
+	  0, flags, \
+	  port_access_u64_csr)
+
+#define RXE64_DEV_CNTR_ELEM(name, counter, flags) \
+CNTR_ELEM(#name, \
+	  (counter * 8 + RCV_COUNTER_ARRAY64), \
+	  0, flags, \
+	  dev_access_u64_csr)
+
+#define OVR_LBL(ctx) C_RCV_HDR_OVF_ ## ctx
+#define OVR_ELM(ctx) \
+CNTR_ELEM("RcvHdrOvr" #ctx, \
+	  (RCV_HDR_OVFL_CNT + ctx*0x100), \
+	  0, CNTR_NORMAL, port_access_u64_csr)
+
+/* 32bit TXE */
+#define TXE32_PORT_CNTR_ELEM(name, counter, flags) \
+CNTR_ELEM(#name, \
+	  (counter * 8 + SEND_COUNTER_ARRAY32), \
+	  0, flags | CNTR_32BIT, \
+	  port_access_u32_csr)
+
+/* 64bit TXE */
+#define TXE64_PORT_CNTR_ELEM(name, counter, flags) \
+CNTR_ELEM(#name, \
+	  (counter * 8 + SEND_COUNTER_ARRAY64), \
+	  0, flags, \
+	  port_access_u64_csr)
+
+# define TX64_DEV_CNTR_ELEM(name, counter, flags) \
+CNTR_ELEM(#name,\
+	  counter * 8 + SEND_COUNTER_ARRAY64, \
+	  0, \
+	  flags, \
+	  dev_access_u64_csr)
+
+/* CCE */
+#define CCE_PERF_DEV_CNTR_ELEM(name, counter, flags) \
+CNTR_ELEM(#name, \
+	  (counter * 8 + CCE_COUNTER_ARRAY32), \
+	  0, flags | CNTR_32BIT, \
+	  dev_access_u32_csr)
+
+#define CCE_INT_DEV_CNTR_ELEM(name, counter, flags) \
+CNTR_ELEM(#name, \
+	  (counter * 8 + CCE_INT_COUNTER_ARRAY32), \
+	  0, flags | CNTR_32BIT, \
+	  dev_access_u32_csr)
+
+/* DC */
+#define DC_PERF_CNTR(name, counter, flags) \
+CNTR_ELEM(#name, \
+	  counter, \
+	  0, \
+	  flags, \
+	  dev_access_u64_csr)
+
+#define DC_PERF_CNTR_LCB(name, counter, flags) \
+CNTR_ELEM(#name, \
+	  counter, \
+	  0, \
+	  flags, \
+	  dc_access_lcb_cntr)
+
+/* ibp counters */
+#define SW_IBP_CNTR(name, cntr) \
+CNTR_ELEM(#name, \
+	  0, \
+	  0, \
+	  CNTR_SYNTH, \
+	  access_ibp_##cntr)
+
+u64 read_csr(const struct hfi1_devdata *dd, u32 offset)
+{
+	u64 val;
+
+	if (dd->flags & HFI1_PRESENT) {
+		val = readq((void __iomem *)dd->kregbase + offset);
+		return val;
+	}
+	return -1;
+}
+
+void write_csr(const struct hfi1_devdata *dd, u32 offset, u64 value)
+{
+	if (dd->flags & HFI1_PRESENT)
+		writeq(value, (void __iomem *)dd->kregbase + offset);
+}
+
+void __iomem *get_csr_addr(
+	struct hfi1_devdata *dd,
+	u32 offset)
+{
+	return (void __iomem *)dd->kregbase + offset;
+}
+
+static inline u64 read_write_csr(const struct hfi1_devdata *dd, u32 csr,
+				 int mode, u64 value)
+{
+	u64 ret;
+
+
+	if (mode == CNTR_MODE_R) {
+		ret = read_csr(dd, csr);
+	} else if (mode == CNTR_MODE_W) {
+		write_csr(dd, csr, value);
+		ret = value;
+	} else {
+		dd_dev_err(dd, "Invalid cntr register access mode");
+		return 0;
+	}
+
+	hfi1_cdbg(CNTR, "csr 0x%x val 0x%llx mode %d", csr, ret, mode);
+	return ret;
+}
+
+/* Dev Access */
+static u64 dev_access_u32_csr(const struct cntr_entry *entry,
+			    void *context, int vl, int mode, u64 data)
+{
+	struct hfi1_devdata *dd = (struct hfi1_devdata *)context;
+
+	if (vl != CNTR_INVALID_VL)
+		return 0;
+	return read_write_csr(dd, entry->csr, mode, data);
+}
+
+static u64 dev_access_u64_csr(const struct cntr_entry *entry, void *context,
+			    int vl, int mode, u64 data)
+{
+	struct hfi1_devdata *dd = (struct hfi1_devdata *)context;
+
+	u64 val = 0;
+	u64 csr = entry->csr;
+
+	if (entry->flags & CNTR_VL) {
+		if (vl == CNTR_INVALID_VL)
+			return 0;
+		csr += 8 * vl;
+	} else {
+		if (vl != CNTR_INVALID_VL)
+			return 0;
+	}
+
+	val = read_write_csr(dd, csr, mode, data);
+	return val;
+}
+
+static u64 dc_access_lcb_cntr(const struct cntr_entry *entry, void *context,
+			    int vl, int mode, u64 data)
+{
+	struct hfi1_devdata *dd = (struct hfi1_devdata *)context;
+	u32 csr = entry->csr;
+	int ret = 0;
+
+	if (vl != CNTR_INVALID_VL)
+		return 0;
+	if (mode == CNTR_MODE_R)
+		ret = read_lcb_csr(dd, csr, &data);
+	else if (mode == CNTR_MODE_W)
+		ret = write_lcb_csr(dd, csr, data);
+
+	if (ret) {
+		dd_dev_err(dd, "Could not acquire LCB for counter 0x%x", csr);
+		return 0;
+	}
+
+	hfi1_cdbg(CNTR, "csr 0x%x val 0x%llx mode %d", csr, data, mode);
+	return data;
+}
+
+/* Port Access */
+static u64 port_access_u32_csr(const struct cntr_entry *entry, void *context,
+			     int vl, int mode, u64 data)
+{
+	struct hfi1_pportdata *ppd = (struct hfi1_pportdata *)context;
+
+	if (vl != CNTR_INVALID_VL)
+		return 0;
+	return read_write_csr(ppd->dd, entry->csr, mode, data);
+}
+
+static u64 port_access_u64_csr(const struct cntr_entry *entry,
+			     void *context, int vl, int mode, u64 data)
+{
+	struct hfi1_pportdata *ppd = (struct hfi1_pportdata *)context;
+	u64 val;
+	u64 csr = entry->csr;
+
+	if (entry->flags & CNTR_VL) {
+		if (vl == CNTR_INVALID_VL)
+			return 0;
+		csr += 8 * vl;
+	} else {
+		if (vl != CNTR_INVALID_VL)
+			return 0;
+	}
+	val = read_write_csr(ppd->dd, csr, mode, data);
+	return val;
+}
+
+/* Software defined */
+static inline u64 read_write_sw(struct hfi1_devdata *dd, u64 *cntr, int mode,
+				u64 data)
+{
+	u64 ret;
+
+	if (mode == CNTR_MODE_R) {
+		ret = *cntr;
+	} else if (mode == CNTR_MODE_W) {
+		*cntr = data;
+		ret = data;
+	} else {
+		dd_dev_err(dd, "Invalid cntr sw access mode");
+		return 0;
+	}
+
+	hfi1_cdbg(CNTR, "val 0x%llx mode %d", ret, mode);
+
+	return ret;
+}
+
+static u64 access_sw_link_dn_cnt(const struct cntr_entry *entry, void *context,
+			       int vl, int mode, u64 data)
+{
+	struct hfi1_pportdata *ppd = (struct hfi1_pportdata *)context;
+
+	if (vl != CNTR_INVALID_VL)
+		return 0;
+	return read_write_sw(ppd->dd, &ppd->link_downed, mode, data);
+}
+
+static u64 access_sw_link_up_cnt(const struct cntr_entry *entry, void *context,
+			       int vl, int mode, u64 data)
+{
+	struct hfi1_pportdata *ppd = (struct hfi1_pportdata *)context;
+
+	if (vl != CNTR_INVALID_VL)
+		return 0;
+	return read_write_sw(ppd->dd, &ppd->link_up, mode, data);
+}
+
+static u64 access_sw_xmit_discards(const struct cntr_entry *entry,
+				    void *context, int vl, int mode, u64 data)
+{
+	struct hfi1_pportdata *ppd = (struct hfi1_pportdata *)context;
+
+	if (vl != CNTR_INVALID_VL)
+		return 0;
+
+	return read_write_sw(ppd->dd, &ppd->port_xmit_discards, mode, data);
+}
+
+static u64 access_xmit_constraint_errs(const struct cntr_entry *entry,
+				     void *context, int vl, int mode, u64 data)
+{
+	struct hfi1_pportdata *ppd = (struct hfi1_pportdata *)context;
+
+	if (vl != CNTR_INVALID_VL)
+		return 0;
+
+	return read_write_sw(ppd->dd, &ppd->port_xmit_constraint_errors,
+			     mode, data);
+}
+
+static u64 access_rcv_constraint_errs(const struct cntr_entry *entry,
+				     void *context, int vl, int mode, u64 data)
+{
+	struct hfi1_pportdata *ppd = (struct hfi1_pportdata *)context;
+
+	if (vl != CNTR_INVALID_VL)
+		return 0;
+
+	return read_write_sw(ppd->dd, &ppd->port_rcv_constraint_errors,
+			     mode, data);
+}
+
+u64 get_all_cpu_total(u64 __percpu *cntr)
+{
+	int cpu;
+	u64 counter = 0;
+
+	for_each_possible_cpu(cpu)
+		counter += *per_cpu_ptr(cntr, cpu);
+	return counter;
+}
+
+static u64 read_write_cpu(struct hfi1_devdata *dd, u64 *z_val,
+			  u64 __percpu *cntr,
+			  int vl, int mode, u64 data)
+{
+
+	u64 ret = 0;
+
+	if (vl != CNTR_INVALID_VL)
+		return 0;
+
+	if (mode == CNTR_MODE_R) {
+		ret = get_all_cpu_total(cntr) - *z_val;
+	} else if (mode == CNTR_MODE_W) {
+		/* A write can only zero the counter */
+		if (data == 0)
+			*z_val = get_all_cpu_total(cntr);
+		else
+			dd_dev_err(dd, "Per CPU cntrs can only be zeroed");
+	} else {
+		dd_dev_err(dd, "Invalid cntr sw cpu access mode");
+		return 0;
+	}
+
+	return ret;
+}
+
+static u64 access_sw_cpu_intr(const struct cntr_entry *entry,
+			      void *context, int vl, int mode, u64 data)
+{
+	struct hfi1_devdata *dd = (struct hfi1_devdata *)context;
+
+	return read_write_cpu(dd, &dd->z_int_counter, dd->int_counter, vl,
+			      mode, data);
+}
+
+static u64 access_sw_cpu_rcv_limit(const struct cntr_entry *entry,
+			      void *context, int vl, int mode, u64 data)
+{
+	struct hfi1_devdata *dd = (struct hfi1_devdata *)context;
+
+	return read_write_cpu(dd, &dd->z_rcv_limit, dd->rcv_limit, vl,
+			      mode, data);
+}
+
+static u64 access_sw_pio_wait(const struct cntr_entry *entry,
+			      void *context, int vl, int mode, u64 data)
+{
+	struct hfi1_devdata *dd = (struct hfi1_devdata *)context;
+
+	return dd->verbs_dev.n_piowait;
+}
+
+static u64 access_sw_vtx_wait(const struct cntr_entry *entry,
+			      void *context, int vl, int mode, u64 data)
+{
+	struct hfi1_devdata *dd = (struct hfi1_devdata *)context;
+
+	return dd->verbs_dev.n_txwait;
+}
+
+static u64 access_sw_kmem_wait(const struct cntr_entry *entry,
+			       void *context, int vl, int mode, u64 data)
+{
+	struct hfi1_devdata *dd = (struct hfi1_devdata *)context;
+
+	return dd->verbs_dev.n_kmem_wait;
+}
+
+#define def_access_sw_cpu(cntr) \
+static u64 access_sw_cpu_##cntr(const struct cntr_entry *entry,		      \
+			      void *context, int vl, int mode, u64 data)      \
+{									      \
+	struct hfi1_pportdata *ppd = (struct hfi1_pportdata *)context;	      \
+	return read_write_cpu(ppd->dd, &ppd->ibport_data.z_ ##cntr,	      \
+			      ppd->ibport_data.cntr, vl,		      \
+			      mode, data);				      \
+}
+
+def_access_sw_cpu(rc_acks);
+def_access_sw_cpu(rc_qacks);
+def_access_sw_cpu(rc_delayed_comp);
+
+#define def_access_ibp_counter(cntr) \
+static u64 access_ibp_##cntr(const struct cntr_entry *entry,		      \
+				void *context, int vl, int mode, u64 data)    \
+{									      \
+	struct hfi1_pportdata *ppd = (struct hfi1_pportdata *)context;	      \
+									      \
+	if (vl != CNTR_INVALID_VL)					      \
+		return 0;						      \
+									      \
+	return read_write_sw(ppd->dd, &ppd->ibport_data.n_ ##cntr,	      \
+			     mode, data);				      \
+}
+
+def_access_ibp_counter(loop_pkts);
+def_access_ibp_counter(rc_resends);
+def_access_ibp_counter(rnr_naks);
+def_access_ibp_counter(other_naks);
+def_access_ibp_counter(rc_timeouts);
+def_access_ibp_counter(pkt_drops);
+def_access_ibp_counter(dmawait);
+def_access_ibp_counter(rc_seqnak);
+def_access_ibp_counter(rc_dupreq);
+def_access_ibp_counter(rdma_seq);
+def_access_ibp_counter(unaligned);
+def_access_ibp_counter(seq_naks);
+
+static struct cntr_entry dev_cntrs[DEV_CNTR_LAST] = {
+[C_RCV_OVF] = RXE32_DEV_CNTR_ELEM(RcvOverflow, RCV_BUF_OVFL_CNT, CNTR_SYNTH),
+[C_RX_TID_FULL] = RXE32_DEV_CNTR_ELEM(RxTIDFullEr, RCV_TID_FULL_ERR_CNT,
+			CNTR_NORMAL),
+[C_RX_TID_INVALID] = RXE32_DEV_CNTR_ELEM(RxTIDInvalid, RCV_TID_VALID_ERR_CNT,
+			CNTR_NORMAL),
+[C_RX_TID_FLGMS] = RXE32_DEV_CNTR_ELEM(RxTidFLGMs,
+			RCV_TID_FLOW_GEN_MISMATCH_CNT,
+			CNTR_NORMAL),
+[C_RX_CTX_RHQS] = RXE32_DEV_CNTR_ELEM(RxCtxRHQS, RCV_CONTEXT_RHQ_STALL,
+			CNTR_NORMAL),
+[C_RX_CTX_EGRS] = RXE32_DEV_CNTR_ELEM(RxCtxEgrS, RCV_CONTEXT_EGR_STALL,
+			CNTR_NORMAL),
+[C_RCV_TID_FLSMS] = RXE32_DEV_CNTR_ELEM(RxTidFLSMs,
+			RCV_TID_FLOW_SEQ_MISMATCH_CNT, CNTR_NORMAL),
+[C_CCE_PCI_CR_ST] = CCE_PERF_DEV_CNTR_ELEM(CcePciCrSt,
+			CCE_PCIE_POSTED_CRDT_STALL_CNT, CNTR_NORMAL),
+[C_CCE_PCI_TR_ST] = CCE_PERF_DEV_CNTR_ELEM(CcePciTrSt, CCE_PCIE_TRGT_STALL_CNT,
+			CNTR_NORMAL),
+[C_CCE_PIO_WR_ST] = CCE_PERF_DEV_CNTR_ELEM(CcePioWrSt, CCE_PIO_WR_STALL_CNT,
+			CNTR_NORMAL),
+[C_CCE_ERR_INT] = CCE_INT_DEV_CNTR_ELEM(CceErrInt, CCE_ERR_INT_CNT,
+			CNTR_NORMAL),
+[C_CCE_SDMA_INT] = CCE_INT_DEV_CNTR_ELEM(CceSdmaInt, CCE_SDMA_INT_CNT,
+			CNTR_NORMAL),
+[C_CCE_MISC_INT] = CCE_INT_DEV_CNTR_ELEM(CceMiscInt, CCE_MISC_INT_CNT,
+			CNTR_NORMAL),
+[C_CCE_RCV_AV_INT] = CCE_INT_DEV_CNTR_ELEM(CceRcvAvInt, CCE_RCV_AVAIL_INT_CNT,
+			CNTR_NORMAL),
+[C_CCE_RCV_URG_INT] = CCE_INT_DEV_CNTR_ELEM(CceRcvUrgInt,
+			CCE_RCV_URGENT_INT_CNT,	CNTR_NORMAL),
+[C_CCE_SEND_CR_INT] = CCE_INT_DEV_CNTR_ELEM(CceSndCrInt,
+			CCE_SEND_CREDIT_INT_CNT, CNTR_NORMAL),
+[C_DC_UNC_ERR] = DC_PERF_CNTR(DcUnctblErr, DCC_ERR_UNCORRECTABLE_CNT,
+			      CNTR_SYNTH),
+[C_DC_RCV_ERR] = DC_PERF_CNTR(DcRecvErr, DCC_ERR_PORTRCV_ERR_CNT, CNTR_SYNTH),
+[C_DC_FM_CFG_ERR] = DC_PERF_CNTR(DcFmCfgErr, DCC_ERR_FMCONFIG_ERR_CNT,
+				 CNTR_SYNTH),
+[C_DC_RMT_PHY_ERR] = DC_PERF_CNTR(DcRmtPhyErr, DCC_ERR_RCVREMOTE_PHY_ERR_CNT,
+				  CNTR_SYNTH),
+[C_DC_DROPPED_PKT] = DC_PERF_CNTR(DcDroppedPkt, DCC_ERR_DROPPED_PKT_CNT,
+				  CNTR_SYNTH),
+[C_DC_MC_XMIT_PKTS] = DC_PERF_CNTR(DcMcXmitPkts,
+				   DCC_PRF_PORT_XMIT_MULTICAST_CNT, CNTR_SYNTH),
+[C_DC_MC_RCV_PKTS] = DC_PERF_CNTR(DcMcRcvPkts,
+				  DCC_PRF_PORT_RCV_MULTICAST_PKT_CNT,
+				  CNTR_SYNTH),
+[C_DC_XMIT_CERR] = DC_PERF_CNTR(DcXmitCorr,
+				DCC_PRF_PORT_XMIT_CORRECTABLE_CNT, CNTR_SYNTH),
+[C_DC_RCV_CERR] = DC_PERF_CNTR(DcRcvCorrCnt, DCC_PRF_PORT_RCV_CORRECTABLE_CNT,
+			       CNTR_SYNTH),
+[C_DC_RCV_FCC] = DC_PERF_CNTR(DcRxFCntl, DCC_PRF_RX_FLOW_CRTL_CNT,
+			      CNTR_SYNTH),
+[C_DC_XMIT_FCC] = DC_PERF_CNTR(DcXmitFCntl, DCC_PRF_TX_FLOW_CRTL_CNT,
+			       CNTR_SYNTH),
+[C_DC_XMIT_FLITS] = DC_PERF_CNTR(DcXmitFlits, DCC_PRF_PORT_XMIT_DATA_CNT,
+				 CNTR_SYNTH),
+[C_DC_RCV_FLITS] = DC_PERF_CNTR(DcRcvFlits, DCC_PRF_PORT_RCV_DATA_CNT,
+				CNTR_SYNTH),
+[C_DC_XMIT_PKTS] = DC_PERF_CNTR(DcXmitPkts, DCC_PRF_PORT_XMIT_PKTS_CNT,
+				CNTR_SYNTH),
+[C_DC_RCV_PKTS] = DC_PERF_CNTR(DcRcvPkts, DCC_PRF_PORT_RCV_PKTS_CNT,
+			       CNTR_SYNTH),
+[C_DC_RX_FLIT_VL] = DC_PERF_CNTR(DcRxFlitVl, DCC_PRF_PORT_VL_RCV_DATA_CNT,
+				 CNTR_SYNTH | CNTR_VL),
+[C_DC_RX_PKT_VL] = DC_PERF_CNTR(DcRxPktVl, DCC_PRF_PORT_VL_RCV_PKTS_CNT,
+				CNTR_SYNTH | CNTR_VL),
+[C_DC_RCV_FCN] = DC_PERF_CNTR(DcRcvFcn, DCC_PRF_PORT_RCV_FECN_CNT, CNTR_SYNTH),
+[C_DC_RCV_FCN_VL] = DC_PERF_CNTR(DcRcvFcnVl, DCC_PRF_PORT_VL_RCV_FECN_CNT,
+				 CNTR_SYNTH | CNTR_VL),
+[C_DC_RCV_BCN] = DC_PERF_CNTR(DcRcvBcn, DCC_PRF_PORT_RCV_BECN_CNT, CNTR_SYNTH),
+[C_DC_RCV_BCN_VL] = DC_PERF_CNTR(DcRcvBcnVl, DCC_PRF_PORT_VL_RCV_BECN_CNT,
+				 CNTR_SYNTH | CNTR_VL),
+[C_DC_RCV_BBL] = DC_PERF_CNTR(DcRcvBbl, DCC_PRF_PORT_RCV_BUBBLE_CNT,
+			      CNTR_SYNTH),
+[C_DC_RCV_BBL_VL] = DC_PERF_CNTR(DcRcvBblVl, DCC_PRF_PORT_VL_RCV_BUBBLE_CNT,
+				 CNTR_SYNTH | CNTR_VL),
+[C_DC_MARK_FECN] = DC_PERF_CNTR(DcMarkFcn, DCC_PRF_PORT_MARK_FECN_CNT,
+				CNTR_SYNTH),
+[C_DC_MARK_FECN_VL] = DC_PERF_CNTR(DcMarkFcnVl, DCC_PRF_PORT_VL_MARK_FECN_CNT,
+				   CNTR_SYNTH | CNTR_VL),
+[C_DC_TOTAL_CRC] =
+	DC_PERF_CNTR_LCB(DcTotCrc, DC_LCB_ERR_INFO_TOTAL_CRC_ERR,
+			 CNTR_SYNTH),
+[C_DC_CRC_LN0] = DC_PERF_CNTR_LCB(DcCrcLn0, DC_LCB_ERR_INFO_CRC_ERR_LN0,
+				  CNTR_SYNTH),
+[C_DC_CRC_LN1] = DC_PERF_CNTR_LCB(DcCrcLn1, DC_LCB_ERR_INFO_CRC_ERR_LN1,
+				  CNTR_SYNTH),
+[C_DC_CRC_LN2] = DC_PERF_CNTR_LCB(DcCrcLn2, DC_LCB_ERR_INFO_CRC_ERR_LN2,
+				  CNTR_SYNTH),
+[C_DC_CRC_LN3] = DC_PERF_CNTR_LCB(DcCrcLn3, DC_LCB_ERR_INFO_CRC_ERR_LN3,
+				  CNTR_SYNTH),
+[C_DC_CRC_MULT_LN] =
+	DC_PERF_CNTR_LCB(DcMultLn, DC_LCB_ERR_INFO_CRC_ERR_MULTI_LN,
+			 CNTR_SYNTH),
+[C_DC_TX_REPLAY] = DC_PERF_CNTR_LCB(DcTxReplay, DC_LCB_ERR_INFO_TX_REPLAY_CNT,
+				    CNTR_SYNTH),
+[C_DC_RX_REPLAY] = DC_PERF_CNTR_LCB(DcRxReplay, DC_LCB_ERR_INFO_RX_REPLAY_CNT,
+				    CNTR_SYNTH),
+[C_DC_SEQ_CRC_CNT] =
+	DC_PERF_CNTR_LCB(DcLinkSeqCrc, DC_LCB_ERR_INFO_SEQ_CRC_CNT,
+			 CNTR_SYNTH),
+[C_DC_ESC0_ONLY_CNT] =
+	DC_PERF_CNTR_LCB(DcEsc0, DC_LCB_ERR_INFO_ESCAPE_0_ONLY_CNT,
+			 CNTR_SYNTH),
+[C_DC_ESC0_PLUS1_CNT] =
+	DC_PERF_CNTR_LCB(DcEsc1, DC_LCB_ERR_INFO_ESCAPE_0_PLUS1_CNT,
+			 CNTR_SYNTH),
+[C_DC_ESC0_PLUS2_CNT] =
+	DC_PERF_CNTR_LCB(DcEsc0Plus2, DC_LCB_ERR_INFO_ESCAPE_0_PLUS2_CNT,
+			 CNTR_SYNTH),
+[C_DC_REINIT_FROM_PEER_CNT] =
+	DC_PERF_CNTR_LCB(DcReinitPeer, DC_LCB_ERR_INFO_REINIT_FROM_PEER_CNT,
+			 CNTR_SYNTH),
+[C_DC_SBE_CNT] = DC_PERF_CNTR_LCB(DcSbe, DC_LCB_ERR_INFO_SBE_CNT,
+				  CNTR_SYNTH),
+[C_DC_MISC_FLG_CNT] =
+	DC_PERF_CNTR_LCB(DcMiscFlg, DC_LCB_ERR_INFO_MISC_FLG_CNT,
+			 CNTR_SYNTH),
+[C_DC_PRF_GOOD_LTP_CNT] =
+	DC_PERF_CNTR_LCB(DcGoodLTP, DC_LCB_PRF_GOOD_LTP_CNT, CNTR_SYNTH),
+[C_DC_PRF_ACCEPTED_LTP_CNT] =
+	DC_PERF_CNTR_LCB(DcAccLTP, DC_LCB_PRF_ACCEPTED_LTP_CNT,
+			 CNTR_SYNTH),
+[C_DC_PRF_RX_FLIT_CNT] =
+	DC_PERF_CNTR_LCB(DcPrfRxFlit, DC_LCB_PRF_RX_FLIT_CNT, CNTR_SYNTH),
+[C_DC_PRF_TX_FLIT_CNT] =
+	DC_PERF_CNTR_LCB(DcPrfTxFlit, DC_LCB_PRF_TX_FLIT_CNT, CNTR_SYNTH),
+[C_DC_PRF_CLK_CNTR] =
+	DC_PERF_CNTR_LCB(DcPrfClk, DC_LCB_PRF_CLK_CNTR, CNTR_SYNTH),
+[C_DC_PG_DBG_FLIT_CRDTS_CNT] =
+	DC_PERF_CNTR_LCB(DcFltCrdts, DC_LCB_PG_DBG_FLIT_CRDTS_CNT, CNTR_SYNTH),
+[C_DC_PG_STS_PAUSE_COMPLETE_CNT] =
+	DC_PERF_CNTR_LCB(DcPauseComp, DC_LCB_PG_STS_PAUSE_COMPLETE_CNT,
+			 CNTR_SYNTH),
+[C_DC_PG_STS_TX_SBE_CNT] =
+	DC_PERF_CNTR_LCB(DcStsTxSbe, DC_LCB_PG_STS_TX_SBE_CNT, CNTR_SYNTH),
+[C_DC_PG_STS_TX_MBE_CNT] =
+	DC_PERF_CNTR_LCB(DcStsTxMbe, DC_LCB_PG_STS_TX_MBE_CNT,
+			 CNTR_SYNTH),
+[C_SW_CPU_INTR] = CNTR_ELEM("Intr", 0, 0, CNTR_NORMAL,
+			    access_sw_cpu_intr),
+[C_SW_CPU_RCV_LIM] = CNTR_ELEM("RcvLimit", 0, 0, CNTR_NORMAL,
+			    access_sw_cpu_rcv_limit),
+[C_SW_VTX_WAIT] = CNTR_ELEM("vTxWait", 0, 0, CNTR_NORMAL,
+			    access_sw_vtx_wait),
+[C_SW_PIO_WAIT] = CNTR_ELEM("PioWait", 0, 0, CNTR_NORMAL,
+			    access_sw_pio_wait),
+[C_SW_KMEM_WAIT] = CNTR_ELEM("KmemWait", 0, 0, CNTR_NORMAL,
+			    access_sw_kmem_wait),
+};
+
+static struct cntr_entry port_cntrs[PORT_CNTR_LAST] = {
+[C_TX_UNSUP_VL] = TXE32_PORT_CNTR_ELEM(TxUnVLErr, SEND_UNSUP_VL_ERR_CNT,
+			CNTR_NORMAL),
+[C_TX_INVAL_LEN] = TXE32_PORT_CNTR_ELEM(TxInvalLen, SEND_LEN_ERR_CNT,
+			CNTR_NORMAL),
+[C_TX_MM_LEN_ERR] = TXE32_PORT_CNTR_ELEM(TxMMLenErr, SEND_MAX_MIN_LEN_ERR_CNT,
+			CNTR_NORMAL),
+[C_TX_UNDERRUN] = TXE32_PORT_CNTR_ELEM(TxUnderrun, SEND_UNDERRUN_CNT,
+			CNTR_NORMAL),
+[C_TX_FLOW_STALL] = TXE32_PORT_CNTR_ELEM(TxFlowStall, SEND_FLOW_STALL_CNT,
+			CNTR_NORMAL),
+[C_TX_DROPPED] = TXE32_PORT_CNTR_ELEM(TxDropped, SEND_DROPPED_PKT_CNT,
+			CNTR_NORMAL),
+[C_TX_HDR_ERR] = TXE32_PORT_CNTR_ELEM(TxHdrErr, SEND_HEADERS_ERR_CNT,
+			CNTR_NORMAL),
+[C_TX_PKT] = TXE64_PORT_CNTR_ELEM(TxPkt, SEND_DATA_PKT_CNT, CNTR_NORMAL),
+[C_TX_WORDS] = TXE64_PORT_CNTR_ELEM(TxWords, SEND_DWORD_CNT, CNTR_NORMAL),
+[C_TX_WAIT] = TXE64_PORT_CNTR_ELEM(TxWait, SEND_WAIT_CNT, CNTR_SYNTH),
+[C_TX_FLIT_VL] = TXE64_PORT_CNTR_ELEM(TxFlitVL, SEND_DATA_VL0_CNT,
+			CNTR_SYNTH | CNTR_VL),
+[C_TX_PKT_VL] = TXE64_PORT_CNTR_ELEM(TxPktVL, SEND_DATA_PKT_VL0_CNT,
+			CNTR_SYNTH | CNTR_VL),
+[C_TX_WAIT_VL] = TXE64_PORT_CNTR_ELEM(TxWaitVL, SEND_WAIT_VL0_CNT,
+			CNTR_SYNTH | CNTR_VL),
+[C_RX_PKT] = RXE64_PORT_CNTR_ELEM(RxPkt, RCV_DATA_PKT_CNT, CNTR_NORMAL),
+[C_RX_WORDS] = RXE64_PORT_CNTR_ELEM(RxWords, RCV_DWORD_CNT, CNTR_NORMAL),
+[C_SW_LINK_DOWN] = CNTR_ELEM("SwLinkDown", 0, 0, CNTR_SYNTH | CNTR_32BIT,
+			access_sw_link_dn_cnt),
+[C_SW_LINK_UP] = CNTR_ELEM("SwLinkUp", 0, 0, CNTR_SYNTH | CNTR_32BIT,
+			access_sw_link_up_cnt),
+[C_SW_XMIT_DSCD] = CNTR_ELEM("XmitDscd", 0, 0, CNTR_SYNTH | CNTR_32BIT,
+			access_sw_xmit_discards),
+[C_SW_XMIT_DSCD_VL] = CNTR_ELEM("XmitDscdVl", 0, 0,
+			CNTR_SYNTH | CNTR_32BIT | CNTR_VL,
+			access_sw_xmit_discards),
+[C_SW_XMIT_CSTR_ERR] = CNTR_ELEM("XmitCstrErr", 0, 0, CNTR_SYNTH,
+			access_xmit_constraint_errs),
+[C_SW_RCV_CSTR_ERR] = CNTR_ELEM("RcvCstrErr", 0, 0, CNTR_SYNTH,
+			access_rcv_constraint_errs),
+[C_SW_IBP_LOOP_PKTS] = SW_IBP_CNTR(LoopPkts, loop_pkts),
+[C_SW_IBP_RC_RESENDS] = SW_IBP_CNTR(RcResend, rc_resends),
+[C_SW_IBP_RNR_NAKS] = SW_IBP_CNTR(RnrNak, rnr_naks),
+[C_SW_IBP_OTHER_NAKS] = SW_IBP_CNTR(OtherNak, other_naks),
+[C_SW_IBP_RC_TIMEOUTS] = SW_IBP_CNTR(RcTimeOut, rc_timeouts),
+[C_SW_IBP_PKT_DROPS] = SW_IBP_CNTR(PktDrop, pkt_drops),
+[C_SW_IBP_DMA_WAIT] = SW_IBP_CNTR(DmaWait, dmawait),
+[C_SW_IBP_RC_SEQNAK] = SW_IBP_CNTR(RcSeqNak, rc_seqnak),
+[C_SW_IBP_RC_DUPREQ] = SW_IBP_CNTR(RcDupRew, rc_dupreq),
+[C_SW_IBP_RDMA_SEQ] = SW_IBP_CNTR(RdmaSeq, rdma_seq),
+[C_SW_IBP_UNALIGNED] = SW_IBP_CNTR(Unaligned, unaligned),
+[C_SW_IBP_SEQ_NAK] = SW_IBP_CNTR(SeqNak, seq_naks),
+[C_SW_CPU_RC_ACKS] = CNTR_ELEM("RcAcks", 0, 0, CNTR_NORMAL,
+			       access_sw_cpu_rc_acks),
+[C_SW_CPU_RC_QACKS] = CNTR_ELEM("RcQacks", 0, 0, CNTR_NORMAL,
+			       access_sw_cpu_rc_qacks),
+[C_SW_CPU_RC_DELAYED_COMP] = CNTR_ELEM("RcDelayComp", 0, 0, CNTR_NORMAL,
+			       access_sw_cpu_rc_delayed_comp),
+[OVR_LBL(0)] = OVR_ELM(0), [OVR_LBL(1)] = OVR_ELM(1),
+[OVR_LBL(2)] = OVR_ELM(2), [OVR_LBL(3)] = OVR_ELM(3),
+[OVR_LBL(4)] = OVR_ELM(4), [OVR_LBL(5)] = OVR_ELM(5),
+[OVR_LBL(6)] = OVR_ELM(6), [OVR_LBL(7)] = OVR_ELM(7),
+[OVR_LBL(8)] = OVR_ELM(8), [OVR_LBL(9)] = OVR_ELM(9),
+[OVR_LBL(10)] = OVR_ELM(10), [OVR_LBL(11)] = OVR_ELM(11),
+[OVR_LBL(12)] = OVR_ELM(12), [OVR_LBL(13)] = OVR_ELM(13),
+[OVR_LBL(14)] = OVR_ELM(14), [OVR_LBL(15)] = OVR_ELM(15),
+[OVR_LBL(16)] = OVR_ELM(16), [OVR_LBL(17)] = OVR_ELM(17),
+[OVR_LBL(18)] = OVR_ELM(18), [OVR_LBL(19)] = OVR_ELM(19),
+[OVR_LBL(20)] = OVR_ELM(20), [OVR_LBL(21)] = OVR_ELM(21),
+[OVR_LBL(22)] = OVR_ELM(22), [OVR_LBL(23)] = OVR_ELM(23),
+[OVR_LBL(24)] = OVR_ELM(24), [OVR_LBL(25)] = OVR_ELM(25),
+[OVR_LBL(26)] = OVR_ELM(26), [OVR_LBL(27)] = OVR_ELM(27),
+[OVR_LBL(28)] = OVR_ELM(28), [OVR_LBL(29)] = OVR_ELM(29),
+[OVR_LBL(30)] = OVR_ELM(30), [OVR_LBL(31)] = OVR_ELM(31),
+[OVR_LBL(32)] = OVR_ELM(32), [OVR_LBL(33)] = OVR_ELM(33),
+[OVR_LBL(34)] = OVR_ELM(34), [OVR_LBL(35)] = OVR_ELM(35),
+[OVR_LBL(36)] = OVR_ELM(36), [OVR_LBL(37)] = OVR_ELM(37),
+[OVR_LBL(38)] = OVR_ELM(38), [OVR_LBL(39)] = OVR_ELM(39),
+[OVR_LBL(40)] = OVR_ELM(40), [OVR_LBL(41)] = OVR_ELM(41),
+[OVR_LBL(42)] = OVR_ELM(42), [OVR_LBL(43)] = OVR_ELM(43),
+[OVR_LBL(44)] = OVR_ELM(44), [OVR_LBL(45)] = OVR_ELM(45),
+[OVR_LBL(46)] = OVR_ELM(46), [OVR_LBL(47)] = OVR_ELM(47),
+[OVR_LBL(48)] = OVR_ELM(48), [OVR_LBL(49)] = OVR_ELM(49),
+[OVR_LBL(50)] = OVR_ELM(50), [OVR_LBL(51)] = OVR_ELM(51),
+[OVR_LBL(52)] = OVR_ELM(52), [OVR_LBL(53)] = OVR_ELM(53),
+[OVR_LBL(54)] = OVR_ELM(54), [OVR_LBL(55)] = OVR_ELM(55),
+[OVR_LBL(56)] = OVR_ELM(56), [OVR_LBL(57)] = OVR_ELM(57),
+[OVR_LBL(58)] = OVR_ELM(58), [OVR_LBL(59)] = OVR_ELM(59),
+[OVR_LBL(60)] = OVR_ELM(60), [OVR_LBL(61)] = OVR_ELM(61),
+[OVR_LBL(62)] = OVR_ELM(62), [OVR_LBL(63)] = OVR_ELM(63),
+[OVR_LBL(64)] = OVR_ELM(64), [OVR_LBL(65)] = OVR_ELM(65),
+[OVR_LBL(66)] = OVR_ELM(66), [OVR_LBL(67)] = OVR_ELM(67),
+[OVR_LBL(68)] = OVR_ELM(68), [OVR_LBL(69)] = OVR_ELM(69),
+[OVR_LBL(70)] = OVR_ELM(70), [OVR_LBL(71)] = OVR_ELM(71),
+[OVR_LBL(72)] = OVR_ELM(72), [OVR_LBL(73)] = OVR_ELM(73),
+[OVR_LBL(74)] = OVR_ELM(74), [OVR_LBL(75)] = OVR_ELM(75),
+[OVR_LBL(76)] = OVR_ELM(76), [OVR_LBL(77)] = OVR_ELM(77),
+[OVR_LBL(78)] = OVR_ELM(78), [OVR_LBL(79)] = OVR_ELM(79),
+[OVR_LBL(80)] = OVR_ELM(80), [OVR_LBL(81)] = OVR_ELM(81),
+[OVR_LBL(82)] = OVR_ELM(82), [OVR_LBL(83)] = OVR_ELM(83),
+[OVR_LBL(84)] = OVR_ELM(84), [OVR_LBL(85)] = OVR_ELM(85),
+[OVR_LBL(86)] = OVR_ELM(86), [OVR_LBL(87)] = OVR_ELM(87),
+[OVR_LBL(88)] = OVR_ELM(88), [OVR_LBL(89)] = OVR_ELM(89),
+[OVR_LBL(90)] = OVR_ELM(90), [OVR_LBL(91)] = OVR_ELM(91),
+[OVR_LBL(92)] = OVR_ELM(92), [OVR_LBL(93)] = OVR_ELM(93),
+[OVR_LBL(94)] = OVR_ELM(94), [OVR_LBL(95)] = OVR_ELM(95),
+[OVR_LBL(96)] = OVR_ELM(96), [OVR_LBL(97)] = OVR_ELM(97),
+[OVR_LBL(98)] = OVR_ELM(98), [OVR_LBL(99)] = OVR_ELM(99),
+[OVR_LBL(100)] = OVR_ELM(100), [OVR_LBL(101)] = OVR_ELM(101),
+[OVR_LBL(102)] = OVR_ELM(102), [OVR_LBL(103)] = OVR_ELM(103),
+[OVR_LBL(104)] = OVR_ELM(104), [OVR_LBL(105)] = OVR_ELM(105),
+[OVR_LBL(106)] = OVR_ELM(106), [OVR_LBL(107)] = OVR_ELM(107),
+[OVR_LBL(108)] = OVR_ELM(108), [OVR_LBL(109)] = OVR_ELM(109),
+[OVR_LBL(110)] = OVR_ELM(110), [OVR_LBL(111)] = OVR_ELM(111),
+[OVR_LBL(112)] = OVR_ELM(112), [OVR_LBL(113)] = OVR_ELM(113),
+[OVR_LBL(114)] = OVR_ELM(114), [OVR_LBL(115)] = OVR_ELM(115),
+[OVR_LBL(116)] = OVR_ELM(116), [OVR_LBL(117)] = OVR_ELM(117),
+[OVR_LBL(118)] = OVR_ELM(118), [OVR_LBL(119)] = OVR_ELM(119),
+[OVR_LBL(120)] = OVR_ELM(120), [OVR_LBL(121)] = OVR_ELM(121),
+[OVR_LBL(122)] = OVR_ELM(122), [OVR_LBL(123)] = OVR_ELM(123),
+[OVR_LBL(124)] = OVR_ELM(124), [OVR_LBL(125)] = OVR_ELM(125),
+[OVR_LBL(126)] = OVR_ELM(126), [OVR_LBL(127)] = OVR_ELM(127),
+[OVR_LBL(128)] = OVR_ELM(128), [OVR_LBL(129)] = OVR_ELM(129),
+[OVR_LBL(130)] = OVR_ELM(130), [OVR_LBL(131)] = OVR_ELM(131),
+[OVR_LBL(132)] = OVR_ELM(132), [OVR_LBL(133)] = OVR_ELM(133),
+[OVR_LBL(134)] = OVR_ELM(134), [OVR_LBL(135)] = OVR_ELM(135),
+[OVR_LBL(136)] = OVR_ELM(136), [OVR_LBL(137)] = OVR_ELM(137),
+[OVR_LBL(138)] = OVR_ELM(138), [OVR_LBL(139)] = OVR_ELM(139),
+[OVR_LBL(140)] = OVR_ELM(140), [OVR_LBL(141)] = OVR_ELM(141),
+[OVR_LBL(142)] = OVR_ELM(142), [OVR_LBL(143)] = OVR_ELM(143),
+[OVR_LBL(144)] = OVR_ELM(144), [OVR_LBL(145)] = OVR_ELM(145),
+[OVR_LBL(146)] = OVR_ELM(146), [OVR_LBL(147)] = OVR_ELM(147),
+[OVR_LBL(148)] = OVR_ELM(148), [OVR_LBL(149)] = OVR_ELM(149),
+[OVR_LBL(150)] = OVR_ELM(150), [OVR_LBL(151)] = OVR_ELM(151),
+[OVR_LBL(152)] = OVR_ELM(152), [OVR_LBL(153)] = OVR_ELM(153),
+[OVR_LBL(154)] = OVR_ELM(154), [OVR_LBL(155)] = OVR_ELM(155),
+[OVR_LBL(156)] = OVR_ELM(156), [OVR_LBL(157)] = OVR_ELM(157),
+[OVR_LBL(158)] = OVR_ELM(158), [OVR_LBL(159)] = OVR_ELM(159),
+};
+
+/* ======================================================================== */
+
+/* return true if this is chip revision revision a0 */
+int is_a0(struct hfi1_devdata *dd)
+{
+	return ((dd->revision >> CCE_REVISION_CHIP_REV_MINOR_SHIFT)
+			& CCE_REVISION_CHIP_REV_MINOR_MASK) == 0;
+}
+
+/* return true if this is chip revision revision a */
+int is_ax(struct hfi1_devdata *dd)
+{
+	u8 chip_rev_minor =
+		dd->revision >> CCE_REVISION_CHIP_REV_MINOR_SHIFT
+			& CCE_REVISION_CHIP_REV_MINOR_MASK;
+	return (chip_rev_minor & 0xf0) == 0;
+}
+
+/* return true if this is chip revision revision b */
+int is_bx(struct hfi1_devdata *dd)
+{
+	u8 chip_rev_minor =
+		dd->revision >> CCE_REVISION_CHIP_REV_MINOR_SHIFT
+			& CCE_REVISION_CHIP_REV_MINOR_MASK;
+	return !!(chip_rev_minor & 0x10);
+}
+
+/*
+ * Append string s to buffer buf.  Arguments curp and len are the current
+ * position and remaining length, respectively.
+ *
+ * return 0 on success, 1 on out of room
+ */
+static int append_str(char *buf, char **curp, int *lenp, const char *s)
+{
+	char *p = *curp;
+	int len = *lenp;
+	int result = 0; /* success */
+	char c;
+
+	/* add a comma, if first in the buffer */
+	if (p != buf) {
+		if (len == 0) {
+			result = 1; /* out of room */
+			goto done;
+		}
+		*p++ = ',';
+		len--;
+	}
+
+	/* copy the string */
+	while ((c = *s++) != 0) {
+		if (len == 0) {
+			result = 1; /* out of room */
+			goto done;
+		}
+		*p++ = c;
+		len--;
+	}
+
+done:
+	/* write return values */
+	*curp = p;
+	*lenp = len;
+
+	return result;
+}
+
+/*
+ * Using the given flag table, print a comma separated string into
+ * the buffer.  End in '*' if the buffer is too short.
+ */
+static char *flag_string(char *buf, int buf_len, u64 flags,
+				struct flag_table *table, int table_size)
+{
+	char extra[32];
+	char *p = buf;
+	int len = buf_len;
+	int no_room = 0;
+	int i;
+
+	/* make sure there is at least 2 so we can form "*" */
+	if (len < 2)
+		return "";
+
+	len--;	/* leave room for a nul */
+	for (i = 0; i < table_size; i++) {
+		if (flags & table[i].flag) {
+			no_room = append_str(buf, &p, &len, table[i].str);
+			if (no_room)
+				break;
+			flags &= ~table[i].flag;
+		}
+	}
+
+	/* any undocumented bits left? */
+	if (!no_room && flags) {
+		snprintf(extra, sizeof(extra), "bits 0x%llx", flags);
+		no_room = append_str(buf, &p, &len, extra);
+	}
+
+	/* add * if ran out of room */
+	if (no_room) {
+		/* may need to back up to add space for a '*' */
+		if (len == 0)
+			--p;
+		*p++ = '*';
+	}
+
+	/* add final nul - space already allocated above */
+	*p = 0;
+	return buf;
+}
+
+/* first 8 CCE error interrupt source names */
+static const char * const cce_misc_names[] = {
+	"CceErrInt",		/* 0 */
+	"RxeErrInt",		/* 1 */
+	"MiscErrInt",		/* 2 */
+	"Reserved3",		/* 3 */
+	"PioErrInt",		/* 4 */
+	"SDmaErrInt",		/* 5 */
+	"EgressErrInt",		/* 6 */
+	"TxeErrInt"		/* 7 */
+};
+
+/*
+ * Return the miscellaneous error interrupt name.
+ */
+static char *is_misc_err_name(char *buf, size_t bsize, unsigned int source)
+{
+	if (source < ARRAY_SIZE(cce_misc_names))
+		strncpy(buf, cce_misc_names[source], bsize);
+	else
+		snprintf(buf,
+			bsize,
+			"Reserved%u",
+			source + IS_GENERAL_ERR_START);
+
+	return buf;
+}
+
+/*
+ * Return the SDMA engine error interrupt name.
+ */
+static char *is_sdma_eng_err_name(char *buf, size_t bsize, unsigned int source)
+{
+	snprintf(buf, bsize, "SDmaEngErrInt%u", source);
+	return buf;
+}
+
+/*
+ * Return the send context error interrupt name.
+ */
+static char *is_sendctxt_err_name(char *buf, size_t bsize, unsigned int source)
+{
+	snprintf(buf, bsize, "SendCtxtErrInt%u", source);
+	return buf;
+}
+
+static const char * const various_names[] = {
+	"PbcInt",
+	"GpioAssertInt",
+	"Qsfp1Int",
+	"Qsfp2Int",
+	"TCritInt"
+};
+
+/*
+ * Return the various interrupt name.
+ */
+static char *is_various_name(char *buf, size_t bsize, unsigned int source)
+{
+	if (source < ARRAY_SIZE(various_names))
+		strncpy(buf, various_names[source], bsize);
+	else
+		snprintf(buf, bsize, "Reserved%u", source+IS_VARIOUS_START);
+	return buf;
+}
+
+/*
+ * Return the DC interrupt name.
+ */
+static char *is_dc_name(char *buf, size_t bsize, unsigned int source)
+{
+	static const char * const dc_int_names[] = {
+		"common",
+		"lcb",
+		"8051",
+		"lbm"	/* local block merge */
+	};
+
+	if (source < ARRAY_SIZE(dc_int_names))
+		snprintf(buf, bsize, "dc_%s_int", dc_int_names[source]);
+	else
+		snprintf(buf, bsize, "DCInt%u", source);
+	return buf;
+}
+
+static const char * const sdma_int_names[] = {
+	"SDmaInt",
+	"SdmaIdleInt",
+	"SdmaProgressInt",
+};
+
+/*
+ * Return the SDMA engine interrupt name.
+ */
+static char *is_sdma_eng_name(char *buf, size_t bsize, unsigned int source)
+{
+	/* what interrupt */
+	unsigned int what  = source / TXE_NUM_SDMA_ENGINES;
+	/* which engine */
+	unsigned int which = source % TXE_NUM_SDMA_ENGINES;
+
+	if (likely(what < 3))
+		snprintf(buf, bsize, "%s%u", sdma_int_names[what], which);
+	else
+		snprintf(buf, bsize, "Invalid SDMA interrupt %u", source);
+	return buf;
+}
+
+/*
+ * Return the receive available interrupt name.
+ */
+static char *is_rcv_avail_name(char *buf, size_t bsize, unsigned int source)
+{
+	snprintf(buf, bsize, "RcvAvailInt%u", source);
+	return buf;
+}
+
+/*
+ * Return the receive urgent interrupt name.
+ */
+static char *is_rcv_urgent_name(char *buf, size_t bsize, unsigned int source)
+{
+	snprintf(buf, bsize, "RcvUrgentInt%u", source);
+	return buf;
+}
+
+/*
+ * Return the send credit interrupt name.
+ */
+static char *is_send_credit_name(char *buf, size_t bsize, unsigned int source)
+{
+	snprintf(buf, bsize, "SendCreditInt%u", source);
+	return buf;
+}
+
+/*
+ * Return the reserved interrupt name.
+ */
+static char *is_reserved_name(char *buf, size_t bsize, unsigned int source)
+{
+	snprintf(buf, bsize, "Reserved%u", source + IS_RESERVED_START);
+	return buf;
+}
+
+static char *cce_err_status_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags,
+			cce_err_status_flags, ARRAY_SIZE(cce_err_status_flags));
+}
+
+static char *rxe_err_status_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags,
+			rxe_err_status_flags, ARRAY_SIZE(rxe_err_status_flags));
+}
+
+static char *misc_err_status_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags, misc_err_status_flags,
+			ARRAY_SIZE(misc_err_status_flags));
+}
+
+static char *pio_err_status_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags,
+			pio_err_status_flags, ARRAY_SIZE(pio_err_status_flags));
+}
+
+static char *sdma_err_status_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags,
+			sdma_err_status_flags,
+			ARRAY_SIZE(sdma_err_status_flags));
+}
+
+static char *egress_err_status_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags,
+		egress_err_status_flags, ARRAY_SIZE(egress_err_status_flags));
+}
+
+static char *egress_err_info_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags,
+		egress_err_info_flags, ARRAY_SIZE(egress_err_info_flags));
+}
+
+static char *send_err_status_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags,
+			send_err_status_flags,
+			ARRAY_SIZE(send_err_status_flags));
+}
+
+static void handle_cce_err(struct hfi1_devdata *dd, u32 unused, u64 reg)
+{
+	char buf[96];
+
+	/*
+	 * For most these errors, there is nothing that can be done except
+	 * report or record it.
+	 */
+	dd_dev_info(dd, "CCE Error: %s\n",
+		cce_err_status_string(buf, sizeof(buf), reg));
+
+	if ((reg & CCE_ERR_STATUS_CCE_CLI2_ASYNC_FIFO_PARITY_ERR_SMASK)
+			&& is_a0(dd)
+			&& (dd->icode != ICODE_FUNCTIONAL_SIMULATOR)) {
+		/* this error requires a manual drop into SPC freeze mode */
+		/* then a fix up */
+		start_freeze_handling(dd->pport, FREEZE_SELF);
+	}
+}
+
+/*
+ * Check counters for receive errors that do not have an interrupt
+ * associated with them.
+ */
+#define RCVERR_CHECK_TIME 10
+static void update_rcverr_timer(unsigned long opaque)
+{
+	struct hfi1_devdata *dd = (struct hfi1_devdata *)opaque;
+	struct hfi1_pportdata *ppd = dd->pport;
+	u32 cur_ovfl_cnt = read_dev_cntr(dd, C_RCV_OVF, CNTR_INVALID_VL);
+
+	if (dd->rcv_ovfl_cnt < cur_ovfl_cnt &&
+		ppd->port_error_action & OPA_PI_MASK_EX_BUFFER_OVERRUN) {
+		dd_dev_info(dd, "%s: PortErrorAction bounce\n", __func__);
+		set_link_down_reason(ppd,
+		  OPA_LINKDOWN_REASON_EXCESSIVE_BUFFER_OVERRUN, 0,
+			OPA_LINKDOWN_REASON_EXCESSIVE_BUFFER_OVERRUN);
+		queue_work(ppd->hfi1_wq, &ppd->link_bounce_work);
+	}
+	dd->rcv_ovfl_cnt = (u32) cur_ovfl_cnt;
+
+	mod_timer(&dd->rcverr_timer, jiffies + HZ * RCVERR_CHECK_TIME);
+}
+
+static int init_rcverr(struct hfi1_devdata *dd)
+{
+	init_timer(&dd->rcverr_timer);
+	dd->rcverr_timer.function = update_rcverr_timer;
+	dd->rcverr_timer.data = (unsigned long) dd;
+	/* Assume the hardware counter has been reset */
+	dd->rcv_ovfl_cnt = 0;
+	return mod_timer(&dd->rcverr_timer, jiffies + HZ * RCVERR_CHECK_TIME);
+}
+
+static void free_rcverr(struct hfi1_devdata *dd)
+{
+	if (dd->rcverr_timer.data)
+		del_timer_sync(&dd->rcverr_timer);
+	dd->rcverr_timer.data = 0;
+}
+
+static void handle_rxe_err(struct hfi1_devdata *dd, u32 unused, u64 reg)
+{
+	char buf[96];
+
+	dd_dev_info(dd, "Receive Error: %s\n",
+		rxe_err_status_string(buf, sizeof(buf), reg));
+
+	if (reg & ALL_RXE_FREEZE_ERR) {
+		int flags = 0;
+
+		/*
+		 * Freeze mode recovery is disabled for the errors
+		 * in RXE_FREEZE_ABORT_MASK
+		 */
+		if (is_a0(dd) && (reg & RXE_FREEZE_ABORT_MASK))
+			flags = FREEZE_ABORT;
+
+		start_freeze_handling(dd->pport, flags);
+	}
+}
+
+static void handle_misc_err(struct hfi1_devdata *dd, u32 unused, u64 reg)
+{
+	char buf[96];
+
+	dd_dev_info(dd, "Misc Error: %s",
+		misc_err_status_string(buf, sizeof(buf), reg));
+}
+
+static void handle_pio_err(struct hfi1_devdata *dd, u32 unused, u64 reg)
+{
+	char buf[96];
+
+	dd_dev_info(dd, "PIO Error: %s\n",
+		pio_err_status_string(buf, sizeof(buf), reg));
+
+	if (reg & ALL_PIO_FREEZE_ERR)
+		start_freeze_handling(dd->pport, 0);
+}
+
+static void handle_sdma_err(struct hfi1_devdata *dd, u32 unused, u64 reg)
+{
+	char buf[96];
+
+	dd_dev_info(dd, "SDMA Error: %s\n",
+		sdma_err_status_string(buf, sizeof(buf), reg));
+
+	if (reg & ALL_SDMA_FREEZE_ERR)
+		start_freeze_handling(dd->pport, 0);
+}
+
+static void count_port_inactive(struct hfi1_devdata *dd)
+{
+	struct hfi1_pportdata *ppd = dd->pport;
+
+	if (ppd->port_xmit_discards < ~(u64)0)
+		ppd->port_xmit_discards++;
+}
+
+/*
+ * We have had a "disallowed packet" error during egress. Determine the
+ * integrity check which failed, and update relevant error counter, etc.
+ *
+ * Note that the SEND_EGRESS_ERR_INFO register has only a single
+ * bit of state per integrity check, and so we can miss the reason for an
+ * egress error if more than one packet fails the same integrity check
+ * since we cleared the corresponding bit in SEND_EGRESS_ERR_INFO.
+ */
+static void handle_send_egress_err_info(struct hfi1_devdata *dd)
+{
+	struct hfi1_pportdata *ppd = dd->pport;
+	u64 src = read_csr(dd, SEND_EGRESS_ERR_SOURCE); /* read first */
+	u64 info = read_csr(dd, SEND_EGRESS_ERR_INFO);
+	char buf[96];
+
+	/* clear down all observed info as quickly as possible after read */
+	write_csr(dd, SEND_EGRESS_ERR_INFO, info);
+
+	dd_dev_info(dd,
+		"Egress Error Info: 0x%llx, %s Egress Error Src 0x%llx\n",
+		info, egress_err_info_string(buf, sizeof(buf), info), src);
+
+	/* Eventually add other counters for each bit */
+
+	if (info & SEND_EGRESS_ERR_INFO_TOO_LONG_IB_PACKET_ERR_SMASK) {
+		if (ppd->port_xmit_discards < ~(u64)0)
+			ppd->port_xmit_discards++;
+	}
+}
+
+/*
+ * Input value is a bit position within the SEND_EGRESS_ERR_STATUS
+ * register. Does it represent a 'port inactive' error?
+ */
+static inline int port_inactive_err(u64 posn)
+{
+	return (posn >= SEES(TX_LINKDOWN) &&
+		posn <= SEES(TX_INCORRECT_LINK_STATE));
+}
+
+/*
+ * Input value is a bit position within the SEND_EGRESS_ERR_STATUS
+ * register. Does it represent a 'disallowed packet' error?
+ */
+static inline int disallowed_pkt_err(u64 posn)
+{
+	return (posn >= SEES(TX_SDMA0_DISALLOWED_PACKET) &&
+		posn <= SEES(TX_SDMA15_DISALLOWED_PACKET));
+}
+
+static void handle_egress_err(struct hfi1_devdata *dd, u32 unused, u64 reg)
+{
+	u64 reg_copy = reg, handled = 0;
+	char buf[96];
+
+	if (reg & ALL_TXE_EGRESS_FREEZE_ERR)
+		start_freeze_handling(dd->pport, 0);
+	if (is_a0(dd) && (reg &
+		    SEND_EGRESS_ERR_STATUS_TX_CREDIT_RETURN_VL_ERR_SMASK)
+		    && (dd->icode != ICODE_FUNCTIONAL_SIMULATOR))
+		start_freeze_handling(dd->pport, 0);
+
+	while (reg_copy) {
+		int posn = fls64(reg_copy);
+		/*
+		 * fls64() returns a 1-based offset, but we generally
+		 * want 0-based offsets.
+		 */
+		int shift = posn - 1;
+
+		if (port_inactive_err(shift)) {
+			count_port_inactive(dd);
+			handled |= (1ULL << shift);
+		} else if (disallowed_pkt_err(shift)) {
+			handle_send_egress_err_info(dd);
+			handled |= (1ULL << shift);
+		}
+		clear_bit(shift, (unsigned long *)&reg_copy);
+	}
+
+	reg &= ~handled;
+
+	if (reg)
+		dd_dev_info(dd, "Egress Error: %s\n",
+			egress_err_status_string(buf, sizeof(buf), reg));
+}
+
+static void handle_txe_err(struct hfi1_devdata *dd, u32 unused, u64 reg)
+{
+	char buf[96];
+
+	dd_dev_info(dd, "Send Error: %s\n",
+		send_err_status_string(buf, sizeof(buf), reg));
+
+}
+
+/*
+ * The maximum number of times the error clear down will loop before
+ * blocking a repeating error.  This value is arbitrary.
+ */
+#define MAX_CLEAR_COUNT 20
+
+/*
+ * Clear and handle an error register.  All error interrupts are funneled
+ * through here to have a central location to correctly handle single-
+ * or multi-shot errors.
+ *
+ * For non per-context registers, call this routine with a context value
+ * of 0 so the per-context offset is zero.
+ *
+ * If the handler loops too many times, assume that something is wrong
+ * and can't be fixed, so mask the error bits.
+ */
+static void interrupt_clear_down(struct hfi1_devdata *dd,
+				 u32 context,
+				 const struct err_reg_info *eri)
+{
+	u64 reg;
+	u32 count;
+
+	/* read in a loop until no more errors are seen */
+	count = 0;
+	while (1) {
+		reg = read_kctxt_csr(dd, context, eri->status);
+		if (reg == 0)
+			break;
+		write_kctxt_csr(dd, context, eri->clear, reg);
+		if (likely(eri->handler))
+			eri->handler(dd, context, reg);
+		count++;
+		if (count > MAX_CLEAR_COUNT) {
+			u64 mask;
+
+			dd_dev_err(dd, "Repeating %s bits 0x%llx - masking\n",
+				eri->desc, reg);
+			/*
+			 * Read-modify-write so any other masked bits
+			 * remain masked.
+			 */
+			mask = read_kctxt_csr(dd, context, eri->mask);
+			mask &= ~reg;
+			write_kctxt_csr(dd, context, eri->mask, mask);
+			break;
+		}
+	}
+}
+
+/*
+ * CCE block "misc" interrupt.  Source is < 16.
+ */
+static void is_misc_err_int(struct hfi1_devdata *dd, unsigned int source)
+{
+	const struct err_reg_info *eri = &misc_errs[source];
+
+	if (eri->handler) {
+		interrupt_clear_down(dd, 0, eri);
+	} else {
+		dd_dev_err(dd, "Unexpected misc interrupt (%u) - reserved\n",
+			source);
+	}
+}
+
+static char *send_context_err_status_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags,
+			sc_err_status_flags, ARRAY_SIZE(sc_err_status_flags));
+}
+
+/*
+ * Send context error interrupt.  Source (hw_context) is < 160.
+ *
+ * All send context errors cause the send context to halt.  The normal
+ * clear-down mechanism cannot be used because we cannot clear the
+ * error bits until several other long-running items are done first.
+ * This is OK because with the context halted, nothing else is going
+ * to happen on it anyway.
+ */
+static void is_sendctxt_err_int(struct hfi1_devdata *dd,
+				unsigned int hw_context)
+{
+	struct send_context_info *sci;
+	struct send_context *sc;
+	char flags[96];
+	u64 status;
+	u32 sw_index;
+
+	sw_index = dd->hw_to_sw[hw_context];
+	if (sw_index >= dd->num_send_contexts) {
+		dd_dev_err(dd,
+			"out of range sw index %u for send context %u\n",
+			sw_index, hw_context);
+		return;
+	}
+	sci = &dd->send_contexts[sw_index];
+	sc = sci->sc;
+	if (!sc) {
+		dd_dev_err(dd, "%s: context %u(%u): no sc?\n", __func__,
+			sw_index, hw_context);
+		return;
+	}
+
+	/* tell the software that a halt has begun */
+	sc_stop(sc, SCF_HALTED);
+
+	status = read_kctxt_csr(dd, hw_context, SEND_CTXT_ERR_STATUS);
+
+	dd_dev_info(dd, "Send Context %u(%u) Error: %s\n", sw_index, hw_context,
+		send_context_err_status_string(flags, sizeof(flags), status));
+
+	if (status & SEND_CTXT_ERR_STATUS_PIO_DISALLOWED_PACKET_ERR_SMASK)
+		handle_send_egress_err_info(dd);
+
+	/*
+	 * Automatically restart halted kernel contexts out of interrupt
+	 * context.  User contexts must ask the driver to restart the context.
+	 */
+	if (sc->type != SC_USER)
+		queue_work(dd->pport->hfi1_wq, &sc->halt_work);
+}
+
+static void handle_sdma_eng_err(struct hfi1_devdata *dd,
+				unsigned int source, u64 status)
+{
+	struct sdma_engine *sde;
+
+	sde = &dd->per_sdma[source];
+#ifdef CONFIG_SDMA_VERBOSITY
+	dd_dev_err(sde->dd, "CONFIG SDMA(%u) %s:%d %s()\n", sde->this_idx,
+		   slashstrip(__FILE__), __LINE__, __func__);
+	dd_dev_err(sde->dd, "CONFIG SDMA(%u) source: %u status 0x%llx\n",
+		   sde->this_idx, source, (unsigned long long)status);
+#endif
+	sdma_engine_error(sde, status);
+}
+
+/*
+ * CCE block SDMA error interrupt.  Source is < 16.
+ */
+static void is_sdma_eng_err_int(struct hfi1_devdata *dd, unsigned int source)
+{
+#ifdef CONFIG_SDMA_VERBOSITY
+	struct sdma_engine *sde = &dd->per_sdma[source];
+
+	dd_dev_err(dd, "CONFIG SDMA(%u) %s:%d %s()\n", sde->this_idx,
+		   slashstrip(__FILE__), __LINE__, __func__);
+	dd_dev_err(dd, "CONFIG SDMA(%u) source: %u\n", sde->this_idx,
+		   source);
+	sdma_dumpstate(sde);
+#endif
+	interrupt_clear_down(dd, source, &sdma_eng_err);
+}
+
+/*
+ * CCE block "various" interrupt.  Source is < 8.
+ */
+static void is_various_int(struct hfi1_devdata *dd, unsigned int source)
+{
+	const struct err_reg_info *eri = &various_err[source];
+
+	/*
+	 * TCritInt cannot go through interrupt_clear_down()
+	 * because it is not a second tier interrupt. The handler
+	 * should be called directly.
+	 */
+	if (source == TCRIT_INT_SOURCE)
+		handle_temp_err(dd);
+	else if (eri->handler)
+		interrupt_clear_down(dd, 0, eri);
+	else
+		dd_dev_info(dd,
+			"%s: Unimplemented/reserved interrupt %d\n",
+			__func__, source);
+}
+
+static void handle_qsfp_int(struct hfi1_devdata *dd, u32 src_ctx, u64 reg)
+{
+	/* source is always zero */
+	struct hfi1_pportdata *ppd = dd->pport;
+	unsigned long flags;
+	u64 qsfp_int_mgmt = (u64)(QSFP_HFI0_INT_N | QSFP_HFI0_MODPRST_N);
+
+	if (reg & QSFP_HFI0_MODPRST_N) {
+
+		dd_dev_info(dd, "%s: ModPresent triggered QSFP interrupt\n",
+				__func__);
+
+		if (!qsfp_mod_present(ppd)) {
+			ppd->driver_link_ready = 0;
+			/*
+			 * Cable removed, reset all our information about the
+			 * cache and cable capabilities
+			 */
+
+			spin_lock_irqsave(&ppd->qsfp_info.qsfp_lock, flags);
+			/*
+			 * We don't set cache_refresh_required here as we expect
+			 * an interrupt when a cable is inserted
+			 */
+			ppd->qsfp_info.cache_valid = 0;
+			ppd->qsfp_info.qsfp_interrupt_functional = 0;
+			spin_unlock_irqrestore(&ppd->qsfp_info.qsfp_lock,
+						flags);
+			write_csr(dd,
+					dd->hfi1_id ?
+						ASIC_QSFP2_INVERT :
+						ASIC_QSFP1_INVERT,
+				qsfp_int_mgmt);
+			if (ppd->host_link_state == HLS_DN_POLL) {
+				/*
+				 * The link is still in POLL. This means
+				 * that the normal link down processing
+				 * will not happen. We have to do it here
+				 * before turning the DC off.
+				 */
+				queue_work(ppd->hfi1_wq, &ppd->link_down_work);
+			}
+		} else {
+			spin_lock_irqsave(&ppd->qsfp_info.qsfp_lock, flags);
+			ppd->qsfp_info.cache_valid = 0;
+			ppd->qsfp_info.cache_refresh_required = 1;
+			spin_unlock_irqrestore(&ppd->qsfp_info.qsfp_lock,
+						flags);
+
+			qsfp_int_mgmt &= ~(u64)QSFP_HFI0_MODPRST_N;
+			write_csr(dd,
+					dd->hfi1_id ?
+						ASIC_QSFP2_INVERT :
+						ASIC_QSFP1_INVERT,
+				qsfp_int_mgmt);
+		}
+	}
+
+	if (reg & QSFP_HFI0_INT_N) {
+
+		dd_dev_info(dd, "%s: IntN triggered QSFP interrupt\n",
+				__func__);
+		spin_lock_irqsave(&ppd->qsfp_info.qsfp_lock, flags);
+		ppd->qsfp_info.check_interrupt_flags = 1;
+		ppd->qsfp_info.qsfp_interrupt_functional = 1;
+		spin_unlock_irqrestore(&ppd->qsfp_info.qsfp_lock, flags);
+	}
+
+	/* Schedule the QSFP work only if there is a cable attached. */
+	if (qsfp_mod_present(ppd))
+		queue_work(ppd->hfi1_wq, &ppd->qsfp_info.qsfp_work);
+}
+
+static int request_host_lcb_access(struct hfi1_devdata *dd)
+{
+	int ret;
+
+	ret = do_8051_command(dd, HCMD_MISC,
+		(u64)HCMD_MISC_REQUEST_LCB_ACCESS << LOAD_DATA_FIELD_ID_SHIFT,
+		NULL);
+	if (ret != HCMD_SUCCESS) {
+		dd_dev_err(dd, "%s: command failed with error %d\n",
+			__func__, ret);
+	}
+	return ret == HCMD_SUCCESS ? 0 : -EBUSY;
+}
+
+static int request_8051_lcb_access(struct hfi1_devdata *dd)
+{
+	int ret;
+
+	ret = do_8051_command(dd, HCMD_MISC,
+		(u64)HCMD_MISC_GRANT_LCB_ACCESS << LOAD_DATA_FIELD_ID_SHIFT,
+		NULL);
+	if (ret != HCMD_SUCCESS) {
+		dd_dev_err(dd, "%s: command failed with error %d\n",
+			__func__, ret);
+	}
+	return ret == HCMD_SUCCESS ? 0 : -EBUSY;
+}
+
+/*
+ * Set the LCB selector - allow host access.  The DCC selector always
+ * points to the host.
+ */
+static inline void set_host_lcb_access(struct hfi1_devdata *dd)
+{
+	write_csr(dd, DC_DC8051_CFG_CSR_ACCESS_SEL,
+				DC_DC8051_CFG_CSR_ACCESS_SEL_DCC_SMASK
+				| DC_DC8051_CFG_CSR_ACCESS_SEL_LCB_SMASK);
+}
+
+/*
+ * Clear the LCB selector - allow 8051 access.  The DCC selector always
+ * points to the host.
+ */
+static inline void set_8051_lcb_access(struct hfi1_devdata *dd)
+{
+	write_csr(dd, DC_DC8051_CFG_CSR_ACCESS_SEL,
+				DC_DC8051_CFG_CSR_ACCESS_SEL_DCC_SMASK);
+}
+
+/*
+ * Acquire LCB access from the 8051.  If the host already has access,
+ * just increment a counter.  Otherwise, inform the 8051 that the
+ * host is taking access.
+ *
+ * Returns:
+ *	0 on success
+ *	-EBUSY if the 8051 has control and cannot be disturbed
+ *	-errno if unable to acquire access from the 8051
+ */
+int acquire_lcb_access(struct hfi1_devdata *dd, int sleep_ok)
+{
+	struct hfi1_pportdata *ppd = dd->pport;
+	int ret = 0;
+
+	/*
+	 * Use the host link state lock so the operation of this routine
+	 * { link state check, selector change, count increment } can occur
+	 * as a unit against a link state change.  Otherwise there is a
+	 * race between the state change and the count increment.
+	 */
+	if (sleep_ok) {
+		mutex_lock(&ppd->hls_lock);
+	} else {
+		while (mutex_trylock(&ppd->hls_lock) == EBUSY)
+			udelay(1);
+	}
+
+	/* this access is valid only when the link is up */
+	if ((ppd->host_link_state & HLS_UP) == 0) {
+		dd_dev_info(dd, "%s: link state %s not up\n",
+			__func__, link_state_name(ppd->host_link_state));
+		ret = -EBUSY;
+		goto done;
+	}
+
+	if (dd->lcb_access_count == 0) {
+		ret = request_host_lcb_access(dd);
+		if (ret) {
+			dd_dev_err(dd,
+				"%s: unable to acquire LCB access, err %d\n",
+				__func__, ret);
+			goto done;
+		}
+		set_host_lcb_access(dd);
+	}
+	dd->lcb_access_count++;
+done:
+	mutex_unlock(&ppd->hls_lock);
+	return ret;
+}
+
+/*
+ * Release LCB access by decrementing the use count.  If the count is moving
+ * from 1 to 0, inform 8051 that it has control back.
+ *
+ * Returns:
+ *	0 on success
+ *	-errno if unable to release access to the 8051
+ */
+int release_lcb_access(struct hfi1_devdata *dd, int sleep_ok)
+{
+	int ret = 0;
+
+	/*
+	 * Use the host link state lock because the acquire needed it.
+	 * Here, we only need to keep { selector change, count decrement }
+	 * as a unit.
+	 */
+	if (sleep_ok) {
+		mutex_lock(&dd->pport->hls_lock);
+	} else {
+		while (mutex_trylock(&dd->pport->hls_lock) == EBUSY)
+			udelay(1);
+	}
+
+	if (dd->lcb_access_count == 0) {
+		dd_dev_err(dd, "%s: LCB access count is zero.  Skipping.\n",
+			__func__);
+		goto done;
+	}
+
+	if (dd->lcb_access_count == 1) {
+		set_8051_lcb_access(dd);
+		ret = request_8051_lcb_access(dd);
+		if (ret) {
+			dd_dev_err(dd,
+				"%s: unable to release LCB access, err %d\n",
+				__func__, ret);
+			/* restore host access if the grant didn't work */
+			set_host_lcb_access(dd);
+			goto done;
+		}
+	}
+	dd->lcb_access_count--;
+done:
+	mutex_unlock(&dd->pport->hls_lock);
+	return ret;
+}
+
+/*
+ * Initialize LCB access variables and state.  Called during driver load,
+ * after most of the initialization is finished.
+ *
+ * The DC default is LCB access on for the host.  The driver defaults to
+ * leaving access to the 8051.  Assign access now - this constrains the call
+ * to this routine to be after all LCB set-up is done.  In particular, after
+ * hf1_init_dd() -> set_up_interrupts() -> clear_all_interrupts()
+ */
+static void init_lcb_access(struct hfi1_devdata *dd)
+{
+	dd->lcb_access_count = 0;
+}
+
+/*
+ * Write a response back to a 8051 request.
+ */
+static void hreq_response(struct hfi1_devdata *dd, u8 return_code, u16 rsp_data)
+{
+	write_csr(dd, DC_DC8051_CFG_EXT_DEV_0,
+		DC_DC8051_CFG_EXT_DEV_0_COMPLETED_SMASK
+		| (u64)return_code << DC_DC8051_CFG_EXT_DEV_0_RETURN_CODE_SHIFT
+		| (u64)rsp_data << DC_DC8051_CFG_EXT_DEV_0_RSP_DATA_SHIFT);
+}
+
+/*
+ * Handle requests from the 8051.
+ */
+static void handle_8051_request(struct hfi1_devdata *dd)
+{
+	u64 reg;
+	u16 data;
+	u8 type;
+
+	reg = read_csr(dd, DC_DC8051_CFG_EXT_DEV_1);
+	if ((reg & DC_DC8051_CFG_EXT_DEV_1_REQ_NEW_SMASK) == 0)
+		return;	/* no request */
+
+	/* zero out COMPLETED so the response is seen */
+	write_csr(dd, DC_DC8051_CFG_EXT_DEV_0, 0);
+
+	/* extract request details */
+	type = (reg >> DC_DC8051_CFG_EXT_DEV_1_REQ_TYPE_SHIFT)
+			& DC_DC8051_CFG_EXT_DEV_1_REQ_TYPE_MASK;
+	data = (reg >> DC_DC8051_CFG_EXT_DEV_1_REQ_DATA_SHIFT)
+			& DC_DC8051_CFG_EXT_DEV_1_REQ_DATA_MASK;
+
+	switch (type) {
+	case HREQ_LOAD_CONFIG:
+	case HREQ_SAVE_CONFIG:
+	case HREQ_READ_CONFIG:
+	case HREQ_SET_TX_EQ_ABS:
+	case HREQ_SET_TX_EQ_REL:
+	case HREQ_ENABLE:
+		dd_dev_info(dd, "8051 request: request 0x%x not supported\n",
+			type);
+		hreq_response(dd, HREQ_NOT_SUPPORTED, 0);
+		break;
+
+	case HREQ_CONFIG_DONE:
+		hreq_response(dd, HREQ_SUCCESS, 0);
+		break;
+
+	case HREQ_INTERFACE_TEST:
+		hreq_response(dd, HREQ_SUCCESS, data);
+		break;
+
+	default:
+		dd_dev_err(dd, "8051 request: unknown request 0x%x\n", type);
+		hreq_response(dd, HREQ_NOT_SUPPORTED, 0);
+		break;
+	}
+}
+
+static void write_global_credit(struct hfi1_devdata *dd,
+				u8 vau, u16 total, u16 shared)
+{
+	write_csr(dd, SEND_CM_GLOBAL_CREDIT,
+		((u64)total
+			<< SEND_CM_GLOBAL_CREDIT_TOTAL_CREDIT_LIMIT_SHIFT)
+		| ((u64)shared
+			<< SEND_CM_GLOBAL_CREDIT_SHARED_LIMIT_SHIFT)
+		| ((u64)vau << SEND_CM_GLOBAL_CREDIT_AU_SHIFT));
+}
+
+/*
+ * Set up initial VL15 credits of the remote.  Assumes the rest of
+ * the CM credit registers are zero from a previous global or credit reset .
+ */
+void set_up_vl15(struct hfi1_devdata *dd, u8 vau, u16 vl15buf)
+{
+	/* leave shared count at zero for both global and VL15 */
+	write_global_credit(dd, vau, vl15buf, 0);
+
+	/* We may need some credits for another VL when sending packets
+	 * with the snoop interface. Dividing it down the middle for VL15
+	 * and VL0 should suffice.
+	 */
+	if (unlikely(dd->hfi1_snoop.mode_flag == HFI1_PORT_SNOOP_MODE)) {
+		write_csr(dd, SEND_CM_CREDIT_VL15, (u64)(vl15buf >> 1)
+		    << SEND_CM_CREDIT_VL15_DEDICATED_LIMIT_VL_SHIFT);
+		write_csr(dd, SEND_CM_CREDIT_VL, (u64)(vl15buf >> 1)
+		    << SEND_CM_CREDIT_VL_DEDICATED_LIMIT_VL_SHIFT);
+	} else {
+		write_csr(dd, SEND_CM_CREDIT_VL15, (u64)vl15buf
+			<< SEND_CM_CREDIT_VL15_DEDICATED_LIMIT_VL_SHIFT);
+	}
+}
+
+/*
+ * Zero all credit details from the previous connection and
+ * reset the CM manager's internal counters.
+ */
+void reset_link_credits(struct hfi1_devdata *dd)
+{
+	int i;
+
+	/* remove all previous VL credit limits */
+	for (i = 0; i < TXE_NUM_DATA_VL; i++)
+		write_csr(dd, SEND_CM_CREDIT_VL + (8*i), 0);
+	write_csr(dd, SEND_CM_CREDIT_VL15, 0);
+	write_global_credit(dd, 0, 0, 0);
+	/* reset the CM block */
+	pio_send_control(dd, PSC_CM_RESET);
+}
+
+/* convert a vCU to a CU */
+static u32 vcu_to_cu(u8 vcu)
+{
+	return 1 << vcu;
+}
+
+/* convert a CU to a vCU */
+static u8 cu_to_vcu(u32 cu)
+{
+	return ilog2(cu);
+}
+
+/* convert a vAU to an AU */
+static u32 vau_to_au(u8 vau)
+{
+	return 8 * (1 << vau);
+}
+
+static void set_linkup_defaults(struct hfi1_pportdata *ppd)
+{
+	ppd->sm_trap_qp = 0x0;
+	ppd->sa_qp = 0x1;
+}
+
+/*
+ * Graceful LCB shutdown.  This leaves the LCB FIFOs in reset.
+ */
+static void lcb_shutdown(struct hfi1_devdata *dd, int abort)
+{
+	u64 reg;
+
+	/* clear lcb run: LCB_CFG_RUN.EN = 0 */
+	write_csr(dd, DC_LCB_CFG_RUN, 0);
+	/* set tx fifo reset: LCB_CFG_TX_FIFOS_RESET.VAL = 1 */
+	write_csr(dd, DC_LCB_CFG_TX_FIFOS_RESET,
+		1ull << DC_LCB_CFG_TX_FIFOS_RESET_VAL_SHIFT);
+	/* set dcc reset csr: DCC_CFG_RESET.{reset_lcb,reset_rx_fpe} = 1 */
+	dd->lcb_err_en = read_csr(dd, DC_LCB_ERR_EN);
+	reg = read_csr(dd, DCC_CFG_RESET);
+	write_csr(dd, DCC_CFG_RESET,
+		reg
+		| (1ull << DCC_CFG_RESET_RESET_LCB_SHIFT)
+		| (1ull << DCC_CFG_RESET_RESET_RX_FPE_SHIFT));
+	(void) read_csr(dd, DCC_CFG_RESET); /* make sure the write completed */
+	if (!abort) {
+		udelay(1);    /* must hold for the longer of 16cclks or 20ns */
+		write_csr(dd, DCC_CFG_RESET, reg);
+		write_csr(dd, DC_LCB_ERR_EN, dd->lcb_err_en);
+	}
+}
+
+/*
+ * This routine should be called after the link has been transitioned to
+ * OFFLINE (OFFLINE state has the side effect of putting the SerDes into
+ * reset).
+ *
+ * The expectation is that the caller of this routine would have taken
+ * care of properly transitioning the link into the correct state.
+ */
+static void dc_shutdown(struct hfi1_devdata *dd)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dd->dc8051_lock, flags);
+	if (dd->dc_shutdown) {
+		spin_unlock_irqrestore(&dd->dc8051_lock, flags);
+		return;
+	}
+	dd->dc_shutdown = 1;
+	spin_unlock_irqrestore(&dd->dc8051_lock, flags);
+	/* Shutdown the LCB */
+	lcb_shutdown(dd, 1);
+	/* Going to OFFLINE would have causes the 8051 to put the
+	 * SerDes into reset already. Just need to shut down the 8051,
+	 * itself. */
+	write_csr(dd, DC_DC8051_CFG_RST, 0x1);
+}
+
+/* Calling this after the DC has been brought out of reset should not
+ * do any damage. */
+static void dc_start(struct hfi1_devdata *dd)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&dd->dc8051_lock, flags);
+	if (!dd->dc_shutdown)
+		goto done;
+	spin_unlock_irqrestore(&dd->dc8051_lock, flags);
+	/* Take the 8051 out of reset */
+	write_csr(dd, DC_DC8051_CFG_RST, 0ull);
+	/* Wait until 8051 is ready */
+	ret = wait_fm_ready(dd, TIMEOUT_8051_START);
+	if (ret) {
+		dd_dev_err(dd, "%s: timeout starting 8051 firmware\n",
+			__func__);
+	}
+	/* Take away reset for LCB and RX FPE (set in lcb_shutdown). */
+	write_csr(dd, DCC_CFG_RESET, 0x10);
+	/* lcb_shutdown() with abort=1 does not restore these */
+	write_csr(dd, DC_LCB_ERR_EN, dd->lcb_err_en);
+	spin_lock_irqsave(&dd->dc8051_lock, flags);
+	dd->dc_shutdown = 0;
+done:
+	spin_unlock_irqrestore(&dd->dc8051_lock, flags);
+}
+
+/*
+ * These LCB adjustments are for the Aurora SerDes core in the FPGA.
+ */
+static void adjust_lcb_for_fpga_serdes(struct hfi1_devdata *dd)
+{
+	u64 rx_radr, tx_radr;
+	u32 version;
+
+	if (dd->icode != ICODE_FPGA_EMULATION)
+		return;
+
+	/*
+	 * These LCB defaults on emulator _s are good, nothing to do here:
+	 *	LCB_CFG_TX_FIFOS_RADR
+	 *	LCB_CFG_RX_FIFOS_RADR
+	 *	LCB_CFG_LN_DCLK
+	 *	LCB_CFG_IGNORE_LOST_RCLK
+	 */
+	if (is_emulator_s(dd))
+		return;
+	/* else this is _p */
+
+	version = emulator_rev(dd);
+	if (!is_a0(dd))
+		version = 0x2d;	/* all B0 use 0x2d or higher settings */
+
+	if (version <= 0x12) {
+		/* release 0x12 and below */
+
+		/*
+		 * LCB_CFG_RX_FIFOS_RADR.RST_VAL = 0x9
+		 * LCB_CFG_RX_FIFOS_RADR.OK_TO_JUMP_VAL = 0x9
+		 * LCB_CFG_RX_FIFOS_RADR.DO_NOT_JUMP_VAL = 0xa
+		 */
+		rx_radr =
+		      0xaull << DC_LCB_CFG_RX_FIFOS_RADR_DO_NOT_JUMP_VAL_SHIFT
+		    | 0x9ull << DC_LCB_CFG_RX_FIFOS_RADR_OK_TO_JUMP_VAL_SHIFT
+		    | 0x9ull << DC_LCB_CFG_RX_FIFOS_RADR_RST_VAL_SHIFT;
+		/*
+		 * LCB_CFG_TX_FIFOS_RADR.ON_REINIT = 0 (default)
+		 * LCB_CFG_TX_FIFOS_RADR.RST_VAL = 6
+		 */
+		tx_radr = 6ull << DC_LCB_CFG_TX_FIFOS_RADR_RST_VAL_SHIFT;
+	} else if (version <= 0x18) {
+		/* release 0x13 up to 0x18 */
+		/* LCB_CFG_RX_FIFOS_RADR = 0x988 */
+		rx_radr =
+		      0x9ull << DC_LCB_CFG_RX_FIFOS_RADR_DO_NOT_JUMP_VAL_SHIFT
+		    | 0x8ull << DC_LCB_CFG_RX_FIFOS_RADR_OK_TO_JUMP_VAL_SHIFT
+		    | 0x8ull << DC_LCB_CFG_RX_FIFOS_RADR_RST_VAL_SHIFT;
+		tx_radr = 7ull << DC_LCB_CFG_TX_FIFOS_RADR_RST_VAL_SHIFT;
+	} else if (version == 0x19) {
+		/* release 0x19 */
+		/* LCB_CFG_RX_FIFOS_RADR = 0xa99 */
+		rx_radr =
+		      0xAull << DC_LCB_CFG_RX_FIFOS_RADR_DO_NOT_JUMP_VAL_SHIFT
+		    | 0x9ull << DC_LCB_CFG_RX_FIFOS_RADR_OK_TO_JUMP_VAL_SHIFT
+		    | 0x9ull << DC_LCB_CFG_RX_FIFOS_RADR_RST_VAL_SHIFT;
+		tx_radr = 3ull << DC_LCB_CFG_TX_FIFOS_RADR_RST_VAL_SHIFT;
+	} else if (version == 0x1a) {
+		/* release 0x1a */
+		/* LCB_CFG_RX_FIFOS_RADR = 0x988 */
+		rx_radr =
+		      0x9ull << DC_LCB_CFG_RX_FIFOS_RADR_DO_NOT_JUMP_VAL_SHIFT
+		    | 0x8ull << DC_LCB_CFG_RX_FIFOS_RADR_OK_TO_JUMP_VAL_SHIFT
+		    | 0x8ull << DC_LCB_CFG_RX_FIFOS_RADR_RST_VAL_SHIFT;
+		tx_radr = 7ull << DC_LCB_CFG_TX_FIFOS_RADR_RST_VAL_SHIFT;
+		write_csr(dd, DC_LCB_CFG_LN_DCLK, 1ull);
+	} else {
+		/* release 0x1b and higher */
+		/* LCB_CFG_RX_FIFOS_RADR = 0x877 */
+		rx_radr =
+		      0x8ull << DC_LCB_CFG_RX_FIFOS_RADR_DO_NOT_JUMP_VAL_SHIFT
+		    | 0x7ull << DC_LCB_CFG_RX_FIFOS_RADR_OK_TO_JUMP_VAL_SHIFT
+		    | 0x7ull << DC_LCB_CFG_RX_FIFOS_RADR_RST_VAL_SHIFT;
+		tx_radr = 3ull << DC_LCB_CFG_TX_FIFOS_RADR_RST_VAL_SHIFT;
+	}
+
+	write_csr(dd, DC_LCB_CFG_RX_FIFOS_RADR, rx_radr);
+	/* LCB_CFG_IGNORE_LOST_RCLK.EN = 1 */
+	write_csr(dd, DC_LCB_CFG_IGNORE_LOST_RCLK,
+		DC_LCB_CFG_IGNORE_LOST_RCLK_EN_SMASK);
+	write_csr(dd, DC_LCB_CFG_TX_FIFOS_RADR, tx_radr);
+}
+
+/*
+ * Handle a SMA idle message
+ *
+ * This is a work-queue function outside of the interrupt.
+ */
+void handle_sma_message(struct work_struct *work)
+{
+	struct hfi1_pportdata *ppd = container_of(work, struct hfi1_pportdata,
+							sma_message_work);
+	struct hfi1_devdata *dd = ppd->dd;
+	u64 msg;
+	int ret;
+
+	/* msg is bytes 1-4 of the 40-bit idle message - the command code
+	   is stripped off */
+	ret = read_idle_sma(dd, &msg);
+	if (ret)
+		return;
+	dd_dev_info(dd, "%s: SMA message 0x%llx\n", __func__, msg);
+	/*
+	 * React to the SMA message.  Byte[1] (0 for us) is the command.
+	 */
+	switch (msg & 0xff) {
+	case SMA_IDLE_ARM:
+		/*
+		 * See OPAv1 table 9-14 - HFI and External Switch Ports Key
+		 * State Transitions
+		 *
+		 * Only expected in INIT or ARMED, discard otherwise.
+		 */
+		if (ppd->host_link_state & (HLS_UP_INIT | HLS_UP_ARMED))
+			ppd->neighbor_normal = 1;
+		break;
+	case SMA_IDLE_ACTIVE:
+		/*
+		 * See OPAv1 table 9-14 - HFI and External Switch Ports Key
+		 * State Transitions
+		 *
+		 * Can activate the node.  Discard otherwise.
+		 */
+		if (ppd->host_link_state == HLS_UP_ARMED
+					&& ppd->is_active_optimize_enabled) {
+			ppd->neighbor_normal = 1;
+			ret = set_link_state(ppd, HLS_UP_ACTIVE);
+			if (ret)
+				dd_dev_err(
+					dd,
+					"%s: received Active SMA idle message, couldn't set link to Active\n",
+					__func__);
+		}
+		break;
+	default:
+		dd_dev_err(dd,
+			"%s: received unexpected SMA idle message 0x%llx\n",
+			__func__, msg);
+		break;
+	}
+}
+
+static void adjust_rcvctrl(struct hfi1_devdata *dd, u64 add, u64 clear)
+{
+	u64 rcvctrl;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dd->rcvctrl_lock, flags);
+	rcvctrl = read_csr(dd, RCV_CTRL);
+	rcvctrl |= add;
+	rcvctrl &= ~clear;
+	write_csr(dd, RCV_CTRL, rcvctrl);
+	spin_unlock_irqrestore(&dd->rcvctrl_lock, flags);
+}
+
+static inline void add_rcvctrl(struct hfi1_devdata *dd, u64 add)
+{
+	adjust_rcvctrl(dd, add, 0);
+}
+
+static inline void clear_rcvctrl(struct hfi1_devdata *dd, u64 clear)
+{
+	adjust_rcvctrl(dd, 0, clear);
+}
+
+/*
+ * Called from all interrupt handlers to start handling an SPC freeze.
+ */
+void start_freeze_handling(struct hfi1_pportdata *ppd, int flags)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	struct send_context *sc;
+	int i;
+
+	if (flags & FREEZE_SELF)
+		write_csr(dd, CCE_CTRL, CCE_CTRL_SPC_FREEZE_SMASK);
+
+	/* enter frozen mode */
+	dd->flags |= HFI1_FROZEN;
+
+	/* notify all SDMA engines that they are going into a freeze */
+	sdma_freeze_notify(dd, !!(flags & FREEZE_LINK_DOWN));
+
+	/* do halt pre-handling on all enabled send contexts */
+	for (i = 0; i < dd->num_send_contexts; i++) {
+		sc = dd->send_contexts[i].sc;
+		if (sc && (sc->flags & SCF_ENABLED))
+			sc_stop(sc, SCF_FROZEN | SCF_HALTED);
+	}
+
+	/* Send context are frozen. Notify user space */
+	hfi1_set_uevent_bits(ppd, _HFI1_EVENT_FROZEN_BIT);
+
+	if (flags & FREEZE_ABORT) {
+		dd_dev_err(dd,
+			   "Aborted freeze recovery. Please REBOOT system\n");
+		return;
+	}
+	/* queue non-interrupt handler */
+	queue_work(ppd->hfi1_wq, &ppd->freeze_work);
+}
+
+/*
+ * Wait until all 4 sub-blocks indicate that they have frozen or unfrozen,
+ * depending on the "freeze" parameter.
+ *
+ * No need to return an error if it times out, our only option
+ * is to proceed anyway.
+ */
+static void wait_for_freeze_status(struct hfi1_devdata *dd, int freeze)
+{
+	unsigned long timeout;
+	u64 reg;
+
+	timeout = jiffies + msecs_to_jiffies(FREEZE_STATUS_TIMEOUT);
+	while (1) {
+		reg = read_csr(dd, CCE_STATUS);
+		if (freeze) {
+			/* waiting until all indicators are set */
+			if ((reg & ALL_FROZE) == ALL_FROZE)
+				return;	/* all done */
+		} else {
+			/* waiting until all indicators are clear */
+			if ((reg & ALL_FROZE) == 0)
+				return; /* all done */
+		}
+
+		if (time_after(jiffies, timeout)) {
+			dd_dev_err(dd,
+				"Time out waiting for SPC %sfreeze, bits 0x%llx, expecting 0x%llx, continuing",
+				freeze ? "" : "un",
+				reg & ALL_FROZE,
+				freeze ? ALL_FROZE : 0ull);
+			return;
+		}
+		usleep_range(80, 120);
+	}
+}
+
+/*
+ * Do all freeze handling for the RXE block.
+ */
+static void rxe_freeze(struct hfi1_devdata *dd)
+{
+	int i;
+
+	/* disable port */
+	clear_rcvctrl(dd, RCV_CTRL_RCV_PORT_ENABLE_SMASK);
+
+	/* disable all receive contexts */
+	for (i = 0; i < dd->num_rcv_contexts; i++)
+		hfi1_rcvctrl(dd, HFI1_RCVCTRL_CTXT_DIS, i);
+}
+
+/*
+ * Unfreeze handling for the RXE block - kernel contexts only.
+ * This will also enable the port.  User contexts will do unfreeze
+ * handling on a per-context basis as they call into the driver.
+ *
+ */
+static void rxe_kernel_unfreeze(struct hfi1_devdata *dd)
+{
+	int i;
+
+	/* enable all kernel contexts */
+	for (i = 0; i < dd->n_krcv_queues; i++)
+		hfi1_rcvctrl(dd, HFI1_RCVCTRL_CTXT_ENB, i);
+
+	/* enable port */
+	add_rcvctrl(dd, RCV_CTRL_RCV_PORT_ENABLE_SMASK);
+}
+
+/*
+ * Non-interrupt SPC freeze handling.
+ *
+ * This is a work-queue function outside of the triggering interrupt.
+ */
+void handle_freeze(struct work_struct *work)
+{
+	struct hfi1_pportdata *ppd = container_of(work, struct hfi1_pportdata,
+								freeze_work);
+	struct hfi1_devdata *dd = ppd->dd;
+
+	/* wait for freeze indicators on all affected blocks */
+	dd_dev_info(dd, "Entering SPC freeze\n");
+	wait_for_freeze_status(dd, 1);
+
+	/* SPC is now frozen */
+
+	/* do send PIO freeze steps */
+	pio_freeze(dd);
+
+	/* do send DMA freeze steps */
+	sdma_freeze(dd);
+
+	/* do send egress freeze steps - nothing to do */
+
+	/* do receive freeze steps */
+	rxe_freeze(dd);
+
+	/*
+	 * Unfreeze the hardware - clear the freeze, wait for each
+	 * block's frozen bit to clear, then clear the frozen flag.
+	 */
+	write_csr(dd, CCE_CTRL, CCE_CTRL_SPC_UNFREEZE_SMASK);
+	wait_for_freeze_status(dd, 0);
+
+	if (is_a0(dd)) {
+		write_csr(dd, CCE_CTRL, CCE_CTRL_SPC_FREEZE_SMASK);
+		wait_for_freeze_status(dd, 1);
+		write_csr(dd, CCE_CTRL, CCE_CTRL_SPC_UNFREEZE_SMASK);
+		wait_for_freeze_status(dd, 0);
+	}
+
+	/* do send PIO unfreeze steps for kernel contexts */
+	pio_kernel_unfreeze(dd);
+
+	/* do send DMA unfreeze steps */
+	sdma_unfreeze(dd);
+
+	/* do send egress unfreeze steps - nothing to do */
+
+	/* do receive unfreeze steps for kernel contexts */
+	rxe_kernel_unfreeze(dd);
+
+	/*
+	 * The unfreeze procedure touches global device registers when
+	 * it disables and re-enables RXE. Mark the device unfrozen
+	 * after all that is done so other parts of the driver waiting
+	 * for the device to unfreeze don't do things out of order.
+	 *
+	 * The above implies that the meaning of HFI1_FROZEN flag is
+	 * "Device has gone into freeze mode and freeze mode handling
+	 * is still in progress."
+	 *
+	 * The flag will be removed when freeze mode processing has
+	 * completed.
+	 */
+	dd->flags &= ~HFI1_FROZEN;
+	wake_up(&dd->event_queue);
+
+	/* no longer frozen */
+	dd_dev_err(dd, "Exiting SPC freeze\n");
+}
+
+/*
+ * Handle a link up interrupt from the 8051.
+ *
+ * This is a work-queue function outside of the interrupt.
+ */
+void handle_link_up(struct work_struct *work)
+{
+	struct hfi1_pportdata *ppd = container_of(work, struct hfi1_pportdata,
+								link_up_work);
+	set_link_state(ppd, HLS_UP_INIT);
+
+	/* cache the read of DC_LCB_STS_ROUND_TRIP_LTP_CNT */
+	read_ltp_rtt(ppd->dd);
+	/*
+	 * OPA specifies that certain counters are cleared on a transition
+	 * to link up, so do that.
+	 */
+	clear_linkup_counters(ppd->dd);
+	/*
+	 * And (re)set link up default values.
+	 */
+	set_linkup_defaults(ppd);
+
+	/* enforce link speed enabled */
+	if ((ppd->link_speed_active & ppd->link_speed_enabled) == 0) {
+		/* oops - current speed is not enabled, bounce */
+		dd_dev_err(ppd->dd,
+			"Link speed active 0x%x is outside enabled 0x%x, downing link\n",
+			ppd->link_speed_active, ppd->link_speed_enabled);
+		set_link_down_reason(ppd, OPA_LINKDOWN_REASON_SPEED_POLICY, 0,
+			OPA_LINKDOWN_REASON_SPEED_POLICY);
+		set_link_state(ppd, HLS_DN_OFFLINE);
+		start_link(ppd);
+	}
+}
+
+/* Several pieces of LNI information were cached for SMA in ppd.
+ * Reset these on link down */
+static void reset_neighbor_info(struct hfi1_pportdata *ppd)
+{
+	ppd->neighbor_guid = 0;
+	ppd->neighbor_port_number = 0;
+	ppd->neighbor_type = 0;
+	ppd->neighbor_fm_security = 0;
+}
+
+/*
+ * Handle a link down interrupt from the 8051.
+ *
+ * This is a work-queue function outside of the interrupt.
+ */
+void handle_link_down(struct work_struct *work)
+{
+	u8 lcl_reason, neigh_reason = 0;
+	struct hfi1_pportdata *ppd = container_of(work, struct hfi1_pportdata,
+								link_down_work);
+
+	/* go offline first, then deal with reasons */
+	set_link_state(ppd, HLS_DN_OFFLINE);
+
+	lcl_reason = 0;
+	read_planned_down_reason_code(ppd->dd, &neigh_reason);
+
+	/*
+	 * If no reason, assume peer-initiated but missed
+	 * LinkGoingDown idle flits.
+	 */
+	if (neigh_reason == 0)
+		lcl_reason = OPA_LINKDOWN_REASON_NEIGHBOR_UNKNOWN;
+
+	set_link_down_reason(ppd, lcl_reason, neigh_reason, 0);
+
+	reset_neighbor_info(ppd);
+
+	/* disable the port */
+	clear_rcvctrl(ppd->dd, RCV_CTRL_RCV_PORT_ENABLE_SMASK);
+
+	/* If there is no cable attached, turn the DC off. Otherwise,
+	 * start the link bring up. */
+	if (!qsfp_mod_present(ppd))
+		dc_shutdown(ppd->dd);
+	else
+		start_link(ppd);
+}
+
+void handle_link_bounce(struct work_struct *work)
+{
+	struct hfi1_pportdata *ppd = container_of(work, struct hfi1_pportdata,
+							link_bounce_work);
+
+	/*
+	 * Only do something if the link is currently up.
+	 */
+	if (ppd->host_link_state & HLS_UP) {
+		set_link_state(ppd, HLS_DN_OFFLINE);
+		start_link(ppd);
+	} else {
+		dd_dev_info(ppd->dd, "%s: link not up (%s), nothing to do\n",
+			__func__, link_state_name(ppd->host_link_state));
+	}
+}
+
+/*
+ * Mask conversion: Capability exchange to Port LTP.  The capability
+ * exchange has an implicit 16b CRC that is mandatory.
+ */
+static int cap_to_port_ltp(int cap)
+{
+	int port_ltp = PORT_LTP_CRC_MODE_16; /* this mode is mandatory */
+
+	if (cap & CAP_CRC_14B)
+		port_ltp |= PORT_LTP_CRC_MODE_14;
+	if (cap & CAP_CRC_48B)
+		port_ltp |= PORT_LTP_CRC_MODE_48;
+	if (cap & CAP_CRC_12B_16B_PER_LANE)
+		port_ltp |= PORT_LTP_CRC_MODE_PER_LANE;
+
+	return port_ltp;
+}
+
+/*
+ * Convert an OPA Port LTP mask to capability mask
+ */
+int port_ltp_to_cap(int port_ltp)
+{
+	int cap_mask = 0;
+
+	if (port_ltp & PORT_LTP_CRC_MODE_14)
+		cap_mask |= CAP_CRC_14B;
+	if (port_ltp & PORT_LTP_CRC_MODE_48)
+		cap_mask |= CAP_CRC_48B;
+	if (port_ltp & PORT_LTP_CRC_MODE_PER_LANE)
+		cap_mask |= CAP_CRC_12B_16B_PER_LANE;
+
+	return cap_mask;
+}
+
+/*
+ * Convert a single DC LCB CRC mode to an OPA Port LTP mask.
+ */
+static int lcb_to_port_ltp(int lcb_crc)
+{
+	int port_ltp = 0;
+
+	if (lcb_crc == LCB_CRC_12B_16B_PER_LANE)
+		port_ltp = PORT_LTP_CRC_MODE_PER_LANE;
+	else if (lcb_crc == LCB_CRC_48B)
+		port_ltp = PORT_LTP_CRC_MODE_48;
+	else if (lcb_crc == LCB_CRC_14B)
+		port_ltp = PORT_LTP_CRC_MODE_14;
+	else
+		port_ltp = PORT_LTP_CRC_MODE_16;
+
+	return port_ltp;
+}
+
+/*
+ * Our neighbor has indicated that we are allowed to act as a fabric
+ * manager, so place the full management partition key in the second
+ * (0-based) pkey array position (see OPAv1, section 20.2.2.6.8). Note
+ * that we should already have the limited management partition key in
+ * array element 1, and also that the port is not yet up when
+ * add_full_mgmt_pkey() is invoked.
+ */
+static void add_full_mgmt_pkey(struct hfi1_pportdata *ppd)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+
+	/* Sanity check - ppd->pkeys[2] should be 0 */
+	if (ppd->pkeys[2] != 0)
+		dd_dev_err(dd, "%s pkey[2] already set to 0x%x, resetting it to 0x%x\n",
+			   __func__, ppd->pkeys[2], FULL_MGMT_P_KEY);
+	ppd->pkeys[2] = FULL_MGMT_P_KEY;
+	(void)hfi1_set_ib_cfg(ppd, HFI1_IB_CFG_PKEYS, 0);
+}
+
+/*
+ * Convert the given link width to the OPA link width bitmask.
+ */
+static u16 link_width_to_bits(struct hfi1_devdata *dd, u16 width)
+{
+	switch (width) {
+	case 0:
+		/*
+		 * Simulator and quick linkup do not set the width.
+		 * Just set it to 4x without complaint.
+		 */
+		if (dd->icode == ICODE_FUNCTIONAL_SIMULATOR || quick_linkup)
+			return OPA_LINK_WIDTH_4X;
+		return 0; /* no lanes up */
+	case 1: return OPA_LINK_WIDTH_1X;
+	case 2: return OPA_LINK_WIDTH_2X;
+	case 3: return OPA_LINK_WIDTH_3X;
+	default:
+		dd_dev_info(dd, "%s: invalid width %d, using 4\n",
+			__func__, width);
+		/* fall through */
+	case 4: return OPA_LINK_WIDTH_4X;
+	}
+}
+
+/*
+ * Do a population count on the bottom nibble.
+ */
+static const u8 bit_counts[16] = {
+	0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4
+};
+static inline u8 nibble_to_count(u8 nibble)
+{
+	return bit_counts[nibble & 0xf];
+}
+
+/*
+ * Read the active lane information from the 8051 registers and return
+ * their widths.
+ *
+ * Active lane information is found in these 8051 registers:
+ *	enable_lane_tx
+ *	enable_lane_rx
+ */
+static void get_link_widths(struct hfi1_devdata *dd, u16 *tx_width,
+			    u16 *rx_width)
+{
+	u16 tx, rx;
+	u8 enable_lane_rx;
+	u8 enable_lane_tx;
+	u8 tx_polarity_inversion;
+	u8 rx_polarity_inversion;
+	u8 max_rate;
+
+	/* read the active lanes */
+	read_tx_settings(dd, &enable_lane_tx, &tx_polarity_inversion,
+				&rx_polarity_inversion, &max_rate);
+	read_local_lni(dd, &enable_lane_rx);
+
+	/* convert to counts */
+	tx = nibble_to_count(enable_lane_tx);
+	rx = nibble_to_count(enable_lane_rx);
+
+	/*
+	 * Set link_speed_active here, overriding what was set in
+	 * handle_verify_cap().  The ASIC 8051 firmware does not correctly
+	 * set the max_rate field in handle_verify_cap until v0.19.
+	 */
+	if ((dd->icode == ICODE_RTL_SILICON)
+				&& (dd->dc8051_ver < dc8051_ver(0, 19))) {
+		/* max_rate: 0 = 12.5G, 1 = 25G */
+		switch (max_rate) {
+		case 0:
+			dd->pport[0].link_speed_active = OPA_LINK_SPEED_12_5G;
+			break;
+		default:
+			dd_dev_err(dd,
+				"%s: unexpected max rate %d, using 25Gb\n",
+				__func__, (int)max_rate);
+			/* fall through */
+		case 1:
+			dd->pport[0].link_speed_active = OPA_LINK_SPEED_25G;
+			break;
+		}
+	}
+
+	dd_dev_info(dd,
+		"Fabric active lanes (width): tx 0x%x (%d), rx 0x%x (%d)\n",
+		enable_lane_tx, tx, enable_lane_rx, rx);
+	*tx_width = link_width_to_bits(dd, tx);
+	*rx_width = link_width_to_bits(dd, rx);
+}
+
+/*
+ * Read verify_cap_local_fm_link_width[1] to obtain the link widths.
+ * Valid after the end of VerifyCap and during LinkUp.  Does not change
+ * after link up.  I.e. look elsewhere for downgrade information.
+ *
+ * Bits are:
+ *	+ bits [7:4] contain the number of active transmitters
+ *	+ bits [3:0] contain the number of active receivers
+ * These are numbers 1 through 4 and can be different values if the
+ * link is asymmetric.
+ *
+ * verify_cap_local_fm_link_width[0] retains its original value.
+ */
+static void get_linkup_widths(struct hfi1_devdata *dd, u16 *tx_width,
+			      u16 *rx_width)
+{
+	u16 widths, tx, rx;
+	u8 misc_bits, local_flags;
+	u16 active_tx, active_rx;
+
+	read_vc_local_link_width(dd, &misc_bits, &local_flags, &widths);
+	tx = widths >> 12;
+	rx = (widths >> 8) & 0xf;
+
+	*tx_width = link_width_to_bits(dd, tx);
+	*rx_width = link_width_to_bits(dd, rx);
+
+	/* print the active widths */
+	get_link_widths(dd, &active_tx, &active_rx);
+}
+
+/*
+ * Set ppd->link_width_active and ppd->link_width_downgrade_active using
+ * hardware information when the link first comes up.
+ *
+ * The link width is not available until after VerifyCap.AllFramesReceived
+ * (the trigger for handle_verify_cap), so this is outside that routine
+ * and should be called when the 8051 signals linkup.
+ */
+void get_linkup_link_widths(struct hfi1_pportdata *ppd)
+{
+	u16 tx_width, rx_width;
+
+	/* get end-of-LNI link widths */
+	get_linkup_widths(ppd->dd, &tx_width, &rx_width);
+
+	/* use tx_width as the link is supposed to be symmetric on link up */
+	ppd->link_width_active = tx_width;
+	/* link width downgrade active (LWD.A) starts out matching LW.A */
+	ppd->link_width_downgrade_tx_active = ppd->link_width_active;
+	ppd->link_width_downgrade_rx_active = ppd->link_width_active;
+	/* per OPA spec, on link up LWD.E resets to LWD.S */
+	ppd->link_width_downgrade_enabled = ppd->link_width_downgrade_supported;
+	/* cache the active egress rate (units {10^6 bits/sec]) */
+	ppd->current_egress_rate = active_egress_rate(ppd);
+}
+
+/*
+ * Handle a verify capabilities interrupt from the 8051.
+ *
+ * This is a work-queue function outside of the interrupt.
+ */
+void handle_verify_cap(struct work_struct *work)
+{
+	struct hfi1_pportdata *ppd = container_of(work, struct hfi1_pportdata,
+								link_vc_work);
+	struct hfi1_devdata *dd = ppd->dd;
+	u64 reg;
+	u8 power_management;
+	u8 continious;
+	u8 vcu;
+	u8 vau;
+	u8 z;
+	u16 vl15buf;
+	u16 link_widths;
+	u16 crc_mask;
+	u16 crc_val;
+	u16 device_id;
+	u16 active_tx, active_rx;
+	u8 partner_supported_crc;
+	u8 remote_tx_rate;
+	u8 device_rev;
+
+	set_link_state(ppd, HLS_VERIFY_CAP);
+
+	lcb_shutdown(dd, 0);
+	adjust_lcb_for_fpga_serdes(dd);
+
+	/*
+	 * These are now valid:
+	 *	remote VerifyCap fields in the general LNI config
+	 *	CSR DC8051_STS_REMOTE_GUID
+	 *	CSR DC8051_STS_REMOTE_NODE_TYPE
+	 *	CSR DC8051_STS_REMOTE_FM_SECURITY
+	 *	CSR DC8051_STS_REMOTE_PORT_NO
+	 */
+
+	read_vc_remote_phy(dd, &power_management, &continious);
+	read_vc_remote_fabric(
+		dd,
+		&vau,
+		&z,
+		&vcu,
+		&vl15buf,
+		&partner_supported_crc);
+	read_vc_remote_link_width(dd, &remote_tx_rate, &link_widths);
+	read_remote_device_id(dd, &device_id, &device_rev);
+	/*
+	 * And the 'MgmtAllowed' information, which is exchanged during
+	 * LNI, is also be available at this point.
+	 */
+	read_mgmt_allowed(dd, &ppd->mgmt_allowed);
+	/* print the active widths */
+	get_link_widths(dd, &active_tx, &active_rx);
+	dd_dev_info(dd,
+		"Peer PHY: power management 0x%x, continuous updates 0x%x\n",
+		(int)power_management, (int)continious);
+	dd_dev_info(dd,
+		"Peer Fabric: vAU %d, Z %d, vCU %d, vl15 credits 0x%x, CRC sizes 0x%x\n",
+		(int)vau,
+		(int)z,
+		(int)vcu,
+		(int)vl15buf,
+		(int)partner_supported_crc);
+	dd_dev_info(dd, "Peer Link Width: tx rate 0x%x, widths 0x%x\n",
+		(u32)remote_tx_rate, (u32)link_widths);
+	dd_dev_info(dd, "Peer Device ID: 0x%04x, Revision 0x%02x\n",
+		(u32)device_id, (u32)device_rev);
+	/*
+	 * The peer vAU value just read is the peer receiver value.  HFI does
+	 * not support a transmit vAU of 0 (AU == 8).  We advertised that
+	 * with Z=1 in the fabric capabilities sent to the peer.  The peer
+	 * will see our Z=1, and, if it advertised a vAU of 0, will move its
+	 * receive to vAU of 1 (AU == 16).  Do the same here.  We do not care
+	 * about the peer Z value - our sent vAU is 3 (hardwired) and is not
+	 * subject to the Z value exception.
+	 */
+	if (vau == 0)
+		vau = 1;
+	set_up_vl15(dd, vau, vl15buf);
+
+	/* set up the LCB CRC mode */
+	crc_mask = ppd->port_crc_mode_enabled & partner_supported_crc;
+
+	/* order is important: use the lowest bit in common */
+	if (crc_mask & CAP_CRC_14B)
+		crc_val = LCB_CRC_14B;
+	else if (crc_mask & CAP_CRC_48B)
+		crc_val = LCB_CRC_48B;
+	else if (crc_mask & CAP_CRC_12B_16B_PER_LANE)
+		crc_val = LCB_CRC_12B_16B_PER_LANE;
+	else
+		crc_val = LCB_CRC_16B;
+
+	dd_dev_info(dd, "Final LCB CRC mode: %d\n", (int)crc_val);
+	write_csr(dd, DC_LCB_CFG_CRC_MODE,
+		  (u64)crc_val << DC_LCB_CFG_CRC_MODE_TX_VAL_SHIFT);
+
+	/* set (14b only) or clear sideband credit */
+	reg = read_csr(dd, SEND_CM_CTRL);
+	if (crc_val == LCB_CRC_14B && crc_14b_sideband) {
+		write_csr(dd, SEND_CM_CTRL,
+			reg | SEND_CM_CTRL_FORCE_CREDIT_MODE_SMASK);
+	} else {
+		write_csr(dd, SEND_CM_CTRL,
+			reg & ~SEND_CM_CTRL_FORCE_CREDIT_MODE_SMASK);
+	}
+
+	ppd->link_speed_active = 0;	/* invalid value */
+	if (dd->dc8051_ver < dc8051_ver(0, 20)) {
+		/* remote_tx_rate: 0 = 12.5G, 1 = 25G */
+		switch (remote_tx_rate) {
+		case 0:
+			ppd->link_speed_active = OPA_LINK_SPEED_12_5G;
+			break;
+		case 1:
+			ppd->link_speed_active = OPA_LINK_SPEED_25G;
+			break;
+		}
+	} else {
+		/* actual rate is highest bit of the ANDed rates */
+		u8 rate = remote_tx_rate & ppd->local_tx_rate;
+
+		if (rate & 2)
+			ppd->link_speed_active = OPA_LINK_SPEED_25G;
+		else if (rate & 1)
+			ppd->link_speed_active = OPA_LINK_SPEED_12_5G;
+	}
+	if (ppd->link_speed_active == 0) {
+		dd_dev_err(dd, "%s: unexpected remote tx rate %d, using 25Gb\n",
+			__func__, (int)remote_tx_rate);
+		ppd->link_speed_active = OPA_LINK_SPEED_25G;
+	}
+
+	/*
+	 * Cache the values of the supported, enabled, and active
+	 * LTP CRC modes to return in 'portinfo' queries. But the bit
+	 * flags that are returned in the portinfo query differ from
+	 * what's in the link_crc_mask, crc_sizes, and crc_val
+	 * variables. Convert these here.
+	 */
+	ppd->port_ltp_crc_mode = cap_to_port_ltp(link_crc_mask) << 8;
+		/* supported crc modes */
+	ppd->port_ltp_crc_mode |=
+		cap_to_port_ltp(ppd->port_crc_mode_enabled) << 4;
+		/* enabled crc modes */
+	ppd->port_ltp_crc_mode |= lcb_to_port_ltp(crc_val);
+		/* active crc mode */
+
+	/* set up the remote credit return table */
+	assign_remote_cm_au_table(dd, vcu);
+
+	/*
+	 * The LCB is reset on entry to handle_verify_cap(), so this must
+	 * be applied on every link up.
+	 *
+	 * Adjust LCB error kill enable to kill the link if
+	 * these RBUF errors are seen:
+	 *	REPLAY_BUF_MBE_SMASK
+	 *	FLIT_INPUT_BUF_MBE_SMASK
+	 */
+	if (is_a0(dd)) {			/* fixed in B0 */
+		reg = read_csr(dd, DC_LCB_CFG_LINK_KILL_EN);
+		reg |= DC_LCB_CFG_LINK_KILL_EN_REPLAY_BUF_MBE_SMASK
+			| DC_LCB_CFG_LINK_KILL_EN_FLIT_INPUT_BUF_MBE_SMASK;
+		write_csr(dd, DC_LCB_CFG_LINK_KILL_EN, reg);
+	}
+
+	/* pull LCB fifos out of reset - all fifo clocks must be stable */
+	write_csr(dd, DC_LCB_CFG_TX_FIFOS_RESET, 0);
+
+	/* give 8051 access to the LCB CSRs */
+	write_csr(dd, DC_LCB_ERR_EN, 0); /* mask LCB errors */
+	set_8051_lcb_access(dd);
+
+	ppd->neighbor_guid =
+		read_csr(dd, DC_DC8051_STS_REMOTE_GUID);
+	ppd->neighbor_port_number = read_csr(dd, DC_DC8051_STS_REMOTE_PORT_NO) &
+					DC_DC8051_STS_REMOTE_PORT_NO_VAL_SMASK;
+	ppd->neighbor_type =
+		read_csr(dd, DC_DC8051_STS_REMOTE_NODE_TYPE) &
+		DC_DC8051_STS_REMOTE_NODE_TYPE_VAL_MASK;
+	ppd->neighbor_fm_security =
+		read_csr(dd, DC_DC8051_STS_REMOTE_FM_SECURITY) &
+		DC_DC8051_STS_LOCAL_FM_SECURITY_DISABLED_MASK;
+	dd_dev_info(dd,
+		"Neighbor Guid: %llx Neighbor type %d MgmtAllowed %d FM security bypass %d\n",
+		ppd->neighbor_guid, ppd->neighbor_type,
+		ppd->mgmt_allowed, ppd->neighbor_fm_security);
+	if (ppd->mgmt_allowed)
+		add_full_mgmt_pkey(ppd);
+
+	/* tell the 8051 to go to LinkUp */
+	set_link_state(ppd, HLS_GOING_UP);
+}
+
+/*
+ * Apply the link width downgrade enabled policy against the current active
+ * link widths.
+ *
+ * Called when the enabled policy changes or the active link widths change.
+ */
+void apply_link_downgrade_policy(struct hfi1_pportdata *ppd, int refresh_widths)
+{
+	int skip = 1;
+	int do_bounce = 0;
+	u16 lwde = ppd->link_width_downgrade_enabled;
+	u16 tx, rx;
+
+	mutex_lock(&ppd->hls_lock);
+	/* only apply if the link is up */
+	if (ppd->host_link_state & HLS_UP)
+		skip = 0;
+	mutex_unlock(&ppd->hls_lock);
+	if (skip)
+		return;
+
+	if (refresh_widths) {
+		get_link_widths(ppd->dd, &tx, &rx);
+		ppd->link_width_downgrade_tx_active = tx;
+		ppd->link_width_downgrade_rx_active = rx;
+	}
+
+	if (lwde == 0) {
+		/* downgrade is disabled */
+
+		/* bounce if not at starting active width */
+		if ((ppd->link_width_active !=
+					ppd->link_width_downgrade_tx_active)
+				|| (ppd->link_width_active !=
+					ppd->link_width_downgrade_rx_active)) {
+			dd_dev_err(ppd->dd,
+				"Link downgrade is disabled and link has downgraded, downing link\n");
+			dd_dev_err(ppd->dd,
+				"  original 0x%x, tx active 0x%x, rx active 0x%x\n",
+				ppd->link_width_active,
+				ppd->link_width_downgrade_tx_active,
+				ppd->link_width_downgrade_rx_active);
+			do_bounce = 1;
+		}
+	} else if ((lwde & ppd->link_width_downgrade_tx_active) == 0
+		|| (lwde & ppd->link_width_downgrade_rx_active) == 0) {
+		/* Tx or Rx is outside the enabled policy */
+		dd_dev_err(ppd->dd,
+			"Link is outside of downgrade allowed, downing link\n");
+		dd_dev_err(ppd->dd,
+			"  enabled 0x%x, tx active 0x%x, rx active 0x%x\n",
+			lwde,
+			ppd->link_width_downgrade_tx_active,
+			ppd->link_width_downgrade_rx_active);
+		do_bounce = 1;
+	}
+
+	if (do_bounce) {
+		set_link_down_reason(ppd, OPA_LINKDOWN_REASON_WIDTH_POLICY, 0,
+		  OPA_LINKDOWN_REASON_WIDTH_POLICY);
+		set_link_state(ppd, HLS_DN_OFFLINE);
+		start_link(ppd);
+	}
+}
+
+/*
+ * Handle a link downgrade interrupt from the 8051.
+ *
+ * This is a work-queue function outside of the interrupt.
+ */
+void handle_link_downgrade(struct work_struct *work)
+{
+	struct hfi1_pportdata *ppd = container_of(work, struct hfi1_pportdata,
+							link_downgrade_work);
+
+	dd_dev_info(ppd->dd, "8051: Link width downgrade\n");
+	apply_link_downgrade_policy(ppd, 1);
+}
+
+static char *dcc_err_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags, dcc_err_flags,
+		ARRAY_SIZE(dcc_err_flags));
+}
+
+static char *lcb_err_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags, lcb_err_flags,
+		ARRAY_SIZE(lcb_err_flags));
+}
+
+static char *dc8051_err_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags, dc8051_err_flags,
+		ARRAY_SIZE(dc8051_err_flags));
+}
+
+static char *dc8051_info_err_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags, dc8051_info_err_flags,
+		ARRAY_SIZE(dc8051_info_err_flags));
+}
+
+static char *dc8051_info_host_msg_string(char *buf, int buf_len, u64 flags)
+{
+	return flag_string(buf, buf_len, flags, dc8051_info_host_msg_flags,
+		ARRAY_SIZE(dc8051_info_host_msg_flags));
+}
+
+static void handle_8051_interrupt(struct hfi1_devdata *dd, u32 unused, u64 reg)
+{
+	struct hfi1_pportdata *ppd = dd->pport;
+	u64 info, err, host_msg;
+	int queue_link_down = 0;
+	char buf[96];
+
+	/* look at the flags */
+	if (reg & DC_DC8051_ERR_FLG_SET_BY_8051_SMASK) {
+		/* 8051 information set by firmware */
+		/* read DC8051_DBG_ERR_INFO_SET_BY_8051 for details */
+		info = read_csr(dd, DC_DC8051_DBG_ERR_INFO_SET_BY_8051);
+		err = (info >> DC_DC8051_DBG_ERR_INFO_SET_BY_8051_ERROR_SHIFT)
+			& DC_DC8051_DBG_ERR_INFO_SET_BY_8051_ERROR_MASK;
+		host_msg = (info >>
+			DC_DC8051_DBG_ERR_INFO_SET_BY_8051_HOST_MSG_SHIFT)
+			& DC_DC8051_DBG_ERR_INFO_SET_BY_8051_HOST_MSG_MASK;
+
+		/*
+		 * Handle error flags.
+		 */
+		if (err & FAILED_LNI) {
+			/*
+			 * LNI error indications are cleared by the 8051
+			 * only when starting polling.  Only pay attention
+			 * to them when in the states that occur during
+			 * LNI.
+			 */
+			if (ppd->host_link_state
+			    & (HLS_DN_POLL | HLS_VERIFY_CAP | HLS_GOING_UP)) {
+				queue_link_down = 1;
+				dd_dev_info(dd, "Link error: %s\n",
+					dc8051_info_err_string(buf,
+						sizeof(buf),
+						err & FAILED_LNI));
+			}
+			err &= ~(u64)FAILED_LNI;
+		}
+		if (err) {
+			/* report remaining errors, but do not do anything */
+			dd_dev_err(dd, "8051 info error: %s\n",
+				dc8051_info_err_string(buf, sizeof(buf), err));
+		}
+
+		/*
+		 * Handle host message flags.
+		 */
+		if (host_msg & HOST_REQ_DONE) {
+			/*
+			 * Presently, the driver does a busy wait for
+			 * host requests to complete.  This is only an
+			 * informational message.
+			 * NOTE: The 8051 clears the host message
+			 * information *on the next 8051 command*.
+			 * Therefore, when linkup is achieved,
+			 * this flag will still be set.
+			 */
+			host_msg &= ~(u64)HOST_REQ_DONE;
+		}
+		if (host_msg & BC_SMA_MSG) {
+			queue_work(ppd->hfi1_wq, &ppd->sma_message_work);
+			host_msg &= ~(u64)BC_SMA_MSG;
+		}
+		if (host_msg & LINKUP_ACHIEVED) {
+			dd_dev_info(dd, "8051: Link up\n");
+			queue_work(ppd->hfi1_wq, &ppd->link_up_work);
+			host_msg &= ~(u64)LINKUP_ACHIEVED;
+		}
+		if (host_msg & EXT_DEVICE_CFG_REQ) {
+			handle_8051_request(dd);
+			host_msg &= ~(u64)EXT_DEVICE_CFG_REQ;
+		}
+		if (host_msg & VERIFY_CAP_FRAME) {
+			queue_work(ppd->hfi1_wq, &ppd->link_vc_work);
+			host_msg &= ~(u64)VERIFY_CAP_FRAME;
+		}
+		if (host_msg & LINK_GOING_DOWN) {
+			const char *extra = "";
+			/* no downgrade action needed if going down */
+			if (host_msg & LINK_WIDTH_DOWNGRADED) {
+				host_msg &= ~(u64)LINK_WIDTH_DOWNGRADED;
+				extra = " (ignoring downgrade)";
+			}
+			dd_dev_info(dd, "8051: Link down%s\n", extra);
+			queue_link_down = 1;
+			host_msg &= ~(u64)LINK_GOING_DOWN;
+		}
+		if (host_msg & LINK_WIDTH_DOWNGRADED) {
+			queue_work(ppd->hfi1_wq, &ppd->link_downgrade_work);
+			host_msg &= ~(u64)LINK_WIDTH_DOWNGRADED;
+		}
+		if (host_msg) {
+			/* report remaining messages, but do not do anything */
+			dd_dev_info(dd, "8051 info host message: %s\n",
+				dc8051_info_host_msg_string(buf, sizeof(buf),
+					host_msg));
+		}
+
+		reg &= ~DC_DC8051_ERR_FLG_SET_BY_8051_SMASK;
+	}
+	if (reg & DC_DC8051_ERR_FLG_LOST_8051_HEART_BEAT_SMASK) {
+		/*
+		 * Lost the 8051 heartbeat.  If this happens, we
+		 * receive constant interrupts about it.  Disable
+		 * the interrupt after the first.
+		 */
+		dd_dev_err(dd, "Lost 8051 heartbeat\n");
+		write_csr(dd, DC_DC8051_ERR_EN,
+			read_csr(dd, DC_DC8051_ERR_EN)
+			  & ~DC_DC8051_ERR_EN_LOST_8051_HEART_BEAT_SMASK);
+
+		reg &= ~DC_DC8051_ERR_FLG_LOST_8051_HEART_BEAT_SMASK;
+	}
+	if (reg) {
+		/* report the error, but do not do anything */
+		dd_dev_err(dd, "8051 error: %s\n",
+			dc8051_err_string(buf, sizeof(buf), reg));
+	}
+
+	if (queue_link_down) {
+		/* if the link is already going down or disabled, do not
+		 * queue another */
+		if ((ppd->host_link_state
+				    & (HLS_GOING_OFFLINE|HLS_LINK_COOLDOWN))
+				|| ppd->link_enabled == 0) {
+			dd_dev_info(dd, "%s: not queuing link down\n",
+				__func__);
+		} else {
+			queue_work(ppd->hfi1_wq, &ppd->link_down_work);
+		}
+	}
+}
+
+static const char * const fm_config_txt[] = {
+[0] =
+	"BadHeadDist: Distance violation between two head flits",
+[1] =
+	"BadTailDist: Distance violation between two tail flits",
+[2] =
+	"BadCtrlDist: Distance violation between two credit control flits",
+[3] =
+	"BadCrdAck: Credits return for unsupported VL",
+[4] =
+	"UnsupportedVLMarker: Received VL Marker",
+[5] =
+	"BadPreempt: Exceeded the preemption nesting level",
+[6] =
+	"BadControlFlit: Received unsupported control flit",
+/* no 7 */
+[8] =
+	"UnsupportedVLMarker: Received VL Marker for unconfigured or disabled VL",
+};
+
+static const char * const port_rcv_txt[] = {
+[1] =
+	"BadPktLen: Illegal PktLen",
+[2] =
+	"PktLenTooLong: Packet longer than PktLen",
+[3] =
+	"PktLenTooShort: Packet shorter than PktLen",
+[4] =
+	"BadSLID: Illegal SLID (0, using multicast as SLID, does not include security validation of SLID)",
+[5] =
+	"BadDLID: Illegal DLID (0, doesn't match HFI)",
+[6] =
+	"BadL2: Illegal L2 opcode",
+[7] =
+	"BadSC: Unsupported SC",
+[9] =
+	"BadRC: Illegal RC",
+[11] =
+	"PreemptError: Preempting with same VL",
+[12] =
+	"PreemptVL15: Preempting a VL15 packet",
+};
+
+#define OPA_LDR_FMCONFIG_OFFSET 16
+#define OPA_LDR_PORTRCV_OFFSET 0
+static void handle_dcc_err(struct hfi1_devdata *dd, u32 unused, u64 reg)
+{
+	u64 info, hdr0, hdr1;
+	const char *extra;
+	char buf[96];
+	struct hfi1_pportdata *ppd = dd->pport;
+	u8 lcl_reason = 0;
+	int do_bounce = 0;
+
+	if (reg & DCC_ERR_FLG_UNCORRECTABLE_ERR_SMASK) {
+		if (!(dd->err_info_uncorrectable & OPA_EI_STATUS_SMASK)) {
+			info = read_csr(dd, DCC_ERR_INFO_UNCORRECTABLE);
+			dd->err_info_uncorrectable = info & OPA_EI_CODE_SMASK;
+			/* set status bit */
+			dd->err_info_uncorrectable |= OPA_EI_STATUS_SMASK;
+		}
+		reg &= ~DCC_ERR_FLG_UNCORRECTABLE_ERR_SMASK;
+	}
+
+	if (reg & DCC_ERR_FLG_LINK_ERR_SMASK) {
+		struct hfi1_pportdata *ppd = dd->pport;
+		/* this counter saturates at (2^32) - 1 */
+		if (ppd->link_downed < (u32)UINT_MAX)
+			ppd->link_downed++;
+		reg &= ~DCC_ERR_FLG_LINK_ERR_SMASK;
+	}
+
+	if (reg & DCC_ERR_FLG_FMCONFIG_ERR_SMASK) {
+		u8 reason_valid = 1;
+
+		info = read_csr(dd, DCC_ERR_INFO_FMCONFIG);
+		if (!(dd->err_info_fmconfig & OPA_EI_STATUS_SMASK)) {
+			dd->err_info_fmconfig = info & OPA_EI_CODE_SMASK;
+			/* set status bit */
+			dd->err_info_fmconfig |= OPA_EI_STATUS_SMASK;
+		}
+		switch (info) {
+		case 0:
+		case 1:
+		case 2:
+		case 3:
+		case 4:
+		case 5:
+		case 6:
+			extra = fm_config_txt[info];
+			break;
+		case 8:
+			extra = fm_config_txt[info];
+			if (ppd->port_error_action &
+			    OPA_PI_MASK_FM_CFG_UNSUPPORTED_VL_MARKER) {
+				do_bounce = 1;
+				/*
+				 * lcl_reason cannot be derived from info
+				 * for this error
+				 */
+				lcl_reason =
+				  OPA_LINKDOWN_REASON_UNSUPPORTED_VL_MARKER;
+			}
+			break;
+		default:
+			reason_valid = 0;
+			snprintf(buf, sizeof(buf), "reserved%lld", info);
+			extra = buf;
+			break;
+		}
+
+		if (reason_valid && !do_bounce) {
+			do_bounce = ppd->port_error_action &
+					(1 << (OPA_LDR_FMCONFIG_OFFSET + info));
+			lcl_reason = info + OPA_LINKDOWN_REASON_BAD_HEAD_DIST;
+		}
+
+		/* just report this */
+		dd_dev_info(dd, "DCC Error: fmconfig error: %s\n", extra);
+		reg &= ~DCC_ERR_FLG_FMCONFIG_ERR_SMASK;
+	}
+
+	if (reg & DCC_ERR_FLG_RCVPORT_ERR_SMASK) {
+		u8 reason_valid = 1;
+
+		info = read_csr(dd, DCC_ERR_INFO_PORTRCV);
+		hdr0 = read_csr(dd, DCC_ERR_INFO_PORTRCV_HDR0);
+		hdr1 = read_csr(dd, DCC_ERR_INFO_PORTRCV_HDR1);
+		if (!(dd->err_info_rcvport.status_and_code &
+		      OPA_EI_STATUS_SMASK)) {
+			dd->err_info_rcvport.status_and_code =
+				info & OPA_EI_CODE_SMASK;
+			/* set status bit */
+			dd->err_info_rcvport.status_and_code |=
+				OPA_EI_STATUS_SMASK;
+			/* save first 2 flits in the packet that caused
+			 * the error */
+			 dd->err_info_rcvport.packet_flit1 = hdr0;
+			 dd->err_info_rcvport.packet_flit2 = hdr1;
+		}
+		switch (info) {
+		case 1:
+		case 2:
+		case 3:
+		case 4:
+		case 5:
+		case 6:
+		case 7:
+		case 9:
+		case 11:
+		case 12:
+			extra = port_rcv_txt[info];
+			break;
+		default:
+			reason_valid = 0;
+			snprintf(buf, sizeof(buf), "reserved%lld", info);
+			extra = buf;
+			break;
+		}
+
+		if (reason_valid && !do_bounce) {
+			do_bounce = ppd->port_error_action &
+					(1 << (OPA_LDR_PORTRCV_OFFSET + info));
+			lcl_reason = info + OPA_LINKDOWN_REASON_RCV_ERROR_0;
+		}
+
+		/* just report this */
+		dd_dev_info(dd, "DCC Error: PortRcv error: %s\n", extra);
+		dd_dev_info(dd, "           hdr0 0x%llx, hdr1 0x%llx\n",
+			hdr0, hdr1);
+
+		reg &= ~DCC_ERR_FLG_RCVPORT_ERR_SMASK;
+	}
+
+	if (reg & DCC_ERR_FLG_EN_CSR_ACCESS_BLOCKED_UC_SMASK) {
+		/* informative only */
+		dd_dev_info(dd, "8051 access to LCB blocked\n");
+		reg &= ~DCC_ERR_FLG_EN_CSR_ACCESS_BLOCKED_UC_SMASK;
+	}
+	if (reg & DCC_ERR_FLG_EN_CSR_ACCESS_BLOCKED_HOST_SMASK) {
+		/* informative only */
+		dd_dev_info(dd, "host access to LCB blocked\n");
+		reg &= ~DCC_ERR_FLG_EN_CSR_ACCESS_BLOCKED_HOST_SMASK;
+	}
+
+	/* report any remaining errors */
+	if (reg)
+		dd_dev_info(dd, "DCC Error: %s\n",
+			dcc_err_string(buf, sizeof(buf), reg));
+
+	if (lcl_reason == 0)
+		lcl_reason = OPA_LINKDOWN_REASON_UNKNOWN;
+
+	if (do_bounce) {
+		dd_dev_info(dd, "%s: PortErrorAction bounce\n", __func__);
+		set_link_down_reason(ppd, lcl_reason, 0, lcl_reason);
+		queue_work(ppd->hfi1_wq, &ppd->link_bounce_work);
+	}
+}
+
+static void handle_lcb_err(struct hfi1_devdata *dd, u32 unused, u64 reg)
+{
+	char buf[96];
+
+	dd_dev_info(dd, "LCB Error: %s\n",
+		lcb_err_string(buf, sizeof(buf), reg));
+}
+
+/*
+ * CCE block DC interrupt.  Source is < 8.
+ */
+static void is_dc_int(struct hfi1_devdata *dd, unsigned int source)
+{
+	const struct err_reg_info *eri = &dc_errs[source];
+
+	if (eri->handler) {
+		interrupt_clear_down(dd, 0, eri);
+	} else if (source == 3 /* dc_lbm_int */) {
+		/*
+		 * This indicates that a parity error has occurred on the
+		 * address/control lines presented to the LBM.  The error
+		 * is a single pulse, there is no associated error flag,
+		 * and it is non-maskable.  This is because if a parity
+		 * error occurs on the request the request is dropped.
+		 * This should never occur, but it is nice to know if it
+		 * ever does.
+		 */
+		dd_dev_err(dd, "Parity error in DC LBM block\n");
+	} else {
+		dd_dev_err(dd, "Invalid DC interrupt %u\n", source);
+	}
+}
+
+/*
+ * TX block send credit interrupt.  Source is < 160.
+ */
+static void is_send_credit_int(struct hfi1_devdata *dd, unsigned int source)
+{
+	sc_group_release_update(dd, source);
+}
+
+/*
+ * TX block SDMA interrupt.  Source is < 48.
+ *
+ * SDMA interrupts are grouped by type:
+ *
+ *	 0 -  N-1 = SDma
+ *	 N - 2N-1 = SDmaProgress
+ *	2N - 3N-1 = SDmaIdle
+ */
+static void is_sdma_eng_int(struct hfi1_devdata *dd, unsigned int source)
+{
+	/* what interrupt */
+	unsigned int what  = source / TXE_NUM_SDMA_ENGINES;
+	/* which engine */
+	unsigned int which = source % TXE_NUM_SDMA_ENGINES;
+
+#ifdef CONFIG_SDMA_VERBOSITY
+	dd_dev_err(dd, "CONFIG SDMA(%u) %s:%d %s()\n", which,
+		   slashstrip(__FILE__), __LINE__, __func__);
+	sdma_dumpstate(&dd->per_sdma[which]);
+#endif
+
+	if (likely(what < 3 && which < dd->num_sdma)) {
+		sdma_engine_interrupt(&dd->per_sdma[which], 1ull << source);
+	} else {
+		/* should not happen */
+		dd_dev_err(dd, "Invalid SDMA interrupt 0x%x\n", source);
+	}
+}
+
+/*
+ * RX block receive available interrupt.  Source is < 160.
+ */
+static void is_rcv_avail_int(struct hfi1_devdata *dd, unsigned int source)
+{
+	struct hfi1_ctxtdata *rcd;
+	char *err_detail;
+
+	if (likely(source < dd->num_rcv_contexts)) {
+		rcd = dd->rcd[source];
+		if (rcd) {
+			if (source < dd->first_user_ctxt)
+				rcd->do_interrupt(rcd);
+			else
+				handle_user_interrupt(rcd);
+			return;	/* OK */
+		}
+		/* received an interrupt, but no rcd */
+		err_detail = "dataless";
+	} else {
+		/* received an interrupt, but are not using that context */
+		err_detail = "out of range";
+	}
+	dd_dev_err(dd, "unexpected %s receive available context interrupt %u\n",
+		err_detail, source);
+}
+
+/*
+ * RX block receive urgent interrupt.  Source is < 160.
+ */
+static void is_rcv_urgent_int(struct hfi1_devdata *dd, unsigned int source)
+{
+	struct hfi1_ctxtdata *rcd;
+	char *err_detail;
+
+	if (likely(source < dd->num_rcv_contexts)) {
+		rcd = dd->rcd[source];
+		if (rcd) {
+			/* only pay attention to user urgent interrupts */
+			if (source >= dd->first_user_ctxt)
+				handle_user_interrupt(rcd);
+			return;	/* OK */
+		}
+		/* received an interrupt, but no rcd */
+		err_detail = "dataless";
+	} else {
+		/* received an interrupt, but are not using that context */
+		err_detail = "out of range";
+	}
+	dd_dev_err(dd, "unexpected %s receive urgent context interrupt %u\n",
+		err_detail, source);
+}
+
+/*
+ * Reserved range interrupt.  Should not be called in normal operation.
+ */
+static void is_reserved_int(struct hfi1_devdata *dd, unsigned int source)
+{
+	char name[64];
+
+	dd_dev_err(dd, "unexpected %s interrupt\n",
+				is_reserved_name(name, sizeof(name), source));
+}
+
+static const struct is_table is_table[] = {
+/* start		     end
+				name func		interrupt func */
+{ IS_GENERAL_ERR_START,  IS_GENERAL_ERR_END,
+				is_misc_err_name,	is_misc_err_int },
+{ IS_SDMAENG_ERR_START,  IS_SDMAENG_ERR_END,
+				is_sdma_eng_err_name,	is_sdma_eng_err_int },
+{ IS_SENDCTXT_ERR_START, IS_SENDCTXT_ERR_END,
+				is_sendctxt_err_name,	is_sendctxt_err_int },
+{ IS_SDMA_START,	     IS_SDMA_END,
+				is_sdma_eng_name,	is_sdma_eng_int },
+{ IS_VARIOUS_START,	     IS_VARIOUS_END,
+				is_various_name,	is_various_int },
+{ IS_DC_START,	     IS_DC_END,
+				is_dc_name,		is_dc_int },
+{ IS_RCVAVAIL_START,     IS_RCVAVAIL_END,
+				is_rcv_avail_name,	is_rcv_avail_int },
+{ IS_RCVURGENT_START,    IS_RCVURGENT_END,
+				is_rcv_urgent_name,	is_rcv_urgent_int },
+{ IS_SENDCREDIT_START,   IS_SENDCREDIT_END,
+				is_send_credit_name,	is_send_credit_int},
+{ IS_RESERVED_START,     IS_RESERVED_END,
+				is_reserved_name,	is_reserved_int},
+};
+
+/*
+ * Interrupt source interrupt - called when the given source has an interrupt.
+ * Source is a bit index into an array of 64-bit integers.
+ */
+static void is_interrupt(struct hfi1_devdata *dd, unsigned int source)
+{
+	const struct is_table *entry;
+
+	/* avoids a double compare by walking the table in-order */
+	for (entry = &is_table[0]; entry->is_name; entry++) {
+		if (source < entry->end) {
+			trace_hfi1_interrupt(dd, entry, source);
+			entry->is_int(dd, source - entry->start);
+			return;
+		}
+	}
+	/* fell off the end */
+	dd_dev_err(dd, "invalid interrupt source %u\n", source);
+}
+
+/*
+ * General interrupt handler.  This is able to correctly handle
+ * all interrupts in case INTx is used.
+ */
+static irqreturn_t general_interrupt(int irq, void *data)
+{
+	struct hfi1_devdata *dd = data;
+	u64 regs[CCE_NUM_INT_CSRS];
+	u32 bit;
+	int i;
+
+	this_cpu_inc(*dd->int_counter);
+
+	/* phase 1: scan and clear all handled interrupts */
+	for (i = 0; i < CCE_NUM_INT_CSRS; i++) {
+		if (dd->gi_mask[i] == 0) {
+			regs[i] = 0;	/* used later */
+			continue;
+		}
+		regs[i] = read_csr(dd, CCE_INT_STATUS + (8 * i)) &
+				dd->gi_mask[i];
+		/* only clear if anything is set */
+		if (regs[i])
+			write_csr(dd, CCE_INT_CLEAR + (8 * i), regs[i]);
+	}
+
+	/* phase 2: call the appropriate handler */
+	for_each_set_bit(bit, (unsigned long *)&regs[0],
+						CCE_NUM_INT_CSRS*64) {
+		is_interrupt(dd, bit);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t sdma_interrupt(int irq, void *data)
+{
+	struct sdma_engine *sde = data;
+	struct hfi1_devdata *dd = sde->dd;
+	u64 status;
+
+#ifdef CONFIG_SDMA_VERBOSITY
+	dd_dev_err(dd, "CONFIG SDMA(%u) %s:%d %s()\n", sde->this_idx,
+		   slashstrip(__FILE__), __LINE__, __func__);
+	sdma_dumpstate(sde);
+#endif
+
+	this_cpu_inc(*dd->int_counter);
+
+	/* This read_csr is really bad in the hot path */
+	status = read_csr(dd,
+			CCE_INT_STATUS + (8*(IS_SDMA_START/64)))
+			& sde->imask;
+	if (likely(status)) {
+		/* clear the interrupt(s) */
+		write_csr(dd,
+			CCE_INT_CLEAR + (8*(IS_SDMA_START/64)),
+			status);
+
+		/* handle the interrupt(s) */
+		sdma_engine_interrupt(sde, status);
+	} else
+		dd_dev_err(dd, "SDMA engine %u interrupt, but no status bits set\n",
+			sde->this_idx);
+
+	return IRQ_HANDLED;
+}
+
+/*
+ * NOTE: this routine expects to be on its own MSI-X interrupt.  If
+ * multiple receive contexts share the same MSI-X interrupt, then this
+ * routine must check for who received it.
+ */
+static irqreturn_t receive_context_interrupt(int irq, void *data)
+{
+	struct hfi1_ctxtdata *rcd = data;
+	struct hfi1_devdata *dd = rcd->dd;
+
+	trace_hfi1_receive_interrupt(dd, rcd->ctxt);
+	this_cpu_inc(*dd->int_counter);
+
+	/* clear the interrupt */
+	write_csr(rcd->dd, CCE_INT_CLEAR + (8*rcd->ireg), rcd->imask);
+
+	/* handle the interrupt */
+	rcd->do_interrupt(rcd);
+
+	return IRQ_HANDLED;
+}
+
+/* ========================================================================= */
+
+u32 read_physical_state(struct hfi1_devdata *dd)
+{
+	u64 reg;
+
+	reg = read_csr(dd, DC_DC8051_STS_CUR_STATE);
+	return (reg >> DC_DC8051_STS_CUR_STATE_PORT_SHIFT)
+				& DC_DC8051_STS_CUR_STATE_PORT_MASK;
+}
+
+static u32 read_logical_state(struct hfi1_devdata *dd)
+{
+	u64 reg;
+
+	reg = read_csr(dd, DCC_CFG_PORT_CONFIG);
+	return (reg >> DCC_CFG_PORT_CONFIG_LINK_STATE_SHIFT)
+				& DCC_CFG_PORT_CONFIG_LINK_STATE_MASK;
+}
+
+static void set_logical_state(struct hfi1_devdata *dd, u32 chip_lstate)
+{
+	u64 reg;
+
+	reg = read_csr(dd, DCC_CFG_PORT_CONFIG);
+	/* clear current state, set new state */
+	reg &= ~DCC_CFG_PORT_CONFIG_LINK_STATE_SMASK;
+	reg |= (u64)chip_lstate << DCC_CFG_PORT_CONFIG_LINK_STATE_SHIFT;
+	write_csr(dd, DCC_CFG_PORT_CONFIG, reg);
+}
+
+/*
+ * Use the 8051 to read a LCB CSR.
+ */
+static int read_lcb_via_8051(struct hfi1_devdata *dd, u32 addr, u64 *data)
+{
+	u32 regno;
+	int ret;
+
+	if (dd->icode == ICODE_FUNCTIONAL_SIMULATOR) {
+		if (acquire_lcb_access(dd, 0) == 0) {
+			*data = read_csr(dd, addr);
+			release_lcb_access(dd, 0);
+			return 0;
+		}
+		return -EBUSY;
+	}
+
+	/* register is an index of LCB registers: (offset - base) / 8 */
+	regno = (addr - DC_LCB_CFG_RUN) >> 3;
+	ret = do_8051_command(dd, HCMD_READ_LCB_CSR, regno, data);
+	if (ret != HCMD_SUCCESS)
+		return -EBUSY;
+	return 0;
+}
+
+/*
+ * Read an LCB CSR.  Access may not be in host control, so check.
+ * Return 0 on success, -EBUSY on failure.
+ */
+int read_lcb_csr(struct hfi1_devdata *dd, u32 addr, u64 *data)
+{
+	struct hfi1_pportdata *ppd = dd->pport;
+
+	/* if up, go through the 8051 for the value */
+	if (ppd->host_link_state & HLS_UP)
+		return read_lcb_via_8051(dd, addr, data);
+	/* if going up or down, no access */
+	if (ppd->host_link_state & (HLS_GOING_UP | HLS_GOING_OFFLINE))
+		return -EBUSY;
+	/* otherwise, host has access */
+	*data = read_csr(dd, addr);
+	return 0;
+}
+
+/*
+ * Use the 8051 to write a LCB CSR.
+ */
+static int write_lcb_via_8051(struct hfi1_devdata *dd, u32 addr, u64 data)
+{
+
+	if (acquire_lcb_access(dd, 0) == 0) {
+		write_csr(dd, addr, data);
+		release_lcb_access(dd, 0);
+		return 0;
+	}
+	return -EBUSY;
+}
+
+/*
+ * Write an LCB CSR.  Access may not be in host control, so check.
+ * Return 0 on success, -EBUSY on failure.
+ */
+int write_lcb_csr(struct hfi1_devdata *dd, u32 addr, u64 data)
+{
+	struct hfi1_pportdata *ppd = dd->pport;
+
+	/* if up, go through the 8051 for the value */
+	if (ppd->host_link_state & HLS_UP)
+		return write_lcb_via_8051(dd, addr, data);
+	/* if going up or down, no access */
+	if (ppd->host_link_state & (HLS_GOING_UP | HLS_GOING_OFFLINE))
+		return -EBUSY;
+	/* otherwise, host has access */
+	write_csr(dd, addr, data);
+	return 0;
+}
+
+/*
+ * Returns:
+ *	< 0 = Linux error, not able to get access
+ *	> 0 = 8051 command RETURN_CODE
+ */
+static int do_8051_command(
+	struct hfi1_devdata *dd,
+	u32 type,
+	u64 in_data,
+	u64 *out_data)
+{
+	u64 reg, completed;
+	int return_code;
+	unsigned long flags;
+	unsigned long timeout;
+
+	hfi1_cdbg(DC8051, "type %d, data 0x%012llx", type, in_data);
+
+	/*
+	 * Alternative to holding the lock for a long time:
+	 * - keep busy wait - have other users bounce off
+	 */
+	spin_lock_irqsave(&dd->dc8051_lock, flags);
+
+	/* We can't send any commands to the 8051 if it's in reset */
+	if (dd->dc_shutdown) {
+		return_code = -ENODEV;
+		goto fail;
+	}
+
+	/*
+	 * If an 8051 host command timed out previously, then the 8051 is
+	 * stuck.
+	 *
+	 * On first timeout, attempt to reset and restart the entire DC
+	 * block (including 8051). (Is this too big of a hammer?)
+	 *
+	 * If the 8051 times out a second time, the reset did not bring it
+	 * back to healthy life. In that case, fail any subsequent commands.
+	 */
+	if (dd->dc8051_timed_out) {
+		if (dd->dc8051_timed_out > 1) {
+			dd_dev_err(dd,
+				   "Previous 8051 host command timed out, skipping command %u\n",
+				   type);
+			return_code = -ENXIO;
+			goto fail;
+		}
+		spin_unlock_irqrestore(&dd->dc8051_lock, flags);
+		dc_shutdown(dd);
+		dc_start(dd);
+		spin_lock_irqsave(&dd->dc8051_lock, flags);
+	}
+
+	/*
+	 * If there is no timeout, then the 8051 command interface is
+	 * waiting for a command.
+	 */
+
+	/*
+	 * Do two writes: the first to stabilize the type and req_data, the
+	 * second to activate.
+	 */
+	reg = ((u64)type & DC_DC8051_CFG_HOST_CMD_0_REQ_TYPE_MASK)
+			<< DC_DC8051_CFG_HOST_CMD_0_REQ_TYPE_SHIFT
+		| (in_data & DC_DC8051_CFG_HOST_CMD_0_REQ_DATA_MASK)
+			<< DC_DC8051_CFG_HOST_CMD_0_REQ_DATA_SHIFT;
+	write_csr(dd, DC_DC8051_CFG_HOST_CMD_0, reg);
+	reg |= DC_DC8051_CFG_HOST_CMD_0_REQ_NEW_SMASK;
+	write_csr(dd, DC_DC8051_CFG_HOST_CMD_0, reg);
+
+	/* wait for completion, alternate: interrupt */
+	timeout = jiffies + msecs_to_jiffies(DC8051_COMMAND_TIMEOUT);
+	while (1) {
+		reg = read_csr(dd, DC_DC8051_CFG_HOST_CMD_1);
+		completed = reg & DC_DC8051_CFG_HOST_CMD_1_COMPLETED_SMASK;
+		if (completed)
+			break;
+		if (time_after(jiffies, timeout)) {
+			dd->dc8051_timed_out++;
+			dd_dev_err(dd, "8051 host command %u timeout\n", type);
+			if (out_data)
+				*out_data = 0;
+			return_code = -ETIMEDOUT;
+			goto fail;
+		}
+		udelay(2);
+	}
+
+	if (out_data) {
+		*out_data = (reg >> DC_DC8051_CFG_HOST_CMD_1_RSP_DATA_SHIFT)
+				& DC_DC8051_CFG_HOST_CMD_1_RSP_DATA_MASK;
+		if (type == HCMD_READ_LCB_CSR) {
+			/* top 16 bits are in a different register */
+			*out_data |= (read_csr(dd, DC_DC8051_CFG_EXT_DEV_1)
+				& DC_DC8051_CFG_EXT_DEV_1_REQ_DATA_SMASK)
+				<< (48
+				    - DC_DC8051_CFG_EXT_DEV_1_REQ_DATA_SHIFT);
+		}
+	}
+	return_code = (reg >> DC_DC8051_CFG_HOST_CMD_1_RETURN_CODE_SHIFT)
+				& DC_DC8051_CFG_HOST_CMD_1_RETURN_CODE_MASK;
+	dd->dc8051_timed_out = 0;
+	/*
+	 * Clear command for next user.
+	 */
+	write_csr(dd, DC_DC8051_CFG_HOST_CMD_0, 0);
+
+fail:
+	spin_unlock_irqrestore(&dd->dc8051_lock, flags);
+
+	return return_code;
+}
+
+static int set_physical_link_state(struct hfi1_devdata *dd, u64 state)
+{
+	return do_8051_command(dd, HCMD_CHANGE_PHY_STATE, state, NULL);
+}
+
+static int load_8051_config(struct hfi1_devdata *dd, u8 field_id,
+			    u8 lane_id, u32 config_data)
+{
+	u64 data;
+	int ret;
+
+	data = (u64)field_id << LOAD_DATA_FIELD_ID_SHIFT
+		| (u64)lane_id << LOAD_DATA_LANE_ID_SHIFT
+		| (u64)config_data << LOAD_DATA_DATA_SHIFT;
+	ret = do_8051_command(dd, HCMD_LOAD_CONFIG_DATA, data, NULL);
+	if (ret != HCMD_SUCCESS) {
+		dd_dev_err(dd,
+			"load 8051 config: field id %d, lane %d, err %d\n",
+			(int)field_id, (int)lane_id, ret);
+	}
+	return ret;
+}
+
+/*
+ * Read the 8051 firmware "registers".  Use the RAM directly.  Always
+ * set the result, even on error.
+ * Return 0 on success, -errno on failure
+ */
+static int read_8051_config(struct hfi1_devdata *dd, u8 field_id, u8 lane_id,
+			    u32 *result)
+{
+	u64 big_data;
+	u32 addr;
+	int ret;
+
+	/* address start depends on the lane_id */
+	if (lane_id < 4)
+		addr = (4 * NUM_GENERAL_FIELDS)
+			+ (lane_id * 4 * NUM_LANE_FIELDS);
+	else
+		addr = 0;
+	addr += field_id * 4;
+
+	/* read is in 8-byte chunks, hardware will truncate the address down */
+	ret = read_8051_data(dd, addr, 8, &big_data);
+
+	if (ret == 0) {
+		/* extract the 4 bytes we want */
+		if (addr & 0x4)
+			*result = (u32)(big_data >> 32);
+		else
+			*result = (u32)big_data;
+	} else {
+		*result = 0;
+		dd_dev_err(dd, "%s: direct read failed, lane %d, field %d!\n",
+			__func__, lane_id, field_id);
+	}
+
+	return ret;
+}
+
+static int write_vc_local_phy(struct hfi1_devdata *dd, u8 power_management,
+			      u8 continuous)
+{
+	u32 frame;
+
+	frame = continuous << CONTINIOUS_REMOTE_UPDATE_SUPPORT_SHIFT
+		| power_management << POWER_MANAGEMENT_SHIFT;
+	return load_8051_config(dd, VERIFY_CAP_LOCAL_PHY,
+				GENERAL_CONFIG, frame);
+}
+
+static int write_vc_local_fabric(struct hfi1_devdata *dd, u8 vau, u8 z, u8 vcu,
+				 u16 vl15buf, u8 crc_sizes)
+{
+	u32 frame;
+
+	frame = (u32)vau << VAU_SHIFT
+		| (u32)z << Z_SHIFT
+		| (u32)vcu << VCU_SHIFT
+		| (u32)vl15buf << VL15BUF_SHIFT
+		| (u32)crc_sizes << CRC_SIZES_SHIFT;
+	return load_8051_config(dd, VERIFY_CAP_LOCAL_FABRIC,
+				GENERAL_CONFIG, frame);
+}
+
+static void read_vc_local_link_width(struct hfi1_devdata *dd, u8 *misc_bits,
+				     u8 *flag_bits, u16 *link_widths)
+{
+	u32 frame;
+
+	read_8051_config(dd, VERIFY_CAP_LOCAL_LINK_WIDTH, GENERAL_CONFIG,
+				&frame);
+	*misc_bits = (frame >> MISC_CONFIG_BITS_SHIFT) & MISC_CONFIG_BITS_MASK;
+	*flag_bits = (frame >> LOCAL_FLAG_BITS_SHIFT) & LOCAL_FLAG_BITS_MASK;
+	*link_widths = (frame >> LINK_WIDTH_SHIFT) & LINK_WIDTH_MASK;
+}
+
+static int write_vc_local_link_width(struct hfi1_devdata *dd,
+				     u8 misc_bits,
+				     u8 flag_bits,
+				     u16 link_widths)
+{
+	u32 frame;
+
+	frame = (u32)misc_bits << MISC_CONFIG_BITS_SHIFT
+		| (u32)flag_bits << LOCAL_FLAG_BITS_SHIFT
+		| (u32)link_widths << LINK_WIDTH_SHIFT;
+	return load_8051_config(dd, VERIFY_CAP_LOCAL_LINK_WIDTH, GENERAL_CONFIG,
+		     frame);
+}
+
+static int write_local_device_id(struct hfi1_devdata *dd, u16 device_id,
+				 u8 device_rev)
+{
+	u32 frame;
+
+	frame = ((u32)device_id << LOCAL_DEVICE_ID_SHIFT)
+		| ((u32)device_rev << LOCAL_DEVICE_REV_SHIFT);
+	return load_8051_config(dd, LOCAL_DEVICE_ID, GENERAL_CONFIG, frame);
+}
+
+static void read_remote_device_id(struct hfi1_devdata *dd, u16 *device_id,
+				  u8 *device_rev)
+{
+	u32 frame;
+
+	read_8051_config(dd, REMOTE_DEVICE_ID, GENERAL_CONFIG, &frame);
+	*device_id = (frame >> REMOTE_DEVICE_ID_SHIFT) & REMOTE_DEVICE_ID_MASK;
+	*device_rev = (frame >> REMOTE_DEVICE_REV_SHIFT)
+			& REMOTE_DEVICE_REV_MASK;
+}
+
+void read_misc_status(struct hfi1_devdata *dd, u8 *ver_a, u8 *ver_b)
+{
+	u32 frame;
+
+	read_8051_config(dd, MISC_STATUS, GENERAL_CONFIG, &frame);
+	*ver_a = (frame >> STS_FM_VERSION_A_SHIFT) & STS_FM_VERSION_A_MASK;
+	*ver_b = (frame >> STS_FM_VERSION_B_SHIFT) & STS_FM_VERSION_B_MASK;
+}
+
+static void read_vc_remote_phy(struct hfi1_devdata *dd, u8 *power_management,
+			       u8 *continuous)
+{
+	u32 frame;
+
+	read_8051_config(dd, VERIFY_CAP_REMOTE_PHY, GENERAL_CONFIG, &frame);
+	*power_management = (frame >> POWER_MANAGEMENT_SHIFT)
+					& POWER_MANAGEMENT_MASK;
+	*continuous = (frame >> CONTINIOUS_REMOTE_UPDATE_SUPPORT_SHIFT)
+					& CONTINIOUS_REMOTE_UPDATE_SUPPORT_MASK;
+}
+
+static void read_vc_remote_fabric(struct hfi1_devdata *dd, u8 *vau, u8 *z,
+				  u8 *vcu, u16 *vl15buf, u8 *crc_sizes)
+{
+	u32 frame;
+
+	read_8051_config(dd, VERIFY_CAP_REMOTE_FABRIC, GENERAL_CONFIG, &frame);
+	*vau = (frame >> VAU_SHIFT) & VAU_MASK;
+	*z = (frame >> Z_SHIFT) & Z_MASK;
+	*vcu = (frame >> VCU_SHIFT) & VCU_MASK;
+	*vl15buf = (frame >> VL15BUF_SHIFT) & VL15BUF_MASK;
+	*crc_sizes = (frame >> CRC_SIZES_SHIFT) & CRC_SIZES_MASK;
+}
+
+static void read_vc_remote_link_width(struct hfi1_devdata *dd,
+				      u8 *remote_tx_rate,
+				      u16 *link_widths)
+{
+	u32 frame;
+
+	read_8051_config(dd, VERIFY_CAP_REMOTE_LINK_WIDTH, GENERAL_CONFIG,
+				&frame);
+	*remote_tx_rate = (frame >> REMOTE_TX_RATE_SHIFT)
+				& REMOTE_TX_RATE_MASK;
+	*link_widths = (frame >> LINK_WIDTH_SHIFT) & LINK_WIDTH_MASK;
+}
+
+static void read_local_lni(struct hfi1_devdata *dd, u8 *enable_lane_rx)
+{
+	u32 frame;
+
+	read_8051_config(dd, LOCAL_LNI_INFO, GENERAL_CONFIG, &frame);
+	*enable_lane_rx = (frame >> ENABLE_LANE_RX_SHIFT) & ENABLE_LANE_RX_MASK;
+}
+
+static void read_mgmt_allowed(struct hfi1_devdata *dd, u8 *mgmt_allowed)
+{
+	u32 frame;
+
+	read_8051_config(dd, REMOTE_LNI_INFO, GENERAL_CONFIG, &frame);
+	*mgmt_allowed = (frame >> MGMT_ALLOWED_SHIFT) & MGMT_ALLOWED_MASK;
+}
+
+static void read_last_local_state(struct hfi1_devdata *dd, u32 *lls)
+{
+	read_8051_config(dd, LAST_LOCAL_STATE_COMPLETE, GENERAL_CONFIG, lls);
+}
+
+static void read_last_remote_state(struct hfi1_devdata *dd, u32 *lrs)
+{
+	read_8051_config(dd, LAST_REMOTE_STATE_COMPLETE, GENERAL_CONFIG, lrs);
+}
+
+void hfi1_read_link_quality(struct hfi1_devdata *dd, u8 *link_quality)
+{
+	u32 frame;
+	int ret;
+
+	*link_quality = 0;
+	if (dd->pport->host_link_state & HLS_UP) {
+		ret = read_8051_config(dd, LINK_QUALITY_INFO, GENERAL_CONFIG,
+					&frame);
+		if (ret == 0)
+			*link_quality = (frame >> LINK_QUALITY_SHIFT)
+						& LINK_QUALITY_MASK;
+	}
+}
+
+static void read_planned_down_reason_code(struct hfi1_devdata *dd, u8 *pdrrc)
+{
+	u32 frame;
+
+	read_8051_config(dd, LINK_QUALITY_INFO, GENERAL_CONFIG, &frame);
+	*pdrrc = (frame >> DOWN_REMOTE_REASON_SHIFT) & DOWN_REMOTE_REASON_MASK;
+}
+
+static int read_tx_settings(struct hfi1_devdata *dd,
+			    u8 *enable_lane_tx,
+			    u8 *tx_polarity_inversion,
+			    u8 *rx_polarity_inversion,
+			    u8 *max_rate)
+{
+	u32 frame;
+	int ret;
+
+	ret = read_8051_config(dd, TX_SETTINGS, GENERAL_CONFIG, &frame);
+	*enable_lane_tx = (frame >> ENABLE_LANE_TX_SHIFT)
+				& ENABLE_LANE_TX_MASK;
+	*tx_polarity_inversion = (frame >> TX_POLARITY_INVERSION_SHIFT)
+				& TX_POLARITY_INVERSION_MASK;
+	*rx_polarity_inversion = (frame >> RX_POLARITY_INVERSION_SHIFT)
+				& RX_POLARITY_INVERSION_MASK;
+	*max_rate = (frame >> MAX_RATE_SHIFT) & MAX_RATE_MASK;
+	return ret;
+}
+
+static int write_tx_settings(struct hfi1_devdata *dd,
+			     u8 enable_lane_tx,
+			     u8 tx_polarity_inversion,
+			     u8 rx_polarity_inversion,
+			     u8 max_rate)
+{
+	u32 frame;
+
+	/* no need to mask, all variable sizes match field widths */
+	frame = enable_lane_tx << ENABLE_LANE_TX_SHIFT
+		| tx_polarity_inversion << TX_POLARITY_INVERSION_SHIFT
+		| rx_polarity_inversion << RX_POLARITY_INVERSION_SHIFT
+		| max_rate << MAX_RATE_SHIFT;
+	return load_8051_config(dd, TX_SETTINGS, GENERAL_CONFIG, frame);
+}
+
+static void check_fabric_firmware_versions(struct hfi1_devdata *dd)
+{
+	u32 frame, version, prod_id;
+	int ret, lane;
+
+	/* 4 lanes */
+	for (lane = 0; lane < 4; lane++) {
+		ret = read_8051_config(dd, SPICO_FW_VERSION, lane, &frame);
+		if (ret) {
+			dd_dev_err(
+				dd,
+				"Unable to read lane %d firmware details\n",
+				lane);
+			continue;
+		}
+		version = (frame >> SPICO_ROM_VERSION_SHIFT)
+					& SPICO_ROM_VERSION_MASK;
+		prod_id = (frame >> SPICO_ROM_PROD_ID_SHIFT)
+					& SPICO_ROM_PROD_ID_MASK;
+		dd_dev_info(dd,
+			"Lane %d firmware: version 0x%04x, prod_id 0x%04x\n",
+			lane, version, prod_id);
+	}
+}
+
+/*
+ * Read an idle LCB message.
+ *
+ * Returns 0 on success, -EINVAL on error
+ */
+static int read_idle_message(struct hfi1_devdata *dd, u64 type, u64 *data_out)
+{
+	int ret;
+
+	ret = do_8051_command(dd, HCMD_READ_LCB_IDLE_MSG,
+		type, data_out);
+	if (ret != HCMD_SUCCESS) {
+		dd_dev_err(dd, "read idle message: type %d, err %d\n",
+			(u32)type, ret);
+		return -EINVAL;
+	}
+	dd_dev_info(dd, "%s: read idle message 0x%llx\n", __func__, *data_out);
+	/* return only the payload as we already know the type */
+	*data_out >>= IDLE_PAYLOAD_SHIFT;
+	return 0;
+}
+
+/*
+ * Read an idle SMA message.  To be done in response to a notification from
+ * the 8051.
+ *
+ * Returns 0 on success, -EINVAL on error
+ */
+static int read_idle_sma(struct hfi1_devdata *dd, u64 *data)
+{
+	return read_idle_message(dd,
+			(u64)IDLE_SMA << IDLE_MSG_TYPE_SHIFT, data);
+}
+
+/*
+ * Send an idle LCB message.
+ *
+ * Returns 0 on success, -EINVAL on error
+ */
+static int send_idle_message(struct hfi1_devdata *dd, u64 data)
+{
+	int ret;
+
+	dd_dev_info(dd, "%s: sending idle message 0x%llx\n", __func__, data);
+	ret = do_8051_command(dd, HCMD_SEND_LCB_IDLE_MSG, data, NULL);
+	if (ret != HCMD_SUCCESS) {
+		dd_dev_err(dd, "send idle message: data 0x%llx, err %d\n",
+			data, ret);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/*
+ * Send an idle SMA message.
+ *
+ * Returns 0 on success, -EINVAL on error
+ */
+int send_idle_sma(struct hfi1_devdata *dd, u64 message)
+{
+	u64 data;
+
+	data = ((message & IDLE_PAYLOAD_MASK) << IDLE_PAYLOAD_SHIFT)
+		| ((u64)IDLE_SMA << IDLE_MSG_TYPE_SHIFT);
+	return send_idle_message(dd, data);
+}
+
+/*
+ * Initialize the LCB then do a quick link up.  This may or may not be
+ * in loopback.
+ *
+ * return 0 on success, -errno on error
+ */
+static int do_quick_linkup(struct hfi1_devdata *dd)
+{
+	u64 reg;
+	unsigned long timeout;
+	int ret;
+
+	lcb_shutdown(dd, 0);
+
+	if (loopback) {
+		/* LCB_CFG_LOOPBACK.VAL = 2 */
+		/* LCB_CFG_LANE_WIDTH.VAL = 0 */
+		write_csr(dd, DC_LCB_CFG_LOOPBACK,
+			IB_PACKET_TYPE << DC_LCB_CFG_LOOPBACK_VAL_SHIFT);
+		write_csr(dd, DC_LCB_CFG_LANE_WIDTH, 0);
+	}
+
+	/* start the LCBs */
+	/* LCB_CFG_TX_FIFOS_RESET.VAL = 0 */
+	write_csr(dd, DC_LCB_CFG_TX_FIFOS_RESET, 0);
+
+	/* simulator only loopback steps */
+	if (loopback && dd->icode == ICODE_FUNCTIONAL_SIMULATOR) {
+		/* LCB_CFG_RUN.EN = 1 */
+		write_csr(dd, DC_LCB_CFG_RUN,
+			1ull << DC_LCB_CFG_RUN_EN_SHIFT);
+
+		/* watch LCB_STS_LINK_TRANSFER_ACTIVE */
+		timeout = jiffies + msecs_to_jiffies(10);
+		while (1) {
+			reg = read_csr(dd,
+				DC_LCB_STS_LINK_TRANSFER_ACTIVE);
+			if (reg)
+				break;
+			if (time_after(jiffies, timeout)) {
+				dd_dev_err(dd,
+					"timeout waiting for LINK_TRANSFER_ACTIVE\n");
+				return -ETIMEDOUT;
+			}
+			udelay(2);
+		}
+
+		write_csr(dd, DC_LCB_CFG_ALLOW_LINK_UP,
+			1ull << DC_LCB_CFG_ALLOW_LINK_UP_VAL_SHIFT);
+	}
+
+	if (!loopback) {
+		/*
+		 * When doing quick linkup and not in loopback, both
+		 * sides must be done with LCB set-up before either
+		 * starts the quick linkup.  Put a delay here so that
+		 * both sides can be started and have a chance to be
+		 * done with LCB set up before resuming.
+		 */
+		dd_dev_err(dd,
+			"Pausing for peer to be finished with LCB set up\n");
+		msleep(5000);
+		dd_dev_err(dd,
+			"Continuing with quick linkup\n");
+	}
+
+	write_csr(dd, DC_LCB_ERR_EN, 0); /* mask LCB errors */
+	set_8051_lcb_access(dd);
+
+	/*
+	 * State "quick" LinkUp request sets the physical link state to
+	 * LinkUp without a verify capability sequence.
+	 * This state is in simulator v37 and later.
+	 */
+	ret = set_physical_link_state(dd, PLS_QUICK_LINKUP);
+	if (ret != HCMD_SUCCESS) {
+		dd_dev_err(dd,
+			"%s: set physical link state to quick LinkUp failed with return %d\n",
+			__func__, ret);
+
+		set_host_lcb_access(dd);
+		write_csr(dd, DC_LCB_ERR_EN, ~0ull); /* watch LCB errors */
+
+		if (ret >= 0)
+			ret = -EINVAL;
+		return ret;
+	}
+
+	return 0; /* success */
+}
+
+/*
+ * Set the SerDes to internal loopback mode.
+ * Returns 0 on success, -errno on error.
+ */
+static int set_serdes_loopback_mode(struct hfi1_devdata *dd)
+{
+	int ret;
+
+	ret = set_physical_link_state(dd, PLS_INTERNAL_SERDES_LOOPBACK);
+	if (ret == HCMD_SUCCESS)
+		return 0;
+	dd_dev_err(dd,
+		"Set physical link state to SerDes Loopback failed with return %d\n",
+		ret);
+	if (ret >= 0)
+		ret = -EINVAL;
+	return ret;
+}
+
+/*
+ * Do all special steps to set up loopback.
+ */
+static int init_loopback(struct hfi1_devdata *dd)
+{
+	dd_dev_info(dd, "Entering loopback mode\n");
+
+	/* all loopbacks should disable self GUID check */
+	write_csr(dd, DC_DC8051_CFG_MODE,
+		(read_csr(dd, DC_DC8051_CFG_MODE) | DISABLE_SELF_GUID_CHECK));
+
+	/*
+	 * The simulator has only one loopback option - LCB.  Switch
+	 * to that option, which includes quick link up.
+	 *
+	 * Accept all valid loopback values.
+	 */
+	if ((dd->icode == ICODE_FUNCTIONAL_SIMULATOR)
+		&& (loopback == LOOPBACK_SERDES
+			|| loopback == LOOPBACK_LCB
+			|| loopback == LOOPBACK_CABLE)) {
+		loopback = LOOPBACK_LCB;
+		quick_linkup = 1;
+		return 0;
+	}
+
+	/* handle serdes loopback */
+	if (loopback == LOOPBACK_SERDES) {
+		/* internal serdes loopack needs quick linkup on RTL */
+		if (dd->icode == ICODE_RTL_SILICON)
+			quick_linkup = 1;
+		return set_serdes_loopback_mode(dd);
+	}
+
+	/* LCB loopback - handled at poll time */
+	if (loopback == LOOPBACK_LCB) {
+		quick_linkup = 1; /* LCB is always quick linkup */
+
+		/* not supported in emulation due to emulation RTL changes */
+		if (dd->icode == ICODE_FPGA_EMULATION) {
+			dd_dev_err(dd,
+				"LCB loopback not supported in emulation\n");
+			return -EINVAL;
+		}
+		return 0;
+	}
+
+	/* external cable loopback requires no extra steps */
+	if (loopback == LOOPBACK_CABLE)
+		return 0;
+
+	dd_dev_err(dd, "Invalid loopback mode %d\n", loopback);
+	return -EINVAL;
+}
+
+/*
+ * Translate from the OPA_LINK_WIDTH handed to us by the FM to bits
+ * used in the Verify Capability link width attribute.
+ */
+static u16 opa_to_vc_link_widths(u16 opa_widths)
+{
+	int i;
+	u16 result = 0;
+
+	static const struct link_bits {
+		u16 from;
+		u16 to;
+	} opa_link_xlate[] = {
+		{ OPA_LINK_WIDTH_1X, 1 << (1-1)  },
+		{ OPA_LINK_WIDTH_2X, 1 << (2-1)  },
+		{ OPA_LINK_WIDTH_3X, 1 << (3-1)  },
+		{ OPA_LINK_WIDTH_4X, 1 << (4-1)  },
+	};
+
+	for (i = 0; i < ARRAY_SIZE(opa_link_xlate); i++) {
+		if (opa_widths & opa_link_xlate[i].from)
+			result |= opa_link_xlate[i].to;
+	}
+	return result;
+}
+
+/*
+ * Set link attributes before moving to polling.
+ */
+static int set_local_link_attributes(struct hfi1_pportdata *ppd)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	u8 enable_lane_tx;
+	u8 tx_polarity_inversion;
+	u8 rx_polarity_inversion;
+	int ret;
+
+	/* reset our fabric serdes to clear any lingering problems */
+	fabric_serdes_reset(dd);
+
+	/* set the local tx rate - need to read-modify-write */
+	ret = read_tx_settings(dd, &enable_lane_tx, &tx_polarity_inversion,
+		&rx_polarity_inversion, &ppd->local_tx_rate);
+	if (ret)
+		goto set_local_link_attributes_fail;
+
+	if (dd->dc8051_ver < dc8051_ver(0, 20)) {
+		/* set the tx rate to the fastest enabled */
+		if (ppd->link_speed_enabled & OPA_LINK_SPEED_25G)
+			ppd->local_tx_rate = 1;
+		else
+			ppd->local_tx_rate = 0;
+	} else {
+		/* set the tx rate to all enabled */
+		ppd->local_tx_rate = 0;
+		if (ppd->link_speed_enabled & OPA_LINK_SPEED_25G)
+			ppd->local_tx_rate |= 2;
+		if (ppd->link_speed_enabled & OPA_LINK_SPEED_12_5G)
+			ppd->local_tx_rate |= 1;
+	}
+	ret = write_tx_settings(dd, enable_lane_tx, tx_polarity_inversion,
+		     rx_polarity_inversion, ppd->local_tx_rate);
+	if (ret != HCMD_SUCCESS)
+		goto set_local_link_attributes_fail;
+
+	/*
+	 * DC supports continuous updates.
+	 */
+	ret = write_vc_local_phy(dd, 0 /* no power management */,
+				     1 /* continuous updates */);
+	if (ret != HCMD_SUCCESS)
+		goto set_local_link_attributes_fail;
+
+	/* z=1 in the next call: AU of 0 is not supported by the hardware */
+	ret = write_vc_local_fabric(dd, dd->vau, 1, dd->vcu, dd->vl15_init,
+				    ppd->port_crc_mode_enabled);
+	if (ret != HCMD_SUCCESS)
+		goto set_local_link_attributes_fail;
+
+	ret = write_vc_local_link_width(dd, 0, 0,
+		     opa_to_vc_link_widths(ppd->link_width_enabled));
+	if (ret != HCMD_SUCCESS)
+		goto set_local_link_attributes_fail;
+
+	/* let peer know who we are */
+	ret = write_local_device_id(dd, dd->pcidev->device, dd->minrev);
+	if (ret == HCMD_SUCCESS)
+		return 0;
+
+set_local_link_attributes_fail:
+	dd_dev_err(dd,
+		"Failed to set local link attributes, return 0x%x\n",
+		ret);
+	return ret;
+}
+
+/*
+ * Call this to start the link.  Schedule a retry if the cable is not
+ * present or if unable to start polling.  Do not do anything if the
+ * link is disabled.  Returns 0 if link is disabled or moved to polling
+ */
+int start_link(struct hfi1_pportdata *ppd)
+{
+	if (!ppd->link_enabled) {
+		dd_dev_info(ppd->dd,
+			"%s: stopping link start because link is disabled\n",
+			__func__);
+		return 0;
+	}
+	if (!ppd->driver_link_ready) {
+		dd_dev_info(ppd->dd,
+			"%s: stopping link start because driver is not ready\n",
+			__func__);
+		return 0;
+	}
+
+	if (qsfp_mod_present(ppd) || loopback == LOOPBACK_SERDES ||
+			loopback == LOOPBACK_LCB ||
+			ppd->dd->icode == ICODE_FUNCTIONAL_SIMULATOR)
+		return set_link_state(ppd, HLS_DN_POLL);
+
+	dd_dev_info(ppd->dd,
+		"%s: stopping link start because no cable is present\n",
+		__func__);
+	return -EAGAIN;
+}
+
+static void reset_qsfp(struct hfi1_pportdata *ppd)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	u64 mask, qsfp_mask;
+
+	mask = (u64)QSFP_HFI0_RESET_N;
+	qsfp_mask = read_csr(dd,
+		dd->hfi1_id ? ASIC_QSFP2_OE : ASIC_QSFP1_OE);
+	qsfp_mask |= mask;
+	write_csr(dd,
+		dd->hfi1_id ? ASIC_QSFP2_OE : ASIC_QSFP1_OE,
+		qsfp_mask);
+
+	qsfp_mask = read_csr(dd,
+		dd->hfi1_id ? ASIC_QSFP2_OUT : ASIC_QSFP1_OUT);
+	qsfp_mask &= ~mask;
+	write_csr(dd,
+		dd->hfi1_id ? ASIC_QSFP2_OUT : ASIC_QSFP1_OUT,
+		qsfp_mask);
+
+	udelay(10);
+
+	qsfp_mask |= mask;
+	write_csr(dd,
+		dd->hfi1_id ? ASIC_QSFP2_OUT : ASIC_QSFP1_OUT,
+		qsfp_mask);
+}
+
+static int handle_qsfp_error_conditions(struct hfi1_pportdata *ppd,
+					u8 *qsfp_interrupt_status)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+
+	if ((qsfp_interrupt_status[0] & QSFP_HIGH_TEMP_ALARM) ||
+		(qsfp_interrupt_status[0] & QSFP_HIGH_TEMP_WARNING))
+		dd_dev_info(dd,
+			"%s: QSFP cable on fire\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[0] & QSFP_LOW_TEMP_ALARM) ||
+		(qsfp_interrupt_status[0] & QSFP_LOW_TEMP_WARNING))
+		dd_dev_info(dd,
+			"%s: QSFP cable temperature too low\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[1] & QSFP_HIGH_VCC_ALARM) ||
+		(qsfp_interrupt_status[1] & QSFP_HIGH_VCC_WARNING))
+		dd_dev_info(dd,
+			"%s: QSFP supply voltage too high\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[1] & QSFP_LOW_VCC_ALARM) ||
+		(qsfp_interrupt_status[1] & QSFP_LOW_VCC_WARNING))
+		dd_dev_info(dd,
+			"%s: QSFP supply voltage too low\n",
+			__func__);
+
+	/* Byte 2 is vendor specific */
+
+	if ((qsfp_interrupt_status[3] & QSFP_HIGH_POWER_ALARM) ||
+		(qsfp_interrupt_status[3] & QSFP_HIGH_POWER_WARNING))
+		dd_dev_info(dd,
+			"%s: Cable RX channel 1/2 power too high\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[3] & QSFP_LOW_POWER_ALARM) ||
+		(qsfp_interrupt_status[3] & QSFP_LOW_POWER_WARNING))
+		dd_dev_info(dd,
+			"%s: Cable RX channel 1/2 power too low\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[4] & QSFP_HIGH_POWER_ALARM) ||
+		(qsfp_interrupt_status[4] & QSFP_HIGH_POWER_WARNING))
+		dd_dev_info(dd,
+			"%s: Cable RX channel 3/4 power too high\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[4] & QSFP_LOW_POWER_ALARM) ||
+		(qsfp_interrupt_status[4] & QSFP_LOW_POWER_WARNING))
+		dd_dev_info(dd,
+			"%s: Cable RX channel 3/4 power too low\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[5] & QSFP_HIGH_BIAS_ALARM) ||
+		(qsfp_interrupt_status[5] & QSFP_HIGH_BIAS_WARNING))
+		dd_dev_info(dd,
+			"%s: Cable TX channel 1/2 bias too high\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[5] & QSFP_LOW_BIAS_ALARM) ||
+		(qsfp_interrupt_status[5] & QSFP_LOW_BIAS_WARNING))
+		dd_dev_info(dd,
+			"%s: Cable TX channel 1/2 bias too low\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[6] & QSFP_HIGH_BIAS_ALARM) ||
+		(qsfp_interrupt_status[6] & QSFP_HIGH_BIAS_WARNING))
+		dd_dev_info(dd,
+			"%s: Cable TX channel 3/4 bias too high\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[6] & QSFP_LOW_BIAS_ALARM) ||
+		(qsfp_interrupt_status[6] & QSFP_LOW_BIAS_WARNING))
+		dd_dev_info(dd,
+			"%s: Cable TX channel 3/4 bias too low\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[7] & QSFP_HIGH_POWER_ALARM) ||
+		(qsfp_interrupt_status[7] & QSFP_HIGH_POWER_WARNING))
+		dd_dev_info(dd,
+			"%s: Cable TX channel 1/2 power too high\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[7] & QSFP_LOW_POWER_ALARM) ||
+		(qsfp_interrupt_status[7] & QSFP_LOW_POWER_WARNING))
+		dd_dev_info(dd,
+			"%s: Cable TX channel 1/2 power too low\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[8] & QSFP_HIGH_POWER_ALARM) ||
+		(qsfp_interrupt_status[8] & QSFP_HIGH_POWER_WARNING))
+		dd_dev_info(dd,
+			"%s: Cable TX channel 3/4 power too high\n",
+			__func__);
+
+	if ((qsfp_interrupt_status[8] & QSFP_LOW_POWER_ALARM) ||
+		(qsfp_interrupt_status[8] & QSFP_LOW_POWER_WARNING))
+		dd_dev_info(dd,
+			"%s: Cable TX channel 3/4 power too low\n",
+			__func__);
+
+	/* Bytes 9-10 and 11-12 are reserved */
+	/* Bytes 13-15 are vendor specific */
+
+	return 0;
+}
+
+static int do_pre_lni_host_behaviors(struct hfi1_pportdata *ppd)
+{
+	refresh_qsfp_cache(ppd, &ppd->qsfp_info);
+
+	return 0;
+}
+
+static int do_qsfp_intr_fallback(struct hfi1_pportdata *ppd)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	u8 qsfp_interrupt_status = 0;
+
+	if (qsfp_read(ppd, dd->hfi1_id, 2, &qsfp_interrupt_status, 1)
+		!= 1) {
+		dd_dev_info(dd,
+			"%s: Failed to read status of QSFP module\n",
+			__func__);
+		return -EIO;
+	}
+
+	/* We don't care about alarms & warnings with a non-functional INT_N */
+	if (!(qsfp_interrupt_status & QSFP_DATA_NOT_READY))
+		do_pre_lni_host_behaviors(ppd);
+
+	return 0;
+}
+
+/* This routine will only be scheduled if the QSFP module is present */
+static void qsfp_event(struct work_struct *work)
+{
+	struct qsfp_data *qd;
+	struct hfi1_pportdata *ppd;
+	struct hfi1_devdata *dd;
+
+	qd = container_of(work, struct qsfp_data, qsfp_work);
+	ppd = qd->ppd;
+	dd = ppd->dd;
+
+	/* Sanity check */
+	if (!qsfp_mod_present(ppd))
+		return;
+
+	/*
+	 * Turn DC back on after cables has been
+	 * re-inserted. Up until now, the DC has been in
+	 * reset to save power.
+	 */
+	dc_start(dd);
+
+	if (qd->cache_refresh_required) {
+		msleep(3000);
+		reset_qsfp(ppd);
+
+		/* Check for QSFP interrupt after t_init (SFF 8679)
+		 * + extra
+		 */
+		msleep(3000);
+		if (!qd->qsfp_interrupt_functional) {
+			if (do_qsfp_intr_fallback(ppd) < 0)
+				dd_dev_info(dd, "%s: QSFP fallback failed\n",
+					__func__);
+			ppd->driver_link_ready = 1;
+			start_link(ppd);
+		}
+	}
+
+	if (qd->check_interrupt_flags) {
+		u8 qsfp_interrupt_status[16] = {0,};
+
+		if (qsfp_read(ppd, dd->hfi1_id, 6,
+			      &qsfp_interrupt_status[0], 16) != 16) {
+			dd_dev_info(dd,
+				"%s: Failed to read status of QSFP module\n",
+				__func__);
+		} else {
+			unsigned long flags;
+			u8 data_status;
+
+			spin_lock_irqsave(&ppd->qsfp_info.qsfp_lock, flags);
+			ppd->qsfp_info.check_interrupt_flags = 0;
+			spin_unlock_irqrestore(&ppd->qsfp_info.qsfp_lock,
+								flags);
+
+			if (qsfp_read(ppd, dd->hfi1_id, 2, &data_status, 1)
+				 != 1) {
+				dd_dev_info(dd,
+				"%s: Failed to read status of QSFP module\n",
+					__func__);
+			}
+			if (!(data_status & QSFP_DATA_NOT_READY)) {
+				do_pre_lni_host_behaviors(ppd);
+				start_link(ppd);
+			} else
+				handle_qsfp_error_conditions(ppd,
+						qsfp_interrupt_status);
+		}
+	}
+}
+
+void init_qsfp(struct hfi1_pportdata *ppd)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	u64 qsfp_mask;
+
+	if (loopback == LOOPBACK_SERDES || loopback == LOOPBACK_LCB ||
+			ppd->dd->icode == ICODE_FUNCTIONAL_SIMULATOR ||
+			!HFI1_CAP_IS_KSET(QSFP_ENABLED)) {
+		ppd->driver_link_ready = 1;
+		return;
+	}
+
+	ppd->qsfp_info.ppd = ppd;
+	INIT_WORK(&ppd->qsfp_info.qsfp_work, qsfp_event);
+
+	qsfp_mask = (u64)(QSFP_HFI0_INT_N | QSFP_HFI0_MODPRST_N);
+	/* Clear current status to avoid spurious interrupts */
+	write_csr(dd,
+			dd->hfi1_id ?
+				ASIC_QSFP2_CLEAR :
+				ASIC_QSFP1_CLEAR,
+		qsfp_mask);
+
+	/* Handle active low nature of INT_N and MODPRST_N pins */
+	if (qsfp_mod_present(ppd))
+		qsfp_mask &= ~(u64)QSFP_HFI0_MODPRST_N;
+	write_csr(dd,
+		  dd->hfi1_id ? ASIC_QSFP2_INVERT : ASIC_QSFP1_INVERT,
+		  qsfp_mask);
+
+	/* Allow only INT_N and MODPRST_N to trigger QSFP interrupts */
+	qsfp_mask |= (u64)QSFP_HFI0_MODPRST_N;
+	write_csr(dd,
+		dd->hfi1_id ? ASIC_QSFP2_MASK : ASIC_QSFP1_MASK,
+		qsfp_mask);
+
+	if (qsfp_mod_present(ppd)) {
+		msleep(3000);
+		reset_qsfp(ppd);
+
+		/* Check for QSFP interrupt after t_init (SFF 8679)
+		 * + extra
+		 */
+		msleep(3000);
+		if (!ppd->qsfp_info.qsfp_interrupt_functional) {
+			if (do_qsfp_intr_fallback(ppd) < 0)
+				dd_dev_info(dd,
+					"%s: QSFP fallback failed\n",
+					__func__);
+			ppd->driver_link_ready = 1;
+		}
+	}
+}
+
+int bringup_serdes(struct hfi1_pportdata *ppd)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	u64 guid;
+	int ret;
+
+	if (HFI1_CAP_IS_KSET(EXTENDED_PSN))
+		add_rcvctrl(dd, RCV_CTRL_RCV_EXTENDED_PSN_ENABLE_SMASK);
+
+	guid = ppd->guid;
+	if (!guid) {
+		if (dd->base_guid)
+			guid = dd->base_guid + ppd->port - 1;
+		ppd->guid = guid;
+	}
+
+	/* the link defaults to enabled */
+	ppd->link_enabled = 1;
+	/* Set linkinit_reason on power up per OPA spec */
+	ppd->linkinit_reason = OPA_LINKINIT_REASON_LINKUP;
+
+	if (loopback) {
+		ret = init_loopback(dd);
+		if (ret < 0)
+			return ret;
+	}
+
+	return start_link(ppd);
+}
+
+void hfi1_quiet_serdes(struct hfi1_pportdata *ppd)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+
+	/*
+	 * Shut down the link and keep it down.   First turn off that the
+	 * driver wants to allow the link to be up (driver_link_ready).
+	 * Then make sure the link is not automatically restarted
+	 * (link_enabled).  Cancel any pending restart.  And finally
+	 * go offline.
+	 */
+	ppd->driver_link_ready = 0;
+	ppd->link_enabled = 0;
+
+	set_link_down_reason(ppd, OPA_LINKDOWN_REASON_SMA_DISABLED, 0,
+	  OPA_LINKDOWN_REASON_SMA_DISABLED);
+	set_link_state(ppd, HLS_DN_OFFLINE);
+
+	/* disable the port */
+	clear_rcvctrl(dd, RCV_CTRL_RCV_PORT_ENABLE_SMASK);
+}
+
+static inline int init_cpu_counters(struct hfi1_devdata *dd)
+{
+	struct hfi1_pportdata *ppd;
+	int i;
+
+	ppd = (struct hfi1_pportdata *)(dd + 1);
+	for (i = 0; i < dd->num_pports; i++, ppd++) {
+		ppd->ibport_data.rc_acks = NULL;
+		ppd->ibport_data.rc_qacks = NULL;
+		ppd->ibport_data.rc_acks = alloc_percpu(u64);
+		ppd->ibport_data.rc_qacks = alloc_percpu(u64);
+		ppd->ibport_data.rc_delayed_comp = alloc_percpu(u64);
+		if ((ppd->ibport_data.rc_acks == NULL) ||
+		    (ppd->ibport_data.rc_delayed_comp == NULL) ||
+		    (ppd->ibport_data.rc_qacks == NULL))
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static const char * const pt_names[] = {
+	"expected",
+	"eager",
+	"invalid"
+};
+
+static const char *pt_name(u32 type)
+{
+	return type >= ARRAY_SIZE(pt_names) ? "unknown" : pt_names[type];
+}
+
+/*
+ * index is the index into the receive array
+ */
+void hfi1_put_tid(struct hfi1_devdata *dd, u32 index,
+		  u32 type, unsigned long pa, u16 order)
+{
+	u64 reg;
+	void __iomem *base = (dd->rcvarray_wc ? dd->rcvarray_wc :
+			      (dd->kregbase + RCV_ARRAY));
+
+	if (!(dd->flags & HFI1_PRESENT))
+		goto done;
+
+	if (type == PT_INVALID) {
+		pa = 0;
+	} else if (type > PT_INVALID) {
+		dd_dev_err(dd,
+			"unexpected receive array type %u for index %u, not handled\n",
+			type, index);
+		goto done;
+	}
+
+	hfi1_cdbg(TID, "type %s, index 0x%x, pa 0x%lx, bsize 0x%lx",
+		  pt_name(type), index, pa, (unsigned long)order);
+
+#define RT_ADDR_SHIFT 12	/* 4KB kernel address boundary */
+	reg = RCV_ARRAY_RT_WRITE_ENABLE_SMASK
+		| (u64)order << RCV_ARRAY_RT_BUF_SIZE_SHIFT
+		| ((pa >> RT_ADDR_SHIFT) & RCV_ARRAY_RT_ADDR_MASK)
+					<< RCV_ARRAY_RT_ADDR_SHIFT;
+	writeq(reg, base + (index * 8));
+
+	if (type == PT_EAGER)
+		/*
+		 * Eager entries are written one-by-one so we have to push them
+		 * after we write the entry.
+		 */
+		flush_wc();
+done:
+	return;
+}
+
+void hfi1_clear_tids(struct hfi1_ctxtdata *rcd)
+{
+	struct hfi1_devdata *dd = rcd->dd;
+	u32 i;
+
+	/* this could be optimized */
+	for (i = rcd->eager_base; i < rcd->eager_base +
+		     rcd->egrbufs.alloced; i++)
+		hfi1_put_tid(dd, i, PT_INVALID, 0, 0);
+
+	for (i = rcd->expected_base;
+			i < rcd->expected_base + rcd->expected_count; i++)
+		hfi1_put_tid(dd, i, PT_INVALID, 0, 0);
+}
+
+int hfi1_get_base_kinfo(struct hfi1_ctxtdata *rcd,
+			struct hfi1_ctxt_info *kinfo)
+{
+	kinfo->runtime_flags = (HFI1_MISC_GET() << HFI1_CAP_USER_SHIFT) |
+		HFI1_CAP_UGET(MASK) | HFI1_CAP_KGET(K2U);
+	return 0;
+}
+
+struct hfi1_message_header *hfi1_get_msgheader(
+				struct hfi1_devdata *dd, __le32 *rhf_addr)
+{
+	u32 offset = rhf_hdrq_offset(rhf_to_cpu(rhf_addr));
+
+	return (struct hfi1_message_header *)
+		(rhf_addr - dd->rhf_offset + offset);
+}
+
+static const char * const ib_cfg_name_strings[] = {
+	"HFI1_IB_CFG_LIDLMC",
+	"HFI1_IB_CFG_LWID_DG_ENB",
+	"HFI1_IB_CFG_LWID_ENB",
+	"HFI1_IB_CFG_LWID",
+	"HFI1_IB_CFG_SPD_ENB",
+	"HFI1_IB_CFG_SPD",
+	"HFI1_IB_CFG_RXPOL_ENB",
+	"HFI1_IB_CFG_LREV_ENB",
+	"HFI1_IB_CFG_LINKLATENCY",
+	"HFI1_IB_CFG_HRTBT",
+	"HFI1_IB_CFG_OP_VLS",
+	"HFI1_IB_CFG_VL_HIGH_CAP",
+	"HFI1_IB_CFG_VL_LOW_CAP",
+	"HFI1_IB_CFG_OVERRUN_THRESH",
+	"HFI1_IB_CFG_PHYERR_THRESH",
+	"HFI1_IB_CFG_LINKDEFAULT",
+	"HFI1_IB_CFG_PKEYS",
+	"HFI1_IB_CFG_MTU",
+	"HFI1_IB_CFG_LSTATE",
+	"HFI1_IB_CFG_VL_HIGH_LIMIT",
+	"HFI1_IB_CFG_PMA_TICKS",
+	"HFI1_IB_CFG_PORT"
+};
+
+static const char *ib_cfg_name(int which)
+{
+	if (which < 0 || which >= ARRAY_SIZE(ib_cfg_name_strings))
+		return "invalid";
+	return ib_cfg_name_strings[which];
+}
+
+int hfi1_get_ib_cfg(struct hfi1_pportdata *ppd, int which)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	int val = 0;
+
+	switch (which) {
+	case HFI1_IB_CFG_LWID_ENB: /* allowed Link-width */
+		val = ppd->link_width_enabled;
+		break;
+	case HFI1_IB_CFG_LWID: /* currently active Link-width */
+		val = ppd->link_width_active;
+		break;
+	case HFI1_IB_CFG_SPD_ENB: /* allowed Link speeds */
+		val = ppd->link_speed_enabled;
+		break;
+	case HFI1_IB_CFG_SPD: /* current Link speed */
+		val = ppd->link_speed_active;
+		break;
+
+	case HFI1_IB_CFG_RXPOL_ENB: /* Auto-RX-polarity enable */
+	case HFI1_IB_CFG_LREV_ENB: /* Auto-Lane-reversal enable */
+	case HFI1_IB_CFG_LINKLATENCY:
+		goto unimplemented;
+
+	case HFI1_IB_CFG_OP_VLS:
+		val = ppd->vls_operational;
+		break;
+	case HFI1_IB_CFG_VL_HIGH_CAP: /* VL arb high priority table size */
+		val = VL_ARB_HIGH_PRIO_TABLE_SIZE;
+		break;
+	case HFI1_IB_CFG_VL_LOW_CAP: /* VL arb low priority table size */
+		val = VL_ARB_LOW_PRIO_TABLE_SIZE;
+		break;
+	case HFI1_IB_CFG_OVERRUN_THRESH: /* IB overrun threshold */
+		val = ppd->overrun_threshold;
+		break;
+	case HFI1_IB_CFG_PHYERR_THRESH: /* IB PHY error threshold */
+		val = ppd->phy_error_threshold;
+		break;
+	case HFI1_IB_CFG_LINKDEFAULT: /* IB link default (sleep/poll) */
+		val = dd->link_default;
+		break;
+
+	case HFI1_IB_CFG_HRTBT: /* Heartbeat off/enable/auto */
+	case HFI1_IB_CFG_PMA_TICKS:
+	default:
+unimplemented:
+		if (HFI1_CAP_IS_KSET(PRINT_UNIMPL))
+			dd_dev_info(
+				dd,
+				"%s: which %s: not implemented\n",
+				__func__,
+				ib_cfg_name(which));
+		break;
+	}
+
+	return val;
+}
+
+/*
+ * The largest MAD packet size.
+ */
+#define MAX_MAD_PACKET 2048
+
+/*
+ * Return the maximum header bytes that can go on the _wire_
+ * for this device. This count includes the ICRC which is
+ * not part of the packet held in memory but it is appended
+ * by the HW.
+ * This is dependent on the device's receive header entry size.
+ * HFI allows this to be set per-receive context, but the
+ * driver presently enforces a global value.
+ */
+u32 lrh_max_header_bytes(struct hfi1_devdata *dd)
+{
+	/*
+	 * The maximum non-payload (MTU) bytes in LRH.PktLen are
+	 * the Receive Header Entry Size minus the PBC (or RHF) size
+	 * plus one DW for the ICRC appended by HW.
+	 *
+	 * dd->rcd[0].rcvhdrqentsize is in DW.
+	 * We use rcd[0] as all context will have the same value. Also,
+	 * the first kernel context would have been allocated by now so
+	 * we are guaranteed a valid value.
+	 */
+	return (dd->rcd[0]->rcvhdrqentsize - 2/*PBC/RHF*/ + 1/*ICRC*/) << 2;
+}
+
+/*
+ * Set Send Length
+ * @ppd - per port data
+ *
+ * Set the MTU by limiting how many DWs may be sent.  The SendLenCheck*
+ * registers compare against LRH.PktLen, so use the max bytes included
+ * in the LRH.
+ *
+ * This routine changes all VL values except VL15, which it maintains at
+ * the same value.
+ */
+static void set_send_length(struct hfi1_pportdata *ppd)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	u32 max_hb = lrh_max_header_bytes(dd), maxvlmtu = 0, dcmtu;
+	u64 len1 = 0, len2 = (((dd->vld[15].mtu + max_hb) >> 2)
+			      & SEND_LEN_CHECK1_LEN_VL15_MASK) <<
+		SEND_LEN_CHECK1_LEN_VL15_SHIFT;
+	int i;
+
+	for (i = 0; i < ppd->vls_supported; i++) {
+		if (dd->vld[i].mtu > maxvlmtu)
+			maxvlmtu = dd->vld[i].mtu;
+		if (i <= 3)
+			len1 |= (((dd->vld[i].mtu + max_hb) >> 2)
+				 & SEND_LEN_CHECK0_LEN_VL0_MASK) <<
+				((i % 4) * SEND_LEN_CHECK0_LEN_VL1_SHIFT);
+		else
+			len2 |= (((dd->vld[i].mtu + max_hb) >> 2)
+				 & SEND_LEN_CHECK1_LEN_VL4_MASK) <<
+				((i % 4) * SEND_LEN_CHECK1_LEN_VL5_SHIFT);
+	}
+	write_csr(dd, SEND_LEN_CHECK0, len1);
+	write_csr(dd, SEND_LEN_CHECK1, len2);
+	/* adjust kernel credit return thresholds based on new MTUs */
+	/* all kernel receive contexts have the same hdrqentsize */
+	for (i = 0; i < ppd->vls_supported; i++) {
+		sc_set_cr_threshold(dd->vld[i].sc,
+			sc_mtu_to_threshold(dd->vld[i].sc, dd->vld[i].mtu,
+				dd->rcd[0]->rcvhdrqentsize));
+	}
+	sc_set_cr_threshold(dd->vld[15].sc,
+		sc_mtu_to_threshold(dd->vld[15].sc, dd->vld[15].mtu,
+			dd->rcd[0]->rcvhdrqentsize));
+
+	/* Adjust maximum MTU for the port in DC */
+	dcmtu = maxvlmtu == 10240 ? DCC_CFG_PORT_MTU_CAP_10240 :
+		(ilog2(maxvlmtu >> 8) + 1);
+	len1 = read_csr(ppd->dd, DCC_CFG_PORT_CONFIG);
+	len1 &= ~DCC_CFG_PORT_CONFIG_MTU_CAP_SMASK;
+	len1 |= ((u64)dcmtu & DCC_CFG_PORT_CONFIG_MTU_CAP_MASK) <<
+		DCC_CFG_PORT_CONFIG_MTU_CAP_SHIFT;
+	write_csr(ppd->dd, DCC_CFG_PORT_CONFIG, len1);
+}
+
+static void set_lidlmc(struct hfi1_pportdata *ppd)
+{
+	int i;
+	u64 sreg = 0;
+	struct hfi1_devdata *dd = ppd->dd;
+	u32 mask = ~((1U << ppd->lmc) - 1);
+	u64 c1 = read_csr(ppd->dd, DCC_CFG_PORT_CONFIG1);
+
+	if (dd->hfi1_snoop.mode_flag)
+		dd_dev_info(dd, "Set lid/lmc while snooping");
+
+	c1 &= ~(DCC_CFG_PORT_CONFIG1_TARGET_DLID_SMASK
+		| DCC_CFG_PORT_CONFIG1_DLID_MASK_SMASK);
+	c1 |= ((ppd->lid & DCC_CFG_PORT_CONFIG1_TARGET_DLID_MASK)
+			<< DCC_CFG_PORT_CONFIG1_TARGET_DLID_SHIFT)|
+	      ((mask & DCC_CFG_PORT_CONFIG1_DLID_MASK_MASK)
+			<< DCC_CFG_PORT_CONFIG1_DLID_MASK_SHIFT);
+	write_csr(ppd->dd, DCC_CFG_PORT_CONFIG1, c1);
+
+	/*
+	 * Iterate over all the send contexts and set their SLID check
+	 */
+	sreg = ((mask & SEND_CTXT_CHECK_SLID_MASK_MASK) <<
+			SEND_CTXT_CHECK_SLID_MASK_SHIFT) |
+	       (((ppd->lid & mask) & SEND_CTXT_CHECK_SLID_VALUE_MASK) <<
+			SEND_CTXT_CHECK_SLID_VALUE_SHIFT);
+
+	for (i = 0; i < dd->chip_send_contexts; i++) {
+		hfi1_cdbg(LINKVERB, "SendContext[%d].SLID_CHECK = 0x%x",
+			  i, (u32)sreg);
+		write_kctxt_csr(dd, i, SEND_CTXT_CHECK_SLID, sreg);
+	}
+
+	/* Now we have to do the same thing for the sdma engines */
+	sdma_update_lmc(dd, mask, ppd->lid);
+}
+
+static int wait_phy_linkstate(struct hfi1_devdata *dd, u32 state, u32 msecs)
+{
+	unsigned long timeout;
+	u32 curr_state;
+
+	timeout = jiffies + msecs_to_jiffies(msecs);
+	while (1) {
+		curr_state = read_physical_state(dd);
+		if (curr_state == state)
+			break;
+		if (time_after(jiffies, timeout)) {
+			dd_dev_err(dd,
+				"timeout waiting for phy link state 0x%x, current state is 0x%x\n",
+				state, curr_state);
+			return -ETIMEDOUT;
+		}
+		usleep_range(1950, 2050); /* sleep 2ms-ish */
+	}
+
+	return 0;
+}
+
+/*
+ * Helper for set_link_state().  Do not call except from that routine.
+ * Expects ppd->hls_mutex to be held.
+ *
+ * @rem_reason value to be sent to the neighbor
+ *
+ * LinkDownReasons only set if transition succeeds.
+ */
+static int goto_offline(struct hfi1_pportdata *ppd, u8 rem_reason)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	u32 pstate, previous_state;
+	u32 last_local_state;
+	u32 last_remote_state;
+	int ret;
+	int do_transition;
+	int do_wait;
+
+	previous_state = ppd->host_link_state;
+	ppd->host_link_state = HLS_GOING_OFFLINE;
+	pstate = read_physical_state(dd);
+	if (pstate == PLS_OFFLINE) {
+		do_transition = 0;	/* in right state */
+		do_wait = 0;		/* ...no need to wait */
+	} else if ((pstate & 0xff) == PLS_OFFLINE) {
+		do_transition = 0;	/* in an offline transient state */
+		do_wait = 1;		/* ...wait for it to settle */
+	} else {
+		do_transition = 1;	/* need to move to offline */
+		do_wait = 1;		/* ...will need to wait */
+	}
+
+	if (do_transition) {
+		ret = set_physical_link_state(dd,
+			PLS_OFFLINE | (rem_reason << 8));
+
+		if (ret != HCMD_SUCCESS) {
+			dd_dev_err(dd,
+				"Failed to transition to Offline link state, return %d\n",
+				ret);
+			return -EINVAL;
+		}
+		if (ppd->offline_disabled_reason == OPA_LINKDOWN_REASON_NONE)
+			ppd->offline_disabled_reason =
+			OPA_LINKDOWN_REASON_TRANSIENT;
+	}
+
+	if (do_wait) {
+		/* it can take a while for the link to go down */
+		ret = wait_phy_linkstate(dd, PLS_OFFLINE, 5000);
+		if (ret < 0)
+			return ret;
+	}
+
+	/* make sure the logical state is also down */
+	wait_logical_linkstate(ppd, IB_PORT_DOWN, 1000);
+
+	/*
+	 * Now in charge of LCB - must be after the physical state is
+	 * offline.quiet and before host_link_state is changed.
+	 */
+	set_host_lcb_access(dd);
+	write_csr(dd, DC_LCB_ERR_EN, ~0ull); /* watch LCB errors */
+	ppd->host_link_state = HLS_LINK_COOLDOWN; /* LCB access allowed */
+
+	/*
+	 * The LNI has a mandatory wait time after the physical state
+	 * moves to Offline.Quiet.  The wait time may be different
+	 * depending on how the link went down.  The 8051 firmware
+	 * will observe the needed wait time and only move to ready
+	 * when that is completed.  The largest of the quiet timeouts
+	 * is 2.5s, so wait that long and then a bit more.
+	 */
+	ret = wait_fm_ready(dd, 3000);
+	if (ret) {
+		dd_dev_err(dd,
+			"After going offline, timed out waiting for the 8051 to become ready to accept host requests\n");
+		/* state is really offline, so make it so */
+		ppd->host_link_state = HLS_DN_OFFLINE;
+		return ret;
+	}
+
+	/*
+	 * The state is now offline and the 8051 is ready to accept host
+	 * requests.
+	 *	- change our state
+	 *	- notify others if we were previously in a linkup state
+	 */
+	ppd->host_link_state = HLS_DN_OFFLINE;
+	if (previous_state & HLS_UP) {
+		/* went down while link was up */
+		handle_linkup_change(dd, 0);
+	} else if (previous_state
+			& (HLS_DN_POLL | HLS_VERIFY_CAP | HLS_GOING_UP)) {
+		/* went down while attempting link up */
+		/* byte 1 of last_*_state is the failure reason */
+		read_last_local_state(dd, &last_local_state);
+		read_last_remote_state(dd, &last_remote_state);
+		dd_dev_err(dd,
+			"LNI failure last states: local 0x%08x, remote 0x%08x\n",
+			last_local_state, last_remote_state);
+	}
+
+	/* the active link width (downgrade) is 0 on link down */
+	ppd->link_width_active = 0;
+	ppd->link_width_downgrade_tx_active = 0;
+	ppd->link_width_downgrade_rx_active = 0;
+	ppd->current_egress_rate = 0;
+	return 0;
+}
+
+/* return the link state name */
+static const char *link_state_name(u32 state)
+{
+	const char *name;
+	int n = ilog2(state);
+	static const char * const names[] = {
+		[__HLS_UP_INIT_BP]	 = "INIT",
+		[__HLS_UP_ARMED_BP]	 = "ARMED",
+		[__HLS_UP_ACTIVE_BP]	 = "ACTIVE",
+		[__HLS_DN_DOWNDEF_BP]	 = "DOWNDEF",
+		[__HLS_DN_POLL_BP]	 = "POLL",
+		[__HLS_DN_DISABLE_BP]	 = "DISABLE",
+		[__HLS_DN_OFFLINE_BP]	 = "OFFLINE",
+		[__HLS_VERIFY_CAP_BP]	 = "VERIFY_CAP",
+		[__HLS_GOING_UP_BP]	 = "GOING_UP",
+		[__HLS_GOING_OFFLINE_BP] = "GOING_OFFLINE",
+		[__HLS_LINK_COOLDOWN_BP] = "LINK_COOLDOWN"
+	};
+
+	name = n < ARRAY_SIZE(names) ? names[n] : NULL;
+	return name ? name : "unknown";
+}
+
+/* return the link state reason name */
+static const char *link_state_reason_name(struct hfi1_pportdata *ppd, u32 state)
+{
+	if (state == HLS_UP_INIT) {
+		switch (ppd->linkinit_reason) {
+		case OPA_LINKINIT_REASON_LINKUP:
+			return "(LINKUP)";
+		case OPA_LINKINIT_REASON_FLAPPING:
+			return "(FLAPPING)";
+		case OPA_LINKINIT_OUTSIDE_POLICY:
+			return "(OUTSIDE_POLICY)";
+		case OPA_LINKINIT_QUARANTINED:
+			return "(QUARANTINED)";
+		case OPA_LINKINIT_INSUFIC_CAPABILITY:
+			return "(INSUFIC_CAPABILITY)";
+		default:
+			break;
+		}
+	}
+	return "";
+}
+
+/*
+ * driver_physical_state - convert the driver's notion of a port's
+ * state (an HLS_*) into a physical state (a {IB,OPA}_PORTPHYSSTATE_*).
+ * Return -1 (converted to a u32) to indicate error.
+ */
+u32 driver_physical_state(struct hfi1_pportdata *ppd)
+{
+	switch (ppd->host_link_state) {
+	case HLS_UP_INIT:
+	case HLS_UP_ARMED:
+	case HLS_UP_ACTIVE:
+		return IB_PORTPHYSSTATE_LINKUP;
+	case HLS_DN_POLL:
+		return IB_PORTPHYSSTATE_POLLING;
+	case HLS_DN_DISABLE:
+		return IB_PORTPHYSSTATE_DISABLED;
+	case HLS_DN_OFFLINE:
+		return OPA_PORTPHYSSTATE_OFFLINE;
+	case HLS_VERIFY_CAP:
+		return IB_PORTPHYSSTATE_POLLING;
+	case HLS_GOING_UP:
+		return IB_PORTPHYSSTATE_POLLING;
+	case HLS_GOING_OFFLINE:
+		return OPA_PORTPHYSSTATE_OFFLINE;
+	case HLS_LINK_COOLDOWN:
+		return OPA_PORTPHYSSTATE_OFFLINE;
+	case HLS_DN_DOWNDEF:
+	default:
+		dd_dev_err(ppd->dd, "invalid host_link_state 0x%x\n",
+			   ppd->host_link_state);
+		return  -1;
+	}
+}
+
+/*
+ * driver_logical_state - convert the driver's notion of a port's
+ * state (an HLS_*) into a logical state (a IB_PORT_*). Return -1
+ * (converted to a u32) to indicate error.
+ */
+u32 driver_logical_state(struct hfi1_pportdata *ppd)
+{
+	if (ppd->host_link_state && !(ppd->host_link_state & HLS_UP))
+		return IB_PORT_DOWN;
+
+	switch (ppd->host_link_state & HLS_UP) {
+	case HLS_UP_INIT:
+		return IB_PORT_INIT;
+	case HLS_UP_ARMED:
+		return IB_PORT_ARMED;
+	case HLS_UP_ACTIVE:
+		return IB_PORT_ACTIVE;
+	default:
+		dd_dev_err(ppd->dd, "invalid host_link_state 0x%x\n",
+			   ppd->host_link_state);
+	return -1;
+	}
+}
+
+void set_link_down_reason(struct hfi1_pportdata *ppd, u8 lcl_reason,
+			  u8 neigh_reason, u8 rem_reason)
+{
+	if (ppd->local_link_down_reason.latest == 0 &&
+	    ppd->neigh_link_down_reason.latest == 0) {
+		ppd->local_link_down_reason.latest = lcl_reason;
+		ppd->neigh_link_down_reason.latest = neigh_reason;
+		ppd->remote_link_down_reason = rem_reason;
+	}
+}
+
+/*
+ * Change the physical and/or logical link state.
+ *
+ * Do not call this routine while inside an interrupt.  It contains
+ * calls to routines that can take multiple seconds to finish.
+ *
+ * Returns 0 on success, -errno on failure.
+ */
+int set_link_state(struct hfi1_pportdata *ppd, u32 state)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	struct ib_event event = {.device = NULL};
+	int ret1, ret = 0;
+	int was_up, is_down;
+	int orig_new_state, poll_bounce;
+
+	mutex_lock(&ppd->hls_lock);
+
+	orig_new_state = state;
+	if (state == HLS_DN_DOWNDEF)
+		state = dd->link_default;
+
+	/* interpret poll -> poll as a link bounce */
+	poll_bounce = ppd->host_link_state == HLS_DN_POLL
+				&& state == HLS_DN_POLL;
+
+	dd_dev_info(dd, "%s: current %s, new %s %s%s\n", __func__,
+		link_state_name(ppd->host_link_state),
+		link_state_name(orig_new_state),
+		poll_bounce ? "(bounce) " : "",
+		link_state_reason_name(ppd, state));
+
+	was_up = !!(ppd->host_link_state & HLS_UP);
+
+	/*
+	 * If we're going to a (HLS_*) link state that implies the logical
+	 * link state is neither of (IB_PORT_ARMED, IB_PORT_ACTIVE), then
+	 * reset is_sm_config_started to 0.
+	 */
+	if (!(state & (HLS_UP_ARMED | HLS_UP_ACTIVE)))
+		ppd->is_sm_config_started = 0;
+
+	/*
+	 * Do nothing if the states match.  Let a poll to poll link bounce
+	 * go through.
+	 */
+	if (ppd->host_link_state == state && !poll_bounce)
+		goto done;
+
+	switch (state) {
+	case HLS_UP_INIT:
+		if (ppd->host_link_state == HLS_DN_POLL && (quick_linkup
+			    || dd->icode == ICODE_FUNCTIONAL_SIMULATOR)) {
+			/*
+			 * Quick link up jumps from polling to here.
+			 *
+			 * Whether in normal or loopback mode, the
+			 * simulator jumps from polling to link up.
+			 * Accept that here.
+			 */
+			/* OK */;
+		} else if (ppd->host_link_state != HLS_GOING_UP) {
+			goto unexpected;
+		}
+
+		ppd->host_link_state = HLS_UP_INIT;
+		ret = wait_logical_linkstate(ppd, IB_PORT_INIT, 1000);
+		if (ret) {
+			/* logical state didn't change, stay at going_up */
+			ppd->host_link_state = HLS_GOING_UP;
+			dd_dev_err(dd,
+				"%s: logical state did not change to INIT\n",
+				__func__);
+		} else {
+			/* clear old transient LINKINIT_REASON code */
+			if (ppd->linkinit_reason >= OPA_LINKINIT_REASON_CLEAR)
+				ppd->linkinit_reason =
+					OPA_LINKINIT_REASON_LINKUP;
+
+			/* enable the port */
+			add_rcvctrl(dd, RCV_CTRL_RCV_PORT_ENABLE_SMASK);
+
+			handle_linkup_change(dd, 1);
+		}
+		break;
+	case HLS_UP_ARMED:
+		if (ppd->host_link_state != HLS_UP_INIT)
+			goto unexpected;
+
+		ppd->host_link_state = HLS_UP_ARMED;
+		set_logical_state(dd, LSTATE_ARMED);
+		ret = wait_logical_linkstate(ppd, IB_PORT_ARMED, 1000);
+		if (ret) {
+			/* logical state didn't change, stay at init */
+			ppd->host_link_state = HLS_UP_INIT;
+			dd_dev_err(dd,
+				"%s: logical state did not change to ARMED\n",
+				__func__);
+		}
+		/*
+		 * The simulator does not currently implement SMA messages,
+		 * so neighbor_normal is not set.  Set it here when we first
+		 * move to Armed.
+		 */
+		if (dd->icode == ICODE_FUNCTIONAL_SIMULATOR)
+			ppd->neighbor_normal = 1;
+		break;
+	case HLS_UP_ACTIVE:
+		if (ppd->host_link_state != HLS_UP_ARMED)
+			goto unexpected;
+
+		ppd->host_link_state = HLS_UP_ACTIVE;
+		set_logical_state(dd, LSTATE_ACTIVE);
+		ret = wait_logical_linkstate(ppd, IB_PORT_ACTIVE, 1000);
+		if (ret) {
+			/* logical state didn't change, stay at armed */
+			ppd->host_link_state = HLS_UP_ARMED;
+			dd_dev_err(dd,
+				"%s: logical state did not change to ACTIVE\n",
+				__func__);
+		} else {
+
+			/* tell all engines to go running */
+			sdma_all_running(dd);
+
+			/* Signal the IB layer that the port has went active */
+			event.device = &dd->verbs_dev.ibdev;
+			event.element.port_num = ppd->port;
+			event.event = IB_EVENT_PORT_ACTIVE;
+		}
+		break;
+	case HLS_DN_POLL:
+		if ((ppd->host_link_state == HLS_DN_DISABLE ||
+		     ppd->host_link_state == HLS_DN_OFFLINE) &&
+		    dd->dc_shutdown)
+			dc_start(dd);
+		/* Hand LED control to the DC */
+		write_csr(dd, DCC_CFG_LED_CNTRL, 0);
+
+		if (ppd->host_link_state != HLS_DN_OFFLINE) {
+			u8 tmp = ppd->link_enabled;
+
+			ret = goto_offline(ppd, ppd->remote_link_down_reason);
+			if (ret) {
+				ppd->link_enabled = tmp;
+				break;
+			}
+			ppd->remote_link_down_reason = 0;
+
+			if (ppd->driver_link_ready)
+				ppd->link_enabled = 1;
+		}
+
+		ret = set_local_link_attributes(ppd);
+		if (ret)
+			break;
+
+		ppd->port_error_action = 0;
+		ppd->host_link_state = HLS_DN_POLL;
+
+		if (quick_linkup) {
+			/* quick linkup does not go into polling */
+			ret = do_quick_linkup(dd);
+		} else {
+			ret1 = set_physical_link_state(dd, PLS_POLLING);
+			if (ret1 != HCMD_SUCCESS) {
+				dd_dev_err(dd,
+					"Failed to transition to Polling link state, return 0x%x\n",
+					ret1);
+				ret = -EINVAL;
+			}
+		}
+		ppd->offline_disabled_reason = OPA_LINKDOWN_REASON_NONE;
+		/*
+		 * If an error occurred above, go back to offline.  The
+		 * caller may reschedule another attempt.
+		 */
+		if (ret)
+			goto_offline(ppd, 0);
+		break;
+	case HLS_DN_DISABLE:
+		/* link is disabled */
+		ppd->link_enabled = 0;
+
+		/* allow any state to transition to disabled */
+
+		/* must transition to offline first */
+		if (ppd->host_link_state != HLS_DN_OFFLINE) {
+			ret = goto_offline(ppd, ppd->remote_link_down_reason);
+			if (ret)
+				break;
+			ppd->remote_link_down_reason = 0;
+		}
+
+		ret1 = set_physical_link_state(dd, PLS_DISABLED);
+		if (ret1 != HCMD_SUCCESS) {
+			dd_dev_err(dd,
+				"Failed to transition to Disabled link state, return 0x%x\n",
+				ret1);
+			ret = -EINVAL;
+			break;
+		}
+		ppd->host_link_state = HLS_DN_DISABLE;
+		dc_shutdown(dd);
+		break;
+	case HLS_DN_OFFLINE:
+		if (ppd->host_link_state == HLS_DN_DISABLE)
+			dc_start(dd);
+
+		/* allow any state to transition to offline */
+		ret = goto_offline(ppd, ppd->remote_link_down_reason);
+		if (!ret)
+			ppd->remote_link_down_reason = 0;
+		break;
+	case HLS_VERIFY_CAP:
+		if (ppd->host_link_state != HLS_DN_POLL)
+			goto unexpected;
+		ppd->host_link_state = HLS_VERIFY_CAP;
+		break;
+	case HLS_GOING_UP:
+		if (ppd->host_link_state != HLS_VERIFY_CAP)
+			goto unexpected;
+
+		ret1 = set_physical_link_state(dd, PLS_LINKUP);
+		if (ret1 != HCMD_SUCCESS) {
+			dd_dev_err(dd,
+				"Failed to transition to link up state, return 0x%x\n",
+				ret1);
+			ret = -EINVAL;
+			break;
+		}
+		ppd->host_link_state = HLS_GOING_UP;
+		break;
+
+	case HLS_GOING_OFFLINE:		/* transient within goto_offline() */
+	case HLS_LINK_COOLDOWN:		/* transient within goto_offline() */
+	default:
+		dd_dev_info(dd, "%s: state 0x%x: not supported\n",
+			__func__, state);
+		ret = -EINVAL;
+		break;
+	}
+
+	is_down = !!(ppd->host_link_state & (HLS_DN_POLL |
+			HLS_DN_DISABLE | HLS_DN_OFFLINE));
+
+	if (was_up && is_down && ppd->local_link_down_reason.sma == 0 &&
+	    ppd->neigh_link_down_reason.sma == 0) {
+		ppd->local_link_down_reason.sma =
+		  ppd->local_link_down_reason.latest;
+		ppd->neigh_link_down_reason.sma =
+		  ppd->neigh_link_down_reason.latest;
+	}
+
+	goto done;
+
+unexpected:
+	dd_dev_err(dd, "%s: unexpected state transition from %s to %s\n",
+		__func__, link_state_name(ppd->host_link_state),
+		link_state_name(state));
+	ret = -EINVAL;
+
+done:
+	mutex_unlock(&ppd->hls_lock);
+
+	if (event.device)
+		ib_dispatch_event(&event);
+
+	return ret;
+}
+
+int hfi1_set_ib_cfg(struct hfi1_pportdata *ppd, int which, u32 val)
+{
+	u64 reg;
+	int ret = 0;
+
+	switch (which) {
+	case HFI1_IB_CFG_LIDLMC:
+		set_lidlmc(ppd);
+		break;
+	case HFI1_IB_CFG_VL_HIGH_LIMIT:
+		/*
+		 * The VL Arbitrator high limit is sent in units of 4k
+		 * bytes, while HFI stores it in units of 64 bytes.
+		 */
+		val *= 4096/64;
+		reg = ((u64)val & SEND_HIGH_PRIORITY_LIMIT_LIMIT_MASK)
+			<< SEND_HIGH_PRIORITY_LIMIT_LIMIT_SHIFT;
+		write_csr(ppd->dd, SEND_HIGH_PRIORITY_LIMIT, reg);
+		break;
+	case HFI1_IB_CFG_LINKDEFAULT: /* IB link default (sleep/poll) */
+		/* HFI only supports POLL as the default link down state */
+		if (val != HLS_DN_POLL)
+			ret = -EINVAL;
+		break;
+	case HFI1_IB_CFG_OP_VLS:
+		if (ppd->vls_operational != val) {
+			ppd->vls_operational = val;
+			if (!ppd->port)
+				ret = -EINVAL;
+			else
+				ret = sdma_map_init(
+					ppd->dd,
+					ppd->port - 1,
+					val,
+					NULL);
+		}
+		break;
+	/*
+	 * For link width, link width downgrade, and speed enable, always AND
+	 * the setting with what is actually supported.  This has two benefits.
+	 * First, enabled can't have unsupported values, no matter what the
+	 * SM or FM might want.  Second, the ALL_SUPPORTED wildcards that mean
+	 * "fill in with your supported value" have all the bits in the
+	 * field set, so simply ANDing with supported has the desired result.
+	 */
+	case HFI1_IB_CFG_LWID_ENB: /* set allowed Link-width */
+		ppd->link_width_enabled = val & ppd->link_width_supported;
+		break;
+	case HFI1_IB_CFG_LWID_DG_ENB: /* set allowed link width downgrade */
+		ppd->link_width_downgrade_enabled =
+				val & ppd->link_width_downgrade_supported;
+		break;
+	case HFI1_IB_CFG_SPD_ENB: /* allowed Link speeds */
+		ppd->link_speed_enabled = val & ppd->link_speed_supported;
+		break;
+	case HFI1_IB_CFG_OVERRUN_THRESH: /* IB overrun threshold */
+		/*
+		 * HFI does not follow IB specs, save this value
+		 * so we can report it, if asked.
+		 */
+		ppd->overrun_threshold = val;
+		break;
+	case HFI1_IB_CFG_PHYERR_THRESH: /* IB PHY error threshold */
+		/*
+		 * HFI does not follow IB specs, save this value
+		 * so we can report it, if asked.
+		 */
+		ppd->phy_error_threshold = val;
+		break;
+
+	case HFI1_IB_CFG_MTU:
+		set_send_length(ppd);
+		break;
+
+	case HFI1_IB_CFG_PKEYS:
+		if (HFI1_CAP_IS_KSET(PKEY_CHECK))
+			set_partition_keys(ppd);
+		break;
+
+	default:
+		if (HFI1_CAP_IS_KSET(PRINT_UNIMPL))
+			dd_dev_info(ppd->dd,
+			  "%s: which %s, val 0x%x: not implemented\n",
+			  __func__, ib_cfg_name(which), val);
+		break;
+	}
+	return ret;
+}
+
+/* begin functions related to vl arbitration table caching */
+static void init_vl_arb_caches(struct hfi1_pportdata *ppd)
+{
+	int i;
+
+	BUILD_BUG_ON(VL_ARB_TABLE_SIZE !=
+			VL_ARB_LOW_PRIO_TABLE_SIZE);
+	BUILD_BUG_ON(VL_ARB_TABLE_SIZE !=
+			VL_ARB_HIGH_PRIO_TABLE_SIZE);
+
+	/*
+	 * Note that we always return values directly from the
+	 * 'vl_arb_cache' (and do no CSR reads) in response to a
+	 * 'Get(VLArbTable)'. This is obviously correct after a
+	 * 'Set(VLArbTable)', since the cache will then be up to
+	 * date. But it's also correct prior to any 'Set(VLArbTable)'
+	 * since then both the cache, and the relevant h/w registers
+	 * will be zeroed.
+	 */
+
+	for (i = 0; i < MAX_PRIO_TABLE; i++)
+		spin_lock_init(&ppd->vl_arb_cache[i].lock);
+}
+
+/*
+ * vl_arb_lock_cache
+ *
+ * All other vl_arb_* functions should be called only after locking
+ * the cache.
+ */
+static inline struct vl_arb_cache *
+vl_arb_lock_cache(struct hfi1_pportdata *ppd, int idx)
+{
+	if (idx != LO_PRIO_TABLE && idx != HI_PRIO_TABLE)
+		return NULL;
+	spin_lock(&ppd->vl_arb_cache[idx].lock);
+	return &ppd->vl_arb_cache[idx];
+}
+
+static inline void vl_arb_unlock_cache(struct hfi1_pportdata *ppd, int idx)
+{
+	spin_unlock(&ppd->vl_arb_cache[idx].lock);
+}
+
+static void vl_arb_get_cache(struct vl_arb_cache *cache,
+			     struct ib_vl_weight_elem *vl)
+{
+	memcpy(vl, cache->table, VL_ARB_TABLE_SIZE * sizeof(*vl));
+}
+
+static void vl_arb_set_cache(struct vl_arb_cache *cache,
+			     struct ib_vl_weight_elem *vl)
+{
+	memcpy(cache->table, vl, VL_ARB_TABLE_SIZE * sizeof(*vl));
+}
+
+static int vl_arb_match_cache(struct vl_arb_cache *cache,
+			      struct ib_vl_weight_elem *vl)
+{
+	return !memcmp(cache->table, vl, VL_ARB_TABLE_SIZE * sizeof(*vl));
+}
+/* end functions related to vl arbitration table caching */
+
+static int set_vl_weights(struct hfi1_pportdata *ppd, u32 target,
+			  u32 size, struct ib_vl_weight_elem *vl)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	u64 reg;
+	unsigned int i, is_up = 0;
+	int drain, ret = 0;
+
+	mutex_lock(&ppd->hls_lock);
+
+	if (ppd->host_link_state & HLS_UP)
+		is_up = 1;
+
+	drain = !is_ax(dd) && is_up;
+
+	if (drain)
+		/*
+		 * Before adjusting VL arbitration weights, empty per-VL
+		 * FIFOs, otherwise a packet whose VL weight is being
+		 * set to 0 could get stuck in a FIFO with no chance to
+		 * egress.
+		 */
+		ret = stop_drain_data_vls(dd);
+
+	if (ret) {
+		dd_dev_err(
+			dd,
+			"%s: cannot stop/drain VLs - refusing to change VL arbitration weights\n",
+			__func__);
+		goto err;
+	}
+
+	for (i = 0; i < size; i++, vl++) {
+		/*
+		 * NOTE: The low priority shift and mask are used here, but
+		 * they are the same for both the low and high registers.
+		 */
+		reg = (((u64)vl->vl & SEND_LOW_PRIORITY_LIST_VL_MASK)
+				<< SEND_LOW_PRIORITY_LIST_VL_SHIFT)
+		      | (((u64)vl->weight
+				& SEND_LOW_PRIORITY_LIST_WEIGHT_MASK)
+				<< SEND_LOW_PRIORITY_LIST_WEIGHT_SHIFT);
+		write_csr(dd, target + (i * 8), reg);
+	}
+	pio_send_control(dd, PSC_GLOBAL_VLARB_ENABLE);
+
+	if (drain)
+		open_fill_data_vls(dd); /* reopen all VLs */
+
+err:
+	mutex_unlock(&ppd->hls_lock);
+
+	return ret;
+}
+
+/*
+ * Read one credit merge VL register.
+ */
+static void read_one_cm_vl(struct hfi1_devdata *dd, u32 csr,
+			   struct vl_limit *vll)
+{
+	u64 reg = read_csr(dd, csr);
+
+	vll->dedicated = cpu_to_be16(
+		(reg >> SEND_CM_CREDIT_VL_DEDICATED_LIMIT_VL_SHIFT)
+		& SEND_CM_CREDIT_VL_DEDICATED_LIMIT_VL_MASK);
+	vll->shared = cpu_to_be16(
+		(reg >> SEND_CM_CREDIT_VL_SHARED_LIMIT_VL_SHIFT)
+		& SEND_CM_CREDIT_VL_SHARED_LIMIT_VL_MASK);
+}
+
+/*
+ * Read the current credit merge limits.
+ */
+static int get_buffer_control(struct hfi1_devdata *dd,
+			      struct buffer_control *bc, u16 *overall_limit)
+{
+	u64 reg;
+	int i;
+
+	/* not all entries are filled in */
+	memset(bc, 0, sizeof(*bc));
+
+	/* OPA and HFI have a 1-1 mapping */
+	for (i = 0; i < TXE_NUM_DATA_VL; i++)
+		read_one_cm_vl(dd, SEND_CM_CREDIT_VL + (8*i), &bc->vl[i]);
+
+	/* NOTE: assumes that VL* and VL15 CSRs are bit-wise identical */
+	read_one_cm_vl(dd, SEND_CM_CREDIT_VL15, &bc->vl[15]);
+
+	reg = read_csr(dd, SEND_CM_GLOBAL_CREDIT);
+	bc->overall_shared_limit = cpu_to_be16(
+		(reg >> SEND_CM_GLOBAL_CREDIT_SHARED_LIMIT_SHIFT)
+		& SEND_CM_GLOBAL_CREDIT_SHARED_LIMIT_MASK);
+	if (overall_limit)
+		*overall_limit = (reg
+			>> SEND_CM_GLOBAL_CREDIT_TOTAL_CREDIT_LIMIT_SHIFT)
+			& SEND_CM_GLOBAL_CREDIT_TOTAL_CREDIT_LIMIT_MASK;
+	return sizeof(struct buffer_control);
+}
+
+static int get_sc2vlnt(struct hfi1_devdata *dd, struct sc2vlnt *dp)
+{
+	u64 reg;
+	int i;
+
+	/* each register contains 16 SC->VLnt mappings, 4 bits each */
+	reg = read_csr(dd, DCC_CFG_SC_VL_TABLE_15_0);
+	for (i = 0; i < sizeof(u64); i++) {
+		u8 byte = *(((u8 *)&reg) + i);
+
+		dp->vlnt[2 * i] = byte & 0xf;
+		dp->vlnt[(2 * i) + 1] = (byte & 0xf0) >> 4;
+	}
+
+	reg = read_csr(dd, DCC_CFG_SC_VL_TABLE_31_16);
+	for (i = 0; i < sizeof(u64); i++) {
+		u8 byte = *(((u8 *)&reg) + i);
+
+		dp->vlnt[16 + (2 * i)] = byte & 0xf;
+		dp->vlnt[16 + (2 * i) + 1] = (byte & 0xf0) >> 4;
+	}
+	return sizeof(struct sc2vlnt);
+}
+
+static void get_vlarb_preempt(struct hfi1_devdata *dd, u32 nelems,
+			      struct ib_vl_weight_elem *vl)
+{
+	unsigned int i;
+
+	for (i = 0; i < nelems; i++, vl++) {
+		vl->vl = 0xf;
+		vl->weight = 0;
+	}
+}
+
+static void set_sc2vlnt(struct hfi1_devdata *dd, struct sc2vlnt *dp)
+{
+	write_csr(dd, DCC_CFG_SC_VL_TABLE_15_0,
+		DC_SC_VL_VAL(15_0,
+		0, dp->vlnt[0] & 0xf,
+		1, dp->vlnt[1] & 0xf,
+		2, dp->vlnt[2] & 0xf,
+		3, dp->vlnt[3] & 0xf,
+		4, dp->vlnt[4] & 0xf,
+		5, dp->vlnt[5] & 0xf,
+		6, dp->vlnt[6] & 0xf,
+		7, dp->vlnt[7] & 0xf,
+		8, dp->vlnt[8] & 0xf,
+		9, dp->vlnt[9] & 0xf,
+		10, dp->vlnt[10] & 0xf,
+		11, dp->vlnt[11] & 0xf,
+		12, dp->vlnt[12] & 0xf,
+		13, dp->vlnt[13] & 0xf,
+		14, dp->vlnt[14] & 0xf,
+		15, dp->vlnt[15] & 0xf));
+	write_csr(dd, DCC_CFG_SC_VL_TABLE_31_16,
+		DC_SC_VL_VAL(31_16,
+		16, dp->vlnt[16] & 0xf,
+		17, dp->vlnt[17] & 0xf,
+		18, dp->vlnt[18] & 0xf,
+		19, dp->vlnt[19] & 0xf,
+		20, dp->vlnt[20] & 0xf,
+		21, dp->vlnt[21] & 0xf,
+		22, dp->vlnt[22] & 0xf,
+		23, dp->vlnt[23] & 0xf,
+		24, dp->vlnt[24] & 0xf,
+		25, dp->vlnt[25] & 0xf,
+		26, dp->vlnt[26] & 0xf,
+		27, dp->vlnt[27] & 0xf,
+		28, dp->vlnt[28] & 0xf,
+		29, dp->vlnt[29] & 0xf,
+		30, dp->vlnt[30] & 0xf,
+		31, dp->vlnt[31] & 0xf));
+}
+
+static void nonzero_msg(struct hfi1_devdata *dd, int idx, const char *what,
+			u16 limit)
+{
+	if (limit != 0)
+		dd_dev_info(dd, "Invalid %s limit %d on VL %d, ignoring\n",
+			what, (int)limit, idx);
+}
+
+/* change only the shared limit portion of SendCmGLobalCredit */
+static void set_global_shared(struct hfi1_devdata *dd, u16 limit)
+{
+	u64 reg;
+
+	reg = read_csr(dd, SEND_CM_GLOBAL_CREDIT);
+	reg &= ~SEND_CM_GLOBAL_CREDIT_SHARED_LIMIT_SMASK;
+	reg |= (u64)limit << SEND_CM_GLOBAL_CREDIT_SHARED_LIMIT_SHIFT;
+	write_csr(dd, SEND_CM_GLOBAL_CREDIT, reg);
+}
+
+/* change only the total credit limit portion of SendCmGLobalCredit */
+static void set_global_limit(struct hfi1_devdata *dd, u16 limit)
+{
+	u64 reg;
+
+	reg = read_csr(dd, SEND_CM_GLOBAL_CREDIT);
+	reg &= ~SEND_CM_GLOBAL_CREDIT_TOTAL_CREDIT_LIMIT_SMASK;
+	reg |= (u64)limit << SEND_CM_GLOBAL_CREDIT_TOTAL_CREDIT_LIMIT_SHIFT;
+	write_csr(dd, SEND_CM_GLOBAL_CREDIT, reg);
+}
+
+/* set the given per-VL shared limit */
+static void set_vl_shared(struct hfi1_devdata *dd, int vl, u16 limit)
+{
+	u64 reg;
+	u32 addr;
+
+	if (vl < TXE_NUM_DATA_VL)
+		addr = SEND_CM_CREDIT_VL + (8 * vl);
+	else
+		addr = SEND_CM_CREDIT_VL15;
+
+	reg = read_csr(dd, addr);
+	reg &= ~SEND_CM_CREDIT_VL_SHARED_LIMIT_VL_SMASK;
+	reg |= (u64)limit << SEND_CM_CREDIT_VL_SHARED_LIMIT_VL_SHIFT;
+	write_csr(dd, addr, reg);
+}
+
+/* set the given per-VL dedicated limit */
+static void set_vl_dedicated(struct hfi1_devdata *dd, int vl, u16 limit)
+{
+	u64 reg;
+	u32 addr;
+
+	if (vl < TXE_NUM_DATA_VL)
+		addr = SEND_CM_CREDIT_VL + (8 * vl);
+	else
+		addr = SEND_CM_CREDIT_VL15;
+
+	reg = read_csr(dd, addr);
+	reg &= ~SEND_CM_CREDIT_VL_DEDICATED_LIMIT_VL_SMASK;
+	reg |= (u64)limit << SEND_CM_CREDIT_VL_DEDICATED_LIMIT_VL_SHIFT;
+	write_csr(dd, addr, reg);
+}
+
+/* spin until the given per-VL status mask bits clear */
+static void wait_for_vl_status_clear(struct hfi1_devdata *dd, u64 mask,
+				     const char *which)
+{
+	unsigned long timeout;
+	u64 reg;
+
+	timeout = jiffies + msecs_to_jiffies(VL_STATUS_CLEAR_TIMEOUT);
+	while (1) {
+		reg = read_csr(dd, SEND_CM_CREDIT_USED_STATUS) & mask;
+
+		if (reg == 0)
+			return;	/* success */
+		if (time_after(jiffies, timeout))
+			break;		/* timed out */
+		udelay(1);
+	}
+
+	dd_dev_err(dd,
+		"%s credit change status not clearing after %dms, mask 0x%llx, not clear 0x%llx\n",
+		which, VL_STATUS_CLEAR_TIMEOUT, mask, reg);
+	/*
+	 * If this occurs, it is likely there was a credit loss on the link.
+	 * The only recovery from that is a link bounce.
+	 */
+	dd_dev_err(dd,
+		"Continuing anyway.  A credit loss may occur.  Suggest a link bounce\n");
+}
+
+/*
+ * The number of credits on the VLs may be changed while everything
+ * is "live", but the following algorithm must be followed due to
+ * how the hardware is actually implemented.  In particular,
+ * Return_Credit_Status[] is the only correct status check.
+ *
+ * if (reducing Global_Shared_Credit_Limit or any shared limit changing)
+ *     set Global_Shared_Credit_Limit = 0
+ *     use_all_vl = 1
+ * mask0 = all VLs that are changing either dedicated or shared limits
+ * set Shared_Limit[mask0] = 0
+ * spin until Return_Credit_Status[use_all_vl ? all VL : mask0] == 0
+ * if (changing any dedicated limit)
+ *     mask1 = all VLs that are lowering dedicated limits
+ *     lower Dedicated_Limit[mask1]
+ *     spin until Return_Credit_Status[mask1] == 0
+ *     raise Dedicated_Limits
+ * raise Shared_Limits
+ * raise Global_Shared_Credit_Limit
+ *
+ * lower = if the new limit is lower, set the limit to the new value
+ * raise = if the new limit is higher than the current value (may be changed
+ *	earlier in the algorithm), set the new limit to the new value
+ */
+static int set_buffer_control(struct hfi1_devdata *dd,
+			      struct buffer_control *new_bc)
+{
+	u64 changing_mask, ld_mask, stat_mask;
+	int change_count;
+	int i, use_all_mask;
+	int this_shared_changing;
+	/*
+	 * A0: add the variable any_shared_limit_changing below and in the
+	 * algorithm above.  If removing A0 support, it can be removed.
+	 */
+	int any_shared_limit_changing;
+	struct buffer_control cur_bc;
+	u8 changing[OPA_MAX_VLS];
+	u8 lowering_dedicated[OPA_MAX_VLS];
+	u16 cur_total;
+	u32 new_total = 0;
+	const u64 all_mask =
+	SEND_CM_CREDIT_USED_STATUS_VL0_RETURN_CREDIT_STATUS_SMASK
+	 | SEND_CM_CREDIT_USED_STATUS_VL1_RETURN_CREDIT_STATUS_SMASK
+	 | SEND_CM_CREDIT_USED_STATUS_VL2_RETURN_CREDIT_STATUS_SMASK
+	 | SEND_CM_CREDIT_USED_STATUS_VL3_RETURN_CREDIT_STATUS_SMASK
+	 | SEND_CM_CREDIT_USED_STATUS_VL4_RETURN_CREDIT_STATUS_SMASK
+	 | SEND_CM_CREDIT_USED_STATUS_VL5_RETURN_CREDIT_STATUS_SMASK
+	 | SEND_CM_CREDIT_USED_STATUS_VL6_RETURN_CREDIT_STATUS_SMASK
+	 | SEND_CM_CREDIT_USED_STATUS_VL7_RETURN_CREDIT_STATUS_SMASK
+	 | SEND_CM_CREDIT_USED_STATUS_VL15_RETURN_CREDIT_STATUS_SMASK;
+
+#define valid_vl(idx) ((idx) < TXE_NUM_DATA_VL || (idx) == 15)
+#define NUM_USABLE_VLS 16	/* look at VL15 and less */
+
+
+	/* find the new total credits, do sanity check on unused VLs */
+	for (i = 0; i < OPA_MAX_VLS; i++) {
+		if (valid_vl(i)) {
+			new_total += be16_to_cpu(new_bc->vl[i].dedicated);
+			continue;
+		}
+		nonzero_msg(dd, i, "dedicated",
+			be16_to_cpu(new_bc->vl[i].dedicated));
+		nonzero_msg(dd, i, "shared",
+			be16_to_cpu(new_bc->vl[i].shared));
+		new_bc->vl[i].dedicated = 0;
+		new_bc->vl[i].shared = 0;
+	}
+	new_total += be16_to_cpu(new_bc->overall_shared_limit);
+	if (new_total > (u32)dd->link_credits)
+		return -EINVAL;
+	/* fetch the current values */
+	get_buffer_control(dd, &cur_bc, &cur_total);
+
+	/*
+	 * Create the masks we will use.
+	 */
+	memset(changing, 0, sizeof(changing));
+	memset(lowering_dedicated, 0, sizeof(lowering_dedicated));
+	/* NOTE: Assumes that the individual VL bits are adjacent and in
+	   increasing order */
+	stat_mask =
+		SEND_CM_CREDIT_USED_STATUS_VL0_RETURN_CREDIT_STATUS_SMASK;
+	changing_mask = 0;
+	ld_mask = 0;
+	change_count = 0;
+	any_shared_limit_changing = 0;
+	for (i = 0; i < NUM_USABLE_VLS; i++, stat_mask <<= 1) {
+		if (!valid_vl(i))
+			continue;
+		this_shared_changing = new_bc->vl[i].shared
+						!= cur_bc.vl[i].shared;
+		if (this_shared_changing)
+			any_shared_limit_changing = 1;
+		if (new_bc->vl[i].dedicated != cur_bc.vl[i].dedicated
+				|| this_shared_changing) {
+			changing[i] = 1;
+			changing_mask |= stat_mask;
+			change_count++;
+		}
+		if (be16_to_cpu(new_bc->vl[i].dedicated) <
+					be16_to_cpu(cur_bc.vl[i].dedicated)) {
+			lowering_dedicated[i] = 1;
+			ld_mask |= stat_mask;
+		}
+	}
+
+	/* bracket the credit change with a total adjustment */
+	if (new_total > cur_total)
+		set_global_limit(dd, new_total);
+
+	/*
+	 * Start the credit change algorithm.
+	 */
+	use_all_mask = 0;
+	if ((be16_to_cpu(new_bc->overall_shared_limit) <
+				be16_to_cpu(cur_bc.overall_shared_limit))
+			|| (is_a0(dd) && any_shared_limit_changing)) {
+		set_global_shared(dd, 0);
+		cur_bc.overall_shared_limit = 0;
+		use_all_mask = 1;
+	}
+
+	for (i = 0; i < NUM_USABLE_VLS; i++) {
+		if (!valid_vl(i))
+			continue;
+
+		if (changing[i]) {
+			set_vl_shared(dd, i, 0);
+			cur_bc.vl[i].shared = 0;
+		}
+	}
+
+	wait_for_vl_status_clear(dd, use_all_mask ? all_mask : changing_mask,
+		"shared");
+
+	if (change_count > 0) {
+		for (i = 0; i < NUM_USABLE_VLS; i++) {
+			if (!valid_vl(i))
+				continue;
+
+			if (lowering_dedicated[i]) {
+				set_vl_dedicated(dd, i,
+					be16_to_cpu(new_bc->vl[i].dedicated));
+				cur_bc.vl[i].dedicated =
+						new_bc->vl[i].dedicated;
+			}
+		}
+
+		wait_for_vl_status_clear(dd, ld_mask, "dedicated");
+
+		/* now raise all dedicated that are going up */
+		for (i = 0; i < NUM_USABLE_VLS; i++) {
+			if (!valid_vl(i))
+				continue;
+
+			if (be16_to_cpu(new_bc->vl[i].dedicated) >
+					be16_to_cpu(cur_bc.vl[i].dedicated))
+				set_vl_dedicated(dd, i,
+					be16_to_cpu(new_bc->vl[i].dedicated));
+		}
+	}
+
+	/* next raise all shared that are going up */
+	for (i = 0; i < NUM_USABLE_VLS; i++) {
+		if (!valid_vl(i))
+			continue;
+
+		if (be16_to_cpu(new_bc->vl[i].shared) >
+				be16_to_cpu(cur_bc.vl[i].shared))
+			set_vl_shared(dd, i, be16_to_cpu(new_bc->vl[i].shared));
+	}
+
+	/* finally raise the global shared */
+	if (be16_to_cpu(new_bc->overall_shared_limit) >
+			be16_to_cpu(cur_bc.overall_shared_limit))
+		set_global_shared(dd,
+			be16_to_cpu(new_bc->overall_shared_limit));
+
+	/* bracket the credit change with a total adjustment */
+	if (new_total < cur_total)
+		set_global_limit(dd, new_total);
+	return 0;
+}
+
+/*
+ * Read the given fabric manager table. Return the size of the
+ * table (in bytes) on success, and a negative error code on
+ * failure.
+ */
+int fm_get_table(struct hfi1_pportdata *ppd, int which, void *t)
+
+{
+	int size;
+	struct vl_arb_cache *vlc;
+
+	switch (which) {
+	case FM_TBL_VL_HIGH_ARB:
+		size = 256;
+		/*
+		 * OPA specifies 128 elements (of 2 bytes each), though
+		 * HFI supports only 16 elements in h/w.
+		 */
+		vlc = vl_arb_lock_cache(ppd, HI_PRIO_TABLE);
+		vl_arb_get_cache(vlc, t);
+		vl_arb_unlock_cache(ppd, HI_PRIO_TABLE);
+		break;
+	case FM_TBL_VL_LOW_ARB:
+		size = 256;
+		/*
+		 * OPA specifies 128 elements (of 2 bytes each), though
+		 * HFI supports only 16 elements in h/w.
+		 */
+		vlc = vl_arb_lock_cache(ppd, LO_PRIO_TABLE);
+		vl_arb_get_cache(vlc, t);
+		vl_arb_unlock_cache(ppd, LO_PRIO_TABLE);
+		break;
+	case FM_TBL_BUFFER_CONTROL:
+		size = get_buffer_control(ppd->dd, t, NULL);
+		break;
+	case FM_TBL_SC2VLNT:
+		size = get_sc2vlnt(ppd->dd, t);
+		break;
+	case FM_TBL_VL_PREEMPT_ELEMS:
+		size = 256;
+		/* OPA specifies 128 elements, of 2 bytes each */
+		get_vlarb_preempt(ppd->dd, OPA_MAX_VLS, t);
+		break;
+	case FM_TBL_VL_PREEMPT_MATRIX:
+		size = 256;
+		/*
+		 * OPA specifies that this is the same size as the VL
+		 * arbitration tables (i.e., 256 bytes).
+		 */
+		break;
+	default:
+		return -EINVAL;
+	}
+	return size;
+}
+
+/*
+ * Write the given fabric manager table.
+ */
+int fm_set_table(struct hfi1_pportdata *ppd, int which, void *t)
+{
+	int ret = 0;
+	struct vl_arb_cache *vlc;
+
+	switch (which) {
+	case FM_TBL_VL_HIGH_ARB:
+		vlc = vl_arb_lock_cache(ppd, HI_PRIO_TABLE);
+		if (vl_arb_match_cache(vlc, t)) {
+			vl_arb_unlock_cache(ppd, HI_PRIO_TABLE);
+			break;
+		}
+		vl_arb_set_cache(vlc, t);
+		vl_arb_unlock_cache(ppd, HI_PRIO_TABLE);
+		ret = set_vl_weights(ppd, SEND_HIGH_PRIORITY_LIST,
+				     VL_ARB_HIGH_PRIO_TABLE_SIZE, t);
+		break;
+	case FM_TBL_VL_LOW_ARB:
+		vlc = vl_arb_lock_cache(ppd, LO_PRIO_TABLE);
+		if (vl_arb_match_cache(vlc, t)) {
+			vl_arb_unlock_cache(ppd, LO_PRIO_TABLE);
+			break;
+		}
+		vl_arb_set_cache(vlc, t);
+		vl_arb_unlock_cache(ppd, LO_PRIO_TABLE);
+		ret = set_vl_weights(ppd, SEND_LOW_PRIORITY_LIST,
+				     VL_ARB_LOW_PRIO_TABLE_SIZE, t);
+		break;
+	case FM_TBL_BUFFER_CONTROL:
+		ret = set_buffer_control(ppd->dd, t);
+		break;
+	case FM_TBL_SC2VLNT:
+		set_sc2vlnt(ppd->dd, t);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+	return ret;
+}
+
+/*
+ * Disable all data VLs.
+ *
+ * Return 0 if disabled, non-zero if the VLs cannot be disabled.
+ */
+static int disable_data_vls(struct hfi1_devdata *dd)
+{
+	if (is_a0(dd))
+		return 1;
+
+	pio_send_control(dd, PSC_DATA_VL_DISABLE);
+
+	return 0;
+}
+
+/*
+ * open_fill_data_vls() - the counterpart to stop_drain_data_vls().
+ * Just re-enables all data VLs (the "fill" part happens
+ * automatically - the name was chosen for symmetry with
+ * stop_drain_data_vls()).
+ *
+ * Return 0 if successful, non-zero if the VLs cannot be enabled.
+ */
+int open_fill_data_vls(struct hfi1_devdata *dd)
+{
+	if (is_a0(dd))
+		return 1;
+
+	pio_send_control(dd, PSC_DATA_VL_ENABLE);
+
+	return 0;
+}
+
+/*
+ * drain_data_vls() - assumes that disable_data_vls() has been called,
+ * wait for occupancy (of per-VL FIFOs) for all contexts, and SDMA
+ * engines to drop to 0.
+ */
+static void drain_data_vls(struct hfi1_devdata *dd)
+{
+	sc_wait(dd);
+	sdma_wait(dd);
+	pause_for_credit_return(dd);
+}
+
+/*
+ * stop_drain_data_vls() - disable, then drain all per-VL fifos.
+ *
+ * Use open_fill_data_vls() to resume using data VLs.  This pair is
+ * meant to be used like this:
+ *
+ * stop_drain_data_vls(dd);
+ * // do things with per-VL resources
+ * open_fill_data_vls(dd);
+ */
+int stop_drain_data_vls(struct hfi1_devdata *dd)
+{
+	int ret;
+
+	ret = disable_data_vls(dd);
+	if (ret == 0)
+		drain_data_vls(dd);
+
+	return ret;
+}
+
+/*
+ * Convert a nanosecond time to a cclock count.  No matter how slow
+ * the cclock, a non-zero ns will always have a non-zero result.
+ */
+u32 ns_to_cclock(struct hfi1_devdata *dd, u32 ns)
+{
+	u32 cclocks;
+
+	if (dd->icode == ICODE_FPGA_EMULATION)
+		cclocks = (ns * 1000) / FPGA_CCLOCK_PS;
+	else  /* simulation pretends to be ASIC */
+		cclocks = (ns * 1000) / ASIC_CCLOCK_PS;
+	if (ns && !cclocks)	/* if ns nonzero, must be at least 1 */
+		cclocks = 1;
+	return cclocks;
+}
+
+/*
+ * Convert a cclock count to nanoseconds. Not matter how slow
+ * the cclock, a non-zero cclocks will always have a non-zero result.
+ */
+u32 cclock_to_ns(struct hfi1_devdata *dd, u32 cclocks)
+{
+	u32 ns;
+
+	if (dd->icode == ICODE_FPGA_EMULATION)
+		ns = (cclocks * FPGA_CCLOCK_PS) / 1000;
+	else  /* simulation pretends to be ASIC */
+		ns = (cclocks * ASIC_CCLOCK_PS) / 1000;
+	if (cclocks && !ns)
+		ns = 1;
+	return ns;
+}
+
+/*
+ * Dynamically adjust the receive interrupt timeout for a context based on
+ * incoming packet rate.
+ *
+ * NOTE: Dynamic adjustment does not allow rcv_intr_count to be zero.
+ */
+static void adjust_rcv_timeout(struct hfi1_ctxtdata *rcd, u32 npkts)
+{
+	struct hfi1_devdata *dd = rcd->dd;
+	u32 timeout = rcd->rcvavail_timeout;
+
+	/*
+	 * This algorithm doubles or halves the timeout depending on whether
+	 * the number of packets received in this interrupt were less than or
+	 * greater equal the interrupt count.
+	 *
+	 * The calculations below do not allow a steady state to be achieved.
+	 * Only at the endpoints it is possible to have an unchanging
+	 * timeout.
+	 */
+	if (npkts < rcv_intr_count) {
+		/*
+		 * Not enough packets arrived before the timeout, adjust
+		 * timeout downward.
+		 */
+		if (timeout < 2) /* already at minimum? */
+			return;
+		timeout >>= 1;
+	} else {
+		/*
+		 * More than enough packets arrived before the timeout, adjust
+		 * timeout upward.
+		 */
+		if (timeout >= dd->rcv_intr_timeout_csr) /* already at max? */
+			return;
+		timeout = min(timeout << 1, dd->rcv_intr_timeout_csr);
+	}
+
+	rcd->rcvavail_timeout = timeout;
+	/* timeout cannot be larger than rcv_intr_timeout_csr which has already
+	   been verified to be in range */
+	write_kctxt_csr(dd, rcd->ctxt, RCV_AVAIL_TIME_OUT,
+		(u64)timeout << RCV_AVAIL_TIME_OUT_TIME_OUT_RELOAD_SHIFT);
+}
+
+void update_usrhead(struct hfi1_ctxtdata *rcd, u32 hd, u32 updegr, u32 egrhd,
+		    u32 intr_adjust, u32 npkts)
+{
+	struct hfi1_devdata *dd = rcd->dd;
+	u64 reg;
+	u32 ctxt = rcd->ctxt;
+
+	/*
+	 * Need to write timeout register before updating RcvHdrHead to ensure
+	 * that a new value is used when the HW decides to restart counting.
+	 */
+	if (intr_adjust)
+		adjust_rcv_timeout(rcd, npkts);
+	if (updegr) {
+		reg = (egrhd & RCV_EGR_INDEX_HEAD_HEAD_MASK)
+			<< RCV_EGR_INDEX_HEAD_HEAD_SHIFT;
+		write_uctxt_csr(dd, ctxt, RCV_EGR_INDEX_HEAD, reg);
+	}
+	mmiowb();
+	reg = ((u64)rcv_intr_count << RCV_HDR_HEAD_COUNTER_SHIFT) |
+		(((u64)hd & RCV_HDR_HEAD_HEAD_MASK)
+			<< RCV_HDR_HEAD_HEAD_SHIFT);
+	write_uctxt_csr(dd, ctxt, RCV_HDR_HEAD, reg);
+	mmiowb();
+}
+
+u32 hdrqempty(struct hfi1_ctxtdata *rcd)
+{
+	u32 head, tail;
+
+	head = (read_uctxt_csr(rcd->dd, rcd->ctxt, RCV_HDR_HEAD)
+		& RCV_HDR_HEAD_HEAD_SMASK) >> RCV_HDR_HEAD_HEAD_SHIFT;
+
+	if (rcd->rcvhdrtail_kvaddr)
+		tail = get_rcvhdrtail(rcd);
+	else
+		tail = read_uctxt_csr(rcd->dd, rcd->ctxt, RCV_HDR_TAIL);
+
+	return head == tail;
+}
+
+/*
+ * Context Control and Receive Array encoding for buffer size:
+ *	0x0 invalid
+ *	0x1   4 KB
+ *	0x2   8 KB
+ *	0x3  16 KB
+ *	0x4  32 KB
+ *	0x5  64 KB
+ *	0x6 128 KB
+ *	0x7 256 KB
+ *	0x8 512 KB (Receive Array only)
+ *	0x9   1 MB (Receive Array only)
+ *	0xa   2 MB (Receive Array only)
+ *
+ *	0xB-0xF - reserved (Receive Array only)
+ *
+ *
+ * This routine assumes that the value has already been sanity checked.
+ */
+static u32 encoded_size(u32 size)
+{
+	switch (size) {
+	case   4*1024: return 0x1;
+	case   8*1024: return 0x2;
+	case  16*1024: return 0x3;
+	case  32*1024: return 0x4;
+	case  64*1024: return 0x5;
+	case 128*1024: return 0x6;
+	case 256*1024: return 0x7;
+	case 512*1024: return 0x8;
+	case   1*1024*1024: return 0x9;
+	case   2*1024*1024: return 0xa;
+	}
+	return 0x1;	/* if invalid, go with the minimum size */
+}
+
+void hfi1_rcvctrl(struct hfi1_devdata *dd, unsigned int op, int ctxt)
+{
+	struct hfi1_ctxtdata *rcd;
+	u64 rcvctrl, reg;
+	int did_enable = 0;
+
+	rcd = dd->rcd[ctxt];
+	if (!rcd)
+		return;
+
+	hfi1_cdbg(RCVCTRL, "ctxt %d op 0x%x", ctxt, op);
+
+	rcvctrl = read_kctxt_csr(dd, ctxt, RCV_CTXT_CTRL);
+	/* if the context already enabled, don't do the extra steps */
+	if ((op & HFI1_RCVCTRL_CTXT_ENB)
+			&& !(rcvctrl & RCV_CTXT_CTRL_ENABLE_SMASK)) {
+		/* reset the tail and hdr addresses, and sequence count */
+		write_kctxt_csr(dd, ctxt, RCV_HDR_ADDR,
+				rcd->rcvhdrq_phys);
+		if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL))
+			write_kctxt_csr(dd, ctxt, RCV_HDR_TAIL_ADDR,
+					rcd->rcvhdrqtailaddr_phys);
+		rcd->seq_cnt = 1;
+
+		/* reset the cached receive header queue head value */
+		rcd->head = 0;
+
+		/*
+		 * Zero the receive header queue so we don't get false
+		 * positives when checking the sequence number.  The
+		 * sequence numbers could land exactly on the same spot.
+		 * E.g. a rcd restart before the receive header wrapped.
+		 */
+		memset(rcd->rcvhdrq, 0, rcd->rcvhdrq_size);
+
+		/* starting timeout */
+		rcd->rcvavail_timeout = dd->rcv_intr_timeout_csr;
+
+		/* enable the context */
+		rcvctrl |= RCV_CTXT_CTRL_ENABLE_SMASK;
+
+		/* clean the egr buffer size first */
+		rcvctrl &= ~RCV_CTXT_CTRL_EGR_BUF_SIZE_SMASK;
+		rcvctrl |= ((u64)encoded_size(rcd->egrbufs.rcvtid_size)
+				& RCV_CTXT_CTRL_EGR_BUF_SIZE_MASK)
+					<< RCV_CTXT_CTRL_EGR_BUF_SIZE_SHIFT;
+
+		/* zero RcvHdrHead - set RcvHdrHead.Counter after enable */
+		write_uctxt_csr(dd, ctxt, RCV_HDR_HEAD, 0);
+		did_enable = 1;
+
+		/* zero RcvEgrIndexHead */
+		write_uctxt_csr(dd, ctxt, RCV_EGR_INDEX_HEAD, 0);
+
+		/* set eager count and base index */
+		reg = (((u64)(rcd->egrbufs.alloced >> RCV_SHIFT)
+			& RCV_EGR_CTRL_EGR_CNT_MASK)
+		       << RCV_EGR_CTRL_EGR_CNT_SHIFT) |
+			(((rcd->eager_base >> RCV_SHIFT)
+			  & RCV_EGR_CTRL_EGR_BASE_INDEX_MASK)
+			 << RCV_EGR_CTRL_EGR_BASE_INDEX_SHIFT);
+		write_kctxt_csr(dd, ctxt, RCV_EGR_CTRL, reg);
+
+		/*
+		 * Set TID (expected) count and base index.
+		 * rcd->expected_count is set to individual RcvArray entries,
+		 * not pairs, and the CSR takes a pair-count in groups of
+		 * four, so divide by 8.
+		 */
+		reg = (((rcd->expected_count >> RCV_SHIFT)
+					& RCV_TID_CTRL_TID_PAIR_CNT_MASK)
+				<< RCV_TID_CTRL_TID_PAIR_CNT_SHIFT) |
+		      (((rcd->expected_base >> RCV_SHIFT)
+					& RCV_TID_CTRL_TID_BASE_INDEX_MASK)
+				<< RCV_TID_CTRL_TID_BASE_INDEX_SHIFT);
+		write_kctxt_csr(dd, ctxt, RCV_TID_CTRL, reg);
+		if (ctxt == VL15CTXT)
+			write_csr(dd, RCV_VL15, VL15CTXT);
+	}
+	if (op & HFI1_RCVCTRL_CTXT_DIS) {
+		write_csr(dd, RCV_VL15, 0);
+		rcvctrl &= ~RCV_CTXT_CTRL_ENABLE_SMASK;
+	}
+	if (op & HFI1_RCVCTRL_INTRAVAIL_ENB)
+		rcvctrl |= RCV_CTXT_CTRL_INTR_AVAIL_SMASK;
+	if (op & HFI1_RCVCTRL_INTRAVAIL_DIS)
+		rcvctrl &= ~RCV_CTXT_CTRL_INTR_AVAIL_SMASK;
+	if (op & HFI1_RCVCTRL_TAILUPD_ENB && rcd->rcvhdrqtailaddr_phys)
+		rcvctrl |= RCV_CTXT_CTRL_TAIL_UPD_SMASK;
+	if (op & HFI1_RCVCTRL_TAILUPD_DIS)
+		rcvctrl &= ~RCV_CTXT_CTRL_TAIL_UPD_SMASK;
+	if (op & HFI1_RCVCTRL_TIDFLOW_ENB)
+		rcvctrl |= RCV_CTXT_CTRL_TID_FLOW_ENABLE_SMASK;
+	if (op & HFI1_RCVCTRL_TIDFLOW_DIS)
+		rcvctrl &= ~RCV_CTXT_CTRL_TID_FLOW_ENABLE_SMASK;
+	if (op & HFI1_RCVCTRL_ONE_PKT_EGR_ENB) {
+		/* In one-packet-per-eager mode, the size comes from
+		   the RcvArray entry. */
+		rcvctrl &= ~RCV_CTXT_CTRL_EGR_BUF_SIZE_SMASK;
+		rcvctrl |= RCV_CTXT_CTRL_ONE_PACKET_PER_EGR_BUFFER_SMASK;
+	}
+	if (op & HFI1_RCVCTRL_ONE_PKT_EGR_DIS)
+		rcvctrl &= ~RCV_CTXT_CTRL_ONE_PACKET_PER_EGR_BUFFER_SMASK;
+	if (op & HFI1_RCVCTRL_NO_RHQ_DROP_ENB)
+		rcvctrl |= RCV_CTXT_CTRL_DONT_DROP_RHQ_FULL_SMASK;
+	if (op & HFI1_RCVCTRL_NO_RHQ_DROP_DIS)
+		rcvctrl &= ~RCV_CTXT_CTRL_DONT_DROP_RHQ_FULL_SMASK;
+	if (op & HFI1_RCVCTRL_NO_EGR_DROP_ENB)
+		rcvctrl |= RCV_CTXT_CTRL_DONT_DROP_EGR_FULL_SMASK;
+	if (op & HFI1_RCVCTRL_NO_EGR_DROP_DIS)
+		rcvctrl &= ~RCV_CTXT_CTRL_DONT_DROP_EGR_FULL_SMASK;
+	rcd->rcvctrl = rcvctrl;
+	hfi1_cdbg(RCVCTRL, "ctxt %d rcvctrl 0x%llx\n", ctxt, rcvctrl);
+	write_kctxt_csr(dd, ctxt, RCV_CTXT_CTRL, rcd->rcvctrl);
+
+	/* work around sticky RcvCtxtStatus.BlockedRHQFull */
+	if (did_enable
+	    && (rcvctrl & RCV_CTXT_CTRL_DONT_DROP_RHQ_FULL_SMASK)) {
+		reg = read_kctxt_csr(dd, ctxt, RCV_CTXT_STATUS);
+		if (reg != 0) {
+			dd_dev_info(dd, "ctxt %d status %lld (blocked)\n",
+				ctxt, reg);
+			read_uctxt_csr(dd, ctxt, RCV_HDR_HEAD);
+			write_uctxt_csr(dd, ctxt, RCV_HDR_HEAD, 0x10);
+			write_uctxt_csr(dd, ctxt, RCV_HDR_HEAD, 0x00);
+			read_uctxt_csr(dd, ctxt, RCV_HDR_HEAD);
+			reg = read_kctxt_csr(dd, ctxt, RCV_CTXT_STATUS);
+			dd_dev_info(dd, "ctxt %d status %lld (%s blocked)\n",
+				ctxt, reg, reg == 0 ? "not" : "still");
+		}
+	}
+
+	if (did_enable) {
+		/*
+		 * The interrupt timeout and count must be set after
+		 * the context is enabled to take effect.
+		 */
+		/* set interrupt timeout */
+		write_kctxt_csr(dd, ctxt, RCV_AVAIL_TIME_OUT,
+			(u64)rcd->rcvavail_timeout <<
+				RCV_AVAIL_TIME_OUT_TIME_OUT_RELOAD_SHIFT);
+
+		/* set RcvHdrHead.Counter, zero RcvHdrHead.Head (again) */
+		reg = (u64)rcv_intr_count << RCV_HDR_HEAD_COUNTER_SHIFT;
+		write_uctxt_csr(dd, ctxt, RCV_HDR_HEAD, reg);
+	}
+
+	if (op & (HFI1_RCVCTRL_TAILUPD_DIS | HFI1_RCVCTRL_CTXT_DIS))
+		/*
+		 * If the context has been disabled and the Tail Update has
+		 * been cleared, clear the RCV_HDR_TAIL_ADDR CSR so
+		 * it doesn't contain an address that is invalid.
+		 */
+		write_kctxt_csr(dd, ctxt, RCV_HDR_TAIL_ADDR, 0);
+}
+
+u32 hfi1_read_cntrs(struct hfi1_devdata *dd, loff_t pos, char **namep,
+		    u64 **cntrp)
+{
+	int ret;
+	u64 val = 0;
+
+	if (namep) {
+		ret = dd->cntrnameslen;
+		if (pos != 0) {
+			dd_dev_err(dd, "read_cntrs does not support indexing");
+			return 0;
+		}
+		*namep = dd->cntrnames;
+	} else {
+		const struct cntr_entry *entry;
+		int i, j;
+
+		ret = (dd->ndevcntrs) * sizeof(u64);
+		if (pos != 0) {
+			dd_dev_err(dd, "read_cntrs does not support indexing");
+			return 0;
+		}
+
+		/* Get the start of the block of counters */
+		*cntrp = dd->cntrs;
+
+		/*
+		 * Now go and fill in each counter in the block.
+		 */
+		for (i = 0; i < DEV_CNTR_LAST; i++) {
+			entry = &dev_cntrs[i];
+			hfi1_cdbg(CNTR, "reading %s", entry->name);
+			if (entry->flags & CNTR_DISABLED) {
+				/* Nothing */
+				hfi1_cdbg(CNTR, "\tDisabled\n");
+			} else {
+				if (entry->flags & CNTR_VL) {
+					hfi1_cdbg(CNTR, "\tPer VL\n");
+					for (j = 0; j < C_VL_COUNT; j++) {
+						val = entry->rw_cntr(entry,
+								  dd, j,
+								  CNTR_MODE_R,
+								  0);
+						hfi1_cdbg(
+						   CNTR,
+						   "\t\tRead 0x%llx for %d\n",
+						   val, j);
+						dd->cntrs[entry->offset + j] =
+									    val;
+					}
+				} else {
+					val = entry->rw_cntr(entry, dd,
+							CNTR_INVALID_VL,
+							CNTR_MODE_R, 0);
+					dd->cntrs[entry->offset] = val;
+					hfi1_cdbg(CNTR, "\tRead 0x%llx", val);
+				}
+			}
+		}
+	}
+	return ret;
+}
+
+/*
+ * Used by sysfs to create files for hfi stats to read
+ */
+u32 hfi1_read_portcntrs(struct hfi1_devdata *dd, loff_t pos, u32 port,
+			char **namep, u64 **cntrp)
+{
+	int ret;
+	u64 val = 0;
+
+	if (namep) {
+		ret = dd->portcntrnameslen;
+		if (pos != 0) {
+			dd_dev_err(dd, "index not supported");
+			return 0;
+		}
+		*namep = dd->portcntrnames;
+	} else {
+		const struct cntr_entry *entry;
+		struct hfi1_pportdata *ppd;
+		int i, j;
+
+		ret = (dd->nportcntrs) * sizeof(u64);
+		if (pos != 0) {
+			dd_dev_err(dd, "indexing not supported");
+			return 0;
+		}
+		ppd = (struct hfi1_pportdata *)(dd + 1 + port);
+		*cntrp = ppd->cntrs;
+
+		for (i = 0; i < PORT_CNTR_LAST; i++) {
+			entry = &port_cntrs[i];
+			hfi1_cdbg(CNTR, "reading %s", entry->name);
+			if (entry->flags & CNTR_DISABLED) {
+				/* Nothing */
+				hfi1_cdbg(CNTR, "\tDisabled\n");
+				continue;
+			}
+
+			if (entry->flags & CNTR_VL) {
+				hfi1_cdbg(CNTR, "\tPer VL");
+				for (j = 0; j < C_VL_COUNT; j++) {
+					val = entry->rw_cntr(entry, ppd, j,
+							       CNTR_MODE_R,
+							       0);
+					hfi1_cdbg(
+					   CNTR,
+					   "\t\tRead 0x%llx for %d",
+					   val, j);
+					ppd->cntrs[entry->offset + j] = val;
+				}
+			} else {
+				val = entry->rw_cntr(entry, ppd,
+						       CNTR_INVALID_VL,
+						       CNTR_MODE_R,
+						       0);
+				ppd->cntrs[entry->offset] = val;
+				hfi1_cdbg(CNTR, "\tRead 0x%llx", val);
+			}
+		}
+	}
+	return ret;
+}
+
+static void free_cntrs(struct hfi1_devdata *dd)
+{
+	struct hfi1_pportdata *ppd;
+	int i;
+
+	if (dd->synth_stats_timer.data)
+		del_timer_sync(&dd->synth_stats_timer);
+	dd->synth_stats_timer.data = 0;
+	ppd = (struct hfi1_pportdata *)(dd + 1);
+	for (i = 0; i < dd->num_pports; i++, ppd++) {
+		kfree(ppd->cntrs);
+		kfree(ppd->scntrs);
+		free_percpu(ppd->ibport_data.rc_acks);
+		free_percpu(ppd->ibport_data.rc_qacks);
+		free_percpu(ppd->ibport_data.rc_delayed_comp);
+		ppd->cntrs = NULL;
+		ppd->scntrs = NULL;
+		ppd->ibport_data.rc_acks = NULL;
+		ppd->ibport_data.rc_qacks = NULL;
+		ppd->ibport_data.rc_delayed_comp = NULL;
+	}
+	kfree(dd->portcntrnames);
+	dd->portcntrnames = NULL;
+	kfree(dd->cntrs);
+	dd->cntrs = NULL;
+	kfree(dd->scntrs);
+	dd->scntrs = NULL;
+	kfree(dd->cntrnames);
+	dd->cntrnames = NULL;
+}
+
+#define CNTR_MAX 0xFFFFFFFFFFFFFFFFULL
+#define CNTR_32BIT_MAX 0x00000000FFFFFFFF
+
+static u64 read_dev_port_cntr(struct hfi1_devdata *dd, struct cntr_entry *entry,
+			      u64 *psval, void *context, int vl)
+{
+	u64 val;
+	u64 sval = *psval;
+
+	if (entry->flags & CNTR_DISABLED) {
+		dd_dev_err(dd, "Counter %s not enabled", entry->name);
+		return 0;
+	}
+
+	hfi1_cdbg(CNTR, "cntr: %s vl %d psval 0x%llx", entry->name, vl, *psval);
+
+	val = entry->rw_cntr(entry, context, vl, CNTR_MODE_R, 0);
+
+	/* If its a synthetic counter there is more work we need to do */
+	if (entry->flags & CNTR_SYNTH) {
+		if (sval == CNTR_MAX) {
+			/* No need to read already saturated */
+			return CNTR_MAX;
+		}
+
+		if (entry->flags & CNTR_32BIT) {
+			/* 32bit counters can wrap multiple times */
+			u64 upper = sval >> 32;
+			u64 lower = (sval << 32) >> 32;
+
+			if (lower > val) { /* hw wrapped */
+				if (upper == CNTR_32BIT_MAX)
+					val = CNTR_MAX;
+				else
+					upper++;
+			}
+
+			if (val != CNTR_MAX)
+				val = (upper << 32) | val;
+
+		} else {
+			/* If we rolled we are saturated */
+			if ((val < sval) || (val > CNTR_MAX))
+				val = CNTR_MAX;
+		}
+	}
+
+	*psval = val;
+
+	hfi1_cdbg(CNTR, "\tNew val=0x%llx", val);
+
+	return val;
+}
+
+static u64 write_dev_port_cntr(struct hfi1_devdata *dd,
+			       struct cntr_entry *entry,
+			       u64 *psval, void *context, int vl, u64 data)
+{
+	u64 val;
+
+	if (entry->flags & CNTR_DISABLED) {
+		dd_dev_err(dd, "Counter %s not enabled", entry->name);
+		return 0;
+	}
+
+	hfi1_cdbg(CNTR, "cntr: %s vl %d psval 0x%llx", entry->name, vl, *psval);
+
+	if (entry->flags & CNTR_SYNTH) {
+		*psval = data;
+		if (entry->flags & CNTR_32BIT) {
+			val = entry->rw_cntr(entry, context, vl, CNTR_MODE_W,
+					     (data << 32) >> 32);
+			val = data; /* return the full 64bit value */
+		} else {
+			val = entry->rw_cntr(entry, context, vl, CNTR_MODE_W,
+					     data);
+		}
+	} else {
+		val = entry->rw_cntr(entry, context, vl, CNTR_MODE_W, data);
+	}
+
+	*psval = val;
+
+	hfi1_cdbg(CNTR, "\tNew val=0x%llx", val);
+
+	return val;
+}
+
+u64 read_dev_cntr(struct hfi1_devdata *dd, int index, int vl)
+{
+	struct cntr_entry *entry;
+	u64 *sval;
+
+	entry = &dev_cntrs[index];
+	sval = dd->scntrs + entry->offset;
+
+	if (vl != CNTR_INVALID_VL)
+		sval += vl;
+
+	return read_dev_port_cntr(dd, entry, sval, dd, vl);
+}
+
+u64 write_dev_cntr(struct hfi1_devdata *dd, int index, int vl, u64 data)
+{
+	struct cntr_entry *entry;
+	u64 *sval;
+
+	entry = &dev_cntrs[index];
+	sval = dd->scntrs + entry->offset;
+
+	if (vl != CNTR_INVALID_VL)
+		sval += vl;
+
+	return write_dev_port_cntr(dd, entry, sval, dd, vl, data);
+}
+
+u64 read_port_cntr(struct hfi1_pportdata *ppd, int index, int vl)
+{
+	struct cntr_entry *entry;
+	u64 *sval;
+
+	entry = &port_cntrs[index];
+	sval = ppd->scntrs + entry->offset;
+
+	if (vl != CNTR_INVALID_VL)
+		sval += vl;
+
+	if ((index >= C_RCV_HDR_OVF_FIRST + ppd->dd->num_rcv_contexts) &&
+	    (index <= C_RCV_HDR_OVF_LAST)) {
+		/* We do not want to bother for disabled contexts */
+		return 0;
+	}
+
+	return read_dev_port_cntr(ppd->dd, entry, sval, ppd, vl);
+}
+
+u64 write_port_cntr(struct hfi1_pportdata *ppd, int index, int vl, u64 data)
+{
+	struct cntr_entry *entry;
+	u64 *sval;
+
+	entry = &port_cntrs[index];
+	sval = ppd->scntrs + entry->offset;
+
+	if (vl != CNTR_INVALID_VL)
+		sval += vl;
+
+	if ((index >= C_RCV_HDR_OVF_FIRST + ppd->dd->num_rcv_contexts) &&
+	    (index <= C_RCV_HDR_OVF_LAST)) {
+		/* We do not want to bother for disabled contexts */
+		return 0;
+	}
+
+	return write_dev_port_cntr(ppd->dd, entry, sval, ppd, vl, data);
+}
+
+static void update_synth_timer(unsigned long opaque)
+{
+	u64 cur_tx;
+	u64 cur_rx;
+	u64 total_flits;
+	u8 update = 0;
+	int i, j, vl;
+	struct hfi1_pportdata *ppd;
+	struct cntr_entry *entry;
+
+	struct hfi1_devdata *dd = (struct hfi1_devdata *)opaque;
+
+	/*
+	 * Rather than keep beating on the CSRs pick a minimal set that we can
+	 * check to watch for potential roll over. We can do this by looking at
+	 * the number of flits sent/recv. If the total flits exceeds 32bits then
+	 * we have to iterate all the counters and update.
+	 */
+	entry = &dev_cntrs[C_DC_RCV_FLITS];
+	cur_rx = entry->rw_cntr(entry, dd, CNTR_INVALID_VL, CNTR_MODE_R, 0);
+
+	entry = &dev_cntrs[C_DC_XMIT_FLITS];
+	cur_tx = entry->rw_cntr(entry, dd, CNTR_INVALID_VL, CNTR_MODE_R, 0);
+
+	hfi1_cdbg(
+	    CNTR,
+	    "[%d] curr tx=0x%llx rx=0x%llx :: last tx=0x%llx rx=0x%llx\n",
+	    dd->unit, cur_tx, cur_rx, dd->last_tx, dd->last_rx);
+
+	if ((cur_tx < dd->last_tx) || (cur_rx < dd->last_rx)) {
+		/*
+		 * May not be strictly necessary to update but it won't hurt and
+		 * simplifies the logic here.
+		 */
+		update = 1;
+		hfi1_cdbg(CNTR, "[%d] Tripwire counter rolled, updating",
+			  dd->unit);
+	} else {
+		total_flits = (cur_tx - dd->last_tx) + (cur_rx - dd->last_rx);
+		hfi1_cdbg(CNTR,
+			  "[%d] total flits 0x%llx limit 0x%llx\n", dd->unit,
+			  total_flits, (u64)CNTR_32BIT_MAX);
+		if (total_flits >= CNTR_32BIT_MAX) {
+			hfi1_cdbg(CNTR, "[%d] 32bit limit hit, updating",
+				  dd->unit);
+			update = 1;
+		}
+	}
+
+	if (update) {
+		hfi1_cdbg(CNTR, "[%d] Updating dd and ppd counters", dd->unit);
+		for (i = 0; i < DEV_CNTR_LAST; i++) {
+			entry = &dev_cntrs[i];
+			if (entry->flags & CNTR_VL) {
+				for (vl = 0; vl < C_VL_COUNT; vl++)
+					read_dev_cntr(dd, i, vl);
+			} else {
+				read_dev_cntr(dd, i, CNTR_INVALID_VL);
+			}
+		}
+		ppd = (struct hfi1_pportdata *)(dd + 1);
+		for (i = 0; i < dd->num_pports; i++, ppd++) {
+			for (j = 0; j < PORT_CNTR_LAST; j++) {
+				entry = &port_cntrs[j];
+				if (entry->flags & CNTR_VL) {
+					for (vl = 0; vl < C_VL_COUNT; vl++)
+						read_port_cntr(ppd, j, vl);
+				} else {
+					read_port_cntr(ppd, j, CNTR_INVALID_VL);
+				}
+			}
+		}
+
+		/*
+		 * We want the value in the register. The goal is to keep track
+		 * of the number of "ticks" not the counter value. In other
+		 * words if the register rolls we want to notice it and go ahead
+		 * and force an update.
+		 */
+		entry = &dev_cntrs[C_DC_XMIT_FLITS];
+		dd->last_tx = entry->rw_cntr(entry, dd, CNTR_INVALID_VL,
+						CNTR_MODE_R, 0);
+
+		entry = &dev_cntrs[C_DC_RCV_FLITS];
+		dd->last_rx = entry->rw_cntr(entry, dd, CNTR_INVALID_VL,
+						CNTR_MODE_R, 0);
+
+		hfi1_cdbg(CNTR, "[%d] setting last tx/rx to 0x%llx 0x%llx",
+			  dd->unit, dd->last_tx, dd->last_rx);
+
+	} else {
+		hfi1_cdbg(CNTR, "[%d] No update necessary", dd->unit);
+	}
+
+mod_timer(&dd->synth_stats_timer, jiffies + HZ * SYNTH_CNT_TIME);
+}
+
+#define C_MAX_NAME 13 /* 12 chars + one for /0 */
+static int init_cntrs(struct hfi1_devdata *dd)
+{
+	int i, rcv_ctxts, index, j;
+	size_t sz;
+	char *p;
+	char name[C_MAX_NAME];
+	struct hfi1_pportdata *ppd;
+
+	/* set up the stats timer; the add_timer is done at the end */
+	init_timer(&dd->synth_stats_timer);
+	dd->synth_stats_timer.function = update_synth_timer;
+	dd->synth_stats_timer.data = (unsigned long) dd;
+
+	/***********************/
+	/* per device counters */
+	/***********************/
+
+	/* size names and determine how many we have*/
+	dd->ndevcntrs = 0;
+	sz = 0;
+	index = 0;
+
+	for (i = 0; i < DEV_CNTR_LAST; i++) {
+		hfi1_dbg_early("Init cntr %s\n", dev_cntrs[i].name);
+		if (dev_cntrs[i].flags & CNTR_DISABLED) {
+			hfi1_dbg_early("\tSkipping %s\n", dev_cntrs[i].name);
+			continue;
+		}
+
+		if (dev_cntrs[i].flags & CNTR_VL) {
+			hfi1_dbg_early("\tProcessing VL cntr\n");
+			dev_cntrs[i].offset = index;
+			for (j = 0; j < C_VL_COUNT; j++) {
+				memset(name, '\0', C_MAX_NAME);
+				snprintf(name, C_MAX_NAME, "%s%d",
+					dev_cntrs[i].name,
+					vl_from_idx(j));
+				sz += strlen(name);
+				sz++;
+				hfi1_dbg_early("\t\t%s\n", name);
+				dd->ndevcntrs++;
+				index++;
+			}
+		} else {
+			/* +1 for newline  */
+			sz += strlen(dev_cntrs[i].name) + 1;
+			dd->ndevcntrs++;
+			dev_cntrs[i].offset = index;
+			index++;
+			hfi1_dbg_early("\tAdding %s\n", dev_cntrs[i].name);
+		}
+	}
+
+	/* allocate space for the counter values */
+	dd->cntrs = kcalloc(index, sizeof(u64), GFP_KERNEL);
+	if (!dd->cntrs)
+		goto bail;
+
+	dd->scntrs = kcalloc(index, sizeof(u64), GFP_KERNEL);
+	if (!dd->scntrs)
+		goto bail;
+
+
+	/* allocate space for the counter names */
+	dd->cntrnameslen = sz;
+	dd->cntrnames = kmalloc(sz, GFP_KERNEL);
+	if (!dd->cntrnames)
+		goto bail;
+
+	/* fill in the names */
+	for (p = dd->cntrnames, i = 0, index = 0; i < DEV_CNTR_LAST; i++) {
+		if (dev_cntrs[i].flags & CNTR_DISABLED) {
+			/* Nothing */
+		} else {
+			if (dev_cntrs[i].flags & CNTR_VL) {
+				for (j = 0; j < C_VL_COUNT; j++) {
+					memset(name, '\0', C_MAX_NAME);
+					snprintf(name, C_MAX_NAME, "%s%d",
+						dev_cntrs[i].name,
+						vl_from_idx(j));
+					memcpy(p, name, strlen(name));
+					p += strlen(name);
+					*p++ = '\n';
+				}
+			} else {
+				memcpy(p, dev_cntrs[i].name,
+				       strlen(dev_cntrs[i].name));
+				p += strlen(dev_cntrs[i].name);
+				*p++ = '\n';
+			}
+			index++;
+		}
+	}
+
+	/*********************/
+	/* per port counters */
+	/*********************/
+
+	/*
+	 * Go through the counters for the overflows and disable the ones we
+	 * don't need. This varies based on platform so we need to do it
+	 * dynamically here.
+	 */
+	rcv_ctxts = dd->num_rcv_contexts;
+	for (i = C_RCV_HDR_OVF_FIRST + rcv_ctxts;
+	     i <= C_RCV_HDR_OVF_LAST; i++) {
+		port_cntrs[i].flags |= CNTR_DISABLED;
+	}
+
+	/* size port counter names and determine how many we have*/
+	sz = 0;
+	dd->nportcntrs = 0;
+	for (i = 0; i < PORT_CNTR_LAST; i++) {
+		hfi1_dbg_early("Init pcntr %s\n", port_cntrs[i].name);
+		if (port_cntrs[i].flags & CNTR_DISABLED) {
+			hfi1_dbg_early("\tSkipping %s\n", port_cntrs[i].name);
+			continue;
+		}
+
+		if (port_cntrs[i].flags & CNTR_VL) {
+			hfi1_dbg_early("\tProcessing VL cntr\n");
+			port_cntrs[i].offset = dd->nportcntrs;
+			for (j = 0; j < C_VL_COUNT; j++) {
+				memset(name, '\0', C_MAX_NAME);
+				snprintf(name, C_MAX_NAME, "%s%d",
+					port_cntrs[i].name,
+					vl_from_idx(j));
+				sz += strlen(name);
+				sz++;
+				hfi1_dbg_early("\t\t%s\n", name);
+				dd->nportcntrs++;
+			}
+		} else {
+			/* +1 for newline  */
+			sz += strlen(port_cntrs[i].name) + 1;
+			port_cntrs[i].offset = dd->nportcntrs;
+			dd->nportcntrs++;
+			hfi1_dbg_early("\tAdding %s\n", port_cntrs[i].name);
+		}
+	}
+
+	/* allocate space for the counter names */
+	dd->portcntrnameslen = sz;
+	dd->portcntrnames = kmalloc(sz, GFP_KERNEL);
+	if (!dd->portcntrnames)
+		goto bail;
+
+	/* fill in port cntr names */
+	for (p = dd->portcntrnames, i = 0; i < PORT_CNTR_LAST; i++) {
+		if (port_cntrs[i].flags & CNTR_DISABLED)
+			continue;
+
+		if (port_cntrs[i].flags & CNTR_VL) {
+			for (j = 0; j < C_VL_COUNT; j++) {
+				memset(name, '\0', C_MAX_NAME);
+				snprintf(name, C_MAX_NAME, "%s%d",
+					port_cntrs[i].name,
+					vl_from_idx(j));
+				memcpy(p, name, strlen(name));
+				p += strlen(name);
+				*p++ = '\n';
+			}
+		} else {
+			memcpy(p, port_cntrs[i].name,
+			       strlen(port_cntrs[i].name));
+			p += strlen(port_cntrs[i].name);
+			*p++ = '\n';
+		}
+	}
+
+	/* allocate per port storage for counter values */
+	ppd = (struct hfi1_pportdata *)(dd + 1);
+	for (i = 0; i < dd->num_pports; i++, ppd++) {
+		ppd->cntrs = kcalloc(dd->nportcntrs, sizeof(u64), GFP_KERNEL);
+		if (!ppd->cntrs)
+			goto bail;
+
+		ppd->scntrs = kcalloc(dd->nportcntrs, sizeof(u64), GFP_KERNEL);
+		if (!ppd->scntrs)
+			goto bail;
+	}
+
+	/* CPU counters need to be allocated and zeroed */
+	if (init_cpu_counters(dd))
+		goto bail;
+
+	mod_timer(&dd->synth_stats_timer, jiffies + HZ * SYNTH_CNT_TIME);
+	return 0;
+bail:
+	free_cntrs(dd);
+	return -ENOMEM;
+}
+
+
+static u32 chip_to_opa_lstate(struct hfi1_devdata *dd, u32 chip_lstate)
+{
+	switch (chip_lstate) {
+	default:
+		dd_dev_err(dd,
+			 "Unknown logical state 0x%x, reporting IB_PORT_DOWN\n",
+			 chip_lstate);
+		/* fall through */
+	case LSTATE_DOWN:
+		return IB_PORT_DOWN;
+	case LSTATE_INIT:
+		return IB_PORT_INIT;
+	case LSTATE_ARMED:
+		return IB_PORT_ARMED;
+	case LSTATE_ACTIVE:
+		return IB_PORT_ACTIVE;
+	}
+}
+
+u32 chip_to_opa_pstate(struct hfi1_devdata *dd, u32 chip_pstate)
+{
+	/* look at the HFI meta-states only */
+	switch (chip_pstate & 0xf0) {
+	default:
+		dd_dev_err(dd, "Unexpected chip physical state of 0x%x\n",
+			chip_pstate);
+		/* fall through */
+	case PLS_DISABLED:
+		return IB_PORTPHYSSTATE_DISABLED;
+	case PLS_OFFLINE:
+		return OPA_PORTPHYSSTATE_OFFLINE;
+	case PLS_POLLING:
+		return IB_PORTPHYSSTATE_POLLING;
+	case PLS_CONFIGPHY:
+		return IB_PORTPHYSSTATE_TRAINING;
+	case PLS_LINKUP:
+		return IB_PORTPHYSSTATE_LINKUP;
+	case PLS_PHYTEST:
+		return IB_PORTPHYSSTATE_PHY_TEST;
+	}
+}
+
+/* return the OPA port logical state name */
+const char *opa_lstate_name(u32 lstate)
+{
+	static const char * const port_logical_names[] = {
+		"PORT_NOP",
+		"PORT_DOWN",
+		"PORT_INIT",
+		"PORT_ARMED",
+		"PORT_ACTIVE",
+		"PORT_ACTIVE_DEFER",
+	};
+	if (lstate < ARRAY_SIZE(port_logical_names))
+		return port_logical_names[lstate];
+	return "unknown";
+}
+
+/* return the OPA port physical state name */
+const char *opa_pstate_name(u32 pstate)
+{
+	static const char * const port_physical_names[] = {
+		"PHYS_NOP",
+		"reserved1",
+		"PHYS_POLL",
+		"PHYS_DISABLED",
+		"PHYS_TRAINING",
+		"PHYS_LINKUP",
+		"PHYS_LINK_ERR_RECOVER",
+		"PHYS_PHY_TEST",
+		"reserved8",
+		"PHYS_OFFLINE",
+		"PHYS_GANGED",
+		"PHYS_TEST",
+	};
+	if (pstate < ARRAY_SIZE(port_physical_names))
+		return port_physical_names[pstate];
+	return "unknown";
+}
+
+/*
+ * Read the hardware link state and set the driver's cached value of it.
+ * Return the (new) current value.
+ */
+u32 get_logical_state(struct hfi1_pportdata *ppd)
+{
+	u32 new_state;
+
+	new_state = chip_to_opa_lstate(ppd->dd, read_logical_state(ppd->dd));
+	if (new_state != ppd->lstate) {
+		dd_dev_info(ppd->dd, "logical state changed to %s (0x%x)\n",
+			opa_lstate_name(new_state), new_state);
+		ppd->lstate = new_state;
+	}
+	/*
+	 * Set port status flags in the page mapped into userspace
+	 * memory. Do it here to ensure a reliable state - this is
+	 * the only function called by all state handling code.
+	 * Always set the flags due to the fact that the cache value
+	 * might have been changed explicitly outside of this
+	 * function.
+	 */
+	if (ppd->statusp) {
+		switch (ppd->lstate) {
+		case IB_PORT_DOWN:
+		case IB_PORT_INIT:
+			*ppd->statusp &= ~(HFI1_STATUS_IB_CONF |
+					   HFI1_STATUS_IB_READY);
+			break;
+		case IB_PORT_ARMED:
+			*ppd->statusp |= HFI1_STATUS_IB_CONF;
+			break;
+		case IB_PORT_ACTIVE:
+			*ppd->statusp |= HFI1_STATUS_IB_READY;
+			break;
+		}
+	}
+	return ppd->lstate;
+}
+
+/**
+ * wait_logical_linkstate - wait for an IB link state change to occur
+ * @ppd: port device
+ * @state: the state to wait for
+ * @msecs: the number of milliseconds to wait
+ *
+ * Wait up to msecs milliseconds for IB link state change to occur.
+ * For now, take the easy polling route.
+ * Returns 0 if state reached, otherwise -ETIMEDOUT.
+ */
+static int wait_logical_linkstate(struct hfi1_pportdata *ppd, u32 state,
+				  int msecs)
+{
+	unsigned long timeout;
+
+	timeout = jiffies + msecs_to_jiffies(msecs);
+	while (1) {
+		if (get_logical_state(ppd) == state)
+			return 0;
+		if (time_after(jiffies, timeout))
+			break;
+		msleep(20);
+	}
+	dd_dev_err(ppd->dd, "timeout waiting for link state 0x%x\n", state);
+
+	return -ETIMEDOUT;
+}
+
+u8 hfi1_ibphys_portstate(struct hfi1_pportdata *ppd)
+{
+	static u32 remembered_state = 0xff;
+	u32 pstate;
+	u32 ib_pstate;
+
+	pstate = read_physical_state(ppd->dd);
+	ib_pstate = chip_to_opa_pstate(ppd->dd, pstate);
+	if (remembered_state != ib_pstate) {
+		dd_dev_info(ppd->dd,
+			"%s: physical state changed to %s (0x%x), phy 0x%x\n",
+			__func__, opa_pstate_name(ib_pstate), ib_pstate,
+			pstate);
+		remembered_state = ib_pstate;
+	}
+	return ib_pstate;
+}
+
+/*
+ * Read/modify/write ASIC_QSFP register bits as selected by mask
+ * data: 0 or 1 in the positions depending on what needs to be written
+ * dir: 0 for read, 1 for write
+ * mask: select by setting
+ *      I2CCLK  (bit 0)
+ *      I2CDATA (bit 1)
+ */
+u64 hfi1_gpio_mod(struct hfi1_devdata *dd, u32 target, u32 data, u32 dir,
+		  u32 mask)
+{
+	u64 qsfp_oe, target_oe;
+
+	target_oe = target ? ASIC_QSFP2_OE : ASIC_QSFP1_OE;
+	if (mask) {
+		/* We are writing register bits, so lock access */
+		dir &= mask;
+		data &= mask;
+
+		qsfp_oe = read_csr(dd, target_oe);
+		qsfp_oe = (qsfp_oe & ~(u64)mask) | (u64)dir;
+		write_csr(dd, target_oe, qsfp_oe);
+	}
+	/* We are exclusively reading bits here, but it is unlikely
+	 * we'll get valid data when we set the direction of the pin
+	 * in the same call, so read should call this function again
+	 * to get valid data
+	 */
+	return read_csr(dd, target ? ASIC_QSFP2_IN : ASIC_QSFP1_IN);
+}
+
+#define CLEAR_STATIC_RATE_CONTROL_SMASK(r) \
+(r &= ~SEND_CTXT_CHECK_ENABLE_DISALLOW_PBC_STATIC_RATE_CONTROL_SMASK)
+
+#define SET_STATIC_RATE_CONTROL_SMASK(r) \
+(r |= SEND_CTXT_CHECK_ENABLE_DISALLOW_PBC_STATIC_RATE_CONTROL_SMASK)
+
+int hfi1_init_ctxt(struct send_context *sc)
+{
+	if (sc != NULL) {
+		struct hfi1_devdata *dd = sc->dd;
+		u64 reg;
+		u8 set = (sc->type == SC_USER ?
+			  HFI1_CAP_IS_USET(STATIC_RATE_CTRL) :
+			  HFI1_CAP_IS_KSET(STATIC_RATE_CTRL));
+		reg = read_kctxt_csr(dd, sc->hw_context,
+				     SEND_CTXT_CHECK_ENABLE);
+		if (set)
+			CLEAR_STATIC_RATE_CONTROL_SMASK(reg);
+		else
+			SET_STATIC_RATE_CONTROL_SMASK(reg);
+		write_kctxt_csr(dd, sc->hw_context,
+				SEND_CTXT_CHECK_ENABLE, reg);
+	}
+	return 0;
+}
+
+int hfi1_tempsense_rd(struct hfi1_devdata *dd, struct hfi1_temp *temp)
+{
+	int ret = 0;
+	u64 reg;
+
+	if (dd->icode != ICODE_RTL_SILICON) {
+		if (HFI1_CAP_IS_KSET(PRINT_UNIMPL))
+			dd_dev_info(dd, "%s: tempsense not supported by HW\n",
+				    __func__);
+		return -EINVAL;
+	}
+	reg = read_csr(dd, ASIC_STS_THERM);
+	temp->curr = ((reg >> ASIC_STS_THERM_CURR_TEMP_SHIFT) &
+		      ASIC_STS_THERM_CURR_TEMP_MASK);
+	temp->lo_lim = ((reg >> ASIC_STS_THERM_LO_TEMP_SHIFT) &
+			ASIC_STS_THERM_LO_TEMP_MASK);
+	temp->hi_lim = ((reg >> ASIC_STS_THERM_HI_TEMP_SHIFT) &
+			ASIC_STS_THERM_HI_TEMP_MASK);
+	temp->crit_lim = ((reg >> ASIC_STS_THERM_CRIT_TEMP_SHIFT) &
+			  ASIC_STS_THERM_CRIT_TEMP_MASK);
+	/* triggers is a 3-bit value - 1 bit per trigger. */
+	temp->triggers = (u8)((reg >> ASIC_STS_THERM_LOW_SHIFT) & 0x7);
+
+	return ret;
+}
+
+/* ========================================================================= */
+
+/*
+ * Enable/disable chip from delivering interrupts.
+ */
+void set_intr_state(struct hfi1_devdata *dd, u32 enable)
+{
+	int i;
+
+	/*
+	 * In HFI, the mask needs to be 1 to allow interrupts.
+	 */
+	if (enable) {
+		u64 cce_int_mask;
+		const int qsfp1_int_smask = QSFP1_INT % 64;
+		const int qsfp2_int_smask = QSFP2_INT % 64;
+
+		/* enable all interrupts */
+		for (i = 0; i < CCE_NUM_INT_CSRS; i++)
+			write_csr(dd, CCE_INT_MASK + (8*i), ~(u64)0);
+
+		/*
+		 * disable QSFP1 interrupts for HFI1, QSFP2 interrupts for HFI0
+		 * Qsfp1Int and Qsfp2Int are adjacent bits in the same CSR,
+		 * therefore just one of QSFP1_INT/QSFP2_INT can be used to find
+		 * the index of the appropriate CSR in the CCEIntMask CSR array
+		 */
+		cce_int_mask = read_csr(dd, CCE_INT_MASK +
+						(8*(QSFP1_INT/64)));
+		if (dd->hfi1_id) {
+			cce_int_mask &= ~((u64)1 << qsfp1_int_smask);
+			write_csr(dd, CCE_INT_MASK + (8*(QSFP1_INT/64)),
+					cce_int_mask);
+		} else {
+			cce_int_mask &= ~((u64)1 << qsfp2_int_smask);
+			write_csr(dd, CCE_INT_MASK + (8*(QSFP2_INT/64)),
+					cce_int_mask);
+		}
+	} else {
+		for (i = 0; i < CCE_NUM_INT_CSRS; i++)
+			write_csr(dd, CCE_INT_MASK + (8*i), 0ull);
+	}
+}
+
+/*
+ * Clear all interrupt sources on the chip.
+ */
+static void clear_all_interrupts(struct hfi1_devdata *dd)
+{
+	int i;
+
+	for (i = 0; i < CCE_NUM_INT_CSRS; i++)
+		write_csr(dd, CCE_INT_CLEAR + (8*i), ~(u64)0);
+
+	write_csr(dd, CCE_ERR_CLEAR, ~(u64)0);
+	write_csr(dd, MISC_ERR_CLEAR, ~(u64)0);
+	write_csr(dd, RCV_ERR_CLEAR, ~(u64)0);
+	write_csr(dd, SEND_ERR_CLEAR, ~(u64)0);
+	write_csr(dd, SEND_PIO_ERR_CLEAR, ~(u64)0);
+	write_csr(dd, SEND_DMA_ERR_CLEAR, ~(u64)0);
+	write_csr(dd, SEND_EGRESS_ERR_CLEAR, ~(u64)0);
+	for (i = 0; i < dd->chip_send_contexts; i++)
+		write_kctxt_csr(dd, i, SEND_CTXT_ERR_CLEAR, ~(u64)0);
+	for (i = 0; i < dd->chip_sdma_engines; i++)
+		write_kctxt_csr(dd, i, SEND_DMA_ENG_ERR_CLEAR, ~(u64)0);
+
+	write_csr(dd, DCC_ERR_FLG_CLR, ~(u64)0);
+	write_csr(dd, DC_LCB_ERR_CLR, ~(u64)0);
+	write_csr(dd, DC_DC8051_ERR_CLR, ~(u64)0);
+}
+
+/* Move to pcie.c? */
+static void disable_intx(struct pci_dev *pdev)
+{
+	pci_intx(pdev, 0);
+}
+
+static void clean_up_interrupts(struct hfi1_devdata *dd)
+{
+	int i;
+
+	/* remove irqs - must happen before disabling/turning off */
+	if (dd->num_msix_entries) {
+		/* MSI-X */
+		struct hfi1_msix_entry *me = dd->msix_entries;
+
+		for (i = 0; i < dd->num_msix_entries; i++, me++) {
+			if (me->arg == NULL) /* => no irq, no affinity */
+				break;
+			irq_set_affinity_hint(dd->msix_entries[i].msix.vector,
+					NULL);
+			free_irq(me->msix.vector, me->arg);
+		}
+	} else {
+		/* INTx */
+		if (dd->requested_intx_irq) {
+			free_irq(dd->pcidev->irq, dd);
+			dd->requested_intx_irq = 0;
+		}
+	}
+
+	/* turn off interrupts */
+	if (dd->num_msix_entries) {
+		/* MSI-X */
+		hfi1_nomsix(dd);
+	} else {
+		/* INTx */
+		disable_intx(dd->pcidev);
+	}
+
+	/* clean structures */
+	for (i = 0; i < dd->num_msix_entries; i++)
+		free_cpumask_var(dd->msix_entries[i].mask);
+	kfree(dd->msix_entries);
+	dd->msix_entries = NULL;
+	dd->num_msix_entries = 0;
+}
+
+/*
+ * Remap the interrupt source from the general handler to the given MSI-X
+ * interrupt.
+ */
+static void remap_intr(struct hfi1_devdata *dd, int isrc, int msix_intr)
+{
+	u64 reg;
+	int m, n;
+
+	/* clear from the handled mask of the general interrupt */
+	m = isrc / 64;
+	n = isrc % 64;
+	dd->gi_mask[m] &= ~((u64)1 << n);
+
+	/* direct the chip source to the given MSI-X interrupt */
+	m = isrc / 8;
+	n = isrc % 8;
+	reg = read_csr(dd, CCE_INT_MAP + (8*m));
+	reg &= ~((u64)0xff << (8*n));
+	reg |= ((u64)msix_intr & 0xff) << (8*n);
+	write_csr(dd, CCE_INT_MAP + (8*m), reg);
+}
+
+static void remap_sdma_interrupts(struct hfi1_devdata *dd,
+				  int engine, int msix_intr)
+{
+	/*
+	 * SDMA engine interrupt sources grouped by type, rather than
+	 * engine.  Per-engine interrupts are as follows:
+	 *	SDMA
+	 *	SDMAProgress
+	 *	SDMAIdle
+	 */
+	remap_intr(dd, IS_SDMA_START + 0*TXE_NUM_SDMA_ENGINES + engine,
+		msix_intr);
+	remap_intr(dd, IS_SDMA_START + 1*TXE_NUM_SDMA_ENGINES + engine,
+		msix_intr);
+	remap_intr(dd, IS_SDMA_START + 2*TXE_NUM_SDMA_ENGINES + engine,
+		msix_intr);
+}
+
+static void remap_receive_available_interrupt(struct hfi1_devdata *dd,
+					      int rx, int msix_intr)
+{
+	remap_intr(dd, IS_RCVAVAIL_START + rx, msix_intr);
+}
+
+static int request_intx_irq(struct hfi1_devdata *dd)
+{
+	int ret;
+
+	snprintf(dd->intx_name, sizeof(dd->intx_name), DRIVER_NAME"_%d",
+		dd->unit);
+	ret = request_irq(dd->pcidev->irq, general_interrupt,
+				  IRQF_SHARED, dd->intx_name, dd);
+	if (ret)
+		dd_dev_err(dd, "unable to request INTx interrupt, err %d\n",
+				ret);
+	else
+		dd->requested_intx_irq = 1;
+	return ret;
+}
+
+static int request_msix_irqs(struct hfi1_devdata *dd)
+{
+	const struct cpumask *local_mask;
+	cpumask_var_t def, rcv;
+	bool def_ret, rcv_ret;
+	int first_general, last_general;
+	int first_sdma, last_sdma;
+	int first_rx, last_rx;
+	int first_cpu, restart_cpu, curr_cpu;
+	int rcv_cpu, sdma_cpu;
+	int i, ret = 0, possible;
+	int ht;
+
+	/* calculate the ranges we are going to use */
+	first_general = 0;
+	first_sdma = last_general = first_general + 1;
+	first_rx = last_sdma = first_sdma + dd->num_sdma;
+	last_rx = first_rx + dd->n_krcv_queues;
+
+	/*
+	 * Interrupt affinity.
+	 *
+	 * non-rcv avail gets a default mask that
+	 * starts as possible cpus with threads reset
+	 * and each rcv avail reset.
+	 *
+	 * rcv avail gets node relative 1 wrapping back
+	 * to the node relative 1 as necessary.
+	 *
+	 */
+	local_mask = cpumask_of_pcibus(dd->pcidev->bus);
+	/* if first cpu is invalid, use NUMA 0 */
+	if (cpumask_first(local_mask) >= nr_cpu_ids)
+		local_mask = topology_core_cpumask(0);
+
+	def_ret = zalloc_cpumask_var(&def, GFP_KERNEL);
+	rcv_ret = zalloc_cpumask_var(&rcv, GFP_KERNEL);
+	if (!def_ret || !rcv_ret)
+		goto bail;
+	/* use local mask as default */
+	cpumask_copy(def, local_mask);
+	possible = cpumask_weight(def);
+	/* disarm threads from default */
+	ht = cpumask_weight(
+			topology_sibling_cpumask(cpumask_first(local_mask)));
+	for (i = possible/ht; i < possible; i++)
+		cpumask_clear_cpu(i, def);
+	/* reset possible */
+	possible = cpumask_weight(def);
+	/* def now has full cores on chosen node*/
+	first_cpu = cpumask_first(def);
+	if (nr_cpu_ids >= first_cpu)
+		first_cpu++;
+	restart_cpu = first_cpu;
+	curr_cpu = restart_cpu;
+
+	for (i = first_cpu; i < dd->n_krcv_queues + first_cpu; i++) {
+		cpumask_clear_cpu(curr_cpu, def);
+		cpumask_set_cpu(curr_cpu, rcv);
+		if (curr_cpu >= possible)
+			curr_cpu = restart_cpu;
+		else
+			curr_cpu++;
+	}
+	/* def mask has non-rcv, rcv has recv mask */
+	rcv_cpu = cpumask_first(rcv);
+	sdma_cpu = cpumask_first(def);
+
+	/*
+	 * Sanity check - the code expects all SDMA chip source
+	 * interrupts to be in the same CSR, starting at bit 0.  Verify
+	 * that this is true by checking the bit location of the start.
+	 */
+	BUILD_BUG_ON(IS_SDMA_START % 64);
+
+	for (i = 0; i < dd->num_msix_entries; i++) {
+		struct hfi1_msix_entry *me = &dd->msix_entries[i];
+		const char *err_info;
+		irq_handler_t handler;
+		void *arg;
+		int idx;
+		struct hfi1_ctxtdata *rcd = NULL;
+		struct sdma_engine *sde = NULL;
+
+		/* obtain the arguments to request_irq */
+		if (first_general <= i && i < last_general) {
+			idx = i - first_general;
+			handler = general_interrupt;
+			arg = dd;
+			snprintf(me->name, sizeof(me->name),
+				DRIVER_NAME"_%d", dd->unit);
+			err_info = "general";
+		} else if (first_sdma <= i && i < last_sdma) {
+			idx = i - first_sdma;
+			sde = &dd->per_sdma[idx];
+			handler = sdma_interrupt;
+			arg = sde;
+			snprintf(me->name, sizeof(me->name),
+				DRIVER_NAME"_%d sdma%d", dd->unit, idx);
+			err_info = "sdma";
+			remap_sdma_interrupts(dd, idx, i);
+		} else if (first_rx <= i && i < last_rx) {
+			idx = i - first_rx;
+			rcd = dd->rcd[idx];
+			/* no interrupt if no rcd */
+			if (!rcd)
+				continue;
+			/*
+			 * Set the interrupt register and mask for this
+			 * context's interrupt.
+			 */
+			rcd->ireg = (IS_RCVAVAIL_START+idx) / 64;
+			rcd->imask = ((u64)1) <<
+					((IS_RCVAVAIL_START+idx) % 64);
+			handler = receive_context_interrupt;
+			arg = rcd;
+			snprintf(me->name, sizeof(me->name),
+				DRIVER_NAME"_%d kctxt%d", dd->unit, idx);
+			err_info = "receive context";
+			remap_receive_available_interrupt(dd, idx, i);
+		} else {
+			/* not in our expected range - complain, then
+			   ignore it */
+			dd_dev_err(dd,
+				"Unexpected extra MSI-X interrupt %d\n", i);
+			continue;
+		}
+		/* no argument, no interrupt */
+		if (arg == NULL)
+			continue;
+		/* make sure the name is terminated */
+		me->name[sizeof(me->name)-1] = 0;
+
+		ret = request_irq(me->msix.vector, handler, 0, me->name, arg);
+		if (ret) {
+			dd_dev_err(dd,
+				"unable to allocate %s interrupt, vector %d, index %d, err %d\n",
+				 err_info, me->msix.vector, idx, ret);
+			return ret;
+		}
+		/*
+		 * assign arg after request_irq call, so it will be
+		 * cleaned up
+		 */
+		me->arg = arg;
+
+		if (!zalloc_cpumask_var(
+			&dd->msix_entries[i].mask,
+			GFP_KERNEL))
+			goto bail;
+		if (handler == sdma_interrupt) {
+			dd_dev_info(dd, "sdma engine %d cpu %d\n",
+				sde->this_idx, sdma_cpu);
+			cpumask_set_cpu(sdma_cpu, dd->msix_entries[i].mask);
+			sdma_cpu = cpumask_next(sdma_cpu, def);
+			if (sdma_cpu >= nr_cpu_ids)
+				sdma_cpu = cpumask_first(def);
+		} else if (handler == receive_context_interrupt) {
+			dd_dev_info(dd, "rcv ctxt %d cpu %d\n",
+				rcd->ctxt, rcv_cpu);
+			cpumask_set_cpu(rcv_cpu, dd->msix_entries[i].mask);
+			rcv_cpu = cpumask_next(rcv_cpu, rcv);
+			if (rcv_cpu >= nr_cpu_ids)
+				rcv_cpu = cpumask_first(rcv);
+		} else {
+			/* otherwise first def */
+			dd_dev_info(dd, "%s cpu %d\n",
+				err_info, cpumask_first(def));
+			cpumask_set_cpu(
+				cpumask_first(def), dd->msix_entries[i].mask);
+		}
+		irq_set_affinity_hint(
+			dd->msix_entries[i].msix.vector,
+			dd->msix_entries[i].mask);
+	}
+
+out:
+	free_cpumask_var(def);
+	free_cpumask_var(rcv);
+	return ret;
+bail:
+	ret = -ENOMEM;
+	goto  out;
+}
+
+/*
+ * Set the general handler to accept all interrupts, remap all
+ * chip interrupts back to MSI-X 0.
+ */
+static void reset_interrupts(struct hfi1_devdata *dd)
+{
+	int i;
+
+	/* all interrupts handled by the general handler */
+	for (i = 0; i < CCE_NUM_INT_CSRS; i++)
+		dd->gi_mask[i] = ~(u64)0;
+
+	/* all chip interrupts map to MSI-X 0 */
+	for (i = 0; i < CCE_NUM_INT_MAP_CSRS; i++)
+		write_csr(dd, CCE_INT_MAP + (8*i), 0);
+}
+
+static int set_up_interrupts(struct hfi1_devdata *dd)
+{
+	struct hfi1_msix_entry *entries;
+	u32 total, request;
+	int i, ret;
+	int single_interrupt = 0; /* we expect to have all the interrupts */
+
+	/*
+	 * Interrupt count:
+	 *	1 general, "slow path" interrupt (includes the SDMA engines
+	 *		slow source, SDMACleanupDone)
+	 *	N interrupts - one per used SDMA engine
+	 *	M interrupt - one per kernel receive context
+	 */
+	total = 1 + dd->num_sdma + dd->n_krcv_queues;
+
+	entries = kcalloc(total, sizeof(*entries), GFP_KERNEL);
+	if (!entries) {
+		dd_dev_err(dd, "cannot allocate msix table\n");
+		ret = -ENOMEM;
+		goto fail;
+	}
+	/* 1-1 MSI-X entry assignment */
+	for (i = 0; i < total; i++)
+		entries[i].msix.entry = i;
+
+	/* ask for MSI-X interrupts */
+	request = total;
+	request_msix(dd, &request, entries);
+
+	if (request == 0) {
+		/* using INTx */
+		/* dd->num_msix_entries already zero */
+		kfree(entries);
+		single_interrupt = 1;
+		dd_dev_err(dd, "MSI-X failed, using INTx interrupts\n");
+	} else {
+		/* using MSI-X */
+		dd->num_msix_entries = request;
+		dd->msix_entries = entries;
+
+		if (request != total) {
+			/* using MSI-X, with reduced interrupts */
+			dd_dev_err(
+				dd,
+				"cannot handle reduced interrupt case, want %u, got %u\n",
+				total, request);
+			ret = -EINVAL;
+			goto fail;
+		}
+		dd_dev_info(dd, "%u MSI-X interrupts allocated\n", total);
+	}
+
+	/* mask all interrupts */
+	set_intr_state(dd, 0);
+	/* clear all pending interrupts */
+	clear_all_interrupts(dd);
+
+	/* reset general handler mask, chip MSI-X mappings */
+	reset_interrupts(dd);
+
+	if (single_interrupt)
+		ret = request_intx_irq(dd);
+	else
+		ret = request_msix_irqs(dd);
+	if (ret)
+		goto fail;
+
+	return 0;
+
+fail:
+	clean_up_interrupts(dd);
+	return ret;
+}
+
+/*
+ * Set up context values in dd.  Sets:
+ *
+ *	num_rcv_contexts - number of contexts being used
+ *	n_krcv_queues - number of kernel contexts
+ *	first_user_ctxt - first non-kernel context in array of contexts
+ *	freectxts  - number of free user contexts
+ *	num_send_contexts - number of PIO send contexts being used
+ */
+static int set_up_context_variables(struct hfi1_devdata *dd)
+{
+	int num_kernel_contexts;
+	int num_user_contexts;
+	int total_contexts;
+	int ret;
+	unsigned ngroups;
+
+	/*
+	 * Kernel contexts: (to be fixed later):
+	 * - min or 2 or 1 context/numa
+	 * - Context 0 - default/errors
+	 * - Context 1 - VL15
+	 */
+	if (n_krcvqs)
+		num_kernel_contexts = n_krcvqs + MIN_KERNEL_KCTXTS;
+	else
+		num_kernel_contexts = num_online_nodes();
+	num_kernel_contexts =
+		max_t(int, MIN_KERNEL_KCTXTS, num_kernel_contexts);
+	/*
+	 * Every kernel receive context needs an ACK send context.
+	 * one send context is allocated for each VL{0-7} and VL15
+	 */
+	if (num_kernel_contexts > (dd->chip_send_contexts - num_vls - 1)) {
+		dd_dev_err(dd,
+			   "Reducing # kernel rcv contexts to: %d, from %d\n",
+			   (int)(dd->chip_send_contexts - num_vls - 1),
+			   (int)num_kernel_contexts);
+		num_kernel_contexts = dd->chip_send_contexts - num_vls - 1;
+	}
+	/*
+	 * User contexts: (to be fixed later)
+	 *	- set to num_rcv_contexts if non-zero
+	 *	- default to 1 user context per CPU
+	 */
+	if (num_rcv_contexts)
+		num_user_contexts = num_rcv_contexts;
+	else
+		num_user_contexts = num_online_cpus();
+
+	total_contexts = num_kernel_contexts + num_user_contexts;
+
+	/*
+	 * Adjust the counts given a global max.
+	 */
+	if (total_contexts > dd->chip_rcv_contexts) {
+		dd_dev_err(dd,
+			   "Reducing # user receive contexts to: %d, from %d\n",
+			   (int)(dd->chip_rcv_contexts - num_kernel_contexts),
+			   (int)num_user_contexts);
+		num_user_contexts = dd->chip_rcv_contexts - num_kernel_contexts;
+		/* recalculate */
+		total_contexts = num_kernel_contexts + num_user_contexts;
+	}
+
+	/* the first N are kernel contexts, the rest are user contexts */
+	dd->num_rcv_contexts = total_contexts;
+	dd->n_krcv_queues = num_kernel_contexts;
+	dd->first_user_ctxt = num_kernel_contexts;
+	dd->freectxts = num_user_contexts;
+	dd_dev_info(dd,
+		"rcv contexts: chip %d, used %d (kernel %d, user %d)\n",
+		(int)dd->chip_rcv_contexts,
+		(int)dd->num_rcv_contexts,
+		(int)dd->n_krcv_queues,
+		(int)dd->num_rcv_contexts - dd->n_krcv_queues);
+
+	/*
+	 * Receive array allocation:
+	 *   All RcvArray entries are divided into groups of 8. This
+	 *   is required by the hardware and will speed up writes to
+	 *   consecutive entries by using write-combining of the entire
+	 *   cacheline.
+	 *
+	 *   The number of groups are evenly divided among all contexts.
+	 *   any left over groups will be given to the first N user
+	 *   contexts.
+	 */
+	dd->rcv_entries.group_size = RCV_INCREMENT;
+	ngroups = dd->chip_rcv_array_count / dd->rcv_entries.group_size;
+	dd->rcv_entries.ngroups = ngroups / dd->num_rcv_contexts;
+	dd->rcv_entries.nctxt_extra = ngroups -
+		(dd->num_rcv_contexts * dd->rcv_entries.ngroups);
+	dd_dev_info(dd, "RcvArray groups %u, ctxts extra %u\n",
+		    dd->rcv_entries.ngroups,
+		    dd->rcv_entries.nctxt_extra);
+	if (dd->rcv_entries.ngroups * dd->rcv_entries.group_size >
+	    MAX_EAGER_ENTRIES * 2) {
+		dd->rcv_entries.ngroups = (MAX_EAGER_ENTRIES * 2) /
+			dd->rcv_entries.group_size;
+		dd_dev_info(dd,
+		   "RcvArray group count too high, change to %u\n",
+		   dd->rcv_entries.ngroups);
+		dd->rcv_entries.nctxt_extra = 0;
+	}
+	/*
+	 * PIO send contexts
+	 */
+	ret = init_sc_pools_and_sizes(dd);
+	if (ret >= 0) {	/* success */
+		dd->num_send_contexts = ret;
+		dd_dev_info(
+			dd,
+			"send contexts: chip %d, used %d (kernel %d, ack %d, user %d)\n",
+			dd->chip_send_contexts,
+			dd->num_send_contexts,
+			dd->sc_sizes[SC_KERNEL].count,
+			dd->sc_sizes[SC_ACK].count,
+			dd->sc_sizes[SC_USER].count);
+		ret = 0;	/* success */
+	}
+
+	return ret;
+}
+
+/*
+ * Set the device/port partition key table. The MAD code
+ * will ensure that, at least, the partial management
+ * partition key is present in the table.
+ */
+static void set_partition_keys(struct hfi1_pportdata *ppd)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	u64 reg = 0;
+	int i;
+
+	dd_dev_info(dd, "Setting partition keys\n");
+	for (i = 0; i < hfi1_get_npkeys(dd); i++) {
+		reg |= (ppd->pkeys[i] &
+			RCV_PARTITION_KEY_PARTITION_KEY_A_MASK) <<
+			((i % 4) *
+			 RCV_PARTITION_KEY_PARTITION_KEY_B_SHIFT);
+		/* Each register holds 4 PKey values. */
+		if ((i % 4) == 3) {
+			write_csr(dd, RCV_PARTITION_KEY +
+				  ((i - 3) * 2), reg);
+			reg = 0;
+		}
+	}
+
+	/* Always enable HW pkeys check when pkeys table is set */
+	add_rcvctrl(dd, RCV_CTRL_RCV_PARTITION_KEY_ENABLE_SMASK);
+}
+
+/*
+ * These CSRs and memories are uninitialized on reset and must be
+ * written before reading to set the ECC/parity bits.
+ *
+ * NOTE: All user context CSRs that are not mmaped write-only
+ * (e.g. the TID flows) must be initialized even if the driver never
+ * reads them.
+ */
+static void write_uninitialized_csrs_and_memories(struct hfi1_devdata *dd)
+{
+	int i, j;
+
+	/* CceIntMap */
+	for (i = 0; i < CCE_NUM_INT_MAP_CSRS; i++)
+		write_csr(dd, CCE_INT_MAP+(8*i), 0);
+
+	/* SendCtxtCreditReturnAddr */
+	for (i = 0; i < dd->chip_send_contexts; i++)
+		write_kctxt_csr(dd, i, SEND_CTXT_CREDIT_RETURN_ADDR, 0);
+
+	/* PIO Send buffers */
+	/* SDMA Send buffers */
+	/* These are not normally read, and (presently) have no method
+	   to be read, so are not pre-initialized */
+
+	/* RcvHdrAddr */
+	/* RcvHdrTailAddr */
+	/* RcvTidFlowTable */
+	for (i = 0; i < dd->chip_rcv_contexts; i++) {
+		write_kctxt_csr(dd, i, RCV_HDR_ADDR, 0);
+		write_kctxt_csr(dd, i, RCV_HDR_TAIL_ADDR, 0);
+		for (j = 0; j < RXE_NUM_TID_FLOWS; j++)
+			write_uctxt_csr(dd, i, RCV_TID_FLOW_TABLE+(8*j), 0);
+	}
+
+	/* RcvArray */
+	for (i = 0; i < dd->chip_rcv_array_count; i++)
+		write_csr(dd, RCV_ARRAY + (8*i),
+					RCV_ARRAY_RT_WRITE_ENABLE_SMASK);
+
+	/* RcvQPMapTable */
+	for (i = 0; i < 32; i++)
+		write_csr(dd, RCV_QP_MAP_TABLE + (8 * i), 0);
+}
+
+/*
+ * Use the ctrl_bits in CceCtrl to clear the status_bits in CceStatus.
+ */
+static void clear_cce_status(struct hfi1_devdata *dd, u64 status_bits,
+			     u64 ctrl_bits)
+{
+	unsigned long timeout;
+	u64 reg;
+
+	/* is the condition present? */
+	reg = read_csr(dd, CCE_STATUS);
+	if ((reg & status_bits) == 0)
+		return;
+
+	/* clear the condition */
+	write_csr(dd, CCE_CTRL, ctrl_bits);
+
+	/* wait for the condition to clear */
+	timeout = jiffies + msecs_to_jiffies(CCE_STATUS_TIMEOUT);
+	while (1) {
+		reg = read_csr(dd, CCE_STATUS);
+		if ((reg & status_bits) == 0)
+			return;
+		if (time_after(jiffies, timeout)) {
+			dd_dev_err(dd,
+				"Timeout waiting for CceStatus to clear bits 0x%llx, remaining 0x%llx\n",
+				status_bits, reg & status_bits);
+			return;
+		}
+		udelay(1);
+	}
+}
+
+/* set CCE CSRs to chip reset defaults */
+static void reset_cce_csrs(struct hfi1_devdata *dd)
+{
+	int i;
+
+	/* CCE_REVISION read-only */
+	/* CCE_REVISION2 read-only */
+	/* CCE_CTRL - bits clear automatically */
+	/* CCE_STATUS read-only, use CceCtrl to clear */
+	clear_cce_status(dd, ALL_FROZE, CCE_CTRL_SPC_UNFREEZE_SMASK);
+	clear_cce_status(dd, ALL_TXE_PAUSE, CCE_CTRL_TXE_RESUME_SMASK);
+	clear_cce_status(dd, ALL_RXE_PAUSE, CCE_CTRL_RXE_RESUME_SMASK);
+	for (i = 0; i < CCE_NUM_SCRATCH; i++)
+		write_csr(dd, CCE_SCRATCH + (8 * i), 0);
+	/* CCE_ERR_STATUS read-only */
+	write_csr(dd, CCE_ERR_MASK, 0);
+	write_csr(dd, CCE_ERR_CLEAR, ~0ull);
+	/* CCE_ERR_FORCE leave alone */
+	for (i = 0; i < CCE_NUM_32_BIT_COUNTERS; i++)
+		write_csr(dd, CCE_COUNTER_ARRAY32 + (8 * i), 0);
+	write_csr(dd, CCE_DC_CTRL, CCE_DC_CTRL_RESETCSR);
+	/* CCE_PCIE_CTRL leave alone */
+	for (i = 0; i < CCE_NUM_MSIX_VECTORS; i++) {
+		write_csr(dd, CCE_MSIX_TABLE_LOWER + (8 * i), 0);
+		write_csr(dd, CCE_MSIX_TABLE_UPPER + (8 * i),
+					CCE_MSIX_TABLE_UPPER_RESETCSR);
+	}
+	for (i = 0; i < CCE_NUM_MSIX_PBAS; i++) {
+		/* CCE_MSIX_PBA read-only */
+		write_csr(dd, CCE_MSIX_INT_GRANTED, ~0ull);
+		write_csr(dd, CCE_MSIX_VEC_CLR_WITHOUT_INT, ~0ull);
+	}
+	for (i = 0; i < CCE_NUM_INT_MAP_CSRS; i++)
+		write_csr(dd, CCE_INT_MAP, 0);
+	for (i = 0; i < CCE_NUM_INT_CSRS; i++) {
+		/* CCE_INT_STATUS read-only */
+		write_csr(dd, CCE_INT_MASK + (8 * i), 0);
+		write_csr(dd, CCE_INT_CLEAR + (8 * i), ~0ull);
+		/* CCE_INT_FORCE leave alone */
+		/* CCE_INT_BLOCKED read-only */
+	}
+	for (i = 0; i < CCE_NUM_32_BIT_INT_COUNTERS; i++)
+		write_csr(dd, CCE_INT_COUNTER_ARRAY32 + (8 * i), 0);
+}
+
+/* set ASIC CSRs to chip reset defaults */
+static void reset_asic_csrs(struct hfi1_devdata *dd)
+{
+	static DEFINE_MUTEX(asic_mutex);
+	static int called;
+	int i;
+
+	/*
+	 * If the HFIs are shared between separate nodes or VMs,
+	 * then more will need to be done here.  One idea is a module
+	 * parameter that returns early, letting the first power-on or
+	 * a known first load do the reset and blocking all others.
+	 */
+
+	/*
+	 * These CSRs should only be reset once - the first one here will
+	 * do the work.  Use a mutex so that a non-first caller waits until
+	 * the first is finished before it can proceed.
+	 */
+	mutex_lock(&asic_mutex);
+	if (called)
+		goto done;
+	called = 1;
+
+	if (dd->icode != ICODE_FPGA_EMULATION) {
+		/* emulation does not have an SBus - leave these alone */
+		/*
+		 * All writes to ASIC_CFG_SBUS_REQUEST do something.
+		 * Notes:
+		 * o The reset is not zero if aimed at the core.  See the
+		 *   SBus documentation for details.
+		 * o If the SBus firmware has been updated (e.g. by the BIOS),
+		 *   will the reset revert that?
+		 */
+		/* ASIC_CFG_SBUS_REQUEST leave alone */
+		write_csr(dd, ASIC_CFG_SBUS_EXECUTE, 0);
+	}
+	/* ASIC_SBUS_RESULT read-only */
+	write_csr(dd, ASIC_STS_SBUS_COUNTERS, 0);
+	for (i = 0; i < ASIC_NUM_SCRATCH; i++)
+		write_csr(dd, ASIC_CFG_SCRATCH + (8 * i), 0);
+	write_csr(dd, ASIC_CFG_MUTEX, 0);	/* this will clear it */
+	write_csr(dd, ASIC_CFG_DRV_STR, 0);
+	write_csr(dd, ASIC_CFG_THERM_POLL_EN, 0);
+	/* ASIC_STS_THERM read-only */
+	/* ASIC_CFG_RESET leave alone */
+
+	write_csr(dd, ASIC_PCIE_SD_HOST_CMD, 0);
+	/* ASIC_PCIE_SD_HOST_STATUS read-only */
+	write_csr(dd, ASIC_PCIE_SD_INTRPT_DATA_CODE, 0);
+	write_csr(dd, ASIC_PCIE_SD_INTRPT_ENABLE, 0);
+	/* ASIC_PCIE_SD_INTRPT_PROGRESS read-only */
+	write_csr(dd, ASIC_PCIE_SD_INTRPT_STATUS, ~0ull); /* clear */
+	/* ASIC_HFI0_PCIE_SD_INTRPT_RSPD_DATA read-only */
+	/* ASIC_HFI1_PCIE_SD_INTRPT_RSPD_DATA read-only */
+	for (i = 0; i < 16; i++)
+		write_csr(dd, ASIC_PCIE_SD_INTRPT_LIST + (8 * i), 0);
+
+	/* ASIC_GPIO_IN read-only */
+	write_csr(dd, ASIC_GPIO_OE, 0);
+	write_csr(dd, ASIC_GPIO_INVERT, 0);
+	write_csr(dd, ASIC_GPIO_OUT, 0);
+	write_csr(dd, ASIC_GPIO_MASK, 0);
+	/* ASIC_GPIO_STATUS read-only */
+	write_csr(dd, ASIC_GPIO_CLEAR, ~0ull);
+	/* ASIC_GPIO_FORCE leave alone */
+
+	/* ASIC_QSFP1_IN read-only */
+	write_csr(dd, ASIC_QSFP1_OE, 0);
+	write_csr(dd, ASIC_QSFP1_INVERT, 0);
+	write_csr(dd, ASIC_QSFP1_OUT, 0);
+	write_csr(dd, ASIC_QSFP1_MASK, 0);
+	/* ASIC_QSFP1_STATUS read-only */
+	write_csr(dd, ASIC_QSFP1_CLEAR, ~0ull);
+	/* ASIC_QSFP1_FORCE leave alone */
+
+	/* ASIC_QSFP2_IN read-only */
+	write_csr(dd, ASIC_QSFP2_OE, 0);
+	write_csr(dd, ASIC_QSFP2_INVERT, 0);
+	write_csr(dd, ASIC_QSFP2_OUT, 0);
+	write_csr(dd, ASIC_QSFP2_MASK, 0);
+	/* ASIC_QSFP2_STATUS read-only */
+	write_csr(dd, ASIC_QSFP2_CLEAR, ~0ull);
+	/* ASIC_QSFP2_FORCE leave alone */
+
+	write_csr(dd, ASIC_EEP_CTL_STAT, ASIC_EEP_CTL_STAT_RESETCSR);
+	/* this also writes a NOP command, clearing paging mode */
+	write_csr(dd, ASIC_EEP_ADDR_CMD, 0);
+	write_csr(dd, ASIC_EEP_DATA, 0);
+
+done:
+	mutex_unlock(&asic_mutex);
+}
+
+/* set MISC CSRs to chip reset defaults */
+static void reset_misc_csrs(struct hfi1_devdata *dd)
+{
+	int i;
+
+	for (i = 0; i < 32; i++) {
+		write_csr(dd, MISC_CFG_RSA_R2 + (8 * i), 0);
+		write_csr(dd, MISC_CFG_RSA_SIGNATURE + (8 * i), 0);
+		write_csr(dd, MISC_CFG_RSA_MODULUS + (8 * i), 0);
+	}
+	/* MISC_CFG_SHA_PRELOAD leave alone - always reads 0 and can
+	   only be written 128-byte chunks */
+	/* init RSA engine to clear lingering errors */
+	write_csr(dd, MISC_CFG_RSA_CMD, 1);
+	write_csr(dd, MISC_CFG_RSA_MU, 0);
+	write_csr(dd, MISC_CFG_FW_CTRL, 0);
+	/* MISC_STS_8051_DIGEST read-only */
+	/* MISC_STS_SBM_DIGEST read-only */
+	/* MISC_STS_PCIE_DIGEST read-only */
+	/* MISC_STS_FAB_DIGEST read-only */
+	/* MISC_ERR_STATUS read-only */
+	write_csr(dd, MISC_ERR_MASK, 0);
+	write_csr(dd, MISC_ERR_CLEAR, ~0ull);
+	/* MISC_ERR_FORCE leave alone */
+}
+
+/* set TXE CSRs to chip reset defaults */
+static void reset_txe_csrs(struct hfi1_devdata *dd)
+{
+	int i;
+
+	/*
+	 * TXE Kernel CSRs
+	 */
+	write_csr(dd, SEND_CTRL, 0);
+	__cm_reset(dd, 0);	/* reset CM internal state */
+	/* SEND_CONTEXTS read-only */
+	/* SEND_DMA_ENGINES read-only */
+	/* SEND_PIO_MEM_SIZE read-only */
+	/* SEND_DMA_MEM_SIZE read-only */
+	write_csr(dd, SEND_HIGH_PRIORITY_LIMIT, 0);
+	pio_reset_all(dd);	/* SEND_PIO_INIT_CTXT */
+	/* SEND_PIO_ERR_STATUS read-only */
+	write_csr(dd, SEND_PIO_ERR_MASK, 0);
+	write_csr(dd, SEND_PIO_ERR_CLEAR, ~0ull);
+	/* SEND_PIO_ERR_FORCE leave alone */
+	/* SEND_DMA_ERR_STATUS read-only */
+	write_csr(dd, SEND_DMA_ERR_MASK, 0);
+	write_csr(dd, SEND_DMA_ERR_CLEAR, ~0ull);
+	/* SEND_DMA_ERR_FORCE leave alone */
+	/* SEND_EGRESS_ERR_STATUS read-only */
+	write_csr(dd, SEND_EGRESS_ERR_MASK, 0);
+	write_csr(dd, SEND_EGRESS_ERR_CLEAR, ~0ull);
+	/* SEND_EGRESS_ERR_FORCE leave alone */
+	write_csr(dd, SEND_BTH_QP, 0);
+	write_csr(dd, SEND_STATIC_RATE_CONTROL, 0);
+	write_csr(dd, SEND_SC2VLT0, 0);
+	write_csr(dd, SEND_SC2VLT1, 0);
+	write_csr(dd, SEND_SC2VLT2, 0);
+	write_csr(dd, SEND_SC2VLT3, 0);
+	write_csr(dd, SEND_LEN_CHECK0, 0);
+	write_csr(dd, SEND_LEN_CHECK1, 0);
+	/* SEND_ERR_STATUS read-only */
+	write_csr(dd, SEND_ERR_MASK, 0);
+	write_csr(dd, SEND_ERR_CLEAR, ~0ull);
+	/* SEND_ERR_FORCE read-only */
+	for (i = 0; i < VL_ARB_LOW_PRIO_TABLE_SIZE; i++)
+		write_csr(dd, SEND_LOW_PRIORITY_LIST + (8*i), 0);
+	for (i = 0; i < VL_ARB_HIGH_PRIO_TABLE_SIZE; i++)
+		write_csr(dd, SEND_HIGH_PRIORITY_LIST + (8*i), 0);
+	for (i = 0; i < dd->chip_send_contexts/NUM_CONTEXTS_PER_SET; i++)
+		write_csr(dd, SEND_CONTEXT_SET_CTRL + (8*i), 0);
+	for (i = 0; i < TXE_NUM_32_BIT_COUNTER; i++)
+		write_csr(dd, SEND_COUNTER_ARRAY32 + (8*i), 0);
+	for (i = 0; i < TXE_NUM_64_BIT_COUNTER; i++)
+		write_csr(dd, SEND_COUNTER_ARRAY64 + (8*i), 0);
+	write_csr(dd, SEND_CM_CTRL, SEND_CM_CTRL_RESETCSR);
+	write_csr(dd, SEND_CM_GLOBAL_CREDIT,
+					SEND_CM_GLOBAL_CREDIT_RESETCSR);
+	/* SEND_CM_CREDIT_USED_STATUS read-only */
+	write_csr(dd, SEND_CM_TIMER_CTRL, 0);
+	write_csr(dd, SEND_CM_LOCAL_AU_TABLE0_TO3, 0);
+	write_csr(dd, SEND_CM_LOCAL_AU_TABLE4_TO7, 0);
+	write_csr(dd, SEND_CM_REMOTE_AU_TABLE0_TO3, 0);
+	write_csr(dd, SEND_CM_REMOTE_AU_TABLE4_TO7, 0);
+	for (i = 0; i < TXE_NUM_DATA_VL; i++)
+		write_csr(dd, SEND_CM_CREDIT_VL + (8*i), 0);
+	write_csr(dd, SEND_CM_CREDIT_VL15, 0);
+	/* SEND_CM_CREDIT_USED_VL read-only */
+	/* SEND_CM_CREDIT_USED_VL15 read-only */
+	/* SEND_EGRESS_CTXT_STATUS read-only */
+	/* SEND_EGRESS_SEND_DMA_STATUS read-only */
+	write_csr(dd, SEND_EGRESS_ERR_INFO, ~0ull);
+	/* SEND_EGRESS_ERR_INFO read-only */
+	/* SEND_EGRESS_ERR_SOURCE read-only */
+
+	/*
+	 * TXE Per-Context CSRs
+	 */
+	for (i = 0; i < dd->chip_send_contexts; i++) {
+		write_kctxt_csr(dd, i, SEND_CTXT_CTRL, 0);
+		write_kctxt_csr(dd, i, SEND_CTXT_CREDIT_CTRL, 0);
+		write_kctxt_csr(dd, i, SEND_CTXT_CREDIT_RETURN_ADDR, 0);
+		write_kctxt_csr(dd, i, SEND_CTXT_CREDIT_FORCE, 0);
+		write_kctxt_csr(dd, i, SEND_CTXT_ERR_MASK, 0);
+		write_kctxt_csr(dd, i, SEND_CTXT_ERR_CLEAR, ~0ull);
+		write_kctxt_csr(dd, i, SEND_CTXT_CHECK_ENABLE, 0);
+		write_kctxt_csr(dd, i, SEND_CTXT_CHECK_VL, 0);
+		write_kctxt_csr(dd, i, SEND_CTXT_CHECK_JOB_KEY, 0);
+		write_kctxt_csr(dd, i, SEND_CTXT_CHECK_PARTITION_KEY, 0);
+		write_kctxt_csr(dd, i, SEND_CTXT_CHECK_SLID, 0);
+		write_kctxt_csr(dd, i, SEND_CTXT_CHECK_OPCODE, 0);
+	}
+
+	/*
+	 * TXE Per-SDMA CSRs
+	 */
+	for (i = 0; i < dd->chip_sdma_engines; i++) {
+		write_kctxt_csr(dd, i, SEND_DMA_CTRL, 0);
+		/* SEND_DMA_STATUS read-only */
+		write_kctxt_csr(dd, i, SEND_DMA_BASE_ADDR, 0);
+		write_kctxt_csr(dd, i, SEND_DMA_LEN_GEN, 0);
+		write_kctxt_csr(dd, i, SEND_DMA_TAIL, 0);
+		/* SEND_DMA_HEAD read-only */
+		write_kctxt_csr(dd, i, SEND_DMA_HEAD_ADDR, 0);
+		write_kctxt_csr(dd, i, SEND_DMA_PRIORITY_THLD, 0);
+		/* SEND_DMA_IDLE_CNT read-only */
+		write_kctxt_csr(dd, i, SEND_DMA_RELOAD_CNT, 0);
+		write_kctxt_csr(dd, i, SEND_DMA_DESC_CNT, 0);
+		/* SEND_DMA_DESC_FETCHED_CNT read-only */
+		/* SEND_DMA_ENG_ERR_STATUS read-only */
+		write_kctxt_csr(dd, i, SEND_DMA_ENG_ERR_MASK, 0);
+		write_kctxt_csr(dd, i, SEND_DMA_ENG_ERR_CLEAR, ~0ull);
+		/* SEND_DMA_ENG_ERR_FORCE leave alone */
+		write_kctxt_csr(dd, i, SEND_DMA_CHECK_ENABLE, 0);
+		write_kctxt_csr(dd, i, SEND_DMA_CHECK_VL, 0);
+		write_kctxt_csr(dd, i, SEND_DMA_CHECK_JOB_KEY, 0);
+		write_kctxt_csr(dd, i, SEND_DMA_CHECK_PARTITION_KEY, 0);
+		write_kctxt_csr(dd, i, SEND_DMA_CHECK_SLID, 0);
+		write_kctxt_csr(dd, i, SEND_DMA_CHECK_OPCODE, 0);
+		write_kctxt_csr(dd, i, SEND_DMA_MEMORY, 0);
+	}
+}
+
+/*
+ * Expect on entry:
+ * o Packet ingress is disabled, i.e. RcvCtrl.RcvPortEnable == 0
+ */
+static void init_rbufs(struct hfi1_devdata *dd)
+{
+	u64 reg;
+	int count;
+
+	/*
+	 * Wait for DMA to stop: RxRbufPktPending and RxPktInProgress are
+	 * clear.
+	 */
+	count = 0;
+	while (1) {
+		reg = read_csr(dd, RCV_STATUS);
+		if ((reg & (RCV_STATUS_RX_RBUF_PKT_PENDING_SMASK
+			    | RCV_STATUS_RX_PKT_IN_PROGRESS_SMASK)) == 0)
+			break;
+		/*
+		 * Give up after 1ms - maximum wait time.
+		 *
+		 * RBuf size is 148KiB.  Slowest possible is PCIe Gen1 x1 at
+		 * 250MB/s bandwidth.  Lower rate to 66% for overhead to get:
+		 *	148 KB / (66% * 250MB/s) = 920us
+		 */
+		if (count++ > 500) {
+			dd_dev_err(dd,
+				"%s: in-progress DMA not clearing: RcvStatus 0x%llx, continuing\n",
+				__func__, reg);
+			break;
+		}
+		udelay(2); /* do not busy-wait the CSR */
+	}
+
+	/* start the init - expect RcvCtrl to be 0 */
+	write_csr(dd, RCV_CTRL, RCV_CTRL_RX_RBUF_INIT_SMASK);
+
+	/*
+	 * Read to force the write of Rcvtrl.RxRbufInit.  There is a brief
+	 * period after the write before RcvStatus.RxRbufInitDone is valid.
+	 * The delay in the first run through the loop below is sufficient and
+	 * required before the first read of RcvStatus.RxRbufInintDone.
+	 */
+	read_csr(dd, RCV_CTRL);
+
+	/* wait for the init to finish */
+	count = 0;
+	while (1) {
+		/* delay is required first time through - see above */
+		udelay(2); /* do not busy-wait the CSR */
+		reg = read_csr(dd, RCV_STATUS);
+		if (reg & (RCV_STATUS_RX_RBUF_INIT_DONE_SMASK))
+			break;
+
+		/* give up after 100us - slowest possible at 33MHz is 73us */
+		if (count++ > 50) {
+			dd_dev_err(dd,
+				"%s: RcvStatus.RxRbufInit not set, continuing\n",
+				__func__);
+			break;
+		}
+	}
+}
+
+/* set RXE CSRs to chip reset defaults */
+static void reset_rxe_csrs(struct hfi1_devdata *dd)
+{
+	int i, j;
+
+	/*
+	 * RXE Kernel CSRs
+	 */
+	write_csr(dd, RCV_CTRL, 0);
+	init_rbufs(dd);
+	/* RCV_STATUS read-only */
+	/* RCV_CONTEXTS read-only */
+	/* RCV_ARRAY_CNT read-only */
+	/* RCV_BUF_SIZE read-only */
+	write_csr(dd, RCV_BTH_QP, 0);
+	write_csr(dd, RCV_MULTICAST, 0);
+	write_csr(dd, RCV_BYPASS, 0);
+	write_csr(dd, RCV_VL15, 0);
+	/* this is a clear-down */
+	write_csr(dd, RCV_ERR_INFO,
+			RCV_ERR_INFO_RCV_EXCESS_BUFFER_OVERRUN_SMASK);
+	/* RCV_ERR_STATUS read-only */
+	write_csr(dd, RCV_ERR_MASK, 0);
+	write_csr(dd, RCV_ERR_CLEAR, ~0ull);
+	/* RCV_ERR_FORCE leave alone */
+	for (i = 0; i < 32; i++)
+		write_csr(dd, RCV_QP_MAP_TABLE + (8 * i), 0);
+	for (i = 0; i < 4; i++)
+		write_csr(dd, RCV_PARTITION_KEY + (8 * i), 0);
+	for (i = 0; i < RXE_NUM_32_BIT_COUNTERS; i++)
+		write_csr(dd, RCV_COUNTER_ARRAY32 + (8 * i), 0);
+	for (i = 0; i < RXE_NUM_64_BIT_COUNTERS; i++)
+		write_csr(dd, RCV_COUNTER_ARRAY64 + (8 * i), 0);
+	for (i = 0; i < RXE_NUM_RSM_INSTANCES; i++) {
+		write_csr(dd, RCV_RSM_CFG + (8 * i), 0);
+		write_csr(dd, RCV_RSM_SELECT + (8 * i), 0);
+		write_csr(dd, RCV_RSM_MATCH + (8 * i), 0);
+	}
+	for (i = 0; i < 32; i++)
+		write_csr(dd, RCV_RSM_MAP_TABLE + (8 * i), 0);
+
+	/*
+	 * RXE Kernel and User Per-Context CSRs
+	 */
+	for (i = 0; i < dd->chip_rcv_contexts; i++) {
+		/* kernel */
+		write_kctxt_csr(dd, i, RCV_CTXT_CTRL, 0);
+		/* RCV_CTXT_STATUS read-only */
+		write_kctxt_csr(dd, i, RCV_EGR_CTRL, 0);
+		write_kctxt_csr(dd, i, RCV_TID_CTRL, 0);
+		write_kctxt_csr(dd, i, RCV_KEY_CTRL, 0);
+		write_kctxt_csr(dd, i, RCV_HDR_ADDR, 0);
+		write_kctxt_csr(dd, i, RCV_HDR_CNT, 0);
+		write_kctxt_csr(dd, i, RCV_HDR_ENT_SIZE, 0);
+		write_kctxt_csr(dd, i, RCV_HDR_SIZE, 0);
+		write_kctxt_csr(dd, i, RCV_HDR_TAIL_ADDR, 0);
+		write_kctxt_csr(dd, i, RCV_AVAIL_TIME_OUT, 0);
+		write_kctxt_csr(dd, i, RCV_HDR_OVFL_CNT, 0);
+
+		/* user */
+		/* RCV_HDR_TAIL read-only */
+		write_uctxt_csr(dd, i, RCV_HDR_HEAD, 0);
+		/* RCV_EGR_INDEX_TAIL read-only */
+		write_uctxt_csr(dd, i, RCV_EGR_INDEX_HEAD, 0);
+		/* RCV_EGR_OFFSET_TAIL read-only */
+		for (j = 0; j < RXE_NUM_TID_FLOWS; j++) {
+			write_uctxt_csr(dd, i, RCV_TID_FLOW_TABLE + (8 * j),
+				0);
+		}
+	}
+}
+
+/*
+ * Set sc2vl tables.
+ *
+ * They power on to zeros, so to avoid send context errors
+ * they need to be set:
+ *
+ * SC 0-7 -> VL 0-7 (respectively)
+ * SC 15  -> VL 15
+ * otherwise
+ *        -> VL 0
+ */
+static void init_sc2vl_tables(struct hfi1_devdata *dd)
+{
+	int i;
+	/* init per architecture spec, constrained by hardware capability */
+
+	/* HFI maps sent packets */
+	write_csr(dd, SEND_SC2VLT0, SC2VL_VAL(
+		0,
+		0, 0, 1, 1,
+		2, 2, 3, 3,
+		4, 4, 5, 5,
+		6, 6, 7, 7));
+	write_csr(dd, SEND_SC2VLT1, SC2VL_VAL(
+		1,
+		8, 0, 9, 0,
+		10, 0, 11, 0,
+		12, 0, 13, 0,
+		14, 0, 15, 15));
+	write_csr(dd, SEND_SC2VLT2, SC2VL_VAL(
+		2,
+		16, 0, 17, 0,
+		18, 0, 19, 0,
+		20, 0, 21, 0,
+		22, 0, 23, 0));
+	write_csr(dd, SEND_SC2VLT3, SC2VL_VAL(
+		3,
+		24, 0, 25, 0,
+		26, 0, 27, 0,
+		28, 0, 29, 0,
+		30, 0, 31, 0));
+
+	/* DC maps received packets */
+	write_csr(dd, DCC_CFG_SC_VL_TABLE_15_0, DC_SC_VL_VAL(
+		15_0,
+		0, 0, 1, 1,  2, 2,  3, 3,  4, 4,  5, 5,  6, 6,  7,  7,
+		8, 0, 9, 0, 10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 15));
+	write_csr(dd, DCC_CFG_SC_VL_TABLE_31_16, DC_SC_VL_VAL(
+		31_16,
+		16, 0, 17, 0, 18, 0, 19, 0, 20, 0, 21, 0, 22, 0, 23, 0,
+		24, 0, 25, 0, 26, 0, 27, 0, 28, 0, 29, 0, 30, 0, 31, 0));
+
+	/* initialize the cached sc2vl values consistently with h/w */
+	for (i = 0; i < 32; i++) {
+		if (i < 8 || i == 15)
+			*((u8 *)(dd->sc2vl) + i) = (u8)i;
+		else
+			*((u8 *)(dd->sc2vl) + i) = 0;
+	}
+}
+
+/*
+ * Read chip sizes and then reset parts to sane, disabled, values.  We cannot
+ * depend on the chip going through a power-on reset - a driver may be loaded
+ * and unloaded many times.
+ *
+ * Do not write any CSR values to the chip in this routine - there may be
+ * a reset following the (possible) FLR in this routine.
+ *
+ */
+static void init_chip(struct hfi1_devdata *dd)
+{
+	int i;
+
+	/*
+	 * Put the HFI CSRs in a known state.
+	 * Combine this with a DC reset.
+	 *
+	 * Stop the device from doing anything while we do a
+	 * reset.  We know there are no other active users of
+	 * the device since we are now in charge.  Turn off
+	 * off all outbound and inbound traffic and make sure
+	 * the device does not generate any interrupts.
+	 */
+
+	/* disable send contexts and SDMA engines */
+	write_csr(dd, SEND_CTRL, 0);
+	for (i = 0; i < dd->chip_send_contexts; i++)
+		write_kctxt_csr(dd, i, SEND_CTXT_CTRL, 0);
+	for (i = 0; i < dd->chip_sdma_engines; i++)
+		write_kctxt_csr(dd, i, SEND_DMA_CTRL, 0);
+	/* disable port (turn off RXE inbound traffic) and contexts */
+	write_csr(dd, RCV_CTRL, 0);
+	for (i = 0; i < dd->chip_rcv_contexts; i++)
+		write_csr(dd, RCV_CTXT_CTRL, 0);
+	/* mask all interrupt sources */
+	for (i = 0; i < CCE_NUM_INT_CSRS; i++)
+		write_csr(dd, CCE_INT_MASK + (8*i), 0ull);
+
+	/*
+	 * DC Reset: do a full DC reset before the register clear.
+	 * A recommended length of time to hold is one CSR read,
+	 * so reread the CceDcCtrl.  Then, hold the DC in reset
+	 * across the clear.
+	 */
+	write_csr(dd, CCE_DC_CTRL, CCE_DC_CTRL_DC_RESET_SMASK);
+	(void) read_csr(dd, CCE_DC_CTRL);
+
+	if (use_flr) {
+		/*
+		 * A FLR will reset the SPC core and part of the PCIe.
+		 * The parts that need to be restored have already been
+		 * saved.
+		 */
+		dd_dev_info(dd, "Resetting CSRs with FLR\n");
+
+		/* do the FLR, the DC reset will remain */
+		hfi1_pcie_flr(dd);
+
+		/* restore command and BARs */
+		restore_pci_variables(dd);
+
+		if (is_a0(dd)) {
+			dd_dev_info(dd, "Resetting CSRs with FLR\n");
+			hfi1_pcie_flr(dd);
+			restore_pci_variables(dd);
+		}
+
+	} else {
+		dd_dev_info(dd, "Resetting CSRs with writes\n");
+		reset_cce_csrs(dd);
+		reset_txe_csrs(dd);
+		reset_rxe_csrs(dd);
+		reset_asic_csrs(dd);
+		reset_misc_csrs(dd);
+	}
+	/* clear the DC reset */
+	write_csr(dd, CCE_DC_CTRL, 0);
+	/* Set the LED off */
+	if (is_a0(dd))
+		setextled(dd, 0);
+	/*
+	 * Clear the QSFP reset.
+	 * A0 leaves the out lines floating on power on, then on an FLR
+	 * enforces a 0 on all out pins.  The driver does not touch
+	 * ASIC_QSFPn_OUT otherwise.  This leaves RESET_N low and
+	 * anything  plugged constantly in reset, if it pays attention
+	 * to RESET_N.
+	 * A prime example of this is SiPh. For now, set all pins high.
+	 * I2CCLK and I2CDAT will change per direction, and INT_N and
+	 * MODPRS_N are input only and their value is ignored.
+	 */
+	if (is_a0(dd)) {
+		write_csr(dd, ASIC_QSFP1_OUT, 0x1f);
+		write_csr(dd, ASIC_QSFP2_OUT, 0x1f);
+	}
+}
+
+static void init_early_variables(struct hfi1_devdata *dd)
+{
+	int i;
+
+	/* assign link credit variables */
+	dd->vau = CM_VAU;
+	dd->link_credits = CM_GLOBAL_CREDITS;
+	if (is_a0(dd))
+		dd->link_credits--;
+	dd->vcu = cu_to_vcu(hfi1_cu);
+	/* enough room for 8 MAD packets plus header - 17K */
+	dd->vl15_init = (8 * (2048 + 128)) / vau_to_au(dd->vau);
+	if (dd->vl15_init > dd->link_credits)
+		dd->vl15_init = dd->link_credits;
+
+	write_uninitialized_csrs_and_memories(dd);
+
+	if (HFI1_CAP_IS_KSET(PKEY_CHECK))
+		for (i = 0; i < dd->num_pports; i++) {
+			struct hfi1_pportdata *ppd = &dd->pport[i];
+
+			set_partition_keys(ppd);
+		}
+	init_sc2vl_tables(dd);
+}
+
+static void init_kdeth_qp(struct hfi1_devdata *dd)
+{
+	/* user changed the KDETH_QP */
+	if (kdeth_qp != 0 && kdeth_qp >= 0xff) {
+		/* out of range or illegal value */
+		dd_dev_err(dd, "Invalid KDETH queue pair prefix, ignoring");
+		kdeth_qp = 0;
+	}
+	if (kdeth_qp == 0)	/* not set, or failed range check */
+		kdeth_qp = DEFAULT_KDETH_QP;
+
+	write_csr(dd, SEND_BTH_QP,
+			(kdeth_qp & SEND_BTH_QP_KDETH_QP_MASK)
+				<< SEND_BTH_QP_KDETH_QP_SHIFT);
+
+	write_csr(dd, RCV_BTH_QP,
+			(kdeth_qp & RCV_BTH_QP_KDETH_QP_MASK)
+				<< RCV_BTH_QP_KDETH_QP_SHIFT);
+}
+
+/**
+ * init_qpmap_table
+ * @dd - device data
+ * @first_ctxt - first context
+ * @last_ctxt - first context
+ *
+ * This return sets the qpn mapping table that
+ * is indexed by qpn[8:1].
+ *
+ * The routine will round robin the 256 settings
+ * from first_ctxt to last_ctxt.
+ *
+ * The first/last looks ahead to having specialized
+ * receive contexts for mgmt and bypass.  Normal
+ * verbs traffic will assumed to be on a range
+ * of receive contexts.
+ */
+static void init_qpmap_table(struct hfi1_devdata *dd,
+			     u32 first_ctxt,
+			     u32 last_ctxt)
+{
+	u64 reg = 0;
+	u64 regno = RCV_QP_MAP_TABLE;
+	int i;
+	u64 ctxt = first_ctxt;
+
+	for (i = 0; i < 256;) {
+		if (ctxt == VL15CTXT) {
+			ctxt++;
+			if (ctxt > last_ctxt)
+				ctxt = first_ctxt;
+			continue;
+		}
+		reg |= ctxt << (8 * (i % 8));
+		i++;
+		ctxt++;
+		if (ctxt > last_ctxt)
+			ctxt = first_ctxt;
+		if (i % 8 == 0) {
+			write_csr(dd, regno, reg);
+			reg = 0;
+			regno += 8;
+		}
+	}
+	if (i % 8)
+		write_csr(dd, regno, reg);
+
+	add_rcvctrl(dd, RCV_CTRL_RCV_QP_MAP_ENABLE_SMASK
+			| RCV_CTRL_RCV_BYPASS_ENABLE_SMASK);
+}
+
+/**
+ * init_qos - init RX qos
+ * @dd - device data
+ * @first_context
+ *
+ * This routine initializes Rule 0 and the
+ * RSM map table to implement qos.
+ *
+ * If all of the limit tests succeed,
+ * qos is applied based on the array
+ * interpretation of krcvqs where
+ * entry 0 is VL0.
+ *
+ * The number of vl bits (n) and the number of qpn
+ * bits (m) are computed to feed both the RSM map table
+ * and the single rule.
+ *
+ */
+static void init_qos(struct hfi1_devdata *dd, u32 first_ctxt)
+{
+	u8 max_by_vl = 0;
+	unsigned qpns_per_vl, ctxt, i, qpn, n = 1, m;
+	u64 *rsmmap;
+	u64 reg;
+	u8  rxcontext = is_a0(dd) ? 0 : 0xff;  /* 0 is default if a0 ver. */
+
+	/* validate */
+	if (dd->n_krcv_queues <= MIN_KERNEL_KCTXTS ||
+	    num_vls == 1 ||
+	    krcvqsset <= 1)
+		goto bail;
+	for (i = 0; i < min_t(unsigned, num_vls, krcvqsset); i++)
+		if (krcvqs[i] > max_by_vl)
+			max_by_vl = krcvqs[i];
+	if (max_by_vl > 32)
+		goto bail;
+	qpns_per_vl = __roundup_pow_of_two(max_by_vl);
+	/* determine bits vl */
+	n = ilog2(num_vls);
+	/* determine bits for qpn */
+	m = ilog2(qpns_per_vl);
+	if ((m + n) > 7)
+		goto bail;
+	if (num_vls * qpns_per_vl > dd->chip_rcv_contexts)
+		goto bail;
+	rsmmap = kmalloc_array(NUM_MAP_REGS, sizeof(u64), GFP_KERNEL);
+	memset(rsmmap, rxcontext, NUM_MAP_REGS * sizeof(u64));
+	/* init the local copy of the table */
+	for (i = 0, ctxt = first_ctxt; i < num_vls; i++) {
+		unsigned tctxt;
+
+		for (qpn = 0, tctxt = ctxt;
+		     krcvqs[i] && qpn < qpns_per_vl; qpn++) {
+			unsigned idx, regoff, regidx;
+
+			/* generate index <= 128 */
+			idx = (qpn << n) ^ i;
+			regoff = (idx % 8) * 8;
+			regidx = idx / 8;
+			reg = rsmmap[regidx];
+			/* replace 0xff with context number */
+			reg &= ~(RCV_RSM_MAP_TABLE_RCV_CONTEXT_A_MASK
+				<< regoff);
+			reg |= (u64)(tctxt++) << regoff;
+			rsmmap[regidx] = reg;
+			if (tctxt == ctxt + krcvqs[i])
+				tctxt = ctxt;
+		}
+		ctxt += krcvqs[i];
+	}
+	/* flush cached copies to chip */
+	for (i = 0; i < NUM_MAP_REGS; i++)
+		write_csr(dd, RCV_RSM_MAP_TABLE + (8 * i), rsmmap[i]);
+	/* add rule0 */
+	write_csr(dd, RCV_RSM_CFG /* + (8 * 0) */,
+		RCV_RSM_CFG_ENABLE_OR_CHAIN_RSM0_MASK
+			<< RCV_RSM_CFG_ENABLE_OR_CHAIN_RSM0_SHIFT |
+		2ull << RCV_RSM_CFG_PACKET_TYPE_SHIFT);
+	write_csr(dd, RCV_RSM_SELECT /* + (8 * 0) */,
+		LRH_BTH_MATCH_OFFSET
+			<< RCV_RSM_SELECT_FIELD1_OFFSET_SHIFT |
+		LRH_SC_MATCH_OFFSET << RCV_RSM_SELECT_FIELD2_OFFSET_SHIFT |
+		LRH_SC_SELECT_OFFSET << RCV_RSM_SELECT_INDEX1_OFFSET_SHIFT |
+		((u64)n) << RCV_RSM_SELECT_INDEX1_WIDTH_SHIFT |
+		QPN_SELECT_OFFSET << RCV_RSM_SELECT_INDEX2_OFFSET_SHIFT |
+		((u64)m + (u64)n) << RCV_RSM_SELECT_INDEX2_WIDTH_SHIFT);
+	write_csr(dd, RCV_RSM_MATCH /* + (8 * 0) */,
+		LRH_BTH_MASK << RCV_RSM_MATCH_MASK1_SHIFT |
+		LRH_BTH_VALUE << RCV_RSM_MATCH_VALUE1_SHIFT |
+		LRH_SC_MASK << RCV_RSM_MATCH_MASK2_SHIFT |
+		LRH_SC_VALUE << RCV_RSM_MATCH_VALUE2_SHIFT);
+	/* Enable RSM */
+	add_rcvctrl(dd, RCV_CTRL_RCV_RSM_ENABLE_SMASK);
+	kfree(rsmmap);
+	/* map everything else (non-VL15) to context 0 */
+	init_qpmap_table(
+		dd,
+		0,
+		0);
+	dd->qos_shift = n + 1;
+	return;
+bail:
+	dd->qos_shift = 1;
+	init_qpmap_table(
+		dd,
+		dd->n_krcv_queues > MIN_KERNEL_KCTXTS ? MIN_KERNEL_KCTXTS : 0,
+		dd->n_krcv_queues - 1);
+}
+
+static void init_rxe(struct hfi1_devdata *dd)
+{
+	/* enable all receive errors */
+	write_csr(dd, RCV_ERR_MASK, ~0ull);
+	/* setup QPN map table - start where VL15 context leaves off */
+	init_qos(
+		dd,
+		dd->n_krcv_queues > MIN_KERNEL_KCTXTS ? MIN_KERNEL_KCTXTS : 0);
+	/*
+	 * make sure RcvCtrl.RcvWcb <= PCIe Device Control
+	 * Register Max_Payload_Size (PCI_EXP_DEVCTL in Linux PCIe config
+	 * space, PciCfgCap2.MaxPayloadSize in HFI).  There is only one
+	 * invalid configuration: RcvCtrl.RcvWcb set to its max of 256 and
+	 * Max_PayLoad_Size set to its minimum of 128.
+	 *
+	 * Presently, RcvCtrl.RcvWcb is not modified from its default of 0
+	 * (64 bytes).  Max_Payload_Size is possibly modified upward in
+	 * tune_pcie_caps() which is called after this routine.
+	 */
+}
+
+static void init_other(struct hfi1_devdata *dd)
+{
+	/* enable all CCE errors */
+	write_csr(dd, CCE_ERR_MASK, ~0ull);
+	/* enable *some* Misc errors */
+	write_csr(dd, MISC_ERR_MASK, DRIVER_MISC_MASK);
+	/* enable all DC errors, except LCB */
+	write_csr(dd, DCC_ERR_FLG_EN, ~0ull);
+	write_csr(dd, DC_DC8051_ERR_EN, ~0ull);
+}
+
+/*
+ * Fill out the given AU table using the given CU.  A CU is defined in terms
+ * AUs.  The table is a an encoding: given the index, how many AUs does that
+ * represent?
+ *
+ * NOTE: Assumes that the register layout is the same for the
+ * local and remote tables.
+ */
+static void assign_cm_au_table(struct hfi1_devdata *dd, u32 cu,
+			       u32 csr0to3, u32 csr4to7)
+{
+	write_csr(dd, csr0to3,
+		   0ull <<
+			SEND_CM_LOCAL_AU_TABLE0_TO3_LOCAL_AU_TABLE0_SHIFT
+		|  1ull <<
+			SEND_CM_LOCAL_AU_TABLE0_TO3_LOCAL_AU_TABLE1_SHIFT
+		|  2ull * cu <<
+			SEND_CM_LOCAL_AU_TABLE0_TO3_LOCAL_AU_TABLE2_SHIFT
+		|  4ull * cu <<
+			SEND_CM_LOCAL_AU_TABLE0_TO3_LOCAL_AU_TABLE3_SHIFT);
+	write_csr(dd, csr4to7,
+		   8ull * cu <<
+			SEND_CM_LOCAL_AU_TABLE4_TO7_LOCAL_AU_TABLE4_SHIFT
+		| 16ull * cu <<
+			SEND_CM_LOCAL_AU_TABLE4_TO7_LOCAL_AU_TABLE5_SHIFT
+		| 32ull * cu <<
+			SEND_CM_LOCAL_AU_TABLE4_TO7_LOCAL_AU_TABLE6_SHIFT
+		| 64ull * cu <<
+			SEND_CM_LOCAL_AU_TABLE4_TO7_LOCAL_AU_TABLE7_SHIFT);
+
+}
+
+static void assign_local_cm_au_table(struct hfi1_devdata *dd, u8 vcu)
+{
+	assign_cm_au_table(dd, vcu_to_cu(vcu), SEND_CM_LOCAL_AU_TABLE0_TO3,
+					SEND_CM_LOCAL_AU_TABLE4_TO7);
+}
+
+void assign_remote_cm_au_table(struct hfi1_devdata *dd, u8 vcu)
+{
+	assign_cm_au_table(dd, vcu_to_cu(vcu), SEND_CM_REMOTE_AU_TABLE0_TO3,
+					SEND_CM_REMOTE_AU_TABLE4_TO7);
+}
+
+static void init_txe(struct hfi1_devdata *dd)
+{
+	int i;
+
+	/* enable all PIO, SDMA, general, and Egress errors */
+	write_csr(dd, SEND_PIO_ERR_MASK, ~0ull);
+	write_csr(dd, SEND_DMA_ERR_MASK, ~0ull);
+	write_csr(dd, SEND_ERR_MASK, ~0ull);
+	write_csr(dd, SEND_EGRESS_ERR_MASK, ~0ull);
+
+	/* enable all per-context and per-SDMA engine errors */
+	for (i = 0; i < dd->chip_send_contexts; i++)
+		write_kctxt_csr(dd, i, SEND_CTXT_ERR_MASK, ~0ull);
+	for (i = 0; i < dd->chip_sdma_engines; i++)
+		write_kctxt_csr(dd, i, SEND_DMA_ENG_ERR_MASK, ~0ull);
+
+	/* set the local CU to AU mapping */
+	assign_local_cm_au_table(dd, dd->vcu);
+
+	/*
+	 * Set reasonable default for Credit Return Timer
+	 * Don't set on Simulator - causes it to choke.
+	 */
+	if (dd->icode != ICODE_FUNCTIONAL_SIMULATOR)
+		write_csr(dd, SEND_CM_TIMER_CTRL, HFI1_CREDIT_RETURN_RATE);
+}
+
+int hfi1_set_ctxt_jkey(struct hfi1_devdata *dd, unsigned ctxt, u16 jkey)
+{
+	struct hfi1_ctxtdata *rcd = dd->rcd[ctxt];
+	unsigned sctxt;
+	int ret = 0;
+	u64 reg;
+
+	if (!rcd || !rcd->sc) {
+		ret = -EINVAL;
+		goto done;
+	}
+	sctxt = rcd->sc->hw_context;
+	reg = SEND_CTXT_CHECK_JOB_KEY_MASK_SMASK | /* mask is always 1's */
+		((jkey & SEND_CTXT_CHECK_JOB_KEY_VALUE_MASK) <<
+		 SEND_CTXT_CHECK_JOB_KEY_VALUE_SHIFT);
+	/* JOB_KEY_ALLOW_PERMISSIVE is not allowed by default */
+	if (HFI1_CAP_KGET_MASK(rcd->flags, ALLOW_PERM_JKEY))
+		reg |= SEND_CTXT_CHECK_JOB_KEY_ALLOW_PERMISSIVE_SMASK;
+	write_kctxt_csr(dd, sctxt, SEND_CTXT_CHECK_JOB_KEY, reg);
+	/*
+	 * Enable send-side J_KEY integrity check, unless this is A0 h/w
+	 * (due to A0 erratum).
+	 */
+	if (!is_a0(dd)) {
+		reg = read_kctxt_csr(dd, sctxt, SEND_CTXT_CHECK_ENABLE);
+		reg |= SEND_CTXT_CHECK_ENABLE_CHECK_JOB_KEY_SMASK;
+		write_kctxt_csr(dd, sctxt, SEND_CTXT_CHECK_ENABLE, reg);
+	}
+
+	/* Enable J_KEY check on receive context. */
+	reg = RCV_KEY_CTRL_JOB_KEY_ENABLE_SMASK |
+		((jkey & RCV_KEY_CTRL_JOB_KEY_VALUE_MASK) <<
+		 RCV_KEY_CTRL_JOB_KEY_VALUE_SHIFT);
+	write_kctxt_csr(dd, ctxt, RCV_KEY_CTRL, reg);
+done:
+	return ret;
+}
+
+int hfi1_clear_ctxt_jkey(struct hfi1_devdata *dd, unsigned ctxt)
+{
+	struct hfi1_ctxtdata *rcd = dd->rcd[ctxt];
+	unsigned sctxt;
+	int ret = 0;
+	u64 reg;
+
+	if (!rcd || !rcd->sc) {
+		ret = -EINVAL;
+		goto done;
+	}
+	sctxt = rcd->sc->hw_context;
+	write_kctxt_csr(dd, sctxt, SEND_CTXT_CHECK_JOB_KEY, 0);
+	/*
+	 * Disable send-side J_KEY integrity check, unless this is A0 h/w.
+	 * This check would not have been enabled for A0 h/w, see
+	 * set_ctxt_jkey().
+	 */
+	if (!is_a0(dd)) {
+		reg = read_kctxt_csr(dd, sctxt, SEND_CTXT_CHECK_ENABLE);
+		reg &= ~SEND_CTXT_CHECK_ENABLE_CHECK_JOB_KEY_SMASK;
+		write_kctxt_csr(dd, sctxt, SEND_CTXT_CHECK_ENABLE, reg);
+	}
+	/* Turn off the J_KEY on the receive side */
+	write_kctxt_csr(dd, ctxt, RCV_KEY_CTRL, 0);
+done:
+	return ret;
+}
+
+int hfi1_set_ctxt_pkey(struct hfi1_devdata *dd, unsigned ctxt, u16 pkey)
+{
+	struct hfi1_ctxtdata *rcd;
+	unsigned sctxt;
+	int ret = 0;
+	u64 reg;
+
+	if (ctxt < dd->num_rcv_contexts)
+		rcd = dd->rcd[ctxt];
+	else {
+		ret = -EINVAL;
+		goto done;
+	}
+	if (!rcd || !rcd->sc) {
+		ret = -EINVAL;
+		goto done;
+	}
+	sctxt = rcd->sc->hw_context;
+	reg = ((u64)pkey & SEND_CTXT_CHECK_PARTITION_KEY_VALUE_MASK) <<
+		SEND_CTXT_CHECK_PARTITION_KEY_VALUE_SHIFT;
+	write_kctxt_csr(dd, sctxt, SEND_CTXT_CHECK_PARTITION_KEY, reg);
+	reg = read_kctxt_csr(dd, sctxt, SEND_CTXT_CHECK_ENABLE);
+	reg |= SEND_CTXT_CHECK_ENABLE_CHECK_PARTITION_KEY_SMASK;
+	write_kctxt_csr(dd, sctxt, SEND_CTXT_CHECK_ENABLE, reg);
+done:
+	return ret;
+}
+
+int hfi1_clear_ctxt_pkey(struct hfi1_devdata *dd, unsigned ctxt)
+{
+	struct hfi1_ctxtdata *rcd;
+	unsigned sctxt;
+	int ret = 0;
+	u64 reg;
+
+	if (ctxt < dd->num_rcv_contexts)
+		rcd = dd->rcd[ctxt];
+	else {
+		ret = -EINVAL;
+		goto done;
+	}
+	if (!rcd || !rcd->sc) {
+		ret = -EINVAL;
+		goto done;
+	}
+	sctxt = rcd->sc->hw_context;
+	reg = read_kctxt_csr(dd, sctxt, SEND_CTXT_CHECK_ENABLE);
+	reg &= ~SEND_CTXT_CHECK_ENABLE_CHECK_PARTITION_KEY_SMASK;
+	write_kctxt_csr(dd, sctxt, SEND_CTXT_CHECK_ENABLE, reg);
+	write_kctxt_csr(dd, sctxt, SEND_CTXT_CHECK_PARTITION_KEY, 0);
+done:
+	return ret;
+}
+
+/*
+ * Start doing the clean up the the chip. Our clean up happens in multiple
+ * stages and this is just the first.
+ */
+void hfi1_start_cleanup(struct hfi1_devdata *dd)
+{
+	free_cntrs(dd);
+	free_rcverr(dd);
+	clean_up_interrupts(dd);
+}
+
+#define HFI_BASE_GUID(dev) \
+	((dev)->base_guid & ~(1ULL << GUID_HFI_INDEX_SHIFT))
+
+/*
+ * Certain chip functions need to be initialized only once per asic
+ * instead of per-device. This function finds the peer device and
+ * checks whether that chip initialization needs to be done by this
+ * device.
+ */
+static void asic_should_init(struct hfi1_devdata *dd)
+{
+	unsigned long flags;
+	struct hfi1_devdata *tmp, *peer = NULL;
+
+	spin_lock_irqsave(&hfi1_devs_lock, flags);
+	/* Find our peer device */
+	list_for_each_entry(tmp, &hfi1_dev_list, list) {
+		if ((HFI_BASE_GUID(dd) == HFI_BASE_GUID(tmp)) &&
+		    dd->unit != tmp->unit) {
+			peer = tmp;
+			break;
+		}
+	}
+
+	/*
+	 * "Claim" the ASIC for initialization if it hasn't been
+	 " "claimed" yet.
+	 */
+	if (!peer || !(peer->flags & HFI1_DO_INIT_ASIC))
+		dd->flags |= HFI1_DO_INIT_ASIC;
+	spin_unlock_irqrestore(&hfi1_devs_lock, flags);
+}
+
+/**
+ * Allocate an initialize the device structure for the hfi.
+ * @dev: the pci_dev for hfi1_ib device
+ * @ent: pci_device_id struct for this dev
+ *
+ * Also allocates, initializes, and returns the devdata struct for this
+ * device instance
+ *
+ * This is global, and is called directly at init to set up the
+ * chip-specific function pointers for later use.
+ */
+struct hfi1_devdata *hfi1_init_dd(struct pci_dev *pdev,
+				  const struct pci_device_id *ent)
+{
+	struct hfi1_devdata *dd;
+	struct hfi1_pportdata *ppd;
+	u64 reg;
+	int i, ret;
+	static const char * const inames[] = { /* implementation names */
+		"RTL silicon",
+		"RTL VCS simulation",
+		"RTL FPGA emulation",
+		"Functional simulator"
+	};
+
+	dd = hfi1_alloc_devdata(pdev,
+		NUM_IB_PORTS * sizeof(struct hfi1_pportdata));
+	if (IS_ERR(dd))
+		goto bail;
+	ppd = dd->pport;
+	for (i = 0; i < dd->num_pports; i++, ppd++) {
+		int vl;
+		/* init common fields */
+		hfi1_init_pportdata(pdev, ppd, dd, 0, 1);
+		/* DC supports 4 link widths */
+		ppd->link_width_supported =
+			OPA_LINK_WIDTH_1X | OPA_LINK_WIDTH_2X |
+			OPA_LINK_WIDTH_3X | OPA_LINK_WIDTH_4X;
+		ppd->link_width_downgrade_supported =
+			ppd->link_width_supported;
+		/* start out enabling only 4X */
+		ppd->link_width_enabled = OPA_LINK_WIDTH_4X;
+		ppd->link_width_downgrade_enabled =
+					ppd->link_width_downgrade_supported;
+		/* link width active is 0 when link is down */
+		/* link width downgrade active is 0 when link is down */
+
+		if (num_vls < HFI1_MIN_VLS_SUPPORTED
+			|| num_vls > HFI1_MAX_VLS_SUPPORTED) {
+			hfi1_early_err(&pdev->dev,
+				       "Invalid num_vls %u, using %u VLs\n",
+				    num_vls, HFI1_MAX_VLS_SUPPORTED);
+			num_vls = HFI1_MAX_VLS_SUPPORTED;
+		}
+		ppd->vls_supported = num_vls;
+		ppd->vls_operational = ppd->vls_supported;
+		/* Set the default MTU. */
+		for (vl = 0; vl < num_vls; vl++)
+			dd->vld[vl].mtu = hfi1_max_mtu;
+		dd->vld[15].mtu = MAX_MAD_PACKET;
+		/*
+		 * Set the initial values to reasonable default, will be set
+		 * for real when link is up.
+		 */
+		ppd->lstate = IB_PORT_DOWN;
+		ppd->overrun_threshold = 0x4;
+		ppd->phy_error_threshold = 0xf;
+		ppd->port_crc_mode_enabled = link_crc_mask;
+		/* initialize supported LTP CRC mode */
+		ppd->port_ltp_crc_mode = cap_to_port_ltp(link_crc_mask) << 8;
+		/* initialize enabled LTP CRC mode */
+		ppd->port_ltp_crc_mode |= cap_to_port_ltp(link_crc_mask) << 4;
+		/* start in offline */
+		ppd->host_link_state = HLS_DN_OFFLINE;
+		init_vl_arb_caches(ppd);
+	}
+
+	dd->link_default = HLS_DN_POLL;
+
+	/*
+	 * Do remaining PCIe setup and save PCIe values in dd.
+	 * Any error printing is already done by the init code.
+	 * On return, we have the chip mapped.
+	 */
+	ret = hfi1_pcie_ddinit(dd, pdev, ent);
+	if (ret < 0)
+		goto bail_free;
+
+	/* verify that reads actually work, save revision for reset check */
+	dd->revision = read_csr(dd, CCE_REVISION);
+	if (dd->revision == ~(u64)0) {
+		dd_dev_err(dd, "cannot read chip CSRs\n");
+		ret = -EINVAL;
+		goto bail_cleanup;
+	}
+	dd->majrev = (dd->revision >> CCE_REVISION_CHIP_REV_MAJOR_SHIFT)
+			& CCE_REVISION_CHIP_REV_MAJOR_MASK;
+	dd->minrev = (dd->revision >> CCE_REVISION_CHIP_REV_MINOR_SHIFT)
+			& CCE_REVISION_CHIP_REV_MINOR_MASK;
+
+	/* obtain the hardware ID - NOT related to unit, which is a
+	   software enumeration */
+	reg = read_csr(dd, CCE_REVISION2);
+	dd->hfi1_id = (reg >> CCE_REVISION2_HFI_ID_SHIFT)
+					& CCE_REVISION2_HFI_ID_MASK;
+	/* the variable size will remove unwanted bits */
+	dd->icode = reg >> CCE_REVISION2_IMPL_CODE_SHIFT;
+	dd->irev = reg >> CCE_REVISION2_IMPL_REVISION_SHIFT;
+	dd_dev_info(dd, "Implementation: %s, revision 0x%x\n",
+		dd->icode < ARRAY_SIZE(inames) ? inames[dd->icode] : "unknown",
+		(int)dd->irev);
+
+	/* speeds the hardware can support */
+	dd->pport->link_speed_supported = OPA_LINK_SPEED_25G;
+	/* speeds allowed to run at */
+	dd->pport->link_speed_enabled = dd->pport->link_speed_supported;
+	/* give a reasonable active value, will be set on link up */
+	dd->pport->link_speed_active = OPA_LINK_SPEED_25G;
+
+	dd->chip_rcv_contexts = read_csr(dd, RCV_CONTEXTS);
+	dd->chip_send_contexts = read_csr(dd, SEND_CONTEXTS);
+	dd->chip_sdma_engines = read_csr(dd, SEND_DMA_ENGINES);
+	dd->chip_pio_mem_size = read_csr(dd, SEND_PIO_MEM_SIZE);
+	dd->chip_sdma_mem_size = read_csr(dd, SEND_DMA_MEM_SIZE);
+	/* fix up link widths for emulation _p */
+	ppd = dd->pport;
+	if (dd->icode == ICODE_FPGA_EMULATION && is_emulator_p(dd)) {
+		ppd->link_width_supported =
+			ppd->link_width_enabled =
+			ppd->link_width_downgrade_supported =
+			ppd->link_width_downgrade_enabled =
+				OPA_LINK_WIDTH_1X;
+	}
+	/* insure num_vls isn't larger than number of sdma engines */
+	if (HFI1_CAP_IS_KSET(SDMA) && num_vls > dd->chip_sdma_engines) {
+		dd_dev_err(dd, "num_vls %u too large, using %u VLs\n",
+				num_vls, HFI1_MAX_VLS_SUPPORTED);
+		ppd->vls_supported = num_vls = HFI1_MAX_VLS_SUPPORTED;
+		ppd->vls_operational = ppd->vls_supported;
+	}
+
+	/*
+	 * Convert the ns parameter to the 64 * cclocks used in the CSR.
+	 * Limit the max if larger than the field holds.  If timeout is
+	 * non-zero, then the calculated field will be at least 1.
+	 *
+	 * Must be after icode is set up - the cclock rate depends
+	 * on knowing the hardware being used.
+	 */
+	dd->rcv_intr_timeout_csr = ns_to_cclock(dd, rcv_intr_timeout) / 64;
+	if (dd->rcv_intr_timeout_csr >
+			RCV_AVAIL_TIME_OUT_TIME_OUT_RELOAD_MASK)
+		dd->rcv_intr_timeout_csr =
+			RCV_AVAIL_TIME_OUT_TIME_OUT_RELOAD_MASK;
+	else if (dd->rcv_intr_timeout_csr == 0 && rcv_intr_timeout)
+		dd->rcv_intr_timeout_csr = 1;
+
+	/* obtain chip sizes, reset chip CSRs */
+	init_chip(dd);
+
+	/* read in the PCIe link speed information */
+	ret = pcie_speeds(dd);
+	if (ret)
+		goto bail_cleanup;
+
+	/* needs to be done before we look for the peer device */
+	read_guid(dd);
+
+	asic_should_init(dd);
+
+	/* read in firmware */
+	ret = hfi1_firmware_init(dd);
+	if (ret)
+		goto bail_cleanup;
+
+	/*
+	 * In general, the PCIe Gen3 transition must occur after the
+	 * chip has been idled (so it won't initiate any PCIe transactions
+	 * e.g. an interrupt) and before the driver changes any registers
+	 * (the transition will reset the registers).
+	 *
+	 * In particular, place this call after:
+	 * - init_chip()     - the chip will not initiate any PCIe transactions
+	 * - pcie_speeds()   - reads the current link speed
+	 * - hfi1_firmware_init() - the needed firmware is ready to be
+	 *			    downloaded
+	 */
+	ret = do_pcie_gen3_transition(dd);
+	if (ret)
+		goto bail_cleanup;
+
+	/* start setting dd values and adjusting CSRs */
+	init_early_variables(dd);
+
+	parse_platform_config(dd);
+
+	/* add board names as they are defined */
+	dd->boardname = kmalloc(64, GFP_KERNEL);
+	if (!dd->boardname)
+		goto bail_cleanup;
+	snprintf(dd->boardname, 64, "Board ID 0x%llx",
+		 dd->revision >> CCE_REVISION_BOARD_ID_LOWER_NIBBLE_SHIFT
+		    & CCE_REVISION_BOARD_ID_LOWER_NIBBLE_MASK);
+
+	snprintf(dd->boardversion, BOARD_VERS_MAX,
+		 "ChipABI %u.%u, %s, ChipRev %u.%u, SW Compat %llu\n",
+		 HFI1_CHIP_VERS_MAJ, HFI1_CHIP_VERS_MIN,
+		 dd->boardname,
+		 (u32)dd->majrev,
+		 (u32)dd->minrev,
+		 (dd->revision >> CCE_REVISION_SW_SHIFT)
+		    & CCE_REVISION_SW_MASK);
+
+	ret = set_up_context_variables(dd);
+	if (ret)
+		goto bail_cleanup;
+
+	/* set initial RXE CSRs */
+	init_rxe(dd);
+	/* set initial TXE CSRs */
+	init_txe(dd);
+	/* set initial non-RXE, non-TXE CSRs */
+	init_other(dd);
+	/* set up KDETH QP prefix in both RX and TX CSRs */
+	init_kdeth_qp(dd);
+
+	/* send contexts must be set up before receive contexts */
+	ret = init_send_contexts(dd);
+	if (ret)
+		goto bail_cleanup;
+
+	ret = hfi1_create_ctxts(dd);
+	if (ret)
+		goto bail_cleanup;
+
+	dd->rcvhdrsize = DEFAULT_RCVHDRSIZE;
+	/*
+	 * rcd[0] is guaranteed to be valid by this point. Also, all
+	 * context are using the same value, as per the module parameter.
+	 */
+	dd->rhf_offset = dd->rcd[0]->rcvhdrqentsize - sizeof(u64) / sizeof(u32);
+
+	ret = init_pervl_scs(dd);
+	if (ret)
+		goto bail_cleanup;
+
+	/* sdma init */
+	for (i = 0; i < dd->num_pports; ++i) {
+		ret = sdma_init(dd, i);
+		if (ret)
+			goto bail_cleanup;
+	}
+
+	/* use contexts created by hfi1_create_ctxts */
+	ret = set_up_interrupts(dd);
+	if (ret)
+		goto bail_cleanup;
+
+	/* set up LCB access - must be after set_up_interrupts() */
+	init_lcb_access(dd);
+
+	snprintf(dd->serial, SERIAL_MAX, "0x%08llx\n",
+		 dd->base_guid & 0xFFFFFF);
+
+	dd->oui1 = dd->base_guid >> 56 & 0xFF;
+	dd->oui2 = dd->base_guid >> 48 & 0xFF;
+	dd->oui3 = dd->base_guid >> 40 & 0xFF;
+
+	ret = load_firmware(dd); /* asymmetric with dispose_firmware() */
+	if (ret)
+		goto bail_clear_intr;
+	check_fabric_firmware_versions(dd);
+
+	thermal_init(dd);
+
+	ret = init_cntrs(dd);
+	if (ret)
+		goto bail_clear_intr;
+
+	ret = init_rcverr(dd);
+	if (ret)
+		goto bail_free_cntrs;
+
+	ret = eprom_init(dd);
+	if (ret)
+		goto bail_free_rcverr;
+
+	goto bail;
+
+bail_free_rcverr:
+	free_rcverr(dd);
+bail_free_cntrs:
+	free_cntrs(dd);
+bail_clear_intr:
+	clean_up_interrupts(dd);
+bail_cleanup:
+	hfi1_pcie_ddcleanup(dd);
+bail_free:
+	hfi1_free_devdata(dd);
+	dd = ERR_PTR(ret);
+bail:
+	return dd;
+}
+
+static u16 delay_cycles(struct hfi1_pportdata *ppd, u32 desired_egress_rate,
+			u32 dw_len)
+{
+	u32 delta_cycles;
+	u32 current_egress_rate = ppd->current_egress_rate;
+	/* rates here are in units of 10^6 bits/sec */
+
+	if (desired_egress_rate == -1)
+		return 0; /* shouldn't happen */
+
+	if (desired_egress_rate >= current_egress_rate)
+		return 0; /* we can't help go faster, only slower */
+
+	delta_cycles = egress_cycles(dw_len * 4, desired_egress_rate) -
+			egress_cycles(dw_len * 4, current_egress_rate);
+
+	return (u16)delta_cycles;
+}
+
+
+/**
+ * create_pbc - build a pbc for transmission
+ * @flags: special case flags or-ed in built pbc
+ * @srate: static rate
+ * @vl: vl
+ * @dwlen: dword length (header words + data words + pbc words)
+ *
+ * Create a PBC with the given flags, rate, VL, and length.
+ *
+ * NOTE: The PBC created will not insert any HCRC - all callers but one are
+ * for verbs, which does not use this PSM feature.  The lone other caller
+ * is for the diagnostic interface which calls this if the user does not
+ * supply their own PBC.
+ */
+u64 create_pbc(struct hfi1_pportdata *ppd, u64 flags, int srate_mbs, u32 vl,
+	       u32 dw_len)
+{
+	u64 pbc, delay = 0;
+
+	if (unlikely(srate_mbs))
+		delay = delay_cycles(ppd, srate_mbs, dw_len);
+
+	pbc = flags
+		| (delay << PBC_STATIC_RATE_CONTROL_COUNT_SHIFT)
+		| ((u64)PBC_IHCRC_NONE << PBC_INSERT_HCRC_SHIFT)
+		| (vl & PBC_VL_MASK) << PBC_VL_SHIFT
+		| (dw_len & PBC_LENGTH_DWS_MASK)
+			<< PBC_LENGTH_DWS_SHIFT;
+
+	return pbc;
+}
+
+#define SBUS_THERMAL    0x4f
+#define SBUS_THERM_MONITOR_MODE 0x1
+
+#define THERM_FAILURE(dev, ret, reason) \
+	dd_dev_err((dd),						\
+		   "Thermal sensor initialization failed: %s (%d)\n",	\
+		   (reason), (ret))
+
+/*
+ * Initialize the Avago Thermal sensor.
+ *
+ * After initialization, enable polling of thermal sensor through
+ * SBus interface. In order for this to work, the SBus Master
+ * firmware has to be loaded due to the fact that the HW polling
+ * logic uses SBus interrupts, which are not supported with
+ * default firmware. Otherwise, no data will be returned through
+ * the ASIC_STS_THERM CSR.
+ */
+static int thermal_init(struct hfi1_devdata *dd)
+{
+	int ret = 0;
+
+	if (dd->icode != ICODE_RTL_SILICON ||
+	    !(dd->flags & HFI1_DO_INIT_ASIC))
+		return ret;
+
+	acquire_hw_mutex(dd);
+	dd_dev_info(dd, "Initializing thermal sensor\n");
+	/* Thermal Sensor Initialization */
+	/*    Step 1: Reset the Thermal SBus Receiver */
+	ret = sbus_request_slow(dd, SBUS_THERMAL, 0x0,
+				RESET_SBUS_RECEIVER, 0);
+	if (ret) {
+		THERM_FAILURE(dd, ret, "Bus Reset");
+		goto done;
+	}
+	/*    Step 2: Set Reset bit in Thermal block */
+	ret = sbus_request_slow(dd, SBUS_THERMAL, 0x0,
+				WRITE_SBUS_RECEIVER, 0x1);
+	if (ret) {
+		THERM_FAILURE(dd, ret, "Therm Block Reset");
+		goto done;
+	}
+	/*    Step 3: Write clock divider value (100MHz -> 2MHz) */
+	ret = sbus_request_slow(dd, SBUS_THERMAL, 0x1,
+				WRITE_SBUS_RECEIVER, 0x32);
+	if (ret) {
+		THERM_FAILURE(dd, ret, "Write Clock Div");
+		goto done;
+	}
+	/*    Step 4: Select temperature mode */
+	ret = sbus_request_slow(dd, SBUS_THERMAL, 0x3,
+				WRITE_SBUS_RECEIVER,
+				SBUS_THERM_MONITOR_MODE);
+	if (ret) {
+		THERM_FAILURE(dd, ret, "Write Mode Sel");
+		goto done;
+	}
+	/*    Step 5: De-assert block reset and start conversion */
+	ret = sbus_request_slow(dd, SBUS_THERMAL, 0x0,
+				WRITE_SBUS_RECEIVER, 0x2);
+	if (ret) {
+		THERM_FAILURE(dd, ret, "Write Reset Deassert");
+		goto done;
+	}
+	/*    Step 5.1: Wait for first conversion (21.5ms per spec) */
+	msleep(22);
+
+	/* Enable polling of thermal readings */
+	write_csr(dd, ASIC_CFG_THERM_POLL_EN, 0x1);
+done:
+	release_hw_mutex(dd);
+	return ret;
+}
+
+static void handle_temp_err(struct hfi1_devdata *dd)
+{
+	struct hfi1_pportdata *ppd = &dd->pport[0];
+	/*
+	 * Thermal Critical Interrupt
+	 * Put the device into forced freeze mode, take link down to
+	 * offline, and put DC into reset.
+	 */
+	dd_dev_emerg(dd,
+		     "Critical temperature reached! Forcing device into freeze mode!\n");
+	dd->flags |= HFI1_FORCED_FREEZE;
+	start_freeze_handling(ppd, FREEZE_SELF|FREEZE_ABORT);
+	/*
+	 * Shut DC down as much and as quickly as possible.
+	 *
+	 * Step 1: Take the link down to OFFLINE. This will cause the
+	 *         8051 to put the Serdes in reset. However, we don't want to
+	 *         go through the entire link state machine since we want to
+	 *         shutdown ASAP. Furthermore, this is not a graceful shutdown
+	 *         but rather an attempt to save the chip.
+	 *         Code below is almost the same as quiet_serdes() but avoids
+	 *         all the extra work and the sleeps.
+	 */
+	ppd->driver_link_ready = 0;
+	ppd->link_enabled = 0;
+	set_physical_link_state(dd, PLS_OFFLINE |
+				(OPA_LINKDOWN_REASON_SMA_DISABLED << 8));
+	/*
+	 * Step 2: Shutdown LCB and 8051
+	 *         After shutdown, do not restore DC_CFG_RESET value.
+	 */
+	dc_shutdown(dd);
+}
diff --git a/drivers/staging/rdma/hfi1/chip.h b/drivers/staging/rdma/hfi1/chip.h
new file mode 100644
index 000000000000..f89a432c7334
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/chip.h
@@ -0,0 +1,1035 @@
+#ifndef _CHIP_H
+#define _CHIP_H
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+/*
+ * This file contains all of the defines that is specific to the HFI chip
+ */
+
+/* sizes */
+#define CCE_NUM_MSIX_VECTORS 256
+#define CCE_NUM_INT_CSRS 12
+#define CCE_NUM_INT_MAP_CSRS 96
+#define NUM_INTERRUPT_SOURCES 768
+#define RXE_NUM_CONTEXTS 160
+#define RXE_PER_CONTEXT_SIZE 0x1000	/* 4k */
+#define RXE_NUM_TID_FLOWS 32
+#define RXE_NUM_DATA_VL 8
+#define TXE_NUM_CONTEXTS 160
+#define TXE_NUM_SDMA_ENGINES 16
+#define NUM_CONTEXTS_PER_SET 8
+#define VL_ARB_HIGH_PRIO_TABLE_SIZE 16
+#define VL_ARB_LOW_PRIO_TABLE_SIZE 16
+#define VL_ARB_TABLE_SIZE 16
+#define TXE_NUM_32_BIT_COUNTER 7
+#define TXE_NUM_64_BIT_COUNTER 30
+#define TXE_NUM_DATA_VL 8
+#define TXE_PIO_SIZE (32 * 0x100000)	/* 32 MB */
+#define PIO_BLOCK_SIZE 64			/* bytes */
+#define SDMA_BLOCK_SIZE 64			/* bytes */
+#define RCV_BUF_BLOCK_SIZE 64               /* bytes */
+#define PIO_CMASK 0x7ff	/* counter mask for free and fill counters */
+#define MAX_EAGER_ENTRIES    2048	/* max receive eager entries */
+#define MAX_TID_PAIR_ENTRIES 1024	/* max receive expected pairs */
+/* Virtual? Allocation Unit, defined as AU = 8*2^vAU, 64 bytes, AU is fixed
+   at 64 bytes for all generation one devices */
+#define CM_VAU 3
+/* HFI link credit count, AKA receive buffer depth (RBUF_DEPTH) */
+#define CM_GLOBAL_CREDITS 0x940
+/* Number of PKey entries in the HW */
+#define MAX_PKEY_VALUES 16
+
+#include "chip_registers.h"
+
+#define RXE_PER_CONTEXT_USER   (RXE + RXE_PER_CONTEXT_OFFSET)
+#define TXE_PIO_SEND (TXE + TXE_PIO_SEND_OFFSET)
+
+/* PBC flags */
+#define PBC_INTR		(1ull << 31)
+#define PBC_DC_INFO_SHIFT	(30)
+#define PBC_DC_INFO		(1ull << PBC_DC_INFO_SHIFT)
+#define PBC_TEST_EBP	(1ull << 29)
+#define PBC_PACKET_BYPASS	(1ull << 28)
+#define PBC_CREDIT_RETURN	(1ull << 25)
+#define PBC_INSERT_BYPASS_ICRC (1ull << 24)
+#define PBC_TEST_BAD_ICRC	(1ull << 23)
+#define PBC_FECN		(1ull << 22)
+
+/* PbcInsertHcrc field settings */
+#define PBC_IHCRC_LKDETH 0x0	/* insert @ local KDETH offset */
+#define PBC_IHCRC_GKDETH 0x1	/* insert @ global KDETH offset */
+#define PBC_IHCRC_NONE   0x2	/* no HCRC inserted */
+
+/* PBC fields */
+#define PBC_STATIC_RATE_CONTROL_COUNT_SHIFT 32
+#define PBC_STATIC_RATE_CONTROL_COUNT_MASK 0xffffull
+#define PBC_STATIC_RATE_CONTROL_COUNT_SMASK \
+	(PBC_STATIC_RATE_CONTROL_COUNT_MASK << \
+	PBC_STATIC_RATE_CONTROL_COUNT_SHIFT)
+
+#define PBC_INSERT_HCRC_SHIFT 26
+#define PBC_INSERT_HCRC_MASK 0x3ull
+#define PBC_INSERT_HCRC_SMASK \
+	(PBC_INSERT_HCRC_MASK << PBC_INSERT_HCRC_SHIFT)
+
+#define PBC_VL_SHIFT 12
+#define PBC_VL_MASK 0xfull
+#define PBC_VL_SMASK (PBC_VL_MASK << PBC_VL_SHIFT)
+
+#define PBC_LENGTH_DWS_SHIFT 0
+#define PBC_LENGTH_DWS_MASK 0xfffull
+#define PBC_LENGTH_DWS_SMASK \
+	(PBC_LENGTH_DWS_MASK << PBC_LENGTH_DWS_SHIFT)
+
+/* Credit Return Fields */
+#define CR_COUNTER_SHIFT 0
+#define CR_COUNTER_MASK 0x7ffull
+#define CR_COUNTER_SMASK (CR_COUNTER_MASK << CR_COUNTER_SHIFT)
+
+#define CR_STATUS_SHIFT 11
+#define CR_STATUS_MASK 0x1ull
+#define CR_STATUS_SMASK (CR_STATUS_MASK << CR_STATUS_SHIFT)
+
+#define CR_CREDIT_RETURN_DUE_TO_PBC_SHIFT 12
+#define CR_CREDIT_RETURN_DUE_TO_PBC_MASK 0x1ull
+#define CR_CREDIT_RETURN_DUE_TO_PBC_SMASK \
+	(CR_CREDIT_RETURN_DUE_TO_PBC_MASK << \
+	CR_CREDIT_RETURN_DUE_TO_PBC_SHIFT)
+
+#define CR_CREDIT_RETURN_DUE_TO_THRESHOLD_SHIFT 13
+#define CR_CREDIT_RETURN_DUE_TO_THRESHOLD_MASK 0x1ull
+#define CR_CREDIT_RETURN_DUE_TO_THRESHOLD_SMASK \
+	(CR_CREDIT_RETURN_DUE_TO_THRESHOLD_MASK << \
+	CR_CREDIT_RETURN_DUE_TO_THRESHOLD_SHIFT)
+
+#define CR_CREDIT_RETURN_DUE_TO_ERR_SHIFT 14
+#define CR_CREDIT_RETURN_DUE_TO_ERR_MASK 0x1ull
+#define CR_CREDIT_RETURN_DUE_TO_ERR_SMASK \
+	(CR_CREDIT_RETURN_DUE_TO_ERR_MASK << \
+	CR_CREDIT_RETURN_DUE_TO_ERR_SHIFT)
+
+#define CR_CREDIT_RETURN_DUE_TO_FORCE_SHIFT 15
+#define CR_CREDIT_RETURN_DUE_TO_FORCE_MASK 0x1ull
+#define CR_CREDIT_RETURN_DUE_TO_FORCE_SMASK \
+	(CR_CREDIT_RETURN_DUE_TO_FORCE_MASK << \
+	CR_CREDIT_RETURN_DUE_TO_FORCE_SHIFT)
+
+/* interrupt source numbers */
+#define IS_GENERAL_ERR_START	  0
+#define IS_SDMAENG_ERR_START	 16
+#define IS_SENDCTXT_ERR_START	 32
+#define IS_SDMA_START		192 /* includes SDmaProgress,SDmaIdle */
+#define IS_VARIOUS_START		240
+#define IS_DC_START			248
+#define IS_RCVAVAIL_START		256
+#define IS_RCVURGENT_START		416
+#define IS_SENDCREDIT_START		576
+#define IS_RESERVED_START		736
+#define IS_MAX_SOURCES		768
+
+/* derived interrupt source values */
+#define IS_GENERAL_ERR_END		IS_SDMAENG_ERR_START
+#define IS_SDMAENG_ERR_END		IS_SENDCTXT_ERR_START
+#define IS_SENDCTXT_ERR_END		IS_SDMA_START
+#define IS_SDMA_END			IS_VARIOUS_START
+#define IS_VARIOUS_END		IS_DC_START
+#define IS_DC_END			IS_RCVAVAIL_START
+#define IS_RCVAVAIL_END		IS_RCVURGENT_START
+#define IS_RCVURGENT_END		IS_SENDCREDIT_START
+#define IS_SENDCREDIT_END		IS_RESERVED_START
+#define IS_RESERVED_END		IS_MAX_SOURCES
+
+/* absolute interrupt numbers for QSFP1Int and QSFP2Int */
+#define QSFP1_INT		242
+#define QSFP2_INT		243
+
+/* DCC_CFG_PORT_CONFIG logical link states */
+#define LSTATE_DOWN    0x1
+#define LSTATE_INIT    0x2
+#define LSTATE_ARMED   0x3
+#define LSTATE_ACTIVE  0x4
+
+/* DC8051_STS_CUR_STATE port values (physical link states) */
+#define PLS_DISABLED			   0x30
+#define PLS_OFFLINE				   0x90
+#define PLS_OFFLINE_QUIET			   0x90
+#define PLS_OFFLINE_PLANNED_DOWN_INFORM	   0x91
+#define PLS_OFFLINE_READY_TO_QUIET_LT	   0x92
+#define PLS_OFFLINE_REPORT_FAILURE		   0x93
+#define PLS_OFFLINE_READY_TO_QUIET_BCC	   0x94
+#define PLS_POLLING				   0x20
+#define PLS_POLLING_QUIET			   0x20
+#define PLS_POLLING_ACTIVE			   0x21
+#define PLS_CONFIGPHY			   0x40
+#define PLS_CONFIGPHY_DEBOUCE		   0x40
+#define PLS_CONFIGPHY_ESTCOMM		   0x41
+#define PLS_CONFIGPHY_ESTCOMM_TXRX_HUNT	   0x42
+#define PLS_CONFIGPHY_ESTcOMM_LOCAL_COMPLETE   0x43
+#define PLS_CONFIGPHY_OPTEQ			   0x44
+#define PLS_CONFIGPHY_OPTEQ_OPTIMIZING	   0x44
+#define PLS_CONFIGPHY_OPTEQ_LOCAL_COMPLETE	   0x45
+#define PLS_CONFIGPHY_VERIFYCAP		   0x46
+#define PLS_CONFIGPHY_VERIFYCAP_EXCHANGE	   0x46
+#define PLS_CONFIGPHY_VERIFYCAP_LOCAL_COMPLETE 0x47
+#define PLS_CONFIGLT			   0x48
+#define PLS_CONFIGLT_CONFIGURE		   0x48
+#define PLS_CONFIGLT_LINK_TRANSFER_ACTIVE	   0x49
+#define PLS_LINKUP				   0x50
+#define PLS_PHYTEST				   0xB0
+#define PLS_INTERNAL_SERDES_LOOPBACK	   0xe1
+#define PLS_QUICK_LINKUP			   0xe2
+
+/* DC_DC8051_CFG_HOST_CMD_0.REQ_TYPE - 8051 host commands */
+#define HCMD_LOAD_CONFIG_DATA  0x01
+#define HCMD_READ_CONFIG_DATA  0x02
+#define HCMD_CHANGE_PHY_STATE  0x03
+#define HCMD_SEND_LCB_IDLE_MSG 0x04
+#define HCMD_MISC		   0x05
+#define HCMD_READ_LCB_IDLE_MSG 0x06
+#define HCMD_READ_LCB_CSR      0x07
+#define HCMD_INTERFACE_TEST	   0xff
+
+/* DC_DC8051_CFG_HOST_CMD_1.RETURN_CODE - 8051 host command return */
+#define HCMD_SUCCESS 2
+
+/* DC_DC8051_DBG_ERR_INFO_SET_BY_8051.ERROR - error flags */
+#define SPICO_ROM_FAILED		    (1 <<  0)
+#define UNKNOWN_FRAME		    (1 <<  1)
+#define TARGET_BER_NOT_MET		    (1 <<  2)
+#define FAILED_SERDES_INTERNAL_LOOPBACK (1 <<  3)
+#define FAILED_SERDES_INIT		    (1 <<  4)
+#define FAILED_LNI_POLLING		    (1 <<  5)
+#define FAILED_LNI_DEBOUNCE		    (1 <<  6)
+#define FAILED_LNI_ESTBCOMM		    (1 <<  7)
+#define FAILED_LNI_OPTEQ		    (1 <<  8)
+#define FAILED_LNI_VERIFY_CAP1	    (1 <<  9)
+#define FAILED_LNI_VERIFY_CAP2	    (1 << 10)
+#define FAILED_LNI_CONFIGLT		    (1 << 11)
+
+#define FAILED_LNI (FAILED_LNI_POLLING | FAILED_LNI_DEBOUNCE \
+			| FAILED_LNI_ESTBCOMM | FAILED_LNI_OPTEQ \
+			| FAILED_LNI_VERIFY_CAP1 \
+			| FAILED_LNI_VERIFY_CAP2 \
+			| FAILED_LNI_CONFIGLT)
+
+/* DC_DC8051_DBG_ERR_INFO_SET_BY_8051.HOST_MSG - host message flags */
+#define HOST_REQ_DONE	   (1 << 0)
+#define BC_PWR_MGM_MSG	   (1 << 1)
+#define BC_SMA_MSG		   (1 << 2)
+#define BC_BCC_UNKOWN_MSG	   (1 << 3)
+#define BC_IDLE_UNKNOWN_MSG	   (1 << 4)
+#define EXT_DEVICE_CFG_REQ	   (1 << 5)
+#define VERIFY_CAP_FRAME	   (1 << 6)
+#define LINKUP_ACHIEVED	   (1 << 7)
+#define LINK_GOING_DOWN	   (1 << 8)
+#define LINK_WIDTH_DOWNGRADED  (1 << 9)
+
+/* DC_DC8051_CFG_EXT_DEV_1.REQ_TYPE - 8051 host requests */
+#define HREQ_LOAD_CONFIG	0x01
+#define HREQ_SAVE_CONFIG	0x02
+#define HREQ_READ_CONFIG	0x03
+#define HREQ_SET_TX_EQ_ABS	0x04
+#define HREQ_SET_TX_EQ_REL	0x05
+#define HREQ_ENABLE		0x06
+#define HREQ_CONFIG_DONE	0xfe
+#define HREQ_INTERFACE_TEST	0xff
+
+/* DC_DC8051_CFG_EXT_DEV_0.RETURN_CODE - 8051 host request return codes */
+#define HREQ_INVALID		0x01
+#define HREQ_SUCCESS		0x02
+#define HREQ_NOT_SUPPORTED		0x03
+#define HREQ_FEATURE_NOT_SUPPORTED	0x04 /* request specific feature */
+#define HREQ_REQUEST_REJECTED	0xfe
+#define HREQ_EXECUTION_ONGOING	0xff
+
+/* MISC host command functions */
+#define HCMD_MISC_REQUEST_LCB_ACCESS 0x1
+#define HCMD_MISC_GRANT_LCB_ACCESS   0x2
+
+/* idle flit message types */
+#define IDLE_PHYSICAL_LINK_MGMT 0x1
+#define IDLE_CRU		    0x2
+#define IDLE_SMA		    0x3
+#define IDLE_POWER_MGMT	    0x4
+
+/* idle flit message send fields (both send and read) */
+#define IDLE_PAYLOAD_MASK 0xffffffffffull /* 40 bits */
+#define IDLE_PAYLOAD_SHIFT 8
+#define IDLE_MSG_TYPE_MASK 0xf
+#define IDLE_MSG_TYPE_SHIFT 0
+
+/* idle flit message read fields */
+#define READ_IDLE_MSG_TYPE_MASK 0xf
+#define READ_IDLE_MSG_TYPE_SHIFT 0
+
+/* SMA idle flit payload commands */
+#define SMA_IDLE_ARM	1
+#define SMA_IDLE_ACTIVE 2
+
+/* DC_DC8051_CFG_MODE.GENERAL bits */
+#define DISABLE_SELF_GUID_CHECK 0x2
+
+/*
+ * Eager buffer minimum and maximum sizes supported by the hardware.
+ * All power-of-two sizes in between are supported as well.
+ * MAX_EAGER_BUFFER_TOTAL is the maximum size of memory
+ * allocatable for Eager buffer to a single context. All others
+ * are limits for the RcvArray entries.
+ */
+#define MIN_EAGER_BUFFER       (4 * 1024)
+#define MAX_EAGER_BUFFER       (256 * 1024)
+#define MAX_EAGER_BUFFER_TOTAL (64 * (1 << 20)) /* max per ctxt 64MB */
+#define MAX_EXPECTED_BUFFER    (2048 * 1024)
+
+/*
+ * Receive expected base and count and eager base and count increment -
+ * the CSR fields hold multiples of this value.
+ */
+#define RCV_SHIFT 3
+#define RCV_INCREMENT (1 << RCV_SHIFT)
+
+/*
+ * Receive header queue entry increment - the CSR holds multiples of
+ * this value.
+ */
+#define HDRQ_SIZE_SHIFT 5
+#define HDRQ_INCREMENT (1 << HDRQ_SIZE_SHIFT)
+
+/*
+ * Freeze handling flags
+ */
+#define FREEZE_ABORT     0x01	/* do not do recovery */
+#define FREEZE_SELF	     0x02	/* initiate the freeze */
+#define FREEZE_LINK_DOWN 0x04	/* link is down */
+
+/*
+ * Chip implementation codes.
+ */
+#define ICODE_RTL_SILICON		0x00
+#define ICODE_RTL_VCS_SIMULATION	0x01
+#define ICODE_FPGA_EMULATION	0x02
+#define ICODE_FUNCTIONAL_SIMULATOR	0x03
+
+/*
+ * 8051 data memory size.
+ */
+#define DC8051_DATA_MEM_SIZE 0x1000
+
+/*
+ * 8051 firmware registers
+ */
+#define NUM_GENERAL_FIELDS 0x17
+#define NUM_LANE_FIELDS    0x8
+
+/* 8051 general register Field IDs */
+#define TX_SETTINGS		     0x06
+#define VERIFY_CAP_LOCAL_PHY	     0x07
+#define VERIFY_CAP_LOCAL_FABRIC	     0x08
+#define VERIFY_CAP_LOCAL_LINK_WIDTH  0x09
+#define LOCAL_DEVICE_ID		     0x0a
+#define LOCAL_LNI_INFO		     0x0c
+#define REMOTE_LNI_INFO              0x0d
+#define MISC_STATUS		     0x0e
+#define VERIFY_CAP_REMOTE_PHY	     0x0f
+#define VERIFY_CAP_REMOTE_FABRIC     0x10
+#define VERIFY_CAP_REMOTE_LINK_WIDTH 0x11
+#define LAST_LOCAL_STATE_COMPLETE    0x12
+#define LAST_REMOTE_STATE_COMPLETE   0x13
+#define LINK_QUALITY_INFO            0x14
+#define REMOTE_DEVICE_ID	     0x15
+
+/* Lane ID for general configuration registers */
+#define GENERAL_CONFIG 4
+
+/* LOAD_DATA 8051 command shifts and fields */
+#define LOAD_DATA_FIELD_ID_SHIFT 40
+#define LOAD_DATA_FIELD_ID_MASK 0xfull
+#define LOAD_DATA_LANE_ID_SHIFT 32
+#define LOAD_DATA_LANE_ID_MASK 0xfull
+#define LOAD_DATA_DATA_SHIFT   0x0
+#define LOAD_DATA_DATA_MASK   0xffffffffull
+
+/* READ_DATA 8051 command shifts and fields */
+#define READ_DATA_FIELD_ID_SHIFT 40
+#define READ_DATA_FIELD_ID_MASK 0xffull
+#define READ_DATA_LANE_ID_SHIFT 32
+#define READ_DATA_LANE_ID_MASK 0xffull
+#define READ_DATA_DATA_SHIFT   0x0
+#define READ_DATA_DATA_MASK   0xffffffffull
+
+/* TX settings fields */
+#define ENABLE_LANE_TX_SHIFT		0
+#define ENABLE_LANE_TX_MASK		0xff
+#define TX_POLARITY_INVERSION_SHIFT	8
+#define TX_POLARITY_INVERSION_MASK	0xff
+#define RX_POLARITY_INVERSION_SHIFT	16
+#define RX_POLARITY_INVERSION_MASK	0xff
+#define MAX_RATE_SHIFT			24
+#define MAX_RATE_MASK			0xff
+
+/* verify capability PHY fields */
+#define CONTINIOUS_REMOTE_UPDATE_SUPPORT_SHIFT	0x4
+#define CONTINIOUS_REMOTE_UPDATE_SUPPORT_MASK	0x1
+#define POWER_MANAGEMENT_SHIFT			0x0
+#define POWER_MANAGEMENT_MASK			0xf
+
+/* 8051 lane register Field IDs */
+#define SPICO_FW_VERSION 0x7	/* SPICO firmware version */
+
+/* SPICO firmware version fields */
+#define SPICO_ROM_VERSION_SHIFT 0
+#define SPICO_ROM_VERSION_MASK 0xffff
+#define SPICO_ROM_PROD_ID_SHIFT 16
+#define SPICO_ROM_PROD_ID_MASK 0xffff
+
+/* verify capability fabric fields */
+#define VAU_SHIFT	0
+#define VAU_MASK	0x0007
+#define Z_SHIFT		3
+#define Z_MASK		0x0001
+#define VCU_SHIFT	4
+#define VCU_MASK	0x0007
+#define VL15BUF_SHIFT	8
+#define VL15BUF_MASK	0x0fff
+#define CRC_SIZES_SHIFT 20
+#define CRC_SIZES_MASK	0x7
+
+/* verify capability local link width fields */
+#define LINK_WIDTH_SHIFT 0		/* also for remote link width */
+#define LINK_WIDTH_MASK 0xffff		/* also for remote link width */
+#define LOCAL_FLAG_BITS_SHIFT 16
+#define LOCAL_FLAG_BITS_MASK 0xff
+#define MISC_CONFIG_BITS_SHIFT 24
+#define MISC_CONFIG_BITS_MASK 0xff
+
+/* verify capability remote link width fields */
+#define REMOTE_TX_RATE_SHIFT 16
+#define REMOTE_TX_RATE_MASK 0xff
+
+/* LOCAL_DEVICE_ID fields */
+#define LOCAL_DEVICE_REV_SHIFT 0
+#define LOCAL_DEVICE_REV_MASK 0xff
+#define LOCAL_DEVICE_ID_SHIFT 8
+#define LOCAL_DEVICE_ID_MASK 0xffff
+
+/* REMOTE_DEVICE_ID fields */
+#define REMOTE_DEVICE_REV_SHIFT 0
+#define REMOTE_DEVICE_REV_MASK 0xff
+#define REMOTE_DEVICE_ID_SHIFT 8
+#define REMOTE_DEVICE_ID_MASK 0xffff
+
+/* local LNI link width fields */
+#define ENABLE_LANE_RX_SHIFT 16
+#define ENABLE_LANE_RX_MASK  0xff
+
+/* mask, shift for reading 'mgmt_enabled' value from REMOTE_LNI_INFO field */
+#define MGMT_ALLOWED_SHIFT 23
+#define MGMT_ALLOWED_MASK 0x1
+
+/* mask, shift for 'link_quality' within LINK_QUALITY_INFO field */
+#define LINK_QUALITY_SHIFT 24
+#define LINK_QUALITY_MASK  0x7
+
+/*
+ * mask, shift for reading 'planned_down_remote_reason_code'
+ * from LINK_QUALITY_INFO field
+ */
+#define DOWN_REMOTE_REASON_SHIFT 16
+#define DOWN_REMOTE_REASON_MASK  0xff
+
+/* verify capability PHY power management bits */
+#define PWRM_BER_CONTROL	0x1
+#define PWRM_BANDWIDTH_CONTROL	0x2
+
+/* verify capability fabric CRC size bits */
+enum {
+	CAP_CRC_14B = (1 << 0), /* 14b CRC */
+	CAP_CRC_48B = (1 << 1), /* 48b CRC */
+	CAP_CRC_12B_16B_PER_LANE = (1 << 2) /* 12b-16b per lane CRC */
+};
+
+#define SUPPORTED_CRCS (CAP_CRC_14B | CAP_CRC_48B)
+
+/* misc status version fields */
+#define STS_FM_VERSION_A_SHIFT 16
+#define STS_FM_VERSION_A_MASK  0xff
+#define STS_FM_VERSION_B_SHIFT 24
+#define STS_FM_VERSION_B_MASK  0xff
+
+/* LCB_CFG_CRC_MODE TX_VAL and RX_VAL CRC mode values */
+#define LCB_CRC_16B			0x0	/* 16b CRC */
+#define LCB_CRC_14B			0x1	/* 14b CRC */
+#define LCB_CRC_48B			0x2	/* 48b CRC */
+#define LCB_CRC_12B_16B_PER_LANE	0x3	/* 12b-16b per lane CRC */
+
+/* the following enum is (almost) a copy/paste of the definition
+ * in the OPA spec, section 20.2.2.6.8 (PortInfo) */
+enum {
+	PORT_LTP_CRC_MODE_NONE = 0,
+	PORT_LTP_CRC_MODE_14 = 1, /* 14-bit LTP CRC mode (optional) */
+	PORT_LTP_CRC_MODE_16 = 2, /* 16-bit LTP CRC mode */
+	PORT_LTP_CRC_MODE_48 = 4,
+		/* 48-bit overlapping LTP CRC mode (optional) */
+	PORT_LTP_CRC_MODE_PER_LANE = 8
+		/* 12 to 16 bit per lane LTP CRC mode (optional) */
+};
+
+/* timeouts */
+#define LINK_RESTART_DELAY 1000		/* link restart delay, in ms */
+#define TIMEOUT_8051_START 5000         /* 8051 start timeout, in ms */
+#define DC8051_COMMAND_TIMEOUT 20000	/* DC8051 command timeout, in ms */
+#define FREEZE_STATUS_TIMEOUT 20	/* wait for freeze indicators, in ms */
+#define VL_STATUS_CLEAR_TIMEOUT 5000	/* per-VL status clear, in ms */
+#define CCE_STATUS_TIMEOUT 10		/* time to clear CCE Status, in ms */
+
+/* cclock tick time, in picoseconds per tick: 1/speed * 10^12  */
+#define ASIC_CCLOCK_PS  1242	/* 805 MHz */
+#define FPGA_CCLOCK_PS 30300	/*  33 MHz */
+
+/*
+ * Mask of enabled MISC errors.  Do not enable the two RSA engine errors -
+ * see firmware.c:run_rsa() for details.
+ */
+#define DRIVER_MISC_MASK \
+	(~(MISC_ERR_STATUS_MISC_FW_AUTH_FAILED_ERR_SMASK \
+		| MISC_ERR_STATUS_MISC_KEY_MISMATCH_ERR_SMASK))
+
+/* valid values for the loopback module parameter */
+#define LOOPBACK_NONE	0	/* no loopback - default */
+#define LOOPBACK_SERDES 1
+#define LOOPBACK_LCB	2
+#define LOOPBACK_CABLE	3	/* external cable */
+
+/* read and write hardware registers */
+u64 read_csr(const struct hfi1_devdata *dd, u32 offset);
+void write_csr(const struct hfi1_devdata *dd, u32 offset, u64 value);
+
+/*
+ * The *_kctxt_* flavor of the CSR read/write functions are for
+ * per-context or per-SDMA CSRs that are not mappable to user-space.
+ * Their spacing is not a PAGE_SIZE multiple.
+ */
+static inline u64 read_kctxt_csr(const struct hfi1_devdata *dd, int ctxt,
+				 u32 offset0)
+{
+	/* kernel per-context CSRs are separated by 0x100 */
+	return read_csr(dd, offset0 + (0x100 * ctxt));
+}
+
+static inline void write_kctxt_csr(struct hfi1_devdata *dd, int ctxt,
+				   u32 offset0, u64 value)
+{
+	/* kernel per-context CSRs are separated by 0x100 */
+	write_csr(dd, offset0 + (0x100 * ctxt), value);
+}
+
+int read_lcb_csr(struct hfi1_devdata *dd, u32 offset, u64 *data);
+int write_lcb_csr(struct hfi1_devdata *dd, u32 offset, u64 data);
+
+void __iomem *get_csr_addr(
+	struct hfi1_devdata *dd,
+	u32 offset);
+
+static inline void __iomem *get_kctxt_csr_addr(
+	struct hfi1_devdata *dd,
+	int ctxt,
+	u32 offset0)
+{
+	return get_csr_addr(dd, offset0 + (0x100 * ctxt));
+}
+
+/*
+ * The *_uctxt_* flavor of the CSR read/write functions are for
+ * per-context CSRs that are mappable to user space. All these CSRs
+ * are spaced by a PAGE_SIZE multiple in order to be mappable to
+ * different processes without exposing other contexts' CSRs
+ */
+static inline u64 read_uctxt_csr(const struct hfi1_devdata *dd, int ctxt,
+				 u32 offset0)
+{
+	/* user per-context CSRs are separated by 0x1000 */
+	return read_csr(dd, offset0 + (0x1000 * ctxt));
+}
+
+static inline void write_uctxt_csr(struct hfi1_devdata *dd, int ctxt,
+				   u32 offset0, u64 value)
+{
+	/* user per-context CSRs are separated by 0x1000 */
+	write_csr(dd, offset0 + (0x1000 * ctxt), value);
+}
+
+u64 create_pbc(struct hfi1_pportdata *ppd, u64, int, u32, u32);
+
+/* firmware.c */
+#define NUM_PCIE_SERDES 16	/* number of PCIe serdes on the SBus */
+extern const u8 pcie_serdes_broadcast[];
+extern const u8 pcie_pcs_addrs[2][NUM_PCIE_SERDES];
+/* SBus commands */
+#define RESET_SBUS_RECEIVER 0x20
+#define WRITE_SBUS_RECEIVER 0x21
+void sbus_request(struct hfi1_devdata *dd,
+		  u8 receiver_addr, u8 data_addr, u8 command, u32 data_in);
+int sbus_request_slow(struct hfi1_devdata *dd,
+		      u8 receiver_addr, u8 data_addr, u8 command, u32 data_in);
+void set_sbus_fast_mode(struct hfi1_devdata *dd);
+void clear_sbus_fast_mode(struct hfi1_devdata *dd);
+int hfi1_firmware_init(struct hfi1_devdata *dd);
+int load_pcie_firmware(struct hfi1_devdata *dd);
+int load_firmware(struct hfi1_devdata *dd);
+void dispose_firmware(void);
+int acquire_hw_mutex(struct hfi1_devdata *dd);
+void release_hw_mutex(struct hfi1_devdata *dd);
+void fabric_serdes_reset(struct hfi1_devdata *dd);
+int read_8051_data(struct hfi1_devdata *dd, u32 addr, u32 len, u64 *result);
+
+/* chip.c */
+void read_misc_status(struct hfi1_devdata *dd, u8 *ver_a, u8 *ver_b);
+void read_guid(struct hfi1_devdata *dd);
+int wait_fm_ready(struct hfi1_devdata *dd, u32 mstimeout);
+void set_link_down_reason(struct hfi1_pportdata *ppd, u8 lcl_reason,
+			  u8 neigh_reason, u8 rem_reason);
+int set_link_state(struct hfi1_pportdata *, u32 state);
+int port_ltp_to_cap(int port_ltp);
+void handle_verify_cap(struct work_struct *work);
+void handle_freeze(struct work_struct *work);
+void handle_link_up(struct work_struct *work);
+void handle_link_down(struct work_struct *work);
+void handle_link_downgrade(struct work_struct *work);
+void handle_link_bounce(struct work_struct *work);
+void handle_sma_message(struct work_struct *work);
+void start_freeze_handling(struct hfi1_pportdata *ppd, int flags);
+int send_idle_sma(struct hfi1_devdata *dd, u64 message);
+int start_link(struct hfi1_pportdata *ppd);
+void init_qsfp(struct hfi1_pportdata *ppd);
+int bringup_serdes(struct hfi1_pportdata *ppd);
+void set_intr_state(struct hfi1_devdata *dd, u32 enable);
+void apply_link_downgrade_policy(struct hfi1_pportdata *ppd,
+				 int refresh_widths);
+void update_usrhead(struct hfi1_ctxtdata *, u32, u32, u32, u32, u32);
+int stop_drain_data_vls(struct hfi1_devdata *dd);
+int open_fill_data_vls(struct hfi1_devdata *dd);
+u32 ns_to_cclock(struct hfi1_devdata *dd, u32 ns);
+u32 cclock_to_ns(struct hfi1_devdata *dd, u32 cclock);
+void get_linkup_link_widths(struct hfi1_pportdata *ppd);
+void read_ltp_rtt(struct hfi1_devdata *dd);
+void clear_linkup_counters(struct hfi1_devdata *dd);
+u32 hdrqempty(struct hfi1_ctxtdata *rcd);
+int is_a0(struct hfi1_devdata *dd);
+int is_ax(struct hfi1_devdata *dd);
+int is_bx(struct hfi1_devdata *dd);
+u32 read_physical_state(struct hfi1_devdata *dd);
+u32 chip_to_opa_pstate(struct hfi1_devdata *dd, u32 chip_pstate);
+u32 get_logical_state(struct hfi1_pportdata *ppd);
+const char *opa_lstate_name(u32 lstate);
+const char *opa_pstate_name(u32 pstate);
+u32 driver_physical_state(struct hfi1_pportdata *ppd);
+u32 driver_logical_state(struct hfi1_pportdata *ppd);
+
+int acquire_lcb_access(struct hfi1_devdata *dd, int sleep_ok);
+int release_lcb_access(struct hfi1_devdata *dd, int sleep_ok);
+#define LCB_START DC_LCB_CSRS
+#define LCB_END   DC_8051_CSRS /* next block is 8051 */
+static inline int is_lcb_offset(u32 offset)
+{
+	return (offset >= LCB_START && offset < LCB_END);
+}
+
+extern uint num_vls;
+
+extern uint disable_integrity;
+u64 read_dev_cntr(struct hfi1_devdata *dd, int index, int vl);
+u64 write_dev_cntr(struct hfi1_devdata *dd, int index, int vl, u64 data);
+u64 read_port_cntr(struct hfi1_pportdata *ppd, int index, int vl);
+u64 write_port_cntr(struct hfi1_pportdata *ppd, int index, int vl, u64 data);
+
+/* Per VL indexes */
+enum {
+	C_VL_0 = 0,
+	C_VL_1,
+	C_VL_2,
+	C_VL_3,
+	C_VL_4,
+	C_VL_5,
+	C_VL_6,
+	C_VL_7,
+	C_VL_15,
+	C_VL_COUNT
+};
+
+static inline int vl_from_idx(int idx)
+{
+	return (idx == C_VL_15 ? 15 : idx);
+}
+
+static inline int idx_from_vl(int vl)
+{
+	return (vl == 15 ? C_VL_15 : vl);
+}
+
+/* Per device counter indexes */
+enum {
+	C_RCV_OVF = 0,
+	C_RX_TID_FULL,
+	C_RX_TID_INVALID,
+	C_RX_TID_FLGMS,
+	C_RX_CTX_RHQS,
+	C_RX_CTX_EGRS,
+	C_RCV_TID_FLSMS,
+	C_CCE_PCI_CR_ST,
+	C_CCE_PCI_TR_ST,
+	C_CCE_PIO_WR_ST,
+	C_CCE_ERR_INT,
+	C_CCE_SDMA_INT,
+	C_CCE_MISC_INT,
+	C_CCE_RCV_AV_INT,
+	C_CCE_RCV_URG_INT,
+	C_CCE_SEND_CR_INT,
+	C_DC_UNC_ERR,
+	C_DC_RCV_ERR,
+	C_DC_FM_CFG_ERR,
+	C_DC_RMT_PHY_ERR,
+	C_DC_DROPPED_PKT,
+	C_DC_MC_XMIT_PKTS,
+	C_DC_MC_RCV_PKTS,
+	C_DC_XMIT_CERR,
+	C_DC_RCV_CERR,
+	C_DC_RCV_FCC,
+	C_DC_XMIT_FCC,
+	C_DC_XMIT_FLITS,
+	C_DC_RCV_FLITS,
+	C_DC_XMIT_PKTS,
+	C_DC_RCV_PKTS,
+	C_DC_RX_FLIT_VL,
+	C_DC_RX_PKT_VL,
+	C_DC_RCV_FCN,
+	C_DC_RCV_FCN_VL,
+	C_DC_RCV_BCN,
+	C_DC_RCV_BCN_VL,
+	C_DC_RCV_BBL,
+	C_DC_RCV_BBL_VL,
+	C_DC_MARK_FECN,
+	C_DC_MARK_FECN_VL,
+	C_DC_TOTAL_CRC,
+	C_DC_CRC_LN0,
+	C_DC_CRC_LN1,
+	C_DC_CRC_LN2,
+	C_DC_CRC_LN3,
+	C_DC_CRC_MULT_LN,
+	C_DC_TX_REPLAY,
+	C_DC_RX_REPLAY,
+	C_DC_SEQ_CRC_CNT,
+	C_DC_ESC0_ONLY_CNT,
+	C_DC_ESC0_PLUS1_CNT,
+	C_DC_ESC0_PLUS2_CNT,
+	C_DC_REINIT_FROM_PEER_CNT,
+	C_DC_SBE_CNT,
+	C_DC_MISC_FLG_CNT,
+	C_DC_PRF_GOOD_LTP_CNT,
+	C_DC_PRF_ACCEPTED_LTP_CNT,
+	C_DC_PRF_RX_FLIT_CNT,
+	C_DC_PRF_TX_FLIT_CNT,
+	C_DC_PRF_CLK_CNTR,
+	C_DC_PG_DBG_FLIT_CRDTS_CNT,
+	C_DC_PG_STS_PAUSE_COMPLETE_CNT,
+	C_DC_PG_STS_TX_SBE_CNT,
+	C_DC_PG_STS_TX_MBE_CNT,
+	C_SW_CPU_INTR,
+	C_SW_CPU_RCV_LIM,
+	C_SW_VTX_WAIT,
+	C_SW_PIO_WAIT,
+	C_SW_KMEM_WAIT,
+	DEV_CNTR_LAST  /* Must be kept last */
+};
+
+/* Per port counter indexes */
+enum {
+	C_TX_UNSUP_VL = 0,
+	C_TX_INVAL_LEN,
+	C_TX_MM_LEN_ERR,
+	C_TX_UNDERRUN,
+	C_TX_FLOW_STALL,
+	C_TX_DROPPED,
+	C_TX_HDR_ERR,
+	C_TX_PKT,
+	C_TX_WORDS,
+	C_TX_WAIT,
+	C_TX_FLIT_VL,
+	C_TX_PKT_VL,
+	C_TX_WAIT_VL,
+	C_RX_PKT,
+	C_RX_WORDS,
+	C_SW_LINK_DOWN,
+	C_SW_LINK_UP,
+	C_SW_XMIT_DSCD,
+	C_SW_XMIT_DSCD_VL,
+	C_SW_XMIT_CSTR_ERR,
+	C_SW_RCV_CSTR_ERR,
+	C_SW_IBP_LOOP_PKTS,
+	C_SW_IBP_RC_RESENDS,
+	C_SW_IBP_RNR_NAKS,
+	C_SW_IBP_OTHER_NAKS,
+	C_SW_IBP_RC_TIMEOUTS,
+	C_SW_IBP_PKT_DROPS,
+	C_SW_IBP_DMA_WAIT,
+	C_SW_IBP_RC_SEQNAK,
+	C_SW_IBP_RC_DUPREQ,
+	C_SW_IBP_RDMA_SEQ,
+	C_SW_IBP_UNALIGNED,
+	C_SW_IBP_SEQ_NAK,
+	C_SW_CPU_RC_ACKS,
+	C_SW_CPU_RC_QACKS,
+	C_SW_CPU_RC_DELAYED_COMP,
+	C_RCV_HDR_OVF_0,
+	C_RCV_HDR_OVF_1,
+	C_RCV_HDR_OVF_2,
+	C_RCV_HDR_OVF_3,
+	C_RCV_HDR_OVF_4,
+	C_RCV_HDR_OVF_5,
+	C_RCV_HDR_OVF_6,
+	C_RCV_HDR_OVF_7,
+	C_RCV_HDR_OVF_8,
+	C_RCV_HDR_OVF_9,
+	C_RCV_HDR_OVF_10,
+	C_RCV_HDR_OVF_11,
+	C_RCV_HDR_OVF_12,
+	C_RCV_HDR_OVF_13,
+	C_RCV_HDR_OVF_14,
+	C_RCV_HDR_OVF_15,
+	C_RCV_HDR_OVF_16,
+	C_RCV_HDR_OVF_17,
+	C_RCV_HDR_OVF_18,
+	C_RCV_HDR_OVF_19,
+	C_RCV_HDR_OVF_20,
+	C_RCV_HDR_OVF_21,
+	C_RCV_HDR_OVF_22,
+	C_RCV_HDR_OVF_23,
+	C_RCV_HDR_OVF_24,
+	C_RCV_HDR_OVF_25,
+	C_RCV_HDR_OVF_26,
+	C_RCV_HDR_OVF_27,
+	C_RCV_HDR_OVF_28,
+	C_RCV_HDR_OVF_29,
+	C_RCV_HDR_OVF_30,
+	C_RCV_HDR_OVF_31,
+	C_RCV_HDR_OVF_32,
+	C_RCV_HDR_OVF_33,
+	C_RCV_HDR_OVF_34,
+	C_RCV_HDR_OVF_35,
+	C_RCV_HDR_OVF_36,
+	C_RCV_HDR_OVF_37,
+	C_RCV_HDR_OVF_38,
+	C_RCV_HDR_OVF_39,
+	C_RCV_HDR_OVF_40,
+	C_RCV_HDR_OVF_41,
+	C_RCV_HDR_OVF_42,
+	C_RCV_HDR_OVF_43,
+	C_RCV_HDR_OVF_44,
+	C_RCV_HDR_OVF_45,
+	C_RCV_HDR_OVF_46,
+	C_RCV_HDR_OVF_47,
+	C_RCV_HDR_OVF_48,
+	C_RCV_HDR_OVF_49,
+	C_RCV_HDR_OVF_50,
+	C_RCV_HDR_OVF_51,
+	C_RCV_HDR_OVF_52,
+	C_RCV_HDR_OVF_53,
+	C_RCV_HDR_OVF_54,
+	C_RCV_HDR_OVF_55,
+	C_RCV_HDR_OVF_56,
+	C_RCV_HDR_OVF_57,
+	C_RCV_HDR_OVF_58,
+	C_RCV_HDR_OVF_59,
+	C_RCV_HDR_OVF_60,
+	C_RCV_HDR_OVF_61,
+	C_RCV_HDR_OVF_62,
+	C_RCV_HDR_OVF_63,
+	C_RCV_HDR_OVF_64,
+	C_RCV_HDR_OVF_65,
+	C_RCV_HDR_OVF_66,
+	C_RCV_HDR_OVF_67,
+	C_RCV_HDR_OVF_68,
+	C_RCV_HDR_OVF_69,
+	C_RCV_HDR_OVF_70,
+	C_RCV_HDR_OVF_71,
+	C_RCV_HDR_OVF_72,
+	C_RCV_HDR_OVF_73,
+	C_RCV_HDR_OVF_74,
+	C_RCV_HDR_OVF_75,
+	C_RCV_HDR_OVF_76,
+	C_RCV_HDR_OVF_77,
+	C_RCV_HDR_OVF_78,
+	C_RCV_HDR_OVF_79,
+	C_RCV_HDR_OVF_80,
+	C_RCV_HDR_OVF_81,
+	C_RCV_HDR_OVF_82,
+	C_RCV_HDR_OVF_83,
+	C_RCV_HDR_OVF_84,
+	C_RCV_HDR_OVF_85,
+	C_RCV_HDR_OVF_86,
+	C_RCV_HDR_OVF_87,
+	C_RCV_HDR_OVF_88,
+	C_RCV_HDR_OVF_89,
+	C_RCV_HDR_OVF_90,
+	C_RCV_HDR_OVF_91,
+	C_RCV_HDR_OVF_92,
+	C_RCV_HDR_OVF_93,
+	C_RCV_HDR_OVF_94,
+	C_RCV_HDR_OVF_95,
+	C_RCV_HDR_OVF_96,
+	C_RCV_HDR_OVF_97,
+	C_RCV_HDR_OVF_98,
+	C_RCV_HDR_OVF_99,
+	C_RCV_HDR_OVF_100,
+	C_RCV_HDR_OVF_101,
+	C_RCV_HDR_OVF_102,
+	C_RCV_HDR_OVF_103,
+	C_RCV_HDR_OVF_104,
+	C_RCV_HDR_OVF_105,
+	C_RCV_HDR_OVF_106,
+	C_RCV_HDR_OVF_107,
+	C_RCV_HDR_OVF_108,
+	C_RCV_HDR_OVF_109,
+	C_RCV_HDR_OVF_110,
+	C_RCV_HDR_OVF_111,
+	C_RCV_HDR_OVF_112,
+	C_RCV_HDR_OVF_113,
+	C_RCV_HDR_OVF_114,
+	C_RCV_HDR_OVF_115,
+	C_RCV_HDR_OVF_116,
+	C_RCV_HDR_OVF_117,
+	C_RCV_HDR_OVF_118,
+	C_RCV_HDR_OVF_119,
+	C_RCV_HDR_OVF_120,
+	C_RCV_HDR_OVF_121,
+	C_RCV_HDR_OVF_122,
+	C_RCV_HDR_OVF_123,
+	C_RCV_HDR_OVF_124,
+	C_RCV_HDR_OVF_125,
+	C_RCV_HDR_OVF_126,
+	C_RCV_HDR_OVF_127,
+	C_RCV_HDR_OVF_128,
+	C_RCV_HDR_OVF_129,
+	C_RCV_HDR_OVF_130,
+	C_RCV_HDR_OVF_131,
+	C_RCV_HDR_OVF_132,
+	C_RCV_HDR_OVF_133,
+	C_RCV_HDR_OVF_134,
+	C_RCV_HDR_OVF_135,
+	C_RCV_HDR_OVF_136,
+	C_RCV_HDR_OVF_137,
+	C_RCV_HDR_OVF_138,
+	C_RCV_HDR_OVF_139,
+	C_RCV_HDR_OVF_140,
+	C_RCV_HDR_OVF_141,
+	C_RCV_HDR_OVF_142,
+	C_RCV_HDR_OVF_143,
+	C_RCV_HDR_OVF_144,
+	C_RCV_HDR_OVF_145,
+	C_RCV_HDR_OVF_146,
+	C_RCV_HDR_OVF_147,
+	C_RCV_HDR_OVF_148,
+	C_RCV_HDR_OVF_149,
+	C_RCV_HDR_OVF_150,
+	C_RCV_HDR_OVF_151,
+	C_RCV_HDR_OVF_152,
+	C_RCV_HDR_OVF_153,
+	C_RCV_HDR_OVF_154,
+	C_RCV_HDR_OVF_155,
+	C_RCV_HDR_OVF_156,
+	C_RCV_HDR_OVF_157,
+	C_RCV_HDR_OVF_158,
+	C_RCV_HDR_OVF_159,
+	PORT_CNTR_LAST /* Must be kept last */
+};
+
+u64 get_all_cpu_total(u64 __percpu *cntr);
+void hfi1_start_cleanup(struct hfi1_devdata *dd);
+void hfi1_clear_tids(struct hfi1_ctxtdata *rcd);
+struct hfi1_message_header *hfi1_get_msgheader(
+				struct hfi1_devdata *dd, __le32 *rhf_addr);
+int hfi1_get_base_kinfo(struct hfi1_ctxtdata *rcd,
+			struct hfi1_ctxt_info *kinfo);
+u64 hfi1_gpio_mod(struct hfi1_devdata *dd, u32 target, u32 data, u32 dir,
+		  u32 mask);
+int hfi1_init_ctxt(struct send_context *sc);
+void hfi1_put_tid(struct hfi1_devdata *dd, u32 index,
+		  u32 type, unsigned long pa, u16 order);
+void hfi1_quiet_serdes(struct hfi1_pportdata *ppd);
+void hfi1_rcvctrl(struct hfi1_devdata *dd, unsigned int op, int ctxt);
+u32 hfi1_read_cntrs(struct hfi1_devdata *dd, loff_t pos, char **namep,
+		    u64 **cntrp);
+u32 hfi1_read_portcntrs(struct hfi1_devdata *dd, loff_t pos, u32 port,
+			char **namep, u64 **cntrp);
+u8 hfi1_ibphys_portstate(struct hfi1_pportdata *ppd);
+int hfi1_get_ib_cfg(struct hfi1_pportdata *ppd, int which);
+int hfi1_set_ib_cfg(struct hfi1_pportdata *ppd, int which, u32 val);
+int hfi1_set_ctxt_jkey(struct hfi1_devdata *dd, unsigned ctxt, u16 jkey);
+int hfi1_clear_ctxt_jkey(struct hfi1_devdata *dd, unsigned ctxt);
+int hfi1_set_ctxt_pkey(struct hfi1_devdata *dd, unsigned ctxt, u16 pkey);
+int hfi1_clear_ctxt_pkey(struct hfi1_devdata *dd, unsigned ctxt);
+void hfi1_read_link_quality(struct hfi1_devdata *dd, u8 *link_quality);
+
+/*
+ * Interrupt source table.
+ *
+ * Each entry is an interrupt source "type".  It is ordered by increasing
+ * number.
+ */
+struct is_table {
+	int start;	 /* interrupt source type start */
+	int end;	 /* interrupt source type end */
+	/* routine that returns the name of the interrupt source */
+	char *(*is_name)(char *name, size_t size, unsigned int source);
+	/* routine to call when receiving an interrupt */
+	void (*is_int)(struct hfi1_devdata *dd, unsigned int source);
+};
+
+#endif /* _CHIP_H */
diff --git a/drivers/staging/rdma/hfi1/chip_registers.h b/drivers/staging/rdma/hfi1/chip_registers.h
new file mode 100644
index 000000000000..6521030018d8
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/chip_registers.h
@@ -0,0 +1,1289 @@
+#ifndef DEF_CHIP_REG
+#define DEF_CHIP_REG
+
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#define CORE		0x000000000000
+#define CCE			(CORE + 0x000000000000)
+#define ASIC		(CORE + 0x000000400000)
+#define MISC		(CORE + 0x000000500000)
+#define DC_TOP_CSRS		(CORE + 0x000000600000)
+#define CHIP_DEBUG		(CORE + 0x000000700000)
+#define RXE			(CORE + 0x000001000000)
+#define TXE			(CORE + 0x000001800000)
+#define DCC_CSRS		(DC_TOP_CSRS + 0x000000000000)
+#define DC_LCB_CSRS		(DC_TOP_CSRS + 0x000000001000)
+#define DC_8051_CSRS		(DC_TOP_CSRS + 0x000000002000)
+#define PCIE		0
+
+#define ASIC_NUM_SCRATCH 4
+#define CCE_ERR_INT_CNT 0
+#define CCE_MISC_INT_CNT 2
+#define CCE_NUM_32_BIT_COUNTERS 3
+#define CCE_NUM_32_BIT_INT_COUNTERS 6
+#define CCE_NUM_INT_CSRS 12
+#define CCE_NUM_INT_MAP_CSRS 96
+#define CCE_NUM_MSIX_PBAS 4
+#define CCE_NUM_MSIX_VECTORS 256
+#define CCE_NUM_SCRATCH 4
+#define CCE_PCIE_POSTED_CRDT_STALL_CNT 2
+#define CCE_PCIE_TRGT_STALL_CNT 0
+#define CCE_PIO_WR_STALL_CNT 1
+#define CCE_RCV_AVAIL_INT_CNT 3
+#define CCE_RCV_URGENT_INT_CNT 4
+#define CCE_SDMA_INT_CNT 1
+#define CCE_SEND_CREDIT_INT_CNT 5
+#define DCC_CFG_LED_CNTRL (DCC_CSRS + 0x000000000040)
+#define DCC_CFG_LED_CNTRL_LED_CNTRL_SMASK 0x10ull
+#define DCC_CFG_LED_CNTRL_LED_SW_BLINK_RATE_SHIFT 0
+#define DCC_CFG_LED_CNTRL_LED_SW_BLINK_RATE_SMASK 0xFull
+#define DCC_CFG_PORT_CONFIG (DCC_CSRS + 0x000000000008)
+#define DCC_CFG_PORT_CONFIG1 (DCC_CSRS + 0x000000000010)
+#define DCC_CFG_PORT_CONFIG1_DLID_MASK_MASK 0xFFFFull
+#define DCC_CFG_PORT_CONFIG1_DLID_MASK_SHIFT 16
+#define DCC_CFG_PORT_CONFIG1_DLID_MASK_SMASK 0xFFFF0000ull
+#define DCC_CFG_PORT_CONFIG1_TARGET_DLID_MASK 0xFFFFull
+#define DCC_CFG_PORT_CONFIG1_TARGET_DLID_SHIFT 0
+#define DCC_CFG_PORT_CONFIG1_TARGET_DLID_SMASK 0xFFFFull
+#define DCC_CFG_PORT_CONFIG_LINK_STATE_MASK 0x7ull
+#define DCC_CFG_PORT_CONFIG_LINK_STATE_SHIFT 48
+#define DCC_CFG_PORT_CONFIG_LINK_STATE_SMASK 0x7000000000000ull
+#define DCC_CFG_PORT_CONFIG_MTU_CAP_MASK 0x7ull
+#define DCC_CFG_PORT_CONFIG_MTU_CAP_SHIFT 32
+#define DCC_CFG_PORT_CONFIG_MTU_CAP_SMASK 0x700000000ull
+#define DCC_CFG_RESET (DCC_CSRS + 0x000000000000)
+#define DCC_CFG_RESET_RESET_LCB_SHIFT 0
+#define DCC_CFG_RESET_RESET_RX_FPE_SHIFT 2
+#define DCC_CFG_SC_VL_TABLE_15_0 (DCC_CSRS + 0x000000000028)
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY0_SHIFT 0
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY10_SHIFT 40
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY11_SHIFT 44
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY12_SHIFT 48
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY13_SHIFT 52
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY14_SHIFT 56
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY15_SHIFT 60
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY1_SHIFT 4
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY2_SHIFT 8
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY3_SHIFT 12
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY4_SHIFT 16
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY5_SHIFT 20
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY6_SHIFT 24
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY7_SHIFT 28
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY8_SHIFT 32
+#define DCC_CFG_SC_VL_TABLE_15_0_ENTRY9_SHIFT 36
+#define DCC_CFG_SC_VL_TABLE_31_16 (DCC_CSRS + 0x000000000030)
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY16_SHIFT 0
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY17_SHIFT 4
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY18_SHIFT 8
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY19_SHIFT 12
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY20_SHIFT 16
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY21_SHIFT 20
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY22_SHIFT 24
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY23_SHIFT 28
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY24_SHIFT 32
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY25_SHIFT 36
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY26_SHIFT 40
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY27_SHIFT 44
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY28_SHIFT 48
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY29_SHIFT 52
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY30_SHIFT 56
+#define DCC_CFG_SC_VL_TABLE_31_16_ENTRY31_SHIFT 60
+#define DCC_ERR_DROPPED_PKT_CNT (DCC_CSRS + 0x000000000120)
+#define DCC_ERR_FLG (DCC_CSRS + 0x000000000050)
+#define DCC_ERR_FLG_BAD_CRDT_ACK_ERR_SMASK 0x4000ull
+#define DCC_ERR_FLG_BAD_CTRL_DIST_ERR_SMASK 0x200000ull
+#define DCC_ERR_FLG_BAD_CTRL_FLIT_ERR_SMASK 0x10000ull
+#define DCC_ERR_FLG_BAD_DLID_TARGET_ERR_SMASK 0x200ull
+#define DCC_ERR_FLG_BAD_HEAD_DIST_ERR_SMASK 0x800000ull
+#define DCC_ERR_FLG_BAD_L2_ERR_SMASK 0x2ull
+#define DCC_ERR_FLG_BAD_LVER_ERR_SMASK 0x400ull
+#define DCC_ERR_FLG_BAD_MID_TAIL_ERR_SMASK 0x8ull
+#define DCC_ERR_FLG_BAD_PKT_LENGTH_ERR_SMASK 0x4000000ull
+#define DCC_ERR_FLG_BAD_PREEMPTION_ERR_SMASK 0x10ull
+#define DCC_ERR_FLG_BAD_SC_ERR_SMASK 0x4ull
+#define DCC_ERR_FLG_BAD_TAIL_DIST_ERR_SMASK 0x400000ull
+#define DCC_ERR_FLG_BAD_VL_MARKER_ERR_SMASK 0x80ull
+#define DCC_ERR_FLG_CLR (DCC_CSRS + 0x000000000060)
+#define DCC_ERR_FLG_CSR_ACCESS_BLOCKED_HOST_SMASK 0x8000000000ull
+#define DCC_ERR_FLG_CSR_ACCESS_BLOCKED_UC_SMASK 0x10000000000ull
+#define DCC_ERR_FLG_CSR_INVAL_ADDR_SMASK 0x400000000000ull
+#define DCC_ERR_FLG_CSR_PARITY_ERR_SMASK 0x200000000000ull
+#define DCC_ERR_FLG_DLID_ZERO_ERR_SMASK 0x40000000ull
+#define DCC_ERR_FLG_EN (DCC_CSRS + 0x000000000058)
+#define DCC_ERR_FLG_EN_CSR_ACCESS_BLOCKED_HOST_SMASK 0x8000000000ull
+#define DCC_ERR_FLG_EN_CSR_ACCESS_BLOCKED_UC_SMASK 0x10000000000ull
+#define DCC_ERR_FLG_EVENT_CNTR_PARITY_ERR_SMASK 0x20000ull
+#define DCC_ERR_FLG_EVENT_CNTR_ROLLOVER_ERR_SMASK 0x40000ull
+#define DCC_ERR_FLG_FMCONFIG_ERR_SMASK 0x40000000000000ull
+#define DCC_ERR_FLG_FPE_TX_FIFO_OVFLW_ERR_SMASK 0x2000000000ull
+#define DCC_ERR_FLG_FPE_TX_FIFO_UNFLW_ERR_SMASK 0x4000000000ull
+#define DCC_ERR_FLG_LATE_EBP_ERR_SMASK 0x1000000000ull
+#define DCC_ERR_FLG_LATE_LONG_ERR_SMASK 0x800000000ull
+#define DCC_ERR_FLG_LATE_SHORT_ERR_SMASK 0x400000000ull
+#define DCC_ERR_FLG_LENGTH_MTU_ERR_SMASK 0x80000000ull
+#define DCC_ERR_FLG_LINK_ERR_SMASK 0x80000ull
+#define DCC_ERR_FLG_MISC_CNTR_ROLLOVER_ERR_SMASK 0x100000ull
+#define DCC_ERR_FLG_NONVL15_STATE_ERR_SMASK 0x1000000ull
+#define DCC_ERR_FLG_PERM_NVL15_ERR_SMASK 0x10000000ull
+#define DCC_ERR_FLG_PREEMPTION_ERR_SMASK 0x20ull
+#define DCC_ERR_FLG_PREEMPTIONVL15_ERR_SMASK 0x40ull
+#define DCC_ERR_FLG_RCVPORT_ERR_SMASK 0x80000000000000ull
+#define DCC_ERR_FLG_RX_BYTE_SHFT_PARITY_ERR_SMASK 0x1000000000000ull
+#define DCC_ERR_FLG_RX_CTRL_PARITY_MBE_ERR_SMASK 0x100000000000ull
+#define DCC_ERR_FLG_RX_EARLY_DROP_ERR_SMASK 0x200000000ull
+#define DCC_ERR_FLG_SLID_ZERO_ERR_SMASK 0x20000000ull
+#define DCC_ERR_FLG_TX_BYTE_SHFT_PARITY_ERR_SMASK 0x800000000000ull
+#define DCC_ERR_FLG_TX_CTRL_PARITY_ERR_SMASK 0x20000000000ull
+#define DCC_ERR_FLG_TX_CTRL_PARITY_MBE_ERR_SMASK 0x40000000000ull
+#define DCC_ERR_FLG_TX_SC_PARITY_ERR_SMASK 0x80000000000ull
+#define DCC_ERR_FLG_UNCORRECTABLE_ERR_SMASK 0x2000ull
+#define DCC_ERR_FLG_UNSUP_PKT_TYPE_SMASK 0x8000ull
+#define DCC_ERR_FLG_UNSUP_VL_ERR_SMASK 0x8000000ull
+#define DCC_ERR_FLG_VL15_MULTI_ERR_SMASK 0x2000000ull
+#define DCC_ERR_FMCONFIG_ERR_CNT (DCC_CSRS + 0x000000000110)
+#define DCC_ERR_INFO_FMCONFIG (DCC_CSRS + 0x000000000090)
+#define DCC_ERR_INFO_PORTRCV (DCC_CSRS + 0x000000000078)
+#define DCC_ERR_INFO_PORTRCV_HDR0 (DCC_CSRS + 0x000000000080)
+#define DCC_ERR_INFO_PORTRCV_HDR1 (DCC_CSRS + 0x000000000088)
+#define DCC_ERR_INFO_UNCORRECTABLE (DCC_CSRS + 0x000000000098)
+#define DCC_ERR_PORTRCV_ERR_CNT (DCC_CSRS + 0x000000000108)
+#define DCC_ERR_RCVREMOTE_PHY_ERR_CNT (DCC_CSRS + 0x000000000118)
+#define DCC_ERR_UNCORRECTABLE_CNT (DCC_CSRS + 0x000000000100)
+#define DCC_PRF_PORT_MARK_FECN_CNT (DCC_CSRS + 0x000000000330)
+#define DCC_PRF_PORT_RCV_BECN_CNT (DCC_CSRS + 0x000000000290)
+#define DCC_PRF_PORT_RCV_BUBBLE_CNT (DCC_CSRS + 0x0000000002E0)
+#define DCC_PRF_PORT_RCV_CORRECTABLE_CNT (DCC_CSRS + 0x000000000140)
+#define DCC_PRF_PORT_RCV_DATA_CNT (DCC_CSRS + 0x000000000198)
+#define DCC_PRF_PORT_RCV_FECN_CNT (DCC_CSRS + 0x000000000240)
+#define DCC_PRF_PORT_RCV_MULTICAST_PKT_CNT (DCC_CSRS + 0x000000000130)
+#define DCC_PRF_PORT_RCV_PKTS_CNT (DCC_CSRS + 0x0000000001A8)
+#define DCC_PRF_PORT_VL_MARK_FECN_CNT (DCC_CSRS + 0x000000000338)
+#define DCC_PRF_PORT_VL_RCV_BECN_CNT (DCC_CSRS + 0x000000000298)
+#define DCC_PRF_PORT_VL_RCV_BUBBLE_CNT (DCC_CSRS + 0x0000000002E8)
+#define DCC_PRF_PORT_VL_RCV_DATA_CNT (DCC_CSRS + 0x0000000001B0)
+#define DCC_PRF_PORT_VL_RCV_FECN_CNT (DCC_CSRS + 0x000000000248)
+#define DCC_PRF_PORT_VL_RCV_PKTS_CNT (DCC_CSRS + 0x0000000001F8)
+#define DCC_PRF_PORT_XMIT_CORRECTABLE_CNT (DCC_CSRS + 0x000000000138)
+#define DCC_PRF_PORT_XMIT_DATA_CNT (DCC_CSRS + 0x000000000190)
+#define DCC_PRF_PORT_XMIT_MULTICAST_CNT (DCC_CSRS + 0x000000000128)
+#define DCC_PRF_PORT_XMIT_PKTS_CNT (DCC_CSRS + 0x0000000001A0)
+#define DCC_PRF_RX_FLOW_CRTL_CNT (DCC_CSRS + 0x000000000180)
+#define DCC_PRF_TX_FLOW_CRTL_CNT (DCC_CSRS + 0x000000000188)
+#define DC_DC8051_CFG_CSR_ACCESS_SEL (DC_8051_CSRS + 0x000000000110)
+#define DC_DC8051_CFG_CSR_ACCESS_SEL_DCC_SMASK 0x2ull
+#define DC_DC8051_CFG_CSR_ACCESS_SEL_LCB_SMASK 0x1ull
+#define DC_DC8051_CFG_EXT_DEV_0 (DC_8051_CSRS + 0x000000000118)
+#define DC_DC8051_CFG_EXT_DEV_0_COMPLETED_SMASK 0x1ull
+#define DC_DC8051_CFG_EXT_DEV_0_RETURN_CODE_SHIFT 8
+#define DC_DC8051_CFG_EXT_DEV_0_RSP_DATA_SHIFT 16
+#define DC_DC8051_CFG_EXT_DEV_1 (DC_8051_CSRS + 0x000000000120)
+#define DC_DC8051_CFG_EXT_DEV_1_REQ_DATA_MASK 0xFFFFull
+#define DC_DC8051_CFG_EXT_DEV_1_REQ_DATA_SHIFT 16
+#define DC_DC8051_CFG_EXT_DEV_1_REQ_DATA_SMASK 0xFFFF0000ull
+#define DC_DC8051_CFG_EXT_DEV_1_REQ_NEW_SMASK 0x1ull
+#define DC_DC8051_CFG_EXT_DEV_1_REQ_TYPE_MASK 0xFFull
+#define DC_DC8051_CFG_EXT_DEV_1_REQ_TYPE_SHIFT 8
+#define DC_DC8051_CFG_HOST_CMD_0 (DC_8051_CSRS + 0x000000000028)
+#define DC_DC8051_CFG_HOST_CMD_0_REQ_DATA_MASK 0xFFFFFFFFFFFFull
+#define DC_DC8051_CFG_HOST_CMD_0_REQ_DATA_SHIFT 16
+#define DC_DC8051_CFG_HOST_CMD_0_REQ_NEW_SMASK 0x1ull
+#define DC_DC8051_CFG_HOST_CMD_0_REQ_TYPE_MASK 0xFFull
+#define DC_DC8051_CFG_HOST_CMD_0_REQ_TYPE_SHIFT 8
+#define DC_DC8051_CFG_HOST_CMD_1 (DC_8051_CSRS + 0x000000000030)
+#define DC_DC8051_CFG_HOST_CMD_1_COMPLETED_SMASK 0x1ull
+#define DC_DC8051_CFG_HOST_CMD_1_RETURN_CODE_MASK 0xFFull
+#define DC_DC8051_CFG_HOST_CMD_1_RETURN_CODE_SHIFT 8
+#define DC_DC8051_CFG_HOST_CMD_1_RSP_DATA_MASK 0xFFFFFFFFFFFFull
+#define DC_DC8051_CFG_HOST_CMD_1_RSP_DATA_SHIFT 16
+#define DC_DC8051_CFG_LOCAL_GUID (DC_8051_CSRS + 0x000000000038)
+#define DC_DC8051_CFG_MODE (DC_8051_CSRS + 0x000000000070)
+#define DC_DC8051_CFG_RAM_ACCESS_CTRL (DC_8051_CSRS + 0x000000000008)
+#define DC_DC8051_CFG_RAM_ACCESS_CTRL_ADDRESS_MASK 0x7FFFull
+#define DC_DC8051_CFG_RAM_ACCESS_CTRL_ADDRESS_SHIFT 0
+#define DC_DC8051_CFG_RAM_ACCESS_CTRL_WRITE_ENA_SMASK 0x1000000ull
+#define DC_DC8051_CFG_RAM_ACCESS_CTRL_READ_ENA_SMASK 0x10000ull
+#define DC_DC8051_CFG_RAM_ACCESS_SETUP (DC_8051_CSRS + 0x000000000000)
+#define DC_DC8051_CFG_RAM_ACCESS_SETUP_AUTO_INCR_ADDR_SMASK 0x100ull
+#define DC_DC8051_CFG_RAM_ACCESS_SETUP_RAM_SEL_SMASK 0x1ull
+#define DC_DC8051_CFG_RAM_ACCESS_STATUS (DC_8051_CSRS + 0x000000000018)
+#define DC_DC8051_CFG_RAM_ACCESS_STATUS_ACCESS_COMPLETED_SMASK 0x10000ull
+#define DC_DC8051_CFG_RAM_ACCESS_WR_DATA (DC_8051_CSRS + 0x000000000010)
+#define DC_DC8051_CFG_RAM_ACCESS_RD_DATA (DC_8051_CSRS + 0x000000000020)
+#define DC_DC8051_CFG_RST (DC_8051_CSRS + 0x000000000068)
+#define DC_DC8051_CFG_RST_CRAM_SMASK 0x2ull
+#define DC_DC8051_CFG_RST_DRAM_SMASK 0x4ull
+#define DC_DC8051_CFG_RST_IRAM_SMASK 0x8ull
+#define DC_DC8051_CFG_RST_M8051W_SMASK 0x1ull
+#define DC_DC8051_CFG_RST_SFR_SMASK 0x10ull
+#define DC_DC8051_DBG_ERR_INFO_SET_BY_8051 (DC_8051_CSRS + 0x0000000000D8)
+#define DC_DC8051_DBG_ERR_INFO_SET_BY_8051_ERROR_MASK 0xFFFFFFFFull
+#define DC_DC8051_DBG_ERR_INFO_SET_BY_8051_ERROR_SHIFT 16
+#define DC_DC8051_DBG_ERR_INFO_SET_BY_8051_HOST_MSG_MASK 0xFFFFull
+#define DC_DC8051_DBG_ERR_INFO_SET_BY_8051_HOST_MSG_SHIFT 0
+#define DC_DC8051_ERR_CLR (DC_8051_CSRS + 0x0000000000E8)
+#define DC_DC8051_ERR_EN (DC_8051_CSRS + 0x0000000000F0)
+#define DC_DC8051_ERR_EN_LOST_8051_HEART_BEAT_SMASK 0x2ull
+#define DC_DC8051_ERR_FLG (DC_8051_CSRS + 0x0000000000E0)
+#define DC_DC8051_ERR_FLG_CRAM_MBE_SMASK 0x4ull
+#define DC_DC8051_ERR_FLG_CRAM_SBE_SMASK 0x8ull
+#define DC_DC8051_ERR_FLG_DRAM_MBE_SMASK 0x10ull
+#define DC_DC8051_ERR_FLG_DRAM_SBE_SMASK 0x20ull
+#define DC_DC8051_ERR_FLG_INVALID_CSR_ADDR_SMASK 0x400ull
+#define DC_DC8051_ERR_FLG_IRAM_MBE_SMASK 0x40ull
+#define DC_DC8051_ERR_FLG_IRAM_SBE_SMASK 0x80ull
+#define DC_DC8051_ERR_FLG_LOST_8051_HEART_BEAT_SMASK 0x2ull
+#define DC_DC8051_ERR_FLG_SET_BY_8051_SMASK 0x1ull
+#define DC_DC8051_ERR_FLG_UNMATCHED_SECURE_MSG_ACROSS_BCC_LANES_SMASK 0x100ull
+#define DC_DC8051_STS_CUR_STATE (DC_8051_CSRS + 0x000000000060)
+#define DC_DC8051_STS_CUR_STATE_FIRMWARE_MASK 0xFFull
+#define DC_DC8051_STS_CUR_STATE_FIRMWARE_SHIFT 16
+#define DC_DC8051_STS_CUR_STATE_PORT_MASK 0xFFull
+#define DC_DC8051_STS_CUR_STATE_PORT_SHIFT 0
+#define DC_DC8051_STS_LOCAL_FM_SECURITY (DC_8051_CSRS + 0x000000000050)
+#define DC_DC8051_STS_LOCAL_FM_SECURITY_DISABLED_MASK 0x1ull
+#define DC_DC8051_STS_REMOTE_FM_SECURITY (DC_8051_CSRS + 0x000000000058)
+#define DC_DC8051_STS_REMOTE_GUID (DC_8051_CSRS + 0x000000000040)
+#define DC_DC8051_STS_REMOTE_NODE_TYPE (DC_8051_CSRS + 0x000000000048)
+#define DC_DC8051_STS_REMOTE_NODE_TYPE_VAL_MASK 0x3ull
+#define DC_DC8051_STS_REMOTE_PORT_NO (DC_8051_CSRS + 0x000000000130)
+#define DC_DC8051_STS_REMOTE_PORT_NO_VAL_SMASK 0xFFull
+#define DC_LCB_CFG_ALLOW_LINK_UP (DC_LCB_CSRS + 0x000000000128)
+#define DC_LCB_CFG_ALLOW_LINK_UP_VAL_SHIFT 0
+#define DC_LCB_CFG_CRC_MODE (DC_LCB_CSRS + 0x000000000058)
+#define DC_LCB_CFG_CRC_MODE_TX_VAL_SHIFT 0
+#define DC_LCB_CFG_IGNORE_LOST_RCLK (DC_LCB_CSRS + 0x000000000020)
+#define DC_LCB_CFG_IGNORE_LOST_RCLK_EN_SMASK 0x1ull
+#define DC_LCB_CFG_LANE_WIDTH (DC_LCB_CSRS + 0x000000000100)
+#define DC_LCB_CFG_LINK_KILL_EN (DC_LCB_CSRS + 0x000000000120)
+#define DC_LCB_CFG_LINK_KILL_EN_FLIT_INPUT_BUF_MBE_SMASK 0x100000ull
+#define DC_LCB_CFG_LINK_KILL_EN_REPLAY_BUF_MBE_SMASK 0x400000ull
+#define DC_LCB_CFG_LN_DCLK (DC_LCB_CSRS + 0x000000000060)
+#define DC_LCB_CFG_LOOPBACK (DC_LCB_CSRS + 0x0000000000F8)
+#define DC_LCB_CFG_LOOPBACK_VAL_SHIFT 0
+#define DC_LCB_CFG_RUN (DC_LCB_CSRS + 0x000000000000)
+#define DC_LCB_CFG_RUN_EN_SHIFT 0
+#define DC_LCB_CFG_RX_FIFOS_RADR (DC_LCB_CSRS + 0x000000000018)
+#define DC_LCB_CFG_RX_FIFOS_RADR_DO_NOT_JUMP_VAL_SHIFT 8
+#define DC_LCB_CFG_RX_FIFOS_RADR_OK_TO_JUMP_VAL_SHIFT 4
+#define DC_LCB_CFG_RX_FIFOS_RADR_RST_VAL_SHIFT 0
+#define DC_LCB_CFG_TX_FIFOS_RADR (DC_LCB_CSRS + 0x000000000010)
+#define DC_LCB_CFG_TX_FIFOS_RADR_RST_VAL_SHIFT 0
+#define DC_LCB_CFG_TX_FIFOS_RESET (DC_LCB_CSRS + 0x000000000008)
+#define DC_LCB_CFG_TX_FIFOS_RESET_VAL_SHIFT 0
+#define DC_LCB_ERR_CLR (DC_LCB_CSRS + 0x000000000308)
+#define DC_LCB_ERR_EN (DC_LCB_CSRS + 0x000000000310)
+#define DC_LCB_ERR_FLG (DC_LCB_CSRS + 0x000000000300)
+#define DC_LCB_ERR_FLG_REDUNDANT_FLIT_PARITY_ERR_SMASK 0x20000000ull
+#define DC_LCB_ERR_FLG_NEG_EDGE_LINK_TRANSFER_ACTIVE_SMASK 0x10000000ull
+#define DC_LCB_ERR_FLG_HOLD_REINIT_SMASK 0x8000000ull
+#define DC_LCB_ERR_FLG_RST_FOR_INCOMPLT_RND_TRIP_SMASK 0x4000000ull
+#define DC_LCB_ERR_FLG_RST_FOR_LINK_TIMEOUT_SMASK 0x2000000ull
+#define DC_LCB_ERR_FLG_CREDIT_RETURN_FLIT_MBE_SMASK 0x1000000ull
+#define DC_LCB_ERR_FLG_REPLAY_BUF_SBE_SMASK 0x800000ull
+#define DC_LCB_ERR_FLG_REPLAY_BUF_MBE_SMASK 0x400000ull
+#define DC_LCB_ERR_FLG_FLIT_INPUT_BUF_SBE_SMASK 0x200000ull
+#define DC_LCB_ERR_FLG_FLIT_INPUT_BUF_MBE_SMASK 0x100000ull
+#define DC_LCB_ERR_FLG_VL_ACK_INPUT_WRONG_CRC_MODE_SMASK 0x80000ull
+#define DC_LCB_ERR_FLG_VL_ACK_INPUT_PARITY_ERR_SMASK 0x40000ull
+#define DC_LCB_ERR_FLG_VL_ACK_INPUT_BUF_OFLW_SMASK 0x20000ull
+#define DC_LCB_ERR_FLG_FLIT_INPUT_BUF_OFLW_SMASK 0x10000ull
+#define DC_LCB_ERR_FLG_ILLEGAL_FLIT_ENCODING_SMASK 0x8000ull
+#define DC_LCB_ERR_FLG_ILLEGAL_NULL_LTP_SMASK 0x4000ull
+#define DC_LCB_ERR_FLG_UNEXPECTED_ROUND_TRIP_MARKER_SMASK 0x2000ull
+#define DC_LCB_ERR_FLG_UNEXPECTED_REPLAY_MARKER_SMASK 0x1000ull
+#define DC_LCB_ERR_FLG_RCLK_STOPPED_SMASK 0x800ull
+#define DC_LCB_ERR_FLG_CRC_ERR_CNT_HIT_LIMIT_SMASK 0x400ull
+#define DC_LCB_ERR_FLG_REINIT_FOR_LN_DEGRADE_SMASK 0x200ull
+#define DC_LCB_ERR_FLG_REINIT_FROM_PEER_SMASK 0x100ull
+#define DC_LCB_ERR_FLG_SEQ_CRC_ERR_SMASK 0x80ull
+#define DC_LCB_ERR_FLG_RX_LESS_THAN_FOUR_LNS_SMASK 0x40ull
+#define DC_LCB_ERR_FLG_TX_LESS_THAN_FOUR_LNS_SMASK 0x20ull
+#define DC_LCB_ERR_FLG_LOST_REINIT_STALL_OR_TOS_SMASK 0x10ull
+#define DC_LCB_ERR_FLG_ALL_LNS_FAILED_REINIT_TEST_SMASK 0x8ull
+#define DC_LCB_ERR_FLG_RST_FOR_FAILED_DESKEW_SMASK 0x4ull
+#define DC_LCB_ERR_FLG_INVALID_CSR_ADDR_SMASK 0x2ull
+#define DC_LCB_ERR_FLG_CSR_PARITY_ERR_SMASK 0x1ull
+#define DC_LCB_ERR_INFO_CRC_ERR_LN0 (DC_LCB_CSRS + 0x000000000328)
+#define DC_LCB_ERR_INFO_CRC_ERR_LN1 (DC_LCB_CSRS + 0x000000000330)
+#define DC_LCB_ERR_INFO_CRC_ERR_LN2 (DC_LCB_CSRS + 0x000000000338)
+#define DC_LCB_ERR_INFO_CRC_ERR_LN3 (DC_LCB_CSRS + 0x000000000340)
+#define DC_LCB_ERR_INFO_CRC_ERR_MULTI_LN (DC_LCB_CSRS + 0x000000000348)
+#define DC_LCB_ERR_INFO_ESCAPE_0_ONLY_CNT (DC_LCB_CSRS + 0x000000000368)
+#define DC_LCB_ERR_INFO_ESCAPE_0_PLUS1_CNT (DC_LCB_CSRS + 0x000000000370)
+#define DC_LCB_ERR_INFO_ESCAPE_0_PLUS2_CNT (DC_LCB_CSRS + 0x000000000378)
+#define DC_LCB_ERR_INFO_MISC_FLG_CNT (DC_LCB_CSRS + 0x000000000390)
+#define DC_LCB_ERR_INFO_REINIT_FROM_PEER_CNT (DC_LCB_CSRS + 0x000000000380)
+#define DC_LCB_ERR_INFO_RX_REPLAY_CNT (DC_LCB_CSRS + 0x000000000358)
+#define DC_LCB_ERR_INFO_SBE_CNT (DC_LCB_CSRS + 0x000000000388)
+#define DC_LCB_ERR_INFO_SEQ_CRC_CNT (DC_LCB_CSRS + 0x000000000360)
+#define DC_LCB_ERR_INFO_TOTAL_CRC_ERR (DC_LCB_CSRS + 0x000000000320)
+#define DC_LCB_ERR_INFO_TX_REPLAY_CNT (DC_LCB_CSRS + 0x000000000350)
+#define DC_LCB_PG_DBG_FLIT_CRDTS_CNT (DC_LCB_CSRS + 0x000000000580)
+#define DC_LCB_PG_STS_PAUSE_COMPLETE_CNT (DC_LCB_CSRS + 0x0000000005F8)
+#define DC_LCB_PG_STS_TX_MBE_CNT (DC_LCB_CSRS + 0x000000000608)
+#define DC_LCB_PG_STS_TX_SBE_CNT (DC_LCB_CSRS + 0x000000000600)
+#define DC_LCB_PRF_ACCEPTED_LTP_CNT (DC_LCB_CSRS + 0x000000000408)
+#define DC_LCB_PRF_CLK_CNTR (DC_LCB_CSRS + 0x000000000420)
+#define DC_LCB_PRF_GOOD_LTP_CNT (DC_LCB_CSRS + 0x000000000400)
+#define DC_LCB_PRF_RX_FLIT_CNT (DC_LCB_CSRS + 0x000000000410)
+#define DC_LCB_PRF_TX_FLIT_CNT (DC_LCB_CSRS + 0x000000000418)
+#define DC_LCB_STS_LINK_TRANSFER_ACTIVE (DC_LCB_CSRS + 0x000000000468)
+#define DC_LCB_STS_ROUND_TRIP_LTP_CNT (DC_LCB_CSRS + 0x0000000004B0)
+#define RCV_BUF_OVFL_CNT 10
+#define RCV_CONTEXT_EGR_STALL 22
+#define RCV_CONTEXT_RHQ_STALL 21
+#define RCV_DATA_PKT_CNT 0
+#define RCV_DWORD_CNT 1
+#define RCV_TID_FLOW_GEN_MISMATCH_CNT 20
+#define RCV_TID_FLOW_SEQ_MISMATCH_CNT 23
+#define RCV_TID_FULL_ERR_CNT 18
+#define RCV_TID_VALID_ERR_CNT 19
+#define RXE_NUM_32_BIT_COUNTERS 24
+#define RXE_NUM_64_BIT_COUNTERS 2
+#define RXE_NUM_RSM_INSTANCES 4
+#define RXE_NUM_TID_FLOWS 32
+#define RXE_PER_CONTEXT_OFFSET 0x0300000
+#define SEND_DATA_PKT_CNT 0
+#define SEND_DATA_PKT_VL0_CNT 12
+#define SEND_DATA_VL0_CNT 3
+#define SEND_DROPPED_PKT_CNT 5
+#define SEND_DWORD_CNT 1
+#define SEND_FLOW_STALL_CNT 4
+#define SEND_HEADERS_ERR_CNT 6
+#define SEND_LEN_ERR_CNT 1
+#define SEND_MAX_MIN_LEN_ERR_CNT 2
+#define SEND_UNDERRUN_CNT 3
+#define SEND_UNSUP_VL_ERR_CNT 0
+#define SEND_WAIT_CNT 2
+#define SEND_WAIT_VL0_CNT 21
+#define TXE_PIO_SEND_OFFSET 0x0800000
+#define ASIC_CFG_DRV_STR (ASIC + 0x000000000048)
+#define ASIC_CFG_MUTEX (ASIC + 0x000000000040)
+#define ASIC_CFG_SBUS_EXECUTE (ASIC + 0x000000000008)
+#define ASIC_CFG_SBUS_EXECUTE_EXECUTE_SMASK 0x1ull
+#define ASIC_CFG_SBUS_EXECUTE_FAST_MODE_SMASK 0x2ull
+#define ASIC_CFG_SBUS_REQUEST (ASIC + 0x000000000000)
+#define ASIC_CFG_SBUS_REQUEST_COMMAND_SHIFT 16
+#define ASIC_CFG_SBUS_REQUEST_DATA_ADDR_SHIFT 8
+#define ASIC_CFG_SBUS_REQUEST_DATA_IN_SHIFT 32
+#define ASIC_CFG_SBUS_REQUEST_RECEIVER_ADDR_SHIFT 0
+#define ASIC_CFG_SCRATCH (ASIC + 0x000000000020)
+#define ASIC_CFG_THERM_POLL_EN (ASIC + 0x000000000050)
+#define ASIC_EEP_ADDR_CMD (ASIC + 0x000000000308)
+#define ASIC_EEP_ADDR_CMD_EP_ADDR_MASK 0xFFFFFFull
+#define ASIC_EEP_CTL_STAT (ASIC + 0x000000000300)
+#define ASIC_EEP_CTL_STAT_EP_RESET_SMASK 0x4ull
+#define ASIC_EEP_CTL_STAT_RATE_SPI_SHIFT 8
+#define ASIC_EEP_CTL_STAT_RESETCSR 0x0000000083818000ull
+#define ASIC_EEP_DATA (ASIC + 0x000000000310)
+#define ASIC_GPIO_CLEAR (ASIC + 0x000000000230)
+#define ASIC_GPIO_FORCE (ASIC + 0x000000000238)
+#define ASIC_GPIO_IN (ASIC + 0x000000000200)
+#define ASIC_GPIO_INVERT (ASIC + 0x000000000210)
+#define ASIC_GPIO_MASK (ASIC + 0x000000000220)
+#define ASIC_GPIO_OE (ASIC + 0x000000000208)
+#define ASIC_GPIO_OUT (ASIC + 0x000000000218)
+#define ASIC_PCIE_SD_HOST_CMD (ASIC + 0x000000000100)
+#define ASIC_PCIE_SD_HOST_CMD_INTRPT_CMD_SHIFT 0
+#define ASIC_PCIE_SD_HOST_CMD_SBR_MODE_SMASK 0x400ull
+#define ASIC_PCIE_SD_HOST_CMD_SBUS_RCVR_ADDR_SHIFT 2
+#define ASIC_PCIE_SD_HOST_CMD_TIMER_MASK 0xFFFFFull
+#define ASIC_PCIE_SD_HOST_CMD_TIMER_SHIFT 12
+#define ASIC_PCIE_SD_HOST_STATUS (ASIC + 0x000000000108)
+#define ASIC_PCIE_SD_HOST_STATUS_FW_DNLD_ERR_MASK 0x7ull
+#define ASIC_PCIE_SD_HOST_STATUS_FW_DNLD_ERR_SHIFT 2
+#define ASIC_PCIE_SD_HOST_STATUS_FW_DNLD_STS_MASK 0x3ull
+#define ASIC_PCIE_SD_HOST_STATUS_FW_DNLD_STS_SHIFT 0
+#define ASIC_PCIE_SD_INTRPT_DATA_CODE (ASIC + 0x000000000110)
+#define ASIC_PCIE_SD_INTRPT_ENABLE (ASIC + 0x000000000118)
+#define ASIC_PCIE_SD_INTRPT_LIST (ASIC + 0x000000000180)
+#define ASIC_PCIE_SD_INTRPT_LIST_INTRPT_CODE_SHIFT 16
+#define ASIC_PCIE_SD_INTRPT_LIST_INTRPT_DATA_SHIFT 0
+#define ASIC_PCIE_SD_INTRPT_STATUS (ASIC + 0x000000000128)
+#define ASIC_QSFP1_CLEAR (ASIC + 0x000000000270)
+#define ASIC_QSFP1_FORCE (ASIC + 0x000000000278)
+#define ASIC_QSFP1_IN (ASIC + 0x000000000240)
+#define ASIC_QSFP1_INVERT (ASIC + 0x000000000250)
+#define ASIC_QSFP1_MASK (ASIC + 0x000000000260)
+#define ASIC_QSFP1_OE (ASIC + 0x000000000248)
+#define ASIC_QSFP1_OUT (ASIC + 0x000000000258)
+#define ASIC_QSFP1_STATUS (ASIC + 0x000000000268)
+#define ASIC_QSFP2_CLEAR (ASIC + 0x0000000002B0)
+#define ASIC_QSFP2_FORCE (ASIC + 0x0000000002B8)
+#define ASIC_QSFP2_IN (ASIC + 0x000000000280)
+#define ASIC_QSFP2_INVERT (ASIC + 0x000000000290)
+#define ASIC_QSFP2_MASK (ASIC + 0x0000000002A0)
+#define ASIC_QSFP2_OE (ASIC + 0x000000000288)
+#define ASIC_QSFP2_OUT (ASIC + 0x000000000298)
+#define ASIC_QSFP2_STATUS (ASIC + 0x0000000002A8)
+#define ASIC_STS_SBUS_COUNTERS (ASIC + 0x000000000018)
+#define ASIC_STS_SBUS_COUNTERS_EXECUTE_CNT_MASK 0xFFFFull
+#define ASIC_STS_SBUS_COUNTERS_EXECUTE_CNT_SHIFT 0
+#define ASIC_STS_SBUS_COUNTERS_RCV_DATA_VALID_CNT_MASK 0xFFFFull
+#define ASIC_STS_SBUS_COUNTERS_RCV_DATA_VALID_CNT_SHIFT 16
+#define ASIC_STS_SBUS_RESULT (ASIC + 0x000000000010)
+#define ASIC_STS_SBUS_RESULT_DONE_SMASK 0x1ull
+#define ASIC_STS_SBUS_RESULT_RCV_DATA_VALID_SMASK 0x2ull
+#define ASIC_STS_THERM (ASIC + 0x000000000058)
+#define ASIC_STS_THERM_CRIT_TEMP_MASK 0x7FFull
+#define ASIC_STS_THERM_CRIT_TEMP_SHIFT 18
+#define ASIC_STS_THERM_CURR_TEMP_MASK 0x7FFull
+#define ASIC_STS_THERM_CURR_TEMP_SHIFT 2
+#define ASIC_STS_THERM_HI_TEMP_MASK 0x7FFull
+#define ASIC_STS_THERM_HI_TEMP_SHIFT 50
+#define ASIC_STS_THERM_LO_TEMP_MASK 0x7FFull
+#define ASIC_STS_THERM_LO_TEMP_SHIFT 34
+#define ASIC_STS_THERM_LOW_SHIFT 13
+#define CCE_COUNTER_ARRAY32 (CCE + 0x000000000060)
+#define CCE_CTRL (CCE + 0x000000000010)
+#define CCE_CTRL_RXE_RESUME_SMASK 0x800ull
+#define CCE_CTRL_SPC_FREEZE_SMASK 0x100ull
+#define CCE_CTRL_SPC_UNFREEZE_SMASK 0x200ull
+#define CCE_CTRL_TXE_RESUME_SMASK 0x2000ull
+#define CCE_DC_CTRL (CCE + 0x0000000000B8)
+#define CCE_DC_CTRL_DC_RESET_SMASK 0x1ull
+#define CCE_DC_CTRL_RESETCSR 0x0000000000000001ull
+#define CCE_ERR_CLEAR (CCE + 0x000000000050)
+#define CCE_ERR_MASK (CCE + 0x000000000048)
+#define CCE_ERR_STATUS (CCE + 0x000000000040)
+#define CCE_ERR_STATUS_CCE_CLI0_ASYNC_FIFO_PARITY_ERR_SMASK 0x40ull
+#define CCE_ERR_STATUS_CCE_CLI1_ASYNC_FIFO_DBG_PARITY_ERROR_SMASK 0x1000ull
+#define CCE_ERR_STATUS_CCE_CLI1_ASYNC_FIFO_PIO_CRDT_PARITY_ERR_SMASK \
+		0x200ull
+#define CCE_ERR_STATUS_CCE_CLI1_ASYNC_FIFO_RXDMA_PARITY_ERROR_SMASK \
+		0x800ull
+#define CCE_ERR_STATUS_CCE_CLI1_ASYNC_FIFO_SDMA_HD_PARITY_ERR_SMASK \
+		0x400ull
+#define CCE_ERR_STATUS_CCE_CLI2_ASYNC_FIFO_PARITY_ERR_SMASK 0x100ull
+#define CCE_ERR_STATUS_CCE_CSR_CFG_BUS_PARITY_ERR_SMASK 0x80ull
+#define CCE_ERR_STATUS_CCE_CSR_PARITY_ERR_SMASK 0x1ull
+#define CCE_ERR_STATUS_CCE_CSR_READ_BAD_ADDR_ERR_SMASK 0x2ull
+#define CCE_ERR_STATUS_CCE_CSR_WRITE_BAD_ADDR_ERR_SMASK 0x4ull
+#define CCE_ERR_STATUS_CCE_INT_MAP_COR_ERR_SMASK 0x4000000000ull
+#define CCE_ERR_STATUS_CCE_INT_MAP_UNC_ERR_SMASK 0x8000000000ull
+#define CCE_ERR_STATUS_CCE_MSIX_CSR_PARITY_ERR_SMASK 0x10000000000ull
+#define CCE_ERR_STATUS_CCE_MSIX_TABLE_COR_ERR_SMASK 0x1000000000ull
+#define CCE_ERR_STATUS_CCE_MSIX_TABLE_UNC_ERR_SMASK 0x2000000000ull
+#define CCE_ERR_STATUS_CCE_RCPL_ASYNC_FIFO_PARITY_ERR_SMASK 0x400000000ull
+#define CCE_ERR_STATUS_CCE_RSPD_DATA_PARITY_ERR_SMASK 0x20ull
+#define CCE_ERR_STATUS_CCE_RXDMA_CONV_FIFO_PARITY_ERR_SMASK 0x800000000ull
+#define CCE_ERR_STATUS_CCE_SEG_READ_BAD_ADDR_ERR_SMASK 0x100000000ull
+#define CCE_ERR_STATUS_CCE_SEG_WRITE_BAD_ADDR_ERR_SMASK 0x200000000ull
+#define CCE_ERR_STATUS_CCE_TRGT_ACCESS_ERR_SMASK 0x10ull
+#define CCE_ERR_STATUS_CCE_TRGT_ASYNC_FIFO_PARITY_ERR_SMASK 0x8ull
+#define CCE_ERR_STATUS_CCE_TRGT_CPL_TIMEOUT_ERR_SMASK 0x40000000ull
+#define CCE_ERR_STATUS_LA_TRIGGERED_SMASK 0x80000000ull
+#define CCE_ERR_STATUS_PCIC_CPL_DAT_QCOR_ERR_SMASK 0x40000ull
+#define CCE_ERR_STATUS_PCIC_CPL_DAT_QUNC_ERR_SMASK 0x4000000ull
+#define CCE_ERR_STATUS_PCIC_CPL_HD_QCOR_ERR_SMASK 0x20000ull
+#define CCE_ERR_STATUS_PCIC_CPL_HD_QUNC_ERR_SMASK 0x2000000ull
+#define CCE_ERR_STATUS_PCIC_NPOST_DAT_QPARITY_ERR_SMASK 0x100000ull
+#define CCE_ERR_STATUS_PCIC_NPOST_HQ_PARITY_ERR_SMASK 0x80000ull
+#define CCE_ERR_STATUS_PCIC_POST_DAT_QCOR_ERR_SMASK 0x10000ull
+#define CCE_ERR_STATUS_PCIC_POST_DAT_QUNC_ERR_SMASK 0x1000000ull
+#define CCE_ERR_STATUS_PCIC_POST_HD_QCOR_ERR_SMASK 0x8000ull
+#define CCE_ERR_STATUS_PCIC_POST_HD_QUNC_ERR_SMASK 0x800000ull
+#define CCE_ERR_STATUS_PCIC_RECEIVE_PARITY_ERR_SMASK 0x20000000ull
+#define CCE_ERR_STATUS_PCIC_RETRY_MEM_COR_ERR_SMASK 0x2000ull
+#define CCE_ERR_STATUS_PCIC_RETRY_MEM_UNC_ERR_SMASK 0x200000ull
+#define CCE_ERR_STATUS_PCIC_RETRY_SOT_MEM_COR_ERR_SMASK 0x4000ull
+#define CCE_ERR_STATUS_PCIC_RETRY_SOT_MEM_UNC_ERR_SMASK 0x400000ull
+#define CCE_ERR_STATUS_PCIC_TRANSMIT_BACK_PARITY_ERR_SMASK 0x10000000ull
+#define CCE_ERR_STATUS_PCIC_TRANSMIT_FRONT_PARITY_ERR_SMASK 0x8000000ull
+#define CCE_INT_CLEAR (CCE + 0x000000110A00)
+#define CCE_INT_COUNTER_ARRAY32 (CCE + 0x000000110D00)
+#define CCE_INT_FORCE (CCE + 0x000000110B00)
+#define CCE_INT_MAP (CCE + 0x000000110500)
+#define CCE_INT_MASK (CCE + 0x000000110900)
+#define CCE_INT_STATUS (CCE + 0x000000110800)
+#define CCE_MSIX_INT_GRANTED (CCE + 0x000000110200)
+#define CCE_MSIX_TABLE_LOWER (CCE + 0x000000100000)
+#define CCE_MSIX_TABLE_UPPER (CCE + 0x000000100008)
+#define CCE_MSIX_TABLE_UPPER_RESETCSR 0x0000000100000000ull
+#define CCE_MSIX_VEC_CLR_WITHOUT_INT (CCE + 0x000000110400)
+#define CCE_REVISION (CCE + 0x000000000000)
+#define CCE_REVISION2 (CCE + 0x000000000008)
+#define CCE_REVISION2_HFI_ID_MASK 0x1ull
+#define CCE_REVISION2_HFI_ID_SHIFT 0
+#define CCE_REVISION2_IMPL_CODE_SHIFT 8
+#define CCE_REVISION2_IMPL_REVISION_SHIFT 16
+#define CCE_REVISION_BOARD_ID_LOWER_NIBBLE_MASK 0xFull
+#define CCE_REVISION_BOARD_ID_LOWER_NIBBLE_SHIFT 32
+#define CCE_REVISION_CHIP_REV_MAJOR_MASK 0xFFull
+#define CCE_REVISION_CHIP_REV_MAJOR_SHIFT 8
+#define CCE_REVISION_CHIP_REV_MINOR_MASK 0xFFull
+#define CCE_REVISION_CHIP_REV_MINOR_SHIFT 0
+#define CCE_REVISION_SW_MASK 0xFFull
+#define CCE_REVISION_SW_SHIFT 24
+#define CCE_SCRATCH (CCE + 0x000000000020)
+#define CCE_STATUS (CCE + 0x000000000018)
+#define CCE_STATUS_RXE_FROZE_SMASK 0x2ull
+#define CCE_STATUS_RXE_PAUSED_SMASK 0x20ull
+#define CCE_STATUS_SDMA_FROZE_SMASK 0x1ull
+#define CCE_STATUS_SDMA_PAUSED_SMASK 0x10ull
+#define CCE_STATUS_TXE_FROZE_SMASK 0x4ull
+#define CCE_STATUS_TXE_PAUSED_SMASK 0x40ull
+#define CCE_STATUS_TXE_PIO_FROZE_SMASK 0x8ull
+#define CCE_STATUS_TXE_PIO_PAUSED_SMASK 0x80ull
+#define MISC_CFG_FW_CTRL (MISC + 0x000000001000)
+#define MISC_CFG_FW_CTRL_FW_8051_LOADED_SMASK 0x2ull
+#define MISC_CFG_FW_CTRL_RSA_STATUS_SHIFT 2
+#define MISC_CFG_FW_CTRL_RSA_STATUS_SMASK 0xCull
+#define MISC_CFG_RSA_CMD (MISC + 0x000000000A08)
+#define MISC_CFG_RSA_MODULUS (MISC + 0x000000000400)
+#define MISC_CFG_RSA_MU (MISC + 0x000000000A10)
+#define MISC_CFG_RSA_R2 (MISC + 0x000000000000)
+#define MISC_CFG_RSA_SIGNATURE (MISC + 0x000000000200)
+#define MISC_CFG_SHA_PRELOAD (MISC + 0x000000000A00)
+#define MISC_ERR_CLEAR (MISC + 0x000000002010)
+#define MISC_ERR_MASK (MISC + 0x000000002008)
+#define MISC_ERR_STATUS (MISC + 0x000000002000)
+#define MISC_ERR_STATUS_MISC_PLL_LOCK_FAIL_ERR_SMASK 0x1000ull
+#define MISC_ERR_STATUS_MISC_MBIST_FAIL_ERR_SMASK 0x800ull
+#define MISC_ERR_STATUS_MISC_INVALID_EEP_CMD_ERR_SMASK 0x400ull
+#define MISC_ERR_STATUS_MISC_EFUSE_DONE_PARITY_ERR_SMASK 0x200ull
+#define MISC_ERR_STATUS_MISC_EFUSE_WRITE_ERR_SMASK 0x100ull
+#define MISC_ERR_STATUS_MISC_EFUSE_READ_BAD_ADDR_ERR_SMASK 0x80ull
+#define MISC_ERR_STATUS_MISC_EFUSE_CSR_PARITY_ERR_SMASK 0x40ull
+#define MISC_ERR_STATUS_MISC_FW_AUTH_FAILED_ERR_SMASK 0x20ull
+#define MISC_ERR_STATUS_MISC_KEY_MISMATCH_ERR_SMASK 0x10ull
+#define MISC_ERR_STATUS_MISC_SBUS_WRITE_FAILED_ERR_SMASK 0x8ull
+#define MISC_ERR_STATUS_MISC_CSR_WRITE_BAD_ADDR_ERR_SMASK 0x4ull
+#define MISC_ERR_STATUS_MISC_CSR_READ_BAD_ADDR_ERR_SMASK 0x2ull
+#define MISC_ERR_STATUS_MISC_CSR_PARITY_ERR_SMASK 0x1ull
+#define PCI_CFG_MSIX0 (PCIE + 0x0000000000B0)
+#define PCI_CFG_REG1 (PCIE + 0x000000000004)
+#define PCI_CFG_REG11 (PCIE + 0x00000000002C)
+#define PCIE_CFG_SPCIE1 (PCIE + 0x00000000014C)
+#define PCIE_CFG_SPCIE2 (PCIE + 0x000000000150)
+#define PCIE_CFG_TPH2 (PCIE + 0x000000000180)
+#define RCV_ARRAY (RXE + 0x000000200000)
+#define RCV_ARRAY_CNT (RXE + 0x000000000018)
+#define RCV_ARRAY_RT_ADDR_MASK 0xFFFFFFFFFull
+#define RCV_ARRAY_RT_ADDR_SHIFT 0
+#define RCV_ARRAY_RT_BUF_SIZE_SHIFT 36
+#define RCV_ARRAY_RT_WRITE_ENABLE_SMASK 0x8000000000000000ull
+#define RCV_AVAIL_TIME_OUT (RXE + 0x000000100050)
+#define RCV_AVAIL_TIME_OUT_TIME_OUT_RELOAD_MASK 0xFFull
+#define RCV_AVAIL_TIME_OUT_TIME_OUT_RELOAD_SHIFT 0
+#define RCV_BTH_QP (RXE + 0x000000000028)
+#define RCV_BTH_QP_KDETH_QP_MASK 0xFFull
+#define RCV_BTH_QP_KDETH_QP_SHIFT 16
+#define RCV_BYPASS (RXE + 0x000000000038)
+#define RCV_CONTEXTS (RXE + 0x000000000010)
+#define RCV_COUNTER_ARRAY32 (RXE + 0x000000000400)
+#define RCV_COUNTER_ARRAY64 (RXE + 0x000000000500)
+#define RCV_CTRL (RXE + 0x000000000000)
+#define RCV_CTRL_RCV_BYPASS_ENABLE_SMASK 0x10ull
+#define RCV_CTRL_RCV_EXTENDED_PSN_ENABLE_SMASK 0x40ull
+#define RCV_CTRL_RCV_PARTITION_KEY_ENABLE_SMASK 0x4ull
+#define RCV_CTRL_RCV_PORT_ENABLE_SMASK 0x1ull
+#define RCV_CTRL_RCV_QP_MAP_ENABLE_SMASK 0x2ull
+#define RCV_CTRL_RCV_RSM_ENABLE_SMASK 0x20ull
+#define RCV_CTRL_RX_RBUF_INIT_SMASK 0x200ull
+#define RCV_CTXT_CTRL (RXE + 0x000000100000)
+#define RCV_CTXT_CTRL_DONT_DROP_EGR_FULL_SMASK 0x4ull
+#define RCV_CTXT_CTRL_DONT_DROP_RHQ_FULL_SMASK 0x8ull
+#define RCV_CTXT_CTRL_EGR_BUF_SIZE_MASK 0x7ull
+#define RCV_CTXT_CTRL_EGR_BUF_SIZE_SHIFT 8
+#define RCV_CTXT_CTRL_EGR_BUF_SIZE_SMASK 0x700ull
+#define RCV_CTXT_CTRL_ENABLE_SMASK 0x1ull
+#define RCV_CTXT_CTRL_INTR_AVAIL_SMASK 0x20ull
+#define RCV_CTXT_CTRL_ONE_PACKET_PER_EGR_BUFFER_SMASK 0x2ull
+#define RCV_CTXT_CTRL_TAIL_UPD_SMASK 0x40ull
+#define RCV_CTXT_CTRL_TID_FLOW_ENABLE_SMASK 0x10ull
+#define RCV_CTXT_STATUS (RXE + 0x000000100008)
+#define RCV_EGR_CTRL (RXE + 0x000000100010)
+#define RCV_EGR_CTRL_EGR_BASE_INDEX_MASK 0x1FFFull
+#define RCV_EGR_CTRL_EGR_BASE_INDEX_SHIFT 0
+#define RCV_EGR_CTRL_EGR_CNT_MASK 0x1FFull
+#define RCV_EGR_CTRL_EGR_CNT_SHIFT 32
+#define RCV_EGR_INDEX_HEAD (RXE + 0x000000300018)
+#define RCV_EGR_INDEX_HEAD_HEAD_MASK 0x7FFull
+#define RCV_EGR_INDEX_HEAD_HEAD_SHIFT 0
+#define RCV_ERR_CLEAR (RXE + 0x000000000070)
+#define RCV_ERR_INFO (RXE + 0x000000000050)
+#define RCV_ERR_INFO_RCV_EXCESS_BUFFER_OVERRUN_SC_SMASK 0x1Full
+#define RCV_ERR_INFO_RCV_EXCESS_BUFFER_OVERRUN_SMASK 0x20ull
+#define RCV_ERR_MASK (RXE + 0x000000000068)
+#define RCV_ERR_STATUS (RXE + 0x000000000060)
+#define RCV_ERR_STATUS_RX_CSR_PARITY_ERR_SMASK 0x8000000000000000ull
+#define RCV_ERR_STATUS_RX_CSR_READ_BAD_ADDR_ERR_SMASK 0x2000000000000000ull
+#define RCV_ERR_STATUS_RX_CSR_WRITE_BAD_ADDR_ERR_SMASK \
+		0x4000000000000000ull
+#define RCV_ERR_STATUS_RX_DC_INTF_PARITY_ERR_SMASK 0x2ull
+#define RCV_ERR_STATUS_RX_DC_SOP_EOP_PARITY_ERR_SMASK 0x200ull
+#define RCV_ERR_STATUS_RX_DMA_CSR_COR_ERR_SMASK 0x1ull
+#define RCV_ERR_STATUS_RX_DMA_CSR_PARITY_ERR_SMASK 0x200000000000000ull
+#define RCV_ERR_STATUS_RX_DMA_CSR_UNC_ERR_SMASK 0x1000000000000000ull
+#define RCV_ERR_STATUS_RX_DMA_DATA_FIFO_RD_COR_ERR_SMASK \
+		0x40000000000000ull
+#define RCV_ERR_STATUS_RX_DMA_DATA_FIFO_RD_UNC_ERR_SMASK \
+		0x20000000000000ull
+#define RCV_ERR_STATUS_RX_DMA_DQ_FSM_ENCODING_ERR_SMASK \
+		0x800000000000000ull
+#define RCV_ERR_STATUS_RX_DMA_EQ_FSM_ENCODING_ERR_SMASK \
+		0x400000000000000ull
+#define RCV_ERR_STATUS_RX_DMA_FLAG_COR_ERR_SMASK 0x800ull
+#define RCV_ERR_STATUS_RX_DMA_FLAG_UNC_ERR_SMASK 0x400ull
+#define RCV_ERR_STATUS_RX_DMA_HDR_FIFO_RD_COR_ERR_SMASK 0x10000000000000ull
+#define RCV_ERR_STATUS_RX_DMA_HDR_FIFO_RD_UNC_ERR_SMASK 0x8000000000000ull
+#define RCV_ERR_STATUS_RX_HQ_INTR_CSR_PARITY_ERR_SMASK 0x200000000000ull
+#define RCV_ERR_STATUS_RX_HQ_INTR_FSM_ERR_SMASK 0x400000000000ull
+#define RCV_ERR_STATUS_RX_LOOKUP_CSR_PARITY_ERR_SMASK 0x100000000000ull
+#define RCV_ERR_STATUS_RX_LOOKUP_DES_PART1_UNC_COR_ERR_SMASK \
+		0x10000000000ull
+#define RCV_ERR_STATUS_RX_LOOKUP_DES_PART1_UNC_ERR_SMASK 0x8000000000ull
+#define RCV_ERR_STATUS_RX_LOOKUP_DES_PART2_PARITY_ERR_SMASK \
+		0x20000000000ull
+#define RCV_ERR_STATUS_RX_LOOKUP_RCV_ARRAY_COR_ERR_SMASK 0x80000000000ull
+#define RCV_ERR_STATUS_RX_LOOKUP_RCV_ARRAY_UNC_ERR_SMASK 0x40000000000ull
+#define RCV_ERR_STATUS_RX_RBUF_BAD_LOOKUP_ERR_SMASK 0x40000000ull
+#define RCV_ERR_STATUS_RX_RBUF_BLOCK_LIST_READ_COR_ERR_SMASK 0x100000ull
+#define RCV_ERR_STATUS_RX_RBUF_BLOCK_LIST_READ_UNC_ERR_SMASK 0x80000ull
+#define RCV_ERR_STATUS_RX_RBUF_CSR_QENT_CNT_PARITY_ERR_SMASK 0x400000ull
+#define RCV_ERR_STATUS_RX_RBUF_CSR_QEOPDW_PARITY_ERR_SMASK 0x10000000ull
+#define RCV_ERR_STATUS_RX_RBUF_CSR_QHD_PTR_PARITY_ERR_SMASK 0x2000000ull
+#define RCV_ERR_STATUS_RX_RBUF_CSR_QHEAD_BUF_NUM_PARITY_ERR_SMASK \
+		0x200000ull
+#define RCV_ERR_STATUS_RX_RBUF_CSR_QNEXT_BUF_PARITY_ERR_SMASK 0x800000ull
+#define RCV_ERR_STATUS_RX_RBUF_CSR_QNUM_OF_PKT_PARITY_ERR_SMASK \
+		0x8000000ull
+#define RCV_ERR_STATUS_RX_RBUF_CSR_QTL_PTR_PARITY_ERR_SMASK 0x4000000ull
+#define RCV_ERR_STATUS_RX_RBUF_CSR_QVLD_BIT_PARITY_ERR_SMASK 0x1000000ull
+#define RCV_ERR_STATUS_RX_RBUF_CTX_ID_PARITY_ERR_SMASK 0x20000000ull
+#define RCV_ERR_STATUS_RX_RBUF_DATA_COR_ERR_SMASK 0x100000000000000ull
+#define RCV_ERR_STATUS_RX_RBUF_DATA_UNC_ERR_SMASK 0x80000000000000ull
+#define RCV_ERR_STATUS_RX_RBUF_DESC_PART1_COR_ERR_SMASK 0x1000000000000ull
+#define RCV_ERR_STATUS_RX_RBUF_DESC_PART1_UNC_ERR_SMASK 0x800000000000ull
+#define RCV_ERR_STATUS_RX_RBUF_DESC_PART2_COR_ERR_SMASK 0x4000000000000ull
+#define RCV_ERR_STATUS_RX_RBUF_DESC_PART2_UNC_ERR_SMASK 0x2000000000000ull
+#define RCV_ERR_STATUS_RX_RBUF_EMPTY_ERR_SMASK 0x100000000ull
+#define RCV_ERR_STATUS_RX_RBUF_FL_INITDONE_PARITY_ERR_SMASK 0x800000000ull
+#define RCV_ERR_STATUS_RX_RBUF_FL_INIT_WR_ADDR_PARITY_ERR_SMASK \
+		0x1000000000ull
+#define RCV_ERR_STATUS_RX_RBUF_FL_RD_ADDR_PARITY_ERR_SMASK 0x200000000ull
+#define RCV_ERR_STATUS_RX_RBUF_FL_WR_ADDR_PARITY_ERR_SMASK 0x400000000ull
+#define RCV_ERR_STATUS_RX_RBUF_FREE_LIST_COR_ERR_SMASK 0x4000ull
+#define RCV_ERR_STATUS_RX_RBUF_FREE_LIST_UNC_ERR_SMASK 0x2000ull
+#define RCV_ERR_STATUS_RX_RBUF_FULL_ERR_SMASK 0x80000000ull
+#define RCV_ERR_STATUS_RX_RBUF_LOOKUP_DES_COR_ERR_SMASK 0x40000ull
+#define RCV_ERR_STATUS_RX_RBUF_LOOKUP_DES_REG_UNC_COR_ERR_SMASK 0x10000ull
+#define RCV_ERR_STATUS_RX_RBUF_LOOKUP_DES_REG_UNC_ERR_SMASK 0x8000ull
+#define RCV_ERR_STATUS_RX_RBUF_LOOKUP_DES_UNC_ERR_SMASK 0x20000ull
+#define RCV_ERR_STATUS_RX_RBUF_NEXT_FREE_BUF_COR_ERR_SMASK 0x4000000000ull
+#define RCV_ERR_STATUS_RX_RBUF_NEXT_FREE_BUF_UNC_ERR_SMASK 0x2000000000ull
+#define RCV_ERR_STATUS_RX_RCV_CSR_PARITY_ERR_SMASK 0x100ull
+#define RCV_ERR_STATUS_RX_RCV_DATA_COR_ERR_SMASK 0x20ull
+#define RCV_ERR_STATUS_RX_RCV_DATA_UNC_ERR_SMASK 0x10ull
+#define RCV_ERR_STATUS_RX_RCV_FSM_ENCODING_ERR_SMASK 0x1000ull
+#define RCV_ERR_STATUS_RX_RCV_HDR_COR_ERR_SMASK 0x8ull
+#define RCV_ERR_STATUS_RX_RCV_HDR_UNC_ERR_SMASK 0x4ull
+#define RCV_ERR_STATUS_RX_RCV_QP_MAP_TABLE_COR_ERR_SMASK 0x80ull
+#define RCV_ERR_STATUS_RX_RCV_QP_MAP_TABLE_UNC_ERR_SMASK 0x40ull
+#define RCV_HDR_ADDR (RXE + 0x000000100028)
+#define RCV_HDR_CNT (RXE + 0x000000100030)
+#define RCV_HDR_CNT_CNT_MASK 0x1FFull
+#define RCV_HDR_CNT_CNT_SHIFT 0
+#define RCV_HDR_ENT_SIZE (RXE + 0x000000100038)
+#define RCV_HDR_ENT_SIZE_ENT_SIZE_MASK 0x7ull
+#define RCV_HDR_ENT_SIZE_ENT_SIZE_SHIFT 0
+#define RCV_HDR_HEAD (RXE + 0x000000300008)
+#define RCV_HDR_HEAD_COUNTER_MASK 0xFFull
+#define RCV_HDR_HEAD_COUNTER_SHIFT 32
+#define RCV_HDR_HEAD_HEAD_MASK 0x7FFFFull
+#define RCV_HDR_HEAD_HEAD_SHIFT 0
+#define RCV_HDR_HEAD_HEAD_SMASK 0x7FFFFull
+#define RCV_HDR_OVFL_CNT (RXE + 0x000000100058)
+#define RCV_HDR_SIZE (RXE + 0x000000100040)
+#define RCV_HDR_SIZE_HDR_SIZE_MASK 0x1Full
+#define RCV_HDR_SIZE_HDR_SIZE_SHIFT 0
+#define RCV_HDR_TAIL (RXE + 0x000000300000)
+#define RCV_HDR_TAIL_ADDR (RXE + 0x000000100048)
+#define RCV_KEY_CTRL (RXE + 0x000000100020)
+#define RCV_KEY_CTRL_JOB_KEY_ENABLE_SMASK 0x200000000ull
+#define RCV_KEY_CTRL_JOB_KEY_VALUE_MASK 0xFFFFull
+#define RCV_KEY_CTRL_JOB_KEY_VALUE_SHIFT 0
+#define RCV_MULTICAST (RXE + 0x000000000030)
+#define RCV_PARTITION_KEY (RXE + 0x000000000200)
+#define RCV_PARTITION_KEY_PARTITION_KEY_A_MASK 0xFFFFull
+#define RCV_PARTITION_KEY_PARTITION_KEY_B_SHIFT 16
+#define RCV_QP_MAP_TABLE (RXE + 0x000000000100)
+#define RCV_RSM_CFG (RXE + 0x000000000600)
+#define RCV_RSM_CFG_ENABLE_OR_CHAIN_RSM0_MASK 0x1ull
+#define RCV_RSM_CFG_ENABLE_OR_CHAIN_RSM0_SHIFT 0
+#define RCV_RSM_CFG_PACKET_TYPE_SHIFT 60
+#define RCV_RSM_MAP_TABLE (RXE + 0x000000000900)
+#define RCV_RSM_MAP_TABLE_RCV_CONTEXT_A_MASK 0xFFull
+#define RCV_RSM_MATCH (RXE + 0x000000000800)
+#define RCV_RSM_MATCH_MASK1_SHIFT 0
+#define RCV_RSM_MATCH_MASK2_SHIFT 16
+#define RCV_RSM_MATCH_VALUE1_SHIFT 8
+#define RCV_RSM_MATCH_VALUE2_SHIFT 24
+#define RCV_RSM_SELECT (RXE + 0x000000000700)
+#define RCV_RSM_SELECT_FIELD1_OFFSET_SHIFT 0
+#define RCV_RSM_SELECT_FIELD2_OFFSET_SHIFT 16
+#define RCV_RSM_SELECT_INDEX1_OFFSET_SHIFT 32
+#define RCV_RSM_SELECT_INDEX1_WIDTH_SHIFT 44
+#define RCV_RSM_SELECT_INDEX2_OFFSET_SHIFT 48
+#define RCV_RSM_SELECT_INDEX2_WIDTH_SHIFT 60
+#define RCV_STATUS (RXE + 0x000000000008)
+#define RCV_STATUS_RX_PKT_IN_PROGRESS_SMASK 0x1ull
+#define RCV_STATUS_RX_RBUF_INIT_DONE_SMASK 0x200ull
+#define RCV_STATUS_RX_RBUF_PKT_PENDING_SMASK 0x40ull
+#define RCV_TID_CTRL (RXE + 0x000000100018)
+#define RCV_TID_CTRL_TID_BASE_INDEX_MASK 0x1FFFull
+#define RCV_TID_CTRL_TID_BASE_INDEX_SHIFT 0
+#define RCV_TID_CTRL_TID_PAIR_CNT_MASK 0x1FFull
+#define RCV_TID_CTRL_TID_PAIR_CNT_SHIFT 32
+#define RCV_TID_FLOW_TABLE (RXE + 0x000000300800)
+#define RCV_VL15 (RXE + 0x000000000048)
+#define SEND_BTH_QP (TXE + 0x0000000000A0)
+#define SEND_BTH_QP_KDETH_QP_MASK 0xFFull
+#define SEND_BTH_QP_KDETH_QP_SHIFT 16
+#define SEND_CM_CREDIT_USED_STATUS (TXE + 0x000000000510)
+#define SEND_CM_CREDIT_USED_STATUS_VL0_RETURN_CREDIT_STATUS_SMASK \
+		0x1000000000000ull
+#define SEND_CM_CREDIT_USED_STATUS_VL15_RETURN_CREDIT_STATUS_SMASK \
+		0x8000000000000000ull
+#define SEND_CM_CREDIT_USED_STATUS_VL1_RETURN_CREDIT_STATUS_SMASK \
+		0x2000000000000ull
+#define SEND_CM_CREDIT_USED_STATUS_VL2_RETURN_CREDIT_STATUS_SMASK \
+		0x4000000000000ull
+#define SEND_CM_CREDIT_USED_STATUS_VL3_RETURN_CREDIT_STATUS_SMASK \
+		0x8000000000000ull
+#define SEND_CM_CREDIT_USED_STATUS_VL4_RETURN_CREDIT_STATUS_SMASK \
+		0x10000000000000ull
+#define SEND_CM_CREDIT_USED_STATUS_VL5_RETURN_CREDIT_STATUS_SMASK \
+		0x20000000000000ull
+#define SEND_CM_CREDIT_USED_STATUS_VL6_RETURN_CREDIT_STATUS_SMASK \
+		0x40000000000000ull
+#define SEND_CM_CREDIT_USED_STATUS_VL7_RETURN_CREDIT_STATUS_SMASK \
+		0x80000000000000ull
+#define SEND_CM_CREDIT_VL (TXE + 0x000000000600)
+#define SEND_CM_CREDIT_VL15 (TXE + 0x000000000678)
+#define SEND_CM_CREDIT_VL15_DEDICATED_LIMIT_VL_SHIFT 0
+#define SEND_CM_CREDIT_VL_DEDICATED_LIMIT_VL_MASK 0xFFFFull
+#define SEND_CM_CREDIT_VL_DEDICATED_LIMIT_VL_SHIFT 0
+#define SEND_CM_CREDIT_VL_DEDICATED_LIMIT_VL_SMASK 0xFFFFull
+#define SEND_CM_CREDIT_VL_SHARED_LIMIT_VL_MASK 0xFFFFull
+#define SEND_CM_CREDIT_VL_SHARED_LIMIT_VL_SHIFT 16
+#define SEND_CM_CREDIT_VL_SHARED_LIMIT_VL_SMASK 0xFFFF0000ull
+#define SEND_CM_CTRL (TXE + 0x000000000500)
+#define SEND_CM_CTRL_FORCE_CREDIT_MODE_SMASK 0x8ull
+#define SEND_CM_CTRL_RESETCSR 0x0000000000000020ull
+#define SEND_CM_GLOBAL_CREDIT (TXE + 0x000000000508)
+#define SEND_CM_GLOBAL_CREDIT_AU_SHIFT 16
+#define SEND_CM_GLOBAL_CREDIT_RESETCSR 0x0000094000030000ull
+#define SEND_CM_GLOBAL_CREDIT_SHARED_LIMIT_MASK 0xFFFFull
+#define SEND_CM_GLOBAL_CREDIT_SHARED_LIMIT_SHIFT 0
+#define SEND_CM_GLOBAL_CREDIT_SHARED_LIMIT_SMASK 0xFFFFull
+#define SEND_CM_GLOBAL_CREDIT_TOTAL_CREDIT_LIMIT_MASK 0xFFFFull
+#define SEND_CM_GLOBAL_CREDIT_TOTAL_CREDIT_LIMIT_SHIFT 32
+#define SEND_CM_GLOBAL_CREDIT_TOTAL_CREDIT_LIMIT_SMASK 0xFFFF00000000ull
+#define SEND_CM_LOCAL_AU_TABLE0_TO3 (TXE + 0x000000000520)
+#define SEND_CM_LOCAL_AU_TABLE0_TO3_LOCAL_AU_TABLE0_SHIFT 0
+#define SEND_CM_LOCAL_AU_TABLE0_TO3_LOCAL_AU_TABLE1_SHIFT 16
+#define SEND_CM_LOCAL_AU_TABLE0_TO3_LOCAL_AU_TABLE2_SHIFT 32
+#define SEND_CM_LOCAL_AU_TABLE0_TO3_LOCAL_AU_TABLE3_SHIFT 48
+#define SEND_CM_LOCAL_AU_TABLE4_TO7 (TXE + 0x000000000528)
+#define SEND_CM_LOCAL_AU_TABLE4_TO7_LOCAL_AU_TABLE4_SHIFT 0
+#define SEND_CM_LOCAL_AU_TABLE4_TO7_LOCAL_AU_TABLE5_SHIFT 16
+#define SEND_CM_LOCAL_AU_TABLE4_TO7_LOCAL_AU_TABLE6_SHIFT 32
+#define SEND_CM_LOCAL_AU_TABLE4_TO7_LOCAL_AU_TABLE7_SHIFT 48
+#define SEND_CM_REMOTE_AU_TABLE0_TO3 (TXE + 0x000000000530)
+#define SEND_CM_REMOTE_AU_TABLE4_TO7 (TXE + 0x000000000538)
+#define SEND_CM_TIMER_CTRL (TXE + 0x000000000518)
+#define SEND_CONTEXTS (TXE + 0x000000000010)
+#define SEND_CONTEXT_SET_CTRL (TXE + 0x000000000200)
+#define SEND_COUNTER_ARRAY32 (TXE + 0x000000000300)
+#define SEND_COUNTER_ARRAY64 (TXE + 0x000000000400)
+#define SEND_CTRL (TXE + 0x000000000000)
+#define SEND_CTRL_CM_RESET_SMASK 0x4ull
+#define SEND_CTRL_SEND_ENABLE_SMASK 0x1ull
+#define SEND_CTRL_VL_ARBITER_ENABLE_SMASK 0x2ull
+#define SEND_CTXT_CHECK_ENABLE (TXE + 0x000000100080)
+#define SEND_CTXT_CHECK_ENABLE_CHECK_BYPASS_VL_MAPPING_SMASK 0x80ull
+#define SEND_CTXT_CHECK_ENABLE_CHECK_ENABLE_SMASK 0x1ull
+#define SEND_CTXT_CHECK_ENABLE_CHECK_JOB_KEY_SMASK 0x4ull
+#define SEND_CTXT_CHECK_ENABLE_CHECK_OPCODE_SMASK 0x20ull
+#define SEND_CTXT_CHECK_ENABLE_CHECK_PARTITION_KEY_SMASK 0x8ull
+#define SEND_CTXT_CHECK_ENABLE_CHECK_SLID_SMASK 0x10ull
+#define SEND_CTXT_CHECK_ENABLE_CHECK_VL_MAPPING_SMASK 0x40ull
+#define SEND_CTXT_CHECK_ENABLE_CHECK_VL_SMASK 0x2ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_BAD_PKT_LEN_SMASK 0x20000ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_BYPASS_BAD_PKT_LEN_SMASK \
+		0x200000ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_BYPASS_SMASK 0x800ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_GRH_SMASK 0x400ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_KDETH_PACKETS_SMASK 0x1000ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_NON_KDETH_PACKETS_SMASK 0x2000ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_PBC_STATIC_RATE_CONTROL_SMASK \
+		0x100000ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_PBC_TEST_SMASK 0x10000ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_RAW_IPV6_SMASK 0x200ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_RAW_SMASK 0x100ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_TOO_LONG_BYPASS_PACKETS_SMASK \
+		0x80000ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_TOO_LONG_IB_PACKETS_SMASK \
+		0x40000ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_TOO_SMALL_BYPASS_PACKETS_SMASK \
+		0x8000ull
+#define SEND_CTXT_CHECK_ENABLE_DISALLOW_TOO_SMALL_IB_PACKETS_SMASK \
+		0x4000ull
+#define SEND_CTXT_CHECK_JOB_KEY (TXE + 0x000000100090)
+#define SEND_CTXT_CHECK_JOB_KEY_ALLOW_PERMISSIVE_SMASK 0x100000000ull
+#define SEND_CTXT_CHECK_JOB_KEY_MASK_SMASK 0xFFFF0000ull
+#define SEND_CTXT_CHECK_JOB_KEY_VALUE_MASK 0xFFFFull
+#define SEND_CTXT_CHECK_JOB_KEY_VALUE_SHIFT 0
+#define SEND_CTXT_CHECK_OPCODE (TXE + 0x0000001000A8)
+#define SEND_CTXT_CHECK_OPCODE_MASK_SHIFT 8
+#define SEND_CTXT_CHECK_OPCODE_VALUE_SHIFT 0
+#define SEND_CTXT_CHECK_PARTITION_KEY (TXE + 0x000000100098)
+#define SEND_CTXT_CHECK_PARTITION_KEY_VALUE_MASK 0xFFFFull
+#define SEND_CTXT_CHECK_PARTITION_KEY_VALUE_SHIFT 0
+#define SEND_CTXT_CHECK_SLID (TXE + 0x0000001000A0)
+#define SEND_CTXT_CHECK_SLID_MASK_MASK 0xFFFFull
+#define SEND_CTXT_CHECK_SLID_MASK_SHIFT 16
+#define SEND_CTXT_CHECK_SLID_VALUE_MASK 0xFFFFull
+#define SEND_CTXT_CHECK_SLID_VALUE_SHIFT 0
+#define SEND_CTXT_CHECK_VL (TXE + 0x000000100088)
+#define SEND_CTXT_CREDIT_CTRL (TXE + 0x000000100010)
+#define SEND_CTXT_CREDIT_CTRL_CREDIT_INTR_SMASK 0x20000ull
+#define SEND_CTXT_CREDIT_CTRL_EARLY_RETURN_SMASK 0x10000ull
+#define SEND_CTXT_CREDIT_CTRL_THRESHOLD_MASK 0x7FFull
+#define SEND_CTXT_CREDIT_CTRL_THRESHOLD_SHIFT 0
+#define SEND_CTXT_CREDIT_CTRL_THRESHOLD_SMASK 0x7FFull
+#define SEND_CTXT_CREDIT_FORCE (TXE + 0x000000100028)
+#define SEND_CTXT_CREDIT_FORCE_FORCE_RETURN_SMASK 0x1ull
+#define SEND_CTXT_CREDIT_RETURN_ADDR (TXE + 0x000000100020)
+#define SEND_CTXT_CREDIT_RETURN_ADDR_ADDRESS_SMASK 0xFFFFFFFFFFC0ull
+#define SEND_CTXT_CTRL (TXE + 0x000000100000)
+#define SEND_CTXT_CTRL_CTXT_BASE_MASK 0x3FFFull
+#define SEND_CTXT_CTRL_CTXT_BASE_SHIFT 32
+#define SEND_CTXT_CTRL_CTXT_DEPTH_MASK 0x7FFull
+#define SEND_CTXT_CTRL_CTXT_DEPTH_SHIFT 48
+#define SEND_CTXT_CTRL_CTXT_ENABLE_SMASK 0x1ull
+#define SEND_CTXT_ERR_CLEAR (TXE + 0x000000100050)
+#define SEND_CTXT_ERR_MASK (TXE + 0x000000100048)
+#define SEND_CTXT_ERR_STATUS (TXE + 0x000000100040)
+#define SEND_CTXT_ERR_STATUS_PIO_DISALLOWED_PACKET_ERR_SMASK 0x2ull
+#define SEND_CTXT_ERR_STATUS_PIO_INCONSISTENT_SOP_ERR_SMASK 0x1ull
+#define SEND_CTXT_ERR_STATUS_PIO_WRITE_CROSSES_BOUNDARY_ERR_SMASK 0x4ull
+#define SEND_CTXT_ERR_STATUS_PIO_WRITE_OUT_OF_BOUNDS_ERR_SMASK 0x10ull
+#define SEND_CTXT_ERR_STATUS_PIO_WRITE_OVERFLOW_ERR_SMASK 0x8ull
+#define SEND_CTXT_STATUS (TXE + 0x000000100008)
+#define SEND_CTXT_STATUS_CTXT_HALTED_SMASK 0x1ull
+#define SEND_DMA_BASE_ADDR (TXE + 0x000000200010)
+#define SEND_DMA_CHECK_ENABLE (TXE + 0x000000200080)
+#define SEND_DMA_CHECK_ENABLE_CHECK_BYPASS_VL_MAPPING_SMASK 0x80ull
+#define SEND_DMA_CHECK_ENABLE_CHECK_ENABLE_SMASK 0x1ull
+#define SEND_DMA_CHECK_ENABLE_CHECK_JOB_KEY_SMASK 0x4ull
+#define SEND_DMA_CHECK_ENABLE_CHECK_OPCODE_SMASK 0x20ull
+#define SEND_DMA_CHECK_ENABLE_CHECK_PARTITION_KEY_SMASK 0x8ull
+#define SEND_DMA_CHECK_ENABLE_CHECK_SLID_SMASK 0x10ull
+#define SEND_DMA_CHECK_ENABLE_CHECK_VL_MAPPING_SMASK 0x40ull
+#define SEND_DMA_CHECK_ENABLE_CHECK_VL_SMASK 0x2ull
+#define SEND_DMA_CHECK_ENABLE_DISALLOW_BAD_PKT_LEN_SMASK 0x20000ull
+#define SEND_DMA_CHECK_ENABLE_DISALLOW_BYPASS_BAD_PKT_LEN_SMASK 0x200000ull
+#define SEND_DMA_CHECK_ENABLE_DISALLOW_PBC_STATIC_RATE_CONTROL_SMASK \
+		0x100000ull
+#define SEND_DMA_CHECK_ENABLE_DISALLOW_RAW_IPV6_SMASK 0x200ull
+#define SEND_DMA_CHECK_ENABLE_DISALLOW_RAW_SMASK 0x100ull
+#define SEND_DMA_CHECK_ENABLE_DISALLOW_TOO_LONG_BYPASS_PACKETS_SMASK \
+		0x80000ull
+#define SEND_DMA_CHECK_ENABLE_DISALLOW_TOO_LONG_IB_PACKETS_SMASK 0x40000ull
+#define SEND_DMA_CHECK_ENABLE_DISALLOW_TOO_SMALL_BYPASS_PACKETS_SMASK \
+		0x8000ull
+#define SEND_DMA_CHECK_ENABLE_DISALLOW_TOO_SMALL_IB_PACKETS_SMASK 0x4000ull
+#define SEND_DMA_CHECK_JOB_KEY (TXE + 0x000000200090)
+#define SEND_DMA_CHECK_OPCODE (TXE + 0x0000002000A8)
+#define SEND_DMA_CHECK_PARTITION_KEY (TXE + 0x000000200098)
+#define SEND_DMA_CHECK_SLID (TXE + 0x0000002000A0)
+#define SEND_DMA_CHECK_SLID_MASK_MASK 0xFFFFull
+#define SEND_DMA_CHECK_SLID_MASK_SHIFT 16
+#define SEND_DMA_CHECK_SLID_VALUE_MASK 0xFFFFull
+#define SEND_DMA_CHECK_SLID_VALUE_SHIFT 0
+#define SEND_DMA_CHECK_VL (TXE + 0x000000200088)
+#define SEND_DMA_CTRL (TXE + 0x000000200000)
+#define SEND_DMA_CTRL_SDMA_CLEANUP_SMASK 0x4ull
+#define SEND_DMA_CTRL_SDMA_ENABLE_SMASK 0x1ull
+#define SEND_DMA_CTRL_SDMA_HALT_SMASK 0x2ull
+#define SEND_DMA_CTRL_SDMA_INT_ENABLE_SMASK 0x8ull
+#define SEND_DMA_DESC_CNT (TXE + 0x000000200050)
+#define SEND_DMA_DESC_CNT_CNT_MASK 0xFFFFull
+#define SEND_DMA_DESC_CNT_CNT_SHIFT 0
+#define SEND_DMA_ENG_ERR_CLEAR (TXE + 0x000000200070)
+#define SEND_DMA_ENG_ERR_CLEAR_SDMA_HEADER_REQUEST_FIFO_UNC_ERR_MASK 0x1ull
+#define SEND_DMA_ENG_ERR_CLEAR_SDMA_HEADER_REQUEST_FIFO_UNC_ERR_SHIFT 18
+#define SEND_DMA_ENG_ERR_MASK (TXE + 0x000000200068)
+#define SEND_DMA_ENG_ERR_STATUS (TXE + 0x000000200060)
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_ASSEMBLY_UNC_ERR_SMASK 0x8000ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_DESC_TABLE_UNC_ERR_SMASK 0x4000ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_FIRST_DESC_ERR_SMASK 0x10ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_GEN_MISMATCH_ERR_SMASK 0x2ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_HALT_ERR_SMASK 0x40ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_HEADER_ADDRESS_ERR_SMASK 0x800ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_HEADER_LENGTH_ERR_SMASK 0x1000ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_HEADER_REQUEST_FIFO_UNC_ERR_SMASK \
+		0x40000ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_HEADER_SELECT_ERR_SMASK 0x400ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_HEADER_STORAGE_UNC_ERR_SMASK \
+		0x20000ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_LENGTH_MISMATCH_ERR_SMASK 0x80ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_MEM_READ_ERR_SMASK 0x20ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_PACKET_DESC_OVERFLOW_ERR_SMASK \
+		0x100ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_PACKET_TRACKING_UNC_ERR_SMASK \
+		0x10000ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_TAIL_OUT_OF_BOUNDS_ERR_SMASK 0x8ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_TIMEOUT_ERR_SMASK 0x2000ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_TOO_LONG_ERR_SMASK 0x4ull
+#define SEND_DMA_ENG_ERR_STATUS_SDMA_WRONG_DW_ERR_SMASK 0x1ull
+#define SEND_DMA_ENGINES (TXE + 0x000000000018)
+#define SEND_DMA_ERR_CLEAR (TXE + 0x000000000070)
+#define SEND_DMA_ERR_MASK (TXE + 0x000000000068)
+#define SEND_DMA_ERR_STATUS (TXE + 0x000000000060)
+#define SEND_DMA_ERR_STATUS_SDMA_CSR_PARITY_ERR_SMASK 0x2ull
+#define SEND_DMA_ERR_STATUS_SDMA_PCIE_REQ_TRACKING_COR_ERR_SMASK 0x8ull
+#define SEND_DMA_ERR_STATUS_SDMA_PCIE_REQ_TRACKING_UNC_ERR_SMASK 0x4ull
+#define SEND_DMA_ERR_STATUS_SDMA_RPY_TAG_ERR_SMASK 0x1ull
+#define SEND_DMA_HEAD (TXE + 0x000000200028)
+#define SEND_DMA_HEAD_ADDR (TXE + 0x000000200030)
+#define SEND_DMA_LEN_GEN (TXE + 0x000000200018)
+#define SEND_DMA_LEN_GEN_GENERATION_SHIFT 16
+#define SEND_DMA_LEN_GEN_LENGTH_SHIFT 6
+#define SEND_DMA_MEMORY (TXE + 0x0000002000B0)
+#define SEND_DMA_MEMORY_SDMA_MEMORY_CNT_SHIFT 16
+#define SEND_DMA_MEMORY_SDMA_MEMORY_INDEX_SHIFT 0
+#define SEND_DMA_MEM_SIZE (TXE + 0x000000000028)
+#define SEND_DMA_PRIORITY_THLD (TXE + 0x000000200038)
+#define SEND_DMA_RELOAD_CNT (TXE + 0x000000200048)
+#define SEND_DMA_STATUS (TXE + 0x000000200008)
+#define SEND_DMA_STATUS_ENG_CLEANED_UP_SMASK 0x200000000000000ull
+#define SEND_DMA_STATUS_ENG_HALTED_SMASK 0x100000000000000ull
+#define SEND_DMA_TAIL (TXE + 0x000000200020)
+#define SEND_EGRESS_CTXT_STATUS (TXE + 0x000000000800)
+#define SEND_EGRESS_CTXT_STATUS_CTXT_EGRESS_HALT_STATUS_SMASK 0x10000ull
+#define SEND_EGRESS_CTXT_STATUS_CTXT_EGRESS_PACKET_OCCUPANCY_SHIFT 0
+#define SEND_EGRESS_CTXT_STATUS_CTXT_EGRESS_PACKET_OCCUPANCY_SMASK \
+		0x3FFFull
+#define SEND_EGRESS_ERR_CLEAR (TXE + 0x000000000090)
+#define SEND_EGRESS_ERR_INFO (TXE + 0x000000000F00)
+#define SEND_EGRESS_ERR_INFO_BAD_PKT_LEN_ERR_SMASK 0x20000ull
+#define SEND_EGRESS_ERR_INFO_BYPASS_ERR_SMASK 0x800ull
+#define SEND_EGRESS_ERR_INFO_GRH_ERR_SMASK 0x400ull
+#define SEND_EGRESS_ERR_INFO_JOB_KEY_ERR_SMASK 0x4ull
+#define SEND_EGRESS_ERR_INFO_KDETH_PACKETS_ERR_SMASK 0x1000ull
+#define SEND_EGRESS_ERR_INFO_NON_KDETH_PACKETS_ERR_SMASK 0x2000ull
+#define SEND_EGRESS_ERR_INFO_OPCODE_ERR_SMASK 0x20ull
+#define SEND_EGRESS_ERR_INFO_PARTITION_KEY_ERR_SMASK 0x8ull
+#define SEND_EGRESS_ERR_INFO_PBC_STATIC_RATE_CONTROL_ERR_SMASK 0x100000ull
+#define SEND_EGRESS_ERR_INFO_PBC_TEST_ERR_SMASK 0x10000ull
+#define SEND_EGRESS_ERR_INFO_RAW_ERR_SMASK 0x100ull
+#define SEND_EGRESS_ERR_INFO_RAW_IPV6_ERR_SMASK 0x200ull
+#define SEND_EGRESS_ERR_INFO_SLID_ERR_SMASK 0x10ull
+#define SEND_EGRESS_ERR_INFO_TOO_LONG_BYPASS_PACKETS_ERR_SMASK 0x80000ull
+#define SEND_EGRESS_ERR_INFO_TOO_LONG_IB_PACKET_ERR_SMASK 0x40000ull
+#define SEND_EGRESS_ERR_INFO_TOO_SMALL_BYPASS_PACKETS_ERR_SMASK 0x8000ull
+#define SEND_EGRESS_ERR_INFO_TOO_SMALL_IB_PACKETS_ERR_SMASK 0x4000ull
+#define SEND_EGRESS_ERR_INFO_VL_ERR_SMASK 0x2ull
+#define SEND_EGRESS_ERR_INFO_VL_MAPPING_ERR_SMASK 0x40ull
+#define SEND_EGRESS_ERR_MASK (TXE + 0x000000000088)
+#define SEND_EGRESS_ERR_SOURCE (TXE + 0x000000000F08)
+#define SEND_EGRESS_ERR_STATUS (TXE + 0x000000000080)
+#define SEND_EGRESS_ERR_STATUS_TX_CONFIG_PARITY_ERR_SMASK 0x8000ull
+#define SEND_EGRESS_ERR_STATUS_TX_CREDIT_OVERRUN_ERR_SMASK \
+		0x200000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_CREDIT_RETURN_PARITY_ERR_SMASK \
+		0x20000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_CREDIT_RETURN_VL_ERR_SMASK \
+		0x800000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_EGRESS_FIFO_COR_ERR_SMASK \
+		0x2000000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_EGRESS_FIFO_UNC_ERR_SMASK \
+		0x200000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_EGRESS_FIFO_UNDERRUN_OR_PARITY_ERR_SMASK \
+		0x8ull
+#define SEND_EGRESS_ERR_STATUS_TX_HCRC_INSERTION_ERR_SMASK \
+		0x400000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_ILLEGAL_VL_ERR_SMASK 0x1000ull
+#define SEND_EGRESS_ERR_STATUS_TX_INCORRECT_LINK_STATE_ERR_SMASK 0x20ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_CSR_PARITY_ERR_SMASK 0x2000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO0_COR_ERR_SMASK \
+		0x1000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO0_UNC_OR_PARITY_ERR_SMASK \
+		0x100000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO1_COR_ERR_SMASK \
+		0x2000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO1_UNC_OR_PARITY_ERR_SMASK \
+		0x200000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO2_COR_ERR_SMASK \
+		0x4000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO2_UNC_OR_PARITY_ERR_SMASK \
+		0x400000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO3_COR_ERR_SMASK \
+		0x8000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO3_UNC_OR_PARITY_ERR_SMASK \
+		0x800000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO4_COR_ERR_SMASK \
+		0x10000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO4_UNC_OR_PARITY_ERR_SMASK \
+		0x1000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO5_COR_ERR_SMASK \
+		0x20000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO5_UNC_OR_PARITY_ERR_SMASK \
+		0x2000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO6_COR_ERR_SMASK \
+		0x40000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO6_UNC_OR_PARITY_ERR_SMASK \
+		0x4000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO7_COR_ERR_SMASK \
+		0x80000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO7_UNC_OR_PARITY_ERR_SMASK \
+		0x8000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO8_COR_ERR_SMASK \
+		0x100000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LAUNCH_FIFO8_UNC_OR_PARITY_ERR_SMASK \
+		0x10000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_LINKDOWN_ERR_SMASK 0x10ull
+#define SEND_EGRESS_ERR_STATUS_TX_PIO_LAUNCH_INTF_PARITY_ERR_SMASK 0x80ull
+#define SEND_EGRESS_ERR_STATUS_TX_PKT_INTEGRITY_MEM_COR_ERR_SMASK 0x1ull
+#define SEND_EGRESS_ERR_STATUS_TX_PKT_INTEGRITY_MEM_UNC_ERR_SMASK 0x2ull
+#define SEND_EGRESS_ERR_STATUS_TX_READ_PIO_MEMORY_COR_ERR_SMASK \
+		0x1000000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_READ_PIO_MEMORY_CSR_UNC_ERR_SMASK \
+		0x8000000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_READ_PIO_MEMORY_UNC_ERR_SMASK \
+		0x100000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_READ_SDMA_MEMORY_COR_ERR_SMASK \
+		0x800000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_READ_SDMA_MEMORY_CSR_UNC_ERR_SMASK \
+		0x4000000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_READ_SDMA_MEMORY_UNC_ERR_SMASK \
+		0x80000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SB_HDR_COR_ERR_SMASK 0x400000000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SB_HDR_UNC_ERR_SMASK 0x40000000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SBRD_CTL_CSR_PARITY_ERR_SMASK 0x4000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SBRD_CTL_STATE_MACHINE_PARITY_ERR_SMASK \
+		0x800ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA0_DISALLOWED_PACKET_ERR_SMASK \
+		0x10000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA10_DISALLOWED_PACKET_ERR_SMASK \
+		0x4000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA11_DISALLOWED_PACKET_ERR_SMASK \
+		0x8000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA12_DISALLOWED_PACKET_ERR_SMASK \
+		0x10000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA13_DISALLOWED_PACKET_ERR_SMASK \
+		0x20000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA14_DISALLOWED_PACKET_ERR_SMASK \
+		0x40000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA15_DISALLOWED_PACKET_ERR_SMASK \
+		0x80000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA1_DISALLOWED_PACKET_ERR_SMASK \
+		0x20000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA2_DISALLOWED_PACKET_ERR_SMASK \
+		0x40000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA3_DISALLOWED_PACKET_ERR_SMASK \
+		0x80000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA4_DISALLOWED_PACKET_ERR_SMASK \
+		0x100000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA5_DISALLOWED_PACKET_ERR_SMASK \
+		0x200000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA6_DISALLOWED_PACKET_ERR_SMASK \
+		0x400000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA7_DISALLOWED_PACKET_ERR_SMASK \
+		0x800000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA8_DISALLOWED_PACKET_ERR_SMASK \
+		0x1000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA9_DISALLOWED_PACKET_ERR_SMASK \
+		0x2000000ull
+#define SEND_EGRESS_ERR_STATUS_TX_SDMA_LAUNCH_INTF_PARITY_ERR_SMASK \
+		0x100ull
+#define SEND_EGRESS_SEND_DMA_STATUS (TXE + 0x000000000E00)
+#define SEND_EGRESS_SEND_DMA_STATUS_SDMA_EGRESS_PACKET_OCCUPANCY_SHIFT 0
+#define SEND_EGRESS_SEND_DMA_STATUS_SDMA_EGRESS_PACKET_OCCUPANCY_SMASK \
+		0x3FFFull
+#define SEND_ERR_CLEAR (TXE + 0x0000000000F0)
+#define SEND_ERR_MASK (TXE + 0x0000000000E8)
+#define SEND_ERR_STATUS (TXE + 0x0000000000E0)
+#define SEND_ERR_STATUS_SEND_CSR_PARITY_ERR_SMASK 0x1ull
+#define SEND_ERR_STATUS_SEND_CSR_READ_BAD_ADDR_ERR_SMASK 0x2ull
+#define SEND_ERR_STATUS_SEND_CSR_WRITE_BAD_ADDR_ERR_SMASK 0x4ull
+#define SEND_HIGH_PRIORITY_LIMIT (TXE + 0x000000000030)
+#define SEND_HIGH_PRIORITY_LIMIT_LIMIT_MASK 0x3FFFull
+#define SEND_HIGH_PRIORITY_LIMIT_LIMIT_SHIFT 0
+#define SEND_HIGH_PRIORITY_LIST (TXE + 0x000000000180)
+#define SEND_LEN_CHECK0 (TXE + 0x0000000000D0)
+#define SEND_LEN_CHECK0_LEN_VL0_MASK 0xFFFull
+#define SEND_LEN_CHECK0_LEN_VL1_SHIFT 12
+#define SEND_LEN_CHECK1 (TXE + 0x0000000000D8)
+#define SEND_LEN_CHECK1_LEN_VL15_MASK 0xFFFull
+#define SEND_LEN_CHECK1_LEN_VL15_SHIFT 48
+#define SEND_LEN_CHECK1_LEN_VL4_MASK 0xFFFull
+#define SEND_LEN_CHECK1_LEN_VL5_SHIFT 12
+#define SEND_LOW_PRIORITY_LIST (TXE + 0x000000000100)
+#define SEND_LOW_PRIORITY_LIST_VL_MASK 0x7ull
+#define SEND_LOW_PRIORITY_LIST_VL_SHIFT 16
+#define SEND_LOW_PRIORITY_LIST_WEIGHT_MASK 0xFFull
+#define SEND_LOW_PRIORITY_LIST_WEIGHT_SHIFT 0
+#define SEND_PIO_ERR_CLEAR (TXE + 0x000000000050)
+#define SEND_PIO_ERR_CLEAR_PIO_INIT_SM_IN_ERR_SMASK 0x20000ull
+#define SEND_PIO_ERR_MASK (TXE + 0x000000000048)
+#define SEND_PIO_ERR_STATUS (TXE + 0x000000000040)
+#define SEND_PIO_ERR_STATUS_PIO_BLOCK_QW_COUNT_PARITY_ERR_SMASK \
+		0x1000000ull
+#define SEND_PIO_ERR_STATUS_PIO_CREDIT_RET_FIFO_PARITY_ERR_SMASK 0x8000ull
+#define SEND_PIO_ERR_STATUS_PIO_CSR_PARITY_ERR_SMASK 0x4ull
+#define SEND_PIO_ERR_STATUS_PIO_CURRENT_FREE_CNT_PARITY_ERR_SMASK \
+		0x100000000ull
+#define SEND_PIO_ERR_STATUS_PIO_HOST_ADDR_MEM_COR_ERR_SMASK 0x100000ull
+#define SEND_PIO_ERR_STATUS_PIO_HOST_ADDR_MEM_UNC_ERR_SMASK 0x80000ull
+#define SEND_PIO_ERR_STATUS_PIO_INIT_SM_IN_ERR_SMASK 0x20000ull
+#define SEND_PIO_ERR_STATUS_PIO_LAST_RETURNED_CNT_PARITY_ERR_SMASK \
+		0x200000000ull
+#define SEND_PIO_ERR_STATUS_PIO_PCC_FIFO_PARITY_ERR_SMASK 0x20ull
+#define SEND_PIO_ERR_STATUS_PIO_PCC_SOP_HEAD_PARITY_ERR_SMASK \
+		0x400000000ull
+#define SEND_PIO_ERR_STATUS_PIO_PEC_FIFO_PARITY_ERR_SMASK 0x40ull
+#define SEND_PIO_ERR_STATUS_PIO_PEC_SOP_HEAD_PARITY_ERR_SMASK \
+		0x800000000ull
+#define SEND_PIO_ERR_STATUS_PIO_PKT_EVICT_FIFO_PARITY_ERR_SMASK 0x200ull
+#define SEND_PIO_ERR_STATUS_PIO_PKT_EVICT_SM_OR_ARB_SM_ERR_SMASK 0x40000ull
+#define SEND_PIO_ERR_STATUS_PIO_PPMC_BQC_MEM_PARITY_ERR_SMASK 0x10000000ull
+#define SEND_PIO_ERR_STATUS_PIO_PPMC_PBL_FIFO_ERR_SMASK 0x10000ull
+#define SEND_PIO_ERR_STATUS_PIO_PPMC_SOP_LEN_ERR_SMASK 0x20000000ull
+#define SEND_PIO_ERR_STATUS_PIO_SB_MEM_FIFO0_ERR_SMASK 0x8ull
+#define SEND_PIO_ERR_STATUS_PIO_SB_MEM_FIFO1_ERR_SMASK 0x10ull
+#define SEND_PIO_ERR_STATUS_PIO_SBRDCTL_CRREL_PARITY_ERR_SMASK 0x80ull
+#define SEND_PIO_ERR_STATUS_PIO_SBRDCTRL_CRREL_FIFO_PARITY_ERR_SMASK \
+		0x100ull
+#define SEND_PIO_ERR_STATUS_PIO_SM_PKT_RESET_PARITY_ERR_SMASK 0x400ull
+#define SEND_PIO_ERR_STATUS_PIO_STATE_MACHINE_ERR_SMASK 0x400000ull
+#define SEND_PIO_ERR_STATUS_PIO_VL_FIFO_PARITY_ERR_SMASK 0x8000000ull
+#define SEND_PIO_ERR_STATUS_PIO_VLF_SOP_PARITY_ERR_SMASK 0x4000000ull
+#define SEND_PIO_ERR_STATUS_PIO_VLF_VL_LEN_PARITY_ERR_SMASK 0x2000000ull
+#define SEND_PIO_ERR_STATUS_PIO_VL_LEN_MEM_BANK0_COR_ERR_SMASK 0x2000ull
+#define SEND_PIO_ERR_STATUS_PIO_VL_LEN_MEM_BANK0_UNC_ERR_SMASK 0x800ull
+#define SEND_PIO_ERR_STATUS_PIO_VL_LEN_MEM_BANK1_COR_ERR_SMASK 0x4000ull
+#define SEND_PIO_ERR_STATUS_PIO_VL_LEN_MEM_BANK1_UNC_ERR_SMASK 0x1000ull
+#define SEND_PIO_ERR_STATUS_PIO_WRITE_ADDR_PARITY_ERR_SMASK 0x2ull
+#define SEND_PIO_ERR_STATUS_PIO_WRITE_BAD_CTXT_ERR_SMASK 0x1ull
+#define SEND_PIO_ERR_STATUS_PIO_WRITE_DATA_PARITY_ERR_SMASK 0x200000ull
+#define SEND_PIO_ERR_STATUS_PIO_WRITE_QW_VALID_PARITY_ERR_SMASK 0x800000ull
+#define SEND_PIO_INIT_CTXT (TXE + 0x000000000038)
+#define SEND_PIO_INIT_CTXT_PIO_ALL_CTXT_INIT_SMASK 0x1ull
+#define SEND_PIO_INIT_CTXT_PIO_CTXT_NUM_MASK 0xFFull
+#define SEND_PIO_INIT_CTXT_PIO_CTXT_NUM_SHIFT 8
+#define SEND_PIO_INIT_CTXT_PIO_INIT_ERR_SMASK 0x8ull
+#define SEND_PIO_INIT_CTXT_PIO_INIT_IN_PROGRESS_SMASK 0x4ull
+#define SEND_PIO_INIT_CTXT_PIO_SINGLE_CTXT_INIT_SMASK 0x2ull
+#define SEND_PIO_MEM_SIZE (TXE + 0x000000000020)
+#define SEND_SC2VLT0 (TXE + 0x0000000000B0)
+#define SEND_SC2VLT0_SC0_SHIFT 0
+#define SEND_SC2VLT0_SC1_SHIFT 8
+#define SEND_SC2VLT0_SC2_SHIFT 16
+#define SEND_SC2VLT0_SC3_SHIFT 24
+#define SEND_SC2VLT0_SC4_SHIFT 32
+#define SEND_SC2VLT0_SC5_SHIFT 40
+#define SEND_SC2VLT0_SC6_SHIFT 48
+#define SEND_SC2VLT0_SC7_SHIFT 56
+#define SEND_SC2VLT1 (TXE + 0x0000000000B8)
+#define SEND_SC2VLT1_SC10_SHIFT 16
+#define SEND_SC2VLT1_SC11_SHIFT 24
+#define SEND_SC2VLT1_SC12_SHIFT 32
+#define SEND_SC2VLT1_SC13_SHIFT 40
+#define SEND_SC2VLT1_SC14_SHIFT 48
+#define SEND_SC2VLT1_SC15_SHIFT 56
+#define SEND_SC2VLT1_SC8_SHIFT 0
+#define SEND_SC2VLT1_SC9_SHIFT 8
+#define SEND_SC2VLT2 (TXE + 0x0000000000C0)
+#define SEND_SC2VLT2_SC16_SHIFT 0
+#define SEND_SC2VLT2_SC17_SHIFT 8
+#define SEND_SC2VLT2_SC18_SHIFT 16
+#define SEND_SC2VLT2_SC19_SHIFT 24
+#define SEND_SC2VLT2_SC20_SHIFT 32
+#define SEND_SC2VLT2_SC21_SHIFT 40
+#define SEND_SC2VLT2_SC22_SHIFT 48
+#define SEND_SC2VLT2_SC23_SHIFT 56
+#define SEND_SC2VLT3 (TXE + 0x0000000000C8)
+#define SEND_SC2VLT3_SC24_SHIFT 0
+#define SEND_SC2VLT3_SC25_SHIFT 8
+#define SEND_SC2VLT3_SC26_SHIFT 16
+#define SEND_SC2VLT3_SC27_SHIFT 24
+#define SEND_SC2VLT3_SC28_SHIFT 32
+#define SEND_SC2VLT3_SC29_SHIFT 40
+#define SEND_SC2VLT3_SC30_SHIFT 48
+#define SEND_SC2VLT3_SC31_SHIFT 56
+#define SEND_STATIC_RATE_CONTROL (TXE + 0x0000000000A8)
+#define SEND_STATIC_RATE_CONTROL_CSR_SRC_RELOAD_SHIFT 0
+#define SEND_STATIC_RATE_CONTROL_CSR_SRC_RELOAD_SMASK 0xFFFFull
+#define PCIE_CFG_REG_PL2 (PCIE + 0x000000000708)
+#define PCIE_CFG_REG_PL102 (PCIE + 0x000000000898)
+#define PCIE_CFG_REG_PL102_GEN3_EQ_POST_CURSOR_PSET_SHIFT 12
+#define PCIE_CFG_REG_PL102_GEN3_EQ_CURSOR_PSET_SHIFT 6
+#define PCIE_CFG_REG_PL102_GEN3_EQ_PRE_CURSOR_PSET_SHIFT 0
+#define PCIE_CFG_REG_PL103 (PCIE + 0x00000000089C)
+#define PCIE_CFG_REG_PL105 (PCIE + 0x0000000008A4)
+#define PCIE_CFG_REG_PL105_GEN3_EQ_VIOLATE_COEF_RULES_SMASK 0x1ull
+#define PCIE_CFG_REG_PL2_LOW_PWR_ENT_CNT_SHIFT 24
+#define PCIE_CFG_REG_PL100 (PCIE + 0x000000000890)
+#define PCIE_CFG_REG_PL100_EQ_EIEOS_CNT_SMASK 0x400ull
+#define PCIE_CFG_REG_PL101 (PCIE + 0x000000000894)
+#define PCIE_CFG_REG_PL101_GEN3_EQ_LOCAL_FS_SHIFT 6
+#define PCIE_CFG_REG_PL101_GEN3_EQ_LOCAL_LF_SHIFT 0
+#define PCIE_CFG_REG_PL106 (PCIE + 0x0000000008A8)
+#define PCIE_CFG_REG_PL106_GEN3_EQ_PSET_REQ_VEC_SHIFT 8
+#define PCIE_CFG_REG_PL106_GEN3_EQ_EVAL2MS_DISABLE_SMASK 0x20ull
+#define PCIE_CFG_REG_PL106_GEN3_EQ_PHASE23_EXIT_MODE_SMASK 0x10ull
+
+#endif          /* DEF_CHIP_REG */
diff --git a/drivers/staging/rdma/hfi1/common.h b/drivers/staging/rdma/hfi1/common.h
new file mode 100644
index 000000000000..5f2293729cf9
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/common.h
@@ -0,0 +1,415 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#ifndef _COMMON_H
+#define _COMMON_H
+
+#include <rdma/hfi/hfi1_user.h>
+
+/*
+ * This file contains defines, structures, etc. that are used
+ * to communicate between kernel and user code.
+ */
+
+/* version of protocol header (known to chip also). In the long run,
+ * we should be able to generate and accept a range of version numbers;
+ * for now we only accept one, and it's compiled in.
+ */
+#define IPS_PROTO_VERSION 2
+
+/*
+ * These are compile time constants that you may want to enable or disable
+ * if you are trying to debug problems with code or performance.
+ * HFI1_VERBOSE_TRACING define as 1 if you want additional tracing in
+ * fast path code
+ * HFI1_TRACE_REGWRITES define as 1 if you want register writes to be
+ * traced in fast path code
+ * _HFI1_TRACING define as 0 if you want to remove all tracing in a
+ * compilation unit
+ */
+
+/*
+ * If a packet's QP[23:16] bits match this value, then it is
+ * a PSM packet and the hardware will expect a KDETH header
+ * following the BTH.
+ */
+#define DEFAULT_KDETH_QP 0x80
+
+/* driver/hw feature set bitmask */
+#define HFI1_CAP_USER_SHIFT      24
+#define HFI1_CAP_MASK            ((1UL << HFI1_CAP_USER_SHIFT) - 1)
+/* locked flag - if set, only HFI1_CAP_WRITABLE_MASK bits can be set */
+#define HFI1_CAP_LOCKED_SHIFT    63
+#define HFI1_CAP_LOCKED_MASK     0x1ULL
+#define HFI1_CAP_LOCKED_SMASK    (HFI1_CAP_LOCKED_MASK << HFI1_CAP_LOCKED_SHIFT)
+/* extra bits used between kernel and user processes */
+#define HFI1_CAP_MISC_SHIFT      (HFI1_CAP_USER_SHIFT * 2)
+#define HFI1_CAP_MISC_MASK       ((1ULL << (HFI1_CAP_LOCKED_SHIFT - \
+					   HFI1_CAP_MISC_SHIFT)) - 1)
+
+#define HFI1_CAP_KSET(cap) ({ hfi1_cap_mask |= HFI1_CAP_##cap; hfi1_cap_mask; })
+#define HFI1_CAP_KCLEAR(cap)						\
+	({								\
+		hfi1_cap_mask &= ~HFI1_CAP_##cap;			\
+		hfi1_cap_mask;						\
+	})
+#define HFI1_CAP_USET(cap)						\
+	({								\
+		hfi1_cap_mask |= (HFI1_CAP_##cap << HFI1_CAP_USER_SHIFT); \
+		hfi1_cap_mask;						\
+		})
+#define HFI1_CAP_UCLEAR(cap)						\
+	({								\
+		hfi1_cap_mask &= ~(HFI1_CAP_##cap << HFI1_CAP_USER_SHIFT); \
+		hfi1_cap_mask;						\
+	})
+#define HFI1_CAP_SET(cap)						\
+	({								\
+		hfi1_cap_mask |= (HFI1_CAP_##cap | (HFI1_CAP_##cap <<	\
+						  HFI1_CAP_USER_SHIFT)); \
+		hfi1_cap_mask;						\
+	})
+#define HFI1_CAP_CLEAR(cap)						\
+	({								\
+		hfi1_cap_mask &= ~(HFI1_CAP_##cap |			\
+				  (HFI1_CAP_##cap << HFI1_CAP_USER_SHIFT)); \
+		hfi1_cap_mask;						\
+	})
+#define HFI1_CAP_LOCK()							\
+	({ hfi1_cap_mask |= HFI1_CAP_LOCKED_SMASK; hfi1_cap_mask; })
+#define HFI1_CAP_LOCKED() (!!(hfi1_cap_mask & HFI1_CAP_LOCKED_SMASK))
+/*
+ * The set of capability bits that can be changed after initial load
+ * This set is the same for kernel and user contexts. However, for
+ * user contexts, the set can be further filtered by using the
+ * HFI1_CAP_RESERVED_MASK bits.
+ */
+#define HFI1_CAP_WRITABLE_MASK   (HFI1_CAP_SDMA_AHG |			\
+				 HFI1_CAP_HDRSUPP |			\
+				 HFI1_CAP_MULTI_PKT_EGR |		\
+				 HFI1_CAP_NODROP_RHQ_FULL |		\
+				 HFI1_CAP_NODROP_EGR_FULL |		\
+				 HFI1_CAP_ALLOW_PERM_JKEY |		\
+				 HFI1_CAP_STATIC_RATE_CTRL |		\
+				 HFI1_CAP_PRINT_UNIMPL)
+/*
+ * A set of capability bits that are "global" and are not allowed to be
+ * set in the user bitmask.
+ */
+#define HFI1_CAP_RESERVED_MASK   ((HFI1_CAP_SDMA |			\
+				  HFI1_CAP_USE_SDMA_HEAD |		\
+				  HFI1_CAP_EXTENDED_PSN |		\
+				  HFI1_CAP_PRINT_UNIMPL |		\
+				  HFI1_CAP_QSFP_ENABLED |		\
+				  HFI1_CAP_NO_INTEGRITY |		\
+				  HFI1_CAP_PKEY_CHECK) <<		\
+				 HFI1_CAP_USER_SHIFT)
+/*
+ * Set of capabilities that need to be enabled for kernel context in
+ * order to be allowed for user contexts, as well.
+ */
+#define HFI1_CAP_MUST_HAVE_KERN (HFI1_CAP_STATIC_RATE_CTRL)
+/* Default enabled capabilities (both kernel and user) */
+#define HFI1_CAP_MASK_DEFAULT    (HFI1_CAP_HDRSUPP |			\
+				 HFI1_CAP_NODROP_RHQ_FULL |		\
+				 HFI1_CAP_NODROP_EGR_FULL |		\
+				 HFI1_CAP_SDMA |			\
+				 HFI1_CAP_PRINT_UNIMPL |		\
+				 HFI1_CAP_STATIC_RATE_CTRL |		\
+				 HFI1_CAP_QSFP_ENABLED |		\
+				 HFI1_CAP_PKEY_CHECK |			\
+				 HFI1_CAP_MULTI_PKT_EGR |		\
+				 HFI1_CAP_EXTENDED_PSN |		\
+				 ((HFI1_CAP_HDRSUPP |			\
+				   HFI1_CAP_MULTI_PKT_EGR |		\
+				   HFI1_CAP_STATIC_RATE_CTRL |		\
+				   HFI1_CAP_PKEY_CHECK |		\
+				   HFI1_CAP_EARLY_CREDIT_RETURN) <<	\
+				  HFI1_CAP_USER_SHIFT))
+/*
+ * A bitmask of kernel/global capabilities that should be communicated
+ * to user level processes.
+ */
+#define HFI1_CAP_K2U (HFI1_CAP_SDMA |			\
+		     HFI1_CAP_EXTENDED_PSN |		\
+		     HFI1_CAP_PKEY_CHECK |		\
+		     HFI1_CAP_NO_INTEGRITY)
+
+#define HFI1_USER_SWVERSION ((HFI1_USER_SWMAJOR << 16) | HFI1_USER_SWMINOR)
+
+#ifndef HFI1_KERN_TYPE
+#define HFI1_KERN_TYPE 0
+#endif
+
+/*
+ * Similarly, this is the kernel version going back to the user.  It's
+ * slightly different, in that we want to tell if the driver was built as
+ * part of a Intel release, or from the driver from openfabrics.org,
+ * kernel.org, or a standard distribution, for support reasons.
+ * The high bit is 0 for non-Intel and 1 for Intel-built/supplied.
+ *
+ * It's returned by the driver to the user code during initialization in the
+ * spi_sw_version field of hfi1_base_info, so the user code can in turn
+ * check for compatibility with the kernel.
+*/
+#define HFI1_KERN_SWVERSION ((HFI1_KERN_TYPE << 31) | HFI1_USER_SWVERSION)
+
+/*
+ * Define the driver version number.  This is something that refers only
+ * to the driver itself, not the software interfaces it supports.
+ */
+#ifndef HFI1_DRIVER_VERSION_BASE
+#define HFI1_DRIVER_VERSION_BASE "0.9-248"
+#endif
+
+/* create the final driver version string */
+#ifdef HFI1_IDSTR
+#define HFI1_DRIVER_VERSION HFI1_DRIVER_VERSION_BASE " " HFI1_IDSTR
+#else
+#define HFI1_DRIVER_VERSION HFI1_DRIVER_VERSION_BASE
+#endif
+
+/*
+ * Diagnostics can send a packet by writing the following
+ * struct to the diag packet special file.
+ *
+ * This allows a custom PBC qword, so that special modes and deliberate
+ * changes to CRCs can be used.
+ */
+#define _DIAG_PKT_VERS 1
+struct diag_pkt {
+	__u16 version;		/* structure version */
+	__u16 unit;		/* which device */
+	__u16 sw_index;		/* send sw index to use */
+	__u16 len;		/* data length, in bytes */
+	__u16 port;		/* port number */
+	__u16 unused;
+	__u32 flags;		/* call flags */
+	__u64 data;		/* user data pointer */
+	__u64 pbc;		/* PBC for the packet */
+};
+
+/* diag_pkt flags */
+#define F_DIAGPKT_WAIT 0x1	/* wait until packet is sent */
+
+/*
+ * The next set of defines are for packet headers, and chip register
+ * and memory bits that are visible to and/or used by user-mode software.
+ */
+
+/*
+ * Receive Header Flags
+ */
+#define RHF_PKT_LEN_SHIFT	0
+#define RHF_PKT_LEN_MASK	0xfffull
+#define RHF_PKT_LEN_SMASK (RHF_PKT_LEN_MASK << RHF_PKT_LEN_SHIFT)
+
+#define RHF_RCV_TYPE_SHIFT	12
+#define RHF_RCV_TYPE_MASK	0x7ull
+#define RHF_RCV_TYPE_SMASK (RHF_RCV_TYPE_MASK << RHF_RCV_TYPE_SHIFT)
+
+#define RHF_USE_EGR_BFR_SHIFT	15
+#define RHF_USE_EGR_BFR_MASK	0x1ull
+#define RHF_USE_EGR_BFR_SMASK (RHF_USE_EGR_BFR_MASK << RHF_USE_EGR_BFR_SHIFT)
+
+#define RHF_EGR_INDEX_SHIFT	16
+#define RHF_EGR_INDEX_MASK	0x7ffull
+#define RHF_EGR_INDEX_SMASK (RHF_EGR_INDEX_MASK << RHF_EGR_INDEX_SHIFT)
+
+#define RHF_DC_INFO_SHIFT	27
+#define RHF_DC_INFO_MASK	0x1ull
+#define RHF_DC_INFO_SMASK (RHF_DC_INFO_MASK << RHF_DC_INFO_SHIFT)
+
+#define RHF_RCV_SEQ_SHIFT	28
+#define RHF_RCV_SEQ_MASK	0xfull
+#define RHF_RCV_SEQ_SMASK (RHF_RCV_SEQ_MASK << RHF_RCV_SEQ_SHIFT)
+
+#define RHF_EGR_OFFSET_SHIFT	32
+#define RHF_EGR_OFFSET_MASK	0xfffull
+#define RHF_EGR_OFFSET_SMASK (RHF_EGR_OFFSET_MASK << RHF_EGR_OFFSET_SHIFT)
+#define RHF_HDRQ_OFFSET_SHIFT	44
+#define RHF_HDRQ_OFFSET_MASK	0x1ffull
+#define RHF_HDRQ_OFFSET_SMASK (RHF_HDRQ_OFFSET_MASK << RHF_HDRQ_OFFSET_SHIFT)
+#define RHF_K_HDR_LEN_ERR	(0x1ull << 53)
+#define RHF_DC_UNC_ERR		(0x1ull << 54)
+#define RHF_DC_ERR		(0x1ull << 55)
+#define RHF_RCV_TYPE_ERR_SHIFT	56
+#define RHF_RCV_TYPE_ERR_MASK	0x7ul
+#define RHF_RCV_TYPE_ERR_SMASK (RHF_RCV_TYPE_ERR_MASK << RHF_RCV_TYPE_ERR_SHIFT)
+#define RHF_TID_ERR		(0x1ull << 59)
+#define RHF_LEN_ERR		(0x1ull << 60)
+#define RHF_ECC_ERR		(0x1ull << 61)
+#define RHF_VCRC_ERR		(0x1ull << 62)
+#define RHF_ICRC_ERR		(0x1ull << 63)
+
+#define RHF_ERROR_SMASK 0xffe0000000000000ull		/* bits 63:53 */
+
+/* RHF receive types */
+#define RHF_RCV_TYPE_EXPECTED 0
+#define RHF_RCV_TYPE_EAGER    1
+#define RHF_RCV_TYPE_IB       2 /* normal IB, IB Raw, or IPv6 */
+#define RHF_RCV_TYPE_ERROR    3
+#define RHF_RCV_TYPE_BYPASS   4
+#define RHF_RCV_TYPE_INVALID5 5
+#define RHF_RCV_TYPE_INVALID6 6
+#define RHF_RCV_TYPE_INVALID7 7
+
+/* RHF receive type error - expected packet errors */
+#define RHF_RTE_EXPECTED_FLOW_SEQ_ERR	0x2
+#define RHF_RTE_EXPECTED_FLOW_GEN_ERR	0x4
+
+/* RHF receive type error - eager packet errors */
+#define RHF_RTE_EAGER_NO_ERR		0x0
+
+/* RHF receive type error - IB packet errors */
+#define RHF_RTE_IB_NO_ERR		0x0
+
+/* RHF receive type error - error packet errors */
+#define RHF_RTE_ERROR_NO_ERR		0x0
+#define RHF_RTE_ERROR_OP_CODE_ERR	0x1
+#define RHF_RTE_ERROR_KHDR_MIN_LEN_ERR	0x2
+#define RHF_RTE_ERROR_KHDR_HCRC_ERR	0x3
+#define RHF_RTE_ERROR_KHDR_KVER_ERR	0x4
+#define RHF_RTE_ERROR_CONTEXT_ERR	0x5
+#define RHF_RTE_ERROR_KHDR_TID_ERR	0x6
+
+/* RHF receive type error - bypass packet errors */
+#define RHF_RTE_BYPASS_NO_ERR		0x0
+
+/*
+ * This structure contains the first field common to all protocols
+ * that employ this chip.
+ */
+struct hfi1_message_header {
+	__be16 lrh[4];
+};
+
+/* IB - LRH header constants */
+#define HFI1_LRH_GRH 0x0003      /* 1. word of IB LRH - next header: GRH */
+#define HFI1_LRH_BTH 0x0002      /* 1. word of IB LRH - next header: BTH */
+
+/* misc. */
+#define SIZE_OF_CRC 1
+
+#define LIM_MGMT_P_KEY       0x7FFF
+#define FULL_MGMT_P_KEY      0xFFFF
+
+#define DEFAULT_P_KEY LIM_MGMT_P_KEY
+#define HFI1_PERMISSIVE_LID 0xFFFF
+#define HFI1_AETH_CREDIT_SHIFT 24
+#define HFI1_AETH_CREDIT_MASK 0x1F
+#define HFI1_AETH_CREDIT_INVAL 0x1F
+#define HFI1_MSN_MASK 0xFFFFFF
+#define HFI1_QPN_MASK 0xFFFFFF
+#define HFI1_FECN_SHIFT 31
+#define HFI1_FECN_MASK 1
+#define HFI1_FECN_SMASK (1 << HFI1_FECN_SHIFT)
+#define HFI1_BECN_SHIFT 30
+#define HFI1_BECN_MASK 1
+#define HFI1_BECN_SMASK (1 << HFI1_BECN_SHIFT)
+#define HFI1_MULTICAST_LID_BASE 0xC000
+
+static inline __u64 rhf_to_cpu(const __le32 *rbuf)
+{
+	return __le64_to_cpu(*((__le64 *)rbuf));
+}
+
+static inline u64 rhf_err_flags(u64 rhf)
+{
+	return rhf & RHF_ERROR_SMASK;
+}
+
+static inline u32 rhf_rcv_type(u64 rhf)
+{
+	return (rhf >> RHF_RCV_TYPE_SHIFT) & RHF_RCV_TYPE_MASK;
+}
+
+static inline u32 rhf_rcv_type_err(u64 rhf)
+{
+	return (rhf >> RHF_RCV_TYPE_ERR_SHIFT) & RHF_RCV_TYPE_ERR_MASK;
+}
+
+/* return size is in bytes, not DWORDs */
+static inline u32 rhf_pkt_len(u64 rhf)
+{
+	return ((rhf & RHF_PKT_LEN_SMASK) >> RHF_PKT_LEN_SHIFT) << 2;
+}
+
+static inline u32 rhf_egr_index(u64 rhf)
+{
+	return (rhf >> RHF_EGR_INDEX_SHIFT) & RHF_EGR_INDEX_MASK;
+}
+
+static inline u32 rhf_rcv_seq(u64 rhf)
+{
+	return (rhf >> RHF_RCV_SEQ_SHIFT) & RHF_RCV_SEQ_MASK;
+}
+
+/* returned offset is in DWORDS */
+static inline u32 rhf_hdrq_offset(u64 rhf)
+{
+	return (rhf >> RHF_HDRQ_OFFSET_SHIFT) & RHF_HDRQ_OFFSET_MASK;
+}
+
+static inline u64 rhf_use_egr_bfr(u64 rhf)
+{
+	return rhf & RHF_USE_EGR_BFR_SMASK;
+}
+
+static inline u64 rhf_dc_info(u64 rhf)
+{
+	return rhf & RHF_DC_INFO_SMASK;
+}
+
+static inline u32 rhf_egr_buf_offset(u64 rhf)
+{
+	return (rhf >> RHF_EGR_OFFSET_SHIFT) & RHF_EGR_OFFSET_MASK;
+}
+#endif /* _COMMON_H */
diff --git a/drivers/staging/rdma/hfi1/cq.c b/drivers/staging/rdma/hfi1/cq.c
new file mode 100644
index 000000000000..4f046ffe7e60
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/cq.c
@@ -0,0 +1,558 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/err.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/kthread.h>
+
+#include "verbs.h"
+#include "hfi.h"
+
+/**
+ * hfi1_cq_enter - add a new entry to the completion queue
+ * @cq: completion queue
+ * @entry: work completion entry to add
+ * @sig: true if @entry is a solicited entry
+ *
+ * This may be called with qp->s_lock held.
+ */
+void hfi1_cq_enter(struct hfi1_cq *cq, struct ib_wc *entry, int solicited)
+{
+	struct hfi1_cq_wc *wc;
+	unsigned long flags;
+	u32 head;
+	u32 next;
+
+	spin_lock_irqsave(&cq->lock, flags);
+
+	/*
+	 * Note that the head pointer might be writable by user processes.
+	 * Take care to verify it is a sane value.
+	 */
+	wc = cq->queue;
+	head = wc->head;
+	if (head >= (unsigned) cq->ibcq.cqe) {
+		head = cq->ibcq.cqe;
+		next = 0;
+	} else
+		next = head + 1;
+	if (unlikely(next == wc->tail)) {
+		spin_unlock_irqrestore(&cq->lock, flags);
+		if (cq->ibcq.event_handler) {
+			struct ib_event ev;
+
+			ev.device = cq->ibcq.device;
+			ev.element.cq = &cq->ibcq;
+			ev.event = IB_EVENT_CQ_ERR;
+			cq->ibcq.event_handler(&ev, cq->ibcq.cq_context);
+		}
+		return;
+	}
+	if (cq->ip) {
+		wc->uqueue[head].wr_id = entry->wr_id;
+		wc->uqueue[head].status = entry->status;
+		wc->uqueue[head].opcode = entry->opcode;
+		wc->uqueue[head].vendor_err = entry->vendor_err;
+		wc->uqueue[head].byte_len = entry->byte_len;
+		wc->uqueue[head].ex.imm_data =
+			(__u32 __force)entry->ex.imm_data;
+		wc->uqueue[head].qp_num = entry->qp->qp_num;
+		wc->uqueue[head].src_qp = entry->src_qp;
+		wc->uqueue[head].wc_flags = entry->wc_flags;
+		wc->uqueue[head].pkey_index = entry->pkey_index;
+		wc->uqueue[head].slid = entry->slid;
+		wc->uqueue[head].sl = entry->sl;
+		wc->uqueue[head].dlid_path_bits = entry->dlid_path_bits;
+		wc->uqueue[head].port_num = entry->port_num;
+		/* Make sure entry is written before the head index. */
+		smp_wmb();
+	} else
+		wc->kqueue[head] = *entry;
+	wc->head = next;
+
+	if (cq->notify == IB_CQ_NEXT_COMP ||
+	    (cq->notify == IB_CQ_SOLICITED &&
+	     (solicited || entry->status != IB_WC_SUCCESS))) {
+		struct kthread_worker *worker;
+		/*
+		 * This will cause send_complete() to be called in
+		 * another thread.
+		 */
+		smp_read_barrier_depends(); /* see hfi1_cq_exit */
+		worker = cq->dd->worker;
+		if (likely(worker)) {
+			cq->notify = IB_CQ_NONE;
+			cq->triggered++;
+			queue_kthread_work(worker, &cq->comptask);
+		}
+	}
+
+	spin_unlock_irqrestore(&cq->lock, flags);
+}
+
+/**
+ * hfi1_poll_cq - poll for work completion entries
+ * @ibcq: the completion queue to poll
+ * @num_entries: the maximum number of entries to return
+ * @entry: pointer to array where work completions are placed
+ *
+ * Returns the number of completion entries polled.
+ *
+ * This may be called from interrupt context.  Also called by ib_poll_cq()
+ * in the generic verbs code.
+ */
+int hfi1_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry)
+{
+	struct hfi1_cq *cq = to_icq(ibcq);
+	struct hfi1_cq_wc *wc;
+	unsigned long flags;
+	int npolled;
+	u32 tail;
+
+	/* The kernel can only poll a kernel completion queue */
+	if (cq->ip) {
+		npolled = -EINVAL;
+		goto bail;
+	}
+
+	spin_lock_irqsave(&cq->lock, flags);
+
+	wc = cq->queue;
+	tail = wc->tail;
+	if (tail > (u32) cq->ibcq.cqe)
+		tail = (u32) cq->ibcq.cqe;
+	for (npolled = 0; npolled < num_entries; ++npolled, ++entry) {
+		if (tail == wc->head)
+			break;
+		/* The kernel doesn't need a RMB since it has the lock. */
+		*entry = wc->kqueue[tail];
+		if (tail >= cq->ibcq.cqe)
+			tail = 0;
+		else
+			tail++;
+	}
+	wc->tail = tail;
+
+	spin_unlock_irqrestore(&cq->lock, flags);
+
+bail:
+	return npolled;
+}
+
+static void send_complete(struct kthread_work *work)
+{
+	struct hfi1_cq *cq = container_of(work, struct hfi1_cq, comptask);
+
+	/*
+	 * The completion handler will most likely rearm the notification
+	 * and poll for all pending entries.  If a new completion entry
+	 * is added while we are in this routine, queue_work()
+	 * won't call us again until we return so we check triggered to
+	 * see if we need to call the handler again.
+	 */
+	for (;;) {
+		u8 triggered = cq->triggered;
+
+		/*
+		 * IPoIB connected mode assumes the callback is from a
+		 * soft IRQ. We simulate this by blocking "bottom halves".
+		 * See the implementation for ipoib_cm_handle_tx_wc(),
+		 * netif_tx_lock_bh() and netif_tx_lock().
+		 */
+		local_bh_disable();
+		cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);
+		local_bh_enable();
+
+		if (cq->triggered == triggered)
+			return;
+	}
+}
+
+/**
+ * hfi1_create_cq - create a completion queue
+ * @ibdev: the device this completion queue is attached to
+ * @attr: creation attributes
+ * @context: unused by the driver
+ * @udata: user data for libibverbs.so
+ *
+ * Returns a pointer to the completion queue or negative errno values
+ * for failure.
+ *
+ * Called by ib_create_cq() in the generic verbs code.
+ */
+struct ib_cq *hfi1_create_cq(
+	struct ib_device *ibdev,
+	const struct ib_cq_init_attr *attr,
+	struct ib_ucontext *context,
+	struct ib_udata *udata)
+{
+	struct hfi1_ibdev *dev = to_idev(ibdev);
+	struct hfi1_cq *cq;
+	struct hfi1_cq_wc *wc;
+	struct ib_cq *ret;
+	u32 sz;
+	unsigned int entries = attr->cqe;
+
+	if (attr->flags)
+		return ERR_PTR(-EINVAL);
+
+	if (entries < 1 || entries > hfi1_max_cqes)
+		return ERR_PTR(-EINVAL);
+
+	/* Allocate the completion queue structure. */
+	cq = kmalloc(sizeof(*cq), GFP_KERNEL);
+	if (!cq)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * Allocate the completion queue entries and head/tail pointers.
+	 * This is allocated separately so that it can be resized and
+	 * also mapped into user space.
+	 * We need to use vmalloc() in order to support mmap and large
+	 * numbers of entries.
+	 */
+	sz = sizeof(*wc);
+	if (udata && udata->outlen >= sizeof(__u64))
+		sz += sizeof(struct ib_uverbs_wc) * (entries + 1);
+	else
+		sz += sizeof(struct ib_wc) * (entries + 1);
+	wc = vmalloc_user(sz);
+	if (!wc) {
+		ret = ERR_PTR(-ENOMEM);
+		goto bail_cq;
+	}
+
+	/*
+	 * Return the address of the WC as the offset to mmap.
+	 * See hfi1_mmap() for details.
+	 */
+	if (udata && udata->outlen >= sizeof(__u64)) {
+		int err;
+
+		cq->ip = hfi1_create_mmap_info(dev, sz, context, wc);
+		if (!cq->ip) {
+			ret = ERR_PTR(-ENOMEM);
+			goto bail_wc;
+		}
+
+		err = ib_copy_to_udata(udata, &cq->ip->offset,
+				       sizeof(cq->ip->offset));
+		if (err) {
+			ret = ERR_PTR(err);
+			goto bail_ip;
+		}
+	} else
+		cq->ip = NULL;
+
+	spin_lock(&dev->n_cqs_lock);
+	if (dev->n_cqs_allocated == hfi1_max_cqs) {
+		spin_unlock(&dev->n_cqs_lock);
+		ret = ERR_PTR(-ENOMEM);
+		goto bail_ip;
+	}
+
+	dev->n_cqs_allocated++;
+	spin_unlock(&dev->n_cqs_lock);
+
+	if (cq->ip) {
+		spin_lock_irq(&dev->pending_lock);
+		list_add(&cq->ip->pending_mmaps, &dev->pending_mmaps);
+		spin_unlock_irq(&dev->pending_lock);
+	}
+
+	/*
+	 * ib_create_cq() will initialize cq->ibcq except for cq->ibcq.cqe.
+	 * The number of entries should be >= the number requested or return
+	 * an error.
+	 */
+	cq->dd = dd_from_dev(dev);
+	cq->ibcq.cqe = entries;
+	cq->notify = IB_CQ_NONE;
+	cq->triggered = 0;
+	spin_lock_init(&cq->lock);
+	init_kthread_work(&cq->comptask, send_complete);
+	wc->head = 0;
+	wc->tail = 0;
+	cq->queue = wc;
+
+	ret = &cq->ibcq;
+
+	goto done;
+
+bail_ip:
+	kfree(cq->ip);
+bail_wc:
+	vfree(wc);
+bail_cq:
+	kfree(cq);
+done:
+	return ret;
+}
+
+/**
+ * hfi1_destroy_cq - destroy a completion queue
+ * @ibcq: the completion queue to destroy.
+ *
+ * Returns 0 for success.
+ *
+ * Called by ib_destroy_cq() in the generic verbs code.
+ */
+int hfi1_destroy_cq(struct ib_cq *ibcq)
+{
+	struct hfi1_ibdev *dev = to_idev(ibcq->device);
+	struct hfi1_cq *cq = to_icq(ibcq);
+
+	flush_kthread_work(&cq->comptask);
+	spin_lock(&dev->n_cqs_lock);
+	dev->n_cqs_allocated--;
+	spin_unlock(&dev->n_cqs_lock);
+	if (cq->ip)
+		kref_put(&cq->ip->ref, hfi1_release_mmap_info);
+	else
+		vfree(cq->queue);
+	kfree(cq);
+
+	return 0;
+}
+
+/**
+ * hfi1_req_notify_cq - change the notification type for a completion queue
+ * @ibcq: the completion queue
+ * @notify_flags: the type of notification to request
+ *
+ * Returns 0 for success.
+ *
+ * This may be called from interrupt context.  Also called by
+ * ib_req_notify_cq() in the generic verbs code.
+ */
+int hfi1_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags)
+{
+	struct hfi1_cq *cq = to_icq(ibcq);
+	unsigned long flags;
+	int ret = 0;
+
+	spin_lock_irqsave(&cq->lock, flags);
+	/*
+	 * Don't change IB_CQ_NEXT_COMP to IB_CQ_SOLICITED but allow
+	 * any other transitions (see C11-31 and C11-32 in ch. 11.4.2.2).
+	 */
+	if (cq->notify != IB_CQ_NEXT_COMP)
+		cq->notify = notify_flags & IB_CQ_SOLICITED_MASK;
+
+	if ((notify_flags & IB_CQ_REPORT_MISSED_EVENTS) &&
+	    cq->queue->head != cq->queue->tail)
+		ret = 1;
+
+	spin_unlock_irqrestore(&cq->lock, flags);
+
+	return ret;
+}
+
+/**
+ * hfi1_resize_cq - change the size of the CQ
+ * @ibcq: the completion queue
+ *
+ * Returns 0 for success.
+ */
+int hfi1_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
+{
+	struct hfi1_cq *cq = to_icq(ibcq);
+	struct hfi1_cq_wc *old_wc;
+	struct hfi1_cq_wc *wc;
+	u32 head, tail, n;
+	int ret;
+	u32 sz;
+
+	if (cqe < 1 || cqe > hfi1_max_cqes) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	/*
+	 * Need to use vmalloc() if we want to support large #s of entries.
+	 */
+	sz = sizeof(*wc);
+	if (udata && udata->outlen >= sizeof(__u64))
+		sz += sizeof(struct ib_uverbs_wc) * (cqe + 1);
+	else
+		sz += sizeof(struct ib_wc) * (cqe + 1);
+	wc = vmalloc_user(sz);
+	if (!wc) {
+		ret = -ENOMEM;
+		goto bail;
+	}
+
+	/* Check that we can write the offset to mmap. */
+	if (udata && udata->outlen >= sizeof(__u64)) {
+		__u64 offset = 0;
+
+		ret = ib_copy_to_udata(udata, &offset, sizeof(offset));
+		if (ret)
+			goto bail_free;
+	}
+
+	spin_lock_irq(&cq->lock);
+	/*
+	 * Make sure head and tail are sane since they
+	 * might be user writable.
+	 */
+	old_wc = cq->queue;
+	head = old_wc->head;
+	if (head > (u32) cq->ibcq.cqe)
+		head = (u32) cq->ibcq.cqe;
+	tail = old_wc->tail;
+	if (tail > (u32) cq->ibcq.cqe)
+		tail = (u32) cq->ibcq.cqe;
+	if (head < tail)
+		n = cq->ibcq.cqe + 1 + head - tail;
+	else
+		n = head - tail;
+	if (unlikely((u32)cqe < n)) {
+		ret = -EINVAL;
+		goto bail_unlock;
+	}
+	for (n = 0; tail != head; n++) {
+		if (cq->ip)
+			wc->uqueue[n] = old_wc->uqueue[tail];
+		else
+			wc->kqueue[n] = old_wc->kqueue[tail];
+		if (tail == (u32) cq->ibcq.cqe)
+			tail = 0;
+		else
+			tail++;
+	}
+	cq->ibcq.cqe = cqe;
+	wc->head = n;
+	wc->tail = 0;
+	cq->queue = wc;
+	spin_unlock_irq(&cq->lock);
+
+	vfree(old_wc);
+
+	if (cq->ip) {
+		struct hfi1_ibdev *dev = to_idev(ibcq->device);
+		struct hfi1_mmap_info *ip = cq->ip;
+
+		hfi1_update_mmap_info(dev, ip, sz, wc);
+
+		/*
+		 * Return the offset to mmap.
+		 * See hfi1_mmap() for details.
+		 */
+		if (udata && udata->outlen >= sizeof(__u64)) {
+			ret = ib_copy_to_udata(udata, &ip->offset,
+					       sizeof(ip->offset));
+			if (ret)
+				goto bail;
+		}
+
+		spin_lock_irq(&dev->pending_lock);
+		if (list_empty(&ip->pending_mmaps))
+			list_add(&ip->pending_mmaps, &dev->pending_mmaps);
+		spin_unlock_irq(&dev->pending_lock);
+	}
+
+	ret = 0;
+	goto bail;
+
+bail_unlock:
+	spin_unlock_irq(&cq->lock);
+bail_free:
+	vfree(wc);
+bail:
+	return ret;
+}
+
+int hfi1_cq_init(struct hfi1_devdata *dd)
+{
+	int ret = 0;
+	int cpu;
+	struct task_struct *task;
+
+	if (dd->worker)
+		return 0;
+	dd->worker = kzalloc(sizeof(*dd->worker), GFP_KERNEL);
+	if (!dd->worker)
+		return -ENOMEM;
+	init_kthread_worker(dd->worker);
+	task = kthread_create_on_node(
+		kthread_worker_fn,
+		dd->worker,
+		dd->assigned_node_id,
+		"hfi1_cq%d", dd->unit);
+	if (IS_ERR(task))
+		goto task_fail;
+	cpu = cpumask_first(cpumask_of_node(dd->assigned_node_id));
+	kthread_bind(task, cpu);
+	wake_up_process(task);
+out:
+	return ret;
+task_fail:
+	ret = PTR_ERR(task);
+	kfree(dd->worker);
+	dd->worker = NULL;
+	goto out;
+}
+
+void hfi1_cq_exit(struct hfi1_devdata *dd)
+{
+	struct kthread_worker *worker;
+
+	worker = dd->worker;
+	if (!worker)
+		return;
+	/* blocks future queuing from send_complete() */
+	dd->worker = NULL;
+	smp_wmb(); /* See hfi1_cq_enter */
+	flush_kthread_worker(worker);
+	kthread_stop(worker->task);
+	kfree(worker);
+}
diff --git a/drivers/staging/rdma/hfi1/debugfs.c b/drivers/staging/rdma/hfi1/debugfs.c
new file mode 100644
index 000000000000..acd2269e9f14
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/debugfs.c
@@ -0,0 +1,899 @@
+#ifdef CONFIG_DEBUG_FS
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/kernel.h>
+#include <linux/export.h>
+
+#include "hfi.h"
+#include "debugfs.h"
+#include "device.h"
+#include "qp.h"
+#include "sdma.h"
+
+static struct dentry *hfi1_dbg_root;
+
+#define private2dd(file) (file_inode(file)->i_private)
+#define private2ppd(file) (file_inode(file)->i_private)
+
+#define DEBUGFS_SEQ_FILE_OPS(name) \
+static const struct seq_operations _##name##_seq_ops = { \
+	.start = _##name##_seq_start, \
+	.next  = _##name##_seq_next, \
+	.stop  = _##name##_seq_stop, \
+	.show  = _##name##_seq_show \
+}
+#define DEBUGFS_SEQ_FILE_OPEN(name) \
+static int _##name##_open(struct inode *inode, struct file *s) \
+{ \
+	struct seq_file *seq; \
+	int ret; \
+	ret =  seq_open(s, &_##name##_seq_ops); \
+	if (ret) \
+		return ret; \
+	seq = s->private_data; \
+	seq->private = inode->i_private; \
+	return 0; \
+}
+
+#define DEBUGFS_FILE_OPS(name) \
+static const struct file_operations _##name##_file_ops = { \
+	.owner   = THIS_MODULE, \
+	.open    = _##name##_open, \
+	.read    = seq_read, \
+	.llseek  = seq_lseek, \
+	.release = seq_release \
+}
+
+#define DEBUGFS_FILE_CREATE(name, parent, data, ops, mode)	\
+do { \
+	struct dentry *ent; \
+	ent = debugfs_create_file(name, mode, parent, \
+		data, ops); \
+	if (!ent) \
+		pr_warn("create of %s failed\n", name); \
+} while (0)
+
+
+#define DEBUGFS_SEQ_FILE_CREATE(name, parent, data) \
+	DEBUGFS_FILE_CREATE(#name, parent, data, &_##name##_file_ops, S_IRUGO)
+
+static void *_opcode_stats_seq_start(struct seq_file *s, loff_t *pos)
+__acquires(RCU)
+{
+	struct hfi1_opcode_stats_perctx *opstats;
+
+	rcu_read_lock();
+	if (*pos >= ARRAY_SIZE(opstats->stats))
+		return NULL;
+	return pos;
+}
+
+static void *_opcode_stats_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	struct hfi1_opcode_stats_perctx *opstats;
+
+	++*pos;
+	if (*pos >= ARRAY_SIZE(opstats->stats))
+		return NULL;
+	return pos;
+}
+
+
+static void _opcode_stats_seq_stop(struct seq_file *s, void *v)
+__releases(RCU)
+{
+	rcu_read_unlock();
+}
+
+static int _opcode_stats_seq_show(struct seq_file *s, void *v)
+{
+	loff_t *spos = v;
+	loff_t i = *spos, j;
+	u64 n_packets = 0, n_bytes = 0;
+	struct hfi1_ibdev *ibd = (struct hfi1_ibdev *)s->private;
+	struct hfi1_devdata *dd = dd_from_dev(ibd);
+
+	for (j = 0; j < dd->first_user_ctxt; j++) {
+		if (!dd->rcd[j])
+			continue;
+		n_packets += dd->rcd[j]->opstats->stats[i].n_packets;
+		n_bytes += dd->rcd[j]->opstats->stats[i].n_bytes;
+	}
+	if (!n_packets && !n_bytes)
+		return SEQ_SKIP;
+	seq_printf(s, "%02llx %llu/%llu\n", i,
+		(unsigned long long) n_packets,
+		(unsigned long long) n_bytes);
+
+	return 0;
+}
+
+DEBUGFS_SEQ_FILE_OPS(opcode_stats);
+DEBUGFS_SEQ_FILE_OPEN(opcode_stats)
+DEBUGFS_FILE_OPS(opcode_stats);
+
+static void *_ctx_stats_seq_start(struct seq_file *s, loff_t *pos)
+{
+	struct hfi1_ibdev *ibd = (struct hfi1_ibdev *)s->private;
+	struct hfi1_devdata *dd = dd_from_dev(ibd);
+
+	if (!*pos)
+		return SEQ_START_TOKEN;
+	if (*pos >= dd->first_user_ctxt)
+		return NULL;
+	return pos;
+}
+
+static void *_ctx_stats_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	struct hfi1_ibdev *ibd = (struct hfi1_ibdev *)s->private;
+	struct hfi1_devdata *dd = dd_from_dev(ibd);
+
+	if (v == SEQ_START_TOKEN)
+		return pos;
+
+	++*pos;
+	if (*pos >= dd->first_user_ctxt)
+		return NULL;
+	return pos;
+}
+
+static void _ctx_stats_seq_stop(struct seq_file *s, void *v)
+{
+	/* nothing allocated */
+}
+
+static int _ctx_stats_seq_show(struct seq_file *s, void *v)
+{
+	loff_t *spos;
+	loff_t i, j;
+	u64 n_packets = 0;
+	struct hfi1_ibdev *ibd = (struct hfi1_ibdev *)s->private;
+	struct hfi1_devdata *dd = dd_from_dev(ibd);
+
+	if (v == SEQ_START_TOKEN) {
+		seq_puts(s, "Ctx:npkts\n");
+		return 0;
+	}
+
+	spos = v;
+	i = *spos;
+
+	if (!dd->rcd[i])
+		return SEQ_SKIP;
+
+	for (j = 0; j < ARRAY_SIZE(dd->rcd[i]->opstats->stats); j++)
+		n_packets += dd->rcd[i]->opstats->stats[j].n_packets;
+
+	if (!n_packets)
+		return SEQ_SKIP;
+
+	seq_printf(s, "  %llu:%llu\n", i, n_packets);
+	return 0;
+}
+
+DEBUGFS_SEQ_FILE_OPS(ctx_stats);
+DEBUGFS_SEQ_FILE_OPEN(ctx_stats)
+DEBUGFS_FILE_OPS(ctx_stats);
+
+static void *_qp_stats_seq_start(struct seq_file *s, loff_t *pos)
+__acquires(RCU)
+{
+	struct qp_iter *iter;
+	loff_t n = *pos;
+
+	rcu_read_lock();
+	iter = qp_iter_init(s->private);
+	if (!iter)
+		return NULL;
+
+	while (n--) {
+		if (qp_iter_next(iter)) {
+			kfree(iter);
+			return NULL;
+		}
+	}
+
+	return iter;
+}
+
+static void *_qp_stats_seq_next(struct seq_file *s, void *iter_ptr,
+				   loff_t *pos)
+{
+	struct qp_iter *iter = iter_ptr;
+
+	(*pos)++;
+
+	if (qp_iter_next(iter)) {
+		kfree(iter);
+		return NULL;
+	}
+
+	return iter;
+}
+
+static void _qp_stats_seq_stop(struct seq_file *s, void *iter_ptr)
+__releases(RCU)
+{
+	rcu_read_unlock();
+}
+
+static int _qp_stats_seq_show(struct seq_file *s, void *iter_ptr)
+{
+	struct qp_iter *iter = iter_ptr;
+
+	if (!iter)
+		return 0;
+
+	qp_iter_print(s, iter);
+
+	return 0;
+}
+
+DEBUGFS_SEQ_FILE_OPS(qp_stats);
+DEBUGFS_SEQ_FILE_OPEN(qp_stats)
+DEBUGFS_FILE_OPS(qp_stats);
+
+static void *_sdes_seq_start(struct seq_file *s, loff_t *pos)
+__acquires(RCU)
+{
+	struct hfi1_ibdev *ibd;
+	struct hfi1_devdata *dd;
+
+	rcu_read_lock();
+	ibd = (struct hfi1_ibdev *)s->private;
+	dd = dd_from_dev(ibd);
+	if (!dd->per_sdma || *pos >= dd->num_sdma)
+		return NULL;
+	return pos;
+}
+
+static void *_sdes_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	struct hfi1_ibdev *ibd = (struct hfi1_ibdev *)s->private;
+	struct hfi1_devdata *dd = dd_from_dev(ibd);
+
+	++*pos;
+	if (!dd->per_sdma || *pos >= dd->num_sdma)
+		return NULL;
+	return pos;
+}
+
+
+static void _sdes_seq_stop(struct seq_file *s, void *v)
+__releases(RCU)
+{
+	rcu_read_unlock();
+}
+
+static int _sdes_seq_show(struct seq_file *s, void *v)
+{
+	struct hfi1_ibdev *ibd = (struct hfi1_ibdev *)s->private;
+	struct hfi1_devdata *dd = dd_from_dev(ibd);
+	loff_t *spos = v;
+	loff_t i = *spos;
+
+	sdma_seqfile_dump_sde(s, &dd->per_sdma[i]);
+	return 0;
+}
+
+DEBUGFS_SEQ_FILE_OPS(sdes);
+DEBUGFS_SEQ_FILE_OPEN(sdes)
+DEBUGFS_FILE_OPS(sdes);
+
+/* read the per-device counters */
+static ssize_t dev_counters_read(struct file *file, char __user *buf,
+				 size_t count, loff_t *ppos)
+{
+	u64 *counters;
+	size_t avail;
+	struct hfi1_devdata *dd;
+	ssize_t rval;
+
+	rcu_read_lock();
+	dd = private2dd(file);
+	avail = hfi1_read_cntrs(dd, *ppos, NULL, &counters);
+	rval =  simple_read_from_buffer(buf, count, ppos, counters, avail);
+	rcu_read_unlock();
+	return rval;
+}
+
+/* read the per-device counters */
+static ssize_t dev_names_read(struct file *file, char __user *buf,
+			      size_t count, loff_t *ppos)
+{
+	char *names;
+	size_t avail;
+	struct hfi1_devdata *dd;
+	ssize_t rval;
+
+	rcu_read_lock();
+	dd = private2dd(file);
+	avail = hfi1_read_cntrs(dd, *ppos, &names, NULL);
+	rval =  simple_read_from_buffer(buf, count, ppos, names, avail);
+	rcu_read_unlock();
+	return rval;
+}
+
+struct counter_info {
+	char *name;
+	const struct file_operations ops;
+};
+
+/*
+ * Could use file_inode(file)->i_ino to figure out which file,
+ * instead of separate routine for each, but for now, this works...
+ */
+
+/* read the per-port names (same for each port) */
+static ssize_t portnames_read(struct file *file, char __user *buf,
+			      size_t count, loff_t *ppos)
+{
+	char *names;
+	size_t avail;
+	struct hfi1_devdata *dd;
+	ssize_t rval;
+
+	rcu_read_lock();
+	dd = private2dd(file);
+	/* port number n/a here since names are constant */
+	avail = hfi1_read_portcntrs(dd, *ppos, 0, &names, NULL);
+	rval = simple_read_from_buffer(buf, count, ppos, names, avail);
+	rcu_read_unlock();
+	return rval;
+}
+
+/* read the per-port counters */
+static ssize_t portcntrs_debugfs_read(struct file *file, char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	u64 *counters;
+	size_t avail;
+	struct hfi1_devdata *dd;
+	struct hfi1_pportdata *ppd;
+	ssize_t rval;
+
+	rcu_read_lock();
+	ppd = private2ppd(file);
+	dd = ppd->dd;
+	avail = hfi1_read_portcntrs(dd, *ppos, ppd->port - 1, NULL, &counters);
+	rval = simple_read_from_buffer(buf, count, ppos, counters, avail);
+	rcu_read_unlock();
+	return rval;
+}
+
+/*
+ * read the per-port QSFP data for ppd
+ */
+static ssize_t qsfp_debugfs_dump(struct file *file, char __user *buf,
+			   size_t count, loff_t *ppos)
+{
+	struct hfi1_pportdata *ppd;
+	char *tmp;
+	int ret;
+
+	rcu_read_lock();
+	ppd = private2ppd(file);
+	tmp = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!tmp) {
+		rcu_read_unlock();
+		return -ENOMEM;
+	}
+
+	ret = qsfp_dump(ppd, tmp, PAGE_SIZE);
+	if (ret > 0)
+		ret = simple_read_from_buffer(buf, count, ppos, tmp, ret);
+	rcu_read_unlock();
+	kfree(tmp);
+	return ret;
+}
+
+/* Do an i2c write operation on the chain for the given HFI. */
+static ssize_t __i2c_debugfs_write(struct file *file, const char __user *buf,
+			   size_t count, loff_t *ppos, u32 target)
+{
+	struct hfi1_pportdata *ppd;
+	char *buff;
+	int ret;
+	int i2c_addr;
+	int offset;
+	int total_written;
+
+	rcu_read_lock();
+	ppd = private2ppd(file);
+
+	buff = kmalloc(count, GFP_KERNEL);
+	if (!buff) {
+		ret = -ENOMEM;
+		goto _return;
+	}
+
+	ret = copy_from_user(buff, buf, count);
+	if (ret > 0) {
+		ret = -EFAULT;
+		goto _free;
+	}
+
+	i2c_addr = (*ppos >> 16) & 0xff;
+	offset = *ppos & 0xffff;
+
+	total_written = i2c_write(ppd, target, i2c_addr, offset, buff, count);
+	if (total_written < 0) {
+		ret = total_written;
+		goto _free;
+	}
+
+	*ppos += total_written;
+
+	ret = total_written;
+
+ _free:
+	kfree(buff);
+ _return:
+	rcu_read_unlock();
+	return ret;
+}
+
+/* Do an i2c write operation on chain for HFI 0. */
+static ssize_t i2c1_debugfs_write(struct file *file, const char __user *buf,
+			   size_t count, loff_t *ppos)
+{
+	return __i2c_debugfs_write(file, buf, count, ppos, 0);
+}
+
+/* Do an i2c write operation on chain for HFI 1. */
+static ssize_t i2c2_debugfs_write(struct file *file, const char __user *buf,
+			   size_t count, loff_t *ppos)
+{
+	return __i2c_debugfs_write(file, buf, count, ppos, 1);
+}
+
+/* Do an i2c read operation on the chain for the given HFI. */
+static ssize_t __i2c_debugfs_read(struct file *file, char __user *buf,
+			size_t count, loff_t *ppos, u32 target)
+{
+	struct hfi1_pportdata *ppd;
+	char *buff;
+	int ret;
+	int i2c_addr;
+	int offset;
+	int total_read;
+
+	rcu_read_lock();
+	ppd = private2ppd(file);
+
+	buff = kmalloc(count, GFP_KERNEL);
+	if (!buff) {
+		ret = -ENOMEM;
+		goto _return;
+	}
+
+	i2c_addr = (*ppos >> 16) & 0xff;
+	offset = *ppos & 0xffff;
+
+	total_read = i2c_read(ppd, target, i2c_addr, offset, buff, count);
+	if (total_read < 0) {
+		ret = total_read;
+		goto _free;
+	}
+
+	*ppos += total_read;
+
+	ret = copy_to_user(buf, buff, total_read);
+	if (ret > 0) {
+		ret = -EFAULT;
+		goto _free;
+	}
+
+	ret = total_read;
+
+ _free:
+	kfree(buff);
+ _return:
+	rcu_read_unlock();
+	return ret;
+}
+
+/* Do an i2c read operation on chain for HFI 0. */
+static ssize_t i2c1_debugfs_read(struct file *file, char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	return __i2c_debugfs_read(file, buf, count, ppos, 0);
+}
+
+/* Do an i2c read operation on chain for HFI 1. */
+static ssize_t i2c2_debugfs_read(struct file *file, char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	return __i2c_debugfs_read(file, buf, count, ppos, 1);
+}
+
+/* Do a QSFP write operation on the i2c chain for the given HFI. */
+static ssize_t __qsfp_debugfs_write(struct file *file, const char __user *buf,
+			   size_t count, loff_t *ppos, u32 target)
+{
+	struct hfi1_pportdata *ppd;
+	char *buff;
+	int ret;
+	int total_written;
+
+	rcu_read_lock();
+	if (*ppos + count > QSFP_PAGESIZE * 4) { /* base page + page00-page03 */
+		ret = -EINVAL;
+		goto _return;
+	}
+
+	ppd = private2ppd(file);
+
+	buff = kmalloc(count, GFP_KERNEL);
+	if (!buff) {
+		ret = -ENOMEM;
+		goto _return;
+	}
+
+	ret = copy_from_user(buff, buf, count);
+	if (ret > 0) {
+		ret = -EFAULT;
+		goto _free;
+	}
+
+	total_written = qsfp_write(ppd, target, *ppos, buff, count);
+	if (total_written < 0) {
+		ret = total_written;
+		goto _free;
+	}
+
+	*ppos += total_written;
+
+	ret = total_written;
+
+ _free:
+	kfree(buff);
+ _return:
+	rcu_read_unlock();
+	return ret;
+}
+
+/* Do a QSFP write operation on i2c chain for HFI 0. */
+static ssize_t qsfp1_debugfs_write(struct file *file, const char __user *buf,
+			   size_t count, loff_t *ppos)
+{
+	return __qsfp_debugfs_write(file, buf, count, ppos, 0);
+}
+
+/* Do a QSFP write operation on i2c chain for HFI 1. */
+static ssize_t qsfp2_debugfs_write(struct file *file, const char __user *buf,
+			   size_t count, loff_t *ppos)
+{
+	return __qsfp_debugfs_write(file, buf, count, ppos, 1);
+}
+
+/* Do a QSFP read operation on the i2c chain for the given HFI. */
+static ssize_t __qsfp_debugfs_read(struct file *file, char __user *buf,
+			size_t count, loff_t *ppos, u32 target)
+{
+	struct hfi1_pportdata *ppd;
+	char *buff;
+	int ret;
+	int total_read;
+
+	rcu_read_lock();
+	if (*ppos + count > QSFP_PAGESIZE * 4) { /* base page + page00-page03 */
+		ret = -EINVAL;
+		goto _return;
+	}
+
+	ppd = private2ppd(file);
+
+	buff = kmalloc(count, GFP_KERNEL);
+	if (!buff) {
+		ret = -ENOMEM;
+		goto _return;
+	}
+
+	total_read = qsfp_read(ppd, target, *ppos, buff, count);
+	if (total_read < 0) {
+		ret = total_read;
+		goto _free;
+	}
+
+	*ppos += total_read;
+
+	ret = copy_to_user(buf, buff, total_read);
+	if (ret > 0) {
+		ret = -EFAULT;
+		goto _free;
+	}
+
+	ret = total_read;
+
+ _free:
+	kfree(buff);
+ _return:
+	rcu_read_unlock();
+	return ret;
+}
+
+/* Do a QSFP read operation on i2c chain for HFI 0. */
+static ssize_t qsfp1_debugfs_read(struct file *file, char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	return __qsfp_debugfs_read(file, buf, count, ppos, 0);
+}
+
+/* Do a QSFP read operation on i2c chain for HFI 1. */
+static ssize_t qsfp2_debugfs_read(struct file *file, char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	return __qsfp_debugfs_read(file, buf, count, ppos, 1);
+}
+
+#define DEBUGFS_OPS(nm, readroutine, writeroutine)	\
+{ \
+	.name = nm, \
+	.ops = { \
+		.read = readroutine, \
+		.write = writeroutine, \
+		.llseek = generic_file_llseek, \
+	}, \
+}
+
+static const struct counter_info cntr_ops[] = {
+	DEBUGFS_OPS("counter_names", dev_names_read, NULL),
+	DEBUGFS_OPS("counters", dev_counters_read, NULL),
+	DEBUGFS_OPS("portcounter_names", portnames_read, NULL),
+};
+
+static const struct counter_info port_cntr_ops[] = {
+	DEBUGFS_OPS("port%dcounters", portcntrs_debugfs_read, NULL),
+	DEBUGFS_OPS("i2c1", i2c1_debugfs_read, i2c1_debugfs_write),
+	DEBUGFS_OPS("i2c2", i2c2_debugfs_read, i2c2_debugfs_write),
+	DEBUGFS_OPS("qsfp_dump%d", qsfp_debugfs_dump, NULL),
+	DEBUGFS_OPS("qsfp1", qsfp1_debugfs_read, qsfp1_debugfs_write),
+	DEBUGFS_OPS("qsfp2", qsfp2_debugfs_read, qsfp2_debugfs_write),
+};
+
+void hfi1_dbg_ibdev_init(struct hfi1_ibdev *ibd)
+{
+	char name[sizeof("port0counters") + 1];
+	char link[10];
+	struct hfi1_devdata *dd = dd_from_dev(ibd);
+	struct hfi1_pportdata *ppd;
+	int unit = dd->unit;
+	int i, j;
+
+	if (!hfi1_dbg_root)
+		return;
+	snprintf(name, sizeof(name), "%s_%d", class_name(), unit);
+	snprintf(link, sizeof(link), "%d", unit);
+	ibd->hfi1_ibdev_dbg = debugfs_create_dir(name, hfi1_dbg_root);
+	if (!ibd->hfi1_ibdev_dbg) {
+		pr_warn("create of %s failed\n", name);
+		return;
+	}
+	ibd->hfi1_ibdev_link =
+		debugfs_create_symlink(link, hfi1_dbg_root, name);
+	if (!ibd->hfi1_ibdev_link) {
+		pr_warn("create of %s symlink failed\n", name);
+		return;
+	}
+	DEBUGFS_SEQ_FILE_CREATE(opcode_stats, ibd->hfi1_ibdev_dbg, ibd);
+	DEBUGFS_SEQ_FILE_CREATE(ctx_stats, ibd->hfi1_ibdev_dbg, ibd);
+	DEBUGFS_SEQ_FILE_CREATE(qp_stats, ibd->hfi1_ibdev_dbg, ibd);
+	DEBUGFS_SEQ_FILE_CREATE(sdes, ibd->hfi1_ibdev_dbg, ibd);
+	/* dev counter files */
+	for (i = 0; i < ARRAY_SIZE(cntr_ops); i++)
+		DEBUGFS_FILE_CREATE(cntr_ops[i].name,
+				    ibd->hfi1_ibdev_dbg,
+				    dd,
+				    &cntr_ops[i].ops, S_IRUGO);
+	/* per port files */
+	for (ppd = dd->pport, j = 0; j < dd->num_pports; j++, ppd++)
+		for (i = 0; i < ARRAY_SIZE(port_cntr_ops); i++) {
+			snprintf(name,
+				 sizeof(name),
+				 port_cntr_ops[i].name,
+				 j + 1);
+			DEBUGFS_FILE_CREATE(name,
+					    ibd->hfi1_ibdev_dbg,
+					    ppd,
+					    &port_cntr_ops[i].ops,
+					    port_cntr_ops[i].ops.write == NULL ?
+					    S_IRUGO : S_IRUGO|S_IWUSR);
+		}
+}
+
+void hfi1_dbg_ibdev_exit(struct hfi1_ibdev *ibd)
+{
+	if (!hfi1_dbg_root)
+		goto out;
+	debugfs_remove(ibd->hfi1_ibdev_link);
+	debugfs_remove_recursive(ibd->hfi1_ibdev_dbg);
+out:
+	ibd->hfi1_ibdev_dbg = NULL;
+	synchronize_rcu();
+}
+
+/*
+ * driver stats field names, one line per stat, single string.  Used by
+ * programs like hfistats to print the stats in a way which works for
+ * different versions of drivers, without changing program source.
+ * if hfi1_ib_stats changes, this needs to change.  Names need to be
+ * 12 chars or less (w/o newline), for proper display by hfistats utility.
+ */
+static const char * const hfi1_statnames[] = {
+	/* must be element 0*/
+	"KernIntr",
+	"ErrorIntr",
+	"Tx_Errs",
+	"Rcv_Errs",
+	"H/W_Errs",
+	"NoPIOBufs",
+	"CtxtsOpen",
+	"RcvLen_Errs",
+	"EgrBufFull",
+	"EgrHdrFull"
+};
+
+static void *_driver_stats_names_seq_start(struct seq_file *s, loff_t *pos)
+__acquires(RCU)
+{
+	rcu_read_lock();
+	if (*pos >= ARRAY_SIZE(hfi1_statnames))
+		return NULL;
+	return pos;
+}
+
+static void *_driver_stats_names_seq_next(
+	struct seq_file *s,
+	void *v,
+	loff_t *pos)
+{
+	++*pos;
+	if (*pos >= ARRAY_SIZE(hfi1_statnames))
+		return NULL;
+	return pos;
+}
+
+static void _driver_stats_names_seq_stop(struct seq_file *s, void *v)
+__releases(RCU)
+{
+	rcu_read_unlock();
+}
+
+static int _driver_stats_names_seq_show(struct seq_file *s, void *v)
+{
+	loff_t *spos = v;
+
+	seq_printf(s, "%s\n", hfi1_statnames[*spos]);
+	return 0;
+}
+
+DEBUGFS_SEQ_FILE_OPS(driver_stats_names);
+DEBUGFS_SEQ_FILE_OPEN(driver_stats_names)
+DEBUGFS_FILE_OPS(driver_stats_names);
+
+static void *_driver_stats_seq_start(struct seq_file *s, loff_t *pos)
+__acquires(RCU)
+{
+	rcu_read_lock();
+	if (*pos >= ARRAY_SIZE(hfi1_statnames))
+		return NULL;
+	return pos;
+}
+
+static void *_driver_stats_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	++*pos;
+	if (*pos >= ARRAY_SIZE(hfi1_statnames))
+		return NULL;
+	return pos;
+}
+
+static void _driver_stats_seq_stop(struct seq_file *s, void *v)
+__releases(RCU)
+{
+	rcu_read_unlock();
+}
+
+static u64 hfi1_sps_ints(void)
+{
+	unsigned long flags;
+	struct hfi1_devdata *dd;
+	u64 sps_ints = 0;
+
+	spin_lock_irqsave(&hfi1_devs_lock, flags);
+	list_for_each_entry(dd, &hfi1_dev_list, list) {
+		sps_ints += get_all_cpu_total(dd->int_counter);
+	}
+	spin_unlock_irqrestore(&hfi1_devs_lock, flags);
+	return sps_ints;
+}
+
+static int _driver_stats_seq_show(struct seq_file *s, void *v)
+{
+	loff_t *spos = v;
+	char *buffer;
+	u64 *stats = (u64 *)&hfi1_stats;
+	size_t sz = seq_get_buf(s, &buffer);
+
+	if (sz < sizeof(u64))
+		return SEQ_SKIP;
+	/* special case for interrupts */
+	if (*spos == 0)
+		*(u64 *)buffer = hfi1_sps_ints();
+	else
+		*(u64 *)buffer = stats[*spos];
+	seq_commit(s,  sizeof(u64));
+	return 0;
+}
+
+DEBUGFS_SEQ_FILE_OPS(driver_stats);
+DEBUGFS_SEQ_FILE_OPEN(driver_stats)
+DEBUGFS_FILE_OPS(driver_stats);
+
+void hfi1_dbg_init(void)
+{
+	hfi1_dbg_root  = debugfs_create_dir(DRIVER_NAME, NULL);
+	if (!hfi1_dbg_root)
+		pr_warn("init of debugfs failed\n");
+	DEBUGFS_SEQ_FILE_CREATE(driver_stats_names, hfi1_dbg_root, NULL);
+	DEBUGFS_SEQ_FILE_CREATE(driver_stats, hfi1_dbg_root, NULL);
+}
+
+void hfi1_dbg_exit(void)
+{
+	debugfs_remove_recursive(hfi1_dbg_root);
+	hfi1_dbg_root = NULL;
+}
+
+#endif
diff --git a/drivers/staging/rdma/hfi1/debugfs.h b/drivers/staging/rdma/hfi1/debugfs.h
new file mode 100644
index 000000000000..92d6fe146714
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/debugfs.h
@@ -0,0 +1,78 @@
+#ifndef _HFI1_DEBUGFS_H
+#define _HFI1_DEBUGFS_H
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+struct hfi1_ibdev;
+#ifdef CONFIG_DEBUG_FS
+void hfi1_dbg_ibdev_init(struct hfi1_ibdev *ibd);
+void hfi1_dbg_ibdev_exit(struct hfi1_ibdev *ibd);
+void hfi1_dbg_init(void);
+void hfi1_dbg_exit(void);
+#else
+static inline void hfi1_dbg_ibdev_init(struct hfi1_ibdev *ibd)
+{
+}
+
+void hfi1_dbg_ibdev_exit(struct hfi1_ibdev *ibd)
+{
+}
+
+void hfi1_dbg_init(void)
+{
+}
+
+void hfi1_dbg_exit(void)
+{
+}
+
+#endif
+
+#endif                          /* _HFI1_DEBUGFS_H */
diff --git a/drivers/staging/rdma/hfi1/device.c b/drivers/staging/rdma/hfi1/device.c
new file mode 100644
index 000000000000..07c87a87775f
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/device.c
@@ -0,0 +1,142 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/cdev.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/fs.h>
+
+#include "hfi.h"
+#include "device.h"
+
+static struct class *class;
+static dev_t hfi1_dev;
+
+int hfi1_cdev_init(int minor, const char *name,
+		   const struct file_operations *fops,
+		   struct cdev *cdev, struct device **devp)
+{
+	const dev_t dev = MKDEV(MAJOR(hfi1_dev), minor);
+	struct device *device = NULL;
+	int ret;
+
+	cdev_init(cdev, fops);
+	cdev->owner = THIS_MODULE;
+	kobject_set_name(&cdev->kobj, name);
+
+	ret = cdev_add(cdev, dev, 1);
+	if (ret < 0) {
+		pr_err("Could not add cdev for minor %d, %s (err %d)\n",
+		       minor, name, -ret);
+		goto done;
+	}
+
+	device = device_create(class, NULL, dev, NULL, "%s", name);
+	if (!IS_ERR(device))
+		goto done;
+	ret = PTR_ERR(device);
+	device = NULL;
+	pr_err("Could not create device for minor %d, %s (err %d)\n",
+	       minor, name, -ret);
+	cdev_del(cdev);
+done:
+	*devp = device;
+	return ret;
+}
+
+void hfi1_cdev_cleanup(struct cdev *cdev, struct device **devp)
+{
+	struct device *device = *devp;
+
+	if (device) {
+		device_unregister(device);
+		*devp = NULL;
+
+		cdev_del(cdev);
+	}
+}
+
+static const char *hfi1_class_name = "hfi1";
+
+const char *class_name(void)
+{
+	return hfi1_class_name;
+}
+
+int __init dev_init(void)
+{
+	int ret;
+
+	ret = alloc_chrdev_region(&hfi1_dev, 0, HFI1_NMINORS, DRIVER_NAME);
+	if (ret < 0) {
+		pr_err("Could not allocate chrdev region (err %d)\n", -ret);
+		goto done;
+	}
+
+	class = class_create(THIS_MODULE, class_name());
+	if (IS_ERR(class)) {
+		ret = PTR_ERR(class);
+		pr_err("Could not create device class (err %d)\n", -ret);
+		unregister_chrdev_region(hfi1_dev, HFI1_NMINORS);
+	}
+
+done:
+	return ret;
+}
+
+void dev_cleanup(void)
+{
+	if (class) {
+		class_destroy(class);
+		class = NULL;
+	}
+
+	unregister_chrdev_region(hfi1_dev, HFI1_NMINORS);
+}
diff --git a/drivers/staging/rdma/hfi1/device.h b/drivers/staging/rdma/hfi1/device.h
new file mode 100644
index 000000000000..98caecd3d807
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/device.h
@@ -0,0 +1,61 @@
+#ifndef _HFI1_DEVICE_H
+#define _HFI1_DEVICE_H
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+int hfi1_cdev_init(int minor, const char *name,
+		   const struct file_operations *fops,
+		   struct cdev *cdev, struct device **devp);
+void hfi1_cdev_cleanup(struct cdev *cdev, struct device **devp);
+const char *class_name(void);
+int __init dev_init(void);
+void dev_cleanup(void);
+
+#endif                          /* _HFI1_DEVICE_H */
diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
new file mode 100644
index 000000000000..6777d6b659cf
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -0,0 +1,1873 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+/*
+ * This file contains support for diagnostic functions.  It is accessed by
+ * opening the hfi1_diag device, normally minor number 129.  Diagnostic use
+ * of the chip may render the chip or board unusable until the driver
+ * is unloaded, or in some cases, until the system is rebooted.
+ *
+ * Accesses to the chip through this interface are not similar to going
+ * through the /sys/bus/pci resource mmap interface.
+ */
+
+#include <linux/io.h>
+#include <linux/pci.h>
+#include <linux/poll.h>
+#include <linux/vmalloc.h>
+#include <linux/export.h>
+#include <linux/fs.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <rdma/ib_smi.h>
+#include "hfi.h"
+#include "device.h"
+#include "common.h"
+#include "trace.h"
+
+#undef pr_fmt
+#define pr_fmt(fmt) DRIVER_NAME ": " fmt
+#define snoop_dbg(fmt, ...) \
+	hfi1_cdbg(SNOOP, fmt, ##__VA_ARGS__)
+
+/* Snoop option mask */
+#define SNOOP_DROP_SEND	(1 << 0)
+#define SNOOP_USE_METADATA	(1 << 1)
+
+static u8 snoop_flags;
+
+/*
+ * Extract packet length from LRH header.
+ * Why & 0x7FF? Because len is only 11 bits in case it wasn't 0'd we throw the
+ * bogus bits away. This is in Dwords so multiply by 4 to get size in bytes
+ */
+#define HFI1_GET_PKT_LEN(x)      (((be16_to_cpu((x)->lrh[2]) & 0x7FF)) << 2)
+
+enum hfi1_filter_status {
+	HFI1_FILTER_HIT,
+	HFI1_FILTER_ERR,
+	HFI1_FILTER_MISS
+};
+
+/* snoop processing functions */
+rhf_rcv_function_ptr snoop_rhf_rcv_functions[8] = {
+	[RHF_RCV_TYPE_EXPECTED] = snoop_recv_handler,
+	[RHF_RCV_TYPE_EAGER]    = snoop_recv_handler,
+	[RHF_RCV_TYPE_IB]       = snoop_recv_handler,
+	[RHF_RCV_TYPE_ERROR]    = snoop_recv_handler,
+	[RHF_RCV_TYPE_BYPASS]   = snoop_recv_handler,
+	[RHF_RCV_TYPE_INVALID5] = process_receive_invalid,
+	[RHF_RCV_TYPE_INVALID6] = process_receive_invalid,
+	[RHF_RCV_TYPE_INVALID7] = process_receive_invalid
+};
+
+/* Snoop packet structure */
+struct snoop_packet {
+	struct list_head list;
+	u32 total_len;
+	u8 data[];
+};
+
+/* Do not make these an enum or it will blow up the capture_md */
+#define PKT_DIR_EGRESS 0x0
+#define PKT_DIR_INGRESS 0x1
+
+/* Packet capture metadata returned to the user with the packet. */
+struct capture_md {
+	u8 port;
+	u8 dir;
+	u8 reserved[6];
+	union {
+		u64 pbc;
+		u64 rhf;
+	} u;
+};
+
+static atomic_t diagpkt_count = ATOMIC_INIT(0);
+static struct cdev diagpkt_cdev;
+static struct device *diagpkt_device;
+
+static ssize_t diagpkt_write(struct file *fp, const char __user *data,
+				 size_t count, loff_t *off);
+
+static const struct file_operations diagpkt_file_ops = {
+	.owner = THIS_MODULE,
+	.write = diagpkt_write,
+	.llseek = noop_llseek,
+};
+
+/*
+ * This is used for communication with user space for snoop extended IOCTLs
+ */
+struct hfi1_link_info {
+	__be64 node_guid;
+	u8 port_mode;
+	u8 port_state;
+	u16 link_speed_active;
+	u16 link_width_active;
+	u16 vl15_init;
+	u8 port_number;
+	/*
+	 * Add padding to make this a full IB SMP payload. Note: changing the
+	 * size of this structure will make the IOCTLs created with _IOWR
+	 * change.
+	 * Be sure to run tests on all IOCTLs when making changes to this
+	 * structure.
+	 */
+	u8 res[47];
+};
+
+/*
+ * This starts our ioctl sequence numbers *way* off from the ones
+ * defined in ib_core.
+ */
+#define SNOOP_CAPTURE_VERSION 0x1
+
+#define IB_IOCTL_MAGIC          0x1b /* See Documentation/ioctl-number.txt */
+#define HFI1_SNOOP_IOC_MAGIC IB_IOCTL_MAGIC
+#define HFI1_SNOOP_IOC_BASE_SEQ 0x80
+
+#define HFI1_SNOOP_IOCGETLINKSTATE \
+	_IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ)
+#define HFI1_SNOOP_IOCSETLINKSTATE \
+	_IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+1)
+#define HFI1_SNOOP_IOCCLEARQUEUE \
+	_IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+2)
+#define HFI1_SNOOP_IOCCLEARFILTER \
+	_IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+3)
+#define HFI1_SNOOP_IOCSETFILTER \
+	_IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+4)
+#define HFI1_SNOOP_IOCGETVERSION \
+	_IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+5)
+#define HFI1_SNOOP_IOCSET_OPTS \
+	_IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+6)
+
+/*
+ * These offsets +6/+7 could change, but these are already known and used
+ * IOCTL numbers so don't change them without a good reason.
+ */
+#define HFI1_SNOOP_IOCGETLINKSTATE_EXTRA \
+	_IOWR(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+6, \
+		struct hfi1_link_info)
+#define HFI1_SNOOP_IOCSETLINKSTATE_EXTRA \
+	_IOWR(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ+7, \
+		struct hfi1_link_info)
+
+static int hfi1_snoop_open(struct inode *in, struct file *fp);
+static ssize_t hfi1_snoop_read(struct file *fp, char __user *data,
+				size_t pkt_len, loff_t *off);
+static ssize_t hfi1_snoop_write(struct file *fp, const char __user *data,
+				 size_t count, loff_t *off);
+static long hfi1_ioctl(struct file *fp, unsigned int cmd, unsigned long arg);
+static unsigned int hfi1_snoop_poll(struct file *fp,
+					struct poll_table_struct *wait);
+static int hfi1_snoop_release(struct inode *in, struct file *fp);
+
+struct hfi1_packet_filter_command {
+	int opcode;
+	int length;
+	void *value_ptr;
+};
+
+/* Can't re-use PKT_DIR_*GRESS here because 0 means no packets for this */
+#define HFI1_SNOOP_INGRESS 0x1
+#define HFI1_SNOOP_EGRESS  0x2
+
+enum hfi1_packet_filter_opcodes {
+	FILTER_BY_LID,
+	FILTER_BY_DLID,
+	FILTER_BY_MAD_MGMT_CLASS,
+	FILTER_BY_QP_NUMBER,
+	FILTER_BY_PKT_TYPE,
+	FILTER_BY_SERVICE_LEVEL,
+	FILTER_BY_PKEY,
+	FILTER_BY_DIRECTION,
+};
+
+static const struct file_operations snoop_file_ops = {
+	.owner = THIS_MODULE,
+	.open = hfi1_snoop_open,
+	.read = hfi1_snoop_read,
+	.unlocked_ioctl = hfi1_ioctl,
+	.poll = hfi1_snoop_poll,
+	.write = hfi1_snoop_write,
+	.release = hfi1_snoop_release
+};
+
+struct hfi1_filter_array {
+	int (*filter)(void *, void *, void *);
+};
+
+static int hfi1_filter_lid(void *ibhdr, void *packet_data, void *value);
+static int hfi1_filter_dlid(void *ibhdr, void *packet_data, void *value);
+static int hfi1_filter_mad_mgmt_class(void *ibhdr, void *packet_data,
+				      void *value);
+static int hfi1_filter_qp_number(void *ibhdr, void *packet_data, void *value);
+static int hfi1_filter_ibpacket_type(void *ibhdr, void *packet_data,
+				     void *value);
+static int hfi1_filter_ib_service_level(void *ibhdr, void *packet_data,
+					void *value);
+static int hfi1_filter_ib_pkey(void *ibhdr, void *packet_data, void *value);
+static int hfi1_filter_direction(void *ibhdr, void *packet_data, void *value);
+
+static struct hfi1_filter_array hfi1_filters[] = {
+	{ hfi1_filter_lid },
+	{ hfi1_filter_dlid },
+	{ hfi1_filter_mad_mgmt_class },
+	{ hfi1_filter_qp_number },
+	{ hfi1_filter_ibpacket_type },
+	{ hfi1_filter_ib_service_level },
+	{ hfi1_filter_ib_pkey },
+	{ hfi1_filter_direction },
+};
+
+#define HFI1_MAX_FILTERS	ARRAY_SIZE(hfi1_filters)
+#define HFI1_DIAG_MINOR_BASE	129
+
+static int hfi1_snoop_add(struct hfi1_devdata *dd, const char *name);
+
+int hfi1_diag_add(struct hfi1_devdata *dd)
+{
+	char name[16];
+	int ret = 0;
+
+	snprintf(name, sizeof(name), "%s_diagpkt%d", class_name(),
+		 dd->unit);
+	/*
+	 * Do this for each device as opposed to the normal diagpkt
+	 * interface which is one per host
+	 */
+	ret = hfi1_snoop_add(dd, name);
+	if (ret)
+		dd_dev_err(dd, "Unable to init snoop/capture device");
+
+	snprintf(name, sizeof(name), "%s_diagpkt", class_name());
+	if (atomic_inc_return(&diagpkt_count) == 1) {
+		ret = hfi1_cdev_init(HFI1_DIAGPKT_MINOR, name,
+				     &diagpkt_file_ops, &diagpkt_cdev,
+				     &diagpkt_device);
+	}
+
+	return ret;
+}
+
+/* this must be called w/ dd->snoop_in_lock held */
+static void drain_snoop_list(struct list_head *queue)
+{
+	struct list_head *pos, *q;
+	struct snoop_packet *packet;
+
+	list_for_each_safe(pos, q, queue) {
+		packet = list_entry(pos, struct snoop_packet, list);
+		list_del(pos);
+		kfree(packet);
+	}
+}
+
+static void hfi1_snoop_remove(struct hfi1_devdata *dd)
+{
+	unsigned long flags = 0;
+
+	spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
+	drain_snoop_list(&dd->hfi1_snoop.queue);
+	hfi1_cdev_cleanup(&dd->hfi1_snoop.cdev, &dd->hfi1_snoop.class_dev);
+	spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
+}
+
+void hfi1_diag_remove(struct hfi1_devdata *dd)
+{
+
+	hfi1_snoop_remove(dd);
+	if (atomic_dec_and_test(&diagpkt_count))
+		hfi1_cdev_cleanup(&diagpkt_cdev, &diagpkt_device);
+	hfi1_cdev_cleanup(&dd->diag_cdev, &dd->diag_device);
+}
+
+
+/*
+ * Allocated structure shared between the credit return mechanism and
+ * diagpkt_send().
+ */
+struct diagpkt_wait {
+	struct completion credits_returned;
+	int code;
+	atomic_t count;
+};
+
+/*
+ * When each side is finished with the structure, they call this.
+ * The last user frees the structure.
+ */
+static void put_diagpkt_wait(struct diagpkt_wait *wait)
+{
+	if (atomic_dec_and_test(&wait->count))
+		kfree(wait);
+}
+
+/*
+ * Callback from the credit return code.  Set the complete, which
+ * will let diapkt_send() continue.
+ */
+static void diagpkt_complete(void *arg, int code)
+{
+	struct diagpkt_wait *wait = (struct diagpkt_wait *)arg;
+
+	wait->code = code;
+	complete(&wait->credits_returned);
+	put_diagpkt_wait(wait);	/* finished with the structure */
+}
+
+/**
+ * diagpkt_send - send a packet
+ * @dp: diag packet descriptor
+ */
+static ssize_t diagpkt_send(struct diag_pkt *dp)
+{
+	struct hfi1_devdata *dd;
+	struct send_context *sc;
+	struct pio_buf *pbuf;
+	u32 *tmpbuf = NULL;
+	ssize_t ret = 0;
+	u32 pkt_len, total_len;
+	pio_release_cb credit_cb = NULL;
+	void *credit_arg = NULL;
+	struct diagpkt_wait *wait = NULL;
+
+	dd = hfi1_lookup(dp->unit);
+	if (!dd || !(dd->flags & HFI1_PRESENT) || !dd->kregbase) {
+		ret = -ENODEV;
+		goto bail;
+	}
+	if (!(dd->flags & HFI1_INITTED)) {
+		/* no hardware, freeze, etc. */
+		ret = -ENODEV;
+		goto bail;
+	}
+
+	if (dp->version != _DIAG_PKT_VERS) {
+		dd_dev_err(dd, "Invalid version %u for diagpkt_write\n",
+			    dp->version);
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	/* send count must be an exact number of dwords */
+	if (dp->len & 3) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	/* there is only port 1 */
+	if (dp->port != 1) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	/* need a valid context */
+	if (dp->sw_index >= dd->num_send_contexts) {
+		ret = -EINVAL;
+		goto bail;
+	}
+	/* can only use kernel contexts */
+	if (dd->send_contexts[dp->sw_index].type != SC_KERNEL) {
+		ret = -EINVAL;
+		goto bail;
+	}
+	/* must be allocated */
+	sc = dd->send_contexts[dp->sw_index].sc;
+	if (!sc) {
+		ret = -EINVAL;
+		goto bail;
+	}
+	/* must be enabled */
+	if (!(sc->flags & SCF_ENABLED)) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	/* allocate a buffer and copy the data in */
+	tmpbuf = vmalloc(dp->len);
+	if (!tmpbuf) {
+		ret = -ENOMEM;
+		goto bail;
+	}
+
+	if (copy_from_user(tmpbuf,
+			   (const void __user *) (unsigned long) dp->data,
+			   dp->len)) {
+		ret = -EFAULT;
+		goto bail;
+	}
+
+	/*
+	 * pkt_len is how much data we have to write, includes header and data.
+	 * total_len is length of the packet in Dwords plus the PBC should not
+	 * include the CRC.
+	 */
+	pkt_len = dp->len >> 2;
+	total_len = pkt_len + 2; /* PBC + packet */
+
+	/* if 0, fill in a default */
+	if (dp->pbc == 0) {
+		struct hfi1_pportdata *ppd = dd->pport;
+
+		hfi1_cdbg(PKT, "Generating PBC");
+		dp->pbc = create_pbc(ppd, 0, 0, 0, total_len);
+	} else {
+		hfi1_cdbg(PKT, "Using passed in PBC");
+	}
+
+	hfi1_cdbg(PKT, "Egress PBC content is 0x%llx", dp->pbc);
+
+	/*
+	 * The caller wants to wait until the packet is sent and to
+	 * check for errors.  The best we can do is wait until
+	 * the buffer credits are returned and check if any packet
+	 * error has occurred.  If there are any late errors, this
+	 * could miss it.  If there are other senders who generate
+	 * an error, this may find it.  However, in general, it
+	 * should catch most.
+	 */
+	if (dp->flags & F_DIAGPKT_WAIT) {
+		/* always force a credit return */
+		dp->pbc |= PBC_CREDIT_RETURN;
+		/* turn on credit return interrupts */
+		sc_add_credit_return_intr(sc);
+		wait = kmalloc(sizeof(*wait), GFP_KERNEL);
+		if (!wait) {
+			ret = -ENOMEM;
+			goto bail;
+		}
+		init_completion(&wait->credits_returned);
+		atomic_set(&wait->count, 2);
+		wait->code = PRC_OK;
+
+		credit_cb = diagpkt_complete;
+		credit_arg = wait;
+	}
+
+	pbuf = sc_buffer_alloc(sc, total_len, credit_cb, credit_arg);
+	if (!pbuf) {
+		/*
+		 * No send buffer means no credit callback.  Undo
+		 * the wait set-up that was done above.  We free wait
+		 * because the callback will never be called.
+		 */
+		if (dp->flags & F_DIAGPKT_WAIT) {
+			sc_del_credit_return_intr(sc);
+			kfree(wait);
+			wait = NULL;
+		}
+		ret = -ENOSPC;
+		goto bail;
+	}
+
+	pio_copy(dd, pbuf, dp->pbc, tmpbuf, pkt_len);
+	/* no flush needed as the HW knows the packet size */
+
+	ret = sizeof(*dp);
+
+	if (dp->flags & F_DIAGPKT_WAIT) {
+		/* wait for credit return */
+		ret = wait_for_completion_interruptible(
+						&wait->credits_returned);
+		/*
+		 * If the wait returns an error, the wait was interrupted,
+		 * e.g. with a ^C in the user program.  The callback is
+		 * still pending.  This is OK as the wait structure is
+		 * kmalloc'ed and the structure will free itself when
+		 * all users are done with it.
+		 *
+		 * A context disable occurs on a send context restart, so
+		 * include that in the list of errors below to check for.
+		 * NOTE: PRC_FILL_ERR is at best informational and cannot
+		 * be depended on.
+		 */
+		if (!ret && (((wait->code & PRC_STATUS_ERR)
+				|| (wait->code & PRC_FILL_ERR)
+				|| (wait->code & PRC_SC_DISABLE))))
+			ret = -EIO;
+
+		put_diagpkt_wait(wait);	/* finished with the structure */
+		sc_del_credit_return_intr(sc);
+	}
+
+bail:
+	vfree(tmpbuf);
+	return ret;
+}
+
+static ssize_t diagpkt_write(struct file *fp, const char __user *data,
+				 size_t count, loff_t *off)
+{
+	struct hfi1_devdata *dd;
+	struct send_context *sc;
+	u8 vl;
+
+	struct diag_pkt dp;
+
+	if (count != sizeof(dp))
+		return -EINVAL;
+
+	if (copy_from_user(&dp, data, sizeof(dp)))
+		return -EFAULT;
+
+	/*
+	* The Send Context is derived from the PbcVL value
+	* if PBC is populated
+	*/
+	if (dp.pbc) {
+		dd = hfi1_lookup(dp.unit);
+		if (dd == NULL)
+			return -ENODEV;
+		vl = (dp.pbc >> PBC_VL_SHIFT) & PBC_VL_MASK;
+		sc = dd->vld[vl].sc;
+		if (sc) {
+			dp.sw_index = sc->sw_index;
+			hfi1_cdbg(
+			       PKT,
+			       "Packet sent over VL %d via Send Context %u(%u)",
+			       vl, sc->sw_index, sc->hw_context);
+		}
+	}
+
+	return diagpkt_send(&dp);
+}
+
+static int hfi1_snoop_add(struct hfi1_devdata *dd, const char *name)
+{
+	int ret = 0;
+
+	dd->hfi1_snoop.mode_flag = 0;
+	spin_lock_init(&dd->hfi1_snoop.snoop_lock);
+	INIT_LIST_HEAD(&dd->hfi1_snoop.queue);
+	init_waitqueue_head(&dd->hfi1_snoop.waitq);
+
+	ret = hfi1_cdev_init(HFI1_SNOOP_CAPTURE_BASE + dd->unit, name,
+			     &snoop_file_ops,
+			     &dd->hfi1_snoop.cdev, &dd->hfi1_snoop.class_dev);
+
+	if (ret) {
+		dd_dev_err(dd, "Couldn't create %s device: %d", name, ret);
+		hfi1_cdev_cleanup(&dd->hfi1_snoop.cdev,
+				 &dd->hfi1_snoop.class_dev);
+	}
+
+	return ret;
+}
+
+static struct hfi1_devdata *hfi1_dd_from_sc_inode(struct inode *in)
+{
+	int unit = iminor(in) - HFI1_SNOOP_CAPTURE_BASE;
+	struct hfi1_devdata *dd = NULL;
+
+	dd = hfi1_lookup(unit);
+	return dd;
+
+}
+
+/* clear or restore send context integrity checks */
+static void adjust_integrity_checks(struct hfi1_devdata *dd)
+{
+	struct send_context *sc;
+	unsigned long sc_flags;
+	int i;
+
+	spin_lock_irqsave(&dd->sc_lock, sc_flags);
+	for (i = 0; i < dd->num_send_contexts; i++) {
+		int enable;
+
+		sc = dd->send_contexts[i].sc;
+
+		if (!sc)
+			continue;	/* not allocated */
+
+		enable = likely(!HFI1_CAP_IS_KSET(NO_INTEGRITY)) &&
+			 dd->hfi1_snoop.mode_flag != HFI1_PORT_SNOOP_MODE;
+
+		set_pio_integrity(sc);
+
+		if (enable) /* take HFI_CAP_* flags into account */
+			hfi1_init_ctxt(sc);
+	}
+	spin_unlock_irqrestore(&dd->sc_lock, sc_flags);
+}
+
+static int hfi1_snoop_open(struct inode *in, struct file *fp)
+{
+	int ret;
+	int mode_flag = 0;
+	unsigned long flags = 0;
+	struct hfi1_devdata *dd;
+	struct list_head *queue;
+
+	mutex_lock(&hfi1_mutex);
+
+	dd = hfi1_dd_from_sc_inode(in);
+	if (dd == NULL) {
+		ret = -ENODEV;
+		goto bail;
+	}
+
+	/*
+	 * File mode determines snoop or capture. Some existing user
+	 * applications expect the capture device to be able to be opened RDWR
+	 * because they expect a dedicated capture device. For this reason we
+	 * support a module param to force capture mode even if the file open
+	 * mode matches snoop.
+	 */
+	if ((fp->f_flags & O_ACCMODE) == O_RDONLY) {
+		snoop_dbg("Capture Enabled");
+		mode_flag = HFI1_PORT_CAPTURE_MODE;
+	} else if ((fp->f_flags & O_ACCMODE) == O_RDWR) {
+		snoop_dbg("Snoop Enabled");
+		mode_flag = HFI1_PORT_SNOOP_MODE;
+	} else {
+		snoop_dbg("Invalid");
+		ret =  -EINVAL;
+		goto bail;
+	}
+	queue = &dd->hfi1_snoop.queue;
+
+	/*
+	 * We are not supporting snoop and capture at the same time.
+	 */
+	spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
+	if (dd->hfi1_snoop.mode_flag) {
+		ret = -EBUSY;
+		spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
+		goto bail;
+	}
+
+	dd->hfi1_snoop.mode_flag = mode_flag;
+	drain_snoop_list(queue);
+
+	dd->hfi1_snoop.filter_callback = NULL;
+	dd->hfi1_snoop.filter_value = NULL;
+
+	/*
+	 * Send side packet integrity checks are not helpful when snooping so
+	 * disable and re-enable when we stop snooping.
+	 */
+	if (mode_flag == HFI1_PORT_SNOOP_MODE) {
+		/* clear after snoop mode is on */
+		adjust_integrity_checks(dd); /* clear */
+
+		/*
+		 * We also do not want to be doing the DLID LMC check for
+		 * ingressed packets.
+		 */
+		dd->hfi1_snoop.dcc_cfg = read_csr(dd, DCC_CFG_PORT_CONFIG1);
+		write_csr(dd, DCC_CFG_PORT_CONFIG1,
+			  (dd->hfi1_snoop.dcc_cfg >> 32) << 32);
+	}
+
+	/*
+	 * As soon as we set these function pointers the recv and send handlers
+	 * are active. This is a race condition so we must make sure to drain
+	 * the queue and init filter values above. Technically we should add
+	 * locking here but all that will happen is on recv a packet will get
+	 * allocated and get stuck on the snoop_lock before getting added to the
+	 * queue. Same goes for send.
+	 */
+	dd->rhf_rcv_function_map = snoop_rhf_rcv_functions;
+	dd->process_pio_send = snoop_send_pio_handler;
+	dd->process_dma_send = snoop_send_pio_handler;
+	dd->pio_inline_send = snoop_inline_pio_send;
+
+	spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
+	ret = 0;
+
+bail:
+	mutex_unlock(&hfi1_mutex);
+
+	return ret;
+}
+
+static int hfi1_snoop_release(struct inode *in, struct file *fp)
+{
+	unsigned long flags = 0;
+	struct hfi1_devdata *dd;
+	int mode_flag;
+
+	dd = hfi1_dd_from_sc_inode(in);
+	if (dd == NULL)
+		return -ENODEV;
+
+	spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
+
+	/* clear the snoop mode before re-adjusting send context CSRs */
+	mode_flag = dd->hfi1_snoop.mode_flag;
+	dd->hfi1_snoop.mode_flag = 0;
+
+	/*
+	 * Drain the queue and clear the filters we are done with it. Don't
+	 * forget to restore the packet integrity checks
+	 */
+	drain_snoop_list(&dd->hfi1_snoop.queue);
+	if (mode_flag == HFI1_PORT_SNOOP_MODE) {
+		/* restore after snoop mode is clear */
+		adjust_integrity_checks(dd); /* restore */
+
+		/*
+		 * Also should probably reset the DCC_CONFIG1 register for DLID
+		 * checking on incoming packets again. Use the value saved when
+		 * opening the snoop device.
+		 */
+		write_csr(dd, DCC_CFG_PORT_CONFIG1, dd->hfi1_snoop.dcc_cfg);
+	}
+
+	dd->hfi1_snoop.filter_callback = NULL;
+	kfree(dd->hfi1_snoop.filter_value);
+	dd->hfi1_snoop.filter_value = NULL;
+
+	/*
+	 * User is done snooping and capturing, return control to the normal
+	 * handler. Re-enable SDMA handling.
+	 */
+	dd->rhf_rcv_function_map = dd->normal_rhf_rcv_functions;
+	dd->process_pio_send = hfi1_verbs_send_pio;
+	dd->process_dma_send = hfi1_verbs_send_dma;
+	dd->pio_inline_send = pio_copy;
+
+	spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
+
+	snoop_dbg("snoop/capture device released");
+
+	return 0;
+}
+
+static unsigned int hfi1_snoop_poll(struct file *fp,
+				    struct poll_table_struct *wait)
+{
+	int ret = 0;
+	unsigned long flags = 0;
+
+	struct hfi1_devdata *dd;
+
+	dd = hfi1_dd_from_sc_inode(fp->f_inode);
+	if (dd == NULL)
+		return -ENODEV;
+
+	spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
+
+	poll_wait(fp, &dd->hfi1_snoop.waitq, wait);
+	if (!list_empty(&dd->hfi1_snoop.queue))
+		ret |= POLLIN | POLLRDNORM;
+
+	spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
+	return ret;
+
+}
+
+static ssize_t hfi1_snoop_write(struct file *fp, const char __user *data,
+				size_t count, loff_t *off)
+{
+	struct diag_pkt dpkt;
+	struct hfi1_devdata *dd;
+	size_t ret;
+	u8 byte_two, sl, sc5, sc4, vl, byte_one;
+	struct send_context *sc;
+	u32 len;
+	u64 pbc;
+	struct hfi1_ibport *ibp;
+	struct hfi1_pportdata *ppd;
+
+	dd = hfi1_dd_from_sc_inode(fp->f_inode);
+	if (dd == NULL)
+		return -ENODEV;
+
+	ppd = dd->pport;
+	snoop_dbg("received %lu bytes from user", count);
+
+	memset(&dpkt, 0, sizeof(struct diag_pkt));
+	dpkt.version = _DIAG_PKT_VERS;
+	dpkt.unit = dd->unit;
+	dpkt.port = 1;
+
+	if (likely(!(snoop_flags & SNOOP_USE_METADATA))) {
+		/*
+		* We need to generate the PBC and not let diagpkt_send do it,
+		* to do this we need the VL and the length in dwords.
+		* The VL can be determined by using the SL and looking up the
+		* SC. Then the SC can be converted into VL. The exception to
+		* this is those packets which are from an SMI queue pair.
+		* Since we can't detect anything about the QP here we have to
+		* rely on the SC. If its 0xF then we assume its SMI and
+		* do not look at the SL.
+		*/
+		if (copy_from_user(&byte_one, data, 1))
+			return -EINVAL;
+
+		if (copy_from_user(&byte_two, data+1, 1))
+			return -EINVAL;
+
+		sc4 = (byte_one >> 4) & 0xf;
+		if (sc4 == 0xF) {
+			snoop_dbg("Detected VL15 packet ignoring SL in packet");
+			vl = sc4;
+		} else {
+			sl = (byte_two >> 4) & 0xf;
+			ibp = to_iport(&dd->verbs_dev.ibdev, 1);
+			sc5 = ibp->sl_to_sc[sl];
+			vl = sc_to_vlt(dd, sc5);
+			if (vl != sc4) {
+				snoop_dbg("VL %d does not match SC %d of packet",
+					  vl, sc4);
+				return -EINVAL;
+			}
+		}
+
+		sc = dd->vld[vl].sc; /* Look up the context based on VL */
+		if (sc) {
+			dpkt.sw_index = sc->sw_index;
+			snoop_dbg("Sending on context %u(%u)", sc->sw_index,
+				  sc->hw_context);
+		} else {
+			snoop_dbg("Could not find context for vl %d", vl);
+			return -EINVAL;
+		}
+
+		len = (count >> 2) + 2; /* Add in PBC */
+		pbc = create_pbc(ppd, 0, 0, vl, len);
+	} else {
+		if (copy_from_user(&pbc, data, sizeof(pbc)))
+			return -EINVAL;
+		vl = (pbc >> PBC_VL_SHIFT) & PBC_VL_MASK;
+		sc = dd->vld[vl].sc; /* Look up the context based on VL */
+		if (sc) {
+			dpkt.sw_index = sc->sw_index;
+		} else {
+			snoop_dbg("Could not find context for vl %d", vl);
+			return -EINVAL;
+		}
+		data += sizeof(pbc);
+		count -= sizeof(pbc);
+	}
+	dpkt.len = count;
+	dpkt.data = (unsigned long)data;
+
+	snoop_dbg("PBC: vl=0x%llx Length=0x%llx",
+		  (pbc >> 12) & 0xf,
+		  (pbc & 0xfff));
+
+	dpkt.pbc = pbc;
+	ret = diagpkt_send(&dpkt);
+	/*
+	 * diagpkt_send only returns number of bytes in the diagpkt so patch
+	 * that up here before returning.
+	 */
+	if (ret == sizeof(dpkt))
+		return count;
+
+	return ret;
+}
+
+static ssize_t hfi1_snoop_read(struct file *fp, char __user *data,
+			       size_t pkt_len, loff_t *off)
+{
+	ssize_t ret = 0;
+	unsigned long flags = 0;
+	struct snoop_packet *packet = NULL;
+	struct hfi1_devdata *dd;
+
+	dd = hfi1_dd_from_sc_inode(fp->f_inode);
+	if (dd == NULL)
+		return -ENODEV;
+
+	spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
+
+	while (list_empty(&dd->hfi1_snoop.queue)) {
+		spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
+
+		if (fp->f_flags & O_NONBLOCK)
+			return -EAGAIN;
+
+		if (wait_event_interruptible(
+				dd->hfi1_snoop.waitq,
+				!list_empty(&dd->hfi1_snoop.queue)))
+			return -EINTR;
+
+		spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
+	}
+
+	if (!list_empty(&dd->hfi1_snoop.queue)) {
+		packet = list_entry(dd->hfi1_snoop.queue.next,
+				    struct snoop_packet, list);
+		list_del(&packet->list);
+		spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
+		if (pkt_len >= packet->total_len) {
+			if (copy_to_user(data, packet->data,
+				packet->total_len))
+				ret = -EFAULT;
+			else
+				ret = packet->total_len;
+		} else
+			ret = -EINVAL;
+
+		kfree(packet);
+	} else
+		spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
+
+	return ret;
+}
+
+static long hfi1_ioctl(struct file *fp, unsigned int cmd, unsigned long arg)
+{
+	struct hfi1_devdata *dd;
+	void *filter_value = NULL;
+	long ret = 0;
+	int value = 0;
+	u8 physState = 0;
+	u8 linkState = 0;
+	u16 devState = 0;
+	unsigned long flags = 0;
+	unsigned long *argp = NULL;
+	struct hfi1_packet_filter_command filter_cmd = {0};
+	int mode_flag = 0;
+	struct hfi1_pportdata *ppd = NULL;
+	unsigned int index;
+	struct hfi1_link_info link_info;
+
+	dd = hfi1_dd_from_sc_inode(fp->f_inode);
+	if (dd == NULL)
+		return -ENODEV;
+
+	spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
+
+	mode_flag = dd->hfi1_snoop.mode_flag;
+
+	if (((_IOC_DIR(cmd) & _IOC_READ)
+	    && !access_ok(VERIFY_WRITE, (void __user *)arg, _IOC_SIZE(cmd)))
+	    || ((_IOC_DIR(cmd) & _IOC_WRITE)
+	    && !access_ok(VERIFY_READ, (void __user *)arg, _IOC_SIZE(cmd)))) {
+		ret = -EFAULT;
+	} else if (!capable(CAP_SYS_ADMIN)) {
+		ret = -EPERM;
+	} else if ((mode_flag & HFI1_PORT_CAPTURE_MODE) &&
+		   (cmd != HFI1_SNOOP_IOCCLEARQUEUE) &&
+		   (cmd != HFI1_SNOOP_IOCCLEARFILTER) &&
+		   (cmd != HFI1_SNOOP_IOCSETFILTER)) {
+		/* Capture devices are allowed only 3 operations
+		 * 1.Clear capture queue
+		 * 2.Clear capture filter
+		 * 3.Set capture filter
+		 * Other are invalid.
+		 */
+		ret = -EINVAL;
+	} else {
+		switch (cmd) {
+		case HFI1_SNOOP_IOCSETLINKSTATE:
+			snoop_dbg("HFI1_SNOOP_IOCSETLINKSTATE is not valid");
+			ret = -EINVAL;
+			break;
+
+		case HFI1_SNOOP_IOCSETLINKSTATE_EXTRA:
+			memset(&link_info, 0, sizeof(link_info));
+
+			ret = copy_from_user(&link_info,
+				(struct hfi1_link_info __user *)arg,
+				sizeof(link_info));
+			if (ret)
+				break;
+
+			value = link_info.port_state;
+			index = link_info.port_number;
+			if (index > dd->num_pports - 1) {
+				ret = -EINVAL;
+				break;
+			}
+
+			ppd = &dd->pport[index];
+			if (!ppd) {
+				ret = -EINVAL;
+				break;
+			}
+
+			/* What we want to transition to */
+			physState = (value >> 4) & 0xF;
+			linkState = value & 0xF;
+			snoop_dbg("Setting link state 0x%x", value);
+
+			switch (linkState) {
+			case IB_PORT_NOP:
+				if (physState == 0)
+					break;
+					/* fall through */
+			case IB_PORT_DOWN:
+				switch (physState) {
+				case 0:
+					devState = HLS_DN_DOWNDEF;
+					break;
+				case 2:
+					devState = HLS_DN_POLL;
+					break;
+				case 3:
+					devState = HLS_DN_DISABLE;
+					break;
+				default:
+					ret = -EINVAL;
+					goto done;
+				}
+				ret = set_link_state(ppd, devState);
+				break;
+			case IB_PORT_ARMED:
+				ret = set_link_state(ppd, HLS_UP_ARMED);
+				if (!ret)
+					send_idle_sma(dd, SMA_IDLE_ARM);
+				break;
+			case IB_PORT_ACTIVE:
+				ret = set_link_state(ppd, HLS_UP_ACTIVE);
+				if (!ret)
+					send_idle_sma(dd, SMA_IDLE_ACTIVE);
+				break;
+			default:
+				ret = -EINVAL;
+				break;
+			}
+
+			if (ret)
+				break;
+			/* fall through */
+		case HFI1_SNOOP_IOCGETLINKSTATE:
+		case HFI1_SNOOP_IOCGETLINKSTATE_EXTRA:
+			if (cmd == HFI1_SNOOP_IOCGETLINKSTATE_EXTRA) {
+				memset(&link_info, 0, sizeof(link_info));
+				ret = copy_from_user(&link_info,
+					(struct hfi1_link_info __user *)arg,
+					sizeof(link_info));
+				index = link_info.port_number;
+			} else {
+				ret = __get_user(index, (int __user *) arg);
+				if (ret !=  0)
+					break;
+			}
+
+			if (index > dd->num_pports - 1) {
+				ret = -EINVAL;
+				break;
+			}
+
+			ppd = &dd->pport[index];
+			if (!ppd) {
+				ret = -EINVAL;
+				break;
+			}
+			value = hfi1_ibphys_portstate(ppd);
+			value <<= 4;
+			value |= driver_lstate(ppd);
+
+			snoop_dbg("Link port | Link State: %d", value);
+
+			if ((cmd == HFI1_SNOOP_IOCGETLINKSTATE_EXTRA) ||
+			    (cmd == HFI1_SNOOP_IOCSETLINKSTATE_EXTRA)) {
+				link_info.port_state = value;
+				link_info.node_guid = cpu_to_be64(ppd->guid);
+				link_info.link_speed_active =
+							ppd->link_speed_active;
+				link_info.link_width_active =
+							ppd->link_width_active;
+				ret = copy_to_user(
+					(struct hfi1_link_info __user *)arg,
+					&link_info, sizeof(link_info));
+			} else {
+				ret = __put_user(value, (int __user *)arg);
+			}
+			break;
+
+		case HFI1_SNOOP_IOCCLEARQUEUE:
+			snoop_dbg("Clearing snoop queue");
+			drain_snoop_list(&dd->hfi1_snoop.queue);
+			break;
+
+		case HFI1_SNOOP_IOCCLEARFILTER:
+			snoop_dbg("Clearing filter");
+			if (dd->hfi1_snoop.filter_callback) {
+				/* Drain packets first */
+				drain_snoop_list(&dd->hfi1_snoop.queue);
+				dd->hfi1_snoop.filter_callback = NULL;
+			}
+			kfree(dd->hfi1_snoop.filter_value);
+			dd->hfi1_snoop.filter_value = NULL;
+			break;
+
+		case HFI1_SNOOP_IOCSETFILTER:
+			snoop_dbg("Setting filter");
+			/* just copy command structure */
+			argp = (unsigned long *)arg;
+			ret = copy_from_user(&filter_cmd, (void __user *)argp,
+					     sizeof(filter_cmd));
+			if (ret < 0) {
+				pr_alert("Error copying filter command\n");
+				break;
+			}
+			if (filter_cmd.opcode >= HFI1_MAX_FILTERS) {
+				pr_alert("Invalid opcode in request\n");
+				ret = -EINVAL;
+				break;
+			}
+
+			snoop_dbg("Opcode %d Len %d Ptr %p",
+				   filter_cmd.opcode, filter_cmd.length,
+				   filter_cmd.value_ptr);
+
+			filter_value = kzalloc(
+						filter_cmd.length * sizeof(u8),
+						GFP_KERNEL);
+			if (!filter_value) {
+				pr_alert("Not enough memory\n");
+				ret = -ENOMEM;
+				break;
+			}
+			/* copy remaining data from userspace */
+			ret = copy_from_user((u8 *)filter_value,
+					(void __user *)filter_cmd.value_ptr,
+					filter_cmd.length);
+			if (ret < 0) {
+				kfree(filter_value);
+				pr_alert("Error copying filter data\n");
+				break;
+			}
+			/* Drain packets first */
+			drain_snoop_list(&dd->hfi1_snoop.queue);
+			dd->hfi1_snoop.filter_callback =
+				hfi1_filters[filter_cmd.opcode].filter;
+			/* just in case we see back to back sets */
+			kfree(dd->hfi1_snoop.filter_value);
+			dd->hfi1_snoop.filter_value = filter_value;
+
+			break;
+		case HFI1_SNOOP_IOCGETVERSION:
+			value = SNOOP_CAPTURE_VERSION;
+			snoop_dbg("Getting version: %d", value);
+			ret = __put_user(value, (int __user *)arg);
+			break;
+		case HFI1_SNOOP_IOCSET_OPTS:
+			snoop_flags = 0;
+			ret = __get_user(value, (int __user *) arg);
+			if (ret != 0)
+				break;
+
+			snoop_dbg("Setting snoop option %d", value);
+			if (value & SNOOP_DROP_SEND)
+				snoop_flags |= SNOOP_DROP_SEND;
+			if (value & SNOOP_USE_METADATA)
+				snoop_flags |= SNOOP_USE_METADATA;
+			break;
+		default:
+			ret = -ENOTTY;
+			break;
+		}
+	}
+done:
+	spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
+	return ret;
+}
+
+static void snoop_list_add_tail(struct snoop_packet *packet,
+				struct hfi1_devdata *dd)
+{
+	unsigned long flags = 0;
+
+	spin_lock_irqsave(&dd->hfi1_snoop.snoop_lock, flags);
+	if (likely((dd->hfi1_snoop.mode_flag & HFI1_PORT_SNOOP_MODE) ||
+		   (dd->hfi1_snoop.mode_flag & HFI1_PORT_CAPTURE_MODE))) {
+		list_add_tail(&packet->list, &dd->hfi1_snoop.queue);
+		snoop_dbg("Added packet to list");
+	}
+
+	/*
+	 * Technically we can could have closed the snoop device while waiting
+	 * on the above lock and it is gone now. The snoop mode_flag will
+	 * prevent us from adding the packet to the queue though.
+	 */
+
+	spin_unlock_irqrestore(&dd->hfi1_snoop.snoop_lock, flags);
+	wake_up_interruptible(&dd->hfi1_snoop.waitq);
+}
+
+static inline int hfi1_filter_check(void *val, const char *msg)
+{
+	if (!val) {
+		snoop_dbg("Error invalid %s value for filter", msg);
+		return HFI1_FILTER_ERR;
+	}
+	return 0;
+}
+
+static int hfi1_filter_lid(void *ibhdr, void *packet_data, void *value)
+{
+	struct hfi1_ib_header *hdr;
+	int ret;
+
+	ret = hfi1_filter_check(ibhdr, "header");
+	if (ret)
+		return ret;
+	ret = hfi1_filter_check(value, "user");
+	if (ret)
+		return ret;
+	hdr = (struct hfi1_ib_header *)ibhdr;
+
+	if (*((u16 *)value) == be16_to_cpu(hdr->lrh[3])) /* matches slid */
+		return HFI1_FILTER_HIT; /* matched */
+
+	return HFI1_FILTER_MISS; /* Not matched */
+}
+
+static int hfi1_filter_dlid(void *ibhdr, void *packet_data, void *value)
+{
+	struct hfi1_ib_header *hdr;
+	int ret;
+
+	ret = hfi1_filter_check(ibhdr, "header");
+	if (ret)
+		return ret;
+	ret = hfi1_filter_check(value, "user");
+	if (ret)
+		return ret;
+
+	hdr = (struct hfi1_ib_header *)ibhdr;
+
+	if (*((u16 *)value) == be16_to_cpu(hdr->lrh[1]))
+		return HFI1_FILTER_HIT;
+
+	return HFI1_FILTER_MISS;
+}
+
+/* Not valid for outgoing packets, send handler passes null for data*/
+static int hfi1_filter_mad_mgmt_class(void *ibhdr, void *packet_data,
+				      void *value)
+{
+	struct hfi1_ib_header *hdr;
+	struct hfi1_other_headers *ohdr = NULL;
+	struct ib_smp *smp = NULL;
+	u32 qpn = 0;
+	int ret;
+
+	ret = hfi1_filter_check(ibhdr, "header");
+	if (ret)
+		return ret;
+	ret = hfi1_filter_check(packet_data, "packet_data");
+	if (ret)
+		return ret;
+	ret = hfi1_filter_check(value, "user");
+	if (ret)
+		return ret;
+
+	hdr = (struct hfi1_ib_header *)ibhdr;
+
+	/* Check for GRH */
+	if ((be16_to_cpu(hdr->lrh[0]) & 3) == HFI1_LRH_BTH)
+		ohdr = &hdr->u.oth; /* LRH + BTH + DETH */
+	else
+		ohdr = &hdr->u.l.oth; /* LRH + GRH + BTH + DETH */
+
+	qpn = be32_to_cpu(ohdr->bth[1]) & 0x00FFFFFF;
+	if (qpn <= 1) {
+		smp = (struct ib_smp *)packet_data;
+		if (*((u8 *)value) == smp->mgmt_class)
+			return HFI1_FILTER_HIT;
+		else
+			return HFI1_FILTER_MISS;
+	}
+	return HFI1_FILTER_ERR;
+}
+
+static int hfi1_filter_qp_number(void *ibhdr, void *packet_data, void *value)
+{
+
+	struct hfi1_ib_header *hdr;
+	struct hfi1_other_headers *ohdr = NULL;
+	int ret;
+
+	ret = hfi1_filter_check(ibhdr, "header");
+	if (ret)
+		return ret;
+	ret = hfi1_filter_check(value, "user");
+	if (ret)
+		return ret;
+
+	hdr = (struct hfi1_ib_header *)ibhdr;
+
+	/* Check for GRH */
+	if ((be16_to_cpu(hdr->lrh[0]) & 3) == HFI1_LRH_BTH)
+		ohdr = &hdr->u.oth; /* LRH + BTH + DETH */
+	else
+		ohdr = &hdr->u.l.oth; /* LRH + GRH + BTH + DETH */
+	if (*((u32 *)value) == (be32_to_cpu(ohdr->bth[1]) & 0x00FFFFFF))
+		return HFI1_FILTER_HIT;
+
+	return HFI1_FILTER_MISS;
+}
+
+static int hfi1_filter_ibpacket_type(void *ibhdr, void *packet_data,
+				     void *value)
+{
+	u32 lnh = 0;
+	u8 opcode = 0;
+	struct hfi1_ib_header *hdr;
+	struct hfi1_other_headers *ohdr = NULL;
+	int ret;
+
+	ret = hfi1_filter_check(ibhdr, "header");
+	if (ret)
+		return ret;
+	ret = hfi1_filter_check(value, "user");
+	if (ret)
+		return ret;
+
+	hdr = (struct hfi1_ib_header *)ibhdr;
+
+	lnh = (be16_to_cpu(hdr->lrh[0]) & 3);
+
+	if (lnh == HFI1_LRH_BTH)
+		ohdr = &hdr->u.oth;
+	else if (lnh == HFI1_LRH_GRH)
+		ohdr = &hdr->u.l.oth;
+	else
+		return HFI1_FILTER_ERR;
+
+	opcode = be32_to_cpu(ohdr->bth[0]) >> 24;
+
+	if (*((u8 *)value) == ((opcode >> 5) & 0x7))
+		return HFI1_FILTER_HIT;
+
+	return HFI1_FILTER_MISS;
+}
+
+static int hfi1_filter_ib_service_level(void *ibhdr, void *packet_data,
+					void *value)
+{
+	struct hfi1_ib_header *hdr;
+	int ret;
+
+	ret = hfi1_filter_check(ibhdr, "header");
+	if (ret)
+		return ret;
+	ret = hfi1_filter_check(value, "user");
+	if (ret)
+		return ret;
+
+	hdr = (struct hfi1_ib_header *)ibhdr;
+
+	if ((*((u8 *)value)) == ((be16_to_cpu(hdr->lrh[0]) >> 4) & 0xF))
+		return HFI1_FILTER_HIT;
+
+	return HFI1_FILTER_MISS;
+}
+
+static int hfi1_filter_ib_pkey(void *ibhdr, void *packet_data, void *value)
+{
+
+	u32 lnh = 0;
+	struct hfi1_ib_header *hdr;
+	struct hfi1_other_headers *ohdr = NULL;
+	int ret;
+
+	ret = hfi1_filter_check(ibhdr, "header");
+	if (ret)
+		return ret;
+	ret = hfi1_filter_check(value, "user");
+	if (ret)
+		return ret;
+
+	hdr = (struct hfi1_ib_header *)ibhdr;
+
+	lnh = (be16_to_cpu(hdr->lrh[0]) & 3);
+	if (lnh == HFI1_LRH_BTH)
+		ohdr = &hdr->u.oth;
+	else if (lnh == HFI1_LRH_GRH)
+		ohdr = &hdr->u.l.oth;
+	else
+		return HFI1_FILTER_ERR;
+
+	/* P_key is 16-bit entity, however top most bit indicates
+	 * type of membership. 0 for limited and 1 for Full.
+	 * Limited members cannot accept information from other
+	 * Limited members, but communication is allowed between
+	 * every other combination of membership.
+	 * Hence we'll omit comparing top-most bit while filtering
+	 */
+
+	if ((*(u16 *)value & 0x7FFF) ==
+		((be32_to_cpu(ohdr->bth[0])) & 0x7FFF))
+		return HFI1_FILTER_HIT;
+
+	return HFI1_FILTER_MISS;
+}
+
+/*
+ * If packet_data is NULL then this is coming from one of the send functions.
+ * Thus we know if its an ingressed or egressed packet.
+ */
+static int hfi1_filter_direction(void *ibhdr, void *packet_data, void *value)
+{
+	u8 user_dir = *(u8 *)value;
+	int ret;
+
+	ret = hfi1_filter_check(value, "user");
+	if (ret)
+		return ret;
+
+	if (packet_data) {
+		/* Incoming packet */
+		if (user_dir & HFI1_SNOOP_INGRESS)
+			return HFI1_FILTER_HIT;
+	} else {
+		/* Outgoing packet */
+		if (user_dir & HFI1_SNOOP_EGRESS)
+			return HFI1_FILTER_HIT;
+	}
+
+	return HFI1_FILTER_MISS;
+}
+
+/*
+ * Allocate a snoop packet. The structure that is stored in the ring buffer, not
+ * to be confused with an hfi packet type.
+ */
+static struct snoop_packet *allocate_snoop_packet(u32 hdr_len,
+						  u32 data_len,
+						  u32 md_len)
+{
+
+	struct snoop_packet *packet = NULL;
+
+	packet = kzalloc(sizeof(struct snoop_packet) + hdr_len + data_len
+			 + md_len,
+			 GFP_ATOMIC | __GFP_NOWARN);
+	if (likely(packet))
+		INIT_LIST_HEAD(&packet->list);
+
+
+	return packet;
+}
+
+/*
+ * Instead of having snoop and capture code intermixed with the recv functions,
+ * both the interrupt handler and hfi1_ib_rcv() we are going to hijack the call
+ * and land in here for snoop/capture but if not enabled the call will go
+ * through as before. This gives us a single point to constrain all of the snoop
+ * snoop recv logic. There is nothing special that needs to happen for bypass
+ * packets. This routine should not try to look into the packet. It just copied
+ * it. There is no guarantee for filters when it comes to bypass packets as
+ * there is no specific support. Bottom line is this routine does now even know
+ * what a bypass packet is.
+ */
+int snoop_recv_handler(struct hfi1_packet *packet)
+{
+	struct hfi1_pportdata *ppd = packet->rcd->ppd;
+	struct hfi1_ib_header *hdr = packet->hdr;
+	int header_size = packet->hlen;
+	void *data = packet->ebuf;
+	u32 tlen = packet->tlen;
+	struct snoop_packet *s_packet = NULL;
+	int ret;
+	int snoop_mode = 0;
+	u32 md_len = 0;
+	struct capture_md md;
+
+	snoop_dbg("PACKET IN: hdr size %d tlen %d data %p", header_size, tlen,
+		  data);
+
+	trace_snoop_capture(ppd->dd, header_size, hdr, tlen - header_size,
+			    data);
+
+	if (!ppd->dd->hfi1_snoop.filter_callback) {
+		snoop_dbg("filter not set");
+		ret = HFI1_FILTER_HIT;
+	} else {
+		ret = ppd->dd->hfi1_snoop.filter_callback(hdr, data,
+					ppd->dd->hfi1_snoop.filter_value);
+	}
+
+	switch (ret) {
+	case HFI1_FILTER_ERR:
+		snoop_dbg("Error in filter call");
+		break;
+	case HFI1_FILTER_MISS:
+		snoop_dbg("Filter Miss");
+		break;
+	case HFI1_FILTER_HIT:
+
+		if (ppd->dd->hfi1_snoop.mode_flag & HFI1_PORT_SNOOP_MODE)
+			snoop_mode = 1;
+		if ((snoop_mode == 0) ||
+		    unlikely(snoop_flags & SNOOP_USE_METADATA))
+			md_len = sizeof(struct capture_md);
+
+
+		s_packet = allocate_snoop_packet(header_size,
+						 tlen - header_size,
+						 md_len);
+
+		if (unlikely(s_packet == NULL)) {
+			dd_dev_warn_ratelimited(ppd->dd, "Unable to allocate snoop/capture packet\n");
+			break;
+		}
+
+		if (md_len > 0) {
+			memset(&md, 0, sizeof(struct capture_md));
+			md.port = 1;
+			md.dir = PKT_DIR_INGRESS;
+			md.u.rhf = packet->rhf;
+			memcpy(s_packet->data, &md, md_len);
+		}
+
+		/* We should always have a header */
+		if (hdr) {
+			memcpy(s_packet->data + md_len, hdr, header_size);
+		} else {
+			dd_dev_err(ppd->dd, "Unable to copy header to snoop/capture packet\n");
+			kfree(s_packet);
+			break;
+		}
+
+		/*
+		 * Packets with no data are possible. If there is no data needed
+		 * to take care of the last 4 bytes which are normally included
+		 * with data buffers and are included in tlen.  Since we kzalloc
+		 * the buffer we do not need to set any values but if we decide
+		 * not to use kzalloc we should zero them.
+		 */
+		if (data)
+			memcpy(s_packet->data + header_size + md_len, data,
+			       tlen - header_size);
+
+		s_packet->total_len = tlen + md_len;
+		snoop_list_add_tail(s_packet, ppd->dd);
+
+		/*
+		 * If we are snooping the packet not capturing then throw away
+		 * after adding to the list.
+		 */
+		snoop_dbg("Capturing packet");
+		if (ppd->dd->hfi1_snoop.mode_flag & HFI1_PORT_SNOOP_MODE) {
+			snoop_dbg("Throwing packet away");
+			/*
+			 * If we are dropping the packet we still may need to
+			 * handle the case where error flags are set, this is
+			 * normally done by the type specific handler but that
+			 * won't be called in this case.
+			 */
+			if (unlikely(rhf_err_flags(packet->rhf)))
+				handle_eflags(packet);
+
+			/* throw the packet on the floor */
+			return RHF_RCV_CONTINUE;
+		}
+		break;
+	default:
+		break;
+	}
+
+	/*
+	 * We do not care what type of packet came in here - just pass it off
+	 * to the normal handler.
+	 */
+	return ppd->dd->normal_rhf_rcv_functions[rhf_rcv_type(packet->rhf)]
+			(packet);
+}
+
+/*
+ * Handle snooping and capturing packets when sdma is being used.
+ */
+int snoop_send_dma_handler(struct hfi1_qp *qp, struct ahg_ib_header *ibhdr,
+			   u32 hdrwords, struct hfi1_sge_state *ss, u32 len,
+			   u32 plen, u32 dwords, u64 pbc)
+{
+	pr_alert("Snooping/Capture of  Send DMA Packets Is Not Supported!\n");
+	snoop_dbg("Unsupported Operation");
+	return hfi1_verbs_send_dma(qp, ibhdr, hdrwords, ss, len, plen, dwords,
+				  0);
+}
+
+/*
+ * Handle snooping and capturing packets when pio is being used. Does not handle
+ * bypass packets. The only way to send a bypass packet currently is to use the
+ * diagpkt interface. When that interface is enable snoop/capture is not.
+ */
+int snoop_send_pio_handler(struct hfi1_qp *qp, struct ahg_ib_header *ahdr,
+			   u32 hdrwords, struct hfi1_sge_state *ss, u32 len,
+			   u32 plen, u32 dwords, u64 pbc)
+{
+	struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	struct snoop_packet *s_packet = NULL;
+	u32 *hdr = (u32 *)&ahdr->ibh;
+	u32 length = 0;
+	struct hfi1_sge_state temp_ss;
+	void *data = NULL;
+	void *data_start = NULL;
+	int ret;
+	int snoop_mode = 0;
+	int md_len = 0;
+	struct capture_md md;
+	u32 vl;
+	u32 hdr_len = hdrwords << 2;
+	u32 tlen = HFI1_GET_PKT_LEN(&ahdr->ibh);
+
+	md.u.pbc = 0;
+
+	snoop_dbg("PACKET OUT: hdrword %u len %u plen %u dwords %u tlen %u",
+		  hdrwords, len, plen, dwords, tlen);
+	if (ppd->dd->hfi1_snoop.mode_flag & HFI1_PORT_SNOOP_MODE)
+		snoop_mode = 1;
+	if ((snoop_mode == 0) ||
+	    unlikely(snoop_flags & SNOOP_USE_METADATA))
+		md_len = sizeof(struct capture_md);
+
+	/* not using ss->total_len as arg 2 b/c that does not count CRC */
+	s_packet = allocate_snoop_packet(hdr_len, tlen - hdr_len, md_len);
+
+	if (unlikely(s_packet == NULL)) {
+		dd_dev_warn_ratelimited(ppd->dd, "Unable to allocate snoop/capture packet\n");
+		goto out;
+	}
+
+	s_packet->total_len = tlen + md_len;
+
+	if (md_len > 0) {
+		memset(&md, 0, sizeof(struct capture_md));
+		md.port = 1;
+		md.dir = PKT_DIR_EGRESS;
+		if (likely(pbc == 0)) {
+			vl = be16_to_cpu(ahdr->ibh.lrh[0]) >> 12;
+			md.u.pbc = create_pbc(ppd, 0, qp->s_srate, vl, plen);
+		} else {
+			md.u.pbc = 0;
+		}
+		memcpy(s_packet->data, &md, md_len);
+	} else {
+		md.u.pbc = pbc;
+	}
+
+	/* Copy header */
+	if (likely(hdr)) {
+		memcpy(s_packet->data + md_len, hdr, hdr_len);
+	} else {
+		dd_dev_err(ppd->dd,
+			   "Unable to copy header to snoop/capture packet\n");
+		kfree(s_packet);
+		goto out;
+	}
+
+	if (ss) {
+		data = s_packet->data + hdr_len + md_len;
+		data_start = data;
+
+		/*
+		 * Copy SGE State
+		 * The update_sge() function below will not modify the
+		 * individual SGEs in the array. It will make a copy each time
+		 * and operate on that. So we only need to copy this instance
+		 * and it won't impact PIO.
+		 */
+		temp_ss = *ss;
+		length = len;
+
+		snoop_dbg("Need to copy %d bytes", length);
+		while (length) {
+			void *addr = temp_ss.sge.vaddr;
+			u32 slen = temp_ss.sge.length;
+
+			if (slen > length) {
+				slen = length;
+				snoop_dbg("slen %d > len %d", slen, length);
+			}
+			snoop_dbg("copy %d to %p", slen, addr);
+			memcpy(data, addr, slen);
+			update_sge(&temp_ss, slen);
+			length -= slen;
+			data += slen;
+			snoop_dbg("data is now %p bytes left %d", data, length);
+		}
+		snoop_dbg("Completed SGE copy");
+	}
+
+	/*
+	 * Why do the filter check down here? Because the event tracing has its
+	 * own filtering and we need to have the walked the SGE list.
+	 */
+	if (!ppd->dd->hfi1_snoop.filter_callback) {
+		snoop_dbg("filter not set\n");
+		ret = HFI1_FILTER_HIT;
+	} else {
+		ret = ppd->dd->hfi1_snoop.filter_callback(
+					&ahdr->ibh,
+					NULL,
+					ppd->dd->hfi1_snoop.filter_value);
+	}
+
+	switch (ret) {
+	case HFI1_FILTER_ERR:
+		snoop_dbg("Error in filter call");
+		/* fall through */
+	case HFI1_FILTER_MISS:
+		snoop_dbg("Filter Miss");
+		kfree(s_packet);
+		break;
+	case HFI1_FILTER_HIT:
+		snoop_dbg("Capturing packet");
+		snoop_list_add_tail(s_packet, ppd->dd);
+
+		if (unlikely((snoop_flags & SNOOP_DROP_SEND) &&
+			     (ppd->dd->hfi1_snoop.mode_flag &
+			      HFI1_PORT_SNOOP_MODE))) {
+			unsigned long flags;
+
+			snoop_dbg("Dropping packet");
+			if (qp->s_wqe) {
+				spin_lock_irqsave(&qp->s_lock, flags);
+				hfi1_send_complete(
+					qp,
+					qp->s_wqe,
+					IB_WC_SUCCESS);
+				spin_unlock_irqrestore(&qp->s_lock, flags);
+			} else if (qp->ibqp.qp_type == IB_QPT_RC) {
+				spin_lock_irqsave(&qp->s_lock, flags);
+				hfi1_rc_send_complete(qp, &ahdr->ibh);
+				spin_unlock_irqrestore(&qp->s_lock, flags);
+			}
+			return 0;
+		}
+		break;
+	default:
+		kfree(s_packet);
+		break;
+	}
+out:
+	return hfi1_verbs_send_pio(qp, ahdr, hdrwords, ss, len, plen, dwords,
+				  md.u.pbc);
+}
+
+/*
+ * Callers of this must pass a hfi1_ib_header type for the from ptr. Currently
+ * this can be used anywhere, but the intention is for inline ACKs for RC and
+ * CCA packets. We don't restrict this usage though.
+ */
+void snoop_inline_pio_send(struct hfi1_devdata *dd, struct pio_buf *pbuf,
+			   u64 pbc, const void *from, size_t count)
+{
+	int snoop_mode = 0;
+	int md_len = 0;
+	struct capture_md md;
+	struct snoop_packet *s_packet = NULL;
+
+	/*
+	 * count is in dwords so we need to convert to bytes.
+	 * We also need to account for CRC which would be tacked on by hardware.
+	 */
+	int packet_len = (count << 2) + 4;
+	int ret;
+
+	snoop_dbg("ACK OUT: len %d", packet_len);
+
+	if (!dd->hfi1_snoop.filter_callback) {
+		snoop_dbg("filter not set");
+		ret = HFI1_FILTER_HIT;
+	} else {
+		ret = dd->hfi1_snoop.filter_callback(
+				(struct hfi1_ib_header *)from,
+				NULL,
+				dd->hfi1_snoop.filter_value);
+	}
+
+	switch (ret) {
+	case HFI1_FILTER_ERR:
+		snoop_dbg("Error in filter call");
+		/* fall through */
+	case HFI1_FILTER_MISS:
+		snoop_dbg("Filter Miss");
+		break;
+	case HFI1_FILTER_HIT:
+		snoop_dbg("Capturing packet");
+		if (dd->hfi1_snoop.mode_flag & HFI1_PORT_SNOOP_MODE)
+			snoop_mode = 1;
+		if ((snoop_mode == 0) ||
+		    unlikely(snoop_flags & SNOOP_USE_METADATA))
+			md_len = sizeof(struct capture_md);
+
+		s_packet = allocate_snoop_packet(packet_len, 0, md_len);
+
+		if (unlikely(s_packet == NULL)) {
+			dd_dev_warn_ratelimited(dd, "Unable to allocate snoop/capture packet\n");
+			goto inline_pio_out;
+		}
+
+		s_packet->total_len = packet_len + md_len;
+
+		/* Fill in the metadata for the packet */
+		if (md_len > 0) {
+			memset(&md, 0, sizeof(struct capture_md));
+			md.port = 1;
+			md.dir = PKT_DIR_EGRESS;
+			md.u.pbc = pbc;
+			memcpy(s_packet->data, &md, md_len);
+		}
+
+		/* Add the packet data which is a single buffer */
+		memcpy(s_packet->data + md_len, from, packet_len);
+
+		snoop_list_add_tail(s_packet, dd);
+
+		if (unlikely((snoop_flags & SNOOP_DROP_SEND) && snoop_mode)) {
+			snoop_dbg("Dropping packet");
+			return;
+		}
+		break;
+	default:
+		break;
+	}
+
+inline_pio_out:
+	pio_copy(dd, pbuf, pbc, from, count);
+
+}
diff --git a/drivers/staging/rdma/hfi1/dma.c b/drivers/staging/rdma/hfi1/dma.c
new file mode 100644
index 000000000000..e03bd735173c
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/dma.c
@@ -0,0 +1,186 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#include <linux/types.h>
+#include <linux/scatterlist.h>
+
+#include "verbs.h"
+
+#define BAD_DMA_ADDRESS ((u64) 0)
+
+/*
+ * The following functions implement driver specific replacements
+ * for the ib_dma_*() functions.
+ *
+ * These functions return kernel virtual addresses instead of
+ * device bus addresses since the driver uses the CPU to copy
+ * data instead of using hardware DMA.
+ */
+
+static int hfi1_mapping_error(struct ib_device *dev, u64 dma_addr)
+{
+	return dma_addr == BAD_DMA_ADDRESS;
+}
+
+static u64 hfi1_dma_map_single(struct ib_device *dev, void *cpu_addr,
+			       size_t size, enum dma_data_direction direction)
+{
+	if (WARN_ON(!valid_dma_direction(direction)))
+		return BAD_DMA_ADDRESS;
+
+	return (u64) cpu_addr;
+}
+
+static void hfi1_dma_unmap_single(struct ib_device *dev, u64 addr, size_t size,
+				  enum dma_data_direction direction)
+{
+	/* This is a stub, nothing to be done here */
+}
+
+static u64 hfi1_dma_map_page(struct ib_device *dev, struct page *page,
+			     unsigned long offset, size_t size,
+			    enum dma_data_direction direction)
+{
+	u64 addr;
+
+	if (WARN_ON(!valid_dma_direction(direction)))
+		return BAD_DMA_ADDRESS;
+
+	if (offset + size > PAGE_SIZE)
+		return BAD_DMA_ADDRESS;
+
+	addr = (u64) page_address(page);
+	if (addr)
+		addr += offset;
+
+	return addr;
+}
+
+static void hfi1_dma_unmap_page(struct ib_device *dev, u64 addr, size_t size,
+				enum dma_data_direction direction)
+{
+	/* This is a stub, nothing to be done here */
+}
+
+static int hfi1_map_sg(struct ib_device *dev, struct scatterlist *sgl,
+		       int nents, enum dma_data_direction direction)
+{
+	struct scatterlist *sg;
+	u64 addr;
+	int i;
+	int ret = nents;
+
+	if (WARN_ON(!valid_dma_direction(direction)))
+		return BAD_DMA_ADDRESS;
+
+	for_each_sg(sgl, sg, nents, i) {
+		addr = (u64) page_address(sg_page(sg));
+		if (!addr) {
+			ret = 0;
+			break;
+		}
+		sg->dma_address = addr + sg->offset;
+#ifdef CONFIG_NEED_SG_DMA_LENGTH
+		sg->dma_length = sg->length;
+#endif
+	}
+	return ret;
+}
+
+static void hfi1_unmap_sg(struct ib_device *dev,
+			  struct scatterlist *sg, int nents,
+			 enum dma_data_direction direction)
+{
+	/* This is a stub, nothing to be done here */
+}
+
+static void hfi1_sync_single_for_cpu(struct ib_device *dev, u64 addr,
+				     size_t size, enum dma_data_direction dir)
+{
+}
+
+static void hfi1_sync_single_for_device(struct ib_device *dev, u64 addr,
+					size_t size,
+					enum dma_data_direction dir)
+{
+}
+
+static void *hfi1_dma_alloc_coherent(struct ib_device *dev, size_t size,
+				     u64 *dma_handle, gfp_t flag)
+{
+	struct page *p;
+	void *addr = NULL;
+
+	p = alloc_pages(flag, get_order(size));
+	if (p)
+		addr = page_address(p);
+	if (dma_handle)
+		*dma_handle = (u64) addr;
+	return addr;
+}
+
+static void hfi1_dma_free_coherent(struct ib_device *dev, size_t size,
+				   void *cpu_addr, u64 dma_handle)
+{
+	free_pages((unsigned long) cpu_addr, get_order(size));
+}
+
+struct ib_dma_mapping_ops hfi1_dma_mapping_ops = {
+	.mapping_error = hfi1_mapping_error,
+	.map_single = hfi1_dma_map_single,
+	.unmap_single = hfi1_dma_unmap_single,
+	.map_page = hfi1_dma_map_page,
+	.unmap_page = hfi1_dma_unmap_page,
+	.map_sg = hfi1_map_sg,
+	.unmap_sg = hfi1_unmap_sg,
+	.sync_single_for_cpu = hfi1_sync_single_for_cpu,
+	.sync_single_for_device = hfi1_sync_single_for_device,
+	.alloc_coherent = hfi1_dma_alloc_coherent,
+	.free_coherent = hfi1_dma_free_coherent
+};
diff --git a/drivers/staging/rdma/hfi1/driver.c b/drivers/staging/rdma/hfi1/driver.c
new file mode 100644
index 000000000000..c0a59001e5cd
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/driver.c
@@ -0,0 +1,1241 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/spinlock.h>
+#include <linux/pci.h>
+#include <linux/io.h>
+#include <linux/delay.h>
+#include <linux/netdevice.h>
+#include <linux/vmalloc.h>
+#include <linux/module.h>
+#include <linux/prefetch.h>
+
+#include "hfi.h"
+#include "trace.h"
+#include "qp.h"
+#include "sdma.h"
+
+#undef pr_fmt
+#define pr_fmt(fmt) DRIVER_NAME ": " fmt
+
+/*
+ * The size has to be longer than this string, so we can append
+ * board/chip information to it in the initialization code.
+ */
+const char ib_hfi1_version[] = HFI1_DRIVER_VERSION "\n";
+
+DEFINE_SPINLOCK(hfi1_devs_lock);
+LIST_HEAD(hfi1_dev_list);
+DEFINE_MUTEX(hfi1_mutex);	/* general driver use */
+
+unsigned int hfi1_max_mtu = HFI1_DEFAULT_MAX_MTU;
+module_param_named(max_mtu, hfi1_max_mtu, uint, S_IRUGO);
+MODULE_PARM_DESC(max_mtu, "Set max MTU bytes, default is 8192");
+
+unsigned int hfi1_cu = 1;
+module_param_named(cu, hfi1_cu, uint, S_IRUGO);
+MODULE_PARM_DESC(cu, "Credit return units");
+
+unsigned long hfi1_cap_mask = HFI1_CAP_MASK_DEFAULT;
+static int hfi1_caps_set(const char *, const struct kernel_param *);
+static int hfi1_caps_get(char *, const struct kernel_param *);
+static const struct kernel_param_ops cap_ops = {
+	.set = hfi1_caps_set,
+	.get = hfi1_caps_get
+};
+module_param_cb(cap_mask, &cap_ops, &hfi1_cap_mask, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(cap_mask, "Bit mask of enabled/disabled HW features");
+
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_DESCRIPTION("Intel Omni-Path Architecture driver");
+MODULE_VERSION(HFI1_DRIVER_VERSION);
+
+/*
+ * MAX_PKT_RCV is the max # if packets processed per receive interrupt.
+ */
+#define MAX_PKT_RECV 64
+#define EGR_HEAD_UPDATE_THRESHOLD 16
+
+struct hfi1_ib_stats hfi1_stats;
+
+static int hfi1_caps_set(const char *val, const struct kernel_param *kp)
+{
+	int ret = 0;
+	unsigned long *cap_mask_ptr = (unsigned long *)kp->arg,
+		cap_mask = *cap_mask_ptr, value, diff,
+		write_mask = ((HFI1_CAP_WRITABLE_MASK << HFI1_CAP_USER_SHIFT) |
+			      HFI1_CAP_WRITABLE_MASK);
+
+	ret = kstrtoul(val, 0, &value);
+	if (ret) {
+		pr_warn("Invalid module parameter value for 'cap_mask'\n");
+		goto done;
+	}
+	/* Get the changed bits (except the locked bit) */
+	diff = value ^ (cap_mask & ~HFI1_CAP_LOCKED_SMASK);
+
+	/* Remove any bits that are not allowed to change after driver load */
+	if (HFI1_CAP_LOCKED() && (diff & ~write_mask)) {
+		pr_warn("Ignoring non-writable capability bits %#lx\n",
+			diff & ~write_mask);
+		diff &= write_mask;
+	}
+
+	/* Mask off any reserved bits */
+	diff &= ~HFI1_CAP_RESERVED_MASK;
+	/* Clear any previously set and changing bits */
+	cap_mask &= ~diff;
+	/* Update the bits with the new capability */
+	cap_mask |= (value & diff);
+	/* Check for any kernel/user restrictions */
+	diff = (cap_mask & (HFI1_CAP_MUST_HAVE_KERN << HFI1_CAP_USER_SHIFT)) ^
+		((cap_mask & HFI1_CAP_MUST_HAVE_KERN) << HFI1_CAP_USER_SHIFT);
+	cap_mask &= ~diff;
+	/* Set the bitmask to the final set */
+	*cap_mask_ptr = cap_mask;
+done:
+	return ret;
+}
+
+static int hfi1_caps_get(char *buffer, const struct kernel_param *kp)
+{
+	unsigned long cap_mask = *(unsigned long *)kp->arg;
+
+	cap_mask &= ~HFI1_CAP_LOCKED_SMASK;
+	cap_mask |= ((cap_mask & HFI1_CAP_K2U) << HFI1_CAP_USER_SHIFT);
+
+	return scnprintf(buffer, PAGE_SIZE, "0x%lx", cap_mask);
+}
+
+const char *get_unit_name(int unit)
+{
+	static char iname[16];
+
+	snprintf(iname, sizeof(iname), DRIVER_NAME"_%u", unit);
+	return iname;
+}
+
+/*
+ * Return count of units with at least one port ACTIVE.
+ */
+int hfi1_count_active_units(void)
+{
+	struct hfi1_devdata *dd;
+	struct hfi1_pportdata *ppd;
+	unsigned long flags;
+	int pidx, nunits_active = 0;
+
+	spin_lock_irqsave(&hfi1_devs_lock, flags);
+	list_for_each_entry(dd, &hfi1_dev_list, list) {
+		if (!(dd->flags & HFI1_PRESENT) || !dd->kregbase)
+			continue;
+		for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+			ppd = dd->pport + pidx;
+			if (ppd->lid && ppd->linkup) {
+				nunits_active++;
+				break;
+			}
+		}
+	}
+	spin_unlock_irqrestore(&hfi1_devs_lock, flags);
+	return nunits_active;
+}
+
+/*
+ * Return count of all units, optionally return in arguments
+ * the number of usable (present) units, and the number of
+ * ports that are up.
+ */
+int hfi1_count_units(int *npresentp, int *nupp)
+{
+	int nunits = 0, npresent = 0, nup = 0;
+	struct hfi1_devdata *dd;
+	unsigned long flags;
+	int pidx;
+	struct hfi1_pportdata *ppd;
+
+	spin_lock_irqsave(&hfi1_devs_lock, flags);
+
+	list_for_each_entry(dd, &hfi1_dev_list, list) {
+		nunits++;
+		if ((dd->flags & HFI1_PRESENT) && dd->kregbase)
+			npresent++;
+		for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+			ppd = dd->pport + pidx;
+			if (ppd->lid && ppd->linkup)
+				nup++;
+		}
+	}
+
+	spin_unlock_irqrestore(&hfi1_devs_lock, flags);
+
+	if (npresentp)
+		*npresentp = npresent;
+	if (nupp)
+		*nupp = nup;
+
+	return nunits;
+}
+
+/*
+ * Get address of eager buffer from it's index (allocated in chunks, not
+ * contiguous).
+ */
+static inline void *get_egrbuf(const struct hfi1_ctxtdata *rcd, u64 rhf,
+			       u8 *update)
+{
+	u32 idx = rhf_egr_index(rhf), offset = rhf_egr_buf_offset(rhf);
+
+	*update |= !(idx & (rcd->egrbufs.threshold - 1)) && !offset;
+	return (void *)(((u64)(rcd->egrbufs.rcvtids[idx].addr)) +
+			(offset * RCV_BUF_BLOCK_SIZE));
+}
+
+/*
+ * Validate and encode the a given RcvArray Buffer size.
+ * The function will check whether the given size falls within
+ * allowed size ranges for the respective type and, optionally,
+ * return the proper encoding.
+ */
+inline int hfi1_rcvbuf_validate(u32 size, u8 type, u16 *encoded)
+{
+	if (unlikely(!IS_ALIGNED(size, PAGE_SIZE)))
+		return 0;
+	if (unlikely(size < MIN_EAGER_BUFFER))
+		return 0;
+	if (size >
+	    (type == PT_EAGER ? MAX_EAGER_BUFFER : MAX_EXPECTED_BUFFER))
+		return 0;
+	if (encoded)
+		*encoded = ilog2(size / PAGE_SIZE) + 1;
+	return 1;
+}
+
+static void rcv_hdrerr(struct hfi1_ctxtdata *rcd, struct hfi1_pportdata *ppd,
+		       struct hfi1_packet *packet)
+{
+	struct hfi1_message_header *rhdr = packet->hdr;
+	u32 rte = rhf_rcv_type_err(packet->rhf);
+	int lnh = be16_to_cpu(rhdr->lrh[0]) & 3;
+	struct hfi1_ibport *ibp = &ppd->ibport_data;
+
+	if (packet->rhf & (RHF_VCRC_ERR | RHF_ICRC_ERR))
+		return;
+
+	if (packet->rhf & RHF_TID_ERR) {
+		/* For TIDERR and RC QPs preemptively schedule a NAK */
+		struct hfi1_ib_header *hdr = (struct hfi1_ib_header *)rhdr;
+		struct hfi1_other_headers *ohdr = NULL;
+		u32 tlen = rhf_pkt_len(packet->rhf); /* in bytes */
+		u16 lid  = be16_to_cpu(hdr->lrh[1]);
+		u32 qp_num;
+		u32 rcv_flags = 0;
+
+		/* Sanity check packet */
+		if (tlen < 24)
+			goto drop;
+
+		/* Check for GRH */
+		if (lnh == HFI1_LRH_BTH)
+			ohdr = &hdr->u.oth;
+		else if (lnh == HFI1_LRH_GRH) {
+			u32 vtf;
+
+			ohdr = &hdr->u.l.oth;
+			if (hdr->u.l.grh.next_hdr != IB_GRH_NEXT_HDR)
+				goto drop;
+			vtf = be32_to_cpu(hdr->u.l.grh.version_tclass_flow);
+			if ((vtf >> IB_GRH_VERSION_SHIFT) != IB_GRH_VERSION)
+				goto drop;
+			rcv_flags |= HFI1_HAS_GRH;
+		} else
+			goto drop;
+
+		/* Get the destination QP number. */
+		qp_num = be32_to_cpu(ohdr->bth[1]) & HFI1_QPN_MASK;
+		if (lid < HFI1_MULTICAST_LID_BASE) {
+			struct hfi1_qp *qp;
+
+			rcu_read_lock();
+			qp = hfi1_lookup_qpn(ibp, qp_num);
+			if (!qp) {
+				rcu_read_unlock();
+				goto drop;
+			}
+
+			/*
+			 * Handle only RC QPs - for other QP types drop error
+			 * packet.
+			 */
+			spin_lock(&qp->r_lock);
+
+			/* Check for valid receive state. */
+			if (!(ib_hfi1_state_ops[qp->state] &
+			      HFI1_PROCESS_RECV_OK)) {
+				ibp->n_pkt_drops++;
+			}
+
+			switch (qp->ibqp.qp_type) {
+			case IB_QPT_RC:
+				hfi1_rc_hdrerr(
+					rcd,
+					hdr,
+					rcv_flags,
+					qp);
+				break;
+			default:
+				/* For now don't handle any other QP types */
+				break;
+			}
+
+			spin_unlock(&qp->r_lock);
+			rcu_read_unlock();
+		} /* Unicast QP */
+	} /* Valid packet with TIDErr */
+
+	/* handle "RcvTypeErr" flags */
+	switch (rte) {
+	case RHF_RTE_ERROR_OP_CODE_ERR:
+	{
+		u32 opcode;
+		void *ebuf = NULL;
+		__be32 *bth = NULL;
+
+		if (rhf_use_egr_bfr(packet->rhf))
+			ebuf = packet->ebuf;
+
+		if (ebuf == NULL)
+			goto drop; /* this should never happen */
+
+		if (lnh == HFI1_LRH_BTH)
+			bth = (__be32 *)ebuf;
+		else if (lnh == HFI1_LRH_GRH)
+			bth = (__be32 *)((char *)ebuf + sizeof(struct ib_grh));
+		else
+			goto drop;
+
+		opcode = be32_to_cpu(bth[0]) >> 24;
+		opcode &= 0xff;
+
+		if (opcode == IB_OPCODE_CNP) {
+			/*
+			 * Only in pre-B0 h/w is the CNP_OPCODE handled
+			 * via this code path (errata 291394).
+			 */
+			struct hfi1_qp *qp = NULL;
+			u32 lqpn, rqpn;
+			u16 rlid;
+			u8 svc_type, sl, sc5;
+
+			sc5  = (be16_to_cpu(rhdr->lrh[0]) >> 12) & 0xf;
+			if (rhf_dc_info(packet->rhf))
+				sc5 |= 0x10;
+			sl = ibp->sc_to_sl[sc5];
+
+			lqpn = be32_to_cpu(bth[1]) & HFI1_QPN_MASK;
+			rcu_read_lock();
+			qp = hfi1_lookup_qpn(ibp, lqpn);
+			if (qp == NULL) {
+				rcu_read_unlock();
+				goto drop;
+			}
+
+			switch (qp->ibqp.qp_type) {
+			case IB_QPT_UD:
+				rlid = 0;
+				rqpn = 0;
+				svc_type = IB_CC_SVCTYPE_UD;
+				break;
+			case IB_QPT_UC:
+				rlid = be16_to_cpu(rhdr->lrh[3]);
+				rqpn = qp->remote_qpn;
+				svc_type = IB_CC_SVCTYPE_UC;
+				break;
+			default:
+				goto drop;
+			}
+
+			process_becn(ppd, sl, rlid, lqpn, rqpn, svc_type);
+			rcu_read_unlock();
+		}
+
+		packet->rhf &= ~RHF_RCV_TYPE_ERR_SMASK;
+		break;
+	}
+	default:
+		break;
+	}
+
+drop:
+	return;
+}
+
+static inline void init_packet(struct hfi1_ctxtdata *rcd,
+			      struct hfi1_packet *packet)
+{
+
+	packet->rsize = rcd->rcvhdrqentsize; /* words */
+	packet->maxcnt = rcd->rcvhdrq_cnt * packet->rsize; /* words */
+	packet->rcd = rcd;
+	packet->updegr = 0;
+	packet->etail = -1;
+	packet->rhf_addr = (__le32 *) rcd->rcvhdrq + rcd->head +
+			   rcd->dd->rhf_offset;
+	packet->rhf = rhf_to_cpu(packet->rhf_addr);
+	packet->rhqoff = rcd->head;
+	packet->numpkt = 0;
+	packet->rcv_flags = 0;
+}
+
+#ifndef CONFIG_PRESCAN_RXQ
+static void prescan_rxq(struct hfi1_packet *packet) {}
+#else /* CONFIG_PRESCAN_RXQ */
+static int prescan_receive_queue;
+
+static void process_ecn(struct hfi1_qp *qp, struct hfi1_ib_header *hdr,
+			struct hfi1_other_headers *ohdr,
+			u64 rhf, struct ib_grh *grh)
+{
+	struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
+	u32 bth1;
+	u8 sc5, svc_type;
+	int is_fecn, is_becn;
+
+	switch (qp->ibqp.qp_type) {
+	case IB_QPT_UD:
+		svc_type = IB_CC_SVCTYPE_UD;
+		break;
+	case IB_QPT_UC:	/* LATER */
+	case IB_QPT_RC:	/* LATER */
+	default:
+		return;
+	}
+
+	is_fecn = (be32_to_cpu(ohdr->bth[1]) >> HFI1_FECN_SHIFT) &
+			HFI1_FECN_MASK;
+	is_becn = (be32_to_cpu(ohdr->bth[1]) >> HFI1_BECN_SHIFT) &
+			HFI1_BECN_MASK;
+
+	sc5 = (be16_to_cpu(hdr->lrh[0]) >> 12) & 0xf;
+	if (rhf_dc_info(rhf))
+		sc5 |= 0x10;
+
+	if (is_fecn) {
+		u32 src_qpn = be32_to_cpu(ohdr->u.ud.deth[1]) & HFI1_QPN_MASK;
+		u16 pkey = (u16)be32_to_cpu(ohdr->bth[0]);
+		u16 dlid = be16_to_cpu(hdr->lrh[1]);
+		u16 slid = be16_to_cpu(hdr->lrh[3]);
+
+		return_cnp(ibp, qp, src_qpn, pkey, dlid, slid, sc5, grh);
+	}
+
+	if (is_becn) {
+		struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+		u32 lqpn =  be32_to_cpu(ohdr->bth[1]) & HFI1_QPN_MASK;
+		u8 sl = ibp->sc_to_sl[sc5];
+
+		process_becn(ppd, sl, 0, lqpn, 0, svc_type);
+	}
+
+	/* turn off BECN, or FECN */
+	bth1 = be32_to_cpu(ohdr->bth[1]);
+	bth1 &= ~(HFI1_FECN_MASK << HFI1_FECN_SHIFT);
+	bth1 &= ~(HFI1_BECN_MASK << HFI1_BECN_SHIFT);
+	ohdr->bth[1] = cpu_to_be32(bth1);
+}
+
+struct ps_mdata {
+	struct hfi1_ctxtdata *rcd;
+	u32 rsize;
+	u32 maxcnt;
+	u32 ps_head;
+	u32 ps_tail;
+	u32 ps_seq;
+};
+
+static inline void init_ps_mdata(struct ps_mdata *mdata,
+				 struct hfi1_packet *packet)
+{
+	struct hfi1_ctxtdata *rcd = packet->rcd;
+
+	mdata->rcd = rcd;
+	mdata->rsize = packet->rsize;
+	mdata->maxcnt = packet->maxcnt;
+
+	if (rcd->ps_state.initialized == 0) {
+		mdata->ps_head = packet->rhqoff;
+		rcd->ps_state.initialized++;
+	} else
+		mdata->ps_head = rcd->ps_state.ps_head;
+
+	if (HFI1_CAP_IS_KSET(DMA_RTAIL)) {
+		mdata->ps_tail = packet->hdrqtail;
+		mdata->ps_seq = 0; /* not used with DMA_RTAIL */
+	} else {
+		mdata->ps_tail = 0; /* used only with DMA_RTAIL*/
+		mdata->ps_seq = rcd->seq_cnt;
+	}
+}
+
+static inline int ps_done(struct ps_mdata *mdata, u64 rhf)
+{
+	if (HFI1_CAP_IS_KSET(DMA_RTAIL))
+		return mdata->ps_head == mdata->ps_tail;
+	return mdata->ps_seq != rhf_rcv_seq(rhf);
+}
+
+static inline void update_ps_mdata(struct ps_mdata *mdata)
+{
+	struct hfi1_ctxtdata *rcd = mdata->rcd;
+
+	mdata->ps_head += mdata->rsize;
+	if (mdata->ps_head > mdata->maxcnt)
+		mdata->ps_head = 0;
+	rcd->ps_state.ps_head = mdata->ps_head;
+	if (!HFI1_CAP_IS_KSET(DMA_RTAIL)) {
+		if (++mdata->ps_seq > 13)
+			mdata->ps_seq = 1;
+	}
+}
+
+/*
+ * prescan_rxq - search through the receive queue looking for packets
+ * containing Excplicit Congestion Notifications (FECNs, or BECNs).
+ * When an ECN is found, process the Congestion Notification, and toggle
+ * it off.
+ */
+static void prescan_rxq(struct hfi1_packet *packet)
+{
+	struct hfi1_ctxtdata *rcd = packet->rcd;
+	struct ps_mdata mdata;
+
+	if (!prescan_receive_queue)
+		return;
+
+	init_ps_mdata(&mdata, packet);
+
+	while (1) {
+		struct hfi1_devdata *dd = rcd->dd;
+		struct hfi1_ibport *ibp = &rcd->ppd->ibport_data;
+		__le32 *rhf_addr = (__le32 *) rcd->rcvhdrq + mdata.ps_head +
+					 dd->rhf_offset;
+		struct hfi1_qp *qp;
+		struct hfi1_ib_header *hdr;
+		struct hfi1_other_headers *ohdr;
+		struct ib_grh *grh = NULL;
+		u64 rhf = rhf_to_cpu(rhf_addr);
+		u32 etype = rhf_rcv_type(rhf), qpn;
+		int is_ecn = 0;
+		u8 lnh;
+
+		if (ps_done(&mdata, rhf))
+			break;
+
+		if (etype != RHF_RCV_TYPE_IB)
+			goto next;
+
+		hdr = (struct hfi1_ib_header *)
+			hfi1_get_msgheader(dd, rhf_addr);
+		lnh = be16_to_cpu(hdr->lrh[0]) & 3;
+
+		if (lnh == HFI1_LRH_BTH)
+			ohdr = &hdr->u.oth;
+		else if (lnh == HFI1_LRH_GRH) {
+			ohdr = &hdr->u.l.oth;
+			grh = &hdr->u.l.grh;
+		} else
+			goto next; /* just in case */
+
+		is_ecn |= be32_to_cpu(ohdr->bth[1]) &
+			(HFI1_FECN_MASK << HFI1_FECN_SHIFT);
+		is_ecn |= be32_to_cpu(ohdr->bth[1]) &
+			(HFI1_BECN_MASK << HFI1_BECN_SHIFT);
+
+		if (!is_ecn)
+			goto next;
+
+		qpn = be32_to_cpu(ohdr->bth[1]) & HFI1_QPN_MASK;
+		rcu_read_lock();
+		qp = hfi1_lookup_qpn(ibp, qpn);
+
+		if (qp == NULL) {
+			rcu_read_unlock();
+			goto next;
+		}
+
+		process_ecn(qp, hdr, ohdr, rhf, grh);
+		rcu_read_unlock();
+next:
+		update_ps_mdata(&mdata);
+	}
+}
+#endif /* CONFIG_PRESCAN_RXQ */
+
+#define RCV_PKT_OK 0x0
+#define RCV_PKT_MAX 0x1
+
+static inline int process_rcv_packet(struct hfi1_packet *packet)
+{
+	int ret = RCV_PKT_OK;
+
+	packet->hdr = hfi1_get_msgheader(packet->rcd->dd,
+					 packet->rhf_addr);
+	packet->hlen = (u8 *)packet->rhf_addr - (u8 *)packet->hdr;
+	packet->etype = rhf_rcv_type(packet->rhf);
+	/* total length */
+	packet->tlen = rhf_pkt_len(packet->rhf); /* in bytes */
+	/* retrieve eager buffer details */
+	packet->ebuf = NULL;
+	if (rhf_use_egr_bfr(packet->rhf)) {
+		packet->etail = rhf_egr_index(packet->rhf);
+		packet->ebuf = get_egrbuf(packet->rcd, packet->rhf,
+				 &packet->updegr);
+		/*
+		 * Prefetch the contents of the eager buffer.  It is
+		 * OK to send a negative length to prefetch_range().
+		 * The +2 is the size of the RHF.
+		 */
+		prefetch_range(packet->ebuf,
+			packet->tlen - ((packet->rcd->rcvhdrqentsize -
+				  (rhf_hdrq_offset(packet->rhf)+2)) * 4));
+	}
+
+	/*
+	 * Call a type specific handler for the packet. We
+	 * should be able to trust that etype won't be beyond
+	 * the range of valid indexes. If so something is really
+	 * wrong and we can probably just let things come
+	 * crashing down. There is no need to eat another
+	 * comparison in this performance critical code.
+	 */
+	packet->rcd->dd->rhf_rcv_function_map[packet->etype](packet);
+	packet->numpkt++;
+
+	/* Set up for the next packet */
+	packet->rhqoff += packet->rsize;
+	if (packet->rhqoff >= packet->maxcnt)
+		packet->rhqoff = 0;
+
+	if (packet->numpkt == MAX_PKT_RECV) {
+		ret = RCV_PKT_MAX;
+		this_cpu_inc(*packet->rcd->dd->rcv_limit);
+	}
+
+	packet->rhf_addr = (__le32 *) packet->rcd->rcvhdrq + packet->rhqoff +
+				      packet->rcd->dd->rhf_offset;
+	packet->rhf = rhf_to_cpu(packet->rhf_addr);
+
+	return ret;
+}
+
+static inline void process_rcv_update(int last, struct hfi1_packet *packet)
+{
+	/*
+	 * Update head regs etc., every 16 packets, if not last pkt,
+	 * to help prevent rcvhdrq overflows, when many packets
+	 * are processed and queue is nearly full.
+	 * Don't request an interrupt for intermediate updates.
+	 */
+	if (!last && !(packet->numpkt & 0xf)) {
+		update_usrhead(packet->rcd, packet->rhqoff, packet->updegr,
+			       packet->etail, 0, 0);
+		packet->updegr = 0;
+	}
+	packet->rcv_flags = 0;
+}
+
+static inline void finish_packet(struct hfi1_packet *packet)
+{
+
+	/*
+	 * Nothing we need to free for the packet.
+	 *
+	 * The only thing we need to do is a final update and call for an
+	 * interrupt
+	 */
+	update_usrhead(packet->rcd, packet->rcd->head, packet->updegr,
+		       packet->etail, rcv_intr_dynamic, packet->numpkt);
+
+}
+
+static inline void process_rcv_qp_work(struct hfi1_packet *packet)
+{
+
+	struct hfi1_ctxtdata *rcd;
+	struct hfi1_qp *qp, *nqp;
+
+	rcd = packet->rcd;
+	rcd->head = packet->rhqoff;
+
+	/*
+	 * Iterate over all QPs waiting to respond.
+	 * The list won't change since the IRQ is only run on one CPU.
+	 */
+	list_for_each_entry_safe(qp, nqp, &rcd->qp_wait_list, rspwait) {
+		list_del_init(&qp->rspwait);
+		if (qp->r_flags & HFI1_R_RSP_NAK) {
+			qp->r_flags &= ~HFI1_R_RSP_NAK;
+			hfi1_send_rc_ack(rcd, qp, 0);
+		}
+		if (qp->r_flags & HFI1_R_RSP_SEND) {
+			unsigned long flags;
+
+			qp->r_flags &= ~HFI1_R_RSP_SEND;
+			spin_lock_irqsave(&qp->s_lock, flags);
+			if (ib_hfi1_state_ops[qp->state] &
+					HFI1_PROCESS_OR_FLUSH_SEND)
+				hfi1_schedule_send(qp);
+			spin_unlock_irqrestore(&qp->s_lock, flags);
+		}
+		if (atomic_dec_and_test(&qp->refcount))
+			wake_up(&qp->wait);
+	}
+}
+
+/*
+ * Handle receive interrupts when using the no dma rtail option.
+ */
+void handle_receive_interrupt_nodma_rtail(struct hfi1_ctxtdata *rcd)
+{
+	u32 seq;
+	int last = 0;
+	struct hfi1_packet packet;
+
+	init_packet(rcd, &packet);
+	seq = rhf_rcv_seq(packet.rhf);
+	if (seq != rcd->seq_cnt)
+		goto bail;
+
+	prescan_rxq(&packet);
+
+	while (!last) {
+		last = process_rcv_packet(&packet);
+		seq = rhf_rcv_seq(packet.rhf);
+		if (++rcd->seq_cnt > 13)
+			rcd->seq_cnt = 1;
+		if (seq != rcd->seq_cnt)
+			last = 1;
+		process_rcv_update(last, &packet);
+	}
+	process_rcv_qp_work(&packet);
+bail:
+	finish_packet(&packet);
+}
+
+void handle_receive_interrupt_dma_rtail(struct hfi1_ctxtdata *rcd)
+{
+	u32 hdrqtail;
+	int last = 0;
+	struct hfi1_packet packet;
+
+	init_packet(rcd, &packet);
+	hdrqtail = get_rcvhdrtail(rcd);
+	if (packet.rhqoff == hdrqtail)
+		goto bail;
+	smp_rmb();  /* prevent speculative reads of dma'ed hdrq */
+
+	prescan_rxq(&packet);
+
+	while (!last) {
+		last = process_rcv_packet(&packet);
+		if (packet.rhqoff == hdrqtail)
+			last = 1;
+		process_rcv_update(last, &packet);
+	}
+	process_rcv_qp_work(&packet);
+bail:
+	finish_packet(&packet);
+
+}
+
+static inline void set_all_nodma_rtail(struct hfi1_devdata *dd)
+{
+	int i;
+
+	for (i = 0; i < dd->first_user_ctxt; i++)
+		dd->rcd[i]->do_interrupt =
+			&handle_receive_interrupt_nodma_rtail;
+}
+
+static inline void set_all_dma_rtail(struct hfi1_devdata *dd)
+{
+	int i;
+
+	for (i = 0; i < dd->first_user_ctxt; i++)
+		dd->rcd[i]->do_interrupt =
+			&handle_receive_interrupt_dma_rtail;
+}
+
+/*
+ * handle_receive_interrupt - receive a packet
+ * @rcd: the context
+ *
+ * Called from interrupt handler for errors or receive interrupt.
+ * This is the slow path interrupt handler.
+ */
+void handle_receive_interrupt(struct hfi1_ctxtdata *rcd)
+{
+
+	struct hfi1_devdata *dd = rcd->dd;
+	u32 hdrqtail;
+	int last = 0, needset = 1;
+	struct hfi1_packet packet;
+
+	init_packet(rcd, &packet);
+
+	if (!HFI1_CAP_IS_KSET(DMA_RTAIL)) {
+		u32 seq = rhf_rcv_seq(packet.rhf);
+
+		if (seq != rcd->seq_cnt)
+			goto bail;
+		hdrqtail = 0;
+	} else {
+		hdrqtail = get_rcvhdrtail(rcd);
+		if (packet.rhqoff == hdrqtail)
+			goto bail;
+		smp_rmb();  /* prevent speculative reads of dma'ed hdrq */
+	}
+
+	prescan_rxq(&packet);
+
+	while (!last) {
+
+		if (unlikely(dd->do_drop && atomic_xchg(&dd->drop_packet,
+			DROP_PACKET_OFF) == DROP_PACKET_ON)) {
+			dd->do_drop = 0;
+
+			/* On to the next packet */
+			packet.rhqoff += packet.rsize;
+			packet.rhf_addr = (__le32 *) rcd->rcvhdrq +
+					  packet.rhqoff +
+					  dd->rhf_offset;
+			packet.rhf = rhf_to_cpu(packet.rhf_addr);
+
+		} else {
+			last = process_rcv_packet(&packet);
+		}
+
+		if (!HFI1_CAP_IS_KSET(DMA_RTAIL)) {
+			u32 seq = rhf_rcv_seq(packet.rhf);
+
+			if (++rcd->seq_cnt > 13)
+				rcd->seq_cnt = 1;
+			if (seq != rcd->seq_cnt)
+				last = 1;
+			if (needset) {
+				dd_dev_info(dd,
+					"Switching to NO_DMA_RTAIL\n");
+				set_all_nodma_rtail(dd);
+				needset = 0;
+			}
+		} else {
+			if (packet.rhqoff == hdrqtail)
+				last = 1;
+			if (needset) {
+				dd_dev_info(dd,
+					    "Switching to DMA_RTAIL\n");
+				set_all_dma_rtail(dd);
+				needset = 0;
+			}
+		}
+
+		process_rcv_update(last, &packet);
+	}
+
+	process_rcv_qp_work(&packet);
+
+bail:
+	/*
+	 * Always write head at end, and setup rcv interrupt, even
+	 * if no packets were processed.
+	 */
+	finish_packet(&packet);
+}
+
+/*
+ * Convert a given MTU size to the on-wire MAD packet enumeration.
+ * Return -1 if the size is invalid.
+ */
+int mtu_to_enum(u32 mtu, int default_if_bad)
+{
+	switch (mtu) {
+	case     0: return OPA_MTU_0;
+	case   256: return OPA_MTU_256;
+	case   512: return OPA_MTU_512;
+	case  1024: return OPA_MTU_1024;
+	case  2048: return OPA_MTU_2048;
+	case  4096: return OPA_MTU_4096;
+	case  8192: return OPA_MTU_8192;
+	case 10240: return OPA_MTU_10240;
+	}
+	return default_if_bad;
+}
+
+u16 enum_to_mtu(int mtu)
+{
+	switch (mtu) {
+	case OPA_MTU_0:     return 0;
+	case OPA_MTU_256:   return 256;
+	case OPA_MTU_512:   return 512;
+	case OPA_MTU_1024:  return 1024;
+	case OPA_MTU_2048:  return 2048;
+	case OPA_MTU_4096:  return 4096;
+	case OPA_MTU_8192:  return 8192;
+	case OPA_MTU_10240: return 10240;
+	default: return 0xffff;
+	}
+}
+
+/*
+ * set_mtu - set the MTU
+ * @ppd: the per port data
+ *
+ * We can handle "any" incoming size, the issue here is whether we
+ * need to restrict our outgoing size.  We do not deal with what happens
+ * to programs that are already running when the size changes.
+ */
+int set_mtu(struct hfi1_pportdata *ppd)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	int i, drain, ret = 0, is_up = 0;
+
+	ppd->ibmtu = 0;
+	for (i = 0; i < ppd->vls_supported; i++)
+		if (ppd->ibmtu < dd->vld[i].mtu)
+			ppd->ibmtu = dd->vld[i].mtu;
+	ppd->ibmaxlen = ppd->ibmtu + lrh_max_header_bytes(ppd->dd);
+
+	mutex_lock(&ppd->hls_lock);
+	if (ppd->host_link_state == HLS_UP_INIT
+			|| ppd->host_link_state == HLS_UP_ARMED
+			|| ppd->host_link_state == HLS_UP_ACTIVE)
+		is_up = 1;
+
+	drain = !is_ax(dd) && is_up;
+
+	if (drain)
+		/*
+		 * MTU is specified per-VL. To ensure that no packet gets
+		 * stuck (due, e.g., to the MTU for the packet's VL being
+		 * reduced), empty the per-VL FIFOs before adjusting MTU.
+		 */
+		ret = stop_drain_data_vls(dd);
+
+	if (ret) {
+		dd_dev_err(dd, "%s: cannot stop/drain VLs - refusing to change per-VL MTUs\n",
+			   __func__);
+		goto err;
+	}
+
+	hfi1_set_ib_cfg(ppd, HFI1_IB_CFG_MTU, 0);
+
+	if (drain)
+		open_fill_data_vls(dd); /* reopen all VLs */
+
+err:
+	mutex_unlock(&ppd->hls_lock);
+
+	return ret;
+}
+
+int hfi1_set_lid(struct hfi1_pportdata *ppd, u32 lid, u8 lmc)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+
+	ppd->lid = lid;
+	ppd->lmc = lmc;
+	hfi1_set_ib_cfg(ppd, HFI1_IB_CFG_LIDLMC, 0);
+
+	dd_dev_info(dd, "IB%u:%u got a lid: 0x%x\n", dd->unit, ppd->port, lid);
+
+	return 0;
+}
+
+/*
+ * Following deal with the "obviously simple" task of overriding the state
+ * of the LEDs, which normally indicate link physical and logical status.
+ * The complications arise in dealing with different hardware mappings
+ * and the board-dependent routine being called from interrupts.
+ * and then there's the requirement to _flash_ them.
+ */
+#define LED_OVER_FREQ_SHIFT 8
+#define LED_OVER_FREQ_MASK (0xFF<<LED_OVER_FREQ_SHIFT)
+/* Below is "non-zero" to force override, but both actual LEDs are off */
+#define LED_OVER_BOTH_OFF (8)
+
+static void run_led_override(unsigned long opaque)
+{
+	struct hfi1_pportdata *ppd = (struct hfi1_pportdata *)opaque;
+	struct hfi1_devdata *dd = ppd->dd;
+	int timeoff;
+	int ph_idx;
+
+	if (!(dd->flags & HFI1_INITTED))
+		return;
+
+	ph_idx = ppd->led_override_phase++ & 1;
+	ppd->led_override = ppd->led_override_vals[ph_idx];
+	timeoff = ppd->led_override_timeoff;
+
+	/*
+	 * don't re-fire the timer if user asked for it to be off; we let
+	 * it fire one more time after they turn it off to simplify
+	 */
+	if (ppd->led_override_vals[0] || ppd->led_override_vals[1])
+		mod_timer(&ppd->led_override_timer, jiffies + timeoff);
+}
+
+void hfi1_set_led_override(struct hfi1_pportdata *ppd, unsigned int val)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	int timeoff, freq;
+
+	if (!(dd->flags & HFI1_INITTED))
+		return;
+
+	/* First check if we are blinking. If not, use 1HZ polling */
+	timeoff = HZ;
+	freq = (val & LED_OVER_FREQ_MASK) >> LED_OVER_FREQ_SHIFT;
+
+	if (freq) {
+		/* For blink, set each phase from one nybble of val */
+		ppd->led_override_vals[0] = val & 0xF;
+		ppd->led_override_vals[1] = (val >> 4) & 0xF;
+		timeoff = (HZ << 4)/freq;
+	} else {
+		/* Non-blink set both phases the same. */
+		ppd->led_override_vals[0] = val & 0xF;
+		ppd->led_override_vals[1] = val & 0xF;
+	}
+	ppd->led_override_timeoff = timeoff;
+
+	/*
+	 * If the timer has not already been started, do so. Use a "quick"
+	 * timeout so the function will be called soon, to look at our request.
+	 */
+	if (atomic_inc_return(&ppd->led_override_timer_active) == 1) {
+		/* Need to start timer */
+		init_timer(&ppd->led_override_timer);
+		ppd->led_override_timer.function = run_led_override;
+		ppd->led_override_timer.data = (unsigned long) ppd;
+		ppd->led_override_timer.expires = jiffies + 1;
+		add_timer(&ppd->led_override_timer);
+	} else {
+		if (ppd->led_override_vals[0] || ppd->led_override_vals[1])
+			mod_timer(&ppd->led_override_timer, jiffies + 1);
+		atomic_dec(&ppd->led_override_timer_active);
+	}
+}
+
+/**
+ * hfi1_reset_device - reset the chip if possible
+ * @unit: the device to reset
+ *
+ * Whether or not reset is successful, we attempt to re-initialize the chip
+ * (that is, much like a driver unload/reload).  We clear the INITTED flag
+ * so that the various entry points will fail until we reinitialize.  For
+ * now, we only allow this if no user contexts are open that use chip resources
+ */
+int hfi1_reset_device(int unit)
+{
+	int ret, i;
+	struct hfi1_devdata *dd = hfi1_lookup(unit);
+	struct hfi1_pportdata *ppd;
+	unsigned long flags;
+	int pidx;
+
+	if (!dd) {
+		ret = -ENODEV;
+		goto bail;
+	}
+
+	dd_dev_info(dd, "Reset on unit %u requested\n", unit);
+
+	if (!dd->kregbase || !(dd->flags & HFI1_PRESENT)) {
+		dd_dev_info(dd,
+			"Invalid unit number %u or not initialized or not present\n",
+			unit);
+		ret = -ENXIO;
+		goto bail;
+	}
+
+	spin_lock_irqsave(&dd->uctxt_lock, flags);
+	if (dd->rcd)
+		for (i = dd->first_user_ctxt; i < dd->num_rcv_contexts; i++) {
+			if (!dd->rcd[i] || !dd->rcd[i]->cnt)
+				continue;
+			spin_unlock_irqrestore(&dd->uctxt_lock, flags);
+			ret = -EBUSY;
+			goto bail;
+		}
+	spin_unlock_irqrestore(&dd->uctxt_lock, flags);
+
+	for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+		ppd = dd->pport + pidx;
+		if (atomic_read(&ppd->led_override_timer_active)) {
+			/* Need to stop LED timer, _then_ shut off LEDs */
+			del_timer_sync(&ppd->led_override_timer);
+			atomic_set(&ppd->led_override_timer_active, 0);
+		}
+
+		/* Shut off LEDs after we are sure timer is not running */
+		ppd->led_override = LED_OVER_BOTH_OFF;
+	}
+	if (dd->flags & HFI1_HAS_SEND_DMA)
+		sdma_exit(dd);
+
+	hfi1_reset_cpu_counters(dd);
+
+	ret = hfi1_init(dd, 1);
+
+	if (ret)
+		dd_dev_err(dd,
+			"Reinitialize unit %u after reset failed with %d\n",
+			unit, ret);
+	else
+		dd_dev_info(dd, "Reinitialized unit %u after resetting\n",
+			unit);
+
+bail:
+	return ret;
+}
+
+void handle_eflags(struct hfi1_packet *packet)
+{
+	struct hfi1_ctxtdata *rcd = packet->rcd;
+	u32 rte = rhf_rcv_type_err(packet->rhf);
+
+	dd_dev_err(rcd->dd,
+		"receive context %d: rhf 0x%016llx, errs [ %s%s%s%s%s%s%s%s] rte 0x%x\n",
+		rcd->ctxt, packet->rhf,
+		packet->rhf & RHF_K_HDR_LEN_ERR ? "k_hdr_len " : "",
+		packet->rhf & RHF_DC_UNC_ERR ? "dc_unc " : "",
+		packet->rhf & RHF_DC_ERR ? "dc " : "",
+		packet->rhf & RHF_TID_ERR ? "tid " : "",
+		packet->rhf & RHF_LEN_ERR ? "len " : "",
+		packet->rhf & RHF_ECC_ERR ? "ecc " : "",
+		packet->rhf & RHF_VCRC_ERR ? "vcrc " : "",
+		packet->rhf & RHF_ICRC_ERR ? "icrc " : "",
+		rte);
+
+	rcv_hdrerr(rcd, rcd->ppd, packet);
+}
+
+/*
+ * The following functions are called by the interrupt handler. They are type
+ * specific handlers for each packet type.
+ */
+int process_receive_ib(struct hfi1_packet *packet)
+{
+	trace_hfi1_rcvhdr(packet->rcd->ppd->dd,
+			  packet->rcd->ctxt,
+			  rhf_err_flags(packet->rhf),
+			  RHF_RCV_TYPE_IB,
+			  packet->hlen,
+			  packet->tlen,
+			  packet->updegr,
+			  rhf_egr_index(packet->rhf));
+
+	if (unlikely(rhf_err_flags(packet->rhf))) {
+		handle_eflags(packet);
+		return RHF_RCV_CONTINUE;
+	}
+
+	hfi1_ib_rcv(packet);
+	return RHF_RCV_CONTINUE;
+}
+
+int process_receive_bypass(struct hfi1_packet *packet)
+{
+	if (unlikely(rhf_err_flags(packet->rhf)))
+		handle_eflags(packet);
+
+	dd_dev_err(packet->rcd->dd,
+	   "Bypass packets are not supported in normal operation. Dropping\n");
+	return RHF_RCV_CONTINUE;
+}
+
+int process_receive_error(struct hfi1_packet *packet)
+{
+	handle_eflags(packet);
+
+	if (unlikely(rhf_err_flags(packet->rhf)))
+		dd_dev_err(packet->rcd->dd,
+			   "Unhandled error packet received. Dropping.\n");
+
+	return RHF_RCV_CONTINUE;
+}
+
+int kdeth_process_expected(struct hfi1_packet *packet)
+{
+	if (unlikely(rhf_err_flags(packet->rhf)))
+		handle_eflags(packet);
+
+	dd_dev_err(packet->rcd->dd,
+		   "Unhandled expected packet received. Dropping.\n");
+	return RHF_RCV_CONTINUE;
+}
+
+int kdeth_process_eager(struct hfi1_packet *packet)
+{
+	if (unlikely(rhf_err_flags(packet->rhf)))
+		handle_eflags(packet);
+
+	dd_dev_err(packet->rcd->dd,
+		   "Unhandled eager packet received. Dropping.\n");
+	return RHF_RCV_CONTINUE;
+}
+
+int process_receive_invalid(struct hfi1_packet *packet)
+{
+	dd_dev_err(packet->rcd->dd, "Invalid packet type %d. Dropping\n",
+		rhf_rcv_type(packet->rhf));
+	return RHF_RCV_CONTINUE;
+}
diff --git a/drivers/staging/rdma/hfi1/eprom.c b/drivers/staging/rdma/hfi1/eprom.c
new file mode 100644
index 000000000000..b61d3ae93ed1
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/eprom.c
@@ -0,0 +1,475 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#include <linux/delay.h>
+#include "hfi.h"
+#include "common.h"
+#include "eprom.h"
+
+/*
+ * The EPROM is logically divided into two partitions:
+ *	partition 0: the first 128K, visible from PCI ROM BAR
+ *	partition 1: the rest
+ */
+#define P0_SIZE (128 * 1024)
+#define P1_START P0_SIZE
+
+/* largest erase size supported by the controller */
+#define SIZE_32KB (32 * 1024)
+#define MASK_32KB (SIZE_32KB - 1)
+
+/* controller page size, in bytes */
+#define EP_PAGE_SIZE 256
+#define EEP_PAGE_MASK (EP_PAGE_SIZE - 1)
+
+/* controller commands */
+#define CMD_SHIFT 24
+#define CMD_NOP			    (0)
+#define CMD_PAGE_PROGRAM(addr)	    ((0x02 << CMD_SHIFT) | addr)
+#define CMD_READ_DATA(addr)	    ((0x03 << CMD_SHIFT) | addr)
+#define CMD_READ_SR1		    ((0x05 << CMD_SHIFT))
+#define CMD_WRITE_ENABLE	    ((0x06 << CMD_SHIFT))
+#define CMD_SECTOR_ERASE_32KB(addr) ((0x52 << CMD_SHIFT) | addr)
+#define CMD_CHIP_ERASE		    ((0x60 << CMD_SHIFT))
+#define CMD_READ_MANUF_DEV_ID	    ((0x90 << CMD_SHIFT))
+#define CMD_RELEASE_POWERDOWN_NOID  ((0xab << CMD_SHIFT))
+
+/* controller interface speeds */
+#define EP_SPEED_FULL 0x2	/* full speed */
+
+/* controller status register 1 bits */
+#define SR1_BUSY 0x1ull		/* the BUSY bit in SR1 */
+
+/* sleep length while waiting for controller */
+#define WAIT_SLEEP_US 100	/* must be larger than 5 (see usage) */
+#define COUNT_DELAY_SEC(n) ((n) * (1000000/WAIT_SLEEP_US))
+
+/* GPIO pins */
+#define EPROM_WP_N (1ull << 14)	/* EPROM write line */
+
+/*
+ * Use the EP mutex to guard against other callers from within the driver.
+ * Also covers usage of eprom_available.
+ */
+static DEFINE_MUTEX(eprom_mutex);
+static int eprom_available;	/* default: not available */
+
+/*
+ * Turn on external enable line that allows writing on the flash.
+ */
+static void write_enable(struct hfi1_devdata *dd)
+{
+	/* raise signal */
+	write_csr(dd, ASIC_GPIO_OUT,
+		read_csr(dd, ASIC_GPIO_OUT) | EPROM_WP_N);
+	/* raise enable */
+	write_csr(dd, ASIC_GPIO_OE,
+		read_csr(dd, ASIC_GPIO_OE) | EPROM_WP_N);
+}
+
+/*
+ * Turn off external enable line that allows writing on the flash.
+ */
+static void write_disable(struct hfi1_devdata *dd)
+{
+	/* lower signal */
+	write_csr(dd, ASIC_GPIO_OUT,
+		read_csr(dd, ASIC_GPIO_OUT) & ~EPROM_WP_N);
+	/* lower enable */
+	write_csr(dd, ASIC_GPIO_OE,
+		read_csr(dd, ASIC_GPIO_OE) & ~EPROM_WP_N);
+}
+
+/*
+ * Wait for the device to become not busy.  Must be called after all
+ * write or erase operations.
+ */
+static int wait_for_not_busy(struct hfi1_devdata *dd)
+{
+	unsigned long count = 0;
+	u64 reg;
+	int ret = 0;
+
+	/* starts page mode */
+	write_csr(dd, ASIC_EEP_ADDR_CMD, CMD_READ_SR1);
+	while (1) {
+		udelay(WAIT_SLEEP_US);
+		usleep_range(WAIT_SLEEP_US - 5, WAIT_SLEEP_US + 5);
+		count++;
+		reg = read_csr(dd, ASIC_EEP_DATA);
+		if ((reg & SR1_BUSY) == 0)
+			break;
+		/* 200s is the largest time for a 128Mb device */
+		if (count > COUNT_DELAY_SEC(200)) {
+			dd_dev_err(dd, "waited too long for SPI FLASH busy to clear - failing\n");
+			ret = -ETIMEDOUT;
+			break; /* break, not goto - must stop page mode */
+		}
+	}
+
+	/* stop page mode with a NOP */
+	write_csr(dd, ASIC_EEP_ADDR_CMD, CMD_NOP);
+
+	return ret;
+}
+
+/*
+ * Read the device ID from the SPI controller.
+ */
+static u32 read_device_id(struct hfi1_devdata *dd)
+{
+	/* read the Manufacture Device ID */
+	write_csr(dd, ASIC_EEP_ADDR_CMD, CMD_READ_MANUF_DEV_ID);
+	return (u32)read_csr(dd, ASIC_EEP_DATA);
+}
+
+/*
+ * Erase the whole flash.
+ */
+static int erase_chip(struct hfi1_devdata *dd)
+{
+	int ret;
+
+	write_enable(dd);
+
+	write_csr(dd, ASIC_EEP_ADDR_CMD, CMD_WRITE_ENABLE);
+	write_csr(dd, ASIC_EEP_ADDR_CMD, CMD_CHIP_ERASE);
+	ret = wait_for_not_busy(dd);
+
+	write_disable(dd);
+
+	return ret;
+}
+
+/*
+ * Erase a range using the 32KB erase command.
+ */
+static int erase_32kb_range(struct hfi1_devdata *dd, u32 start, u32 end)
+{
+	int ret = 0;
+
+	if (end < start)
+		return -EINVAL;
+
+	if ((start & MASK_32KB) || (end & MASK_32KB)) {
+		dd_dev_err(dd,
+			"%s: non-aligned range (0x%x,0x%x) for a 32KB erase\n",
+			__func__, start, end);
+		return -EINVAL;
+	}
+
+	write_enable(dd);
+
+	for (; start < end; start += SIZE_32KB) {
+		write_csr(dd, ASIC_EEP_ADDR_CMD, CMD_WRITE_ENABLE);
+		write_csr(dd, ASIC_EEP_ADDR_CMD,
+						CMD_SECTOR_ERASE_32KB(start));
+		ret = wait_for_not_busy(dd);
+		if (ret)
+			goto done;
+	}
+
+done:
+	write_disable(dd);
+
+	return ret;
+}
+
+/*
+ * Read a 256 byte (64 dword) EPROM page.
+ * All callers have verified the offset is at a page boundary.
+ */
+static void read_page(struct hfi1_devdata *dd, u32 offset, u32 *result)
+{
+	int i;
+
+	write_csr(dd, ASIC_EEP_ADDR_CMD, CMD_READ_DATA(offset));
+	for (i = 0; i < EP_PAGE_SIZE/sizeof(u32); i++)
+		result[i] = (u32)read_csr(dd, ASIC_EEP_DATA);
+	write_csr(dd, ASIC_EEP_ADDR_CMD, CMD_NOP); /* close open page */
+}
+
+/*
+ * Read length bytes starting at offset.  Copy to user address addr.
+ */
+static int read_length(struct hfi1_devdata *dd, u32 start, u32 len, u64 addr)
+{
+	u32 offset;
+	u32 buffer[EP_PAGE_SIZE/sizeof(u32)];
+	int ret = 0;
+
+	/* reject anything not on an EPROM page boundary */
+	if ((start & EEP_PAGE_MASK) || (len & EEP_PAGE_MASK))
+		return -EINVAL;
+
+	for (offset = 0; offset < len; offset += EP_PAGE_SIZE) {
+		read_page(dd, start + offset, buffer);
+		if (copy_to_user((void __user *)(addr + offset),
+						buffer, EP_PAGE_SIZE)) {
+			ret = -EFAULT;
+			goto done;
+		}
+	}
+
+done:
+	return ret;
+}
+
+/*
+ * Write a 256 byte (64 dword) EPROM page.
+ * All callers have verified the offset is at a page boundary.
+ */
+static int write_page(struct hfi1_devdata *dd, u32 offset, u32 *data)
+{
+	int i;
+
+	write_csr(dd, ASIC_EEP_ADDR_CMD, CMD_WRITE_ENABLE);
+	write_csr(dd, ASIC_EEP_DATA, data[0]);
+	write_csr(dd, ASIC_EEP_ADDR_CMD, CMD_PAGE_PROGRAM(offset));
+	for (i = 1; i < EP_PAGE_SIZE/sizeof(u32); i++)
+		write_csr(dd, ASIC_EEP_DATA, data[i]);
+	/* will close the open page */
+	return wait_for_not_busy(dd);
+}
+
+/*
+ * Write length bytes starting at offset.  Read from user address addr.
+ */
+static int write_length(struct hfi1_devdata *dd, u32 start, u32 len, u64 addr)
+{
+	u32 offset;
+	u32 buffer[EP_PAGE_SIZE/sizeof(u32)];
+	int ret = 0;
+
+	/* reject anything not on an EPROM page boundary */
+	if ((start & EEP_PAGE_MASK) || (len & EEP_PAGE_MASK))
+		return -EINVAL;
+
+	write_enable(dd);
+
+	for (offset = 0; offset < len; offset += EP_PAGE_SIZE) {
+		if (copy_from_user(buffer, (void __user *)(addr + offset),
+						EP_PAGE_SIZE)) {
+			ret = -EFAULT;
+			goto done;
+		}
+		ret = write_page(dd, start + offset, buffer);
+		if (ret)
+			goto done;
+	}
+
+done:
+	write_disable(dd);
+	return ret;
+}
+
+/*
+ * Perform the given operation on the EPROM.  Called from user space.  The
+ * user credentials have already been checked.
+ *
+ * Return 0 on success, -ERRNO on error
+ */
+int handle_eprom_command(const struct hfi1_cmd *cmd)
+{
+	struct hfi1_devdata *dd;
+	u32 dev_id;
+	int ret = 0;
+
+	/*
+	 * The EPROM is per-device, so use unit 0 as that will always
+	 * exist.
+	 */
+	dd = hfi1_lookup(0);
+	if (!dd) {
+		pr_err("%s: cannot find unit 0!\n", __func__);
+		return -EINVAL;
+	}
+
+	/* lock against other callers touching the ASIC block */
+	mutex_lock(&eprom_mutex);
+
+	/* some platforms do not have an EPROM */
+	if (!eprom_available) {
+		ret = -ENOSYS;
+		goto done_asic;
+	}
+
+	/* lock against the other HFI on another OS */
+	ret = acquire_hw_mutex(dd);
+	if (ret) {
+		dd_dev_err(dd,
+			"%s: unable to acquire hw mutex, no EPROM support\n",
+			__func__);
+		goto done_asic;
+	}
+
+	dd_dev_info(dd, "%s: cmd: type %d, len 0x%x, addr 0x%016llx\n",
+		__func__, cmd->type, cmd->len, cmd->addr);
+
+	switch (cmd->type) {
+	case HFI1_CMD_EP_INFO:
+		if (cmd->len != sizeof(u32)) {
+			ret = -ERANGE;
+			break;
+		}
+		dev_id = read_device_id(dd);
+		/* addr points to a u32 user buffer */
+		if (copy_to_user((void __user *)cmd->addr, &dev_id,
+								sizeof(u32)))
+			ret = -EFAULT;
+		break;
+	case HFI1_CMD_EP_ERASE_CHIP:
+		ret = erase_chip(dd);
+		break;
+	case HFI1_CMD_EP_ERASE_P0:
+		if (cmd->len != P0_SIZE) {
+			ret = -ERANGE;
+			break;
+		}
+		ret = erase_32kb_range(dd, 0, cmd->len);
+		break;
+	case HFI1_CMD_EP_ERASE_P1:
+		/* check for overflow */
+		if (P1_START + cmd->len > ASIC_EEP_ADDR_CMD_EP_ADDR_MASK) {
+			ret = -ERANGE;
+			break;
+		}
+		ret = erase_32kb_range(dd, P1_START, P1_START + cmd->len);
+		break;
+	case HFI1_CMD_EP_READ_P0:
+		if (cmd->len != P0_SIZE) {
+			ret = -ERANGE;
+			break;
+		}
+		ret = read_length(dd, 0, cmd->len, cmd->addr);
+		break;
+	case HFI1_CMD_EP_READ_P1:
+		/* check for overflow */
+		if (P1_START + cmd->len > ASIC_EEP_ADDR_CMD_EP_ADDR_MASK) {
+			ret = -ERANGE;
+			break;
+		}
+		ret = read_length(dd, P1_START, cmd->len, cmd->addr);
+		break;
+	case HFI1_CMD_EP_WRITE_P0:
+		if (cmd->len > P0_SIZE) {
+			ret = -ERANGE;
+			break;
+		}
+		ret = write_length(dd, 0, cmd->len, cmd->addr);
+		break;
+	case HFI1_CMD_EP_WRITE_P1:
+		/* check for overflow */
+		if (P1_START + cmd->len > ASIC_EEP_ADDR_CMD_EP_ADDR_MASK) {
+			ret = -ERANGE;
+			break;
+		}
+		ret = write_length(dd, P1_START, cmd->len, cmd->addr);
+		break;
+	default:
+		dd_dev_err(dd, "%s: unexpected command %d\n",
+			__func__, cmd->type);
+		ret = -EINVAL;
+		break;
+	}
+
+	release_hw_mutex(dd);
+done_asic:
+	mutex_unlock(&eprom_mutex);
+	return ret;
+}
+
+/*
+ * Initialize the EPROM handler.
+ */
+int eprom_init(struct hfi1_devdata *dd)
+{
+	int ret = 0;
+
+	/* only the discrete chip has an EPROM, nothing to do */
+	if (dd->pcidev->device != PCI_DEVICE_ID_INTEL0)
+		return 0;
+
+	/* lock against other callers */
+	mutex_lock(&eprom_mutex);
+	if (eprom_available)	/* already initialized */
+		goto done_asic;
+
+	/*
+	 * Lock against the other HFI on another OS - the mutex above
+	 * would have caught anything in this driver.  It is OK if
+	 * both OSes reset the EPROM - as long as they don't do it at
+	 * the same time.
+	 */
+	ret = acquire_hw_mutex(dd);
+	if (ret) {
+		dd_dev_err(dd,
+			"%s: unable to acquire hw mutex, no EPROM support\n",
+			__func__);
+		goto done_asic;
+	}
+
+	/* reset EPROM to be sure it is in a good state */
+
+	/* set reset */
+	write_csr(dd, ASIC_EEP_CTL_STAT,
+					ASIC_EEP_CTL_STAT_EP_RESET_SMASK);
+	/* clear reset, set speed */
+	write_csr(dd, ASIC_EEP_CTL_STAT,
+			EP_SPEED_FULL << ASIC_EEP_CTL_STAT_RATE_SPI_SHIFT);
+
+	/* wake the device with command "release powerdown NoID" */
+	write_csr(dd, ASIC_EEP_ADDR_CMD, CMD_RELEASE_POWERDOWN_NOID);
+
+	eprom_available = 1;
+	release_hw_mutex(dd);
+done_asic:
+	mutex_unlock(&eprom_mutex);
+	return ret;
+}
diff --git a/drivers/staging/rdma/hfi1/eprom.h b/drivers/staging/rdma/hfi1/eprom.h
new file mode 100644
index 000000000000..64a64276be81
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/eprom.h
@@ -0,0 +1,55 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+struct hfi1_cmd;
+struct hfi1_devdata;
+
+int eprom_init(struct hfi1_devdata *dd);
+int handle_eprom_command(const struct hfi1_cmd *cmd);
diff --git a/drivers/staging/rdma/hfi1/file_ops.c b/drivers/staging/rdma/hfi1/file_ops.c
new file mode 100644
index 000000000000..469861750b76
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/file_ops.c
@@ -0,0 +1,2140 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#include <linux/pci.h>
+#include <linux/poll.h>
+#include <linux/cdev.h>
+#include <linux/swap.h>
+#include <linux/vmalloc.h>
+#include <linux/highmem.h>
+#include <linux/io.h>
+#include <linux/jiffies.h>
+#include <asm/pgtable.h>
+#include <linux/delay.h>
+#include <linux/export.h>
+#include <linux/module.h>
+#include <linux/cred.h>
+#include <linux/uio.h>
+
+#include "hfi.h"
+#include "pio.h"
+#include "device.h"
+#include "common.h"
+#include "trace.h"
+#include "user_sdma.h"
+#include "eprom.h"
+
+#undef pr_fmt
+#define pr_fmt(fmt) DRIVER_NAME ": " fmt
+
+#define SEND_CTXT_HALT_TIMEOUT 1000 /* msecs */
+
+/*
+ * File operation functions
+ */
+static int hfi1_file_open(struct inode *, struct file *);
+static int hfi1_file_close(struct inode *, struct file *);
+static ssize_t hfi1_file_write(struct file *, const char __user *,
+			       size_t, loff_t *);
+static ssize_t hfi1_write_iter(struct kiocb *, struct iov_iter *);
+static unsigned int hfi1_poll(struct file *, struct poll_table_struct *);
+static int hfi1_file_mmap(struct file *, struct vm_area_struct *);
+
+static u64 kvirt_to_phys(void *);
+static int assign_ctxt(struct file *, struct hfi1_user_info *);
+static int init_subctxts(struct hfi1_ctxtdata *, const struct hfi1_user_info *);
+static int user_init(struct file *);
+static int get_ctxt_info(struct file *, void __user *, __u32);
+static int get_base_info(struct file *, void __user *, __u32);
+static int setup_ctxt(struct file *);
+static int setup_subctxt(struct hfi1_ctxtdata *);
+static int get_user_context(struct file *, struct hfi1_user_info *,
+			    int, unsigned);
+static int find_shared_ctxt(struct file *, const struct hfi1_user_info *);
+static int allocate_ctxt(struct file *, struct hfi1_devdata *,
+			 struct hfi1_user_info *);
+static unsigned int poll_urgent(struct file *, struct poll_table_struct *);
+static unsigned int poll_next(struct file *, struct poll_table_struct *);
+static int user_event_ack(struct hfi1_ctxtdata *, int, unsigned long);
+static int set_ctxt_pkey(struct hfi1_ctxtdata *, unsigned, u16);
+static int manage_rcvq(struct hfi1_ctxtdata *, unsigned, int);
+static int vma_fault(struct vm_area_struct *, struct vm_fault *);
+static int exp_tid_setup(struct file *, struct hfi1_tid_info *);
+static int exp_tid_free(struct file *, struct hfi1_tid_info *);
+static void unlock_exp_tids(struct hfi1_ctxtdata *);
+
+static const struct file_operations hfi1_file_ops = {
+	.owner = THIS_MODULE,
+	.write = hfi1_file_write,
+	.write_iter = hfi1_write_iter,
+	.open = hfi1_file_open,
+	.release = hfi1_file_close,
+	.poll = hfi1_poll,
+	.mmap = hfi1_file_mmap,
+	.llseek = noop_llseek,
+};
+
+static struct vm_operations_struct vm_ops = {
+	.fault = vma_fault,
+};
+
+/*
+ * Types of memories mapped into user processes' space
+ */
+enum mmap_types {
+	PIO_BUFS = 1,
+	PIO_BUFS_SOP,
+	PIO_CRED,
+	RCV_HDRQ,
+	RCV_EGRBUF,
+	UREGS,
+	EVENTS,
+	STATUS,
+	RTAIL,
+	SUBCTXT_UREGS,
+	SUBCTXT_RCV_HDRQ,
+	SUBCTXT_EGRBUF,
+	SDMA_COMP
+};
+
+/*
+ * Masks and offsets defining the mmap tokens
+ */
+#define HFI1_MMAP_OFFSET_MASK   0xfffULL
+#define HFI1_MMAP_OFFSET_SHIFT  0
+#define HFI1_MMAP_SUBCTXT_MASK  0xfULL
+#define HFI1_MMAP_SUBCTXT_SHIFT 12
+#define HFI1_MMAP_CTXT_MASK     0xffULL
+#define HFI1_MMAP_CTXT_SHIFT    16
+#define HFI1_MMAP_TYPE_MASK     0xfULL
+#define HFI1_MMAP_TYPE_SHIFT    24
+#define HFI1_MMAP_MAGIC_MASK    0xffffffffULL
+#define HFI1_MMAP_MAGIC_SHIFT   32
+
+#define HFI1_MMAP_MAGIC         0xdabbad00
+
+#define HFI1_MMAP_TOKEN_SET(field, val)	\
+	(((val) & HFI1_MMAP_##field##_MASK) << HFI1_MMAP_##field##_SHIFT)
+#define HFI1_MMAP_TOKEN_GET(field, token) \
+	(((token) >> HFI1_MMAP_##field##_SHIFT) & HFI1_MMAP_##field##_MASK)
+#define HFI1_MMAP_TOKEN(type, ctxt, subctxt, addr)   \
+	(HFI1_MMAP_TOKEN_SET(MAGIC, HFI1_MMAP_MAGIC) | \
+	HFI1_MMAP_TOKEN_SET(TYPE, type) | \
+	HFI1_MMAP_TOKEN_SET(CTXT, ctxt) | \
+	HFI1_MMAP_TOKEN_SET(SUBCTXT, subctxt) | \
+	HFI1_MMAP_TOKEN_SET(OFFSET, ((unsigned long)addr & ~PAGE_MASK)))
+
+#define EXP_TID_SET(field, value)			\
+	(((value) & EXP_TID_TID##field##_MASK) <<	\
+	 EXP_TID_TID##field##_SHIFT)
+#define EXP_TID_CLEAR(tid, field) {					\
+		(tid) &= ~(EXP_TID_TID##field##_MASK <<			\
+			   EXP_TID_TID##field##_SHIFT);			\
+			}
+#define EXP_TID_RESET(tid, field, value) do {				\
+		EXP_TID_CLEAR(tid, field);				\
+		(tid) |= EXP_TID_SET(field, value);			\
+	} while (0)
+
+#define dbg(fmt, ...)				\
+	pr_info(fmt, ##__VA_ARGS__)
+
+
+static inline int is_valid_mmap(u64 token)
+{
+	return (HFI1_MMAP_TOKEN_GET(MAGIC, token) == HFI1_MMAP_MAGIC);
+}
+
+static int hfi1_file_open(struct inode *inode, struct file *fp)
+{
+	/* The real work is performed later in assign_ctxt() */
+	fp->private_data = kzalloc(sizeof(struct hfi1_filedata), GFP_KERNEL);
+	if (fp->private_data) /* no cpu affinity by default */
+		((struct hfi1_filedata *)fp->private_data)->rec_cpu_num = -1;
+	return fp->private_data ? 0 : -ENOMEM;
+}
+
+static ssize_t hfi1_file_write(struct file *fp, const char __user *data,
+			       size_t count, loff_t *offset)
+{
+	const struct hfi1_cmd __user *ucmd;
+	struct hfi1_ctxtdata *uctxt = ctxt_fp(fp);
+	struct hfi1_cmd cmd;
+	struct hfi1_user_info uinfo;
+	struct hfi1_tid_info tinfo;
+	ssize_t consumed = 0, copy = 0, ret = 0;
+	void *dest = NULL;
+	__u64 user_val = 0;
+	int uctxt_required = 1;
+	int must_be_root = 0;
+
+	if (count < sizeof(cmd)) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	ucmd = (const struct hfi1_cmd __user *)data;
+	if (copy_from_user(&cmd, ucmd, sizeof(cmd))) {
+		ret = -EFAULT;
+		goto bail;
+	}
+
+	consumed = sizeof(cmd);
+
+	switch (cmd.type) {
+	case HFI1_CMD_ASSIGN_CTXT:
+		uctxt_required = 0;	/* assigned user context not required */
+		copy = sizeof(uinfo);
+		dest = &uinfo;
+		break;
+	case HFI1_CMD_SDMA_STATUS_UPD:
+	case HFI1_CMD_CREDIT_UPD:
+		copy = 0;
+		break;
+	case HFI1_CMD_TID_UPDATE:
+	case HFI1_CMD_TID_FREE:
+		copy = sizeof(tinfo);
+		dest = &tinfo;
+		break;
+	case HFI1_CMD_USER_INFO:
+	case HFI1_CMD_RECV_CTRL:
+	case HFI1_CMD_POLL_TYPE:
+	case HFI1_CMD_ACK_EVENT:
+	case HFI1_CMD_CTXT_INFO:
+	case HFI1_CMD_SET_PKEY:
+	case HFI1_CMD_CTXT_RESET:
+		copy = 0;
+		user_val = cmd.addr;
+		break;
+	case HFI1_CMD_EP_INFO:
+	case HFI1_CMD_EP_ERASE_CHIP:
+	case HFI1_CMD_EP_ERASE_P0:
+	case HFI1_CMD_EP_ERASE_P1:
+	case HFI1_CMD_EP_READ_P0:
+	case HFI1_CMD_EP_READ_P1:
+	case HFI1_CMD_EP_WRITE_P0:
+	case HFI1_CMD_EP_WRITE_P1:
+		uctxt_required = 0;	/* assigned user context not required */
+		must_be_root = 1;	/* validate user */
+		copy = 0;
+		break;
+	default:
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	/* If the command comes with user data, copy it. */
+	if (copy) {
+		if (copy_from_user(dest, (void __user *)cmd.addr, copy)) {
+			ret = -EFAULT;
+			goto bail;
+		}
+		consumed += copy;
+	}
+
+	/*
+	 * Make sure there is a uctxt when needed.
+	 */
+	if (uctxt_required && !uctxt) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	/* only root can do these operations */
+	if (must_be_root && !capable(CAP_SYS_ADMIN)) {
+		ret = -EPERM;
+		goto bail;
+	}
+
+	switch (cmd.type) {
+	case HFI1_CMD_ASSIGN_CTXT:
+		ret = assign_ctxt(fp, &uinfo);
+		if (ret < 0)
+			goto bail;
+		ret = setup_ctxt(fp);
+		if (ret)
+			goto bail;
+		ret = user_init(fp);
+		break;
+	case HFI1_CMD_CTXT_INFO:
+		ret = get_ctxt_info(fp, (void __user *)(unsigned long)
+				    user_val, cmd.len);
+		break;
+	case HFI1_CMD_USER_INFO:
+		ret = get_base_info(fp, (void __user *)(unsigned long)
+				    user_val, cmd.len);
+		break;
+	case HFI1_CMD_SDMA_STATUS_UPD:
+		break;
+	case HFI1_CMD_CREDIT_UPD:
+		if (uctxt && uctxt->sc)
+			sc_return_credits(uctxt->sc);
+		break;
+	case HFI1_CMD_TID_UPDATE:
+		ret = exp_tid_setup(fp, &tinfo);
+		if (!ret) {
+			unsigned long addr;
+			/*
+			 * Copy the number of tidlist entries we used
+			 * and the length of the buffer we registered.
+			 * These fields are adjacent in the structure so
+			 * we can copy them at the same time.
+			 */
+			addr = (unsigned long)cmd.addr +
+				offsetof(struct hfi1_tid_info, tidcnt);
+			if (copy_to_user((void __user *)addr, &tinfo.tidcnt,
+					 sizeof(tinfo.tidcnt) +
+					 sizeof(tinfo.length)))
+				ret = -EFAULT;
+		}
+		break;
+	case HFI1_CMD_TID_FREE:
+		ret = exp_tid_free(fp, &tinfo);
+		break;
+	case HFI1_CMD_RECV_CTRL:
+		ret = manage_rcvq(uctxt, subctxt_fp(fp), (int)user_val);
+		break;
+	case HFI1_CMD_POLL_TYPE:
+		uctxt->poll_type = (typeof(uctxt->poll_type))user_val;
+		break;
+	case HFI1_CMD_ACK_EVENT:
+		ret = user_event_ack(uctxt, subctxt_fp(fp), user_val);
+		break;
+	case HFI1_CMD_SET_PKEY:
+		if (HFI1_CAP_IS_USET(PKEY_CHECK))
+			ret = set_ctxt_pkey(uctxt, subctxt_fp(fp), user_val);
+		else
+			ret = -EPERM;
+		break;
+	case HFI1_CMD_CTXT_RESET: {
+		struct send_context *sc;
+		struct hfi1_devdata *dd;
+
+		if (!uctxt || !uctxt->dd || !uctxt->sc) {
+			ret = -EINVAL;
+			break;
+		}
+		/*
+		 * There is no protection here. User level has to
+		 * guarantee that no one will be writing to the send
+		 * context while it is being re-initialized.
+		 * If user level breaks that guarantee, it will break
+		 * it's own context and no one else's.
+		 */
+		dd = uctxt->dd;
+		sc = uctxt->sc;
+		/*
+		 * Wait until the interrupt handler has marked the
+		 * context as halted or frozen. Report error if we time
+		 * out.
+		 */
+		wait_event_interruptible_timeout(
+			sc->halt_wait, (sc->flags & SCF_HALTED),
+			msecs_to_jiffies(SEND_CTXT_HALT_TIMEOUT));
+		if (!(sc->flags & SCF_HALTED)) {
+			ret = -ENOLCK;
+			break;
+		}
+		/*
+		 * If the send context was halted due to a Freeze,
+		 * wait until the device has been "unfrozen" before
+		 * resetting the context.
+		 */
+		if (sc->flags & SCF_FROZEN) {
+			wait_event_interruptible_timeout(
+				dd->event_queue,
+				!(ACCESS_ONCE(dd->flags) & HFI1_FROZEN),
+				msecs_to_jiffies(SEND_CTXT_HALT_TIMEOUT));
+			if (dd->flags & HFI1_FROZEN) {
+				ret = -ENOLCK;
+				break;
+			}
+			if (dd->flags & HFI1_FORCED_FREEZE) {
+				/* Don't allow context reset if we are into
+				 * forced freeze */
+				ret = -ENODEV;
+				break;
+			}
+			sc_disable(sc);
+			ret = sc_enable(sc);
+			hfi1_rcvctrl(dd, HFI1_RCVCTRL_CTXT_ENB,
+				     uctxt->ctxt);
+		} else
+			ret = sc_restart(sc);
+		if (!ret)
+			sc_return_credits(sc);
+		break;
+	}
+	case HFI1_CMD_EP_INFO:
+	case HFI1_CMD_EP_ERASE_CHIP:
+	case HFI1_CMD_EP_ERASE_P0:
+	case HFI1_CMD_EP_ERASE_P1:
+	case HFI1_CMD_EP_READ_P0:
+	case HFI1_CMD_EP_READ_P1:
+	case HFI1_CMD_EP_WRITE_P0:
+	case HFI1_CMD_EP_WRITE_P1:
+		ret = handle_eprom_command(&cmd);
+		break;
+	}
+
+	if (ret >= 0)
+		ret = consumed;
+bail:
+	return ret;
+}
+
+static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from)
+{
+	struct hfi1_user_sdma_pkt_q *pq;
+	struct hfi1_user_sdma_comp_q *cq;
+	int ret = 0, done = 0, reqs = 0;
+	unsigned long dim = from->nr_segs;
+
+	if (!user_sdma_comp_fp(kiocb->ki_filp) ||
+	    !user_sdma_pkt_fp(kiocb->ki_filp)) {
+		ret = -EIO;
+		goto done;
+	}
+
+	if (!iter_is_iovec(from) || !dim) {
+		ret = -EINVAL;
+		goto done;
+	}
+
+	hfi1_cdbg(SDMA, "SDMA request from %u:%u (%lu)",
+		  ctxt_fp(kiocb->ki_filp)->ctxt, subctxt_fp(kiocb->ki_filp),
+		  dim);
+	pq = user_sdma_pkt_fp(kiocb->ki_filp);
+	cq = user_sdma_comp_fp(kiocb->ki_filp);
+
+	if (atomic_read(&pq->n_reqs) == pq->n_max_reqs) {
+		ret = -ENOSPC;
+		goto done;
+	}
+
+	while (dim) {
+		unsigned long count = 0;
+
+		ret = hfi1_user_sdma_process_request(
+			kiocb->ki_filp,	(struct iovec *)(from->iov + done),
+			dim, &count);
+		if (ret)
+			goto done;
+		dim -= count;
+		done += count;
+		reqs++;
+	}
+done:
+	return ret ? ret : reqs;
+}
+
+static int hfi1_file_mmap(struct file *fp, struct vm_area_struct *vma)
+{
+	struct hfi1_ctxtdata *uctxt;
+	struct hfi1_devdata *dd;
+	unsigned long flags, pfn;
+	u64 token = vma->vm_pgoff << PAGE_SHIFT,
+		memaddr = 0;
+	u8 subctxt, mapio = 0, vmf = 0, type;
+	ssize_t memlen = 0;
+	int ret = 0;
+	u16 ctxt;
+
+	uctxt = ctxt_fp(fp);
+	if (!is_valid_mmap(token) || !uctxt ||
+	    !(vma->vm_flags & VM_SHARED)) {
+		ret = -EINVAL;
+		goto done;
+	}
+	dd = uctxt->dd;
+	ctxt = HFI1_MMAP_TOKEN_GET(CTXT, token);
+	subctxt = HFI1_MMAP_TOKEN_GET(SUBCTXT, token);
+	type = HFI1_MMAP_TOKEN_GET(TYPE, token);
+	if (ctxt != uctxt->ctxt || subctxt != subctxt_fp(fp)) {
+		ret = -EINVAL;
+		goto done;
+	}
+
+	flags = vma->vm_flags;
+
+	switch (type) {
+	case PIO_BUFS:
+	case PIO_BUFS_SOP:
+		memaddr = ((dd->physaddr + TXE_PIO_SEND) +
+				/* chip pio base */
+			   (uctxt->sc->hw_context * (1 << 16))) +
+				/* 64K PIO space / ctxt */
+			(type == PIO_BUFS_SOP ?
+				(TXE_PIO_SIZE / 2) : 0); /* sop? */
+		/*
+		 * Map only the amount allocated to the context, not the
+		 * entire available context's PIO space.
+		 */
+		memlen = ALIGN(uctxt->sc->credits * PIO_BLOCK_SIZE,
+			       PAGE_SIZE);
+		flags &= ~VM_MAYREAD;
+		flags |= VM_DONTCOPY | VM_DONTEXPAND;
+		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+		mapio = 1;
+		break;
+	case PIO_CRED:
+		if (flags & VM_WRITE) {
+			ret = -EPERM;
+			goto done;
+		}
+		/*
+		 * The credit return location for this context could be on the
+		 * second or third page allocated for credit returns (if number
+		 * of enabled contexts > 64 and 128 respectively).
+		 */
+		memaddr = dd->cr_base[uctxt->numa_id].pa +
+			(((u64)uctxt->sc->hw_free -
+			  (u64)dd->cr_base[uctxt->numa_id].va) & PAGE_MASK);
+		memlen = PAGE_SIZE;
+		flags &= ~VM_MAYWRITE;
+		flags |= VM_DONTCOPY | VM_DONTEXPAND;
+		/*
+		 * The driver has already allocated memory for credit
+		 * returns and programmed it into the chip. Has that
+		 * memory been flagged as non-cached?
+		 */
+		/* vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); */
+		mapio = 1;
+		break;
+	case RCV_HDRQ:
+		memaddr = uctxt->rcvhdrq_phys;
+		memlen = uctxt->rcvhdrq_size;
+		break;
+	case RCV_EGRBUF: {
+		unsigned long addr;
+		int i;
+		/*
+		 * The RcvEgr buffer need to be handled differently
+		 * as multiple non-contiguous pages need to be mapped
+		 * into the user process.
+		 */
+		memlen = uctxt->egrbufs.size;
+		if ((vma->vm_end - vma->vm_start) != memlen) {
+			dd_dev_err(dd, "Eager buffer map size invalid (%lu != %lu)\n",
+				   (vma->vm_end - vma->vm_start), memlen);
+			ret = -EINVAL;
+			goto done;
+		}
+		if (vma->vm_flags & VM_WRITE) {
+			ret = -EPERM;
+			goto done;
+		}
+		vma->vm_flags &= ~VM_MAYWRITE;
+		addr = vma->vm_start;
+		for (i = 0 ; i < uctxt->egrbufs.numbufs; i++) {
+			ret = remap_pfn_range(
+				vma, addr,
+				uctxt->egrbufs.buffers[i].phys >> PAGE_SHIFT,
+				uctxt->egrbufs.buffers[i].len,
+				vma->vm_page_prot);
+			if (ret < 0)
+				goto done;
+			addr += uctxt->egrbufs.buffers[i].len;
+		}
+		ret = 0;
+		goto done;
+	}
+	case UREGS:
+		/*
+		 * Map only the page that contains this context's user
+		 * registers.
+		 */
+		memaddr = (unsigned long)
+			(dd->physaddr + RXE_PER_CONTEXT_USER)
+			+ (uctxt->ctxt * RXE_PER_CONTEXT_SIZE);
+		/*
+		 * TidFlow table is on the same page as the rest of the
+		 * user registers.
+		 */
+		memlen = PAGE_SIZE;
+		flags |= VM_DONTCOPY | VM_DONTEXPAND;
+		vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+		mapio = 1;
+		break;
+	case EVENTS:
+		/*
+		 * Use the page where this context's flags are. User level
+		 * knows where it's own bitmap is within the page.
+		 */
+		memaddr = ((unsigned long)dd->events +
+			   ((uctxt->ctxt - dd->first_user_ctxt) *
+			    HFI1_MAX_SHARED_CTXTS)) & PAGE_MASK;
+		memlen = PAGE_SIZE;
+		/*
+		 * v3.7 removes VM_RESERVED but the effect is kept by
+		 * using VM_IO.
+		 */
+		flags |= VM_IO | VM_DONTEXPAND;
+		vmf = 1;
+		break;
+	case STATUS:
+		memaddr = kvirt_to_phys((void *)dd->status);
+		memlen = PAGE_SIZE;
+		flags |= VM_IO | VM_DONTEXPAND;
+		break;
+	case RTAIL:
+		if (!HFI1_CAP_IS_USET(DMA_RTAIL)) {
+			/*
+			 * If the memory allocation failed, the context alloc
+			 * also would have failed, so we would never get here
+			 */
+			ret = -EINVAL;
+			goto done;
+		}
+		if (flags & VM_WRITE) {
+			ret = -EPERM;
+			goto done;
+		}
+		memaddr = uctxt->rcvhdrqtailaddr_phys;
+		memlen = PAGE_SIZE;
+		flags &= ~VM_MAYWRITE;
+		break;
+	case SUBCTXT_UREGS:
+		memaddr = (u64)uctxt->subctxt_uregbase;
+		memlen = PAGE_SIZE;
+		flags |= VM_IO | VM_DONTEXPAND;
+		vmf = 1;
+		break;
+	case SUBCTXT_RCV_HDRQ:
+		memaddr = (u64)uctxt->subctxt_rcvhdr_base;
+		memlen = uctxt->rcvhdrq_size * uctxt->subctxt_cnt;
+		flags |= VM_IO | VM_DONTEXPAND;
+		vmf = 1;
+		break;
+	case SUBCTXT_EGRBUF:
+		memaddr = (u64)uctxt->subctxt_rcvegrbuf;
+		memlen = uctxt->egrbufs.size * uctxt->subctxt_cnt;
+		flags |= VM_IO | VM_DONTEXPAND;
+		flags &= ~VM_MAYWRITE;
+		vmf = 1;
+		break;
+	case SDMA_COMP: {
+		struct hfi1_user_sdma_comp_q *cq;
+
+		if (!user_sdma_comp_fp(fp)) {
+			ret = -EFAULT;
+			goto done;
+		}
+		cq = user_sdma_comp_fp(fp);
+		memaddr = (u64)cq->comps;
+		memlen = ALIGN(sizeof(*cq->comps) * cq->nentries, PAGE_SIZE);
+		flags |= VM_IO | VM_DONTEXPAND;
+		vmf = 1;
+		break;
+	}
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	if ((vma->vm_end - vma->vm_start) != memlen) {
+		hfi1_cdbg(PROC, "%u:%u Memory size mismatch %lu:%lu",
+			  uctxt->ctxt, subctxt_fp(fp),
+			  (vma->vm_end - vma->vm_start), memlen);
+		ret = -EINVAL;
+		goto done;
+	}
+
+	vma->vm_flags = flags;
+	dd_dev_info(dd,
+		    "%s: %u:%u type:%u io/vf:%d/%d, addr:0x%llx, len:%lu(%lu), flags:0x%lx\n",
+		    __func__, ctxt, subctxt, type, mapio, vmf, memaddr, memlen,
+		    vma->vm_end - vma->vm_start, vma->vm_flags);
+	pfn = (unsigned long)(memaddr >> PAGE_SHIFT);
+	if (vmf) {
+		vma->vm_pgoff = pfn;
+		vma->vm_ops = &vm_ops;
+		ret = 0;
+	} else if (mapio) {
+		ret = io_remap_pfn_range(vma, vma->vm_start, pfn, memlen,
+					 vma->vm_page_prot);
+	} else {
+		ret = remap_pfn_range(vma, vma->vm_start, pfn, memlen,
+				      vma->vm_page_prot);
+	}
+done:
+	return ret;
+}
+
+/*
+ * Local (non-chip) user memory is not mapped right away but as it is
+ * accessed by the user-level code.
+ */
+static int vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	struct page *page;
+
+	page = vmalloc_to_page((void *)(vmf->pgoff << PAGE_SHIFT));
+	if (!page)
+		return VM_FAULT_SIGBUS;
+
+	get_page(page);
+	vmf->page = page;
+
+	return 0;
+}
+
+static unsigned int hfi1_poll(struct file *fp, struct poll_table_struct *pt)
+{
+	struct hfi1_ctxtdata *uctxt;
+	unsigned pollflag;
+
+	uctxt = ctxt_fp(fp);
+	if (!uctxt)
+		pollflag = POLLERR;
+	else if (uctxt->poll_type == HFI1_POLL_TYPE_URGENT)
+		pollflag = poll_urgent(fp, pt);
+	else  if (uctxt->poll_type == HFI1_POLL_TYPE_ANYRCV)
+		pollflag = poll_next(fp, pt);
+	else /* invalid */
+		pollflag = POLLERR;
+
+	return pollflag;
+}
+
+static int hfi1_file_close(struct inode *inode, struct file *fp)
+{
+	struct hfi1_filedata *fdata = fp->private_data;
+	struct hfi1_ctxtdata *uctxt = fdata->uctxt;
+	struct hfi1_devdata *dd;
+	unsigned long flags, *ev;
+
+	fp->private_data = NULL;
+
+	if (!uctxt)
+		goto done;
+
+	hfi1_cdbg(PROC, "freeing ctxt %u:%u", uctxt->ctxt, fdata->subctxt);
+	dd = uctxt->dd;
+	mutex_lock(&hfi1_mutex);
+
+	flush_wc();
+	/* drain user sdma queue */
+	if (fdata->pq)
+		hfi1_user_sdma_free_queues(fdata);
+
+	/*
+	 * Clear any left over, unhandled events so the next process that
+	 * gets this context doesn't get confused.
+	 */
+	ev = dd->events + ((uctxt->ctxt - dd->first_user_ctxt) *
+			   HFI1_MAX_SHARED_CTXTS) + fdata->subctxt;
+	*ev = 0;
+
+	if (--uctxt->cnt) {
+		uctxt->active_slaves &= ~(1 << fdata->subctxt);
+		uctxt->subpid[fdata->subctxt] = 0;
+		mutex_unlock(&hfi1_mutex);
+		goto done;
+	}
+
+	spin_lock_irqsave(&dd->uctxt_lock, flags);
+	/*
+	 * Disable receive context and interrupt available, reset all
+	 * RcvCtxtCtrl bits to default values.
+	 */
+	hfi1_rcvctrl(dd, HFI1_RCVCTRL_CTXT_DIS |
+		     HFI1_RCVCTRL_TIDFLOW_DIS |
+		     HFI1_RCVCTRL_INTRAVAIL_DIS |
+		     HFI1_RCVCTRL_ONE_PKT_EGR_DIS |
+		     HFI1_RCVCTRL_NO_RHQ_DROP_DIS |
+		     HFI1_RCVCTRL_NO_EGR_DROP_DIS, uctxt->ctxt);
+	/* Clear the context's J_KEY */
+	hfi1_clear_ctxt_jkey(dd, uctxt->ctxt);
+	/*
+	 * Reset context integrity checks to default.
+	 * (writes to CSRs probably belong in chip.c)
+	 */
+	write_kctxt_csr(dd, uctxt->sc->hw_context, SEND_CTXT_CHECK_ENABLE,
+			hfi1_pkt_default_send_ctxt_mask(dd, uctxt->sc->type));
+	sc_disable(uctxt->sc);
+	uctxt->pid = 0;
+	spin_unlock_irqrestore(&dd->uctxt_lock, flags);
+
+	dd->rcd[uctxt->ctxt] = NULL;
+	uctxt->rcvwait_to = 0;
+	uctxt->piowait_to = 0;
+	uctxt->rcvnowait = 0;
+	uctxt->pionowait = 0;
+	uctxt->event_flags = 0;
+
+	hfi1_clear_tids(uctxt);
+	hfi1_clear_ctxt_pkey(dd, uctxt->ctxt);
+
+	if (uctxt->tid_pg_list)
+		unlock_exp_tids(uctxt);
+
+	hfi1_stats.sps_ctxts--;
+	dd->freectxts++;
+	mutex_unlock(&hfi1_mutex);
+	hfi1_free_ctxtdata(dd, uctxt);
+done:
+	kfree(fdata);
+	return 0;
+}
+
+/*
+ * Convert kernel *virtual* addresses to physical addresses.
+ * This is used to vmalloc'ed addresses.
+ */
+static u64 kvirt_to_phys(void *addr)
+{
+	struct page *page;
+	u64 paddr = 0;
+
+	page = vmalloc_to_page(addr);
+	if (page)
+		paddr = page_to_pfn(page) << PAGE_SHIFT;
+
+	return paddr;
+}
+
+static int assign_ctxt(struct file *fp, struct hfi1_user_info *uinfo)
+{
+	int i_minor, ret = 0;
+	unsigned swmajor, swminor, alg = HFI1_ALG_ACROSS;
+
+	swmajor = uinfo->userversion >> 16;
+	if (swmajor != HFI1_USER_SWMAJOR) {
+		ret = -ENODEV;
+		goto done;
+	}
+
+	swminor = uinfo->userversion & 0xffff;
+
+	if (uinfo->hfi1_alg < HFI1_ALG_COUNT)
+		alg = uinfo->hfi1_alg;
+
+	mutex_lock(&hfi1_mutex);
+	/* First, lets check if we need to setup a shared context? */
+	if (uinfo->subctxt_cnt)
+		ret = find_shared_ctxt(fp, uinfo);
+
+	/*
+	 * We execute the following block if we couldn't find a
+	 * shared context or if context sharing is not required.
+	 */
+	if (!ret) {
+		i_minor = iminor(file_inode(fp)) - HFI1_USER_MINOR_BASE;
+		ret = get_user_context(fp, uinfo, i_minor - 1, alg);
+	}
+	mutex_unlock(&hfi1_mutex);
+done:
+	return ret;
+}
+
+static int get_user_context(struct file *fp, struct hfi1_user_info *uinfo,
+			    int devno, unsigned alg)
+{
+	struct hfi1_devdata *dd = NULL;
+	int ret = 0, devmax, npresent, nup, dev;
+
+	devmax = hfi1_count_units(&npresent, &nup);
+	if (!npresent) {
+		ret = -ENXIO;
+		goto done;
+	}
+	if (!nup) {
+		ret = -ENETDOWN;
+		goto done;
+	}
+	if (devno >= 0) {
+		dd = hfi1_lookup(devno);
+		if (!dd)
+			ret = -ENODEV;
+		else if (!dd->freectxts)
+			ret = -EBUSY;
+	} else {
+		struct hfi1_devdata *pdd;
+
+		if (alg == HFI1_ALG_ACROSS) {
+			unsigned free = 0U;
+
+			for (dev = 0; dev < devmax; dev++) {
+				pdd = hfi1_lookup(dev);
+				if (pdd && pdd->freectxts &&
+				    pdd->freectxts > free) {
+					dd = pdd;
+					free = pdd->freectxts;
+				}
+			}
+		} else {
+			for (dev = 0; dev < devmax; dev++) {
+				pdd = hfi1_lookup(dev);
+				if (pdd && pdd->freectxts) {
+					dd = pdd;
+					break;
+				}
+			}
+		}
+		if (!dd)
+			ret = -EBUSY;
+	}
+done:
+	return ret ? ret : allocate_ctxt(fp, dd, uinfo);
+}
+
+static int find_shared_ctxt(struct file *fp,
+			    const struct hfi1_user_info *uinfo)
+{
+	int devmax, ndev, i;
+	int ret = 0;
+
+	devmax = hfi1_count_units(NULL, NULL);
+
+	for (ndev = 0; ndev < devmax; ndev++) {
+		struct hfi1_devdata *dd = hfi1_lookup(ndev);
+
+		/* device portion of usable() */
+		if (!(dd && (dd->flags & HFI1_PRESENT) && dd->kregbase))
+			continue;
+		for (i = dd->first_user_ctxt; i < dd->num_rcv_contexts; i++) {
+			struct hfi1_ctxtdata *uctxt = dd->rcd[i];
+
+			/* Skip ctxts which are not yet open */
+			if (!uctxt || !uctxt->cnt)
+				continue;
+			/* Skip ctxt if it doesn't match the requested one */
+			if (memcmp(uctxt->uuid, uinfo->uuid,
+				   sizeof(uctxt->uuid)) ||
+			    uctxt->subctxt_id != uinfo->subctxt_id ||
+			    uctxt->subctxt_cnt != uinfo->subctxt_cnt)
+				continue;
+
+			/* Verify the sharing process matches the master */
+			if (uctxt->userversion != uinfo->userversion ||
+			    uctxt->cnt >= uctxt->subctxt_cnt) {
+				ret = -EINVAL;
+				goto done;
+			}
+			ctxt_fp(fp) = uctxt;
+			subctxt_fp(fp) = uctxt->cnt++;
+			uctxt->subpid[subctxt_fp(fp)] = current->pid;
+			uctxt->active_slaves |= 1 << subctxt_fp(fp);
+			ret = 1;
+			goto done;
+		}
+	}
+
+done:
+	return ret;
+}
+
+static int allocate_ctxt(struct file *fp, struct hfi1_devdata *dd,
+			 struct hfi1_user_info *uinfo)
+{
+	struct hfi1_ctxtdata *uctxt;
+	unsigned ctxt;
+	int ret;
+
+	if (dd->flags & HFI1_FROZEN) {
+		/*
+		 * Pick an error that is unique from all other errors
+		 * that are returned so the user process knows that
+		 * it tried to allocate while the SPC was frozen.  It
+		 * it should be able to retry with success in a short
+		 * while.
+		 */
+		return -EIO;
+	}
+
+	for (ctxt = dd->first_user_ctxt; ctxt < dd->num_rcv_contexts; ctxt++)
+		if (!dd->rcd[ctxt])
+			break;
+
+	if (ctxt == dd->num_rcv_contexts)
+		return -EBUSY;
+
+	uctxt = hfi1_create_ctxtdata(dd->pport, ctxt);
+	if (!uctxt) {
+		dd_dev_err(dd,
+			   "Unable to allocate ctxtdata memory, failing open\n");
+		return -ENOMEM;
+	}
+	/*
+	 * Allocate and enable a PIO send context.
+	 */
+	uctxt->sc = sc_alloc(dd, SC_USER, uctxt->rcvhdrqentsize,
+			     uctxt->numa_id);
+	if (!uctxt->sc)
+		return -ENOMEM;
+
+	dbg("allocated send context %u(%u)\n", uctxt->sc->sw_index,
+		uctxt->sc->hw_context);
+	ret = sc_enable(uctxt->sc);
+	if (ret)
+		return ret;
+	/*
+	 * Setup shared context resources if the user-level has requested
+	 * shared contexts and this is the 'master' process.
+	 * This has to be done here so the rest of the sub-contexts find the
+	 * proper master.
+	 */
+	if (uinfo->subctxt_cnt && !subctxt_fp(fp)) {
+		ret = init_subctxts(uctxt, uinfo);
+		/*
+		 * On error, we don't need to disable and de-allocate the
+		 * send context because it will be done during file close
+		 */
+		if (ret)
+			return ret;
+	}
+	uctxt->userversion = uinfo->userversion;
+	uctxt->pid = current->pid;
+	uctxt->flags = HFI1_CAP_UGET(MASK);
+	init_waitqueue_head(&uctxt->wait);
+	strlcpy(uctxt->comm, current->comm, sizeof(uctxt->comm));
+	memcpy(uctxt->uuid, uinfo->uuid, sizeof(uctxt->uuid));
+	uctxt->jkey = generate_jkey(current_uid());
+	INIT_LIST_HEAD(&uctxt->sdma_queues);
+	spin_lock_init(&uctxt->sdma_qlock);
+	hfi1_stats.sps_ctxts++;
+	dd->freectxts--;
+	ctxt_fp(fp) = uctxt;
+
+	return 0;
+}
+
+static int init_subctxts(struct hfi1_ctxtdata *uctxt,
+			 const struct hfi1_user_info *uinfo)
+{
+	int ret = 0;
+	unsigned num_subctxts;
+
+	num_subctxts = uinfo->subctxt_cnt;
+	if (num_subctxts > HFI1_MAX_SHARED_CTXTS) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	uctxt->subctxt_cnt = uinfo->subctxt_cnt;
+	uctxt->subctxt_id = uinfo->subctxt_id;
+	uctxt->active_slaves = 1;
+	uctxt->redirect_seq_cnt = 1;
+	set_bit(HFI1_CTXT_MASTER_UNINIT, &uctxt->event_flags);
+bail:
+	return ret;
+}
+
+static int setup_subctxt(struct hfi1_ctxtdata *uctxt)
+{
+	int ret = 0;
+	unsigned num_subctxts = uctxt->subctxt_cnt;
+
+	uctxt->subctxt_uregbase = vmalloc_user(PAGE_SIZE);
+	if (!uctxt->subctxt_uregbase) {
+		ret = -ENOMEM;
+		goto bail;
+	}
+	/* We can take the size of the RcvHdr Queue from the master */
+	uctxt->subctxt_rcvhdr_base = vmalloc_user(uctxt->rcvhdrq_size *
+						  num_subctxts);
+	if (!uctxt->subctxt_rcvhdr_base) {
+		ret = -ENOMEM;
+		goto bail_ureg;
+	}
+
+	uctxt->subctxt_rcvegrbuf = vmalloc_user(uctxt->egrbufs.size *
+						num_subctxts);
+	if (!uctxt->subctxt_rcvegrbuf) {
+		ret = -ENOMEM;
+		goto bail_rhdr;
+	}
+	goto bail;
+bail_rhdr:
+	vfree(uctxt->subctxt_rcvhdr_base);
+bail_ureg:
+	vfree(uctxt->subctxt_uregbase);
+	uctxt->subctxt_uregbase = NULL;
+bail:
+	return ret;
+}
+
+static int user_init(struct file *fp)
+{
+	int ret;
+	unsigned int rcvctrl_ops = 0;
+	struct hfi1_ctxtdata *uctxt = ctxt_fp(fp);
+
+	/* make sure that the context has already been setup */
+	if (!test_bit(HFI1_CTXT_SETUP_DONE, &uctxt->event_flags)) {
+		ret = -EFAULT;
+		goto done;
+	}
+
+	/*
+	 * Subctxts don't need to initialize anything since master
+	 * has done it.
+	 */
+	if (subctxt_fp(fp)) {
+		ret = wait_event_interruptible(uctxt->wait,
+			!test_bit(HFI1_CTXT_MASTER_UNINIT,
+			&uctxt->event_flags));
+		goto done;
+	}
+
+	/* initialize poll variables... */
+	uctxt->urgent = 0;
+	uctxt->urgent_poll = 0;
+
+	/*
+	 * Now enable the ctxt for receive.
+	 * For chips that are set to DMA the tail register to memory
+	 * when they change (and when the update bit transitions from
+	 * 0 to 1.  So for those chips, we turn it off and then back on.
+	 * This will (very briefly) affect any other open ctxts, but the
+	 * duration is very short, and therefore isn't an issue.  We
+	 * explicitly set the in-memory tail copy to 0 beforehand, so we
+	 * don't have to wait to be sure the DMA update has happened
+	 * (chip resets head/tail to 0 on transition to enable).
+	 */
+	if (uctxt->rcvhdrtail_kvaddr)
+		clear_rcvhdrtail(uctxt);
+
+	/* Setup J_KEY before enabling the context */
+	hfi1_set_ctxt_jkey(uctxt->dd, uctxt->ctxt, uctxt->jkey);
+
+	rcvctrl_ops = HFI1_RCVCTRL_CTXT_ENB;
+	if (HFI1_CAP_KGET_MASK(uctxt->flags, HDRSUPP))
+		rcvctrl_ops |= HFI1_RCVCTRL_TIDFLOW_ENB;
+	/*
+	 * Ignore the bit in the flags for now until proper
+	 * support for multiple packet per rcv array entry is
+	 * added.
+	 */
+	if (!HFI1_CAP_KGET_MASK(uctxt->flags, MULTI_PKT_EGR))
+		rcvctrl_ops |= HFI1_RCVCTRL_ONE_PKT_EGR_ENB;
+	if (HFI1_CAP_KGET_MASK(uctxt->flags, NODROP_EGR_FULL))
+		rcvctrl_ops |= HFI1_RCVCTRL_NO_EGR_DROP_ENB;
+	if (HFI1_CAP_KGET_MASK(uctxt->flags, NODROP_RHQ_FULL))
+		rcvctrl_ops |= HFI1_RCVCTRL_NO_RHQ_DROP_ENB;
+	if (HFI1_CAP_KGET_MASK(uctxt->flags, DMA_RTAIL))
+		rcvctrl_ops |= HFI1_RCVCTRL_TAILUPD_ENB;
+	hfi1_rcvctrl(uctxt->dd, rcvctrl_ops, uctxt->ctxt);
+
+	/* Notify any waiting slaves */
+	if (uctxt->subctxt_cnt) {
+		clear_bit(HFI1_CTXT_MASTER_UNINIT, &uctxt->event_flags);
+		wake_up(&uctxt->wait);
+	}
+	ret = 0;
+
+done:
+	return ret;
+}
+
+static int get_ctxt_info(struct file *fp, void __user *ubase, __u32 len)
+{
+	struct hfi1_ctxt_info cinfo;
+	struct hfi1_ctxtdata *uctxt = ctxt_fp(fp);
+	struct hfi1_filedata *fd = fp->private_data;
+	int ret = 0;
+
+	ret = hfi1_get_base_kinfo(uctxt, &cinfo);
+	if (ret < 0)
+		goto done;
+	cinfo.num_active = hfi1_count_active_units();
+	cinfo.unit = uctxt->dd->unit;
+	cinfo.ctxt = uctxt->ctxt;
+	cinfo.subctxt = subctxt_fp(fp);
+	cinfo.rcvtids = roundup(uctxt->egrbufs.alloced,
+				uctxt->dd->rcv_entries.group_size) +
+		uctxt->expected_count;
+	cinfo.credits = uctxt->sc->credits;
+	cinfo.numa_node = uctxt->numa_id;
+	cinfo.rec_cpu = fd->rec_cpu_num;
+	cinfo.send_ctxt = uctxt->sc->hw_context;
+
+	cinfo.egrtids = uctxt->egrbufs.alloced;
+	cinfo.rcvhdrq_cnt = uctxt->rcvhdrq_cnt;
+	cinfo.rcvhdrq_entsize = uctxt->rcvhdrqentsize << 2;
+	cinfo.sdma_ring_size = user_sdma_comp_fp(fp)->nentries;
+	cinfo.rcvegr_size = uctxt->egrbufs.rcvtid_size;
+
+	trace_hfi1_ctxt_info(uctxt->dd, uctxt->ctxt, subctxt_fp(fp), cinfo);
+	if (copy_to_user(ubase, &cinfo, sizeof(cinfo)))
+		ret = -EFAULT;
+done:
+	return ret;
+}
+
+static int setup_ctxt(struct file *fp)
+{
+	struct hfi1_ctxtdata *uctxt = ctxt_fp(fp);
+	struct hfi1_devdata *dd = uctxt->dd;
+	int ret = 0;
+
+	/*
+	 * Context should be set up only once (including allocation and
+	 * programming of eager buffers. This is done if context sharing
+	 * is not requested or by the master process.
+	 */
+	if (!uctxt->subctxt_cnt || !subctxt_fp(fp)) {
+		ret = hfi1_init_ctxt(uctxt->sc);
+		if (ret)
+			goto done;
+
+		/* Now allocate the RcvHdr queue and eager buffers. */
+		ret = hfi1_create_rcvhdrq(dd, uctxt);
+		if (ret)
+			goto done;
+		ret = hfi1_setup_eagerbufs(uctxt);
+		if (ret)
+			goto done;
+		if (uctxt->subctxt_cnt && !subctxt_fp(fp)) {
+			ret = setup_subctxt(uctxt);
+			if (ret)
+				goto done;
+		}
+		/* Setup Expected Rcv memories */
+		uctxt->tid_pg_list = vzalloc(uctxt->expected_count *
+					     sizeof(struct page **));
+		if (!uctxt->tid_pg_list) {
+			ret = -ENOMEM;
+			goto done;
+		}
+		uctxt->physshadow = vzalloc(uctxt->expected_count *
+					    sizeof(*uctxt->physshadow));
+		if (!uctxt->physshadow) {
+			ret = -ENOMEM;
+			goto done;
+		}
+		/* allocate expected TID map and initialize the cursor */
+		atomic_set(&uctxt->tidcursor, 0);
+		uctxt->numtidgroups = uctxt->expected_count /
+			dd->rcv_entries.group_size;
+		uctxt->tidmapcnt = uctxt->numtidgroups / BITS_PER_LONG +
+			!!(uctxt->numtidgroups % BITS_PER_LONG);
+		uctxt->tidusemap = kzalloc_node(uctxt->tidmapcnt *
+						sizeof(*uctxt->tidusemap),
+						GFP_KERNEL, uctxt->numa_id);
+		if (!uctxt->tidusemap) {
+			ret = -ENOMEM;
+			goto done;
+		}
+		/*
+		 * In case that the number of groups is not a multiple of
+		 * 64 (the number of groups in a tidusemap element), mark
+		 * the extra ones as used. This will effectively make them
+		 * permanently used and should never be assigned. Otherwise,
+		 * the code which checks how many free groups we have will
+		 * get completely confused about the state of the bits.
+		 */
+		if (uctxt->numtidgroups % BITS_PER_LONG)
+			uctxt->tidusemap[uctxt->tidmapcnt - 1] =
+				~((1ULL << (uctxt->numtidgroups %
+					    BITS_PER_LONG)) - 1);
+		trace_hfi1_exp_tid_map(uctxt->ctxt, subctxt_fp(fp), 0,
+				       uctxt->tidusemap, uctxt->tidmapcnt);
+	}
+	ret = hfi1_user_sdma_alloc_queues(uctxt, fp);
+	if (ret)
+		goto done;
+
+	set_bit(HFI1_CTXT_SETUP_DONE, &uctxt->event_flags);
+done:
+	return ret;
+}
+
+static int get_base_info(struct file *fp, void __user *ubase, __u32 len)
+{
+	struct hfi1_base_info binfo;
+	struct hfi1_ctxtdata *uctxt = ctxt_fp(fp);
+	struct hfi1_devdata *dd = uctxt->dd;
+	ssize_t sz;
+	unsigned offset;
+	int ret = 0;
+
+	trace_hfi1_uctxtdata(uctxt->dd, uctxt);
+
+	memset(&binfo, 0, sizeof(binfo));
+	binfo.hw_version = dd->revision;
+	binfo.sw_version = HFI1_KERN_SWVERSION;
+	binfo.bthqp = kdeth_qp;
+	binfo.jkey = uctxt->jkey;
+	/*
+	 * If more than 64 contexts are enabled the allocated credit
+	 * return will span two or three contiguous pages. Since we only
+	 * map the page containing the context's credit return address,
+	 * we need to calculate the offset in the proper page.
+	 */
+	offset = ((u64)uctxt->sc->hw_free -
+		  (u64)dd->cr_base[uctxt->numa_id].va) % PAGE_SIZE;
+	binfo.sc_credits_addr = HFI1_MMAP_TOKEN(PIO_CRED, uctxt->ctxt,
+					       subctxt_fp(fp), offset);
+	binfo.pio_bufbase = HFI1_MMAP_TOKEN(PIO_BUFS, uctxt->ctxt,
+					    subctxt_fp(fp),
+					    uctxt->sc->base_addr);
+	binfo.pio_bufbase_sop = HFI1_MMAP_TOKEN(PIO_BUFS_SOP,
+						uctxt->ctxt,
+						subctxt_fp(fp),
+						uctxt->sc->base_addr);
+	binfo.rcvhdr_bufbase = HFI1_MMAP_TOKEN(RCV_HDRQ, uctxt->ctxt,
+					       subctxt_fp(fp),
+					       uctxt->rcvhdrq);
+	binfo.rcvegr_bufbase = HFI1_MMAP_TOKEN(RCV_EGRBUF, uctxt->ctxt,
+					       subctxt_fp(fp),
+					       uctxt->egrbufs.rcvtids[0].phys);
+	binfo.sdma_comp_bufbase = HFI1_MMAP_TOKEN(SDMA_COMP, uctxt->ctxt,
+						 subctxt_fp(fp), 0);
+	/*
+	 * user regs are at
+	 * (RXE_PER_CONTEXT_USER + (ctxt * RXE_PER_CONTEXT_SIZE))
+	 */
+	binfo.user_regbase = HFI1_MMAP_TOKEN(UREGS, uctxt->ctxt,
+					    subctxt_fp(fp), 0);
+	offset = ((((uctxt->ctxt - dd->first_user_ctxt) *
+		    HFI1_MAX_SHARED_CTXTS) + subctxt_fp(fp)) *
+		  sizeof(*dd->events)) & ~PAGE_MASK;
+	binfo.events_bufbase = HFI1_MMAP_TOKEN(EVENTS, uctxt->ctxt,
+					      subctxt_fp(fp),
+					      offset);
+	binfo.status_bufbase = HFI1_MMAP_TOKEN(STATUS, uctxt->ctxt,
+					      subctxt_fp(fp),
+					      dd->status);
+	if (HFI1_CAP_IS_USET(DMA_RTAIL))
+		binfo.rcvhdrtail_base = HFI1_MMAP_TOKEN(RTAIL, uctxt->ctxt,
+						       subctxt_fp(fp), 0);
+	if (uctxt->subctxt_cnt) {
+		binfo.subctxt_uregbase = HFI1_MMAP_TOKEN(SUBCTXT_UREGS,
+							uctxt->ctxt,
+							subctxt_fp(fp), 0);
+		binfo.subctxt_rcvhdrbuf = HFI1_MMAP_TOKEN(SUBCTXT_RCV_HDRQ,
+							 uctxt->ctxt,
+							 subctxt_fp(fp), 0);
+		binfo.subctxt_rcvegrbuf = HFI1_MMAP_TOKEN(SUBCTXT_EGRBUF,
+							 uctxt->ctxt,
+							 subctxt_fp(fp), 0);
+	}
+	sz = (len < sizeof(binfo)) ? len : sizeof(binfo);
+	if (copy_to_user(ubase, &binfo, sz))
+		ret = -EFAULT;
+	return ret;
+}
+
+static unsigned int poll_urgent(struct file *fp,
+				struct poll_table_struct *pt)
+{
+	struct hfi1_ctxtdata *uctxt = ctxt_fp(fp);
+	struct hfi1_devdata *dd = uctxt->dd;
+	unsigned pollflag;
+
+	poll_wait(fp, &uctxt->wait, pt);
+
+	spin_lock_irq(&dd->uctxt_lock);
+	if (uctxt->urgent != uctxt->urgent_poll) {
+		pollflag = POLLIN | POLLRDNORM;
+		uctxt->urgent_poll = uctxt->urgent;
+	} else {
+		pollflag = 0;
+		set_bit(HFI1_CTXT_WAITING_URG, &uctxt->event_flags);
+	}
+	spin_unlock_irq(&dd->uctxt_lock);
+
+	return pollflag;
+}
+
+static unsigned int poll_next(struct file *fp,
+			      struct poll_table_struct *pt)
+{
+	struct hfi1_ctxtdata *uctxt = ctxt_fp(fp);
+	struct hfi1_devdata *dd = uctxt->dd;
+	unsigned pollflag;
+
+	poll_wait(fp, &uctxt->wait, pt);
+
+	spin_lock_irq(&dd->uctxt_lock);
+	if (hdrqempty(uctxt)) {
+		set_bit(HFI1_CTXT_WAITING_RCV, &uctxt->event_flags);
+		hfi1_rcvctrl(dd, HFI1_RCVCTRL_INTRAVAIL_ENB, uctxt->ctxt);
+		pollflag = 0;
+	} else
+		pollflag = POLLIN | POLLRDNORM;
+	spin_unlock_irq(&dd->uctxt_lock);
+
+	return pollflag;
+}
+
+/*
+ * Find all user contexts in use, and set the specified bit in their
+ * event mask.
+ * See also find_ctxt() for a similar use, that is specific to send buffers.
+ */
+int hfi1_set_uevent_bits(struct hfi1_pportdata *ppd, const int evtbit)
+{
+	struct hfi1_ctxtdata *uctxt;
+	struct hfi1_devdata *dd = ppd->dd;
+	unsigned ctxt;
+	int ret = 0;
+	unsigned long flags;
+
+	if (!dd->events) {
+		ret = -EINVAL;
+		goto done;
+	}
+
+	spin_lock_irqsave(&dd->uctxt_lock, flags);
+	for (ctxt = dd->first_user_ctxt; ctxt < dd->num_rcv_contexts;
+	     ctxt++) {
+		uctxt = dd->rcd[ctxt];
+		if (uctxt) {
+			unsigned long *evs = dd->events +
+				(uctxt->ctxt - dd->first_user_ctxt) *
+				HFI1_MAX_SHARED_CTXTS;
+			int i;
+			/*
+			 * subctxt_cnt is 0 if not shared, so do base
+			 * separately, first, then remaining subctxt, if any
+			 */
+			set_bit(evtbit, evs);
+			for (i = 1; i < uctxt->subctxt_cnt; i++)
+				set_bit(evtbit, evs + i);
+		}
+	}
+	spin_unlock_irqrestore(&dd->uctxt_lock, flags);
+done:
+	return ret;
+}
+
+/**
+ * manage_rcvq - manage a context's receive queue
+ * @uctxt: the context
+ * @subctxt: the sub-context
+ * @start_stop: action to carry out
+ *
+ * start_stop == 0 disables receive on the context, for use in queue
+ * overflow conditions.  start_stop==1 re-enables, to be used to
+ * re-init the software copy of the head register
+ */
+static int manage_rcvq(struct hfi1_ctxtdata *uctxt, unsigned subctxt,
+		       int start_stop)
+{
+	struct hfi1_devdata *dd = uctxt->dd;
+	unsigned int rcvctrl_op;
+
+	if (subctxt)
+		goto bail;
+	/* atomically clear receive enable ctxt. */
+	if (start_stop) {
+		/*
+		 * On enable, force in-memory copy of the tail register to
+		 * 0, so that protocol code doesn't have to worry about
+		 * whether or not the chip has yet updated the in-memory
+		 * copy or not on return from the system call. The chip
+		 * always resets it's tail register back to 0 on a
+		 * transition from disabled to enabled.
+		 */
+		if (uctxt->rcvhdrtail_kvaddr)
+			clear_rcvhdrtail(uctxt);
+		rcvctrl_op = HFI1_RCVCTRL_CTXT_ENB;
+	} else
+		rcvctrl_op = HFI1_RCVCTRL_CTXT_DIS;
+	hfi1_rcvctrl(dd, rcvctrl_op, uctxt->ctxt);
+	/* always; new head should be equal to new tail; see above */
+bail:
+	return 0;
+}
+
+/*
+ * clear the event notifier events for this context.
+ * User process then performs actions appropriate to bit having been
+ * set, if desired, and checks again in future.
+ */
+static int user_event_ack(struct hfi1_ctxtdata *uctxt, int subctxt,
+			  unsigned long events)
+{
+	int i;
+	struct hfi1_devdata *dd = uctxt->dd;
+	unsigned long *evs;
+
+	if (!dd->events)
+		return 0;
+
+	evs = dd->events + ((uctxt->ctxt - dd->first_user_ctxt) *
+			    HFI1_MAX_SHARED_CTXTS) + subctxt;
+
+	for (i = 0; i <= _HFI1_MAX_EVENT_BIT; i++) {
+		if (!test_bit(i, &events))
+			continue;
+		clear_bit(i, evs);
+	}
+	return 0;
+}
+
+#define num_user_pages(vaddr, len)					\
+	(1 + (((((unsigned long)(vaddr) +				\
+		 (unsigned long)(len) - 1) & PAGE_MASK) -		\
+	       ((unsigned long)vaddr & PAGE_MASK)) >> PAGE_SHIFT))
+
+/**
+ * tzcnt - count the number of trailing zeros in a 64bit value
+ * @value: the value to be examined
+ *
+ * Returns the number of trailing least significant zeros in the
+ * the input value. If the value is zero, return the number of
+ * bits of the value.
+ */
+static inline u8 tzcnt(u64 value)
+{
+	return value ? __builtin_ctzl(value) : sizeof(value) * 8;
+}
+
+static inline unsigned num_free_groups(unsigned long map, u16 *start)
+{
+	unsigned free;
+	u16 bitidx = *start;
+
+	if (bitidx >= BITS_PER_LONG)
+		return 0;
+	/* "Turn off" any bits set before our bit index */
+	map &= ~((1ULL << bitidx) - 1);
+	free = tzcnt(map) - bitidx;
+	while (!free && bitidx < BITS_PER_LONG) {
+		/* Zero out the last set bit so we look at the rest */
+		map &= ~(1ULL << bitidx);
+		/*
+		 * Account for the previously checked bits and advance
+		 * the bit index. We don't have to check for bitidx
+		 * getting bigger than BITS_PER_LONG here as it would
+		 * mean extra instructions that we don't need. If it
+		 * did happen, it would push free to a negative value
+		 * which will break the loop.
+		 */
+		free = tzcnt(map) - ++bitidx;
+	}
+	*start = bitidx;
+	return free;
+}
+
+static int exp_tid_setup(struct file *fp, struct hfi1_tid_info *tinfo)
+{
+	int ret = 0;
+	struct hfi1_ctxtdata *uctxt = ctxt_fp(fp);
+	struct hfi1_devdata *dd = uctxt->dd;
+	unsigned tid, mapped = 0, npages, ngroups, exp_groups,
+		tidpairs = uctxt->expected_count / 2;
+	struct page **pages;
+	unsigned long vaddr, tidmap[uctxt->tidmapcnt];
+	dma_addr_t *phys;
+	u32 tidlist[tidpairs], pairidx = 0, tidcursor;
+	u16 useidx, idx, bitidx, tidcnt = 0;
+
+	vaddr = tinfo->vaddr;
+
+	if (vaddr & ~PAGE_MASK) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	npages = num_user_pages(vaddr, tinfo->length);
+	if (!npages) {
+		ret = -EINVAL;
+		goto bail;
+	}
+	if (!access_ok(VERIFY_WRITE, (void __user *)vaddr,
+		       npages * PAGE_SIZE)) {
+		dd_dev_err(dd, "Fail vaddr %p, %u pages, !access_ok\n",
+			   (void *)vaddr, npages);
+		ret = -EFAULT;
+		goto bail;
+	}
+
+	memset(tidmap, 0, sizeof(tidmap[0]) * uctxt->tidmapcnt);
+	memset(tidlist, 0, sizeof(tidlist[0]) * tidpairs);
+
+	exp_groups = uctxt->expected_count / dd->rcv_entries.group_size;
+	/* which group set do we look at first? */
+	tidcursor = atomic_read(&uctxt->tidcursor);
+	useidx = (tidcursor >> 16) & 0xffff;
+	bitidx = tidcursor & 0xffff;
+
+	/*
+	 * Keep going until we've mapped all pages or we've exhausted all
+	 * RcvArray entries.
+	 * This iterates over the number of tidmaps + 1
+	 * (idx <= uctxt->tidmapcnt) so we check the bitmap which we
+	 * started from one more time for any free bits before the
+	 * starting point bit.
+	 */
+	for (mapped = 0, idx = 0;
+	     mapped < npages && idx <= uctxt->tidmapcnt;) {
+		u64 i, offset = 0;
+		unsigned free, pinned, pmapped = 0, bits_used;
+		u16 grp;
+
+		/*
+		 * "Reserve" the needed group bits under lock so other
+		 * processes can't step in the middle of it. Once
+		 * reserved, we don't need the lock anymore since we
+		 * are guaranteed the groups.
+		 */
+		spin_lock(&uctxt->exp_lock);
+		if (uctxt->tidusemap[useidx] == -1ULL ||
+		    bitidx >= BITS_PER_LONG) {
+			/* no free groups in the set, use the next */
+			useidx = (useidx + 1) % uctxt->tidmapcnt;
+			idx++;
+			bitidx = 0;
+			spin_unlock(&uctxt->exp_lock);
+			continue;
+		}
+		ngroups = ((npages - mapped) / dd->rcv_entries.group_size) +
+			!!((npages - mapped) % dd->rcv_entries.group_size);
+
+		/*
+		 * If we've gotten here, the current set of groups does have
+		 * one or more free groups.
+		 */
+		free = num_free_groups(uctxt->tidusemap[useidx], &bitidx);
+		if (!free) {
+			/*
+			 * Despite the check above, free could still come back
+			 * as 0 because we don't check the entire bitmap but
+			 * we start from bitidx.
+			 */
+			spin_unlock(&uctxt->exp_lock);
+			continue;
+		}
+		bits_used = min(free, ngroups);
+		tidmap[useidx] |= ((1ULL << bits_used) - 1) << bitidx;
+		uctxt->tidusemap[useidx] |= tidmap[useidx];
+		spin_unlock(&uctxt->exp_lock);
+
+		/*
+		 * At this point, we know where in the map we have free bits.
+		 * properly offset into the various "shadow" arrays and compute
+		 * the RcvArray entry index.
+		 */
+		offset = ((useidx * BITS_PER_LONG) + bitidx) *
+			dd->rcv_entries.group_size;
+		pages = uctxt->tid_pg_list + offset;
+		phys = uctxt->physshadow + offset;
+		tid = uctxt->expected_base + offset;
+
+		/* Calculate how many pages we can pin based on free bits */
+		pinned = min((bits_used * dd->rcv_entries.group_size),
+			     (npages - mapped));
+		/*
+		 * Now that we know how many free RcvArray entries we have,
+		 * we can pin that many user pages.
+		 */
+		ret = hfi1_get_user_pages(vaddr + (mapped * PAGE_SIZE),
+					  pinned, pages);
+		if (ret) {
+			/*
+			 * We can't continue because the pages array won't be
+			 * initialized. This should never happen,
+			 * unless perhaps the user has mpin'ed the pages
+			 * themselves.
+			 */
+			dd_dev_info(dd,
+				    "Failed to lock addr %p, %u pages: errno %d\n",
+				    (void *) vaddr, pinned, -ret);
+			/*
+			 * Let go of the bits that we reserved since we are not
+			 * going to use them.
+			 */
+			spin_lock(&uctxt->exp_lock);
+			uctxt->tidusemap[useidx] &=
+				~(((1ULL << bits_used) - 1) << bitidx);
+			spin_unlock(&uctxt->exp_lock);
+			goto done;
+		}
+		/*
+		 * How many groups do we need based on how many pages we have
+		 * pinned?
+		 */
+		ngroups = (pinned / dd->rcv_entries.group_size) +
+			!!(pinned % dd->rcv_entries.group_size);
+		/*
+		 * Keep programming RcvArray entries for all the <ngroups> free
+		 * groups.
+		 */
+		for (i = 0, grp = 0; grp < ngroups; i++, grp++) {
+			unsigned j;
+			u32 pair_size = 0, tidsize;
+			/*
+			 * This inner loop will program an entire group or the
+			 * array of pinned pages (which ever limit is hit
+			 * first).
+			 */
+			for (j = 0; j < dd->rcv_entries.group_size &&
+				     pmapped < pinned; j++, pmapped++, tid++) {
+				tidsize = PAGE_SIZE;
+				phys[pmapped] = hfi1_map_page(dd->pcidev,
+						   pages[pmapped], 0,
+						   tidsize, PCI_DMA_FROMDEVICE);
+				trace_hfi1_exp_rcv_set(uctxt->ctxt,
+						       subctxt_fp(fp),
+						       tid, vaddr,
+						       phys[pmapped],
+						       pages[pmapped]);
+				/*
+				 * Each RcvArray entry is programmed with one
+				 * page * worth of memory. This will handle
+				 * the 8K MTU as well as anything smaller
+				 * due to the fact that both entries in the
+				 * RcvTidPair are programmed with a page.
+				 * PSM currently does not handle anything
+				 * bigger than 8K MTU, so should we even worry
+				 * about 10K here?
+				 */
+				hfi1_put_tid(dd, tid, PT_EXPECTED,
+					     phys[pmapped],
+					     ilog2(tidsize >> PAGE_SHIFT) + 1);
+				pair_size += tidsize >> PAGE_SHIFT;
+				EXP_TID_RESET(tidlist[pairidx], LEN, pair_size);
+				if (!(tid % 2)) {
+					tidlist[pairidx] |=
+					   EXP_TID_SET(IDX,
+						(tid - uctxt->expected_base)
+						       / 2);
+					tidlist[pairidx] |=
+						EXP_TID_SET(CTRL, 1);
+					tidcnt++;
+				} else {
+					tidlist[pairidx] |=
+						EXP_TID_SET(CTRL, 2);
+					pair_size = 0;
+					pairidx++;
+				}
+			}
+			/*
+			 * We've programmed the entire group (or as much of the
+			 * group as we'll use. Now, it's time to push it out...
+			 */
+			flush_wc();
+		}
+		mapped += pinned;
+		atomic_set(&uctxt->tidcursor,
+			   (((useidx & 0xffffff) << 16) |
+			    ((bitidx + bits_used) & 0xffffff)));
+	}
+	trace_hfi1_exp_tid_map(uctxt->ctxt, subctxt_fp(fp), 0, uctxt->tidusemap,
+			       uctxt->tidmapcnt);
+
+done:
+	/* If we've mapped anything, copy relevant info to user */
+	if (mapped) {
+		if (copy_to_user((void __user *)(unsigned long)tinfo->tidlist,
+				 tidlist, sizeof(tidlist[0]) * tidcnt)) {
+			ret = -EFAULT;
+			goto done;
+		}
+		/* copy TID info to user */
+		if (copy_to_user((void __user *)(unsigned long)tinfo->tidmap,
+				 tidmap, sizeof(tidmap[0]) * uctxt->tidmapcnt))
+			ret = -EFAULT;
+	}
+bail:
+	/*
+	 * Calculate mapped length. New Exp TID protocol does not "unwind" and
+	 * report an error if it can't map the entire buffer. It just reports
+	 * the length that was mapped.
+	 */
+	tinfo->length = mapped * PAGE_SIZE;
+	tinfo->tidcnt = tidcnt;
+	return ret;
+}
+
+static int exp_tid_free(struct file *fp, struct hfi1_tid_info *tinfo)
+{
+	struct hfi1_ctxtdata *uctxt = ctxt_fp(fp);
+	struct hfi1_devdata *dd = uctxt->dd;
+	unsigned long tidmap[uctxt->tidmapcnt];
+	struct page **pages;
+	dma_addr_t *phys;
+	u16 idx, bitidx, tid;
+	int ret = 0;
+
+	if (copy_from_user(&tidmap, (void __user *)(unsigned long)
+			   tinfo->tidmap,
+			   sizeof(tidmap[0]) * uctxt->tidmapcnt)) {
+		ret = -EFAULT;
+		goto done;
+	}
+	for (idx = 0; idx < uctxt->tidmapcnt; idx++) {
+		unsigned long map;
+
+		bitidx = 0;
+		if (!tidmap[idx])
+			continue;
+		map = tidmap[idx];
+		while ((bitidx = tzcnt(map)) < BITS_PER_LONG) {
+			int i, pcount = 0;
+			struct page *pshadow[dd->rcv_entries.group_size];
+			unsigned offset = ((idx * BITS_PER_LONG) + bitidx) *
+				dd->rcv_entries.group_size;
+
+			pages = uctxt->tid_pg_list + offset;
+			phys = uctxt->physshadow + offset;
+			tid = uctxt->expected_base + offset;
+			for (i = 0; i < dd->rcv_entries.group_size;
+			     i++, tid++) {
+				if (pages[i]) {
+					hfi1_put_tid(dd, tid, PT_INVALID,
+						      0, 0);
+					trace_hfi1_exp_rcv_free(uctxt->ctxt,
+								subctxt_fp(fp),
+								tid, phys[i],
+								pages[i]);
+					pci_unmap_page(dd->pcidev, phys[i],
+					      PAGE_SIZE, PCI_DMA_FROMDEVICE);
+					pshadow[pcount] = pages[i];
+					pages[i] = NULL;
+					pcount++;
+					phys[i] = 0;
+				}
+			}
+			flush_wc();
+			hfi1_release_user_pages(pshadow, pcount);
+			clear_bit(bitidx, &uctxt->tidusemap[idx]);
+			map &= ~(1ULL<<bitidx);
+		}
+	}
+	trace_hfi1_exp_tid_map(uctxt->ctxt, subctxt_fp(fp), 1, uctxt->tidusemap,
+			       uctxt->tidmapcnt);
+done:
+	return ret;
+}
+
+static void unlock_exp_tids(struct hfi1_ctxtdata *uctxt)
+{
+	struct hfi1_devdata *dd = uctxt->dd;
+	unsigned tid;
+
+	dd_dev_info(dd, "ctxt %u unlocking any locked expTID pages\n",
+		    uctxt->ctxt);
+	for (tid = 0; tid < uctxt->expected_count; tid++) {
+		struct page *p = uctxt->tid_pg_list[tid];
+		dma_addr_t phys;
+
+		if (!p)
+			continue;
+
+		phys = uctxt->physshadow[tid];
+		uctxt->physshadow[tid] = 0;
+		uctxt->tid_pg_list[tid] = NULL;
+		pci_unmap_page(dd->pcidev, phys, PAGE_SIZE, PCI_DMA_FROMDEVICE);
+		hfi1_release_user_pages(&p, 1);
+	}
+}
+
+static int set_ctxt_pkey(struct hfi1_ctxtdata *uctxt, unsigned subctxt,
+			 u16 pkey)
+{
+	int ret = -ENOENT, i, intable = 0;
+	struct hfi1_pportdata *ppd = uctxt->ppd;
+	struct hfi1_devdata *dd = uctxt->dd;
+
+	if (pkey == LIM_MGMT_P_KEY || pkey == FULL_MGMT_P_KEY) {
+		ret = -EINVAL;
+		goto done;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(ppd->pkeys); i++)
+		if (pkey == ppd->pkeys[i]) {
+			intable = 1;
+			break;
+		}
+
+	if (intable)
+		ret = hfi1_set_ctxt_pkey(dd, uctxt->ctxt, pkey);
+done:
+	return ret;
+}
+
+static int ui_open(struct inode *inode, struct file *filp)
+{
+	struct hfi1_devdata *dd;
+
+	dd = container_of(inode->i_cdev, struct hfi1_devdata, ui_cdev);
+	filp->private_data = dd; /* for other methods */
+	return 0;
+}
+
+static int ui_release(struct inode *inode, struct file *filp)
+{
+	/* nothing to do */
+	return 0;
+}
+
+static loff_t ui_lseek(struct file *filp, loff_t offset, int whence)
+{
+	struct hfi1_devdata *dd = filp->private_data;
+
+	switch (whence) {
+	case SEEK_SET:
+		break;
+	case SEEK_CUR:
+		offset += filp->f_pos;
+		break;
+	case SEEK_END:
+		offset = ((dd->kregend - dd->kregbase) + DC8051_DATA_MEM_SIZE) -
+			offset;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (offset < 0)
+		return -EINVAL;
+
+	if (offset >= (dd->kregend - dd->kregbase) + DC8051_DATA_MEM_SIZE)
+		return -EINVAL;
+
+	filp->f_pos = offset;
+
+	return filp->f_pos;
+}
+
+
+/* NOTE: assumes unsigned long is 8 bytes */
+static ssize_t ui_read(struct file *filp, char __user *buf, size_t count,
+			loff_t *f_pos)
+{
+	struct hfi1_devdata *dd = filp->private_data;
+	void __iomem *base = dd->kregbase;
+	unsigned long total, csr_off,
+		barlen = (dd->kregend - dd->kregbase);
+	u64 data;
+
+	/* only read 8 byte quantities */
+	if ((count % 8) != 0)
+		return -EINVAL;
+	/* offset must be 8-byte aligned */
+	if ((*f_pos % 8) != 0)
+		return -EINVAL;
+	/* destination buffer must be 8-byte aligned */
+	if ((unsigned long)buf % 8 != 0)
+		return -EINVAL;
+	/* must be in range */
+	if (*f_pos + count > (barlen + DC8051_DATA_MEM_SIZE))
+		return -EINVAL;
+	/* only set the base if we are not starting past the BAR */
+	if (*f_pos < barlen)
+		base += *f_pos;
+	csr_off = *f_pos;
+	for (total = 0; total < count; total += 8, csr_off += 8) {
+		/* accessing LCB CSRs requires more checks */
+		if (is_lcb_offset(csr_off)) {
+			if (read_lcb_csr(dd, csr_off, (u64 *)&data))
+				break; /* failed */
+		}
+		/*
+		 * Cannot read ASIC GPIO/QSFP* clear and force CSRs without a
+		 * false parity error.  Avoid the whole issue by not reading
+		 * them.  These registers are defined as having a read value
+		 * of 0.
+		 */
+		else if (csr_off == ASIC_GPIO_CLEAR
+				|| csr_off == ASIC_GPIO_FORCE
+				|| csr_off == ASIC_QSFP1_CLEAR
+				|| csr_off == ASIC_QSFP1_FORCE
+				|| csr_off == ASIC_QSFP2_CLEAR
+				|| csr_off == ASIC_QSFP2_FORCE)
+			data = 0;
+		else if (csr_off >= barlen) {
+			/*
+			 * read_8051_data can read more than just 8 bytes at
+			 * a time. However, folding this into the loop and
+			 * handling the reads in 8 byte increments allows us
+			 * to smoothly transition from chip memory to 8051
+			 * memory.
+			 */
+			if (read_8051_data(dd,
+					   (u32)(csr_off - barlen),
+					   sizeof(data), &data))
+				break; /* failed */
+		} else
+			data = readq(base + total);
+		if (put_user(data, (unsigned long __user *)(buf + total)))
+			break;
+	}
+	*f_pos += total;
+	return total;
+}
+
+/* NOTE: assumes unsigned long is 8 bytes */
+static ssize_t ui_write(struct file *filp, const char __user *buf,
+			size_t count, loff_t *f_pos)
+{
+	struct hfi1_devdata *dd = filp->private_data;
+	void __iomem *base;
+	unsigned long total, data, csr_off;
+	int in_lcb;
+
+	/* only write 8 byte quantities */
+	if ((count % 8) != 0)
+		return -EINVAL;
+	/* offset must be 8-byte aligned */
+	if ((*f_pos % 8) != 0)
+		return -EINVAL;
+	/* source buffer must be 8-byte aligned */
+	if ((unsigned long)buf % 8 != 0)
+		return -EINVAL;
+	/* must be in range */
+	if (*f_pos + count > dd->kregend - dd->kregbase)
+		return -EINVAL;
+
+	base = (void __iomem *)dd->kregbase + *f_pos;
+	csr_off = *f_pos;
+	in_lcb = 0;
+	for (total = 0; total < count; total += 8, csr_off += 8) {
+		if (get_user(data, (unsigned long __user *)(buf + total)))
+			break;
+		/* accessing LCB CSRs requires a special procedure */
+		if (is_lcb_offset(csr_off)) {
+			if (!in_lcb) {
+				int ret = acquire_lcb_access(dd, 1);
+
+				if (ret)
+					break;
+				in_lcb = 1;
+			}
+		} else {
+			if (in_lcb) {
+				release_lcb_access(dd, 1);
+				in_lcb = 0;
+			}
+		}
+		writeq(data, base + total);
+	}
+	if (in_lcb)
+		release_lcb_access(dd, 1);
+	*f_pos += total;
+	return total;
+}
+
+static const struct file_operations ui_file_ops = {
+	.owner = THIS_MODULE,
+	.llseek = ui_lseek,
+	.read = ui_read,
+	.write = ui_write,
+	.open = ui_open,
+	.release = ui_release,
+};
+#define UI_OFFSET 192	/* device minor offset for UI devices */
+static int create_ui = 1;
+
+static struct cdev wildcard_cdev;
+static struct device *wildcard_device;
+
+static atomic_t user_count = ATOMIC_INIT(0);
+
+static void user_remove(struct hfi1_devdata *dd)
+{
+	if (atomic_dec_return(&user_count) == 0)
+		hfi1_cdev_cleanup(&wildcard_cdev, &wildcard_device);
+
+	hfi1_cdev_cleanup(&dd->user_cdev, &dd->user_device);
+	hfi1_cdev_cleanup(&dd->ui_cdev, &dd->ui_device);
+}
+
+static int user_add(struct hfi1_devdata *dd)
+{
+	char name[10];
+	int ret;
+
+	if (atomic_inc_return(&user_count) == 1) {
+		ret = hfi1_cdev_init(0, class_name(), &hfi1_file_ops,
+				     &wildcard_cdev, &wildcard_device);
+		if (ret)
+			goto done;
+	}
+
+	snprintf(name, sizeof(name), "%s_%d", class_name(), dd->unit);
+	ret = hfi1_cdev_init(dd->unit + 1, name, &hfi1_file_ops,
+			     &dd->user_cdev, &dd->user_device);
+	if (ret)
+		goto done;
+
+	if (create_ui) {
+		snprintf(name, sizeof(name),
+			 "%s_ui%d", class_name(), dd->unit);
+		ret = hfi1_cdev_init(dd->unit + UI_OFFSET, name, &ui_file_ops,
+				     &dd->ui_cdev, &dd->ui_device);
+		if (ret)
+			goto done;
+	}
+
+	return 0;
+done:
+	user_remove(dd);
+	return ret;
+}
+
+/*
+ * Create per-unit files in /dev
+ */
+int hfi1_device_create(struct hfi1_devdata *dd)
+{
+	int r, ret;
+
+	r = user_add(dd);
+	ret = hfi1_diag_add(dd);
+	if (r && !ret)
+		ret = r;
+	return ret;
+}
+
+/*
+ * Remove per-unit files in /dev
+ * void, core kernel returns no errors for this stuff
+ */
+void hfi1_device_remove(struct hfi1_devdata *dd)
+{
+	user_remove(dd);
+	hfi1_diag_remove(dd);
+}
diff --git a/drivers/staging/rdma/hfi1/firmware.c b/drivers/staging/rdma/hfi1/firmware.c
new file mode 100644
index 000000000000..5c2f2ed8f224
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/firmware.c
@@ -0,0 +1,1620 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/firmware.h>
+#include <linux/mutex.h>
+#include <linux/module.h>
+#include <linux/delay.h>
+#include <linux/crc32.h>
+
+#include "hfi.h"
+#include "trace.h"
+
+/*
+ * Make it easy to toggle firmware file name and if it gets loaded by
+ * editing the following. This may be something we do while in development
+ * but not necessarily something a user would ever need to use.
+ */
+#define DEFAULT_FW_8051_NAME_FPGA "hfi_dc8051.bin"
+#define DEFAULT_FW_8051_NAME_ASIC "hfi1_dc8051.fw"
+#define DEFAULT_FW_FABRIC_NAME "hfi1_fabric.fw"
+#define DEFAULT_FW_SBUS_NAME "hfi1_sbus.fw"
+#define DEFAULT_FW_PCIE_NAME "hfi1_pcie.fw"
+#define DEFAULT_PLATFORM_CONFIG_NAME "hfi1_platform.dat"
+
+static uint fw_8051_load = 1;
+static uint fw_fabric_serdes_load = 1;
+static uint fw_pcie_serdes_load = 1;
+static uint fw_sbus_load = 1;
+static uint platform_config_load = 1;
+
+/* Firmware file names get set in hfi1_firmware_init() based on the above */
+static char *fw_8051_name;
+static char *fw_fabric_serdes_name;
+static char *fw_sbus_name;
+static char *fw_pcie_serdes_name;
+static char *platform_config_name;
+
+#define SBUS_MAX_POLL_COUNT 100
+#define SBUS_COUNTER(reg, name) \
+	(((reg) >> ASIC_STS_SBUS_COUNTERS_##name##_CNT_SHIFT) & \
+	 ASIC_STS_SBUS_COUNTERS_##name##_CNT_MASK)
+
+/*
+ * Firmware security header.
+ */
+struct css_header {
+	u32 module_type;
+	u32 header_len;
+	u32 header_version;
+	u32 module_id;
+	u32 module_vendor;
+	u32 date;		/* BCD yyyymmdd */
+	u32 size;		/* in DWORDs */
+	u32 key_size;		/* in DWORDs */
+	u32 modulus_size;	/* in DWORDs */
+	u32 exponent_size;	/* in DWORDs */
+	u32 reserved[22];
+};
+/* expected field values */
+#define CSS_MODULE_TYPE	   0x00000006
+#define CSS_HEADER_LEN	   0x000000a1
+#define CSS_HEADER_VERSION 0x00010000
+#define CSS_MODULE_VENDOR  0x00008086
+
+#define KEY_SIZE      256
+#define MU_SIZE		8
+#define EXPONENT_SIZE	4
+
+/* the file itself */
+struct firmware_file {
+	struct css_header css_header;
+	u8 modulus[KEY_SIZE];
+	u8 exponent[EXPONENT_SIZE];
+	u8 signature[KEY_SIZE];
+	u8 firmware[];
+};
+
+struct augmented_firmware_file {
+	struct css_header css_header;
+	u8 modulus[KEY_SIZE];
+	u8 exponent[EXPONENT_SIZE];
+	u8 signature[KEY_SIZE];
+	u8 r2[KEY_SIZE];
+	u8 mu[MU_SIZE];
+	u8 firmware[];
+};
+
+/* augmented file size difference */
+#define AUGMENT_SIZE (sizeof(struct augmented_firmware_file) - \
+						sizeof(struct firmware_file))
+
+struct firmware_details {
+	/* Linux core piece */
+	const struct firmware *fw;
+
+	struct css_header *css_header;
+	u8 *firmware_ptr;		/* pointer to binary data */
+	u32 firmware_len;		/* length in bytes */
+	u8 *modulus;			/* pointer to the modulus */
+	u8 *exponent;			/* pointer to the exponent */
+	u8 *signature;			/* pointer to the signature */
+	u8 *r2;				/* pointer to r2 */
+	u8 *mu;				/* pointer to mu */
+	struct augmented_firmware_file dummy_header;
+};
+
+/*
+ * The mutex protects fw_state, fw_err, and all of the firmware_details
+ * variables.
+ */
+static DEFINE_MUTEX(fw_mutex);
+enum fw_state {
+	FW_EMPTY,
+	FW_ACQUIRED,
+	FW_ERR
+};
+static enum fw_state fw_state = FW_EMPTY;
+static int fw_err;
+static struct firmware_details fw_8051;
+static struct firmware_details fw_fabric;
+static struct firmware_details fw_pcie;
+static struct firmware_details fw_sbus;
+static const struct firmware *platform_config;
+
+/* flags for turn_off_spicos() */
+#define SPICO_SBUS   0x1
+#define SPICO_FABRIC 0x2
+#define ENABLE_SPICO_SMASK 0x1
+
+/* security block commands */
+#define RSA_CMD_INIT  0x1
+#define RSA_CMD_START 0x2
+
+/* security block status */
+#define RSA_STATUS_IDLE   0x0
+#define RSA_STATUS_ACTIVE 0x1
+#define RSA_STATUS_DONE   0x2
+#define RSA_STATUS_FAILED 0x3
+
+/* RSA engine timeout, in ms */
+#define RSA_ENGINE_TIMEOUT 100 /* ms */
+
+/* hardware mutex timeout, in ms */
+#define HM_TIMEOUT 4000 /* 4 s */
+
+/* 8051 memory access timeout, in us */
+#define DC8051_ACCESS_TIMEOUT 100 /* us */
+
+/* the number of fabric SerDes on the SBus */
+#define NUM_FABRIC_SERDES 4
+
+/* SBus fabric SerDes addresses, one set per HFI */
+static const u8 fabric_serdes_addrs[2][NUM_FABRIC_SERDES] = {
+	{ 0x01, 0x02, 0x03, 0x04 },
+	{ 0x28, 0x29, 0x2a, 0x2b }
+};
+
+/* SBus PCIe SerDes addresses, one set per HFI */
+static const u8 pcie_serdes_addrs[2][NUM_PCIE_SERDES] = {
+	{ 0x08, 0x0a, 0x0c, 0x0e, 0x10, 0x12, 0x14, 0x16,
+	  0x18, 0x1a, 0x1c, 0x1e, 0x20, 0x22, 0x24, 0x26 },
+	{ 0x2f, 0x31, 0x33, 0x35, 0x37, 0x39, 0x3b, 0x3d,
+	  0x3f, 0x41, 0x43, 0x45, 0x47, 0x49, 0x4b, 0x4d }
+};
+
+/* SBus PCIe PCS addresses, one set per HFI */
+const u8 pcie_pcs_addrs[2][NUM_PCIE_SERDES] = {
+	{ 0x09, 0x0b, 0x0d, 0x0f, 0x11, 0x13, 0x15, 0x17,
+	  0x19, 0x1b, 0x1d, 0x1f, 0x21, 0x23, 0x25, 0x27 },
+	{ 0x30, 0x32, 0x34, 0x36, 0x38, 0x3a, 0x3c, 0x3e,
+	  0x40, 0x42, 0x44, 0x46, 0x48, 0x4a, 0x4c, 0x4e }
+};
+
+/* SBus fabric SerDes broadcast addresses, one per HFI */
+static const u8 fabric_serdes_broadcast[2] = { 0xe4, 0xe5 };
+static const u8 all_fabric_serdes_broadcast = 0xe1;
+
+/* SBus PCIe SerDes broadcast addresses, one per HFI */
+const u8 pcie_serdes_broadcast[2] = { 0xe2, 0xe3 };
+static const u8 all_pcie_serdes_broadcast = 0xe0;
+
+/* forwards */
+static void dispose_one_firmware(struct firmware_details *fdet);
+
+/*
+ * Read a single 64-bit value from 8051 data memory.
+ *
+ * Expects:
+ * o caller to have already set up data read, no auto increment
+ * o caller to turn off read enable when finished
+ *
+ * The address argument is a byte offset.  Bits 0:2 in the address are
+ * ignored - i.e. the hardware will always do aligned 8-byte reads as if
+ * the lower bits are zero.
+ *
+ * Return 0 on success, -ENXIO on a read error (timeout).
+ */
+static int __read_8051_data(struct hfi1_devdata *dd, u32 addr, u64 *result)
+{
+	u64 reg;
+	int count;
+
+	/* start the read at the given address */
+	reg = ((addr & DC_DC8051_CFG_RAM_ACCESS_CTRL_ADDRESS_MASK)
+			<< DC_DC8051_CFG_RAM_ACCESS_CTRL_ADDRESS_SHIFT)
+		| DC_DC8051_CFG_RAM_ACCESS_CTRL_READ_ENA_SMASK;
+	write_csr(dd, DC_DC8051_CFG_RAM_ACCESS_CTRL, reg);
+
+	/* wait until ACCESS_COMPLETED is set */
+	count = 0;
+	while ((read_csr(dd, DC_DC8051_CFG_RAM_ACCESS_STATUS)
+		    & DC_DC8051_CFG_RAM_ACCESS_STATUS_ACCESS_COMPLETED_SMASK)
+		    == 0) {
+		count++;
+		if (count > DC8051_ACCESS_TIMEOUT) {
+			dd_dev_err(dd, "timeout reading 8051 data\n");
+			return -ENXIO;
+		}
+		ndelay(10);
+	}
+
+	/* gather the data */
+	*result = read_csr(dd, DC_DC8051_CFG_RAM_ACCESS_RD_DATA);
+
+	return 0;
+}
+
+/*
+ * Read 8051 data starting at addr, for len bytes.  Will read in 8-byte chunks.
+ * Return 0 on success, -errno on error.
+ */
+int read_8051_data(struct hfi1_devdata *dd, u32 addr, u32 len, u64 *result)
+{
+	unsigned long flags;
+	u32 done;
+	int ret = 0;
+
+	spin_lock_irqsave(&dd->dc8051_memlock, flags);
+
+	/* data read set-up, no auto-increment */
+	write_csr(dd, DC_DC8051_CFG_RAM_ACCESS_SETUP, 0);
+
+	for (done = 0; done < len; addr += 8, done += 8, result++) {
+		ret = __read_8051_data(dd, addr, result);
+		if (ret)
+			break;
+	}
+
+	/* turn off read enable */
+	write_csr(dd, DC_DC8051_CFG_RAM_ACCESS_CTRL, 0);
+
+	spin_unlock_irqrestore(&dd->dc8051_memlock, flags);
+
+	return ret;
+}
+
+/*
+ * Write data or code to the 8051 code or data RAM.
+ */
+static int write_8051(struct hfi1_devdata *dd, int code, u32 start,
+		      const u8 *data, u32 len)
+{
+	u64 reg;
+	u32 offset;
+	int aligned, count;
+
+	/* check alignment */
+	aligned = ((unsigned long)data & 0x7) == 0;
+
+	/* write set-up */
+	reg = (code ? DC_DC8051_CFG_RAM_ACCESS_SETUP_RAM_SEL_SMASK : 0ull)
+		| DC_DC8051_CFG_RAM_ACCESS_SETUP_AUTO_INCR_ADDR_SMASK;
+	write_csr(dd, DC_DC8051_CFG_RAM_ACCESS_SETUP, reg);
+
+	reg = ((start & DC_DC8051_CFG_RAM_ACCESS_CTRL_ADDRESS_MASK)
+			<< DC_DC8051_CFG_RAM_ACCESS_CTRL_ADDRESS_SHIFT)
+		| DC_DC8051_CFG_RAM_ACCESS_CTRL_WRITE_ENA_SMASK;
+	write_csr(dd, DC_DC8051_CFG_RAM_ACCESS_CTRL, reg);
+
+	/* write */
+	for (offset = 0; offset < len; offset += 8) {
+		int bytes = len - offset;
+
+		if (bytes < 8) {
+			reg = 0;
+			memcpy(&reg, &data[offset], bytes);
+		} else if (aligned) {
+			reg = *(u64 *)&data[offset];
+		} else {
+			memcpy(&reg, &data[offset], 8);
+		}
+		write_csr(dd, DC_DC8051_CFG_RAM_ACCESS_WR_DATA, reg);
+
+		/* wait until ACCESS_COMPLETED is set */
+		count = 0;
+		while ((read_csr(dd, DC_DC8051_CFG_RAM_ACCESS_STATUS)
+		    & DC_DC8051_CFG_RAM_ACCESS_STATUS_ACCESS_COMPLETED_SMASK)
+		    == 0) {
+			count++;
+			if (count > DC8051_ACCESS_TIMEOUT) {
+				dd_dev_err(dd, "timeout writing 8051 data\n");
+				return -ENXIO;
+			}
+			udelay(1);
+		}
+	}
+
+	/* turn off write access, auto increment (also sets to data access) */
+	write_csr(dd, DC_DC8051_CFG_RAM_ACCESS_CTRL, 0);
+	write_csr(dd, DC_DC8051_CFG_RAM_ACCESS_SETUP, 0);
+
+	return 0;
+}
+
+/* return 0 if values match, non-zero and complain otherwise */
+static int invalid_header(struct hfi1_devdata *dd, const char *what,
+			  u32 actual, u32 expected)
+{
+	if (actual == expected)
+		return 0;
+
+	dd_dev_err(dd,
+		"invalid firmware header field %s: expected 0x%x, actual 0x%x\n",
+		what, expected, actual);
+	return 1;
+}
+
+/*
+ * Verify that the static fields in the CSS header match.
+ */
+static int verify_css_header(struct hfi1_devdata *dd, struct css_header *css)
+{
+	/* verify CSS header fields (most sizes are in DW, so add /4) */
+	if (invalid_header(dd, "module_type", css->module_type, CSS_MODULE_TYPE)
+			|| invalid_header(dd, "header_len", css->header_len,
+					(sizeof(struct firmware_file)/4))
+			|| invalid_header(dd, "header_version",
+					css->header_version, CSS_HEADER_VERSION)
+			|| invalid_header(dd, "module_vendor",
+					css->module_vendor, CSS_MODULE_VENDOR)
+			|| invalid_header(dd, "key_size",
+					css->key_size, KEY_SIZE/4)
+			|| invalid_header(dd, "modulus_size",
+					css->modulus_size, KEY_SIZE/4)
+			|| invalid_header(dd, "exponent_size",
+					css->exponent_size, EXPONENT_SIZE/4)) {
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/*
+ * Make sure there are at least some bytes after the prefix.
+ */
+static int payload_check(struct hfi1_devdata *dd, const char *name,
+			 long file_size, long prefix_size)
+{
+	/* make sure we have some payload */
+	if (prefix_size >= file_size) {
+		dd_dev_err(dd,
+			"firmware \"%s\", size %ld, must be larger than %ld bytes\n",
+			name, file_size, prefix_size);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * Request the firmware from the system.  Extract the pieces and fill in
+ * fdet.  If successful, the caller will need to call dispose_one_firmware().
+ * Returns 0 on success, -ERRNO on error.
+ */
+static int obtain_one_firmware(struct hfi1_devdata *dd, const char *name,
+			       struct firmware_details *fdet)
+{
+	struct css_header *css;
+	int ret;
+
+	memset(fdet, 0, sizeof(*fdet));
+
+	ret = request_firmware(&fdet->fw, name, &dd->pcidev->dev);
+	if (ret) {
+		dd_dev_err(dd, "cannot load firmware \"%s\", err %d\n",
+			name, ret);
+		return ret;
+	}
+
+	/* verify the firmware */
+	if (fdet->fw->size < sizeof(struct css_header)) {
+		dd_dev_err(dd, "firmware \"%s\" is too small\n", name);
+		ret = -EINVAL;
+		goto done;
+	}
+	css = (struct css_header *)fdet->fw->data;
+
+	hfi1_cdbg(FIRMWARE, "Firmware %s details:", name);
+	hfi1_cdbg(FIRMWARE, "file size: 0x%lx bytes", fdet->fw->size);
+	hfi1_cdbg(FIRMWARE, "CSS structure:");
+	hfi1_cdbg(FIRMWARE, "  module_type    0x%x", css->module_type);
+	hfi1_cdbg(FIRMWARE, "  header_len     0x%03x (0x%03x bytes)",
+		  css->header_len, 4 * css->header_len);
+	hfi1_cdbg(FIRMWARE, "  header_version 0x%x", css->header_version);
+	hfi1_cdbg(FIRMWARE, "  module_id      0x%x", css->module_id);
+	hfi1_cdbg(FIRMWARE, "  module_vendor  0x%x", css->module_vendor);
+	hfi1_cdbg(FIRMWARE, "  date           0x%x", css->date);
+	hfi1_cdbg(FIRMWARE, "  size           0x%03x (0x%03x bytes)",
+		  css->size, 4 * css->size);
+	hfi1_cdbg(FIRMWARE, "  key_size       0x%03x (0x%03x bytes)",
+		  css->key_size, 4 * css->key_size);
+	hfi1_cdbg(FIRMWARE, "  modulus_size   0x%03x (0x%03x bytes)",
+		  css->modulus_size, 4 * css->modulus_size);
+	hfi1_cdbg(FIRMWARE, "  exponent_size  0x%03x (0x%03x bytes)",
+		  css->exponent_size, 4 * css->exponent_size);
+	hfi1_cdbg(FIRMWARE, "firmware size: 0x%lx bytes",
+		  fdet->fw->size - sizeof(struct firmware_file));
+
+	/*
+	 * If the file does not have a valid CSS header, fail.
+	 * Otherwise, check the CSS size field for an expected size.
+	 * The augmented file has r2 and mu inserted after the header
+	 * was generated, so there will be a known difference between
+	 * the CSS header size and the actual file size.  Use this
+	 * difference to identify an augmented file.
+	 *
+	 * Note: css->size is in DWORDs, multiply by 4 to get bytes.
+	 */
+	ret = verify_css_header(dd, css);
+	if (ret) {
+		dd_dev_info(dd, "Invalid CSS header for \"%s\"\n", name);
+	} else if ((css->size*4) == fdet->fw->size) {
+		/* non-augmented firmware file */
+		struct firmware_file *ff = (struct firmware_file *)
+							fdet->fw->data;
+
+		/* make sure there are bytes in the payload */
+		ret = payload_check(dd, name, fdet->fw->size,
+						sizeof(struct firmware_file));
+		if (ret == 0) {
+			fdet->css_header = css;
+			fdet->modulus = ff->modulus;
+			fdet->exponent = ff->exponent;
+			fdet->signature = ff->signature;
+			fdet->r2 = fdet->dummy_header.r2; /* use dummy space */
+			fdet->mu = fdet->dummy_header.mu; /* use dummy space */
+			fdet->firmware_ptr = ff->firmware;
+			fdet->firmware_len = fdet->fw->size -
+						sizeof(struct firmware_file);
+			/*
+			 * Header does not include r2 and mu - generate here.
+			 * For now, fail.
+			 */
+			dd_dev_err(dd, "driver is unable to validate firmware without r2 and mu (not in firmware file)\n");
+			ret = -EINVAL;
+		}
+	} else if ((css->size*4) + AUGMENT_SIZE == fdet->fw->size) {
+		/* augmented firmware file */
+		struct augmented_firmware_file *aff =
+			(struct augmented_firmware_file *)fdet->fw->data;
+
+		/* make sure there are bytes in the payload */
+		ret = payload_check(dd, name, fdet->fw->size,
+					sizeof(struct augmented_firmware_file));
+		if (ret == 0) {
+			fdet->css_header = css;
+			fdet->modulus = aff->modulus;
+			fdet->exponent = aff->exponent;
+			fdet->signature = aff->signature;
+			fdet->r2 = aff->r2;
+			fdet->mu = aff->mu;
+			fdet->firmware_ptr = aff->firmware;
+			fdet->firmware_len = fdet->fw->size -
+					sizeof(struct augmented_firmware_file);
+		}
+	} else {
+		/* css->size check failed */
+		dd_dev_err(dd,
+			"invalid firmware header field size: expected 0x%lx or 0x%lx, actual 0x%x\n",
+			fdet->fw->size/4, (fdet->fw->size - AUGMENT_SIZE)/4,
+			css->size);
+
+		ret = -EINVAL;
+	}
+
+done:
+	/* if returning an error, clean up after ourselves */
+	if (ret)
+		dispose_one_firmware(fdet);
+	return ret;
+}
+
+static void dispose_one_firmware(struct firmware_details *fdet)
+{
+	release_firmware(fdet->fw);
+	fdet->fw = NULL;
+}
+
+/*
+ * Called by all HFIs when loading their firmware - i.e. device probe time.
+ * The first one will do the actual firmware load.  Use a mutex to resolve
+ * any possible race condition.
+ *
+ * The call to this routine cannot be moved to driver load because the kernel
+ * call request_firmware() requires a device which is only available after
+ * the first device probe.
+ */
+static int obtain_firmware(struct hfi1_devdata *dd)
+{
+	int err = 0;
+
+	mutex_lock(&fw_mutex);
+	if (fw_state == FW_ACQUIRED) {
+		goto done;	/* already acquired */
+	} else if (fw_state == FW_ERR) {
+		err = fw_err;
+		goto done;	/* already tried and failed */
+	}
+
+	if (fw_8051_load) {
+		err = obtain_one_firmware(dd, fw_8051_name, &fw_8051);
+		if (err)
+			goto done;
+	}
+
+	if (fw_fabric_serdes_load) {
+		err = obtain_one_firmware(dd, fw_fabric_serdes_name,
+			&fw_fabric);
+		if (err)
+			goto done;
+	}
+
+	if (fw_sbus_load) {
+		err = obtain_one_firmware(dd, fw_sbus_name, &fw_sbus);
+		if (err)
+			goto done;
+	}
+
+	if (fw_pcie_serdes_load) {
+		err = obtain_one_firmware(dd, fw_pcie_serdes_name, &fw_pcie);
+		if (err)
+			goto done;
+	}
+
+	if (platform_config_load) {
+		platform_config = NULL;
+		err = request_firmware(&platform_config, platform_config_name,
+						&dd->pcidev->dev);
+		if (err) {
+			err = 0;
+			platform_config = NULL;
+		}
+	}
+
+	/* success */
+	fw_state = FW_ACQUIRED;
+
+done:
+	if (err) {
+		fw_err = err;
+		fw_state = FW_ERR;
+	}
+	mutex_unlock(&fw_mutex);
+
+	return err;
+}
+
+/*
+ * Called when the driver unloads.  The timing is asymmetric with its
+ * counterpart, obtain_firmware().  If called at device remove time,
+ * then it is conceivable that another device could probe while the
+ * firmware is being disposed.  The mutexes can be moved to do that
+ * safely, but then the firmware would be requested from the OS multiple
+ * times.
+ *
+ * No mutex is needed as the driver is unloading and there cannot be any
+ * other callers.
+ */
+void dispose_firmware(void)
+{
+	dispose_one_firmware(&fw_8051);
+	dispose_one_firmware(&fw_fabric);
+	dispose_one_firmware(&fw_pcie);
+	dispose_one_firmware(&fw_sbus);
+
+	release_firmware(platform_config);
+	platform_config = NULL;
+
+	/* retain the error state, otherwise revert to empty */
+	if (fw_state != FW_ERR)
+		fw_state = FW_EMPTY;
+}
+
+/*
+ * Write a block of data to a given array CSR.  All calls will be in
+ * multiples of 8 bytes.
+ */
+static void write_rsa_data(struct hfi1_devdata *dd, int what,
+			   const u8 *data, int nbytes)
+{
+	int qw_size = nbytes/8;
+	int i;
+
+	if (((unsigned long)data & 0x7) == 0) {
+		/* aligned */
+		u64 *ptr = (u64 *)data;
+
+		for (i = 0; i < qw_size; i++, ptr++)
+			write_csr(dd, what + (8*i), *ptr);
+	} else {
+		/* not aligned */
+		for (i = 0; i < qw_size; i++, data += 8) {
+			u64 value;
+
+			memcpy(&value, data, 8);
+			write_csr(dd, what + (8*i), value);
+		}
+	}
+}
+
+/*
+ * Write a block of data to a given CSR as a stream of writes.  All calls will
+ * be in multiples of 8 bytes.
+ */
+static void write_streamed_rsa_data(struct hfi1_devdata *dd, int what,
+				    const u8 *data, int nbytes)
+{
+	u64 *ptr = (u64 *)data;
+	int qw_size = nbytes/8;
+
+	for (; qw_size > 0; qw_size--, ptr++)
+		write_csr(dd, what, *ptr);
+}
+
+/*
+ * Download the signature and start the RSA mechanism.  Wait for
+ * RSA_ENGINE_TIMEOUT before giving up.
+ */
+static int run_rsa(struct hfi1_devdata *dd, const char *who,
+		   const u8 *signature)
+{
+	unsigned long timeout;
+	u64 reg;
+	u32 status;
+	int ret = 0;
+
+	/* write the signature */
+	write_rsa_data(dd, MISC_CFG_RSA_SIGNATURE, signature, KEY_SIZE);
+
+	/* initialize RSA */
+	write_csr(dd, MISC_CFG_RSA_CMD, RSA_CMD_INIT);
+
+	/*
+	 * Make sure the engine is idle and insert a delay between the two
+	 * writes to MISC_CFG_RSA_CMD.
+	 */
+	status = (read_csr(dd, MISC_CFG_FW_CTRL)
+			   & MISC_CFG_FW_CTRL_RSA_STATUS_SMASK)
+			     >> MISC_CFG_FW_CTRL_RSA_STATUS_SHIFT;
+	if (status != RSA_STATUS_IDLE) {
+		dd_dev_err(dd, "%s security engine not idle - giving up\n",
+			who);
+		return -EBUSY;
+	}
+
+	/* start RSA */
+	write_csr(dd, MISC_CFG_RSA_CMD, RSA_CMD_START);
+
+	/*
+	 * Look for the result.
+	 *
+	 * The RSA engine is hooked up to two MISC errors.  The driver
+	 * masks these errors as they do not respond to the standard
+	 * error "clear down" mechanism.  Look for these errors here and
+	 * clear them when possible.  This routine will exit with the
+	 * errors of the current run still set.
+	 *
+	 * MISC_FW_AUTH_FAILED_ERR
+	 *	Firmware authorization failed.  This can be cleared by
+	 *	re-initializing the RSA engine, then clearing the status bit.
+	 *	Do not re-init the RSA angine immediately after a successful
+	 *	run - this will reset the current authorization.
+	 *
+	 * MISC_KEY_MISMATCH_ERR
+	 *	Key does not match.  The only way to clear this is to load
+	 *	a matching key then clear the status bit.  If this error
+	 *	is raised, it will persist outside of this routine until a
+	 *	matching key is loaded.
+	 */
+	timeout = msecs_to_jiffies(RSA_ENGINE_TIMEOUT) + jiffies;
+	while (1) {
+		status = (read_csr(dd, MISC_CFG_FW_CTRL)
+			   & MISC_CFG_FW_CTRL_RSA_STATUS_SMASK)
+			     >> MISC_CFG_FW_CTRL_RSA_STATUS_SHIFT;
+
+		if (status == RSA_STATUS_IDLE) {
+			/* should not happen */
+			dd_dev_err(dd, "%s firmware security bad idle state\n",
+				who);
+			ret = -EINVAL;
+			break;
+		} else if (status == RSA_STATUS_DONE) {
+			/* finished successfully */
+			break;
+		} else if (status == RSA_STATUS_FAILED) {
+			/* finished unsuccessfully */
+			ret = -EINVAL;
+			break;
+		}
+		/* else still active */
+
+		if (time_after(jiffies, timeout)) {
+			/*
+			 * Timed out while active.  We can't reset the engine
+			 * if it is stuck active, but run through the
+			 * error code to see what error bits are set.
+			 */
+			dd_dev_err(dd, "%s firmware security time out\n", who);
+			ret = -ETIMEDOUT;
+			break;
+		}
+
+		msleep(20);
+	}
+
+	/*
+	 * Arrive here on success or failure.  Clear all RSA engine
+	 * errors.  All current errors will stick - the RSA logic is keeping
+	 * error high.  All previous errors will clear - the RSA logic
+	 * is not keeping the error high.
+	 */
+	write_csr(dd, MISC_ERR_CLEAR,
+			MISC_ERR_STATUS_MISC_FW_AUTH_FAILED_ERR_SMASK
+			| MISC_ERR_STATUS_MISC_KEY_MISMATCH_ERR_SMASK);
+	/*
+	 * All that is left are the current errors.  Print failure details,
+	 * if any.
+	 */
+	reg = read_csr(dd, MISC_ERR_STATUS);
+	if (ret) {
+		if (reg & MISC_ERR_STATUS_MISC_FW_AUTH_FAILED_ERR_SMASK)
+			dd_dev_err(dd, "%s firmware authorization failed\n",
+				who);
+		if (reg & MISC_ERR_STATUS_MISC_KEY_MISMATCH_ERR_SMASK)
+			dd_dev_err(dd, "%s firmware key mismatch\n", who);
+	}
+
+	return ret;
+}
+
+static void load_security_variables(struct hfi1_devdata *dd,
+				    struct firmware_details *fdet)
+{
+	/* Security variables a.  Write the modulus */
+	write_rsa_data(dd, MISC_CFG_RSA_MODULUS, fdet->modulus, KEY_SIZE);
+	/* Security variables b.  Write the r2 */
+	write_rsa_data(dd, MISC_CFG_RSA_R2, fdet->r2, KEY_SIZE);
+	/* Security variables c.  Write the mu */
+	write_rsa_data(dd, MISC_CFG_RSA_MU, fdet->mu, MU_SIZE);
+	/* Security variables d.  Write the header */
+	write_streamed_rsa_data(dd, MISC_CFG_SHA_PRELOAD,
+			(u8 *)fdet->css_header, sizeof(struct css_header));
+}
+
+/* return the 8051 firmware state */
+static inline u32 get_firmware_state(struct hfi1_devdata *dd)
+{
+	u64 reg = read_csr(dd, DC_DC8051_STS_CUR_STATE);
+
+	return (reg >> DC_DC8051_STS_CUR_STATE_FIRMWARE_SHIFT)
+				& DC_DC8051_STS_CUR_STATE_FIRMWARE_MASK;
+}
+
+/*
+ * Wait until the firmware is up and ready to take host requests.
+ * Return 0 on success, -ETIMEDOUT on timeout.
+ */
+int wait_fm_ready(struct hfi1_devdata *dd, u32 mstimeout)
+{
+	unsigned long timeout;
+
+	/* in the simulator, the fake 8051 is always ready */
+	if (dd->icode == ICODE_FUNCTIONAL_SIMULATOR)
+		return 0;
+
+	timeout = msecs_to_jiffies(mstimeout) + jiffies;
+	while (1) {
+		if (get_firmware_state(dd) == 0xa0)	/* ready */
+			return 0;
+		if (time_after(jiffies, timeout))	/* timed out */
+			return -ETIMEDOUT;
+		usleep_range(1950, 2050); /* sleep 2ms-ish */
+	}
+}
+
+/*
+ * Load the 8051 firmware.
+ */
+static int load_8051_firmware(struct hfi1_devdata *dd,
+			      struct firmware_details *fdet)
+{
+	u64 reg;
+	int ret;
+	u8 ver_a, ver_b;
+
+	/*
+	 * DC Reset sequence
+	 * Load DC 8051 firmware
+	 */
+	/*
+	 * DC reset step 1: Reset DC8051
+	 */
+	reg = DC_DC8051_CFG_RST_M8051W_SMASK
+		| DC_DC8051_CFG_RST_CRAM_SMASK
+		| DC_DC8051_CFG_RST_DRAM_SMASK
+		| DC_DC8051_CFG_RST_IRAM_SMASK
+		| DC_DC8051_CFG_RST_SFR_SMASK;
+	write_csr(dd, DC_DC8051_CFG_RST, reg);
+
+	/*
+	 * DC reset step 2 (optional): Load 8051 data memory with link
+	 * configuration
+	 */
+
+	/*
+	 * DC reset step 3: Load DC8051 firmware
+	 */
+	/* release all but the core reset */
+	reg = DC_DC8051_CFG_RST_M8051W_SMASK;
+	write_csr(dd, DC_DC8051_CFG_RST, reg);
+
+	/* Firmware load step 1 */
+	load_security_variables(dd, fdet);
+
+	/*
+	 * Firmware load step 2.  Clear MISC_CFG_FW_CTRL.FW_8051_LOADED
+	 */
+	write_csr(dd, MISC_CFG_FW_CTRL, 0);
+
+	/* Firmware load steps 3-5 */
+	ret = write_8051(dd, 1/*code*/, 0, fdet->firmware_ptr,
+							fdet->firmware_len);
+	if (ret)
+		return ret;
+
+	/*
+	 * DC reset step 4. Host starts the DC8051 firmware
+	 */
+	/*
+	 * Firmware load step 6.  Set MISC_CFG_FW_CTRL.FW_8051_LOADED
+	 */
+	write_csr(dd, MISC_CFG_FW_CTRL, MISC_CFG_FW_CTRL_FW_8051_LOADED_SMASK);
+
+	/* Firmware load steps 7-10 */
+	ret = run_rsa(dd, "8051", fdet->signature);
+	if (ret)
+		return ret;
+
+	/* clear all reset bits, releasing the 8051 */
+	write_csr(dd, DC_DC8051_CFG_RST, 0ull);
+
+	/*
+	 * DC reset step 5. Wait for firmware to be ready to accept host
+	 * requests.
+	 */
+	ret = wait_fm_ready(dd, TIMEOUT_8051_START);
+	if (ret) { /* timed out */
+		dd_dev_err(dd, "8051 start timeout, current state 0x%x\n",
+			get_firmware_state(dd));
+		return -ETIMEDOUT;
+	}
+
+	read_misc_status(dd, &ver_a, &ver_b);
+	dd_dev_info(dd, "8051 firmware version %d.%d\n",
+		(int)ver_b, (int)ver_a);
+	dd->dc8051_ver = dc8051_ver(ver_b, ver_a);
+
+	return 0;
+}
+
+/* SBus Master broadcast address */
+#define SBUS_MASTER_BROADCAST 0xfd
+
+/*
+ * Write the SBus request register
+ *
+ * No need for masking - the arguments are sized exactly.
+ */
+void sbus_request(struct hfi1_devdata *dd,
+		  u8 receiver_addr, u8 data_addr, u8 command, u32 data_in)
+{
+	write_csr(dd, ASIC_CFG_SBUS_REQUEST,
+		((u64)data_in << ASIC_CFG_SBUS_REQUEST_DATA_IN_SHIFT)
+		| ((u64)command << ASIC_CFG_SBUS_REQUEST_COMMAND_SHIFT)
+		| ((u64)data_addr << ASIC_CFG_SBUS_REQUEST_DATA_ADDR_SHIFT)
+		| ((u64)receiver_addr
+			<< ASIC_CFG_SBUS_REQUEST_RECEIVER_ADDR_SHIFT));
+}
+
+/*
+ * Turn off the SBus and fabric serdes spicos.
+ *
+ * + Must be called with Sbus fast mode turned on.
+ * + Must be called after fabric serdes broadcast is set up.
+ * + Must be called before the 8051 is loaded - assumes 8051 is not loaded
+ *   when using MISC_CFG_FW_CTRL.
+ */
+static void turn_off_spicos(struct hfi1_devdata *dd, int flags)
+{
+	/* only needed on A0 */
+	if (!is_a0(dd))
+		return;
+
+	dd_dev_info(dd, "Turning off spicos:%s%s\n",
+		flags & SPICO_SBUS ? " SBus" : "",
+		flags & SPICO_FABRIC ? " fabric" : "");
+
+	write_csr(dd, MISC_CFG_FW_CTRL, ENABLE_SPICO_SMASK);
+	/* disable SBus spico */
+	if (flags & SPICO_SBUS)
+		sbus_request(dd, SBUS_MASTER_BROADCAST, 0x01,
+			WRITE_SBUS_RECEIVER, 0x00000040);
+
+	/* disable the fabric serdes spicos */
+	if (flags & SPICO_FABRIC)
+		sbus_request(dd, fabric_serdes_broadcast[dd->hfi1_id],
+			     0x07, WRITE_SBUS_RECEIVER, 0x00000000);
+	write_csr(dd, MISC_CFG_FW_CTRL, 0);
+}
+
+/*
+ *  Reset all of the fabric serdes for our HFI.
+ */
+void fabric_serdes_reset(struct hfi1_devdata *dd)
+{
+	u8 ra;
+
+	if (dd->icode != ICODE_RTL_SILICON) /* only for RTL */
+		return;
+
+	ra = fabric_serdes_broadcast[dd->hfi1_id];
+
+	acquire_hw_mutex(dd);
+	set_sbus_fast_mode(dd);
+	/* place SerDes in reset and disable SPICO */
+	sbus_request(dd, ra, 0x07, WRITE_SBUS_RECEIVER, 0x00000011);
+	/* wait 100 refclk cycles @ 156.25MHz => 640ns */
+	udelay(1);
+	/* remove SerDes reset */
+	sbus_request(dd, ra, 0x07, WRITE_SBUS_RECEIVER, 0x00000010);
+	/* turn SPICO enable on */
+	sbus_request(dd, ra, 0x07, WRITE_SBUS_RECEIVER, 0x00000002);
+	clear_sbus_fast_mode(dd);
+	release_hw_mutex(dd);
+}
+
+/* Access to the SBus in this routine should probably be serialized */
+int sbus_request_slow(struct hfi1_devdata *dd,
+		      u8 receiver_addr, u8 data_addr, u8 command, u32 data_in)
+{
+	u64 reg, count = 0;
+
+	sbus_request(dd, receiver_addr, data_addr, command, data_in);
+	write_csr(dd, ASIC_CFG_SBUS_EXECUTE,
+		  ASIC_CFG_SBUS_EXECUTE_EXECUTE_SMASK);
+	/* Wait for both DONE and RCV_DATA_VALID to go high */
+	reg = read_csr(dd, ASIC_STS_SBUS_RESULT);
+	while (!((reg & ASIC_STS_SBUS_RESULT_DONE_SMASK) &&
+		 (reg & ASIC_STS_SBUS_RESULT_RCV_DATA_VALID_SMASK))) {
+		if (count++ >= SBUS_MAX_POLL_COUNT) {
+			u64 counts = read_csr(dd, ASIC_STS_SBUS_COUNTERS);
+			/*
+			 * If the loop has timed out, we are OK if DONE bit
+			 * is set and RCV_DATA_VALID and EXECUTE counters
+			 * are the same. If not, we cannot proceed.
+			 */
+			if ((reg & ASIC_STS_SBUS_RESULT_DONE_SMASK) &&
+			    (SBUS_COUNTER(counts, RCV_DATA_VALID) ==
+			     SBUS_COUNTER(counts, EXECUTE)))
+				break;
+			return -ETIMEDOUT;
+		}
+		udelay(1);
+		reg = read_csr(dd, ASIC_STS_SBUS_RESULT);
+	}
+	count = 0;
+	write_csr(dd, ASIC_CFG_SBUS_EXECUTE, 0);
+	/* Wait for DONE to clear after EXECUTE is cleared */
+	reg = read_csr(dd, ASIC_STS_SBUS_RESULT);
+	while (reg & ASIC_STS_SBUS_RESULT_DONE_SMASK) {
+		if (count++ >= SBUS_MAX_POLL_COUNT)
+			return -ETIME;
+		udelay(1);
+		reg = read_csr(dd, ASIC_STS_SBUS_RESULT);
+	}
+	return 0;
+}
+
+static int load_fabric_serdes_firmware(struct hfi1_devdata *dd,
+				       struct firmware_details *fdet)
+{
+	int i, err;
+	const u8 ra = fabric_serdes_broadcast[dd->hfi1_id]; /* receiver addr */
+
+	dd_dev_info(dd, "Downloading fabric firmware\n");
+
+	/* step 1: load security variables */
+	load_security_variables(dd, fdet);
+	/* step 2: place SerDes in reset and disable SPICO */
+	sbus_request(dd, ra, 0x07, WRITE_SBUS_RECEIVER, 0x00000011);
+	/* wait 100 refclk cycles @ 156.25MHz => 640ns */
+	udelay(1);
+	/* step 3:  remove SerDes reset */
+	sbus_request(dd, ra, 0x07, WRITE_SBUS_RECEIVER, 0x00000010);
+	/* step 4: assert IMEM override */
+	sbus_request(dd, ra, 0x00, WRITE_SBUS_RECEIVER, 0x40000000);
+	/* step 5: download SerDes machine code */
+	for (i = 0; i < fdet->firmware_len; i += 4) {
+		sbus_request(dd, ra, 0x0a, WRITE_SBUS_RECEIVER,
+					*(u32 *)&fdet->firmware_ptr[i]);
+	}
+	/* step 6: IMEM override off */
+	sbus_request(dd, ra, 0x00, WRITE_SBUS_RECEIVER, 0x00000000);
+	/* step 7: turn ECC on */
+	sbus_request(dd, ra, 0x0b, WRITE_SBUS_RECEIVER, 0x000c0000);
+
+	/* steps 8-11: run the RSA engine */
+	err = run_rsa(dd, "fabric serdes", fdet->signature);
+	if (err)
+		return err;
+
+	/* step 12: turn SPICO enable on */
+	sbus_request(dd, ra, 0x07, WRITE_SBUS_RECEIVER, 0x00000002);
+	/* step 13: enable core hardware interrupts */
+	sbus_request(dd, ra, 0x08, WRITE_SBUS_RECEIVER, 0x00000000);
+
+	return 0;
+}
+
+static int load_sbus_firmware(struct hfi1_devdata *dd,
+			      struct firmware_details *fdet)
+{
+	int i, err;
+	const u8 ra = SBUS_MASTER_BROADCAST; /* receiver address */
+
+	dd_dev_info(dd, "Downloading SBus firmware\n");
+
+	/* step 1: load security variables */
+	load_security_variables(dd, fdet);
+	/* step 2: place SPICO into reset and enable off */
+	sbus_request(dd, ra, 0x01, WRITE_SBUS_RECEIVER, 0x000000c0);
+	/* step 3: remove reset, enable off, IMEM_CNTRL_EN on */
+	sbus_request(dd, ra, 0x01, WRITE_SBUS_RECEIVER, 0x00000240);
+	/* step 4: set starting IMEM address for burst download */
+	sbus_request(dd, ra, 0x03, WRITE_SBUS_RECEIVER, 0x80000000);
+	/* step 5: download the SBus Master machine code */
+	for (i = 0; i < fdet->firmware_len; i += 4) {
+		sbus_request(dd, ra, 0x14, WRITE_SBUS_RECEIVER,
+					*(u32 *)&fdet->firmware_ptr[i]);
+	}
+	/* step 6: set IMEM_CNTL_EN off */
+	sbus_request(dd, ra, 0x01, WRITE_SBUS_RECEIVER, 0x00000040);
+	/* step 7: turn ECC on */
+	sbus_request(dd, ra, 0x16, WRITE_SBUS_RECEIVER, 0x000c0000);
+
+	/* steps 8-11: run the RSA engine */
+	err = run_rsa(dd, "SBus", fdet->signature);
+	if (err)
+		return err;
+
+	/* step 12: set SPICO_ENABLE on */
+	sbus_request(dd, ra, 0x01, WRITE_SBUS_RECEIVER, 0x00000140);
+
+	return 0;
+}
+
+static int load_pcie_serdes_firmware(struct hfi1_devdata *dd,
+				     struct firmware_details *fdet)
+{
+	int i;
+	const u8 ra = SBUS_MASTER_BROADCAST; /* receiver address */
+
+	dd_dev_info(dd, "Downloading PCIe firmware\n");
+
+	/* step 1: load security variables */
+	load_security_variables(dd, fdet);
+	/* step 2: assert single step (halts the SBus Master spico) */
+	sbus_request(dd, ra, 0x05, WRITE_SBUS_RECEIVER, 0x00000001);
+	/* step 3: enable XDMEM access */
+	sbus_request(dd, ra, 0x01, WRITE_SBUS_RECEIVER, 0x00000d40);
+	/* step 4: load firmware into SBus Master XDMEM */
+	/* NOTE: the dmem address, write_en, and wdata are all pre-packed,
+	   we only need to pick up the bytes and write them */
+	for (i = 0; i < fdet->firmware_len; i += 4) {
+		sbus_request(dd, ra, 0x04, WRITE_SBUS_RECEIVER,
+					*(u32 *)&fdet->firmware_ptr[i]);
+	}
+	/* step 5: disable XDMEM access */
+	sbus_request(dd, ra, 0x01, WRITE_SBUS_RECEIVER, 0x00000140);
+	/* step 6: allow SBus Spico to run */
+	sbus_request(dd, ra, 0x05, WRITE_SBUS_RECEIVER, 0x00000000);
+
+	/* steps 7-11: run RSA, if it succeeds, firmware is available to
+	   be swapped */
+	return run_rsa(dd, "PCIe serdes", fdet->signature);
+}
+
+/*
+ * Set the given broadcast values on the given list of devices.
+ */
+static void set_serdes_broadcast(struct hfi1_devdata *dd, u8 bg1, u8 bg2,
+				 const u8 *addrs, int count)
+{
+	while (--count >= 0) {
+		/*
+		 * Set BROADCAST_GROUP_1 and BROADCAST_GROUP_2, leave
+		 * defaults for everything else.  Do not read-modify-write,
+		 * per instruction from the manufacturer.
+		 *
+		 * Register 0xfd:
+		 *	bits    what
+		 *	-----	---------------------------------
+		 *	  0	IGNORE_BROADCAST  (default 0)
+		 *	11:4	BROADCAST_GROUP_1 (default 0xff)
+		 *	23:16	BROADCAST_GROUP_2 (default 0xff)
+		 */
+		sbus_request(dd, addrs[count], 0xfd, WRITE_SBUS_RECEIVER,
+				(u32)bg1 << 4 | (u32)bg2 << 16);
+	}
+}
+
+int acquire_hw_mutex(struct hfi1_devdata *dd)
+{
+	unsigned long timeout;
+	int try = 0;
+	u8 mask = 1 << dd->hfi1_id;
+	u8 user;
+
+retry:
+	timeout = msecs_to_jiffies(HM_TIMEOUT) + jiffies;
+	while (1) {
+		write_csr(dd, ASIC_CFG_MUTEX, mask);
+		user = (u8)read_csr(dd, ASIC_CFG_MUTEX);
+		if (user == mask)
+			return 0; /* success */
+		if (time_after(jiffies, timeout))
+			break; /* timed out */
+		msleep(20);
+	}
+
+	/* timed out */
+	dd_dev_err(dd,
+		"Unable to acquire hardware mutex, mutex mask %u, my mask %u (%s)\n",
+		(u32)user, (u32)mask, (try == 0) ? "retrying" : "giving up");
+
+	if (try == 0) {
+		/* break mutex and retry */
+		write_csr(dd, ASIC_CFG_MUTEX, 0);
+		try++;
+		goto retry;
+	}
+
+	return -EBUSY;
+}
+
+void release_hw_mutex(struct hfi1_devdata *dd)
+{
+	write_csr(dd, ASIC_CFG_MUTEX, 0);
+}
+
+void set_sbus_fast_mode(struct hfi1_devdata *dd)
+{
+	write_csr(dd, ASIC_CFG_SBUS_EXECUTE,
+				ASIC_CFG_SBUS_EXECUTE_FAST_MODE_SMASK);
+}
+
+void clear_sbus_fast_mode(struct hfi1_devdata *dd)
+{
+	u64 reg, count = 0;
+
+	reg = read_csr(dd, ASIC_STS_SBUS_COUNTERS);
+	while (SBUS_COUNTER(reg, EXECUTE) !=
+	       SBUS_COUNTER(reg, RCV_DATA_VALID)) {
+		if (count++ >= SBUS_MAX_POLL_COUNT)
+			break;
+		udelay(1);
+		reg = read_csr(dd, ASIC_STS_SBUS_COUNTERS);
+	}
+	write_csr(dd, ASIC_CFG_SBUS_EXECUTE, 0);
+}
+
+int load_firmware(struct hfi1_devdata *dd)
+{
+	int ret;
+
+	if (fw_sbus_load || fw_fabric_serdes_load) {
+		ret = acquire_hw_mutex(dd);
+		if (ret)
+			return ret;
+
+		set_sbus_fast_mode(dd);
+
+		/*
+		 * The SBus contains part of the fabric firmware and so must
+		 * also be downloaded.
+		 */
+		if (fw_sbus_load) {
+			turn_off_spicos(dd, SPICO_SBUS);
+			ret = load_sbus_firmware(dd, &fw_sbus);
+			if (ret)
+				goto clear;
+		}
+
+		if (fw_fabric_serdes_load) {
+			set_serdes_broadcast(dd, all_fabric_serdes_broadcast,
+					fabric_serdes_broadcast[dd->hfi1_id],
+					fabric_serdes_addrs[dd->hfi1_id],
+					NUM_FABRIC_SERDES);
+			turn_off_spicos(dd, SPICO_FABRIC);
+			ret = load_fabric_serdes_firmware(dd, &fw_fabric);
+		}
+
+clear:
+		clear_sbus_fast_mode(dd);
+		release_hw_mutex(dd);
+		if (ret)
+			return ret;
+	}
+
+	if (fw_8051_load) {
+		ret = load_8051_firmware(dd, &fw_8051);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+int hfi1_firmware_init(struct hfi1_devdata *dd)
+{
+	/* only RTL can use these */
+	if (dd->icode != ICODE_RTL_SILICON) {
+		fw_fabric_serdes_load = 0;
+		fw_pcie_serdes_load = 0;
+		fw_sbus_load = 0;
+	}
+
+	/* no 8051 or QSFP on simulator */
+	if (dd->icode == ICODE_FUNCTIONAL_SIMULATOR) {
+		fw_8051_load = 0;
+		platform_config_load = 0;
+	}
+
+	if (!fw_8051_name) {
+		if (dd->icode == ICODE_RTL_SILICON)
+			fw_8051_name = DEFAULT_FW_8051_NAME_ASIC;
+		else
+			fw_8051_name = DEFAULT_FW_8051_NAME_FPGA;
+	}
+	if (!fw_fabric_serdes_name)
+		fw_fabric_serdes_name = DEFAULT_FW_FABRIC_NAME;
+	if (!fw_sbus_name)
+		fw_sbus_name = DEFAULT_FW_SBUS_NAME;
+	if (!fw_pcie_serdes_name)
+		fw_pcie_serdes_name = DEFAULT_FW_PCIE_NAME;
+	if (!platform_config_name)
+		platform_config_name = DEFAULT_PLATFORM_CONFIG_NAME;
+
+	return obtain_firmware(dd);
+}
+
+int parse_platform_config(struct hfi1_devdata *dd)
+{
+	struct platform_config_cache *pcfgcache = &dd->pcfg_cache;
+	u32 *ptr = NULL;
+	u32 header1 = 0, header2 = 0, magic_num = 0, crc = 0;
+	u32 record_idx = 0, table_type = 0, table_length_dwords = 0;
+
+	if (platform_config == NULL) {
+		dd_dev_info(dd, "%s: Missing config file\n", __func__);
+		goto bail;
+	}
+	ptr = (u32 *)platform_config->data;
+
+	magic_num = *ptr;
+	ptr++;
+	if (magic_num != PLATFORM_CONFIG_MAGIC_NUM) {
+		dd_dev_info(dd, "%s: Bad config file\n", __func__);
+		goto bail;
+	}
+
+	while (ptr < (u32 *)(platform_config->data + platform_config->size)) {
+		header1 = *ptr;
+		header2 = *(ptr + 1);
+		if (header1 != ~header2) {
+			dd_dev_info(dd, "%s: Failed validation at offset %ld\n",
+				__func__, (ptr - (u32 *)platform_config->data));
+			goto bail;
+		}
+
+		record_idx = *ptr &
+			((1 << PLATFORM_CONFIG_HEADER_RECORD_IDX_LEN_BITS) - 1);
+
+		table_length_dwords = (*ptr >>
+				PLATFORM_CONFIG_HEADER_TABLE_LENGTH_SHIFT) &
+		      ((1 << PLATFORM_CONFIG_HEADER_TABLE_LENGTH_LEN_BITS) - 1);
+
+		table_type = (*ptr >> PLATFORM_CONFIG_HEADER_TABLE_TYPE_SHIFT) &
+			((1 << PLATFORM_CONFIG_HEADER_TABLE_TYPE_LEN_BITS) - 1);
+
+		/* Done with this set of headers */
+		ptr += 2;
+
+		if (record_idx) {
+			/* data table */
+			switch (table_type) {
+			case PLATFORM_CONFIG_SYSTEM_TABLE:
+				pcfgcache->config_tables[table_type].num_table =
+									1;
+				break;
+			case PLATFORM_CONFIG_PORT_TABLE:
+				pcfgcache->config_tables[table_type].num_table =
+									2;
+				break;
+			case PLATFORM_CONFIG_RX_PRESET_TABLE:
+				/* fall through */
+			case PLATFORM_CONFIG_TX_PRESET_TABLE:
+				/* fall through */
+			case PLATFORM_CONFIG_QSFP_ATTEN_TABLE:
+				/* fall through */
+			case PLATFORM_CONFIG_VARIABLE_SETTINGS_TABLE:
+				pcfgcache->config_tables[table_type].num_table =
+							table_length_dwords;
+				break;
+			default:
+				dd_dev_info(dd,
+				      "%s: Unknown data table %d, offset %ld\n",
+					__func__, table_type,
+				       (ptr - (u32 *)platform_config->data));
+				goto bail; /* We don't trust this file now */
+			}
+			pcfgcache->config_tables[table_type].table = ptr;
+		} else {
+			/* metadata table */
+			switch (table_type) {
+			case PLATFORM_CONFIG_SYSTEM_TABLE:
+				/* fall through */
+			case PLATFORM_CONFIG_PORT_TABLE:
+				/* fall through */
+			case PLATFORM_CONFIG_RX_PRESET_TABLE:
+				/* fall through */
+			case PLATFORM_CONFIG_TX_PRESET_TABLE:
+				/* fall through */
+			case PLATFORM_CONFIG_QSFP_ATTEN_TABLE:
+				/* fall through */
+			case PLATFORM_CONFIG_VARIABLE_SETTINGS_TABLE:
+				break;
+			default:
+				dd_dev_info(dd,
+				  "%s: Unknown metadata table %d, offset %ld\n",
+				  __func__, table_type,
+				  (ptr - (u32 *)platform_config->data));
+				goto bail; /* We don't trust this file now */
+			}
+			pcfgcache->config_tables[table_type].table_metadata =
+									ptr;
+		}
+
+		/* Calculate and check table crc */
+		crc = crc32_le(~(u32)0, (unsigned char const *)ptr,
+				(table_length_dwords * 4));
+		crc ^= ~(u32)0;
+
+		/* Jump the table */
+		ptr += table_length_dwords;
+		if (crc != *ptr) {
+			dd_dev_info(dd, "%s: Failed CRC check at offset %ld\n",
+				__func__, (ptr - (u32 *)platform_config->data));
+			goto bail;
+		}
+		/* Jump the CRC DWORD */
+		ptr++;
+	}
+
+	pcfgcache->cache_valid = 1;
+	return 0;
+bail:
+	memset(pcfgcache, 0, sizeof(struct platform_config_cache));
+	return -EINVAL;
+}
+
+static int get_platform_fw_field_metadata(struct hfi1_devdata *dd, int table,
+		int field, u32 *field_len_bits, u32 *field_start_bits)
+{
+	struct platform_config_cache *pcfgcache = &dd->pcfg_cache;
+	u32 *src_ptr = NULL;
+
+	if (!pcfgcache->cache_valid)
+		return -EINVAL;
+
+	switch (table) {
+	case PLATFORM_CONFIG_SYSTEM_TABLE:
+		/* fall through */
+	case PLATFORM_CONFIG_PORT_TABLE:
+		/* fall through */
+	case PLATFORM_CONFIG_RX_PRESET_TABLE:
+		/* fall through */
+	case PLATFORM_CONFIG_TX_PRESET_TABLE:
+		/* fall through */
+	case PLATFORM_CONFIG_QSFP_ATTEN_TABLE:
+		/* fall through */
+	case PLATFORM_CONFIG_VARIABLE_SETTINGS_TABLE:
+		if (field && field < platform_config_table_limits[table])
+			src_ptr =
+			pcfgcache->config_tables[table].table_metadata + field;
+		break;
+	default:
+		dd_dev_info(dd, "%s: Unknown table\n", __func__);
+		break;
+	}
+
+	if (!src_ptr)
+		return -EINVAL;
+
+	if (field_start_bits)
+		*field_start_bits = *src_ptr &
+		      ((1 << METADATA_TABLE_FIELD_START_LEN_BITS) - 1);
+
+	if (field_len_bits)
+		*field_len_bits = (*src_ptr >> METADATA_TABLE_FIELD_LEN_SHIFT)
+		       & ((1 << METADATA_TABLE_FIELD_LEN_LEN_BITS) - 1);
+
+	return 0;
+}
+
+/* This is the central interface to getting data out of the platform config
+ * file. It depends on parse_platform_config() having populated the
+ * platform_config_cache in hfi1_devdata, and checks the cache_valid member to
+ * validate the sanity of the cache.
+ *
+ * The non-obvious parameters:
+ * @table_index: Acts as a look up key into which instance of the tables the
+ * relevant field is fetched from.
+ *
+ * This applies to the data tables that have multiple instances. The port table
+ * is an exception to this rule as each HFI only has one port and thus the
+ * relevant table can be distinguished by hfi_id.
+ *
+ * @data: pointer to memory that will be populated with the field requested.
+ * @len: length of memory pointed by @data in bytes.
+ */
+int get_platform_config_field(struct hfi1_devdata *dd,
+			enum platform_config_table_type_encoding table_type,
+			int table_index, int field_index, u32 *data, u32 len)
+{
+	int ret = 0, wlen = 0, seek = 0;
+	u32 field_len_bits = 0, field_start_bits = 0, *src_ptr = NULL;
+	struct platform_config_cache *pcfgcache = &dd->pcfg_cache;
+
+	if (data)
+		memset(data, 0, len);
+	else
+		return -EINVAL;
+
+	ret = get_platform_fw_field_metadata(dd, table_type, field_index,
+					&field_len_bits, &field_start_bits);
+	if (ret)
+		return -EINVAL;
+
+	/* Convert length to bits */
+	len *= 8;
+
+	/* Our metadata function checked cache_valid and field_index for us */
+	switch (table_type) {
+	case PLATFORM_CONFIG_SYSTEM_TABLE:
+		src_ptr = pcfgcache->config_tables[table_type].table;
+
+		if (field_index != SYSTEM_TABLE_QSFP_POWER_CLASS_MAX) {
+			if (len < field_len_bits)
+				return -EINVAL;
+
+			seek = field_start_bits/8;
+			wlen = field_len_bits/8;
+
+			src_ptr = (u32 *)((u8 *)src_ptr + seek);
+
+			/* We expect the field to be byte aligned and whole byte
+			 * lengths if we are here */
+			memcpy(data, src_ptr, wlen);
+			return 0;
+		}
+		break;
+	case PLATFORM_CONFIG_PORT_TABLE:
+		/* Port table is 4 DWORDS in META_VERSION 0 */
+		src_ptr = dd->hfi1_id ?
+			pcfgcache->config_tables[table_type].table + 4 :
+			pcfgcache->config_tables[table_type].table;
+		break;
+	case PLATFORM_CONFIG_RX_PRESET_TABLE:
+		/* fall through */
+	case PLATFORM_CONFIG_TX_PRESET_TABLE:
+		/* fall through */
+	case PLATFORM_CONFIG_QSFP_ATTEN_TABLE:
+		/* fall through */
+	case PLATFORM_CONFIG_VARIABLE_SETTINGS_TABLE:
+		src_ptr = pcfgcache->config_tables[table_type].table;
+
+		if (table_index <
+			pcfgcache->config_tables[table_type].num_table)
+			src_ptr += table_index;
+		else
+			src_ptr = NULL;
+		break;
+	default:
+		dd_dev_info(dd, "%s: Unknown table\n", __func__);
+		break;
+	}
+
+	if (!src_ptr || len < field_len_bits)
+		return -EINVAL;
+
+	src_ptr += (field_start_bits/32);
+	*data = (*src_ptr >> (field_start_bits % 32)) &
+			((1 << field_len_bits) - 1);
+
+	return 0;
+}
+
+/*
+ * Download the firmware needed for the Gen3 PCIe SerDes.  An update
+ * to the SBus firmware is needed before updating the PCIe firmware.
+ *
+ * Note: caller must be holding the HW mutex.
+ */
+int load_pcie_firmware(struct hfi1_devdata *dd)
+{
+	int ret = 0;
+
+	/* both firmware loads below use the SBus */
+	set_sbus_fast_mode(dd);
+
+	if (fw_sbus_load) {
+		turn_off_spicos(dd, SPICO_SBUS);
+		ret = load_sbus_firmware(dd, &fw_sbus);
+		if (ret)
+			goto done;
+	}
+
+	if (fw_pcie_serdes_load) {
+		dd_dev_info(dd, "Setting PCIe SerDes broadcast\n");
+		set_serdes_broadcast(dd, all_pcie_serdes_broadcast,
+					pcie_serdes_broadcast[dd->hfi1_id],
+					pcie_serdes_addrs[dd->hfi1_id],
+					NUM_PCIE_SERDES);
+		ret = load_pcie_serdes_firmware(dd, &fw_pcie);
+		if (ret)
+			goto done;
+	}
+
+done:
+	clear_sbus_fast_mode(dd);
+
+	return ret;
+}
+
+/*
+ * Read the GUID from the hardware, store it in dd.
+ */
+void read_guid(struct hfi1_devdata *dd)
+{
+	dd->base_guid = read_csr(dd, DC_DC8051_CFG_LOCAL_GUID);
+	dd_dev_info(dd, "GUID %llx",
+		(unsigned long long)dd->base_guid);
+}
diff --git a/drivers/staging/rdma/hfi1/hfi.h b/drivers/staging/rdma/hfi1/hfi.h
new file mode 100644
index 000000000000..8ca171bf3e36
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/hfi.h
@@ -0,0 +1,1821 @@
+#ifndef _HFI1_KERNEL_H
+#define _HFI1_KERNEL_H
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/interrupt.h>
+#include <linux/pci.h>
+#include <linux/dma-mapping.h>
+#include <linux/mutex.h>
+#include <linux/list.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/io.h>
+#include <linux/fs.h>
+#include <linux/completion.h>
+#include <linux/kref.h>
+#include <linux/sched.h>
+#include <linux/cdev.h>
+#include <linux/delay.h>
+#include <linux/kthread.h>
+
+#include "chip_registers.h"
+#include "common.h"
+#include "verbs.h"
+#include "pio.h"
+#include "chip.h"
+#include "mad.h"
+#include "qsfp.h"
+#include "platform_config.h"
+
+/* bumped 1 from s/w major version of TrueScale */
+#define HFI1_CHIP_VERS_MAJ 3U
+
+/* don't care about this except printing */
+#define HFI1_CHIP_VERS_MIN 0U
+
+/* The Organization Unique Identifier (Mfg code), and its position in GUID */
+#define HFI1_OUI 0x001175
+#define HFI1_OUI_LSB 40
+
+#define DROP_PACKET_OFF		0
+#define DROP_PACKET_ON		1
+
+extern unsigned long hfi1_cap_mask;
+#define HFI1_CAP_KGET_MASK(mask, cap) ((mask) & HFI1_CAP_##cap)
+#define HFI1_CAP_UGET_MASK(mask, cap) \
+	(((mask) >> HFI1_CAP_USER_SHIFT) & HFI1_CAP_##cap)
+#define HFI1_CAP_KGET(cap) (HFI1_CAP_KGET_MASK(hfi1_cap_mask, cap))
+#define HFI1_CAP_UGET(cap) (HFI1_CAP_UGET_MASK(hfi1_cap_mask, cap))
+#define HFI1_CAP_IS_KSET(cap) (!!HFI1_CAP_KGET(cap))
+#define HFI1_CAP_IS_USET(cap) (!!HFI1_CAP_UGET(cap))
+#define HFI1_MISC_GET() ((hfi1_cap_mask >> HFI1_CAP_MISC_SHIFT) & \
+			HFI1_CAP_MISC_MASK)
+
+/*
+ * per driver stats, either not device nor port-specific, or
+ * summed over all of the devices and ports.
+ * They are described by name via ipathfs filesystem, so layout
+ * and number of elements can change without breaking compatibility.
+ * If members are added or deleted hfi1_statnames[] in debugfs.c must
+ * change to match.
+ */
+struct hfi1_ib_stats {
+	__u64 sps_ints; /* number of interrupts handled */
+	__u64 sps_errints; /* number of error interrupts */
+	__u64 sps_txerrs; /* tx-related packet errors */
+	__u64 sps_rcverrs; /* non-crc rcv packet errors */
+	__u64 sps_hwerrs; /* hardware errors reported (parity, etc.) */
+	__u64 sps_nopiobufs; /* no pio bufs avail from kernel */
+	__u64 sps_ctxts; /* number of contexts currently open */
+	__u64 sps_lenerrs; /* number of kernel packets where RHF != LRH len */
+	__u64 sps_buffull;
+	__u64 sps_hdrfull;
+};
+
+extern struct hfi1_ib_stats hfi1_stats;
+extern const struct pci_error_handlers hfi1_pci_err_handler;
+
+/*
+ * First-cut criterion for "device is active" is
+ * two thousand dwords combined Tx, Rx traffic per
+ * 5-second interval. SMA packets are 64 dwords,
+ * and occur "a few per second", presumably each way.
+ */
+#define HFI1_TRAFFIC_ACTIVE_THRESHOLD (2000)
+
+/*
+ * Below contains all data related to a single context (formerly called port).
+ */
+
+#ifdef CONFIG_DEBUG_FS
+struct hfi1_opcode_stats_perctx;
+#endif
+
+/*
+ * struct ps_state keeps state associated with RX queue "prescanning"
+ * (prescanning for FECNs, and BECNs), if prescanning is in use.
+ */
+struct ps_state {
+	u32 ps_head;
+	int initialized;
+};
+
+struct ctxt_eager_bufs {
+	ssize_t size;            /* total size of eager buffers */
+	u32 count;               /* size of buffers array */
+	u32 numbufs;             /* number of buffers allocated */
+	u32 alloced;             /* number of rcvarray entries used */
+	u32 rcvtid_size;         /* size of each eager rcv tid */
+	u32 threshold;           /* head update threshold */
+	struct eager_buffer {
+		void *addr;
+		dma_addr_t phys;
+		ssize_t len;
+	} *buffers;
+	struct {
+		void *addr;
+		dma_addr_t phys;
+	} *rcvtids;
+};
+
+struct hfi1_ctxtdata {
+	/* shadow the ctxt's RcvCtrl register */
+	u64 rcvctrl;
+	/* rcvhdrq base, needs mmap before useful */
+	void *rcvhdrq;
+	/* kernel virtual address where hdrqtail is updated */
+	volatile __le64 *rcvhdrtail_kvaddr;
+	/*
+	 * Shared page for kernel to signal user processes that send buffers
+	 * need disarming.  The process should call HFI1_CMD_DISARM_BUFS
+	 * or HFI1_CMD_ACK_EVENT with IPATH_EVENT_DISARM_BUFS set.
+	 */
+	unsigned long *user_event_mask;
+	/* when waiting for rcv or pioavail */
+	wait_queue_head_t wait;
+	/* rcvhdrq size (for freeing) */
+	size_t rcvhdrq_size;
+	/* number of rcvhdrq entries */
+	u16 rcvhdrq_cnt;
+	/* size of each of the rcvhdrq entries */
+	u16 rcvhdrqentsize;
+	/* mmap of hdrq, must fit in 44 bits */
+	dma_addr_t rcvhdrq_phys;
+	dma_addr_t rcvhdrqtailaddr_phys;
+	struct ctxt_eager_bufs egrbufs;
+	/* this receive context's assigned PIO ACK send context */
+	struct send_context *sc;
+
+	/* dynamic receive available interrupt timeout */
+	u32 rcvavail_timeout;
+	/*
+	 * number of opens (including slave sub-contexts) on this instance
+	 * (ignoring forks, dup, etc. for now)
+	 */
+	int cnt;
+	/*
+	 * how much space to leave at start of eager TID entries for
+	 * protocol use, on each TID
+	 */
+	/* instead of calculating it */
+	unsigned ctxt;
+	/* non-zero if ctxt is being shared. */
+	u16 subctxt_cnt;
+	/* non-zero if ctxt is being shared. */
+	u16 subctxt_id;
+	u8 uuid[16];
+	/* job key */
+	u16 jkey;
+	/* number of RcvArray groups for this context. */
+	u32 rcv_array_groups;
+	/* index of first eager TID entry. */
+	u32 eager_base;
+	/* number of expected TID entries */
+	u32 expected_count;
+	/* index of first expected TID entry. */
+	u32 expected_base;
+	/* cursor into the exp group sets */
+	atomic_t tidcursor;
+	/* number of exp TID groups assigned to the ctxt */
+	u16 numtidgroups;
+	/* size of exp TID group fields in tidusemap */
+	u16 tidmapcnt;
+	/* exp TID group usage bitfield array */
+	unsigned long *tidusemap;
+	/* pinned pages for exp sends, allocated at open */
+	struct page **tid_pg_list;
+	/* dma handles for exp tid pages */
+	dma_addr_t *physshadow;
+	/* lock protecting all Expected TID data */
+	spinlock_t exp_lock;
+	/* number of pio bufs for this ctxt (all procs, if shared) */
+	u32 piocnt;
+	/* first pio buffer for this ctxt */
+	u32 pio_base;
+	/* chip offset of PIO buffers for this ctxt */
+	u32 piobufs;
+	/* per-context configuration flags */
+	u16 flags;
+	/* per-context event flags for fileops/intr communication */
+	unsigned long event_flags;
+	/* WAIT_RCV that timed out, no interrupt */
+	u32 rcvwait_to;
+	/* WAIT_PIO that timed out, no interrupt */
+	u32 piowait_to;
+	/* WAIT_RCV already happened, no wait */
+	u32 rcvnowait;
+	/* WAIT_PIO already happened, no wait */
+	u32 pionowait;
+	/* total number of polled urgent packets */
+	u32 urgent;
+	/* saved total number of polled urgent packets for poll edge trigger */
+	u32 urgent_poll;
+	/* pid of process using this ctxt */
+	pid_t pid;
+	pid_t subpid[HFI1_MAX_SHARED_CTXTS];
+	/* same size as task_struct .comm[], command that opened context */
+	char comm[16];
+	/* so file ops can get at unit */
+	struct hfi1_devdata *dd;
+	/* so functions that need physical port can get it easily */
+	struct hfi1_pportdata *ppd;
+	/* A page of memory for rcvhdrhead, rcvegrhead, rcvegrtail * N */
+	void *subctxt_uregbase;
+	/* An array of pages for the eager receive buffers * N */
+	void *subctxt_rcvegrbuf;
+	/* An array of pages for the eager header queue entries * N */
+	void *subctxt_rcvhdr_base;
+	/* The version of the library which opened this ctxt */
+	u32 userversion;
+	/* Bitmask of active slaves */
+	u32 active_slaves;
+	/* Type of packets or conditions we want to poll for */
+	u16 poll_type;
+	/* receive packet sequence counter */
+	u8 seq_cnt;
+	u8 redirect_seq_cnt;
+	/* ctxt rcvhdrq head offset */
+	u32 head;
+	u32 pkt_count;
+	/* QPs waiting for context processing */
+	struct list_head qp_wait_list;
+	/* interrupt handling */
+	u64 imask;	/* clear interrupt mask */
+	int ireg;	/* clear interrupt register */
+	unsigned numa_id; /* numa node of this context */
+	/* verbs stats per CTX */
+	struct hfi1_opcode_stats_perctx *opstats;
+	/*
+	 * This is the kernel thread that will keep making
+	 * progress on the user sdma requests behind the scenes.
+	 * There is one per context (shared contexts use the master's).
+	 */
+	struct task_struct *progress;
+	struct list_head sdma_queues;
+	spinlock_t sdma_qlock;
+
+#ifdef CONFIG_PRESCAN_RXQ
+	struct ps_state ps_state;
+#endif /* CONFIG_PRESCAN_RXQ */
+
+	/*
+	 * The interrupt handler for a particular receive context can vary
+	 * throughout it's lifetime. This is not a lock protected data member so
+	 * it must be updated atomically and the prev and new value must always
+	 * be valid. Worst case is we process an extra interrupt and up to 64
+	 * packets with the wrong interrupt handler.
+	 */
+	void (*do_interrupt)(struct hfi1_ctxtdata *rcd);
+};
+
+/*
+ * Represents a single packet at a high level. Put commonly computed things in
+ * here so we do not have to keep doing them over and over. The rule of thumb is
+ * if something is used one time to derive some value, store that something in
+ * here. If it is used multiple times, then store the result of that derivation
+ * in here.
+ */
+struct hfi1_packet {
+	void *ebuf;
+	void *hdr;
+	struct hfi1_ctxtdata *rcd;
+	__le32 *rhf_addr;
+	struct hfi1_qp *qp;
+	struct hfi1_other_headers *ohdr;
+	u64 rhf;
+	u32 maxcnt;
+	u32 rhqoff;
+	u32 hdrqtail;
+	int numpkt;
+	u16 tlen;
+	u16 hlen;
+	s16 etail;
+	u16 rsize;
+	u8 updegr;
+	u8 rcv_flags;
+	u8 etype;
+};
+
+static inline bool has_sc4_bit(struct hfi1_packet *p)
+{
+	return !!rhf_dc_info(p->rhf);
+}
+
+/*
+ * Private data for snoop/capture support.
+ */
+struct hfi1_snoop_data {
+	int mode_flag;
+	struct cdev cdev;
+	struct device *class_dev;
+	spinlock_t snoop_lock;
+	struct list_head queue;
+	wait_queue_head_t waitq;
+	void *filter_value;
+	int (*filter_callback)(void *hdr, void *data, void *value);
+	u64 dcc_cfg; /* saved value of DCC Cfg register */
+};
+
+/* snoop mode_flag values */
+#define HFI1_PORT_SNOOP_MODE     1U
+#define HFI1_PORT_CAPTURE_MODE   2U
+
+struct hfi1_sge_state;
+
+/*
+ * Get/Set IB link-level config parameters for f_get/set_ib_cfg()
+ * Mostly for MADs that set or query link parameters, also ipath
+ * config interfaces
+ */
+#define HFI1_IB_CFG_LIDLMC 0 /* LID (LS16b) and Mask (MS16b) */
+#define HFI1_IB_CFG_LWID_DG_ENB 1 /* allowed Link-width downgrade */
+#define HFI1_IB_CFG_LWID_ENB 2 /* allowed Link-width */
+#define HFI1_IB_CFG_LWID 3 /* currently active Link-width */
+#define HFI1_IB_CFG_SPD_ENB 4 /* allowed Link speeds */
+#define HFI1_IB_CFG_SPD 5 /* current Link spd */
+#define HFI1_IB_CFG_RXPOL_ENB 6 /* Auto-RX-polarity enable */
+#define HFI1_IB_CFG_LREV_ENB 7 /* Auto-Lane-reversal enable */
+#define HFI1_IB_CFG_LINKLATENCY 8 /* Link Latency (IB1.2 only) */
+#define HFI1_IB_CFG_HRTBT 9 /* IB heartbeat off/enable/auto; DDR/QDR only */
+#define HFI1_IB_CFG_OP_VLS 10 /* operational VLs */
+#define HFI1_IB_CFG_VL_HIGH_CAP 11 /* num of VL high priority weights */
+#define HFI1_IB_CFG_VL_LOW_CAP 12 /* num of VL low priority weights */
+#define HFI1_IB_CFG_OVERRUN_THRESH 13 /* IB overrun threshold */
+#define HFI1_IB_CFG_PHYERR_THRESH 14 /* IB PHY error threshold */
+#define HFI1_IB_CFG_LINKDEFAULT 15 /* IB link default (sleep/poll) */
+#define HFI1_IB_CFG_PKEYS 16 /* update partition keys */
+#define HFI1_IB_CFG_MTU 17 /* update MTU in IBC */
+#define HFI1_IB_CFG_VL_HIGH_LIMIT 19
+#define HFI1_IB_CFG_PMA_TICKS 20 /* PMA sample tick resolution */
+#define HFI1_IB_CFG_PORT 21 /* switch port we are connected to */
+
+/*
+ * HFI or Host Link States
+ *
+ * These describe the states the driver thinks the logical and physical
+ * states are in.  Used as an argument to set_link_state().  Implemented
+ * as bits for easy multi-state checking.  The actual state can only be
+ * one.
+ */
+#define __HLS_UP_INIT_BP	0
+#define __HLS_UP_ARMED_BP	1
+#define __HLS_UP_ACTIVE_BP	2
+#define __HLS_DN_DOWNDEF_BP	3	/* link down default */
+#define __HLS_DN_POLL_BP	4
+#define __HLS_DN_DISABLE_BP	5
+#define __HLS_DN_OFFLINE_BP	6
+#define __HLS_VERIFY_CAP_BP	7
+#define __HLS_GOING_UP_BP	8
+#define __HLS_GOING_OFFLINE_BP  9
+#define __HLS_LINK_COOLDOWN_BP 10
+
+#define HLS_UP_INIT	  (1 << __HLS_UP_INIT_BP)
+#define HLS_UP_ARMED	  (1 << __HLS_UP_ARMED_BP)
+#define HLS_UP_ACTIVE	  (1 << __HLS_UP_ACTIVE_BP)
+#define HLS_DN_DOWNDEF	  (1 << __HLS_DN_DOWNDEF_BP) /* link down default */
+#define HLS_DN_POLL	  (1 << __HLS_DN_POLL_BP)
+#define HLS_DN_DISABLE	  (1 << __HLS_DN_DISABLE_BP)
+#define HLS_DN_OFFLINE	  (1 << __HLS_DN_OFFLINE_BP)
+#define HLS_VERIFY_CAP	  (1 << __HLS_VERIFY_CAP_BP)
+#define HLS_GOING_UP	  (1 << __HLS_GOING_UP_BP)
+#define HLS_GOING_OFFLINE (1 << __HLS_GOING_OFFLINE_BP)
+#define HLS_LINK_COOLDOWN (1 << __HLS_LINK_COOLDOWN_BP)
+
+#define HLS_UP (HLS_UP_INIT | HLS_UP_ARMED | HLS_UP_ACTIVE)
+
+/* use this MTU size if none other is given */
+#define HFI1_DEFAULT_ACTIVE_MTU 8192
+/* use this MTU size as the default maximum */
+#define HFI1_DEFAULT_MAX_MTU 8192
+/* default partition key */
+#define DEFAULT_PKEY 0xffff
+
+/*
+ * Possible fabric manager config parameters for fm_{get,set}_table()
+ */
+#define FM_TBL_VL_HIGH_ARB		1 /* Get/set VL high prio weights */
+#define FM_TBL_VL_LOW_ARB		2 /* Get/set VL low prio weights */
+#define FM_TBL_BUFFER_CONTROL		3 /* Get/set Buffer Control */
+#define FM_TBL_SC2VLNT			4 /* Get/set SC->VLnt */
+#define FM_TBL_VL_PREEMPT_ELEMS		5 /* Get (no set) VL preempt elems */
+#define FM_TBL_VL_PREEMPT_MATRIX	6 /* Get (no set) VL preempt matrix */
+
+/*
+ * Possible "operations" for f_rcvctrl(ppd, op, ctxt)
+ * these are bits so they can be combined, e.g.
+ * HFI1_RCVCTRL_INTRAVAIL_ENB | HFI1_RCVCTRL_CTXT_ENB
+ */
+#define HFI1_RCVCTRL_TAILUPD_ENB 0x01
+#define HFI1_RCVCTRL_TAILUPD_DIS 0x02
+#define HFI1_RCVCTRL_CTXT_ENB 0x04
+#define HFI1_RCVCTRL_CTXT_DIS 0x08
+#define HFI1_RCVCTRL_INTRAVAIL_ENB 0x10
+#define HFI1_RCVCTRL_INTRAVAIL_DIS 0x20
+#define HFI1_RCVCTRL_PKEY_ENB 0x40  /* Note, default is enabled */
+#define HFI1_RCVCTRL_PKEY_DIS 0x80
+#define HFI1_RCVCTRL_TIDFLOW_ENB 0x0400
+#define HFI1_RCVCTRL_TIDFLOW_DIS 0x0800
+#define HFI1_RCVCTRL_ONE_PKT_EGR_ENB 0x1000
+#define HFI1_RCVCTRL_ONE_PKT_EGR_DIS 0x2000
+#define HFI1_RCVCTRL_NO_RHQ_DROP_ENB 0x4000
+#define HFI1_RCVCTRL_NO_RHQ_DROP_DIS 0x8000
+#define HFI1_RCVCTRL_NO_EGR_DROP_ENB 0x10000
+#define HFI1_RCVCTRL_NO_EGR_DROP_DIS 0x20000
+
+/* partition enforcement flags */
+#define HFI1_PART_ENFORCE_IN	0x1
+#define HFI1_PART_ENFORCE_OUT	0x2
+
+/* how often we check for synthetic counter wrap around */
+#define SYNTH_CNT_TIME 2
+
+/* Counter flags */
+#define CNTR_NORMAL		0x0 /* Normal counters, just read register */
+#define CNTR_SYNTH		0x1 /* Synthetic counters, saturate at all 1s */
+#define CNTR_DISABLED		0x2 /* Disable this counter */
+#define CNTR_32BIT		0x4 /* Simulate 64 bits for this counter */
+#define CNTR_VL			0x8 /* Per VL counter */
+#define CNTR_INVALID_VL		-1  /* Specifies invalid VL */
+#define CNTR_MODE_W		0x0
+#define CNTR_MODE_R		0x1
+
+/* VLs Supported/Operational */
+#define HFI1_MIN_VLS_SUPPORTED 1
+#define HFI1_MAX_VLS_SUPPORTED 8
+
+static inline void incr_cntr64(u64 *cntr)
+{
+	if (*cntr < (u64)-1LL)
+		(*cntr)++;
+}
+
+static inline void incr_cntr32(u32 *cntr)
+{
+	if (*cntr < (u32)-1LL)
+		(*cntr)++;
+}
+
+#define MAX_NAME_SIZE 64
+struct hfi1_msix_entry {
+	struct msix_entry msix;
+	void *arg;
+	char name[MAX_NAME_SIZE];
+	cpumask_var_t mask;
+};
+
+/* per-SL CCA information */
+struct cca_timer {
+	struct hrtimer hrtimer;
+	struct hfi1_pportdata *ppd; /* read-only */
+	int sl; /* read-only */
+	u16 ccti; /* read/write - current value of CCTI */
+};
+
+struct link_down_reason {
+	/*
+	 * SMA-facing value.  Should be set from .latest when
+	 * HLS_UP_* -> HLS_DN_* transition actually occurs.
+	 */
+	u8 sma;
+	u8 latest;
+};
+
+enum {
+	LO_PRIO_TABLE,
+	HI_PRIO_TABLE,
+	MAX_PRIO_TABLE
+};
+
+struct vl_arb_cache {
+	spinlock_t lock;
+	struct ib_vl_weight_elem table[VL_ARB_TABLE_SIZE];
+};
+
+/*
+ * The structure below encapsulates data relevant to a physical IB Port.
+ * Current chips support only one such port, but the separation
+ * clarifies things a bit. Note that to conform to IB conventions,
+ * port-numbers are one-based. The first or only port is port1.
+ */
+struct hfi1_pportdata {
+	struct hfi1_ibport ibport_data;
+
+	struct hfi1_devdata *dd;
+	struct kobject pport_cc_kobj;
+	struct kobject sc2vl_kobj;
+	struct kobject sl2sc_kobj;
+	struct kobject vl2mtu_kobj;
+
+	/* QSFP support */
+	struct qsfp_data qsfp_info;
+
+	/* GUID for this interface, in host order */
+	u64 guid;
+	/* GUID for peer interface, in host order */
+	u64 neighbor_guid;
+
+	/* up or down physical link state */
+	u32 linkup;
+
+	/*
+	 * this address is mapped read-only into user processes so they can
+	 * get status cheaply, whenever they want.  One qword of status per port
+	 */
+	u64 *statusp;
+
+	/* SendDMA related entries */
+
+	struct workqueue_struct *hfi1_wq;
+
+	/* move out of interrupt context */
+	struct work_struct link_vc_work;
+	struct work_struct link_up_work;
+	struct work_struct link_down_work;
+	struct work_struct sma_message_work;
+	struct work_struct freeze_work;
+	struct work_struct link_downgrade_work;
+	struct work_struct link_bounce_work;
+	/* host link state variables */
+	struct mutex hls_lock;
+	u32 host_link_state;
+
+	spinlock_t            sdma_alllock ____cacheline_aligned_in_smp;
+
+	u32 lstate;	/* logical link state */
+
+	/* these are the "32 bit" regs */
+
+	u32 ibmtu; /* The MTU programmed for this unit */
+	/*
+	 * Current max size IB packet (in bytes) including IB headers, that
+	 * we can send. Changes when ibmtu changes.
+	 */
+	u32 ibmaxlen;
+	u32 current_egress_rate; /* units [10^6 bits/sec] */
+	/* LID programmed for this instance */
+	u16 lid;
+	/* list of pkeys programmed; 0 if not set */
+	u16 pkeys[MAX_PKEY_VALUES];
+	u16 link_width_supported;
+	u16 link_width_downgrade_supported;
+	u16 link_speed_supported;
+	u16 link_width_enabled;
+	u16 link_width_downgrade_enabled;
+	u16 link_speed_enabled;
+	u16 link_width_active;
+	u16 link_width_downgrade_tx_active;
+	u16 link_width_downgrade_rx_active;
+	u16 link_speed_active;
+	u8 vls_supported;
+	u8 vls_operational;
+	/* LID mask control */
+	u8 lmc;
+	/* Rx Polarity inversion (compensate for ~tx on partner) */
+	u8 rx_pol_inv;
+
+	u8 hw_pidx;     /* physical port index */
+	u8 port;        /* IB port number and index into dd->pports - 1 */
+	/* type of neighbor node */
+	u8 neighbor_type;
+	u8 neighbor_normal;
+	u8 neighbor_fm_security; /* 1 if firmware checking is disabled */
+	u8 neighbor_port_number;
+	u8 is_sm_config_started;
+	u8 offline_disabled_reason;
+	u8 is_active_optimize_enabled;
+	u8 driver_link_ready;	/* driver ready for active link */
+	u8 link_enabled;	/* link enabled? */
+	u8 linkinit_reason;
+	u8 local_tx_rate;	/* rate given to 8051 firmware */
+
+	/* placeholders for IB MAD packet settings */
+	u8 overrun_threshold;
+	u8 phy_error_threshold;
+
+	/* used to override LED behavior */
+	u8 led_override;  /* Substituted for normal value, if non-zero */
+	u16 led_override_timeoff; /* delta to next timer event */
+	u8 led_override_vals[2]; /* Alternates per blink-frame */
+	u8 led_override_phase; /* Just counts, LSB picks from vals[] */
+	atomic_t led_override_timer_active;
+	/* Used to flash LEDs in override mode */
+	struct timer_list led_override_timer;
+	u32 sm_trap_qp;
+	u32 sa_qp;
+
+	/*
+	 * cca_timer_lock protects access to the per-SL cca_timer
+	 * structures (specifically the ccti member).
+	 */
+	spinlock_t cca_timer_lock ____cacheline_aligned_in_smp;
+	struct cca_timer cca_timer[OPA_MAX_SLS];
+
+	/* List of congestion control table entries */
+	struct ib_cc_table_entry_shadow ccti_entries[CC_TABLE_SHADOW_MAX];
+
+	/* congestion entries, each entry corresponding to a SL */
+	struct opa_congestion_setting_entry_shadow
+		congestion_entries[OPA_MAX_SLS];
+
+	/*
+	 * cc_state_lock protects (write) access to the per-port
+	 * struct cc_state.
+	 */
+	spinlock_t cc_state_lock ____cacheline_aligned_in_smp;
+
+	struct cc_state __rcu *cc_state;
+
+	/* Total number of congestion control table entries */
+	u16 total_cct_entry;
+
+	/* Bit map identifying service level */
+	u32 cc_sl_control_map;
+
+	/* CA's max number of 64 entry units in the congestion control table */
+	u8 cc_max_table_entries;
+
+	/* begin congestion log related entries
+	 * cc_log_lock protects all congestion log related data */
+	spinlock_t cc_log_lock ____cacheline_aligned_in_smp;
+	u8 threshold_cong_event_map[OPA_MAX_SLS/8];
+	u16 threshold_event_counter;
+	struct opa_hfi1_cong_log_event_internal cc_events[OPA_CONG_LOG_ELEMS];
+	int cc_log_idx; /* index for logging events */
+	int cc_mad_idx; /* index for reporting events */
+	/* end congestion log related entries */
+
+	struct vl_arb_cache vl_arb_cache[MAX_PRIO_TABLE];
+
+	/* port relative counter buffer */
+	u64 *cntrs;
+	/* port relative synthetic counter buffer */
+	u64 *scntrs;
+	/* we synthesize port_xmit_discards from several egress errors */
+	u64 port_xmit_discards;
+	u64 port_xmit_constraint_errors;
+	u64 port_rcv_constraint_errors;
+	/* count of 'link_err' interrupts from DC */
+	u64 link_downed;
+	/* number of times link retrained successfully */
+	u64 link_up;
+	/* port_ltp_crc_mode is returned in 'portinfo' MADs */
+	u16 port_ltp_crc_mode;
+	/* port_crc_mode_enabled is the crc we support */
+	u8 port_crc_mode_enabled;
+	/* mgmt_allowed is also returned in 'portinfo' MADs */
+	u8 mgmt_allowed;
+	u8 part_enforce; /* partition enforcement flags */
+	struct link_down_reason local_link_down_reason;
+	struct link_down_reason neigh_link_down_reason;
+	/* Value to be sent to link peer on LinkDown .*/
+	u8 remote_link_down_reason;
+	/* Error events that will cause a port bounce. */
+	u32 port_error_action;
+};
+
+typedef int (*rhf_rcv_function_ptr)(struct hfi1_packet *packet);
+
+typedef void (*opcode_handler)(struct hfi1_packet *packet);
+
+/* return values for the RHF receive functions */
+#define RHF_RCV_CONTINUE  0	/* keep going */
+#define RHF_RCV_DONE	  1	/* stop, this packet processed */
+#define RHF_RCV_REPROCESS 2	/* stop. retain this packet */
+
+struct rcv_array_data {
+	u8 group_size;
+	u16 ngroups;
+	u16 nctxt_extra;
+};
+
+struct per_vl_data {
+	u16 mtu;
+	struct send_context *sc;
+};
+
+/* 16 to directly index */
+#define PER_VL_SEND_CONTEXTS 16
+
+struct err_info_rcvport {
+	u8 status_and_code;
+	u64 packet_flit1;
+	u64 packet_flit2;
+};
+
+struct err_info_constraint {
+	u8 status;
+	u16 pkey;
+	u32 slid;
+};
+
+struct hfi1_temp {
+	unsigned int curr;       /* current temperature */
+	unsigned int lo_lim;     /* low temperature limit */
+	unsigned int hi_lim;     /* high temperature limit */
+	unsigned int crit_lim;   /* critical temperature limit */
+	u8 triggers;      /* temperature triggers */
+};
+
+/* device data struct now contains only "general per-device" info.
+ * fields related to a physical IB port are in a hfi1_pportdata struct.
+ */
+struct sdma_engine;
+struct sdma_vl_map;
+
+#define BOARD_VERS_MAX 96 /* how long the version string can be */
+#define SERIAL_MAX 16 /* length of the serial number */
+
+struct hfi1_devdata {
+	struct hfi1_ibdev verbs_dev;     /* must be first */
+	struct list_head list;
+	/* pointers to related structs for this device */
+	/* pci access data structure */
+	struct pci_dev *pcidev;
+	struct cdev user_cdev;
+	struct cdev diag_cdev;
+	struct cdev ui_cdev;
+	struct device *user_device;
+	struct device *diag_device;
+	struct device *ui_device;
+
+	/* mem-mapped pointer to base of chip regs */
+	u8 __iomem *kregbase;
+	/* end of mem-mapped chip space excluding sendbuf and user regs */
+	u8 __iomem *kregend;
+	/* physical address of chip for io_remap, etc. */
+	resource_size_t physaddr;
+	/* receive context data */
+	struct hfi1_ctxtdata **rcd;
+	/* send context data */
+	struct send_context_info *send_contexts;
+	/* map hardware send contexts to software index */
+	u8 *hw_to_sw;
+	/* spinlock for allocating and releasing send context resources */
+	spinlock_t sc_lock;
+	/* Per VL data. Enough for all VLs but not all elements are set/used. */
+	struct per_vl_data vld[PER_VL_SEND_CONTEXTS];
+	/* seqlock for sc2vl */
+	seqlock_t sc2vl_lock;
+	u64 sc2vl[4];
+	/* Send Context initialization lock. */
+	spinlock_t sc_init_lock;
+
+	/* fields common to all SDMA engines */
+
+	/* default flags to last descriptor */
+	u64 default_desc1;
+	volatile __le64                    *sdma_heads_dma; /* DMA'ed by chip */
+	dma_addr_t                          sdma_heads_phys;
+	void                               *sdma_pad_dma; /* DMA'ed by chip */
+	dma_addr_t                          sdma_pad_phys;
+	/* for deallocation */
+	size_t                              sdma_heads_size;
+	/* number from the chip */
+	u32                                 chip_sdma_engines;
+	/* num used */
+	u32                                 num_sdma;
+	/* lock for sdma_map */
+	spinlock_t                          sde_map_lock;
+	/* array of engines sized by num_sdma */
+	struct sdma_engine                 *per_sdma;
+	/* array of vl maps */
+	struct sdma_vl_map __rcu           *sdma_map;
+	/* SPC freeze waitqueue and variable */
+	wait_queue_head_t		  sdma_unfreeze_wq;
+	atomic_t			  sdma_unfreeze_count;
+
+
+	/* hfi1_pportdata, points to array of (physical) port-specific
+	 * data structs, indexed by pidx (0..n-1)
+	 */
+	struct hfi1_pportdata *pport;
+
+	/* mem-mapped pointer to base of PIO buffers */
+	void __iomem *piobase;
+	/*
+	 * write-combining mem-mapped pointer to base of RcvArray
+	 * memory.
+	 */
+	void __iomem *rcvarray_wc;
+	/*
+	 * credit return base - a per-NUMA range of DMA address that
+	 * the chip will use to update the per-context free counter
+	 */
+	struct credit_return_base *cr_base;
+
+	/* send context numbers and sizes for each type */
+	struct sc_config_sizes sc_sizes[SC_MAX];
+
+	u32 lcb_access_count;		/* count of LCB users */
+
+	char *boardname; /* human readable board info */
+
+	/* device (not port) flags, basically device capabilities */
+	u32 flags;
+
+	/* reset value */
+	u64 z_int_counter;
+	u64 z_rcv_limit;
+	/* percpu int_counter */
+	u64 __percpu *int_counter;
+	u64 __percpu *rcv_limit;
+
+	/* number of receive contexts in use by the driver */
+	u32 num_rcv_contexts;
+	/* number of pio send contexts in use by the driver */
+	u32 num_send_contexts;
+	/*
+	 * number of ctxts available for PSM open
+	 */
+	u32 freectxts;
+	/* base receive interrupt timeout, in CSR units */
+	u32 rcv_intr_timeout_csr;
+
+	u64 __iomem *egrtidbase;
+	spinlock_t sendctrl_lock; /* protect changes to SendCtrl */
+	spinlock_t rcvctrl_lock; /* protect changes to RcvCtrl */
+	/* around rcd and (user ctxts) ctxt_cnt use (intr vs free) */
+	spinlock_t uctxt_lock; /* rcd and user context changes */
+	/* exclusive access to 8051 */
+	spinlock_t dc8051_lock;
+	/* exclusive access to 8051 memory */
+	spinlock_t dc8051_memlock;
+	int dc8051_timed_out;	/* remember if the 8051 timed out */
+	/*
+	 * A page that will hold event notification bitmaps for all
+	 * contexts. This page will be mapped into all processes.
+	 */
+	unsigned long *events;
+	/*
+	 * per unit status, see also portdata statusp
+	 * mapped read-only into user processes so they can get unit and
+	 * IB link status cheaply
+	 */
+	struct hfi1_status *status;
+	u32 freezelen; /* max length of freezemsg */
+
+	/* revision register shadow */
+	u64 revision;
+	/* Base GUID for device (network order) */
+	u64 base_guid;
+
+	/* these are the "32 bit" regs */
+
+	/* value we put in kr_rcvhdrsize */
+	u32 rcvhdrsize;
+	/* number of receive contexts the chip supports */
+	u32 chip_rcv_contexts;
+	/* number of receive array entries */
+	u32 chip_rcv_array_count;
+	/* number of PIO send contexts the chip supports */
+	u32 chip_send_contexts;
+	/* number of bytes in the PIO memory buffer */
+	u32 chip_pio_mem_size;
+	/* number of bytes in the SDMA memory buffer */
+	u32 chip_sdma_mem_size;
+
+	/* size of each rcvegrbuffer */
+	u32 rcvegrbufsize;
+	/* log2 of above */
+	u16 rcvegrbufsize_shift;
+	/* both sides of the PCIe link are gen3 capable */
+	u8 link_gen3_capable;
+	/* localbus width (1, 2,4,8,16,32) from config space  */
+	u32 lbus_width;
+	/* localbus speed in MHz */
+	u32 lbus_speed;
+	int unit; /* unit # of this chip */
+	int node; /* home node of this chip */
+
+	/* save these PCI fields to restore after a reset */
+	u32 pcibar0;
+	u32 pcibar1;
+	u32 pci_rom;
+	u16 pci_command;
+	u16 pcie_devctl;
+	u16 pcie_lnkctl;
+	u16 pcie_devctl2;
+	u32 pci_msix0;
+	u32 pci_lnkctl3;
+	u32 pci_tph2;
+
+	/*
+	 * ASCII serial number, from flash, large enough for original
+	 * all digit strings, and longer serial number format
+	 */
+	u8 serial[SERIAL_MAX];
+	/* human readable board version */
+	u8 boardversion[BOARD_VERS_MAX];
+	u8 lbus_info[32]; /* human readable localbus info */
+	/* chip major rev, from CceRevision */
+	u8 majrev;
+	/* chip minor rev, from CceRevision */
+	u8 minrev;
+	/* hardware ID */
+	u8 hfi1_id;
+	/* implementation code */
+	u8 icode;
+	/* default link down value (poll/sleep) */
+	u8 link_default;
+	/* vAU of this device */
+	u8 vau;
+	/* vCU of this device */
+	u8 vcu;
+	/* link credits of this device */
+	u16 link_credits;
+	/* initial vl15 credits to use */
+	u16 vl15_init;
+
+	/* Misc small ints */
+	/* Number of physical ports available */
+	u8 num_pports;
+	/* Lowest context number which can be used by user processes */
+	u8 first_user_ctxt;
+	u8 n_krcv_queues;
+	u8 qos_shift;
+	u8 qpn_mask;
+
+	u16 rhf_offset; /* offset of RHF within receive header entry */
+	u16 irev;	/* implementation revision */
+	u16 dc8051_ver; /* 8051 firmware version */
+
+	struct platform_config_cache pcfg_cache;
+	/* control high-level access to qsfp */
+	struct mutex qsfp_i2c_mutex;
+
+	struct diag_client *diag_client;
+	spinlock_t hfi1_diag_trans_lock; /* protect diag observer ops */
+
+	u8 psxmitwait_supported;
+	/* cycle length of PS* counters in HW (in picoseconds) */
+	u16 psxmitwait_check_rate;
+	/* high volume overflow errors deferred to tasklet */
+	struct tasklet_struct error_tasklet;
+	/* per device cq worker */
+	struct kthread_worker *worker;
+
+	/* MSI-X information */
+	struct hfi1_msix_entry *msix_entries;
+	u32 num_msix_entries;
+
+	/* INTx information */
+	u32 requested_intx_irq;		/* did we request one? */
+	char intx_name[MAX_NAME_SIZE];	/* INTx name */
+
+	/* general interrupt: mask of handled interrupts */
+	u64 gi_mask[CCE_NUM_INT_CSRS];
+
+	struct rcv_array_data rcv_entries;
+
+	/*
+	 * 64 bit synthetic counters
+	 */
+	struct timer_list synth_stats_timer;
+
+	/*
+	 * device counters
+	 */
+	char *cntrnames;
+	size_t cntrnameslen;
+	size_t ndevcntrs;
+	u64 *cntrs;
+	u64 *scntrs;
+
+	/*
+	 * remembered values for synthetic counters
+	 */
+	u64 last_tx;
+	u64 last_rx;
+
+	/*
+	 * per-port counters
+	 */
+	size_t nportcntrs;
+	char *portcntrnames;
+	size_t portcntrnameslen;
+
+	struct hfi1_snoop_data hfi1_snoop;
+
+	struct err_info_rcvport err_info_rcvport;
+	struct err_info_constraint err_info_rcv_constraint;
+	struct err_info_constraint err_info_xmit_constraint;
+	u8 err_info_uncorrectable;
+	u8 err_info_fmconfig;
+
+	atomic_t drop_packet;
+	u8 do_drop;
+
+	/* receive interrupt functions */
+	rhf_rcv_function_ptr *rhf_rcv_function_map;
+	rhf_rcv_function_ptr normal_rhf_rcv_functions[8];
+
+	/*
+	 * Handlers for outgoing data so that snoop/capture does not
+	 * have to have its hooks in the send path
+	 */
+	int (*process_pio_send)(struct hfi1_qp *qp, struct ahg_ib_header *ibhdr,
+				u32 hdrwords, struct hfi1_sge_state *ss,
+				u32 len, u32 plen, u32 dwords, u64 pbc);
+	int (*process_dma_send)(struct hfi1_qp *qp, struct ahg_ib_header *ibhdr,
+				u32 hdrwords, struct hfi1_sge_state *ss,
+				u32 len, u32 plen, u32 dwords, u64 pbc);
+	void (*pio_inline_send)(struct hfi1_devdata *dd, struct pio_buf *pbuf,
+				u64 pbc, const void *from, size_t count);
+
+	/* OUI comes from the HW. Used everywhere as 3 separate bytes. */
+	u8 oui1;
+	u8 oui2;
+	u8 oui3;
+	/* Timer and counter used to detect RcvBufOvflCnt changes */
+	struct timer_list rcverr_timer;
+	u32 rcv_ovfl_cnt;
+
+	int assigned_node_id;
+	wait_queue_head_t event_queue;
+
+	/* Save the enabled LCB error bits */
+	u64 lcb_err_en;
+	u8 dc_shutdown;
+};
+
+/* 8051 firmware version helper */
+#define dc8051_ver(a, b) ((a) << 8 | (b))
+
+/* f_put_tid types */
+#define PT_EXPECTED 0
+#define PT_EAGER    1
+#define PT_INVALID  2
+
+/* Private data for file operations */
+struct hfi1_filedata {
+	struct hfi1_ctxtdata *uctxt;
+	unsigned subctxt;
+	struct hfi1_user_sdma_comp_q *cq;
+	struct hfi1_user_sdma_pkt_q *pq;
+	/* for cpu affinity; -1 if none */
+	int rec_cpu_num;
+};
+
+extern struct list_head hfi1_dev_list;
+extern spinlock_t hfi1_devs_lock;
+struct hfi1_devdata *hfi1_lookup(int unit);
+extern u32 hfi1_cpulist_count;
+extern unsigned long *hfi1_cpulist;
+
+extern unsigned int snoop_drop_send;
+extern unsigned int snoop_force_capture;
+int hfi1_init(struct hfi1_devdata *, int);
+int hfi1_count_units(int *npresentp, int *nupp);
+int hfi1_count_active_units(void);
+
+int hfi1_diag_add(struct hfi1_devdata *);
+void hfi1_diag_remove(struct hfi1_devdata *);
+void handle_linkup_change(struct hfi1_devdata *dd, u32 linkup);
+
+void handle_user_interrupt(struct hfi1_ctxtdata *rcd);
+
+int hfi1_create_rcvhdrq(struct hfi1_devdata *, struct hfi1_ctxtdata *);
+int hfi1_setup_eagerbufs(struct hfi1_ctxtdata *);
+int hfi1_create_ctxts(struct hfi1_devdata *dd);
+struct hfi1_ctxtdata *hfi1_create_ctxtdata(struct hfi1_pportdata *, u32);
+void hfi1_init_pportdata(struct pci_dev *, struct hfi1_pportdata *,
+			 struct hfi1_devdata *, u8, u8);
+void hfi1_free_ctxtdata(struct hfi1_devdata *, struct hfi1_ctxtdata *);
+
+void handle_receive_interrupt(struct hfi1_ctxtdata *);
+void handle_receive_interrupt_nodma_rtail(struct hfi1_ctxtdata *rcd);
+void handle_receive_interrupt_dma_rtail(struct hfi1_ctxtdata *rcd);
+int hfi1_reset_device(int);
+
+/* return the driver's idea of the logical OPA port state */
+static inline u32 driver_lstate(struct hfi1_pportdata *ppd)
+{
+	return ppd->lstate; /* use the cached value */
+}
+
+static inline u16 generate_jkey(kuid_t uid)
+{
+	return from_kuid(current_user_ns(), uid) & 0xffff;
+}
+
+/*
+ * active_egress_rate
+ *
+ * returns the active egress rate in units of [10^6 bits/sec]
+ */
+static inline u32 active_egress_rate(struct hfi1_pportdata *ppd)
+{
+	u16 link_speed = ppd->link_speed_active;
+	u16 link_width = ppd->link_width_active;
+	u32 egress_rate;
+
+	if (link_speed == OPA_LINK_SPEED_25G)
+		egress_rate = 25000;
+	else /* assume OPA_LINK_SPEED_12_5G */
+		egress_rate = 12500;
+
+	switch (link_width) {
+	case OPA_LINK_WIDTH_4X:
+		egress_rate *= 4;
+		break;
+	case OPA_LINK_WIDTH_3X:
+		egress_rate *= 3;
+		break;
+	case OPA_LINK_WIDTH_2X:
+		egress_rate *= 2;
+		break;
+	default:
+		/* assume IB_WIDTH_1X */
+		break;
+	}
+
+	return egress_rate;
+}
+
+/*
+ * egress_cycles
+ *
+ * Returns the number of 'fabric clock cycles' to egress a packet
+ * of length 'len' bytes, at 'rate' Mbit/s. Since the fabric clock
+ * rate is (approximately) 805 MHz, the units of the returned value
+ * are (1/805 MHz).
+ */
+static inline u32 egress_cycles(u32 len, u32 rate)
+{
+	u32 cycles;
+
+	/*
+	 * cycles is:
+	 *
+	 *          (length) [bits] / (rate) [bits/sec]
+	 *  ---------------------------------------------------
+	 *  fabric_clock_period == 1 /(805 * 10^6) [cycles/sec]
+	 */
+
+	cycles = len * 8; /* bits */
+	cycles *= 805;
+	cycles /= rate;
+
+	return cycles;
+}
+
+void set_link_ipg(struct hfi1_pportdata *ppd);
+void process_becn(struct hfi1_pportdata *ppd, u8 sl,  u16 rlid, u32 lqpn,
+		  u32 rqpn, u8 svc_type);
+void return_cnp(struct hfi1_ibport *ibp, struct hfi1_qp *qp, u32 remote_qpn,
+		u32 pkey, u32 slid, u32 dlid, u8 sc5,
+		const struct ib_grh *old_grh);
+
+#define PACKET_EGRESS_TIMEOUT 350
+static inline void pause_for_credit_return(struct hfi1_devdata *dd)
+{
+	/* Pause at least 1us, to ensure chip returns all credits */
+	u32 usec = cclock_to_ns(dd, PACKET_EGRESS_TIMEOUT) / 1000;
+
+	udelay(usec ? usec : 1);
+}
+
+/**
+ * sc_to_vlt() reverse lookup sc to vl
+ * @dd - devdata
+ * @sc5 - 5 bit sc
+ */
+static inline u8 sc_to_vlt(struct hfi1_devdata *dd, u8 sc5)
+{
+	unsigned seq;
+	u8 rval;
+
+	if (sc5 >= OPA_MAX_SCS)
+		return (u8)(0xff);
+
+	do {
+		seq = read_seqbegin(&dd->sc2vl_lock);
+		rval = *(((u8 *)dd->sc2vl) + sc5);
+	} while (read_seqretry(&dd->sc2vl_lock, seq));
+
+	return rval;
+}
+
+#define PKEY_MEMBER_MASK 0x8000
+#define PKEY_LOW_15_MASK 0x7fff
+
+/*
+ * ingress_pkey_matches_entry - return 1 if the pkey matches ent (ent
+ * being an entry from the ingress partition key table), return 0
+ * otherwise. Use the matching criteria for ingress partition keys
+ * specified in the OPAv1 spec., section 9.10.14.
+ */
+static inline int ingress_pkey_matches_entry(u16 pkey, u16 ent)
+{
+	u16 mkey = pkey & PKEY_LOW_15_MASK;
+	u16 ment = ent & PKEY_LOW_15_MASK;
+
+	if (mkey == ment) {
+		/*
+		 * If pkey[15] is clear (limited partition member),
+		 * is bit 15 in the corresponding table element
+		 * clear (limited member)?
+		 */
+		if (!(pkey & PKEY_MEMBER_MASK))
+			return !!(ent & PKEY_MEMBER_MASK);
+		return 1;
+	}
+	return 0;
+}
+
+/*
+ * ingress_pkey_table_search - search the entire pkey table for
+ * an entry which matches 'pkey'. return 0 if a match is found,
+ * and 1 otherwise.
+ */
+static int ingress_pkey_table_search(struct hfi1_pportdata *ppd, u16 pkey)
+{
+	int i;
+
+	for (i = 0; i < MAX_PKEY_VALUES; i++) {
+		if (ingress_pkey_matches_entry(pkey, ppd->pkeys[i]))
+			return 0;
+	}
+	return 1;
+}
+
+/*
+ * ingress_pkey_table_fail - record a failure of ingress pkey validation,
+ * i.e., increment port_rcv_constraint_errors for the port, and record
+ * the 'error info' for this failure.
+ */
+static void ingress_pkey_table_fail(struct hfi1_pportdata *ppd, u16 pkey,
+				    u16 slid)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+
+	incr_cntr64(&ppd->port_rcv_constraint_errors);
+	if (!(dd->err_info_rcv_constraint.status & OPA_EI_STATUS_SMASK)) {
+		dd->err_info_rcv_constraint.status |= OPA_EI_STATUS_SMASK;
+		dd->err_info_rcv_constraint.slid = slid;
+		dd->err_info_rcv_constraint.pkey = pkey;
+	}
+}
+
+/*
+ * ingress_pkey_check - Return 0 if the ingress pkey is valid, return 1
+ * otherwise. Use the criteria in the OPAv1 spec, section 9.10.14. idx
+ * is a hint as to the best place in the partition key table to begin
+ * searching. This function should not be called on the data path because
+ * of performance reasons. On datapath pkey check is expected to be done
+ * by HW and rcv_pkey_check function should be called instead.
+ */
+static inline int ingress_pkey_check(struct hfi1_pportdata *ppd, u16 pkey,
+				     u8 sc5, u8 idx, u16 slid)
+{
+	if (!(ppd->part_enforce & HFI1_PART_ENFORCE_IN))
+		return 0;
+
+	/* If SC15, pkey[0:14] must be 0x7fff */
+	if ((sc5 == 0xf) && ((pkey & PKEY_LOW_15_MASK) != PKEY_LOW_15_MASK))
+		goto bad;
+
+	/* Is the pkey = 0x0, or 0x8000? */
+	if ((pkey & PKEY_LOW_15_MASK) == 0)
+		goto bad;
+
+	/* The most likely matching pkey has index 'idx' */
+	if (ingress_pkey_matches_entry(pkey, ppd->pkeys[idx]))
+		return 0;
+
+	/* no match - try the whole table */
+	if (!ingress_pkey_table_search(ppd, pkey))
+		return 0;
+
+bad:
+	ingress_pkey_table_fail(ppd, pkey, slid);
+	return 1;
+}
+
+/*
+ * rcv_pkey_check - Return 0 if the ingress pkey is valid, return 1
+ * otherwise. It only ensures pkey is vlid for QP0. This function
+ * should be called on the data path instead of ingress_pkey_check
+ * as on data path, pkey check is done by HW (except for QP0).
+ */
+static inline int rcv_pkey_check(struct hfi1_pportdata *ppd, u16 pkey,
+				 u8 sc5, u16 slid)
+{
+	if (!(ppd->part_enforce & HFI1_PART_ENFORCE_IN))
+		return 0;
+
+	/* If SC15, pkey[0:14] must be 0x7fff */
+	if ((sc5 == 0xf) && ((pkey & PKEY_LOW_15_MASK) != PKEY_LOW_15_MASK))
+		goto bad;
+
+	return 0;
+bad:
+	ingress_pkey_table_fail(ppd, pkey, slid);
+	return 1;
+}
+
+/* MTU handling */
+
+/* MTU enumeration, 256-4k match IB */
+#define OPA_MTU_0     0
+#define OPA_MTU_256   1
+#define OPA_MTU_512   2
+#define OPA_MTU_1024  3
+#define OPA_MTU_2048  4
+#define OPA_MTU_4096  5
+
+u32 lrh_max_header_bytes(struct hfi1_devdata *dd);
+int mtu_to_enum(u32 mtu, int default_if_bad);
+u16 enum_to_mtu(int);
+static inline int valid_ib_mtu(unsigned int mtu)
+{
+	return mtu == 256 || mtu == 512 ||
+		mtu == 1024 || mtu == 2048 ||
+		mtu == 4096;
+}
+static inline int valid_opa_max_mtu(unsigned int mtu)
+{
+	return mtu >= 2048 &&
+		(valid_ib_mtu(mtu) || mtu == 8192 || mtu == 10240);
+}
+
+int set_mtu(struct hfi1_pportdata *);
+
+int hfi1_set_lid(struct hfi1_pportdata *, u32, u8);
+void hfi1_disable_after_error(struct hfi1_devdata *);
+int hfi1_set_uevent_bits(struct hfi1_pportdata *, const int);
+int hfi1_rcvbuf_validate(u32, u8, u16 *);
+
+int fm_get_table(struct hfi1_pportdata *, int, void *);
+int fm_set_table(struct hfi1_pportdata *, int, void *);
+
+void set_up_vl15(struct hfi1_devdata *dd, u8 vau, u16 vl15buf);
+void reset_link_credits(struct hfi1_devdata *dd);
+void assign_remote_cm_au_table(struct hfi1_devdata *dd, u8 vcu);
+
+int snoop_recv_handler(struct hfi1_packet *packet);
+int snoop_send_dma_handler(struct hfi1_qp *qp, struct ahg_ib_header *ibhdr,
+			   u32 hdrwords, struct hfi1_sge_state *ss, u32 len,
+			   u32 plen, u32 dwords, u64 pbc);
+int snoop_send_pio_handler(struct hfi1_qp *qp, struct ahg_ib_header *ibhdr,
+			   u32 hdrwords, struct hfi1_sge_state *ss, u32 len,
+			   u32 plen, u32 dwords, u64 pbc);
+void snoop_inline_pio_send(struct hfi1_devdata *dd, struct pio_buf *pbuf,
+			   u64 pbc, const void *from, size_t count);
+
+/* for use in system calls, where we want to know device type, etc. */
+#define ctxt_fp(fp) \
+	(((struct hfi1_filedata *)(fp)->private_data)->uctxt)
+#define subctxt_fp(fp) \
+	(((struct hfi1_filedata *)(fp)->private_data)->subctxt)
+#define tidcursor_fp(fp) \
+	(((struct hfi1_filedata *)(fp)->private_data)->tidcursor)
+#define user_sdma_pkt_fp(fp) \
+	(((struct hfi1_filedata *)(fp)->private_data)->pq)
+#define user_sdma_comp_fp(fp) \
+	(((struct hfi1_filedata *)(fp)->private_data)->cq)
+
+static inline struct hfi1_devdata *dd_from_ppd(struct hfi1_pportdata *ppd)
+{
+	return ppd->dd;
+}
+
+static inline struct hfi1_devdata *dd_from_dev(struct hfi1_ibdev *dev)
+{
+	return container_of(dev, struct hfi1_devdata, verbs_dev);
+}
+
+static inline struct hfi1_devdata *dd_from_ibdev(struct ib_device *ibdev)
+{
+	return dd_from_dev(to_idev(ibdev));
+}
+
+static inline struct hfi1_pportdata *ppd_from_ibp(struct hfi1_ibport *ibp)
+{
+	return container_of(ibp, struct hfi1_pportdata, ibport_data);
+}
+
+static inline struct hfi1_ibport *to_iport(struct ib_device *ibdev, u8 port)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	unsigned pidx = port - 1; /* IB number port from 1, hdw from 0 */
+
+	WARN_ON(pidx >= dd->num_pports);
+	return &dd->pport[pidx].ibport_data;
+}
+
+/*
+ * Return the indexed PKEY from the port PKEY table.
+ */
+static inline u16 hfi1_get_pkey(struct hfi1_ibport *ibp, unsigned index)
+{
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	u16 ret;
+
+	if (index >= ARRAY_SIZE(ppd->pkeys))
+		ret = 0;
+	else
+		ret = ppd->pkeys[index];
+
+	return ret;
+}
+
+/*
+ * Readers of cc_state must call get_cc_state() under rcu_read_lock().
+ * Writers of cc_state must call get_cc_state() under cc_state_lock.
+ */
+static inline struct cc_state *get_cc_state(struct hfi1_pportdata *ppd)
+{
+	return rcu_dereference(ppd->cc_state);
+}
+
+/*
+ * values for dd->flags (_device_ related flags)
+ */
+#define HFI1_INITTED           0x1    /* chip and driver up and initted */
+#define HFI1_PRESENT           0x2    /* chip accesses can be done */
+#define HFI1_FROZEN            0x4    /* chip in SPC freeze */
+#define HFI1_HAS_SDMA_TIMEOUT  0x8
+#define HFI1_HAS_SEND_DMA      0x10   /* Supports Send DMA */
+#define HFI1_FORCED_FREEZE     0x80   /* driver forced freeze mode */
+#define HFI1_DO_INIT_ASIC      0x100  /* This device will init the ASIC */
+
+/* IB dword length mask in PBC (lower 11 bits); same for all chips */
+#define HFI1_PBC_LENGTH_MASK                     ((1 << 11) - 1)
+
+
+/* ctxt_flag bit offsets */
+		/* context has been setup */
+#define HFI1_CTXT_SETUP_DONE 1
+		/* waiting for a packet to arrive */
+#define HFI1_CTXT_WAITING_RCV   2
+		/* master has not finished initializing */
+#define HFI1_CTXT_MASTER_UNINIT 4
+		/* waiting for an urgent packet to arrive */
+#define HFI1_CTXT_WAITING_URG 5
+
+/* free up any allocated data at closes */
+struct hfi1_devdata *hfi1_init_dd(struct pci_dev *,
+				  const struct pci_device_id *);
+void hfi1_free_devdata(struct hfi1_devdata *);
+void cc_state_reclaim(struct rcu_head *rcu);
+struct hfi1_devdata *hfi1_alloc_devdata(struct pci_dev *pdev, size_t extra);
+
+/*
+ * Set LED override, only the two LSBs have "public" meaning, but
+ * any non-zero value substitutes them for the Link and LinkTrain
+ * LED states.
+ */
+#define HFI1_LED_PHYS 1 /* Physical (linktraining) GREEN LED */
+#define HFI1_LED_LOG 2  /* Logical (link) YELLOW LED */
+void hfi1_set_led_override(struct hfi1_pportdata *ppd, unsigned int val);
+
+#define HFI1_CREDIT_RETURN_RATE (100)
+
+/*
+ * The number of words for the KDETH protocol field.  If this is
+ * larger then the actual field used, then part of the payload
+ * will be in the header.
+ *
+ * Optimally, we want this sized so that a typical case will
+ * use full cache lines.  The typical local KDETH header would
+ * be:
+ *
+ *	Bytes	Field
+ *	  8	LRH
+ *	 12	BHT
+ *	 ??	KDETH
+ *	  8	RHF
+ *	---
+ *	 28 + KDETH
+ *
+ * For a 64-byte cache line, KDETH would need to be 36 bytes or 9 DWORDS
+ */
+#define DEFAULT_RCVHDRSIZE 9
+
+/*
+ * Maximal header byte count:
+ *
+ *	Bytes	Field
+ *	  8	LRH
+ *	 40	GRH (optional)
+ *	 12	BTH
+ *	 ??	KDETH
+ *	  8	RHF
+ *	---
+ *	 68 + KDETH
+ *
+ * We also want to maintain a cache line alignment to assist DMA'ing
+ * of the header bytes.  Round up to a good size.
+ */
+#define DEFAULT_RCVHDR_ENTSIZE 32
+
+int hfi1_get_user_pages(unsigned long, size_t, struct page **);
+void hfi1_release_user_pages(struct page **, size_t);
+
+static inline void clear_rcvhdrtail(const struct hfi1_ctxtdata *rcd)
+{
+	*((u64 *) rcd->rcvhdrtail_kvaddr) = 0ULL;
+}
+
+static inline u32 get_rcvhdrtail(const struct hfi1_ctxtdata *rcd)
+{
+	/*
+	 * volatile because it's a DMA target from the chip, routine is
+	 * inlined, and don't want register caching or reordering.
+	 */
+	return (u32) le64_to_cpu(*rcd->rcvhdrtail_kvaddr);
+}
+
+/*
+ * sysfs interface.
+ */
+
+extern const char ib_hfi1_version[];
+
+int hfi1_device_create(struct hfi1_devdata *);
+void hfi1_device_remove(struct hfi1_devdata *);
+
+int hfi1_create_port_files(struct ib_device *ibdev, u8 port_num,
+			   struct kobject *kobj);
+int hfi1_verbs_register_sysfs(struct hfi1_devdata *);
+void hfi1_verbs_unregister_sysfs(struct hfi1_devdata *);
+/* Hook for sysfs read of QSFP */
+int qsfp_dump(struct hfi1_pportdata *ppd, char *buf, int len);
+
+int hfi1_pcie_init(struct pci_dev *, const struct pci_device_id *);
+void hfi1_pcie_cleanup(struct pci_dev *);
+int hfi1_pcie_ddinit(struct hfi1_devdata *, struct pci_dev *,
+		     const struct pci_device_id *);
+void hfi1_pcie_ddcleanup(struct hfi1_devdata *);
+void hfi1_pcie_flr(struct hfi1_devdata *);
+int pcie_speeds(struct hfi1_devdata *);
+void request_msix(struct hfi1_devdata *, u32 *, struct hfi1_msix_entry *);
+void hfi1_enable_intx(struct pci_dev *);
+void hfi1_nomsix(struct hfi1_devdata *);
+void restore_pci_variables(struct hfi1_devdata *dd);
+int do_pcie_gen3_transition(struct hfi1_devdata *dd);
+int parse_platform_config(struct hfi1_devdata *dd);
+int get_platform_config_field(struct hfi1_devdata *dd,
+			enum platform_config_table_type_encoding table_type,
+			int table_index, int field_index, u32 *data, u32 len);
+
+dma_addr_t hfi1_map_page(struct pci_dev *, struct page *, unsigned long,
+			 size_t, int);
+const char *get_unit_name(int unit);
+
+/*
+ * Flush write combining store buffers (if present) and perform a write
+ * barrier.
+ */
+static inline void flush_wc(void)
+{
+	asm volatile("sfence" : : : "memory");
+}
+
+void handle_eflags(struct hfi1_packet *packet);
+int process_receive_ib(struct hfi1_packet *packet);
+int process_receive_bypass(struct hfi1_packet *packet);
+int process_receive_error(struct hfi1_packet *packet);
+int kdeth_process_expected(struct hfi1_packet *packet);
+int kdeth_process_eager(struct hfi1_packet *packet);
+int process_receive_invalid(struct hfi1_packet *packet);
+
+extern rhf_rcv_function_ptr snoop_rhf_rcv_functions[8];
+
+void update_sge(struct hfi1_sge_state *ss, u32 length);
+
+/* global module parameter variables */
+extern unsigned int hfi1_max_mtu;
+extern unsigned int hfi1_cu;
+extern unsigned int user_credit_return_threshold;
+extern uint num_rcv_contexts;
+extern unsigned n_krcvqs;
+extern u8 krcvqs[];
+extern int krcvqsset;
+extern uint kdeth_qp;
+extern uint loopback;
+extern uint quick_linkup;
+extern uint rcv_intr_timeout;
+extern uint rcv_intr_count;
+extern uint rcv_intr_dynamic;
+extern ushort link_crc_mask;
+
+extern struct mutex hfi1_mutex;
+
+/* Number of seconds before our card status check...  */
+#define STATUS_TIMEOUT 60
+
+#define DRIVER_NAME		"hfi1"
+#define HFI1_USER_MINOR_BASE     0
+#define HFI1_TRACE_MINOR         127
+#define HFI1_DIAGPKT_MINOR       128
+#define HFI1_DIAG_MINOR_BASE     129
+#define HFI1_SNOOP_CAPTURE_BASE  200
+#define HFI1_NMINORS             255
+
+#define PCI_VENDOR_ID_INTEL 0x8086
+#define PCI_DEVICE_ID_INTEL0 0x24f0
+#define PCI_DEVICE_ID_INTEL1 0x24f1
+
+#define HFI1_PKT_USER_SC_INTEGRITY					    \
+	(SEND_CTXT_CHECK_ENABLE_DISALLOW_NON_KDETH_PACKETS_SMASK	    \
+	| SEND_CTXT_CHECK_ENABLE_DISALLOW_BYPASS_SMASK		    \
+	| SEND_CTXT_CHECK_ENABLE_DISALLOW_GRH_SMASK)
+
+#define HFI1_PKT_KERNEL_SC_INTEGRITY					    \
+	(SEND_CTXT_CHECK_ENABLE_DISALLOW_KDETH_PACKETS_SMASK)
+
+static inline u64 hfi1_pkt_default_send_ctxt_mask(struct hfi1_devdata *dd,
+						  u16 ctxt_type)
+{
+	u64 base_sc_integrity =
+	SEND_CTXT_CHECK_ENABLE_DISALLOW_BYPASS_BAD_PKT_LEN_SMASK
+	| SEND_CTXT_CHECK_ENABLE_DISALLOW_PBC_STATIC_RATE_CONTROL_SMASK
+	| SEND_CTXT_CHECK_ENABLE_DISALLOW_TOO_LONG_BYPASS_PACKETS_SMASK
+	| SEND_CTXT_CHECK_ENABLE_DISALLOW_TOO_LONG_IB_PACKETS_SMASK
+	| SEND_CTXT_CHECK_ENABLE_DISALLOW_BAD_PKT_LEN_SMASK
+	| SEND_CTXT_CHECK_ENABLE_DISALLOW_PBC_TEST_SMASK
+	| SEND_CTXT_CHECK_ENABLE_DISALLOW_TOO_SMALL_BYPASS_PACKETS_SMASK
+	| SEND_CTXT_CHECK_ENABLE_DISALLOW_TOO_SMALL_IB_PACKETS_SMASK
+	| SEND_CTXT_CHECK_ENABLE_DISALLOW_RAW_IPV6_SMASK
+	| SEND_CTXT_CHECK_ENABLE_DISALLOW_RAW_SMASK
+	| SEND_CTXT_CHECK_ENABLE_CHECK_BYPASS_VL_MAPPING_SMASK
+	| SEND_CTXT_CHECK_ENABLE_CHECK_VL_MAPPING_SMASK
+	| SEND_CTXT_CHECK_ENABLE_CHECK_OPCODE_SMASK
+	| SEND_CTXT_CHECK_ENABLE_CHECK_SLID_SMASK
+	| SEND_CTXT_CHECK_ENABLE_CHECK_JOB_KEY_SMASK
+	| SEND_CTXT_CHECK_ENABLE_CHECK_VL_SMASK
+	| SEND_CTXT_CHECK_ENABLE_CHECK_ENABLE_SMASK;
+
+	if (ctxt_type == SC_USER)
+		base_sc_integrity |= HFI1_PKT_USER_SC_INTEGRITY;
+	else
+		base_sc_integrity |= HFI1_PKT_KERNEL_SC_INTEGRITY;
+
+	if (is_a0(dd))
+		/* turn off send-side job key checks - A0 erratum */
+		return base_sc_integrity &
+		       ~SEND_CTXT_CHECK_ENABLE_CHECK_JOB_KEY_SMASK;
+	return base_sc_integrity;
+}
+
+static inline u64 hfi1_pkt_base_sdma_integrity(struct hfi1_devdata *dd)
+{
+	u64 base_sdma_integrity =
+	SEND_DMA_CHECK_ENABLE_DISALLOW_BYPASS_BAD_PKT_LEN_SMASK
+	| SEND_DMA_CHECK_ENABLE_DISALLOW_PBC_STATIC_RATE_CONTROL_SMASK
+	| SEND_DMA_CHECK_ENABLE_DISALLOW_TOO_LONG_BYPASS_PACKETS_SMASK
+	| SEND_DMA_CHECK_ENABLE_DISALLOW_TOO_LONG_IB_PACKETS_SMASK
+	| SEND_DMA_CHECK_ENABLE_DISALLOW_BAD_PKT_LEN_SMASK
+	| SEND_DMA_CHECK_ENABLE_DISALLOW_TOO_SMALL_BYPASS_PACKETS_SMASK
+	| SEND_DMA_CHECK_ENABLE_DISALLOW_TOO_SMALL_IB_PACKETS_SMASK
+	| SEND_DMA_CHECK_ENABLE_DISALLOW_RAW_IPV6_SMASK
+	| SEND_DMA_CHECK_ENABLE_DISALLOW_RAW_SMASK
+	| SEND_DMA_CHECK_ENABLE_CHECK_BYPASS_VL_MAPPING_SMASK
+	| SEND_DMA_CHECK_ENABLE_CHECK_VL_MAPPING_SMASK
+	| SEND_DMA_CHECK_ENABLE_CHECK_OPCODE_SMASK
+	| SEND_DMA_CHECK_ENABLE_CHECK_SLID_SMASK
+	| SEND_DMA_CHECK_ENABLE_CHECK_JOB_KEY_SMASK
+	| SEND_DMA_CHECK_ENABLE_CHECK_VL_SMASK
+	| SEND_DMA_CHECK_ENABLE_CHECK_ENABLE_SMASK;
+
+	if (is_a0(dd))
+		/* turn off send-side job key checks - A0 erratum */
+		return base_sdma_integrity &
+		       ~SEND_DMA_CHECK_ENABLE_CHECK_JOB_KEY_SMASK;
+	return base_sdma_integrity;
+}
+
+/*
+ * hfi1_early_err is used (only!) to print early errors before devdata is
+ * allocated, or when dd->pcidev may not be valid, and at the tail end of
+ * cleanup when devdata may have been freed, etc.  hfi1_dev_porterr is
+ * the same as dd_dev_err, but is used when the message really needs
+ * the IB port# to be definitive as to what's happening..
+ */
+#define hfi1_early_err(dev, fmt, ...) \
+	dev_err(dev, fmt, ##__VA_ARGS__)
+
+#define hfi1_early_info(dev, fmt, ...) \
+	dev_info(dev, fmt, ##__VA_ARGS__)
+
+#define dd_dev_emerg(dd, fmt, ...) \
+	dev_emerg(&(dd)->pcidev->dev, "%s: " fmt, \
+		  get_unit_name((dd)->unit), ##__VA_ARGS__)
+#define dd_dev_err(dd, fmt, ...) \
+	dev_err(&(dd)->pcidev->dev, "%s: " fmt, \
+			get_unit_name((dd)->unit), ##__VA_ARGS__)
+#define dd_dev_warn(dd, fmt, ...) \
+	dev_warn(&(dd)->pcidev->dev, "%s: " fmt, \
+			get_unit_name((dd)->unit), ##__VA_ARGS__)
+
+#define dd_dev_warn_ratelimited(dd, fmt, ...) \
+	dev_warn_ratelimited(&(dd)->pcidev->dev, "%s: " fmt, \
+			get_unit_name((dd)->unit), ##__VA_ARGS__)
+
+#define dd_dev_info(dd, fmt, ...) \
+	dev_info(&(dd)->pcidev->dev, "%s: " fmt, \
+			get_unit_name((dd)->unit), ##__VA_ARGS__)
+
+#define hfi1_dev_porterr(dd, port, fmt, ...) \
+	dev_err(&(dd)->pcidev->dev, "%s: IB%u:%u " fmt, \
+			get_unit_name((dd)->unit), (dd)->unit, (port), \
+			##__VA_ARGS__)
+
+/*
+ * this is used for formatting hw error messages...
+ */
+struct hfi1_hwerror_msgs {
+	u64 mask;
+	const char *msg;
+	size_t sz;
+};
+
+/* in intr.c... */
+void hfi1_format_hwerrors(u64 hwerrs,
+			  const struct hfi1_hwerror_msgs *hwerrmsgs,
+			  size_t nhwerrmsgs, char *msg, size_t lmsg);
+
+#define USER_OPCODE_CHECK_VAL 0xC0
+#define USER_OPCODE_CHECK_MASK 0xC0
+#define OPCODE_CHECK_VAL_DISABLED 0x0
+#define OPCODE_CHECK_MASK_DISABLED 0x0
+
+static inline void hfi1_reset_cpu_counters(struct hfi1_devdata *dd)
+{
+	struct hfi1_pportdata *ppd;
+	int i;
+
+	dd->z_int_counter = get_all_cpu_total(dd->int_counter);
+	dd->z_rcv_limit = get_all_cpu_total(dd->rcv_limit);
+
+	ppd = (struct hfi1_pportdata *)(dd + 1);
+	for (i = 0; i < dd->num_pports; i++, ppd++) {
+		ppd->ibport_data.z_rc_acks =
+			get_all_cpu_total(ppd->ibport_data.rc_acks);
+		ppd->ibport_data.z_rc_qacks =
+			get_all_cpu_total(ppd->ibport_data.rc_qacks);
+	}
+}
+
+/* Control LED state */
+static inline void setextled(struct hfi1_devdata *dd, u32 on)
+{
+	if (on)
+		write_csr(dd, DCC_CFG_LED_CNTRL, 0x1F);
+	else
+		write_csr(dd, DCC_CFG_LED_CNTRL, 0x10);
+}
+
+int hfi1_tempsense_rd(struct hfi1_devdata *dd, struct hfi1_temp *temp);
+
+#endif                          /* _HFI1_KERNEL_H */
diff --git a/drivers/staging/rdma/hfi1/init.c b/drivers/staging/rdma/hfi1/init.c
new file mode 100644
index 000000000000..a877eda8c13c
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/init.c
@@ -0,0 +1,1722 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/pci.h>
+#include <linux/netdevice.h>
+#include <linux/vmalloc.h>
+#include <linux/delay.h>
+#include <linux/idr.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/hrtimer.h>
+
+#include "hfi.h"
+#include "device.h"
+#include "common.h"
+#include "mad.h"
+#include "sdma.h"
+#include "debugfs.h"
+#include "verbs.h"
+
+#undef pr_fmt
+#define pr_fmt(fmt) DRIVER_NAME ": " fmt
+
+/*
+ * min buffers we want to have per context, after driver
+ */
+#define HFI1_MIN_USER_CTXT_BUFCNT 7
+
+#define HFI1_MIN_HDRQ_EGRBUF_CNT 2
+#define HFI1_MIN_EAGER_BUFFER_SIZE (4 * 1024) /* 4KB */
+#define HFI1_MAX_EAGER_BUFFER_SIZE (256 * 1024) /* 256KB */
+
+/*
+ * Number of user receive contexts we are configured to use (to allow for more
+ * pio buffers per ctxt, etc.)  Zero means use one user context per CPU.
+ */
+uint num_rcv_contexts;
+module_param_named(num_rcv_contexts, num_rcv_contexts, uint, S_IRUGO);
+MODULE_PARM_DESC(
+	num_rcv_contexts, "Set max number of user receive contexts to use");
+
+u8 krcvqs[RXE_NUM_DATA_VL];
+int krcvqsset;
+module_param_array(krcvqs, byte, &krcvqsset, S_IRUGO);
+MODULE_PARM_DESC(krcvqs, "Array of the number of kernel receive queues by VL");
+
+/* computed based on above array */
+unsigned n_krcvqs;
+
+static unsigned hfi1_rcvarr_split = 25;
+module_param_named(rcvarr_split, hfi1_rcvarr_split, uint, S_IRUGO);
+MODULE_PARM_DESC(rcvarr_split, "Percent of context's RcvArray entries used for Eager buffers");
+
+static uint eager_buffer_size = (2 << 20); /* 2MB */
+module_param(eager_buffer_size, uint, S_IRUGO);
+MODULE_PARM_DESC(eager_buffer_size, "Size of the eager buffers, default: 2MB");
+
+static uint rcvhdrcnt = 2048; /* 2x the max eager buffer count */
+module_param_named(rcvhdrcnt, rcvhdrcnt, uint, S_IRUGO);
+MODULE_PARM_DESC(rcvhdrcnt, "Receive header queue count (default 2048)");
+
+static uint hfi1_hdrq_entsize = 32;
+module_param_named(hdrq_entsize, hfi1_hdrq_entsize, uint, S_IRUGO);
+MODULE_PARM_DESC(hdrq_entsize, "Size of header queue entries: 2 - 8B, 16 - 64B (default), 32 - 128B");
+
+unsigned int user_credit_return_threshold = 33;	/* default is 33% */
+module_param(user_credit_return_threshold, uint, S_IRUGO);
+MODULE_PARM_DESC(user_credit_return_theshold, "Credit return threshold for user send contexts, return when unreturned credits passes this many blocks (in percent of allocated blocks, 0 is off)");
+
+static inline u64 encode_rcv_header_entry_size(u16);
+
+static struct idr hfi1_unit_table;
+u32 hfi1_cpulist_count;
+unsigned long *hfi1_cpulist;
+
+/*
+ * Common code for creating the receive context array.
+ */
+int hfi1_create_ctxts(struct hfi1_devdata *dd)
+{
+	unsigned i;
+	int ret;
+	int local_node_id = pcibus_to_node(dd->pcidev->bus);
+
+	if (local_node_id < 0)
+		local_node_id = numa_node_id();
+	dd->assigned_node_id = local_node_id;
+
+	dd->rcd = kcalloc(dd->num_rcv_contexts, sizeof(*dd->rcd), GFP_KERNEL);
+	if (!dd->rcd) {
+		dd_dev_err(dd,
+			"Unable to allocate receive context array, failing\n");
+		goto nomem;
+	}
+
+	/* create one or more kernel contexts */
+	for (i = 0; i < dd->first_user_ctxt; ++i) {
+		struct hfi1_pportdata *ppd;
+		struct hfi1_ctxtdata *rcd;
+
+		ppd = dd->pport + (i % dd->num_pports);
+		rcd = hfi1_create_ctxtdata(ppd, i);
+		if (!rcd) {
+			dd_dev_err(dd,
+				"Unable to allocate kernel receive context, failing\n");
+			goto nomem;
+		}
+		/*
+		 * Set up the kernel context flags here and now because they
+		 * use default values for all receive side memories.  User
+		 * contexts will be handled as they are created.
+		 */
+		rcd->flags = HFI1_CAP_KGET(MULTI_PKT_EGR) |
+			HFI1_CAP_KGET(NODROP_RHQ_FULL) |
+			HFI1_CAP_KGET(NODROP_EGR_FULL) |
+			HFI1_CAP_KGET(DMA_RTAIL);
+		rcd->seq_cnt = 1;
+
+		rcd->sc = sc_alloc(dd, SC_ACK, rcd->rcvhdrqentsize, dd->node);
+		if (!rcd->sc) {
+			dd_dev_err(dd,
+				"Unable to allocate kernel send context, failing\n");
+			dd->rcd[rcd->ctxt] = NULL;
+			hfi1_free_ctxtdata(dd, rcd);
+			goto nomem;
+		}
+
+		ret = hfi1_init_ctxt(rcd->sc);
+		if (ret < 0) {
+			dd_dev_err(dd,
+				   "Failed to setup kernel receive context, failing\n");
+			sc_free(rcd->sc);
+			dd->rcd[rcd->ctxt] = NULL;
+			hfi1_free_ctxtdata(dd, rcd);
+			ret = -EFAULT;
+			goto bail;
+		}
+	}
+
+	return 0;
+nomem:
+	ret = -ENOMEM;
+bail:
+	kfree(dd->rcd);
+	dd->rcd = NULL;
+	return ret;
+}
+
+/*
+ * Common code for user and kernel context setup.
+ */
+struct hfi1_ctxtdata *hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, u32 ctxt)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	struct hfi1_ctxtdata *rcd;
+	unsigned kctxt_ngroups = 0;
+	u32 base;
+
+	if (dd->rcv_entries.nctxt_extra >
+	    dd->num_rcv_contexts - dd->first_user_ctxt)
+		kctxt_ngroups = (dd->rcv_entries.nctxt_extra -
+				 (dd->num_rcv_contexts - dd->first_user_ctxt));
+	rcd = kzalloc(sizeof(*rcd), GFP_KERNEL);
+	if (rcd) {
+		u32 rcvtids, max_entries;
+
+		dd_dev_info(dd, "%s: setting up context %u\n", __func__, ctxt);
+
+		INIT_LIST_HEAD(&rcd->qp_wait_list);
+		rcd->ppd = ppd;
+		rcd->dd = dd;
+		rcd->cnt = 1;
+		rcd->ctxt = ctxt;
+		dd->rcd[ctxt] = rcd;
+		rcd->numa_id = numa_node_id();
+		rcd->rcv_array_groups = dd->rcv_entries.ngroups;
+
+		spin_lock_init(&rcd->exp_lock);
+
+		/*
+		 * Calculate the context's RcvArray entry starting point.
+		 * We do this here because we have to take into account all
+		 * the RcvArray entries that previous context would have
+		 * taken and we have to account for any extra groups
+		 * assigned to the kernel or user contexts.
+		 */
+		if (ctxt < dd->first_user_ctxt) {
+			if (ctxt < kctxt_ngroups) {
+				base = ctxt * (dd->rcv_entries.ngroups + 1);
+				rcd->rcv_array_groups++;
+			} else
+				base = kctxt_ngroups +
+					(ctxt * dd->rcv_entries.ngroups);
+		} else {
+			u16 ct = ctxt - dd->first_user_ctxt;
+
+			base = ((dd->n_krcv_queues * dd->rcv_entries.ngroups) +
+				kctxt_ngroups);
+			if (ct < dd->rcv_entries.nctxt_extra) {
+				base += ct * (dd->rcv_entries.ngroups + 1);
+				rcd->rcv_array_groups++;
+			} else
+				base += dd->rcv_entries.nctxt_extra +
+					(ct * dd->rcv_entries.ngroups);
+		}
+		rcd->eager_base = base * dd->rcv_entries.group_size;
+
+		/* Validate and initialize Rcv Hdr Q variables */
+		if (rcvhdrcnt % HDRQ_INCREMENT) {
+			dd_dev_err(dd,
+				   "ctxt%u: header queue count %d must be divisible by %d\n",
+				   rcd->ctxt, rcvhdrcnt, HDRQ_INCREMENT);
+			goto bail;
+		}
+		rcd->rcvhdrq_cnt = rcvhdrcnt;
+		rcd->rcvhdrqentsize = hfi1_hdrq_entsize;
+		/*
+		 * Simple Eager buffer allocation: we have already pre-allocated
+		 * the number of RcvArray entry groups. Each ctxtdata structure
+		 * holds the number of groups for that context.
+		 *
+		 * To follow CSR requirements and maintain cacheline alignment,
+		 * make sure all sizes and bases are multiples of group_size.
+		 *
+		 * The expected entry count is what is left after assigning
+		 * eager.
+		 */
+		max_entries = rcd->rcv_array_groups *
+			dd->rcv_entries.group_size;
+		rcvtids = ((max_entries * hfi1_rcvarr_split) / 100);
+		rcd->egrbufs.count = round_down(rcvtids,
+						dd->rcv_entries.group_size);
+		if (rcd->egrbufs.count > MAX_EAGER_ENTRIES) {
+			dd_dev_err(dd, "ctxt%u: requested too many RcvArray entries.\n",
+				   rcd->ctxt);
+			rcd->egrbufs.count = MAX_EAGER_ENTRIES;
+		}
+		dd_dev_info(dd, "ctxt%u: max Eager buffer RcvArray entries: %u\n",
+			    rcd->ctxt, rcd->egrbufs.count);
+
+		/*
+		 * Allocate array that will hold the eager buffer accounting
+		 * data.
+		 * This will allocate the maximum possible buffer count based
+		 * on the value of the RcvArray split parameter.
+		 * The resulting value will be rounded down to the closest
+		 * multiple of dd->rcv_entries.group_size.
+		 */
+		rcd->egrbufs.buffers = kzalloc(sizeof(*rcd->egrbufs.buffers) *
+					       rcd->egrbufs.count, GFP_KERNEL);
+		if (!rcd->egrbufs.buffers)
+			goto bail;
+		rcd->egrbufs.rcvtids = kzalloc(sizeof(*rcd->egrbufs.rcvtids) *
+					       rcd->egrbufs.count, GFP_KERNEL);
+		if (!rcd->egrbufs.rcvtids)
+			goto bail;
+		rcd->egrbufs.size = eager_buffer_size;
+		/*
+		 * The size of the buffers programmed into the RcvArray
+		 * entries needs to be big enough to handle the highest
+		 * MTU supported.
+		 */
+		if (rcd->egrbufs.size < hfi1_max_mtu) {
+			rcd->egrbufs.size = __roundup_pow_of_two(hfi1_max_mtu);
+			dd_dev_info(dd,
+				    "ctxt%u: eager bufs size too small. Adjusting to %zu\n",
+				    rcd->ctxt, rcd->egrbufs.size);
+		}
+		rcd->egrbufs.rcvtid_size = HFI1_MAX_EAGER_BUFFER_SIZE;
+
+		if (ctxt < dd->first_user_ctxt) { /* N/A for PSM contexts */
+			rcd->opstats = kzalloc(sizeof(*rcd->opstats),
+				GFP_KERNEL);
+			if (!rcd->opstats) {
+				dd_dev_err(dd,
+					   "ctxt%u: Unable to allocate per ctxt stats buffer\n",
+					   rcd->ctxt);
+				goto bail;
+			}
+		}
+	}
+	return rcd;
+bail:
+	kfree(rcd->opstats);
+	kfree(rcd->egrbufs.rcvtids);
+	kfree(rcd->egrbufs.buffers);
+	kfree(rcd);
+	return NULL;
+}
+
+/*
+ * Convert a receive header entry size that to the encoding used in the CSR.
+ *
+ * Return a zero if the given size is invalid.
+ */
+static inline u64 encode_rcv_header_entry_size(u16 size)
+{
+	/* there are only 3 valid receive header entry sizes */
+	if (size == 2)
+		return 1;
+	if (size == 16)
+		return 2;
+	else if (size == 32)
+		return 4;
+	return 0; /* invalid */
+}
+
+/*
+ * Select the largest ccti value over all SLs to determine the intra-
+ * packet gap for the link.
+ *
+ * called with cca_timer_lock held (to protect access to cca_timer
+ * array), and rcu_read_lock() (to protect access to cc_state).
+ */
+void set_link_ipg(struct hfi1_pportdata *ppd)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	struct cc_state *cc_state;
+	int i;
+	u16 cce, ccti_limit, max_ccti = 0;
+	u16 shift, mult;
+	u64 src;
+	u32 current_egress_rate; /* Mbits /sec */
+	u32 max_pkt_time;
+	/*
+	 * max_pkt_time is the maximum packet egress time in units
+	 * of the fabric clock period 1/(805 MHz).
+	 */
+
+	cc_state = get_cc_state(ppd);
+
+	if (cc_state == NULL)
+		/*
+		 * This should _never_ happen - rcu_read_lock() is held,
+		 * and set_link_ipg() should not be called if cc_state
+		 * is NULL.
+		 */
+		return;
+
+	for (i = 0; i < OPA_MAX_SLS; i++) {
+		u16 ccti = ppd->cca_timer[i].ccti;
+
+		if (ccti > max_ccti)
+			max_ccti = ccti;
+	}
+
+	ccti_limit = cc_state->cct.ccti_limit;
+	if (max_ccti > ccti_limit)
+		max_ccti = ccti_limit;
+
+	cce = cc_state->cct.entries[max_ccti].entry;
+	shift = (cce & 0xc000) >> 14;
+	mult = (cce & 0x3fff);
+
+	current_egress_rate = active_egress_rate(ppd);
+
+	max_pkt_time = egress_cycles(ppd->ibmaxlen, current_egress_rate);
+
+	src = (max_pkt_time >> shift) * mult;
+
+	src &= SEND_STATIC_RATE_CONTROL_CSR_SRC_RELOAD_SMASK;
+	src <<= SEND_STATIC_RATE_CONTROL_CSR_SRC_RELOAD_SHIFT;
+
+	write_csr(dd, SEND_STATIC_RATE_CONTROL, src);
+}
+
+static enum hrtimer_restart cca_timer_fn(struct hrtimer *t)
+{
+	struct cca_timer *cca_timer;
+	struct hfi1_pportdata *ppd;
+	int sl;
+	u16 ccti, ccti_timer, ccti_min;
+	struct cc_state *cc_state;
+
+	cca_timer = container_of(t, struct cca_timer, hrtimer);
+	ppd = cca_timer->ppd;
+	sl = cca_timer->sl;
+
+	rcu_read_lock();
+
+	cc_state = get_cc_state(ppd);
+
+	if (cc_state == NULL) {
+		rcu_read_unlock();
+		return HRTIMER_NORESTART;
+	}
+
+	/*
+	 * 1) decrement ccti for SL
+	 * 2) calculate IPG for link (set_link_ipg())
+	 * 3) restart timer, unless ccti is at min value
+	 */
+
+	ccti_min = cc_state->cong_setting.entries[sl].ccti_min;
+	ccti_timer = cc_state->cong_setting.entries[sl].ccti_timer;
+
+	spin_lock(&ppd->cca_timer_lock);
+
+	ccti = cca_timer->ccti;
+
+	if (ccti > ccti_min) {
+		cca_timer->ccti--;
+		set_link_ipg(ppd);
+	}
+
+	spin_unlock(&ppd->cca_timer_lock);
+
+	rcu_read_unlock();
+
+	if (ccti > ccti_min) {
+		unsigned long nsec = 1024 * ccti_timer;
+		/* ccti_timer is in units of 1.024 usec */
+		hrtimer_forward_now(t, ns_to_ktime(nsec));
+		return HRTIMER_RESTART;
+	}
+	return HRTIMER_NORESTART;
+}
+
+/*
+ * Common code for initializing the physical port structure.
+ */
+void hfi1_init_pportdata(struct pci_dev *pdev, struct hfi1_pportdata *ppd,
+			 struct hfi1_devdata *dd, u8 hw_pidx, u8 port)
+{
+	int i, size;
+	uint default_pkey_idx;
+
+	ppd->dd = dd;
+	ppd->hw_pidx = hw_pidx;
+	ppd->port = port; /* IB port number, not index */
+
+	default_pkey_idx = 1;
+
+	ppd->pkeys[default_pkey_idx] = DEFAULT_P_KEY;
+	if (loopback) {
+		hfi1_early_err(&pdev->dev,
+			       "Faking data partition 0x8001 in idx %u\n",
+			       !default_pkey_idx);
+		ppd->pkeys[!default_pkey_idx] = 0x8001;
+	}
+
+	INIT_WORK(&ppd->link_vc_work, handle_verify_cap);
+	INIT_WORK(&ppd->link_up_work, handle_link_up);
+	INIT_WORK(&ppd->link_down_work, handle_link_down);
+	INIT_WORK(&ppd->freeze_work, handle_freeze);
+	INIT_WORK(&ppd->link_downgrade_work, handle_link_downgrade);
+	INIT_WORK(&ppd->sma_message_work, handle_sma_message);
+	INIT_WORK(&ppd->link_bounce_work, handle_link_bounce);
+	mutex_init(&ppd->hls_lock);
+	spin_lock_init(&ppd->sdma_alllock);
+	spin_lock_init(&ppd->qsfp_info.qsfp_lock);
+
+	ppd->sm_trap_qp = 0x0;
+	ppd->sa_qp = 0x1;
+
+	ppd->hfi1_wq = NULL;
+
+	spin_lock_init(&ppd->cca_timer_lock);
+
+	for (i = 0; i < OPA_MAX_SLS; i++) {
+		hrtimer_init(&ppd->cca_timer[i].hrtimer, CLOCK_MONOTONIC,
+			     HRTIMER_MODE_REL);
+		ppd->cca_timer[i].ppd = ppd;
+		ppd->cca_timer[i].sl = i;
+		ppd->cca_timer[i].ccti = 0;
+		ppd->cca_timer[i].hrtimer.function = cca_timer_fn;
+	}
+
+	ppd->cc_max_table_entries = IB_CC_TABLE_CAP_DEFAULT;
+
+	spin_lock_init(&ppd->cc_state_lock);
+	spin_lock_init(&ppd->cc_log_lock);
+	size = sizeof(struct cc_state);
+	RCU_INIT_POINTER(ppd->cc_state, kzalloc(size, GFP_KERNEL));
+	if (!rcu_dereference(ppd->cc_state))
+		goto bail;
+	return;
+
+bail:
+
+	hfi1_early_err(&pdev->dev,
+		       "Congestion Control Agent disabled for port %d\n", port);
+}
+
+/*
+ * Do initialization for device that is only needed on
+ * first detect, not on resets.
+ */
+static int loadtime_init(struct hfi1_devdata *dd)
+{
+	return 0;
+}
+
+/**
+ * init_after_reset - re-initialize after a reset
+ * @dd: the hfi1_ib device
+ *
+ * sanity check at least some of the values after reset, and
+ * ensure no receive or transmit (explicitly, in case reset
+ * failed
+ */
+static int init_after_reset(struct hfi1_devdata *dd)
+{
+	int i;
+
+	/*
+	 * Ensure chip does no sends or receives, tail updates, or
+	 * pioavail updates while we re-initialize.  This is mostly
+	 * for the driver data structures, not chip registers.
+	 */
+	for (i = 0; i < dd->num_rcv_contexts; i++)
+		hfi1_rcvctrl(dd, HFI1_RCVCTRL_CTXT_DIS |
+				  HFI1_RCVCTRL_INTRAVAIL_DIS |
+				  HFI1_RCVCTRL_TAILUPD_DIS, i);
+	pio_send_control(dd, PSC_GLOBAL_DISABLE);
+	for (i = 0; i < dd->num_send_contexts; i++)
+		sc_disable(dd->send_contexts[i].sc);
+
+	return 0;
+}
+
+static void enable_chip(struct hfi1_devdata *dd)
+{
+	u32 rcvmask;
+	u32 i;
+
+	/* enable PIO send */
+	pio_send_control(dd, PSC_GLOBAL_ENABLE);
+
+	/*
+	 * Enable kernel ctxts' receive and receive interrupt.
+	 * Other ctxts done as user opens and initializes them.
+	 */
+	rcvmask = HFI1_RCVCTRL_CTXT_ENB | HFI1_RCVCTRL_INTRAVAIL_ENB;
+	for (i = 0; i < dd->first_user_ctxt; ++i) {
+		rcvmask |= HFI1_CAP_KGET_MASK(dd->rcd[i]->flags, DMA_RTAIL) ?
+			HFI1_RCVCTRL_TAILUPD_ENB : HFI1_RCVCTRL_TAILUPD_DIS;
+		if (!HFI1_CAP_KGET_MASK(dd->rcd[i]->flags, MULTI_PKT_EGR))
+			rcvmask |= HFI1_RCVCTRL_ONE_PKT_EGR_ENB;
+		if (HFI1_CAP_KGET_MASK(dd->rcd[i]->flags, NODROP_RHQ_FULL))
+			rcvmask |= HFI1_RCVCTRL_NO_RHQ_DROP_ENB;
+		if (HFI1_CAP_KGET_MASK(dd->rcd[i]->flags, NODROP_EGR_FULL))
+			rcvmask |= HFI1_RCVCTRL_NO_EGR_DROP_ENB;
+		hfi1_rcvctrl(dd, rcvmask, i);
+		sc_enable(dd->rcd[i]->sc);
+	}
+}
+
+/**
+ * create_workqueues - create per port workqueues
+ * @dd: the hfi1_ib device
+ */
+static int create_workqueues(struct hfi1_devdata *dd)
+{
+	int pidx;
+	struct hfi1_pportdata *ppd;
+
+	for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+		ppd = dd->pport + pidx;
+		if (!ppd->hfi1_wq) {
+			char wq_name[8]; /* 3 + 2 + 1 + 1 + 1 */
+
+			snprintf(wq_name, sizeof(wq_name), "hfi%d_%d",
+				 dd->unit, pidx);
+			ppd->hfi1_wq =
+				create_singlethread_workqueue(wq_name);
+			if (!ppd->hfi1_wq)
+				goto wq_error;
+		}
+	}
+	return 0;
+wq_error:
+	pr_err("create_singlethread_workqueue failed for port %d\n",
+		pidx + 1);
+	for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+		ppd = dd->pport + pidx;
+		if (ppd->hfi1_wq) {
+			destroy_workqueue(ppd->hfi1_wq);
+			ppd->hfi1_wq = NULL;
+		}
+	}
+	return -ENOMEM;
+}
+
+/**
+ * hfi1_init - do the actual initialization sequence on the chip
+ * @dd: the hfi1_ib device
+ * @reinit: re-initializing, so don't allocate new memory
+ *
+ * Do the actual initialization sequence on the chip.  This is done
+ * both from the init routine called from the PCI infrastructure, and
+ * when we reset the chip, or detect that it was reset internally,
+ * or it's administratively re-enabled.
+ *
+ * Memory allocation here and in called routines is only done in
+ * the first case (reinit == 0).  We have to be careful, because even
+ * without memory allocation, we need to re-write all the chip registers
+ * TIDs, etc. after the reset or enable has completed.
+ */
+int hfi1_init(struct hfi1_devdata *dd, int reinit)
+{
+	int ret = 0, pidx, lastfail = 0;
+	unsigned i, len;
+	struct hfi1_ctxtdata *rcd;
+	struct hfi1_pportdata *ppd;
+
+	/* Set up recv low level handlers */
+	dd->normal_rhf_rcv_functions[RHF_RCV_TYPE_EXPECTED] =
+						kdeth_process_expected;
+	dd->normal_rhf_rcv_functions[RHF_RCV_TYPE_EAGER] =
+						kdeth_process_eager;
+	dd->normal_rhf_rcv_functions[RHF_RCV_TYPE_IB] = process_receive_ib;
+	dd->normal_rhf_rcv_functions[RHF_RCV_TYPE_ERROR] =
+						process_receive_error;
+	dd->normal_rhf_rcv_functions[RHF_RCV_TYPE_BYPASS] =
+						process_receive_bypass;
+	dd->normal_rhf_rcv_functions[RHF_RCV_TYPE_INVALID5] =
+						process_receive_invalid;
+	dd->normal_rhf_rcv_functions[RHF_RCV_TYPE_INVALID6] =
+						process_receive_invalid;
+	dd->normal_rhf_rcv_functions[RHF_RCV_TYPE_INVALID7] =
+						process_receive_invalid;
+	dd->rhf_rcv_function_map = dd->normal_rhf_rcv_functions;
+
+	/* Set up send low level handlers */
+	dd->process_pio_send = hfi1_verbs_send_pio;
+	dd->process_dma_send = hfi1_verbs_send_dma;
+	dd->pio_inline_send = pio_copy;
+
+	if (is_a0(dd)) {
+		atomic_set(&dd->drop_packet, DROP_PACKET_ON);
+		dd->do_drop = 1;
+	} else {
+		atomic_set(&dd->drop_packet, DROP_PACKET_OFF);
+		dd->do_drop = 0;
+	}
+
+	/* make sure the link is not "up" */
+	for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+		ppd = dd->pport + pidx;
+		ppd->linkup = 0;
+	}
+
+	if (reinit)
+		ret = init_after_reset(dd);
+	else
+		ret = loadtime_init(dd);
+	if (ret)
+		goto done;
+
+	/* dd->rcd can be NULL if early initialization failed */
+	for (i = 0; dd->rcd && i < dd->first_user_ctxt; ++i) {
+		/*
+		 * Set up the (kernel) rcvhdr queue and egr TIDs.  If doing
+		 * re-init, the simplest way to handle this is to free
+		 * existing, and re-allocate.
+		 * Need to re-create rest of ctxt 0 ctxtdata as well.
+		 */
+		rcd = dd->rcd[i];
+		if (!rcd)
+			continue;
+
+		rcd->do_interrupt = &handle_receive_interrupt;
+
+		lastfail = hfi1_create_rcvhdrq(dd, rcd);
+		if (!lastfail)
+			lastfail = hfi1_setup_eagerbufs(rcd);
+		if (lastfail)
+			dd_dev_err(dd,
+				"failed to allocate kernel ctxt's rcvhdrq and/or egr bufs\n");
+	}
+	if (lastfail)
+		ret = lastfail;
+
+	/* Allocate enough memory for user event notification. */
+	len = ALIGN(dd->chip_rcv_contexts * HFI1_MAX_SHARED_CTXTS *
+		    sizeof(*dd->events), PAGE_SIZE);
+	dd->events = vmalloc_user(len);
+	if (!dd->events)
+		dd_dev_err(dd, "Failed to allocate user events page\n");
+	/*
+	 * Allocate a page for device and port status.
+	 * Page will be shared amongst all user processes.
+	 */
+	dd->status = vmalloc_user(PAGE_SIZE);
+	if (!dd->status)
+		dd_dev_err(dd, "Failed to allocate dev status page\n");
+	else
+		dd->freezelen = PAGE_SIZE - (sizeof(*dd->status) -
+					     sizeof(dd->status->freezemsg));
+	for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+		ppd = dd->pport + pidx;
+		if (dd->status)
+			/* Currently, we only have one port */
+			ppd->statusp = &dd->status->port;
+
+		set_mtu(ppd);
+	}
+
+	/* enable chip even if we have an error, so we can debug cause */
+	enable_chip(dd);
+
+	ret = hfi1_cq_init(dd);
+done:
+	/*
+	 * Set status even if port serdes is not initialized
+	 * so that diags will work.
+	 */
+	if (dd->status)
+		dd->status->dev |= HFI1_STATUS_CHIP_PRESENT |
+			HFI1_STATUS_INITTED;
+	if (!ret) {
+		/* enable all interrupts from the chip */
+		set_intr_state(dd, 1);
+
+		/* chip is OK for user apps; mark it as initialized */
+		for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+			ppd = dd->pport + pidx;
+
+			/* initialize the qsfp if it exists
+			 * Requires interrupts to be enabled so we are notified
+			 * when the QSFP completes reset, and has
+			 * to be done before bringing up the SERDES
+			 */
+			init_qsfp(ppd);
+
+			/* start the serdes - must be after interrupts are
+			   enabled so we are notified when the link goes up */
+			lastfail = bringup_serdes(ppd);
+			if (lastfail)
+				dd_dev_info(dd,
+					"Failed to bring up port %u\n",
+					ppd->port);
+
+			/*
+			 * Set status even if port serdes is not initialized
+			 * so that diags will work.
+			 */
+			if (ppd->statusp)
+				*ppd->statusp |= HFI1_STATUS_CHIP_PRESENT |
+							HFI1_STATUS_INITTED;
+			if (!ppd->link_speed_enabled)
+				continue;
+		}
+	}
+
+	/* if ret is non-zero, we probably should do some cleanup here... */
+	return ret;
+}
+
+static inline struct hfi1_devdata *__hfi1_lookup(int unit)
+{
+	return idr_find(&hfi1_unit_table, unit);
+}
+
+struct hfi1_devdata *hfi1_lookup(int unit)
+{
+	struct hfi1_devdata *dd;
+	unsigned long flags;
+
+	spin_lock_irqsave(&hfi1_devs_lock, flags);
+	dd = __hfi1_lookup(unit);
+	spin_unlock_irqrestore(&hfi1_devs_lock, flags);
+
+	return dd;
+}
+
+/*
+ * Stop the timers during unit shutdown, or after an error late
+ * in initialization.
+ */
+static void stop_timers(struct hfi1_devdata *dd)
+{
+	struct hfi1_pportdata *ppd;
+	int pidx;
+
+	for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+		ppd = dd->pport + pidx;
+		if (ppd->led_override_timer.data) {
+			del_timer_sync(&ppd->led_override_timer);
+			atomic_set(&ppd->led_override_timer_active, 0);
+		}
+	}
+}
+
+/**
+ * shutdown_device - shut down a device
+ * @dd: the hfi1_ib device
+ *
+ * This is called to make the device quiet when we are about to
+ * unload the driver, and also when the device is administratively
+ * disabled.   It does not free any data structures.
+ * Everything it does has to be setup again by hfi1_init(dd, 1)
+ */
+static void shutdown_device(struct hfi1_devdata *dd)
+{
+	struct hfi1_pportdata *ppd;
+	unsigned pidx;
+	int i;
+
+	for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+		ppd = dd->pport + pidx;
+
+		ppd->linkup = 0;
+		if (ppd->statusp)
+			*ppd->statusp &= ~(HFI1_STATUS_IB_CONF |
+					   HFI1_STATUS_IB_READY);
+	}
+	dd->flags &= ~HFI1_INITTED;
+
+	/* mask interrupts, but not errors */
+	set_intr_state(dd, 0);
+
+	for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+		ppd = dd->pport + pidx;
+		for (i = 0; i < dd->num_rcv_contexts; i++)
+			hfi1_rcvctrl(dd, HFI1_RCVCTRL_TAILUPD_DIS |
+					  HFI1_RCVCTRL_CTXT_DIS |
+					  HFI1_RCVCTRL_INTRAVAIL_DIS |
+					  HFI1_RCVCTRL_PKEY_DIS |
+					  HFI1_RCVCTRL_ONE_PKT_EGR_DIS, i);
+		/*
+		 * Gracefully stop all sends allowing any in progress to
+		 * trickle out first.
+		 */
+		for (i = 0; i < dd->num_send_contexts; i++)
+			sc_flush(dd->send_contexts[i].sc);
+	}
+
+	/*
+	 * Enough for anything that's going to trickle out to have actually
+	 * done so.
+	 */
+	udelay(20);
+
+	for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+		ppd = dd->pport + pidx;
+
+		/* disable all contexts */
+		for (i = 0; i < dd->num_send_contexts; i++)
+			sc_disable(dd->send_contexts[i].sc);
+		/* disable the send device */
+		pio_send_control(dd, PSC_GLOBAL_DISABLE);
+
+		/*
+		 * Clear SerdesEnable.
+		 * We can't count on interrupts since we are stopping.
+		 */
+		hfi1_quiet_serdes(ppd);
+
+		if (ppd->hfi1_wq) {
+			destroy_workqueue(ppd->hfi1_wq);
+			ppd->hfi1_wq = NULL;
+		}
+	}
+	sdma_exit(dd);
+}
+
+/**
+ * hfi1_free_ctxtdata - free a context's allocated data
+ * @dd: the hfi1_ib device
+ * @rcd: the ctxtdata structure
+ *
+ * free up any allocated data for a context
+ * This should not touch anything that would affect a simultaneous
+ * re-allocation of context data, because it is called after hfi1_mutex
+ * is released (and can be called from reinit as well).
+ * It should never change any chip state, or global driver state.
+ */
+void hfi1_free_ctxtdata(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd)
+{
+	unsigned e;
+
+	if (!rcd)
+		return;
+
+	if (rcd->rcvhdrq) {
+		dma_free_coherent(&dd->pcidev->dev, rcd->rcvhdrq_size,
+				  rcd->rcvhdrq, rcd->rcvhdrq_phys);
+		rcd->rcvhdrq = NULL;
+		if (rcd->rcvhdrtail_kvaddr) {
+			dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE,
+					  (void *)rcd->rcvhdrtail_kvaddr,
+					  rcd->rcvhdrqtailaddr_phys);
+			rcd->rcvhdrtail_kvaddr = NULL;
+		}
+	}
+
+	/* all the RcvArray entries should have been cleared by now */
+	kfree(rcd->egrbufs.rcvtids);
+
+	for (e = 0; e < rcd->egrbufs.alloced; e++) {
+		if (rcd->egrbufs.buffers[e].phys)
+			dma_free_coherent(&dd->pcidev->dev,
+					  rcd->egrbufs.buffers[e].len,
+					  rcd->egrbufs.buffers[e].addr,
+					  rcd->egrbufs.buffers[e].phys);
+	}
+	kfree(rcd->egrbufs.buffers);
+
+	sc_free(rcd->sc);
+	vfree(rcd->physshadow);
+	vfree(rcd->tid_pg_list);
+	vfree(rcd->user_event_mask);
+	vfree(rcd->subctxt_uregbase);
+	vfree(rcd->subctxt_rcvegrbuf);
+	vfree(rcd->subctxt_rcvhdr_base);
+	kfree(rcd->tidusemap);
+	kfree(rcd->opstats);
+	kfree(rcd);
+}
+
+void hfi1_free_devdata(struct hfi1_devdata *dd)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&hfi1_devs_lock, flags);
+	idr_remove(&hfi1_unit_table, dd->unit);
+	list_del(&dd->list);
+	spin_unlock_irqrestore(&hfi1_devs_lock, flags);
+	hfi1_dbg_ibdev_exit(&dd->verbs_dev);
+	rcu_barrier(); /* wait for rcu callbacks to complete */
+	free_percpu(dd->int_counter);
+	free_percpu(dd->rcv_limit);
+	ib_dealloc_device(&dd->verbs_dev.ibdev);
+}
+
+/*
+ * Allocate our primary per-unit data structure.  Must be done via verbs
+ * allocator, because the verbs cleanup process both does cleanup and
+ * free of the data structure.
+ * "extra" is for chip-specific data.
+ *
+ * Use the idr mechanism to get a unit number for this unit.
+ */
+struct hfi1_devdata *hfi1_alloc_devdata(struct pci_dev *pdev, size_t extra)
+{
+	unsigned long flags;
+	struct hfi1_devdata *dd;
+	int ret;
+
+	dd = (struct hfi1_devdata *)ib_alloc_device(sizeof(*dd) + extra);
+	if (!dd)
+		return ERR_PTR(-ENOMEM);
+	/* extra is * number of ports */
+	dd->num_pports = extra / sizeof(struct hfi1_pportdata);
+	dd->pport = (struct hfi1_pportdata *)(dd + 1);
+
+	INIT_LIST_HEAD(&dd->list);
+	dd->node = dev_to_node(&pdev->dev);
+	if (dd->node < 0)
+		dd->node = 0;
+	idr_preload(GFP_KERNEL);
+	spin_lock_irqsave(&hfi1_devs_lock, flags);
+
+	ret = idr_alloc(&hfi1_unit_table, dd, 0, 0, GFP_NOWAIT);
+	if (ret >= 0) {
+		dd->unit = ret;
+		list_add(&dd->list, &hfi1_dev_list);
+	}
+
+	spin_unlock_irqrestore(&hfi1_devs_lock, flags);
+	idr_preload_end();
+
+	if (ret < 0) {
+		hfi1_early_err(&pdev->dev,
+			       "Could not allocate unit ID: error %d\n", -ret);
+		goto bail;
+	}
+	/*
+	 * Initialize all locks for the device. This needs to be as early as
+	 * possible so locks are usable.
+	 */
+	spin_lock_init(&dd->sc_lock);
+	spin_lock_init(&dd->sendctrl_lock);
+	spin_lock_init(&dd->rcvctrl_lock);
+	spin_lock_init(&dd->uctxt_lock);
+	spin_lock_init(&dd->hfi1_diag_trans_lock);
+	spin_lock_init(&dd->sc_init_lock);
+	spin_lock_init(&dd->dc8051_lock);
+	spin_lock_init(&dd->dc8051_memlock);
+	mutex_init(&dd->qsfp_i2c_mutex);
+	seqlock_init(&dd->sc2vl_lock);
+	spin_lock_init(&dd->sde_map_lock);
+	init_waitqueue_head(&dd->event_queue);
+
+	dd->int_counter = alloc_percpu(u64);
+	if (!dd->int_counter) {
+		ret = -ENOMEM;
+		hfi1_early_err(&pdev->dev,
+			       "Could not allocate per-cpu int_counter\n");
+		goto bail;
+	}
+
+	dd->rcv_limit = alloc_percpu(u64);
+	if (!dd->rcv_limit) {
+		ret = -ENOMEM;
+		hfi1_early_err(&pdev->dev,
+			       "Could not allocate per-cpu rcv_limit\n");
+		goto bail;
+	}
+
+	if (!hfi1_cpulist_count) {
+		u32 count = num_online_cpus();
+
+		hfi1_cpulist = kzalloc(BITS_TO_LONGS(count) *
+				      sizeof(long), GFP_KERNEL);
+		if (hfi1_cpulist)
+			hfi1_cpulist_count = count;
+		else
+			hfi1_early_err(
+			&pdev->dev,
+			"Could not alloc cpulist info, cpu affinity might be wrong\n");
+	}
+	hfi1_dbg_ibdev_init(&dd->verbs_dev);
+	return dd;
+
+bail:
+	if (!list_empty(&dd->list))
+		list_del_init(&dd->list);
+	ib_dealloc_device(&dd->verbs_dev.ibdev);
+	return ERR_PTR(ret);
+}
+
+/*
+ * Called from freeze mode handlers, and from PCI error
+ * reporting code.  Should be paranoid about state of
+ * system and data structures.
+ */
+void hfi1_disable_after_error(struct hfi1_devdata *dd)
+{
+	if (dd->flags & HFI1_INITTED) {
+		u32 pidx;
+
+		dd->flags &= ~HFI1_INITTED;
+		if (dd->pport)
+			for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+				struct hfi1_pportdata *ppd;
+
+				ppd = dd->pport + pidx;
+				if (dd->flags & HFI1_PRESENT)
+					set_link_state(ppd, HLS_DN_DISABLE);
+
+				if (ppd->statusp)
+					*ppd->statusp &= ~HFI1_STATUS_IB_READY;
+			}
+	}
+
+	/*
+	 * Mark as having had an error for driver, and also
+	 * for /sys and status word mapped to user programs.
+	 * This marks unit as not usable, until reset.
+	 */
+	if (dd->status)
+		dd->status->dev |= HFI1_STATUS_HWERROR;
+}
+
+static void remove_one(struct pci_dev *);
+static int init_one(struct pci_dev *, const struct pci_device_id *);
+
+#define DRIVER_LOAD_MSG "Intel " DRIVER_NAME " loaded: "
+#define PFX DRIVER_NAME ": "
+
+static const struct pci_device_id hfi1_pci_tbl[] = {
+	{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL0) },
+	{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL1) },
+	{ 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, hfi1_pci_tbl);
+
+static struct pci_driver hfi1_pci_driver = {
+	.name = DRIVER_NAME,
+	.probe = init_one,
+	.remove = remove_one,
+	.id_table = hfi1_pci_tbl,
+	.err_handler = &hfi1_pci_err_handler,
+};
+
+static void __init compute_krcvqs(void)
+{
+	int i;
+
+	for (i = 0; i < krcvqsset; i++)
+		n_krcvqs += krcvqs[i];
+}
+
+/*
+ * Do all the generic driver unit- and chip-independent memory
+ * allocation and initialization.
+ */
+static int __init hfi1_mod_init(void)
+{
+	int ret;
+
+	ret = dev_init();
+	if (ret)
+		goto bail;
+
+	/* validate max MTU before any devices start */
+	if (!valid_opa_max_mtu(hfi1_max_mtu)) {
+		pr_err("Invalid max_mtu 0x%x, using 0x%x instead\n",
+		       hfi1_max_mtu, HFI1_DEFAULT_MAX_MTU);
+		hfi1_max_mtu = HFI1_DEFAULT_MAX_MTU;
+	}
+	/* valid CUs run from 1-128 in powers of 2 */
+	if (hfi1_cu > 128 || !is_power_of_2(hfi1_cu))
+		hfi1_cu = 1;
+	/* valid credit return threshold is 0-100, variable is unsigned */
+	if (user_credit_return_threshold > 100)
+		user_credit_return_threshold = 100;
+
+	compute_krcvqs();
+	/* sanitize receive interrupt count, time must wait until after
+	   the hardware type is known */
+	if (rcv_intr_count > RCV_HDR_HEAD_COUNTER_MASK)
+		rcv_intr_count = RCV_HDR_HEAD_COUNTER_MASK;
+	/* reject invalid combinations */
+	if (rcv_intr_count == 0 && rcv_intr_timeout == 0) {
+		pr_err("Invalid mode: both receive interrupt count and available timeout are zero - setting interrupt count to 1\n");
+		rcv_intr_count = 1;
+	}
+	if (rcv_intr_count > 1 && rcv_intr_timeout == 0) {
+		/*
+		 * Avoid indefinite packet delivery by requiring a timeout
+		 * if count is > 1.
+		 */
+		pr_err("Invalid mode: receive interrupt count greater than 1 and available timeout is zero - setting available timeout to 1\n");
+		rcv_intr_timeout = 1;
+	}
+	if (rcv_intr_dynamic && !(rcv_intr_count > 1 && rcv_intr_timeout > 0)) {
+		/*
+		 * The dynamic algorithm expects a non-zero timeout
+		 * and a count > 1.
+		 */
+		pr_err("Invalid mode: dynamic receive interrupt mitigation with invalid count and timeout - turning dynamic off\n");
+		rcv_intr_dynamic = 0;
+	}
+
+	/* sanitize link CRC options */
+	link_crc_mask &= SUPPORTED_CRCS;
+
+	/*
+	 * These must be called before the driver is registered with
+	 * the PCI subsystem.
+	 */
+	idr_init(&hfi1_unit_table);
+
+	hfi1_dbg_init();
+	ret = pci_register_driver(&hfi1_pci_driver);
+	if (ret < 0) {
+		pr_err("Unable to register driver: error %d\n", -ret);
+		goto bail_dev;
+	}
+	goto bail; /* all OK */
+
+bail_dev:
+	hfi1_dbg_exit();
+	idr_destroy(&hfi1_unit_table);
+	dev_cleanup();
+bail:
+	return ret;
+}
+
+module_init(hfi1_mod_init);
+
+/*
+ * Do the non-unit driver cleanup, memory free, etc. at unload.
+ */
+static void __exit hfi1_mod_cleanup(void)
+{
+	pci_unregister_driver(&hfi1_pci_driver);
+	hfi1_dbg_exit();
+	hfi1_cpulist_count = 0;
+	kfree(hfi1_cpulist);
+
+	idr_destroy(&hfi1_unit_table);
+	dispose_firmware();	/* asymmetric with obtain_firmware() */
+	dev_cleanup();
+}
+
+module_exit(hfi1_mod_cleanup);
+
+/* this can only be called after a successful initialization */
+static void cleanup_device_data(struct hfi1_devdata *dd)
+{
+	int ctxt;
+	int pidx;
+	struct hfi1_ctxtdata **tmp;
+	unsigned long flags;
+
+	/* users can't do anything more with chip */
+	for (pidx = 0; pidx < dd->num_pports; ++pidx) {
+		struct hfi1_pportdata *ppd = &dd->pport[pidx];
+		struct cc_state *cc_state;
+		int i;
+
+		if (ppd->statusp)
+			*ppd->statusp &= ~HFI1_STATUS_CHIP_PRESENT;
+
+		for (i = 0; i < OPA_MAX_SLS; i++)
+			hrtimer_cancel(&ppd->cca_timer[i].hrtimer);
+
+		spin_lock(&ppd->cc_state_lock);
+		cc_state = get_cc_state(ppd);
+		rcu_assign_pointer(ppd->cc_state, NULL);
+		spin_unlock(&ppd->cc_state_lock);
+
+		if (cc_state)
+			call_rcu(&cc_state->rcu, cc_state_reclaim);
+	}
+
+	free_credit_return(dd);
+
+	/*
+	 * Free any resources still in use (usually just kernel contexts)
+	 * at unload; we do for ctxtcnt, because that's what we allocate.
+	 * We acquire lock to be really paranoid that rcd isn't being
+	 * accessed from some interrupt-related code (that should not happen,
+	 * but best to be sure).
+	 */
+	spin_lock_irqsave(&dd->uctxt_lock, flags);
+	tmp = dd->rcd;
+	dd->rcd = NULL;
+	spin_unlock_irqrestore(&dd->uctxt_lock, flags);
+	for (ctxt = 0; tmp && ctxt < dd->num_rcv_contexts; ctxt++) {
+		struct hfi1_ctxtdata *rcd = tmp[ctxt];
+
+		tmp[ctxt] = NULL; /* debugging paranoia */
+		if (rcd) {
+			hfi1_clear_tids(rcd);
+			hfi1_free_ctxtdata(dd, rcd);
+		}
+	}
+	kfree(tmp);
+	/* must follow rcv context free - need to remove rcv's hooks */
+	for (ctxt = 0; ctxt < dd->num_send_contexts; ctxt++)
+		sc_free(dd->send_contexts[ctxt].sc);
+	dd->num_send_contexts = 0;
+	kfree(dd->send_contexts);
+	dd->send_contexts = NULL;
+	kfree(dd->boardname);
+	vfree(dd->events);
+	vfree(dd->status);
+	hfi1_cq_exit(dd);
+}
+
+/*
+ * Clean up on unit shutdown, or error during unit load after
+ * successful initialization.
+ */
+static void postinit_cleanup(struct hfi1_devdata *dd)
+{
+	hfi1_start_cleanup(dd);
+
+	hfi1_pcie_ddcleanup(dd);
+	hfi1_pcie_cleanup(dd->pcidev);
+
+	cleanup_device_data(dd);
+
+	hfi1_free_devdata(dd);
+}
+
+static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+	int ret = 0, j, pidx, initfail;
+	struct hfi1_devdata *dd = NULL;
+
+	/* First, lock the non-writable module parameters */
+	HFI1_CAP_LOCK();
+
+	/* Validate some global module parameters */
+	if (rcvhdrcnt <= HFI1_MIN_HDRQ_EGRBUF_CNT) {
+		hfi1_early_err(&pdev->dev, "Header queue  count too small\n");
+		ret = -EINVAL;
+		goto bail;
+	}
+	/* use the encoding function as a sanitization check */
+	if (!encode_rcv_header_entry_size(hfi1_hdrq_entsize)) {
+		hfi1_early_err(&pdev->dev, "Invalid HdrQ Entry size %u\n",
+			       hfi1_hdrq_entsize);
+		goto bail;
+	}
+
+	/* The receive eager buffer size must be set before the receive
+	 * contexts are created.
+	 *
+	 * Set the eager buffer size.  Validate that it falls in a range
+	 * allowed by the hardware - all powers of 2 between the min and
+	 * max.  The maximum valid MTU is within the eager buffer range
+	 * so we do not need to cap the max_mtu by an eager buffer size
+	 * setting.
+	 */
+	if (eager_buffer_size) {
+		if (!is_power_of_2(eager_buffer_size))
+			eager_buffer_size =
+				roundup_pow_of_two(eager_buffer_size);
+		eager_buffer_size =
+			clamp_val(eager_buffer_size,
+				  MIN_EAGER_BUFFER * 8,
+				  MAX_EAGER_BUFFER_TOTAL);
+		hfi1_early_info(&pdev->dev, "Eager buffer size %u\n",
+				eager_buffer_size);
+	} else {
+		hfi1_early_err(&pdev->dev, "Invalid Eager buffer size of 0\n");
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	/* restrict value of hfi1_rcvarr_split */
+	hfi1_rcvarr_split = clamp_val(hfi1_rcvarr_split, 0, 100);
+
+	ret = hfi1_pcie_init(pdev, ent);
+	if (ret)
+		goto bail;
+
+	/*
+	 * Do device-specific initialization, function table setup, dd
+	 * allocation, etc.
+	 */
+	switch (ent->device) {
+	case PCI_DEVICE_ID_INTEL0:
+	case PCI_DEVICE_ID_INTEL1:
+		dd = hfi1_init_dd(pdev, ent);
+		break;
+	default:
+		hfi1_early_err(&pdev->dev,
+			       "Failing on unknown Intel deviceid 0x%x\n",
+			       ent->device);
+		ret = -ENODEV;
+	}
+
+	if (IS_ERR(dd))
+		ret = PTR_ERR(dd);
+	if (ret)
+		goto clean_bail; /* error already printed */
+
+	ret = create_workqueues(dd);
+	if (ret)
+		goto clean_bail;
+
+	/* do the generic initialization */
+	initfail = hfi1_init(dd, 0);
+
+	ret = hfi1_register_ib_device(dd);
+
+	/*
+	 * Now ready for use.  this should be cleared whenever we
+	 * detect a reset, or initiate one.  If earlier failure,
+	 * we still create devices, so diags, etc. can be used
+	 * to determine cause of problem.
+	 */
+	if (!initfail && !ret)
+		dd->flags |= HFI1_INITTED;
+
+	j = hfi1_device_create(dd);
+	if (j)
+		dd_dev_err(dd, "Failed to create /dev devices: %d\n", -j);
+
+	if (initfail || ret) {
+		stop_timers(dd);
+		flush_workqueue(ib_wq);
+		for (pidx = 0; pidx < dd->num_pports; ++pidx)
+			hfi1_quiet_serdes(dd->pport + pidx);
+		if (!j)
+			hfi1_device_remove(dd);
+		if (!ret)
+			hfi1_unregister_ib_device(dd);
+		postinit_cleanup(dd);
+		if (initfail)
+			ret = initfail;
+		goto bail;	/* everything already cleaned */
+	}
+
+	sdma_start(dd);
+
+	return 0;
+
+clean_bail:
+	hfi1_pcie_cleanup(pdev);
+bail:
+	return ret;
+}
+
+static void remove_one(struct pci_dev *pdev)
+{
+	struct hfi1_devdata *dd = pci_get_drvdata(pdev);
+
+	/* unregister from IB core */
+	hfi1_unregister_ib_device(dd);
+
+	/*
+	 * Disable the IB link, disable interrupts on the device,
+	 * clear dma engines, etc.
+	 */
+	shutdown_device(dd);
+
+	stop_timers(dd);
+
+	/* wait until all of our (qsfp) queue_work() calls complete */
+	flush_workqueue(ib_wq);
+
+	hfi1_device_remove(dd);
+
+	postinit_cleanup(dd);
+}
+
+/**
+ * hfi1_create_rcvhdrq - create a receive header queue
+ * @dd: the hfi1_ib device
+ * @rcd: the context data
+ *
+ * This must be contiguous memory (from an i/o perspective), and must be
+ * DMA'able (which means for some systems, it will go through an IOMMU,
+ * or be forced into a low address range).
+ */
+int hfi1_create_rcvhdrq(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd)
+{
+	unsigned amt;
+	u64 reg;
+
+	if (!rcd->rcvhdrq) {
+		dma_addr_t phys_hdrqtail;
+		gfp_t gfp_flags;
+
+		/*
+		 * rcvhdrqentsize is in DWs, so we have to convert to bytes
+		 * (* sizeof(u32)).
+		 */
+		amt = ALIGN(rcd->rcvhdrq_cnt * rcd->rcvhdrqentsize *
+			    sizeof(u32), PAGE_SIZE);
+
+		gfp_flags = (rcd->ctxt >= dd->first_user_ctxt) ?
+			GFP_USER : GFP_KERNEL;
+		rcd->rcvhdrq = dma_zalloc_coherent(
+			&dd->pcidev->dev, amt, &rcd->rcvhdrq_phys,
+			gfp_flags | __GFP_COMP);
+
+		if (!rcd->rcvhdrq) {
+			dd_dev_err(dd,
+				"attempt to allocate %d bytes for ctxt %u rcvhdrq failed\n",
+				amt, rcd->ctxt);
+			goto bail;
+		}
+
+		/* Event mask is per device now and is in hfi1_devdata */
+		/*if (rcd->ctxt >= dd->first_user_ctxt) {
+			rcd->user_event_mask = vmalloc_user(PAGE_SIZE);
+			if (!rcd->user_event_mask)
+				goto bail_free_hdrq;
+				}*/
+
+		if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL)) {
+			rcd->rcvhdrtail_kvaddr = dma_zalloc_coherent(
+				&dd->pcidev->dev, PAGE_SIZE, &phys_hdrqtail,
+				gfp_flags);
+			if (!rcd->rcvhdrtail_kvaddr)
+				goto bail_free;
+			rcd->rcvhdrqtailaddr_phys = phys_hdrqtail;
+		}
+
+		rcd->rcvhdrq_size = amt;
+	}
+	/*
+	 * These values are per-context:
+	 *	RcvHdrCnt
+	 *	RcvHdrEntSize
+	 *	RcvHdrSize
+	 */
+	reg = ((u64)(rcd->rcvhdrq_cnt >> HDRQ_SIZE_SHIFT)
+			& RCV_HDR_CNT_CNT_MASK)
+		<< RCV_HDR_CNT_CNT_SHIFT;
+	write_kctxt_csr(dd, rcd->ctxt, RCV_HDR_CNT, reg);
+	reg = (encode_rcv_header_entry_size(rcd->rcvhdrqentsize)
+			& RCV_HDR_ENT_SIZE_ENT_SIZE_MASK)
+		<< RCV_HDR_ENT_SIZE_ENT_SIZE_SHIFT;
+	write_kctxt_csr(dd, rcd->ctxt, RCV_HDR_ENT_SIZE, reg);
+	reg = (dd->rcvhdrsize & RCV_HDR_SIZE_HDR_SIZE_MASK)
+		<< RCV_HDR_SIZE_HDR_SIZE_SHIFT;
+	write_kctxt_csr(dd, rcd->ctxt, RCV_HDR_SIZE, reg);
+	return 0;
+
+bail_free:
+	dd_dev_err(dd,
+		"attempt to allocate 1 page for ctxt %u rcvhdrqtailaddr failed\n",
+		rcd->ctxt);
+	vfree(rcd->user_event_mask);
+	rcd->user_event_mask = NULL;
+	dma_free_coherent(&dd->pcidev->dev, amt, rcd->rcvhdrq,
+			  rcd->rcvhdrq_phys);
+	rcd->rcvhdrq = NULL;
+bail:
+	return -ENOMEM;
+}
+
+/**
+ * allocate eager buffers, both kernel and user contexts.
+ * @rcd: the context we are setting up.
+ *
+ * Allocate the eager TID buffers and program them into hip.
+ * They are no longer completely contiguous, we do multiple allocation
+ * calls.  Otherwise we get the OOM code involved, by asking for too
+ * much per call, with disastrous results on some kernels.
+ */
+int hfi1_setup_eagerbufs(struct hfi1_ctxtdata *rcd)
+{
+	struct hfi1_devdata *dd = rcd->dd;
+	u32 max_entries, egrtop, alloced_bytes = 0, idx = 0;
+	gfp_t gfp_flags;
+	u16 order;
+	int ret = 0;
+	u16 round_mtu = roundup_pow_of_two(hfi1_max_mtu);
+
+	/*
+	 * GFP_USER, but without GFP_FS, so buffer cache can be
+	 * coalesced (we hope); otherwise, even at order 4,
+	 * heavy filesystem activity makes these fail, and we can
+	 * use compound pages.
+	 */
+	gfp_flags = __GFP_WAIT | __GFP_IO | __GFP_COMP;
+
+	/*
+	 * The minimum size of the eager buffers is a groups of MTU-sized
+	 * buffers.
+	 * The global eager_buffer_size parameter is checked against the
+	 * theoretical lower limit of the value. Here, we check against the
+	 * MTU.
+	 */
+	if (rcd->egrbufs.size < (round_mtu * dd->rcv_entries.group_size))
+		rcd->egrbufs.size = round_mtu * dd->rcv_entries.group_size;
+	/*
+	 * If using one-pkt-per-egr-buffer, lower the eager buffer
+	 * size to the max MTU (page-aligned).
+	 */
+	if (!HFI1_CAP_KGET_MASK(rcd->flags, MULTI_PKT_EGR))
+		rcd->egrbufs.rcvtid_size = round_mtu;
+
+	/*
+	 * Eager buffers sizes of 1MB or less require smaller TID sizes
+	 * to satisfy the "multiple of 8 RcvArray entries" requirement.
+	 */
+	if (rcd->egrbufs.size <= (1 << 20))
+		rcd->egrbufs.rcvtid_size = max((unsigned long)round_mtu,
+			rounddown_pow_of_two(rcd->egrbufs.size / 8));
+
+	while (alloced_bytes < rcd->egrbufs.size &&
+	       rcd->egrbufs.alloced < rcd->egrbufs.count) {
+		rcd->egrbufs.buffers[idx].addr =
+			dma_zalloc_coherent(&dd->pcidev->dev,
+					    rcd->egrbufs.rcvtid_size,
+					    &rcd->egrbufs.buffers[idx].phys,
+					    gfp_flags);
+		if (rcd->egrbufs.buffers[idx].addr) {
+			rcd->egrbufs.buffers[idx].len =
+				rcd->egrbufs.rcvtid_size;
+			rcd->egrbufs.rcvtids[rcd->egrbufs.alloced].addr =
+				rcd->egrbufs.buffers[idx].addr;
+			rcd->egrbufs.rcvtids[rcd->egrbufs.alloced].phys =
+				rcd->egrbufs.buffers[idx].phys;
+			rcd->egrbufs.alloced++;
+			alloced_bytes += rcd->egrbufs.rcvtid_size;
+			idx++;
+		} else {
+			u32 new_size, i, j;
+			u64 offset = 0;
+
+			/*
+			 * Fail the eager buffer allocation if:
+			 *   - we are already using the lowest acceptable size
+			 *   - we are using one-pkt-per-egr-buffer (this implies
+			 *     that we are accepting only one size)
+			 */
+			if (rcd->egrbufs.rcvtid_size == round_mtu ||
+			    !HFI1_CAP_KGET_MASK(rcd->flags, MULTI_PKT_EGR)) {
+				dd_dev_err(dd, "ctxt%u: Failed to allocate eager buffers\n",
+					rcd->ctxt);
+				goto bail_rcvegrbuf_phys;
+			}
+
+			new_size = rcd->egrbufs.rcvtid_size / 2;
+
+			/*
+			 * If the first attempt to allocate memory failed, don't
+			 * fail everything but continue with the next lower
+			 * size.
+			 */
+			if (idx == 0) {
+				rcd->egrbufs.rcvtid_size = new_size;
+				continue;
+			}
+
+			/*
+			 * Re-partition already allocated buffers to a smaller
+			 * size.
+			 */
+			rcd->egrbufs.alloced = 0;
+			for (i = 0, j = 0, offset = 0; j < idx; i++) {
+				if (i >= rcd->egrbufs.count)
+					break;
+				rcd->egrbufs.rcvtids[i].phys =
+					rcd->egrbufs.buffers[j].phys + offset;
+				rcd->egrbufs.rcvtids[i].addr =
+					rcd->egrbufs.buffers[j].addr + offset;
+				rcd->egrbufs.alloced++;
+				if ((rcd->egrbufs.buffers[j].phys + offset +
+				     new_size) ==
+				    (rcd->egrbufs.buffers[j].phys +
+				     rcd->egrbufs.buffers[j].len)) {
+					j++;
+					offset = 0;
+				} else
+					offset += new_size;
+			}
+			rcd->egrbufs.rcvtid_size = new_size;
+		}
+	}
+	rcd->egrbufs.numbufs = idx;
+	rcd->egrbufs.size = alloced_bytes;
+
+	dd_dev_info(dd, "ctxt%u: Alloced %u rcv tid entries @ %uKB, total %zuKB\n",
+		rcd->ctxt, rcd->egrbufs.alloced, rcd->egrbufs.rcvtid_size,
+		rcd->egrbufs.size);
+
+	/*
+	 * Set the contexts rcv array head update threshold to the closest
+	 * power of 2 (so we can use a mask instead of modulo) below half
+	 * the allocated entries.
+	 */
+	rcd->egrbufs.threshold =
+		rounddown_pow_of_two(rcd->egrbufs.alloced / 2);
+	/*
+	 * Compute the expected RcvArray entry base. This is done after
+	 * allocating the eager buffers in order to maximize the
+	 * expected RcvArray entries for the context.
+	 */
+	max_entries = rcd->rcv_array_groups * dd->rcv_entries.group_size;
+	egrtop = roundup(rcd->egrbufs.alloced, dd->rcv_entries.group_size);
+	rcd->expected_count = max_entries - egrtop;
+	if (rcd->expected_count > MAX_TID_PAIR_ENTRIES * 2)
+		rcd->expected_count = MAX_TID_PAIR_ENTRIES * 2;
+
+	rcd->expected_base = rcd->eager_base + egrtop;
+	dd_dev_info(dd, "ctxt%u: eager:%u, exp:%u, egrbase:%u, expbase:%u\n",
+		    rcd->ctxt, rcd->egrbufs.alloced, rcd->expected_count,
+		    rcd->eager_base, rcd->expected_base);
+
+	if (!hfi1_rcvbuf_validate(rcd->egrbufs.rcvtid_size, PT_EAGER, &order)) {
+		dd_dev_err(dd, "ctxt%u: current Eager buffer size is invalid %u\n",
+			   rcd->ctxt, rcd->egrbufs.rcvtid_size);
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	for (idx = 0; idx < rcd->egrbufs.alloced; idx++) {
+		hfi1_put_tid(dd, rcd->eager_base + idx, PT_EAGER,
+			      rcd->egrbufs.rcvtids[idx].phys, order);
+		cond_resched();
+	}
+	goto bail;
+
+bail_rcvegrbuf_phys:
+	for (idx = 0; idx < rcd->egrbufs.alloced &&
+		     rcd->egrbufs.buffers[idx].addr;
+	     idx++) {
+		dma_free_coherent(&dd->pcidev->dev,
+				  rcd->egrbufs.buffers[idx].len,
+				  rcd->egrbufs.buffers[idx].addr,
+				  rcd->egrbufs.buffers[idx].phys);
+		rcd->egrbufs.buffers[idx].addr = NULL;
+		rcd->egrbufs.buffers[idx].phys = 0;
+		rcd->egrbufs.buffers[idx].len = 0;
+	}
+bail:
+	return ret;
+}
diff --git a/drivers/staging/rdma/hfi1/intr.c b/drivers/staging/rdma/hfi1/intr.c
new file mode 100644
index 000000000000..426582b9ab65
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/intr.c
@@ -0,0 +1,207 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/pci.h>
+#include <linux/delay.h>
+
+#include "hfi.h"
+#include "common.h"
+#include "sdma.h"
+
+/**
+ * format_hwmsg - format a single hwerror message
+ * @msg message buffer
+ * @msgl length of message buffer
+ * @hwmsg message to add to message buffer
+ */
+static void format_hwmsg(char *msg, size_t msgl, const char *hwmsg)
+{
+	strlcat(msg, "[", msgl);
+	strlcat(msg, hwmsg, msgl);
+	strlcat(msg, "]", msgl);
+}
+
+/**
+ * hfi1_format_hwerrors - format hardware error messages for display
+ * @hwerrs hardware errors bit vector
+ * @hwerrmsgs hardware error descriptions
+ * @nhwerrmsgs number of hwerrmsgs
+ * @msg message buffer
+ * @msgl message buffer length
+ */
+void hfi1_format_hwerrors(u64 hwerrs, const struct hfi1_hwerror_msgs *hwerrmsgs,
+			  size_t nhwerrmsgs, char *msg, size_t msgl)
+{
+	int i;
+
+	for (i = 0; i < nhwerrmsgs; i++)
+		if (hwerrs & hwerrmsgs[i].mask)
+			format_hwmsg(msg, msgl, hwerrmsgs[i].msg);
+}
+
+static void signal_ib_event(struct hfi1_pportdata *ppd, enum ib_event_type ev)
+{
+	struct ib_event event;
+	struct hfi1_devdata *dd = ppd->dd;
+
+	/*
+	 * Only call ib_dispatch_event() if the IB device has been
+	 * registered.  HFI1_INITED is set iff the driver has successfully
+	 * registered with the IB core.
+	 */
+	if (!(dd->flags & HFI1_INITTED))
+		return;
+	event.device = &dd->verbs_dev.ibdev;
+	event.element.port_num = ppd->port;
+	event.event = ev;
+	ib_dispatch_event(&event);
+}
+
+/*
+ * Handle a linkup or link down notification.
+ * This is called outside an interrupt.
+ */
+void handle_linkup_change(struct hfi1_devdata *dd, u32 linkup)
+{
+	struct hfi1_pportdata *ppd = &dd->pport[0];
+	enum ib_event_type ev;
+
+	if (!(ppd->linkup ^ !!linkup))
+		return;	/* no change, nothing to do */
+
+	if (linkup) {
+		/*
+		 * Quick linkup and all link up on the simulator does not
+		 * trigger or implement:
+		 *	- VerifyCap interrupt
+		 *	- VerifyCap frames
+		 * But rather moves directly to LinkUp.
+		 *
+		 * Do the work of the VerifyCap interrupt handler,
+		 * handle_verify_cap(), but do not try moving the state to
+		 * LinkUp as we are already there.
+		 *
+		 * NOTE: This uses this device's vAU, vCU, and vl15_init for
+		 * the remote values.  Both sides must be using the values.
+		 */
+		if (quick_linkup
+			    || dd->icode == ICODE_FUNCTIONAL_SIMULATOR) {
+			set_up_vl15(dd, dd->vau, dd->vl15_init);
+			assign_remote_cm_au_table(dd, dd->vcu);
+			ppd->neighbor_guid =
+				read_csr(dd,
+					DC_DC8051_STS_REMOTE_GUID);
+			ppd->neighbor_type =
+				read_csr(dd, DC_DC8051_STS_REMOTE_NODE_TYPE) &
+					DC_DC8051_STS_REMOTE_NODE_TYPE_VAL_MASK;
+			ppd->neighbor_port_number =
+				read_csr(dd, DC_DC8051_STS_REMOTE_PORT_NO) &
+					DC_DC8051_STS_REMOTE_PORT_NO_VAL_SMASK;
+			dd_dev_info(dd,
+				"Neighbor GUID: %llx Neighbor type %d\n",
+				ppd->neighbor_guid,
+				ppd->neighbor_type);
+		}
+
+		/* physical link went up */
+		ppd->linkup = 1;
+		ppd->offline_disabled_reason = OPA_LINKDOWN_REASON_NONE;
+
+		/* link widths are not available until the link is fully up */
+		get_linkup_link_widths(ppd);
+
+	} else {
+		/* physical link went down */
+		ppd->linkup = 0;
+
+		/* clear HW details of the previous connection */
+		reset_link_credits(dd);
+
+		/* freeze after a link down to guarantee a clean egress */
+		start_freeze_handling(ppd, FREEZE_SELF|FREEZE_LINK_DOWN);
+
+		ev = IB_EVENT_PORT_ERR;
+
+		hfi1_set_uevent_bits(ppd, _HFI1_EVENT_LINKDOWN_BIT);
+
+		/* if we are down, the neighbor is down */
+		ppd->neighbor_normal = 0;
+
+		/* notify IB of the link change */
+		signal_ib_event(ppd, ev);
+	}
+
+
+}
+
+/*
+ * Handle receive or urgent interrupts for user contexts.  This means a user
+ * process was waiting for a packet to arrive, and didn't want to poll.
+ */
+void handle_user_interrupt(struct hfi1_ctxtdata *rcd)
+{
+	struct hfi1_devdata *dd = rcd->dd;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dd->uctxt_lock, flags);
+	if (!rcd->cnt)
+		goto done;
+
+	if (test_and_clear_bit(HFI1_CTXT_WAITING_RCV, &rcd->event_flags)) {
+		wake_up_interruptible(&rcd->wait);
+		hfi1_rcvctrl(dd, HFI1_RCVCTRL_INTRAVAIL_DIS, rcd->ctxt);
+	} else if (test_and_clear_bit(HFI1_CTXT_WAITING_URG,
+							&rcd->event_flags)) {
+		rcd->urgent++;
+		wake_up_interruptible(&rcd->wait);
+	}
+done:
+	spin_unlock_irqrestore(&dd->uctxt_lock, flags);
+}
diff --git a/drivers/staging/rdma/hfi1/iowait.h b/drivers/staging/rdma/hfi1/iowait.h
new file mode 100644
index 000000000000..fa361b405851
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/iowait.h
@@ -0,0 +1,186 @@
+#ifndef _HFI1_IOWAIT_H
+#define _HFI1_IOWAIT_H
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/sched.h>
+
+/*
+ * typedef (*restart_t)() - restart callback
+ * @work: pointer to work structure
+ */
+typedef void (*restart_t)(struct work_struct *work);
+
+struct sdma_txreq;
+struct sdma_engine;
+/**
+ * struct iowait - linkage for delayed progress/waiting
+ * @list: used to add/insert into QP/PQ wait lists
+ * @tx_head: overflow list of sdma_txreq's
+ * @sleep: no space callback
+ * @wakeup: space callback
+ * @iowork: workqueue overhead
+ * @wait_dma: wait for sdma_busy == 0
+ * @sdma_busy: # of packets in flight
+ * @count: total number of descriptors in tx_head'ed list
+ * @tx_limit: limit for overflow queuing
+ * @tx_count: number of tx entry's in tx_head'ed list
+ *
+ * This is to be embedded in user's state structure
+ * (QP or PQ).
+ *
+ * The sleep and wakeup members are a
+ * bit misnamed.   They do not strictly
+ * speaking sleep or wake up, but they
+ * are callbacks for the ULP to implement
+ * what ever queuing/dequeuing of
+ * the embedded iowait and its containing struct
+ * when a resource shortage like SDMA ring space is seen.
+ *
+ * Both potentially have locks help
+ * so sleeping is not allowed.
+ *
+ * The wait_dma member along with the iow
+ */
+
+struct iowait {
+	struct list_head list;
+	struct list_head tx_head;
+	int (*sleep)(
+		struct sdma_engine *sde,
+		struct iowait *wait,
+		struct sdma_txreq *tx,
+		unsigned seq);
+	void (*wakeup)(struct iowait *wait, int reason);
+	struct work_struct iowork;
+	wait_queue_head_t wait_dma;
+	atomic_t sdma_busy;
+	u32 count;
+	u32 tx_limit;
+	u32 tx_count;
+};
+
+#define SDMA_AVAIL_REASON 0
+
+/**
+ * iowait_init() - initialize wait structure
+ * @wait: wait struct to initialize
+ * @tx_limit: limit for overflow queuing
+ * @func: restart function for workqueue
+ * @sleep: sleep function for no space
+ * @wakeup: wakeup function for no space
+ *
+ * This function initializes the iowait
+ * structure embedded in the QP or PQ.
+ *
+ */
+
+static inline void iowait_init(
+	struct iowait *wait,
+	u32 tx_limit,
+	void (*func)(struct work_struct *work),
+	int (*sleep)(
+		struct sdma_engine *sde,
+		struct iowait *wait,
+		struct sdma_txreq *tx,
+		unsigned seq),
+	void (*wakeup)(struct iowait *wait, int reason))
+{
+	wait->count = 0;
+	INIT_LIST_HEAD(&wait->list);
+	INIT_LIST_HEAD(&wait->tx_head);
+	INIT_WORK(&wait->iowork, func);
+	init_waitqueue_head(&wait->wait_dma);
+	atomic_set(&wait->sdma_busy, 0);
+	wait->tx_limit = tx_limit;
+	wait->sleep = sleep;
+	wait->wakeup = wakeup;
+}
+
+/**
+ * iowait_schedule() - initialize wait structure
+ * @wait: wait struct to schedule
+ * @wq: workqueue for schedule
+ */
+static inline void iowait_schedule(
+	struct iowait *wait,
+	struct workqueue_struct *wq)
+{
+	queue_work(wq, &wait->iowork);
+}
+
+/**
+ * iowait_sdma_drain() - wait for DMAs to drain
+ *
+ * @wait: iowait structure
+ *
+ * This will delay until the iowait sdmas have
+ * completed.
+ */
+static inline void iowait_sdma_drain(struct iowait *wait)
+{
+	wait_event(wait->wait_dma, !atomic_read(&wait->sdma_busy));
+}
+
+/**
+ * iowait_drain_wakeup() - trigger iowait_drain() waiter
+ *
+ * @wait: iowait structure
+ *
+ * This will trigger any waiters.
+ */
+static inline void iowait_drain_wakeup(struct iowait *wait)
+{
+	wake_up(&wait->wait_dma);
+}
+
+#endif
diff --git a/drivers/staging/rdma/hfi1/keys.c b/drivers/staging/rdma/hfi1/keys.c
new file mode 100644
index 000000000000..f6eff177ace1
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/keys.c
@@ -0,0 +1,411 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include "hfi.h"
+
+/**
+ * hfi1_alloc_lkey - allocate an lkey
+ * @mr: memory region that this lkey protects
+ * @dma_region: 0->normal key, 1->restricted DMA key
+ *
+ * Returns 0 if successful, otherwise returns -errno.
+ *
+ * Increments mr reference count as required.
+ *
+ * Sets the lkey field mr for non-dma regions.
+ *
+ */
+
+int hfi1_alloc_lkey(struct hfi1_mregion *mr, int dma_region)
+{
+	unsigned long flags;
+	u32 r;
+	u32 n;
+	int ret = 0;
+	struct hfi1_ibdev *dev = to_idev(mr->pd->device);
+	struct hfi1_lkey_table *rkt = &dev->lk_table;
+
+	hfi1_get_mr(mr);
+	spin_lock_irqsave(&rkt->lock, flags);
+
+	/* special case for dma_mr lkey == 0 */
+	if (dma_region) {
+		struct hfi1_mregion *tmr;
+
+		tmr = rcu_access_pointer(dev->dma_mr);
+		if (!tmr) {
+			rcu_assign_pointer(dev->dma_mr, mr);
+			mr->lkey_published = 1;
+		} else {
+			hfi1_put_mr(mr);
+		}
+		goto success;
+	}
+
+	/* Find the next available LKEY */
+	r = rkt->next;
+	n = r;
+	for (;;) {
+		if (!rcu_access_pointer(rkt->table[r]))
+			break;
+		r = (r + 1) & (rkt->max - 1);
+		if (r == n)
+			goto bail;
+	}
+	rkt->next = (r + 1) & (rkt->max - 1);
+	/*
+	 * Make sure lkey is never zero which is reserved to indicate an
+	 * unrestricted LKEY.
+	 */
+	rkt->gen++;
+	/*
+	 * bits are capped in verbs.c to ensure enough bits for
+	 * generation number
+	 */
+	mr->lkey = (r << (32 - hfi1_lkey_table_size)) |
+		((((1 << (24 - hfi1_lkey_table_size)) - 1) & rkt->gen)
+		 << 8);
+	if (mr->lkey == 0) {
+		mr->lkey |= 1 << 8;
+		rkt->gen++;
+	}
+	rcu_assign_pointer(rkt->table[r], mr);
+	mr->lkey_published = 1;
+success:
+	spin_unlock_irqrestore(&rkt->lock, flags);
+out:
+	return ret;
+bail:
+	hfi1_put_mr(mr);
+	spin_unlock_irqrestore(&rkt->lock, flags);
+	ret = -ENOMEM;
+	goto out;
+}
+
+/**
+ * hfi1_free_lkey - free an lkey
+ * @mr: mr to free from tables
+ */
+void hfi1_free_lkey(struct hfi1_mregion *mr)
+{
+	unsigned long flags;
+	u32 lkey = mr->lkey;
+	u32 r;
+	struct hfi1_ibdev *dev = to_idev(mr->pd->device);
+	struct hfi1_lkey_table *rkt = &dev->lk_table;
+	int freed = 0;
+
+	spin_lock_irqsave(&rkt->lock, flags);
+	if (!mr->lkey_published)
+		goto out;
+	if (lkey == 0)
+		RCU_INIT_POINTER(dev->dma_mr, NULL);
+	else {
+		r = lkey >> (32 - hfi1_lkey_table_size);
+		RCU_INIT_POINTER(rkt->table[r], NULL);
+	}
+	mr->lkey_published = 0;
+	freed++;
+out:
+	spin_unlock_irqrestore(&rkt->lock, flags);
+	if (freed) {
+		synchronize_rcu();
+		hfi1_put_mr(mr);
+	}
+}
+
+/**
+ * hfi1_lkey_ok - check IB SGE for validity and initialize
+ * @rkt: table containing lkey to check SGE against
+ * @pd: protection domain
+ * @isge: outgoing internal SGE
+ * @sge: SGE to check
+ * @acc: access flags
+ *
+ * Return 1 if valid and successful, otherwise returns 0.
+ *
+ * increments the reference count upon success
+ *
+ * Check the IB SGE for validity and initialize our internal version
+ * of it.
+ */
+int hfi1_lkey_ok(struct hfi1_lkey_table *rkt, struct hfi1_pd *pd,
+		 struct hfi1_sge *isge, struct ib_sge *sge, int acc)
+{
+	struct hfi1_mregion *mr;
+	unsigned n, m;
+	size_t off;
+
+	/*
+	 * We use LKEY == zero for kernel virtual addresses
+	 * (see hfi1_get_dma_mr and dma.c).
+	 */
+	rcu_read_lock();
+	if (sge->lkey == 0) {
+		struct hfi1_ibdev *dev = to_idev(pd->ibpd.device);
+
+		if (pd->user)
+			goto bail;
+		mr = rcu_dereference(dev->dma_mr);
+		if (!mr)
+			goto bail;
+		atomic_inc(&mr->refcount);
+		rcu_read_unlock();
+
+		isge->mr = mr;
+		isge->vaddr = (void *) sge->addr;
+		isge->length = sge->length;
+		isge->sge_length = sge->length;
+		isge->m = 0;
+		isge->n = 0;
+		goto ok;
+	}
+	mr = rcu_dereference(
+		rkt->table[(sge->lkey >> (32 - hfi1_lkey_table_size))]);
+	if (unlikely(!mr || mr->lkey != sge->lkey || mr->pd != &pd->ibpd))
+		goto bail;
+
+	off = sge->addr - mr->user_base;
+	if (unlikely(sge->addr < mr->user_base ||
+		     off + sge->length > mr->length ||
+		     (mr->access_flags & acc) != acc))
+		goto bail;
+	atomic_inc(&mr->refcount);
+	rcu_read_unlock();
+
+	off += mr->offset;
+	if (mr->page_shift) {
+		/*
+		page sizes are uniform power of 2 so no loop is necessary
+		entries_spanned_by_off is the number of times the loop below
+		would have executed.
+		*/
+		size_t entries_spanned_by_off;
+
+		entries_spanned_by_off = off >> mr->page_shift;
+		off -= (entries_spanned_by_off << mr->page_shift);
+		m = entries_spanned_by_off / HFI1_SEGSZ;
+		n = entries_spanned_by_off % HFI1_SEGSZ;
+	} else {
+		m = 0;
+		n = 0;
+		while (off >= mr->map[m]->segs[n].length) {
+			off -= mr->map[m]->segs[n].length;
+			n++;
+			if (n >= HFI1_SEGSZ) {
+				m++;
+				n = 0;
+			}
+		}
+	}
+	isge->mr = mr;
+	isge->vaddr = mr->map[m]->segs[n].vaddr + off;
+	isge->length = mr->map[m]->segs[n].length - off;
+	isge->sge_length = sge->length;
+	isge->m = m;
+	isge->n = n;
+ok:
+	return 1;
+bail:
+	rcu_read_unlock();
+	return 0;
+}
+
+/**
+ * hfi1_rkey_ok - check the IB virtual address, length, and RKEY
+ * @qp: qp for validation
+ * @sge: SGE state
+ * @len: length of data
+ * @vaddr: virtual address to place data
+ * @rkey: rkey to check
+ * @acc: access flags
+ *
+ * Return 1 if successful, otherwise 0.
+ *
+ * increments the reference count upon success
+ */
+int hfi1_rkey_ok(struct hfi1_qp *qp, struct hfi1_sge *sge,
+		 u32 len, u64 vaddr, u32 rkey, int acc)
+{
+	struct hfi1_lkey_table *rkt = &to_idev(qp->ibqp.device)->lk_table;
+	struct hfi1_mregion *mr;
+	unsigned n, m;
+	size_t off;
+
+	/*
+	 * We use RKEY == zero for kernel virtual addresses
+	 * (see hfi1_get_dma_mr and dma.c).
+	 */
+	rcu_read_lock();
+	if (rkey == 0) {
+		struct hfi1_pd *pd = to_ipd(qp->ibqp.pd);
+		struct hfi1_ibdev *dev = to_idev(pd->ibpd.device);
+
+		if (pd->user)
+			goto bail;
+		mr = rcu_dereference(dev->dma_mr);
+		if (!mr)
+			goto bail;
+		atomic_inc(&mr->refcount);
+		rcu_read_unlock();
+
+		sge->mr = mr;
+		sge->vaddr = (void *) vaddr;
+		sge->length = len;
+		sge->sge_length = len;
+		sge->m = 0;
+		sge->n = 0;
+		goto ok;
+	}
+
+	mr = rcu_dereference(
+		rkt->table[(rkey >> (32 - hfi1_lkey_table_size))]);
+	if (unlikely(!mr || mr->lkey != rkey || qp->ibqp.pd != mr->pd))
+		goto bail;
+
+	off = vaddr - mr->iova;
+	if (unlikely(vaddr < mr->iova || off + len > mr->length ||
+		     (mr->access_flags & acc) == 0))
+		goto bail;
+	atomic_inc(&mr->refcount);
+	rcu_read_unlock();
+
+	off += mr->offset;
+	if (mr->page_shift) {
+		/*
+		page sizes are uniform power of 2 so no loop is necessary
+		entries_spanned_by_off is the number of times the loop below
+		would have executed.
+		*/
+		size_t entries_spanned_by_off;
+
+		entries_spanned_by_off = off >> mr->page_shift;
+		off -= (entries_spanned_by_off << mr->page_shift);
+		m = entries_spanned_by_off / HFI1_SEGSZ;
+		n = entries_spanned_by_off % HFI1_SEGSZ;
+	} else {
+		m = 0;
+		n = 0;
+		while (off >= mr->map[m]->segs[n].length) {
+			off -= mr->map[m]->segs[n].length;
+			n++;
+			if (n >= HFI1_SEGSZ) {
+				m++;
+				n = 0;
+			}
+		}
+	}
+	sge->mr = mr;
+	sge->vaddr = mr->map[m]->segs[n].vaddr + off;
+	sge->length = mr->map[m]->segs[n].length - off;
+	sge->sge_length = len;
+	sge->m = m;
+	sge->n = n;
+ok:
+	return 1;
+bail:
+	rcu_read_unlock();
+	return 0;
+}
+
+/*
+ * Initialize the memory region specified by the work request.
+ */
+int hfi1_fast_reg_mr(struct hfi1_qp *qp, struct ib_send_wr *wr)
+{
+	struct hfi1_lkey_table *rkt = &to_idev(qp->ibqp.device)->lk_table;
+	struct hfi1_pd *pd = to_ipd(qp->ibqp.pd);
+	struct hfi1_mregion *mr;
+	u32 rkey = wr->wr.fast_reg.rkey;
+	unsigned i, n, m;
+	int ret = -EINVAL;
+	unsigned long flags;
+	u64 *page_list;
+	size_t ps;
+
+	spin_lock_irqsave(&rkt->lock, flags);
+	if (pd->user || rkey == 0)
+		goto bail;
+
+	mr = rcu_dereference_protected(
+		rkt->table[(rkey >> (32 - hfi1_lkey_table_size))],
+		lockdep_is_held(&rkt->lock));
+	if (unlikely(mr == NULL || qp->ibqp.pd != mr->pd))
+		goto bail;
+
+	if (wr->wr.fast_reg.page_list_len > mr->max_segs)
+		goto bail;
+
+	ps = 1UL << wr->wr.fast_reg.page_shift;
+	if (wr->wr.fast_reg.length > ps * wr->wr.fast_reg.page_list_len)
+		goto bail;
+
+	mr->user_base = wr->wr.fast_reg.iova_start;
+	mr->iova = wr->wr.fast_reg.iova_start;
+	mr->lkey = rkey;
+	mr->length = wr->wr.fast_reg.length;
+	mr->access_flags = wr->wr.fast_reg.access_flags;
+	page_list = wr->wr.fast_reg.page_list->page_list;
+	m = 0;
+	n = 0;
+	for (i = 0; i < wr->wr.fast_reg.page_list_len; i++) {
+		mr->map[m]->segs[n].vaddr = (void *) page_list[i];
+		mr->map[m]->segs[n].length = ps;
+		if (++n == HFI1_SEGSZ) {
+			m++;
+			n = 0;
+		}
+	}
+
+	ret = 0;
+bail:
+	spin_unlock_irqrestore(&rkt->lock, flags);
+	return ret;
+}
diff --git a/drivers/staging/rdma/hfi1/mad.c b/drivers/staging/rdma/hfi1/mad.c
new file mode 100644
index 000000000000..0a18fee46432
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/mad.c
@@ -0,0 +1,4257 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/net.h>
+#define OPA_NUM_PKEY_BLOCKS_PER_SMP (OPA_SMP_DR_DATA_SIZE \
+			/ (OPA_PARTITION_TABLE_BLK_SIZE * sizeof(u16)))
+
+#include "hfi.h"
+#include "mad.h"
+#include "trace.h"
+
+/* the reset value from the FM is supposed to be 0xffff, handle both */
+#define OPA_LINK_WIDTH_RESET_OLD 0x0fff
+#define OPA_LINK_WIDTH_RESET 0xffff
+
+static int reply(struct ib_mad_hdr *smp)
+{
+	/*
+	 * The verbs framework will handle the directed/LID route
+	 * packet changes.
+	 */
+	smp->method = IB_MGMT_METHOD_GET_RESP;
+	if (smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
+		smp->status |= IB_SMP_DIRECTION;
+	return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY;
+}
+
+static inline void clear_opa_smp_data(struct opa_smp *smp)
+{
+	void *data = opa_get_smp_data(smp);
+	size_t size = opa_get_smp_data_size(smp);
+
+	memset(data, 0, size);
+}
+
+static void send_trap(struct hfi1_ibport *ibp, void *data, unsigned len)
+{
+	struct ib_mad_send_buf *send_buf;
+	struct ib_mad_agent *agent;
+	struct ib_smp *smp;
+	int ret;
+	unsigned long flags;
+	unsigned long timeout;
+	int pkey_idx;
+	u32 qpn = ppd_from_ibp(ibp)->sm_trap_qp;
+
+	agent = ibp->send_agent;
+	if (!agent)
+		return;
+
+	/* o14-3.2.1 */
+	if (ppd_from_ibp(ibp)->lstate != IB_PORT_ACTIVE)
+		return;
+
+	/* o14-2 */
+	if (ibp->trap_timeout && time_before(jiffies, ibp->trap_timeout))
+		return;
+
+	pkey_idx = hfi1_lookup_pkey_idx(ibp, LIM_MGMT_P_KEY);
+	if (pkey_idx < 0) {
+		pr_warn("%s: failed to find limited mgmt pkey, defaulting 0x%x\n",
+			__func__, hfi1_get_pkey(ibp, 1));
+		pkey_idx = 1;
+	}
+
+	send_buf = ib_create_send_mad(agent, qpn, pkey_idx, 0,
+				      IB_MGMT_MAD_HDR, IB_MGMT_MAD_DATA,
+				      GFP_ATOMIC, IB_MGMT_BASE_VERSION);
+	if (IS_ERR(send_buf))
+		return;
+
+	smp = send_buf->mad;
+	smp->base_version = IB_MGMT_BASE_VERSION;
+	smp->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED;
+	smp->class_version = 1;
+	smp->method = IB_MGMT_METHOD_TRAP;
+	ibp->tid++;
+	smp->tid = cpu_to_be64(ibp->tid);
+	smp->attr_id = IB_SMP_ATTR_NOTICE;
+	/* o14-1: smp->mkey = 0; */
+	memcpy(smp->data, data, len);
+
+	spin_lock_irqsave(&ibp->lock, flags);
+	if (!ibp->sm_ah) {
+		if (ibp->sm_lid != be16_to_cpu(IB_LID_PERMISSIVE)) {
+			struct ib_ah *ah;
+
+			ah = hfi1_create_qp0_ah(ibp, ibp->sm_lid);
+			if (IS_ERR(ah))
+				ret = PTR_ERR(ah);
+			else {
+				send_buf->ah = ah;
+				ibp->sm_ah = to_iah(ah);
+				ret = 0;
+			}
+		} else
+			ret = -EINVAL;
+	} else {
+		send_buf->ah = &ibp->sm_ah->ibah;
+		ret = 0;
+	}
+	spin_unlock_irqrestore(&ibp->lock, flags);
+
+	if (!ret)
+		ret = ib_post_send_mad(send_buf, NULL);
+	if (!ret) {
+		/* 4.096 usec. */
+		timeout = (4096 * (1UL << ibp->subnet_timeout)) / 1000;
+		ibp->trap_timeout = jiffies + usecs_to_jiffies(timeout);
+	} else {
+		ib_free_send_mad(send_buf);
+		ibp->trap_timeout = 0;
+	}
+}
+
+/*
+ * Send a bad [PQ]_Key trap (ch. 14.3.8).
+ */
+void hfi1_bad_pqkey(struct hfi1_ibport *ibp, __be16 trap_num, u32 key, u32 sl,
+		    u32 qp1, u32 qp2, __be16 lid1, __be16 lid2)
+{
+	struct ib_mad_notice_attr data;
+
+	if (trap_num == IB_NOTICE_TRAP_BAD_PKEY)
+		ibp->pkey_violations++;
+	else
+		ibp->qkey_violations++;
+	ibp->n_pkt_drops++;
+
+	/* Send violation trap */
+	data.generic_type = IB_NOTICE_TYPE_SECURITY;
+	data.prod_type_msb = 0;
+	data.prod_type_lsb = IB_NOTICE_PROD_CA;
+	data.trap_num = trap_num;
+	data.issuer_lid = cpu_to_be16(ppd_from_ibp(ibp)->lid);
+	data.toggle_count = 0;
+	memset(&data.details, 0, sizeof(data.details));
+	data.details.ntc_257_258.lid1 = lid1;
+	data.details.ntc_257_258.lid2 = lid2;
+	data.details.ntc_257_258.key = cpu_to_be32(key);
+	data.details.ntc_257_258.sl_qp1 = cpu_to_be32((sl << 28) | qp1);
+	data.details.ntc_257_258.qp2 = cpu_to_be32(qp2);
+
+	send_trap(ibp, &data, sizeof(data));
+}
+
+/*
+ * Send a bad M_Key trap (ch. 14.3.9).
+ */
+static void bad_mkey(struct hfi1_ibport *ibp, struct ib_mad_hdr *mad,
+		     __be64 mkey, __be32 dr_slid, u8 return_path[], u8 hop_cnt)
+{
+	struct ib_mad_notice_attr data;
+
+	/* Send violation trap */
+	data.generic_type = IB_NOTICE_TYPE_SECURITY;
+	data.prod_type_msb = 0;
+	data.prod_type_lsb = IB_NOTICE_PROD_CA;
+	data.trap_num = IB_NOTICE_TRAP_BAD_MKEY;
+	data.issuer_lid = cpu_to_be16(ppd_from_ibp(ibp)->lid);
+	data.toggle_count = 0;
+	memset(&data.details, 0, sizeof(data.details));
+	data.details.ntc_256.lid = data.issuer_lid;
+	data.details.ntc_256.method = mad->method;
+	data.details.ntc_256.attr_id = mad->attr_id;
+	data.details.ntc_256.attr_mod = mad->attr_mod;
+	data.details.ntc_256.mkey = mkey;
+	if (mad->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
+
+		data.details.ntc_256.dr_slid = (__force __be16)dr_slid;
+		data.details.ntc_256.dr_trunc_hop = IB_NOTICE_TRAP_DR_NOTICE;
+		if (hop_cnt > ARRAY_SIZE(data.details.ntc_256.dr_rtn_path)) {
+			data.details.ntc_256.dr_trunc_hop |=
+				IB_NOTICE_TRAP_DR_TRUNC;
+			hop_cnt = ARRAY_SIZE(data.details.ntc_256.dr_rtn_path);
+		}
+		data.details.ntc_256.dr_trunc_hop |= hop_cnt;
+		memcpy(data.details.ntc_256.dr_rtn_path, return_path,
+		       hop_cnt);
+	}
+
+	send_trap(ibp, &data, sizeof(data));
+}
+
+/*
+ * Send a Port Capability Mask Changed trap (ch. 14.3.11).
+ */
+void hfi1_cap_mask_chg(struct hfi1_ibport *ibp)
+{
+	struct ib_mad_notice_attr data;
+
+	data.generic_type = IB_NOTICE_TYPE_INFO;
+	data.prod_type_msb = 0;
+	data.prod_type_lsb = IB_NOTICE_PROD_CA;
+	data.trap_num = IB_NOTICE_TRAP_CAP_MASK_CHG;
+	data.issuer_lid = cpu_to_be16(ppd_from_ibp(ibp)->lid);
+	data.toggle_count = 0;
+	memset(&data.details, 0, sizeof(data.details));
+	data.details.ntc_144.lid = data.issuer_lid;
+	data.details.ntc_144.new_cap_mask = cpu_to_be32(ibp->port_cap_flags);
+
+	send_trap(ibp, &data, sizeof(data));
+}
+
+/*
+ * Send a System Image GUID Changed trap (ch. 14.3.12).
+ */
+void hfi1_sys_guid_chg(struct hfi1_ibport *ibp)
+{
+	struct ib_mad_notice_attr data;
+
+	data.generic_type = IB_NOTICE_TYPE_INFO;
+	data.prod_type_msb = 0;
+	data.prod_type_lsb = IB_NOTICE_PROD_CA;
+	data.trap_num = IB_NOTICE_TRAP_SYS_GUID_CHG;
+	data.issuer_lid = cpu_to_be16(ppd_from_ibp(ibp)->lid);
+	data.toggle_count = 0;
+	memset(&data.details, 0, sizeof(data.details));
+	data.details.ntc_145.lid = data.issuer_lid;
+	data.details.ntc_145.new_sys_guid = ib_hfi1_sys_image_guid;
+
+	send_trap(ibp, &data, sizeof(data));
+}
+
+/*
+ * Send a Node Description Changed trap (ch. 14.3.13).
+ */
+void hfi1_node_desc_chg(struct hfi1_ibport *ibp)
+{
+	struct ib_mad_notice_attr data;
+
+	data.generic_type = IB_NOTICE_TYPE_INFO;
+	data.prod_type_msb = 0;
+	data.prod_type_lsb = IB_NOTICE_PROD_CA;
+	data.trap_num = IB_NOTICE_TRAP_CAP_MASK_CHG;
+	data.issuer_lid = cpu_to_be16(ppd_from_ibp(ibp)->lid);
+	data.toggle_count = 0;
+	memset(&data.details, 0, sizeof(data.details));
+	data.details.ntc_144.lid = data.issuer_lid;
+	data.details.ntc_144.local_changes = 1;
+	data.details.ntc_144.change_flags = IB_NOTICE_TRAP_NODE_DESC_CHG;
+
+	send_trap(ibp, &data, sizeof(data));
+}
+
+static int __subn_get_opa_nodedesc(struct opa_smp *smp, u32 am,
+				   u8 *data, struct ib_device *ibdev,
+				   u8 port, u32 *resp_len)
+{
+	struct opa_node_description *nd;
+
+	if (am) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	nd = (struct opa_node_description *)data;
+
+	memcpy(nd->data, ibdev->node_desc, sizeof(nd->data));
+
+	if (resp_len)
+		*resp_len += sizeof(*nd);
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int __subn_get_opa_nodeinfo(struct opa_smp *smp, u32 am, u8 *data,
+				   struct ib_device *ibdev, u8 port,
+				   u32 *resp_len)
+{
+	struct opa_node_info *ni;
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	unsigned pidx = port - 1; /* IB number port from 1, hw from 0 */
+
+	ni = (struct opa_node_info *)data;
+
+	/* GUID 0 is illegal */
+	if (am || pidx >= dd->num_pports || dd->pport[pidx].guid == 0) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	ni->port_guid = cpu_to_be64(dd->pport[pidx].guid);
+	ni->base_version = OPA_MGMT_BASE_VERSION;
+	ni->class_version = OPA_SMI_CLASS_VERSION;
+	ni->node_type = 1;     /* channel adapter */
+	ni->num_ports = ibdev->phys_port_cnt;
+	/* This is already in network order */
+	ni->system_image_guid = ib_hfi1_sys_image_guid;
+	/* Use first-port GUID as node */
+	ni->node_guid = cpu_to_be64(dd->pport->guid);
+	ni->partition_cap = cpu_to_be16(hfi1_get_npkeys(dd));
+	ni->device_id = cpu_to_be16(dd->pcidev->device);
+	ni->revision = cpu_to_be32(dd->minrev);
+	ni->local_port_num = port;
+	ni->vendor_id[0] = dd->oui1;
+	ni->vendor_id[1] = dd->oui2;
+	ni->vendor_id[2] = dd->oui3;
+
+	if (resp_len)
+		*resp_len += sizeof(*ni);
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int subn_get_nodeinfo(struct ib_smp *smp, struct ib_device *ibdev,
+			     u8 port)
+{
+	struct ib_node_info *nip = (struct ib_node_info *)&smp->data;
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	unsigned pidx = port - 1; /* IB number port from 1, hw from 0 */
+
+	/* GUID 0 is illegal */
+	if (smp->attr_mod || pidx >= dd->num_pports ||
+	    dd->pport[pidx].guid == 0)
+		smp->status |= IB_SMP_INVALID_FIELD;
+	else
+		nip->port_guid = cpu_to_be64(dd->pport[pidx].guid);
+
+	nip->base_version = OPA_MGMT_BASE_VERSION;
+	nip->class_version = OPA_SMI_CLASS_VERSION;
+	nip->node_type = 1;     /* channel adapter */
+	nip->num_ports = ibdev->phys_port_cnt;
+	/* This is already in network order */
+	nip->sys_guid = ib_hfi1_sys_image_guid;
+	 /* Use first-port GUID as node */
+	nip->node_guid = cpu_to_be64(dd->pport->guid);
+	nip->partition_cap = cpu_to_be16(hfi1_get_npkeys(dd));
+	nip->device_id = cpu_to_be16(dd->pcidev->device);
+	nip->revision = cpu_to_be32(dd->minrev);
+	nip->local_port_num = port;
+	nip->vendor_id[0] = dd->oui1;
+	nip->vendor_id[1] = dd->oui2;
+	nip->vendor_id[2] = dd->oui3;
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static void set_link_width_enabled(struct hfi1_pportdata *ppd, u32 w)
+{
+	(void)hfi1_set_ib_cfg(ppd, HFI1_IB_CFG_LWID_ENB, w);
+}
+
+static void set_link_width_downgrade_enabled(struct hfi1_pportdata *ppd, u32 w)
+{
+	(void)hfi1_set_ib_cfg(ppd, HFI1_IB_CFG_LWID_DG_ENB, w);
+}
+
+static void set_link_speed_enabled(struct hfi1_pportdata *ppd, u32 s)
+{
+	(void)hfi1_set_ib_cfg(ppd, HFI1_IB_CFG_SPD_ENB, s);
+}
+
+static int check_mkey(struct hfi1_ibport *ibp, struct ib_mad_hdr *mad,
+		      int mad_flags, __be64 mkey, __be32 dr_slid,
+		      u8 return_path[], u8 hop_cnt)
+{
+	int valid_mkey = 0;
+	int ret = 0;
+
+	/* Is the mkey in the process of expiring? */
+	if (ibp->mkey_lease_timeout &&
+	    time_after_eq(jiffies, ibp->mkey_lease_timeout)) {
+		/* Clear timeout and mkey protection field. */
+		ibp->mkey_lease_timeout = 0;
+		ibp->mkeyprot = 0;
+	}
+
+	if ((mad_flags & IB_MAD_IGNORE_MKEY) ||  ibp->mkey == 0 ||
+	    ibp->mkey == mkey)
+		valid_mkey = 1;
+
+	/* Unset lease timeout on any valid Get/Set/TrapRepress */
+	if (valid_mkey && ibp->mkey_lease_timeout &&
+	    (mad->method == IB_MGMT_METHOD_GET ||
+	     mad->method == IB_MGMT_METHOD_SET ||
+	     mad->method == IB_MGMT_METHOD_TRAP_REPRESS))
+		ibp->mkey_lease_timeout = 0;
+
+	if (!valid_mkey) {
+		switch (mad->method) {
+		case IB_MGMT_METHOD_GET:
+			/* Bad mkey not a violation below level 2 */
+			if (ibp->mkeyprot < 2)
+				break;
+		case IB_MGMT_METHOD_SET:
+		case IB_MGMT_METHOD_TRAP_REPRESS:
+			if (ibp->mkey_violations != 0xFFFF)
+				++ibp->mkey_violations;
+			if (!ibp->mkey_lease_timeout && ibp->mkey_lease_period)
+				ibp->mkey_lease_timeout = jiffies +
+					ibp->mkey_lease_period * HZ;
+			/* Generate a trap notice. */
+			bad_mkey(ibp, mad, mkey, dr_slid, return_path,
+				 hop_cnt);
+			ret = 1;
+		}
+	}
+
+	return ret;
+}
+
+/*
+ * The SMA caches reads from LCB registers in case the LCB is unavailable.
+ * (The LCB is unavailable in certain link states, for example.)
+ */
+struct lcb_datum {
+	u32 off;
+	u64 val;
+};
+
+static struct lcb_datum lcb_cache[] = {
+	{ DC_LCB_STS_ROUND_TRIP_LTP_CNT, 0 },
+};
+
+static int write_lcb_cache(u32 off, u64 val)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(lcb_cache); i++) {
+		if (lcb_cache[i].off == off) {
+			lcb_cache[i].val = val;
+			return 0;
+		}
+	}
+
+	pr_warn("%s bad offset 0x%x\n", __func__, off);
+	return -1;
+}
+
+static int read_lcb_cache(u32 off, u64 *val)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(lcb_cache); i++) {
+		if (lcb_cache[i].off == off) {
+			*val = lcb_cache[i].val;
+			return 0;
+		}
+	}
+
+	pr_warn("%s bad offset 0x%x\n", __func__, off);
+	return -1;
+}
+
+void read_ltp_rtt(struct hfi1_devdata *dd)
+{
+	u64 reg;
+
+	if (read_lcb_csr(dd, DC_LCB_STS_ROUND_TRIP_LTP_CNT, &reg))
+		dd_dev_err(dd, "%s: unable to read LTP RTT\n", __func__);
+	else
+		write_lcb_cache(DC_LCB_STS_ROUND_TRIP_LTP_CNT, reg);
+}
+
+static u8 __opa_porttype(struct hfi1_pportdata *ppd)
+{
+	if (qsfp_mod_present(ppd)) {
+		if (ppd->qsfp_info.cache_valid)
+			return OPA_PORT_TYPE_STANDARD;
+		return OPA_PORT_TYPE_DISCONNECTED;
+	}
+	return OPA_PORT_TYPE_UNKNOWN;
+}
+
+static int __subn_get_opa_portinfo(struct opa_smp *smp, u32 am, u8 *data,
+				   struct ib_device *ibdev, u8 port,
+				   u32 *resp_len)
+{
+	int i;
+	struct hfi1_devdata *dd;
+	struct hfi1_pportdata *ppd;
+	struct hfi1_ibport *ibp;
+	struct opa_port_info *pi = (struct opa_port_info *)data;
+	u8 mtu;
+	u8 credit_rate;
+	u32 state;
+	u32 num_ports = OPA_AM_NPORT(am);
+	u32 start_of_sm_config = OPA_AM_START_SM_CFG(am);
+	u32 buffer_units;
+	u64 tmp = 0;
+
+	if (num_ports != 1) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	dd = dd_from_ibdev(ibdev);
+	/* IB numbers ports from 1, hw from 0 */
+	ppd = dd->pport + (port - 1);
+	ibp = &ppd->ibport_data;
+
+	if (ppd->vls_supported/2 > ARRAY_SIZE(pi->neigh_mtu.pvlx_to_mtu) ||
+		ppd->vls_supported > ARRAY_SIZE(dd->vld)) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	pi->lid = cpu_to_be32(ppd->lid);
+
+	/* Only return the mkey if the protection field allows it. */
+	if (!(smp->method == IB_MGMT_METHOD_GET &&
+	      ibp->mkey != smp->mkey &&
+	      ibp->mkeyprot == 1))
+		pi->mkey = ibp->mkey;
+
+	pi->subnet_prefix = ibp->gid_prefix;
+	pi->sm_lid = cpu_to_be32(ibp->sm_lid);
+	pi->ib_cap_mask = cpu_to_be32(ibp->port_cap_flags);
+	pi->mkey_lease_period = cpu_to_be16(ibp->mkey_lease_period);
+	pi->sm_trap_qp = cpu_to_be32(ppd->sm_trap_qp);
+	pi->sa_qp = cpu_to_be32(ppd->sa_qp);
+
+	pi->link_width.enabled = cpu_to_be16(ppd->link_width_enabled);
+	pi->link_width.supported = cpu_to_be16(ppd->link_width_supported);
+	pi->link_width.active = cpu_to_be16(ppd->link_width_active);
+
+	pi->link_width_downgrade.supported =
+			cpu_to_be16(ppd->link_width_downgrade_supported);
+	pi->link_width_downgrade.enabled =
+			cpu_to_be16(ppd->link_width_downgrade_enabled);
+	pi->link_width_downgrade.tx_active =
+			cpu_to_be16(ppd->link_width_downgrade_tx_active);
+	pi->link_width_downgrade.rx_active =
+			cpu_to_be16(ppd->link_width_downgrade_rx_active);
+
+	pi->link_speed.supported = cpu_to_be16(ppd->link_speed_supported);
+	pi->link_speed.active = cpu_to_be16(ppd->link_speed_active);
+	pi->link_speed.enabled = cpu_to_be16(ppd->link_speed_enabled);
+
+	state = driver_lstate(ppd);
+
+	if (start_of_sm_config && (state == IB_PORT_INIT))
+		ppd->is_sm_config_started = 1;
+
+	pi->port_phys_conf = __opa_porttype(ppd) & 0xf;
+
+#if PI_LED_ENABLE_SUP
+	pi->port_states.ledenable_offlinereason = ppd->neighbor_normal << 4;
+	pi->port_states.ledenable_offlinereason |=
+		ppd->is_sm_config_started << 5;
+	pi->port_states.ledenable_offlinereason |=
+		ppd->offline_disabled_reason & OPA_PI_MASK_OFFLINE_REASON;
+#else
+	pi->port_states.offline_reason = ppd->neighbor_normal << 4;
+	pi->port_states.offline_reason |= ppd->is_sm_config_started << 5;
+	pi->port_states.offline_reason |= ppd->offline_disabled_reason &
+						OPA_PI_MASK_OFFLINE_REASON;
+#endif /* PI_LED_ENABLE_SUP */
+
+	pi->port_states.portphysstate_portstate =
+		(hfi1_ibphys_portstate(ppd) << 4) | state;
+
+	pi->mkeyprotect_lmc = (ibp->mkeyprot << 6) | ppd->lmc;
+
+	memset(pi->neigh_mtu.pvlx_to_mtu, 0, sizeof(pi->neigh_mtu.pvlx_to_mtu));
+	for (i = 0; i < ppd->vls_supported; i++) {
+		mtu = mtu_to_enum(dd->vld[i].mtu, HFI1_DEFAULT_ACTIVE_MTU);
+		if ((i % 2) == 0)
+			pi->neigh_mtu.pvlx_to_mtu[i/2] |= (mtu << 4);
+		else
+			pi->neigh_mtu.pvlx_to_mtu[i/2] |= mtu;
+	}
+	/* don't forget VL 15 */
+	mtu = mtu_to_enum(dd->vld[15].mtu, 2048);
+	pi->neigh_mtu.pvlx_to_mtu[15/2] |= mtu;
+	pi->smsl = ibp->sm_sl & OPA_PI_MASK_SMSL;
+	pi->operational_vls = hfi1_get_ib_cfg(ppd, HFI1_IB_CFG_OP_VLS);
+	pi->partenforce_filterraw |=
+		(ppd->linkinit_reason & OPA_PI_MASK_LINKINIT_REASON);
+	if (ppd->part_enforce & HFI1_PART_ENFORCE_IN)
+		pi->partenforce_filterraw |= OPA_PI_MASK_PARTITION_ENFORCE_IN;
+	if (ppd->part_enforce & HFI1_PART_ENFORCE_OUT)
+		pi->partenforce_filterraw |= OPA_PI_MASK_PARTITION_ENFORCE_OUT;
+	pi->mkey_violations = cpu_to_be16(ibp->mkey_violations);
+	/* P_KeyViolations are counted by hardware. */
+	pi->pkey_violations = cpu_to_be16(ibp->pkey_violations);
+	pi->qkey_violations = cpu_to_be16(ibp->qkey_violations);
+
+	pi->vl.cap = ppd->vls_supported;
+	pi->vl.high_limit = cpu_to_be16(ibp->vl_high_limit);
+	pi->vl.arb_high_cap = (u8)hfi1_get_ib_cfg(ppd, HFI1_IB_CFG_VL_HIGH_CAP);
+	pi->vl.arb_low_cap = (u8)hfi1_get_ib_cfg(ppd, HFI1_IB_CFG_VL_LOW_CAP);
+
+	pi->clientrereg_subnettimeout = ibp->subnet_timeout;
+
+	pi->port_link_mode  = cpu_to_be16(OPA_PORT_LINK_MODE_OPA << 10 |
+					  OPA_PORT_LINK_MODE_OPA << 5 |
+					  OPA_PORT_LINK_MODE_OPA);
+
+	pi->port_ltp_crc_mode = cpu_to_be16(ppd->port_ltp_crc_mode);
+
+	pi->port_mode = cpu_to_be16(
+				ppd->is_active_optimize_enabled ?
+					OPA_PI_MASK_PORT_ACTIVE_OPTOMIZE : 0);
+
+	pi->port_packet_format.supported =
+		cpu_to_be16(OPA_PORT_PACKET_FORMAT_9B);
+	pi->port_packet_format.enabled =
+		cpu_to_be16(OPA_PORT_PACKET_FORMAT_9B);
+
+	/* flit_control.interleave is (OPA V1, version .76):
+	 * bits		use
+	 * ----		---
+	 * 2		res
+	 * 2		DistanceSupported
+	 * 2		DistanceEnabled
+	 * 5		MaxNextLevelTxEnabled
+	 * 5		MaxNestLevelRxSupported
+	 *
+	 * HFI supports only "distance mode 1" (see OPA V1, version .76,
+	 * section 9.6.2), so set DistanceSupported, DistanceEnabled
+	 * to 0x1.
+	 */
+	pi->flit_control.interleave = cpu_to_be16(0x1400);
+
+	pi->link_down_reason = ppd->local_link_down_reason.sma;
+	pi->neigh_link_down_reason = ppd->neigh_link_down_reason.sma;
+	pi->port_error_action = cpu_to_be32(ppd->port_error_action);
+	pi->mtucap = mtu_to_enum(hfi1_max_mtu, IB_MTU_4096);
+
+	/* 32.768 usec. response time (guessing) */
+	pi->resptimevalue = 3;
+
+	pi->local_port_num = port;
+
+	/* buffer info for FM */
+	pi->overall_buffer_space = cpu_to_be16(dd->link_credits);
+
+	pi->neigh_node_guid = cpu_to_be64(ppd->neighbor_guid);
+	pi->neigh_port_num = ppd->neighbor_port_number;
+	pi->port_neigh_mode =
+		(ppd->neighbor_type & OPA_PI_MASK_NEIGH_NODE_TYPE) |
+		(ppd->mgmt_allowed ? OPA_PI_MASK_NEIGH_MGMT_ALLOWED : 0) |
+		(ppd->neighbor_fm_security ?
+			OPA_PI_MASK_NEIGH_FW_AUTH_BYPASS : 0);
+
+	/* HFIs shall always return VL15 credits to their
+	 * neighbor in a timely manner, without any credit return pacing.
+	 */
+	credit_rate = 0;
+	buffer_units  = (dd->vau) & OPA_PI_MASK_BUF_UNIT_BUF_ALLOC;
+	buffer_units |= (dd->vcu << 3) & OPA_PI_MASK_BUF_UNIT_CREDIT_ACK;
+	buffer_units |= (credit_rate << 6) &
+				OPA_PI_MASK_BUF_UNIT_VL15_CREDIT_RATE;
+	buffer_units |= (dd->vl15_init << 11) & OPA_PI_MASK_BUF_UNIT_VL15_INIT;
+	pi->buffer_units = cpu_to_be32(buffer_units);
+
+	pi->opa_cap_mask = cpu_to_be16(OPA_CAP_MASK3_IsSharedSpaceSupported);
+
+	/* HFI supports a replay buffer 128 LTPs in size */
+	pi->replay_depth.buffer = 0x80;
+	/* read the cached value of DC_LCB_STS_ROUND_TRIP_LTP_CNT */
+	read_lcb_cache(DC_LCB_STS_ROUND_TRIP_LTP_CNT, &tmp);
+
+	/* this counter is 16 bits wide, but the replay_depth.wire
+	 * variable is only 8 bits */
+	if (tmp > 0xff)
+		tmp = 0xff;
+	pi->replay_depth.wire = tmp;
+
+	if (resp_len)
+		*resp_len += sizeof(struct opa_port_info);
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+/**
+ * get_pkeys - return the PKEY table
+ * @dd: the hfi1_ib device
+ * @port: the IB port number
+ * @pkeys: the pkey table is placed here
+ */
+static int get_pkeys(struct hfi1_devdata *dd, u8 port, u16 *pkeys)
+{
+	struct hfi1_pportdata *ppd = dd->pport + port - 1;
+
+	memcpy(pkeys, ppd->pkeys, sizeof(ppd->pkeys));
+
+	return 0;
+}
+
+static int __subn_get_opa_pkeytable(struct opa_smp *smp, u32 am, u8 *data,
+				    struct ib_device *ibdev, u8 port,
+				    u32 *resp_len)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	u32 n_blocks_req = OPA_AM_NBLK(am);
+	u32 start_block = am & 0x7ff;
+	__be16 *p;
+	u16 *q;
+	int i;
+	u16 n_blocks_avail;
+	unsigned npkeys = hfi1_get_npkeys(dd);
+	size_t size;
+
+	if (n_blocks_req == 0) {
+		pr_warn("OPA Get PKey AM Invalid : P = %d; B = 0x%x; N = 0x%x\n",
+			port, start_block, n_blocks_req);
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	n_blocks_avail = (u16) (npkeys/OPA_PARTITION_TABLE_BLK_SIZE) + 1;
+
+	size = (n_blocks_req * OPA_PARTITION_TABLE_BLK_SIZE) * sizeof(u16);
+
+	if (start_block + n_blocks_req > n_blocks_avail ||
+	    n_blocks_req > OPA_NUM_PKEY_BLOCKS_PER_SMP) {
+		pr_warn("OPA Get PKey AM Invalid : s 0x%x; req 0x%x; "
+			"avail 0x%x; blk/smp 0x%lx\n",
+			start_block, n_blocks_req, n_blocks_avail,
+			OPA_NUM_PKEY_BLOCKS_PER_SMP);
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	p = (__be16 *) data;
+	q = (u16 *)data;
+	/* get the real pkeys if we are requesting the first block */
+	if (start_block == 0) {
+		get_pkeys(dd, port, q);
+		for (i = 0; i < npkeys; i++)
+			p[i] = cpu_to_be16(q[i]);
+		if (resp_len)
+			*resp_len += size;
+	} else
+		smp->status |= IB_SMP_INVALID_FIELD;
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+enum {
+	HFI_TRANSITION_DISALLOWED,
+	HFI_TRANSITION_IGNORED,
+	HFI_TRANSITION_ALLOWED,
+	HFI_TRANSITION_UNDEFINED,
+};
+
+/*
+ * Use shortened names to improve readability of
+ * {logical,physical}_state_transitions
+ */
+enum {
+	__D = HFI_TRANSITION_DISALLOWED,
+	__I = HFI_TRANSITION_IGNORED,
+	__A = HFI_TRANSITION_ALLOWED,
+	__U = HFI_TRANSITION_UNDEFINED,
+};
+
+/*
+ * IB_PORTPHYSSTATE_POLLING (2) through OPA_PORTPHYSSTATE_MAX (11) are
+ * represented in physical_state_transitions.
+ */
+#define __N_PHYSTATES (OPA_PORTPHYSSTATE_MAX - IB_PORTPHYSSTATE_POLLING + 1)
+
+/*
+ * Within physical_state_transitions, rows represent "old" states,
+ * columns "new" states, and physical_state_transitions.allowed[old][new]
+ * indicates if the transition from old state to new state is legal (see
+ * OPAg1v1, Table 6-4).
+ */
+static const struct {
+	u8 allowed[__N_PHYSTATES][__N_PHYSTATES];
+} physical_state_transitions = {
+	{
+		/* 2    3    4    5    6    7    8    9   10   11 */
+	/* 2 */	{ __A, __A, __D, __D, __D, __D, __D, __D, __D, __D },
+	/* 3 */	{ __A, __I, __D, __D, __D, __D, __D, __D, __D, __A },
+	/* 4 */	{ __U, __U, __U, __U, __U, __U, __U, __U, __U, __U },
+	/* 5 */	{ __A, __A, __D, __I, __D, __D, __D, __D, __D, __D },
+	/* 6 */	{ __U, __U, __U, __U, __U, __U, __U, __U, __U, __U },
+	/* 7 */	{ __D, __A, __D, __D, __D, __I, __D, __D, __D, __D },
+	/* 8 */	{ __U, __U, __U, __U, __U, __U, __U, __U, __U, __U },
+	/* 9 */	{ __I, __A, __D, __D, __D, __D, __D, __I, __D, __D },
+	/*10 */	{ __U, __U, __U, __U, __U, __U, __U, __U, __U, __U },
+	/*11 */	{ __D, __A, __D, __D, __D, __D, __D, __D, __D, __I },
+	}
+};
+
+/*
+ * IB_PORT_DOWN (1) through IB_PORT_ACTIVE_DEFER (5) are represented
+ * logical_state_transitions
+ */
+
+#define __N_LOGICAL_STATES (IB_PORT_ACTIVE_DEFER - IB_PORT_DOWN + 1)
+
+/*
+ * Within logical_state_transitions rows represent "old" states,
+ * columns "new" states, and logical_state_transitions.allowed[old][new]
+ * indicates if the transition from old state to new state is legal (see
+ * OPAg1v1, Table 9-12).
+ */
+static const struct {
+	u8 allowed[__N_LOGICAL_STATES][__N_LOGICAL_STATES];
+} logical_state_transitions = {
+	{
+		/* 1    2    3    4    5 */
+	/* 1 */	{ __I, __D, __D, __D, __U},
+	/* 2 */	{ __D, __I, __A, __D, __U},
+	/* 3 */	{ __D, __D, __I, __A, __U},
+	/* 4 */	{ __D, __D, __I, __I, __U},
+	/* 5 */	{ __U, __U, __U, __U, __U},
+	}
+};
+
+static int logical_transition_allowed(int old, int new)
+{
+	if (old < IB_PORT_NOP || old > IB_PORT_ACTIVE_DEFER ||
+	    new < IB_PORT_NOP || new > IB_PORT_ACTIVE_DEFER) {
+		pr_warn("invalid logical state(s) (old %d new %d)\n",
+			old, new);
+		return HFI_TRANSITION_UNDEFINED;
+	}
+
+	if (new == IB_PORT_NOP)
+		return HFI_TRANSITION_ALLOWED; /* always allowed */
+
+	/* adjust states for indexing into logical_state_transitions */
+	old -= IB_PORT_DOWN;
+	new -= IB_PORT_DOWN;
+
+	if (old < 0 || new < 0)
+		return HFI_TRANSITION_UNDEFINED;
+	return logical_state_transitions.allowed[old][new];
+}
+
+static int physical_transition_allowed(int old, int new)
+{
+	if (old < IB_PORTPHYSSTATE_NOP || old > OPA_PORTPHYSSTATE_MAX ||
+	    new < IB_PORTPHYSSTATE_NOP || new > OPA_PORTPHYSSTATE_MAX) {
+		pr_warn("invalid physical state(s) (old %d new %d)\n",
+			old, new);
+		return HFI_TRANSITION_UNDEFINED;
+	}
+
+	if (new == IB_PORTPHYSSTATE_NOP)
+		return HFI_TRANSITION_ALLOWED; /* always allowed */
+
+	/* adjust states for indexing into physical_state_transitions */
+	old -= IB_PORTPHYSSTATE_POLLING;
+	new -= IB_PORTPHYSSTATE_POLLING;
+
+	if (old < 0 || new < 0)
+		return HFI_TRANSITION_UNDEFINED;
+	return physical_state_transitions.allowed[old][new];
+}
+
+static int port_states_transition_allowed(struct hfi1_pportdata *ppd,
+					  u32 logical_new, u32 physical_new)
+{
+	u32 physical_old = driver_physical_state(ppd);
+	u32 logical_old = driver_logical_state(ppd);
+	int ret, logical_allowed, physical_allowed;
+
+	logical_allowed = ret =
+		logical_transition_allowed(logical_old, logical_new);
+
+	if (ret == HFI_TRANSITION_DISALLOWED ||
+	    ret == HFI_TRANSITION_UNDEFINED) {
+		pr_warn("invalid logical state transition %s -> %s\n",
+			opa_lstate_name(logical_old),
+			opa_lstate_name(logical_new));
+		return ret;
+	}
+
+	physical_allowed = ret =
+		physical_transition_allowed(physical_old, physical_new);
+
+	if (ret == HFI_TRANSITION_DISALLOWED ||
+	    ret == HFI_TRANSITION_UNDEFINED) {
+		pr_warn("invalid physical state transition %s -> %s\n",
+			opa_pstate_name(physical_old),
+			opa_pstate_name(physical_new));
+		return ret;
+	}
+
+	if (logical_allowed == HFI_TRANSITION_IGNORED &&
+	    physical_allowed == HFI_TRANSITION_IGNORED)
+		return HFI_TRANSITION_IGNORED;
+
+	/*
+	 * Either physical_allowed or logical_allowed is
+	 * HFI_TRANSITION_ALLOWED.
+	 */
+	return HFI_TRANSITION_ALLOWED;
+}
+
+static int set_port_states(struct hfi1_pportdata *ppd, struct opa_smp *smp,
+			   u32 logical_state, u32 phys_state,
+			   int suppress_idle_sma)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	u32 link_state;
+	int ret;
+
+	ret = port_states_transition_allowed(ppd, logical_state, phys_state);
+	if (ret == HFI_TRANSITION_DISALLOWED ||
+	    ret == HFI_TRANSITION_UNDEFINED) {
+		/* error message emitted above */
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return 0;
+	}
+
+	if (ret == HFI_TRANSITION_IGNORED)
+		return 0;
+
+	if ((phys_state != IB_PORTPHYSSTATE_NOP) &&
+	    !(logical_state == IB_PORT_DOWN ||
+	      logical_state == IB_PORT_NOP)){
+		pr_warn("SubnSet(OPA_PortInfo) port state invalid: logical_state 0x%x physical_state 0x%x\n",
+			logical_state, phys_state);
+		smp->status |= IB_SMP_INVALID_FIELD;
+	}
+
+	/*
+	 * Logical state changes are summarized in OPAv1g1 spec.,
+	 * Table 9-12; physical state changes are summarized in
+	 * OPAv1g1 spec., Table 6.4.
+	 */
+	switch (logical_state) {
+	case IB_PORT_NOP:
+		if (phys_state == IB_PORTPHYSSTATE_NOP)
+			break;
+		/* FALLTHROUGH */
+	case IB_PORT_DOWN:
+		if (phys_state == IB_PORTPHYSSTATE_NOP)
+			link_state = HLS_DN_DOWNDEF;
+		else if (phys_state == IB_PORTPHYSSTATE_POLLING) {
+			link_state = HLS_DN_POLL;
+			set_link_down_reason(ppd,
+			     OPA_LINKDOWN_REASON_FM_BOUNCE, 0,
+			     OPA_LINKDOWN_REASON_FM_BOUNCE);
+		} else if (phys_state == IB_PORTPHYSSTATE_DISABLED)
+			link_state = HLS_DN_DISABLE;
+		else {
+			pr_warn("SubnSet(OPA_PortInfo) invalid physical state 0x%x\n",
+				phys_state);
+			smp->status |= IB_SMP_INVALID_FIELD;
+			break;
+		}
+
+		set_link_state(ppd, link_state);
+		if (link_state == HLS_DN_DISABLE &&
+		    (ppd->offline_disabled_reason >
+		     OPA_LINKDOWN_REASON_SMA_DISABLED ||
+		     ppd->offline_disabled_reason ==
+		     OPA_LINKDOWN_REASON_NONE))
+			ppd->offline_disabled_reason =
+			OPA_LINKDOWN_REASON_SMA_DISABLED;
+		/*
+		 * Don't send a reply if the response would be sent
+		 * through the disabled port.
+		 */
+		if (link_state == HLS_DN_DISABLE && smp->hop_cnt)
+			return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED;
+		break;
+	case IB_PORT_ARMED:
+		ret = set_link_state(ppd, HLS_UP_ARMED);
+		if ((ret == 0) && (suppress_idle_sma == 0))
+			send_idle_sma(dd, SMA_IDLE_ARM);
+		break;
+	case IB_PORT_ACTIVE:
+		if (ppd->neighbor_normal) {
+			ret = set_link_state(ppd, HLS_UP_ACTIVE);
+			if (ret == 0)
+				send_idle_sma(dd, SMA_IDLE_ACTIVE);
+		} else {
+			pr_warn("SubnSet(OPA_PortInfo) Cannot move to Active with NeighborNormal 0\n");
+			smp->status |= IB_SMP_INVALID_FIELD;
+		}
+		break;
+	default:
+		pr_warn("SubnSet(OPA_PortInfo) invalid logical state 0x%x\n",
+			logical_state);
+		smp->status |= IB_SMP_INVALID_FIELD;
+	}
+
+	return 0;
+}
+
+/**
+ * subn_set_opa_portinfo - set port information
+ * @smp: the incoming SM packet
+ * @ibdev: the infiniband device
+ * @port: the port on the device
+ *
+ */
+static int __subn_set_opa_portinfo(struct opa_smp *smp, u32 am, u8 *data,
+				   struct ib_device *ibdev, u8 port,
+				   u32 *resp_len)
+{
+	struct opa_port_info *pi = (struct opa_port_info *)data;
+	struct ib_event event;
+	struct hfi1_devdata *dd;
+	struct hfi1_pportdata *ppd;
+	struct hfi1_ibport *ibp;
+	u8 clientrereg;
+	unsigned long flags;
+	u32 smlid, opa_lid; /* tmp vars to hold LID values */
+	u16 lid;
+	u8 ls_old, ls_new, ps_new;
+	u8 vls;
+	u8 msl;
+	u8 crc_enabled;
+	u16 lse, lwe, mtu;
+	u32 num_ports = OPA_AM_NPORT(am);
+	u32 start_of_sm_config = OPA_AM_START_SM_CFG(am);
+	int ret, i, invalid = 0, call_set_mtu = 0;
+	int call_link_downgrade_policy = 0;
+
+	if (num_ports != 1) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	opa_lid = be32_to_cpu(pi->lid);
+	if (opa_lid & 0xFFFF0000) {
+		pr_warn("OPA_PortInfo lid out of range: %X\n", opa_lid);
+		smp->status |= IB_SMP_INVALID_FIELD;
+		goto get_only;
+	}
+
+	lid = (u16)(opa_lid & 0x0000FFFF);
+
+	smlid = be32_to_cpu(pi->sm_lid);
+	if (smlid & 0xFFFF0000) {
+		pr_warn("OPA_PortInfo SM lid out of range: %X\n", smlid);
+		smp->status |= IB_SMP_INVALID_FIELD;
+		goto get_only;
+	}
+	smlid &= 0x0000FFFF;
+
+	clientrereg = (pi->clientrereg_subnettimeout &
+			OPA_PI_MASK_CLIENT_REREGISTER);
+
+	dd = dd_from_ibdev(ibdev);
+	/* IB numbers ports from 1, hw from 0 */
+	ppd = dd->pport + (port - 1);
+	ibp = &ppd->ibport_data;
+	event.device = ibdev;
+	event.element.port_num = port;
+
+	ls_old = driver_lstate(ppd);
+
+	ibp->mkey = pi->mkey;
+	ibp->gid_prefix = pi->subnet_prefix;
+	ibp->mkey_lease_period = be16_to_cpu(pi->mkey_lease_period);
+
+	/* Must be a valid unicast LID address. */
+	if ((lid == 0 && ls_old > IB_PORT_INIT) ||
+	     lid >= HFI1_MULTICAST_LID_BASE) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		pr_warn("SubnSet(OPA_PortInfo) lid invalid 0x%x\n",
+			lid);
+	} else if (ppd->lid != lid ||
+		 ppd->lmc != (pi->mkeyprotect_lmc & OPA_PI_MASK_LMC)) {
+		if (ppd->lid != lid)
+			hfi1_set_uevent_bits(ppd, _HFI1_EVENT_LID_CHANGE_BIT);
+		if (ppd->lmc != (pi->mkeyprotect_lmc & OPA_PI_MASK_LMC))
+			hfi1_set_uevent_bits(ppd, _HFI1_EVENT_LMC_CHANGE_BIT);
+		hfi1_set_lid(ppd, lid, pi->mkeyprotect_lmc & OPA_PI_MASK_LMC);
+		event.event = IB_EVENT_LID_CHANGE;
+		ib_dispatch_event(&event);
+	}
+
+	msl = pi->smsl & OPA_PI_MASK_SMSL;
+	if (pi->partenforce_filterraw & OPA_PI_MASK_LINKINIT_REASON)
+		ppd->linkinit_reason =
+			(pi->partenforce_filterraw &
+			 OPA_PI_MASK_LINKINIT_REASON);
+	/* enable/disable SW pkey checking as per FM control */
+	if (pi->partenforce_filterraw & OPA_PI_MASK_PARTITION_ENFORCE_IN)
+		ppd->part_enforce |= HFI1_PART_ENFORCE_IN;
+	else
+		ppd->part_enforce &= ~HFI1_PART_ENFORCE_IN;
+
+	if (pi->partenforce_filterraw & OPA_PI_MASK_PARTITION_ENFORCE_OUT)
+		ppd->part_enforce |= HFI1_PART_ENFORCE_OUT;
+	else
+		ppd->part_enforce &= ~HFI1_PART_ENFORCE_OUT;
+
+	/* Must be a valid unicast LID address. */
+	if ((smlid == 0 && ls_old > IB_PORT_INIT) ||
+	     smlid >= HFI1_MULTICAST_LID_BASE) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		pr_warn("SubnSet(OPA_PortInfo) smlid invalid 0x%x\n", smlid);
+	} else if (smlid != ibp->sm_lid || msl != ibp->sm_sl) {
+		pr_warn("SubnSet(OPA_PortInfo) smlid 0x%x\n", smlid);
+		spin_lock_irqsave(&ibp->lock, flags);
+		if (ibp->sm_ah) {
+			if (smlid != ibp->sm_lid)
+				ibp->sm_ah->attr.dlid = smlid;
+			if (msl != ibp->sm_sl)
+				ibp->sm_ah->attr.sl = msl;
+		}
+		spin_unlock_irqrestore(&ibp->lock, flags);
+		if (smlid != ibp->sm_lid)
+			ibp->sm_lid = smlid;
+		if (msl != ibp->sm_sl)
+			ibp->sm_sl = msl;
+		event.event = IB_EVENT_SM_CHANGE;
+		ib_dispatch_event(&event);
+	}
+
+	if (pi->link_down_reason == 0) {
+		ppd->local_link_down_reason.sma = 0;
+		ppd->local_link_down_reason.latest = 0;
+	}
+
+	if (pi->neigh_link_down_reason == 0) {
+		ppd->neigh_link_down_reason.sma = 0;
+		ppd->neigh_link_down_reason.latest = 0;
+	}
+
+	ppd->sm_trap_qp = be32_to_cpu(pi->sm_trap_qp);
+	ppd->sa_qp = be32_to_cpu(pi->sa_qp);
+
+	ppd->port_error_action = be32_to_cpu(pi->port_error_action);
+	lwe = be16_to_cpu(pi->link_width.enabled);
+	if (lwe) {
+		if (lwe == OPA_LINK_WIDTH_RESET
+				|| lwe == OPA_LINK_WIDTH_RESET_OLD)
+			set_link_width_enabled(ppd, ppd->link_width_supported);
+		else if ((lwe & ~ppd->link_width_supported) == 0)
+			set_link_width_enabled(ppd, lwe);
+		else
+			smp->status |= IB_SMP_INVALID_FIELD;
+	}
+	lwe = be16_to_cpu(pi->link_width_downgrade.enabled);
+	/* LWD.E is always applied - 0 means "disabled" */
+	if (lwe == OPA_LINK_WIDTH_RESET
+			|| lwe == OPA_LINK_WIDTH_RESET_OLD) {
+		set_link_width_downgrade_enabled(ppd,
+				ppd->link_width_downgrade_supported);
+	} else if ((lwe & ~ppd->link_width_downgrade_supported) == 0) {
+		/* only set and apply if something changed */
+		if (lwe != ppd->link_width_downgrade_enabled) {
+			set_link_width_downgrade_enabled(ppd, lwe);
+			call_link_downgrade_policy = 1;
+		}
+	} else
+		smp->status |= IB_SMP_INVALID_FIELD;
+
+	lse = be16_to_cpu(pi->link_speed.enabled);
+	if (lse) {
+		if (lse & be16_to_cpu(pi->link_speed.supported))
+			set_link_speed_enabled(ppd, lse);
+		else
+			smp->status |= IB_SMP_INVALID_FIELD;
+	}
+
+	ibp->mkeyprot = (pi->mkeyprotect_lmc & OPA_PI_MASK_MKEY_PROT_BIT) >> 6;
+	ibp->vl_high_limit = be16_to_cpu(pi->vl.high_limit) & 0xFF;
+	(void)hfi1_set_ib_cfg(ppd, HFI1_IB_CFG_VL_HIGH_LIMIT,
+				    ibp->vl_high_limit);
+
+	if (ppd->vls_supported/2 > ARRAY_SIZE(pi->neigh_mtu.pvlx_to_mtu) ||
+		ppd->vls_supported > ARRAY_SIZE(dd->vld)) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+	for (i = 0; i < ppd->vls_supported; i++) {
+		if ((i % 2) == 0)
+			mtu = enum_to_mtu((pi->neigh_mtu.pvlx_to_mtu[i/2] >> 4)
+					  & 0xF);
+		else
+			mtu = enum_to_mtu(pi->neigh_mtu.pvlx_to_mtu[i/2] & 0xF);
+		if (mtu == 0xffff) {
+			pr_warn("SubnSet(OPA_PortInfo) mtu invalid %d (0x%x)\n",
+				mtu,
+				(pi->neigh_mtu.pvlx_to_mtu[0] >> 4) & 0xF);
+			smp->status |= IB_SMP_INVALID_FIELD;
+			mtu = hfi1_max_mtu; /* use a valid MTU */
+		}
+		if (dd->vld[i].mtu != mtu) {
+			dd_dev_info(dd,
+				"MTU change on vl %d from %d to %d\n",
+				i, dd->vld[i].mtu, mtu);
+			dd->vld[i].mtu = mtu;
+			call_set_mtu++;
+		}
+	}
+	/* As per OPAV1 spec: VL15 must support and be configured
+	 * for operation with a 2048 or larger MTU.
+	 */
+	mtu = enum_to_mtu(pi->neigh_mtu.pvlx_to_mtu[15/2] & 0xF);
+	if (mtu < 2048 || mtu == 0xffff)
+		mtu = 2048;
+	if (dd->vld[15].mtu != mtu) {
+		dd_dev_info(dd,
+			"MTU change on vl 15 from %d to %d\n",
+			dd->vld[15].mtu, mtu);
+		dd->vld[15].mtu = mtu;
+		call_set_mtu++;
+	}
+	if (call_set_mtu)
+		set_mtu(ppd);
+
+	/* Set operational VLs */
+	vls = pi->operational_vls & OPA_PI_MASK_OPERATIONAL_VL;
+	if (vls) {
+		if (vls > ppd->vls_supported) {
+			pr_warn("SubnSet(OPA_PortInfo) VL's supported invalid %d\n",
+				pi->operational_vls);
+			smp->status |= IB_SMP_INVALID_FIELD;
+		} else {
+			if (hfi1_set_ib_cfg(ppd, HFI1_IB_CFG_OP_VLS,
+						vls) == -EINVAL)
+				smp->status |= IB_SMP_INVALID_FIELD;
+		}
+	}
+
+	if (pi->mkey_violations == 0)
+		ibp->mkey_violations = 0;
+
+	if (pi->pkey_violations == 0)
+		ibp->pkey_violations = 0;
+
+	if (pi->qkey_violations == 0)
+		ibp->qkey_violations = 0;
+
+	ibp->subnet_timeout =
+		pi->clientrereg_subnettimeout & OPA_PI_MASK_SUBNET_TIMEOUT;
+
+	crc_enabled = be16_to_cpu(pi->port_ltp_crc_mode);
+	crc_enabled >>= 4;
+	crc_enabled &= 0xf;
+
+	if (crc_enabled != 0)
+		ppd->port_crc_mode_enabled = port_ltp_to_cap(crc_enabled);
+
+	ppd->is_active_optimize_enabled =
+			!!(be16_to_cpu(pi->port_mode)
+					& OPA_PI_MASK_PORT_ACTIVE_OPTOMIZE);
+
+	ls_new = pi->port_states.portphysstate_portstate &
+			OPA_PI_MASK_PORT_STATE;
+	ps_new = (pi->port_states.portphysstate_portstate &
+			OPA_PI_MASK_PORT_PHYSICAL_STATE) >> 4;
+
+	if (ls_old == IB_PORT_INIT) {
+		if (start_of_sm_config) {
+			if (ls_new == ls_old || (ls_new == IB_PORT_ARMED))
+				ppd->is_sm_config_started = 1;
+		} else if (ls_new == IB_PORT_ARMED) {
+			if (ppd->is_sm_config_started == 0)
+				invalid = 1;
+		}
+	}
+
+	/* Handle CLIENT_REREGISTER event b/c SM asked us for it */
+	if (clientrereg) {
+		event.event = IB_EVENT_CLIENT_REREGISTER;
+		ib_dispatch_event(&event);
+	}
+
+	/*
+	 * Do the port state change now that the other link parameters
+	 * have been set.
+	 * Changing the port physical state only makes sense if the link
+	 * is down or is being set to down.
+	 */
+
+	ret = set_port_states(ppd, smp, ls_new, ps_new, invalid);
+	if (ret)
+		return ret;
+
+	ret = __subn_get_opa_portinfo(smp, am, data, ibdev, port, resp_len);
+
+	/* restore re-reg bit per o14-12.2.1 */
+	pi->clientrereg_subnettimeout |= clientrereg;
+
+	/*
+	 * Apply the new link downgrade policy.  This may result in a link
+	 * bounce.  Do this after everything else so things are settled.
+	 * Possible problem: if setting the port state above fails, then
+	 * the policy change is not applied.
+	 */
+	if (call_link_downgrade_policy)
+		apply_link_downgrade_policy(ppd, 0);
+
+	return ret;
+
+get_only:
+	return __subn_get_opa_portinfo(smp, am, data, ibdev, port, resp_len);
+}
+
+/**
+ * set_pkeys - set the PKEY table for ctxt 0
+ * @dd: the hfi1_ib device
+ * @port: the IB port number
+ * @pkeys: the PKEY table
+ */
+static int set_pkeys(struct hfi1_devdata *dd, u8 port, u16 *pkeys)
+{
+	struct hfi1_pportdata *ppd;
+	int i;
+	int changed = 0;
+	int update_includes_mgmt_partition = 0;
+
+	/*
+	 * IB port one/two always maps to context zero/one,
+	 * always a kernel context, no locking needed
+	 * If we get here with ppd setup, no need to check
+	 * that rcd is valid.
+	 */
+	ppd = dd->pport + (port - 1);
+	/*
+	 * If the update does not include the management pkey, don't do it.
+	 */
+	for (i = 0; i < ARRAY_SIZE(ppd->pkeys); i++) {
+		if (pkeys[i] == LIM_MGMT_P_KEY) {
+			update_includes_mgmt_partition = 1;
+			break;
+		}
+	}
+
+	if (!update_includes_mgmt_partition)
+		return 1;
+
+	for (i = 0; i < ARRAY_SIZE(ppd->pkeys); i++) {
+		u16 key = pkeys[i];
+		u16 okey = ppd->pkeys[i];
+
+		if (key == okey)
+			continue;
+		/*
+		 * The SM gives us the complete PKey table. We have
+		 * to ensure that we put the PKeys in the matching
+		 * slots.
+		 */
+		ppd->pkeys[i] = key;
+		changed = 1;
+	}
+
+	if (changed) {
+		struct ib_event event;
+
+		(void)hfi1_set_ib_cfg(ppd, HFI1_IB_CFG_PKEYS, 0);
+
+		event.event = IB_EVENT_PKEY_CHANGE;
+		event.device = &dd->verbs_dev.ibdev;
+		event.element.port_num = port;
+		ib_dispatch_event(&event);
+	}
+	return 0;
+}
+
+static int __subn_set_opa_pkeytable(struct opa_smp *smp, u32 am, u8 *data,
+				    struct ib_device *ibdev, u8 port,
+				    u32 *resp_len)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	u32 n_blocks_sent = OPA_AM_NBLK(am);
+	u32 start_block = am & 0x7ff;
+	u16 *p = (u16 *) data;
+	__be16 *q = (__be16 *)data;
+	int i;
+	u16 n_blocks_avail;
+	unsigned npkeys = hfi1_get_npkeys(dd);
+
+	if (n_blocks_sent == 0) {
+		pr_warn("OPA Get PKey AM Invalid : P = %d; B = 0x%x; N = 0x%x\n",
+			port, start_block, n_blocks_sent);
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	n_blocks_avail = (u16)(npkeys/OPA_PARTITION_TABLE_BLK_SIZE) + 1;
+
+	if (start_block + n_blocks_sent > n_blocks_avail ||
+	    n_blocks_sent > OPA_NUM_PKEY_BLOCKS_PER_SMP) {
+		pr_warn("OPA Set PKey AM Invalid : s 0x%x; req 0x%x; avail 0x%x; blk/smp 0x%lx\n",
+			start_block, n_blocks_sent, n_blocks_avail,
+			OPA_NUM_PKEY_BLOCKS_PER_SMP);
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	for (i = 0; i < n_blocks_sent * OPA_PARTITION_TABLE_BLK_SIZE; i++)
+		p[i] = be16_to_cpu(q[i]);
+
+	if (start_block == 0 && set_pkeys(dd, port, p) != 0) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	return __subn_get_opa_pkeytable(smp, am, data, ibdev, port, resp_len);
+}
+
+static int get_sc2vlt_tables(struct hfi1_devdata *dd, void *data)
+{
+	u64 *val = (u64 *)data;
+
+	*val++ = read_csr(dd, SEND_SC2VLT0);
+	*val++ = read_csr(dd, SEND_SC2VLT1);
+	*val++ = read_csr(dd, SEND_SC2VLT2);
+	*val++ = read_csr(dd, SEND_SC2VLT3);
+	return 0;
+}
+
+#define ILLEGAL_VL 12
+/*
+ * filter_sc2vlt changes mappings to VL15 to ILLEGAL_VL (except
+ * for SC15, which must map to VL15). If we don't remap things this
+ * way it is possible for VL15 counters to increment when we try to
+ * send on a SC which is mapped to an invalid VL.
+ */
+static void filter_sc2vlt(void *data)
+{
+	int i;
+	u8 *pd = (u8 *)data;
+
+	for (i = 0; i < OPA_MAX_SCS; i++) {
+		if (i == 15)
+			continue;
+		if ((pd[i] & 0x1f) == 0xf)
+			pd[i] = ILLEGAL_VL;
+	}
+}
+
+static int set_sc2vlt_tables(struct hfi1_devdata *dd, void *data)
+{
+	u64 *val = (u64 *)data;
+
+	filter_sc2vlt(data);
+
+	write_csr(dd, SEND_SC2VLT0, *val++);
+	write_csr(dd, SEND_SC2VLT1, *val++);
+	write_csr(dd, SEND_SC2VLT2, *val++);
+	write_csr(dd, SEND_SC2VLT3, *val++);
+	write_seqlock_irq(&dd->sc2vl_lock);
+	memcpy(dd->sc2vl, (u64 *)data, sizeof(dd->sc2vl));
+	write_sequnlock_irq(&dd->sc2vl_lock);
+	return 0;
+}
+
+static int __subn_get_opa_sl_to_sc(struct opa_smp *smp, u32 am, u8 *data,
+				   struct ib_device *ibdev, u8 port,
+				   u32 *resp_len)
+{
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	u8 *p = (u8 *)data;
+	size_t size = ARRAY_SIZE(ibp->sl_to_sc); /* == 32 */
+	unsigned i;
+
+	if (am) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(ibp->sl_to_sc); i++)
+		*p++ = ibp->sl_to_sc[i];
+
+	if (resp_len)
+		*resp_len += size;
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int __subn_set_opa_sl_to_sc(struct opa_smp *smp, u32 am, u8 *data,
+				   struct ib_device *ibdev, u8 port,
+				   u32 *resp_len)
+{
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	u8 *p = (u8 *)data;
+	int i;
+
+	if (am) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	for (i = 0; i <  ARRAY_SIZE(ibp->sl_to_sc); i++)
+		ibp->sl_to_sc[i] = *p++;
+
+	return __subn_get_opa_sl_to_sc(smp, am, data, ibdev, port, resp_len);
+}
+
+static int __subn_get_opa_sc_to_sl(struct opa_smp *smp, u32 am, u8 *data,
+				   struct ib_device *ibdev, u8 port,
+				   u32 *resp_len)
+{
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	u8 *p = (u8 *)data;
+	size_t size = ARRAY_SIZE(ibp->sc_to_sl); /* == 32 */
+	unsigned i;
+
+	if (am) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(ibp->sc_to_sl); i++)
+		*p++ = ibp->sc_to_sl[i];
+
+	if (resp_len)
+		*resp_len += size;
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int __subn_set_opa_sc_to_sl(struct opa_smp *smp, u32 am, u8 *data,
+				   struct ib_device *ibdev, u8 port,
+				   u32 *resp_len)
+{
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	u8 *p = (u8 *)data;
+	int i;
+
+	if (am) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(ibp->sc_to_sl); i++)
+		ibp->sc_to_sl[i] = *p++;
+
+	return __subn_get_opa_sc_to_sl(smp, am, data, ibdev, port, resp_len);
+}
+
+static int __subn_get_opa_sc_to_vlt(struct opa_smp *smp, u32 am, u8 *data,
+				    struct ib_device *ibdev, u8 port,
+				    u32 *resp_len)
+{
+	u32 n_blocks = OPA_AM_NBLK(am);
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	void *vp = (void *) data;
+	size_t size = 4 * sizeof(u64);
+
+	if (n_blocks != 1) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	get_sc2vlt_tables(dd, vp);
+
+	if (resp_len)
+		*resp_len += size;
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int __subn_set_opa_sc_to_vlt(struct opa_smp *smp, u32 am, u8 *data,
+				    struct ib_device *ibdev, u8 port,
+				    u32 *resp_len)
+{
+	u32 n_blocks = OPA_AM_NBLK(am);
+	int async_update = OPA_AM_ASYNC(am);
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	void *vp = (void *) data;
+	struct hfi1_pportdata *ppd;
+	int lstate;
+
+	if (n_blocks != 1 || async_update) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	/* IB numbers ports from 1, hw from 0 */
+	ppd = dd->pport + (port - 1);
+	lstate = driver_lstate(ppd);
+	/* it's known that async_update is 0 by this point, but include
+	 * the explicit check for clarity */
+	if (!async_update &&
+	    (lstate == IB_PORT_ARMED || lstate == IB_PORT_ACTIVE)) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	set_sc2vlt_tables(dd, vp);
+
+	return __subn_get_opa_sc_to_vlt(smp, am, data, ibdev, port, resp_len);
+}
+
+static int __subn_get_opa_sc_to_vlnt(struct opa_smp *smp, u32 am, u8 *data,
+				     struct ib_device *ibdev, u8 port,
+				     u32 *resp_len)
+{
+	u32 n_blocks = OPA_AM_NPORT(am);
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct hfi1_pportdata *ppd;
+	void *vp = (void *) data;
+	int size;
+
+	if (n_blocks != 1) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	ppd = dd->pport + (port - 1);
+
+	size = fm_get_table(ppd, FM_TBL_SC2VLNT, vp);
+
+	if (resp_len)
+		*resp_len += size;
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int __subn_set_opa_sc_to_vlnt(struct opa_smp *smp, u32 am, u8 *data,
+				     struct ib_device *ibdev, u8 port,
+				     u32 *resp_len)
+{
+	u32 n_blocks = OPA_AM_NPORT(am);
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct hfi1_pportdata *ppd;
+	void *vp = (void *) data;
+	int lstate;
+
+	if (n_blocks != 1) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	/* IB numbers ports from 1, hw from 0 */
+	ppd = dd->pport + (port - 1);
+	lstate = driver_lstate(ppd);
+	if (lstate == IB_PORT_ARMED || lstate == IB_PORT_ACTIVE) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	ppd = dd->pport + (port - 1);
+
+	fm_set_table(ppd, FM_TBL_SC2VLNT, vp);
+
+	return __subn_get_opa_sc_to_vlnt(smp, am, data, ibdev, port,
+					 resp_len);
+}
+
+static int __subn_get_opa_psi(struct opa_smp *smp, u32 am, u8 *data,
+			      struct ib_device *ibdev, u8 port,
+			      u32 *resp_len)
+{
+	u32 nports = OPA_AM_NPORT(am);
+	u32 start_of_sm_config = OPA_AM_START_SM_CFG(am);
+	u32 lstate;
+	struct hfi1_ibport *ibp;
+	struct hfi1_pportdata *ppd;
+	struct opa_port_state_info *psi = (struct opa_port_state_info *) data;
+
+	if (nports != 1) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	ibp = to_iport(ibdev, port);
+	ppd = ppd_from_ibp(ibp);
+
+	lstate = driver_lstate(ppd);
+
+	if (start_of_sm_config && (lstate == IB_PORT_INIT))
+		ppd->is_sm_config_started = 1;
+
+#if PI_LED_ENABLE_SUP
+	psi->port_states.ledenable_offlinereason = ppd->neighbor_normal << 4;
+	psi->port_states.ledenable_offlinereason |=
+		ppd->is_sm_config_started << 5;
+	psi->port_states.ledenable_offlinereason |=
+		ppd->offline_disabled_reason & OPA_PI_MASK_OFFLINE_REASON;
+#else
+	psi->port_states.offline_reason = ppd->neighbor_normal << 4;
+	psi->port_states.offline_reason |= ppd->is_sm_config_started << 5;
+	psi->port_states.offline_reason |= ppd->offline_disabled_reason &
+				OPA_PI_MASK_OFFLINE_REASON;
+#endif /* PI_LED_ENABLE_SUP */
+
+	psi->port_states.portphysstate_portstate =
+		(hfi1_ibphys_portstate(ppd) << 4) | (lstate & 0xf);
+	psi->link_width_downgrade_tx_active =
+	  ppd->link_width_downgrade_tx_active;
+	psi->link_width_downgrade_rx_active =
+	  ppd->link_width_downgrade_rx_active;
+	if (resp_len)
+		*resp_len += sizeof(struct opa_port_state_info);
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int __subn_set_opa_psi(struct opa_smp *smp, u32 am, u8 *data,
+			      struct ib_device *ibdev, u8 port,
+			      u32 *resp_len)
+{
+	u32 nports = OPA_AM_NPORT(am);
+	u32 start_of_sm_config = OPA_AM_START_SM_CFG(am);
+	u32 ls_old;
+	u8 ls_new, ps_new;
+	struct hfi1_ibport *ibp;
+	struct hfi1_pportdata *ppd;
+	struct opa_port_state_info *psi = (struct opa_port_state_info *) data;
+	int ret, invalid = 0;
+
+	if (nports != 1) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	ibp = to_iport(ibdev, port);
+	ppd = ppd_from_ibp(ibp);
+
+	ls_old = driver_lstate(ppd);
+
+	ls_new = port_states_to_logical_state(&psi->port_states);
+	ps_new = port_states_to_phys_state(&psi->port_states);
+
+	if (ls_old == IB_PORT_INIT) {
+		if (start_of_sm_config) {
+			if (ls_new == ls_old || (ls_new == IB_PORT_ARMED))
+				ppd->is_sm_config_started = 1;
+		} else if (ls_new == IB_PORT_ARMED) {
+			if (ppd->is_sm_config_started == 0)
+				invalid = 1;
+		}
+	}
+
+	ret = set_port_states(ppd, smp, ls_new, ps_new, invalid);
+	if (ret)
+		return ret;
+
+	if (invalid)
+		smp->status |= IB_SMP_INVALID_FIELD;
+
+	return __subn_get_opa_psi(smp, am, data, ibdev, port, resp_len);
+}
+
+static int __subn_get_opa_cable_info(struct opa_smp *smp, u32 am, u8 *data,
+				     struct ib_device *ibdev, u8 port,
+				     u32 *resp_len)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	u32 addr = OPA_AM_CI_ADDR(am);
+	u32 len = OPA_AM_CI_LEN(am) + 1;
+	int ret;
+
+#define __CI_PAGE_SIZE (1 << 7) /* 128 bytes */
+#define __CI_PAGE_MASK ~(__CI_PAGE_SIZE - 1)
+#define __CI_PAGE_NUM(a) ((a) & __CI_PAGE_MASK)
+
+	/* check that addr is within spec, and
+	 * addr and (addr + len - 1) are on the same "page" */
+	if (addr >= 4096 ||
+		(__CI_PAGE_NUM(addr) != __CI_PAGE_NUM(addr + len - 1))) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	ret = get_cable_info(dd, port, addr, len, data);
+
+	if (ret == -ENODEV) {
+		smp->status |= IB_SMP_UNSUP_METH_ATTR;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	/* The address range for the CableInfo SMA query is wider than the
+	 * memory available on the QSFP cable. We want to return a valid
+	 * response, albeit zeroed out, for address ranges beyond available
+	 * memory but that are within the CableInfo query spec
+	 */
+	if (ret < 0 && ret != -ERANGE) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	if (resp_len)
+		*resp_len += len;
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int __subn_get_opa_bct(struct opa_smp *smp, u32 am, u8 *data,
+			      struct ib_device *ibdev, u8 port, u32 *resp_len)
+{
+	u32 num_ports = OPA_AM_NPORT(am);
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct hfi1_pportdata *ppd;
+	struct buffer_control *p = (struct buffer_control *) data;
+	int size;
+
+	if (num_ports != 1) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	ppd = dd->pport + (port - 1);
+	size = fm_get_table(ppd, FM_TBL_BUFFER_CONTROL, p);
+	trace_bct_get(dd, p);
+	if (resp_len)
+		*resp_len += size;
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int __subn_set_opa_bct(struct opa_smp *smp, u32 am, u8 *data,
+			      struct ib_device *ibdev, u8 port, u32 *resp_len)
+{
+	u32 num_ports = OPA_AM_NPORT(am);
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct hfi1_pportdata *ppd;
+	struct buffer_control *p = (struct buffer_control *) data;
+
+	if (num_ports != 1) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+	ppd = dd->pport + (port - 1);
+	trace_bct_set(dd, p);
+	if (fm_set_table(ppd, FM_TBL_BUFFER_CONTROL, p) < 0) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	return __subn_get_opa_bct(smp, am, data, ibdev, port, resp_len);
+}
+
+static int __subn_get_opa_vl_arb(struct opa_smp *smp, u32 am, u8 *data,
+				 struct ib_device *ibdev, u8 port,
+				 u32 *resp_len)
+{
+	struct hfi1_pportdata *ppd = ppd_from_ibp(to_iport(ibdev, port));
+	u32 num_ports = OPA_AM_NPORT(am);
+	u8 section = (am & 0x00ff0000) >> 16;
+	u8 *p = data;
+	int size = 0;
+
+	if (num_ports != 1) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	switch (section) {
+	case OPA_VLARB_LOW_ELEMENTS:
+		size = fm_get_table(ppd, FM_TBL_VL_LOW_ARB, p);
+		break;
+	case OPA_VLARB_HIGH_ELEMENTS:
+		size = fm_get_table(ppd, FM_TBL_VL_HIGH_ARB, p);
+		break;
+	case OPA_VLARB_PREEMPT_ELEMENTS:
+		size = fm_get_table(ppd, FM_TBL_VL_PREEMPT_ELEMS, p);
+		break;
+	case OPA_VLARB_PREEMPT_MATRIX:
+		size = fm_get_table(ppd, FM_TBL_VL_PREEMPT_MATRIX, p);
+		break;
+	default:
+		pr_warn("OPA SubnGet(VL Arb) AM Invalid : 0x%x\n",
+			be32_to_cpu(smp->attr_mod));
+		smp->status |= IB_SMP_INVALID_FIELD;
+		break;
+	}
+
+	if (size > 0 && resp_len)
+		*resp_len += size;
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int __subn_set_opa_vl_arb(struct opa_smp *smp, u32 am, u8 *data,
+				 struct ib_device *ibdev, u8 port,
+				 u32 *resp_len)
+{
+	struct hfi1_pportdata *ppd = ppd_from_ibp(to_iport(ibdev, port));
+	u32 num_ports = OPA_AM_NPORT(am);
+	u8 section = (am & 0x00ff0000) >> 16;
+	u8 *p = data;
+
+	if (num_ports != 1) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	switch (section) {
+	case OPA_VLARB_LOW_ELEMENTS:
+		(void) fm_set_table(ppd, FM_TBL_VL_LOW_ARB, p);
+		break;
+	case OPA_VLARB_HIGH_ELEMENTS:
+		(void) fm_set_table(ppd, FM_TBL_VL_HIGH_ARB, p);
+		break;
+	/* neither OPA_VLARB_PREEMPT_ELEMENTS, or OPA_VLARB_PREEMPT_MATRIX
+	 * can be changed from the default values */
+	case OPA_VLARB_PREEMPT_ELEMENTS:
+		/* FALLTHROUGH */
+	case OPA_VLARB_PREEMPT_MATRIX:
+		smp->status |= IB_SMP_UNSUP_METH_ATTR;
+		break;
+	default:
+		pr_warn("OPA SubnSet(VL Arb) AM Invalid : 0x%x\n",
+			be32_to_cpu(smp->attr_mod));
+		smp->status |= IB_SMP_INVALID_FIELD;
+		break;
+	}
+
+	return __subn_get_opa_vl_arb(smp, am, data, ibdev, port, resp_len);
+}
+
+struct opa_pma_mad {
+	struct ib_mad_hdr mad_hdr;
+	u8 data[2024];
+} __packed;
+
+struct opa_class_port_info {
+	u8 base_version;
+	u8 class_version;
+	__be16 cap_mask;
+	__be32 cap_mask2_resp_time;
+
+	u8 redirect_gid[16];
+	__be32 redirect_tc_fl;
+	__be32 redirect_lid;
+	__be32 redirect_sl_qp;
+	__be32 redirect_qkey;
+
+	u8 trap_gid[16];
+	__be32 trap_tc_fl;
+	__be32 trap_lid;
+	__be32 trap_hl_qp;
+	__be32 trap_qkey;
+
+	__be16 trap_pkey;
+	__be16 redirect_pkey;
+
+	u8 trap_sl_rsvd;
+	u8 reserved[3];
+} __packed;
+
+struct opa_port_status_req {
+	__u8 port_num;
+	__u8 reserved[3];
+	__be32 vl_select_mask;
+};
+
+#define VL_MASK_ALL		0x000080ff
+
+struct opa_port_status_rsp {
+	__u8 port_num;
+	__u8 reserved[3];
+	__be32  vl_select_mask;
+
+	/* Data counters */
+	__be64 port_xmit_data;
+	__be64 port_rcv_data;
+	__be64 port_xmit_pkts;
+	__be64 port_rcv_pkts;
+	__be64 port_multicast_xmit_pkts;
+	__be64 port_multicast_rcv_pkts;
+	__be64 port_xmit_wait;
+	__be64 sw_port_congestion;
+	__be64 port_rcv_fecn;
+	__be64 port_rcv_becn;
+	__be64 port_xmit_time_cong;
+	__be64 port_xmit_wasted_bw;
+	__be64 port_xmit_wait_data;
+	__be64 port_rcv_bubble;
+	__be64 port_mark_fecn;
+	/* Error counters */
+	__be64 port_rcv_constraint_errors;
+	__be64 port_rcv_switch_relay_errors;
+	__be64 port_xmit_discards;
+	__be64 port_xmit_constraint_errors;
+	__be64 port_rcv_remote_physical_errors;
+	__be64 local_link_integrity_errors;
+	__be64 port_rcv_errors;
+	__be64 excessive_buffer_overruns;
+	__be64 fm_config_errors;
+	__be32 link_error_recovery;
+	__be32 link_downed;
+	u8 uncorrectable_errors;
+
+	u8 link_quality_indicator; /* 5res, 3bit */
+	u8 res2[6];
+	struct _vls_pctrs {
+		/* per-VL Data counters */
+		__be64 port_vl_xmit_data;
+		__be64 port_vl_rcv_data;
+		__be64 port_vl_xmit_pkts;
+		__be64 port_vl_rcv_pkts;
+		__be64 port_vl_xmit_wait;
+		__be64 sw_port_vl_congestion;
+		__be64 port_vl_rcv_fecn;
+		__be64 port_vl_rcv_becn;
+		__be64 port_xmit_time_cong;
+		__be64 port_vl_xmit_wasted_bw;
+		__be64 port_vl_xmit_wait_data;
+		__be64 port_vl_rcv_bubble;
+		__be64 port_vl_mark_fecn;
+		__be64 port_vl_xmit_discards;
+	} vls[0]; /* real array size defined by # bits set in vl_select_mask */
+};
+
+enum counter_selects {
+	CS_PORT_XMIT_DATA			= (1 << 31),
+	CS_PORT_RCV_DATA			= (1 << 30),
+	CS_PORT_XMIT_PKTS			= (1 << 29),
+	CS_PORT_RCV_PKTS			= (1 << 28),
+	CS_PORT_MCAST_XMIT_PKTS			= (1 << 27),
+	CS_PORT_MCAST_RCV_PKTS			= (1 << 26),
+	CS_PORT_XMIT_WAIT			= (1 << 25),
+	CS_SW_PORT_CONGESTION			= (1 << 24),
+	CS_PORT_RCV_FECN			= (1 << 23),
+	CS_PORT_RCV_BECN			= (1 << 22),
+	CS_PORT_XMIT_TIME_CONG			= (1 << 21),
+	CS_PORT_XMIT_WASTED_BW			= (1 << 20),
+	CS_PORT_XMIT_WAIT_DATA			= (1 << 19),
+	CS_PORT_RCV_BUBBLE			= (1 << 18),
+	CS_PORT_MARK_FECN			= (1 << 17),
+	CS_PORT_RCV_CONSTRAINT_ERRORS		= (1 << 16),
+	CS_PORT_RCV_SWITCH_RELAY_ERRORS		= (1 << 15),
+	CS_PORT_XMIT_DISCARDS			= (1 << 14),
+	CS_PORT_XMIT_CONSTRAINT_ERRORS		= (1 << 13),
+	CS_PORT_RCV_REMOTE_PHYSICAL_ERRORS	= (1 << 12),
+	CS_LOCAL_LINK_INTEGRITY_ERRORS		= (1 << 11),
+	CS_PORT_RCV_ERRORS			= (1 << 10),
+	CS_EXCESSIVE_BUFFER_OVERRUNS		= (1 << 9),
+	CS_FM_CONFIG_ERRORS			= (1 << 8),
+	CS_LINK_ERROR_RECOVERY			= (1 << 7),
+	CS_LINK_DOWNED				= (1 << 6),
+	CS_UNCORRECTABLE_ERRORS			= (1 << 5),
+};
+
+struct opa_clear_port_status {
+	__be64 port_select_mask[4];
+	__be32 counter_select_mask;
+};
+
+struct opa_aggregate {
+	__be16 attr_id;
+	__be16 err_reqlength;	/* 1 bit, 8 res, 7 bit */
+	__be32 attr_mod;
+	u8 data[0];
+};
+
+/* Request contains first two fields, response contains those plus the rest */
+struct opa_port_data_counters_msg {
+	__be64 port_select_mask[4];
+	__be32 vl_select_mask;
+
+	/* Response fields follow */
+	__be32 reserved1;
+	struct _port_dctrs {
+		u8 port_number;
+		u8 reserved2[3];
+		__be32 link_quality_indicator; /* 29res, 3bit */
+
+		/* Data counters */
+		__be64 port_xmit_data;
+		__be64 port_rcv_data;
+		__be64 port_xmit_pkts;
+		__be64 port_rcv_pkts;
+		__be64 port_multicast_xmit_pkts;
+		__be64 port_multicast_rcv_pkts;
+		__be64 port_xmit_wait;
+		__be64 sw_port_congestion;
+		__be64 port_rcv_fecn;
+		__be64 port_rcv_becn;
+		__be64 port_xmit_time_cong;
+		__be64 port_xmit_wasted_bw;
+		__be64 port_xmit_wait_data;
+		__be64 port_rcv_bubble;
+		__be64 port_mark_fecn;
+
+		__be64 port_error_counter_summary;
+		/* Sum of error counts/port */
+
+		struct _vls_dctrs {
+			/* per-VL Data counters */
+			__be64 port_vl_xmit_data;
+			__be64 port_vl_rcv_data;
+			__be64 port_vl_xmit_pkts;
+			__be64 port_vl_rcv_pkts;
+			__be64 port_vl_xmit_wait;
+			__be64 sw_port_vl_congestion;
+			__be64 port_vl_rcv_fecn;
+			__be64 port_vl_rcv_becn;
+			__be64 port_xmit_time_cong;
+			__be64 port_vl_xmit_wasted_bw;
+			__be64 port_vl_xmit_wait_data;
+			__be64 port_vl_rcv_bubble;
+			__be64 port_vl_mark_fecn;
+		} vls[0];
+		/* array size defined by #bits set in vl_select_mask*/
+	} port[1]; /* array size defined by  #ports in attribute modifier */
+};
+
+struct opa_port_error_counters64_msg {
+	/* Request contains first two fields, response contains the
+	 * whole magilla */
+	__be64 port_select_mask[4];
+	__be32 vl_select_mask;
+
+	/* Response-only fields follow */
+	__be32 reserved1;
+	struct _port_ectrs {
+		u8 port_number;
+		u8 reserved2[7];
+		__be64 port_rcv_constraint_errors;
+		__be64 port_rcv_switch_relay_errors;
+		__be64 port_xmit_discards;
+		__be64 port_xmit_constraint_errors;
+		__be64 port_rcv_remote_physical_errors;
+		__be64 local_link_integrity_errors;
+		__be64 port_rcv_errors;
+		__be64 excessive_buffer_overruns;
+		__be64 fm_config_errors;
+		__be32 link_error_recovery;
+		__be32 link_downed;
+		u8 uncorrectable_errors;
+		u8 reserved3[7];
+		struct _vls_ectrs {
+			__be64 port_vl_xmit_discards;
+		} vls[0];
+		/* array size defined by #bits set in vl_select_mask */
+	} port[1]; /* array size defined by #ports in attribute modifier */
+};
+
+struct opa_port_error_info_msg {
+	__be64 port_select_mask[4];
+	__be32 error_info_select_mask;
+	__be32 reserved1;
+	struct _port_ei {
+
+		u8 port_number;
+		u8 reserved2[7];
+
+		/* PortRcvErrorInfo */
+		struct {
+			u8 status_and_code;
+			union {
+				u8 raw[17];
+				struct {
+					/* EI1to12 format */
+					u8 packet_flit1[8];
+					u8 packet_flit2[8];
+					u8 remaining_flit_bits12;
+				} ei1to12;
+				struct {
+					u8 packet_bytes[8];
+					u8 remaining_flit_bits;
+				} ei13;
+			} ei;
+			u8 reserved3[6];
+		} __packed port_rcv_ei;
+
+		/* ExcessiveBufferOverrunInfo */
+		struct {
+			u8 status_and_sc;
+			u8 reserved4[7];
+		} __packed excessive_buffer_overrun_ei;
+
+		/* PortXmitConstraintErrorInfo */
+		struct {
+			u8 status;
+			u8 reserved5;
+			__be16 pkey;
+			__be32 slid;
+		} __packed port_xmit_constraint_ei;
+
+		/* PortRcvConstraintErrorInfo */
+		struct {
+			u8 status;
+			u8 reserved6;
+			__be16 pkey;
+			__be32 slid;
+		} __packed port_rcv_constraint_ei;
+
+		/* PortRcvSwitchRelayErrorInfo */
+		struct {
+			u8 status_and_code;
+			u8 reserved7[3];
+			__u32 error_info;
+		} __packed port_rcv_switch_relay_ei;
+
+		/* UncorrectableErrorInfo */
+		struct {
+			u8 status_and_code;
+			u8 reserved8;
+		} __packed uncorrectable_ei;
+
+		/* FMConfigErrorInfo */
+		struct {
+			u8 status_and_code;
+			u8 error_info;
+		} __packed fm_config_ei;
+		__u32 reserved9;
+	} port[1]; /* actual array size defined by #ports in attr modifier */
+};
+
+/* opa_port_error_info_msg error_info_select_mask bit definitions */
+enum error_info_selects {
+	ES_PORT_RCV_ERROR_INFO			= (1 << 31),
+	ES_EXCESSIVE_BUFFER_OVERRUN_INFO	= (1 << 30),
+	ES_PORT_XMIT_CONSTRAINT_ERROR_INFO	= (1 << 29),
+	ES_PORT_RCV_CONSTRAINT_ERROR_INFO	= (1 << 28),
+	ES_PORT_RCV_SWITCH_RELAY_ERROR_INFO	= (1 << 27),
+	ES_UNCORRECTABLE_ERROR_INFO		= (1 << 26),
+	ES_FM_CONFIG_ERROR_INFO			= (1 << 25)
+};
+
+static int pma_get_opa_classportinfo(struct opa_pma_mad *pmp,
+				struct ib_device *ibdev, u32 *resp_len)
+{
+	struct opa_class_port_info *p =
+		(struct opa_class_port_info *)pmp->data;
+
+	memset(pmp->data, 0, sizeof(pmp->data));
+
+	if (pmp->mad_hdr.attr_mod != 0)
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+
+	p->base_version = OPA_MGMT_BASE_VERSION;
+	p->class_version = OPA_SMI_CLASS_VERSION;
+	/*
+	 * Expected response time is 4.096 usec. * 2^18 == 1.073741824 sec.
+	 */
+	p->cap_mask2_resp_time = cpu_to_be32(18);
+
+	if (resp_len)
+		*resp_len += sizeof(*p);
+
+	return reply((struct ib_mad_hdr *)pmp);
+}
+
+static void a0_portstatus(struct hfi1_pportdata *ppd,
+			  struct opa_port_status_rsp *rsp, u32 vl_select_mask)
+{
+	if (!is_bx(ppd->dd)) {
+		unsigned long vl;
+		int vfi = 0;
+		u64 max_vl_xmit_wait = 0, tmp;
+		u32 vl_all_mask = VL_MASK_ALL;
+		u64 rcv_data, rcv_bubble;
+
+		rcv_data = be64_to_cpu(rsp->port_rcv_data);
+		rcv_bubble = be64_to_cpu(rsp->port_rcv_bubble);
+		/* In the measured time period, calculate the total number
+		 * of flits that were received. Subtract out one false
+		 * rcv_bubble increment for every 32 received flits but
+		 * don't let the number go negative.
+		 */
+		if (rcv_bubble >= (rcv_data>>5)) {
+			rcv_bubble -= (rcv_data>>5);
+			rsp->port_rcv_bubble = cpu_to_be64(rcv_bubble);
+		}
+		for_each_set_bit(vl, (unsigned long *)&(vl_select_mask),
+				 8 * sizeof(vl_select_mask)) {
+			rcv_data = be64_to_cpu(rsp->vls[vfi].port_vl_rcv_data);
+			rcv_bubble =
+				be64_to_cpu(rsp->vls[vfi].port_vl_rcv_bubble);
+			if (rcv_bubble >= (rcv_data>>5)) {
+				rcv_bubble -= (rcv_data>>5);
+				rsp->vls[vfi].port_vl_rcv_bubble =
+							cpu_to_be64(rcv_bubble);
+			}
+			vfi++;
+		}
+
+		for_each_set_bit(vl, (unsigned long *)&(vl_all_mask),
+				 8 * sizeof(vl_all_mask)) {
+			tmp = read_port_cntr(ppd, C_TX_WAIT_VL,
+					     idx_from_vl(vl));
+			if (tmp > max_vl_xmit_wait)
+				max_vl_xmit_wait = tmp;
+		}
+		rsp->port_xmit_wait = cpu_to_be64(max_vl_xmit_wait);
+	}
+}
+
+
+static int pma_get_opa_portstatus(struct opa_pma_mad *pmp,
+			struct ib_device *ibdev, u8 port, u32 *resp_len)
+{
+	struct opa_port_status_req *req =
+		(struct opa_port_status_req *)pmp->data;
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct opa_port_status_rsp *rsp;
+	u32 vl_select_mask = be32_to_cpu(req->vl_select_mask);
+	unsigned long vl;
+	size_t response_data_size;
+	u32 nports = be32_to_cpu(pmp->mad_hdr.attr_mod) >> 24;
+	u8 port_num = req->port_num;
+	u8 num_vls = hweight32(vl_select_mask);
+	struct _vls_pctrs *vlinfo;
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	int vfi;
+	u64 tmp, tmp2;
+
+	response_data_size = sizeof(struct opa_port_status_rsp) +
+				num_vls * sizeof(struct _vls_pctrs);
+	if (response_data_size > sizeof(pmp->data)) {
+		pmp->mad_hdr.status |= OPA_PM_STATUS_REQUEST_TOO_LARGE;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+
+	if (nports != 1 || (port_num && port_num != port)
+	    || num_vls > OPA_MAX_VLS || (vl_select_mask & ~VL_MASK_ALL)) {
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+
+	memset(pmp->data, 0, sizeof(pmp->data));
+
+	rsp = (struct opa_port_status_rsp *)pmp->data;
+	if (port_num)
+		rsp->port_num = port_num;
+	else
+		rsp->port_num = port;
+
+	rsp->port_rcv_constraint_errors =
+		cpu_to_be64(read_port_cntr(ppd, C_SW_RCV_CSTR_ERR,
+					   CNTR_INVALID_VL));
+
+	hfi1_read_link_quality(dd, &rsp->link_quality_indicator);
+
+	rsp->vl_select_mask = cpu_to_be32(vl_select_mask);
+	rsp->port_xmit_data = cpu_to_be64(read_dev_cntr(dd, C_DC_XMIT_FLITS,
+					  CNTR_INVALID_VL));
+	rsp->port_rcv_data = cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_FLITS,
+					 CNTR_INVALID_VL));
+	rsp->port_rcv_bubble =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_BBL, CNTR_INVALID_VL));
+	rsp->port_xmit_pkts = cpu_to_be64(read_dev_cntr(dd, C_DC_XMIT_PKTS,
+					  CNTR_INVALID_VL));
+	rsp->port_rcv_pkts = cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_PKTS,
+					 CNTR_INVALID_VL));
+	rsp->port_multicast_xmit_pkts =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_MC_XMIT_PKTS,
+					CNTR_INVALID_VL));
+	rsp->port_multicast_rcv_pkts =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_MC_RCV_PKTS,
+					  CNTR_INVALID_VL));
+	rsp->port_xmit_wait =
+		cpu_to_be64(read_port_cntr(ppd, C_TX_WAIT, CNTR_INVALID_VL));
+	rsp->port_rcv_fecn =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_FCN, CNTR_INVALID_VL));
+	rsp->port_rcv_becn =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_BCN, CNTR_INVALID_VL));
+	rsp->port_xmit_discards =
+		cpu_to_be64(read_port_cntr(ppd, C_SW_XMIT_DSCD,
+					   CNTR_INVALID_VL));
+	rsp->port_xmit_constraint_errors =
+		cpu_to_be64(read_port_cntr(ppd, C_SW_XMIT_CSTR_ERR,
+					   CNTR_INVALID_VL));
+	rsp->port_rcv_remote_physical_errors =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_RMT_PHY_ERR,
+					  CNTR_INVALID_VL));
+	tmp = read_dev_cntr(dd, C_DC_RX_REPLAY, CNTR_INVALID_VL);
+	tmp2 = tmp + read_dev_cntr(dd, C_DC_TX_REPLAY, CNTR_INVALID_VL);
+	if (tmp2 < tmp) {
+		/* overflow/wrapped */
+		rsp->local_link_integrity_errors = cpu_to_be64(~0);
+	} else {
+		rsp->local_link_integrity_errors = cpu_to_be64(tmp2);
+	}
+	tmp = read_dev_cntr(dd, C_DC_SEQ_CRC_CNT, CNTR_INVALID_VL);
+	tmp2 = tmp + read_dev_cntr(dd, C_DC_REINIT_FROM_PEER_CNT,
+					CNTR_INVALID_VL);
+	if (tmp2 > (u32)UINT_MAX || tmp2 < tmp) {
+		/* overflow/wrapped */
+		rsp->link_error_recovery = cpu_to_be32(~0);
+	} else {
+		rsp->link_error_recovery = cpu_to_be32(tmp2);
+	}
+	rsp->port_rcv_errors =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_ERR, CNTR_INVALID_VL));
+	rsp->excessive_buffer_overruns =
+		cpu_to_be64(read_dev_cntr(dd, C_RCV_OVF, CNTR_INVALID_VL));
+	rsp->fm_config_errors =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_FM_CFG_ERR,
+					  CNTR_INVALID_VL));
+	rsp->link_downed = cpu_to_be32(read_port_cntr(ppd, C_SW_LINK_DOWN,
+					  CNTR_INVALID_VL));
+
+	/* rsp->uncorrectable_errors is 8 bits wide, and it pegs at 0xff */
+	tmp = read_dev_cntr(dd, C_DC_UNC_ERR, CNTR_INVALID_VL);
+	rsp->uncorrectable_errors = tmp < 0x100 ? (tmp & 0xff) : 0xff;
+
+	vlinfo = &(rsp->vls[0]);
+	vfi = 0;
+	/* The vl_select_mask has been checked above, and we know
+	 * that it contains only entries which represent valid VLs.
+	 * So in the for_each_set_bit() loop below, we don't need
+	 * any additional checks for vl.
+	 */
+	for_each_set_bit(vl, (unsigned long *)&(vl_select_mask),
+			 8 * sizeof(vl_select_mask)) {
+		memset(vlinfo, 0, sizeof(*vlinfo));
+
+		tmp = read_dev_cntr(dd, C_DC_RX_FLIT_VL, idx_from_vl(vl));
+		rsp->vls[vfi].port_vl_rcv_data = cpu_to_be64(tmp);
+		rsp->vls[vfi].port_vl_rcv_bubble =
+			cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_BBL_VL,
+					idx_from_vl(vl)));
+
+		rsp->vls[vfi].port_vl_rcv_pkts =
+			cpu_to_be64(read_dev_cntr(dd, C_DC_RX_PKT_VL,
+					idx_from_vl(vl)));
+
+		rsp->vls[vfi].port_vl_xmit_data =
+			cpu_to_be64(read_port_cntr(ppd, C_TX_FLIT_VL,
+					idx_from_vl(vl)));
+
+		rsp->vls[vfi].port_vl_xmit_pkts =
+			cpu_to_be64(read_port_cntr(ppd, C_TX_PKT_VL,
+					idx_from_vl(vl)));
+
+		rsp->vls[vfi].port_vl_xmit_wait =
+			cpu_to_be64(read_port_cntr(ppd, C_TX_WAIT_VL,
+					idx_from_vl(vl)));
+
+		rsp->vls[vfi].port_vl_rcv_fecn =
+			cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_FCN_VL,
+					idx_from_vl(vl)));
+
+		rsp->vls[vfi].port_vl_rcv_becn =
+			cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_BCN_VL,
+					idx_from_vl(vl)));
+
+		vlinfo++;
+		vfi++;
+	}
+
+	a0_portstatus(ppd, rsp, vl_select_mask);
+
+	if (resp_len)
+		*resp_len += response_data_size;
+
+	return reply((struct ib_mad_hdr *)pmp);
+}
+
+static u64 get_error_counter_summary(struct ib_device *ibdev, u8 port)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	u64 error_counter_summary = 0, tmp;
+
+	error_counter_summary += read_port_cntr(ppd, C_SW_RCV_CSTR_ERR,
+						CNTR_INVALID_VL);
+	/* port_rcv_switch_relay_errors is 0 for HFIs */
+	error_counter_summary += read_port_cntr(ppd, C_SW_XMIT_DSCD,
+						CNTR_INVALID_VL);
+	error_counter_summary += read_port_cntr(ppd, C_SW_XMIT_CSTR_ERR,
+						CNTR_INVALID_VL);
+	error_counter_summary += read_dev_cntr(dd, C_DC_RMT_PHY_ERR,
+						CNTR_INVALID_VL);
+	error_counter_summary += read_dev_cntr(dd, C_DC_TX_REPLAY,
+						CNTR_INVALID_VL);
+	error_counter_summary += read_dev_cntr(dd, C_DC_RX_REPLAY,
+						CNTR_INVALID_VL);
+	error_counter_summary += read_dev_cntr(dd, C_DC_SEQ_CRC_CNT,
+						CNTR_INVALID_VL);
+	error_counter_summary += read_dev_cntr(dd, C_DC_REINIT_FROM_PEER_CNT,
+						CNTR_INVALID_VL);
+	error_counter_summary += read_dev_cntr(dd, C_DC_RCV_ERR,
+						CNTR_INVALID_VL);
+	error_counter_summary += read_dev_cntr(dd, C_RCV_OVF, CNTR_INVALID_VL);
+	error_counter_summary += read_dev_cntr(dd, C_DC_FM_CFG_ERR,
+						CNTR_INVALID_VL);
+	/* ppd->link_downed is a 32-bit value */
+	error_counter_summary += read_port_cntr(ppd, C_SW_LINK_DOWN,
+						CNTR_INVALID_VL);
+	tmp = read_dev_cntr(dd, C_DC_UNC_ERR, CNTR_INVALID_VL);
+	/* this is an 8-bit quantity */
+	error_counter_summary += tmp < 0x100 ? (tmp & 0xff) : 0xff;
+
+	return error_counter_summary;
+}
+
+static void a0_datacounters(struct hfi1_devdata *dd, struct _port_dctrs *rsp,
+			    u32 vl_select_mask)
+{
+	if (!is_bx(dd)) {
+		unsigned long vl;
+		int vfi = 0;
+		u64 rcv_data, rcv_bubble, sum_vl_xmit_wait = 0;
+
+		rcv_data = be64_to_cpu(rsp->port_rcv_data);
+		rcv_bubble = be64_to_cpu(rsp->port_rcv_bubble);
+		/* In the measured time period, calculate the total number
+		 * of flits that were received. Subtract out one false
+		 * rcv_bubble increment for every 32 received flits but
+		 * don't let the number go negative.
+		 */
+		if (rcv_bubble >= (rcv_data>>5)) {
+			rcv_bubble -= (rcv_data>>5);
+			rsp->port_rcv_bubble = cpu_to_be64(rcv_bubble);
+		}
+		for_each_set_bit(vl, (unsigned long *)&(vl_select_mask),
+				8 * sizeof(vl_select_mask)) {
+			rcv_data = be64_to_cpu(rsp->vls[vfi].port_vl_rcv_data);
+			rcv_bubble =
+				be64_to_cpu(rsp->vls[vfi].port_vl_rcv_bubble);
+			if (rcv_bubble >= (rcv_data>>5)) {
+				rcv_bubble -= (rcv_data>>5);
+				rsp->vls[vfi].port_vl_rcv_bubble =
+							cpu_to_be64(rcv_bubble);
+			}
+			vfi++;
+		}
+		vfi = 0;
+		for_each_set_bit(vl, (unsigned long *)&(vl_select_mask),
+				8 * sizeof(vl_select_mask)) {
+			u64 tmp = sum_vl_xmit_wait +
+				be64_to_cpu(rsp->vls[vfi++].port_vl_xmit_wait);
+			if (tmp < sum_vl_xmit_wait) {
+				/* we wrapped */
+				sum_vl_xmit_wait = (u64) ~0;
+				break;
+			}
+			sum_vl_xmit_wait = tmp;
+		}
+		if (be64_to_cpu(rsp->port_xmit_wait) > sum_vl_xmit_wait)
+			rsp->port_xmit_wait = cpu_to_be64(sum_vl_xmit_wait);
+	}
+}
+
+static int pma_get_opa_datacounters(struct opa_pma_mad *pmp,
+			struct ib_device *ibdev, u8 port, u32 *resp_len)
+{
+	struct opa_port_data_counters_msg *req =
+		(struct opa_port_data_counters_msg *)pmp->data;
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	struct _port_dctrs *rsp;
+	struct _vls_dctrs *vlinfo;
+	size_t response_data_size;
+	u32 num_ports;
+	u8 num_pslm;
+	u8 lq, num_vls;
+	u64 port_mask;
+	unsigned long port_num;
+	unsigned long vl;
+	u32 vl_select_mask;
+	int vfi;
+
+	num_ports = be32_to_cpu(pmp->mad_hdr.attr_mod) >> 24;
+	num_pslm = hweight64(be64_to_cpu(req->port_select_mask[3]));
+	num_vls = hweight32(be32_to_cpu(req->vl_select_mask));
+	vl_select_mask = be32_to_cpu(req->vl_select_mask);
+
+	if (num_ports != 1 || (vl_select_mask & ~VL_MASK_ALL)) {
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+
+	/* Sanity check */
+	response_data_size = sizeof(struct opa_port_data_counters_msg) +
+				num_vls * sizeof(struct _vls_dctrs);
+
+	if (response_data_size > sizeof(pmp->data)) {
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+
+	/*
+	 * The bit set in the mask needs to be consistent with the
+	 * port the request came in on.
+	 */
+	port_mask = be64_to_cpu(req->port_select_mask[3]);
+	port_num = find_first_bit((unsigned long *)&port_mask,
+				  sizeof(port_mask));
+
+	if ((u8)port_num != port) {
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+
+	rsp = (struct _port_dctrs *)&(req->port[0]);
+	memset(rsp, 0, sizeof(*rsp));
+
+	rsp->port_number = port;
+	/*
+	 * Note that link_quality_indicator is a 32 bit quantity in
+	 * 'datacounters' queries (as opposed to 'portinfo' queries,
+	 * where it's a byte).
+	 */
+	hfi1_read_link_quality(dd, &lq);
+	rsp->link_quality_indicator = cpu_to_be32((u32)lq);
+
+	/* rsp->sw_port_congestion is 0 for HFIs */
+	/* rsp->port_xmit_time_cong is 0 for HFIs */
+	/* rsp->port_xmit_wasted_bw ??? */
+	/* rsp->port_xmit_wait_data ??? */
+	/* rsp->port_mark_fecn is 0 for HFIs */
+
+	rsp->port_xmit_data = cpu_to_be64(read_dev_cntr(dd, C_DC_XMIT_FLITS,
+						CNTR_INVALID_VL));
+	rsp->port_rcv_data = cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_FLITS,
+						CNTR_INVALID_VL));
+	rsp->port_rcv_bubble =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_BBL, CNTR_INVALID_VL));
+	rsp->port_xmit_pkts = cpu_to_be64(read_dev_cntr(dd, C_DC_XMIT_PKTS,
+						CNTR_INVALID_VL));
+	rsp->port_rcv_pkts = cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_PKTS,
+						CNTR_INVALID_VL));
+	rsp->port_multicast_xmit_pkts =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_MC_XMIT_PKTS,
+						CNTR_INVALID_VL));
+	rsp->port_multicast_rcv_pkts =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_MC_RCV_PKTS,
+						CNTR_INVALID_VL));
+	rsp->port_xmit_wait =
+		cpu_to_be64(read_port_cntr(ppd, C_TX_WAIT, CNTR_INVALID_VL));
+	rsp->port_rcv_fecn =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_FCN, CNTR_INVALID_VL));
+	rsp->port_rcv_becn =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_BCN, CNTR_INVALID_VL));
+
+	rsp->port_error_counter_summary =
+		cpu_to_be64(get_error_counter_summary(ibdev, port));
+
+	vlinfo = &(rsp->vls[0]);
+	vfi = 0;
+	/* The vl_select_mask has been checked above, and we know
+	 * that it contains only entries which represent valid VLs.
+	 * So in the for_each_set_bit() loop below, we don't need
+	 * any additional checks for vl.
+	 */
+	for_each_set_bit(vl, (unsigned long *)&(vl_select_mask),
+		 8 * sizeof(req->vl_select_mask)) {
+		memset(vlinfo, 0, sizeof(*vlinfo));
+
+		rsp->vls[vfi].port_vl_xmit_data =
+			cpu_to_be64(read_port_cntr(ppd, C_TX_FLIT_VL,
+							idx_from_vl(vl)));
+
+		rsp->vls[vfi].port_vl_rcv_data =
+			cpu_to_be64(read_dev_cntr(dd, C_DC_RX_FLIT_VL,
+							idx_from_vl(vl)));
+		rsp->vls[vfi].port_vl_rcv_bubble =
+			cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_BBL_VL,
+					idx_from_vl(vl)));
+
+		rsp->vls[vfi].port_vl_xmit_pkts =
+			cpu_to_be64(read_port_cntr(ppd, C_TX_PKT_VL,
+							idx_from_vl(vl)));
+
+		rsp->vls[vfi].port_vl_rcv_pkts =
+			cpu_to_be64(read_dev_cntr(dd, C_DC_RX_PKT_VL,
+							idx_from_vl(vl)));
+
+		rsp->vls[vfi].port_vl_xmit_wait =
+			cpu_to_be64(read_port_cntr(ppd, C_TX_WAIT_VL,
+							idx_from_vl(vl)));
+
+		rsp->vls[vfi].port_vl_rcv_fecn =
+			cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_FCN_VL,
+							idx_from_vl(vl)));
+		rsp->vls[vfi].port_vl_rcv_becn =
+			cpu_to_be64(read_dev_cntr(dd, C_DC_RCV_BCN_VL,
+							idx_from_vl(vl)));
+
+		/* rsp->port_vl_xmit_time_cong is 0 for HFIs */
+		/* rsp->port_vl_xmit_wasted_bw ??? */
+		/* port_vl_xmit_wait_data - TXE (table 13-9 HFI spec) ???
+		 * does this differ from rsp->vls[vfi].port_vl_xmit_wait */
+		/*rsp->vls[vfi].port_vl_mark_fecn =
+			cpu_to_be64(read_csr(dd, DCC_PRF_PORT_VL_MARK_FECN_CNT
+				+ offset));
+		*/
+		vlinfo++;
+		vfi++;
+	}
+
+	a0_datacounters(dd, rsp, vl_select_mask);
+
+	if (resp_len)
+		*resp_len += response_data_size;
+
+	return reply((struct ib_mad_hdr *)pmp);
+}
+
+static int pma_get_opa_porterrors(struct opa_pma_mad *pmp,
+			struct ib_device *ibdev, u8 port, u32 *resp_len)
+{
+	size_t response_data_size;
+	struct _port_ectrs *rsp;
+	unsigned long port_num;
+	struct opa_port_error_counters64_msg *req;
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	u32 num_ports;
+	u8 num_pslm;
+	u8 num_vls;
+	struct hfi1_ibport *ibp;
+	struct hfi1_pportdata *ppd;
+	struct _vls_ectrs *vlinfo;
+	unsigned long vl;
+	u64 port_mask, tmp, tmp2;
+	u32 vl_select_mask;
+	int vfi;
+
+	req = (struct opa_port_error_counters64_msg *)pmp->data;
+
+	num_ports = be32_to_cpu(pmp->mad_hdr.attr_mod) >> 24;
+
+	num_pslm = hweight64(be64_to_cpu(req->port_select_mask[3]));
+	num_vls = hweight32(be32_to_cpu(req->vl_select_mask));
+
+	if (num_ports != 1 || num_ports != num_pslm) {
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+
+	response_data_size = sizeof(struct opa_port_error_counters64_msg) +
+				num_vls * sizeof(struct _vls_ectrs);
+
+	if (response_data_size > sizeof(pmp->data)) {
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+	/*
+	 * The bit set in the mask needs to be consistent with the
+	 * port the request came in on.
+	 */
+	port_mask = be64_to_cpu(req->port_select_mask[3]);
+	port_num = find_first_bit((unsigned long *)&port_mask,
+					sizeof(port_mask));
+
+	if ((u8)port_num != port) {
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+
+	rsp = (struct _port_ectrs *)&(req->port[0]);
+
+	ibp = to_iport(ibdev, port_num);
+	ppd = ppd_from_ibp(ibp);
+
+	memset(rsp, 0, sizeof(*rsp));
+	rsp->port_number = (u8)port_num;
+
+	rsp->port_rcv_constraint_errors =
+		cpu_to_be64(read_port_cntr(ppd, C_SW_RCV_CSTR_ERR,
+					   CNTR_INVALID_VL));
+	/* port_rcv_switch_relay_errors is 0 for HFIs */
+	rsp->port_xmit_discards =
+		cpu_to_be64(read_port_cntr(ppd, C_SW_XMIT_DSCD,
+						CNTR_INVALID_VL));
+	rsp->port_rcv_remote_physical_errors =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_RMT_PHY_ERR,
+						CNTR_INVALID_VL));
+	tmp = read_dev_cntr(dd, C_DC_RX_REPLAY, CNTR_INVALID_VL);
+	tmp2 = tmp + read_dev_cntr(dd, C_DC_TX_REPLAY, CNTR_INVALID_VL);
+	if (tmp2 < tmp) {
+		/* overflow/wrapped */
+		rsp->local_link_integrity_errors = cpu_to_be64(~0);
+	} else {
+		rsp->local_link_integrity_errors = cpu_to_be64(tmp2);
+	}
+	tmp = read_dev_cntr(dd, C_DC_SEQ_CRC_CNT, CNTR_INVALID_VL);
+	tmp2 = tmp + read_dev_cntr(dd, C_DC_REINIT_FROM_PEER_CNT,
+					CNTR_INVALID_VL);
+	if (tmp2 > (u32)UINT_MAX || tmp2 < tmp) {
+		/* overflow/wrapped */
+		rsp->link_error_recovery = cpu_to_be32(~0);
+	} else {
+		rsp->link_error_recovery = cpu_to_be32(tmp2);
+	}
+	rsp->port_xmit_constraint_errors =
+		cpu_to_be64(read_port_cntr(ppd, C_SW_XMIT_CSTR_ERR,
+					   CNTR_INVALID_VL));
+	rsp->excessive_buffer_overruns =
+		cpu_to_be64(read_dev_cntr(dd, C_RCV_OVF, CNTR_INVALID_VL));
+	rsp->fm_config_errors =
+		cpu_to_be64(read_dev_cntr(dd, C_DC_FM_CFG_ERR,
+						CNTR_INVALID_VL));
+	rsp->link_downed = cpu_to_be32(read_port_cntr(ppd, C_SW_LINK_DOWN,
+						CNTR_INVALID_VL));
+	tmp = read_dev_cntr(dd, C_DC_UNC_ERR, CNTR_INVALID_VL);
+	rsp->uncorrectable_errors = tmp < 0x100 ? (tmp & 0xff) : 0xff;
+
+	vlinfo = (struct _vls_ectrs *)&(rsp->vls[0]);
+	vfi = 0;
+	vl_select_mask = be32_to_cpu(req->vl_select_mask);
+	for_each_set_bit(vl, (unsigned long *)&(vl_select_mask),
+			 8 * sizeof(req->vl_select_mask)) {
+		memset(vlinfo, 0, sizeof(*vlinfo));
+		/* vlinfo->vls[vfi].port_vl_xmit_discards ??? */
+		vlinfo += 1;
+		vfi++;
+	}
+
+	if (resp_len)
+		*resp_len += response_data_size;
+
+	return reply((struct ib_mad_hdr *)pmp);
+}
+
+static int pma_get_opa_errorinfo(struct opa_pma_mad *pmp,
+			struct ib_device *ibdev, u8 port, u32 *resp_len)
+{
+	size_t response_data_size;
+	struct _port_ei *rsp;
+	struct opa_port_error_info_msg *req;
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	u64 port_mask;
+	u32 num_ports;
+	unsigned long port_num;
+	u8 num_pslm;
+	u64 reg;
+
+	req = (struct opa_port_error_info_msg *)pmp->data;
+	rsp = (struct _port_ei *)&(req->port[0]);
+
+	num_ports = OPA_AM_NPORT(be32_to_cpu(pmp->mad_hdr.attr_mod));
+	num_pslm = hweight64(be64_to_cpu(req->port_select_mask[3]));
+
+	memset(rsp, 0, sizeof(*rsp));
+
+	if (num_ports != 1 || num_ports != num_pslm) {
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+
+	/* Sanity check */
+	response_data_size = sizeof(struct opa_port_error_info_msg);
+
+	if (response_data_size > sizeof(pmp->data)) {
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+
+	/*
+	 * The bit set in the mask needs to be consistent with the port
+	 * the request came in on.
+	 */
+	port_mask = be64_to_cpu(req->port_select_mask[3]);
+	port_num = find_first_bit((unsigned long *)&port_mask,
+				  sizeof(port_mask));
+
+	if ((u8)port_num != port) {
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+
+	/* PortRcvErrorInfo */
+	rsp->port_rcv_ei.status_and_code =
+		dd->err_info_rcvport.status_and_code;
+	memcpy(&rsp->port_rcv_ei.ei.ei1to12.packet_flit1,
+		&dd->err_info_rcvport.packet_flit1, sizeof(u64));
+	memcpy(&rsp->port_rcv_ei.ei.ei1to12.packet_flit2,
+		&dd->err_info_rcvport.packet_flit2, sizeof(u64));
+
+	/* ExcessiverBufferOverrunInfo */
+	reg = read_csr(dd, RCV_ERR_INFO);
+	if (reg & RCV_ERR_INFO_RCV_EXCESS_BUFFER_OVERRUN_SMASK) {
+		/* if the RcvExcessBufferOverrun bit is set, save SC of
+		 * first pkt that encountered an excess buffer overrun */
+		u8 tmp = (u8)reg;
+
+		tmp &=  RCV_ERR_INFO_RCV_EXCESS_BUFFER_OVERRUN_SC_SMASK;
+		tmp <<= 2;
+		rsp->excessive_buffer_overrun_ei.status_and_sc = tmp;
+		/* set the status bit */
+		rsp->excessive_buffer_overrun_ei.status_and_sc |= 0x80;
+	}
+
+	rsp->port_xmit_constraint_ei.status =
+		dd->err_info_xmit_constraint.status;
+	rsp->port_xmit_constraint_ei.pkey =
+		cpu_to_be16(dd->err_info_xmit_constraint.pkey);
+	rsp->port_xmit_constraint_ei.slid =
+		cpu_to_be32(dd->err_info_xmit_constraint.slid);
+
+	rsp->port_rcv_constraint_ei.status =
+		dd->err_info_rcv_constraint.status;
+	rsp->port_rcv_constraint_ei.pkey =
+		cpu_to_be16(dd->err_info_rcv_constraint.pkey);
+	rsp->port_rcv_constraint_ei.slid =
+		cpu_to_be32(dd->err_info_rcv_constraint.slid);
+
+	/* UncorrectableErrorInfo */
+	rsp->uncorrectable_ei.status_and_code = dd->err_info_uncorrectable;
+
+	/* FMConfigErrorInfo */
+	rsp->fm_config_ei.status_and_code = dd->err_info_fmconfig;
+
+	if (resp_len)
+		*resp_len += response_data_size;
+
+	return reply((struct ib_mad_hdr *)pmp);
+}
+
+static int pma_set_opa_portstatus(struct opa_pma_mad *pmp,
+			struct ib_device *ibdev, u8 port, u32 *resp_len)
+{
+	struct opa_clear_port_status *req =
+		(struct opa_clear_port_status *)pmp->data;
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	u32 nports = be32_to_cpu(pmp->mad_hdr.attr_mod) >> 24;
+	u64 portn = be64_to_cpu(req->port_select_mask[3]);
+	u32 counter_select = be32_to_cpu(req->counter_select_mask);
+	u32 vl_select_mask = VL_MASK_ALL; /* clear all per-vl cnts */
+	unsigned long vl;
+
+	if ((nports != 1) || (portn != 1 << port)) {
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+	/*
+	 * only counters returned by pma_get_opa_portstatus() are
+	 * handled, so when pma_get_opa_portstatus() gets a fix,
+	 * the corresponding change should be made here as well.
+	 */
+
+	if (counter_select & CS_PORT_XMIT_DATA)
+		write_dev_cntr(dd, C_DC_XMIT_FLITS, CNTR_INVALID_VL, 0);
+
+	if (counter_select & CS_PORT_RCV_DATA)
+		write_dev_cntr(dd, C_DC_RCV_FLITS, CNTR_INVALID_VL, 0);
+
+	if (counter_select & CS_PORT_XMIT_PKTS)
+		write_dev_cntr(dd, C_DC_XMIT_PKTS, CNTR_INVALID_VL, 0);
+
+	if (counter_select & CS_PORT_RCV_PKTS)
+		write_dev_cntr(dd, C_DC_RCV_PKTS, CNTR_INVALID_VL, 0);
+
+	if (counter_select & CS_PORT_MCAST_XMIT_PKTS)
+		write_dev_cntr(dd, C_DC_MC_XMIT_PKTS, CNTR_INVALID_VL, 0);
+
+	if (counter_select & CS_PORT_MCAST_RCV_PKTS)
+		write_dev_cntr(dd, C_DC_MC_RCV_PKTS, CNTR_INVALID_VL, 0);
+
+	if (counter_select & CS_PORT_XMIT_WAIT)
+		write_port_cntr(ppd, C_TX_WAIT, CNTR_INVALID_VL, 0);
+
+	/* ignore cs_sw_portCongestion for HFIs */
+
+	if (counter_select & CS_PORT_RCV_FECN)
+		write_dev_cntr(dd, C_DC_RCV_FCN, CNTR_INVALID_VL, 0);
+
+	if (counter_select & CS_PORT_RCV_BECN)
+		write_dev_cntr(dd, C_DC_RCV_BCN, CNTR_INVALID_VL, 0);
+
+	/* ignore cs_port_xmit_time_cong for HFIs */
+	/* ignore cs_port_xmit_wasted_bw for now */
+	/* ignore cs_port_xmit_wait_data for now */
+	if (counter_select & CS_PORT_RCV_BUBBLE)
+		write_dev_cntr(dd, C_DC_RCV_BBL, CNTR_INVALID_VL, 0);
+
+	/* Only applicable for switch */
+	/*if (counter_select & CS_PORT_MARK_FECN)
+		write_csr(dd, DCC_PRF_PORT_MARK_FECN_CNT, 0);*/
+
+	if (counter_select & CS_PORT_RCV_CONSTRAINT_ERRORS)
+		write_port_cntr(ppd, C_SW_RCV_CSTR_ERR, CNTR_INVALID_VL, 0);
+
+	/* ignore cs_port_rcv_switch_relay_errors for HFIs */
+	if (counter_select & CS_PORT_XMIT_DISCARDS)
+		write_port_cntr(ppd, C_SW_XMIT_DSCD, CNTR_INVALID_VL, 0);
+
+	if (counter_select & CS_PORT_XMIT_CONSTRAINT_ERRORS)
+		write_port_cntr(ppd, C_SW_XMIT_CSTR_ERR, CNTR_INVALID_VL, 0);
+
+	if (counter_select & CS_PORT_RCV_REMOTE_PHYSICAL_ERRORS)
+		write_dev_cntr(dd, C_DC_RMT_PHY_ERR, CNTR_INVALID_VL, 0);
+
+	if (counter_select & CS_LOCAL_LINK_INTEGRITY_ERRORS) {
+		write_dev_cntr(dd, C_DC_TX_REPLAY, CNTR_INVALID_VL, 0);
+		write_dev_cntr(dd, C_DC_RX_REPLAY, CNTR_INVALID_VL, 0);
+	}
+
+	if (counter_select & CS_LINK_ERROR_RECOVERY) {
+		write_dev_cntr(dd, C_DC_SEQ_CRC_CNT, CNTR_INVALID_VL, 0);
+		write_dev_cntr(dd, C_DC_REINIT_FROM_PEER_CNT,
+						CNTR_INVALID_VL, 0);
+	}
+
+	if (counter_select & CS_PORT_RCV_ERRORS)
+		write_dev_cntr(dd, C_DC_RCV_ERR, CNTR_INVALID_VL, 0);
+
+	if (counter_select & CS_EXCESSIVE_BUFFER_OVERRUNS) {
+		write_dev_cntr(dd, C_RCV_OVF, CNTR_INVALID_VL, 0);
+		dd->rcv_ovfl_cnt = 0;
+	}
+
+	if (counter_select & CS_FM_CONFIG_ERRORS)
+		write_dev_cntr(dd, C_DC_FM_CFG_ERR, CNTR_INVALID_VL, 0);
+
+	if (counter_select & CS_LINK_DOWNED)
+		write_port_cntr(ppd, C_SW_LINK_DOWN, CNTR_INVALID_VL, 0);
+
+	if (counter_select & CS_UNCORRECTABLE_ERRORS)
+		write_dev_cntr(dd, C_DC_UNC_ERR, CNTR_INVALID_VL, 0);
+
+	for_each_set_bit(vl, (unsigned long *)&(vl_select_mask),
+			 8 * sizeof(vl_select_mask)) {
+
+		if (counter_select & CS_PORT_XMIT_DATA)
+			write_port_cntr(ppd, C_TX_FLIT_VL, idx_from_vl(vl), 0);
+
+		if (counter_select & CS_PORT_RCV_DATA)
+			write_dev_cntr(dd, C_DC_RX_FLIT_VL, idx_from_vl(vl), 0);
+
+		if (counter_select & CS_PORT_XMIT_PKTS)
+			write_port_cntr(ppd, C_TX_PKT_VL, idx_from_vl(vl), 0);
+
+		if (counter_select & CS_PORT_RCV_PKTS)
+			write_dev_cntr(dd, C_DC_RX_PKT_VL, idx_from_vl(vl), 0);
+
+		if (counter_select & CS_PORT_XMIT_WAIT)
+			write_port_cntr(ppd, C_TX_WAIT_VL, idx_from_vl(vl), 0);
+
+		/* sw_port_vl_congestion is 0 for HFIs */
+		if (counter_select & CS_PORT_RCV_FECN)
+			write_dev_cntr(dd, C_DC_RCV_FCN_VL, idx_from_vl(vl), 0);
+
+		if (counter_select & CS_PORT_RCV_BECN)
+			write_dev_cntr(dd, C_DC_RCV_BCN_VL, idx_from_vl(vl), 0);
+
+		/* port_vl_xmit_time_cong is 0 for HFIs */
+		/* port_vl_xmit_wasted_bw ??? */
+		/* port_vl_xmit_wait_data - TXE (table 13-9 HFI spec) ??? */
+		if (counter_select & CS_PORT_RCV_BUBBLE)
+			write_dev_cntr(dd, C_DC_RCV_BBL_VL, idx_from_vl(vl), 0);
+
+		/*if (counter_select & CS_PORT_MARK_FECN)
+		     write_csr(dd, DCC_PRF_PORT_VL_MARK_FECN_CNT + offset, 0);
+		*/
+		/* port_vl_xmit_discards ??? */
+	}
+
+	if (resp_len)
+		*resp_len += sizeof(*req);
+
+	return reply((struct ib_mad_hdr *)pmp);
+}
+
+static int pma_set_opa_errorinfo(struct opa_pma_mad *pmp,
+			struct ib_device *ibdev, u8 port, u32 *resp_len)
+{
+	struct _port_ei *rsp;
+	struct opa_port_error_info_msg *req;
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	u64 port_mask;
+	u32 num_ports;
+	unsigned long port_num;
+	u8 num_pslm;
+	u32 error_info_select;
+
+	req = (struct opa_port_error_info_msg *)pmp->data;
+	rsp = (struct _port_ei *)&(req->port[0]);
+
+	num_ports = OPA_AM_NPORT(be32_to_cpu(pmp->mad_hdr.attr_mod));
+	num_pslm = hweight64(be64_to_cpu(req->port_select_mask[3]));
+
+	memset(rsp, 0, sizeof(*rsp));
+
+	if (num_ports != 1 || num_ports != num_pslm) {
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+
+	/*
+	 * The bit set in the mask needs to be consistent with the port
+	 * the request came in on.
+	 */
+	port_mask = be64_to_cpu(req->port_select_mask[3]);
+	port_num = find_first_bit((unsigned long *)&port_mask,
+				  sizeof(port_mask));
+
+	if ((u8)port_num != port) {
+		pmp->mad_hdr.status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+
+	error_info_select = be32_to_cpu(req->error_info_select_mask);
+
+	/* PortRcvErrorInfo */
+	if (error_info_select & ES_PORT_RCV_ERROR_INFO)
+		/* turn off status bit */
+		dd->err_info_rcvport.status_and_code &= ~OPA_EI_STATUS_SMASK;
+
+	/* ExcessiverBufferOverrunInfo */
+	if (error_info_select & ES_EXCESSIVE_BUFFER_OVERRUN_INFO)
+		/* status bit is essentially kept in the h/w - bit 5 of
+		 * RCV_ERR_INFO */
+		write_csr(dd, RCV_ERR_INFO,
+			  RCV_ERR_INFO_RCV_EXCESS_BUFFER_OVERRUN_SMASK);
+
+	if (error_info_select & ES_PORT_XMIT_CONSTRAINT_ERROR_INFO)
+		dd->err_info_xmit_constraint.status &= ~OPA_EI_STATUS_SMASK;
+
+	if (error_info_select & ES_PORT_RCV_CONSTRAINT_ERROR_INFO)
+		dd->err_info_rcv_constraint.status &= ~OPA_EI_STATUS_SMASK;
+
+	/* UncorrectableErrorInfo */
+	if (error_info_select & ES_UNCORRECTABLE_ERROR_INFO)
+		/* turn off status bit */
+		dd->err_info_uncorrectable &= ~OPA_EI_STATUS_SMASK;
+
+	/* FMConfigErrorInfo */
+	if (error_info_select & ES_FM_CONFIG_ERROR_INFO)
+		/* turn off status bit */
+		dd->err_info_fmconfig &= ~OPA_EI_STATUS_SMASK;
+
+	if (resp_len)
+		*resp_len += sizeof(*req);
+
+	return reply((struct ib_mad_hdr *)pmp);
+}
+
+struct opa_congestion_info_attr {
+	__be16 congestion_info;
+	u8 control_table_cap;	/* Multiple of 64 entry unit CCTs */
+	u8 congestion_log_length;
+} __packed;
+
+static int __subn_get_opa_cong_info(struct opa_smp *smp, u32 am, u8 *data,
+				    struct ib_device *ibdev, u8 port,
+				    u32 *resp_len)
+{
+	struct opa_congestion_info_attr *p =
+		(struct opa_congestion_info_attr *)data;
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+
+	p->congestion_info = 0;
+	p->control_table_cap = ppd->cc_max_table_entries;
+	p->congestion_log_length = OPA_CONG_LOG_ELEMS;
+
+	if (resp_len)
+		*resp_len += sizeof(*p);
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int __subn_get_opa_cong_setting(struct opa_smp *smp, u32 am,
+					     u8 *data,
+					     struct ib_device *ibdev,
+					     u8 port, u32 *resp_len)
+{
+	int i;
+	struct opa_congestion_setting_attr *p =
+		(struct opa_congestion_setting_attr *) data;
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	struct opa_congestion_setting_entry_shadow *entries;
+	struct cc_state *cc_state;
+
+	rcu_read_lock();
+
+	cc_state = get_cc_state(ppd);
+
+	if (cc_state == NULL) {
+		rcu_read_unlock();
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	entries = cc_state->cong_setting.entries;
+	p->port_control = cpu_to_be16(cc_state->cong_setting.port_control);
+	p->control_map = cpu_to_be32(cc_state->cong_setting.control_map);
+	for (i = 0; i < OPA_MAX_SLS; i++) {
+		p->entries[i].ccti_increase = entries[i].ccti_increase;
+		p->entries[i].ccti_timer = cpu_to_be16(entries[i].ccti_timer);
+		p->entries[i].trigger_threshold =
+			entries[i].trigger_threshold;
+		p->entries[i].ccti_min = entries[i].ccti_min;
+	}
+
+	rcu_read_unlock();
+
+	if (resp_len)
+		*resp_len += sizeof(*p);
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int __subn_set_opa_cong_setting(struct opa_smp *smp, u32 am, u8 *data,
+				       struct ib_device *ibdev, u8 port,
+				       u32 *resp_len)
+{
+	struct opa_congestion_setting_attr *p =
+		(struct opa_congestion_setting_attr *) data;
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	struct opa_congestion_setting_entry_shadow *entries;
+	int i;
+
+	ppd->cc_sl_control_map = be32_to_cpu(p->control_map);
+
+	entries = ppd->congestion_entries;
+	for (i = 0; i < OPA_MAX_SLS; i++) {
+		entries[i].ccti_increase = p->entries[i].ccti_increase;
+		entries[i].ccti_timer = be16_to_cpu(p->entries[i].ccti_timer);
+		entries[i].trigger_threshold =
+			p->entries[i].trigger_threshold;
+		entries[i].ccti_min = p->entries[i].ccti_min;
+	}
+
+	return __subn_get_opa_cong_setting(smp, am, data, ibdev, port,
+					   resp_len);
+}
+
+static int __subn_get_opa_hfi1_cong_log(struct opa_smp *smp, u32 am,
+					u8 *data, struct ib_device *ibdev,
+					u8 port, u32 *resp_len)
+{
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	struct opa_hfi1_cong_log *cong_log = (struct opa_hfi1_cong_log *)data;
+	s64 ts;
+	int i;
+
+	if (am != 0) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	spin_lock(&ppd->cc_log_lock);
+
+	cong_log->log_type = OPA_CC_LOG_TYPE_HFI;
+	cong_log->congestion_flags = 0;
+	cong_log->threshold_event_counter =
+		cpu_to_be16(ppd->threshold_event_counter);
+	memcpy(cong_log->threshold_cong_event_map,
+	       ppd->threshold_cong_event_map,
+	       sizeof(cong_log->threshold_cong_event_map));
+	/* keep timestamp in units of 1.024 usec */
+	ts = ktime_to_ns(ktime_get()) / 1024;
+	cong_log->current_time_stamp = cpu_to_be32(ts);
+	for (i = 0; i < OPA_CONG_LOG_ELEMS; i++) {
+		struct opa_hfi1_cong_log_event_internal *cce =
+			&ppd->cc_events[ppd->cc_mad_idx++];
+		if (ppd->cc_mad_idx == OPA_CONG_LOG_ELEMS)
+			ppd->cc_mad_idx = 0;
+		/*
+		 * Entries which are older than twice the time
+		 * required to wrap the counter are supposed to
+		 * be zeroed (CA10-49 IBTA, release 1.2.1, V1).
+		 */
+		if ((u64)(ts - cce->timestamp) > (2 * UINT_MAX))
+			continue;
+		memcpy(cong_log->events[i].local_qp_cn_entry, &cce->lqpn, 3);
+		memcpy(cong_log->events[i].remote_qp_number_cn_entry,
+			&cce->rqpn, 3);
+		cong_log->events[i].sl_svc_type_cn_entry =
+			((cce->sl & 0x1f) << 3) | (cce->svc_type & 0x7);
+		cong_log->events[i].remote_lid_cn_entry =
+			cpu_to_be32(cce->rlid);
+		cong_log->events[i].timestamp_cn_entry =
+			cpu_to_be32(cce->timestamp);
+	}
+
+	/*
+	 * Reset threshold_cong_event_map, and threshold_event_counter
+	 * to 0 when log is read.
+	 */
+	memset(ppd->threshold_cong_event_map, 0x0,
+	       sizeof(ppd->threshold_cong_event_map));
+	ppd->threshold_event_counter = 0;
+
+	spin_unlock(&ppd->cc_log_lock);
+
+	if (resp_len)
+		*resp_len += sizeof(struct opa_hfi1_cong_log);
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int __subn_get_opa_cc_table(struct opa_smp *smp, u32 am, u8 *data,
+				   struct ib_device *ibdev, u8 port,
+				   u32 *resp_len)
+{
+	struct ib_cc_table_attr *cc_table_attr =
+		(struct ib_cc_table_attr *) data;
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	u32 start_block = OPA_AM_START_BLK(am);
+	u32 n_blocks = OPA_AM_NBLK(am);
+	struct ib_cc_table_entry_shadow *entries;
+	int i, j;
+	u32 sentry, eentry;
+	struct cc_state *cc_state;
+
+	/* sanity check n_blocks, start_block */
+	if (n_blocks == 0 ||
+	    start_block + n_blocks > ppd->cc_max_table_entries) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	rcu_read_lock();
+
+	cc_state = get_cc_state(ppd);
+
+	if (cc_state == NULL) {
+		rcu_read_unlock();
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	sentry = start_block * IB_CCT_ENTRIES;
+	eentry = sentry + (IB_CCT_ENTRIES * n_blocks);
+
+	cc_table_attr->ccti_limit = cpu_to_be16(cc_state->cct.ccti_limit);
+
+	entries = cc_state->cct.entries;
+
+	/* return n_blocks, though the last block may not be full */
+	for (j = 0, i = sentry; i < eentry; j++, i++)
+		cc_table_attr->ccti_entries[j].entry =
+			cpu_to_be16(entries[i].entry);
+
+	rcu_read_unlock();
+
+	if (resp_len)
+		*resp_len += sizeof(u16)*(IB_CCT_ENTRIES * n_blocks + 1);
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+void cc_state_reclaim(struct rcu_head *rcu)
+{
+	struct cc_state *cc_state = container_of(rcu, struct cc_state, rcu);
+
+	kfree(cc_state);
+}
+
+static int __subn_set_opa_cc_table(struct opa_smp *smp, u32 am, u8 *data,
+				   struct ib_device *ibdev, u8 port,
+				   u32 *resp_len)
+{
+	struct ib_cc_table_attr *p = (struct ib_cc_table_attr *) data;
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	u32 start_block = OPA_AM_START_BLK(am);
+	u32 n_blocks = OPA_AM_NBLK(am);
+	struct ib_cc_table_entry_shadow *entries;
+	int i, j;
+	u32 sentry, eentry;
+	u16 ccti_limit;
+	struct cc_state *old_cc_state, *new_cc_state;
+
+	/* sanity check n_blocks, start_block */
+	if (n_blocks == 0 ||
+	    start_block + n_blocks > ppd->cc_max_table_entries) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	sentry = start_block * IB_CCT_ENTRIES;
+	eentry = sentry + ((n_blocks - 1) * IB_CCT_ENTRIES) +
+		 (be16_to_cpu(p->ccti_limit)) % IB_CCT_ENTRIES + 1;
+
+	/* sanity check ccti_limit */
+	ccti_limit = be16_to_cpu(p->ccti_limit);
+	if (ccti_limit + 1 > eentry) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	new_cc_state = kzalloc(sizeof(*new_cc_state), GFP_KERNEL);
+	if (new_cc_state == NULL)
+		goto getit;
+
+	spin_lock(&ppd->cc_state_lock);
+
+	old_cc_state = get_cc_state(ppd);
+
+	if (old_cc_state == NULL) {
+		spin_unlock(&ppd->cc_state_lock);
+		kfree(new_cc_state);
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	*new_cc_state = *old_cc_state;
+
+	new_cc_state->cct.ccti_limit = ccti_limit;
+
+	entries = ppd->ccti_entries;
+	ppd->total_cct_entry = ccti_limit + 1;
+
+	for (j = 0, i = sentry; i < eentry; j++, i++)
+		entries[i].entry = be16_to_cpu(p->ccti_entries[j].entry);
+
+	memcpy(new_cc_state->cct.entries, entries,
+	       eentry * sizeof(struct ib_cc_table_entry));
+
+	new_cc_state->cong_setting.port_control = IB_CC_CCS_PC_SL_BASED;
+	new_cc_state->cong_setting.control_map = ppd->cc_sl_control_map;
+	memcpy(new_cc_state->cong_setting.entries, ppd->congestion_entries,
+	       OPA_MAX_SLS * sizeof(struct opa_congestion_setting_entry));
+
+	rcu_assign_pointer(ppd->cc_state, new_cc_state);
+
+	spin_unlock(&ppd->cc_state_lock);
+
+	call_rcu(&old_cc_state->rcu, cc_state_reclaim);
+
+getit:
+	return __subn_get_opa_cc_table(smp, am, data, ibdev, port, resp_len);
+}
+
+struct opa_led_info {
+	__be32 rsvd_led_mask;
+	__be32 rsvd;
+};
+
+#define OPA_LED_SHIFT	31
+#define OPA_LED_MASK	(1 << OPA_LED_SHIFT)
+
+static int __subn_get_opa_led_info(struct opa_smp *smp, u32 am, u8 *data,
+				   struct ib_device *ibdev, u8 port,
+				   u32 *resp_len)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct opa_led_info *p = (struct opa_led_info *) data;
+	u32 nport = OPA_AM_NPORT(am);
+	u64 reg;
+
+	if (nport != 1 || OPA_AM_PORTNUM(am)) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	reg = read_csr(dd, DCC_CFG_LED_CNTRL);
+	if ((reg & DCC_CFG_LED_CNTRL_LED_CNTRL_SMASK) &&
+		((reg & DCC_CFG_LED_CNTRL_LED_SW_BLINK_RATE_SMASK) == 0xf))
+			p->rsvd_led_mask = cpu_to_be32(OPA_LED_MASK);
+
+	if (resp_len)
+		*resp_len += sizeof(struct opa_led_info);
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int __subn_set_opa_led_info(struct opa_smp *smp, u32 am, u8 *data,
+				   struct ib_device *ibdev, u8 port,
+				   u32 *resp_len)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct opa_led_info *p = (struct opa_led_info *) data;
+	u32 nport = OPA_AM_NPORT(am);
+	int on = !!(be32_to_cpu(p->rsvd_led_mask) & OPA_LED_MASK);
+
+	if (nport != 1 || OPA_AM_PORTNUM(am)) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	setextled(dd, on);
+
+	return __subn_get_opa_led_info(smp, am, data, ibdev, port, resp_len);
+}
+
+static int subn_get_opa_sma(__be16 attr_id, struct opa_smp *smp, u32 am,
+			    u8 *data, struct ib_device *ibdev, u8 port,
+			    u32 *resp_len)
+{
+	int ret;
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+
+	switch (attr_id) {
+	case IB_SMP_ATTR_NODE_DESC:
+		ret = __subn_get_opa_nodedesc(smp, am, data, ibdev, port,
+					      resp_len);
+		break;
+	case IB_SMP_ATTR_NODE_INFO:
+		ret = __subn_get_opa_nodeinfo(smp, am, data, ibdev, port,
+					      resp_len);
+		break;
+	case IB_SMP_ATTR_PORT_INFO:
+		ret = __subn_get_opa_portinfo(smp, am, data, ibdev, port,
+					      resp_len);
+		break;
+	case IB_SMP_ATTR_PKEY_TABLE:
+		ret = __subn_get_opa_pkeytable(smp, am, data, ibdev, port,
+					       resp_len);
+		break;
+	case OPA_ATTRIB_ID_SL_TO_SC_MAP:
+		ret = __subn_get_opa_sl_to_sc(smp, am, data, ibdev, port,
+					      resp_len);
+		break;
+	case OPA_ATTRIB_ID_SC_TO_SL_MAP:
+		ret = __subn_get_opa_sc_to_sl(smp, am, data, ibdev, port,
+					      resp_len);
+		break;
+	case OPA_ATTRIB_ID_SC_TO_VLT_MAP:
+		ret = __subn_get_opa_sc_to_vlt(smp, am, data, ibdev, port,
+					       resp_len);
+		break;
+	case OPA_ATTRIB_ID_SC_TO_VLNT_MAP:
+		ret = __subn_get_opa_sc_to_vlnt(smp, am, data, ibdev, port,
+					       resp_len);
+		break;
+	case OPA_ATTRIB_ID_PORT_STATE_INFO:
+		ret = __subn_get_opa_psi(smp, am, data, ibdev, port,
+					 resp_len);
+		break;
+	case OPA_ATTRIB_ID_BUFFER_CONTROL_TABLE:
+		ret = __subn_get_opa_bct(smp, am, data, ibdev, port,
+					 resp_len);
+		break;
+	case OPA_ATTRIB_ID_CABLE_INFO:
+		ret = __subn_get_opa_cable_info(smp, am, data, ibdev, port,
+						resp_len);
+		break;
+	case IB_SMP_ATTR_VL_ARB_TABLE:
+		ret = __subn_get_opa_vl_arb(smp, am, data, ibdev, port,
+					    resp_len);
+		break;
+	case OPA_ATTRIB_ID_CONGESTION_INFO:
+		ret = __subn_get_opa_cong_info(smp, am, data, ibdev, port,
+					       resp_len);
+		break;
+	case OPA_ATTRIB_ID_HFI_CONGESTION_SETTING:
+		ret = __subn_get_opa_cong_setting(smp, am, data, ibdev,
+						  port, resp_len);
+		break;
+	case OPA_ATTRIB_ID_HFI_CONGESTION_LOG:
+		ret = __subn_get_opa_hfi1_cong_log(smp, am, data, ibdev,
+						   port, resp_len);
+		break;
+	case OPA_ATTRIB_ID_CONGESTION_CONTROL_TABLE:
+		ret = __subn_get_opa_cc_table(smp, am, data, ibdev, port,
+					      resp_len);
+		break;
+	case IB_SMP_ATTR_LED_INFO:
+		ret = __subn_get_opa_led_info(smp, am, data, ibdev, port,
+					      resp_len);
+		break;
+	case IB_SMP_ATTR_SM_INFO:
+		if (ibp->port_cap_flags & IB_PORT_SM_DISABLED)
+			return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED;
+		if (ibp->port_cap_flags & IB_PORT_SM)
+			return IB_MAD_RESULT_SUCCESS;
+		/* FALLTHROUGH */
+	default:
+		smp->status |= IB_SMP_UNSUP_METH_ATTR;
+		ret = reply((struct ib_mad_hdr *)smp);
+		break;
+	}
+	return ret;
+}
+
+static int subn_set_opa_sma(__be16 attr_id, struct opa_smp *smp, u32 am,
+			    u8 *data, struct ib_device *ibdev, u8 port,
+			    u32 *resp_len)
+{
+	int ret;
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+
+	switch (attr_id) {
+	case IB_SMP_ATTR_PORT_INFO:
+		ret = __subn_set_opa_portinfo(smp, am, data, ibdev, port,
+					      resp_len);
+		break;
+	case IB_SMP_ATTR_PKEY_TABLE:
+		ret = __subn_set_opa_pkeytable(smp, am, data, ibdev, port,
+					       resp_len);
+		break;
+	case OPA_ATTRIB_ID_SL_TO_SC_MAP:
+		ret = __subn_set_opa_sl_to_sc(smp, am, data, ibdev, port,
+					      resp_len);
+		break;
+	case OPA_ATTRIB_ID_SC_TO_SL_MAP:
+		ret = __subn_set_opa_sc_to_sl(smp, am, data, ibdev, port,
+					      resp_len);
+		break;
+	case OPA_ATTRIB_ID_SC_TO_VLT_MAP:
+		ret = __subn_set_opa_sc_to_vlt(smp, am, data, ibdev, port,
+					       resp_len);
+		break;
+	case OPA_ATTRIB_ID_SC_TO_VLNT_MAP:
+		ret = __subn_set_opa_sc_to_vlnt(smp, am, data, ibdev, port,
+					       resp_len);
+		break;
+	case OPA_ATTRIB_ID_PORT_STATE_INFO:
+		ret = __subn_set_opa_psi(smp, am, data, ibdev, port,
+					 resp_len);
+		break;
+	case OPA_ATTRIB_ID_BUFFER_CONTROL_TABLE:
+		ret = __subn_set_opa_bct(smp, am, data, ibdev, port,
+					 resp_len);
+		break;
+	case IB_SMP_ATTR_VL_ARB_TABLE:
+		ret = __subn_set_opa_vl_arb(smp, am, data, ibdev, port,
+					    resp_len);
+		break;
+	case OPA_ATTRIB_ID_HFI_CONGESTION_SETTING:
+		ret = __subn_set_opa_cong_setting(smp, am, data, ibdev,
+						  port, resp_len);
+		break;
+	case OPA_ATTRIB_ID_CONGESTION_CONTROL_TABLE:
+		ret = __subn_set_opa_cc_table(smp, am, data, ibdev, port,
+					      resp_len);
+		break;
+	case IB_SMP_ATTR_LED_INFO:
+		ret = __subn_set_opa_led_info(smp, am, data, ibdev, port,
+					      resp_len);
+		break;
+	case IB_SMP_ATTR_SM_INFO:
+		if (ibp->port_cap_flags & IB_PORT_SM_DISABLED)
+			return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED;
+		if (ibp->port_cap_flags & IB_PORT_SM)
+			return IB_MAD_RESULT_SUCCESS;
+		/* FALLTHROUGH */
+	default:
+		smp->status |= IB_SMP_UNSUP_METH_ATTR;
+		ret = reply((struct ib_mad_hdr *)smp);
+		break;
+	}
+	return ret;
+}
+
+static inline void set_aggr_error(struct opa_aggregate *ag)
+{
+	ag->err_reqlength |= cpu_to_be16(0x8000);
+}
+
+static int subn_get_opa_aggregate(struct opa_smp *smp,
+				  struct ib_device *ibdev, u8 port,
+				  u32 *resp_len)
+{
+	int i;
+	u32 num_attr = be32_to_cpu(smp->attr_mod) & 0x000000ff;
+	u8 *next_smp = opa_get_smp_data(smp);
+
+	if (num_attr < 1 || num_attr > 117) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	for (i = 0; i < num_attr; i++) {
+		struct opa_aggregate *agg;
+		size_t agg_data_len;
+		size_t agg_size;
+		u32 am;
+
+		agg = (struct opa_aggregate *)next_smp;
+		agg_data_len = (be16_to_cpu(agg->err_reqlength) & 0x007f) * 8;
+		agg_size = sizeof(*agg) + agg_data_len;
+		am = be32_to_cpu(agg->attr_mod);
+
+		*resp_len += agg_size;
+
+		if (next_smp + agg_size > ((u8 *)smp) + sizeof(*smp)) {
+			smp->status |= IB_SMP_INVALID_FIELD;
+			return reply((struct ib_mad_hdr *)smp);
+		}
+
+		/* zero the payload for this segment */
+		memset(next_smp + sizeof(*agg), 0, agg_data_len);
+
+		(void) subn_get_opa_sma(agg->attr_id, smp, am, agg->data,
+					ibdev, port, NULL);
+		if (smp->status & ~IB_SMP_DIRECTION) {
+			set_aggr_error(agg);
+			return reply((struct ib_mad_hdr *)smp);
+		}
+		next_smp += agg_size;
+
+	}
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+static int subn_set_opa_aggregate(struct opa_smp *smp,
+				  struct ib_device *ibdev, u8 port,
+				  u32 *resp_len)
+{
+	int i;
+	u32 num_attr = be32_to_cpu(smp->attr_mod) & 0x000000ff;
+	u8 *next_smp = opa_get_smp_data(smp);
+
+	if (num_attr < 1 || num_attr > 117) {
+		smp->status |= IB_SMP_INVALID_FIELD;
+		return reply((struct ib_mad_hdr *)smp);
+	}
+
+	for (i = 0; i < num_attr; i++) {
+		struct opa_aggregate *agg;
+		size_t agg_data_len;
+		size_t agg_size;
+		u32 am;
+
+		agg = (struct opa_aggregate *)next_smp;
+		agg_data_len = (be16_to_cpu(agg->err_reqlength) & 0x007f) * 8;
+		agg_size = sizeof(*agg) + agg_data_len;
+		am = be32_to_cpu(agg->attr_mod);
+
+		*resp_len += agg_size;
+
+		if (next_smp + agg_size > ((u8 *)smp) + sizeof(*smp)) {
+			smp->status |= IB_SMP_INVALID_FIELD;
+			return reply((struct ib_mad_hdr *)smp);
+		}
+
+		(void) subn_set_opa_sma(agg->attr_id, smp, am, agg->data,
+					ibdev, port, NULL);
+		if (smp->status & ~IB_SMP_DIRECTION) {
+			set_aggr_error(agg);
+			return reply((struct ib_mad_hdr *)smp);
+		}
+		next_smp += agg_size;
+
+	}
+
+	return reply((struct ib_mad_hdr *)smp);
+}
+
+/*
+ * OPAv1 specifies that, on the transition to link up, these counters
+ * are cleared:
+ *   PortRcvErrors [*]
+ *   LinkErrorRecovery
+ *   LocalLinkIntegrityErrors
+ *   ExcessiveBufferOverruns [*]
+ *
+ * [*] Error info associated with these counters is retained, but the
+ * error info status is reset to 0.
+ */
+void clear_linkup_counters(struct hfi1_devdata *dd)
+{
+	/* PortRcvErrors */
+	write_dev_cntr(dd, C_DC_RCV_ERR, CNTR_INVALID_VL, 0);
+	dd->err_info_rcvport.status_and_code &= ~OPA_EI_STATUS_SMASK;
+	/* LinkErrorRecovery */
+	write_dev_cntr(dd, C_DC_SEQ_CRC_CNT, CNTR_INVALID_VL, 0);
+	write_dev_cntr(dd, C_DC_REINIT_FROM_PEER_CNT, CNTR_INVALID_VL, 0);
+	/* LocalLinkIntegrityErrors */
+	write_dev_cntr(dd, C_DC_TX_REPLAY, CNTR_INVALID_VL, 0);
+	write_dev_cntr(dd, C_DC_RX_REPLAY, CNTR_INVALID_VL, 0);
+	/* ExcessiveBufferOverruns */
+	write_dev_cntr(dd, C_RCV_OVF, CNTR_INVALID_VL, 0);
+	dd->rcv_ovfl_cnt = 0;
+	dd->err_info_xmit_constraint.status &= ~OPA_EI_STATUS_SMASK;
+}
+
+/*
+ * is_local_mad() returns 1 if 'mad' is sent from, and destined to the
+ * local node, 0 otherwise.
+ */
+static int is_local_mad(struct hfi1_ibport *ibp, const struct opa_mad *mad,
+			const struct ib_wc *in_wc)
+{
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	const struct opa_smp *smp = (const struct opa_smp *)mad;
+
+	if (smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
+		return (smp->hop_cnt == 0 &&
+			smp->route.dr.dr_slid == OPA_LID_PERMISSIVE &&
+			smp->route.dr.dr_dlid == OPA_LID_PERMISSIVE);
+	}
+
+	return (in_wc->slid == ppd->lid);
+}
+
+/*
+ * opa_local_smp_check() should only be called on MADs for which
+ * is_local_mad() returns true. It applies the SMP checks that are
+ * specific to SMPs which are sent from, and destined to this node.
+ * opa_local_smp_check() returns 0 if the SMP passes its checks, 1
+ * otherwise.
+ *
+ * SMPs which arrive from other nodes are instead checked by
+ * opa_smp_check().
+ */
+static int opa_local_smp_check(struct hfi1_ibport *ibp,
+			       const struct ib_wc *in_wc)
+{
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	u16 slid = in_wc->slid;
+	u16 pkey;
+
+	if (in_wc->pkey_index >= ARRAY_SIZE(ppd->pkeys))
+		return 1;
+
+	pkey = ppd->pkeys[in_wc->pkey_index];
+	/*
+	 * We need to do the "node-local" checks specified in OPAv1,
+	 * rev 0.90, section 9.10.26, which are:
+	 *   - pkey is 0x7fff, or 0xffff
+	 *   - Source QPN == 0 || Destination QPN == 0
+	 *   - the MAD header's management class is either
+	 *     IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE or
+	 *     IB_MGMT_CLASS_SUBN_LID_ROUTED
+	 *   - SLID != 0
+	 *
+	 * However, we know (and so don't need to check again) that,
+	 * for local SMPs, the MAD stack passes MADs with:
+	 *   - Source QPN of 0
+	 *   - MAD mgmt_class is IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE
+	 *   - SLID is either: OPA_LID_PERMISSIVE (0xFFFFFFFF), or
+	 *     our own port's lid
+	 *
+	 */
+	if (pkey == LIM_MGMT_P_KEY || pkey == FULL_MGMT_P_KEY)
+		return 0;
+	ingress_pkey_table_fail(ppd, pkey, slid);
+	return 1;
+}
+
+static int process_subn_opa(struct ib_device *ibdev, int mad_flags,
+			    u8 port, const struct opa_mad *in_mad,
+			    struct opa_mad *out_mad,
+			    u32 *resp_len)
+{
+	struct opa_smp *smp = (struct opa_smp *)out_mad;
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	u8 *data;
+	u32 am;
+	__be16 attr_id;
+	int ret;
+
+	*out_mad = *in_mad;
+	data = opa_get_smp_data(smp);
+
+	am = be32_to_cpu(smp->attr_mod);
+	attr_id = smp->attr_id;
+	if (smp->class_version != OPA_SMI_CLASS_VERSION) {
+		smp->status |= IB_SMP_UNSUP_VERSION;
+		ret = reply((struct ib_mad_hdr *)smp);
+		goto bail;
+	}
+	ret = check_mkey(ibp, (struct ib_mad_hdr *)smp, mad_flags, smp->mkey,
+			 smp->route.dr.dr_slid, smp->route.dr.return_path,
+			 smp->hop_cnt);
+	if (ret) {
+		u32 port_num = be32_to_cpu(smp->attr_mod);
+
+		/*
+		 * If this is a get/set portinfo, we already check the
+		 * M_Key if the MAD is for another port and the M_Key
+		 * is OK on the receiving port. This check is needed
+		 * to increment the error counters when the M_Key
+		 * fails to match on *both* ports.
+		 */
+		if (attr_id == IB_SMP_ATTR_PORT_INFO &&
+		    (smp->method == IB_MGMT_METHOD_GET ||
+		     smp->method == IB_MGMT_METHOD_SET) &&
+		    port_num && port_num <= ibdev->phys_port_cnt &&
+		    port != port_num)
+			(void) check_mkey(to_iport(ibdev, port_num),
+					  (struct ib_mad_hdr *)smp, 0,
+					  smp->mkey, smp->route.dr.dr_slid,
+					  smp->route.dr.return_path,
+					  smp->hop_cnt);
+		ret = IB_MAD_RESULT_FAILURE;
+		goto bail;
+	}
+
+	*resp_len = opa_get_smp_header_size(smp);
+
+	switch (smp->method) {
+	case IB_MGMT_METHOD_GET:
+		switch (attr_id) {
+		default:
+			clear_opa_smp_data(smp);
+			ret = subn_get_opa_sma(attr_id, smp, am, data,
+					       ibdev, port, resp_len);
+			goto bail;
+		case OPA_ATTRIB_ID_AGGREGATE:
+			ret = subn_get_opa_aggregate(smp, ibdev, port,
+						     resp_len);
+			goto bail;
+		}
+	case IB_MGMT_METHOD_SET:
+		switch (attr_id) {
+		default:
+			ret = subn_set_opa_sma(attr_id, smp, am, data,
+					       ibdev, port, resp_len);
+			goto bail;
+		case OPA_ATTRIB_ID_AGGREGATE:
+			ret = subn_set_opa_aggregate(smp, ibdev, port,
+						     resp_len);
+			goto bail;
+		}
+	case IB_MGMT_METHOD_TRAP:
+	case IB_MGMT_METHOD_REPORT:
+	case IB_MGMT_METHOD_REPORT_RESP:
+	case IB_MGMT_METHOD_GET_RESP:
+		/*
+		 * The ib_mad module will call us to process responses
+		 * before checking for other consumers.
+		 * Just tell the caller to process it normally.
+		 */
+		ret = IB_MAD_RESULT_SUCCESS;
+		goto bail;
+	default:
+		smp->status |= IB_SMP_UNSUP_METHOD;
+		ret = reply((struct ib_mad_hdr *)smp);
+	}
+
+bail:
+	return ret;
+}
+
+static int process_subn(struct ib_device *ibdev, int mad_flags,
+			u8 port, const struct ib_mad *in_mad,
+			struct ib_mad *out_mad)
+{
+	struct ib_smp *smp = (struct ib_smp *)out_mad;
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	int ret;
+
+	*out_mad = *in_mad;
+	if (smp->class_version != 1) {
+		smp->status |= IB_SMP_UNSUP_VERSION;
+		ret = reply((struct ib_mad_hdr *)smp);
+		goto bail;
+	}
+
+	ret = check_mkey(ibp, (struct ib_mad_hdr *)smp, mad_flags,
+			 smp->mkey, (__force __be32)smp->dr_slid,
+			 smp->return_path, smp->hop_cnt);
+	if (ret) {
+		u32 port_num = be32_to_cpu(smp->attr_mod);
+
+		/*
+		 * If this is a get/set portinfo, we already check the
+		 * M_Key if the MAD is for another port and the M_Key
+		 * is OK on the receiving port. This check is needed
+		 * to increment the error counters when the M_Key
+		 * fails to match on *both* ports.
+		 */
+		if (in_mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO &&
+		    (smp->method == IB_MGMT_METHOD_GET ||
+		     smp->method == IB_MGMT_METHOD_SET) &&
+		    port_num && port_num <= ibdev->phys_port_cnt &&
+		    port != port_num)
+			(void) check_mkey(to_iport(ibdev, port_num),
+					  (struct ib_mad_hdr *)smp, 0,
+					  smp->mkey,
+					  (__force __be32)smp->dr_slid,
+					  smp->return_path, smp->hop_cnt);
+		ret = IB_MAD_RESULT_FAILURE;
+		goto bail;
+	}
+
+	switch (smp->method) {
+	case IB_MGMT_METHOD_GET:
+		switch (smp->attr_id) {
+		case IB_SMP_ATTR_NODE_INFO:
+			ret = subn_get_nodeinfo(smp, ibdev, port);
+			goto bail;
+		default:
+			smp->status |= IB_SMP_UNSUP_METH_ATTR;
+			ret = reply((struct ib_mad_hdr *)smp);
+			goto bail;
+		}
+	}
+
+bail:
+	return ret;
+}
+
+static int process_perf_opa(struct ib_device *ibdev, u8 port,
+			    const struct opa_mad *in_mad,
+			    struct opa_mad *out_mad, u32 *resp_len)
+{
+	struct opa_pma_mad *pmp = (struct opa_pma_mad *)out_mad;
+	int ret;
+
+	*out_mad = *in_mad;
+
+	if (pmp->mad_hdr.class_version != OPA_SMI_CLASS_VERSION) {
+		pmp->mad_hdr.status |= IB_SMP_UNSUP_VERSION;
+		return reply((struct ib_mad_hdr *)pmp);
+	}
+
+	*resp_len = sizeof(pmp->mad_hdr);
+
+	switch (pmp->mad_hdr.method) {
+	case IB_MGMT_METHOD_GET:
+		switch (pmp->mad_hdr.attr_id) {
+		case IB_PMA_CLASS_PORT_INFO:
+			ret = pma_get_opa_classportinfo(pmp, ibdev, resp_len);
+			goto bail;
+		case OPA_PM_ATTRIB_ID_PORT_STATUS:
+			ret = pma_get_opa_portstatus(pmp, ibdev, port,
+								resp_len);
+			goto bail;
+		case OPA_PM_ATTRIB_ID_DATA_PORT_COUNTERS:
+			ret = pma_get_opa_datacounters(pmp, ibdev, port,
+								resp_len);
+			goto bail;
+		case OPA_PM_ATTRIB_ID_ERROR_PORT_COUNTERS:
+			ret = pma_get_opa_porterrors(pmp, ibdev, port,
+								resp_len);
+			goto bail;
+		case OPA_PM_ATTRIB_ID_ERROR_INFO:
+			ret = pma_get_opa_errorinfo(pmp, ibdev, port,
+								resp_len);
+			goto bail;
+		default:
+			pmp->mad_hdr.status |= IB_SMP_UNSUP_METH_ATTR;
+			ret = reply((struct ib_mad_hdr *)pmp);
+			goto bail;
+		}
+
+	case IB_MGMT_METHOD_SET:
+		switch (pmp->mad_hdr.attr_id) {
+		case OPA_PM_ATTRIB_ID_CLEAR_PORT_STATUS:
+			ret = pma_set_opa_portstatus(pmp, ibdev, port,
+								resp_len);
+			goto bail;
+		case OPA_PM_ATTRIB_ID_ERROR_INFO:
+			ret = pma_set_opa_errorinfo(pmp, ibdev, port,
+								resp_len);
+			goto bail;
+		default:
+			pmp->mad_hdr.status |= IB_SMP_UNSUP_METH_ATTR;
+			ret = reply((struct ib_mad_hdr *)pmp);
+			goto bail;
+		}
+
+	case IB_MGMT_METHOD_TRAP:
+	case IB_MGMT_METHOD_GET_RESP:
+		/*
+		 * The ib_mad module will call us to process responses
+		 * before checking for other consumers.
+		 * Just tell the caller to process it normally.
+		 */
+		ret = IB_MAD_RESULT_SUCCESS;
+		goto bail;
+
+	default:
+		pmp->mad_hdr.status |= IB_SMP_UNSUP_METHOD;
+		ret = reply((struct ib_mad_hdr *)pmp);
+	}
+
+bail:
+	return ret;
+}
+
+static int hfi1_process_opa_mad(struct ib_device *ibdev, int mad_flags,
+			        u8 port, const struct ib_wc *in_wc,
+			        const struct ib_grh *in_grh,
+			        const struct opa_mad *in_mad,
+			        struct opa_mad *out_mad, size_t *out_mad_size,
+			        u16 *out_mad_pkey_index)
+{
+	int ret;
+	int pkey_idx;
+	u32 resp_len = 0;
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+
+	pkey_idx = hfi1_lookup_pkey_idx(ibp, LIM_MGMT_P_KEY);
+	if (pkey_idx < 0) {
+		pr_warn("failed to find limited mgmt pkey, defaulting 0x%x\n",
+			hfi1_get_pkey(ibp, 1));
+		pkey_idx = 1;
+	}
+	*out_mad_pkey_index = (u16)pkey_idx;
+
+	switch (in_mad->mad_hdr.mgmt_class) {
+	case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE:
+	case IB_MGMT_CLASS_SUBN_LID_ROUTED:
+		if (is_local_mad(ibp, in_mad, in_wc)) {
+			ret = opa_local_smp_check(ibp, in_wc);
+			if (ret)
+				return IB_MAD_RESULT_FAILURE;
+		}
+		ret = process_subn_opa(ibdev, mad_flags, port, in_mad,
+				       out_mad, &resp_len);
+		goto bail;
+	case IB_MGMT_CLASS_PERF_MGMT:
+		ret = process_perf_opa(ibdev, port, in_mad, out_mad,
+				       &resp_len);
+		goto bail;
+
+	default:
+		ret = IB_MAD_RESULT_SUCCESS;
+	}
+
+bail:
+	if (ret & IB_MAD_RESULT_REPLY)
+		*out_mad_size = round_up(resp_len, 8);
+	else if (ret & IB_MAD_RESULT_SUCCESS)
+		*out_mad_size = in_wc->byte_len - sizeof(struct ib_grh);
+
+	return ret;
+}
+
+static int hfi1_process_ib_mad(struct ib_device *ibdev, int mad_flags, u8 port,
+			       const struct ib_wc *in_wc,
+			       const struct ib_grh *in_grh,
+			       const struct ib_mad *in_mad,
+			       struct ib_mad *out_mad)
+{
+	int ret;
+
+	switch (in_mad->mad_hdr.mgmt_class) {
+	case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE:
+	case IB_MGMT_CLASS_SUBN_LID_ROUTED:
+		ret = process_subn(ibdev, mad_flags, port, in_mad, out_mad);
+		goto bail;
+	default:
+		ret = IB_MAD_RESULT_SUCCESS;
+	}
+
+bail:
+	return ret;
+}
+
+/**
+ * hfi1_process_mad - process an incoming MAD packet
+ * @ibdev: the infiniband device this packet came in on
+ * @mad_flags: MAD flags
+ * @port: the port number this packet came in on
+ * @in_wc: the work completion entry for this packet
+ * @in_grh: the global route header for this packet
+ * @in_mad: the incoming MAD
+ * @out_mad: any outgoing MAD reply
+ *
+ * Returns IB_MAD_RESULT_SUCCESS if this is a MAD that we are not
+ * interested in processing.
+ *
+ * Note that the verbs framework has already done the MAD sanity checks,
+ * and hop count/pointer updating for IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE
+ * MADs.
+ *
+ * This is called by the ib_mad module.
+ */
+int hfi1_process_mad(struct ib_device *ibdev, int mad_flags, u8 port,
+		     const struct ib_wc *in_wc, const struct ib_grh *in_grh,
+		     const struct ib_mad_hdr *in_mad, size_t in_mad_size,
+		     struct ib_mad_hdr *out_mad, size_t *out_mad_size,
+		     u16 *out_mad_pkey_index)
+{
+	switch (in_mad->base_version) {
+	case OPA_MGMT_BASE_VERSION:
+		if (unlikely(in_mad_size != sizeof(struct opa_mad))) {
+			dev_err(ibdev->dma_device, "invalid in_mad_size\n");
+			return IB_MAD_RESULT_FAILURE;
+		}
+		return hfi1_process_opa_mad(ibdev, mad_flags, port,
+					    in_wc, in_grh,
+					    (struct opa_mad *)in_mad,
+					    (struct opa_mad *)out_mad,
+					    out_mad_size,
+					    out_mad_pkey_index);
+	case IB_MGMT_BASE_VERSION:
+		return hfi1_process_ib_mad(ibdev, mad_flags, port,
+					  in_wc, in_grh,
+					  (const struct ib_mad *)in_mad,
+					  (struct ib_mad *)out_mad);
+	default:
+		break;
+	}
+
+	return IB_MAD_RESULT_FAILURE;
+}
+
+static void send_handler(struct ib_mad_agent *agent,
+			 struct ib_mad_send_wc *mad_send_wc)
+{
+	ib_free_send_mad(mad_send_wc->send_buf);
+}
+
+int hfi1_create_agents(struct hfi1_ibdev *dev)
+{
+	struct hfi1_devdata *dd = dd_from_dev(dev);
+	struct ib_mad_agent *agent;
+	struct hfi1_ibport *ibp;
+	int p;
+	int ret;
+
+	for (p = 0; p < dd->num_pports; p++) {
+		ibp = &dd->pport[p].ibport_data;
+		agent = ib_register_mad_agent(&dev->ibdev, p + 1, IB_QPT_SMI,
+					      NULL, 0, send_handler,
+					      NULL, NULL, 0);
+		if (IS_ERR(agent)) {
+			ret = PTR_ERR(agent);
+			goto err;
+		}
+
+		ibp->send_agent = agent;
+	}
+
+	return 0;
+
+err:
+	for (p = 0; p < dd->num_pports; p++) {
+		ibp = &dd->pport[p].ibport_data;
+		if (ibp->send_agent) {
+			agent = ibp->send_agent;
+			ibp->send_agent = NULL;
+			ib_unregister_mad_agent(agent);
+		}
+	}
+
+	return ret;
+}
+
+void hfi1_free_agents(struct hfi1_ibdev *dev)
+{
+	struct hfi1_devdata *dd = dd_from_dev(dev);
+	struct ib_mad_agent *agent;
+	struct hfi1_ibport *ibp;
+	int p;
+
+	for (p = 0; p < dd->num_pports; p++) {
+		ibp = &dd->pport[p].ibport_data;
+		if (ibp->send_agent) {
+			agent = ibp->send_agent;
+			ibp->send_agent = NULL;
+			ib_unregister_mad_agent(agent);
+		}
+		if (ibp->sm_ah) {
+			ib_destroy_ah(&ibp->sm_ah->ibah);
+			ibp->sm_ah = NULL;
+		}
+	}
+}
diff --git a/drivers/staging/rdma/hfi1/mad.h b/drivers/staging/rdma/hfi1/mad.h
new file mode 100644
index 000000000000..47457501c044
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/mad.h
@@ -0,0 +1,325 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#ifndef _HFI1_MAD_H
+#define _HFI1_MAD_H
+
+#include <rdma/ib_pma.h>
+#define USE_PI_LED_ENABLE	1 /* use led enabled bit in struct
+				   * opa_port_states, if available */
+#include <rdma/opa_smi.h>
+#include <rdma/opa_port_info.h>
+#ifndef PI_LED_ENABLE_SUP
+#define PI_LED_ENABLE_SUP 0
+#endif
+#include "opa_compat.h"
+
+
+
+#define IB_VLARB_LOWPRI_0_31    1
+#define IB_VLARB_LOWPRI_32_63   2
+#define IB_VLARB_HIGHPRI_0_31   3
+#define IB_VLARB_HIGHPRI_32_63  4
+
+#define OPA_MAX_PREEMPT_CAP         32
+#define OPA_VLARB_LOW_ELEMENTS       0
+#define OPA_VLARB_HIGH_ELEMENTS      1
+#define OPA_VLARB_PREEMPT_ELEMENTS   2
+#define OPA_VLARB_PREEMPT_MATRIX     3
+
+#define IB_PMA_PORT_COUNTERS_CONG       cpu_to_be16(0xFF00)
+
+struct ib_pma_portcounters_cong {
+	u8 reserved;
+	u8 reserved1;
+	__be16 port_check_rate;
+	__be16 symbol_error_counter;
+	u8 link_error_recovery_counter;
+	u8 link_downed_counter;
+	__be16 port_rcv_errors;
+	__be16 port_rcv_remphys_errors;
+	__be16 port_rcv_switch_relay_errors;
+	__be16 port_xmit_discards;
+	u8 port_xmit_constraint_errors;
+	u8 port_rcv_constraint_errors;
+	u8 reserved2;
+	u8 link_overrun_errors; /* LocalLink: 7:4, BufferOverrun: 3:0 */
+	__be16 reserved3;
+	__be16 vl15_dropped;
+	__be64 port_xmit_data;
+	__be64 port_rcv_data;
+	__be64 port_xmit_packets;
+	__be64 port_rcv_packets;
+	__be64 port_xmit_wait;
+	__be64 port_adr_events;
+} __packed;
+
+#define IB_SMP_UNSUP_VERSION    cpu_to_be16(0x0004)
+#define IB_SMP_UNSUP_METHOD     cpu_to_be16(0x0008)
+#define IB_SMP_UNSUP_METH_ATTR  cpu_to_be16(0x000C)
+#define IB_SMP_INVALID_FIELD    cpu_to_be16(0x001C)
+
+#define OPA_MAX_PREEMPT_CAP         32
+#define OPA_VLARB_LOW_ELEMENTS       0
+#define OPA_VLARB_HIGH_ELEMENTS      1
+#define OPA_VLARB_PREEMPT_ELEMENTS   2
+#define OPA_VLARB_PREEMPT_MATRIX     3
+
+#define HFI1_XMIT_RATE_UNSUPPORTED               0x0
+#define HFI1_XMIT_RATE_PICO                      0x7
+/* number of 4nsec cycles equaling 2secs */
+#define HFI1_CONG_TIMER_PSINTERVAL               0x1DCD64EC
+
+#define IB_CC_SVCTYPE_RC 0x0
+#define IB_CC_SVCTYPE_UC 0x1
+#define IB_CC_SVCTYPE_RD 0x2
+#define IB_CC_SVCTYPE_UD 0x3
+
+
+/*
+ * There should be an equivalent IB #define for the following, but
+ * I cannot find it.
+ */
+#define OPA_CC_LOG_TYPE_HFI	2
+
+struct opa_hfi1_cong_log_event_internal {
+	u32 lqpn;
+	u32 rqpn;
+	u8 sl;
+	u8 svc_type;
+	u32 rlid;
+	s64 timestamp; /* wider than 32 bits to detect 32 bit rollover */
+};
+
+struct opa_hfi1_cong_log_event {
+	u8 local_qp_cn_entry[3];
+	u8 remote_qp_number_cn_entry[3];
+	u8 sl_svc_type_cn_entry; /* 5 bits SL, 3 bits svc type */
+	u8 reserved;
+	__be32 remote_lid_cn_entry;
+	__be32 timestamp_cn_entry;
+} __packed;
+
+#define OPA_CONG_LOG_ELEMS	96
+
+struct opa_hfi1_cong_log {
+	u8 log_type;
+	u8 congestion_flags;
+	__be16 threshold_event_counter;
+	__be32 current_time_stamp;
+	u8 threshold_cong_event_map[OPA_MAX_SLS/8];
+	struct opa_hfi1_cong_log_event events[OPA_CONG_LOG_ELEMS];
+} __packed;
+
+#define IB_CC_TABLE_CAP_DEFAULT 31
+
+/* Port control flags */
+#define IB_CC_CCS_PC_SL_BASED 0x01
+
+struct opa_congestion_setting_entry {
+	u8 ccti_increase;
+	u8 reserved;
+	__be16 ccti_timer;
+	u8 trigger_threshold;
+	u8 ccti_min; /* min CCTI for cc table */
+} __packed;
+
+struct opa_congestion_setting_entry_shadow {
+	u8 ccti_increase;
+	u8 reserved;
+	u16 ccti_timer;
+	u8 trigger_threshold;
+	u8 ccti_min; /* min CCTI for cc table */
+} __packed;
+
+struct opa_congestion_setting_attr {
+	__be32 control_map;
+	__be16 port_control;
+	struct opa_congestion_setting_entry entries[OPA_MAX_SLS];
+} __packed;
+
+struct opa_congestion_setting_attr_shadow {
+	u32 control_map;
+	u16 port_control;
+	struct opa_congestion_setting_entry_shadow entries[OPA_MAX_SLS];
+} __packed;
+
+#define IB_CC_TABLE_ENTRY_INCREASE_DEFAULT 1
+#define IB_CC_TABLE_ENTRY_TIMER_DEFAULT 1
+
+/* 64 Congestion Control table entries in a single MAD */
+#define IB_CCT_ENTRIES 64
+#define IB_CCT_MIN_ENTRIES (IB_CCT_ENTRIES * 2)
+
+struct ib_cc_table_entry {
+	__be16 entry; /* shift:2, multiplier:14 */
+};
+
+struct ib_cc_table_entry_shadow {
+	u16 entry; /* shift:2, multiplier:14 */
+};
+
+struct ib_cc_table_attr {
+	__be16 ccti_limit; /* max CCTI for cc table */
+	struct ib_cc_table_entry ccti_entries[IB_CCT_ENTRIES];
+} __packed;
+
+struct ib_cc_table_attr_shadow {
+	u16 ccti_limit; /* max CCTI for cc table */
+	struct ib_cc_table_entry_shadow ccti_entries[IB_CCT_ENTRIES];
+} __packed;
+
+#define CC_TABLE_SHADOW_MAX \
+	(IB_CC_TABLE_CAP_DEFAULT * IB_CCT_ENTRIES)
+
+struct cc_table_shadow {
+	u16 ccti_limit; /* max CCTI for cc table */
+	struct ib_cc_table_entry_shadow entries[CC_TABLE_SHADOW_MAX];
+} __packed;
+
+/*
+ * struct cc_state combines the (active) per-port congestion control
+ * table, and the (active) per-SL congestion settings. cc_state data
+ * may need to be read in code paths that we want to be fast, so it
+ * is an RCU protected structure.
+ */
+struct cc_state {
+	struct rcu_head rcu;
+	struct cc_table_shadow cct;
+	struct opa_congestion_setting_attr_shadow cong_setting;
+};
+
+/*
+ * OPA BufferControl MAD
+ */
+
+/* attribute modifier macros */
+#define OPA_AM_NPORT_SHIFT	24
+#define OPA_AM_NPORT_MASK	0xff
+#define OPA_AM_NPORT_SMASK	(OPA_AM_NPORT_MASK << OPA_AM_NPORT_SHIFT)
+#define OPA_AM_NPORT(am)	(((am) >> OPA_AM_NPORT_SHIFT) & \
+					OPA_AM_NPORT_MASK)
+
+#define OPA_AM_NBLK_SHIFT	24
+#define OPA_AM_NBLK_MASK	0xff
+#define OPA_AM_NBLK_SMASK	(OPA_AM_NBLK_MASK << OPA_AM_NBLK_SHIFT)
+#define OPA_AM_NBLK(am)		(((am) >> OPA_AM_NBLK_SHIFT) & \
+					OPA_AM_NBLK_MASK)
+
+#define OPA_AM_START_BLK_SHIFT	0
+#define OPA_AM_START_BLK_MASK	0xff
+#define OPA_AM_START_BLK_SMASK	(OPA_AM_START_BLK_MASK << \
+					OPA_AM_START_BLK_SHIFT)
+#define OPA_AM_START_BLK(am)	(((am) >> OPA_AM_START_BLK_SHIFT) & \
+					OPA_AM_START_BLK_MASK)
+
+#define OPA_AM_PORTNUM_SHIFT	0
+#define OPA_AM_PORTNUM_MASK	0xff
+#define OPA_AM_PORTNUM_SMASK	(OPA_AM_PORTNUM_MASK << OPA_AM_PORTNUM_SHIFT)
+#define OPA_AM_PORTNUM(am)	(((am) >> OPA_AM_PORTNUM_SHIFT) & \
+					OPA_AM_PORTNUM_MASK)
+
+#define OPA_AM_ASYNC_SHIFT	12
+#define OPA_AM_ASYNC_MASK	0x1
+#define OPA_AM_ASYNC_SMASK	(OPA_AM_ASYNC_MASK << OPA_AM_ASYNC_SHIFT)
+#define OPA_AM_ASYNC(am)	(((am) >> OPA_AM_ASYNC_SHIFT) & \
+					OPA_AM_ASYNC_MASK)
+
+#define OPA_AM_START_SM_CFG_SHIFT	9
+#define OPA_AM_START_SM_CFG_MASK	0x1
+#define OPA_AM_START_SM_CFG_SMASK	(OPA_AM_START_SM_CFG_MASK << \
+						OPA_AM_START_SM_CFG_SHIFT)
+#define OPA_AM_START_SM_CFG(am)		(((am) >> OPA_AM_START_SM_CFG_SHIFT) \
+						& OPA_AM_START_SM_CFG_MASK)
+
+#define OPA_AM_CI_ADDR_SHIFT	19
+#define OPA_AM_CI_ADDR_MASK	0xfff
+#define OPA_AM_CI_ADDR_SMASK	(OPA_AM_CI_ADDR_MASK << OPA_CI_ADDR_SHIFT)
+#define OPA_AM_CI_ADDR(am)	(((am) >> OPA_AM_CI_ADDR_SHIFT) & \
+					OPA_AM_CI_ADDR_MASK)
+
+#define OPA_AM_CI_LEN_SHIFT	13
+#define OPA_AM_CI_LEN_MASK	0x3f
+#define OPA_AM_CI_LEN_SMASK	(OPA_AM_CI_LEN_MASK << OPA_CI_LEN_SHIFT)
+#define OPA_AM_CI_LEN(am)	(((am) >> OPA_AM_CI_LEN_SHIFT) & \
+					OPA_AM_CI_LEN_MASK)
+
+/* error info macros */
+#define OPA_EI_STATUS_SMASK	0x80
+#define OPA_EI_CODE_SMASK	0x0f
+
+struct vl_limit {
+	__be16 dedicated;
+	__be16 shared;
+};
+
+struct buffer_control {
+	__be16 reserved;
+	__be16 overall_shared_limit;
+	struct vl_limit vl[OPA_MAX_VLS];
+};
+
+struct sc2vlnt {
+	u8 vlnt[32]; /* 5 bit VL, 3 bits reserved */
+};
+
+/*
+ * The PortSamplesControl.CounterMasks field is an array of 3 bit fields
+ * which specify the N'th counter's capabilities. See ch. 16.1.3.2.
+ * We support 5 counters which only count the mandatory quantities.
+ */
+#define COUNTER_MASK(q, n) (q << ((9 - n) * 3))
+#define COUNTER_MASK0_9 \
+	cpu_to_be32(COUNTER_MASK(1, 0) | \
+		    COUNTER_MASK(1, 1) | \
+		    COUNTER_MASK(1, 2) | \
+		    COUNTER_MASK(1, 3) | \
+		    COUNTER_MASK(1, 4))
+
+#endif				/* _HFI1_MAD_H */
diff --git a/drivers/staging/rdma/hfi1/mmap.c b/drivers/staging/rdma/hfi1/mmap.c
new file mode 100644
index 000000000000..5173b1c60b3d
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/mmap.c
@@ -0,0 +1,192 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
+#include <linux/errno.h>
+#include <asm/pgtable.h>
+
+#include "verbs.h"
+
+/**
+ * hfi1_release_mmap_info - free mmap info structure
+ * @ref: a pointer to the kref within struct hfi1_mmap_info
+ */
+void hfi1_release_mmap_info(struct kref *ref)
+{
+	struct hfi1_mmap_info *ip =
+		container_of(ref, struct hfi1_mmap_info, ref);
+	struct hfi1_ibdev *dev = to_idev(ip->context->device);
+
+	spin_lock_irq(&dev->pending_lock);
+	list_del(&ip->pending_mmaps);
+	spin_unlock_irq(&dev->pending_lock);
+
+	vfree(ip->obj);
+	kfree(ip);
+}
+
+/*
+ * open and close keep track of how many times the CQ is mapped,
+ * to avoid releasing it.
+ */
+static void hfi1_vma_open(struct vm_area_struct *vma)
+{
+	struct hfi1_mmap_info *ip = vma->vm_private_data;
+
+	kref_get(&ip->ref);
+}
+
+static void hfi1_vma_close(struct vm_area_struct *vma)
+{
+	struct hfi1_mmap_info *ip = vma->vm_private_data;
+
+	kref_put(&ip->ref, hfi1_release_mmap_info);
+}
+
+static struct vm_operations_struct hfi1_vm_ops = {
+	.open =     hfi1_vma_open,
+	.close =    hfi1_vma_close,
+};
+
+/**
+ * hfi1_mmap - create a new mmap region
+ * @context: the IB user context of the process making the mmap() call
+ * @vma: the VMA to be initialized
+ * Return zero if the mmap is OK. Otherwise, return an errno.
+ */
+int hfi1_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
+{
+	struct hfi1_ibdev *dev = to_idev(context->device);
+	unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
+	unsigned long size = vma->vm_end - vma->vm_start;
+	struct hfi1_mmap_info *ip, *pp;
+	int ret = -EINVAL;
+
+	/*
+	 * Search the device's list of objects waiting for a mmap call.
+	 * Normally, this list is very short since a call to create a
+	 * CQ, QP, or SRQ is soon followed by a call to mmap().
+	 */
+	spin_lock_irq(&dev->pending_lock);
+	list_for_each_entry_safe(ip, pp, &dev->pending_mmaps,
+				 pending_mmaps) {
+		/* Only the creator is allowed to mmap the object */
+		if (context != ip->context || (__u64) offset != ip->offset)
+			continue;
+		/* Don't allow a mmap larger than the object. */
+		if (size > ip->size)
+			break;
+
+		list_del_init(&ip->pending_mmaps);
+		spin_unlock_irq(&dev->pending_lock);
+
+		ret = remap_vmalloc_range(vma, ip->obj, 0);
+		if (ret)
+			goto done;
+		vma->vm_ops = &hfi1_vm_ops;
+		vma->vm_private_data = ip;
+		hfi1_vma_open(vma);
+		goto done;
+	}
+	spin_unlock_irq(&dev->pending_lock);
+done:
+	return ret;
+}
+
+/*
+ * Allocate information for hfi1_mmap
+ */
+struct hfi1_mmap_info *hfi1_create_mmap_info(struct hfi1_ibdev *dev,
+					     u32 size,
+					     struct ib_ucontext *context,
+					     void *obj) {
+	struct hfi1_mmap_info *ip;
+
+	ip = kmalloc(sizeof(*ip), GFP_KERNEL);
+	if (!ip)
+		goto bail;
+
+	size = PAGE_ALIGN(size);
+
+	spin_lock_irq(&dev->mmap_offset_lock);
+	if (dev->mmap_offset == 0)
+		dev->mmap_offset = PAGE_SIZE;
+	ip->offset = dev->mmap_offset;
+	dev->mmap_offset += size;
+	spin_unlock_irq(&dev->mmap_offset_lock);
+
+	INIT_LIST_HEAD(&ip->pending_mmaps);
+	ip->size = size;
+	ip->context = context;
+	ip->obj = obj;
+	kref_init(&ip->ref);
+
+bail:
+	return ip;
+}
+
+void hfi1_update_mmap_info(struct hfi1_ibdev *dev, struct hfi1_mmap_info *ip,
+			   u32 size, void *obj)
+{
+	size = PAGE_ALIGN(size);
+
+	spin_lock_irq(&dev->mmap_offset_lock);
+	if (dev->mmap_offset == 0)
+		dev->mmap_offset = PAGE_SIZE;
+	ip->offset = dev->mmap_offset;
+	dev->mmap_offset += size;
+	spin_unlock_irq(&dev->mmap_offset_lock);
+
+	ip->size = size;
+	ip->obj = obj;
+}
diff --git a/drivers/staging/rdma/hfi1/mr.c b/drivers/staging/rdma/hfi1/mr.c
new file mode 100644
index 000000000000..23567f83c872
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/mr.c
@@ -0,0 +1,546 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <rdma/ib_umem.h>
+#include <rdma/ib_smi.h>
+
+#include "hfi.h"
+
+/* Fast memory region */
+struct hfi1_fmr {
+	struct ib_fmr ibfmr;
+	struct hfi1_mregion mr;        /* must be last */
+};
+
+static inline struct hfi1_fmr *to_ifmr(struct ib_fmr *ibfmr)
+{
+	return container_of(ibfmr, struct hfi1_fmr, ibfmr);
+}
+
+static int init_mregion(struct hfi1_mregion *mr, struct ib_pd *pd,
+			int count)
+{
+	int m, i = 0;
+	int rval = 0;
+
+	m = (count + HFI1_SEGSZ - 1) / HFI1_SEGSZ;
+	for (; i < m; i++) {
+		mr->map[i] = kzalloc(sizeof(*mr->map[0]), GFP_KERNEL);
+		if (!mr->map[i])
+			goto bail;
+	}
+	mr->mapsz = m;
+	init_completion(&mr->comp);
+	/* count returning the ptr to user */
+	atomic_set(&mr->refcount, 1);
+	mr->pd = pd;
+	mr->max_segs = count;
+out:
+	return rval;
+bail:
+	while (i)
+		kfree(mr->map[--i]);
+	rval = -ENOMEM;
+	goto out;
+}
+
+static void deinit_mregion(struct hfi1_mregion *mr)
+{
+	int i = mr->mapsz;
+
+	mr->mapsz = 0;
+	while (i)
+		kfree(mr->map[--i]);
+}
+
+
+/**
+ * hfi1_get_dma_mr - get a DMA memory region
+ * @pd: protection domain for this memory region
+ * @acc: access flags
+ *
+ * Returns the memory region on success, otherwise returns an errno.
+ * Note that all DMA addresses should be created via the
+ * struct ib_dma_mapping_ops functions (see dma.c).
+ */
+struct ib_mr *hfi1_get_dma_mr(struct ib_pd *pd, int acc)
+{
+	struct hfi1_mr *mr = NULL;
+	struct ib_mr *ret;
+	int rval;
+
+	if (to_ipd(pd)->user) {
+		ret = ERR_PTR(-EPERM);
+		goto bail;
+	}
+
+	mr = kzalloc(sizeof(*mr), GFP_KERNEL);
+	if (!mr) {
+		ret = ERR_PTR(-ENOMEM);
+		goto bail;
+	}
+
+	rval = init_mregion(&mr->mr, pd, 0);
+	if (rval) {
+		ret = ERR_PTR(rval);
+		goto bail;
+	}
+
+
+	rval = hfi1_alloc_lkey(&mr->mr, 1);
+	if (rval) {
+		ret = ERR_PTR(rval);
+		goto bail_mregion;
+	}
+
+	mr->mr.access_flags = acc;
+	ret = &mr->ibmr;
+done:
+	return ret;
+
+bail_mregion:
+	deinit_mregion(&mr->mr);
+bail:
+	kfree(mr);
+	goto done;
+}
+
+static struct hfi1_mr *alloc_mr(int count, struct ib_pd *pd)
+{
+	struct hfi1_mr *mr;
+	int rval = -ENOMEM;
+	int m;
+
+	/* Allocate struct plus pointers to first level page tables. */
+	m = (count + HFI1_SEGSZ - 1) / HFI1_SEGSZ;
+	mr = kzalloc(sizeof(*mr) + m * sizeof(mr->mr.map[0]), GFP_KERNEL);
+	if (!mr)
+		goto bail;
+
+	rval = init_mregion(&mr->mr, pd, count);
+	if (rval)
+		goto bail;
+	/*
+	 * ib_reg_phys_mr() will initialize mr->ibmr except for
+	 * lkey and rkey.
+	 */
+	rval = hfi1_alloc_lkey(&mr->mr, 0);
+	if (rval)
+		goto bail_mregion;
+	mr->ibmr.lkey = mr->mr.lkey;
+	mr->ibmr.rkey = mr->mr.lkey;
+done:
+	return mr;
+
+bail_mregion:
+	deinit_mregion(&mr->mr);
+bail:
+	kfree(mr);
+	mr = ERR_PTR(rval);
+	goto done;
+}
+
+/**
+ * hfi1_reg_phys_mr - register a physical memory region
+ * @pd: protection domain for this memory region
+ * @buffer_list: pointer to the list of physical buffers to register
+ * @num_phys_buf: the number of physical buffers to register
+ * @iova_start: the starting address passed over IB which maps to this MR
+ *
+ * Returns the memory region on success, otherwise returns an errno.
+ */
+struct ib_mr *hfi1_reg_phys_mr(struct ib_pd *pd,
+			       struct ib_phys_buf *buffer_list,
+			       int num_phys_buf, int acc, u64 *iova_start)
+{
+	struct hfi1_mr *mr;
+	int n, m, i;
+	struct ib_mr *ret;
+
+	mr = alloc_mr(num_phys_buf, pd);
+	if (IS_ERR(mr)) {
+		ret = (struct ib_mr *)mr;
+		goto bail;
+	}
+
+	mr->mr.user_base = *iova_start;
+	mr->mr.iova = *iova_start;
+	mr->mr.access_flags = acc;
+
+	m = 0;
+	n = 0;
+	for (i = 0; i < num_phys_buf; i++) {
+		mr->mr.map[m]->segs[n].vaddr = (void *) buffer_list[i].addr;
+		mr->mr.map[m]->segs[n].length = buffer_list[i].size;
+		mr->mr.length += buffer_list[i].size;
+		n++;
+		if (n == HFI1_SEGSZ) {
+			m++;
+			n = 0;
+		}
+	}
+
+	ret = &mr->ibmr;
+
+bail:
+	return ret;
+}
+
+/**
+ * hfi1_reg_user_mr - register a userspace memory region
+ * @pd: protection domain for this memory region
+ * @start: starting userspace address
+ * @length: length of region to register
+ * @mr_access_flags: access flags for this memory region
+ * @udata: unused by the driver
+ *
+ * Returns the memory region on success, otherwise returns an errno.
+ */
+struct ib_mr *hfi1_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
+			       u64 virt_addr, int mr_access_flags,
+			       struct ib_udata *udata)
+{
+	struct hfi1_mr *mr;
+	struct ib_umem *umem;
+	struct scatterlist *sg;
+	int n, m, entry;
+	struct ib_mr *ret;
+
+	if (length == 0) {
+		ret = ERR_PTR(-EINVAL);
+		goto bail;
+	}
+
+	umem = ib_umem_get(pd->uobject->context, start, length,
+			   mr_access_flags, 0);
+	if (IS_ERR(umem))
+		return (void *) umem;
+
+	n = umem->nmap;
+
+	mr = alloc_mr(n, pd);
+	if (IS_ERR(mr)) {
+		ret = (struct ib_mr *)mr;
+		ib_umem_release(umem);
+		goto bail;
+	}
+
+	mr->mr.user_base = start;
+	mr->mr.iova = virt_addr;
+	mr->mr.length = length;
+	mr->mr.offset = ib_umem_offset(umem);
+	mr->mr.access_flags = mr_access_flags;
+	mr->umem = umem;
+
+	if (is_power_of_2(umem->page_size))
+		mr->mr.page_shift = ilog2(umem->page_size);
+	m = 0;
+	n = 0;
+	for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
+			void *vaddr;
+
+			vaddr = page_address(sg_page(sg));
+			if (!vaddr) {
+				ret = ERR_PTR(-EINVAL);
+				goto bail;
+			}
+			mr->mr.map[m]->segs[n].vaddr = vaddr;
+			mr->mr.map[m]->segs[n].length = umem->page_size;
+			n++;
+			if (n == HFI1_SEGSZ) {
+				m++;
+				n = 0;
+			}
+	}
+	ret = &mr->ibmr;
+
+bail:
+	return ret;
+}
+
+/**
+ * hfi1_dereg_mr - unregister and free a memory region
+ * @ibmr: the memory region to free
+ *
+ * Returns 0 on success.
+ *
+ * Note that this is called to free MRs created by hfi1_get_dma_mr()
+ * or hfi1_reg_user_mr().
+ */
+int hfi1_dereg_mr(struct ib_mr *ibmr)
+{
+	struct hfi1_mr *mr = to_imr(ibmr);
+	int ret = 0;
+	unsigned long timeout;
+
+	hfi1_free_lkey(&mr->mr);
+
+	hfi1_put_mr(&mr->mr); /* will set completion if last */
+	timeout = wait_for_completion_timeout(&mr->mr.comp,
+		5 * HZ);
+	if (!timeout) {
+		dd_dev_err(
+			dd_from_ibdev(mr->mr.pd->device),
+			"hfi1_dereg_mr timeout mr %p pd %p refcount %u\n",
+			mr, mr->mr.pd, atomic_read(&mr->mr.refcount));
+		hfi1_get_mr(&mr->mr);
+		ret = -EBUSY;
+		goto out;
+	}
+	deinit_mregion(&mr->mr);
+	if (mr->umem)
+		ib_umem_release(mr->umem);
+	kfree(mr);
+out:
+	return ret;
+}
+
+/*
+ * Allocate a memory region usable with the
+ * IB_WR_FAST_REG_MR send work request.
+ *
+ * Return the memory region on success, otherwise return an errno.
+ */
+struct ib_mr *hfi1_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
+{
+	struct hfi1_mr *mr;
+
+	mr = alloc_mr(max_page_list_len, pd);
+	if (IS_ERR(mr))
+		return (struct ib_mr *)mr;
+
+	return &mr->ibmr;
+}
+
+struct ib_fast_reg_page_list *
+hfi1_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len)
+{
+	unsigned size = page_list_len * sizeof(u64);
+	struct ib_fast_reg_page_list *pl;
+
+	if (size > PAGE_SIZE)
+		return ERR_PTR(-EINVAL);
+
+	pl = kzalloc(sizeof(*pl), GFP_KERNEL);
+	if (!pl)
+		return ERR_PTR(-ENOMEM);
+
+	pl->page_list = kzalloc(size, GFP_KERNEL);
+	if (!pl->page_list)
+		goto err_free;
+
+	return pl;
+
+err_free:
+	kfree(pl);
+	return ERR_PTR(-ENOMEM);
+}
+
+void hfi1_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl)
+{
+	kfree(pl->page_list);
+	kfree(pl);
+}
+
+/**
+ * hfi1_alloc_fmr - allocate a fast memory region
+ * @pd: the protection domain for this memory region
+ * @mr_access_flags: access flags for this memory region
+ * @fmr_attr: fast memory region attributes
+ *
+ * Returns the memory region on success, otherwise returns an errno.
+ */
+struct ib_fmr *hfi1_alloc_fmr(struct ib_pd *pd, int mr_access_flags,
+			      struct ib_fmr_attr *fmr_attr)
+{
+	struct hfi1_fmr *fmr;
+	int m;
+	struct ib_fmr *ret;
+	int rval = -ENOMEM;
+
+	/* Allocate struct plus pointers to first level page tables. */
+	m = (fmr_attr->max_pages + HFI1_SEGSZ - 1) / HFI1_SEGSZ;
+	fmr = kzalloc(sizeof(*fmr) + m * sizeof(fmr->mr.map[0]), GFP_KERNEL);
+	if (!fmr)
+		goto bail;
+
+	rval = init_mregion(&fmr->mr, pd, fmr_attr->max_pages);
+	if (rval)
+		goto bail;
+
+	/*
+	 * ib_alloc_fmr() will initialize fmr->ibfmr except for lkey &
+	 * rkey.
+	 */
+	rval = hfi1_alloc_lkey(&fmr->mr, 0);
+	if (rval)
+		goto bail_mregion;
+	fmr->ibfmr.rkey = fmr->mr.lkey;
+	fmr->ibfmr.lkey = fmr->mr.lkey;
+	/*
+	 * Resources are allocated but no valid mapping (RKEY can't be
+	 * used).
+	 */
+	fmr->mr.access_flags = mr_access_flags;
+	fmr->mr.max_segs = fmr_attr->max_pages;
+	fmr->mr.page_shift = fmr_attr->page_shift;
+
+	ret = &fmr->ibfmr;
+done:
+	return ret;
+
+bail_mregion:
+	deinit_mregion(&fmr->mr);
+bail:
+	kfree(fmr);
+	ret = ERR_PTR(rval);
+	goto done;
+}
+
+/**
+ * hfi1_map_phys_fmr - set up a fast memory region
+ * @ibmfr: the fast memory region to set up
+ * @page_list: the list of pages to associate with the fast memory region
+ * @list_len: the number of pages to associate with the fast memory region
+ * @iova: the virtual address of the start of the fast memory region
+ *
+ * This may be called from interrupt context.
+ */
+
+int hfi1_map_phys_fmr(struct ib_fmr *ibfmr, u64 *page_list,
+		      int list_len, u64 iova)
+{
+	struct hfi1_fmr *fmr = to_ifmr(ibfmr);
+	struct hfi1_lkey_table *rkt;
+	unsigned long flags;
+	int m, n, i;
+	u32 ps;
+	int ret;
+
+	i = atomic_read(&fmr->mr.refcount);
+	if (i > 2)
+		return -EBUSY;
+
+	if (list_len > fmr->mr.max_segs) {
+		ret = -EINVAL;
+		goto bail;
+	}
+	rkt = &to_idev(ibfmr->device)->lk_table;
+	spin_lock_irqsave(&rkt->lock, flags);
+	fmr->mr.user_base = iova;
+	fmr->mr.iova = iova;
+	ps = 1 << fmr->mr.page_shift;
+	fmr->mr.length = list_len * ps;
+	m = 0;
+	n = 0;
+	for (i = 0; i < list_len; i++) {
+		fmr->mr.map[m]->segs[n].vaddr = (void *) page_list[i];
+		fmr->mr.map[m]->segs[n].length = ps;
+		if (++n == HFI1_SEGSZ) {
+			m++;
+			n = 0;
+		}
+	}
+	spin_unlock_irqrestore(&rkt->lock, flags);
+	ret = 0;
+
+bail:
+	return ret;
+}
+
+/**
+ * hfi1_unmap_fmr - unmap fast memory regions
+ * @fmr_list: the list of fast memory regions to unmap
+ *
+ * Returns 0 on success.
+ */
+int hfi1_unmap_fmr(struct list_head *fmr_list)
+{
+	struct hfi1_fmr *fmr;
+	struct hfi1_lkey_table *rkt;
+	unsigned long flags;
+
+	list_for_each_entry(fmr, fmr_list, ibfmr.list) {
+		rkt = &to_idev(fmr->ibfmr.device)->lk_table;
+		spin_lock_irqsave(&rkt->lock, flags);
+		fmr->mr.user_base = 0;
+		fmr->mr.iova = 0;
+		fmr->mr.length = 0;
+		spin_unlock_irqrestore(&rkt->lock, flags);
+	}
+	return 0;
+}
+
+/**
+ * hfi1_dealloc_fmr - deallocate a fast memory region
+ * @ibfmr: the fast memory region to deallocate
+ *
+ * Returns 0 on success.
+ */
+int hfi1_dealloc_fmr(struct ib_fmr *ibfmr)
+{
+	struct hfi1_fmr *fmr = to_ifmr(ibfmr);
+	int ret = 0;
+	unsigned long timeout;
+
+	hfi1_free_lkey(&fmr->mr);
+	hfi1_put_mr(&fmr->mr); /* will set completion if last */
+	timeout = wait_for_completion_timeout(&fmr->mr.comp,
+		5 * HZ);
+	if (!timeout) {
+		hfi1_get_mr(&fmr->mr);
+		ret = -EBUSY;
+		goto out;
+	}
+	deinit_mregion(&fmr->mr);
+	kfree(fmr);
+out:
+	return ret;
+}
diff --git a/drivers/staging/rdma/hfi1/opa_compat.h b/drivers/staging/rdma/hfi1/opa_compat.h
new file mode 100644
index 000000000000..f64eec1c2951
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/opa_compat.h
@@ -0,0 +1,129 @@
+#ifndef _LINUX_H
+#define _LINUX_H
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+/*
+ * This header file is for OPA-specific definitions which are
+ * required by the HFI driver, and which aren't yet in the Linux
+ * IB core. We'll collect these all here, then merge them into
+ * the kernel when that's convenient.
+ */
+
+/* OPA SMA attribute IDs */
+#define OPA_ATTRIB_ID_CONGESTION_INFO		cpu_to_be16(0x008b)
+#define OPA_ATTRIB_ID_HFI_CONGESTION_LOG	cpu_to_be16(0x008f)
+#define OPA_ATTRIB_ID_HFI_CONGESTION_SETTING	cpu_to_be16(0x0090)
+#define OPA_ATTRIB_ID_CONGESTION_CONTROL_TABLE	cpu_to_be16(0x0091)
+
+/* OPA PMA attribute IDs */
+#define OPA_PM_ATTRIB_ID_PORT_STATUS		cpu_to_be16(0x0040)
+#define OPA_PM_ATTRIB_ID_CLEAR_PORT_STATUS	cpu_to_be16(0x0041)
+#define OPA_PM_ATTRIB_ID_DATA_PORT_COUNTERS	cpu_to_be16(0x0042)
+#define OPA_PM_ATTRIB_ID_ERROR_PORT_COUNTERS	cpu_to_be16(0x0043)
+#define OPA_PM_ATTRIB_ID_ERROR_INFO		cpu_to_be16(0x0044)
+
+/* OPA status codes */
+#define OPA_PM_STATUS_REQUEST_TOO_LARGE		cpu_to_be16(0x100)
+
+static inline u8 port_states_to_logical_state(struct opa_port_states *ps)
+{
+	return ps->portphysstate_portstate & OPA_PI_MASK_PORT_STATE;
+}
+
+static inline u8 port_states_to_phys_state(struct opa_port_states *ps)
+{
+	return ((ps->portphysstate_portstate &
+		  OPA_PI_MASK_PORT_PHYSICAL_STATE) >> 4) & 0xf;
+}
+
+/*
+ * OPA port physical states
+ * IB Volume 1, Table 146 PortInfo/IB Volume 2 Section 5.4.2(1) PortPhysState
+ * values.
+ *
+ * When writing, only values 0-3 are valid, other values are ignored.
+ * When reading, 0 is reserved.
+ *
+ * Returned by the ibphys_portstate() routine.
+ */
+enum opa_port_phys_state {
+	IB_PORTPHYSSTATE_NOP = 0,
+	/* 1 is reserved */
+	IB_PORTPHYSSTATE_POLLING = 2,
+	IB_PORTPHYSSTATE_DISABLED = 3,
+	IB_PORTPHYSSTATE_TRAINING = 4,
+	IB_PORTPHYSSTATE_LINKUP = 5,
+	IB_PORTPHYSSTATE_LINK_ERROR_RECOVERY = 6,
+	IB_PORTPHYSSTATE_PHY_TEST = 7,
+	/* 8 is reserved */
+	OPA_PORTPHYSSTATE_OFFLINE = 9,
+	OPA_PORTPHYSSTATE_GANGED = 10,
+	OPA_PORTPHYSSTATE_TEST = 11,
+	OPA_PORTPHYSSTATE_MAX = 11,
+	/* values 12-15 are reserved/ignored */
+};
+
+/* OPA_PORT_TYPE_* definitions - these belong in opa_port_info.h */
+#define OPA_PORT_TYPE_UNKNOWN          0
+#define OPA_PORT_TYPE_DISCONNECTED     1
+/* port is not currently usable, CableInfo not available */
+#define OPA_PORT_TYPE_FIXED            2
+/* A fixed backplane port in a director class switch. All OPA ASICS */
+#define OPA_PORT_TYPE_VARIABLE         3
+/* A backplane port in a blade system, possibly mixed configuration */
+#define OPA_PORT_TYPE_STANDARD         4
+/* implies a SFF-8636 defined format for CableInfo (QSFP) */
+#define OPA_PORT_TYPE_SI_PHOTONICS      5
+/* A silicon photonics module implies TBD defined format for CableInfo
+ * as defined by Intel SFO group */
+/* 6 - 15 are reserved */
+
+#endif /* _LINUX_H */
diff --git a/drivers/staging/rdma/hfi1/pcie.c b/drivers/staging/rdma/hfi1/pcie.c
new file mode 100644
index 000000000000..ac5653c0f65e
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/pcie.c
@@ -0,0 +1,1253 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/pci.h>
+#include <linux/io.h>
+#include <linux/delay.h>
+#include <linux/vmalloc.h>
+#include <linux/aer.h>
+#include <linux/module.h>
+
+#include "hfi.h"
+#include "chip_registers.h"
+
+/* link speed vector for Gen3 speed - not in Linux headers */
+#define GEN1_SPEED_VECTOR 0x1
+#define GEN2_SPEED_VECTOR 0x2
+#define GEN3_SPEED_VECTOR 0x3
+
+/*
+ * This file contains PCIe utility routines.
+ */
+
+/*
+ * Code to adjust PCIe capabilities.
+ */
+static void tune_pcie_caps(struct hfi1_devdata *);
+
+/*
+ * Do all the common PCIe setup and initialization.
+ * devdata is not yet allocated, and is not allocated until after this
+ * routine returns success.  Therefore dd_dev_err() can't be used for error
+ * printing.
+ */
+int hfi1_pcie_init(struct pci_dev *pdev, const struct pci_device_id *ent)
+{
+	int ret;
+
+	ret = pci_enable_device(pdev);
+	if (ret) {
+		/*
+		 * This can happen (in theory) iff:
+		 * We did a chip reset, and then failed to reprogram the
+		 * BAR, or the chip reset due to an internal error.  We then
+		 * unloaded the driver and reloaded it.
+		 *
+		 * Both reset cases set the BAR back to initial state.  For
+		 * the latter case, the AER sticky error bit at offset 0x718
+		 * should be set, but the Linux kernel doesn't yet know
+		 * about that, it appears.  If the original BAR was retained
+		 * in the kernel data structures, this may be OK.
+		 */
+		hfi1_early_err(&pdev->dev, "pci enable failed: error %d\n",
+			       -ret);
+		goto done;
+	}
+
+	ret = pci_request_regions(pdev, DRIVER_NAME);
+	if (ret) {
+		hfi1_early_err(&pdev->dev,
+			       "pci_request_regions fails: err %d\n", -ret);
+		goto bail;
+	}
+
+	ret = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (ret) {
+		/*
+		 * If the 64 bit setup fails, try 32 bit.  Some systems
+		 * do not setup 64 bit maps on systems with 2GB or less
+		 * memory installed.
+		 */
+		ret = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+		if (ret) {
+			hfi1_early_err(&pdev->dev,
+				       "Unable to set DMA mask: %d\n", ret);
+			goto bail;
+		}
+		ret = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
+	} else
+		ret = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (ret) {
+		hfi1_early_err(&pdev->dev,
+			       "Unable to set DMA consistent mask: %d\n", ret);
+		goto bail;
+	}
+
+	pci_set_master(pdev);
+	ret = pci_enable_pcie_error_reporting(pdev);
+	if (ret) {
+		hfi1_early_err(&pdev->dev,
+			       "Unable to enable pcie error reporting: %d\n",
+			      ret);
+		ret = 0;
+	}
+	goto done;
+
+bail:
+	hfi1_pcie_cleanup(pdev);
+done:
+	return ret;
+}
+
+/*
+ * Clean what was done in hfi1_pcie_init()
+ */
+void hfi1_pcie_cleanup(struct pci_dev *pdev)
+{
+	pci_disable_device(pdev);
+	/*
+	 * Release regions should be called after the disable. OK to
+	 * call if request regions has not been called or failed.
+	 */
+	pci_release_regions(pdev);
+}
+
+/*
+ * Do remaining PCIe setup, once dd is allocated, and save away
+ * fields required to re-initialize after a chip reset, or for
+ * various other purposes
+ */
+int hfi1_pcie_ddinit(struct hfi1_devdata *dd, struct pci_dev *pdev,
+		     const struct pci_device_id *ent)
+{
+	unsigned long len;
+	resource_size_t addr;
+
+	dd->pcidev = pdev;
+	pci_set_drvdata(pdev, dd);
+
+	addr = pci_resource_start(pdev, 0);
+	len = pci_resource_len(pdev, 0);
+
+	/*
+	 * The TXE PIO buffers are at the tail end of the chip space.
+	 * Cut them off and map them separately.
+	 */
+
+	/* sanity check vs expectations */
+	if (len != TXE_PIO_SEND + TXE_PIO_SIZE) {
+		dd_dev_err(dd, "chip PIO range does not match\n");
+		return -EINVAL;
+	}
+
+	dd->kregbase = ioremap_nocache(addr, TXE_PIO_SEND);
+	if (!dd->kregbase)
+		return -ENOMEM;
+
+	dd->piobase = ioremap_wc(addr + TXE_PIO_SEND, TXE_PIO_SIZE);
+	if (!dd->piobase) {
+		iounmap(dd->kregbase);
+		return -ENOMEM;
+	}
+
+	dd->flags |= HFI1_PRESENT;	/* now register routines work */
+
+	dd->kregend = dd->kregbase + TXE_PIO_SEND;
+	dd->physaddr = addr;        /* used for io_remap, etc. */
+
+	/*
+	 * Re-map the chip's RcvArray as write-combining to allow us
+	 * to write an entire cacheline worth of entries in one shot.
+	 * If this re-map fails, just continue - the RcvArray programming
+	 * function will handle both cases.
+	 */
+	dd->chip_rcv_array_count = read_csr(dd, RCV_ARRAY_CNT);
+	dd->rcvarray_wc = ioremap_wc(addr + RCV_ARRAY,
+				     dd->chip_rcv_array_count * 8);
+	dd_dev_info(dd, "WC Remapped RcvArray: %p\n", dd->rcvarray_wc);
+	/*
+	 * Save BARs and command to rewrite after device reset.
+	 */
+	dd->pcibar0 = addr;
+	dd->pcibar1 = addr >> 32;
+	pci_read_config_dword(dd->pcidev, PCI_ROM_ADDRESS, &dd->pci_rom);
+	pci_read_config_word(dd->pcidev, PCI_COMMAND, &dd->pci_command);
+	pcie_capability_read_word(dd->pcidev, PCI_EXP_DEVCTL, &dd->pcie_devctl);
+	pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKCTL, &dd->pcie_lnkctl);
+	pcie_capability_read_word(dd->pcidev, PCI_EXP_DEVCTL2,
+							&dd->pcie_devctl2);
+	pci_read_config_dword(dd->pcidev, PCI_CFG_MSIX0, &dd->pci_msix0);
+	pci_read_config_dword(dd->pcidev, PCIE_CFG_SPCIE1,
+							&dd->pci_lnkctl3);
+	pci_read_config_dword(dd->pcidev, PCIE_CFG_TPH2, &dd->pci_tph2);
+
+	return 0;
+}
+
+/*
+ * Do PCIe cleanup related to dd, after chip-specific cleanup, etc.  Just prior
+ * to releasing the dd memory.
+ * Void because all of the core pcie cleanup functions are void.
+ */
+void hfi1_pcie_ddcleanup(struct hfi1_devdata *dd)
+{
+	u64 __iomem *base = (void __iomem *) dd->kregbase;
+
+	dd->flags &= ~HFI1_PRESENT;
+	dd->kregbase = NULL;
+	iounmap(base);
+	if (dd->rcvarray_wc)
+		iounmap(dd->rcvarray_wc);
+	if (dd->piobase)
+		iounmap(dd->piobase);
+
+	pci_set_drvdata(dd->pcidev, NULL);
+}
+
+/*
+ * Do a Function Level Reset (FLR) on the device.
+ * Based on static function drivers/pci/pci.c:pcie_flr().
+ */
+void hfi1_pcie_flr(struct hfi1_devdata *dd)
+{
+	int i;
+	u16 status;
+
+	/* no need to check for the capability - we know the device has it */
+
+	/* wait for Transaction Pending bit to clear, at most a few ms */
+	for (i = 0; i < 4; i++) {
+		if (i)
+			msleep((1 << (i - 1)) * 100);
+
+		pcie_capability_read_word(dd->pcidev, PCI_EXP_DEVSTA, &status);
+		if (!(status & PCI_EXP_DEVSTA_TRPND))
+			goto clear;
+	}
+
+	dd_dev_err(dd, "Transaction Pending bit is not clearing, proceeding with reset anyway\n");
+
+clear:
+	pcie_capability_set_word(dd->pcidev, PCI_EXP_DEVCTL,
+						PCI_EXP_DEVCTL_BCR_FLR);
+	/* PCIe spec requires the function to be back within 100ms */
+	msleep(100);
+}
+
+static void msix_setup(struct hfi1_devdata *dd, int pos, u32 *msixcnt,
+		       struct hfi1_msix_entry *hfi1_msix_entry)
+{
+	int ret;
+	int nvec = *msixcnt;
+	struct msix_entry *msix_entry;
+	int i;
+
+	/* We can't pass hfi1_msix_entry array to msix_setup
+	 * so use a dummy msix_entry array and copy the allocated
+	 * irq back to the hfi1_msix_entry array. */
+	msix_entry = kmalloc_array(nvec, sizeof(*msix_entry), GFP_KERNEL);
+	if (!msix_entry) {
+		ret = -ENOMEM;
+		goto do_intx;
+	}
+
+	for (i = 0; i < nvec; i++)
+		msix_entry[i] = hfi1_msix_entry[i].msix;
+
+	ret = pci_enable_msix_range(dd->pcidev, msix_entry, 1, nvec);
+	if (ret < 0)
+		goto free_msix_entry;
+	nvec = ret;
+
+	for (i = 0; i < nvec; i++)
+		hfi1_msix_entry[i].msix = msix_entry[i];
+
+	kfree(msix_entry);
+	*msixcnt = nvec;
+	return;
+
+free_msix_entry:
+	kfree(msix_entry);
+
+do_intx:
+	dd_dev_err(dd, "pci_enable_msix_range %d vectors failed: %d, falling back to INTx\n",
+		   nvec, ret);
+	*msixcnt = 0;
+	hfi1_enable_intx(dd->pcidev);
+
+}
+
+/* return the PCIe link speed from the given link status */
+static u32 extract_speed(u16 linkstat)
+{
+	u32 speed;
+
+	switch (linkstat & PCI_EXP_LNKSTA_CLS) {
+	default: /* not defined, assume Gen1 */
+	case PCI_EXP_LNKSTA_CLS_2_5GB:
+		speed = 2500; /* Gen 1, 2.5GHz */
+		break;
+	case PCI_EXP_LNKSTA_CLS_5_0GB:
+		speed = 5000; /* Gen 2, 5GHz */
+		break;
+	case GEN3_SPEED_VECTOR:
+		speed = 8000; /* Gen 3, 8GHz */
+		break;
+	}
+	return speed;
+}
+
+/* return the PCIe link speed from the given link status */
+static u32 extract_width(u16 linkstat)
+{
+	return (linkstat & PCI_EXP_LNKSTA_NLW) >> PCI_EXP_LNKSTA_NLW_SHIFT;
+}
+
+/* read the link status and set dd->{lbus_width,lbus_speed,lbus_info} */
+static void update_lbus_info(struct hfi1_devdata *dd)
+{
+	u16 linkstat;
+
+	pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKSTA, &linkstat);
+	dd->lbus_width = extract_width(linkstat);
+	dd->lbus_speed = extract_speed(linkstat);
+	snprintf(dd->lbus_info, sizeof(dd->lbus_info),
+		 "PCIe,%uMHz,x%u", dd->lbus_speed, dd->lbus_width);
+}
+
+/*
+ * Read in the current PCIe link width and speed.  Find if the link is
+ * Gen3 capable.
+ */
+int pcie_speeds(struct hfi1_devdata *dd)
+{
+	u32 linkcap;
+
+	if (!pci_is_pcie(dd->pcidev)) {
+		dd_dev_err(dd, "Can't find PCI Express capability!\n");
+		return -EINVAL;
+	}
+
+	/* find if our max speed is Gen3 and parent supports Gen3 speeds */
+	dd->link_gen3_capable = 1;
+
+	pcie_capability_read_dword(dd->pcidev, PCI_EXP_LNKCAP, &linkcap);
+	if ((linkcap & PCI_EXP_LNKCAP_SLS) != GEN3_SPEED_VECTOR) {
+		dd_dev_info(dd,
+			"This HFI is not Gen3 capable, max speed 0x%x, need 0x3\n",
+			linkcap & PCI_EXP_LNKCAP_SLS);
+		dd->link_gen3_capable = 0;
+	}
+
+	/*
+	 * bus->max_bus_speed is set from the bridge's linkcap Max Link Speed
+	 */
+	if (dd->pcidev->bus->max_bus_speed != PCIE_SPEED_8_0GT) {
+		dd_dev_info(dd, "Parent PCIe bridge does not support Gen3\n");
+		dd->link_gen3_capable = 0;
+	}
+
+	/* obtain the link width and current speed */
+	update_lbus_info(dd);
+
+	/* check against expected pcie width and complain if "wrong" */
+	if (dd->lbus_width < 16)
+		dd_dev_err(dd, "PCIe width %u (x16 HFI)\n", dd->lbus_width);
+
+	return 0;
+}
+
+/*
+ * Returns in *nent:
+ *	- actual number of interrupts allocated
+ *	- 0 if fell back to INTx.
+ */
+void request_msix(struct hfi1_devdata *dd, u32 *nent,
+		  struct hfi1_msix_entry *entry)
+{
+	int pos;
+
+	pos = dd->pcidev->msix_cap;
+	if (*nent && pos) {
+		msix_setup(dd, pos, nent, entry);
+		/* did it, either MSI-X or INTx */
+	} else {
+		*nent = 0;
+		hfi1_enable_intx(dd->pcidev);
+	}
+
+	tune_pcie_caps(dd);
+}
+
+/*
+ * Disable MSI-X.
+ */
+void hfi1_nomsix(struct hfi1_devdata *dd)
+{
+	pci_disable_msix(dd->pcidev);
+}
+
+void hfi1_enable_intx(struct pci_dev *pdev)
+{
+	/* first, turn on INTx */
+	pci_intx(pdev, 1);
+	/* then turn off MSI-X */
+	pci_disable_msix(pdev);
+}
+
+/* restore command and BARs after a reset has wiped them out */
+void restore_pci_variables(struct hfi1_devdata *dd)
+{
+	pci_write_config_word(dd->pcidev, PCI_COMMAND, dd->pci_command);
+	pci_write_config_dword(dd->pcidev,
+				PCI_BASE_ADDRESS_0, dd->pcibar0);
+	pci_write_config_dword(dd->pcidev,
+				PCI_BASE_ADDRESS_1, dd->pcibar1);
+	pci_write_config_dword(dd->pcidev,
+				PCI_ROM_ADDRESS, dd->pci_rom);
+	pcie_capability_write_word(dd->pcidev, PCI_EXP_DEVCTL, dd->pcie_devctl);
+	pcie_capability_write_word(dd->pcidev, PCI_EXP_LNKCTL, dd->pcie_lnkctl);
+	pcie_capability_write_word(dd->pcidev, PCI_EXP_DEVCTL2,
+							dd->pcie_devctl2);
+	pci_write_config_dword(dd->pcidev, PCI_CFG_MSIX0, dd->pci_msix0);
+	pci_write_config_dword(dd->pcidev, PCIE_CFG_SPCIE1,
+							dd->pci_lnkctl3);
+	pci_write_config_dword(dd->pcidev, PCIE_CFG_TPH2, dd->pci_tph2);
+}
+
+
+/*
+ * BIOS may not set PCIe bus-utilization parameters for best performance.
+ * Check and optionally adjust them to maximize our throughput.
+ */
+static int hfi1_pcie_caps;
+module_param_named(pcie_caps, hfi1_pcie_caps, int, S_IRUGO);
+MODULE_PARM_DESC(pcie_caps, "Max PCIe tuning: Payload (0..3), ReadReq (4..7)");
+
+static void tune_pcie_caps(struct hfi1_devdata *dd)
+{
+	struct pci_dev *parent;
+	u16 rc_mpss, rc_mps, ep_mpss, ep_mps;
+	u16 rc_mrrs, ep_mrrs, max_mrrs;
+
+	/* Find out supported and configured values for parent (root) */
+	parent = dd->pcidev->bus->self;
+	if (!pci_is_root_bus(parent->bus)) {
+		dd_dev_info(dd, "Parent not root\n");
+		return;
+	}
+
+	if (!pci_is_pcie(parent) || !pci_is_pcie(dd->pcidev))
+		return;
+	rc_mpss = parent->pcie_mpss;
+	rc_mps = ffs(pcie_get_mps(parent)) - 8;
+	/* Find out supported and configured values for endpoint (us) */
+	ep_mpss = dd->pcidev->pcie_mpss;
+	ep_mps = ffs(pcie_get_mps(dd->pcidev)) - 8;
+
+	/* Find max payload supported by root, endpoint */
+	if (rc_mpss > ep_mpss)
+		rc_mpss = ep_mpss;
+
+	/* If Supported greater than limit in module param, limit it */
+	if (rc_mpss > (hfi1_pcie_caps & 7))
+		rc_mpss = hfi1_pcie_caps & 7;
+	/* If less than (allowed, supported), bump root payload */
+	if (rc_mpss > rc_mps) {
+		rc_mps = rc_mpss;
+		pcie_set_mps(parent, 128 << rc_mps);
+	}
+	/* If less than (allowed, supported), bump endpoint payload */
+	if (rc_mpss > ep_mps) {
+		ep_mps = rc_mpss;
+		pcie_set_mps(dd->pcidev, 128 << ep_mps);
+	}
+
+	/*
+	 * Now the Read Request size.
+	 * No field for max supported, but PCIe spec limits it to 4096,
+	 * which is code '5' (log2(4096) - 7)
+	 */
+	max_mrrs = 5;
+	if (max_mrrs > ((hfi1_pcie_caps >> 4) & 7))
+		max_mrrs = (hfi1_pcie_caps >> 4) & 7;
+
+	max_mrrs = 128 << max_mrrs;
+	rc_mrrs = pcie_get_readrq(parent);
+	ep_mrrs = pcie_get_readrq(dd->pcidev);
+
+	if (max_mrrs > rc_mrrs) {
+		rc_mrrs = max_mrrs;
+		pcie_set_readrq(parent, rc_mrrs);
+	}
+	if (max_mrrs > ep_mrrs) {
+		ep_mrrs = max_mrrs;
+		pcie_set_readrq(dd->pcidev, ep_mrrs);
+	}
+}
+/* End of PCIe capability tuning */
+
+/*
+ * From here through hfi1_pci_err_handler definition is invoked via
+ * PCI error infrastructure, registered via pci
+ */
+static pci_ers_result_t
+pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
+{
+	struct hfi1_devdata *dd = pci_get_drvdata(pdev);
+	pci_ers_result_t ret = PCI_ERS_RESULT_RECOVERED;
+
+	switch (state) {
+	case pci_channel_io_normal:
+		dd_dev_info(dd, "State Normal, ignoring\n");
+		break;
+
+	case pci_channel_io_frozen:
+		dd_dev_info(dd, "State Frozen, requesting reset\n");
+		pci_disable_device(pdev);
+		ret = PCI_ERS_RESULT_NEED_RESET;
+		break;
+
+	case pci_channel_io_perm_failure:
+		if (dd) {
+			dd_dev_info(dd, "State Permanent Failure, disabling\n");
+			/* no more register accesses! */
+			dd->flags &= ~HFI1_PRESENT;
+			hfi1_disable_after_error(dd);
+		}
+		 /* else early, or other problem */
+		ret =  PCI_ERS_RESULT_DISCONNECT;
+		break;
+
+	default: /* shouldn't happen */
+		dd_dev_info(dd, "HFI1 PCI errors detected (state %d)\n",
+			    state);
+		break;
+	}
+	return ret;
+}
+
+static pci_ers_result_t
+pci_mmio_enabled(struct pci_dev *pdev)
+{
+	u64 words = 0U;
+	struct hfi1_devdata *dd = pci_get_drvdata(pdev);
+	pci_ers_result_t ret = PCI_ERS_RESULT_RECOVERED;
+
+	if (dd && dd->pport) {
+		words = read_port_cntr(dd->pport, C_RX_WORDS, CNTR_INVALID_VL);
+		if (words == ~0ULL)
+			ret = PCI_ERS_RESULT_NEED_RESET;
+		dd_dev_info(dd,
+			    "HFI1 mmio_enabled function called, read wordscntr %Lx, returning %d\n",
+			    words, ret);
+	}
+	return  ret;
+}
+
+static pci_ers_result_t
+pci_slot_reset(struct pci_dev *pdev)
+{
+	struct hfi1_devdata *dd = pci_get_drvdata(pdev);
+
+	dd_dev_info(dd, "HFI1 slot_reset function called, ignored\n");
+	return PCI_ERS_RESULT_CAN_RECOVER;
+}
+
+static pci_ers_result_t
+pci_link_reset(struct pci_dev *pdev)
+{
+	struct hfi1_devdata *dd = pci_get_drvdata(pdev);
+
+	dd_dev_info(dd, "HFI1 link_reset function called, ignored\n");
+	return PCI_ERS_RESULT_CAN_RECOVER;
+}
+
+static void
+pci_resume(struct pci_dev *pdev)
+{
+	struct hfi1_devdata *dd = pci_get_drvdata(pdev);
+
+	dd_dev_info(dd, "HFI1 resume function called\n");
+	pci_cleanup_aer_uncorrect_error_status(pdev);
+	/*
+	 * Running jobs will fail, since it's asynchronous
+	 * unlike sysfs-requested reset.   Better than
+	 * doing nothing.
+	 */
+	hfi1_init(dd, 1); /* same as re-init after reset */
+}
+
+const struct pci_error_handlers hfi1_pci_err_handler = {
+	.error_detected = pci_error_detected,
+	.mmio_enabled = pci_mmio_enabled,
+	.link_reset = pci_link_reset,
+	.slot_reset = pci_slot_reset,
+	.resume = pci_resume,
+};
+
+/*============================================================================*/
+/* PCIe Gen3 support */
+
+/*
+ * This code is separated out because it is expected to be removed in the
+ * final shipping product.  If not, then it will be revisited and items
+ * will be moved to more standard locations.
+ */
+
+/* ASIC_PCI_SD_HOST_STATUS.FW_DNLD_STS field values */
+#define DL_STATUS_HFI0 0x1	/* hfi0 firmware download complete */
+#define DL_STATUS_HFI1 0x2	/* hfi1 firmware download complete */
+#define DL_STATUS_BOTH 0x3	/* hfi0 and hfi1 firmware download complete */
+
+/* ASIC_PCI_SD_HOST_STATUS.FW_DNLD_ERR field values */
+#define DL_ERR_NONE		0x0	/* no error */
+#define DL_ERR_SWAP_PARITY	0x1	/* parity error in SerDes interrupt */
+					/*   or response data */
+#define DL_ERR_DISABLED	0x2	/* hfi disabled */
+#define DL_ERR_SECURITY	0x3	/* security check failed */
+#define DL_ERR_SBUS		0x4	/* SBus status error */
+#define DL_ERR_XFR_PARITY	0x5	/* parity error during ROM transfer*/
+
+/* gasket block secondary bus reset delay */
+#define SBR_DELAY_US 200000	/* 200ms */
+
+/* mask for PCIe capability register lnkctl2 target link speed */
+#define LNKCTL2_TARGET_LINK_SPEED_MASK 0xf
+
+static uint pcie_target = 3;
+module_param(pcie_target, uint, S_IRUGO);
+MODULE_PARM_DESC(pcie_target, "PCIe target speed (0 skip, 1-3 Gen1-3)");
+
+static uint pcie_force;
+module_param(pcie_force, uint, S_IRUGO);
+MODULE_PARM_DESC(pcie_force, "Force driver to do a PCIe firmware download even if already at target speed");
+
+static uint pcie_retry = 5;
+module_param(pcie_retry, uint, S_IRUGO);
+MODULE_PARM_DESC(pcie_retry, "Driver will try this many times to reach requested speed");
+
+#define UNSET_PSET 255
+#define DEFAULT_DISCRETE_PSET 2	/* discrete HFI */
+#define DEFAULT_MCP_PSET 4	/* MCP HFI */
+static uint pcie_pset = UNSET_PSET;
+module_param(pcie_pset, uint, S_IRUGO);
+MODULE_PARM_DESC(pcie_pset, "PCIe Eq Pset value to use, range is 0-10");
+
+/* equalization columns */
+#define PREC 0
+#define ATTN 1
+#define POST 2
+
+/* discrete silicon preliminary equalization values */
+static const u8 discrete_preliminary_eq[11][3] = {
+	/* prec   attn   post */
+	{  0x00,  0x00,  0x12 },	/* p0 */
+	{  0x00,  0x00,  0x0c },	/* p1 */
+	{  0x00,  0x00,  0x0f },	/* p2 */
+	{  0x00,  0x00,  0x09 },	/* p3 */
+	{  0x00,  0x00,  0x00 },	/* p4 */
+	{  0x06,  0x00,  0x00 },	/* p5 */
+	{  0x09,  0x00,  0x00 },	/* p6 */
+	{  0x06,  0x00,  0x0f },	/* p7 */
+	{  0x09,  0x00,  0x09 },	/* p8 */
+	{  0x0c,  0x00,  0x00 },	/* p9 */
+	{  0x00,  0x00,  0x18 },	/* p10 */
+};
+
+/* integrated silicon preliminary equalization values */
+static const u8 integrated_preliminary_eq[11][3] = {
+	/* prec   attn   post */
+	{  0x00,  0x1e,  0x07 },	/* p0 */
+	{  0x00,  0x1e,  0x05 },	/* p1 */
+	{  0x00,  0x1e,  0x06 },	/* p2 */
+	{  0x00,  0x1e,  0x04 },	/* p3 */
+	{  0x00,  0x1e,  0x00 },	/* p4 */
+	{  0x03,  0x1e,  0x00 },	/* p5 */
+	{  0x04,  0x1e,  0x00 },	/* p6 */
+	{  0x03,  0x1e,  0x06 },	/* p7 */
+	{  0x03,  0x1e,  0x04 },	/* p8 */
+	{  0x05,  0x1e,  0x00 },	/* p9 */
+	{  0x00,  0x1e,  0x0a },	/* p10 */
+};
+
+/* helper to format the value to write to hardware */
+#define eq_value(pre, curr, post) \
+	((((u32)(pre)) << \
+			PCIE_CFG_REG_PL102_GEN3_EQ_PRE_CURSOR_PSET_SHIFT) \
+	| (((u32)(curr)) << PCIE_CFG_REG_PL102_GEN3_EQ_CURSOR_PSET_SHIFT) \
+	| (((u32)(post)) << \
+		PCIE_CFG_REG_PL102_GEN3_EQ_POST_CURSOR_PSET_SHIFT))
+
+/*
+ * Load the given EQ preset table into the PCIe hardware.
+ */
+static int load_eq_table(struct hfi1_devdata *dd, const u8 eq[11][3], u8 fs,
+			 u8 div)
+{
+	struct pci_dev *pdev = dd->pcidev;
+	u32 hit_error = 0;
+	u32 violation;
+	u32 i;
+	u8 c_minus1, c0, c_plus1;
+
+	for (i = 0; i < 11; i++) {
+		/* set index */
+		pci_write_config_dword(pdev, PCIE_CFG_REG_PL103, i);
+		/* write the value */
+		c_minus1 = eq[i][PREC] / div;
+		c0 = fs - (eq[i][PREC] / div) - (eq[i][POST] / div);
+		c_plus1 = eq[i][POST] / div;
+		pci_write_config_dword(pdev, PCIE_CFG_REG_PL102,
+			eq_value(c_minus1, c0, c_plus1));
+		/* check if these coefficients violate EQ rules */
+		pci_read_config_dword(dd->pcidev, PCIE_CFG_REG_PL105,
+								&violation);
+		if (violation
+		    & PCIE_CFG_REG_PL105_GEN3_EQ_VIOLATE_COEF_RULES_SMASK){
+			if (hit_error == 0) {
+				dd_dev_err(dd,
+					"Gen3 EQ Table Coefficient rule violations\n");
+				dd_dev_err(dd, "         prec   attn   post\n");
+			}
+			dd_dev_err(dd, "   p%02d:   %02x     %02x     %02x\n",
+				i, (u32)eq[i][0], (u32)eq[i][1], (u32)eq[i][2]);
+			dd_dev_err(dd, "            %02x     %02x     %02x\n",
+				(u32)c_minus1, (u32)c0, (u32)c_plus1);
+			hit_error = 1;
+		}
+	}
+	if (hit_error)
+		return -EINVAL;
+	return 0;
+}
+
+/*
+ * Steps to be done after the PCIe firmware is downloaded and
+ * before the SBR for the Pcie Gen3.
+ * The hardware mutex is already being held.
+ */
+static void pcie_post_steps(struct hfi1_devdata *dd)
+{
+	int i;
+
+	set_sbus_fast_mode(dd);
+	/*
+	 * Write to the PCIe PCSes to set the G3_LOCKED_NEXT bits to 1.
+	 * This avoids a spurious framing error that can otherwise be
+	 * generated by the MAC layer.
+	 *
+	 * Use individual addresses since no broadcast is set up.
+	 */
+	for (i = 0; i < NUM_PCIE_SERDES; i++) {
+		sbus_request(dd, pcie_pcs_addrs[dd->hfi1_id][i],
+			     0x03, WRITE_SBUS_RECEIVER, 0x00022132);
+	}
+
+	clear_sbus_fast_mode(dd);
+}
+
+/*
+ * Trigger a secondary bus reset (SBR) on ourselves using our parent.
+ *
+ * Based on pci_parent_bus_reset() which is not exported by the
+ * kernel core.
+ */
+static int trigger_sbr(struct hfi1_devdata *dd)
+{
+	struct pci_dev *dev = dd->pcidev;
+	struct pci_dev *pdev;
+
+	/* need a parent */
+	if (!dev->bus->self) {
+		dd_dev_err(dd, "%s: no parent device\n", __func__);
+		return -ENOTTY;
+	}
+
+	/* should not be anyone else on the bus */
+	list_for_each_entry(pdev, &dev->bus->devices, bus_list)
+		if (pdev != dev) {
+			dd_dev_err(dd,
+				"%s: another device is on the same bus\n",
+				__func__);
+			return -ENOTTY;
+		}
+
+	/*
+	 * A secondary bus reset (SBR) issues a hot reset to our device.
+	 * The following routine does a 1s wait after the reset is dropped
+	 * per PCI Trhfa (recovery time).  PCIe 3.0 section 6.6.1 -
+	 * Conventional Reset, paragraph 3, line 35 also says that a 1s
+	 * delay after a reset is required.  Per spec requirements,
+	 * the link is either working or not after that point.
+	 */
+	pci_reset_bridge_secondary_bus(dev->bus->self);
+
+	return 0;
+}
+
+/*
+ * Write the given gasket interrupt register.
+ */
+static void write_gasket_interrupt(struct hfi1_devdata *dd, int index,
+				   u16 code, u16 data)
+{
+	write_csr(dd, ASIC_PCIE_SD_INTRPT_LIST + (index * 8),
+	    (((u64)code << ASIC_PCIE_SD_INTRPT_LIST_INTRPT_CODE_SHIFT)
+	    |((u64)data << ASIC_PCIE_SD_INTRPT_LIST_INTRPT_DATA_SHIFT)));
+}
+
+/*
+ * Tell the gasket logic how to react to the reset.
+ */
+static void arm_gasket_logic(struct hfi1_devdata *dd)
+{
+	u64 reg;
+
+	reg = (((u64)1 << dd->hfi1_id)
+			<< ASIC_PCIE_SD_HOST_CMD_INTRPT_CMD_SHIFT)
+		| ((u64)pcie_serdes_broadcast[dd->hfi1_id]
+			<< ASIC_PCIE_SD_HOST_CMD_SBUS_RCVR_ADDR_SHIFT
+		| ASIC_PCIE_SD_HOST_CMD_SBR_MODE_SMASK
+		| ((u64)SBR_DELAY_US & ASIC_PCIE_SD_HOST_CMD_TIMER_MASK)
+			<< ASIC_PCIE_SD_HOST_CMD_TIMER_SHIFT
+		);
+	write_csr(dd, ASIC_PCIE_SD_HOST_CMD, reg);
+	/* read back to push the write */
+	read_csr(dd, ASIC_PCIE_SD_HOST_CMD);
+}
+
+/*
+ * Do all the steps needed to transition the PCIe link to Gen3 speed.
+ */
+int do_pcie_gen3_transition(struct hfi1_devdata *dd)
+{
+	struct pci_dev *parent;
+	u64 fw_ctrl;
+	u64 reg, therm;
+	u32 reg32, fs, lf;
+	u32 status, err;
+	int ret;
+	int do_retry, retry_count = 0;
+	uint default_pset;
+	u16 target_vector, target_speed;
+	u16 lnkctl, lnkctl2, vendor;
+	u8 nsbr = 1;
+	u8 div;
+	const u8 (*eq)[3];
+	int return_error = 0;
+
+	/* PCIe Gen3 is for the ASIC only */
+	if (dd->icode != ICODE_RTL_SILICON)
+		return 0;
+
+	if (pcie_target == 1) {			/* target Gen1 */
+		target_vector = GEN1_SPEED_VECTOR;
+		target_speed = 2500;
+	} else if (pcie_target == 2) {		/* target Gen2 */
+		target_vector = GEN2_SPEED_VECTOR;
+		target_speed = 5000;
+	} else if (pcie_target == 3) {		/* target Gen3 */
+		target_vector = GEN3_SPEED_VECTOR;
+		target_speed = 8000;
+	} else {
+		/* off or invalid target - skip */
+		dd_dev_info(dd, "%s: Skipping PCIe transition\n", __func__);
+		return 0;
+	}
+
+	/* if already at target speed, done (unless forced) */
+	if (dd->lbus_speed == target_speed) {
+		dd_dev_info(dd, "%s: PCIe already at gen%d, %s\n", __func__,
+			pcie_target,
+			pcie_force ? "re-doing anyway" : "skipping");
+		if (!pcie_force)
+			return 0;
+	}
+
+	/*
+	 * A0 needs an additional SBR
+	 */
+	if (is_a0(dd))
+		nsbr++;
+
+	/*
+	 * Do the Gen3 transition.  Steps are those of the PCIe Gen3
+	 * recipe.
+	 */
+
+	/* step 1: pcie link working in gen1/gen2 */
+
+	/* step 2: if either side is not capable of Gen3, done */
+	if (pcie_target == 3 && !dd->link_gen3_capable) {
+		dd_dev_err(dd, "The PCIe link is not Gen3 capable\n");
+		ret = -ENOSYS;
+		goto done_no_mutex;
+	}
+
+	/* hold the HW mutex across the firmware download and SBR */
+	ret = acquire_hw_mutex(dd);
+	if (ret)
+		return ret;
+
+	/* make sure thermal polling is not causing interrupts */
+	therm = read_csr(dd, ASIC_CFG_THERM_POLL_EN);
+	if (therm) {
+		write_csr(dd, ASIC_CFG_THERM_POLL_EN, 0x0);
+		msleep(100);
+		dd_dev_info(dd, "%s: Disabled therm polling\n",
+			    __func__);
+	}
+
+	/* step 3: download SBus Master firmware */
+	/* step 4: download PCIe Gen3 SerDes firmware */
+retry:
+	dd_dev_info(dd, "%s: downloading firmware\n", __func__);
+	ret = load_pcie_firmware(dd);
+	if (ret)
+		goto done;
+
+	/* step 5: set up device parameter settings */
+	dd_dev_info(dd, "%s: setting PCIe registers\n", __func__);
+
+	/*
+	 * PcieCfgSpcie1 - Link Control 3
+	 * Leave at reset value.  No need to set PerfEq - link equalization
+	 * will be performed automatically after the SBR when the target
+	 * speed is 8GT/s.
+	 */
+
+	/* clear all 16 per-lane error bits (PCIe: Lane Error Status) */
+	pci_write_config_dword(dd->pcidev, PCIE_CFG_SPCIE2, 0xffff);
+
+	/* step 5a: Set Synopsys Port Logic registers */
+
+	/*
+	 * PcieCfgRegPl2 - Port Force Link
+	 *
+	 * Set the low power field to 0x10 to avoid unnecessary power
+	 * management messages.  All other fields are zero.
+	 */
+	reg32 = 0x10ul << PCIE_CFG_REG_PL2_LOW_PWR_ENT_CNT_SHIFT;
+	pci_write_config_dword(dd->pcidev, PCIE_CFG_REG_PL2, reg32);
+
+	/*
+	 * PcieCfgRegPl100 - Gen3 Control
+	 *
+	 * turn off PcieCfgRegPl100.Gen3ZRxDcNonCompl
+	 * turn on PcieCfgRegPl100.EqEieosCnt (erratum)
+	 * Everything else zero.
+	 */
+	reg32 = PCIE_CFG_REG_PL100_EQ_EIEOS_CNT_SMASK;
+	pci_write_config_dword(dd->pcidev, PCIE_CFG_REG_PL100, reg32);
+
+	/*
+	 * PcieCfgRegPl101 - Gen3 EQ FS and LF
+	 * PcieCfgRegPl102 - Gen3 EQ Presets to Coefficients Mapping
+	 * PcieCfgRegPl103 - Gen3 EQ Preset Index
+	 * PcieCfgRegPl105 - Gen3 EQ Status
+	 *
+	 * Give initial EQ settings.
+	 */
+	if (dd->pcidev->device == PCI_DEVICE_ID_INTEL0) { /* discrete */
+		/* 1000mV, FS=24, LF = 8 */
+		fs = 24;
+		lf = 8;
+		div = 3;
+		eq = discrete_preliminary_eq;
+		default_pset = DEFAULT_DISCRETE_PSET;
+	} else {
+		/* 400mV, FS=29, LF = 9 */
+		fs = 29;
+		lf = 9;
+		div = 1;
+		eq = integrated_preliminary_eq;
+		default_pset = DEFAULT_MCP_PSET;
+	}
+	pci_write_config_dword(dd->pcidev, PCIE_CFG_REG_PL101,
+		(fs << PCIE_CFG_REG_PL101_GEN3_EQ_LOCAL_FS_SHIFT)
+		| (lf << PCIE_CFG_REG_PL101_GEN3_EQ_LOCAL_LF_SHIFT));
+	ret = load_eq_table(dd, eq, fs, div);
+	if (ret)
+		goto done;
+
+	/*
+	 * PcieCfgRegPl106 - Gen3 EQ Control
+	 *
+	 * Set Gen3EqPsetReqVec, leave other fields 0.
+	 */
+	if (pcie_pset == UNSET_PSET)
+		pcie_pset = default_pset;
+	if (pcie_pset > 10) {	/* valid range is 0-10, inclusive */
+		dd_dev_err(dd, "%s: Invalid Eq Pset %u, setting to %d\n",
+			__func__, pcie_pset, default_pset);
+		pcie_pset = default_pset;
+	}
+	dd_dev_info(dd, "%s: using EQ Pset %u\n", __func__, pcie_pset);
+	pci_write_config_dword(dd->pcidev, PCIE_CFG_REG_PL106,
+		((1 << pcie_pset)
+			<< PCIE_CFG_REG_PL106_GEN3_EQ_PSET_REQ_VEC_SHIFT)
+		| PCIE_CFG_REG_PL106_GEN3_EQ_EVAL2MS_DISABLE_SMASK
+		| PCIE_CFG_REG_PL106_GEN3_EQ_PHASE23_EXIT_MODE_SMASK);
+
+	/*
+	 * step 5b: Do post firmware download steps via SBus
+	 */
+	dd_dev_info(dd, "%s: doing pcie post steps\n", __func__);
+	pcie_post_steps(dd);
+
+	/*
+	 * step 5c: Program gasket interrupts
+	 */
+	/* set the Rx Bit Rate to REFCLK ratio */
+	write_gasket_interrupt(dd, 0, 0x0006, 0x0050);
+	/* disable pCal for PCIe Gen3 RX equalization */
+	write_gasket_interrupt(dd, 1, 0x0026, 0x5b01);
+	/*
+	 * Enable iCal for PCIe Gen3 RX equalization, and set which
+	 * evaluation of RX_EQ_EVAL will launch the iCal procedure.
+	 */
+	write_gasket_interrupt(dd, 2, 0x0026, 0x5202);
+	/* terminate list */
+	write_gasket_interrupt(dd, 3, 0x0000, 0x0000);
+
+	/*
+	 * step 5d: program XMT margin
+	 * Right now, leave the default alone.  To change, do a
+	 * read-modify-write of:
+	 *	CcePcieCtrl.XmtMargin
+	 *	CcePcieCtrl.XmitMarginOverwriteEnable
+	 */
+
+	/* step 5e: disable active state power management (ASPM) */
+	dd_dev_info(dd, "%s: clearing ASPM\n", __func__);
+	pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKCTL, &lnkctl);
+	lnkctl &= ~PCI_EXP_LNKCTL_ASPMC;
+	pcie_capability_write_word(dd->pcidev, PCI_EXP_LNKCTL, lnkctl);
+
+	/*
+	 * step 5f: clear DirectSpeedChange
+	 * PcieCfgRegPl67.DirectSpeedChange must be zero to prevent the
+	 * change in the speed target from starting before we are ready.
+	 * This field defaults to 0 and we are not changing it, so nothing
+	 * needs to be done.
+	 */
+
+	/* step 5g: Set target link speed */
+	/*
+	 * Set target link speed to be target on both device and parent.
+	 * On setting the parent: Some system BIOSs "helpfully" set the
+	 * parent target speed to Gen2 to match the ASIC's initial speed.
+	 * We can set the target Gen3 because we have already checked
+	 * that it is Gen3 capable earlier.
+	 */
+	dd_dev_info(dd, "%s: setting parent target link speed\n", __func__);
+	parent = dd->pcidev->bus->self;
+	pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &lnkctl2);
+	dd_dev_info(dd, "%s: ..old link control2: 0x%x\n", __func__,
+		(u32)lnkctl2);
+	/* only write to parent if target is not as high as ours */
+	if ((lnkctl2 & LNKCTL2_TARGET_LINK_SPEED_MASK) < target_vector) {
+		lnkctl2 &= ~LNKCTL2_TARGET_LINK_SPEED_MASK;
+		lnkctl2 |= target_vector;
+		dd_dev_info(dd, "%s: ..new link control2: 0x%x\n", __func__,
+			(u32)lnkctl2);
+		pcie_capability_write_word(parent, PCI_EXP_LNKCTL2, lnkctl2);
+	} else {
+		dd_dev_info(dd, "%s: ..target speed is OK\n", __func__);
+	}
+
+	dd_dev_info(dd, "%s: setting target link speed\n", __func__);
+	pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKCTL2, &lnkctl2);
+	dd_dev_info(dd, "%s: ..old link control2: 0x%x\n", __func__,
+		(u32)lnkctl2);
+	lnkctl2 &= ~LNKCTL2_TARGET_LINK_SPEED_MASK;
+	lnkctl2 |= target_vector;
+	dd_dev_info(dd, "%s: ..new link control2: 0x%x\n", __func__,
+		(u32)lnkctl2);
+	pcie_capability_write_word(dd->pcidev, PCI_EXP_LNKCTL2, lnkctl2);
+
+	/* step 5h: arm gasket logic */
+	/* hold DC in reset across the SBR */
+	write_csr(dd, CCE_DC_CTRL, CCE_DC_CTRL_DC_RESET_SMASK);
+	(void) read_csr(dd, CCE_DC_CTRL); /* DC reset hold */
+	/* save firmware control across the SBR */
+	fw_ctrl = read_csr(dd, MISC_CFG_FW_CTRL);
+
+	dd_dev_info(dd, "%s: arming gasket logic\n", __func__);
+	arm_gasket_logic(dd);
+
+	/*
+	 * step 6: quiesce PCIe link
+	 * The chip has already been reset, so there will be no traffic
+	 * from the chip.  Linux has no easy way to enforce that it will
+	 * not try to access the device, so we just need to hope it doesn't
+	 * do it while we are doing the reset.
+	 */
+
+	/*
+	 * step 7: initiate the secondary bus reset (SBR)
+	 * step 8: hardware brings the links back up
+	 * step 9: wait for link speed transition to be complete
+	 */
+	dd_dev_info(dd, "%s: calling trigger_sbr\n", __func__);
+	ret = trigger_sbr(dd);
+	if (ret)
+		goto done;
+
+	/* step 10: decide what to do next */
+
+	/* check if we can read PCI space */
+	ret = pci_read_config_word(dd->pcidev, PCI_VENDOR_ID, &vendor);
+	if (ret) {
+		dd_dev_info(dd,
+			"%s: read of VendorID failed after SBR, err %d\n",
+			__func__, ret);
+		return_error = 1;
+		goto done;
+	}
+	if (vendor == 0xffff) {
+		dd_dev_info(dd, "%s: VendorID is all 1s after SBR\n", __func__);
+		return_error = 1;
+		ret = -EIO;
+		goto done;
+	}
+
+	/* restore PCI space registers we know were reset */
+	dd_dev_info(dd, "%s: calling restore_pci_variables\n", __func__);
+	restore_pci_variables(dd);
+	/* restore firmware control */
+	write_csr(dd, MISC_CFG_FW_CTRL, fw_ctrl);
+
+	/*
+	 * Check the gasket block status.
+	 *
+	 * This is the first CSR read after the SBR.  If the read returns
+	 * all 1s (fails), the link did not make it back.
+	 *
+	 * Once we're sure we can read and write, clear the DC reset after
+	 * the SBR.  Then check for any per-lane errors. Then look over
+	 * the status.
+	 */
+	reg = read_csr(dd, ASIC_PCIE_SD_HOST_STATUS);
+	dd_dev_info(dd, "%s: gasket block status: 0x%llx\n", __func__, reg);
+	if (reg == ~0ull) {	/* PCIe read failed/timeout */
+		dd_dev_err(dd, "SBR failed - unable to read from device\n");
+		return_error = 1;
+		ret = -ENOSYS;
+		goto done;
+	}
+
+	/* clear the DC reset */
+	write_csr(dd, CCE_DC_CTRL, 0);
+	/* Set the LED off */
+	if (is_a0(dd))
+		setextled(dd, 0);
+
+	/* check for any per-lane errors */
+	pci_read_config_dword(dd->pcidev, PCIE_CFG_SPCIE2, &reg32);
+	dd_dev_info(dd, "%s: per-lane errors: 0x%x\n", __func__, reg32);
+
+	/* extract status, look for our HFI */
+	status = (reg >> ASIC_PCIE_SD_HOST_STATUS_FW_DNLD_STS_SHIFT)
+			& ASIC_PCIE_SD_HOST_STATUS_FW_DNLD_STS_MASK;
+	if ((status & (1 << dd->hfi1_id)) == 0) {
+		dd_dev_err(dd,
+			"%s: gasket status 0x%x, expecting 0x%x\n",
+			__func__, status, 1 << dd->hfi1_id);
+		ret = -EIO;
+		goto done;
+	}
+
+	/* extract error */
+	err = (reg >> ASIC_PCIE_SD_HOST_STATUS_FW_DNLD_ERR_SHIFT)
+		& ASIC_PCIE_SD_HOST_STATUS_FW_DNLD_ERR_MASK;
+	if (err) {
+		dd_dev_err(dd, "%s: gasket error %d\n", __func__, err);
+		ret = -EIO;
+		goto done;
+	}
+
+	/* update our link information cache */
+	update_lbus_info(dd);
+	dd_dev_info(dd, "%s: new speed and width: %s\n", __func__,
+		dd->lbus_info);
+
+	if (dd->lbus_speed != target_speed) { /* not target */
+		/* maybe retry */
+		do_retry = retry_count < pcie_retry;
+		dd_dev_err(dd, "PCIe link speed did not switch to Gen%d%s\n",
+			pcie_target, do_retry ? ", retrying" : "");
+		retry_count++;
+		if (do_retry) {
+			msleep(100); /* allow time to settle */
+			goto retry;
+		}
+		ret = -EIO;
+	}
+
+done:
+	if (therm) {
+		write_csr(dd, ASIC_CFG_THERM_POLL_EN, 0x1);
+		msleep(100);
+		dd_dev_info(dd, "%s: Re-enable therm polling\n",
+			    __func__);
+	}
+	release_hw_mutex(dd);
+done_no_mutex:
+	/* return no error if it is OK to be at current speed */
+	if (ret && !return_error) {
+		dd_dev_err(dd, "Proceeding at current speed PCIe speed\n");
+		ret = 0;
+	}
+
+	dd_dev_info(dd, "%s: done\n", __func__);
+	return ret;
+}
diff --git a/drivers/staging/rdma/hfi1/pio.c b/drivers/staging/rdma/hfi1/pio.c
new file mode 100644
index 000000000000..9991814a8f05
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/pio.c
@@ -0,0 +1,1771 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/delay.h>
+#include "hfi.h"
+#include "qp.h"
+#include "trace.h"
+
+#define SC_CTXT_PACKET_EGRESS_TIMEOUT 350 /* in chip cycles */
+
+#define SC(name) SEND_CTXT_##name
+/*
+ * Send Context functions
+ */
+static void sc_wait_for_packet_egress(struct send_context *sc, int pause);
+
+/*
+ * Set the CM reset bit and wait for it to clear.  Use the provided
+ * sendctrl register.  This routine has no locking.
+ */
+void __cm_reset(struct hfi1_devdata *dd, u64 sendctrl)
+{
+	write_csr(dd, SEND_CTRL, sendctrl | SEND_CTRL_CM_RESET_SMASK);
+	while (1) {
+		udelay(1);
+		sendctrl = read_csr(dd, SEND_CTRL);
+		if ((sendctrl & SEND_CTRL_CM_RESET_SMASK) == 0)
+			break;
+	}
+}
+
+/* defined in header release 48 and higher */
+#ifndef SEND_CTRL_UNSUPPORTED_VL_SHIFT
+#define SEND_CTRL_UNSUPPORTED_VL_SHIFT 3
+#define SEND_CTRL_UNSUPPORTED_VL_MASK 0xffull
+#define SEND_CTRL_UNSUPPORTED_VL_SMASK (SEND_CTRL_UNSUPPORTED_VL_MASK \
+		<< SEND_CTRL_UNSUPPORTED_VL_SHIFT)
+#endif
+
+/* global control of PIO send */
+void pio_send_control(struct hfi1_devdata *dd, int op)
+{
+	u64 reg, mask;
+	unsigned long flags;
+	int write = 1;	/* write sendctrl back */
+	int flush = 0;	/* re-read sendctrl to make sure it is flushed */
+
+	spin_lock_irqsave(&dd->sendctrl_lock, flags);
+
+	reg = read_csr(dd, SEND_CTRL);
+	switch (op) {
+	case PSC_GLOBAL_ENABLE:
+		reg |= SEND_CTRL_SEND_ENABLE_SMASK;
+	/* Fall through */
+	case PSC_DATA_VL_ENABLE:
+		/* Disallow sending on VLs not enabled */
+		mask = (((~0ull)<<num_vls) & SEND_CTRL_UNSUPPORTED_VL_MASK)<<
+				SEND_CTRL_UNSUPPORTED_VL_SHIFT;
+		reg = (reg & ~SEND_CTRL_UNSUPPORTED_VL_SMASK) | mask;
+		break;
+	case PSC_GLOBAL_DISABLE:
+		reg &= ~SEND_CTRL_SEND_ENABLE_SMASK;
+		break;
+	case PSC_GLOBAL_VLARB_ENABLE:
+		reg |= SEND_CTRL_VL_ARBITER_ENABLE_SMASK;
+		break;
+	case PSC_GLOBAL_VLARB_DISABLE:
+		reg &= ~SEND_CTRL_VL_ARBITER_ENABLE_SMASK;
+		break;
+	case PSC_CM_RESET:
+		__cm_reset(dd, reg);
+		write = 0; /* CSR already written (and flushed) */
+		break;
+	case PSC_DATA_VL_DISABLE:
+		reg |= SEND_CTRL_UNSUPPORTED_VL_SMASK;
+		flush = 1;
+		break;
+	default:
+		dd_dev_err(dd, "%s: invalid control %d\n", __func__, op);
+		break;
+	}
+
+	if (write) {
+		write_csr(dd, SEND_CTRL, reg);
+		if (flush)
+			(void) read_csr(dd, SEND_CTRL); /* flush write */
+	}
+
+	spin_unlock_irqrestore(&dd->sendctrl_lock, flags);
+}
+
+/* number of send context memory pools */
+#define NUM_SC_POOLS 2
+
+/* Send Context Size (SCS) wildcards */
+#define SCS_POOL_0 -1
+#define SCS_POOL_1 -2
+/* Send Context Count (SCC) wildcards */
+#define SCC_PER_VL -1
+#define SCC_PER_CPU  -2
+
+#define SCC_PER_KRCVQ  -3
+#define SCC_ACK_CREDITS  32
+
+#define PIO_WAIT_BATCH_SIZE 5
+
+/* default send context sizes */
+static struct sc_config_sizes sc_config_sizes[SC_MAX] = {
+	[SC_KERNEL] = { .size  = SCS_POOL_0,	/* even divide, pool 0 */
+			.count = SCC_PER_VL },/* one per NUMA */
+	[SC_ACK]    = { .size  = SCC_ACK_CREDITS,
+			.count = SCC_PER_KRCVQ },
+	[SC_USER]   = { .size  = SCS_POOL_0,	/* even divide, pool 0 */
+			.count = SCC_PER_CPU },	/* one per CPU */
+
+};
+
+/* send context memory pool configuration */
+struct mem_pool_config {
+	int centipercent;	/* % of memory, in 100ths of 1% */
+	int absolute_blocks;	/* absolute block count */
+};
+
+/* default memory pool configuration: 100% in pool 0 */
+static struct mem_pool_config sc_mem_pool_config[NUM_SC_POOLS] = {
+	/* centi%, abs blocks */
+	{  10000,     -1 },		/* pool 0 */
+	{      0,     -1 },		/* pool 1 */
+};
+
+/* memory pool information, used when calculating final sizes */
+struct mem_pool_info {
+	int centipercent;	/* 100th of 1% of memory to use, -1 if blocks
+				   already set */
+	int count;		/* count of contexts in the pool */
+	int blocks;		/* block size of the pool */
+	int size;		/* context size, in blocks */
+};
+
+/*
+ * Convert a pool wildcard to a valid pool index.  The wildcards
+ * start at -1 and increase negatively.  Map them as:
+ *	-1 => 0
+ *	-2 => 1
+ *	etc.
+ *
+ * Return -1 on non-wildcard input, otherwise convert to a pool number.
+ */
+static int wildcard_to_pool(int wc)
+{
+	if (wc >= 0)
+		return -1;	/* non-wildcard */
+	return -wc - 1;
+}
+
+static const char *sc_type_names[SC_MAX] = {
+	"kernel",
+	"ack",
+	"user"
+};
+
+static const char *sc_type_name(int index)
+{
+	if (index < 0 || index >= SC_MAX)
+		return "unknown";
+	return sc_type_names[index];
+}
+
+/*
+ * Read the send context memory pool configuration and send context
+ * size configuration.  Replace any wildcards and come up with final
+ * counts and sizes for the send context types.
+ */
+int init_sc_pools_and_sizes(struct hfi1_devdata *dd)
+{
+	struct mem_pool_info mem_pool_info[NUM_SC_POOLS] = { { 0 } };
+	int total_blocks = (dd->chip_pio_mem_size / PIO_BLOCK_SIZE) - 1;
+	int total_contexts = 0;
+	int fixed_blocks;
+	int pool_blocks;
+	int used_blocks;
+	int cp_total;		/* centipercent total */
+	int ab_total;		/* absolute block total */
+	int extra;
+	int i;
+
+	/*
+	 * Step 0:
+	 *	- copy the centipercents/absolute sizes from the pool config
+	 *	- sanity check these values
+	 *	- add up centipercents, then later check for full value
+	 *	- add up absolute blocks, then later check for over-commit
+	 */
+	cp_total = 0;
+	ab_total = 0;
+	for (i = 0; i < NUM_SC_POOLS; i++) {
+		int cp = sc_mem_pool_config[i].centipercent;
+		int ab = sc_mem_pool_config[i].absolute_blocks;
+
+		/*
+		 * A negative value is "unused" or "invalid".  Both *can*
+		 * be valid, but centipercent wins, so check that first
+		 */
+		if (cp >= 0) {			/* centipercent valid */
+			cp_total += cp;
+		} else if (ab >= 0) {		/* absolute blocks valid */
+			ab_total += ab;
+		} else {			/* neither valid */
+			dd_dev_err(
+				dd,
+				"Send context memory pool %d: both the block count and centipercent are invalid\n",
+				i);
+			return -EINVAL;
+		}
+
+		mem_pool_info[i].centipercent = cp;
+		mem_pool_info[i].blocks = ab;
+	}
+
+	/* do not use both % and absolute blocks for different pools */
+	if (cp_total != 0 && ab_total != 0) {
+		dd_dev_err(
+			dd,
+			"All send context memory pools must be described as either centipercent or blocks, no mixing between pools\n");
+		return -EINVAL;
+	}
+
+	/* if any percentages are present, they must add up to 100% x 100 */
+	if (cp_total != 0 && cp_total != 10000) {
+		dd_dev_err(
+			dd,
+			"Send context memory pool centipercent is %d, expecting 10000\n",
+			cp_total);
+		return -EINVAL;
+	}
+
+	/* the absolute pool total cannot be more than the mem total */
+	if (ab_total > total_blocks) {
+		dd_dev_err(
+			dd,
+			"Send context memory pool absolute block count %d is larger than the memory size %d\n",
+			ab_total, total_blocks);
+		return -EINVAL;
+	}
+
+	/*
+	 * Step 2:
+	 *	- copy from the context size config
+	 *	- replace context type wildcard counts with real values
+	 *	- add up non-memory pool block sizes
+	 *	- add up memory pool user counts
+	 */
+	fixed_blocks = 0;
+	for (i = 0; i < SC_MAX; i++) {
+		int count = sc_config_sizes[i].count;
+		int size = sc_config_sizes[i].size;
+		int pool;
+
+		/*
+		 * Sanity check count: Either a positive value or
+		 * one of the expected wildcards is valid.  The positive
+		 * value is checked later when we compare against total
+		 * memory available.
+		 */
+		if (i == SC_ACK) {
+			count = dd->n_krcv_queues;
+		} else if (i == SC_KERNEL) {
+			count = num_vls + 1 /* VL15 */;
+		} else if (count == SCC_PER_CPU) {
+			count = dd->num_rcv_contexts - dd->n_krcv_queues;
+		} else if (count < 0) {
+			dd_dev_err(
+				dd,
+				"%s send context invalid count wildcard %d\n",
+				sc_type_name(i), count);
+			return -EINVAL;
+		}
+		if (total_contexts + count > dd->chip_send_contexts)
+			count = dd->chip_send_contexts - total_contexts;
+
+		total_contexts += count;
+
+		/*
+		 * Sanity check pool: The conversion will return a pool
+		 * number or -1 if a fixed (non-negative) value.  The fixed
+		 * value is checked later when we compare against
+		 * total memory available.
+		 */
+		pool = wildcard_to_pool(size);
+		if (pool == -1) {			/* non-wildcard */
+			fixed_blocks += size * count;
+		} else if (pool < NUM_SC_POOLS) {	/* valid wildcard */
+			mem_pool_info[pool].count += count;
+		} else {				/* invalid wildcard */
+			dd_dev_err(
+				dd,
+				"%s send context invalid pool wildcard %d\n",
+				sc_type_name(i), size);
+			return -EINVAL;
+		}
+
+		dd->sc_sizes[i].count = count;
+		dd->sc_sizes[i].size = size;
+	}
+	if (fixed_blocks > total_blocks) {
+		dd_dev_err(
+			dd,
+			"Send context fixed block count, %u, larger than total block count %u\n",
+			fixed_blocks, total_blocks);
+		return -EINVAL;
+	}
+
+	/* step 3: calculate the blocks in the pools, and pool context sizes */
+	pool_blocks = total_blocks - fixed_blocks;
+	if (ab_total > pool_blocks) {
+		dd_dev_err(
+			dd,
+			"Send context fixed pool sizes, %u, larger than pool block count %u\n",
+			ab_total, pool_blocks);
+		return -EINVAL;
+	}
+	/* subtract off the fixed pool blocks */
+	pool_blocks -= ab_total;
+
+	for (i = 0; i < NUM_SC_POOLS; i++) {
+		struct mem_pool_info *pi = &mem_pool_info[i];
+
+		/* % beats absolute blocks */
+		if (pi->centipercent >= 0)
+			pi->blocks = (pool_blocks * pi->centipercent) / 10000;
+
+		if (pi->blocks == 0 && pi->count != 0) {
+			dd_dev_err(
+				dd,
+				"Send context memory pool %d has %u contexts, but no blocks\n",
+				i, pi->count);
+			return -EINVAL;
+		}
+		if (pi->count == 0) {
+			/* warn about wasted blocks */
+			if (pi->blocks != 0)
+				dd_dev_err(
+					dd,
+					"Send context memory pool %d has %u blocks, but zero contexts\n",
+					i, pi->blocks);
+			pi->size = 0;
+		} else {
+			pi->size = pi->blocks / pi->count;
+		}
+	}
+
+	/* step 4: fill in the context type sizes from the pool sizes */
+	used_blocks = 0;
+	for (i = 0; i < SC_MAX; i++) {
+		if (dd->sc_sizes[i].size < 0) {
+			unsigned pool = wildcard_to_pool(dd->sc_sizes[i].size);
+
+			WARN_ON_ONCE(pool >= NUM_SC_POOLS);
+			dd->sc_sizes[i].size = mem_pool_info[pool].size;
+		}
+		/* make sure we are not larger than what is allowed by the HW */
+#define PIO_MAX_BLOCKS 1024
+		if (dd->sc_sizes[i].size > PIO_MAX_BLOCKS)
+			dd->sc_sizes[i].size = PIO_MAX_BLOCKS;
+
+		/* calculate our total usage */
+		used_blocks += dd->sc_sizes[i].size * dd->sc_sizes[i].count;
+	}
+	extra = total_blocks - used_blocks;
+	if (extra != 0)
+		dd_dev_info(dd, "unused send context blocks: %d\n", extra);
+
+	return total_contexts;
+}
+
+int init_send_contexts(struct hfi1_devdata *dd)
+{
+	u16 base;
+	int ret, i, j, context;
+
+	ret = init_credit_return(dd);
+	if (ret)
+		return ret;
+
+	dd->hw_to_sw = kmalloc_array(TXE_NUM_CONTEXTS, sizeof(u8),
+					GFP_KERNEL);
+	dd->send_contexts = kcalloc(dd->num_send_contexts,
+					sizeof(struct send_context_info),
+					GFP_KERNEL);
+	if (!dd->send_contexts || !dd->hw_to_sw) {
+		dd_dev_err(dd, "Unable to allocate send context arrays\n");
+		kfree(dd->hw_to_sw);
+		kfree(dd->send_contexts);
+		free_credit_return(dd);
+		return -ENOMEM;
+	}
+
+	/* hardware context map starts with invalid send context indices */
+	for (i = 0; i < TXE_NUM_CONTEXTS; i++)
+		dd->hw_to_sw[i] = INVALID_SCI;
+
+	/*
+	 * All send contexts have their credit sizes.  Allocate credits
+	 * for each context one after another from the global space.
+	 */
+	context = 0;
+	base = 1;
+	for (i = 0; i < SC_MAX; i++) {
+		struct sc_config_sizes *scs = &dd->sc_sizes[i];
+
+		for (j = 0; j < scs->count; j++) {
+			struct send_context_info *sci =
+						&dd->send_contexts[context];
+			sci->type = i;
+			sci->base = base;
+			sci->credits = scs->size;
+
+			context++;
+			base += scs->size;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * Allocate a software index and hardware context of the given type.
+ *
+ * Must be called with dd->sc_lock held.
+ */
+static int sc_hw_alloc(struct hfi1_devdata *dd, int type, u32 *sw_index,
+		       u32 *hw_context)
+{
+	struct send_context_info *sci;
+	u32 index;
+	u32 context;
+
+	for (index = 0, sci = &dd->send_contexts[0];
+			index < dd->num_send_contexts; index++, sci++) {
+		if (sci->type == type && sci->allocated == 0) {
+			sci->allocated = 1;
+			/* use a 1:1 mapping, but make them non-equal */
+			context = dd->chip_send_contexts - index - 1;
+			dd->hw_to_sw[context] = index;
+			*sw_index = index;
+			*hw_context = context;
+			return 0; /* success */
+		}
+	}
+	dd_dev_err(dd, "Unable to locate a free type %d send context\n", type);
+	return -ENOSPC;
+}
+
+/*
+ * Free the send context given by its software index.
+ *
+ * Must be called with dd->sc_lock held.
+ */
+static void sc_hw_free(struct hfi1_devdata *dd, u32 sw_index, u32 hw_context)
+{
+	struct send_context_info *sci;
+
+	sci = &dd->send_contexts[sw_index];
+	if (!sci->allocated) {
+		dd_dev_err(dd, "%s: sw_index %u not allocated? hw_context %u\n",
+			__func__, sw_index, hw_context);
+	}
+	sci->allocated = 0;
+	dd->hw_to_sw[hw_context] = INVALID_SCI;
+}
+
+/* return the base context of a context in a group */
+static inline u32 group_context(u32 context, u32 group)
+{
+	return (context >> group) << group;
+}
+
+/* return the size of a group */
+static inline u32 group_size(u32 group)
+{
+	return 1 << group;
+}
+
+/*
+ * Obtain the credit return addresses, kernel virtual and physical, for the
+ * given sc.
+ *
+ * To understand this routine:
+ * o va and pa are arrays of struct credit_return.  One for each physical
+ *   send context, per NUMA.
+ * o Each send context always looks in its relative location in a struct
+ *   credit_return for its credit return.
+ * o Each send context in a group must have its return address CSR programmed
+ *   with the same value.  Use the address of the first send context in the
+ *   group.
+ */
+static void cr_group_addresses(struct send_context *sc, dma_addr_t *pa)
+{
+	u32 gc = group_context(sc->hw_context, sc->group);
+	u32 index = sc->hw_context & 0x7;
+
+	sc->hw_free = &sc->dd->cr_base[sc->node].va[gc].cr[index];
+	*pa = (unsigned long)
+	       &((struct credit_return *)sc->dd->cr_base[sc->node].pa)[gc];
+}
+
+/*
+ * Work queue function triggered in error interrupt routine for
+ * kernel contexts.
+ */
+static void sc_halted(struct work_struct *work)
+{
+	struct send_context *sc;
+
+	sc = container_of(work, struct send_context, halt_work);
+	sc_restart(sc);
+}
+
+/*
+ * Calculate PIO block threshold for this send context using the given MTU.
+ * Trigger a return when one MTU plus optional header of credits remain.
+ *
+ * Parameter mtu is in bytes.
+ * Parameter hdrqentsize is in DWORDs.
+ *
+ * Return value is what to write into the CSR: trigger return when
+ * unreturned credits pass this count.
+ */
+u32 sc_mtu_to_threshold(struct send_context *sc, u32 mtu, u32 hdrqentsize)
+{
+	u32 release_credits;
+	u32 threshold;
+
+	/* add in the header size, then divide by the PIO block size */
+	mtu += hdrqentsize << 2;
+	release_credits = DIV_ROUND_UP(mtu, PIO_BLOCK_SIZE);
+
+	/* check against this context's credits */
+	if (sc->credits <= release_credits)
+		threshold = 1;
+	else
+		threshold = sc->credits - release_credits;
+
+	return threshold;
+}
+
+/*
+ * Calculate credit threshold in terms of percent of the allocated credits.
+ * Trigger when unreturned credits equal or exceed the percentage of the whole.
+ *
+ * Return value is what to write into the CSR: trigger return when
+ * unreturned credits pass this count.
+ */
+static u32 sc_percent_to_threshold(struct send_context *sc, u32 percent)
+{
+	return (sc->credits * percent) / 100;
+}
+
+/*
+ * Set the credit return threshold.
+ */
+void sc_set_cr_threshold(struct send_context *sc, u32 new_threshold)
+{
+	unsigned long flags;
+	u32 old_threshold;
+	int force_return = 0;
+
+	spin_lock_irqsave(&sc->credit_ctrl_lock, flags);
+
+	old_threshold = (sc->credit_ctrl >>
+				SC(CREDIT_CTRL_THRESHOLD_SHIFT))
+			 & SC(CREDIT_CTRL_THRESHOLD_MASK);
+
+	if (new_threshold != old_threshold) {
+		sc->credit_ctrl =
+			(sc->credit_ctrl
+				& ~SC(CREDIT_CTRL_THRESHOLD_SMASK))
+			| ((new_threshold
+				& SC(CREDIT_CTRL_THRESHOLD_MASK))
+			   << SC(CREDIT_CTRL_THRESHOLD_SHIFT));
+		write_kctxt_csr(sc->dd, sc->hw_context,
+			SC(CREDIT_CTRL), sc->credit_ctrl);
+
+		/* force a credit return on change to avoid a possible stall */
+		force_return = 1;
+	}
+
+	spin_unlock_irqrestore(&sc->credit_ctrl_lock, flags);
+
+	if (force_return)
+		sc_return_credits(sc);
+}
+
+/*
+ * set_pio_integrity
+ *
+ * Set the CHECK_ENABLE register for the send context 'sc'.
+ */
+void set_pio_integrity(struct send_context *sc)
+{
+	struct hfi1_devdata *dd = sc->dd;
+	u64 reg = 0;
+	u32 hw_context = sc->hw_context;
+	int type = sc->type;
+
+	/*
+	 * No integrity checks if HFI1_CAP_NO_INTEGRITY is set, or if
+	 * we're snooping.
+	 */
+	if (likely(!HFI1_CAP_IS_KSET(NO_INTEGRITY)) &&
+	    dd->hfi1_snoop.mode_flag != HFI1_PORT_SNOOP_MODE)
+		reg = hfi1_pkt_default_send_ctxt_mask(dd, type);
+
+	write_kctxt_csr(dd, hw_context, SC(CHECK_ENABLE), reg);
+}
+
+/*
+ * Allocate a NUMA relative send context structure of the given type along
+ * with a HW context.
+ */
+struct send_context *sc_alloc(struct hfi1_devdata *dd, int type,
+			      uint hdrqentsize, int numa)
+{
+	struct send_context_info *sci;
+	struct send_context *sc;
+	dma_addr_t pa;
+	unsigned long flags;
+	u64 reg;
+	u32 thresh;
+	u32 sw_index;
+	u32 hw_context;
+	int ret;
+	u8 opval, opmask;
+
+	/* do not allocate while frozen */
+	if (dd->flags & HFI1_FROZEN)
+		return NULL;
+
+	sc = kzalloc_node(sizeof(struct send_context), GFP_KERNEL, numa);
+	if (!sc) {
+		dd_dev_err(dd, "Cannot allocate send context structure\n");
+		return NULL;
+	}
+
+	spin_lock_irqsave(&dd->sc_lock, flags);
+	ret = sc_hw_alloc(dd, type, &sw_index, &hw_context);
+	if (ret) {
+		spin_unlock_irqrestore(&dd->sc_lock, flags);
+		kfree(sc);
+		return NULL;
+	}
+
+	sci = &dd->send_contexts[sw_index];
+	sci->sc = sc;
+
+	sc->dd = dd;
+	sc->node = numa;
+	sc->type = type;
+	spin_lock_init(&sc->alloc_lock);
+	spin_lock_init(&sc->release_lock);
+	spin_lock_init(&sc->credit_ctrl_lock);
+	INIT_LIST_HEAD(&sc->piowait);
+	INIT_WORK(&sc->halt_work, sc_halted);
+	atomic_set(&sc->buffers_allocated, 0);
+	init_waitqueue_head(&sc->halt_wait);
+
+	/* grouping is always single context for now */
+	sc->group = 0;
+
+	sc->sw_index = sw_index;
+	sc->hw_context = hw_context;
+	cr_group_addresses(sc, &pa);
+	sc->credits = sci->credits;
+
+/* PIO Send Memory Address details */
+#define PIO_ADDR_CONTEXT_MASK 0xfful
+#define PIO_ADDR_CONTEXT_SHIFT 16
+	sc->base_addr = dd->piobase + ((hw_context & PIO_ADDR_CONTEXT_MASK)
+					<< PIO_ADDR_CONTEXT_SHIFT);
+
+	/* set base and credits */
+	reg = ((sci->credits & SC(CTRL_CTXT_DEPTH_MASK))
+					<< SC(CTRL_CTXT_DEPTH_SHIFT))
+		| ((sci->base & SC(CTRL_CTXT_BASE_MASK))
+					<< SC(CTRL_CTXT_BASE_SHIFT));
+	write_kctxt_csr(dd, hw_context, SC(CTRL), reg);
+
+	set_pio_integrity(sc);
+
+	/* unmask all errors */
+	write_kctxt_csr(dd, hw_context, SC(ERR_MASK), (u64)-1);
+
+	/* set the default partition key */
+	write_kctxt_csr(dd, hw_context, SC(CHECK_PARTITION_KEY),
+		(DEFAULT_PKEY &
+			SC(CHECK_PARTITION_KEY_VALUE_MASK))
+		    << SC(CHECK_PARTITION_KEY_VALUE_SHIFT));
+
+	/* per context type checks */
+	if (type == SC_USER) {
+		opval = USER_OPCODE_CHECK_VAL;
+		opmask = USER_OPCODE_CHECK_MASK;
+	} else {
+		opval = OPCODE_CHECK_VAL_DISABLED;
+		opmask = OPCODE_CHECK_MASK_DISABLED;
+	}
+
+	/* set the send context check opcode mask and value */
+	write_kctxt_csr(dd, hw_context, SC(CHECK_OPCODE),
+		((u64)opmask << SC(CHECK_OPCODE_MASK_SHIFT)) |
+		((u64)opval << SC(CHECK_OPCODE_VALUE_SHIFT)));
+
+	/* set up credit return */
+	reg = pa & SC(CREDIT_RETURN_ADDR_ADDRESS_SMASK);
+	write_kctxt_csr(dd, hw_context, SC(CREDIT_RETURN_ADDR), reg);
+
+	/*
+	 * Calculate the initial credit return threshold.
+	 *
+	 * For Ack contexts, set a threshold for half the credits.
+	 * For User contexts use the given percentage.  This has been
+	 * sanitized on driver start-up.
+	 * For Kernel contexts, use the default MTU plus a header.
+	 */
+	if (type == SC_ACK) {
+		thresh = sc_percent_to_threshold(sc, 50);
+	} else if (type == SC_USER) {
+		thresh = sc_percent_to_threshold(sc,
+				user_credit_return_threshold);
+	} else { /* kernel */
+		thresh = sc_mtu_to_threshold(sc, hfi1_max_mtu, hdrqentsize);
+	}
+	reg = thresh << SC(CREDIT_CTRL_THRESHOLD_SHIFT);
+	/* add in early return */
+	if (type == SC_USER && HFI1_CAP_IS_USET(EARLY_CREDIT_RETURN))
+		reg |= SC(CREDIT_CTRL_EARLY_RETURN_SMASK);
+	else if (HFI1_CAP_IS_KSET(EARLY_CREDIT_RETURN)) /* kernel, ack */
+		reg |= SC(CREDIT_CTRL_EARLY_RETURN_SMASK);
+
+	/* set up write-through credit_ctrl */
+	sc->credit_ctrl = reg;
+	write_kctxt_csr(dd, hw_context, SC(CREDIT_CTRL), reg);
+
+	/* User send contexts should not allow sending on VL15 */
+	if (type == SC_USER) {
+		reg = 1ULL << 15;
+		write_kctxt_csr(dd, hw_context, SC(CHECK_VL), reg);
+	}
+
+	spin_unlock_irqrestore(&dd->sc_lock, flags);
+
+	/*
+	 * Allocate shadow ring to track outstanding PIO buffers _after_
+	 * unlocking.  We don't know the size until the lock is held and
+	 * we can't allocate while the lock is held.  No one is using
+	 * the context yet, so allocate it now.
+	 *
+	 * User contexts do not get a shadow ring.
+	 */
+	if (type != SC_USER) {
+		/*
+		 * Size the shadow ring 1 larger than the number of credits
+		 * so head == tail can mean empty.
+		 */
+		sc->sr_size = sci->credits + 1;
+		sc->sr = kzalloc_node(sizeof(union pio_shadow_ring) *
+				sc->sr_size, GFP_KERNEL, numa);
+		if (!sc->sr) {
+			dd_dev_err(dd,
+				"Cannot allocate send context shadow ring structure\n");
+			sc_free(sc);
+			return NULL;
+		}
+	}
+
+	dd_dev_info(dd,
+		"Send context %u(%u) %s group %u credits %u credit_ctrl 0x%llx threshold %u\n",
+		sw_index,
+		hw_context,
+		sc_type_name(type),
+		sc->group,
+		sc->credits,
+		sc->credit_ctrl,
+		thresh);
+
+	return sc;
+}
+
+/* free a per-NUMA send context structure */
+void sc_free(struct send_context *sc)
+{
+	struct hfi1_devdata *dd;
+	unsigned long flags;
+	u32 sw_index;
+	u32 hw_context;
+
+	if (!sc)
+		return;
+
+	sc->flags |= SCF_IN_FREE;	/* ensure no restarts */
+	dd = sc->dd;
+	if (!list_empty(&sc->piowait))
+		dd_dev_err(dd, "piowait list not empty!\n");
+	sw_index = sc->sw_index;
+	hw_context = sc->hw_context;
+	sc_disable(sc);	/* make sure the HW is disabled */
+	flush_work(&sc->halt_work);
+
+	spin_lock_irqsave(&dd->sc_lock, flags);
+	dd->send_contexts[sw_index].sc = NULL;
+
+	/* clear/disable all registers set in sc_alloc */
+	write_kctxt_csr(dd, hw_context, SC(CTRL), 0);
+	write_kctxt_csr(dd, hw_context, SC(CHECK_ENABLE), 0);
+	write_kctxt_csr(dd, hw_context, SC(ERR_MASK), 0);
+	write_kctxt_csr(dd, hw_context, SC(CHECK_PARTITION_KEY), 0);
+	write_kctxt_csr(dd, hw_context, SC(CHECK_OPCODE), 0);
+	write_kctxt_csr(dd, hw_context, SC(CREDIT_RETURN_ADDR), 0);
+	write_kctxt_csr(dd, hw_context, SC(CREDIT_CTRL), 0);
+
+	/* release the index and context for re-use */
+	sc_hw_free(dd, sw_index, hw_context);
+	spin_unlock_irqrestore(&dd->sc_lock, flags);
+
+	kfree(sc->sr);
+	kfree(sc);
+}
+
+/* disable the context */
+void sc_disable(struct send_context *sc)
+{
+	u64 reg;
+	unsigned long flags;
+	struct pio_buf *pbuf;
+
+	if (!sc)
+		return;
+
+	/* do all steps, even if already disabled */
+	spin_lock_irqsave(&sc->alloc_lock, flags);
+	reg = read_kctxt_csr(sc->dd, sc->hw_context, SC(CTRL));
+	reg &= ~SC(CTRL_CTXT_ENABLE_SMASK);
+	sc->flags &= ~SCF_ENABLED;
+	sc_wait_for_packet_egress(sc, 1);
+	write_kctxt_csr(sc->dd, sc->hw_context, SC(CTRL), reg);
+	spin_unlock_irqrestore(&sc->alloc_lock, flags);
+
+	/*
+	 * Flush any waiters.  Once the context is disabled,
+	 * credit return interrupts are stopped (although there
+	 * could be one in-process when the context is disabled).
+	 * Wait one microsecond for any lingering interrupts, then
+	 * proceed with the flush.
+	 */
+	udelay(1);
+	spin_lock_irqsave(&sc->release_lock, flags);
+	if (sc->sr) {	/* this context has a shadow ring */
+		while (sc->sr_tail != sc->sr_head) {
+			pbuf = &sc->sr[sc->sr_tail].pbuf;
+			if (pbuf->cb)
+				(*pbuf->cb)(pbuf->arg, PRC_SC_DISABLE);
+			sc->sr_tail++;
+			if (sc->sr_tail >= sc->sr_size)
+				sc->sr_tail = 0;
+		}
+	}
+	spin_unlock_irqrestore(&sc->release_lock, flags);
+}
+
+/* return SendEgressCtxtStatus.PacketOccupancy */
+#define packet_occupancy(r) \
+	(((r) & SEND_EGRESS_CTXT_STATUS_CTXT_EGRESS_PACKET_OCCUPANCY_SMASK)\
+	>> SEND_EGRESS_CTXT_STATUS_CTXT_EGRESS_PACKET_OCCUPANCY_SHIFT)
+
+/* is egress halted on the context? */
+#define egress_halted(r) \
+	((r) & SEND_EGRESS_CTXT_STATUS_CTXT_EGRESS_HALT_STATUS_SMASK)
+
+/* wait for packet egress, optionally pause for credit return  */
+static void sc_wait_for_packet_egress(struct send_context *sc, int pause)
+{
+	struct hfi1_devdata *dd = sc->dd;
+	u64 reg;
+	u32 loop = 0;
+
+	while (1) {
+		reg = read_csr(dd, sc->hw_context * 8 +
+			       SEND_EGRESS_CTXT_STATUS);
+		/* done if egress is stopped */
+		if (egress_halted(reg))
+			break;
+		reg = packet_occupancy(reg);
+		if (reg == 0)
+			break;
+		if (loop > 100) {
+			dd_dev_err(dd,
+				"%s: context %u(%u) timeout waiting for packets to egress, remaining count %u\n",
+				__func__, sc->sw_index,
+				sc->hw_context, (u32)reg);
+			break;
+		}
+		loop++;
+		udelay(1);
+	}
+
+	if (pause)
+		/* Add additional delay to ensure chip returns all credits */
+		pause_for_credit_return(dd);
+}
+
+void sc_wait(struct hfi1_devdata *dd)
+{
+	int i;
+
+	for (i = 0; i < dd->num_send_contexts; i++) {
+		struct send_context *sc = dd->send_contexts[i].sc;
+
+		if (!sc)
+			continue;
+		sc_wait_for_packet_egress(sc, 0);
+	}
+}
+
+/*
+ * Restart a context after it has been halted due to error.
+ *
+ * If the first step fails - wait for the halt to be asserted, return early.
+ * Otherwise complain about timeouts but keep going.
+ *
+ * It is expected that allocations (enabled flag bit) have been shut off
+ * already (only applies to kernel contexts).
+ */
+int sc_restart(struct send_context *sc)
+{
+	struct hfi1_devdata *dd = sc->dd;
+	u64 reg;
+	u32 loop;
+	int count;
+
+	/* bounce off if not halted, or being free'd */
+	if (!(sc->flags & SCF_HALTED) || (sc->flags & SCF_IN_FREE))
+		return -EINVAL;
+
+	dd_dev_info(dd, "restarting send context %u(%u)\n", sc->sw_index,
+		sc->hw_context);
+
+	/*
+	 * Step 1: Wait for the context to actually halt.
+	 *
+	 * The error interrupt is asynchronous to actually setting halt
+	 * on the context.
+	 */
+	loop = 0;
+	while (1) {
+		reg = read_kctxt_csr(dd, sc->hw_context, SC(STATUS));
+		if (reg & SC(STATUS_CTXT_HALTED_SMASK))
+			break;
+		if (loop > 100) {
+			dd_dev_err(dd, "%s: context %u(%u) not halting, skipping\n",
+				__func__, sc->sw_index, sc->hw_context);
+			return -ETIME;
+		}
+		loop++;
+		udelay(1);
+	}
+
+	/*
+	 * Step 2: Ensure no users are still trying to write to PIO.
+	 *
+	 * For kernel contexts, we have already turned off buffer allocation.
+	 * Now wait for the buffer count to go to zero.
+	 *
+	 * For user contexts, the user handling code has cut off write access
+	 * to the context's PIO pages before calling this routine and will
+	 * restore write access after this routine returns.
+	 */
+	if (sc->type != SC_USER) {
+		/* kernel context */
+		loop = 0;
+		while (1) {
+			count = atomic_read(&sc->buffers_allocated);
+			if (count == 0)
+				break;
+			if (loop > 100) {
+				dd_dev_err(dd,
+					"%s: context %u(%u) timeout waiting for PIO buffers to zero, remaining %d\n",
+					__func__, sc->sw_index,
+					sc->hw_context, count);
+			}
+			loop++;
+			udelay(1);
+		}
+	}
+
+	/*
+	 * Step 3: Wait for all packets to egress.
+	 * This is done while disabling the send context
+	 *
+	 * Step 4: Disable the context
+	 *
+	 * This is a superset of the halt.  After the disable, the
+	 * errors can be cleared.
+	 */
+	sc_disable(sc);
+
+	/*
+	 * Step 5: Enable the context
+	 *
+	 * This enable will clear the halted flag and per-send context
+	 * error flags.
+	 */
+	return sc_enable(sc);
+}
+
+/*
+ * PIO freeze processing.  To be called after the TXE block is fully frozen.
+ * Go through all frozen send contexts and disable them.  The contexts are
+ * already stopped by the freeze.
+ */
+void pio_freeze(struct hfi1_devdata *dd)
+{
+	struct send_context *sc;
+	int i;
+
+	for (i = 0; i < dd->num_send_contexts; i++) {
+		sc = dd->send_contexts[i].sc;
+		/*
+		 * Don't disable unallocated, unfrozen, or user send contexts.
+		 * User send contexts will be disabled when the process
+		 * calls into the driver to reset its context.
+		 */
+		if (!sc || !(sc->flags & SCF_FROZEN) || sc->type == SC_USER)
+			continue;
+
+		/* only need to disable, the context is already stopped */
+		sc_disable(sc);
+	}
+}
+
+/*
+ * Unfreeze PIO for kernel send contexts.  The precondition for calling this
+ * is that all PIO send contexts have been disabled and the SPC freeze has
+ * been cleared.  Now perform the last step and re-enable each kernel context.
+ * User (PSM) processing will occur when PSM calls into the kernel to
+ * acknowledge the freeze.
+ */
+void pio_kernel_unfreeze(struct hfi1_devdata *dd)
+{
+	struct send_context *sc;
+	int i;
+
+	for (i = 0; i < dd->num_send_contexts; i++) {
+		sc = dd->send_contexts[i].sc;
+		if (!sc || !(sc->flags & SCF_FROZEN) || sc->type == SC_USER)
+			continue;
+
+		sc_enable(sc);	/* will clear the sc frozen flag */
+	}
+}
+
+/*
+ * Wait for the SendPioInitCtxt.PioInitInProgress bit to clear.
+ * Returns:
+ *	-ETIMEDOUT - if we wait too long
+ *	-EIO	   - if there was an error
+ */
+static int pio_init_wait_progress(struct hfi1_devdata *dd)
+{
+	u64 reg;
+	int max, count = 0;
+
+	/* max is the longest possible HW init time / delay */
+	max = (dd->icode == ICODE_FPGA_EMULATION) ? 120 : 5;
+	while (1) {
+		reg = read_csr(dd, SEND_PIO_INIT_CTXT);
+		if (!(reg & SEND_PIO_INIT_CTXT_PIO_INIT_IN_PROGRESS_SMASK))
+			break;
+		if (count >= max)
+			return -ETIMEDOUT;
+		udelay(5);
+		count++;
+	}
+
+	return reg & SEND_PIO_INIT_CTXT_PIO_INIT_ERR_SMASK ? -EIO : 0;
+}
+
+/*
+ * Reset all of the send contexts to their power-on state.  Used
+ * only during manual init - no lock against sc_enable needed.
+ */
+void pio_reset_all(struct hfi1_devdata *dd)
+{
+	int ret;
+
+	/* make sure the init engine is not busy */
+	ret = pio_init_wait_progress(dd);
+	/* ignore any timeout */
+	if (ret == -EIO) {
+		/* clear the error */
+		write_csr(dd, SEND_PIO_ERR_CLEAR,
+			SEND_PIO_ERR_CLEAR_PIO_INIT_SM_IN_ERR_SMASK);
+	}
+
+	/* reset init all */
+	write_csr(dd, SEND_PIO_INIT_CTXT,
+			SEND_PIO_INIT_CTXT_PIO_ALL_CTXT_INIT_SMASK);
+	udelay(2);
+	ret = pio_init_wait_progress(dd);
+	if (ret < 0) {
+		dd_dev_err(dd,
+			"PIO send context init %s while initializing all PIO blocks\n",
+			ret == -ETIMEDOUT ? "is stuck" : "had an error");
+	}
+}
+
+/* enable the context */
+int sc_enable(struct send_context *sc)
+{
+	u64 sc_ctrl, reg, pio;
+	struct hfi1_devdata *dd;
+	unsigned long flags;
+	int ret = 0;
+
+	if (!sc)
+		return -EINVAL;
+	dd = sc->dd;
+
+	/*
+	 * Obtain the allocator lock to guard against any allocation
+	 * attempts (which should not happen prior to context being
+	 * enabled). On the release/disable side we don't need to
+	 * worry about locking since the releaser will not do anything
+	 * if the context accounting values have not changed.
+	 */
+	spin_lock_irqsave(&sc->alloc_lock, flags);
+	sc_ctrl = read_kctxt_csr(dd, sc->hw_context, SC(CTRL));
+	if ((sc_ctrl & SC(CTRL_CTXT_ENABLE_SMASK)))
+		goto unlock; /* already enabled */
+
+	/* IMPORTANT: only clear free and fill if transitioning 0 -> 1 */
+
+	*sc->hw_free = 0;
+	sc->free = 0;
+	sc->alloc_free = 0;
+	sc->fill = 0;
+	sc->sr_head = 0;
+	sc->sr_tail = 0;
+	sc->flags = 0;
+	atomic_set(&sc->buffers_allocated, 0);
+
+	/*
+	 * Clear all per-context errors.  Some of these will be set when
+	 * we are re-enabling after a context halt.  Now that the context
+	 * is disabled, the halt will not clear until after the PIO init
+	 * engine runs below.
+	 */
+	reg = read_kctxt_csr(dd, sc->hw_context, SC(ERR_STATUS));
+	if (reg)
+		write_kctxt_csr(dd, sc->hw_context, SC(ERR_CLEAR),
+			reg);
+
+	/*
+	 * The HW PIO initialization engine can handle only one init
+	 * request at a time. Serialize access to each device's engine.
+	 */
+	spin_lock(&dd->sc_init_lock);
+	/*
+	 * Since access to this code block is serialized and
+	 * each access waits for the initialization to complete
+	 * before releasing the lock, the PIO initialization engine
+	 * should not be in use, so we don't have to wait for the
+	 * InProgress bit to go down.
+	 */
+	pio = ((sc->hw_context & SEND_PIO_INIT_CTXT_PIO_CTXT_NUM_MASK) <<
+	       SEND_PIO_INIT_CTXT_PIO_CTXT_NUM_SHIFT) |
+		SEND_PIO_INIT_CTXT_PIO_SINGLE_CTXT_INIT_SMASK;
+	write_csr(dd, SEND_PIO_INIT_CTXT, pio);
+	/*
+	 * Wait until the engine is done.  Give the chip the required time
+	 * so, hopefully, we read the register just once.
+	 */
+	udelay(2);
+	ret = pio_init_wait_progress(dd);
+	spin_unlock(&dd->sc_init_lock);
+	if (ret) {
+		dd_dev_err(dd,
+			   "sctxt%u(%u): Context not enabled due to init failure %d\n",
+			   sc->sw_index, sc->hw_context, ret);
+		goto unlock;
+	}
+
+	/*
+	 * All is well. Enable the context.
+	 */
+	sc_ctrl |= SC(CTRL_CTXT_ENABLE_SMASK);
+	write_kctxt_csr(dd, sc->hw_context, SC(CTRL), sc_ctrl);
+	/*
+	 * Read SendCtxtCtrl to force the write out and prevent a timing
+	 * hazard where a PIO write may reach the context before the enable.
+	 */
+	read_kctxt_csr(dd, sc->hw_context, SC(CTRL));
+	sc->flags |= SCF_ENABLED;
+
+unlock:
+	spin_unlock_irqrestore(&sc->alloc_lock, flags);
+
+	return ret;
+}
+
+/* force a credit return on the context */
+void sc_return_credits(struct send_context *sc)
+{
+	if (!sc)
+		return;
+
+	/* a 0->1 transition schedules a credit return */
+	write_kctxt_csr(sc->dd, sc->hw_context, SC(CREDIT_FORCE),
+		SC(CREDIT_FORCE_FORCE_RETURN_SMASK));
+	/*
+	 * Ensure that the write is flushed and the credit return is
+	 * scheduled. We care more about the 0 -> 1 transition.
+	 */
+	read_kctxt_csr(sc->dd, sc->hw_context, SC(CREDIT_FORCE));
+	/* set back to 0 for next time */
+	write_kctxt_csr(sc->dd, sc->hw_context, SC(CREDIT_FORCE), 0);
+}
+
+/* allow all in-flight packets to drain on the context */
+void sc_flush(struct send_context *sc)
+{
+	if (!sc)
+		return;
+
+	sc_wait_for_packet_egress(sc, 1);
+}
+
+/* drop all packets on the context, no waiting until they are sent */
+void sc_drop(struct send_context *sc)
+{
+	if (!sc)
+		return;
+
+	dd_dev_info(sc->dd, "%s: context %u(%u) - not implemented\n",
+			__func__, sc->sw_index, sc->hw_context);
+}
+
+/*
+ * Start the software reaction to a context halt or SPC freeze:
+ *	- mark the context as halted or frozen
+ *	- stop buffer allocations
+ *
+ * Called from the error interrupt.  Other work is deferred until
+ * out of the interrupt.
+ */
+void sc_stop(struct send_context *sc, int flag)
+{
+	unsigned long flags;
+
+	/* mark the context */
+	sc->flags |= flag;
+
+	/* stop buffer allocations */
+	spin_lock_irqsave(&sc->alloc_lock, flags);
+	sc->flags &= ~SCF_ENABLED;
+	spin_unlock_irqrestore(&sc->alloc_lock, flags);
+	wake_up(&sc->halt_wait);
+}
+
+#define BLOCK_DWORDS (PIO_BLOCK_SIZE/sizeof(u32))
+#define dwords_to_blocks(x) DIV_ROUND_UP(x, BLOCK_DWORDS)
+
+/*
+ * The send context buffer "allocator".
+ *
+ * @sc: the PIO send context we are allocating from
+ * @len: length of whole packet - including PBC - in dwords
+ * @cb: optional callback to call when the buffer is finished sending
+ * @arg: argument for cb
+ *
+ * Return a pointer to a PIO buffer if successful, NULL if not enough room.
+ */
+struct pio_buf *sc_buffer_alloc(struct send_context *sc, u32 dw_len,
+				pio_release_cb cb, void *arg)
+{
+	struct pio_buf *pbuf = NULL;
+	unsigned long flags;
+	unsigned long avail;
+	unsigned long blocks = dwords_to_blocks(dw_len);
+	unsigned long start_fill;
+	int trycount = 0;
+	u32 head, next;
+
+	spin_lock_irqsave(&sc->alloc_lock, flags);
+	if (!(sc->flags & SCF_ENABLED)) {
+		spin_unlock_irqrestore(&sc->alloc_lock, flags);
+		goto done;
+	}
+
+retry:
+	avail = (unsigned long)sc->credits - (sc->fill - sc->alloc_free);
+	if (blocks > avail) {
+		/* not enough room */
+		if (unlikely(trycount))	{ /* already tried to get more room */
+			spin_unlock_irqrestore(&sc->alloc_lock, flags);
+			goto done;
+		}
+		/* copy from receiver cache line and recalculate */
+		sc->alloc_free = ACCESS_ONCE(sc->free);
+		avail =
+			(unsigned long)sc->credits -
+			(sc->fill - sc->alloc_free);
+		if (blocks > avail) {
+			/* still no room, actively update */
+			spin_unlock_irqrestore(&sc->alloc_lock, flags);
+			sc_release_update(sc);
+			spin_lock_irqsave(&sc->alloc_lock, flags);
+			sc->alloc_free = ACCESS_ONCE(sc->free);
+			trycount++;
+			goto retry;
+		}
+	}
+
+	/* there is enough room */
+
+	atomic_inc(&sc->buffers_allocated);
+
+	/* read this once */
+	head = sc->sr_head;
+
+	/* "allocate" the buffer */
+	start_fill = sc->fill;
+	sc->fill += blocks;
+
+	/*
+	 * Fill the parts that the releaser looks at before moving the head.
+	 * The only necessary piece is the sent_at field.  The credits
+	 * we have just allocated cannot have been returned yet, so the
+	 * cb and arg will not be looked at for a "while".  Put them
+	 * on this side of the memory barrier anyway.
+	 */
+	pbuf = &sc->sr[head].pbuf;
+	pbuf->sent_at = sc->fill;
+	pbuf->cb = cb;
+	pbuf->arg = arg;
+	pbuf->sc = sc;	/* could be filled in at sc->sr init time */
+	/* make sure this is in memory before updating the head */
+
+	/* calculate next head index, do not store */
+	next = head + 1;
+	if (next >= sc->sr_size)
+		next = 0;
+	/* update the head - must be last! - the releaser can look at fields
+	   in pbuf once we move the head */
+	smp_wmb();
+	sc->sr_head = next;
+	spin_unlock_irqrestore(&sc->alloc_lock, flags);
+
+	/* finish filling in the buffer outside the lock */
+	pbuf->start = sc->base_addr + ((start_fill % sc->credits)
+							* PIO_BLOCK_SIZE);
+	pbuf->size = sc->credits * PIO_BLOCK_SIZE;
+	pbuf->end = sc->base_addr + pbuf->size;
+	pbuf->block_count = blocks;
+	pbuf->qw_written = 0;
+	pbuf->carry_bytes = 0;
+	pbuf->carry.val64 = 0;
+done:
+	return pbuf;
+}
+
+/*
+ * There are at least two entities that can turn on credit return
+ * interrupts and they can overlap.  Avoid problems by implementing
+ * a count scheme that is enforced by a lock.  The lock is needed because
+ * the count and CSR write must be paired.
+ */
+
+/*
+ * Start credit return interrupts.  This is managed by a count.  If already
+ * on, just increment the count.
+ */
+void sc_add_credit_return_intr(struct send_context *sc)
+{
+	unsigned long flags;
+
+	/* lock must surround both the count change and the CSR update */
+	spin_lock_irqsave(&sc->credit_ctrl_lock, flags);
+	if (sc->credit_intr_count == 0) {
+		sc->credit_ctrl |= SC(CREDIT_CTRL_CREDIT_INTR_SMASK);
+		write_kctxt_csr(sc->dd, sc->hw_context,
+			SC(CREDIT_CTRL), sc->credit_ctrl);
+	}
+	sc->credit_intr_count++;
+	spin_unlock_irqrestore(&sc->credit_ctrl_lock, flags);
+}
+
+/*
+ * Stop credit return interrupts.  This is managed by a count.  Decrement the
+ * count, if the last user, then turn the credit interrupts off.
+ */
+void sc_del_credit_return_intr(struct send_context *sc)
+{
+	unsigned long flags;
+
+	WARN_ON(sc->credit_intr_count == 0);
+
+	/* lock must surround both the count change and the CSR update */
+	spin_lock_irqsave(&sc->credit_ctrl_lock, flags);
+	sc->credit_intr_count--;
+	if (sc->credit_intr_count == 0) {
+		sc->credit_ctrl &= ~SC(CREDIT_CTRL_CREDIT_INTR_SMASK);
+		write_kctxt_csr(sc->dd, sc->hw_context,
+			SC(CREDIT_CTRL), sc->credit_ctrl);
+	}
+	spin_unlock_irqrestore(&sc->credit_ctrl_lock, flags);
+}
+
+/*
+ * The caller must be careful when calling this.  All needint calls
+ * must be paired with !needint.
+ */
+void hfi1_sc_wantpiobuf_intr(struct send_context *sc, u32 needint)
+{
+	if (needint)
+		sc_add_credit_return_intr(sc);
+	else
+		sc_del_credit_return_intr(sc);
+	trace_hfi1_wantpiointr(sc, needint, sc->credit_ctrl);
+	if (needint) {
+		mmiowb();
+		sc_return_credits(sc);
+	}
+}
+
+/**
+ * sc_piobufavail - callback when a PIO buffer is available
+ * @sc: the send context
+ *
+ * This is called from the interrupt handler when a PIO buffer is
+ * available after hfi1_verbs_send() returned an error that no buffers were
+ * available. Disable the interrupt if there are no more QPs waiting.
+ */
+static void sc_piobufavail(struct send_context *sc)
+{
+	struct hfi1_devdata *dd = sc->dd;
+	struct hfi1_ibdev *dev = &dd->verbs_dev;
+	struct list_head *list;
+	struct hfi1_qp *qps[PIO_WAIT_BATCH_SIZE];
+	struct hfi1_qp *qp;
+	unsigned long flags;
+	unsigned i, n = 0;
+
+	if (dd->send_contexts[sc->sw_index].type != SC_KERNEL)
+		return;
+	list = &sc->piowait;
+	/*
+	 * Note: checking that the piowait list is empty and clearing
+	 * the buffer available interrupt needs to be atomic or we
+	 * could end up with QPs on the wait list with the interrupt
+	 * disabled.
+	 */
+	write_seqlock_irqsave(&dev->iowait_lock, flags);
+	while (!list_empty(list)) {
+		struct iowait *wait;
+
+		if (n == ARRAY_SIZE(qps))
+			goto full;
+		wait = list_first_entry(list, struct iowait, list);
+		qp = container_of(wait, struct hfi1_qp, s_iowait);
+		list_del_init(&qp->s_iowait.list);
+		/* refcount held until actual wake up */
+		qps[n++] = qp;
+	}
+	/*
+	 * Counting: only call wantpiobuf_intr() if there were waiters and they
+	 * are now all gone.
+	 */
+	if (n)
+		hfi1_sc_wantpiobuf_intr(sc, 0);
+full:
+	write_sequnlock_irqrestore(&dev->iowait_lock, flags);
+
+	for (i = 0; i < n; i++)
+		hfi1_qp_wakeup(qps[i], HFI1_S_WAIT_PIO);
+}
+
+/* translate a send credit update to a bit code of reasons */
+static inline int fill_code(u64 hw_free)
+{
+	int code = 0;
+
+	if (hw_free & CR_STATUS_SMASK)
+		code |= PRC_STATUS_ERR;
+	if (hw_free & CR_CREDIT_RETURN_DUE_TO_PBC_SMASK)
+		code |= PRC_PBC;
+	if (hw_free & CR_CREDIT_RETURN_DUE_TO_THRESHOLD_SMASK)
+		code |= PRC_THRESHOLD;
+	if (hw_free & CR_CREDIT_RETURN_DUE_TO_ERR_SMASK)
+		code |= PRC_FILL_ERR;
+	if (hw_free & CR_CREDIT_RETURN_DUE_TO_FORCE_SMASK)
+		code |= PRC_SC_DISABLE;
+	return code;
+}
+
+/* use the jiffies compare to get the wrap right */
+#define sent_before(a, b) time_before(a, b)	/* a < b */
+
+/*
+ * The send context buffer "releaser".
+ */
+void sc_release_update(struct send_context *sc)
+{
+	struct pio_buf *pbuf;
+	u64 hw_free;
+	u32 head, tail;
+	unsigned long old_free;
+	unsigned long extra;
+	unsigned long flags;
+	int code;
+
+	if (!sc)
+		return;
+
+	spin_lock_irqsave(&sc->release_lock, flags);
+	/* update free */
+	hw_free = le64_to_cpu(*sc->hw_free);		/* volatile read */
+	old_free = sc->free;
+	extra = (((hw_free & CR_COUNTER_SMASK) >> CR_COUNTER_SHIFT)
+			- (old_free & CR_COUNTER_MASK))
+				& CR_COUNTER_MASK;
+	sc->free = old_free + extra;
+	trace_hfi1_piofree(sc, extra);
+
+	/* call sent buffer callbacks */
+	code = -1;				/* code not yet set */
+	head = ACCESS_ONCE(sc->sr_head);	/* snapshot the head */
+	tail = sc->sr_tail;
+	while (head != tail) {
+		pbuf = &sc->sr[tail].pbuf;
+
+		if (sent_before(sc->free, pbuf->sent_at)) {
+			/* not sent yet */
+			break;
+		}
+		if (pbuf->cb) {
+			if (code < 0) /* fill in code on first user */
+				code = fill_code(hw_free);
+			(*pbuf->cb)(pbuf->arg, code);
+		}
+
+		tail++;
+		if (tail >= sc->sr_size)
+			tail = 0;
+	}
+	/* update tail, in case we moved it */
+	sc->sr_tail = tail;
+	spin_unlock_irqrestore(&sc->release_lock, flags);
+	sc_piobufavail(sc);
+}
+
+/*
+ * Send context group releaser.  Argument is the send context that caused
+ * the interrupt.  Called from the send context interrupt handler.
+ *
+ * Call release on all contexts in the group.
+ *
+ * This routine takes the sc_lock without an irqsave because it is only
+ * called from an interrupt handler.  Adjust if that changes.
+ */
+void sc_group_release_update(struct hfi1_devdata *dd, u32 hw_context)
+{
+	struct send_context *sc;
+	u32 sw_index;
+	u32 gc, gc_end;
+
+	spin_lock(&dd->sc_lock);
+	sw_index = dd->hw_to_sw[hw_context];
+	if (unlikely(sw_index >= dd->num_send_contexts)) {
+		dd_dev_err(dd, "%s: invalid hw (%u) to sw (%u) mapping\n",
+			__func__, hw_context, sw_index);
+		goto done;
+	}
+	sc = dd->send_contexts[sw_index].sc;
+	if (unlikely(!sc))
+		goto done;
+
+	gc = group_context(hw_context, sc->group);
+	gc_end = gc + group_size(sc->group);
+	for (; gc < gc_end; gc++) {
+		sw_index = dd->hw_to_sw[gc];
+		if (unlikely(sw_index >= dd->num_send_contexts)) {
+			dd_dev_err(dd,
+				"%s: invalid hw (%u) to sw (%u) mapping\n",
+				__func__, hw_context, sw_index);
+			continue;
+		}
+		sc_release_update(dd->send_contexts[sw_index].sc);
+	}
+done:
+	spin_unlock(&dd->sc_lock);
+}
+
+int init_pervl_scs(struct hfi1_devdata *dd)
+{
+	int i;
+	u64 mask, all_vl_mask = (u64) 0x80ff; /* VLs 0-7, 15 */
+	u32 ctxt;
+
+	dd->vld[15].sc = sc_alloc(dd, SC_KERNEL,
+				  dd->rcd[0]->rcvhdrqentsize, dd->node);
+	if (!dd->vld[15].sc)
+		goto nomem;
+	hfi1_init_ctxt(dd->vld[15].sc);
+	dd->vld[15].mtu = enum_to_mtu(OPA_MTU_2048);
+	for (i = 0; i < num_vls; i++) {
+		/*
+		 * Since this function does not deal with a specific
+		 * receive context but we need the RcvHdrQ entry size,
+		 * use the size from rcd[0]. It is guaranteed to be
+		 * valid at this point and will remain the same for all
+		 * receive contexts.
+		 */
+		dd->vld[i].sc = sc_alloc(dd, SC_KERNEL,
+					 dd->rcd[0]->rcvhdrqentsize, dd->node);
+		if (!dd->vld[i].sc)
+			goto nomem;
+
+		hfi1_init_ctxt(dd->vld[i].sc);
+
+		/* non VL15 start with the max MTU */
+		dd->vld[i].mtu = hfi1_max_mtu;
+	}
+	sc_enable(dd->vld[15].sc);
+	ctxt = dd->vld[15].sc->hw_context;
+	mask = all_vl_mask & ~(1LL << 15);
+	write_kctxt_csr(dd, ctxt, SC(CHECK_VL), mask);
+	dd_dev_info(dd,
+		    "Using send context %u(%u) for VL15\n",
+		    dd->vld[15].sc->sw_index, ctxt);
+	for (i = 0; i < num_vls; i++) {
+		sc_enable(dd->vld[i].sc);
+		ctxt = dd->vld[i].sc->hw_context;
+		mask = all_vl_mask & ~(1LL << i);
+		write_kctxt_csr(dd, ctxt, SC(CHECK_VL), mask);
+	}
+	return 0;
+nomem:
+	sc_free(dd->vld[15].sc);
+	for (i = 0; i < num_vls; i++)
+		sc_free(dd->vld[i].sc);
+	return -ENOMEM;
+}
+
+int init_credit_return(struct hfi1_devdata *dd)
+{
+	int ret;
+	int num_numa;
+	int i;
+
+	num_numa = num_online_nodes();
+	/* enforce the expectation that the numas are compact */
+	for (i = 0; i < num_numa; i++) {
+		if (!node_online(i)) {
+			dd_dev_err(dd, "NUMA nodes are not compact\n");
+			ret = -EINVAL;
+			goto done;
+		}
+	}
+
+	dd->cr_base = kcalloc(
+		num_numa,
+		sizeof(struct credit_return_base),
+		GFP_KERNEL);
+	if (!dd->cr_base) {
+		dd_dev_err(dd, "Unable to allocate credit return base\n");
+		ret = -ENOMEM;
+		goto done;
+	}
+	for (i = 0; i < num_numa; i++) {
+		int bytes = TXE_NUM_CONTEXTS * sizeof(struct credit_return);
+
+		set_dev_node(&dd->pcidev->dev, i);
+		dd->cr_base[i].va = dma_zalloc_coherent(
+					&dd->pcidev->dev,
+					bytes,
+					&dd->cr_base[i].pa,
+					GFP_KERNEL);
+		if (dd->cr_base[i].va == NULL) {
+			set_dev_node(&dd->pcidev->dev, dd->node);
+			dd_dev_err(dd,
+				"Unable to allocate credit return DMA range for NUMA %d\n",
+				i);
+			ret = -ENOMEM;
+			goto done;
+		}
+	}
+	set_dev_node(&dd->pcidev->dev, dd->node);
+
+	ret = 0;
+done:
+	return ret;
+}
+
+void free_credit_return(struct hfi1_devdata *dd)
+{
+	int num_numa;
+	int i;
+
+	if (!dd->cr_base)
+		return;
+
+	num_numa = num_online_nodes();
+	for (i = 0; i < num_numa; i++) {
+		if (dd->cr_base[i].va) {
+			dma_free_coherent(&dd->pcidev->dev,
+				TXE_NUM_CONTEXTS
+					* sizeof(struct credit_return),
+				dd->cr_base[i].va,
+				dd->cr_base[i].pa);
+		}
+	}
+	kfree(dd->cr_base);
+	dd->cr_base = NULL;
+}
diff --git a/drivers/staging/rdma/hfi1/pio.h b/drivers/staging/rdma/hfi1/pio.h
new file mode 100644
index 000000000000..0bb885ca3cfb
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/pio.h
@@ -0,0 +1,224 @@
+#ifndef _PIO_H
+#define _PIO_H
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+
+/* send context types */
+#define SC_KERNEL 0
+#define SC_ACK    1
+#define SC_USER   2
+#define SC_MAX    3
+
+/* invalid send context index */
+#define INVALID_SCI 0xff
+
+/* PIO buffer release callback function */
+typedef void (*pio_release_cb)(void *arg, int code);
+
+/* PIO release codes - in bits, as there could more than one that apply */
+#define PRC_OK		0	/* no known error */
+#define PRC_STATUS_ERR	0x01	/* credit return due to status error */
+#define PRC_PBC		0x02	/* credit return due to PBC */
+#define PRC_THRESHOLD	0x04	/* credit return due to threshold */
+#define PRC_FILL_ERR	0x08	/* credit return due fill error */
+#define PRC_FORCE	0x10	/* credit return due credit force */
+#define PRC_SC_DISABLE	0x20	/* clean-up after a context disable */
+
+/* byte helper */
+union mix {
+	u64 val64;
+	u32 val32[2];
+	u8  val8[8];
+};
+
+/* an allocated PIO buffer */
+struct pio_buf {
+	struct send_context *sc;/* back pointer to owning send context */
+	pio_release_cb cb;	/* called when the buffer is released */
+	void *arg;		/* argument for cb */
+	void __iomem *start;	/* buffer start address */
+	void __iomem *end;	/* context end address */
+	unsigned long size;	/* context size, in bytes */
+	unsigned long sent_at;	/* buffer is sent when <= free */
+	u32 block_count;	/* size of buffer, in blocks */
+	u32 qw_written;		/* QW written so far */
+	u32 carry_bytes;	/* number of valid bytes in carry */
+	union mix carry;	/* pending unwritten bytes */
+};
+
+/* cache line aligned pio buffer array */
+union pio_shadow_ring {
+	struct pio_buf pbuf;
+	u64 unused[16];		/* cache line spacer */
+} ____cacheline_aligned;
+
+/* per-NUMA send context */
+struct send_context {
+	/* read-only after init */
+	struct hfi1_devdata *dd;		/* device */
+	void __iomem *base_addr;	/* start of PIO memory */
+	union pio_shadow_ring *sr;	/* shadow ring */
+	volatile __le64 *hw_free;	/* HW free counter */
+	struct work_struct halt_work;	/* halted context work queue entry */
+	unsigned long flags;		/* flags */
+	int node;			/* context home node */
+	int type;			/* context type */
+	u32 sw_index;			/* software index number */
+	u32 hw_context;			/* hardware context number */
+	u32 credits;			/* number of blocks in context */
+	u32 sr_size;			/* size of the shadow ring */
+	u32 group;			/* credit return group */
+	/* allocator fields */
+	spinlock_t alloc_lock ____cacheline_aligned_in_smp;
+	unsigned long fill;		/* official alloc count */
+	unsigned long alloc_free;	/* copy of free (less cache thrash) */
+	u32 sr_head;			/* shadow ring head */
+	/* releaser fields */
+	spinlock_t release_lock ____cacheline_aligned_in_smp;
+	unsigned long free;		/* official free count */
+	u32 sr_tail;			/* shadow ring tail */
+	/* list for PIO waiters */
+	struct list_head piowait  ____cacheline_aligned_in_smp;
+	spinlock_t credit_ctrl_lock ____cacheline_aligned_in_smp;
+	u64 credit_ctrl;		/* cache for credit control */
+	u32 credit_intr_count;		/* count of credit intr users */
+	atomic_t buffers_allocated;	/* count of buffers allocated */
+	wait_queue_head_t halt_wait;    /* wait until kernel sees interrupt */
+};
+
+/* send context flags */
+#define SCF_ENABLED 0x01
+#define SCF_IN_FREE 0x02
+#define SCF_HALTED  0x04
+#define SCF_FROZEN  0x08
+
+struct send_context_info {
+	struct send_context *sc;	/* allocated working context */
+	u16 allocated;			/* has this been allocated? */
+	u16 type;			/* context type */
+	u16 base;			/* base in PIO array */
+	u16 credits;			/* size in PIO array */
+};
+
+/* DMA credit return, index is always (context & 0x7) */
+struct credit_return {
+	volatile __le64 cr[8];
+};
+
+/* NUMA indexed credit return array */
+struct credit_return_base {
+	struct credit_return *va;
+	dma_addr_t pa;
+};
+
+/* send context configuration sizes (one per type) */
+struct sc_config_sizes {
+	short int size;
+	short int count;
+};
+
+/* send context functions */
+int init_credit_return(struct hfi1_devdata *dd);
+void free_credit_return(struct hfi1_devdata *dd);
+int init_sc_pools_and_sizes(struct hfi1_devdata *dd);
+int init_send_contexts(struct hfi1_devdata *dd);
+int init_credit_return(struct hfi1_devdata *dd);
+int init_pervl_scs(struct hfi1_devdata *dd);
+struct send_context *sc_alloc(struct hfi1_devdata *dd, int type,
+			      uint hdrqentsize, int numa);
+void sc_free(struct send_context *sc);
+int sc_enable(struct send_context *sc);
+void sc_disable(struct send_context *sc);
+int sc_restart(struct send_context *sc);
+void sc_return_credits(struct send_context *sc);
+void sc_flush(struct send_context *sc);
+void sc_drop(struct send_context *sc);
+void sc_stop(struct send_context *sc, int bit);
+struct pio_buf *sc_buffer_alloc(struct send_context *sc, u32 dw_len,
+			pio_release_cb cb, void *arg);
+void sc_release_update(struct send_context *sc);
+void sc_return_credits(struct send_context *sc);
+void sc_group_release_update(struct hfi1_devdata *dd, u32 hw_context);
+void sc_add_credit_return_intr(struct send_context *sc);
+void sc_del_credit_return_intr(struct send_context *sc);
+void sc_set_cr_threshold(struct send_context *sc, u32 new_threshold);
+u32 sc_mtu_to_threshold(struct send_context *sc, u32 mtu, u32 hdrqentsize);
+void hfi1_sc_wantpiobuf_intr(struct send_context *sc, u32 needint);
+void sc_wait(struct hfi1_devdata *dd);
+void set_pio_integrity(struct send_context *sc);
+
+/* support functions */
+void pio_reset_all(struct hfi1_devdata *dd);
+void pio_freeze(struct hfi1_devdata *dd);
+void pio_kernel_unfreeze(struct hfi1_devdata *dd);
+
+/* global PIO send control operations */
+#define PSC_GLOBAL_ENABLE 0
+#define PSC_GLOBAL_DISABLE 1
+#define PSC_GLOBAL_VLARB_ENABLE 2
+#define PSC_GLOBAL_VLARB_DISABLE 3
+#define PSC_CM_RESET 4
+#define PSC_DATA_VL_ENABLE 5
+#define PSC_DATA_VL_DISABLE 6
+
+void __cm_reset(struct hfi1_devdata *dd, u64 sendctrl);
+void pio_send_control(struct hfi1_devdata *dd, int op);
+
+
+/* PIO copy routines */
+void pio_copy(struct hfi1_devdata *dd, struct pio_buf *pbuf, u64 pbc,
+	      const void *from, size_t count);
+void seg_pio_copy_start(struct pio_buf *pbuf, u64 pbc,
+					const void *from, size_t nbytes);
+void seg_pio_copy_mid(struct pio_buf *pbuf, const void *from, size_t nbytes);
+void seg_pio_copy_end(struct pio_buf *pbuf);
+
+#endif /* _PIO_H */
diff --git a/drivers/staging/rdma/hfi1/pio_copy.c b/drivers/staging/rdma/hfi1/pio_copy.c
new file mode 100644
index 000000000000..8972bbc02038
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/pio_copy.c
@@ -0,0 +1,858 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include "hfi.h"
+
+/* additive distance between non-SOP and SOP space */
+#define SOP_DISTANCE (TXE_PIO_SIZE / 2)
+#define PIO_BLOCK_MASK (PIO_BLOCK_SIZE-1)
+/* number of QUADWORDs in a block */
+#define PIO_BLOCK_QWS (PIO_BLOCK_SIZE/sizeof(u64))
+
+/**
+ * pio_copy - copy data block to MMIO space
+ * @pbuf: a number of blocks allocated within a PIO send context
+ * @pbc: PBC to send
+ * @from: source, must be 8 byte aligned
+ * @count: number of DWORD (32-bit) quantities to copy from source
+ *
+ * Copy data from source to PIO Send Buffer memory, 8 bytes at a time.
+ * Must always write full BLOCK_SIZE bytes blocks.  The first block must
+ * be written to the corresponding SOP=1 address.
+ *
+ * Known:
+ * o pbuf->start always starts on a block boundary
+ * o pbuf can wrap only at a block boundary
+ */
+void pio_copy(struct hfi1_devdata *dd, struct pio_buf *pbuf, u64 pbc,
+	      const void *from, size_t count)
+{
+	void __iomem *dest = pbuf->start + SOP_DISTANCE;
+	void __iomem *send = dest + PIO_BLOCK_SIZE;
+	void __iomem *dend;			/* 8-byte data end */
+
+	/* write the PBC */
+	writeq(pbc, dest);
+	dest += sizeof(u64);
+
+	/* calculate where the QWORD data ends - in SOP=1 space */
+	dend = dest + ((count>>1) * sizeof(u64));
+
+	if (dend < send) {
+		/* all QWORD data is within the SOP block, does *not*
+		   reach the end of the SOP block */
+
+		while (dest < dend) {
+			writeq(*(u64 *)from, dest);
+			from += sizeof(u64);
+			dest += sizeof(u64);
+		}
+		/*
+		 * No boundary checks are needed here:
+		 * 0. We're not on the SOP block boundary
+		 * 1. The possible DWORD dangle will still be within
+		 *    the SOP block
+		 * 2. We cannot wrap except on a block boundary.
+		 */
+	} else {
+		/* QWORD data extends _to_ or beyond the SOP block */
+
+		/* write 8-byte SOP chunk data */
+		while (dest < send) {
+			writeq(*(u64 *)from, dest);
+			from += sizeof(u64);
+			dest += sizeof(u64);
+		}
+		/* drop out of the SOP range */
+		dest -= SOP_DISTANCE;
+		dend -= SOP_DISTANCE;
+
+		/*
+		 * If the wrap comes before or matches the data end,
+		 * copy until until the wrap, then wrap.
+		 *
+		 * If the data ends at the end of the SOP above and
+		 * the buffer wraps, then pbuf->end == dend == dest
+		 * and nothing will get written, but we will wrap in
+		 * case there is a dangling DWORD.
+		 */
+		if (pbuf->end <= dend) {
+			while (dest < pbuf->end) {
+				writeq(*(u64 *)from, dest);
+				from += sizeof(u64);
+				dest += sizeof(u64);
+			}
+
+			dest -= pbuf->size;
+			dend -= pbuf->size;
+		}
+
+		/* write 8-byte non-SOP, non-wrap chunk data */
+		while (dest < dend) {
+			writeq(*(u64 *)from, dest);
+			from += sizeof(u64);
+			dest += sizeof(u64);
+		}
+	}
+	/* at this point we have wrapped if we are going to wrap */
+
+	/* write dangling u32, if any */
+	if (count & 1) {
+		union mix val;
+
+		val.val64 = 0;
+		val.val32[0] = *(u32 *)from;
+		writeq(val.val64, dest);
+		dest += sizeof(u64);
+	}
+	/* fill in rest of block, no need to check pbuf->end
+	   as we only wrap on a block boundary */
+	while (((unsigned long)dest & PIO_BLOCK_MASK) != 0) {
+		writeq(0, dest);
+		dest += sizeof(u64);
+	}
+
+	/* finished with this buffer */
+	atomic_dec(&pbuf->sc->buffers_allocated);
+}
+
+/* USE_SHIFTS is faster in user-space tests on a Xeon X5570 @ 2.93GHz */
+#define USE_SHIFTS 1
+#ifdef USE_SHIFTS
+/*
+ * Handle carry bytes using shifts and masks.
+ *
+ * NOTE: the value the unused portion of carry is expected to always be zero.
+ */
+
+/*
+ * "zero" shift - bit shift used to zero out upper bytes.  Input is
+ * the count of LSB bytes to preserve.
+ */
+#define zshift(x) (8 * (8-(x)))
+
+/*
+ * "merge" shift - bit shift used to merge with carry bytes.  Input is
+ * the LSB byte count to move beyond.
+ */
+#define mshift(x) (8 * (x))
+
+/*
+ * Read nbytes bytes from "from" and return them in the LSB bytes
+ * of pbuf->carry.  Other bytes are zeroed.  Any previous value
+ * pbuf->carry is lost.
+ *
+ * NOTES:
+ * o do not read from from if nbytes is zero
+ * o from may _not_ be u64 aligned
+ * o nbytes must not span a QW boundary
+ */
+static inline void read_low_bytes(struct pio_buf *pbuf, const void *from,
+							unsigned int nbytes)
+{
+	unsigned long off;
+
+	if (nbytes == 0) {
+		pbuf->carry.val64 = 0;
+	} else {
+		/* align our pointer */
+		off = (unsigned long)from & 0x7;
+		from = (void *)((unsigned long)from & ~0x7l);
+		pbuf->carry.val64 = ((*(u64 *)from)
+				<< zshift(nbytes + off))/* zero upper bytes */
+				>> zshift(nbytes);	/* place at bottom */
+	}
+	pbuf->carry_bytes = nbytes;
+}
+
+/*
+ * Read nbytes bytes from "from" and put them at the next significant bytes
+ * of pbuf->carry.  Unused bytes are zeroed.  It is expected that the extra
+ * read does not overfill carry.
+ *
+ * NOTES:
+ * o from may _not_ be u64 aligned
+ * o nbytes may span a QW boundary
+ */
+static inline void read_extra_bytes(struct pio_buf *pbuf,
+					const void *from, unsigned int nbytes)
+{
+	unsigned long off = (unsigned long)from & 0x7;
+	unsigned int room, xbytes;
+
+	/* align our pointer */
+	from = (void *)((unsigned long)from & ~0x7l);
+
+	/* check count first - don't read anything if count is zero */
+	while (nbytes) {
+		/* find the number of bytes in this u64 */
+		room = 8 - off;	/* this u64 has room for this many bytes */
+		xbytes = nbytes > room ? room : nbytes;
+
+		/*
+		 * shift down to zero lower bytes, shift up to zero upper
+		 * bytes, shift back down to move into place
+		 */
+		pbuf->carry.val64 |= (((*(u64 *)from)
+					>> mshift(off))
+					<< zshift(xbytes))
+					>> zshift(xbytes+pbuf->carry_bytes);
+		off = 0;
+		pbuf->carry_bytes += xbytes;
+		nbytes -= xbytes;
+		from += sizeof(u64);
+	}
+}
+
+/*
+ * Zero extra bytes from the end of pbuf->carry.
+ *
+ * NOTES:
+ * o zbytes <= old_bytes
+ */
+static inline void zero_extra_bytes(struct pio_buf *pbuf, unsigned int zbytes)
+{
+	unsigned int remaining;
+
+	if (zbytes == 0)	/* nothing to do */
+		return;
+
+	remaining = pbuf->carry_bytes - zbytes;	/* remaining bytes */
+
+	/* NOTE: zshift only guaranteed to work if remaining != 0 */
+	if (remaining)
+		pbuf->carry.val64 = (pbuf->carry.val64 << zshift(remaining))
+					>> zshift(remaining);
+	else
+		pbuf->carry.val64 = 0;
+	pbuf->carry_bytes = remaining;
+}
+
+/*
+ * Write a quad word using parts of pbuf->carry and the next 8 bytes of src.
+ * Put the unused part of the next 8 bytes of src into the LSB bytes of
+ * pbuf->carry with the upper bytes zeroed..
+ *
+ * NOTES:
+ * o result must keep unused bytes zeroed
+ * o src must be u64 aligned
+ */
+static inline void merge_write8(
+	struct pio_buf *pbuf,
+	void __iomem *dest,
+	const void *src)
+{
+	u64 new, temp;
+
+	new = *(u64 *)src;
+	temp = pbuf->carry.val64 | (new << mshift(pbuf->carry_bytes));
+	writeq(temp, dest);
+	pbuf->carry.val64 = new >> zshift(pbuf->carry_bytes);
+}
+
+/*
+ * Write a quad word using all bytes of carry.
+ */
+static inline void carry8_write8(union mix carry, void __iomem *dest)
+{
+	writeq(carry.val64, dest);
+}
+
+/*
+ * Write a quad word using all the valid bytes of carry.  If carry
+ * has zero valid bytes, nothing is written.
+ * Returns 0 on nothing written, non-zero on quad word written.
+ */
+static inline int carry_write8(struct pio_buf *pbuf, void __iomem *dest)
+{
+	if (pbuf->carry_bytes) {
+		/* unused bytes are always kept zeroed, so just write */
+		writeq(pbuf->carry.val64, dest);
+		return 1;
+	}
+
+	return 0;
+}
+
+#else /* USE_SHIFTS */
+/*
+ * Handle carry bytes using byte copies.
+ *
+ * NOTE: the value the unused portion of carry is left uninitialized.
+ */
+
+/*
+ * Jump copy - no-loop copy for < 8 bytes.
+ */
+static inline void jcopy(u8 *dest, const u8 *src, u32 n)
+{
+	switch (n) {
+	case 7:
+		*dest++ = *src++;
+	case 6:
+		*dest++ = *src++;
+	case 5:
+		*dest++ = *src++;
+	case 4:
+		*dest++ = *src++;
+	case 3:
+		*dest++ = *src++;
+	case 2:
+		*dest++ = *src++;
+	case 1:
+		*dest++ = *src++;
+	}
+}
+
+/*
+ * Read nbytes from "from" and and place them in the low bytes
+ * of pbuf->carry.  Other bytes are left as-is.  Any previous
+ * value in pbuf->carry is lost.
+ *
+ * NOTES:
+ * o do not read from from if nbytes is zero
+ * o from may _not_ be u64 aligned.
+ */
+static inline void read_low_bytes(struct pio_buf *pbuf, const void *from,
+							unsigned int nbytes)
+{
+	jcopy(&pbuf->carry.val8[0], from, nbytes);
+	pbuf->carry_bytes = nbytes;
+}
+
+/*
+ * Read nbytes bytes from "from" and put them at the end of pbuf->carry.
+ * It is expected that the extra read does not overfill carry.
+ *
+ * NOTES:
+ * o from may _not_ be u64 aligned
+ * o nbytes may span a QW boundary
+ */
+static inline void read_extra_bytes(struct pio_buf *pbuf,
+					const void *from, unsigned int nbytes)
+{
+	jcopy(&pbuf->carry.val8[pbuf->carry_bytes], from, nbytes);
+	pbuf->carry_bytes += nbytes;
+}
+
+/*
+ * Zero extra bytes from the end of pbuf->carry.
+ *
+ * We do not care about the value of unused bytes in carry, so just
+ * reduce the byte count.
+ *
+ * NOTES:
+ * o zbytes <= old_bytes
+ */
+static inline void zero_extra_bytes(struct pio_buf *pbuf, unsigned int zbytes)
+{
+	pbuf->carry_bytes -= zbytes;
+}
+
+/*
+ * Write a quad word using parts of pbuf->carry and the next 8 bytes of src.
+ * Put the unused part of the next 8 bytes of src into the low bytes of
+ * pbuf->carry.
+ */
+static inline void merge_write8(
+	struct pio_buf *pbuf,
+	void *dest,
+	const void *src)
+{
+	u32 remainder = 8 - pbuf->carry_bytes;
+
+	jcopy(&pbuf->carry.val8[pbuf->carry_bytes], src, remainder);
+	writeq(pbuf->carry.val64, dest);
+	jcopy(&pbuf->carry.val8[0], src+remainder, pbuf->carry_bytes);
+}
+
+/*
+ * Write a quad word using all bytes of carry.
+ */
+static inline void carry8_write8(union mix carry, void *dest)
+{
+	writeq(carry.val64, dest);
+}
+
+/*
+ * Write a quad word using all the valid bytes of carry.  If carry
+ * has zero valid bytes, nothing is written.
+ * Returns 0 on nothing written, non-zero on quad word written.
+ */
+static inline int carry_write8(struct pio_buf *pbuf, void *dest)
+{
+	if (pbuf->carry_bytes) {
+		u64 zero = 0;
+
+		jcopy(&pbuf->carry.val8[pbuf->carry_bytes], (u8 *)&zero,
+						8 - pbuf->carry_bytes);
+		writeq(pbuf->carry.val64, dest);
+		return 1;
+	}
+
+	return 0;
+}
+#endif /* USE_SHIFTS */
+
+/*
+ * Segmented PIO Copy - start
+ *
+ * Start a PIO copy.
+ *
+ * @pbuf: destination buffer
+ * @pbc: the PBC for the PIO buffer
+ * @from: data source, QWORD aligned
+ * @nbytes: bytes to copy
+ */
+void seg_pio_copy_start(struct pio_buf *pbuf, u64 pbc,
+				const void *from, size_t nbytes)
+{
+	void __iomem *dest = pbuf->start + SOP_DISTANCE;
+	void __iomem *send = dest + PIO_BLOCK_SIZE;
+	void __iomem *dend;			/* 8-byte data end */
+
+	writeq(pbc, dest);
+	dest += sizeof(u64);
+
+	/* calculate where the QWORD data ends - in SOP=1 space */
+	dend = dest + ((nbytes>>3) * sizeof(u64));
+
+	if (dend < send) {
+		/* all QWORD data is within the SOP block, does *not*
+		   reach the end of the SOP block */
+
+		while (dest < dend) {
+			writeq(*(u64 *)from, dest);
+			from += sizeof(u64);
+			dest += sizeof(u64);
+		}
+		/*
+		 * No boundary checks are needed here:
+		 * 0. We're not on the SOP block boundary
+		 * 1. The possible DWORD dangle will still be within
+		 *    the SOP block
+		 * 2. We cannot wrap except on a block boundary.
+		 */
+	} else {
+		/* QWORD data extends _to_ or beyond the SOP block */
+
+		/* write 8-byte SOP chunk data */
+		while (dest < send) {
+			writeq(*(u64 *)from, dest);
+			from += sizeof(u64);
+			dest += sizeof(u64);
+		}
+		/* drop out of the SOP range */
+		dest -= SOP_DISTANCE;
+		dend -= SOP_DISTANCE;
+
+		/*
+		 * If the wrap comes before or matches the data end,
+		 * copy until until the wrap, then wrap.
+		 *
+		 * If the data ends at the end of the SOP above and
+		 * the buffer wraps, then pbuf->end == dend == dest
+		 * and nothing will get written, but we will wrap in
+		 * case there is a dangling DWORD.
+		 */
+		if (pbuf->end <= dend) {
+			while (dest < pbuf->end) {
+				writeq(*(u64 *)from, dest);
+				from += sizeof(u64);
+				dest += sizeof(u64);
+			}
+
+			dest -= pbuf->size;
+			dend -= pbuf->size;
+		}
+
+		/* write 8-byte non-SOP, non-wrap chunk data */
+		while (dest < dend) {
+			writeq(*(u64 *)from, dest);
+			from += sizeof(u64);
+			dest += sizeof(u64);
+		}
+	}
+	/* at this point we have wrapped if we are going to wrap */
+
+	/* ...but it doesn't matter as we're done writing */
+
+	/* save dangling bytes, if any */
+	read_low_bytes(pbuf, from, nbytes & 0x7);
+
+	pbuf->qw_written = 1 /*PBC*/ + (nbytes >> 3);
+}
+
+/*
+ * Mid copy helper, "mixed case" - source is 64-bit aligned but carry
+ * bytes are non-zero.
+ *
+ * Whole u64s must be written to the chip, so bytes must be manually merged.
+ *
+ * @pbuf: destination buffer
+ * @from: data source, is QWORD aligned.
+ * @nbytes: bytes to copy
+ *
+ * Must handle nbytes < 8.
+ */
+static void mid_copy_mix(struct pio_buf *pbuf, const void *from, size_t nbytes)
+{
+	void __iomem *dest = pbuf->start + (pbuf->qw_written * sizeof(u64));
+	void __iomem *dend;			/* 8-byte data end */
+	unsigned long qw_to_write = (pbuf->carry_bytes + nbytes) >> 3;
+	unsigned long bytes_left = (pbuf->carry_bytes + nbytes) & 0x7;
+
+	/* calculate 8-byte data end */
+	dend = dest + (qw_to_write * sizeof(u64));
+
+	if (pbuf->qw_written < PIO_BLOCK_QWS) {
+		/*
+		 * Still within SOP block.  We don't need to check for
+		 * wrap because we are still in the first block and
+		 * can only wrap on block boundaries.
+		 */
+		void __iomem *send;		/* SOP end */
+		void __iomem *xend;
+
+		/* calculate the end of data or end of block, whichever
+		   comes first */
+		send = pbuf->start + PIO_BLOCK_SIZE;
+		xend = send < dend ? send : dend;
+
+		/* shift up to SOP=1 space */
+		dest += SOP_DISTANCE;
+		xend += SOP_DISTANCE;
+
+		/* write 8-byte chunk data */
+		while (dest < xend) {
+			merge_write8(pbuf, dest, from);
+			from += sizeof(u64);
+			dest += sizeof(u64);
+		}
+
+		/* shift down to SOP=0 space */
+		dest -= SOP_DISTANCE;
+	}
+	/*
+	 * At this point dest could be (either, both, or neither):
+	 * - at dend
+	 * - at the wrap
+	 */
+
+	/*
+	 * If the wrap comes before or matches the data end,
+	 * copy until until the wrap, then wrap.
+	 *
+	 * If dest is at the wrap, we will fall into the if,
+	 * not do the loop, when wrap.
+	 *
+	 * If the data ends at the end of the SOP above and
+	 * the buffer wraps, then pbuf->end == dend == dest
+	 * and nothing will get written.
+	 */
+	if (pbuf->end <= dend) {
+		while (dest < pbuf->end) {
+			merge_write8(pbuf, dest, from);
+			from += sizeof(u64);
+			dest += sizeof(u64);
+		}
+
+		dest -= pbuf->size;
+		dend -= pbuf->size;
+	}
+
+	/* write 8-byte non-SOP, non-wrap chunk data */
+	while (dest < dend) {
+		merge_write8(pbuf, dest, from);
+		from += sizeof(u64);
+		dest += sizeof(u64);
+	}
+
+	/* adjust carry */
+	if (pbuf->carry_bytes < bytes_left) {
+		/* need to read more */
+		read_extra_bytes(pbuf, from, bytes_left - pbuf->carry_bytes);
+	} else {
+		/* remove invalid bytes */
+		zero_extra_bytes(pbuf, pbuf->carry_bytes - bytes_left);
+	}
+
+	pbuf->qw_written += qw_to_write;
+}
+
+/*
+ * Mid copy helper, "straight case" - source pointer is 64-bit aligned
+ * with no carry bytes.
+ *
+ * @pbuf: destination buffer
+ * @from: data source, is QWORD aligned
+ * @nbytes: bytes to copy
+ *
+ * Must handle nbytes < 8.
+ */
+static void mid_copy_straight(struct pio_buf *pbuf,
+						const void *from, size_t nbytes)
+{
+	void __iomem *dest = pbuf->start + (pbuf->qw_written * sizeof(u64));
+	void __iomem *dend;			/* 8-byte data end */
+
+	/* calculate 8-byte data end */
+	dend = dest + ((nbytes>>3) * sizeof(u64));
+
+	if (pbuf->qw_written < PIO_BLOCK_QWS) {
+		/*
+		 * Still within SOP block.  We don't need to check for
+		 * wrap because we are still in the first block and
+		 * can only wrap on block boundaries.
+		 */
+		void __iomem *send;		/* SOP end */
+		void __iomem *xend;
+
+		/* calculate the end of data or end of block, whichever
+		   comes first */
+		send = pbuf->start + PIO_BLOCK_SIZE;
+		xend = send < dend ? send : dend;
+
+		/* shift up to SOP=1 space */
+		dest += SOP_DISTANCE;
+		xend += SOP_DISTANCE;
+
+		/* write 8-byte chunk data */
+		while (dest < xend) {
+			writeq(*(u64 *)from, dest);
+			from += sizeof(u64);
+			dest += sizeof(u64);
+		}
+
+		/* shift down to SOP=0 space */
+		dest -= SOP_DISTANCE;
+	}
+	/*
+	 * At this point dest could be (either, both, or neither):
+	 * - at dend
+	 * - at the wrap
+	 */
+
+	/*
+	 * If the wrap comes before or matches the data end,
+	 * copy until until the wrap, then wrap.
+	 *
+	 * If dest is at the wrap, we will fall into the if,
+	 * not do the loop, when wrap.
+	 *
+	 * If the data ends at the end of the SOP above and
+	 * the buffer wraps, then pbuf->end == dend == dest
+	 * and nothing will get written.
+	 */
+	if (pbuf->end <= dend) {
+		while (dest < pbuf->end) {
+			writeq(*(u64 *)from, dest);
+			from += sizeof(u64);
+			dest += sizeof(u64);
+		}
+
+		dest -= pbuf->size;
+		dend -= pbuf->size;
+	}
+
+	/* write 8-byte non-SOP, non-wrap chunk data */
+	while (dest < dend) {
+		writeq(*(u64 *)from, dest);
+		from += sizeof(u64);
+		dest += sizeof(u64);
+	}
+
+	/* we know carry_bytes was zero on entry to this routine */
+	read_low_bytes(pbuf, from, nbytes & 0x7);
+
+	pbuf->qw_written += nbytes>>3;
+}
+
+/*
+ * Segmented PIO Copy - middle
+ *
+ * Must handle any aligned tail and any aligned source with any byte count.
+ *
+ * @pbuf: a number of blocks allocated within a PIO send context
+ * @from: data source
+ * @nbytes: number of bytes to copy
+ */
+void seg_pio_copy_mid(struct pio_buf *pbuf, const void *from, size_t nbytes)
+{
+	unsigned long from_align = (unsigned long)from & 0x7;
+
+	if (pbuf->carry_bytes + nbytes < 8) {
+		/* not enough bytes to fill a QW */
+		read_extra_bytes(pbuf, from, nbytes);
+		return;
+	}
+
+	if (from_align) {
+		/* misaligned source pointer - align it */
+		unsigned long to_align;
+
+		/* bytes to read to align "from" */
+		to_align = 8 - from_align;
+
+		/*
+		 * In the advance-to-alignment logic below, we do not need
+		 * to check if we are using more than nbytes.  This is because
+		 * if we are here, we already know that carry+nbytes will
+		 * fill at least one QW.
+		 */
+		if (pbuf->carry_bytes + to_align < 8) {
+			/* not enough align bytes to fill a QW */
+			read_extra_bytes(pbuf, from, to_align);
+			from += to_align;
+			nbytes -= to_align;
+		} else {
+			/* bytes to fill carry */
+			unsigned long to_fill = 8 - pbuf->carry_bytes;
+			/* bytes left over to be read */
+			unsigned long extra = to_align - to_fill;
+			void __iomem *dest;
+
+			/* fill carry... */
+			read_extra_bytes(pbuf, from, to_fill);
+			from += to_fill;
+			nbytes -= to_fill;
+
+			/* ...now write carry */
+			dest = pbuf->start + (pbuf->qw_written * sizeof(u64));
+
+			/*
+			 * The two checks immediately below cannot both be
+			 * true, hence the else.  If we have wrapped, we
+			 * cannot still be within the first block.
+			 * Conversely, if we are still in the first block, we
+			 * cannot have wrapped.  We do the wrap check first
+			 * as that is more likely.
+			 */
+			/* adjust if we've wrapped */
+			if (dest >= pbuf->end)
+				dest -= pbuf->size;
+			/* jump to SOP range if within the first block */
+			else if (pbuf->qw_written < PIO_BLOCK_QWS)
+				dest += SOP_DISTANCE;
+
+			carry8_write8(pbuf->carry, dest);
+			pbuf->qw_written++;
+
+			/* read any extra bytes to do final alignment */
+			/* this will overwrite anything in pbuf->carry */
+			read_low_bytes(pbuf, from, extra);
+			from += extra;
+			nbytes -= extra;
+		}
+
+		/* at this point, from is QW aligned */
+	}
+
+	if (pbuf->carry_bytes)
+		mid_copy_mix(pbuf, from, nbytes);
+	else
+		mid_copy_straight(pbuf, from, nbytes);
+}
+
+/*
+ * Segmented PIO Copy - end
+ *
+ * Write any remainder (in pbuf->carry) and finish writing the whole block.
+ *
+ * @pbuf: a number of blocks allocated within a PIO send context
+ */
+void seg_pio_copy_end(struct pio_buf *pbuf)
+{
+	void __iomem *dest = pbuf->start + (pbuf->qw_written * sizeof(u64));
+
+	/*
+	 * The two checks immediately below cannot both be true, hence the
+	 * else.  If we have wrapped, we cannot still be within the first
+	 * block.  Conversely, if we are still in the first block, we
+	 * cannot have wrapped.  We do the wrap check first as that is
+	 * more likely.
+	 */
+	/* adjust if we have wrapped */
+	if (dest >= pbuf->end)
+		dest -= pbuf->size;
+	/* jump to the SOP range if within the first block */
+	else if (pbuf->qw_written < PIO_BLOCK_QWS)
+		dest += SOP_DISTANCE;
+
+	/* write final bytes, if any */
+	if (carry_write8(pbuf, dest)) {
+		dest += sizeof(u64);
+		/*
+		 * NOTE: We do not need to recalculate whether dest needs
+		 * SOP_DISTANCE or not.
+		 *
+		 * If we are in the first block and the dangle write
+		 * keeps us in the same block, dest will need
+		 * to retain SOP_DISTANCE in the loop below.
+		 *
+		 * If we are in the first block and the dangle write pushes
+		 * us to the next block, then loop below will not run
+		 * and dest is not used.  Hence we do not need to update
+		 * it.
+		 *
+		 * If we are past the first block, then SOP_DISTANCE
+		 * was never added, so there is nothing to do.
+		 */
+	}
+
+	/* fill in rest of block */
+	while (((unsigned long)dest & PIO_BLOCK_MASK) != 0) {
+		writeq(0, dest);
+		dest += sizeof(u64);
+	}
+
+	/* finished with this buffer */
+	atomic_dec(&pbuf->sc->buffers_allocated);
+}
diff --git a/drivers/staging/rdma/hfi1/platform_config.h b/drivers/staging/rdma/hfi1/platform_config.h
new file mode 100644
index 000000000000..8a94a8342052
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/platform_config.h
@@ -0,0 +1,286 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#ifndef __PLATFORM_CONFIG_H
+#define __PLATFORM_CONFIG_H
+
+#define METADATA_TABLE_FIELD_START_SHIFT		0
+#define METADATA_TABLE_FIELD_START_LEN_BITS		15
+#define METADATA_TABLE_FIELD_LEN_SHIFT			16
+#define METADATA_TABLE_FIELD_LEN_LEN_BITS		16
+
+/* Header structure */
+#define PLATFORM_CONFIG_HEADER_RECORD_IDX_SHIFT			0
+#define PLATFORM_CONFIG_HEADER_RECORD_IDX_LEN_BITS		6
+#define PLATFORM_CONFIG_HEADER_TABLE_LENGTH_SHIFT		16
+#define PLATFORM_CONFIG_HEADER_TABLE_LENGTH_LEN_BITS		12
+#define PLATFORM_CONFIG_HEADER_TABLE_TYPE_SHIFT			28
+#define PLATFORM_CONFIG_HEADER_TABLE_TYPE_LEN_BITS		4
+
+enum platform_config_table_type_encoding {
+	PLATFORM_CONFIG_TABLE_RESERVED,
+	PLATFORM_CONFIG_SYSTEM_TABLE,
+	PLATFORM_CONFIG_PORT_TABLE,
+	PLATFORM_CONFIG_RX_PRESET_TABLE,
+	PLATFORM_CONFIG_TX_PRESET_TABLE,
+	PLATFORM_CONFIG_QSFP_ATTEN_TABLE,
+	PLATFORM_CONFIG_VARIABLE_SETTINGS_TABLE,
+	PLATFORM_CONFIG_TABLE_MAX
+};
+
+enum platform_config_system_table_fields {
+	SYSTEM_TABLE_RESERVED,
+	SYSTEM_TABLE_NODE_STRING,
+	SYSTEM_TABLE_SYSTEM_IMAGE_GUID,
+	SYSTEM_TABLE_NODE_GUID,
+	SYSTEM_TABLE_REVISION,
+	SYSTEM_TABLE_VENDOR_OUI,
+	SYSTEM_TABLE_META_VERSION,
+	SYSTEM_TABLE_DEVICE_ID,
+	SYSTEM_TABLE_PARTITION_ENFORCEMENT_CAP,
+	SYSTEM_TABLE_QSFP_POWER_CLASS_MAX,
+	SYSTEM_TABLE_QSFP_ATTENUATION_DEFAULT_12G,
+	SYSTEM_TABLE_QSFP_ATTENUATION_DEFAULT_25G,
+	SYSTEM_TABLE_VARIABLE_TABLE_ENTRIES_PER_PORT,
+	SYSTEM_TABLE_MAX
+};
+
+enum platform_config_port_table_fields {
+	PORT_TABLE_RESERVED,
+	PORT_TABLE_PORT_TYPE,
+	PORT_TABLE_ATTENUATION_12G,
+	PORT_TABLE_ATTENUATION_25G,
+	PORT_TABLE_LINK_SPEED_SUPPORTED,
+	PORT_TABLE_LINK_WIDTH_SUPPORTED,
+	PORT_TABLE_VL_CAP,
+	PORT_TABLE_MTU_CAP,
+	PORT_TABLE_TX_LANE_ENABLE_MASK,
+	PORT_TABLE_LOCAL_MAX_TIMEOUT,
+	PORT_TABLE_AUTO_LANE_SHEDDING_ENABLED,
+	PORT_TABLE_EXTERNAL_LOOPBACK_ALLOWED,
+	PORT_TABLE_TX_PRESET_IDX_PASSIVE_CU,
+	PORT_TABLE_TX_PRESET_IDX_ACTIVE_NO_EQ,
+	PORT_TABLE_TX_PRESET_IDX_ACTIVE_EQ,
+	PORT_TABLE_RX_PRESET_IDX,
+	PORT_TABLE_CABLE_REACH_CLASS,
+	PORT_TABLE_MAX
+};
+
+enum platform_config_rx_preset_table_fields {
+	RX_PRESET_TABLE_RESERVED,
+	RX_PRESET_TABLE_QSFP_RX_CDR_APPLY,
+	RX_PRESET_TABLE_QSFP_RX_EQ_APPLY,
+	RX_PRESET_TABLE_QSFP_RX_AMP_APPLY,
+	RX_PRESET_TABLE_QSFP_RX_CDR,
+	RX_PRESET_TABLE_QSFP_RX_EQ,
+	RX_PRESET_TABLE_QSFP_RX_AMP,
+	RX_PRESET_TABLE_MAX
+};
+
+enum platform_config_tx_preset_table_fields {
+	TX_PRESET_TABLE_RESERVED,
+	TX_PRESET_TABLE_PRECUR,
+	TX_PRESET_TABLE_ATTN,
+	TX_PRESET_TABLE_POSTCUR,
+	TX_PRESET_TABLE_QSFP_TX_CDR_APPLY,
+	TX_PRESET_TABLE_QSFP_TX_EQ_APPLY,
+	TX_PRESET_TABLE_QSFP_TX_CDR,
+	TX_PRESET_TABLE_QSFP_TX_EQ,
+	TX_PRESET_TABLE_MAX
+};
+
+enum platform_config_qsfp_attn_table_fields {
+	QSFP_ATTEN_TABLE_RESERVED,
+	QSFP_ATTEN_TABLE_TX_PRESET_IDX,
+	QSFP_ATTEN_TABLE_RX_PRESET_IDX,
+	QSFP_ATTEN_TABLE_MAX
+};
+
+enum platform_config_variable_settings_table_fields {
+	VARIABLE_SETTINGS_TABLE_RESERVED,
+	VARIABLE_SETTINGS_TABLE_TX_PRESET_IDX,
+	VARIABLE_SETTINGS_TABLE_RX_PRESET_IDX,
+	VARIABLE_SETTINGS_TABLE_MAX
+};
+
+struct platform_config_data {
+	u32 *table;
+	u32 *table_metadata;
+	u32 num_table;
+};
+
+/*
+ * This struct acts as a quick reference into the platform_data binary image
+ * and is populated by parse_platform_config(...) depending on the specific
+ * META_VERSION
+ */
+struct platform_config_cache {
+	u8  cache_valid;
+	struct platform_config_data config_tables[PLATFORM_CONFIG_TABLE_MAX];
+};
+
+static const u32 platform_config_table_limits[PLATFORM_CONFIG_TABLE_MAX] = {
+	0,
+	SYSTEM_TABLE_MAX,
+	PORT_TABLE_MAX,
+	RX_PRESET_TABLE_MAX,
+	TX_PRESET_TABLE_MAX,
+	QSFP_ATTEN_TABLE_MAX,
+	VARIABLE_SETTINGS_TABLE_MAX
+};
+
+/* This section defines default values and encodings for the
+ * fields defined for each table above
+ */
+
+/*=====================================================
+ *  System table encodings
+ *====================================================*/
+#define PLATFORM_CONFIG_MAGIC_NUM		0x3d4f5041
+#define PLATFORM_CONFIG_MAGIC_NUMBER_LEN	4
+
+/*
+ * These power classes are the same as defined in SFF 8636 spec rev 2.4
+ * describing byte 129 in table 6-16, except enumerated in a different order
+ */
+enum platform_config_qsfp_power_class_encoding {
+	QSFP_POWER_CLASS_1 = 1,
+	QSFP_POWER_CLASS_2,
+	QSFP_POWER_CLASS_3,
+	QSFP_POWER_CLASS_4,
+	QSFP_POWER_CLASS_5,
+	QSFP_POWER_CLASS_6,
+	QSFP_POWER_CLASS_7
+};
+
+
+/*=====================================================
+ *  Port table encodings
+ *==================================================== */
+enum platform_config_port_type_encoding {
+	PORT_TYPE_RESERVED,
+	PORT_TYPE_DISCONNECTED,
+	PORT_TYPE_FIXED,
+	PORT_TYPE_VARIABLE,
+	PORT_TYPE_QSFP,
+	PORT_TYPE_MAX
+};
+
+enum platform_config_link_speed_supported_encoding {
+	LINK_SPEED_SUPP_12G = 1,
+	LINK_SPEED_SUPP_25G,
+	LINK_SPEED_SUPP_12G_25G,
+	LINK_SPEED_SUPP_MAX
+};
+
+/*
+ * This is a subset (not strict) of the link downgrades
+ * supported. The link downgrades supported are expected
+ * to be supplied to the driver by another entity such as
+ * the fabric manager
+ */
+enum platform_config_link_width_supported_encoding {
+	LINK_WIDTH_SUPP_1X = 1,
+	LINK_WIDTH_SUPP_2X,
+	LINK_WIDTH_SUPP_2X_1X,
+	LINK_WIDTH_SUPP_3X,
+	LINK_WIDTH_SUPP_3X_1X,
+	LINK_WIDTH_SUPP_3X_2X,
+	LINK_WIDTH_SUPP_3X_2X_1X,
+	LINK_WIDTH_SUPP_4X,
+	LINK_WIDTH_SUPP_4X_1X,
+	LINK_WIDTH_SUPP_4X_2X,
+	LINK_WIDTH_SUPP_4X_2X_1X,
+	LINK_WIDTH_SUPP_4X_3X,
+	LINK_WIDTH_SUPP_4X_3X_1X,
+	LINK_WIDTH_SUPP_4X_3X_2X,
+	LINK_WIDTH_SUPP_4X_3X_2X_1X,
+	LINK_WIDTH_SUPP_MAX
+};
+
+enum platform_config_virtual_lane_capability_encoding {
+	VL_CAP_VL0 = 1,
+	VL_CAP_VL0_1,
+	VL_CAP_VL0_2,
+	VL_CAP_VL0_3,
+	VL_CAP_VL0_4,
+	VL_CAP_VL0_5,
+	VL_CAP_VL0_6,
+	VL_CAP_VL0_7,
+	VL_CAP_VL0_8,
+	VL_CAP_VL0_9,
+	VL_CAP_VL0_10,
+	VL_CAP_VL0_11,
+	VL_CAP_VL0_12,
+	VL_CAP_VL0_13,
+	VL_CAP_VL0_14,
+	VL_CAP_MAX
+};
+
+/* Max MTU */
+enum platform_config_mtu_capability_encoding {
+	MTU_CAP_256   = 1,
+	MTU_CAP_512   = 2,
+	MTU_CAP_1024  = 3,
+	MTU_CAP_2048  = 4,
+	MTU_CAP_4096  = 5,
+	MTU_CAP_8192  = 6,
+	MTU_CAP_10240 = 7
+};
+
+enum platform_config_local_max_timeout_encoding {
+	LOCAL_MAX_TIMEOUT_10_MS = 1,
+	LOCAL_MAX_TIMEOUT_100_MS,
+	LOCAL_MAX_TIMEOUT_1_S,
+	LOCAL_MAX_TIMEOUT_10_S,
+	LOCAL_MAX_TIMEOUT_100_S,
+	LOCAL_MAX_TIMEOUT_1000_S
+};
+
+#endif			/*__PLATFORM_CONFIG_H*/
diff --git a/drivers/staging/rdma/hfi1/qp.c b/drivers/staging/rdma/hfi1/qp.c
new file mode 100644
index 000000000000..df1fa56eaf85
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/qp.c
@@ -0,0 +1,1687 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/err.h>
+#include <linux/vmalloc.h>
+#include <linux/hash.h>
+#include <linux/module.h>
+#include <linux/random.h>
+#include <linux/seq_file.h>
+
+#include "hfi.h"
+#include "qp.h"
+#include "trace.h"
+#include "sdma.h"
+
+#define BITS_PER_PAGE           (PAGE_SIZE*BITS_PER_BYTE)
+#define BITS_PER_PAGE_MASK      (BITS_PER_PAGE-1)
+
+static unsigned int hfi1_qp_table_size = 256;
+module_param_named(qp_table_size, hfi1_qp_table_size, uint, S_IRUGO);
+MODULE_PARM_DESC(qp_table_size, "QP table size");
+
+static void flush_tx_list(struct hfi1_qp *qp);
+static int iowait_sleep(
+	struct sdma_engine *sde,
+	struct iowait *wait,
+	struct sdma_txreq *stx,
+	unsigned seq);
+static void iowait_wakeup(struct iowait *wait, int reason);
+
+static inline unsigned mk_qpn(struct hfi1_qpn_table *qpt,
+			      struct qpn_map *map, unsigned off)
+{
+	return (map - qpt->map) * BITS_PER_PAGE + off;
+}
+
+/*
+ * Convert the AETH credit code into the number of credits.
+ */
+static const u16 credit_table[31] = {
+	0,                      /* 0 */
+	1,                      /* 1 */
+	2,                      /* 2 */
+	3,                      /* 3 */
+	4,                      /* 4 */
+	6,                      /* 5 */
+	8,                      /* 6 */
+	12,                     /* 7 */
+	16,                     /* 8 */
+	24,                     /* 9 */
+	32,                     /* A */
+	48,                     /* B */
+	64,                     /* C */
+	96,                     /* D */
+	128,                    /* E */
+	192,                    /* F */
+	256,                    /* 10 */
+	384,                    /* 11 */
+	512,                    /* 12 */
+	768,                    /* 13 */
+	1024,                   /* 14 */
+	1536,                   /* 15 */
+	2048,                   /* 16 */
+	3072,                   /* 17 */
+	4096,                   /* 18 */
+	6144,                   /* 19 */
+	8192,                   /* 1A */
+	12288,                  /* 1B */
+	16384,                  /* 1C */
+	24576,                  /* 1D */
+	32768                   /* 1E */
+};
+
+static void get_map_page(struct hfi1_qpn_table *qpt, struct qpn_map *map)
+{
+	unsigned long page = get_zeroed_page(GFP_KERNEL);
+
+	/*
+	 * Free the page if someone raced with us installing it.
+	 */
+
+	spin_lock(&qpt->lock);
+	if (map->page)
+		free_page(page);
+	else
+		map->page = (void *)page;
+	spin_unlock(&qpt->lock);
+}
+
+/*
+ * Allocate the next available QPN or
+ * zero/one for QP type IB_QPT_SMI/IB_QPT_GSI.
+ */
+static int alloc_qpn(struct hfi1_devdata *dd, struct hfi1_qpn_table *qpt,
+		     enum ib_qp_type type, u8 port)
+{
+	u32 i, offset, max_scan, qpn;
+	struct qpn_map *map;
+	u32 ret;
+
+	if (type == IB_QPT_SMI || type == IB_QPT_GSI) {
+		unsigned n;
+
+		ret = type == IB_QPT_GSI;
+		n = 1 << (ret + 2 * (port - 1));
+		spin_lock(&qpt->lock);
+		if (qpt->flags & n)
+			ret = -EINVAL;
+		else
+			qpt->flags |= n;
+		spin_unlock(&qpt->lock);
+		goto bail;
+	}
+
+	qpn = qpt->last + qpt->incr;
+	if (qpn >= QPN_MAX)
+		qpn = qpt->incr | ((qpt->last & 1) ^ 1);
+	/* offset carries bit 0 */
+	offset = qpn & BITS_PER_PAGE_MASK;
+	map = &qpt->map[qpn / BITS_PER_PAGE];
+	max_scan = qpt->nmaps - !offset;
+	for (i = 0;;) {
+		if (unlikely(!map->page)) {
+			get_map_page(qpt, map);
+			if (unlikely(!map->page))
+				break;
+		}
+		do {
+			if (!test_and_set_bit(offset, map->page)) {
+				qpt->last = qpn;
+				ret = qpn;
+				goto bail;
+			}
+			offset += qpt->incr;
+			/*
+			 * This qpn might be bogus if offset >= BITS_PER_PAGE.
+			 * That is OK.   It gets re-assigned below
+			 */
+			qpn = mk_qpn(qpt, map, offset);
+		} while (offset < BITS_PER_PAGE && qpn < QPN_MAX);
+		/*
+		 * In order to keep the number of pages allocated to a
+		 * minimum, we scan the all existing pages before increasing
+		 * the size of the bitmap table.
+		 */
+		if (++i > max_scan) {
+			if (qpt->nmaps == QPNMAP_ENTRIES)
+				break;
+			map = &qpt->map[qpt->nmaps++];
+			/* start at incr with current bit 0 */
+			offset = qpt->incr | (offset & 1);
+		} else if (map < &qpt->map[qpt->nmaps]) {
+			++map;
+			/* start at incr with current bit 0 */
+			offset = qpt->incr | (offset & 1);
+		} else {
+			map = &qpt->map[0];
+			/* wrap to first map page, invert bit 0 */
+			offset = qpt->incr | ((offset & 1) ^ 1);
+		}
+		/* there can be no bits at shift and below */
+		WARN_ON(offset & (dd->qos_shift - 1));
+		qpn = mk_qpn(qpt, map, offset);
+	}
+
+	ret = -ENOMEM;
+
+bail:
+	return ret;
+}
+
+static void free_qpn(struct hfi1_qpn_table *qpt, u32 qpn)
+{
+	struct qpn_map *map;
+
+	map = qpt->map + qpn / BITS_PER_PAGE;
+	if (map->page)
+		clear_bit(qpn & BITS_PER_PAGE_MASK, map->page);
+}
+
+/*
+ * Put the QP into the hash table.
+ * The hash table holds a reference to the QP.
+ */
+static void insert_qp(struct hfi1_ibdev *dev, struct hfi1_qp *qp)
+{
+	struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
+	unsigned long flags;
+
+	atomic_inc(&qp->refcount);
+	spin_lock_irqsave(&dev->qp_dev->qpt_lock, flags);
+
+	if (qp->ibqp.qp_num <= 1) {
+		rcu_assign_pointer(ibp->qp[qp->ibqp.qp_num], qp);
+	} else {
+		u32 n = qpn_hash(dev->qp_dev, qp->ibqp.qp_num);
+
+		qp->next = dev->qp_dev->qp_table[n];
+		rcu_assign_pointer(dev->qp_dev->qp_table[n], qp);
+		trace_hfi1_qpinsert(qp, n);
+	}
+
+	spin_unlock_irqrestore(&dev->qp_dev->qpt_lock, flags);
+}
+
+/*
+ * Remove the QP from the table so it can't be found asynchronously by
+ * the receive interrupt routine.
+ */
+static void remove_qp(struct hfi1_ibdev *dev, struct hfi1_qp *qp)
+{
+	struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
+	u32 n = qpn_hash(dev->qp_dev, qp->ibqp.qp_num);
+	unsigned long flags;
+	int removed = 1;
+
+	spin_lock_irqsave(&dev->qp_dev->qpt_lock, flags);
+
+	if (rcu_dereference_protected(ibp->qp[0],
+			lockdep_is_held(&dev->qp_dev->qpt_lock)) == qp) {
+		RCU_INIT_POINTER(ibp->qp[0], NULL);
+	} else if (rcu_dereference_protected(ibp->qp[1],
+			lockdep_is_held(&dev->qp_dev->qpt_lock)) == qp) {
+		RCU_INIT_POINTER(ibp->qp[1], NULL);
+	} else {
+		struct hfi1_qp *q;
+		struct hfi1_qp __rcu **qpp;
+
+		removed = 0;
+		qpp = &dev->qp_dev->qp_table[n];
+		for (; (q = rcu_dereference_protected(*qpp,
+				lockdep_is_held(&dev->qp_dev->qpt_lock)))
+					!= NULL;
+				qpp = &q->next)
+			if (q == qp) {
+				RCU_INIT_POINTER(*qpp,
+				 rcu_dereference_protected(qp->next,
+				 lockdep_is_held(&dev->qp_dev->qpt_lock)));
+				removed = 1;
+				trace_hfi1_qpremove(qp, n);
+				break;
+			}
+	}
+
+	spin_unlock_irqrestore(&dev->qp_dev->qpt_lock, flags);
+	if (removed) {
+		synchronize_rcu();
+		if (atomic_dec_and_test(&qp->refcount))
+			wake_up(&qp->wait);
+	}
+}
+
+/**
+ * free_all_qps - check for QPs still in use
+ * @qpt: the QP table to empty
+ *
+ * There should not be any QPs still in use.
+ * Free memory for table.
+ */
+static unsigned free_all_qps(struct hfi1_devdata *dd)
+{
+	struct hfi1_ibdev *dev = &dd->verbs_dev;
+	unsigned long flags;
+	struct hfi1_qp *qp;
+	unsigned n, qp_inuse = 0;
+
+	for (n = 0; n < dd->num_pports; n++) {
+		struct hfi1_ibport *ibp = &dd->pport[n].ibport_data;
+
+		if (!hfi1_mcast_tree_empty(ibp))
+			qp_inuse++;
+		rcu_read_lock();
+		if (rcu_dereference(ibp->qp[0]))
+			qp_inuse++;
+		if (rcu_dereference(ibp->qp[1]))
+			qp_inuse++;
+		rcu_read_unlock();
+	}
+
+	if (!dev->qp_dev)
+		goto bail;
+	spin_lock_irqsave(&dev->qp_dev->qpt_lock, flags);
+	for (n = 0; n < dev->qp_dev->qp_table_size; n++) {
+		qp = rcu_dereference_protected(dev->qp_dev->qp_table[n],
+			lockdep_is_held(&dev->qp_dev->qpt_lock));
+		RCU_INIT_POINTER(dev->qp_dev->qp_table[n], NULL);
+
+		for (; qp; qp = rcu_dereference_protected(qp->next,
+				lockdep_is_held(&dev->qp_dev->qpt_lock)))
+			qp_inuse++;
+	}
+	spin_unlock_irqrestore(&dev->qp_dev->qpt_lock, flags);
+	synchronize_rcu();
+bail:
+	return qp_inuse;
+}
+
+/**
+ * reset_qp - initialize the QP state to the reset state
+ * @qp: the QP to reset
+ * @type: the QP type
+ */
+static void reset_qp(struct hfi1_qp *qp, enum ib_qp_type type)
+{
+	qp->remote_qpn = 0;
+	qp->qkey = 0;
+	qp->qp_access_flags = 0;
+	iowait_init(
+		&qp->s_iowait,
+		1,
+		hfi1_do_send,
+		iowait_sleep,
+		iowait_wakeup);
+	qp->s_flags &= HFI1_S_SIGNAL_REQ_WR;
+	qp->s_hdrwords = 0;
+	qp->s_wqe = NULL;
+	qp->s_draining = 0;
+	qp->s_next_psn = 0;
+	qp->s_last_psn = 0;
+	qp->s_sending_psn = 0;
+	qp->s_sending_hpsn = 0;
+	qp->s_psn = 0;
+	qp->r_psn = 0;
+	qp->r_msn = 0;
+	if (type == IB_QPT_RC) {
+		qp->s_state = IB_OPCODE_RC_SEND_LAST;
+		qp->r_state = IB_OPCODE_RC_SEND_LAST;
+	} else {
+		qp->s_state = IB_OPCODE_UC_SEND_LAST;
+		qp->r_state = IB_OPCODE_UC_SEND_LAST;
+	}
+	qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;
+	qp->r_nak_state = 0;
+	qp->r_aflags = 0;
+	qp->r_flags = 0;
+	qp->s_head = 0;
+	qp->s_tail = 0;
+	qp->s_cur = 0;
+	qp->s_acked = 0;
+	qp->s_last = 0;
+	qp->s_ssn = 1;
+	qp->s_lsn = 0;
+	clear_ahg(qp);
+	qp->s_mig_state = IB_MIG_MIGRATED;
+	memset(qp->s_ack_queue, 0, sizeof(qp->s_ack_queue));
+	qp->r_head_ack_queue = 0;
+	qp->s_tail_ack_queue = 0;
+	qp->s_num_rd_atomic = 0;
+	if (qp->r_rq.wq) {
+		qp->r_rq.wq->head = 0;
+		qp->r_rq.wq->tail = 0;
+	}
+	qp->r_sge.num_sge = 0;
+}
+
+static void clear_mr_refs(struct hfi1_qp *qp, int clr_sends)
+{
+	unsigned n;
+
+	if (test_and_clear_bit(HFI1_R_REWIND_SGE, &qp->r_aflags))
+		hfi1_put_ss(&qp->s_rdma_read_sge);
+
+	hfi1_put_ss(&qp->r_sge);
+
+	if (clr_sends) {
+		while (qp->s_last != qp->s_head) {
+			struct hfi1_swqe *wqe = get_swqe_ptr(qp, qp->s_last);
+			unsigned i;
+
+			for (i = 0; i < wqe->wr.num_sge; i++) {
+				struct hfi1_sge *sge = &wqe->sg_list[i];
+
+				hfi1_put_mr(sge->mr);
+			}
+			if (qp->ibqp.qp_type == IB_QPT_UD ||
+			    qp->ibqp.qp_type == IB_QPT_SMI ||
+			    qp->ibqp.qp_type == IB_QPT_GSI)
+				atomic_dec(&to_iah(wqe->wr.wr.ud.ah)->refcount);
+			if (++qp->s_last >= qp->s_size)
+				qp->s_last = 0;
+		}
+		if (qp->s_rdma_mr) {
+			hfi1_put_mr(qp->s_rdma_mr);
+			qp->s_rdma_mr = NULL;
+		}
+	}
+
+	if (qp->ibqp.qp_type != IB_QPT_RC)
+		return;
+
+	for (n = 0; n < ARRAY_SIZE(qp->s_ack_queue); n++) {
+		struct hfi1_ack_entry *e = &qp->s_ack_queue[n];
+
+		if (e->opcode == IB_OPCODE_RC_RDMA_READ_REQUEST &&
+		    e->rdma_sge.mr) {
+			hfi1_put_mr(e->rdma_sge.mr);
+			e->rdma_sge.mr = NULL;
+		}
+	}
+}
+
+/**
+ * hfi1_error_qp - put a QP into the error state
+ * @qp: the QP to put into the error state
+ * @err: the receive completion error to signal if a RWQE is active
+ *
+ * Flushes both send and receive work queues.
+ * Returns true if last WQE event should be generated.
+ * The QP r_lock and s_lock should be held and interrupts disabled.
+ * If we are already in error state, just return.
+ */
+int hfi1_error_qp(struct hfi1_qp *qp, enum ib_wc_status err)
+{
+	struct hfi1_ibdev *dev = to_idev(qp->ibqp.device);
+	struct ib_wc wc;
+	int ret = 0;
+
+	if (qp->state == IB_QPS_ERR || qp->state == IB_QPS_RESET)
+		goto bail;
+
+	qp->state = IB_QPS_ERR;
+
+	if (qp->s_flags & (HFI1_S_TIMER | HFI1_S_WAIT_RNR)) {
+		qp->s_flags &= ~(HFI1_S_TIMER | HFI1_S_WAIT_RNR);
+		del_timer(&qp->s_timer);
+	}
+
+	if (qp->s_flags & HFI1_S_ANY_WAIT_SEND)
+		qp->s_flags &= ~HFI1_S_ANY_WAIT_SEND;
+
+	write_seqlock(&dev->iowait_lock);
+	if (!list_empty(&qp->s_iowait.list) && !(qp->s_flags & HFI1_S_BUSY)) {
+		qp->s_flags &= ~HFI1_S_ANY_WAIT_IO;
+		list_del_init(&qp->s_iowait.list);
+		if (atomic_dec_and_test(&qp->refcount))
+			wake_up(&qp->wait);
+	}
+	write_sequnlock(&dev->iowait_lock);
+
+	if (!(qp->s_flags & HFI1_S_BUSY)) {
+		qp->s_hdrwords = 0;
+		if (qp->s_rdma_mr) {
+			hfi1_put_mr(qp->s_rdma_mr);
+			qp->s_rdma_mr = NULL;
+		}
+		flush_tx_list(qp);
+	}
+
+	/* Schedule the sending tasklet to drain the send work queue. */
+	if (qp->s_last != qp->s_head)
+		hfi1_schedule_send(qp);
+
+	clear_mr_refs(qp, 0);
+
+	memset(&wc, 0, sizeof(wc));
+	wc.qp = &qp->ibqp;
+	wc.opcode = IB_WC_RECV;
+
+	if (test_and_clear_bit(HFI1_R_WRID_VALID, &qp->r_aflags)) {
+		wc.wr_id = qp->r_wr_id;
+		wc.status = err;
+		hfi1_cq_enter(to_icq(qp->ibqp.recv_cq), &wc, 1);
+	}
+	wc.status = IB_WC_WR_FLUSH_ERR;
+
+	if (qp->r_rq.wq) {
+		struct hfi1_rwq *wq;
+		u32 head;
+		u32 tail;
+
+		spin_lock(&qp->r_rq.lock);
+
+		/* sanity check pointers before trusting them */
+		wq = qp->r_rq.wq;
+		head = wq->head;
+		if (head >= qp->r_rq.size)
+			head = 0;
+		tail = wq->tail;
+		if (tail >= qp->r_rq.size)
+			tail = 0;
+		while (tail != head) {
+			wc.wr_id = get_rwqe_ptr(&qp->r_rq, tail)->wr_id;
+			if (++tail >= qp->r_rq.size)
+				tail = 0;
+			hfi1_cq_enter(to_icq(qp->ibqp.recv_cq), &wc, 1);
+		}
+		wq->tail = tail;
+
+		spin_unlock(&qp->r_rq.lock);
+	} else if (qp->ibqp.event_handler)
+		ret = 1;
+
+bail:
+	return ret;
+}
+
+static void flush_tx_list(struct hfi1_qp *qp)
+{
+	while (!list_empty(&qp->s_iowait.tx_head)) {
+		struct sdma_txreq *tx;
+
+		tx = list_first_entry(
+			&qp->s_iowait.tx_head,
+			struct sdma_txreq,
+			list);
+		list_del_init(&tx->list);
+		hfi1_put_txreq(
+			container_of(tx, struct verbs_txreq, txreq));
+	}
+}
+
+static void flush_iowait(struct hfi1_qp *qp)
+{
+	struct hfi1_ibdev *dev = to_idev(qp->ibqp.device);
+	unsigned long flags;
+
+	write_seqlock_irqsave(&dev->iowait_lock, flags);
+	if (!list_empty(&qp->s_iowait.list)) {
+		list_del_init(&qp->s_iowait.list);
+		if (atomic_dec_and_test(&qp->refcount))
+			wake_up(&qp->wait);
+	}
+	write_sequnlock_irqrestore(&dev->iowait_lock, flags);
+}
+
+static inline int opa_mtu_enum_to_int(int mtu)
+{
+	switch (mtu) {
+	case OPA_MTU_8192:  return 8192;
+	case OPA_MTU_10240: return 10240;
+	default:            return -1;
+	}
+}
+
+/**
+ * This function is what we would push to the core layer if we wanted to be a
+ * "first class citizen".  Instead we hide this here and rely on Verbs ULPs
+ * to blindly pass the MTU enum value from the PathRecord to us.
+ *
+ * The actual flag used to determine "8k MTU" will change and is currently
+ * unknown.
+ */
+static inline int verbs_mtu_enum_to_int(struct ib_device *dev, enum ib_mtu mtu)
+{
+	int val = opa_mtu_enum_to_int((int)mtu);
+
+	if (val > 0)
+		return val;
+	return ib_mtu_enum_to_int(mtu);
+}
+
+
+/**
+ * hfi1_modify_qp - modify the attributes of a queue pair
+ * @ibqp: the queue pair who's attributes we're modifying
+ * @attr: the new attributes
+ * @attr_mask: the mask of attributes to modify
+ * @udata: user data for libibverbs.so
+ *
+ * Returns 0 on success, otherwise returns an errno.
+ */
+int hfi1_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
+		   int attr_mask, struct ib_udata *udata)
+{
+	struct hfi1_ibdev *dev = to_idev(ibqp->device);
+	struct hfi1_qp *qp = to_iqp(ibqp);
+	enum ib_qp_state cur_state, new_state;
+	struct ib_event ev;
+	int lastwqe = 0;
+	int mig = 0;
+	int ret;
+	u32 pmtu = 0; /* for gcc warning only */
+	struct hfi1_devdata *dd;
+
+	spin_lock_irq(&qp->r_lock);
+	spin_lock(&qp->s_lock);
+
+	cur_state = attr_mask & IB_QP_CUR_STATE ?
+		attr->cur_qp_state : qp->state;
+	new_state = attr_mask & IB_QP_STATE ? attr->qp_state : cur_state;
+
+	if (!ib_modify_qp_is_ok(cur_state, new_state, ibqp->qp_type,
+				attr_mask, IB_LINK_LAYER_UNSPECIFIED))
+		goto inval;
+
+	if (attr_mask & IB_QP_AV) {
+		if (attr->ah_attr.dlid >= HFI1_MULTICAST_LID_BASE)
+			goto inval;
+		if (hfi1_check_ah(qp->ibqp.device, &attr->ah_attr))
+			goto inval;
+	}
+
+	if (attr_mask & IB_QP_ALT_PATH) {
+		if (attr->alt_ah_attr.dlid >= HFI1_MULTICAST_LID_BASE)
+			goto inval;
+		if (hfi1_check_ah(qp->ibqp.device, &attr->alt_ah_attr))
+			goto inval;
+		if (attr->alt_pkey_index >= hfi1_get_npkeys(dd_from_dev(dev)))
+			goto inval;
+	}
+
+	if (attr_mask & IB_QP_PKEY_INDEX)
+		if (attr->pkey_index >= hfi1_get_npkeys(dd_from_dev(dev)))
+			goto inval;
+
+	if (attr_mask & IB_QP_MIN_RNR_TIMER)
+		if (attr->min_rnr_timer > 31)
+			goto inval;
+
+	if (attr_mask & IB_QP_PORT)
+		if (qp->ibqp.qp_type == IB_QPT_SMI ||
+		    qp->ibqp.qp_type == IB_QPT_GSI ||
+		    attr->port_num == 0 ||
+		    attr->port_num > ibqp->device->phys_port_cnt)
+			goto inval;
+
+	if (attr_mask & IB_QP_DEST_QPN)
+		if (attr->dest_qp_num > HFI1_QPN_MASK)
+			goto inval;
+
+	if (attr_mask & IB_QP_RETRY_CNT)
+		if (attr->retry_cnt > 7)
+			goto inval;
+
+	if (attr_mask & IB_QP_RNR_RETRY)
+		if (attr->rnr_retry > 7)
+			goto inval;
+
+	/*
+	 * Don't allow invalid path_mtu values.  OK to set greater
+	 * than the active mtu (or even the max_cap, if we have tuned
+	 * that to a small mtu.  We'll set qp->path_mtu
+	 * to the lesser of requested attribute mtu and active,
+	 * for packetizing messages.
+	 * Note that the QP port has to be set in INIT and MTU in RTR.
+	 */
+	if (attr_mask & IB_QP_PATH_MTU) {
+		int mtu, pidx = qp->port_num - 1;
+
+		dd = dd_from_dev(dev);
+		mtu = verbs_mtu_enum_to_int(ibqp->device, attr->path_mtu);
+		if (mtu == -1)
+			goto inval;
+
+		if (mtu > dd->pport[pidx].ibmtu)
+			pmtu = mtu_to_enum(dd->pport[pidx].ibmtu, IB_MTU_2048);
+		else
+			pmtu = attr->path_mtu;
+	}
+
+	if (attr_mask & IB_QP_PATH_MIG_STATE) {
+		if (attr->path_mig_state == IB_MIG_REARM) {
+			if (qp->s_mig_state == IB_MIG_ARMED)
+				goto inval;
+			if (new_state != IB_QPS_RTS)
+				goto inval;
+		} else if (attr->path_mig_state == IB_MIG_MIGRATED) {
+			if (qp->s_mig_state == IB_MIG_REARM)
+				goto inval;
+			if (new_state != IB_QPS_RTS && new_state != IB_QPS_SQD)
+				goto inval;
+			if (qp->s_mig_state == IB_MIG_ARMED)
+				mig = 1;
+		} else
+			goto inval;
+	}
+
+	if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC)
+		if (attr->max_dest_rd_atomic > HFI1_MAX_RDMA_ATOMIC)
+			goto inval;
+
+	switch (new_state) {
+	case IB_QPS_RESET:
+		if (qp->state != IB_QPS_RESET) {
+			qp->state = IB_QPS_RESET;
+			flush_iowait(qp);
+			qp->s_flags &= ~(HFI1_S_TIMER | HFI1_S_ANY_WAIT);
+			spin_unlock(&qp->s_lock);
+			spin_unlock_irq(&qp->r_lock);
+			/* Stop the sending work queue and retry timer */
+			cancel_work_sync(&qp->s_iowait.iowork);
+			del_timer_sync(&qp->s_timer);
+			iowait_sdma_drain(&qp->s_iowait);
+			flush_tx_list(qp);
+			remove_qp(dev, qp);
+			wait_event(qp->wait, !atomic_read(&qp->refcount));
+			spin_lock_irq(&qp->r_lock);
+			spin_lock(&qp->s_lock);
+			clear_mr_refs(qp, 1);
+			clear_ahg(qp);
+			reset_qp(qp, ibqp->qp_type);
+		}
+		break;
+
+	case IB_QPS_RTR:
+		/* Allow event to re-trigger if QP set to RTR more than once */
+		qp->r_flags &= ~HFI1_R_COMM_EST;
+		qp->state = new_state;
+		break;
+
+	case IB_QPS_SQD:
+		qp->s_draining = qp->s_last != qp->s_cur;
+		qp->state = new_state;
+		break;
+
+	case IB_QPS_SQE:
+		if (qp->ibqp.qp_type == IB_QPT_RC)
+			goto inval;
+		qp->state = new_state;
+		break;
+
+	case IB_QPS_ERR:
+		lastwqe = hfi1_error_qp(qp, IB_WC_WR_FLUSH_ERR);
+		break;
+
+	default:
+		qp->state = new_state;
+		break;
+	}
+
+	if (attr_mask & IB_QP_PKEY_INDEX)
+		qp->s_pkey_index = attr->pkey_index;
+
+	if (attr_mask & IB_QP_PORT)
+		qp->port_num = attr->port_num;
+
+	if (attr_mask & IB_QP_DEST_QPN)
+		qp->remote_qpn = attr->dest_qp_num;
+
+	if (attr_mask & IB_QP_SQ_PSN) {
+		qp->s_next_psn = attr->sq_psn & PSN_MODIFY_MASK;
+		qp->s_psn = qp->s_next_psn;
+		qp->s_sending_psn = qp->s_next_psn;
+		qp->s_last_psn = qp->s_next_psn - 1;
+		qp->s_sending_hpsn = qp->s_last_psn;
+	}
+
+	if (attr_mask & IB_QP_RQ_PSN)
+		qp->r_psn = attr->rq_psn & PSN_MODIFY_MASK;
+
+	if (attr_mask & IB_QP_ACCESS_FLAGS)
+		qp->qp_access_flags = attr->qp_access_flags;
+
+	if (attr_mask & IB_QP_AV) {
+		qp->remote_ah_attr = attr->ah_attr;
+		qp->s_srate = attr->ah_attr.static_rate;
+		qp->srate_mbps = ib_rate_to_mbps(qp->s_srate);
+	}
+
+	if (attr_mask & IB_QP_ALT_PATH) {
+		qp->alt_ah_attr = attr->alt_ah_attr;
+		qp->s_alt_pkey_index = attr->alt_pkey_index;
+	}
+
+	if (attr_mask & IB_QP_PATH_MIG_STATE) {
+		qp->s_mig_state = attr->path_mig_state;
+		if (mig) {
+			qp->remote_ah_attr = qp->alt_ah_attr;
+			qp->port_num = qp->alt_ah_attr.port_num;
+			qp->s_pkey_index = qp->s_alt_pkey_index;
+			qp->s_flags |= HFI1_S_AHG_CLEAR;
+		}
+	}
+
+	if (attr_mask & IB_QP_PATH_MTU) {
+		struct hfi1_ibport *ibp;
+		u8 sc, vl;
+		u32 mtu;
+
+		dd = dd_from_dev(dev);
+		ibp = &dd->pport[qp->port_num - 1].ibport_data;
+
+		sc = ibp->sl_to_sc[qp->remote_ah_attr.sl];
+		vl = sc_to_vlt(dd, sc);
+
+		mtu = verbs_mtu_enum_to_int(ibqp->device, pmtu);
+		if (vl < PER_VL_SEND_CONTEXTS)
+			mtu = min_t(u32, mtu, dd->vld[vl].mtu);
+		pmtu = mtu_to_enum(mtu, OPA_MTU_8192);
+
+		qp->path_mtu = pmtu;
+		qp->pmtu = mtu;
+	}
+
+	if (attr_mask & IB_QP_RETRY_CNT) {
+		qp->s_retry_cnt = attr->retry_cnt;
+		qp->s_retry = attr->retry_cnt;
+	}
+
+	if (attr_mask & IB_QP_RNR_RETRY) {
+		qp->s_rnr_retry_cnt = attr->rnr_retry;
+		qp->s_rnr_retry = attr->rnr_retry;
+	}
+
+	if (attr_mask & IB_QP_MIN_RNR_TIMER)
+		qp->r_min_rnr_timer = attr->min_rnr_timer;
+
+	if (attr_mask & IB_QP_TIMEOUT) {
+		qp->timeout = attr->timeout;
+		qp->timeout_jiffies =
+			usecs_to_jiffies((4096UL * (1UL << qp->timeout)) /
+				1000UL);
+	}
+
+	if (attr_mask & IB_QP_QKEY)
+		qp->qkey = attr->qkey;
+
+	if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC)
+		qp->r_max_rd_atomic = attr->max_dest_rd_atomic;
+
+	if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC)
+		qp->s_max_rd_atomic = attr->max_rd_atomic;
+
+	spin_unlock(&qp->s_lock);
+	spin_unlock_irq(&qp->r_lock);
+
+	if (cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT)
+		insert_qp(dev, qp);
+
+	if (lastwqe) {
+		ev.device = qp->ibqp.device;
+		ev.element.qp = &qp->ibqp;
+		ev.event = IB_EVENT_QP_LAST_WQE_REACHED;
+		qp->ibqp.event_handler(&ev, qp->ibqp.qp_context);
+	}
+	if (mig) {
+		ev.device = qp->ibqp.device;
+		ev.element.qp = &qp->ibqp;
+		ev.event = IB_EVENT_PATH_MIG;
+		qp->ibqp.event_handler(&ev, qp->ibqp.qp_context);
+	}
+	ret = 0;
+	goto bail;
+
+inval:
+	spin_unlock(&qp->s_lock);
+	spin_unlock_irq(&qp->r_lock);
+	ret = -EINVAL;
+
+bail:
+	return ret;
+}
+
+int hfi1_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
+		  int attr_mask, struct ib_qp_init_attr *init_attr)
+{
+	struct hfi1_qp *qp = to_iqp(ibqp);
+
+	attr->qp_state = qp->state;
+	attr->cur_qp_state = attr->qp_state;
+	attr->path_mtu = qp->path_mtu;
+	attr->path_mig_state = qp->s_mig_state;
+	attr->qkey = qp->qkey;
+	attr->rq_psn = mask_psn(qp->r_psn);
+	attr->sq_psn = mask_psn(qp->s_next_psn);
+	attr->dest_qp_num = qp->remote_qpn;
+	attr->qp_access_flags = qp->qp_access_flags;
+	attr->cap.max_send_wr = qp->s_size - 1;
+	attr->cap.max_recv_wr = qp->ibqp.srq ? 0 : qp->r_rq.size - 1;
+	attr->cap.max_send_sge = qp->s_max_sge;
+	attr->cap.max_recv_sge = qp->r_rq.max_sge;
+	attr->cap.max_inline_data = 0;
+	attr->ah_attr = qp->remote_ah_attr;
+	attr->alt_ah_attr = qp->alt_ah_attr;
+	attr->pkey_index = qp->s_pkey_index;
+	attr->alt_pkey_index = qp->s_alt_pkey_index;
+	attr->en_sqd_async_notify = 0;
+	attr->sq_draining = qp->s_draining;
+	attr->max_rd_atomic = qp->s_max_rd_atomic;
+	attr->max_dest_rd_atomic = qp->r_max_rd_atomic;
+	attr->min_rnr_timer = qp->r_min_rnr_timer;
+	attr->port_num = qp->port_num;
+	attr->timeout = qp->timeout;
+	attr->retry_cnt = qp->s_retry_cnt;
+	attr->rnr_retry = qp->s_rnr_retry_cnt;
+	attr->alt_port_num = qp->alt_ah_attr.port_num;
+	attr->alt_timeout = qp->alt_timeout;
+
+	init_attr->event_handler = qp->ibqp.event_handler;
+	init_attr->qp_context = qp->ibqp.qp_context;
+	init_attr->send_cq = qp->ibqp.send_cq;
+	init_attr->recv_cq = qp->ibqp.recv_cq;
+	init_attr->srq = qp->ibqp.srq;
+	init_attr->cap = attr->cap;
+	if (qp->s_flags & HFI1_S_SIGNAL_REQ_WR)
+		init_attr->sq_sig_type = IB_SIGNAL_REQ_WR;
+	else
+		init_attr->sq_sig_type = IB_SIGNAL_ALL_WR;
+	init_attr->qp_type = qp->ibqp.qp_type;
+	init_attr->port_num = qp->port_num;
+	return 0;
+}
+
+/**
+ * hfi1_compute_aeth - compute the AETH (syndrome + MSN)
+ * @qp: the queue pair to compute the AETH for
+ *
+ * Returns the AETH.
+ */
+__be32 hfi1_compute_aeth(struct hfi1_qp *qp)
+{
+	u32 aeth = qp->r_msn & HFI1_MSN_MASK;
+
+	if (qp->ibqp.srq) {
+		/*
+		 * Shared receive queues don't generate credits.
+		 * Set the credit field to the invalid value.
+		 */
+		aeth |= HFI1_AETH_CREDIT_INVAL << HFI1_AETH_CREDIT_SHIFT;
+	} else {
+		u32 min, max, x;
+		u32 credits;
+		struct hfi1_rwq *wq = qp->r_rq.wq;
+		u32 head;
+		u32 tail;
+
+		/* sanity check pointers before trusting them */
+		head = wq->head;
+		if (head >= qp->r_rq.size)
+			head = 0;
+		tail = wq->tail;
+		if (tail >= qp->r_rq.size)
+			tail = 0;
+		/*
+		 * Compute the number of credits available (RWQEs).
+		 * There is a small chance that the pair of reads are
+		 * not atomic, which is OK, since the fuzziness is
+		 * resolved as further ACKs go out.
+		 */
+		credits = head - tail;
+		if ((int)credits < 0)
+			credits += qp->r_rq.size;
+		/*
+		 * Binary search the credit table to find the code to
+		 * use.
+		 */
+		min = 0;
+		max = 31;
+		for (;;) {
+			x = (min + max) / 2;
+			if (credit_table[x] == credits)
+				break;
+			if (credit_table[x] > credits)
+				max = x;
+			else if (min == x)
+				break;
+			else
+				min = x;
+		}
+		aeth |= x << HFI1_AETH_CREDIT_SHIFT;
+	}
+	return cpu_to_be32(aeth);
+}
+
+/**
+ * hfi1_create_qp - create a queue pair for a device
+ * @ibpd: the protection domain who's device we create the queue pair for
+ * @init_attr: the attributes of the queue pair
+ * @udata: user data for libibverbs.so
+ *
+ * Returns the queue pair on success, otherwise returns an errno.
+ *
+ * Called by the ib_create_qp() core verbs function.
+ */
+struct ib_qp *hfi1_create_qp(struct ib_pd *ibpd,
+			     struct ib_qp_init_attr *init_attr,
+			     struct ib_udata *udata)
+{
+	struct hfi1_qp *qp;
+	int err;
+	struct hfi1_swqe *swq = NULL;
+	struct hfi1_ibdev *dev;
+	struct hfi1_devdata *dd;
+	size_t sz;
+	size_t sg_list_sz;
+	struct ib_qp *ret;
+
+	if (init_attr->cap.max_send_sge > hfi1_max_sges ||
+	    init_attr->cap.max_send_wr > hfi1_max_qp_wrs ||
+	    init_attr->create_flags) {
+		ret = ERR_PTR(-EINVAL);
+		goto bail;
+	}
+
+	/* Check receive queue parameters if no SRQ is specified. */
+	if (!init_attr->srq) {
+		if (init_attr->cap.max_recv_sge > hfi1_max_sges ||
+		    init_attr->cap.max_recv_wr > hfi1_max_qp_wrs) {
+			ret = ERR_PTR(-EINVAL);
+			goto bail;
+		}
+		if (init_attr->cap.max_send_sge +
+		    init_attr->cap.max_send_wr +
+		    init_attr->cap.max_recv_sge +
+		    init_attr->cap.max_recv_wr == 0) {
+			ret = ERR_PTR(-EINVAL);
+			goto bail;
+		}
+	}
+
+	switch (init_attr->qp_type) {
+	case IB_QPT_SMI:
+	case IB_QPT_GSI:
+		if (init_attr->port_num == 0 ||
+		    init_attr->port_num > ibpd->device->phys_port_cnt) {
+			ret = ERR_PTR(-EINVAL);
+			goto bail;
+		}
+	case IB_QPT_UC:
+	case IB_QPT_RC:
+	case IB_QPT_UD:
+		sz = sizeof(struct hfi1_sge) *
+			init_attr->cap.max_send_sge +
+			sizeof(struct hfi1_swqe);
+		swq = vmalloc((init_attr->cap.max_send_wr + 1) * sz);
+		if (swq == NULL) {
+			ret = ERR_PTR(-ENOMEM);
+			goto bail;
+		}
+		sz = sizeof(*qp);
+		sg_list_sz = 0;
+		if (init_attr->srq) {
+			struct hfi1_srq *srq = to_isrq(init_attr->srq);
+
+			if (srq->rq.max_sge > 1)
+				sg_list_sz = sizeof(*qp->r_sg_list) *
+					(srq->rq.max_sge - 1);
+		} else if (init_attr->cap.max_recv_sge > 1)
+			sg_list_sz = sizeof(*qp->r_sg_list) *
+				(init_attr->cap.max_recv_sge - 1);
+		qp = kzalloc(sz + sg_list_sz, GFP_KERNEL);
+		if (!qp) {
+			ret = ERR_PTR(-ENOMEM);
+			goto bail_swq;
+		}
+		RCU_INIT_POINTER(qp->next, NULL);
+		qp->s_hdr = kzalloc(sizeof(*qp->s_hdr), GFP_KERNEL);
+		if (!qp->s_hdr) {
+			ret = ERR_PTR(-ENOMEM);
+			goto bail_qp;
+		}
+		qp->timeout_jiffies =
+			usecs_to_jiffies((4096UL * (1UL << qp->timeout)) /
+				1000UL);
+		if (init_attr->srq)
+			sz = 0;
+		else {
+			qp->r_rq.size = init_attr->cap.max_recv_wr + 1;
+			qp->r_rq.max_sge = init_attr->cap.max_recv_sge;
+			sz = (sizeof(struct ib_sge) * qp->r_rq.max_sge) +
+				sizeof(struct hfi1_rwqe);
+			qp->r_rq.wq = vmalloc_user(sizeof(struct hfi1_rwq) +
+						   qp->r_rq.size * sz);
+			if (!qp->r_rq.wq) {
+				ret = ERR_PTR(-ENOMEM);
+				goto bail_qp;
+			}
+		}
+
+		/*
+		 * ib_create_qp() will initialize qp->ibqp
+		 * except for qp->ibqp.qp_num.
+		 */
+		spin_lock_init(&qp->r_lock);
+		spin_lock_init(&qp->s_lock);
+		spin_lock_init(&qp->r_rq.lock);
+		atomic_set(&qp->refcount, 0);
+		init_waitqueue_head(&qp->wait);
+		init_timer(&qp->s_timer);
+		qp->s_timer.data = (unsigned long)qp;
+		INIT_LIST_HEAD(&qp->rspwait);
+		qp->state = IB_QPS_RESET;
+		qp->s_wq = swq;
+		qp->s_size = init_attr->cap.max_send_wr + 1;
+		qp->s_max_sge = init_attr->cap.max_send_sge;
+		if (init_attr->sq_sig_type == IB_SIGNAL_REQ_WR)
+			qp->s_flags = HFI1_S_SIGNAL_REQ_WR;
+		dev = to_idev(ibpd->device);
+		dd = dd_from_dev(dev);
+		err = alloc_qpn(dd, &dev->qp_dev->qpn_table, init_attr->qp_type,
+				init_attr->port_num);
+		if (err < 0) {
+			ret = ERR_PTR(err);
+			vfree(qp->r_rq.wq);
+			goto bail_qp;
+		}
+		qp->ibqp.qp_num = err;
+		qp->port_num = init_attr->port_num;
+		reset_qp(qp, init_attr->qp_type);
+
+		break;
+
+	default:
+		/* Don't support raw QPs */
+		ret = ERR_PTR(-ENOSYS);
+		goto bail;
+	}
+
+	init_attr->cap.max_inline_data = 0;
+
+	/*
+	 * Return the address of the RWQ as the offset to mmap.
+	 * See hfi1_mmap() for details.
+	 */
+	if (udata && udata->outlen >= sizeof(__u64)) {
+		if (!qp->r_rq.wq) {
+			__u64 offset = 0;
+
+			err = ib_copy_to_udata(udata, &offset,
+					       sizeof(offset));
+			if (err) {
+				ret = ERR_PTR(err);
+				goto bail_ip;
+			}
+		} else {
+			u32 s = sizeof(struct hfi1_rwq) + qp->r_rq.size * sz;
+
+			qp->ip = hfi1_create_mmap_info(dev, s,
+						      ibpd->uobject->context,
+						      qp->r_rq.wq);
+			if (!qp->ip) {
+				ret = ERR_PTR(-ENOMEM);
+				goto bail_ip;
+			}
+
+			err = ib_copy_to_udata(udata, &(qp->ip->offset),
+					       sizeof(qp->ip->offset));
+			if (err) {
+				ret = ERR_PTR(err);
+				goto bail_ip;
+			}
+		}
+	}
+
+	spin_lock(&dev->n_qps_lock);
+	if (dev->n_qps_allocated == hfi1_max_qps) {
+		spin_unlock(&dev->n_qps_lock);
+		ret = ERR_PTR(-ENOMEM);
+		goto bail_ip;
+	}
+
+	dev->n_qps_allocated++;
+	spin_unlock(&dev->n_qps_lock);
+
+	if (qp->ip) {
+		spin_lock_irq(&dev->pending_lock);
+		list_add(&qp->ip->pending_mmaps, &dev->pending_mmaps);
+		spin_unlock_irq(&dev->pending_lock);
+	}
+
+	ret = &qp->ibqp;
+
+	/*
+	 * We have our QP and its good, now keep track of what types of opcodes
+	 * can be processed on this QP. We do this by keeping track of what the
+	 * 3 high order bits of the opcode are.
+	 */
+	switch (init_attr->qp_type) {
+	case IB_QPT_SMI:
+	case IB_QPT_GSI:
+	case IB_QPT_UD:
+		qp->allowed_ops = IB_OPCODE_UD_SEND_ONLY & OPCODE_QP_MASK;
+		break;
+	case IB_QPT_RC:
+		qp->allowed_ops = IB_OPCODE_RC_SEND_ONLY & OPCODE_QP_MASK;
+		break;
+	case IB_QPT_UC:
+		qp->allowed_ops = IB_OPCODE_UC_SEND_ONLY & OPCODE_QP_MASK;
+		break;
+	default:
+		ret = ERR_PTR(-EINVAL);
+		goto bail_ip;
+	}
+
+	goto bail;
+
+bail_ip:
+	if (qp->ip)
+		kref_put(&qp->ip->ref, hfi1_release_mmap_info);
+	else
+		vfree(qp->r_rq.wq);
+	free_qpn(&dev->qp_dev->qpn_table, qp->ibqp.qp_num);
+bail_qp:
+	kfree(qp->s_hdr);
+	kfree(qp);
+bail_swq:
+	vfree(swq);
+bail:
+	return ret;
+}
+
+/**
+ * hfi1_destroy_qp - destroy a queue pair
+ * @ibqp: the queue pair to destroy
+ *
+ * Returns 0 on success.
+ *
+ * Note that this can be called while the QP is actively sending or
+ * receiving!
+ */
+int hfi1_destroy_qp(struct ib_qp *ibqp)
+{
+	struct hfi1_qp *qp = to_iqp(ibqp);
+	struct hfi1_ibdev *dev = to_idev(ibqp->device);
+
+	/* Make sure HW and driver activity is stopped. */
+	spin_lock_irq(&qp->r_lock);
+	spin_lock(&qp->s_lock);
+	if (qp->state != IB_QPS_RESET) {
+		qp->state = IB_QPS_RESET;
+		flush_iowait(qp);
+		qp->s_flags &= ~(HFI1_S_TIMER | HFI1_S_ANY_WAIT);
+		spin_unlock(&qp->s_lock);
+		spin_unlock_irq(&qp->r_lock);
+		cancel_work_sync(&qp->s_iowait.iowork);
+		del_timer_sync(&qp->s_timer);
+		iowait_sdma_drain(&qp->s_iowait);
+		flush_tx_list(qp);
+		remove_qp(dev, qp);
+		wait_event(qp->wait, !atomic_read(&qp->refcount));
+		spin_lock_irq(&qp->r_lock);
+		spin_lock(&qp->s_lock);
+		clear_mr_refs(qp, 1);
+		clear_ahg(qp);
+	}
+	spin_unlock(&qp->s_lock);
+	spin_unlock_irq(&qp->r_lock);
+
+	/* all user's cleaned up, mark it available */
+	free_qpn(&dev->qp_dev->qpn_table, qp->ibqp.qp_num);
+	spin_lock(&dev->n_qps_lock);
+	dev->n_qps_allocated--;
+	spin_unlock(&dev->n_qps_lock);
+
+	if (qp->ip)
+		kref_put(&qp->ip->ref, hfi1_release_mmap_info);
+	else
+		vfree(qp->r_rq.wq);
+	vfree(qp->s_wq);
+	kfree(qp->s_hdr);
+	kfree(qp);
+	return 0;
+}
+
+/**
+ * init_qpn_table - initialize the QP number table for a device
+ * @qpt: the QPN table
+ */
+static int init_qpn_table(struct hfi1_devdata *dd, struct hfi1_qpn_table *qpt)
+{
+	u32 offset, qpn, i;
+	struct qpn_map *map;
+	int ret = 0;
+
+	spin_lock_init(&qpt->lock);
+
+	qpt->last = 0;
+	qpt->incr = 1 << dd->qos_shift;
+
+	/* insure we don't assign QPs from KDETH 64K window */
+	qpn = kdeth_qp << 16;
+	qpt->nmaps = qpn / BITS_PER_PAGE;
+	/* This should always be zero */
+	offset = qpn & BITS_PER_PAGE_MASK;
+	map = &qpt->map[qpt->nmaps];
+	dd_dev_info(dd, "Reserving QPNs for KDETH window from 0x%x to 0x%x\n",
+		qpn, qpn + 65535);
+	for (i = 0; i < 65536; i++) {
+		if (!map->page) {
+			get_map_page(qpt, map);
+			if (!map->page) {
+				ret = -ENOMEM;
+				break;
+			}
+		}
+		set_bit(offset, map->page);
+		offset++;
+		if (offset == BITS_PER_PAGE) {
+			/* next page */
+			qpt->nmaps++;
+			map++;
+			offset = 0;
+		}
+	}
+	return ret;
+}
+
+/**
+ * free_qpn_table - free the QP number table for a device
+ * @qpt: the QPN table
+ */
+static void free_qpn_table(struct hfi1_qpn_table *qpt)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(qpt->map); i++)
+		free_page((unsigned long) qpt->map[i].page);
+}
+
+/**
+ * hfi1_get_credit - flush the send work queue of a QP
+ * @qp: the qp who's send work queue to flush
+ * @aeth: the Acknowledge Extended Transport Header
+ *
+ * The QP s_lock should be held.
+ */
+void hfi1_get_credit(struct hfi1_qp *qp, u32 aeth)
+{
+	u32 credit = (aeth >> HFI1_AETH_CREDIT_SHIFT) & HFI1_AETH_CREDIT_MASK;
+
+	/*
+	 * If the credit is invalid, we can send
+	 * as many packets as we like.  Otherwise, we have to
+	 * honor the credit field.
+	 */
+	if (credit == HFI1_AETH_CREDIT_INVAL) {
+		if (!(qp->s_flags & HFI1_S_UNLIMITED_CREDIT)) {
+			qp->s_flags |= HFI1_S_UNLIMITED_CREDIT;
+			if (qp->s_flags & HFI1_S_WAIT_SSN_CREDIT) {
+				qp->s_flags &= ~HFI1_S_WAIT_SSN_CREDIT;
+				hfi1_schedule_send(qp);
+			}
+		}
+	} else if (!(qp->s_flags & HFI1_S_UNLIMITED_CREDIT)) {
+		/* Compute new LSN (i.e., MSN + credit) */
+		credit = (aeth + credit_table[credit]) & HFI1_MSN_MASK;
+		if (cmp_msn(credit, qp->s_lsn) > 0) {
+			qp->s_lsn = credit;
+			if (qp->s_flags & HFI1_S_WAIT_SSN_CREDIT) {
+				qp->s_flags &= ~HFI1_S_WAIT_SSN_CREDIT;
+				hfi1_schedule_send(qp);
+			}
+		}
+	}
+}
+
+void hfi1_qp_wakeup(struct hfi1_qp *qp, u32 flag)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&qp->s_lock, flags);
+	if (qp->s_flags & flag) {
+		qp->s_flags &= ~flag;
+		trace_hfi1_qpwakeup(qp, flag);
+		hfi1_schedule_send(qp);
+	}
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+	/* Notify hfi1_destroy_qp() if it is waiting. */
+	if (atomic_dec_and_test(&qp->refcount))
+		wake_up(&qp->wait);
+}
+
+static int iowait_sleep(
+	struct sdma_engine *sde,
+	struct iowait *wait,
+	struct sdma_txreq *stx,
+	unsigned seq)
+{
+	struct verbs_txreq *tx = container_of(stx, struct verbs_txreq, txreq);
+	struct hfi1_qp *qp;
+	unsigned long flags;
+	int ret = 0;
+	struct hfi1_ibdev *dev;
+
+	qp = tx->qp;
+
+	spin_lock_irqsave(&qp->s_lock, flags);
+	if (ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_RECV_OK) {
+
+		/*
+		 * If we couldn't queue the DMA request, save the info
+		 * and try again later rather than destroying the
+		 * buffer and undoing the side effects of the copy.
+		 */
+		/* Make a common routine? */
+		dev = &sde->dd->verbs_dev;
+		list_add_tail(&stx->list, &wait->tx_head);
+		write_seqlock(&dev->iowait_lock);
+		if (sdma_progress(sde, seq, stx))
+			goto eagain;
+		if (list_empty(&qp->s_iowait.list)) {
+			struct hfi1_ibport *ibp =
+				to_iport(qp->ibqp.device, qp->port_num);
+
+			ibp->n_dmawait++;
+			qp->s_flags |= HFI1_S_WAIT_DMA_DESC;
+			list_add_tail(&qp->s_iowait.list, &sde->dmawait);
+			trace_hfi1_qpsleep(qp, HFI1_S_WAIT_DMA_DESC);
+			atomic_inc(&qp->refcount);
+		}
+		write_sequnlock(&dev->iowait_lock);
+		qp->s_flags &= ~HFI1_S_BUSY;
+		spin_unlock_irqrestore(&qp->s_lock, flags);
+		ret = -EBUSY;
+	} else {
+		spin_unlock_irqrestore(&qp->s_lock, flags);
+		hfi1_put_txreq(tx);
+	}
+	return ret;
+eagain:
+	write_sequnlock(&dev->iowait_lock);
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+	list_del_init(&stx->list);
+	return -EAGAIN;
+}
+
+static void iowait_wakeup(struct iowait *wait, int reason)
+{
+	struct hfi1_qp *qp = container_of(wait, struct hfi1_qp, s_iowait);
+
+	WARN_ON(reason != SDMA_AVAIL_REASON);
+	hfi1_qp_wakeup(qp, HFI1_S_WAIT_DMA_DESC);
+}
+
+int hfi1_qp_init(struct hfi1_ibdev *dev)
+{
+	struct hfi1_devdata *dd = dd_from_dev(dev);
+	int i;
+	int ret = -ENOMEM;
+
+	/* allocate parent object */
+	dev->qp_dev = kzalloc(sizeof(*dev->qp_dev), GFP_KERNEL);
+	if (!dev->qp_dev)
+		goto nomem;
+	/* allocate hash table */
+	dev->qp_dev->qp_table_size = hfi1_qp_table_size;
+	dev->qp_dev->qp_table_bits = ilog2(hfi1_qp_table_size);
+	dev->qp_dev->qp_table =
+		kmalloc(dev->qp_dev->qp_table_size *
+				sizeof(*dev->qp_dev->qp_table),
+			GFP_KERNEL);
+	if (!dev->qp_dev->qp_table)
+		goto nomem;
+	for (i = 0; i < dev->qp_dev->qp_table_size; i++)
+		RCU_INIT_POINTER(dev->qp_dev->qp_table[i], NULL);
+	spin_lock_init(&dev->qp_dev->qpt_lock);
+	/* initialize qpn map */
+	ret = init_qpn_table(dd, &dev->qp_dev->qpn_table);
+	if (ret)
+		goto nomem;
+	return ret;
+nomem:
+	if (dev->qp_dev) {
+		kfree(dev->qp_dev->qp_table);
+		free_qpn_table(&dev->qp_dev->qpn_table);
+		kfree(dev->qp_dev);
+	}
+	return ret;
+}
+
+void hfi1_qp_exit(struct hfi1_ibdev *dev)
+{
+	struct hfi1_devdata *dd = dd_from_dev(dev);
+	u32 qps_inuse;
+
+	qps_inuse = free_all_qps(dd);
+	if (qps_inuse)
+		dd_dev_err(dd, "QP memory leak! %u still in use\n",
+			   qps_inuse);
+	if (dev->qp_dev) {
+		kfree(dev->qp_dev->qp_table);
+		free_qpn_table(&dev->qp_dev->qpn_table);
+		kfree(dev->qp_dev);
+	}
+}
+
+/**
+ *
+ * qp_to_sdma_engine - map a qp to a send engine
+ * @qp: the QP
+ * @sc5: the 5 bit sc
+ *
+ * Return:
+ * A send engine for the qp or NULL for SMI type qp.
+ */
+struct sdma_engine *qp_to_sdma_engine(struct hfi1_qp *qp, u8 sc5)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(qp->ibqp.device);
+	struct sdma_engine *sde;
+
+	if (!(dd->flags & HFI1_HAS_SEND_DMA))
+		return NULL;
+	switch (qp->ibqp.qp_type) {
+	case IB_QPT_UC:
+	case IB_QPT_RC:
+		break;
+	case IB_QPT_SMI:
+		return NULL;
+	default:
+		break;
+	}
+	sde = sdma_select_engine_sc(dd, qp->ibqp.qp_num >> dd->qos_shift, sc5);
+	return sde;
+}
+
+struct qp_iter {
+	struct hfi1_ibdev *dev;
+	struct hfi1_qp *qp;
+	int specials;
+	int n;
+};
+
+struct qp_iter *qp_iter_init(struct hfi1_ibdev *dev)
+{
+	struct qp_iter *iter;
+
+	iter = kzalloc(sizeof(*iter), GFP_KERNEL);
+	if (!iter)
+		return NULL;
+
+	iter->dev = dev;
+	iter->specials = dev->ibdev.phys_port_cnt * 2;
+	if (qp_iter_next(iter)) {
+		kfree(iter);
+		return NULL;
+	}
+
+	return iter;
+}
+
+int qp_iter_next(struct qp_iter *iter)
+{
+	struct hfi1_ibdev *dev = iter->dev;
+	int n = iter->n;
+	int ret = 1;
+	struct hfi1_qp *pqp = iter->qp;
+	struct hfi1_qp *qp;
+
+	/*
+	 * The approach is to consider the special qps
+	 * as an additional table entries before the
+	 * real hash table.  Since the qp code sets
+	 * the qp->next hash link to NULL, this works just fine.
+	 *
+	 * iter->specials is 2 * # ports
+	 *
+	 * n = 0..iter->specials is the special qp indices
+	 *
+	 * n = iter->specials..dev->qp_dev->qp_table_size+iter->specials are
+	 * the potential hash bucket entries
+	 *
+	 */
+	for (; n <  dev->qp_dev->qp_table_size + iter->specials; n++) {
+		if (pqp) {
+			qp = rcu_dereference(pqp->next);
+		} else {
+			if (n < iter->specials) {
+				struct hfi1_pportdata *ppd;
+				struct hfi1_ibport *ibp;
+				int pidx;
+
+				pidx = n % dev->ibdev.phys_port_cnt;
+				ppd = &dd_from_dev(dev)->pport[pidx];
+				ibp = &ppd->ibport_data;
+
+				if (!(n & 1))
+					qp = rcu_dereference(ibp->qp[0]);
+				else
+					qp = rcu_dereference(ibp->qp[1]);
+			} else {
+				qp = rcu_dereference(
+					dev->qp_dev->qp_table[
+						(n - iter->specials)]);
+			}
+		}
+		pqp = qp;
+		if (qp) {
+			iter->qp = qp;
+			iter->n = n;
+			return 0;
+		}
+	}
+	return ret;
+}
+
+static const char * const qp_type_str[] = {
+	"SMI", "GSI", "RC", "UC", "UD",
+};
+
+static int qp_idle(struct hfi1_qp *qp)
+{
+	return
+		qp->s_last == qp->s_acked &&
+		qp->s_acked == qp->s_cur &&
+		qp->s_cur == qp->s_tail &&
+		qp->s_tail == qp->s_head;
+}
+
+void qp_iter_print(struct seq_file *s, struct qp_iter *iter)
+{
+	struct hfi1_swqe *wqe;
+	struct hfi1_qp *qp = iter->qp;
+	struct sdma_engine *sde;
+
+	sde = qp_to_sdma_engine(qp, qp->s_sc);
+	wqe = get_swqe_ptr(qp, qp->s_last);
+	seq_printf(s,
+		   "N %d %s QP%u R %u %s %u %u %u f=%x %u %u %u %u %u PSN %x %x %x %x %x (%u %u %u %u %u %u) QP%u LID %x SL %u MTU %d %u %u %u SDE %p,%u\n",
+		   iter->n,
+		   qp_idle(qp) ? "I" : "B",
+		   qp->ibqp.qp_num,
+		   atomic_read(&qp->refcount),
+		   qp_type_str[qp->ibqp.qp_type],
+		   qp->state,
+		   wqe ? wqe->wr.opcode : 0,
+		   qp->s_hdrwords,
+		   qp->s_flags,
+		   atomic_read(&qp->s_iowait.sdma_busy),
+		   !list_empty(&qp->s_iowait.list),
+		   qp->timeout,
+		   wqe ? wqe->ssn : 0,
+		   qp->s_lsn,
+		   qp->s_last_psn,
+		   qp->s_psn, qp->s_next_psn,
+		   qp->s_sending_psn, qp->s_sending_hpsn,
+		   qp->s_last, qp->s_acked, qp->s_cur,
+		   qp->s_tail, qp->s_head, qp->s_size,
+		   qp->remote_qpn,
+		   qp->remote_ah_attr.dlid,
+		   qp->remote_ah_attr.sl,
+		   qp->pmtu,
+		   qp->s_retry_cnt,
+		   qp->timeout,
+		   qp->s_rnr_retry_cnt,
+		   sde,
+		   sde ? sde->this_idx : 0);
+}
+
+void qp_comm_est(struct hfi1_qp *qp)
+{
+	qp->r_flags |= HFI1_R_COMM_EST;
+	if (qp->ibqp.event_handler) {
+		struct ib_event ev;
+
+		ev.device = qp->ibqp.device;
+		ev.element.qp = &qp->ibqp;
+		ev.event = IB_EVENT_COMM_EST;
+		qp->ibqp.event_handler(&ev, qp->ibqp.qp_context);
+	}
+}
diff --git a/drivers/staging/rdma/hfi1/qp.h b/drivers/staging/rdma/hfi1/qp.h
new file mode 100644
index 000000000000..6b505859b59c
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/qp.h
@@ -0,0 +1,235 @@
+#ifndef _QP_H
+#define _QP_H
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/hash.h>
+#include "verbs.h"
+
+#define QPN_MAX                 (1 << 24)
+#define QPNMAP_ENTRIES          (QPN_MAX / PAGE_SIZE / BITS_PER_BYTE)
+
+/*
+ * QPN-map pages start out as NULL, they get allocated upon
+ * first use and are never deallocated. This way,
+ * large bitmaps are not allocated unless large numbers of QPs are used.
+ */
+struct qpn_map {
+	void *page;
+};
+
+struct hfi1_qpn_table {
+	spinlock_t lock; /* protect changes in this struct */
+	unsigned flags;         /* flags for QP0/1 allocated for each port */
+	u32 last;               /* last QP number allocated */
+	u32 nmaps;              /* size of the map table */
+	u16 limit;
+	u8  incr;
+	/* bit map of free QP numbers other than 0/1 */
+	struct qpn_map map[QPNMAP_ENTRIES];
+};
+
+struct hfi1_qp_ibdev {
+	u32 qp_table_size;
+	u32 qp_table_bits;
+	struct hfi1_qp __rcu **qp_table;
+	spinlock_t qpt_lock;
+	struct hfi1_qpn_table qpn_table;
+};
+
+static inline u32 qpn_hash(struct hfi1_qp_ibdev *dev, u32 qpn)
+{
+	return hash_32(qpn, dev->qp_table_bits);
+}
+
+/**
+ * hfi1_lookup_qpn - return the QP with the given QPN
+ * @ibp: the ibport
+ * @qpn: the QP number to look up
+ *
+ * The caller must hold the rcu_read_lock(), and keep the lock until
+ * the returned qp is no longer in use.
+ */
+static inline struct hfi1_qp *hfi1_lookup_qpn(struct hfi1_ibport *ibp,
+				u32 qpn) __must_hold(RCU)
+{
+	struct hfi1_qp *qp = NULL;
+
+	if (unlikely(qpn <= 1)) {
+		qp = rcu_dereference(ibp->qp[qpn]);
+	} else {
+		struct hfi1_ibdev *dev = &ppd_from_ibp(ibp)->dd->verbs_dev;
+		u32 n = qpn_hash(dev->qp_dev, qpn);
+
+		for (qp = rcu_dereference(dev->qp_dev->qp_table[n]); qp;
+			qp = rcu_dereference(qp->next))
+			if (qp->ibqp.qp_num == qpn)
+				break;
+	}
+	return qp;
+}
+
+/**
+ * hfi1_error_qp - put a QP into the error state
+ * @qp: the QP to put into the error state
+ * @err: the receive completion error to signal if a RWQE is active
+ *
+ * Flushes both send and receive work queues.
+ * Returns true if last WQE event should be generated.
+ * The QP r_lock and s_lock should be held and interrupts disabled.
+ * If we are already in error state, just return.
+ */
+int hfi1_error_qp(struct hfi1_qp *qp, enum ib_wc_status err);
+
+/**
+ * hfi1_modify_qp - modify the attributes of a queue pair
+ * @ibqp: the queue pair who's attributes we're modifying
+ * @attr: the new attributes
+ * @attr_mask: the mask of attributes to modify
+ * @udata: user data for libibverbs.so
+ *
+ * Returns 0 on success, otherwise returns an errno.
+ */
+int hfi1_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
+		   int attr_mask, struct ib_udata *udata);
+
+int hfi1_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
+		  int attr_mask, struct ib_qp_init_attr *init_attr);
+
+/**
+ * hfi1_compute_aeth - compute the AETH (syndrome + MSN)
+ * @qp: the queue pair to compute the AETH for
+ *
+ * Returns the AETH.
+ */
+__be32 hfi1_compute_aeth(struct hfi1_qp *qp);
+
+/**
+ * hfi1_create_qp - create a queue pair for a device
+ * @ibpd: the protection domain who's device we create the queue pair for
+ * @init_attr: the attributes of the queue pair
+ * @udata: user data for libibverbs.so
+ *
+ * Returns the queue pair on success, otherwise returns an errno.
+ *
+ * Called by the ib_create_qp() core verbs function.
+ */
+struct ib_qp *hfi1_create_qp(struct ib_pd *ibpd,
+			     struct ib_qp_init_attr *init_attr,
+			     struct ib_udata *udata);
+/**
+ * hfi1_destroy_qp - destroy a queue pair
+ * @ibqp: the queue pair to destroy
+ *
+ * Returns 0 on success.
+ *
+ * Note that this can be called while the QP is actively sending or
+ * receiving!
+ */
+int hfi1_destroy_qp(struct ib_qp *ibqp);
+
+/**
+ * hfi1_get_credit - flush the send work queue of a QP
+ * @qp: the qp who's send work queue to flush
+ * @aeth: the Acknowledge Extended Transport Header
+ *
+ * The QP s_lock should be held.
+ */
+void hfi1_get_credit(struct hfi1_qp *qp, u32 aeth);
+
+/**
+ * hfi1_qp_init - allocate QP tables
+ * @dev: a pointer to the hfi1_ibdev
+ */
+int hfi1_qp_init(struct hfi1_ibdev *dev);
+
+/**
+ * hfi1_qp_exit - free the QP related structures
+ * @dev: a pointer to the hfi1_ibdev
+ */
+void hfi1_qp_exit(struct hfi1_ibdev *dev);
+
+/**
+ * hfi1_qp_waitup - wake up on the indicated event
+ * @qp: the QP
+ * @flag: flag the qp on which the qp is stalled
+ */
+void hfi1_qp_wakeup(struct hfi1_qp *qp, u32 flag);
+
+struct sdma_engine *qp_to_sdma_engine(struct hfi1_qp *qp, u8 sc5);
+
+struct qp_iter;
+
+/**
+ * qp_iter_init - wake up on the indicated event
+ * @dev: the hfi1_ibdev
+ */
+struct qp_iter *qp_iter_init(struct hfi1_ibdev *dev);
+
+/**
+ * qp_iter_next - wakeup on the indicated event
+ * @iter: the iterator for the qp hash list
+ */
+int qp_iter_next(struct qp_iter *iter);
+
+/**
+ * qp_iter_next - wake up on the indicated event
+ * @s: the seq_file to emit the qp information on
+ * @iter: the iterator for the qp hash list
+ */
+void qp_iter_print(struct seq_file *s, struct qp_iter *iter);
+
+/**
+ * qp_comm_est - handle trap with QP established
+ * @qp: the QP
+ */
+void qp_comm_est(struct hfi1_qp *qp);
+
+#endif /* _QP_H */
diff --git a/drivers/staging/rdma/hfi1/qsfp.c b/drivers/staging/rdma/hfi1/qsfp.c
new file mode 100644
index 000000000000..3138936157db
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/qsfp.c
@@ -0,0 +1,546 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/delay.h>
+#include <linux/pci.h>
+#include <linux/vmalloc.h>
+
+#include "hfi.h"
+#include "twsi.h"
+
+/*
+ * QSFP support for hfi driver, using "Two Wire Serial Interface" driver
+ * in twsi.c
+ */
+#define I2C_MAX_RETRY 4
+
+/*
+ * Unlocked i2c write.  Must hold dd->qsfp_i2c_mutex.
+ */
+static int __i2c_write(struct hfi1_pportdata *ppd, u32 target, int i2c_addr,
+		       int offset, void *bp, int len)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	int ret, cnt;
+	u8 *buff = bp;
+
+	/* Make sure TWSI bus is in sane state. */
+	ret = hfi1_twsi_reset(dd, target);
+	if (ret) {
+		hfi1_dev_porterr(dd, ppd->port,
+				 "I2C interface Reset for write failed\n");
+		return -EIO;
+	}
+
+	cnt = 0;
+	while (cnt < len) {
+		int wlen = len - cnt;
+
+		ret = hfi1_twsi_blk_wr(dd, target, i2c_addr, offset,
+				       buff + cnt, wlen);
+		if (ret) {
+			/* hfi1_twsi_blk_wr() 1 for error, else 0 */
+			return -EIO;
+		}
+		offset += wlen;
+		cnt += wlen;
+	}
+
+	/* Must wait min 20us between qsfp i2c transactions */
+	udelay(20);
+
+	return cnt;
+}
+
+int i2c_write(struct hfi1_pportdata *ppd, u32 target, int i2c_addr, int offset,
+	      void *bp, int len)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	int ret;
+
+	ret = mutex_lock_interruptible(&dd->qsfp_i2c_mutex);
+	if (!ret) {
+		ret = __i2c_write(ppd, target, i2c_addr, offset, bp, len);
+		mutex_unlock(&dd->qsfp_i2c_mutex);
+	}
+
+	return ret;
+}
+
+/*
+ * Unlocked i2c read.  Must hold dd->qsfp_i2c_mutex.
+ */
+static int __i2c_read(struct hfi1_pportdata *ppd, u32 target, int i2c_addr,
+		      int offset, void *bp, int len)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	int ret, cnt, pass = 0;
+	int stuck = 0;
+	u8 *buff = bp;
+
+	/* Make sure TWSI bus is in sane state. */
+	ret = hfi1_twsi_reset(dd, target);
+	if (ret) {
+		hfi1_dev_porterr(dd, ppd->port,
+				 "I2C interface Reset for read failed\n");
+		ret = -EIO;
+		stuck = 1;
+		goto exit;
+	}
+
+	cnt = 0;
+	while (cnt < len) {
+		int rlen = len - cnt;
+
+		ret = hfi1_twsi_blk_rd(dd, target, i2c_addr, offset,
+				       buff + cnt, rlen);
+		/* Some QSFP's fail first try. Retry as experiment */
+		if (ret && cnt == 0 && ++pass < I2C_MAX_RETRY)
+			continue;
+		if (ret) {
+			/* hfi1_twsi_blk_rd() 1 for error, else 0 */
+			ret = -EIO;
+			goto exit;
+		}
+		offset += rlen;
+		cnt += rlen;
+	}
+
+	ret = cnt;
+
+exit:
+	if (stuck)
+		dd_dev_err(dd, "I2C interface bus stuck non-idle\n");
+
+	if (pass >= I2C_MAX_RETRY && ret)
+		hfi1_dev_porterr(dd, ppd->port,
+				 "I2C failed even retrying\n");
+	else if (pass)
+		hfi1_dev_porterr(dd, ppd->port, "I2C retries: %d\n", pass);
+
+	/* Must wait min 20us between qsfp i2c transactions */
+	udelay(20);
+
+	return ret;
+}
+
+int i2c_read(struct hfi1_pportdata *ppd, u32 target, int i2c_addr, int offset,
+	     void *bp, int len)
+{
+	struct hfi1_devdata *dd = ppd->dd;
+	int ret;
+
+	ret = mutex_lock_interruptible(&dd->qsfp_i2c_mutex);
+	if (!ret) {
+		ret = __i2c_read(ppd, target, i2c_addr, offset, bp, len);
+		mutex_unlock(&dd->qsfp_i2c_mutex);
+	}
+
+	return ret;
+}
+
+int qsfp_write(struct hfi1_pportdata *ppd, u32 target, int addr, void *bp,
+	       int len)
+{
+	int count = 0;
+	int offset;
+	int nwrite;
+	int ret;
+	u8 page;
+
+	ret = mutex_lock_interruptible(&ppd->dd->qsfp_i2c_mutex);
+	if (ret)
+		return ret;
+
+	while (count < len) {
+		/*
+		 * Set the qsfp page based on a zero-based addresss
+		 * and a page size of QSFP_PAGESIZE bytes.
+		 */
+		page = (u8)(addr / QSFP_PAGESIZE);
+
+		ret = __i2c_write(ppd, target, QSFP_DEV,
+					QSFP_PAGE_SELECT_BYTE_OFFS, &page, 1);
+		if (ret != 1) {
+			hfi1_dev_porterr(
+			ppd->dd,
+			ppd->port,
+			"can't write QSFP_PAGE_SELECT_BYTE: %d\n", ret);
+			ret = -EIO;
+			break;
+		}
+
+		/* truncate write to end of page if crossing page boundary */
+		offset = addr % QSFP_PAGESIZE;
+		nwrite = len - count;
+		if ((offset + nwrite) > QSFP_PAGESIZE)
+			nwrite = QSFP_PAGESIZE - offset;
+
+		ret = __i2c_write(ppd, target, QSFP_DEV, offset, bp + count,
+					nwrite);
+		if (ret <= 0)	/* stop on error or nothing read */
+			break;
+
+		count += ret;
+		addr += ret;
+	}
+
+	mutex_unlock(&ppd->dd->qsfp_i2c_mutex);
+
+	if (ret < 0)
+		return ret;
+	return count;
+}
+
+int qsfp_read(struct hfi1_pportdata *ppd, u32 target, int addr, void *bp,
+	      int len)
+{
+	int count = 0;
+	int offset;
+	int nread;
+	int ret;
+	u8 page;
+
+	ret = mutex_lock_interruptible(&ppd->dd->qsfp_i2c_mutex);
+	if (ret)
+		return ret;
+
+	while (count < len) {
+		/*
+		 * Set the qsfp page based on a zero-based address
+		 * and a page size of QSFP_PAGESIZE bytes.
+		 */
+		page = (u8)(addr / QSFP_PAGESIZE);
+		ret = __i2c_write(ppd, target, QSFP_DEV,
+					QSFP_PAGE_SELECT_BYTE_OFFS, &page, 1);
+		if (ret != 1) {
+			hfi1_dev_porterr(
+			ppd->dd,
+			ppd->port,
+			"can't write QSFP_PAGE_SELECT_BYTE: %d\n", ret);
+			ret = -EIO;
+			break;
+		}
+
+		/* truncate read to end of page if crossing page boundary */
+		offset = addr % QSFP_PAGESIZE;
+		nread = len - count;
+		if ((offset + nread) > QSFP_PAGESIZE)
+			nread = QSFP_PAGESIZE - offset;
+
+		ret = __i2c_read(ppd, target, QSFP_DEV, offset, bp + count,
+					nread);
+		if (ret <= 0)	/* stop on error or nothing read */
+			break;
+
+		count += ret;
+		addr += ret;
+	}
+
+	mutex_unlock(&ppd->dd->qsfp_i2c_mutex);
+
+	if (ret < 0)
+		return ret;
+	return count;
+}
+
+/*
+ * This function caches the QSFP memory range in 128 byte chunks.
+ * As an example, the next byte after address 255 is byte 128 from
+ * upper page 01H (if existing) rather than byte 0 from lower page 00H.
+ */
+int refresh_qsfp_cache(struct hfi1_pportdata *ppd, struct qsfp_data *cp)
+{
+	u32 target = ppd->dd->hfi1_id;
+	int ret;
+	unsigned long flags;
+	u8 *cache = &cp->cache[0];
+
+	/* ensure sane contents on invalid reads, for cable swaps */
+	memset(cache, 0, (QSFP_MAX_NUM_PAGES*128));
+	dd_dev_info(ppd->dd, "%s: called\n", __func__);
+	if (!qsfp_mod_present(ppd)) {
+		ret = -ENODEV;
+		goto bail;
+	}
+
+	ret = qsfp_read(ppd, target, 0, cache, 256);
+	if (ret != 256) {
+		dd_dev_info(ppd->dd,
+			"%s: Read of pages 00H failed, expected 256, got %d\n",
+			__func__, ret);
+		goto bail;
+	}
+
+	if (cache[0] != 0x0C && cache[0] != 0x0D)
+		goto bail;
+
+	/* Is paging enabled? */
+	if (!(cache[2] & 4)) {
+
+		/* Paging enabled, page 03 required */
+		if ((cache[195] & 0xC0) == 0xC0) {
+			/* all */
+			ret = qsfp_read(ppd, target, 384, cache + 256, 128);
+			if (ret <= 0 || ret != 128) {
+				dd_dev_info(ppd->dd, "%s: failed\n", __func__);
+				goto bail;
+			}
+			ret = qsfp_read(ppd, target, 640, cache + 384, 128);
+			if (ret <= 0 || ret != 128) {
+				dd_dev_info(ppd->dd, "%s: failed\n", __func__);
+				goto bail;
+			}
+			ret = qsfp_read(ppd, target, 896, cache + 512, 128);
+			if (ret <= 0 || ret != 128) {
+				dd_dev_info(ppd->dd, "%s: failed\n", __func__);
+				goto bail;
+			}
+		} else if ((cache[195] & 0x80) == 0x80) {
+			/* only page 2 and 3 */
+			ret = qsfp_read(ppd, target, 640, cache + 384, 128);
+			if (ret <= 0 || ret != 128) {
+				dd_dev_info(ppd->dd, "%s: failed\n", __func__);
+				goto bail;
+			}
+			ret = qsfp_read(ppd, target, 896, cache + 512, 128);
+			if (ret <= 0 || ret != 128) {
+				dd_dev_info(ppd->dd, "%s: failed\n", __func__);
+				goto bail;
+			}
+		} else if ((cache[195] & 0x40) == 0x40) {
+			/* only page 1 and 3 */
+			ret = qsfp_read(ppd, target, 384, cache + 256, 128);
+			if (ret <= 0 || ret != 128) {
+				dd_dev_info(ppd->dd, "%s: failed\n", __func__);
+				goto bail;
+			}
+			ret = qsfp_read(ppd, target, 896, cache + 512, 128);
+			if (ret <= 0 || ret != 128) {
+				dd_dev_info(ppd->dd, "%s: failed\n", __func__);
+				goto bail;
+			}
+		} else {
+			/* only page 3 */
+			ret = qsfp_read(ppd, target, 896, cache + 512, 128);
+			if (ret <= 0 || ret != 128) {
+				dd_dev_info(ppd->dd, "%s: failed\n", __func__);
+				goto bail;
+			}
+		}
+	}
+
+	spin_lock_irqsave(&ppd->qsfp_info.qsfp_lock, flags);
+	ppd->qsfp_info.cache_valid = 1;
+	ppd->qsfp_info.cache_refresh_required = 0;
+	spin_unlock_irqrestore(&ppd->qsfp_info.qsfp_lock, flags);
+
+	return 0;
+
+bail:
+	memset(cache, 0, (QSFP_MAX_NUM_PAGES*128));
+	return ret;
+}
+
+const char * const hfi1_qsfp_devtech[16] = {
+	"850nm VCSEL", "1310nm VCSEL", "1550nm VCSEL", "1310nm FP",
+	"1310nm DFB", "1550nm DFB", "1310nm EML", "1550nm EML",
+	"Cu Misc", "1490nm DFB", "Cu NoEq", "Cu Eq",
+	"Undef", "Cu Active BothEq", "Cu FarEq", "Cu NearEq"
+};
+
+#define QSFP_DUMP_CHUNK 16 /* Holds longest string */
+#define QSFP_DEFAULT_HDR_CNT 224
+
+static const char *pwr_codes = "1.5W2.0W2.5W3.5W";
+
+int qsfp_mod_present(struct hfi1_pportdata *ppd)
+{
+	if (HFI1_CAP_IS_KSET(QSFP_ENABLED)) {
+		struct hfi1_devdata *dd = ppd->dd;
+		u64 reg;
+
+		reg = read_csr(dd,
+			dd->hfi1_id ? ASIC_QSFP2_IN : ASIC_QSFP1_IN);
+		return !(reg & QSFP_HFI0_MODPRST_N);
+	}
+	/* always return cable present */
+	return 1;
+}
+
+/*
+ * This function maps QSFP memory addresses in 128 byte chunks in the following
+ * fashion per the CableInfo SMA query definition in the IBA 1.3 spec/OPA Gen 1
+ * spec
+ * For addr 000-127, lower page 00h
+ * For addr 128-255, upper page 00h
+ * For addr 256-383, upper page 01h
+ * For addr 384-511, upper page 02h
+ * For addr 512-639, upper page 03h
+ *
+ * For addresses beyond this range, it returns the invalid range of data buffer
+ * set to 0.
+ * For upper pages that are optional, if they are not valid, returns the
+ * particular range of bytes in the data buffer set to 0.
+ */
+int get_cable_info(struct hfi1_devdata *dd, u32 port_num, u32 addr, u32 len,
+		   u8 *data)
+{
+	struct hfi1_pportdata *ppd;
+	u32 excess_len = 0;
+	int ret = 0;
+
+	if (port_num > dd->num_pports || port_num < 1) {
+		dd_dev_info(dd, "%s: Invalid port number %d\n",
+				__func__, port_num);
+		ret = -EINVAL;
+		goto set_zeroes;
+	}
+
+	ppd = dd->pport + (port_num - 1);
+	if (!qsfp_mod_present(ppd)) {
+		ret = -ENODEV;
+		goto set_zeroes;
+	}
+
+	if (!ppd->qsfp_info.cache_valid) {
+		ret = -EINVAL;
+		goto set_zeroes;
+	}
+
+	if (addr >= (QSFP_MAX_NUM_PAGES * 128)) {
+		ret = -ERANGE;
+		goto set_zeroes;
+	}
+
+	if ((addr + len) > (QSFP_MAX_NUM_PAGES * 128)) {
+		excess_len = (addr + len) - (QSFP_MAX_NUM_PAGES * 128);
+		memcpy(data, &ppd->qsfp_info.cache[addr], (len - excess_len));
+		data += (len - excess_len);
+		goto set_zeroes;
+	}
+
+	memcpy(data, &ppd->qsfp_info.cache[addr], len);
+	return 0;
+
+set_zeroes:
+	memset(data, 0, excess_len);
+	return ret;
+}
+
+int qsfp_dump(struct hfi1_pportdata *ppd, char *buf, int len)
+{
+	u8 *cache = &ppd->qsfp_info.cache[0];
+	u8 bin_buff[QSFP_DUMP_CHUNK];
+	char lenstr[6];
+	int sofar, ret;
+	int bidx = 0;
+	u8 *atten = &cache[QSFP_ATTEN_OFFS];
+	u8 *vendor_oui = &cache[QSFP_VOUI_OFFS];
+
+	sofar = 0;
+	lenstr[0] = ' ';
+	lenstr[1] = '\0';
+
+	if (ppd->qsfp_info.cache_valid) {
+
+		if (QSFP_IS_CU(cache[QSFP_MOD_TECH_OFFS]))
+			sprintf(lenstr, "%dM ", cache[QSFP_MOD_LEN_OFFS]);
+
+		sofar += scnprintf(buf + sofar, len - sofar, "PWR:%.3sW\n",
+				pwr_codes +
+				(QSFP_PWR(cache[QSFP_MOD_PWR_OFFS]) * 4));
+
+		sofar += scnprintf(buf + sofar, len - sofar, "TECH:%s%s\n",
+				lenstr,
+			hfi1_qsfp_devtech[(cache[QSFP_MOD_TECH_OFFS]) >> 4]);
+
+		sofar += scnprintf(buf + sofar, len - sofar, "Vendor:%.*s\n",
+				   QSFP_VEND_LEN, &cache[QSFP_VEND_OFFS]);
+
+		sofar += scnprintf(buf + sofar, len - sofar, "OUI:%06X\n",
+				   QSFP_OUI(vendor_oui));
+
+		sofar += scnprintf(buf + sofar, len - sofar, "Part#:%.*s\n",
+				   QSFP_PN_LEN, &cache[QSFP_PN_OFFS]);
+
+		sofar += scnprintf(buf + sofar, len - sofar, "Rev:%.*s\n",
+				   QSFP_REV_LEN, &cache[QSFP_REV_OFFS]);
+
+		if (QSFP_IS_CU(cache[QSFP_MOD_TECH_OFFS]))
+			sofar += scnprintf(buf + sofar, len - sofar,
+				"Atten:%d, %d\n",
+				QSFP_ATTEN_SDR(atten),
+				QSFP_ATTEN_DDR(atten));
+
+		sofar += scnprintf(buf + sofar, len - sofar, "Serial:%.*s\n",
+				   QSFP_SN_LEN, &cache[QSFP_SN_OFFS]);
+
+		sofar += scnprintf(buf + sofar, len - sofar, "Date:%.*s\n",
+				   QSFP_DATE_LEN, &cache[QSFP_DATE_OFFS]);
+
+		sofar += scnprintf(buf + sofar, len - sofar, "Lot:%.*s\n",
+				   QSFP_LOT_LEN, &cache[QSFP_LOT_OFFS]);
+
+		while (bidx < QSFP_DEFAULT_HDR_CNT) {
+			int iidx;
+
+			memcpy(bin_buff, &cache[bidx], QSFP_DUMP_CHUNK);
+			for (iidx = 0; iidx < QSFP_DUMP_CHUNK; ++iidx) {
+				sofar += scnprintf(buf + sofar, len-sofar,
+					" %02X", bin_buff[iidx]);
+			}
+			sofar += scnprintf(buf + sofar, len - sofar, "\n");
+			bidx += QSFP_DUMP_CHUNK;
+		}
+	}
+	ret = sofar;
+	return ret;
+}
diff --git a/drivers/staging/rdma/hfi1/qsfp.h b/drivers/staging/rdma/hfi1/qsfp.h
new file mode 100644
index 000000000000..d30c2a6baa0b
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/qsfp.h
@@ -0,0 +1,222 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+/* QSFP support common definitions, for hfi driver */
+
+#define QSFP_DEV 0xA0
+#define QSFP_PWR_LAG_MSEC 2000
+#define QSFP_MODPRS_LAG_MSEC 20
+/* 128 byte pages, per SFF 8636 rev 2.4 */
+#define QSFP_MAX_NUM_PAGES	5
+
+/*
+ * Below are masks for QSFP pins.  Pins are the same for HFI0 and HFI1.
+ * _N means asserted low
+ */
+#define QSFP_HFI0_I2CCLK    (1 << 0)
+#define QSFP_HFI0_I2CDAT    (1 << 1)
+#define QSFP_HFI0_RESET_N   (1 << 2)
+#define QSFP_HFI0_INT_N	    (1 << 3)
+#define QSFP_HFI0_MODPRST_N (1 << 4)
+
+/* QSFP is paged at 256 bytes */
+#define QSFP_PAGESIZE 256
+
+/* Defined fields that Intel requires of qualified cables */
+/* Byte 0 is Identifier, not checked */
+/* Byte 1 is reserved "status MSB" */
+/* Byte 2 is "status LSB" We only care that D2 "Flat Mem" is set. */
+/*
+ * Rest of first 128 not used, although 127 is reserved for page select
+ * if module is not "Flat memory".
+ */
+#define QSFP_PAGE_SELECT_BYTE_OFFS 127
+/* Byte 128 is Identifier: must be 0x0c for QSFP, or 0x0d for QSFP+ */
+#define QSFP_MOD_ID_OFFS 128
+/*
+ * Byte 129 is "Extended Identifier". We only care about D7,D6: Power class
+ *  0:1.5W, 1:2.0W, 2:2.5W, 3:3.5W
+ */
+#define QSFP_MOD_PWR_OFFS 129
+/* Byte 130 is Connector type. Not Intel req'd */
+/* Bytes 131..138 are Transceiver types, bit maps for various tech, none IB */
+/* Byte 139 is encoding. code 0x01 is 8b10b. Not Intel req'd */
+/* byte 140 is nominal bit-rate, in units of 100Mbits/sec Not Intel req'd */
+/* Byte 141 is Extended Rate Select. Not Intel req'd */
+/* Bytes 142..145 are lengths for various fiber types. Not Intel req'd */
+/* Byte 146 is length for Copper. Units of 1 meter */
+#define QSFP_MOD_LEN_OFFS 146
+/*
+ * Byte 147 is Device technology. D0..3 not Intel req'd
+ * D4..7 select from 15 choices, translated by table:
+ */
+#define QSFP_MOD_TECH_OFFS 147
+extern const char *const hfi1_qsfp_devtech[16];
+/* Active Equalization includes fiber, copper full EQ, and copper near Eq */
+#define QSFP_IS_ACTIVE(tech) ((0xA2FF >> ((tech) >> 4)) & 1)
+/* Active Equalization includes fiber, copper full EQ, and copper far Eq */
+#define QSFP_IS_ACTIVE_FAR(tech) ((0x32FF >> ((tech) >> 4)) & 1)
+/* Attenuation should be valid for copper other than full/near Eq */
+#define QSFP_HAS_ATTEN(tech) ((0x4D00 >> ((tech) >> 4)) & 1)
+/* Length is only valid if technology is "copper" */
+#define QSFP_IS_CU(tech) ((0xED00 >> ((tech) >> 4)) & 1)
+#define QSFP_TECH_1490 9
+
+#define QSFP_OUI(oui) (((unsigned)oui[0] << 16) | ((unsigned)oui[1] << 8) | \
+			oui[2])
+#define QSFP_OUI_AMPHENOL 0x415048
+#define QSFP_OUI_FINISAR  0x009065
+#define QSFP_OUI_GORE     0x002177
+
+/* Bytes 148..163 are Vendor Name, Left-justified Blank-filled */
+#define QSFP_VEND_OFFS 148
+#define QSFP_VEND_LEN 16
+/* Byte 164 is IB Extended transceiver codes Bits D0..3 are SDR,DDR,QDR,EDR */
+#define QSFP_IBXCV_OFFS 164
+/* Bytes 165..167 are Vendor OUI number */
+#define QSFP_VOUI_OFFS 165
+#define QSFP_VOUI_LEN 3
+/* Bytes 168..183 are Vendor Part Number, string */
+#define QSFP_PN_OFFS 168
+#define QSFP_PN_LEN 16
+/* Bytes 184,185 are Vendor Rev. Left Justified, Blank-filled */
+#define QSFP_REV_OFFS 184
+#define QSFP_REV_LEN 2
+/*
+ * Bytes 186,187 are Wavelength, if Optical. Not Intel req'd
+ *  If copper, they are attenuation in dB:
+ * Byte 186 is at 2.5Gb/sec (SDR), Byte 187 at 5.0Gb/sec (DDR)
+ */
+#define QSFP_ATTEN_OFFS 186
+#define QSFP_ATTEN_LEN 2
+/* Bytes 188,189 are Wavelength tolerance, not Intel req'd */
+/* Byte 190 is Max Case Temp. Not Intel req'd */
+/* Byte 191 is LSB of sum of bytes 128..190. Not Intel req'd */
+#define QSFP_CC_OFFS 191
+/* Bytes 192..195 are Options implemented in qsfp. Not Intel req'd */
+/* Bytes 196..211 are Serial Number, String */
+#define QSFP_SN_OFFS 196
+#define QSFP_SN_LEN 16
+/* Bytes 212..219 are date-code YYMMDD (MM==1 for Jan) */
+#define QSFP_DATE_OFFS 212
+#define QSFP_DATE_LEN 6
+/* Bytes 218,219 are optional lot-code, string */
+#define QSFP_LOT_OFFS 218
+#define QSFP_LOT_LEN 2
+/* Bytes 220, 221 indicate monitoring options, Not Intel req'd */
+/* Byte 223 is LSB of sum of bytes 192..222 */
+#define QSFP_CC_EXT_OFFS 223
+
+/*
+ * Interrupt flag masks
+ */
+#define QSFP_DATA_NOT_READY		0x01
+
+#define QSFP_HIGH_TEMP_ALARM		0x80
+#define QSFP_LOW_TEMP_ALARM		0x40
+#define QSFP_HIGH_TEMP_WARNING		0x20
+#define QSFP_LOW_TEMP_WARNING		0x10
+
+#define QSFP_HIGH_VCC_ALARM		0x80
+#define QSFP_LOW_VCC_ALARM		0x40
+#define QSFP_HIGH_VCC_WARNING		0x20
+#define QSFP_LOW_VCC_WARNING		0x10
+
+#define QSFP_HIGH_POWER_ALARM		0x88
+#define QSFP_LOW_POWER_ALARM		0x44
+#define QSFP_HIGH_POWER_WARNING		0x22
+#define QSFP_LOW_POWER_WARNING		0x11
+
+#define QSFP_HIGH_BIAS_ALARM		0x88
+#define QSFP_LOW_BIAS_ALARM		0x44
+#define QSFP_HIGH_BIAS_WARNING		0x22
+#define QSFP_LOW_BIAS_WARNING		0x11
+
+/*
+ * struct qsfp_data encapsulates state of QSFP device for one port.
+ * it will be part of port-specific data if a board supports QSFP.
+ *
+ * Since multiple board-types use QSFP, and their pport_data structs
+ * differ (in the chip-specific section), we need a pointer to its head.
+ *
+ * Avoiding premature optimization, we will have one work_struct per port,
+ * and let the qsfp_lock arbitrate access to common resources.
+ *
+ */
+
+#define QSFP_PWR(pbyte) (((pbyte) >> 6) & 3)
+#define QSFP_ATTEN_SDR(attenarray) (attenarray[0])
+#define QSFP_ATTEN_DDR(attenarray) (attenarray[1])
+
+struct qsfp_data {
+	/* Helps to find our way */
+	struct hfi1_pportdata *ppd;
+	struct work_struct qsfp_work;
+	u8 cache[QSFP_MAX_NUM_PAGES*128];
+	spinlock_t qsfp_lock;
+	u8 check_interrupt_flags;
+	u8 qsfp_interrupt_functional;
+	u8 cache_valid;
+	u8 cache_refresh_required;
+};
+
+int refresh_qsfp_cache(struct hfi1_pportdata *ppd,
+		       struct qsfp_data *cp);
+int qsfp_mod_present(struct hfi1_pportdata *ppd);
+int get_cable_info(struct hfi1_devdata *dd, u32 port_num, u32 addr,
+		   u32 len, u8 *data);
+
+int i2c_write(struct hfi1_pportdata *ppd, u32 target, int i2c_addr,
+	      int offset, void *bp, int len);
+int i2c_read(struct hfi1_pportdata *ppd, u32 target, int i2c_addr,
+	     int offset, void *bp, int len);
+int qsfp_write(struct hfi1_pportdata *ppd, u32 target, int addr, void *bp,
+	       int len);
+int qsfp_read(struct hfi1_pportdata *ppd, u32 target, int addr, void *bp,
+	      int len);
diff --git a/drivers/staging/rdma/hfi1/rc.c b/drivers/staging/rdma/hfi1/rc.c
new file mode 100644
index 000000000000..632dd5ba7dfd
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/rc.c
@@ -0,0 +1,2426 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/io.h>
+
+#include "hfi.h"
+#include "qp.h"
+#include "sdma.h"
+#include "trace.h"
+
+/* cut down ridiculously long IB macro names */
+#define OP(x) IB_OPCODE_RC_##x
+
+static void rc_timeout(unsigned long arg);
+
+static u32 restart_sge(struct hfi1_sge_state *ss, struct hfi1_swqe *wqe,
+		       u32 psn, u32 pmtu)
+{
+	u32 len;
+
+	len = delta_psn(psn, wqe->psn) * pmtu;
+	ss->sge = wqe->sg_list[0];
+	ss->sg_list = wqe->sg_list + 1;
+	ss->num_sge = wqe->wr.num_sge;
+	ss->total_len = wqe->length;
+	hfi1_skip_sge(ss, len, 0);
+	return wqe->length - len;
+}
+
+static void start_timer(struct hfi1_qp *qp)
+{
+	qp->s_flags |= HFI1_S_TIMER;
+	qp->s_timer.function = rc_timeout;
+	/* 4.096 usec. * (1 << qp->timeout) */
+	qp->s_timer.expires = jiffies + qp->timeout_jiffies;
+	add_timer(&qp->s_timer);
+}
+
+/**
+ * make_rc_ack - construct a response packet (ACK, NAK, or RDMA read)
+ * @dev: the device for this QP
+ * @qp: a pointer to the QP
+ * @ohdr: a pointer to the IB header being constructed
+ * @pmtu: the path MTU
+ *
+ * Return 1 if constructed; otherwise, return 0.
+ * Note that we are in the responder's side of the QP context.
+ * Note the QP s_lock must be held.
+ */
+static int make_rc_ack(struct hfi1_ibdev *dev, struct hfi1_qp *qp,
+		       struct hfi1_other_headers *ohdr, u32 pmtu)
+{
+	struct hfi1_ack_entry *e;
+	u32 hwords;
+	u32 len;
+	u32 bth0;
+	u32 bth2;
+	int middle = 0;
+
+	/* Don't send an ACK if we aren't supposed to. */
+	if (!(ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_RECV_OK))
+		goto bail;
+
+	/* header size in 32-bit words LRH+BTH = (8+12)/4. */
+	hwords = 5;
+
+	switch (qp->s_ack_state) {
+	case OP(RDMA_READ_RESPONSE_LAST):
+	case OP(RDMA_READ_RESPONSE_ONLY):
+		e = &qp->s_ack_queue[qp->s_tail_ack_queue];
+		if (e->rdma_sge.mr) {
+			hfi1_put_mr(e->rdma_sge.mr);
+			e->rdma_sge.mr = NULL;
+		}
+		/* FALLTHROUGH */
+	case OP(ATOMIC_ACKNOWLEDGE):
+		/*
+		 * We can increment the tail pointer now that the last
+		 * response has been sent instead of only being
+		 * constructed.
+		 */
+		if (++qp->s_tail_ack_queue > HFI1_MAX_RDMA_ATOMIC)
+			qp->s_tail_ack_queue = 0;
+		/* FALLTHROUGH */
+	case OP(SEND_ONLY):
+	case OP(ACKNOWLEDGE):
+		/* Check for no next entry in the queue. */
+		if (qp->r_head_ack_queue == qp->s_tail_ack_queue) {
+			if (qp->s_flags & HFI1_S_ACK_PENDING)
+				goto normal;
+			goto bail;
+		}
+
+		e = &qp->s_ack_queue[qp->s_tail_ack_queue];
+		if (e->opcode == OP(RDMA_READ_REQUEST)) {
+			/*
+			 * If a RDMA read response is being resent and
+			 * we haven't seen the duplicate request yet,
+			 * then stop sending the remaining responses the
+			 * responder has seen until the requester re-sends it.
+			 */
+			len = e->rdma_sge.sge_length;
+			if (len && !e->rdma_sge.mr) {
+				qp->s_tail_ack_queue = qp->r_head_ack_queue;
+				goto bail;
+			}
+			/* Copy SGE state in case we need to resend */
+			qp->s_rdma_mr = e->rdma_sge.mr;
+			if (qp->s_rdma_mr)
+				hfi1_get_mr(qp->s_rdma_mr);
+			qp->s_ack_rdma_sge.sge = e->rdma_sge;
+			qp->s_ack_rdma_sge.num_sge = 1;
+			qp->s_cur_sge = &qp->s_ack_rdma_sge;
+			if (len > pmtu) {
+				len = pmtu;
+				qp->s_ack_state = OP(RDMA_READ_RESPONSE_FIRST);
+			} else {
+				qp->s_ack_state = OP(RDMA_READ_RESPONSE_ONLY);
+				e->sent = 1;
+			}
+			ohdr->u.aeth = hfi1_compute_aeth(qp);
+			hwords++;
+			qp->s_ack_rdma_psn = e->psn;
+			bth2 = mask_psn(qp->s_ack_rdma_psn++);
+		} else {
+			/* COMPARE_SWAP or FETCH_ADD */
+			qp->s_cur_sge = NULL;
+			len = 0;
+			qp->s_ack_state = OP(ATOMIC_ACKNOWLEDGE);
+			ohdr->u.at.aeth = hfi1_compute_aeth(qp);
+			ohdr->u.at.atomic_ack_eth[0] =
+				cpu_to_be32(e->atomic_data >> 32);
+			ohdr->u.at.atomic_ack_eth[1] =
+				cpu_to_be32(e->atomic_data);
+			hwords += sizeof(ohdr->u.at) / sizeof(u32);
+			bth2 = mask_psn(e->psn);
+			e->sent = 1;
+		}
+		bth0 = qp->s_ack_state << 24;
+		break;
+
+	case OP(RDMA_READ_RESPONSE_FIRST):
+		qp->s_ack_state = OP(RDMA_READ_RESPONSE_MIDDLE);
+		/* FALLTHROUGH */
+	case OP(RDMA_READ_RESPONSE_MIDDLE):
+		qp->s_cur_sge = &qp->s_ack_rdma_sge;
+		qp->s_rdma_mr = qp->s_ack_rdma_sge.sge.mr;
+		if (qp->s_rdma_mr)
+			hfi1_get_mr(qp->s_rdma_mr);
+		len = qp->s_ack_rdma_sge.sge.sge_length;
+		if (len > pmtu) {
+			len = pmtu;
+			middle = HFI1_CAP_IS_KSET(SDMA_AHG);
+		} else {
+			ohdr->u.aeth = hfi1_compute_aeth(qp);
+			hwords++;
+			qp->s_ack_state = OP(RDMA_READ_RESPONSE_LAST);
+			e = &qp->s_ack_queue[qp->s_tail_ack_queue];
+			e->sent = 1;
+		}
+		bth0 = qp->s_ack_state << 24;
+		bth2 = mask_psn(qp->s_ack_rdma_psn++);
+		break;
+
+	default:
+normal:
+		/*
+		 * Send a regular ACK.
+		 * Set the s_ack_state so we wait until after sending
+		 * the ACK before setting s_ack_state to ACKNOWLEDGE
+		 * (see above).
+		 */
+		qp->s_ack_state = OP(SEND_ONLY);
+		qp->s_flags &= ~HFI1_S_ACK_PENDING;
+		qp->s_cur_sge = NULL;
+		if (qp->s_nak_state)
+			ohdr->u.aeth =
+				cpu_to_be32((qp->r_msn & HFI1_MSN_MASK) |
+					    (qp->s_nak_state <<
+					     HFI1_AETH_CREDIT_SHIFT));
+		else
+			ohdr->u.aeth = hfi1_compute_aeth(qp);
+		hwords++;
+		len = 0;
+		bth0 = OP(ACKNOWLEDGE) << 24;
+		bth2 = mask_psn(qp->s_ack_psn);
+	}
+	qp->s_rdma_ack_cnt++;
+	qp->s_hdrwords = hwords;
+	qp->s_cur_size = len;
+	hfi1_make_ruc_header(qp, ohdr, bth0, bth2, middle);
+	return 1;
+
+bail:
+	qp->s_ack_state = OP(ACKNOWLEDGE);
+	/*
+	 * Ensure s_rdma_ack_cnt changes are committed prior to resetting
+	 * HFI1_S_RESP_PENDING
+	 */
+	smp_wmb();
+	qp->s_flags &= ~(HFI1_S_RESP_PENDING
+				| HFI1_S_ACK_PENDING
+				| HFI1_S_AHG_VALID);
+	return 0;
+}
+
+/**
+ * hfi1_make_rc_req - construct a request packet (SEND, RDMA r/w, ATOMIC)
+ * @qp: a pointer to the QP
+ *
+ * Return 1 if constructed; otherwise, return 0.
+ */
+int hfi1_make_rc_req(struct hfi1_qp *qp)
+{
+	struct hfi1_ibdev *dev = to_idev(qp->ibqp.device);
+	struct hfi1_other_headers *ohdr;
+	struct hfi1_sge_state *ss;
+	struct hfi1_swqe *wqe;
+	/* header size in 32-bit words LRH+BTH = (8+12)/4. */
+	u32 hwords = 5;
+	u32 len;
+	u32 bth0 = 0;
+	u32 bth2;
+	u32 pmtu = qp->pmtu;
+	char newreq;
+	unsigned long flags;
+	int ret = 0;
+	int middle = 0;
+	int delta;
+
+	ohdr = &qp->s_hdr->ibh.u.oth;
+	if (qp->remote_ah_attr.ah_flags & IB_AH_GRH)
+		ohdr = &qp->s_hdr->ibh.u.l.oth;
+
+	/*
+	 * The lock is needed to synchronize between the sending tasklet,
+	 * the receive interrupt handler, and timeout re-sends.
+	 */
+	spin_lock_irqsave(&qp->s_lock, flags);
+
+	/* Sending responses has higher priority over sending requests. */
+	if ((qp->s_flags & HFI1_S_RESP_PENDING) &&
+	    make_rc_ack(dev, qp, ohdr, pmtu))
+		goto done;
+
+	if (!(ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_SEND_OK)) {
+		if (!(ib_hfi1_state_ops[qp->state] & HFI1_FLUSH_SEND))
+			goto bail;
+		/* We are in the error state, flush the work request. */
+		if (qp->s_last == qp->s_head)
+			goto bail;
+		/* If DMAs are in progress, we can't flush immediately. */
+		if (atomic_read(&qp->s_iowait.sdma_busy)) {
+			qp->s_flags |= HFI1_S_WAIT_DMA;
+			goto bail;
+		}
+		clear_ahg(qp);
+		wqe = get_swqe_ptr(qp, qp->s_last);
+		hfi1_send_complete(qp, wqe, qp->s_last != qp->s_acked ?
+			IB_WC_SUCCESS : IB_WC_WR_FLUSH_ERR);
+		/* will get called again */
+		goto done;
+	}
+
+	if (qp->s_flags & (HFI1_S_WAIT_RNR | HFI1_S_WAIT_ACK))
+		goto bail;
+
+	if (cmp_psn(qp->s_psn, qp->s_sending_hpsn) <= 0) {
+		if (cmp_psn(qp->s_sending_psn, qp->s_sending_hpsn) <= 0) {
+			qp->s_flags |= HFI1_S_WAIT_PSN;
+			goto bail;
+		}
+		qp->s_sending_psn = qp->s_psn;
+		qp->s_sending_hpsn = qp->s_psn - 1;
+	}
+
+	/* Send a request. */
+	wqe = get_swqe_ptr(qp, qp->s_cur);
+	switch (qp->s_state) {
+	default:
+		if (!(ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_NEXT_SEND_OK))
+			goto bail;
+		/*
+		 * Resend an old request or start a new one.
+		 *
+		 * We keep track of the current SWQE so that
+		 * we don't reset the "furthest progress" state
+		 * if we need to back up.
+		 */
+		newreq = 0;
+		if (qp->s_cur == qp->s_tail) {
+			/* Check if send work queue is empty. */
+			if (qp->s_tail == qp->s_head) {
+				clear_ahg(qp);
+				goto bail;
+			}
+			/*
+			 * If a fence is requested, wait for previous
+			 * RDMA read and atomic operations to finish.
+			 */
+			if ((wqe->wr.send_flags & IB_SEND_FENCE) &&
+			    qp->s_num_rd_atomic) {
+				qp->s_flags |= HFI1_S_WAIT_FENCE;
+				goto bail;
+			}
+			wqe->psn = qp->s_next_psn;
+			newreq = 1;
+		}
+		/*
+		 * Note that we have to be careful not to modify the
+		 * original work request since we may need to resend
+		 * it.
+		 */
+		len = wqe->length;
+		ss = &qp->s_sge;
+		bth2 = mask_psn(qp->s_psn);
+		switch (wqe->wr.opcode) {
+		case IB_WR_SEND:
+		case IB_WR_SEND_WITH_IMM:
+			/* If no credit, return. */
+			if (!(qp->s_flags & HFI1_S_UNLIMITED_CREDIT) &&
+			    cmp_msn(wqe->ssn, qp->s_lsn + 1) > 0) {
+				qp->s_flags |= HFI1_S_WAIT_SSN_CREDIT;
+				goto bail;
+			}
+			wqe->lpsn = wqe->psn;
+			if (len > pmtu) {
+				wqe->lpsn += (len - 1) / pmtu;
+				qp->s_state = OP(SEND_FIRST);
+				len = pmtu;
+				break;
+			}
+			if (wqe->wr.opcode == IB_WR_SEND)
+				qp->s_state = OP(SEND_ONLY);
+			else {
+				qp->s_state = OP(SEND_ONLY_WITH_IMMEDIATE);
+				/* Immediate data comes after the BTH */
+				ohdr->u.imm_data = wqe->wr.ex.imm_data;
+				hwords += 1;
+			}
+			if (wqe->wr.send_flags & IB_SEND_SOLICITED)
+				bth0 |= IB_BTH_SOLICITED;
+			bth2 |= IB_BTH_REQ_ACK;
+			if (++qp->s_cur == qp->s_size)
+				qp->s_cur = 0;
+			break;
+
+		case IB_WR_RDMA_WRITE:
+			if (newreq && !(qp->s_flags & HFI1_S_UNLIMITED_CREDIT))
+				qp->s_lsn++;
+			/* FALLTHROUGH */
+		case IB_WR_RDMA_WRITE_WITH_IMM:
+			/* If no credit, return. */
+			if (!(qp->s_flags & HFI1_S_UNLIMITED_CREDIT) &&
+			    cmp_msn(wqe->ssn, qp->s_lsn + 1) > 0) {
+				qp->s_flags |= HFI1_S_WAIT_SSN_CREDIT;
+				goto bail;
+			}
+			ohdr->u.rc.reth.vaddr =
+				cpu_to_be64(wqe->wr.wr.rdma.remote_addr);
+			ohdr->u.rc.reth.rkey =
+				cpu_to_be32(wqe->wr.wr.rdma.rkey);
+			ohdr->u.rc.reth.length = cpu_to_be32(len);
+			hwords += sizeof(struct ib_reth) / sizeof(u32);
+			wqe->lpsn = wqe->psn;
+			if (len > pmtu) {
+				wqe->lpsn += (len - 1) / pmtu;
+				qp->s_state = OP(RDMA_WRITE_FIRST);
+				len = pmtu;
+				break;
+			}
+			if (wqe->wr.opcode == IB_WR_RDMA_WRITE)
+				qp->s_state = OP(RDMA_WRITE_ONLY);
+			else {
+				qp->s_state =
+					OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE);
+				/* Immediate data comes after RETH */
+				ohdr->u.rc.imm_data = wqe->wr.ex.imm_data;
+				hwords += 1;
+				if (wqe->wr.send_flags & IB_SEND_SOLICITED)
+					bth0 |= IB_BTH_SOLICITED;
+			}
+			bth2 |= IB_BTH_REQ_ACK;
+			if (++qp->s_cur == qp->s_size)
+				qp->s_cur = 0;
+			break;
+
+		case IB_WR_RDMA_READ:
+			/*
+			 * Don't allow more operations to be started
+			 * than the QP limits allow.
+			 */
+			if (newreq) {
+				if (qp->s_num_rd_atomic >=
+				    qp->s_max_rd_atomic) {
+					qp->s_flags |= HFI1_S_WAIT_RDMAR;
+					goto bail;
+				}
+				qp->s_num_rd_atomic++;
+				if (!(qp->s_flags & HFI1_S_UNLIMITED_CREDIT))
+					qp->s_lsn++;
+				/*
+				 * Adjust s_next_psn to count the
+				 * expected number of responses.
+				 */
+				if (len > pmtu)
+					qp->s_next_psn += (len - 1) / pmtu;
+				wqe->lpsn = qp->s_next_psn++;
+			}
+			ohdr->u.rc.reth.vaddr =
+				cpu_to_be64(wqe->wr.wr.rdma.remote_addr);
+			ohdr->u.rc.reth.rkey =
+				cpu_to_be32(wqe->wr.wr.rdma.rkey);
+			ohdr->u.rc.reth.length = cpu_to_be32(len);
+			qp->s_state = OP(RDMA_READ_REQUEST);
+			hwords += sizeof(ohdr->u.rc.reth) / sizeof(u32);
+			ss = NULL;
+			len = 0;
+			bth2 |= IB_BTH_REQ_ACK;
+			if (++qp->s_cur == qp->s_size)
+				qp->s_cur = 0;
+			break;
+
+		case IB_WR_ATOMIC_CMP_AND_SWP:
+		case IB_WR_ATOMIC_FETCH_AND_ADD:
+			/*
+			 * Don't allow more operations to be started
+			 * than the QP limits allow.
+			 */
+			if (newreq) {
+				if (qp->s_num_rd_atomic >=
+				    qp->s_max_rd_atomic) {
+					qp->s_flags |= HFI1_S_WAIT_RDMAR;
+					goto bail;
+				}
+				qp->s_num_rd_atomic++;
+				if (!(qp->s_flags & HFI1_S_UNLIMITED_CREDIT))
+					qp->s_lsn++;
+				wqe->lpsn = wqe->psn;
+			}
+			if (wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP) {
+				qp->s_state = OP(COMPARE_SWAP);
+				ohdr->u.atomic_eth.swap_data = cpu_to_be64(
+					wqe->wr.wr.atomic.swap);
+				ohdr->u.atomic_eth.compare_data = cpu_to_be64(
+					wqe->wr.wr.atomic.compare_add);
+			} else {
+				qp->s_state = OP(FETCH_ADD);
+				ohdr->u.atomic_eth.swap_data = cpu_to_be64(
+					wqe->wr.wr.atomic.compare_add);
+				ohdr->u.atomic_eth.compare_data = 0;
+			}
+			ohdr->u.atomic_eth.vaddr[0] = cpu_to_be32(
+				wqe->wr.wr.atomic.remote_addr >> 32);
+			ohdr->u.atomic_eth.vaddr[1] = cpu_to_be32(
+				wqe->wr.wr.atomic.remote_addr);
+			ohdr->u.atomic_eth.rkey = cpu_to_be32(
+				wqe->wr.wr.atomic.rkey);
+			hwords += sizeof(struct ib_atomic_eth) / sizeof(u32);
+			ss = NULL;
+			len = 0;
+			bth2 |= IB_BTH_REQ_ACK;
+			if (++qp->s_cur == qp->s_size)
+				qp->s_cur = 0;
+			break;
+
+		default:
+			goto bail;
+		}
+		qp->s_sge.sge = wqe->sg_list[0];
+		qp->s_sge.sg_list = wqe->sg_list + 1;
+		qp->s_sge.num_sge = wqe->wr.num_sge;
+		qp->s_sge.total_len = wqe->length;
+		qp->s_len = wqe->length;
+		if (newreq) {
+			qp->s_tail++;
+			if (qp->s_tail >= qp->s_size)
+				qp->s_tail = 0;
+		}
+		if (wqe->wr.opcode == IB_WR_RDMA_READ)
+			qp->s_psn = wqe->lpsn + 1;
+		else {
+			qp->s_psn++;
+			if (cmp_psn(qp->s_psn, qp->s_next_psn) > 0)
+				qp->s_next_psn = qp->s_psn;
+		}
+		break;
+
+	case OP(RDMA_READ_RESPONSE_FIRST):
+		/*
+		 * qp->s_state is normally set to the opcode of the
+		 * last packet constructed for new requests and therefore
+		 * is never set to RDMA read response.
+		 * RDMA_READ_RESPONSE_FIRST is used by the ACK processing
+		 * thread to indicate a SEND needs to be restarted from an
+		 * earlier PSN without interfering with the sending thread.
+		 * See restart_rc().
+		 */
+		qp->s_len = restart_sge(&qp->s_sge, wqe, qp->s_psn, pmtu);
+		/* FALLTHROUGH */
+	case OP(SEND_FIRST):
+		qp->s_state = OP(SEND_MIDDLE);
+		/* FALLTHROUGH */
+	case OP(SEND_MIDDLE):
+		bth2 = mask_psn(qp->s_psn++);
+		if (cmp_psn(qp->s_psn, qp->s_next_psn) > 0)
+			qp->s_next_psn = qp->s_psn;
+		ss = &qp->s_sge;
+		len = qp->s_len;
+		if (len > pmtu) {
+			len = pmtu;
+			middle = HFI1_CAP_IS_KSET(SDMA_AHG);
+			break;
+		}
+		if (wqe->wr.opcode == IB_WR_SEND)
+			qp->s_state = OP(SEND_LAST);
+		else {
+			qp->s_state = OP(SEND_LAST_WITH_IMMEDIATE);
+			/* Immediate data comes after the BTH */
+			ohdr->u.imm_data = wqe->wr.ex.imm_data;
+			hwords += 1;
+		}
+		if (wqe->wr.send_flags & IB_SEND_SOLICITED)
+			bth0 |= IB_BTH_SOLICITED;
+		bth2 |= IB_BTH_REQ_ACK;
+		qp->s_cur++;
+		if (qp->s_cur >= qp->s_size)
+			qp->s_cur = 0;
+		break;
+
+	case OP(RDMA_READ_RESPONSE_LAST):
+		/*
+		 * qp->s_state is normally set to the opcode of the
+		 * last packet constructed for new requests and therefore
+		 * is never set to RDMA read response.
+		 * RDMA_READ_RESPONSE_LAST is used by the ACK processing
+		 * thread to indicate a RDMA write needs to be restarted from
+		 * an earlier PSN without interfering with the sending thread.
+		 * See restart_rc().
+		 */
+		qp->s_len = restart_sge(&qp->s_sge, wqe, qp->s_psn, pmtu);
+		/* FALLTHROUGH */
+	case OP(RDMA_WRITE_FIRST):
+		qp->s_state = OP(RDMA_WRITE_MIDDLE);
+		/* FALLTHROUGH */
+	case OP(RDMA_WRITE_MIDDLE):
+		bth2 = mask_psn(qp->s_psn++);
+		if (cmp_psn(qp->s_psn, qp->s_next_psn) > 0)
+			qp->s_next_psn = qp->s_psn;
+		ss = &qp->s_sge;
+		len = qp->s_len;
+		if (len > pmtu) {
+			len = pmtu;
+			middle = HFI1_CAP_IS_KSET(SDMA_AHG);
+			break;
+		}
+		if (wqe->wr.opcode == IB_WR_RDMA_WRITE)
+			qp->s_state = OP(RDMA_WRITE_LAST);
+		else {
+			qp->s_state = OP(RDMA_WRITE_LAST_WITH_IMMEDIATE);
+			/* Immediate data comes after the BTH */
+			ohdr->u.imm_data = wqe->wr.ex.imm_data;
+			hwords += 1;
+			if (wqe->wr.send_flags & IB_SEND_SOLICITED)
+				bth0 |= IB_BTH_SOLICITED;
+		}
+		bth2 |= IB_BTH_REQ_ACK;
+		qp->s_cur++;
+		if (qp->s_cur >= qp->s_size)
+			qp->s_cur = 0;
+		break;
+
+	case OP(RDMA_READ_RESPONSE_MIDDLE):
+		/*
+		 * qp->s_state is normally set to the opcode of the
+		 * last packet constructed for new requests and therefore
+		 * is never set to RDMA read response.
+		 * RDMA_READ_RESPONSE_MIDDLE is used by the ACK processing
+		 * thread to indicate a RDMA read needs to be restarted from
+		 * an earlier PSN without interfering with the sending thread.
+		 * See restart_rc().
+		 */
+		len = (delta_psn(qp->s_psn, wqe->psn)) * pmtu;
+		ohdr->u.rc.reth.vaddr =
+			cpu_to_be64(wqe->wr.wr.rdma.remote_addr + len);
+		ohdr->u.rc.reth.rkey =
+			cpu_to_be32(wqe->wr.wr.rdma.rkey);
+		ohdr->u.rc.reth.length = cpu_to_be32(wqe->length - len);
+		qp->s_state = OP(RDMA_READ_REQUEST);
+		hwords += sizeof(ohdr->u.rc.reth) / sizeof(u32);
+		bth2 = mask_psn(qp->s_psn) | IB_BTH_REQ_ACK;
+		qp->s_psn = wqe->lpsn + 1;
+		ss = NULL;
+		len = 0;
+		qp->s_cur++;
+		if (qp->s_cur == qp->s_size)
+			qp->s_cur = 0;
+		break;
+	}
+	qp->s_sending_hpsn = bth2;
+	delta = delta_psn(bth2, wqe->psn);
+	if (delta && delta % HFI1_PSN_CREDIT == 0)
+		bth2 |= IB_BTH_REQ_ACK;
+	if (qp->s_flags & HFI1_S_SEND_ONE) {
+		qp->s_flags &= ~HFI1_S_SEND_ONE;
+		qp->s_flags |= HFI1_S_WAIT_ACK;
+		bth2 |= IB_BTH_REQ_ACK;
+	}
+	qp->s_len -= len;
+	qp->s_hdrwords = hwords;
+	qp->s_cur_sge = ss;
+	qp->s_cur_size = len;
+	hfi1_make_ruc_header(
+		qp,
+		ohdr,
+		bth0 | (qp->s_state << 24),
+		bth2,
+		middle);
+done:
+	ret = 1;
+	goto unlock;
+
+bail:
+	qp->s_flags &= ~HFI1_S_BUSY;
+unlock:
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+	return ret;
+}
+
+/**
+ * hfi1_send_rc_ack - Construct an ACK packet and send it
+ * @qp: a pointer to the QP
+ *
+ * This is called from hfi1_rc_rcv() and handle_receive_interrupt().
+ * Note that RDMA reads and atomics are handled in the
+ * send side QP state and tasklet.
+ */
+void hfi1_send_rc_ack(struct hfi1_ctxtdata *rcd, struct hfi1_qp *qp,
+		      int is_fecn)
+{
+	struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	u64 pbc, pbc_flags = 0;
+	u16 lrh0;
+	u16 sc5;
+	u32 bth0;
+	u32 hwords;
+	u32 vl, plen;
+	struct send_context *sc;
+	struct pio_buf *pbuf;
+	struct hfi1_ib_header hdr;
+	struct hfi1_other_headers *ohdr;
+
+	/* Don't send ACK or NAK if a RDMA read or atomic is pending. */
+	if (qp->s_flags & HFI1_S_RESP_PENDING)
+		goto queue_ack;
+
+	/* Ensure s_rdma_ack_cnt changes are committed */
+	smp_read_barrier_depends();
+	if (qp->s_rdma_ack_cnt)
+		goto queue_ack;
+
+	/* Construct the header */
+	/* header size in 32-bit words LRH+BTH+AETH = (8+12+4)/4 */
+	hwords = 6;
+	if (unlikely(qp->remote_ah_attr.ah_flags & IB_AH_GRH)) {
+		hwords += hfi1_make_grh(ibp, &hdr.u.l.grh,
+				       &qp->remote_ah_attr.grh, hwords, 0);
+		ohdr = &hdr.u.l.oth;
+		lrh0 = HFI1_LRH_GRH;
+	} else {
+		ohdr = &hdr.u.oth;
+		lrh0 = HFI1_LRH_BTH;
+	}
+	/* read pkey_index w/o lock (its atomic) */
+	bth0 = hfi1_get_pkey(ibp, qp->s_pkey_index) | (OP(ACKNOWLEDGE) << 24);
+	if (qp->s_mig_state == IB_MIG_MIGRATED)
+		bth0 |= IB_BTH_MIG_REQ;
+	if (qp->r_nak_state)
+		ohdr->u.aeth = cpu_to_be32((qp->r_msn & HFI1_MSN_MASK) |
+					    (qp->r_nak_state <<
+					     HFI1_AETH_CREDIT_SHIFT));
+	else
+		ohdr->u.aeth = hfi1_compute_aeth(qp);
+	sc5 = ibp->sl_to_sc[qp->remote_ah_attr.sl];
+	/* set PBC_DC_INFO bit (aka SC[4]) in pbc_flags */
+	pbc_flags |= ((!!(sc5 & 0x10)) << PBC_DC_INFO_SHIFT);
+	lrh0 |= (sc5 & 0xf) << 12 | (qp->remote_ah_attr.sl & 0xf) << 4;
+	hdr.lrh[0] = cpu_to_be16(lrh0);
+	hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);
+	hdr.lrh[2] = cpu_to_be16(hwords + SIZE_OF_CRC);
+	hdr.lrh[3] = cpu_to_be16(ppd->lid | qp->remote_ah_attr.src_path_bits);
+	ohdr->bth[0] = cpu_to_be32(bth0);
+	ohdr->bth[1] = cpu_to_be32(qp->remote_qpn);
+	ohdr->bth[1] |= cpu_to_be32((!!is_fecn) << HFI1_BECN_SHIFT);
+	ohdr->bth[2] = cpu_to_be32(mask_psn(qp->r_ack_psn));
+
+	/* Don't try to send ACKs if the link isn't ACTIVE */
+	if (driver_lstate(ppd) != IB_PORT_ACTIVE)
+		return;
+
+	sc = rcd->sc;
+	plen = 2 /* PBC */ + hwords;
+	vl = sc_to_vlt(ppd->dd, sc5);
+	pbc = create_pbc(ppd, pbc_flags, qp->srate_mbps, vl, plen);
+
+	pbuf = sc_buffer_alloc(sc, plen, NULL, NULL);
+	if (!pbuf) {
+		/*
+		 * We have no room to send at the moment.  Pass
+		 * responsibility for sending the ACK to the send tasklet
+		 * so that when enough buffer space becomes available,
+		 * the ACK is sent ahead of other outgoing packets.
+		 */
+		goto queue_ack;
+	}
+
+	trace_output_ibhdr(dd_from_ibdev(qp->ibqp.device), &hdr);
+
+	/* write the pbc and data */
+	ppd->dd->pio_inline_send(ppd->dd, pbuf, pbc, &hdr, hwords);
+
+	return;
+
+queue_ack:
+	this_cpu_inc(*ibp->rc_qacks);
+	spin_lock(&qp->s_lock);
+	qp->s_flags |= HFI1_S_ACK_PENDING | HFI1_S_RESP_PENDING;
+	qp->s_nak_state = qp->r_nak_state;
+	qp->s_ack_psn = qp->r_ack_psn;
+	if (is_fecn)
+		qp->s_flags |= HFI1_S_ECN;
+
+	/* Schedule the send tasklet. */
+	hfi1_schedule_send(qp);
+	spin_unlock(&qp->s_lock);
+}
+
+/**
+ * reset_psn - reset the QP state to send starting from PSN
+ * @qp: the QP
+ * @psn: the packet sequence number to restart at
+ *
+ * This is called from hfi1_rc_rcv() to process an incoming RC ACK
+ * for the given QP.
+ * Called at interrupt level with the QP s_lock held.
+ */
+static void reset_psn(struct hfi1_qp *qp, u32 psn)
+{
+	u32 n = qp->s_acked;
+	struct hfi1_swqe *wqe = get_swqe_ptr(qp, n);
+	u32 opcode;
+
+	qp->s_cur = n;
+
+	/*
+	 * If we are starting the request from the beginning,
+	 * let the normal send code handle initialization.
+	 */
+	if (cmp_psn(psn, wqe->psn) <= 0) {
+		qp->s_state = OP(SEND_LAST);
+		goto done;
+	}
+
+	/* Find the work request opcode corresponding to the given PSN. */
+	opcode = wqe->wr.opcode;
+	for (;;) {
+		int diff;
+
+		if (++n == qp->s_size)
+			n = 0;
+		if (n == qp->s_tail)
+			break;
+		wqe = get_swqe_ptr(qp, n);
+		diff = cmp_psn(psn, wqe->psn);
+		if (diff < 0)
+			break;
+		qp->s_cur = n;
+		/*
+		 * If we are starting the request from the beginning,
+		 * let the normal send code handle initialization.
+		 */
+		if (diff == 0) {
+			qp->s_state = OP(SEND_LAST);
+			goto done;
+		}
+		opcode = wqe->wr.opcode;
+	}
+
+	/*
+	 * Set the state to restart in the middle of a request.
+	 * Don't change the s_sge, s_cur_sge, or s_cur_size.
+	 * See hfi1_make_rc_req().
+	 */
+	switch (opcode) {
+	case IB_WR_SEND:
+	case IB_WR_SEND_WITH_IMM:
+		qp->s_state = OP(RDMA_READ_RESPONSE_FIRST);
+		break;
+
+	case IB_WR_RDMA_WRITE:
+	case IB_WR_RDMA_WRITE_WITH_IMM:
+		qp->s_state = OP(RDMA_READ_RESPONSE_LAST);
+		break;
+
+	case IB_WR_RDMA_READ:
+		qp->s_state = OP(RDMA_READ_RESPONSE_MIDDLE);
+		break;
+
+	default:
+		/*
+		 * This case shouldn't happen since its only
+		 * one PSN per req.
+		 */
+		qp->s_state = OP(SEND_LAST);
+	}
+done:
+	qp->s_psn = psn;
+	/*
+	 * Set HFI1_S_WAIT_PSN as rc_complete() may start the timer
+	 * asynchronously before the send tasklet can get scheduled.
+	 * Doing it in hfi1_make_rc_req() is too late.
+	 */
+	if ((cmp_psn(qp->s_psn, qp->s_sending_hpsn) <= 0) &&
+	    (cmp_psn(qp->s_sending_psn, qp->s_sending_hpsn) <= 0))
+		qp->s_flags |= HFI1_S_WAIT_PSN;
+	qp->s_flags &= ~HFI1_S_AHG_VALID;
+}
+
+/*
+ * Back up requester to resend the last un-ACKed request.
+ * The QP r_lock and s_lock should be held and interrupts disabled.
+ */
+static void restart_rc(struct hfi1_qp *qp, u32 psn, int wait)
+{
+	struct hfi1_swqe *wqe = get_swqe_ptr(qp, qp->s_acked);
+	struct hfi1_ibport *ibp;
+
+	if (qp->s_retry == 0) {
+		if (qp->s_mig_state == IB_MIG_ARMED) {
+			hfi1_migrate_qp(qp);
+			qp->s_retry = qp->s_retry_cnt;
+		} else if (qp->s_last == qp->s_acked) {
+			hfi1_send_complete(qp, wqe, IB_WC_RETRY_EXC_ERR);
+			hfi1_error_qp(qp, IB_WC_WR_FLUSH_ERR);
+			return;
+		} else /* need to handle delayed completion */
+			return;
+	} else
+		qp->s_retry--;
+
+	ibp = to_iport(qp->ibqp.device, qp->port_num);
+	if (wqe->wr.opcode == IB_WR_RDMA_READ)
+		ibp->n_rc_resends++;
+	else
+		ibp->n_rc_resends += delta_psn(qp->s_psn, psn);
+
+	qp->s_flags &= ~(HFI1_S_WAIT_FENCE | HFI1_S_WAIT_RDMAR |
+			 HFI1_S_WAIT_SSN_CREDIT | HFI1_S_WAIT_PSN |
+			 HFI1_S_WAIT_ACK);
+	if (wait)
+		qp->s_flags |= HFI1_S_SEND_ONE;
+	reset_psn(qp, psn);
+}
+
+/*
+ * This is called from s_timer for missing responses.
+ */
+static void rc_timeout(unsigned long arg)
+{
+	struct hfi1_qp *qp = (struct hfi1_qp *)arg;
+	struct hfi1_ibport *ibp;
+	unsigned long flags;
+
+	spin_lock_irqsave(&qp->r_lock, flags);
+	spin_lock(&qp->s_lock);
+	if (qp->s_flags & HFI1_S_TIMER) {
+		ibp = to_iport(qp->ibqp.device, qp->port_num);
+		ibp->n_rc_timeouts++;
+		qp->s_flags &= ~HFI1_S_TIMER;
+		del_timer(&qp->s_timer);
+		restart_rc(qp, qp->s_last_psn + 1, 1);
+		hfi1_schedule_send(qp);
+	}
+	spin_unlock(&qp->s_lock);
+	spin_unlock_irqrestore(&qp->r_lock, flags);
+}
+
+/*
+ * This is called from s_timer for RNR timeouts.
+ */
+void hfi1_rc_rnr_retry(unsigned long arg)
+{
+	struct hfi1_qp *qp = (struct hfi1_qp *)arg;
+	unsigned long flags;
+
+	spin_lock_irqsave(&qp->s_lock, flags);
+	if (qp->s_flags & HFI1_S_WAIT_RNR) {
+		qp->s_flags &= ~HFI1_S_WAIT_RNR;
+		del_timer(&qp->s_timer);
+		hfi1_schedule_send(qp);
+	}
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+}
+
+/*
+ * Set qp->s_sending_psn to the next PSN after the given one.
+ * This would be psn+1 except when RDMA reads are present.
+ */
+static void reset_sending_psn(struct hfi1_qp *qp, u32 psn)
+{
+	struct hfi1_swqe *wqe;
+	u32 n = qp->s_last;
+
+	/* Find the work request corresponding to the given PSN. */
+	for (;;) {
+		wqe = get_swqe_ptr(qp, n);
+		if (cmp_psn(psn, wqe->lpsn) <= 0) {
+			if (wqe->wr.opcode == IB_WR_RDMA_READ)
+				qp->s_sending_psn = wqe->lpsn + 1;
+			else
+				qp->s_sending_psn = psn + 1;
+			break;
+		}
+		if (++n == qp->s_size)
+			n = 0;
+		if (n == qp->s_tail)
+			break;
+	}
+}
+
+/*
+ * This should be called with the QP s_lock held and interrupts disabled.
+ */
+void hfi1_rc_send_complete(struct hfi1_qp *qp, struct hfi1_ib_header *hdr)
+{
+	struct hfi1_other_headers *ohdr;
+	struct hfi1_swqe *wqe;
+	struct ib_wc wc;
+	unsigned i;
+	u32 opcode;
+	u32 psn;
+
+	if (!(ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_OR_FLUSH_SEND))
+		return;
+
+	/* Find out where the BTH is */
+	if ((be16_to_cpu(hdr->lrh[0]) & 3) == HFI1_LRH_BTH)
+		ohdr = &hdr->u.oth;
+	else
+		ohdr = &hdr->u.l.oth;
+
+	opcode = be32_to_cpu(ohdr->bth[0]) >> 24;
+	if (opcode >= OP(RDMA_READ_RESPONSE_FIRST) &&
+	    opcode <= OP(ATOMIC_ACKNOWLEDGE)) {
+		WARN_ON(!qp->s_rdma_ack_cnt);
+		qp->s_rdma_ack_cnt--;
+		return;
+	}
+
+	psn = be32_to_cpu(ohdr->bth[2]);
+	reset_sending_psn(qp, psn);
+
+	/*
+	 * Start timer after a packet requesting an ACK has been sent and
+	 * there are still requests that haven't been acked.
+	 */
+	if ((psn & IB_BTH_REQ_ACK) && qp->s_acked != qp->s_tail &&
+	    !(qp->s_flags &
+		(HFI1_S_TIMER | HFI1_S_WAIT_RNR | HFI1_S_WAIT_PSN)) &&
+		(ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_RECV_OK))
+		start_timer(qp);
+
+	while (qp->s_last != qp->s_acked) {
+		wqe = get_swqe_ptr(qp, qp->s_last);
+		if (cmp_psn(wqe->lpsn, qp->s_sending_psn) >= 0 &&
+		    cmp_psn(qp->s_sending_psn, qp->s_sending_hpsn) <= 0)
+			break;
+		for (i = 0; i < wqe->wr.num_sge; i++) {
+			struct hfi1_sge *sge = &wqe->sg_list[i];
+
+			hfi1_put_mr(sge->mr);
+		}
+		/* Post a send completion queue entry if requested. */
+		if (!(qp->s_flags & HFI1_S_SIGNAL_REQ_WR) ||
+		    (wqe->wr.send_flags & IB_SEND_SIGNALED)) {
+			memset(&wc, 0, sizeof(wc));
+			wc.wr_id = wqe->wr.wr_id;
+			wc.status = IB_WC_SUCCESS;
+			wc.opcode = ib_hfi1_wc_opcode[wqe->wr.opcode];
+			wc.byte_len = wqe->length;
+			wc.qp = &qp->ibqp;
+			hfi1_cq_enter(to_icq(qp->ibqp.send_cq), &wc, 0);
+		}
+		if (++qp->s_last >= qp->s_size)
+			qp->s_last = 0;
+	}
+	/*
+	 * If we were waiting for sends to complete before re-sending,
+	 * and they are now complete, restart sending.
+	 */
+	trace_hfi1_rc_sendcomplete(qp, psn);
+	if (qp->s_flags & HFI1_S_WAIT_PSN &&
+	    cmp_psn(qp->s_sending_psn, qp->s_sending_hpsn) > 0) {
+		qp->s_flags &= ~HFI1_S_WAIT_PSN;
+		qp->s_sending_psn = qp->s_psn;
+		qp->s_sending_hpsn = qp->s_psn - 1;
+		hfi1_schedule_send(qp);
+	}
+}
+
+static inline void update_last_psn(struct hfi1_qp *qp, u32 psn)
+{
+	qp->s_last_psn = psn;
+}
+
+/*
+ * Generate a SWQE completion.
+ * This is similar to hfi1_send_complete but has to check to be sure
+ * that the SGEs are not being referenced if the SWQE is being resent.
+ */
+static struct hfi1_swqe *do_rc_completion(struct hfi1_qp *qp,
+					  struct hfi1_swqe *wqe,
+					  struct hfi1_ibport *ibp)
+{
+	struct ib_wc wc;
+	unsigned i;
+
+	/*
+	 * Don't decrement refcount and don't generate a
+	 * completion if the SWQE is being resent until the send
+	 * is finished.
+	 */
+	if (cmp_psn(wqe->lpsn, qp->s_sending_psn) < 0 ||
+	    cmp_psn(qp->s_sending_psn, qp->s_sending_hpsn) > 0) {
+		for (i = 0; i < wqe->wr.num_sge; i++) {
+			struct hfi1_sge *sge = &wqe->sg_list[i];
+
+			hfi1_put_mr(sge->mr);
+		}
+		/* Post a send completion queue entry if requested. */
+		if (!(qp->s_flags & HFI1_S_SIGNAL_REQ_WR) ||
+		    (wqe->wr.send_flags & IB_SEND_SIGNALED)) {
+			memset(&wc, 0, sizeof(wc));
+			wc.wr_id = wqe->wr.wr_id;
+			wc.status = IB_WC_SUCCESS;
+			wc.opcode = ib_hfi1_wc_opcode[wqe->wr.opcode];
+			wc.byte_len = wqe->length;
+			wc.qp = &qp->ibqp;
+			hfi1_cq_enter(to_icq(qp->ibqp.send_cq), &wc, 0);
+		}
+		if (++qp->s_last >= qp->s_size)
+			qp->s_last = 0;
+	} else {
+		struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+
+		this_cpu_inc(*ibp->rc_delayed_comp);
+		/*
+		 * If send progress not running attempt to progress
+		 * SDMA queue.
+		 */
+		if (ppd->dd->flags & HFI1_HAS_SEND_DMA) {
+			struct sdma_engine *engine;
+			u8 sc5;
+
+			/* For now use sc to find engine */
+			sc5 = ibp->sl_to_sc[qp->remote_ah_attr.sl];
+			engine = qp_to_sdma_engine(qp, sc5);
+			sdma_engine_progress_schedule(engine);
+		}
+	}
+
+	qp->s_retry = qp->s_retry_cnt;
+	update_last_psn(qp, wqe->lpsn);
+
+	/*
+	 * If we are completing a request which is in the process of
+	 * being resent, we can stop re-sending it since we know the
+	 * responder has already seen it.
+	 */
+	if (qp->s_acked == qp->s_cur) {
+		if (++qp->s_cur >= qp->s_size)
+			qp->s_cur = 0;
+		qp->s_acked = qp->s_cur;
+		wqe = get_swqe_ptr(qp, qp->s_cur);
+		if (qp->s_acked != qp->s_tail) {
+			qp->s_state = OP(SEND_LAST);
+			qp->s_psn = wqe->psn;
+		}
+	} else {
+		if (++qp->s_acked >= qp->s_size)
+			qp->s_acked = 0;
+		if (qp->state == IB_QPS_SQD && qp->s_acked == qp->s_cur)
+			qp->s_draining = 0;
+		wqe = get_swqe_ptr(qp, qp->s_acked);
+	}
+	return wqe;
+}
+
+/**
+ * do_rc_ack - process an incoming RC ACK
+ * @qp: the QP the ACK came in on
+ * @psn: the packet sequence number of the ACK
+ * @opcode: the opcode of the request that resulted in the ACK
+ *
+ * This is called from rc_rcv_resp() to process an incoming RC ACK
+ * for the given QP.
+ * Called at interrupt level with the QP s_lock held.
+ * Returns 1 if OK, 0 if current operation should be aborted (NAK).
+ */
+static int do_rc_ack(struct hfi1_qp *qp, u32 aeth, u32 psn, int opcode,
+		     u64 val, struct hfi1_ctxtdata *rcd)
+{
+	struct hfi1_ibport *ibp;
+	enum ib_wc_status status;
+	struct hfi1_swqe *wqe;
+	int ret = 0;
+	u32 ack_psn;
+	int diff;
+
+	/* Remove QP from retry timer */
+	if (qp->s_flags & (HFI1_S_TIMER | HFI1_S_WAIT_RNR)) {
+		qp->s_flags &= ~(HFI1_S_TIMER | HFI1_S_WAIT_RNR);
+		del_timer(&qp->s_timer);
+	}
+
+	/*
+	 * Note that NAKs implicitly ACK outstanding SEND and RDMA write
+	 * requests and implicitly NAK RDMA read and atomic requests issued
+	 * before the NAK'ed request.  The MSN won't include the NAK'ed
+	 * request but will include an ACK'ed request(s).
+	 */
+	ack_psn = psn;
+	if (aeth >> 29)
+		ack_psn--;
+	wqe = get_swqe_ptr(qp, qp->s_acked);
+	ibp = to_iport(qp->ibqp.device, qp->port_num);
+
+	/*
+	 * The MSN might be for a later WQE than the PSN indicates so
+	 * only complete WQEs that the PSN finishes.
+	 */
+	while ((diff = delta_psn(ack_psn, wqe->lpsn)) >= 0) {
+		/*
+		 * RDMA_READ_RESPONSE_ONLY is a special case since
+		 * we want to generate completion events for everything
+		 * before the RDMA read, copy the data, then generate
+		 * the completion for the read.
+		 */
+		if (wqe->wr.opcode == IB_WR_RDMA_READ &&
+		    opcode == OP(RDMA_READ_RESPONSE_ONLY) &&
+		    diff == 0) {
+			ret = 1;
+			goto bail;
+		}
+		/*
+		 * If this request is a RDMA read or atomic, and the ACK is
+		 * for a later operation, this ACK NAKs the RDMA read or
+		 * atomic.  In other words, only a RDMA_READ_LAST or ONLY
+		 * can ACK a RDMA read and likewise for atomic ops.  Note
+		 * that the NAK case can only happen if relaxed ordering is
+		 * used and requests are sent after an RDMA read or atomic
+		 * is sent but before the response is received.
+		 */
+		if ((wqe->wr.opcode == IB_WR_RDMA_READ &&
+		     (opcode != OP(RDMA_READ_RESPONSE_LAST) || diff != 0)) ||
+		    ((wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
+		      wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD) &&
+		     (opcode != OP(ATOMIC_ACKNOWLEDGE) || diff != 0))) {
+			/* Retry this request. */
+			if (!(qp->r_flags & HFI1_R_RDMAR_SEQ)) {
+				qp->r_flags |= HFI1_R_RDMAR_SEQ;
+				restart_rc(qp, qp->s_last_psn + 1, 0);
+				if (list_empty(&qp->rspwait)) {
+					qp->r_flags |= HFI1_R_RSP_SEND;
+					atomic_inc(&qp->refcount);
+					list_add_tail(&qp->rspwait,
+						      &rcd->qp_wait_list);
+				}
+			}
+			/*
+			 * No need to process the ACK/NAK since we are
+			 * restarting an earlier request.
+			 */
+			goto bail;
+		}
+		if (wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
+		    wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD) {
+			u64 *vaddr = wqe->sg_list[0].vaddr;
+			*vaddr = val;
+		}
+		if (qp->s_num_rd_atomic &&
+		    (wqe->wr.opcode == IB_WR_RDMA_READ ||
+		     wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
+		     wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD)) {
+			qp->s_num_rd_atomic--;
+			/* Restart sending task if fence is complete */
+			if ((qp->s_flags & HFI1_S_WAIT_FENCE) &&
+			    !qp->s_num_rd_atomic) {
+				qp->s_flags &= ~(HFI1_S_WAIT_FENCE |
+						 HFI1_S_WAIT_ACK);
+				hfi1_schedule_send(qp);
+			} else if (qp->s_flags & HFI1_S_WAIT_RDMAR) {
+				qp->s_flags &= ~(HFI1_S_WAIT_RDMAR |
+						 HFI1_S_WAIT_ACK);
+				hfi1_schedule_send(qp);
+			}
+		}
+		wqe = do_rc_completion(qp, wqe, ibp);
+		if (qp->s_acked == qp->s_tail)
+			break;
+	}
+
+	switch (aeth >> 29) {
+	case 0:         /* ACK */
+		this_cpu_inc(*ibp->rc_acks);
+		if (qp->s_acked != qp->s_tail) {
+			/*
+			 * We are expecting more ACKs so
+			 * reset the re-transmit timer.
+			 */
+			start_timer(qp);
+			/*
+			 * We can stop re-sending the earlier packets and
+			 * continue with the next packet the receiver wants.
+			 */
+			if (cmp_psn(qp->s_psn, psn) <= 0)
+				reset_psn(qp, psn + 1);
+		} else if (cmp_psn(qp->s_psn, psn) <= 0) {
+			qp->s_state = OP(SEND_LAST);
+			qp->s_psn = psn + 1;
+		}
+		if (qp->s_flags & HFI1_S_WAIT_ACK) {
+			qp->s_flags &= ~HFI1_S_WAIT_ACK;
+			hfi1_schedule_send(qp);
+		}
+		hfi1_get_credit(qp, aeth);
+		qp->s_rnr_retry = qp->s_rnr_retry_cnt;
+		qp->s_retry = qp->s_retry_cnt;
+		update_last_psn(qp, psn);
+		ret = 1;
+		goto bail;
+
+	case 1:         /* RNR NAK */
+		ibp->n_rnr_naks++;
+		if (qp->s_acked == qp->s_tail)
+			goto bail;
+		if (qp->s_flags & HFI1_S_WAIT_RNR)
+			goto bail;
+		if (qp->s_rnr_retry == 0) {
+			status = IB_WC_RNR_RETRY_EXC_ERR;
+			goto class_b;
+		}
+		if (qp->s_rnr_retry_cnt < 7)
+			qp->s_rnr_retry--;
+
+		/* The last valid PSN is the previous PSN. */
+		update_last_psn(qp, psn - 1);
+
+		ibp->n_rc_resends += delta_psn(qp->s_psn, psn);
+
+		reset_psn(qp, psn);
+
+		qp->s_flags &= ~(HFI1_S_WAIT_SSN_CREDIT | HFI1_S_WAIT_ACK);
+		qp->s_flags |= HFI1_S_WAIT_RNR;
+		qp->s_timer.function = hfi1_rc_rnr_retry;
+		qp->s_timer.expires = jiffies + usecs_to_jiffies(
+			ib_hfi1_rnr_table[(aeth >> HFI1_AETH_CREDIT_SHIFT) &
+					   HFI1_AETH_CREDIT_MASK]);
+		add_timer(&qp->s_timer);
+		goto bail;
+
+	case 3:         /* NAK */
+		if (qp->s_acked == qp->s_tail)
+			goto bail;
+		/* The last valid PSN is the previous PSN. */
+		update_last_psn(qp, psn - 1);
+		switch ((aeth >> HFI1_AETH_CREDIT_SHIFT) &
+			HFI1_AETH_CREDIT_MASK) {
+		case 0: /* PSN sequence error */
+			ibp->n_seq_naks++;
+			/*
+			 * Back up to the responder's expected PSN.
+			 * Note that we might get a NAK in the middle of an
+			 * RDMA READ response which terminates the RDMA
+			 * READ.
+			 */
+			restart_rc(qp, psn, 0);
+			hfi1_schedule_send(qp);
+			break;
+
+		case 1: /* Invalid Request */
+			status = IB_WC_REM_INV_REQ_ERR;
+			ibp->n_other_naks++;
+			goto class_b;
+
+		case 2: /* Remote Access Error */
+			status = IB_WC_REM_ACCESS_ERR;
+			ibp->n_other_naks++;
+			goto class_b;
+
+		case 3: /* Remote Operation Error */
+			status = IB_WC_REM_OP_ERR;
+			ibp->n_other_naks++;
+class_b:
+			if (qp->s_last == qp->s_acked) {
+				hfi1_send_complete(qp, wqe, status);
+				hfi1_error_qp(qp, IB_WC_WR_FLUSH_ERR);
+			}
+			break;
+
+		default:
+			/* Ignore other reserved NAK error codes */
+			goto reserved;
+		}
+		qp->s_retry = qp->s_retry_cnt;
+		qp->s_rnr_retry = qp->s_rnr_retry_cnt;
+		goto bail;
+
+	default:                /* 2: reserved */
+reserved:
+		/* Ignore reserved NAK codes. */
+		goto bail;
+	}
+
+bail:
+	return ret;
+}
+
+/*
+ * We have seen an out of sequence RDMA read middle or last packet.
+ * This ACKs SENDs and RDMA writes up to the first RDMA read or atomic SWQE.
+ */
+static void rdma_seq_err(struct hfi1_qp *qp, struct hfi1_ibport *ibp, u32 psn,
+			 struct hfi1_ctxtdata *rcd)
+{
+	struct hfi1_swqe *wqe;
+
+	/* Remove QP from retry timer */
+	if (qp->s_flags & (HFI1_S_TIMER | HFI1_S_WAIT_RNR)) {
+		qp->s_flags &= ~(HFI1_S_TIMER | HFI1_S_WAIT_RNR);
+		del_timer(&qp->s_timer);
+	}
+
+	wqe = get_swqe_ptr(qp, qp->s_acked);
+
+	while (cmp_psn(psn, wqe->lpsn) > 0) {
+		if (wqe->wr.opcode == IB_WR_RDMA_READ ||
+		    wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
+		    wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD)
+			break;
+		wqe = do_rc_completion(qp, wqe, ibp);
+	}
+
+	ibp->n_rdma_seq++;
+	qp->r_flags |= HFI1_R_RDMAR_SEQ;
+	restart_rc(qp, qp->s_last_psn + 1, 0);
+	if (list_empty(&qp->rspwait)) {
+		qp->r_flags |= HFI1_R_RSP_SEND;
+		atomic_inc(&qp->refcount);
+		list_add_tail(&qp->rspwait, &rcd->qp_wait_list);
+	}
+}
+
+/**
+ * rc_rcv_resp - process an incoming RC response packet
+ * @ibp: the port this packet came in on
+ * @ohdr: the other headers for this packet
+ * @data: the packet data
+ * @tlen: the packet length
+ * @qp: the QP for this packet
+ * @opcode: the opcode for this packet
+ * @psn: the packet sequence number for this packet
+ * @hdrsize: the header length
+ * @pmtu: the path MTU
+ *
+ * This is called from hfi1_rc_rcv() to process an incoming RC response
+ * packet for the given QP.
+ * Called at interrupt level.
+ */
+static void rc_rcv_resp(struct hfi1_ibport *ibp,
+			struct hfi1_other_headers *ohdr,
+			void *data, u32 tlen, struct hfi1_qp *qp,
+			u32 opcode, u32 psn, u32 hdrsize, u32 pmtu,
+			struct hfi1_ctxtdata *rcd)
+{
+	struct hfi1_swqe *wqe;
+	enum ib_wc_status status;
+	unsigned long flags;
+	int diff;
+	u32 pad;
+	u32 aeth;
+	u64 val;
+
+	spin_lock_irqsave(&qp->s_lock, flags);
+
+	/* Ignore invalid responses. */
+	if (cmp_psn(psn, qp->s_next_psn) >= 0)
+		goto ack_done;
+
+	/* Ignore duplicate responses. */
+	diff = cmp_psn(psn, qp->s_last_psn);
+	if (unlikely(diff <= 0)) {
+		/* Update credits for "ghost" ACKs */
+		if (diff == 0 && opcode == OP(ACKNOWLEDGE)) {
+			aeth = be32_to_cpu(ohdr->u.aeth);
+			if ((aeth >> 29) == 0)
+				hfi1_get_credit(qp, aeth);
+		}
+		goto ack_done;
+	}
+
+	/*
+	 * Skip everything other than the PSN we expect, if we are waiting
+	 * for a reply to a restarted RDMA read or atomic op.
+	 */
+	if (qp->r_flags & HFI1_R_RDMAR_SEQ) {
+		if (cmp_psn(psn, qp->s_last_psn + 1) != 0)
+			goto ack_done;
+		qp->r_flags &= ~HFI1_R_RDMAR_SEQ;
+	}
+
+	if (unlikely(qp->s_acked == qp->s_tail))
+		goto ack_done;
+	wqe = get_swqe_ptr(qp, qp->s_acked);
+	status = IB_WC_SUCCESS;
+
+	switch (opcode) {
+	case OP(ACKNOWLEDGE):
+	case OP(ATOMIC_ACKNOWLEDGE):
+	case OP(RDMA_READ_RESPONSE_FIRST):
+		aeth = be32_to_cpu(ohdr->u.aeth);
+		if (opcode == OP(ATOMIC_ACKNOWLEDGE)) {
+			__be32 *p = ohdr->u.at.atomic_ack_eth;
+
+			val = ((u64) be32_to_cpu(p[0]) << 32) |
+				be32_to_cpu(p[1]);
+		} else
+			val = 0;
+		if (!do_rc_ack(qp, aeth, psn, opcode, val, rcd) ||
+		    opcode != OP(RDMA_READ_RESPONSE_FIRST))
+			goto ack_done;
+		wqe = get_swqe_ptr(qp, qp->s_acked);
+		if (unlikely(wqe->wr.opcode != IB_WR_RDMA_READ))
+			goto ack_op_err;
+		/*
+		 * If this is a response to a resent RDMA read, we
+		 * have to be careful to copy the data to the right
+		 * location.
+		 */
+		qp->s_rdma_read_len = restart_sge(&qp->s_rdma_read_sge,
+						  wqe, psn, pmtu);
+		goto read_middle;
+
+	case OP(RDMA_READ_RESPONSE_MIDDLE):
+		/* no AETH, no ACK */
+		if (unlikely(cmp_psn(psn, qp->s_last_psn + 1)))
+			goto ack_seq_err;
+		if (unlikely(wqe->wr.opcode != IB_WR_RDMA_READ))
+			goto ack_op_err;
+read_middle:
+		if (unlikely(tlen != (hdrsize + pmtu + 4)))
+			goto ack_len_err;
+		if (unlikely(pmtu >= qp->s_rdma_read_len))
+			goto ack_len_err;
+
+		/*
+		 * We got a response so update the timeout.
+		 * 4.096 usec. * (1 << qp->timeout)
+		 */
+		qp->s_flags |= HFI1_S_TIMER;
+		mod_timer(&qp->s_timer, jiffies + qp->timeout_jiffies);
+		if (qp->s_flags & HFI1_S_WAIT_ACK) {
+			qp->s_flags &= ~HFI1_S_WAIT_ACK;
+			hfi1_schedule_send(qp);
+		}
+
+		if (opcode == OP(RDMA_READ_RESPONSE_MIDDLE))
+			qp->s_retry = qp->s_retry_cnt;
+
+		/*
+		 * Update the RDMA receive state but do the copy w/o
+		 * holding the locks and blocking interrupts.
+		 */
+		qp->s_rdma_read_len -= pmtu;
+		update_last_psn(qp, psn);
+		spin_unlock_irqrestore(&qp->s_lock, flags);
+		hfi1_copy_sge(&qp->s_rdma_read_sge, data, pmtu, 0);
+		goto bail;
+
+	case OP(RDMA_READ_RESPONSE_ONLY):
+		aeth = be32_to_cpu(ohdr->u.aeth);
+		if (!do_rc_ack(qp, aeth, psn, opcode, 0, rcd))
+			goto ack_done;
+		/* Get the number of bytes the message was padded by. */
+		pad = (be32_to_cpu(ohdr->bth[0]) >> 20) & 3;
+		/*
+		 * Check that the data size is >= 0 && <= pmtu.
+		 * Remember to account for ICRC (4).
+		 */
+		if (unlikely(tlen < (hdrsize + pad + 4)))
+			goto ack_len_err;
+		/*
+		 * If this is a response to a resent RDMA read, we
+		 * have to be careful to copy the data to the right
+		 * location.
+		 */
+		wqe = get_swqe_ptr(qp, qp->s_acked);
+		qp->s_rdma_read_len = restart_sge(&qp->s_rdma_read_sge,
+						  wqe, psn, pmtu);
+		goto read_last;
+
+	case OP(RDMA_READ_RESPONSE_LAST):
+		/* ACKs READ req. */
+		if (unlikely(cmp_psn(psn, qp->s_last_psn + 1)))
+			goto ack_seq_err;
+		if (unlikely(wqe->wr.opcode != IB_WR_RDMA_READ))
+			goto ack_op_err;
+		/* Get the number of bytes the message was padded by. */
+		pad = (be32_to_cpu(ohdr->bth[0]) >> 20) & 3;
+		/*
+		 * Check that the data size is >= 1 && <= pmtu.
+		 * Remember to account for ICRC (4).
+		 */
+		if (unlikely(tlen <= (hdrsize + pad + 4)))
+			goto ack_len_err;
+read_last:
+		tlen -= hdrsize + pad + 4;
+		if (unlikely(tlen != qp->s_rdma_read_len))
+			goto ack_len_err;
+		aeth = be32_to_cpu(ohdr->u.aeth);
+		hfi1_copy_sge(&qp->s_rdma_read_sge, data, tlen, 0);
+		WARN_ON(qp->s_rdma_read_sge.num_sge);
+		(void) do_rc_ack(qp, aeth, psn,
+				 OP(RDMA_READ_RESPONSE_LAST), 0, rcd);
+		goto ack_done;
+	}
+
+ack_op_err:
+	status = IB_WC_LOC_QP_OP_ERR;
+	goto ack_err;
+
+ack_seq_err:
+	rdma_seq_err(qp, ibp, psn, rcd);
+	goto ack_done;
+
+ack_len_err:
+	status = IB_WC_LOC_LEN_ERR;
+ack_err:
+	if (qp->s_last == qp->s_acked) {
+		hfi1_send_complete(qp, wqe, status);
+		hfi1_error_qp(qp, IB_WC_WR_FLUSH_ERR);
+	}
+ack_done:
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+bail:
+	return;
+}
+
+/**
+ * rc_rcv_error - process an incoming duplicate or error RC packet
+ * @ohdr: the other headers for this packet
+ * @data: the packet data
+ * @qp: the QP for this packet
+ * @opcode: the opcode for this packet
+ * @psn: the packet sequence number for this packet
+ * @diff: the difference between the PSN and the expected PSN
+ *
+ * This is called from hfi1_rc_rcv() to process an unexpected
+ * incoming RC packet for the given QP.
+ * Called at interrupt level.
+ * Return 1 if no more processing is needed; otherwise return 0 to
+ * schedule a response to be sent.
+ */
+static noinline int rc_rcv_error(struct hfi1_other_headers *ohdr, void *data,
+			struct hfi1_qp *qp, u32 opcode, u32 psn, int diff,
+			struct hfi1_ctxtdata *rcd)
+{
+	struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
+	struct hfi1_ack_entry *e;
+	unsigned long flags;
+	u8 i, prev;
+	int old_req;
+
+	if (diff > 0) {
+		/*
+		 * Packet sequence error.
+		 * A NAK will ACK earlier sends and RDMA writes.
+		 * Don't queue the NAK if we already sent one.
+		 */
+		if (!qp->r_nak_state) {
+			ibp->n_rc_seqnak++;
+			qp->r_nak_state = IB_NAK_PSN_ERROR;
+			/* Use the expected PSN. */
+			qp->r_ack_psn = qp->r_psn;
+			/*
+			 * Wait to send the sequence NAK until all packets
+			 * in the receive queue have been processed.
+			 * Otherwise, we end up propagating congestion.
+			 */
+			if (list_empty(&qp->rspwait)) {
+				qp->r_flags |= HFI1_R_RSP_NAK;
+				atomic_inc(&qp->refcount);
+				list_add_tail(&qp->rspwait, &rcd->qp_wait_list);
+			}
+		}
+		goto done;
+	}
+
+	/*
+	 * Handle a duplicate request.  Don't re-execute SEND, RDMA
+	 * write or atomic op.  Don't NAK errors, just silently drop
+	 * the duplicate request.  Note that r_sge, r_len, and
+	 * r_rcv_len may be in use so don't modify them.
+	 *
+	 * We are supposed to ACK the earliest duplicate PSN but we
+	 * can coalesce an outstanding duplicate ACK.  We have to
+	 * send the earliest so that RDMA reads can be restarted at
+	 * the requester's expected PSN.
+	 *
+	 * First, find where this duplicate PSN falls within the
+	 * ACKs previously sent.
+	 * old_req is true if there is an older response that is scheduled
+	 * to be sent before sending this one.
+	 */
+	e = NULL;
+	old_req = 1;
+	ibp->n_rc_dupreq++;
+
+	spin_lock_irqsave(&qp->s_lock, flags);
+
+	for (i = qp->r_head_ack_queue; ; i = prev) {
+		if (i == qp->s_tail_ack_queue)
+			old_req = 0;
+		if (i)
+			prev = i - 1;
+		else
+			prev = HFI1_MAX_RDMA_ATOMIC;
+		if (prev == qp->r_head_ack_queue) {
+			e = NULL;
+			break;
+		}
+		e = &qp->s_ack_queue[prev];
+		if (!e->opcode) {
+			e = NULL;
+			break;
+		}
+		if (cmp_psn(psn, e->psn) >= 0) {
+			if (prev == qp->s_tail_ack_queue &&
+			    cmp_psn(psn, e->lpsn) <= 0)
+				old_req = 0;
+			break;
+		}
+	}
+	switch (opcode) {
+	case OP(RDMA_READ_REQUEST): {
+		struct ib_reth *reth;
+		u32 offset;
+		u32 len;
+
+		/*
+		 * If we didn't find the RDMA read request in the ack queue,
+		 * we can ignore this request.
+		 */
+		if (!e || e->opcode != OP(RDMA_READ_REQUEST))
+			goto unlock_done;
+		/* RETH comes after BTH */
+		reth = &ohdr->u.rc.reth;
+		/*
+		 * Address range must be a subset of the original
+		 * request and start on pmtu boundaries.
+		 * We reuse the old ack_queue slot since the requester
+		 * should not back up and request an earlier PSN for the
+		 * same request.
+		 */
+		offset = delta_psn(psn, e->psn) * qp->pmtu;
+		len = be32_to_cpu(reth->length);
+		if (unlikely(offset + len != e->rdma_sge.sge_length))
+			goto unlock_done;
+		if (e->rdma_sge.mr) {
+			hfi1_put_mr(e->rdma_sge.mr);
+			e->rdma_sge.mr = NULL;
+		}
+		if (len != 0) {
+			u32 rkey = be32_to_cpu(reth->rkey);
+			u64 vaddr = be64_to_cpu(reth->vaddr);
+			int ok;
+
+			ok = hfi1_rkey_ok(qp, &e->rdma_sge, len, vaddr, rkey,
+					  IB_ACCESS_REMOTE_READ);
+			if (unlikely(!ok))
+				goto unlock_done;
+		} else {
+			e->rdma_sge.vaddr = NULL;
+			e->rdma_sge.length = 0;
+			e->rdma_sge.sge_length = 0;
+		}
+		e->psn = psn;
+		if (old_req)
+			goto unlock_done;
+		qp->s_tail_ack_queue = prev;
+		break;
+	}
+
+	case OP(COMPARE_SWAP):
+	case OP(FETCH_ADD): {
+		/*
+		 * If we didn't find the atomic request in the ack queue
+		 * or the send tasklet is already backed up to send an
+		 * earlier entry, we can ignore this request.
+		 */
+		if (!e || e->opcode != (u8) opcode || old_req)
+			goto unlock_done;
+		qp->s_tail_ack_queue = prev;
+		break;
+	}
+
+	default:
+		/*
+		 * Ignore this operation if it doesn't request an ACK
+		 * or an earlier RDMA read or atomic is going to be resent.
+		 */
+		if (!(psn & IB_BTH_REQ_ACK) || old_req)
+			goto unlock_done;
+		/*
+		 * Resend the most recent ACK if this request is
+		 * after all the previous RDMA reads and atomics.
+		 */
+		if (i == qp->r_head_ack_queue) {
+			spin_unlock_irqrestore(&qp->s_lock, flags);
+			qp->r_nak_state = 0;
+			qp->r_ack_psn = qp->r_psn - 1;
+			goto send_ack;
+		}
+
+		/*
+		 * Resend the RDMA read or atomic op which
+		 * ACKs this duplicate request.
+		 */
+		qp->s_tail_ack_queue = i;
+		break;
+	}
+	qp->s_ack_state = OP(ACKNOWLEDGE);
+	qp->s_flags |= HFI1_S_RESP_PENDING;
+	qp->r_nak_state = 0;
+	hfi1_schedule_send(qp);
+
+unlock_done:
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+done:
+	return 1;
+
+send_ack:
+	return 0;
+}
+
+void hfi1_rc_error(struct hfi1_qp *qp, enum ib_wc_status err)
+{
+	unsigned long flags;
+	int lastwqe;
+
+	spin_lock_irqsave(&qp->s_lock, flags);
+	lastwqe = hfi1_error_qp(qp, err);
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+
+	if (lastwqe) {
+		struct ib_event ev;
+
+		ev.device = qp->ibqp.device;
+		ev.element.qp = &qp->ibqp;
+		ev.event = IB_EVENT_QP_LAST_WQE_REACHED;
+		qp->ibqp.event_handler(&ev, qp->ibqp.qp_context);
+	}
+}
+
+static inline void update_ack_queue(struct hfi1_qp *qp, unsigned n)
+{
+	unsigned next;
+
+	next = n + 1;
+	if (next > HFI1_MAX_RDMA_ATOMIC)
+		next = 0;
+	qp->s_tail_ack_queue = next;
+	qp->s_ack_state = OP(ACKNOWLEDGE);
+}
+
+static void log_cca_event(struct hfi1_pportdata *ppd, u8 sl, u32 rlid,
+			  u32 lqpn, u32 rqpn, u8 svc_type)
+{
+	struct opa_hfi1_cong_log_event_internal *cc_event;
+
+	if (sl >= OPA_MAX_SLS)
+		return;
+
+	spin_lock(&ppd->cc_log_lock);
+
+	ppd->threshold_cong_event_map[sl/8] |= 1 << (sl % 8);
+	ppd->threshold_event_counter++;
+
+	cc_event = &ppd->cc_events[ppd->cc_log_idx++];
+	if (ppd->cc_log_idx == OPA_CONG_LOG_ELEMS)
+		ppd->cc_log_idx = 0;
+	cc_event->lqpn = lqpn & HFI1_QPN_MASK;
+	cc_event->rqpn = rqpn & HFI1_QPN_MASK;
+	cc_event->sl = sl;
+	cc_event->svc_type = svc_type;
+	cc_event->rlid = rlid;
+	/* keep timestamp in units of 1.024 usec */
+	cc_event->timestamp = ktime_to_ns(ktime_get()) / 1024;
+
+	spin_unlock(&ppd->cc_log_lock);
+}
+
+void process_becn(struct hfi1_pportdata *ppd, u8 sl, u16 rlid, u32 lqpn,
+		  u32 rqpn, u8 svc_type)
+{
+	struct cca_timer *cca_timer;
+	u16 ccti, ccti_incr, ccti_timer, ccti_limit;
+	u8 trigger_threshold;
+	struct cc_state *cc_state;
+
+	if (sl >= OPA_MAX_SLS)
+		return;
+
+	cca_timer = &ppd->cca_timer[sl];
+
+	cc_state = get_cc_state(ppd);
+
+	if (cc_state == NULL)
+		return;
+
+	/*
+	 * 1) increase CCTI (for this SL)
+	 * 2) select IPG (i.e., call set_link_ipg())
+	 * 3) start timer
+	 */
+	ccti_limit = cc_state->cct.ccti_limit;
+	ccti_incr = cc_state->cong_setting.entries[sl].ccti_increase;
+	ccti_timer = cc_state->cong_setting.entries[sl].ccti_timer;
+	trigger_threshold =
+		cc_state->cong_setting.entries[sl].trigger_threshold;
+
+	spin_lock(&ppd->cca_timer_lock);
+
+	if (cca_timer->ccti < ccti_limit) {
+		if (cca_timer->ccti + ccti_incr <= ccti_limit)
+			cca_timer->ccti += ccti_incr;
+		else
+			cca_timer->ccti = ccti_limit;
+		set_link_ipg(ppd);
+	}
+
+	spin_unlock(&ppd->cca_timer_lock);
+
+	ccti = cca_timer->ccti;
+
+	if (!hrtimer_active(&cca_timer->hrtimer)) {
+		/* ccti_timer is in units of 1.024 usec */
+		unsigned long nsec = 1024 * ccti_timer;
+
+		hrtimer_start(&cca_timer->hrtimer, ns_to_ktime(nsec),
+			      HRTIMER_MODE_REL);
+	}
+
+	if ((trigger_threshold != 0) && (ccti >= trigger_threshold))
+		log_cca_event(ppd, sl, rlid, lqpn, rqpn, svc_type);
+}
+
+/**
+ * hfi1_rc_rcv - process an incoming RC packet
+ * @rcd: the context pointer
+ * @hdr: the header of this packet
+ * @rcv_flags: flags relevant to rcv processing
+ * @data: the packet data
+ * @tlen: the packet length
+ * @qp: the QP for this packet
+ *
+ * This is called from qp_rcv() to process an incoming RC packet
+ * for the given QP.
+ * Called at interrupt level.
+ */
+void hfi1_rc_rcv(struct hfi1_packet *packet)
+{
+	struct hfi1_ctxtdata *rcd = packet->rcd;
+	struct hfi1_ib_header *hdr = packet->hdr;
+	u32 rcv_flags = packet->rcv_flags;
+	void *data = packet->ebuf;
+	u32 tlen = packet->tlen;
+	struct hfi1_qp *qp = packet->qp;
+	struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	struct hfi1_other_headers *ohdr = packet->ohdr;
+	u32 bth0, opcode;
+	u32 hdrsize = packet->hlen;
+	u32 psn;
+	u32 pad;
+	struct ib_wc wc;
+	u32 pmtu = qp->pmtu;
+	int diff;
+	struct ib_reth *reth;
+	unsigned long flags;
+	u32 bth1;
+	int ret, is_fecn = 0;
+
+	bth0 = be32_to_cpu(ohdr->bth[0]);
+	if (hfi1_ruc_check_hdr(ibp, hdr, rcv_flags & HFI1_HAS_GRH, qp, bth0))
+		return;
+
+	bth1 = be32_to_cpu(ohdr->bth[1]);
+	if (unlikely(bth1 & (HFI1_BECN_SMASK | HFI1_FECN_SMASK))) {
+		if (bth1 & HFI1_BECN_SMASK) {
+			u16 rlid = qp->remote_ah_attr.dlid;
+			u32 lqpn, rqpn;
+
+			lqpn = qp->ibqp.qp_num;
+			rqpn = qp->remote_qpn;
+			process_becn(
+				ppd,
+				qp->remote_ah_attr.sl,
+				rlid, lqpn, rqpn,
+				IB_CC_SVCTYPE_RC);
+		}
+		is_fecn = bth1 & HFI1_FECN_SMASK;
+	}
+
+	psn = be32_to_cpu(ohdr->bth[2]);
+	opcode = bth0 >> 24;
+
+	/*
+	 * Process responses (ACKs) before anything else.  Note that the
+	 * packet sequence number will be for something in the send work
+	 * queue rather than the expected receive packet sequence number.
+	 * In other words, this QP is the requester.
+	 */
+	if (opcode >= OP(RDMA_READ_RESPONSE_FIRST) &&
+	    opcode <= OP(ATOMIC_ACKNOWLEDGE)) {
+		rc_rcv_resp(ibp, ohdr, data, tlen, qp, opcode, psn,
+			    hdrsize, pmtu, rcd);
+		if (is_fecn)
+			goto send_ack;
+		return;
+	}
+
+	/* Compute 24 bits worth of difference. */
+	diff = delta_psn(psn, qp->r_psn);
+	if (unlikely(diff)) {
+		if (rc_rcv_error(ohdr, data, qp, opcode, psn, diff, rcd))
+			return;
+		goto send_ack;
+	}
+
+	/* Check for opcode sequence errors. */
+	switch (qp->r_state) {
+	case OP(SEND_FIRST):
+	case OP(SEND_MIDDLE):
+		if (opcode == OP(SEND_MIDDLE) ||
+		    opcode == OP(SEND_LAST) ||
+		    opcode == OP(SEND_LAST_WITH_IMMEDIATE))
+			break;
+		goto nack_inv;
+
+	case OP(RDMA_WRITE_FIRST):
+	case OP(RDMA_WRITE_MIDDLE):
+		if (opcode == OP(RDMA_WRITE_MIDDLE) ||
+		    opcode == OP(RDMA_WRITE_LAST) ||
+		    opcode == OP(RDMA_WRITE_LAST_WITH_IMMEDIATE))
+			break;
+		goto nack_inv;
+
+	default:
+		if (opcode == OP(SEND_MIDDLE) ||
+		    opcode == OP(SEND_LAST) ||
+		    opcode == OP(SEND_LAST_WITH_IMMEDIATE) ||
+		    opcode == OP(RDMA_WRITE_MIDDLE) ||
+		    opcode == OP(RDMA_WRITE_LAST) ||
+		    opcode == OP(RDMA_WRITE_LAST_WITH_IMMEDIATE))
+			goto nack_inv;
+		/*
+		 * Note that it is up to the requester to not send a new
+		 * RDMA read or atomic operation before receiving an ACK
+		 * for the previous operation.
+		 */
+		break;
+	}
+
+	if (qp->state == IB_QPS_RTR && !(qp->r_flags & HFI1_R_COMM_EST))
+		qp_comm_est(qp);
+
+	/* OK, process the packet. */
+	switch (opcode) {
+	case OP(SEND_FIRST):
+		ret = hfi1_get_rwqe(qp, 0);
+		if (ret < 0)
+			goto nack_op_err;
+		if (!ret)
+			goto rnr_nak;
+		qp->r_rcv_len = 0;
+		/* FALLTHROUGH */
+	case OP(SEND_MIDDLE):
+	case OP(RDMA_WRITE_MIDDLE):
+send_middle:
+		/* Check for invalid length PMTU or posted rwqe len. */
+		if (unlikely(tlen != (hdrsize + pmtu + 4)))
+			goto nack_inv;
+		qp->r_rcv_len += pmtu;
+		if (unlikely(qp->r_rcv_len > qp->r_len))
+			goto nack_inv;
+		hfi1_copy_sge(&qp->r_sge, data, pmtu, 1);
+		break;
+
+	case OP(RDMA_WRITE_LAST_WITH_IMMEDIATE):
+		/* consume RWQE */
+		ret = hfi1_get_rwqe(qp, 1);
+		if (ret < 0)
+			goto nack_op_err;
+		if (!ret)
+			goto rnr_nak;
+		goto send_last_imm;
+
+	case OP(SEND_ONLY):
+	case OP(SEND_ONLY_WITH_IMMEDIATE):
+		ret = hfi1_get_rwqe(qp, 0);
+		if (ret < 0)
+			goto nack_op_err;
+		if (!ret)
+			goto rnr_nak;
+		qp->r_rcv_len = 0;
+		if (opcode == OP(SEND_ONLY))
+			goto no_immediate_data;
+		/* FALLTHROUGH for SEND_ONLY_WITH_IMMEDIATE */
+	case OP(SEND_LAST_WITH_IMMEDIATE):
+send_last_imm:
+		wc.ex.imm_data = ohdr->u.imm_data;
+		wc.wc_flags = IB_WC_WITH_IMM;
+		goto send_last;
+	case OP(SEND_LAST):
+	case OP(RDMA_WRITE_LAST):
+no_immediate_data:
+		wc.wc_flags = 0;
+		wc.ex.imm_data = 0;
+send_last:
+		/* Get the number of bytes the message was padded by. */
+		pad = (bth0 >> 20) & 3;
+		/* Check for invalid length. */
+		/* LAST len should be >= 1 */
+		if (unlikely(tlen < (hdrsize + pad + 4)))
+			goto nack_inv;
+		/* Don't count the CRC. */
+		tlen -= (hdrsize + pad + 4);
+		wc.byte_len = tlen + qp->r_rcv_len;
+		if (unlikely(wc.byte_len > qp->r_len))
+			goto nack_inv;
+		hfi1_copy_sge(&qp->r_sge, data, tlen, 1);
+		hfi1_put_ss(&qp->r_sge);
+		qp->r_msn++;
+		if (!test_and_clear_bit(HFI1_R_WRID_VALID, &qp->r_aflags))
+			break;
+		wc.wr_id = qp->r_wr_id;
+		wc.status = IB_WC_SUCCESS;
+		if (opcode == OP(RDMA_WRITE_LAST_WITH_IMMEDIATE) ||
+		    opcode == OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE))
+			wc.opcode = IB_WC_RECV_RDMA_WITH_IMM;
+		else
+			wc.opcode = IB_WC_RECV;
+		wc.qp = &qp->ibqp;
+		wc.src_qp = qp->remote_qpn;
+		wc.slid = qp->remote_ah_attr.dlid;
+		/*
+		 * It seems that IB mandates the presence of an SL in a
+		 * work completion only for the UD transport (see section
+		 * 11.4.2 of IBTA Vol. 1).
+		 *
+		 * However, the way the SL is chosen below is consistent
+		 * with the way that IB/qib works and is trying avoid
+		 * introducing incompatibilities.
+		 *
+		 * See also OPA Vol. 1, section 9.7.6, and table 9-17.
+		 */
+		wc.sl = qp->remote_ah_attr.sl;
+		/* zero fields that are N/A */
+		wc.vendor_err = 0;
+		wc.pkey_index = 0;
+		wc.dlid_path_bits = 0;
+		wc.port_num = 0;
+		/* Signal completion event if the solicited bit is set. */
+		hfi1_cq_enter(to_icq(qp->ibqp.recv_cq), &wc,
+			      (bth0 & IB_BTH_SOLICITED) != 0);
+		break;
+
+	case OP(RDMA_WRITE_FIRST):
+	case OP(RDMA_WRITE_ONLY):
+	case OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE):
+		if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_WRITE)))
+			goto nack_inv;
+		/* consume RWQE */
+		reth = &ohdr->u.rc.reth;
+		qp->r_len = be32_to_cpu(reth->length);
+		qp->r_rcv_len = 0;
+		qp->r_sge.sg_list = NULL;
+		if (qp->r_len != 0) {
+			u32 rkey = be32_to_cpu(reth->rkey);
+			u64 vaddr = be64_to_cpu(reth->vaddr);
+			int ok;
+
+			/* Check rkey & NAK */
+			ok = hfi1_rkey_ok(qp, &qp->r_sge.sge, qp->r_len, vaddr,
+					  rkey, IB_ACCESS_REMOTE_WRITE);
+			if (unlikely(!ok))
+				goto nack_acc;
+			qp->r_sge.num_sge = 1;
+		} else {
+			qp->r_sge.num_sge = 0;
+			qp->r_sge.sge.mr = NULL;
+			qp->r_sge.sge.vaddr = NULL;
+			qp->r_sge.sge.length = 0;
+			qp->r_sge.sge.sge_length = 0;
+		}
+		if (opcode == OP(RDMA_WRITE_FIRST))
+			goto send_middle;
+		else if (opcode == OP(RDMA_WRITE_ONLY))
+			goto no_immediate_data;
+		ret = hfi1_get_rwqe(qp, 1);
+		if (ret < 0)
+			goto nack_op_err;
+		if (!ret)
+			goto rnr_nak;
+		wc.ex.imm_data = ohdr->u.rc.imm_data;
+		wc.wc_flags = IB_WC_WITH_IMM;
+		goto send_last;
+
+	case OP(RDMA_READ_REQUEST): {
+		struct hfi1_ack_entry *e;
+		u32 len;
+		u8 next;
+
+		if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_READ)))
+			goto nack_inv;
+		next = qp->r_head_ack_queue + 1;
+		/* s_ack_queue is size HFI1_MAX_RDMA_ATOMIC+1 so use > not >= */
+		if (next > HFI1_MAX_RDMA_ATOMIC)
+			next = 0;
+		spin_lock_irqsave(&qp->s_lock, flags);
+		if (unlikely(next == qp->s_tail_ack_queue)) {
+			if (!qp->s_ack_queue[next].sent)
+				goto nack_inv_unlck;
+			update_ack_queue(qp, next);
+		}
+		e = &qp->s_ack_queue[qp->r_head_ack_queue];
+		if (e->opcode == OP(RDMA_READ_REQUEST) && e->rdma_sge.mr) {
+			hfi1_put_mr(e->rdma_sge.mr);
+			e->rdma_sge.mr = NULL;
+		}
+		reth = &ohdr->u.rc.reth;
+		len = be32_to_cpu(reth->length);
+		if (len) {
+			u32 rkey = be32_to_cpu(reth->rkey);
+			u64 vaddr = be64_to_cpu(reth->vaddr);
+			int ok;
+
+			/* Check rkey & NAK */
+			ok = hfi1_rkey_ok(qp, &e->rdma_sge, len, vaddr,
+					  rkey, IB_ACCESS_REMOTE_READ);
+			if (unlikely(!ok))
+				goto nack_acc_unlck;
+			/*
+			 * Update the next expected PSN.  We add 1 later
+			 * below, so only add the remainder here.
+			 */
+			if (len > pmtu)
+				qp->r_psn += (len - 1) / pmtu;
+		} else {
+			e->rdma_sge.mr = NULL;
+			e->rdma_sge.vaddr = NULL;
+			e->rdma_sge.length = 0;
+			e->rdma_sge.sge_length = 0;
+		}
+		e->opcode = opcode;
+		e->sent = 0;
+		e->psn = psn;
+		e->lpsn = qp->r_psn;
+		/*
+		 * We need to increment the MSN here instead of when we
+		 * finish sending the result since a duplicate request would
+		 * increment it more than once.
+		 */
+		qp->r_msn++;
+		qp->r_psn++;
+		qp->r_state = opcode;
+		qp->r_nak_state = 0;
+		qp->r_head_ack_queue = next;
+
+		/* Schedule the send tasklet. */
+		qp->s_flags |= HFI1_S_RESP_PENDING;
+		hfi1_schedule_send(qp);
+
+		spin_unlock_irqrestore(&qp->s_lock, flags);
+		if (is_fecn)
+			goto send_ack;
+		return;
+	}
+
+	case OP(COMPARE_SWAP):
+	case OP(FETCH_ADD): {
+		struct ib_atomic_eth *ateth;
+		struct hfi1_ack_entry *e;
+		u64 vaddr;
+		atomic64_t *maddr;
+		u64 sdata;
+		u32 rkey;
+		u8 next;
+
+		if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_ATOMIC)))
+			goto nack_inv;
+		next = qp->r_head_ack_queue + 1;
+		if (next > HFI1_MAX_RDMA_ATOMIC)
+			next = 0;
+		spin_lock_irqsave(&qp->s_lock, flags);
+		if (unlikely(next == qp->s_tail_ack_queue)) {
+			if (!qp->s_ack_queue[next].sent)
+				goto nack_inv_unlck;
+			update_ack_queue(qp, next);
+		}
+		e = &qp->s_ack_queue[qp->r_head_ack_queue];
+		if (e->opcode == OP(RDMA_READ_REQUEST) && e->rdma_sge.mr) {
+			hfi1_put_mr(e->rdma_sge.mr);
+			e->rdma_sge.mr = NULL;
+		}
+		ateth = &ohdr->u.atomic_eth;
+		vaddr = ((u64) be32_to_cpu(ateth->vaddr[0]) << 32) |
+			be32_to_cpu(ateth->vaddr[1]);
+		if (unlikely(vaddr & (sizeof(u64) - 1)))
+			goto nack_inv_unlck;
+		rkey = be32_to_cpu(ateth->rkey);
+		/* Check rkey & NAK */
+		if (unlikely(!hfi1_rkey_ok(qp, &qp->r_sge.sge, sizeof(u64),
+					   vaddr, rkey,
+					   IB_ACCESS_REMOTE_ATOMIC)))
+			goto nack_acc_unlck;
+		/* Perform atomic OP and save result. */
+		maddr = (atomic64_t *) qp->r_sge.sge.vaddr;
+		sdata = be64_to_cpu(ateth->swap_data);
+		e->atomic_data = (opcode == OP(FETCH_ADD)) ?
+			(u64) atomic64_add_return(sdata, maddr) - sdata :
+			(u64) cmpxchg((u64 *) qp->r_sge.sge.vaddr,
+				      be64_to_cpu(ateth->compare_data),
+				      sdata);
+		hfi1_put_mr(qp->r_sge.sge.mr);
+		qp->r_sge.num_sge = 0;
+		e->opcode = opcode;
+		e->sent = 0;
+		e->psn = psn;
+		e->lpsn = psn;
+		qp->r_msn++;
+		qp->r_psn++;
+		qp->r_state = opcode;
+		qp->r_nak_state = 0;
+		qp->r_head_ack_queue = next;
+
+		/* Schedule the send tasklet. */
+		qp->s_flags |= HFI1_S_RESP_PENDING;
+		hfi1_schedule_send(qp);
+
+		spin_unlock_irqrestore(&qp->s_lock, flags);
+		if (is_fecn)
+			goto send_ack;
+		return;
+	}
+
+	default:
+		/* NAK unknown opcodes. */
+		goto nack_inv;
+	}
+	qp->r_psn++;
+	qp->r_state = opcode;
+	qp->r_ack_psn = psn;
+	qp->r_nak_state = 0;
+	/* Send an ACK if requested or required. */
+	if (psn & (1 << 31))
+		goto send_ack;
+	return;
+
+rnr_nak:
+	qp->r_nak_state = IB_RNR_NAK | qp->r_min_rnr_timer;
+	qp->r_ack_psn = qp->r_psn;
+	/* Queue RNR NAK for later */
+	if (list_empty(&qp->rspwait)) {
+		qp->r_flags |= HFI1_R_RSP_NAK;
+		atomic_inc(&qp->refcount);
+		list_add_tail(&qp->rspwait, &rcd->qp_wait_list);
+	}
+	return;
+
+nack_op_err:
+	hfi1_rc_error(qp, IB_WC_LOC_QP_OP_ERR);
+	qp->r_nak_state = IB_NAK_REMOTE_OPERATIONAL_ERROR;
+	qp->r_ack_psn = qp->r_psn;
+	/* Queue NAK for later */
+	if (list_empty(&qp->rspwait)) {
+		qp->r_flags |= HFI1_R_RSP_NAK;
+		atomic_inc(&qp->refcount);
+		list_add_tail(&qp->rspwait, &rcd->qp_wait_list);
+	}
+	return;
+
+nack_inv_unlck:
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+nack_inv:
+	hfi1_rc_error(qp, IB_WC_LOC_QP_OP_ERR);
+	qp->r_nak_state = IB_NAK_INVALID_REQUEST;
+	qp->r_ack_psn = qp->r_psn;
+	/* Queue NAK for later */
+	if (list_empty(&qp->rspwait)) {
+		qp->r_flags |= HFI1_R_RSP_NAK;
+		atomic_inc(&qp->refcount);
+		list_add_tail(&qp->rspwait, &rcd->qp_wait_list);
+	}
+	return;
+
+nack_acc_unlck:
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+nack_acc:
+	hfi1_rc_error(qp, IB_WC_LOC_PROT_ERR);
+	qp->r_nak_state = IB_NAK_REMOTE_ACCESS_ERROR;
+	qp->r_ack_psn = qp->r_psn;
+send_ack:
+	hfi1_send_rc_ack(rcd, qp, is_fecn);
+}
+
+void hfi1_rc_hdrerr(
+	struct hfi1_ctxtdata *rcd,
+	struct hfi1_ib_header *hdr,
+	u32 rcv_flags,
+	struct hfi1_qp *qp)
+{
+	int has_grh = rcv_flags & HFI1_HAS_GRH;
+	struct hfi1_other_headers *ohdr;
+	struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
+	int diff;
+	u8 opcode;
+	u32 psn;
+
+	/* Check for GRH */
+	ohdr = &hdr->u.oth;
+	if (has_grh)
+		ohdr = &hdr->u.l.oth;
+
+	opcode = be32_to_cpu(ohdr->bth[0]);
+	if (hfi1_ruc_check_hdr(ibp, hdr, has_grh, qp, opcode))
+		return;
+
+	psn = be32_to_cpu(ohdr->bth[2]);
+	opcode >>= 24;
+
+	/* Only deal with RDMA Writes for now */
+	if (opcode < IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST) {
+		diff = delta_psn(psn, qp->r_psn);
+		if (!qp->r_nak_state && diff >= 0) {
+			ibp->n_rc_seqnak++;
+			qp->r_nak_state = IB_NAK_PSN_ERROR;
+			/* Use the expected PSN. */
+			qp->r_ack_psn = qp->r_psn;
+			/*
+			 * Wait to send the sequence
+			 * NAK until all packets
+			 * in the receive queue have
+			 * been processed.
+			 * Otherwise, we end up
+			 * propagating congestion.
+			 */
+			if (list_empty(&qp->rspwait)) {
+				qp->r_flags |= HFI1_R_RSP_NAK;
+				atomic_inc(&qp->refcount);
+				list_add_tail(
+					&qp->rspwait,
+					&rcd->qp_wait_list);
+				}
+		} /* Out of sequence NAK */
+	} /* QP Request NAKs */
+}
diff --git a/drivers/staging/rdma/hfi1/ruc.c b/drivers/staging/rdma/hfi1/ruc.c
new file mode 100644
index 000000000000..a4115288db66
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/ruc.c
@@ -0,0 +1,948 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/spinlock.h>
+
+#include "hfi.h"
+#include "mad.h"
+#include "qp.h"
+#include "sdma.h"
+
+/*
+ * Convert the AETH RNR timeout code into the number of microseconds.
+ */
+const u32 ib_hfi1_rnr_table[32] = {
+	655360,	/* 00: 655.36 */
+	10,	/* 01:    .01 */
+	20,	/* 02     .02 */
+	30,	/* 03:    .03 */
+	40,	/* 04:    .04 */
+	60,	/* 05:    .06 */
+	80,	/* 06:    .08 */
+	120,	/* 07:    .12 */
+	160,	/* 08:    .16 */
+	240,	/* 09:    .24 */
+	320,	/* 0A:    .32 */
+	480,	/* 0B:    .48 */
+	640,	/* 0C:    .64 */
+	960,	/* 0D:    .96 */
+	1280,	/* 0E:   1.28 */
+	1920,	/* 0F:   1.92 */
+	2560,	/* 10:   2.56 */
+	3840,	/* 11:   3.84 */
+	5120,	/* 12:   5.12 */
+	7680,	/* 13:   7.68 */
+	10240,	/* 14:  10.24 */
+	15360,	/* 15:  15.36 */
+	20480,	/* 16:  20.48 */
+	30720,	/* 17:  30.72 */
+	40960,	/* 18:  40.96 */
+	61440,	/* 19:  61.44 */
+	81920,	/* 1A:  81.92 */
+	122880,	/* 1B: 122.88 */
+	163840,	/* 1C: 163.84 */
+	245760,	/* 1D: 245.76 */
+	327680,	/* 1E: 327.68 */
+	491520	/* 1F: 491.52 */
+};
+
+/*
+ * Validate a RWQE and fill in the SGE state.
+ * Return 1 if OK.
+ */
+static int init_sge(struct hfi1_qp *qp, struct hfi1_rwqe *wqe)
+{
+	int i, j, ret;
+	struct ib_wc wc;
+	struct hfi1_lkey_table *rkt;
+	struct hfi1_pd *pd;
+	struct hfi1_sge_state *ss;
+
+	rkt = &to_idev(qp->ibqp.device)->lk_table;
+	pd = to_ipd(qp->ibqp.srq ? qp->ibqp.srq->pd : qp->ibqp.pd);
+	ss = &qp->r_sge;
+	ss->sg_list = qp->r_sg_list;
+	qp->r_len = 0;
+	for (i = j = 0; i < wqe->num_sge; i++) {
+		if (wqe->sg_list[i].length == 0)
+			continue;
+		/* Check LKEY */
+		if (!hfi1_lkey_ok(rkt, pd, j ? &ss->sg_list[j - 1] : &ss->sge,
+				  &wqe->sg_list[i], IB_ACCESS_LOCAL_WRITE))
+			goto bad_lkey;
+		qp->r_len += wqe->sg_list[i].length;
+		j++;
+	}
+	ss->num_sge = j;
+	ss->total_len = qp->r_len;
+	ret = 1;
+	goto bail;
+
+bad_lkey:
+	while (j) {
+		struct hfi1_sge *sge = --j ? &ss->sg_list[j - 1] : &ss->sge;
+
+		hfi1_put_mr(sge->mr);
+	}
+	ss->num_sge = 0;
+	memset(&wc, 0, sizeof(wc));
+	wc.wr_id = wqe->wr_id;
+	wc.status = IB_WC_LOC_PROT_ERR;
+	wc.opcode = IB_WC_RECV;
+	wc.qp = &qp->ibqp;
+	/* Signal solicited completion event. */
+	hfi1_cq_enter(to_icq(qp->ibqp.recv_cq), &wc, 1);
+	ret = 0;
+bail:
+	return ret;
+}
+
+/**
+ * hfi1_get_rwqe - copy the next RWQE into the QP's RWQE
+ * @qp: the QP
+ * @wr_id_only: update qp->r_wr_id only, not qp->r_sge
+ *
+ * Return -1 if there is a local error, 0 if no RWQE is available,
+ * otherwise return 1.
+ *
+ * Can be called from interrupt level.
+ */
+int hfi1_get_rwqe(struct hfi1_qp *qp, int wr_id_only)
+{
+	unsigned long flags;
+	struct hfi1_rq *rq;
+	struct hfi1_rwq *wq;
+	struct hfi1_srq *srq;
+	struct hfi1_rwqe *wqe;
+	void (*handler)(struct ib_event *, void *);
+	u32 tail;
+	int ret;
+
+	if (qp->ibqp.srq) {
+		srq = to_isrq(qp->ibqp.srq);
+		handler = srq->ibsrq.event_handler;
+		rq = &srq->rq;
+	} else {
+		srq = NULL;
+		handler = NULL;
+		rq = &qp->r_rq;
+	}
+
+	spin_lock_irqsave(&rq->lock, flags);
+	if (!(ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_RECV_OK)) {
+		ret = 0;
+		goto unlock;
+	}
+
+	wq = rq->wq;
+	tail = wq->tail;
+	/* Validate tail before using it since it is user writable. */
+	if (tail >= rq->size)
+		tail = 0;
+	if (unlikely(tail == wq->head)) {
+		ret = 0;
+		goto unlock;
+	}
+	/* Make sure entry is read after head index is read. */
+	smp_rmb();
+	wqe = get_rwqe_ptr(rq, tail);
+	/*
+	 * Even though we update the tail index in memory, the verbs
+	 * consumer is not supposed to post more entries until a
+	 * completion is generated.
+	 */
+	if (++tail >= rq->size)
+		tail = 0;
+	wq->tail = tail;
+	if (!wr_id_only && !init_sge(qp, wqe)) {
+		ret = -1;
+		goto unlock;
+	}
+	qp->r_wr_id = wqe->wr_id;
+
+	ret = 1;
+	set_bit(HFI1_R_WRID_VALID, &qp->r_aflags);
+	if (handler) {
+		u32 n;
+
+		/*
+		 * Validate head pointer value and compute
+		 * the number of remaining WQEs.
+		 */
+		n = wq->head;
+		if (n >= rq->size)
+			n = 0;
+		if (n < tail)
+			n += rq->size - tail;
+		else
+			n -= tail;
+		if (n < srq->limit) {
+			struct ib_event ev;
+
+			srq->limit = 0;
+			spin_unlock_irqrestore(&rq->lock, flags);
+			ev.device = qp->ibqp.device;
+			ev.element.srq = qp->ibqp.srq;
+			ev.event = IB_EVENT_SRQ_LIMIT_REACHED;
+			handler(&ev, srq->ibsrq.srq_context);
+			goto bail;
+		}
+	}
+unlock:
+	spin_unlock_irqrestore(&rq->lock, flags);
+bail:
+	return ret;
+}
+
+/*
+ * Switch to alternate path.
+ * The QP s_lock should be held and interrupts disabled.
+ */
+void hfi1_migrate_qp(struct hfi1_qp *qp)
+{
+	struct ib_event ev;
+
+	qp->s_mig_state = IB_MIG_MIGRATED;
+	qp->remote_ah_attr = qp->alt_ah_attr;
+	qp->port_num = qp->alt_ah_attr.port_num;
+	qp->s_pkey_index = qp->s_alt_pkey_index;
+	qp->s_flags |= HFI1_S_AHG_CLEAR;
+
+	ev.device = qp->ibqp.device;
+	ev.element.qp = &qp->ibqp;
+	ev.event = IB_EVENT_PATH_MIG;
+	qp->ibqp.event_handler(&ev, qp->ibqp.qp_context);
+}
+
+static __be64 get_sguid(struct hfi1_ibport *ibp, unsigned index)
+{
+	if (!index) {
+		struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+
+		return cpu_to_be64(ppd->guid);
+	}
+	return ibp->guids[index - 1];
+}
+
+static int gid_ok(union ib_gid *gid, __be64 gid_prefix, __be64 id)
+{
+	return (gid->global.interface_id == id &&
+		(gid->global.subnet_prefix == gid_prefix ||
+		 gid->global.subnet_prefix == IB_DEFAULT_GID_PREFIX));
+}
+
+/*
+ *
+ * This should be called with the QP r_lock held.
+ *
+ * The s_lock will be acquired around the hfi1_migrate_qp() call.
+ */
+int hfi1_ruc_check_hdr(struct hfi1_ibport *ibp, struct hfi1_ib_header *hdr,
+		       int has_grh, struct hfi1_qp *qp, u32 bth0)
+{
+	__be64 guid;
+	unsigned long flags;
+	u8 sc5 = ibp->sl_to_sc[qp->remote_ah_attr.sl];
+
+	if (qp->s_mig_state == IB_MIG_ARMED && (bth0 & IB_BTH_MIG_REQ)) {
+		if (!has_grh) {
+			if (qp->alt_ah_attr.ah_flags & IB_AH_GRH)
+				goto err;
+		} else {
+			if (!(qp->alt_ah_attr.ah_flags & IB_AH_GRH))
+				goto err;
+			guid = get_sguid(ibp, qp->alt_ah_attr.grh.sgid_index);
+			if (!gid_ok(&hdr->u.l.grh.dgid, ibp->gid_prefix, guid))
+				goto err;
+			if (!gid_ok(&hdr->u.l.grh.sgid,
+			    qp->alt_ah_attr.grh.dgid.global.subnet_prefix,
+			    qp->alt_ah_attr.grh.dgid.global.interface_id))
+				goto err;
+		}
+		if (unlikely(rcv_pkey_check(ppd_from_ibp(ibp), (u16)bth0,
+					    sc5, be16_to_cpu(hdr->lrh[3])))) {
+			hfi1_bad_pqkey(ibp, IB_NOTICE_TRAP_BAD_PKEY,
+				       (u16)bth0,
+				       (be16_to_cpu(hdr->lrh[0]) >> 4) & 0xF,
+				       0, qp->ibqp.qp_num,
+				       hdr->lrh[3], hdr->lrh[1]);
+			goto err;
+		}
+		/* Validate the SLID. See Ch. 9.6.1.5 and 17.2.8 */
+		if (be16_to_cpu(hdr->lrh[3]) != qp->alt_ah_attr.dlid ||
+		    ppd_from_ibp(ibp)->port != qp->alt_ah_attr.port_num)
+			goto err;
+		spin_lock_irqsave(&qp->s_lock, flags);
+		hfi1_migrate_qp(qp);
+		spin_unlock_irqrestore(&qp->s_lock, flags);
+	} else {
+		if (!has_grh) {
+			if (qp->remote_ah_attr.ah_flags & IB_AH_GRH)
+				goto err;
+		} else {
+			if (!(qp->remote_ah_attr.ah_flags & IB_AH_GRH))
+				goto err;
+			guid = get_sguid(ibp,
+					 qp->remote_ah_attr.grh.sgid_index);
+			if (!gid_ok(&hdr->u.l.grh.dgid, ibp->gid_prefix, guid))
+				goto err;
+			if (!gid_ok(&hdr->u.l.grh.sgid,
+			    qp->remote_ah_attr.grh.dgid.global.subnet_prefix,
+			    qp->remote_ah_attr.grh.dgid.global.interface_id))
+				goto err;
+		}
+		if (unlikely(rcv_pkey_check(ppd_from_ibp(ibp), (u16)bth0,
+					    sc5, be16_to_cpu(hdr->lrh[3])))) {
+			hfi1_bad_pqkey(ibp, IB_NOTICE_TRAP_BAD_PKEY,
+				       (u16)bth0,
+				       (be16_to_cpu(hdr->lrh[0]) >> 4) & 0xF,
+				       0, qp->ibqp.qp_num,
+				       hdr->lrh[3], hdr->lrh[1]);
+			goto err;
+		}
+		/* Validate the SLID. See Ch. 9.6.1.5 */
+		if (be16_to_cpu(hdr->lrh[3]) != qp->remote_ah_attr.dlid ||
+		    ppd_from_ibp(ibp)->port != qp->port_num)
+			goto err;
+		if (qp->s_mig_state == IB_MIG_REARM &&
+		    !(bth0 & IB_BTH_MIG_REQ))
+			qp->s_mig_state = IB_MIG_ARMED;
+	}
+
+	return 0;
+
+err:
+	return 1;
+}
+
+/**
+ * ruc_loopback - handle UC and RC loopback requests
+ * @sqp: the sending QP
+ *
+ * This is called from hfi1_do_send() to
+ * forward a WQE addressed to the same HFI.
+ * Note that although we are single threaded due to the tasklet, we still
+ * have to protect against post_send().  We don't have to worry about
+ * receive interrupts since this is a connected protocol and all packets
+ * will pass through here.
+ */
+static void ruc_loopback(struct hfi1_qp *sqp)
+{
+	struct hfi1_ibport *ibp = to_iport(sqp->ibqp.device, sqp->port_num);
+	struct hfi1_qp *qp;
+	struct hfi1_swqe *wqe;
+	struct hfi1_sge *sge;
+	unsigned long flags;
+	struct ib_wc wc;
+	u64 sdata;
+	atomic64_t *maddr;
+	enum ib_wc_status send_status;
+	int release;
+	int ret;
+
+	rcu_read_lock();
+
+	/*
+	 * Note that we check the responder QP state after
+	 * checking the requester's state.
+	 */
+	qp = hfi1_lookup_qpn(ibp, sqp->remote_qpn);
+
+	spin_lock_irqsave(&sqp->s_lock, flags);
+
+	/* Return if we are already busy processing a work request. */
+	if ((sqp->s_flags & (HFI1_S_BUSY | HFI1_S_ANY_WAIT)) ||
+	    !(ib_hfi1_state_ops[sqp->state] & HFI1_PROCESS_OR_FLUSH_SEND))
+		goto unlock;
+
+	sqp->s_flags |= HFI1_S_BUSY;
+
+again:
+	if (sqp->s_last == sqp->s_head)
+		goto clr_busy;
+	wqe = get_swqe_ptr(sqp, sqp->s_last);
+
+	/* Return if it is not OK to start a new work request. */
+	if (!(ib_hfi1_state_ops[sqp->state] & HFI1_PROCESS_NEXT_SEND_OK)) {
+		if (!(ib_hfi1_state_ops[sqp->state] & HFI1_FLUSH_SEND))
+			goto clr_busy;
+		/* We are in the error state, flush the work request. */
+		send_status = IB_WC_WR_FLUSH_ERR;
+		goto flush_send;
+	}
+
+	/*
+	 * We can rely on the entry not changing without the s_lock
+	 * being held until we update s_last.
+	 * We increment s_cur to indicate s_last is in progress.
+	 */
+	if (sqp->s_last == sqp->s_cur) {
+		if (++sqp->s_cur >= sqp->s_size)
+			sqp->s_cur = 0;
+	}
+	spin_unlock_irqrestore(&sqp->s_lock, flags);
+
+	if (!qp || !(ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_RECV_OK) ||
+	    qp->ibqp.qp_type != sqp->ibqp.qp_type) {
+		ibp->n_pkt_drops++;
+		/*
+		 * For RC, the requester would timeout and retry so
+		 * shortcut the timeouts and just signal too many retries.
+		 */
+		if (sqp->ibqp.qp_type == IB_QPT_RC)
+			send_status = IB_WC_RETRY_EXC_ERR;
+		else
+			send_status = IB_WC_SUCCESS;
+		goto serr;
+	}
+
+	memset(&wc, 0, sizeof(wc));
+	send_status = IB_WC_SUCCESS;
+
+	release = 1;
+	sqp->s_sge.sge = wqe->sg_list[0];
+	sqp->s_sge.sg_list = wqe->sg_list + 1;
+	sqp->s_sge.num_sge = wqe->wr.num_sge;
+	sqp->s_len = wqe->length;
+	switch (wqe->wr.opcode) {
+	case IB_WR_SEND_WITH_IMM:
+		wc.wc_flags = IB_WC_WITH_IMM;
+		wc.ex.imm_data = wqe->wr.ex.imm_data;
+		/* FALLTHROUGH */
+	case IB_WR_SEND:
+		ret = hfi1_get_rwqe(qp, 0);
+		if (ret < 0)
+			goto op_err;
+		if (!ret)
+			goto rnr_nak;
+		break;
+
+	case IB_WR_RDMA_WRITE_WITH_IMM:
+		if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_WRITE)))
+			goto inv_err;
+		wc.wc_flags = IB_WC_WITH_IMM;
+		wc.ex.imm_data = wqe->wr.ex.imm_data;
+		ret = hfi1_get_rwqe(qp, 1);
+		if (ret < 0)
+			goto op_err;
+		if (!ret)
+			goto rnr_nak;
+		/* FALLTHROUGH */
+	case IB_WR_RDMA_WRITE:
+		if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_WRITE)))
+			goto inv_err;
+		if (wqe->length == 0)
+			break;
+		if (unlikely(!hfi1_rkey_ok(qp, &qp->r_sge.sge, wqe->length,
+					   wqe->wr.wr.rdma.remote_addr,
+					   wqe->wr.wr.rdma.rkey,
+					   IB_ACCESS_REMOTE_WRITE)))
+			goto acc_err;
+		qp->r_sge.sg_list = NULL;
+		qp->r_sge.num_sge = 1;
+		qp->r_sge.total_len = wqe->length;
+		break;
+
+	case IB_WR_RDMA_READ:
+		if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_READ)))
+			goto inv_err;
+		if (unlikely(!hfi1_rkey_ok(qp, &sqp->s_sge.sge, wqe->length,
+					   wqe->wr.wr.rdma.remote_addr,
+					   wqe->wr.wr.rdma.rkey,
+					   IB_ACCESS_REMOTE_READ)))
+			goto acc_err;
+		release = 0;
+		sqp->s_sge.sg_list = NULL;
+		sqp->s_sge.num_sge = 1;
+		qp->r_sge.sge = wqe->sg_list[0];
+		qp->r_sge.sg_list = wqe->sg_list + 1;
+		qp->r_sge.num_sge = wqe->wr.num_sge;
+		qp->r_sge.total_len = wqe->length;
+		break;
+
+	case IB_WR_ATOMIC_CMP_AND_SWP:
+	case IB_WR_ATOMIC_FETCH_AND_ADD:
+		if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_ATOMIC)))
+			goto inv_err;
+		if (unlikely(!hfi1_rkey_ok(qp, &qp->r_sge.sge, sizeof(u64),
+					   wqe->wr.wr.atomic.remote_addr,
+					   wqe->wr.wr.atomic.rkey,
+					   IB_ACCESS_REMOTE_ATOMIC)))
+			goto acc_err;
+		/* Perform atomic OP and save result. */
+		maddr = (atomic64_t *) qp->r_sge.sge.vaddr;
+		sdata = wqe->wr.wr.atomic.compare_add;
+		*(u64 *) sqp->s_sge.sge.vaddr =
+			(wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD) ?
+			(u64) atomic64_add_return(sdata, maddr) - sdata :
+			(u64) cmpxchg((u64 *) qp->r_sge.sge.vaddr,
+				      sdata, wqe->wr.wr.atomic.swap);
+		hfi1_put_mr(qp->r_sge.sge.mr);
+		qp->r_sge.num_sge = 0;
+		goto send_comp;
+
+	default:
+		send_status = IB_WC_LOC_QP_OP_ERR;
+		goto serr;
+	}
+
+	sge = &sqp->s_sge.sge;
+	while (sqp->s_len) {
+		u32 len = sqp->s_len;
+
+		if (len > sge->length)
+			len = sge->length;
+		if (len > sge->sge_length)
+			len = sge->sge_length;
+		WARN_ON_ONCE(len == 0);
+		hfi1_copy_sge(&qp->r_sge, sge->vaddr, len, release);
+		sge->vaddr += len;
+		sge->length -= len;
+		sge->sge_length -= len;
+		if (sge->sge_length == 0) {
+			if (!release)
+				hfi1_put_mr(sge->mr);
+			if (--sqp->s_sge.num_sge)
+				*sge = *sqp->s_sge.sg_list++;
+		} else if (sge->length == 0 && sge->mr->lkey) {
+			if (++sge->n >= HFI1_SEGSZ) {
+				if (++sge->m >= sge->mr->mapsz)
+					break;
+				sge->n = 0;
+			}
+			sge->vaddr =
+				sge->mr->map[sge->m]->segs[sge->n].vaddr;
+			sge->length =
+				sge->mr->map[sge->m]->segs[sge->n].length;
+		}
+		sqp->s_len -= len;
+	}
+	if (release)
+		hfi1_put_ss(&qp->r_sge);
+
+	if (!test_and_clear_bit(HFI1_R_WRID_VALID, &qp->r_aflags))
+		goto send_comp;
+
+	if (wqe->wr.opcode == IB_WR_RDMA_WRITE_WITH_IMM)
+		wc.opcode = IB_WC_RECV_RDMA_WITH_IMM;
+	else
+		wc.opcode = IB_WC_RECV;
+	wc.wr_id = qp->r_wr_id;
+	wc.status = IB_WC_SUCCESS;
+	wc.byte_len = wqe->length;
+	wc.qp = &qp->ibqp;
+	wc.src_qp = qp->remote_qpn;
+	wc.slid = qp->remote_ah_attr.dlid;
+	wc.sl = qp->remote_ah_attr.sl;
+	wc.port_num = 1;
+	/* Signal completion event if the solicited bit is set. */
+	hfi1_cq_enter(to_icq(qp->ibqp.recv_cq), &wc,
+		      wqe->wr.send_flags & IB_SEND_SOLICITED);
+
+send_comp:
+	spin_lock_irqsave(&sqp->s_lock, flags);
+	ibp->n_loop_pkts++;
+flush_send:
+	sqp->s_rnr_retry = sqp->s_rnr_retry_cnt;
+	hfi1_send_complete(sqp, wqe, send_status);
+	goto again;
+
+rnr_nak:
+	/* Handle RNR NAK */
+	if (qp->ibqp.qp_type == IB_QPT_UC)
+		goto send_comp;
+	ibp->n_rnr_naks++;
+	/*
+	 * Note: we don't need the s_lock held since the BUSY flag
+	 * makes this single threaded.
+	 */
+	if (sqp->s_rnr_retry == 0) {
+		send_status = IB_WC_RNR_RETRY_EXC_ERR;
+		goto serr;
+	}
+	if (sqp->s_rnr_retry_cnt < 7)
+		sqp->s_rnr_retry--;
+	spin_lock_irqsave(&sqp->s_lock, flags);
+	if (!(ib_hfi1_state_ops[sqp->state] & HFI1_PROCESS_RECV_OK))
+		goto clr_busy;
+	sqp->s_flags |= HFI1_S_WAIT_RNR;
+	sqp->s_timer.function = hfi1_rc_rnr_retry;
+	sqp->s_timer.expires = jiffies +
+		usecs_to_jiffies(ib_hfi1_rnr_table[qp->r_min_rnr_timer]);
+	add_timer(&sqp->s_timer);
+	goto clr_busy;
+
+op_err:
+	send_status = IB_WC_REM_OP_ERR;
+	wc.status = IB_WC_LOC_QP_OP_ERR;
+	goto err;
+
+inv_err:
+	send_status = IB_WC_REM_INV_REQ_ERR;
+	wc.status = IB_WC_LOC_QP_OP_ERR;
+	goto err;
+
+acc_err:
+	send_status = IB_WC_REM_ACCESS_ERR;
+	wc.status = IB_WC_LOC_PROT_ERR;
+err:
+	/* responder goes to error state */
+	hfi1_rc_error(qp, wc.status);
+
+serr:
+	spin_lock_irqsave(&sqp->s_lock, flags);
+	hfi1_send_complete(sqp, wqe, send_status);
+	if (sqp->ibqp.qp_type == IB_QPT_RC) {
+		int lastwqe = hfi1_error_qp(sqp, IB_WC_WR_FLUSH_ERR);
+
+		sqp->s_flags &= ~HFI1_S_BUSY;
+		spin_unlock_irqrestore(&sqp->s_lock, flags);
+		if (lastwqe) {
+			struct ib_event ev;
+
+			ev.device = sqp->ibqp.device;
+			ev.element.qp = &sqp->ibqp;
+			ev.event = IB_EVENT_QP_LAST_WQE_REACHED;
+			sqp->ibqp.event_handler(&ev, sqp->ibqp.qp_context);
+		}
+		goto done;
+	}
+clr_busy:
+	sqp->s_flags &= ~HFI1_S_BUSY;
+unlock:
+	spin_unlock_irqrestore(&sqp->s_lock, flags);
+done:
+	rcu_read_unlock();
+}
+
+/**
+ * hfi1_make_grh - construct a GRH header
+ * @ibp: a pointer to the IB port
+ * @hdr: a pointer to the GRH header being constructed
+ * @grh: the global route address to send to
+ * @hwords: the number of 32 bit words of header being sent
+ * @nwords: the number of 32 bit words of data being sent
+ *
+ * Return the size of the header in 32 bit words.
+ */
+u32 hfi1_make_grh(struct hfi1_ibport *ibp, struct ib_grh *hdr,
+		  struct ib_global_route *grh, u32 hwords, u32 nwords)
+{
+	hdr->version_tclass_flow =
+		cpu_to_be32((IB_GRH_VERSION << IB_GRH_VERSION_SHIFT) |
+			    (grh->traffic_class << IB_GRH_TCLASS_SHIFT) |
+			    (grh->flow_label << IB_GRH_FLOW_SHIFT));
+	hdr->paylen = cpu_to_be16((hwords - 2 + nwords + SIZE_OF_CRC) << 2);
+	/* next_hdr is defined by C8-7 in ch. 8.4.1 */
+	hdr->next_hdr = IB_GRH_NEXT_HDR;
+	hdr->hop_limit = grh->hop_limit;
+	/* The SGID is 32-bit aligned. */
+	hdr->sgid.global.subnet_prefix = ibp->gid_prefix;
+	hdr->sgid.global.interface_id =
+		grh->sgid_index && grh->sgid_index < ARRAY_SIZE(ibp->guids) ?
+		ibp->guids[grh->sgid_index - 1] :
+			cpu_to_be64(ppd_from_ibp(ibp)->guid);
+	hdr->dgid = grh->dgid;
+
+	/* GRH header size in 32-bit words. */
+	return sizeof(struct ib_grh) / sizeof(u32);
+}
+
+/*
+ * free_ahg - clear ahg from QP
+ */
+void clear_ahg(struct hfi1_qp *qp)
+{
+	qp->s_hdr->ahgcount = 0;
+	qp->s_flags &= ~(HFI1_S_AHG_VALID | HFI1_S_AHG_CLEAR);
+	if (qp->s_sde)
+		sdma_ahg_free(qp->s_sde, qp->s_ahgidx);
+	qp->s_ahgidx = -1;
+	qp->s_sde = NULL;
+}
+
+#define BTH2_OFFSET (offsetof(struct hfi1_pio_header, hdr.u.oth.bth[2]) / 4)
+
+/**
+ * build_ahg - create ahg in s_hdr
+ * @qp: a pointer to QP
+ * @npsn: the next PSN for the request/response
+ *
+ * This routine handles the AHG by allocating an ahg entry and causing the
+ * copy of the first middle.
+ *
+ * Subsequent middles use the copied entry, editing the
+ * PSN with 1 or 2 edits.
+ */
+static inline void build_ahg(struct hfi1_qp *qp, u32 npsn)
+{
+	if (unlikely(qp->s_flags & HFI1_S_AHG_CLEAR))
+		clear_ahg(qp);
+	if (!(qp->s_flags & HFI1_S_AHG_VALID)) {
+		/* first middle that needs copy  */
+		if (qp->s_ahgidx < 0) {
+			if (!qp->s_sde)
+				qp->s_sde = qp_to_sdma_engine(qp, qp->s_sc);
+			qp->s_ahgidx = sdma_ahg_alloc(qp->s_sde);
+		}
+		if (qp->s_ahgidx >= 0) {
+			qp->s_ahgpsn = npsn;
+			qp->s_hdr->tx_flags |= SDMA_TXREQ_F_AHG_COPY;
+			/* save to protect a change in another thread */
+			qp->s_hdr->sde = qp->s_sde;
+			qp->s_hdr->ahgidx = qp->s_ahgidx;
+			qp->s_flags |= HFI1_S_AHG_VALID;
+		}
+	} else {
+		/* subsequent middle after valid */
+		if (qp->s_ahgidx >= 0) {
+			qp->s_hdr->tx_flags |= SDMA_TXREQ_F_USE_AHG;
+			qp->s_hdr->ahgidx = qp->s_ahgidx;
+			qp->s_hdr->ahgcount++;
+			qp->s_hdr->ahgdesc[0] =
+				sdma_build_ahg_descriptor(
+					(__force u16)cpu_to_be16((u16)npsn),
+					BTH2_OFFSET,
+					16,
+					16);
+			if ((npsn & 0xffff0000) !=
+					(qp->s_ahgpsn & 0xffff0000)) {
+				qp->s_hdr->ahgcount++;
+				qp->s_hdr->ahgdesc[1] =
+					sdma_build_ahg_descriptor(
+						(__force u16)cpu_to_be16(
+							(u16)(npsn >> 16)),
+						BTH2_OFFSET,
+						0,
+						16);
+			}
+		}
+	}
+}
+
+void hfi1_make_ruc_header(struct hfi1_qp *qp, struct hfi1_other_headers *ohdr,
+			  u32 bth0, u32 bth2, int middle)
+{
+	struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
+	u16 lrh0;
+	u32 nwords;
+	u32 extra_bytes;
+	u8 sc5;
+	u32 bth1;
+
+	/* Construct the header. */
+	extra_bytes = -qp->s_cur_size & 3;
+	nwords = (qp->s_cur_size + extra_bytes) >> 2;
+	lrh0 = HFI1_LRH_BTH;
+	if (unlikely(qp->remote_ah_attr.ah_flags & IB_AH_GRH)) {
+		qp->s_hdrwords += hfi1_make_grh(ibp, &qp->s_hdr->ibh.u.l.grh,
+					       &qp->remote_ah_attr.grh,
+					       qp->s_hdrwords, nwords);
+		lrh0 = HFI1_LRH_GRH;
+		middle = 0;
+	}
+	sc5 = ibp->sl_to_sc[qp->remote_ah_attr.sl];
+	lrh0 |= (sc5 & 0xf) << 12 | (qp->remote_ah_attr.sl & 0xf) << 4;
+	qp->s_sc = sc5;
+	/*
+	 * reset s_hdr/AHG fields
+	 *
+	 * This insures that the ahgentry/ahgcount
+	 * are at a non-AHG default to protect
+	 * build_verbs_tx_desc() from using
+	 * an include ahgidx.
+	 *
+	 * build_ahg() will modify as appropriate
+	 * to use the AHG feature.
+	 */
+	qp->s_hdr->tx_flags = 0;
+	qp->s_hdr->ahgcount = 0;
+	qp->s_hdr->ahgidx = 0;
+	qp->s_hdr->sde = NULL;
+	if (qp->s_mig_state == IB_MIG_MIGRATED)
+		bth0 |= IB_BTH_MIG_REQ;
+	else
+		middle = 0;
+	if (middle)
+		build_ahg(qp, bth2);
+	else
+		qp->s_flags &= ~HFI1_S_AHG_VALID;
+	qp->s_hdr->ibh.lrh[0] = cpu_to_be16(lrh0);
+	qp->s_hdr->ibh.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);
+	qp->s_hdr->ibh.lrh[2] =
+		cpu_to_be16(qp->s_hdrwords + nwords + SIZE_OF_CRC);
+	qp->s_hdr->ibh.lrh[3] = cpu_to_be16(ppd_from_ibp(ibp)->lid |
+				       qp->remote_ah_attr.src_path_bits);
+	bth0 |= hfi1_get_pkey(ibp, qp->s_pkey_index);
+	bth0 |= extra_bytes << 20;
+	ohdr->bth[0] = cpu_to_be32(bth0);
+	bth1 = qp->remote_qpn;
+	if (qp->s_flags & HFI1_S_ECN) {
+		qp->s_flags &= ~HFI1_S_ECN;
+		/* we recently received a FECN, so return a BECN */
+		bth1 |= (HFI1_BECN_MASK << HFI1_BECN_SHIFT);
+	}
+	ohdr->bth[1] = cpu_to_be32(bth1);
+	ohdr->bth[2] = cpu_to_be32(bth2);
+}
+
+/**
+ * hfi1_do_send - perform a send on a QP
+ * @work: contains a pointer to the QP
+ *
+ * Process entries in the send work queue until credit or queue is
+ * exhausted.  Only allow one CPU to send a packet per QP (tasklet).
+ * Otherwise, two threads could send packets out of order.
+ */
+void hfi1_do_send(struct work_struct *work)
+{
+	struct iowait *wait = container_of(work, struct iowait, iowork);
+	struct hfi1_qp *qp = container_of(wait, struct hfi1_qp, s_iowait);
+	struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	int (*make_req)(struct hfi1_qp *qp);
+	unsigned long flags;
+
+	if ((qp->ibqp.qp_type == IB_QPT_RC ||
+	     qp->ibqp.qp_type == IB_QPT_UC) &&
+	    !loopback &&
+	    (qp->remote_ah_attr.dlid & ~((1 << ppd->lmc) - 1)) == ppd->lid) {
+		ruc_loopback(qp);
+		return;
+	}
+
+	if (qp->ibqp.qp_type == IB_QPT_RC)
+		make_req = hfi1_make_rc_req;
+	else if (qp->ibqp.qp_type == IB_QPT_UC)
+		make_req = hfi1_make_uc_req;
+	else
+		make_req = hfi1_make_ud_req;
+
+	spin_lock_irqsave(&qp->s_lock, flags);
+
+	/* Return if we are already busy processing a work request. */
+	if (!hfi1_send_ok(qp)) {
+		spin_unlock_irqrestore(&qp->s_lock, flags);
+		return;
+	}
+
+	qp->s_flags |= HFI1_S_BUSY;
+
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+
+	do {
+		/* Check for a constructed packet to be sent. */
+		if (qp->s_hdrwords != 0) {
+			/*
+			 * If the packet cannot be sent now, return and
+			 * the send tasklet will be woken up later.
+			 */
+			if (hfi1_verbs_send(qp, qp->s_hdr, qp->s_hdrwords,
+					    qp->s_cur_sge, qp->s_cur_size))
+				break;
+			/* Record that s_hdr is empty. */
+			qp->s_hdrwords = 0;
+		}
+	} while (make_req(qp));
+}
+
+/*
+ * This should be called with s_lock held.
+ */
+void hfi1_send_complete(struct hfi1_qp *qp, struct hfi1_swqe *wqe,
+			enum ib_wc_status status)
+{
+	u32 old_last, last;
+	unsigned i;
+
+	if (!(ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_OR_FLUSH_SEND))
+		return;
+
+	for (i = 0; i < wqe->wr.num_sge; i++) {
+		struct hfi1_sge *sge = &wqe->sg_list[i];
+
+		hfi1_put_mr(sge->mr);
+	}
+	if (qp->ibqp.qp_type == IB_QPT_UD ||
+	    qp->ibqp.qp_type == IB_QPT_SMI ||
+	    qp->ibqp.qp_type == IB_QPT_GSI)
+		atomic_dec(&to_iah(wqe->wr.wr.ud.ah)->refcount);
+
+	/* See ch. 11.2.4.1 and 10.7.3.1 */
+	if (!(qp->s_flags & HFI1_S_SIGNAL_REQ_WR) ||
+	    (wqe->wr.send_flags & IB_SEND_SIGNALED) ||
+	    status != IB_WC_SUCCESS) {
+		struct ib_wc wc;
+
+		memset(&wc, 0, sizeof(wc));
+		wc.wr_id = wqe->wr.wr_id;
+		wc.status = status;
+		wc.opcode = ib_hfi1_wc_opcode[wqe->wr.opcode];
+		wc.qp = &qp->ibqp;
+		if (status == IB_WC_SUCCESS)
+			wc.byte_len = wqe->length;
+		hfi1_cq_enter(to_icq(qp->ibqp.send_cq), &wc,
+			      status != IB_WC_SUCCESS);
+	}
+
+	last = qp->s_last;
+	old_last = last;
+	if (++last >= qp->s_size)
+		last = 0;
+	qp->s_last = last;
+	if (qp->s_acked == old_last)
+		qp->s_acked = last;
+	if (qp->s_cur == old_last)
+		qp->s_cur = last;
+	if (qp->s_tail == old_last)
+		qp->s_tail = last;
+	if (qp->state == IB_QPS_SQD && last == qp->s_cur)
+		qp->s_draining = 0;
+}
diff --git a/drivers/staging/rdma/hfi1/sdma.c b/drivers/staging/rdma/hfi1/sdma.c
new file mode 100644
index 000000000000..37bd767d6bc0
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/sdma.c
@@ -0,0 +1,2962 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/spinlock.h>
+#include <linux/seqlock.h>
+#include <linux/netdevice.h>
+#include <linux/moduleparam.h>
+#include <linux/bitops.h>
+#include <linux/timer.h>
+#include <linux/vmalloc.h>
+
+#include "hfi.h"
+#include "common.h"
+#include "qp.h"
+#include "sdma.h"
+#include "iowait.h"
+#include "trace.h"
+
+/* must be a power of 2 >= 64 <= 32768 */
+#define SDMA_DESCQ_CNT 1024
+#define INVALID_TAIL 0xffff
+
+static uint sdma_descq_cnt = SDMA_DESCQ_CNT;
+module_param(sdma_descq_cnt, uint, S_IRUGO);
+MODULE_PARM_DESC(sdma_descq_cnt, "Number of SDMA descq entries");
+
+static uint sdma_idle_cnt = 250;
+module_param(sdma_idle_cnt, uint, S_IRUGO);
+MODULE_PARM_DESC(sdma_idle_cnt, "sdma interrupt idle delay (ns,default 250)");
+
+uint mod_num_sdma;
+module_param_named(num_sdma, mod_num_sdma, uint, S_IRUGO);
+MODULE_PARM_DESC(num_sdma, "Set max number SDMA engines to use");
+
+#define SDMA_WAIT_BATCH_SIZE 20
+/* max wait time for a SDMA engine to indicate it has halted */
+#define SDMA_ERR_HALT_TIMEOUT 10 /* ms */
+/* all SDMA engine errors that cause a halt */
+
+#define SD(name) SEND_DMA_##name
+#define ALL_SDMA_ENG_HALT_ERRS \
+	(SD(ENG_ERR_STATUS_SDMA_WRONG_DW_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_GEN_MISMATCH_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_TOO_LONG_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_TAIL_OUT_OF_BOUNDS_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_FIRST_DESC_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_MEM_READ_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_HALT_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_LENGTH_MISMATCH_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_PACKET_DESC_OVERFLOW_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_HEADER_SELECT_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_HEADER_ADDRESS_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_HEADER_LENGTH_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_TIMEOUT_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_DESC_TABLE_UNC_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_ASSEMBLY_UNC_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_PACKET_TRACKING_UNC_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_HEADER_STORAGE_UNC_ERR_SMASK) \
+	| SD(ENG_ERR_STATUS_SDMA_HEADER_REQUEST_FIFO_UNC_ERR_SMASK))
+
+/* sdma_sendctrl operations */
+#define SDMA_SENDCTRL_OP_ENABLE    (1U << 0)
+#define SDMA_SENDCTRL_OP_INTENABLE (1U << 1)
+#define SDMA_SENDCTRL_OP_HALT      (1U << 2)
+#define SDMA_SENDCTRL_OP_CLEANUP   (1U << 3)
+
+/* handle long defines */
+#define SDMA_EGRESS_PACKET_OCCUPANCY_SMASK \
+SEND_EGRESS_SEND_DMA_STATUS_SDMA_EGRESS_PACKET_OCCUPANCY_SMASK
+#define SDMA_EGRESS_PACKET_OCCUPANCY_SHIFT \
+SEND_EGRESS_SEND_DMA_STATUS_SDMA_EGRESS_PACKET_OCCUPANCY_SHIFT
+
+static const char * const sdma_state_names[] = {
+	[sdma_state_s00_hw_down]                = "s00_HwDown",
+	[sdma_state_s10_hw_start_up_halt_wait]  = "s10_HwStartUpHaltWait",
+	[sdma_state_s15_hw_start_up_clean_wait] = "s15_HwStartUpCleanWait",
+	[sdma_state_s20_idle]                   = "s20_Idle",
+	[sdma_state_s30_sw_clean_up_wait]       = "s30_SwCleanUpWait",
+	[sdma_state_s40_hw_clean_up_wait]       = "s40_HwCleanUpWait",
+	[sdma_state_s50_hw_halt_wait]           = "s50_HwHaltWait",
+	[sdma_state_s60_idle_halt_wait]         = "s60_IdleHaltWait",
+	[sdma_state_s80_hw_freeze]		= "s80_HwFreeze",
+	[sdma_state_s82_freeze_sw_clean]	= "s82_FreezeSwClean",
+	[sdma_state_s99_running]                = "s99_Running",
+};
+
+static const char * const sdma_event_names[] = {
+	[sdma_event_e00_go_hw_down]   = "e00_GoHwDown",
+	[sdma_event_e10_go_hw_start]  = "e10_GoHwStart",
+	[sdma_event_e15_hw_halt_done] = "e15_HwHaltDone",
+	[sdma_event_e25_hw_clean_up_done] = "e25_HwCleanUpDone",
+	[sdma_event_e30_go_running]   = "e30_GoRunning",
+	[sdma_event_e40_sw_cleaned]   = "e40_SwCleaned",
+	[sdma_event_e50_hw_cleaned]   = "e50_HwCleaned",
+	[sdma_event_e60_hw_halted]    = "e60_HwHalted",
+	[sdma_event_e70_go_idle]      = "e70_GoIdle",
+	[sdma_event_e80_hw_freeze]    = "e80_HwFreeze",
+	[sdma_event_e81_hw_frozen]    = "e81_HwFrozen",
+	[sdma_event_e82_hw_unfreeze]  = "e82_HwUnfreeze",
+	[sdma_event_e85_link_down]    = "e85_LinkDown",
+	[sdma_event_e90_sw_halted]    = "e90_SwHalted",
+};
+
+static const struct sdma_set_state_action sdma_action_table[] = {
+	[sdma_state_s00_hw_down] = {
+		.go_s99_running_tofalse = 1,
+		.op_enable = 0,
+		.op_intenable = 0,
+		.op_halt = 0,
+		.op_cleanup = 0,
+	},
+	[sdma_state_s10_hw_start_up_halt_wait] = {
+		.op_enable = 0,
+		.op_intenable = 0,
+		.op_halt = 1,
+		.op_cleanup = 0,
+	},
+	[sdma_state_s15_hw_start_up_clean_wait] = {
+		.op_enable = 0,
+		.op_intenable = 1,
+		.op_halt = 0,
+		.op_cleanup = 1,
+	},
+	[sdma_state_s20_idle] = {
+		.op_enable = 0,
+		.op_intenable = 1,
+		.op_halt = 0,
+		.op_cleanup = 0,
+	},
+	[sdma_state_s30_sw_clean_up_wait] = {
+		.op_enable = 0,
+		.op_intenable = 0,
+		.op_halt = 0,
+		.op_cleanup = 0,
+	},
+	[sdma_state_s40_hw_clean_up_wait] = {
+		.op_enable = 0,
+		.op_intenable = 0,
+		.op_halt = 0,
+		.op_cleanup = 1,
+	},
+	[sdma_state_s50_hw_halt_wait] = {
+		.op_enable = 0,
+		.op_intenable = 0,
+		.op_halt = 0,
+		.op_cleanup = 0,
+	},
+	[sdma_state_s60_idle_halt_wait] = {
+		.go_s99_running_tofalse = 1,
+		.op_enable = 0,
+		.op_intenable = 0,
+		.op_halt = 1,
+		.op_cleanup = 0,
+	},
+	[sdma_state_s80_hw_freeze] = {
+		.op_enable = 0,
+		.op_intenable = 0,
+		.op_halt = 0,
+		.op_cleanup = 0,
+	},
+	[sdma_state_s82_freeze_sw_clean] = {
+		.op_enable = 0,
+		.op_intenable = 0,
+		.op_halt = 0,
+		.op_cleanup = 0,
+	},
+	[sdma_state_s99_running] = {
+		.op_enable = 1,
+		.op_intenable = 1,
+		.op_halt = 0,
+		.op_cleanup = 0,
+		.go_s99_running_totrue = 1,
+	},
+};
+
+#define SDMA_TAIL_UPDATE_THRESH 0x1F
+
+/* declare all statics here rather than keep sorting */
+static void sdma_complete(struct kref *);
+static void sdma_finalput(struct sdma_state *);
+static void sdma_get(struct sdma_state *);
+static void sdma_hw_clean_up_task(unsigned long);
+static void sdma_put(struct sdma_state *);
+static void sdma_set_state(struct sdma_engine *, enum sdma_states);
+static void sdma_start_hw_clean_up(struct sdma_engine *);
+static void sdma_start_sw_clean_up(struct sdma_engine *);
+static void sdma_sw_clean_up_task(unsigned long);
+static void sdma_sendctrl(struct sdma_engine *, unsigned);
+static void init_sdma_regs(struct sdma_engine *, u32, uint);
+static void sdma_process_event(
+	struct sdma_engine *sde,
+	enum sdma_events event);
+static void __sdma_process_event(
+	struct sdma_engine *sde,
+	enum sdma_events event);
+static void dump_sdma_state(struct sdma_engine *sde);
+static void sdma_make_progress(struct sdma_engine *sde, u64 status);
+static void sdma_desc_avail(struct sdma_engine *sde, unsigned avail);
+static void sdma_flush_descq(struct sdma_engine *sde);
+
+/**
+ * sdma_state_name() - return state string from enum
+ * @state: state
+ */
+static const char *sdma_state_name(enum sdma_states state)
+{
+	return sdma_state_names[state];
+}
+
+static void sdma_get(struct sdma_state *ss)
+{
+	kref_get(&ss->kref);
+}
+
+static void sdma_complete(struct kref *kref)
+{
+	struct sdma_state *ss =
+		container_of(kref, struct sdma_state, kref);
+
+	complete(&ss->comp);
+}
+
+static void sdma_put(struct sdma_state *ss)
+{
+	kref_put(&ss->kref, sdma_complete);
+}
+
+static void sdma_finalput(struct sdma_state *ss)
+{
+	sdma_put(ss);
+	wait_for_completion(&ss->comp);
+}
+
+static inline void write_sde_csr(
+	struct sdma_engine *sde,
+	u32 offset0,
+	u64 value)
+{
+	write_kctxt_csr(sde->dd, sde->this_idx, offset0, value);
+}
+
+static inline u64 read_sde_csr(
+	struct sdma_engine *sde,
+	u32 offset0)
+{
+	return read_kctxt_csr(sde->dd, sde->this_idx, offset0);
+}
+
+/*
+ * sdma_wait_for_packet_egress() - wait for the VL FIFO occupancy for
+ * sdma engine 'sde' to drop to 0.
+ */
+static void sdma_wait_for_packet_egress(struct sdma_engine *sde,
+					int pause)
+{
+	u64 off = 8 * sde->this_idx;
+	struct hfi1_devdata *dd = sde->dd;
+	int lcnt = 0;
+
+	while (1) {
+		u64 reg = read_csr(dd, off + SEND_EGRESS_SEND_DMA_STATUS);
+
+		reg &= SDMA_EGRESS_PACKET_OCCUPANCY_SMASK;
+		reg >>= SDMA_EGRESS_PACKET_OCCUPANCY_SHIFT;
+		if (reg == 0)
+			break;
+		if (lcnt++ > 100) {
+			dd_dev_err(dd, "%s: engine %u timeout waiting for packets to egress, remaining count %u\n",
+				  __func__, sde->this_idx, (u32)reg);
+			break;
+		}
+		udelay(1);
+	}
+}
+
+/*
+ * sdma_wait() - wait for packet egress to complete for all SDMA engines,
+ * and pause for credit return.
+ */
+void sdma_wait(struct hfi1_devdata *dd)
+{
+	int i;
+
+	for (i = 0; i < dd->num_sdma; i++) {
+		struct sdma_engine *sde = &dd->per_sdma[i];
+
+		sdma_wait_for_packet_egress(sde, 0);
+	}
+}
+
+static inline void sdma_set_desc_cnt(struct sdma_engine *sde, unsigned cnt)
+{
+	u64 reg;
+
+	if (!(sde->dd->flags & HFI1_HAS_SDMA_TIMEOUT))
+		return;
+	reg = cnt;
+	reg &= SD(DESC_CNT_CNT_MASK);
+	reg <<= SD(DESC_CNT_CNT_SHIFT);
+	write_sde_csr(sde, SD(DESC_CNT), reg);
+}
+
+/*
+ * Complete all the sdma requests with a SDMA_TXREQ_S_ABORTED status
+ *
+ * Depending on timing there can be txreqs in two places:
+ * - in the descq ring
+ * - in the flush list
+ *
+ * To avoid ordering issues the descq ring needs to be flushed
+ * first followed by the flush list.
+ *
+ * This routine is called from two places
+ * - From a work queue item
+ * - Directly from the state machine just before setting the
+ *   state to running
+ *
+ * Must be called with head_lock held
+ *
+ */
+static void sdma_flush(struct sdma_engine *sde)
+{
+	struct sdma_txreq *txp, *txp_next;
+	LIST_HEAD(flushlist);
+
+	/* flush from head to tail */
+	sdma_flush_descq(sde);
+	spin_lock(&sde->flushlist_lock);
+	/* copy flush list */
+	list_for_each_entry_safe(txp, txp_next, &sde->flushlist, list) {
+		list_del_init(&txp->list);
+		list_add_tail(&txp->list, &flushlist);
+	}
+	spin_unlock(&sde->flushlist_lock);
+	/* flush from flush list */
+	list_for_each_entry_safe(txp, txp_next, &flushlist, list) {
+		int drained = 0;
+		/* protect against complete modifying */
+		struct iowait *wait = txp->wait;
+
+		list_del_init(&txp->list);
+#ifdef CONFIG_HFI1_DEBUG_SDMA_ORDER
+		trace_hfi1_sdma_out_sn(sde, txp->sn);
+		if (WARN_ON_ONCE(sde->head_sn != txp->sn))
+			dd_dev_err(sde->dd, "expected %llu got %llu\n",
+				sde->head_sn, txp->sn);
+		sde->head_sn++;
+#endif
+		sdma_txclean(sde->dd, txp);
+		if (wait)
+			drained = atomic_dec_and_test(&wait->sdma_busy);
+		if (txp->complete)
+			(*txp->complete)(txp, SDMA_TXREQ_S_ABORTED, drained);
+		if (wait && drained)
+			iowait_drain_wakeup(wait);
+	}
+}
+
+/*
+ * Fields a work request for flushing the descq ring
+ * and the flush list
+ *
+ * If the engine has been brought to running during
+ * the scheduling delay, the flush is ignored, assuming
+ * that the process of bringing the engine to running
+ * would have done this flush prior to going to running.
+ *
+ */
+static void sdma_field_flush(struct work_struct *work)
+{
+	unsigned long flags;
+	struct sdma_engine *sde =
+		container_of(work, struct sdma_engine, flush_worker);
+
+	write_seqlock_irqsave(&sde->head_lock, flags);
+	if (!__sdma_running(sde))
+		sdma_flush(sde);
+	write_sequnlock_irqrestore(&sde->head_lock, flags);
+}
+
+static void sdma_err_halt_wait(struct work_struct *work)
+{
+	struct sdma_engine *sde = container_of(work, struct sdma_engine,
+						err_halt_worker);
+	u64 statuscsr;
+	unsigned long timeout;
+
+	timeout = jiffies + msecs_to_jiffies(SDMA_ERR_HALT_TIMEOUT);
+	while (1) {
+		statuscsr = read_sde_csr(sde, SD(STATUS));
+		statuscsr &= SD(STATUS_ENG_HALTED_SMASK);
+		if (statuscsr)
+			break;
+		if (time_after(jiffies, timeout)) {
+			dd_dev_err(sde->dd,
+				"SDMA engine %d - timeout waiting for engine to halt\n",
+				sde->this_idx);
+			/*
+			 * Continue anyway.  This could happen if there was
+			 * an uncorrectable error in the wrong spot.
+			 */
+			break;
+		}
+		usleep_range(80, 120);
+	}
+
+	sdma_process_event(sde, sdma_event_e15_hw_halt_done);
+}
+
+static void sdma_start_err_halt_wait(struct sdma_engine *sde)
+{
+	schedule_work(&sde->err_halt_worker);
+}
+
+
+static void sdma_err_progress_check_schedule(struct sdma_engine *sde)
+{
+	if (!is_bx(sde->dd) && HFI1_CAP_IS_KSET(SDMA_AHG)) {
+
+		unsigned index;
+		struct hfi1_devdata *dd = sde->dd;
+
+		for (index = 0; index < dd->num_sdma; index++) {
+			struct sdma_engine *curr_sdma = &dd->per_sdma[index];
+
+			if (curr_sdma != sde)
+				curr_sdma->progress_check_head =
+							curr_sdma->descq_head;
+		}
+		dd_dev_err(sde->dd,
+			   "SDMA engine %d - check scheduled\n",
+				sde->this_idx);
+		mod_timer(&sde->err_progress_check_timer, jiffies + 10);
+	}
+}
+
+static void sdma_err_progress_check(unsigned long data)
+{
+	unsigned index;
+	struct sdma_engine *sde = (struct sdma_engine *)data;
+
+	dd_dev_err(sde->dd, "SDE progress check event\n");
+	for (index = 0; index < sde->dd->num_sdma; index++) {
+		struct sdma_engine *curr_sde = &sde->dd->per_sdma[index];
+		unsigned long flags;
+
+		/* check progress on each engine except the current one */
+		if (curr_sde == sde)
+			continue;
+		/*
+		 * We must lock interrupts when acquiring sde->lock,
+		 * to avoid a deadlock if interrupt triggers and spins on
+		 * the same lock on same CPU
+		 */
+		spin_lock_irqsave(&curr_sde->tail_lock, flags);
+		write_seqlock(&curr_sde->head_lock);
+
+		/* skip non-running queues */
+		if (curr_sde->state.current_state != sdma_state_s99_running) {
+			write_sequnlock(&curr_sde->head_lock);
+			spin_unlock_irqrestore(&curr_sde->tail_lock, flags);
+			continue;
+		}
+
+		if ((curr_sde->descq_head != curr_sde->descq_tail) &&
+		    (curr_sde->descq_head ==
+				curr_sde->progress_check_head))
+			__sdma_process_event(curr_sde,
+					     sdma_event_e90_sw_halted);
+		write_sequnlock(&curr_sde->head_lock);
+		spin_unlock_irqrestore(&curr_sde->tail_lock, flags);
+	}
+	schedule_work(&sde->err_halt_worker);
+}
+
+static void sdma_hw_clean_up_task(unsigned long opaque)
+{
+	struct sdma_engine *sde = (struct sdma_engine *) opaque;
+	u64 statuscsr;
+
+	while (1) {
+#ifdef CONFIG_SDMA_VERBOSITY
+		dd_dev_err(sde->dd, "CONFIG SDMA(%u) %s:%d %s()\n",
+			   sde->this_idx, slashstrip(__FILE__), __LINE__,
+			__func__);
+#endif
+		statuscsr = read_sde_csr(sde, SD(STATUS));
+		statuscsr &= SD(STATUS_ENG_CLEANED_UP_SMASK);
+		if (statuscsr)
+			break;
+		udelay(10);
+	}
+
+	sdma_process_event(sde, sdma_event_e25_hw_clean_up_done);
+}
+
+static inline struct sdma_txreq *get_txhead(struct sdma_engine *sde)
+{
+	smp_read_barrier_depends(); /* see sdma_update_tail() */
+	return sde->tx_ring[sde->tx_head & sde->sdma_mask];
+}
+
+/*
+ * flush ring for recovery
+ */
+static void sdma_flush_descq(struct sdma_engine *sde)
+{
+	u16 head, tail;
+	int progress = 0;
+	struct sdma_txreq *txp = get_txhead(sde);
+
+	/* The reason for some of the complexity of this code is that
+	 * not all descriptors have corresponding txps.  So, we have to
+	 * be able to skip over descs until we wander into the range of
+	 * the next txp on the list.
+	 */
+	head = sde->descq_head & sde->sdma_mask;
+	tail = sde->descq_tail & sde->sdma_mask;
+	while (head != tail) {
+		/* advance head, wrap if needed */
+		head = ++sde->descq_head & sde->sdma_mask;
+		/* if now past this txp's descs, do the callback */
+		if (txp && txp->next_descq_idx == head) {
+			int drained = 0;
+			/* protect against complete modifying */
+			struct iowait *wait = txp->wait;
+
+			/* remove from list */
+			sde->tx_ring[sde->tx_head++ & sde->sdma_mask] = NULL;
+			if (wait)
+				drained = atomic_dec_and_test(&wait->sdma_busy);
+#ifdef CONFIG_HFI1_DEBUG_SDMA_ORDER
+			trace_hfi1_sdma_out_sn(sde, txp->sn);
+			if (WARN_ON_ONCE(sde->head_sn != txp->sn))
+				dd_dev_err(sde->dd, "expected %llu got %llu\n",
+					sde->head_sn, txp->sn);
+			sde->head_sn++;
+#endif
+			sdma_txclean(sde->dd, txp);
+			trace_hfi1_sdma_progress(sde, head, tail, txp);
+			if (txp->complete)
+				(*txp->complete)(
+					txp,
+					SDMA_TXREQ_S_ABORTED,
+					drained);
+			if (wait && drained)
+				iowait_drain_wakeup(wait);
+			/* see if there is another txp */
+			txp = get_txhead(sde);
+		}
+		progress++;
+	}
+	if (progress)
+		sdma_desc_avail(sde, sdma_descq_freecnt(sde));
+}
+
+static void sdma_sw_clean_up_task(unsigned long opaque)
+{
+	struct sdma_engine *sde = (struct sdma_engine *) opaque;
+	unsigned long flags;
+
+	spin_lock_irqsave(&sde->tail_lock, flags);
+	write_seqlock(&sde->head_lock);
+
+	/*
+	 * At this point, the following should always be true:
+	 * - We are halted, so no more descriptors are getting retired.
+	 * - We are not running, so no one is submitting new work.
+	 * - Only we can send the e40_sw_cleaned, so we can't start
+	 *   running again until we say so.  So, the active list and
+	 *   descq are ours to play with.
+	 */
+
+
+	/*
+	 * In the error clean up sequence, software clean must be called
+	 * before the hardware clean so we can use the hardware head in
+	 * the progress routine.  A hardware clean or SPC unfreeze will
+	 * reset the hardware head.
+	 *
+	 * Process all retired requests. The progress routine will use the
+	 * latest physical hardware head - we are not running so speed does
+	 * not matter.
+	 */
+	sdma_make_progress(sde, 0);
+
+	sdma_flush(sde);
+
+	/*
+	 * Reset our notion of head and tail.
+	 * Note that the HW registers have been reset via an earlier
+	 * clean up.
+	 */
+	sde->descq_tail = 0;
+	sde->descq_head = 0;
+	sde->desc_avail = sdma_descq_freecnt(sde);
+	*sde->head_dma = 0;
+
+	__sdma_process_event(sde, sdma_event_e40_sw_cleaned);
+
+	write_sequnlock(&sde->head_lock);
+	spin_unlock_irqrestore(&sde->tail_lock, flags);
+}
+
+static void sdma_sw_tear_down(struct sdma_engine *sde)
+{
+	struct sdma_state *ss = &sde->state;
+
+	/* Releasing this reference means the state machine has stopped. */
+	sdma_put(ss);
+
+	/* stop waiting for all unfreeze events to complete */
+	atomic_set(&sde->dd->sdma_unfreeze_count, -1);
+	wake_up_interruptible(&sde->dd->sdma_unfreeze_wq);
+}
+
+static void sdma_start_hw_clean_up(struct sdma_engine *sde)
+{
+	tasklet_hi_schedule(&sde->sdma_hw_clean_up_task);
+}
+
+static void sdma_start_sw_clean_up(struct sdma_engine *sde)
+{
+	tasklet_hi_schedule(&sde->sdma_sw_clean_up_task);
+}
+
+static void sdma_set_state(struct sdma_engine *sde,
+	enum sdma_states next_state)
+{
+	struct sdma_state *ss = &sde->state;
+	const struct sdma_set_state_action *action = sdma_action_table;
+	unsigned op = 0;
+
+	trace_hfi1_sdma_state(
+		sde,
+		sdma_state_names[ss->current_state],
+		sdma_state_names[next_state]);
+
+	/* debugging bookkeeping */
+	ss->previous_state = ss->current_state;
+	ss->previous_op = ss->current_op;
+	ss->current_state = next_state;
+
+	if (ss->previous_state != sdma_state_s99_running
+		&& next_state == sdma_state_s99_running)
+		sdma_flush(sde);
+
+	if (action[next_state].op_enable)
+		op |= SDMA_SENDCTRL_OP_ENABLE;
+
+	if (action[next_state].op_intenable)
+		op |= SDMA_SENDCTRL_OP_INTENABLE;
+
+	if (action[next_state].op_halt)
+		op |= SDMA_SENDCTRL_OP_HALT;
+
+	if (action[next_state].op_cleanup)
+		op |= SDMA_SENDCTRL_OP_CLEANUP;
+
+	if (action[next_state].go_s99_running_tofalse)
+		ss->go_s99_running = 0;
+
+	if (action[next_state].go_s99_running_totrue)
+		ss->go_s99_running = 1;
+
+	ss->current_op = op;
+	sdma_sendctrl(sde, ss->current_op);
+}
+
+/**
+ * sdma_get_descq_cnt() - called when device probed
+ *
+ * Return a validated descq count.
+ *
+ * This is currently only used in the verbs initialization to build the tx
+ * list.
+ *
+ * This will probably be deleted in favor of a more scalable approach to
+ * alloc tx's.
+ *
+ */
+u16 sdma_get_descq_cnt(void)
+{
+	u16 count = sdma_descq_cnt;
+
+	if (!count)
+		return SDMA_DESCQ_CNT;
+	/* count must be a power of 2 greater than 64 and less than
+	 * 32768.   Otherwise return default.
+	 */
+	if (!is_power_of_2(count))
+		return SDMA_DESCQ_CNT;
+	if (count < 64 && count > 32768)
+		return SDMA_DESCQ_CNT;
+	return count;
+}
+/**
+ * sdma_select_engine_vl() - select sdma engine
+ * @dd: devdata
+ * @selector: a spreading factor
+ * @vl: this vl
+ *
+ *
+ * This function returns an engine based on the selector and a vl.  The
+ * mapping fields are protected by RCU.
+ */
+struct sdma_engine *sdma_select_engine_vl(
+	struct hfi1_devdata *dd,
+	u32 selector,
+	u8 vl)
+{
+	struct sdma_vl_map *m;
+	struct sdma_map_elem *e;
+	struct sdma_engine *rval;
+
+	if (WARN_ON(vl > 8))
+		return NULL;
+
+	rcu_read_lock();
+	m = rcu_dereference(dd->sdma_map);
+	if (unlikely(!m)) {
+		rcu_read_unlock();
+		return NULL;
+	}
+	e = m->map[vl & m->mask];
+	rval = e->sde[selector & e->mask];
+	rcu_read_unlock();
+
+	trace_hfi1_sdma_engine_select(dd, selector, vl, rval->this_idx);
+	return rval;
+}
+
+/**
+ * sdma_select_engine_sc() - select sdma engine
+ * @dd: devdata
+ * @selector: a spreading factor
+ * @sc5: the 5 bit sc
+ *
+ *
+ * This function returns an engine based on the selector and an sc.
+ */
+struct sdma_engine *sdma_select_engine_sc(
+	struct hfi1_devdata *dd,
+	u32 selector,
+	u8 sc5)
+{
+	u8 vl = sc_to_vlt(dd, sc5);
+
+	return sdma_select_engine_vl(dd, selector, vl);
+}
+
+/*
+ * Free the indicated map struct
+ */
+static void sdma_map_free(struct sdma_vl_map *m)
+{
+	int i;
+
+	for (i = 0; m && i < m->actual_vls; i++)
+		kfree(m->map[i]);
+	kfree(m);
+}
+
+/*
+ * Handle RCU callback
+ */
+static void sdma_map_rcu_callback(struct rcu_head *list)
+{
+	struct sdma_vl_map *m = container_of(list, struct sdma_vl_map, list);
+
+	sdma_map_free(m);
+}
+
+/**
+ * sdma_map_init - called when # vls change
+ * @dd: hfi1_devdata
+ * @port: port number
+ * @num_vls: number of vls
+ * @vl_engines: per vl engine mapping (optional)
+ *
+ * This routine changes the mapping based on the number of vls.
+ *
+ * vl_engines is used to specify a non-uniform vl/engine loading. NULL
+ * implies auto computing the loading and giving each VLs a uniform
+ * distribution of engines per VL.
+ *
+ * The auto algorithm computes the sde_per_vl and the number of extra
+ * engines.  Any extra engines are added from the last VL on down.
+ *
+ * rcu locking is used here to control access to the mapping fields.
+ *
+ * If either the num_vls or num_sdma are non-power of 2, the array sizes
+ * in the struct sdma_vl_map and the struct sdma_map_elem are rounded
+ * up to the next highest power of 2 and the first entry is reused
+ * in a round robin fashion.
+ *
+ * If an error occurs the map change is not done and the mapping is
+ * not changed.
+ *
+ */
+int sdma_map_init(struct hfi1_devdata *dd, u8 port, u8 num_vls, u8 *vl_engines)
+{
+	int i, j;
+	int extra, sde_per_vl;
+	int engine = 0;
+	u8 lvl_engines[OPA_MAX_VLS];
+	struct sdma_vl_map *oldmap, *newmap;
+
+	if (!(dd->flags & HFI1_HAS_SEND_DMA))
+		return 0;
+
+	if (!vl_engines) {
+		/* truncate divide */
+		sde_per_vl = dd->num_sdma / num_vls;
+		/* extras */
+		extra = dd->num_sdma % num_vls;
+		vl_engines = lvl_engines;
+		/* add extras from last vl down */
+		for (i = num_vls - 1; i >= 0; i--, extra--)
+			vl_engines[i] = sde_per_vl + (extra > 0 ? 1 : 0);
+	}
+	/* build new map */
+	newmap = kzalloc(
+		sizeof(struct sdma_vl_map) +
+			roundup_pow_of_two(num_vls) *
+			sizeof(struct sdma_map_elem *),
+		GFP_KERNEL);
+	if (!newmap)
+		goto bail;
+	newmap->actual_vls = num_vls;
+	newmap->vls = roundup_pow_of_two(num_vls);
+	newmap->mask = (1 << ilog2(newmap->vls)) - 1;
+	for (i = 0; i < newmap->vls; i++) {
+		/* save for wrap around */
+		int first_engine = engine;
+
+		if (i < newmap->actual_vls) {
+			int sz = roundup_pow_of_two(vl_engines[i]);
+
+			/* only allocate once */
+			newmap->map[i] = kzalloc(
+				sizeof(struct sdma_map_elem) +
+					sz * sizeof(struct sdma_engine *),
+				GFP_KERNEL);
+			if (!newmap->map[i])
+				goto bail;
+			newmap->map[i]->mask = (1 << ilog2(sz)) - 1;
+			/* assign engines */
+			for (j = 0; j < sz; j++) {
+				newmap->map[i]->sde[j] =
+					&dd->per_sdma[engine];
+				if (++engine >= first_engine + vl_engines[i])
+					/* wrap back to first engine */
+					engine = first_engine;
+			}
+		} else {
+			/* just re-use entry without allocating */
+			newmap->map[i] = newmap->map[i % num_vls];
+		}
+		engine = first_engine + vl_engines[i];
+	}
+	/* newmap in hand, save old map */
+	spin_lock_irq(&dd->sde_map_lock);
+	oldmap = rcu_dereference_protected(dd->sdma_map,
+			lockdep_is_held(&dd->sde_map_lock));
+
+	/* publish newmap */
+	rcu_assign_pointer(dd->sdma_map, newmap);
+
+	spin_unlock_irq(&dd->sde_map_lock);
+	/* success, free any old map after grace period */
+	if (oldmap)
+		call_rcu(&oldmap->list, sdma_map_rcu_callback);
+	return 0;
+bail:
+	/* free any partial allocation */
+	sdma_map_free(newmap);
+	return -ENOMEM;
+}
+
+/*
+ * Clean up allocated memory.
+ *
+ * This routine is can be called regardless of the success of sdma_init()
+ *
+ */
+static void sdma_clean(struct hfi1_devdata *dd, size_t num_engines)
+{
+	size_t i;
+	struct sdma_engine *sde;
+
+	if (dd->sdma_pad_dma) {
+		dma_free_coherent(&dd->pcidev->dev, 4,
+				  (void *)dd->sdma_pad_dma,
+				  dd->sdma_pad_phys);
+		dd->sdma_pad_dma = NULL;
+		dd->sdma_pad_phys = 0;
+	}
+	if (dd->sdma_heads_dma) {
+		dma_free_coherent(&dd->pcidev->dev, dd->sdma_heads_size,
+				  (void *)dd->sdma_heads_dma,
+				  dd->sdma_heads_phys);
+		dd->sdma_heads_dma = NULL;
+		dd->sdma_heads_phys = 0;
+	}
+	for (i = 0; dd->per_sdma && i < num_engines; ++i) {
+		sde = &dd->per_sdma[i];
+
+		sde->head_dma = NULL;
+		sde->head_phys = 0;
+
+		if (sde->descq) {
+			dma_free_coherent(
+				&dd->pcidev->dev,
+				sde->descq_cnt * sizeof(u64[2]),
+				sde->descq,
+				sde->descq_phys
+			);
+			sde->descq = NULL;
+			sde->descq_phys = 0;
+		}
+		if (is_vmalloc_addr(sde->tx_ring))
+			vfree(sde->tx_ring);
+		else
+			kfree(sde->tx_ring);
+		sde->tx_ring = NULL;
+	}
+	spin_lock_irq(&dd->sde_map_lock);
+	kfree(rcu_access_pointer(dd->sdma_map));
+	RCU_INIT_POINTER(dd->sdma_map, NULL);
+	spin_unlock_irq(&dd->sde_map_lock);
+	synchronize_rcu();
+	kfree(dd->per_sdma);
+	dd->per_sdma = NULL;
+}
+
+/**
+ * sdma_init() - called when device probed
+ * @dd: hfi1_devdata
+ * @port: port number (currently only zero)
+ *
+ * sdma_init initializes the specified number of engines.
+ *
+ * The code initializes each sde, its csrs.  Interrupts
+ * are not required to be enabled.
+ *
+ * Returns:
+ * 0 - success, -errno on failure
+ */
+int sdma_init(struct hfi1_devdata *dd, u8 port)
+{
+	unsigned this_idx;
+	struct sdma_engine *sde;
+	u16 descq_cnt;
+	void *curr_head;
+	struct hfi1_pportdata *ppd = dd->pport + port;
+	u32 per_sdma_credits;
+	uint idle_cnt = sdma_idle_cnt;
+	size_t num_engines = dd->chip_sdma_engines;
+
+	if (!HFI1_CAP_IS_KSET(SDMA)) {
+		HFI1_CAP_CLEAR(SDMA_AHG);
+		return 0;
+	}
+	if (mod_num_sdma &&
+		/* can't exceed chip support */
+		mod_num_sdma <= dd->chip_sdma_engines &&
+		/* count must be >= vls */
+		mod_num_sdma >= num_vls)
+		num_engines = mod_num_sdma;
+
+	dd_dev_info(dd, "SDMA mod_num_sdma: %u\n", mod_num_sdma);
+	dd_dev_info(dd, "SDMA chip_sdma_engines: %u\n", dd->chip_sdma_engines);
+	dd_dev_info(dd, "SDMA chip_sdma_mem_size: %u\n",
+		dd->chip_sdma_mem_size);
+
+	per_sdma_credits =
+		dd->chip_sdma_mem_size/(num_engines * SDMA_BLOCK_SIZE);
+
+	/* set up freeze waitqueue */
+	init_waitqueue_head(&dd->sdma_unfreeze_wq);
+	atomic_set(&dd->sdma_unfreeze_count, 0);
+
+	descq_cnt = sdma_get_descq_cnt();
+	dd_dev_info(dd, "SDMA engines %zu descq_cnt %u\n",
+		num_engines, descq_cnt);
+
+	/* alloc memory for array of send engines */
+	dd->per_sdma = kcalloc(num_engines, sizeof(*dd->per_sdma), GFP_KERNEL);
+	if (!dd->per_sdma)
+		return -ENOMEM;
+
+	idle_cnt = ns_to_cclock(dd, idle_cnt);
+	/* Allocate memory for SendDMA descriptor FIFOs */
+	for (this_idx = 0; this_idx < num_engines; ++this_idx) {
+		sde = &dd->per_sdma[this_idx];
+		sde->dd = dd;
+		sde->ppd = ppd;
+		sde->this_idx = this_idx;
+		sde->descq_cnt = descq_cnt;
+		sde->desc_avail = sdma_descq_freecnt(sde);
+		sde->sdma_shift = ilog2(descq_cnt);
+		sde->sdma_mask = (1 << sde->sdma_shift) - 1;
+		sde->descq_full_count = 0;
+
+		/* Create a mask for all 3 chip interrupt sources */
+		sde->imask = (u64)1 << (0*TXE_NUM_SDMA_ENGINES + this_idx)
+			| (u64)1 << (1*TXE_NUM_SDMA_ENGINES + this_idx)
+			| (u64)1 << (2*TXE_NUM_SDMA_ENGINES + this_idx);
+		/* Create a mask specifically for sdma_idle */
+		sde->idle_mask =
+			(u64)1 << (2*TXE_NUM_SDMA_ENGINES + this_idx);
+		/* Create a mask specifically for sdma_progress */
+		sde->progress_mask =
+			(u64)1 << (TXE_NUM_SDMA_ENGINES + this_idx);
+		spin_lock_init(&sde->tail_lock);
+		seqlock_init(&sde->head_lock);
+		spin_lock_init(&sde->senddmactrl_lock);
+		spin_lock_init(&sde->flushlist_lock);
+		/* insure there is always a zero bit */
+		sde->ahg_bits = 0xfffffffe00000000ULL;
+
+		sdma_set_state(sde, sdma_state_s00_hw_down);
+
+		/* set up reference counting */
+		kref_init(&sde->state.kref);
+		init_completion(&sde->state.comp);
+
+		INIT_LIST_HEAD(&sde->flushlist);
+		INIT_LIST_HEAD(&sde->dmawait);
+
+		sde->tail_csr =
+			get_kctxt_csr_addr(dd, this_idx, SD(TAIL));
+
+		if (idle_cnt)
+			dd->default_desc1 =
+				SDMA_DESC1_HEAD_TO_HOST_FLAG;
+		else
+			dd->default_desc1 =
+				SDMA_DESC1_INT_REQ_FLAG;
+
+		tasklet_init(&sde->sdma_hw_clean_up_task, sdma_hw_clean_up_task,
+			(unsigned long)sde);
+
+		tasklet_init(&sde->sdma_sw_clean_up_task, sdma_sw_clean_up_task,
+			(unsigned long)sde);
+		INIT_WORK(&sde->err_halt_worker, sdma_err_halt_wait);
+		INIT_WORK(&sde->flush_worker, sdma_field_flush);
+
+		sde->progress_check_head = 0;
+
+		init_timer(&sde->err_progress_check_timer);
+		sde->err_progress_check_timer.function =
+						sdma_err_progress_check;
+		sde->err_progress_check_timer.data = (unsigned long)sde;
+
+		sde->descq = dma_zalloc_coherent(
+			&dd->pcidev->dev,
+			descq_cnt * sizeof(u64[2]),
+			&sde->descq_phys,
+			GFP_KERNEL
+		);
+		if (!sde->descq)
+			goto bail;
+		sde->tx_ring =
+			kcalloc(descq_cnt, sizeof(struct sdma_txreq *),
+				GFP_KERNEL);
+		if (!sde->tx_ring)
+			sde->tx_ring =
+				vzalloc(
+					sizeof(struct sdma_txreq *) *
+					descq_cnt);
+		if (!sde->tx_ring)
+			goto bail;
+	}
+
+	dd->sdma_heads_size = L1_CACHE_BYTES * num_engines;
+	/* Allocate memory for DMA of head registers to memory */
+	dd->sdma_heads_dma = dma_zalloc_coherent(
+		&dd->pcidev->dev,
+		dd->sdma_heads_size,
+		&dd->sdma_heads_phys,
+		GFP_KERNEL
+	);
+	if (!dd->sdma_heads_dma) {
+		dd_dev_err(dd, "failed to allocate SendDMA head memory\n");
+		goto bail;
+	}
+
+	/* Allocate memory for pad */
+	dd->sdma_pad_dma = dma_zalloc_coherent(
+		&dd->pcidev->dev,
+		sizeof(u32),
+		&dd->sdma_pad_phys,
+		GFP_KERNEL
+	);
+	if (!dd->sdma_pad_dma) {
+		dd_dev_err(dd, "failed to allocate SendDMA pad memory\n");
+		goto bail;
+	}
+
+	/* assign each engine to different cacheline and init registers */
+	curr_head = (void *)dd->sdma_heads_dma;
+	for (this_idx = 0; this_idx < num_engines; ++this_idx) {
+		unsigned long phys_offset;
+
+		sde = &dd->per_sdma[this_idx];
+
+		sde->head_dma = curr_head;
+		curr_head += L1_CACHE_BYTES;
+		phys_offset = (unsigned long)sde->head_dma -
+			      (unsigned long)dd->sdma_heads_dma;
+		sde->head_phys = dd->sdma_heads_phys + phys_offset;
+		init_sdma_regs(sde, per_sdma_credits, idle_cnt);
+	}
+	dd->flags |= HFI1_HAS_SEND_DMA;
+	dd->flags |= idle_cnt ? HFI1_HAS_SDMA_TIMEOUT : 0;
+	dd->num_sdma = num_engines;
+	if (sdma_map_init(dd, port, ppd->vls_operational, NULL))
+		goto bail;
+	dd_dev_info(dd, "SDMA num_sdma: %u\n", dd->num_sdma);
+	return 0;
+
+bail:
+	sdma_clean(dd, num_engines);
+	return -ENOMEM;
+}
+
+/**
+ * sdma_all_running() - called when the link goes up
+ * @dd: hfi1_devdata
+ *
+ * This routine moves all engines to the running state.
+ */
+void sdma_all_running(struct hfi1_devdata *dd)
+{
+	struct sdma_engine *sde;
+	unsigned int i;
+
+	/* move all engines to running */
+	for (i = 0; i < dd->num_sdma; ++i) {
+		sde = &dd->per_sdma[i];
+		sdma_process_event(sde, sdma_event_e30_go_running);
+	}
+}
+
+/**
+ * sdma_all_idle() - called when the link goes down
+ * @dd: hfi1_devdata
+ *
+ * This routine moves all engines to the idle state.
+ */
+void sdma_all_idle(struct hfi1_devdata *dd)
+{
+	struct sdma_engine *sde;
+	unsigned int i;
+
+	/* idle all engines */
+	for (i = 0; i < dd->num_sdma; ++i) {
+		sde = &dd->per_sdma[i];
+		sdma_process_event(sde, sdma_event_e70_go_idle);
+	}
+}
+
+/**
+ * sdma_start() - called to kick off state processing for all engines
+ * @dd: hfi1_devdata
+ *
+ * This routine is for kicking off the state processing for all required
+ * sdma engines.  Interrupts need to be working at this point.
+ *
+ */
+void sdma_start(struct hfi1_devdata *dd)
+{
+	unsigned i;
+	struct sdma_engine *sde;
+
+	/* kick off the engines state processing */
+	for (i = 0; i < dd->num_sdma; ++i) {
+		sde = &dd->per_sdma[i];
+		sdma_process_event(sde, sdma_event_e10_go_hw_start);
+	}
+}
+
+/**
+ * sdma_exit() - used when module is removed
+ * @dd: hfi1_devdata
+ */
+void sdma_exit(struct hfi1_devdata *dd)
+{
+	unsigned this_idx;
+	struct sdma_engine *sde;
+
+	for (this_idx = 0; dd->per_sdma && this_idx < dd->num_sdma;
+			++this_idx) {
+
+		sde = &dd->per_sdma[this_idx];
+		if (!list_empty(&sde->dmawait))
+			dd_dev_err(dd, "sde %u: dmawait list not empty!\n",
+				sde->this_idx);
+		sdma_process_event(sde, sdma_event_e00_go_hw_down);
+
+		del_timer_sync(&sde->err_progress_check_timer);
+
+		/*
+		 * This waits for the state machine to exit so it is not
+		 * necessary to kill the sdma_sw_clean_up_task to make sure
+		 * it is not running.
+		 */
+		sdma_finalput(&sde->state);
+	}
+	sdma_clean(dd, dd->num_sdma);
+}
+
+/*
+ * unmap the indicated descriptor
+ */
+static inline void sdma_unmap_desc(
+	struct hfi1_devdata *dd,
+	struct sdma_desc *descp)
+{
+	switch (sdma_mapping_type(descp)) {
+	case SDMA_MAP_SINGLE:
+		dma_unmap_single(
+			&dd->pcidev->dev,
+			sdma_mapping_addr(descp),
+			sdma_mapping_len(descp),
+			DMA_TO_DEVICE);
+		break;
+	case SDMA_MAP_PAGE:
+		dma_unmap_page(
+			&dd->pcidev->dev,
+			sdma_mapping_addr(descp),
+			sdma_mapping_len(descp),
+			DMA_TO_DEVICE);
+		break;
+	}
+}
+
+/*
+ * return the mode as indicated by the first
+ * descriptor in the tx.
+ */
+static inline u8 ahg_mode(struct sdma_txreq *tx)
+{
+	return (tx->descp[0].qw[1] & SDMA_DESC1_HEADER_MODE_SMASK)
+		>> SDMA_DESC1_HEADER_MODE_SHIFT;
+}
+
+/**
+ * sdma_txclean() - clean tx of mappings, descp *kmalloc's
+ * @dd: hfi1_devdata for unmapping
+ * @tx: tx request to clean
+ *
+ * This is used in the progress routine to clean the tx or
+ * by the ULP to toss an in-process tx build.
+ *
+ * The code can be called multiple times without issue.
+ *
+ */
+void sdma_txclean(
+	struct hfi1_devdata *dd,
+	struct sdma_txreq *tx)
+{
+	u16 i;
+
+	if (tx->num_desc) {
+		u8 skip = 0, mode = ahg_mode(tx);
+
+		/* unmap first */
+		sdma_unmap_desc(dd, &tx->descp[0]);
+		/* determine number of AHG descriptors to skip */
+		if (mode > SDMA_AHG_APPLY_UPDATE1)
+			skip = mode >> 1;
+		for (i = 1 + skip; i < tx->num_desc; i++)
+			sdma_unmap_desc(dd, &tx->descp[i]);
+		tx->num_desc = 0;
+	}
+	kfree(tx->coalesce_buf);
+	tx->coalesce_buf = NULL;
+	/* kmalloc'ed descp */
+	if (unlikely(tx->desc_limit > ARRAY_SIZE(tx->descs))) {
+		tx->desc_limit = ARRAY_SIZE(tx->descs);
+		kfree(tx->descp);
+	}
+}
+
+static inline u16 sdma_gethead(struct sdma_engine *sde)
+{
+	struct hfi1_devdata *dd = sde->dd;
+	int use_dmahead;
+	u16 hwhead;
+
+#ifdef CONFIG_SDMA_VERBOSITY
+	dd_dev_err(sde->dd, "CONFIG SDMA(%u) %s:%d %s()\n",
+		   sde->this_idx, slashstrip(__FILE__), __LINE__, __func__);
+#endif
+
+retry:
+	use_dmahead = HFI1_CAP_IS_KSET(USE_SDMA_HEAD) && __sdma_running(sde) &&
+					(dd->flags & HFI1_HAS_SDMA_TIMEOUT);
+	hwhead = use_dmahead ?
+		(u16) le64_to_cpu(*sde->head_dma) :
+		(u16) read_sde_csr(sde, SD(HEAD));
+
+	if (unlikely(HFI1_CAP_IS_KSET(SDMA_HEAD_CHECK))) {
+		u16 cnt;
+		u16 swtail;
+		u16 swhead;
+		int sane;
+
+		swhead = sde->descq_head & sde->sdma_mask;
+		/* this code is really bad for cache line trading */
+		swtail = ACCESS_ONCE(sde->descq_tail) & sde->sdma_mask;
+		cnt = sde->descq_cnt;
+
+		if (swhead < swtail)
+			/* not wrapped */
+			sane = (hwhead >= swhead) & (hwhead <= swtail);
+		else if (swhead > swtail)
+			/* wrapped around */
+			sane = ((hwhead >= swhead) && (hwhead < cnt)) ||
+				(hwhead <= swtail);
+		else
+			/* empty */
+			sane = (hwhead == swhead);
+
+		if (unlikely(!sane)) {
+			dd_dev_err(dd, "SDMA(%u) bad head (%s) hwhd=%hu swhd=%hu swtl=%hu cnt=%hu\n",
+				sde->this_idx,
+				use_dmahead ? "dma" : "kreg",
+				hwhead, swhead, swtail, cnt);
+			if (use_dmahead) {
+				/* try one more time, using csr */
+				use_dmahead = 0;
+				goto retry;
+			}
+			/* proceed as if no progress */
+			hwhead = swhead;
+		}
+	}
+	return hwhead;
+}
+
+/*
+ * This is called when there are send DMA descriptors that might be
+ * available.
+ *
+ * This is called with head_lock held.
+ */
+static void sdma_desc_avail(struct sdma_engine *sde, unsigned avail)
+{
+	struct iowait *wait, *nw;
+	struct iowait *waits[SDMA_WAIT_BATCH_SIZE];
+	unsigned i, n = 0, seq;
+	struct sdma_txreq *stx;
+	struct hfi1_ibdev *dev = &sde->dd->verbs_dev;
+
+#ifdef CONFIG_SDMA_VERBOSITY
+	dd_dev_err(sde->dd, "CONFIG SDMA(%u) %s:%d %s()\n", sde->this_idx,
+		   slashstrip(__FILE__), __LINE__, __func__);
+	dd_dev_err(sde->dd, "avail: %u\n", avail);
+#endif
+
+	do {
+		seq = read_seqbegin(&dev->iowait_lock);
+		if (!list_empty(&sde->dmawait)) {
+			/* at least one item */
+			write_seqlock(&dev->iowait_lock);
+			/* Harvest waiters wanting DMA descriptors */
+			list_for_each_entry_safe(
+					wait,
+					nw,
+					&sde->dmawait,
+					list) {
+				u16 num_desc = 0;
+
+				if (!wait->wakeup)
+					continue;
+				if (n == ARRAY_SIZE(waits))
+					break;
+				if (!list_empty(&wait->tx_head)) {
+					stx = list_first_entry(
+						&wait->tx_head,
+						struct sdma_txreq,
+						list);
+					num_desc = stx->num_desc;
+				}
+				if (num_desc > avail)
+					break;
+				avail -= num_desc;
+				list_del_init(&wait->list);
+				waits[n++] = wait;
+			}
+			write_sequnlock(&dev->iowait_lock);
+			break;
+		}
+	} while (read_seqretry(&dev->iowait_lock, seq));
+
+	for (i = 0; i < n; i++)
+		waits[i]->wakeup(waits[i], SDMA_AVAIL_REASON);
+}
+
+/* head_lock must be held */
+static void sdma_make_progress(struct sdma_engine *sde, u64 status)
+{
+	struct sdma_txreq *txp = NULL;
+	int progress = 0;
+	u16 hwhead, swhead, swtail;
+	int idle_check_done = 0;
+
+	hwhead = sdma_gethead(sde);
+
+	/* The reason for some of the complexity of this code is that
+	 * not all descriptors have corresponding txps.  So, we have to
+	 * be able to skip over descs until we wander into the range of
+	 * the next txp on the list.
+	 */
+
+retry:
+	txp = get_txhead(sde);
+	swhead = sde->descq_head & sde->sdma_mask;
+	trace_hfi1_sdma_progress(sde, hwhead, swhead, txp);
+	while (swhead != hwhead) {
+		/* advance head, wrap if needed */
+		swhead = ++sde->descq_head & sde->sdma_mask;
+
+		/* if now past this txp's descs, do the callback */
+		if (txp && txp->next_descq_idx == swhead) {
+			int drained = 0;
+			/* protect against complete modifying */
+			struct iowait *wait = txp->wait;
+
+			/* remove from list */
+			sde->tx_ring[sde->tx_head++ & sde->sdma_mask] = NULL;
+			if (wait)
+				drained = atomic_dec_and_test(&wait->sdma_busy);
+#ifdef CONFIG_HFI1_DEBUG_SDMA_ORDER
+			trace_hfi1_sdma_out_sn(sde, txp->sn);
+			if (WARN_ON_ONCE(sde->head_sn != txp->sn))
+				dd_dev_err(sde->dd, "expected %llu got %llu\n",
+					sde->head_sn, txp->sn);
+			sde->head_sn++;
+#endif
+			sdma_txclean(sde->dd, txp);
+			if (txp->complete)
+				(*txp->complete)(
+					txp,
+					SDMA_TXREQ_S_OK,
+					drained);
+			if (wait && drained)
+				iowait_drain_wakeup(wait);
+			/* see if there is another txp */
+			txp = get_txhead(sde);
+		}
+		trace_hfi1_sdma_progress(sde, hwhead, swhead, txp);
+		progress++;
+	}
+
+	/*
+	 * The SDMA idle interrupt is not guaranteed to be ordered with respect
+	 * to updates to the the dma_head location in host memory. The head
+	 * value read might not be fully up to date. If there are pending
+	 * descriptors and the SDMA idle interrupt fired then read from the
+	 * CSR SDMA head instead to get the latest value from the hardware.
+	 * The hardware SDMA head should be read at most once in this invocation
+	 * of sdma_make_progress(..) which is ensured by idle_check_done flag
+	 */
+	if ((status & sde->idle_mask) && !idle_check_done) {
+		swtail = ACCESS_ONCE(sde->descq_tail) & sde->sdma_mask;
+		if (swtail != hwhead) {
+			hwhead = (u16)read_sde_csr(sde, SD(HEAD));
+			idle_check_done = 1;
+			goto retry;
+		}
+	}
+
+	sde->last_status = status;
+	if (progress)
+		sdma_desc_avail(sde, sdma_descq_freecnt(sde));
+}
+
+/*
+ * sdma_engine_interrupt() - interrupt handler for engine
+ * @sde: sdma engine
+ * @status: sdma interrupt reason
+ *
+ * Status is a mask of the 3 possible interrupts for this engine.  It will
+ * contain bits _only_ for this SDMA engine.  It will contain at least one
+ * bit, it may contain more.
+ */
+void sdma_engine_interrupt(struct sdma_engine *sde, u64 status)
+{
+	trace_hfi1_sdma_engine_interrupt(sde, status);
+	write_seqlock(&sde->head_lock);
+	sdma_set_desc_cnt(sde, sde->descq_cnt / 2);
+	sdma_make_progress(sde, status);
+	write_sequnlock(&sde->head_lock);
+}
+
+/**
+ * sdma_engine_error() - error handler for engine
+ * @sde: sdma engine
+ * @status: sdma interrupt reason
+ */
+void sdma_engine_error(struct sdma_engine *sde, u64 status)
+{
+	unsigned long flags;
+
+#ifdef CONFIG_SDMA_VERBOSITY
+	dd_dev_err(sde->dd, "CONFIG SDMA(%u) error status 0x%llx state %s\n",
+		   sde->this_idx,
+		   (unsigned long long)status,
+		   sdma_state_names[sde->state.current_state]);
+#endif
+	spin_lock_irqsave(&sde->tail_lock, flags);
+	write_seqlock(&sde->head_lock);
+	if (status & ALL_SDMA_ENG_HALT_ERRS)
+		__sdma_process_event(sde, sdma_event_e60_hw_halted);
+	if (status & ~SD(ENG_ERR_STATUS_SDMA_HALT_ERR_SMASK)) {
+		dd_dev_err(sde->dd,
+			"SDMA (%u) engine error: 0x%llx state %s\n",
+			sde->this_idx,
+			(unsigned long long)status,
+			sdma_state_names[sde->state.current_state]);
+		dump_sdma_state(sde);
+	}
+	write_sequnlock(&sde->head_lock);
+	spin_unlock_irqrestore(&sde->tail_lock, flags);
+}
+
+static void sdma_sendctrl(struct sdma_engine *sde, unsigned op)
+{
+	u64 set_senddmactrl = 0;
+	u64 clr_senddmactrl = 0;
+	unsigned long flags;
+
+#ifdef CONFIG_SDMA_VERBOSITY
+	dd_dev_err(sde->dd, "CONFIG SDMA(%u) senddmactrl E=%d I=%d H=%d C=%d\n",
+		   sde->this_idx,
+		   (op & SDMA_SENDCTRL_OP_ENABLE) ? 1 : 0,
+		   (op & SDMA_SENDCTRL_OP_INTENABLE) ? 1 : 0,
+		   (op & SDMA_SENDCTRL_OP_HALT) ? 1 : 0,
+		   (op & SDMA_SENDCTRL_OP_CLEANUP) ? 1 : 0);
+#endif
+
+	if (op & SDMA_SENDCTRL_OP_ENABLE)
+		set_senddmactrl |= SD(CTRL_SDMA_ENABLE_SMASK);
+	else
+		clr_senddmactrl |= SD(CTRL_SDMA_ENABLE_SMASK);
+
+	if (op & SDMA_SENDCTRL_OP_INTENABLE)
+		set_senddmactrl |= SD(CTRL_SDMA_INT_ENABLE_SMASK);
+	else
+		clr_senddmactrl |= SD(CTRL_SDMA_INT_ENABLE_SMASK);
+
+	if (op & SDMA_SENDCTRL_OP_HALT)
+		set_senddmactrl |= SD(CTRL_SDMA_HALT_SMASK);
+	else
+		clr_senddmactrl |= SD(CTRL_SDMA_HALT_SMASK);
+
+	spin_lock_irqsave(&sde->senddmactrl_lock, flags);
+
+	sde->p_senddmactrl |= set_senddmactrl;
+	sde->p_senddmactrl &= ~clr_senddmactrl;
+
+	if (op & SDMA_SENDCTRL_OP_CLEANUP)
+		write_sde_csr(sde, SD(CTRL),
+			sde->p_senddmactrl |
+			SD(CTRL_SDMA_CLEANUP_SMASK));
+	else
+		write_sde_csr(sde, SD(CTRL), sde->p_senddmactrl);
+
+	spin_unlock_irqrestore(&sde->senddmactrl_lock, flags);
+
+#ifdef CONFIG_SDMA_VERBOSITY
+	sdma_dumpstate(sde);
+#endif
+}
+
+static void sdma_setlengen(struct sdma_engine *sde)
+{
+#ifdef CONFIG_SDMA_VERBOSITY
+	dd_dev_err(sde->dd, "CONFIG SDMA(%u) %s:%d %s()\n",
+		   sde->this_idx, slashstrip(__FILE__), __LINE__, __func__);
+#endif
+
+	/*
+	 * Set SendDmaLenGen and clear-then-set the MSB of the generation
+	 * count to enable generation checking and load the internal
+	 * generation counter.
+	 */
+	write_sde_csr(sde, SD(LEN_GEN),
+		(sde->descq_cnt/64) << SD(LEN_GEN_LENGTH_SHIFT)
+	);
+	write_sde_csr(sde, SD(LEN_GEN),
+		((sde->descq_cnt/64) << SD(LEN_GEN_LENGTH_SHIFT))
+		| (4ULL << SD(LEN_GEN_GENERATION_SHIFT))
+	);
+}
+
+static inline void sdma_update_tail(struct sdma_engine *sde, u16 tail)
+{
+	/* Commit writes to memory and advance the tail on the chip */
+	smp_wmb(); /* see get_txhead() */
+	writeq(tail, sde->tail_csr);
+}
+
+/*
+ * This is called when changing to state s10_hw_start_up_halt_wait as
+ * a result of send buffer errors or send DMA descriptor errors.
+ */
+static void sdma_hw_start_up(struct sdma_engine *sde)
+{
+	u64 reg;
+
+#ifdef CONFIG_SDMA_VERBOSITY
+	dd_dev_err(sde->dd, "CONFIG SDMA(%u) %s:%d %s()\n",
+		   sde->this_idx, slashstrip(__FILE__), __LINE__, __func__);
+#endif
+
+	sdma_setlengen(sde);
+	sdma_update_tail(sde, 0); /* Set SendDmaTail */
+	*sde->head_dma = 0;
+
+	reg = SD(ENG_ERR_CLEAR_SDMA_HEADER_REQUEST_FIFO_UNC_ERR_MASK) <<
+	      SD(ENG_ERR_CLEAR_SDMA_HEADER_REQUEST_FIFO_UNC_ERR_SHIFT);
+	write_sde_csr(sde, SD(ENG_ERR_CLEAR), reg);
+}
+
+#define CLEAR_STATIC_RATE_CONTROL_SMASK(r) \
+(r &= ~SEND_DMA_CHECK_ENABLE_DISALLOW_PBC_STATIC_RATE_CONTROL_SMASK)
+
+#define SET_STATIC_RATE_CONTROL_SMASK(r) \
+(r |= SEND_DMA_CHECK_ENABLE_DISALLOW_PBC_STATIC_RATE_CONTROL_SMASK)
+/*
+ * set_sdma_integrity
+ *
+ * Set the SEND_DMA_CHECK_ENABLE register for send DMA engine 'sde'.
+ */
+static void set_sdma_integrity(struct sdma_engine *sde)
+{
+	struct hfi1_devdata *dd = sde->dd;
+	u64 reg;
+
+	if (unlikely(HFI1_CAP_IS_KSET(NO_INTEGRITY)))
+		return;
+
+	reg = hfi1_pkt_base_sdma_integrity(dd);
+
+	if (HFI1_CAP_IS_KSET(STATIC_RATE_CTRL))
+		CLEAR_STATIC_RATE_CONTROL_SMASK(reg);
+	else
+		SET_STATIC_RATE_CONTROL_SMASK(reg);
+
+	write_sde_csr(sde, SD(CHECK_ENABLE), reg);
+}
+
+
+static void init_sdma_regs(
+	struct sdma_engine *sde,
+	u32 credits,
+	uint idle_cnt)
+{
+	u8 opval, opmask;
+#ifdef CONFIG_SDMA_VERBOSITY
+	struct hfi1_devdata *dd = sde->dd;
+
+	dd_dev_err(dd, "CONFIG SDMA(%u) %s:%d %s()\n",
+		   sde->this_idx, slashstrip(__FILE__), __LINE__, __func__);
+#endif
+
+	write_sde_csr(sde, SD(BASE_ADDR), sde->descq_phys);
+	sdma_setlengen(sde);
+	sdma_update_tail(sde, 0); /* Set SendDmaTail */
+	write_sde_csr(sde, SD(RELOAD_CNT), idle_cnt);
+	write_sde_csr(sde, SD(DESC_CNT), 0);
+	write_sde_csr(sde, SD(HEAD_ADDR), sde->head_phys);
+	write_sde_csr(sde, SD(MEMORY),
+		((u64)credits <<
+			SD(MEMORY_SDMA_MEMORY_CNT_SHIFT)) |
+		((u64)(credits * sde->this_idx) <<
+			SD(MEMORY_SDMA_MEMORY_INDEX_SHIFT)));
+	write_sde_csr(sde, SD(ENG_ERR_MASK), ~0ull);
+	set_sdma_integrity(sde);
+	opmask = OPCODE_CHECK_MASK_DISABLED;
+	opval = OPCODE_CHECK_VAL_DISABLED;
+	write_sde_csr(sde, SD(CHECK_OPCODE),
+		(opmask << SEND_CTXT_CHECK_OPCODE_MASK_SHIFT) |
+		(opval << SEND_CTXT_CHECK_OPCODE_VALUE_SHIFT));
+}
+
+#ifdef CONFIG_SDMA_VERBOSITY
+
+#define sdma_dumpstate_helper0(reg) do { \
+		csr = read_csr(sde->dd, reg); \
+		dd_dev_err(sde->dd, "%36s     0x%016llx\n", #reg, csr); \
+	} while (0)
+
+#define sdma_dumpstate_helper(reg) do { \
+		csr = read_sde_csr(sde, reg); \
+		dd_dev_err(sde->dd, "%36s[%02u] 0x%016llx\n", \
+			#reg, sde->this_idx, csr); \
+	} while (0)
+
+#define sdma_dumpstate_helper2(reg) do { \
+		csr = read_csr(sde->dd, reg + (8 * i)); \
+		dd_dev_err(sde->dd, "%33s_%02u     0x%016llx\n", \
+				#reg, i, csr); \
+	} while (0)
+
+void sdma_dumpstate(struct sdma_engine *sde)
+{
+	u64 csr;
+	unsigned i;
+
+	sdma_dumpstate_helper(SD(CTRL));
+	sdma_dumpstate_helper(SD(STATUS));
+	sdma_dumpstate_helper0(SD(ERR_STATUS));
+	sdma_dumpstate_helper0(SD(ERR_MASK));
+	sdma_dumpstate_helper(SD(ENG_ERR_STATUS));
+	sdma_dumpstate_helper(SD(ENG_ERR_MASK));
+
+	for (i = 0; i < CCE_NUM_INT_CSRS; ++i) {
+		sdma_dumpstate_helper2(CCE_INT_STATUS));
+		sdma_dumpstate_helper2(CCE_INT_MASK);
+		sdma_dumpstate_helper2(CCE_INT_BLOCKED);
+	}
+
+	sdma_dumpstate_helper(SD(TAIL));
+	sdma_dumpstate_helper(SD(HEAD));
+	sdma_dumpstate_helper(SD(PRIORITY_THLD));
+	sdma_dumpstate_helper(SD(IDLE_CNT);
+	sdma_dumpstate_helper(SD(RELOAD_CNT));
+	sdma_dumpstate_helper(SD(DESC_CNT));
+	sdma_dumpstate_helper(SD(DESC_FETCHED_CNT));
+	sdma_dumpstate_helper(SD(MEMORY));
+	sdma_dumpstate_helper0(SD(ENGINES));
+	sdma_dumpstate_helper0(SD(MEM_SIZE));
+	/* sdma_dumpstate_helper(SEND_EGRESS_SEND_DMA_STATUS);  */
+	sdma_dumpstate_helper(SD(BASE_ADDR));
+	sdma_dumpstate_helper(SD(LEN_GEN));
+	sdma_dumpstate_helper(SD(HEAD_ADDR));
+	sdma_dumpstate_helper(SD(CHECK_ENABLE));
+	sdma_dumpstate_helper(SD(CHECK_VL));
+	sdma_dumpstate_helper(SD(CHECK_JOB_KEY));
+	sdma_dumpstate_helper(SD(CHECK_PARTITION_KEY));
+	sdma_dumpstate_helper(SD(CHECK_SLID));
+	sdma_dumpstate_helper(SD(CHECK_OPCODE));
+}
+#endif
+
+static void dump_sdma_state(struct sdma_engine *sde)
+{
+	struct hw_sdma_desc *descq;
+	struct hw_sdma_desc *descqp;
+	u64 desc[2];
+	u64 addr;
+	u8 gen;
+	u16 len;
+	u16 head, tail, cnt;
+
+	head = sde->descq_head & sde->sdma_mask;
+	tail = sde->descq_tail & sde->sdma_mask;
+	cnt = sdma_descq_freecnt(sde);
+	descq = sde->descq;
+
+	dd_dev_err(sde->dd,
+		"SDMA (%u) descq_head: %u descq_tail: %u freecnt: %u FLE %d\n",
+		sde->this_idx,
+		head,
+		tail,
+		cnt,
+		!list_empty(&sde->flushlist));
+
+	/* print info for each entry in the descriptor queue */
+	while (head != tail) {
+		char flags[6] = { 'x', 'x', 'x', 'x', 0 };
+
+		descqp = &sde->descq[head];
+		desc[0] = le64_to_cpu(descqp->qw[0]);
+		desc[1] = le64_to_cpu(descqp->qw[1]);
+		flags[0] = (desc[1] & SDMA_DESC1_INT_REQ_FLAG) ? 'I' : '-';
+		flags[1] = (desc[1] & SDMA_DESC1_HEAD_TO_HOST_FLAG) ?
+				'H' : '-';
+		flags[2] = (desc[0] & SDMA_DESC0_FIRST_DESC_FLAG) ? 'F' : '-';
+		flags[3] = (desc[0] & SDMA_DESC0_LAST_DESC_FLAG) ? 'L' : '-';
+		addr = (desc[0] >> SDMA_DESC0_PHY_ADDR_SHIFT)
+			& SDMA_DESC0_PHY_ADDR_MASK;
+		gen = (desc[1] >> SDMA_DESC1_GENERATION_SHIFT)
+			& SDMA_DESC1_GENERATION_MASK;
+		len = (desc[0] >> SDMA_DESC0_BYTE_COUNT_SHIFT)
+			& SDMA_DESC0_BYTE_COUNT_MASK;
+		dd_dev_err(sde->dd,
+			"SDMA sdmadesc[%u]: flags:%s addr:0x%016llx gen:%u len:%u bytes\n",
+			 head, flags, addr, gen, len);
+		dd_dev_err(sde->dd,
+			"\tdesc0:0x%016llx desc1 0x%016llx\n",
+			 desc[0], desc[1]);
+		if (desc[0] & SDMA_DESC0_FIRST_DESC_FLAG)
+			dd_dev_err(sde->dd,
+				"\taidx: %u amode: %u alen: %u\n",
+				(u8)((desc[1] & SDMA_DESC1_HEADER_INDEX_SMASK)
+					>> SDMA_DESC1_HEADER_INDEX_MASK),
+				(u8)((desc[1] & SDMA_DESC1_HEADER_MODE_SMASK)
+					>> SDMA_DESC1_HEADER_MODE_SHIFT),
+				(u8)((desc[1] & SDMA_DESC1_HEADER_DWS_SMASK)
+					>> SDMA_DESC1_HEADER_DWS_SHIFT));
+		head++;
+		head &= sde->sdma_mask;
+	}
+}
+
+#define SDE_FMT \
+	"SDE %u STE %s C 0x%llx S 0x%016llx E 0x%llx T(HW) 0x%llx T(SW) 0x%x H(HW) 0x%llx H(SW) 0x%x H(D) 0x%llx DM 0x%llx GL 0x%llx R 0x%llx LIS 0x%llx AHGI 0x%llx TXT %u TXH %u DT %u DH %u FLNE %d DQF %u SLC 0x%llx\n"
+/**
+ * sdma_seqfile_dump_sde() - debugfs dump of sde
+ * @s: seq file
+ * @sde: send dma engine to dump
+ *
+ * This routine dumps the sde to the indicated seq file.
+ */
+void sdma_seqfile_dump_sde(struct seq_file *s, struct sdma_engine *sde)
+{
+	u16 head, tail;
+	struct hw_sdma_desc *descqp;
+	u64 desc[2];
+	u64 addr;
+	u8 gen;
+	u16 len;
+
+	head = sde->descq_head & sde->sdma_mask;
+	tail = ACCESS_ONCE(sde->descq_tail) & sde->sdma_mask;
+	seq_printf(s, SDE_FMT, sde->this_idx,
+		sdma_state_name(sde->state.current_state),
+		(unsigned long long)read_sde_csr(sde, SD(CTRL)),
+		(unsigned long long)read_sde_csr(sde, SD(STATUS)),
+		(unsigned long long)read_sde_csr(sde,
+			SD(ENG_ERR_STATUS)),
+		(unsigned long long)read_sde_csr(sde, SD(TAIL)),
+		tail,
+		(unsigned long long)read_sde_csr(sde, SD(HEAD)),
+		head,
+		(unsigned long long)le64_to_cpu(*sde->head_dma),
+		(unsigned long long)read_sde_csr(sde, SD(MEMORY)),
+		(unsigned long long)read_sde_csr(sde, SD(LEN_GEN)),
+		(unsigned long long)read_sde_csr(sde, SD(RELOAD_CNT)),
+		(unsigned long long)sde->last_status,
+		(unsigned long long)sde->ahg_bits,
+		sde->tx_tail,
+		sde->tx_head,
+		sde->descq_tail,
+		sde->descq_head,
+		   !list_empty(&sde->flushlist),
+		sde->descq_full_count,
+		(unsigned long long)read_sde_csr(sde, SEND_DMA_CHECK_SLID));
+
+	/* print info for each entry in the descriptor queue */
+	while (head != tail) {
+		char flags[6] = { 'x', 'x', 'x', 'x', 0 };
+
+		descqp = &sde->descq[head];
+		desc[0] = le64_to_cpu(descqp->qw[0]);
+		desc[1] = le64_to_cpu(descqp->qw[1]);
+		flags[0] = (desc[1] & SDMA_DESC1_INT_REQ_FLAG) ? 'I' : '-';
+		flags[1] = (desc[1] & SDMA_DESC1_HEAD_TO_HOST_FLAG) ?
+				'H' : '-';
+		flags[2] = (desc[0] & SDMA_DESC0_FIRST_DESC_FLAG) ? 'F' : '-';
+		flags[3] = (desc[0] & SDMA_DESC0_LAST_DESC_FLAG) ? 'L' : '-';
+		addr = (desc[0] >> SDMA_DESC0_PHY_ADDR_SHIFT)
+			& SDMA_DESC0_PHY_ADDR_MASK;
+		gen = (desc[1] >> SDMA_DESC1_GENERATION_SHIFT)
+			& SDMA_DESC1_GENERATION_MASK;
+		len = (desc[0] >> SDMA_DESC0_BYTE_COUNT_SHIFT)
+			& SDMA_DESC0_BYTE_COUNT_MASK;
+		seq_printf(s,
+			"\tdesc[%u]: flags:%s addr:0x%016llx gen:%u len:%u bytes\n",
+			head, flags, addr, gen, len);
+		if (desc[0] & SDMA_DESC0_FIRST_DESC_FLAG)
+			seq_printf(s, "\t\tahgidx: %u ahgmode: %u\n",
+				(u8)((desc[1] & SDMA_DESC1_HEADER_INDEX_SMASK)
+					>> SDMA_DESC1_HEADER_INDEX_MASK),
+				(u8)((desc[1] & SDMA_DESC1_HEADER_MODE_SMASK)
+					>> SDMA_DESC1_HEADER_MODE_SHIFT));
+		head = (head + 1) & sde->sdma_mask;
+	}
+}
+
+/*
+ * add the generation number into
+ * the qw1 and return
+ */
+static inline u64 add_gen(struct sdma_engine *sde, u64 qw1)
+{
+	u8 generation = (sde->descq_tail >> sde->sdma_shift) & 3;
+
+	qw1 &= ~SDMA_DESC1_GENERATION_SMASK;
+	qw1 |= ((u64)generation & SDMA_DESC1_GENERATION_MASK)
+			<< SDMA_DESC1_GENERATION_SHIFT;
+	return qw1;
+}
+
+/*
+ * This routine submits the indicated tx
+ *
+ * Space has already been guaranteed and
+ * tail side of ring is locked.
+ *
+ * The hardware tail update is done
+ * in the caller and that is facilitated
+ * by returning the new tail.
+ *
+ * There is special case logic for ahg
+ * to not add the generation number for
+ * up to 2 descriptors that follow the
+ * first descriptor.
+ *
+ */
+static inline u16 submit_tx(struct sdma_engine *sde, struct sdma_txreq *tx)
+{
+	int i;
+	u16 tail;
+	struct sdma_desc *descp = tx->descp;
+	u8 skip = 0, mode = ahg_mode(tx);
+
+	tail = sde->descq_tail & sde->sdma_mask;
+	sde->descq[tail].qw[0] = cpu_to_le64(descp->qw[0]);
+	sde->descq[tail].qw[1] = cpu_to_le64(add_gen(sde, descp->qw[1]));
+	trace_hfi1_sdma_descriptor(sde, descp->qw[0], descp->qw[1],
+				   tail, &sde->descq[tail]);
+	tail = ++sde->descq_tail & sde->sdma_mask;
+	descp++;
+	if (mode > SDMA_AHG_APPLY_UPDATE1)
+		skip = mode >> 1;
+	for (i = 1; i < tx->num_desc; i++, descp++) {
+		u64 qw1;
+
+		sde->descq[tail].qw[0] = cpu_to_le64(descp->qw[0]);
+		if (skip) {
+			/* edits don't have generation */
+			qw1 = descp->qw[1];
+			skip--;
+		} else {
+			/* replace generation with real one for non-edits */
+			qw1 = add_gen(sde, descp->qw[1]);
+		}
+		sde->descq[tail].qw[1] = cpu_to_le64(qw1);
+		trace_hfi1_sdma_descriptor(sde, descp->qw[0], qw1,
+					   tail, &sde->descq[tail]);
+		tail = ++sde->descq_tail & sde->sdma_mask;
+	}
+	tx->next_descq_idx = tail;
+#ifdef CONFIG_HFI1_DEBUG_SDMA_ORDER
+	tx->sn = sde->tail_sn++;
+	trace_hfi1_sdma_in_sn(sde, tx->sn);
+	WARN_ON_ONCE(sde->tx_ring[sde->tx_tail & sde->sdma_mask]);
+#endif
+	sde->tx_ring[sde->tx_tail++ & sde->sdma_mask] = tx;
+	sde->desc_avail -= tx->num_desc;
+	return tail;
+}
+
+/*
+ * Check for progress
+ */
+static int sdma_check_progress(
+	struct sdma_engine *sde,
+	struct iowait *wait,
+	struct sdma_txreq *tx)
+{
+	int ret;
+
+	sde->desc_avail = sdma_descq_freecnt(sde);
+	if (tx->num_desc <= sde->desc_avail)
+		return -EAGAIN;
+	/* pulse the head_lock */
+	if (wait && wait->sleep) {
+		unsigned seq;
+
+		seq = raw_seqcount_begin(
+			(const seqcount_t *)&sde->head_lock.seqcount);
+		ret = wait->sleep(sde, wait, tx, seq);
+		if (ret == -EAGAIN)
+			sde->desc_avail = sdma_descq_freecnt(sde);
+	} else
+		ret = -EBUSY;
+	return ret;
+}
+
+/**
+ * sdma_send_txreq() - submit a tx req to ring
+ * @sde: sdma engine to use
+ * @wait: wait structure to use when full (may be NULL)
+ * @tx: sdma_txreq to submit
+ *
+ * The call submits the tx into the ring.  If a iowait structure is non-NULL
+ * the packet will be queued to the list in wait.
+ *
+ * Return:
+ * 0 - Success, -EINVAL - sdma_txreq incomplete, -EBUSY - no space in
+ * ring (wait == NULL)
+ * -EIOCBQUEUED - tx queued to iowait, -ECOMM bad sdma state
+ */
+int sdma_send_txreq(struct sdma_engine *sde,
+		    struct iowait *wait,
+		    struct sdma_txreq *tx)
+{
+	int ret = 0;
+	u16 tail;
+	unsigned long flags;
+
+	/* user should have supplied entire packet */
+	if (unlikely(tx->tlen))
+		return -EINVAL;
+	tx->wait = wait;
+	spin_lock_irqsave(&sde->tail_lock, flags);
+retry:
+	if (unlikely(!__sdma_running(sde)))
+		goto unlock_noconn;
+	if (unlikely(tx->num_desc > sde->desc_avail))
+		goto nodesc;
+	tail = submit_tx(sde, tx);
+	if (wait)
+		atomic_inc(&wait->sdma_busy);
+	sdma_update_tail(sde, tail);
+unlock:
+	spin_unlock_irqrestore(&sde->tail_lock, flags);
+	return ret;
+unlock_noconn:
+	if (wait)
+		atomic_inc(&wait->sdma_busy);
+	tx->next_descq_idx = 0;
+#ifdef CONFIG_HFI1_DEBUG_SDMA_ORDER
+	tx->sn = sde->tail_sn++;
+	trace_hfi1_sdma_in_sn(sde, tx->sn);
+#endif
+	spin_lock(&sde->flushlist_lock);
+	list_add_tail(&tx->list, &sde->flushlist);
+	spin_unlock(&sde->flushlist_lock);
+	if (wait) {
+		wait->tx_count++;
+		wait->count += tx->num_desc;
+	}
+	schedule_work(&sde->flush_worker);
+	ret = -ECOMM;
+	goto unlock;
+nodesc:
+	ret = sdma_check_progress(sde, wait, tx);
+	if (ret == -EAGAIN) {
+		ret = 0;
+		goto retry;
+	}
+	sde->descq_full_count++;
+	goto unlock;
+}
+
+/**
+ * sdma_send_txlist() - submit a list of tx req to ring
+ * @sde: sdma engine to use
+ * @wait: wait structure to use when full (may be NULL)
+ * @tx_list: list of sdma_txreqs to submit
+ *
+ * The call submits the list into the ring.
+ *
+ * If the iowait structure is non-NULL and not equal to the iowait list
+ * the unprocessed part of the list  will be appended to the list in wait.
+ *
+ * In all cases, the tx_list will be updated so the head of the tx_list is
+ * the list of descriptors that have yet to be transmitted.
+ *
+ * The intent of this call is to provide a more efficient
+ * way of submitting multiple packets to SDMA while holding the tail
+ * side locking.
+ *
+ * Return:
+ * 0 - Success, -EINVAL - sdma_txreq incomplete, -EBUSY - no space in ring
+ * (wait == NULL)
+ * -EIOCBQUEUED - tx queued to iowait, -ECOMM bad sdma state
+ */
+int sdma_send_txlist(struct sdma_engine *sde,
+		    struct iowait *wait,
+		    struct list_head *tx_list)
+{
+	struct sdma_txreq *tx, *tx_next;
+	int ret = 0;
+	unsigned long flags;
+	u16 tail = INVALID_TAIL;
+	int count = 0;
+
+	spin_lock_irqsave(&sde->tail_lock, flags);
+retry:
+	list_for_each_entry_safe(tx, tx_next, tx_list, list) {
+		tx->wait = wait;
+		if (unlikely(!__sdma_running(sde)))
+			goto unlock_noconn;
+		if (unlikely(tx->num_desc > sde->desc_avail))
+			goto nodesc;
+		if (unlikely(tx->tlen)) {
+			ret = -EINVAL;
+			goto update_tail;
+		}
+		list_del_init(&tx->list);
+		tail = submit_tx(sde, tx);
+		count++;
+		if (tail != INVALID_TAIL &&
+		    (count & SDMA_TAIL_UPDATE_THRESH) == 0) {
+			sdma_update_tail(sde, tail);
+			tail = INVALID_TAIL;
+		}
+	}
+update_tail:
+	if (wait)
+		atomic_add(count, &wait->sdma_busy);
+	if (tail != INVALID_TAIL)
+		sdma_update_tail(sde, tail);
+	spin_unlock_irqrestore(&sde->tail_lock, flags);
+	return ret;
+unlock_noconn:
+	spin_lock(&sde->flushlist_lock);
+	list_for_each_entry_safe(tx, tx_next, tx_list, list) {
+		tx->wait = wait;
+		list_del_init(&tx->list);
+		if (wait)
+			atomic_inc(&wait->sdma_busy);
+		tx->next_descq_idx = 0;
+#ifdef CONFIG_HFI1_DEBUG_SDMA_ORDER
+		tx->sn = sde->tail_sn++;
+		trace_hfi1_sdma_in_sn(sde, tx->sn);
+#endif
+		list_add_tail(&tx->list, &sde->flushlist);
+		if (wait) {
+			wait->tx_count++;
+			wait->count += tx->num_desc;
+		}
+	}
+	spin_unlock(&sde->flushlist_lock);
+	schedule_work(&sde->flush_worker);
+	ret = -ECOMM;
+	goto update_tail;
+nodesc:
+	ret = sdma_check_progress(sde, wait, tx);
+	if (ret == -EAGAIN) {
+		ret = 0;
+		goto retry;
+	}
+	sde->descq_full_count++;
+	goto update_tail;
+}
+
+static void sdma_process_event(struct sdma_engine *sde,
+	enum sdma_events event)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sde->tail_lock, flags);
+	write_seqlock(&sde->head_lock);
+
+	__sdma_process_event(sde, event);
+
+	if (sde->state.current_state == sdma_state_s99_running)
+		sdma_desc_avail(sde, sdma_descq_freecnt(sde));
+
+	write_sequnlock(&sde->head_lock);
+	spin_unlock_irqrestore(&sde->tail_lock, flags);
+}
+
+static void __sdma_process_event(struct sdma_engine *sde,
+	enum sdma_events event)
+{
+	struct sdma_state *ss = &sde->state;
+	int need_progress = 0;
+
+	/* CONFIG SDMA temporary */
+#ifdef CONFIG_SDMA_VERBOSITY
+	dd_dev_err(sde->dd, "CONFIG SDMA(%u) [%s] %s\n", sde->this_idx,
+		   sdma_state_names[ss->current_state],
+		   sdma_event_names[event]);
+#endif
+
+	switch (ss->current_state) {
+	case sdma_state_s00_hw_down:
+		switch (event) {
+		case sdma_event_e00_go_hw_down:
+			break;
+		case sdma_event_e30_go_running:
+			/*
+			 * If down, but running requested (usually result
+			 * of link up, then we need to start up.
+			 * This can happen when hw down is requested while
+			 * bringing the link up with traffic active on
+			 * 7220, e.g. */
+			ss->go_s99_running = 1;
+			/* fall through and start dma engine */
+		case sdma_event_e10_go_hw_start:
+			/* This reference means the state machine is started */
+			sdma_get(&sde->state);
+			sdma_set_state(sde,
+				sdma_state_s10_hw_start_up_halt_wait);
+			break;
+		case sdma_event_e15_hw_halt_done:
+			break;
+		case sdma_event_e25_hw_clean_up_done:
+			break;
+		case sdma_event_e40_sw_cleaned:
+			sdma_sw_tear_down(sde);
+			break;
+		case sdma_event_e50_hw_cleaned:
+			break;
+		case sdma_event_e60_hw_halted:
+			break;
+		case sdma_event_e70_go_idle:
+			break;
+		case sdma_event_e80_hw_freeze:
+			break;
+		case sdma_event_e81_hw_frozen:
+			break;
+		case sdma_event_e82_hw_unfreeze:
+			break;
+		case sdma_event_e85_link_down:
+			break;
+		case sdma_event_e90_sw_halted:
+			break;
+		}
+		break;
+
+	case sdma_state_s10_hw_start_up_halt_wait:
+		switch (event) {
+		case sdma_event_e00_go_hw_down:
+			sdma_set_state(sde, sdma_state_s00_hw_down);
+			sdma_sw_tear_down(sde);
+			break;
+		case sdma_event_e10_go_hw_start:
+			break;
+		case sdma_event_e15_hw_halt_done:
+			sdma_set_state(sde,
+				sdma_state_s15_hw_start_up_clean_wait);
+			sdma_start_hw_clean_up(sde);
+			break;
+		case sdma_event_e25_hw_clean_up_done:
+			break;
+		case sdma_event_e30_go_running:
+			ss->go_s99_running = 1;
+			break;
+		case sdma_event_e40_sw_cleaned:
+			break;
+		case sdma_event_e50_hw_cleaned:
+			break;
+		case sdma_event_e60_hw_halted:
+			sdma_start_err_halt_wait(sde);
+			break;
+		case sdma_event_e70_go_idle:
+			ss->go_s99_running = 0;
+			break;
+		case sdma_event_e80_hw_freeze:
+			break;
+		case sdma_event_e81_hw_frozen:
+			break;
+		case sdma_event_e82_hw_unfreeze:
+			break;
+		case sdma_event_e85_link_down:
+			break;
+		case sdma_event_e90_sw_halted:
+			break;
+		}
+		break;
+
+	case sdma_state_s15_hw_start_up_clean_wait:
+		switch (event) {
+		case sdma_event_e00_go_hw_down:
+			sdma_set_state(sde, sdma_state_s00_hw_down);
+			sdma_sw_tear_down(sde);
+			break;
+		case sdma_event_e10_go_hw_start:
+			break;
+		case sdma_event_e15_hw_halt_done:
+			break;
+		case sdma_event_e25_hw_clean_up_done:
+			sdma_hw_start_up(sde);
+			sdma_set_state(sde, ss->go_s99_running ?
+				       sdma_state_s99_running :
+				       sdma_state_s20_idle);
+			break;
+		case sdma_event_e30_go_running:
+			ss->go_s99_running = 1;
+			break;
+		case sdma_event_e40_sw_cleaned:
+			break;
+		case sdma_event_e50_hw_cleaned:
+			break;
+		case sdma_event_e60_hw_halted:
+			break;
+		case sdma_event_e70_go_idle:
+			ss->go_s99_running = 0;
+			break;
+		case sdma_event_e80_hw_freeze:
+			break;
+		case sdma_event_e81_hw_frozen:
+			break;
+		case sdma_event_e82_hw_unfreeze:
+			break;
+		case sdma_event_e85_link_down:
+			break;
+		case sdma_event_e90_sw_halted:
+			break;
+		}
+		break;
+
+	case sdma_state_s20_idle:
+		switch (event) {
+		case sdma_event_e00_go_hw_down:
+			sdma_set_state(sde, sdma_state_s00_hw_down);
+			sdma_sw_tear_down(sde);
+			break;
+		case sdma_event_e10_go_hw_start:
+			break;
+		case sdma_event_e15_hw_halt_done:
+			break;
+		case sdma_event_e25_hw_clean_up_done:
+			break;
+		case sdma_event_e30_go_running:
+			sdma_set_state(sde, sdma_state_s99_running);
+			ss->go_s99_running = 1;
+			break;
+		case sdma_event_e40_sw_cleaned:
+			break;
+		case sdma_event_e50_hw_cleaned:
+			break;
+		case sdma_event_e60_hw_halted:
+			sdma_set_state(sde, sdma_state_s50_hw_halt_wait);
+			sdma_start_err_halt_wait(sde);
+			break;
+		case sdma_event_e70_go_idle:
+			break;
+		case sdma_event_e85_link_down:
+			/* fall through */
+		case sdma_event_e80_hw_freeze:
+			sdma_set_state(sde, sdma_state_s80_hw_freeze);
+			atomic_dec(&sde->dd->sdma_unfreeze_count);
+			wake_up_interruptible(&sde->dd->sdma_unfreeze_wq);
+			break;
+		case sdma_event_e81_hw_frozen:
+			break;
+		case sdma_event_e82_hw_unfreeze:
+			break;
+		case sdma_event_e90_sw_halted:
+			break;
+		}
+		break;
+
+	case sdma_state_s30_sw_clean_up_wait:
+		switch (event) {
+		case sdma_event_e00_go_hw_down:
+			sdma_set_state(sde, sdma_state_s00_hw_down);
+			break;
+		case sdma_event_e10_go_hw_start:
+			break;
+		case sdma_event_e15_hw_halt_done:
+			break;
+		case sdma_event_e25_hw_clean_up_done:
+			break;
+		case sdma_event_e30_go_running:
+			ss->go_s99_running = 1;
+			break;
+		case sdma_event_e40_sw_cleaned:
+			sdma_set_state(sde, sdma_state_s40_hw_clean_up_wait);
+			sdma_start_hw_clean_up(sde);
+			break;
+		case sdma_event_e50_hw_cleaned:
+			break;
+		case sdma_event_e60_hw_halted:
+			break;
+		case sdma_event_e70_go_idle:
+			ss->go_s99_running = 0;
+			break;
+		case sdma_event_e80_hw_freeze:
+			break;
+		case sdma_event_e81_hw_frozen:
+			break;
+		case sdma_event_e82_hw_unfreeze:
+			break;
+		case sdma_event_e85_link_down:
+			ss->go_s99_running = 0;
+			break;
+		case sdma_event_e90_sw_halted:
+			break;
+		}
+		break;
+
+	case sdma_state_s40_hw_clean_up_wait:
+		switch (event) {
+		case sdma_event_e00_go_hw_down:
+			sdma_set_state(sde, sdma_state_s00_hw_down);
+			sdma_start_sw_clean_up(sde);
+			break;
+		case sdma_event_e10_go_hw_start:
+			break;
+		case sdma_event_e15_hw_halt_done:
+			break;
+		case sdma_event_e25_hw_clean_up_done:
+			sdma_hw_start_up(sde);
+			sdma_set_state(sde, ss->go_s99_running ?
+				       sdma_state_s99_running :
+				       sdma_state_s20_idle);
+			break;
+		case sdma_event_e30_go_running:
+			ss->go_s99_running = 1;
+			break;
+		case sdma_event_e40_sw_cleaned:
+			break;
+		case sdma_event_e50_hw_cleaned:
+			break;
+		case sdma_event_e60_hw_halted:
+			break;
+		case sdma_event_e70_go_idle:
+			ss->go_s99_running = 0;
+			break;
+		case sdma_event_e80_hw_freeze:
+			break;
+		case sdma_event_e81_hw_frozen:
+			break;
+		case sdma_event_e82_hw_unfreeze:
+			break;
+		case sdma_event_e85_link_down:
+			ss->go_s99_running = 0;
+			break;
+		case sdma_event_e90_sw_halted:
+			break;
+		}
+		break;
+
+	case sdma_state_s50_hw_halt_wait:
+		switch (event) {
+		case sdma_event_e00_go_hw_down:
+			sdma_set_state(sde, sdma_state_s00_hw_down);
+			sdma_start_sw_clean_up(sde);
+			break;
+		case sdma_event_e10_go_hw_start:
+			break;
+		case sdma_event_e15_hw_halt_done:
+			sdma_set_state(sde, sdma_state_s30_sw_clean_up_wait);
+			sdma_start_sw_clean_up(sde);
+			break;
+		case sdma_event_e25_hw_clean_up_done:
+			break;
+		case sdma_event_e30_go_running:
+			ss->go_s99_running = 1;
+			break;
+		case sdma_event_e40_sw_cleaned:
+			break;
+		case sdma_event_e50_hw_cleaned:
+			break;
+		case sdma_event_e60_hw_halted:
+			sdma_start_err_halt_wait(sde);
+			break;
+		case sdma_event_e70_go_idle:
+			ss->go_s99_running = 0;
+			break;
+		case sdma_event_e80_hw_freeze:
+			break;
+		case sdma_event_e81_hw_frozen:
+			break;
+		case sdma_event_e82_hw_unfreeze:
+			break;
+		case sdma_event_e85_link_down:
+			ss->go_s99_running = 0;
+			break;
+		case sdma_event_e90_sw_halted:
+			break;
+		}
+		break;
+
+	case sdma_state_s60_idle_halt_wait:
+		switch (event) {
+		case sdma_event_e00_go_hw_down:
+			sdma_set_state(sde, sdma_state_s00_hw_down);
+			sdma_start_sw_clean_up(sde);
+			break;
+		case sdma_event_e10_go_hw_start:
+			break;
+		case sdma_event_e15_hw_halt_done:
+			sdma_set_state(sde, sdma_state_s30_sw_clean_up_wait);
+			sdma_start_sw_clean_up(sde);
+			break;
+		case sdma_event_e25_hw_clean_up_done:
+			break;
+		case sdma_event_e30_go_running:
+			ss->go_s99_running = 1;
+			break;
+		case sdma_event_e40_sw_cleaned:
+			break;
+		case sdma_event_e50_hw_cleaned:
+			break;
+		case sdma_event_e60_hw_halted:
+			sdma_start_err_halt_wait(sde);
+			break;
+		case sdma_event_e70_go_idle:
+			ss->go_s99_running = 0;
+			break;
+		case sdma_event_e80_hw_freeze:
+			break;
+		case sdma_event_e81_hw_frozen:
+			break;
+		case sdma_event_e82_hw_unfreeze:
+			break;
+		case sdma_event_e85_link_down:
+			break;
+		case sdma_event_e90_sw_halted:
+			break;
+		}
+		break;
+
+	case sdma_state_s80_hw_freeze:
+		switch (event) {
+		case sdma_event_e00_go_hw_down:
+			sdma_set_state(sde, sdma_state_s00_hw_down);
+			sdma_start_sw_clean_up(sde);
+			break;
+		case sdma_event_e10_go_hw_start:
+			break;
+		case sdma_event_e15_hw_halt_done:
+			break;
+		case sdma_event_e25_hw_clean_up_done:
+			break;
+		case sdma_event_e30_go_running:
+			ss->go_s99_running = 1;
+			break;
+		case sdma_event_e40_sw_cleaned:
+			break;
+		case sdma_event_e50_hw_cleaned:
+			break;
+		case sdma_event_e60_hw_halted:
+			break;
+		case sdma_event_e70_go_idle:
+			ss->go_s99_running = 0;
+			break;
+		case sdma_event_e80_hw_freeze:
+			break;
+		case sdma_event_e81_hw_frozen:
+			sdma_set_state(sde, sdma_state_s82_freeze_sw_clean);
+			sdma_start_sw_clean_up(sde);
+			break;
+		case sdma_event_e82_hw_unfreeze:
+			break;
+		case sdma_event_e85_link_down:
+			break;
+		case sdma_event_e90_sw_halted:
+			break;
+		}
+		break;
+
+	case sdma_state_s82_freeze_sw_clean:
+		switch (event) {
+		case sdma_event_e00_go_hw_down:
+			sdma_set_state(sde, sdma_state_s00_hw_down);
+			sdma_start_sw_clean_up(sde);
+			break;
+		case sdma_event_e10_go_hw_start:
+			break;
+		case sdma_event_e15_hw_halt_done:
+			break;
+		case sdma_event_e25_hw_clean_up_done:
+			break;
+		case sdma_event_e30_go_running:
+			ss->go_s99_running = 1;
+			break;
+		case sdma_event_e40_sw_cleaned:
+			/* notify caller this engine is done cleaning */
+			atomic_dec(&sde->dd->sdma_unfreeze_count);
+			wake_up_interruptible(&sde->dd->sdma_unfreeze_wq);
+			break;
+		case sdma_event_e50_hw_cleaned:
+			break;
+		case sdma_event_e60_hw_halted:
+			break;
+		case sdma_event_e70_go_idle:
+			ss->go_s99_running = 0;
+			break;
+		case sdma_event_e80_hw_freeze:
+			break;
+		case sdma_event_e81_hw_frozen:
+			break;
+		case sdma_event_e82_hw_unfreeze:
+			sdma_hw_start_up(sde);
+			sdma_set_state(sde, ss->go_s99_running ?
+				       sdma_state_s99_running :
+				       sdma_state_s20_idle);
+			break;
+		case sdma_event_e85_link_down:
+			break;
+		case sdma_event_e90_sw_halted:
+			break;
+		}
+		break;
+
+	case sdma_state_s99_running:
+		switch (event) {
+		case sdma_event_e00_go_hw_down:
+			sdma_set_state(sde, sdma_state_s00_hw_down);
+			sdma_start_sw_clean_up(sde);
+			break;
+		case sdma_event_e10_go_hw_start:
+			break;
+		case sdma_event_e15_hw_halt_done:
+			break;
+		case sdma_event_e25_hw_clean_up_done:
+			break;
+		case sdma_event_e30_go_running:
+			break;
+		case sdma_event_e40_sw_cleaned:
+			break;
+		case sdma_event_e50_hw_cleaned:
+			break;
+		case sdma_event_e60_hw_halted:
+			need_progress = 1;
+			sdma_err_progress_check_schedule(sde);
+		case sdma_event_e90_sw_halted:
+			/*
+			* SW initiated halt does not perform engines
+			* progress check
+			*/
+			sdma_set_state(sde, sdma_state_s50_hw_halt_wait);
+			sdma_start_err_halt_wait(sde);
+			break;
+		case sdma_event_e70_go_idle:
+			sdma_set_state(sde, sdma_state_s60_idle_halt_wait);
+			break;
+		case sdma_event_e85_link_down:
+			ss->go_s99_running = 0;
+			/* fall through */
+		case sdma_event_e80_hw_freeze:
+			sdma_set_state(sde, sdma_state_s80_hw_freeze);
+			atomic_dec(&sde->dd->sdma_unfreeze_count);
+			wake_up_interruptible(&sde->dd->sdma_unfreeze_wq);
+			break;
+		case sdma_event_e81_hw_frozen:
+			break;
+		case sdma_event_e82_hw_unfreeze:
+			break;
+		}
+		break;
+	}
+
+	ss->last_event = event;
+	if (need_progress)
+		sdma_make_progress(sde, 0);
+}
+
+/*
+ * _extend_sdma_tx_descs() - helper to extend txreq
+ *
+ * This is called once the initial nominal allocation
+ * of descriptors in the sdma_txreq is exhausted.
+ *
+ * The code will bump the allocation up to the max
+ * of MAX_DESC (64) descriptors.  There doesn't seem
+ * much point in an interim step.
+ *
+ */
+int _extend_sdma_tx_descs(struct hfi1_devdata *dd, struct sdma_txreq *tx)
+{
+	int i;
+
+	tx->descp = kmalloc_array(
+			MAX_DESC,
+			sizeof(struct sdma_desc),
+			GFP_ATOMIC);
+	if (!tx->descp)
+		return -ENOMEM;
+	tx->desc_limit = MAX_DESC;
+	/* copy ones already built */
+	for (i = 0; i < tx->num_desc; i++)
+		tx->descp[i] = tx->descs[i];
+	return 0;
+}
+
+/* Update sdes when the lmc changes */
+void sdma_update_lmc(struct hfi1_devdata *dd, u64 mask, u32 lid)
+{
+	struct sdma_engine *sde;
+	int i;
+	u64 sreg;
+
+	sreg = ((mask & SD(CHECK_SLID_MASK_MASK)) <<
+		SD(CHECK_SLID_MASK_SHIFT)) |
+		(((lid & mask) & SD(CHECK_SLID_VALUE_MASK)) <<
+		SD(CHECK_SLID_VALUE_SHIFT));
+
+	for (i = 0; i < dd->num_sdma; i++) {
+		hfi1_cdbg(LINKVERB, "SendDmaEngine[%d].SLID_CHECK = 0x%x",
+			  i, (u32)sreg);
+		sde = &dd->per_sdma[i];
+		write_sde_csr(sde, SD(CHECK_SLID), sreg);
+	}
+}
+
+/* tx not dword sized - pad */
+int _pad_sdma_tx_descs(struct hfi1_devdata *dd, struct sdma_txreq *tx)
+{
+	int rval = 0;
+
+	if ((unlikely(tx->num_desc == tx->desc_limit))) {
+		rval = _extend_sdma_tx_descs(dd, tx);
+		if (rval)
+			return rval;
+	}
+	/* finish the one just added  */
+	tx->num_desc++;
+	make_tx_sdma_desc(
+		tx,
+		SDMA_MAP_NONE,
+		dd->sdma_pad_phys,
+		sizeof(u32) - (tx->packet_len & (sizeof(u32) - 1)));
+	_sdma_close_tx(dd, tx);
+	return rval;
+}
+
+/*
+ * Add ahg to the sdma_txreq
+ *
+ * The logic will consume up to 3
+ * descriptors at the beginning of
+ * sdma_txreq.
+ */
+void _sdma_txreq_ahgadd(
+	struct sdma_txreq *tx,
+	u8 num_ahg,
+	u8 ahg_entry,
+	u32 *ahg,
+	u8 ahg_hlen)
+{
+	u32 i, shift = 0, desc = 0;
+	u8 mode;
+
+	WARN_ON_ONCE(num_ahg > 9 || (ahg_hlen & 3) || ahg_hlen == 4);
+	/* compute mode */
+	if (num_ahg == 1)
+		mode = SDMA_AHG_APPLY_UPDATE1;
+	else if (num_ahg <= 5)
+		mode = SDMA_AHG_APPLY_UPDATE2;
+	else
+		mode = SDMA_AHG_APPLY_UPDATE3;
+	tx->num_desc++;
+	/* initialize to consumed descriptors to zero */
+	switch (mode) {
+	case SDMA_AHG_APPLY_UPDATE3:
+		tx->num_desc++;
+		tx->descs[2].qw[0] = 0;
+		tx->descs[2].qw[1] = 0;
+		/* FALLTHROUGH */
+	case SDMA_AHG_APPLY_UPDATE2:
+		tx->num_desc++;
+		tx->descs[1].qw[0] = 0;
+		tx->descs[1].qw[1] = 0;
+		break;
+	}
+	ahg_hlen >>= 2;
+	tx->descs[0].qw[1] |=
+		(((u64)ahg_entry & SDMA_DESC1_HEADER_INDEX_MASK)
+			<< SDMA_DESC1_HEADER_INDEX_SHIFT) |
+		(((u64)ahg_hlen & SDMA_DESC1_HEADER_DWS_MASK)
+			<< SDMA_DESC1_HEADER_DWS_SHIFT) |
+		(((u64)mode & SDMA_DESC1_HEADER_MODE_MASK)
+			<< SDMA_DESC1_HEADER_MODE_SHIFT) |
+		(((u64)ahg[0] & SDMA_DESC1_HEADER_UPDATE1_MASK)
+			<< SDMA_DESC1_HEADER_UPDATE1_SHIFT);
+	for (i = 0; i < (num_ahg - 1); i++) {
+		if (!shift && !(i & 2))
+			desc++;
+		tx->descs[desc].qw[!!(i & 2)] |=
+			(((u64)ahg[i + 1])
+				<< shift);
+		shift = (shift + 32) & 63;
+	}
+}
+
+/**
+ * sdma_ahg_alloc - allocate an AHG entry
+ * @sde: engine to allocate from
+ *
+ * Return:
+ * 0-31 when successful, -EOPNOTSUPP if AHG is not enabled,
+ * -ENOSPC if an entry is not available
+ */
+int sdma_ahg_alloc(struct sdma_engine *sde)
+{
+	int nr;
+	int oldbit;
+
+	if (!sde) {
+		trace_hfi1_ahg_allocate(sde, -EINVAL);
+		return -EINVAL;
+	}
+	while (1) {
+		nr = ffz(ACCESS_ONCE(sde->ahg_bits));
+		if (nr > 31) {
+			trace_hfi1_ahg_allocate(sde, -ENOSPC);
+			return -ENOSPC;
+		}
+		oldbit = test_and_set_bit(nr, &sde->ahg_bits);
+		if (!oldbit)
+			break;
+		cpu_relax();
+	}
+	trace_hfi1_ahg_allocate(sde, nr);
+	return nr;
+}
+
+/**
+ * sdma_ahg_free - free an AHG entry
+ * @sde: engine to return AHG entry
+ * @ahg_index: index to free
+ *
+ * This routine frees the indicate AHG entry.
+ */
+void sdma_ahg_free(struct sdma_engine *sde, int ahg_index)
+{
+	if (!sde)
+		return;
+	trace_hfi1_ahg_deallocate(sde, ahg_index);
+	if (ahg_index < 0 || ahg_index > 31)
+		return;
+	clear_bit(ahg_index, &sde->ahg_bits);
+}
+
+/*
+ * SPC freeze handling for SDMA engines.  Called when the driver knows
+ * the SPC is going into a freeze but before the freeze is fully
+ * settled.  Generally an error interrupt.
+ *
+ * This event will pull the engine out of running so no more entries can be
+ * added to the engine's queue.
+ */
+void sdma_freeze_notify(struct hfi1_devdata *dd, int link_down)
+{
+	int i;
+	enum sdma_events event = link_down ? sdma_event_e85_link_down :
+					     sdma_event_e80_hw_freeze;
+
+	/* set up the wait but do not wait here */
+	atomic_set(&dd->sdma_unfreeze_count, dd->num_sdma);
+
+	/* tell all engines to stop running and wait */
+	for (i = 0; i < dd->num_sdma; i++)
+		sdma_process_event(&dd->per_sdma[i], event);
+
+	/* sdma_freeze() will wait for all engines to have stopped */
+}
+
+/*
+ * SPC freeze handling for SDMA engines.  Called when the driver knows
+ * the SPC is fully frozen.
+ */
+void sdma_freeze(struct hfi1_devdata *dd)
+{
+	int i;
+	int ret;
+
+	/*
+	 * Make sure all engines have moved out of the running state before
+	 * continuing.
+	 */
+	ret = wait_event_interruptible(dd->sdma_unfreeze_wq,
+				atomic_read(&dd->sdma_unfreeze_count) <= 0);
+	/* interrupted or count is negative, then unloading - just exit */
+	if (ret || atomic_read(&dd->sdma_unfreeze_count) < 0)
+		return;
+
+	/* set up the count for the next wait */
+	atomic_set(&dd->sdma_unfreeze_count, dd->num_sdma);
+
+	/* tell all engines that the SPC is frozen, they can start cleaning */
+	for (i = 0; i < dd->num_sdma; i++)
+		sdma_process_event(&dd->per_sdma[i], sdma_event_e81_hw_frozen);
+
+	/*
+	 * Wait for everyone to finish software clean before exiting.  The
+	 * software clean will read engine CSRs, so must be completed before
+	 * the next step, which will clear the engine CSRs.
+	 */
+	(void) wait_event_interruptible(dd->sdma_unfreeze_wq,
+				atomic_read(&dd->sdma_unfreeze_count) <= 0);
+	/* no need to check results - done no matter what */
+}
+
+/*
+ * SPC freeze handling for the SDMA engines.  Called after the SPC is unfrozen.
+ *
+ * The SPC freeze acts like a SDMA halt and a hardware clean combined.  All
+ * that is left is a software clean.  We could do it after the SPC is fully
+ * frozen, but then we'd have to add another state to wait for the unfreeze.
+ * Instead, just defer the software clean until the unfreeze step.
+ */
+void sdma_unfreeze(struct hfi1_devdata *dd)
+{
+	int i;
+
+	/* tell all engines start freeze clean up */
+	for (i = 0; i < dd->num_sdma; i++)
+		sdma_process_event(&dd->per_sdma[i],
+					sdma_event_e82_hw_unfreeze);
+}
+
+/**
+ * _sdma_engine_progress_schedule() - schedule progress on engine
+ * @sde: sdma_engine to schedule progress
+ *
+ */
+void _sdma_engine_progress_schedule(
+	struct sdma_engine *sde)
+{
+	trace_hfi1_sdma_engine_progress(sde, sde->progress_mask);
+	/* assume we have selected a good cpu */
+	write_csr(sde->dd,
+		  CCE_INT_FORCE + (8*(IS_SDMA_START/64)), sde->progress_mask);
+}
diff --git a/drivers/staging/rdma/hfi1/sdma.h b/drivers/staging/rdma/hfi1/sdma.h
new file mode 100644
index 000000000000..1e613fcd8f4c
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/sdma.h
@@ -0,0 +1,1123 @@
+#ifndef _HFI1_SDMA_H
+#define _HFI1_SDMA_H
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/types.h>
+#include <linux/list.h>
+#include <asm/byteorder.h>
+#include <linux/workqueue.h>
+#include <linux/rculist.h>
+
+#include "hfi.h"
+#include "verbs.h"
+
+/* increased for AHG */
+#define NUM_DESC 6
+/* Hardware limit */
+#define MAX_DESC 64
+/* Hardware limit for SDMA packet size */
+#define MAX_SDMA_PKT_SIZE ((16 * 1024) - 1)
+
+
+#define SDMA_TXREQ_S_OK        0
+#define SDMA_TXREQ_S_SENDERROR 1
+#define SDMA_TXREQ_S_ABORTED   2
+#define SDMA_TXREQ_S_SHUTDOWN  3
+
+/* flags bits */
+#define SDMA_TXREQ_F_URGENT       0x0001
+#define SDMA_TXREQ_F_AHG_COPY     0x0002
+#define SDMA_TXREQ_F_USE_AHG      0x0004
+
+#define SDMA_MAP_NONE          0
+#define SDMA_MAP_SINGLE        1
+#define SDMA_MAP_PAGE          2
+
+#define SDMA_AHG_VALUE_MASK          0xffff
+#define SDMA_AHG_VALUE_SHIFT         0
+#define SDMA_AHG_INDEX_MASK          0xf
+#define SDMA_AHG_INDEX_SHIFT         16
+#define SDMA_AHG_FIELD_LEN_MASK      0xf
+#define SDMA_AHG_FIELD_LEN_SHIFT     20
+#define SDMA_AHG_FIELD_START_MASK    0x1f
+#define SDMA_AHG_FIELD_START_SHIFT   24
+#define SDMA_AHG_UPDATE_ENABLE_MASK  0x1
+#define SDMA_AHG_UPDATE_ENABLE_SHIFT 31
+
+/* AHG modes */
+
+/*
+ * Be aware the ordering and values
+ * for SDMA_AHG_APPLY_UPDATE[123]
+ * are assumed in generating a skip
+ * count in submit_tx() in sdma.c
+ */
+#define SDMA_AHG_NO_AHG              0
+#define SDMA_AHG_COPY                1
+#define SDMA_AHG_APPLY_UPDATE1       2
+#define SDMA_AHG_APPLY_UPDATE2       3
+#define SDMA_AHG_APPLY_UPDATE3       4
+
+/*
+ * Bits defined in the send DMA descriptor.
+ */
+#define SDMA_DESC0_FIRST_DESC_FLAG      (1ULL<<63)
+#define SDMA_DESC0_LAST_DESC_FLAG       (1ULL<<62)
+#define SDMA_DESC0_BYTE_COUNT_SHIFT     48
+#define SDMA_DESC0_BYTE_COUNT_WIDTH     14
+#define SDMA_DESC0_BYTE_COUNT_MASK \
+	((1ULL<<SDMA_DESC0_BYTE_COUNT_WIDTH)-1ULL)
+#define SDMA_DESC0_BYTE_COUNT_SMASK \
+	(SDMA_DESC0_BYTE_COUNT_MASK<<SDMA_DESC0_BYTE_COUNT_SHIFT)
+#define SDMA_DESC0_PHY_ADDR_SHIFT       0
+#define SDMA_DESC0_PHY_ADDR_WIDTH       48
+#define SDMA_DESC0_PHY_ADDR_MASK \
+	((1ULL<<SDMA_DESC0_PHY_ADDR_WIDTH)-1ULL)
+#define SDMA_DESC0_PHY_ADDR_SMASK \
+	(SDMA_DESC0_PHY_ADDR_MASK<<SDMA_DESC0_PHY_ADDR_SHIFT)
+
+#define SDMA_DESC1_HEADER_UPDATE1_SHIFT 32
+#define SDMA_DESC1_HEADER_UPDATE1_WIDTH 32
+#define SDMA_DESC1_HEADER_UPDATE1_MASK \
+	((1ULL<<SDMA_DESC1_HEADER_UPDATE1_WIDTH)-1ULL)
+#define SDMA_DESC1_HEADER_UPDATE1_SMASK \
+	(SDMA_DESC1_HEADER_UPDATE1_MASK<<SDMA_DESC1_HEADER_UPDATE1_SHIFT)
+#define SDMA_DESC1_HEADER_MODE_SHIFT    13
+#define SDMA_DESC1_HEADER_MODE_WIDTH    3
+#define SDMA_DESC1_HEADER_MODE_MASK \
+	((1ULL<<SDMA_DESC1_HEADER_MODE_WIDTH)-1ULL)
+#define SDMA_DESC1_HEADER_MODE_SMASK \
+	(SDMA_DESC1_HEADER_MODE_MASK<<SDMA_DESC1_HEADER_MODE_SHIFT)
+#define SDMA_DESC1_HEADER_INDEX_SHIFT   8
+#define SDMA_DESC1_HEADER_INDEX_WIDTH   5
+#define SDMA_DESC1_HEADER_INDEX_MASK \
+	((1ULL<<SDMA_DESC1_HEADER_INDEX_WIDTH)-1ULL)
+#define SDMA_DESC1_HEADER_INDEX_SMASK \
+	(SDMA_DESC1_HEADER_INDEX_MASK<<SDMA_DESC1_HEADER_INDEX_SHIFT)
+#define SDMA_DESC1_HEADER_DWS_SHIFT     4
+#define SDMA_DESC1_HEADER_DWS_WIDTH     4
+#define SDMA_DESC1_HEADER_DWS_MASK \
+	((1ULL<<SDMA_DESC1_HEADER_DWS_WIDTH)-1ULL)
+#define SDMA_DESC1_HEADER_DWS_SMASK \
+	(SDMA_DESC1_HEADER_DWS_MASK<<SDMA_DESC1_HEADER_DWS_SHIFT)
+#define SDMA_DESC1_GENERATION_SHIFT     2
+#define SDMA_DESC1_GENERATION_WIDTH     2
+#define SDMA_DESC1_GENERATION_MASK \
+	((1ULL<<SDMA_DESC1_GENERATION_WIDTH)-1ULL)
+#define SDMA_DESC1_GENERATION_SMASK \
+	(SDMA_DESC1_GENERATION_MASK<<SDMA_DESC1_GENERATION_SHIFT)
+#define SDMA_DESC1_INT_REQ_FLAG         (1ULL<<1)
+#define SDMA_DESC1_HEAD_TO_HOST_FLAG    (1ULL<<0)
+
+enum sdma_states {
+	sdma_state_s00_hw_down,
+	sdma_state_s10_hw_start_up_halt_wait,
+	sdma_state_s15_hw_start_up_clean_wait,
+	sdma_state_s20_idle,
+	sdma_state_s30_sw_clean_up_wait,
+	sdma_state_s40_hw_clean_up_wait,
+	sdma_state_s50_hw_halt_wait,
+	sdma_state_s60_idle_halt_wait,
+	sdma_state_s80_hw_freeze,
+	sdma_state_s82_freeze_sw_clean,
+	sdma_state_s99_running,
+};
+
+enum sdma_events {
+	sdma_event_e00_go_hw_down,
+	sdma_event_e10_go_hw_start,
+	sdma_event_e15_hw_halt_done,
+	sdma_event_e25_hw_clean_up_done,
+	sdma_event_e30_go_running,
+	sdma_event_e40_sw_cleaned,
+	sdma_event_e50_hw_cleaned,
+	sdma_event_e60_hw_halted,
+	sdma_event_e70_go_idle,
+	sdma_event_e80_hw_freeze,
+	sdma_event_e81_hw_frozen,
+	sdma_event_e82_hw_unfreeze,
+	sdma_event_e85_link_down,
+	sdma_event_e90_sw_halted,
+};
+
+struct sdma_set_state_action {
+	unsigned op_enable:1;
+	unsigned op_intenable:1;
+	unsigned op_halt:1;
+	unsigned op_cleanup:1;
+	unsigned go_s99_running_tofalse:1;
+	unsigned go_s99_running_totrue:1;
+};
+
+struct sdma_state {
+	struct kref          kref;
+	struct completion    comp;
+	enum sdma_states current_state;
+	unsigned             current_op;
+	unsigned             go_s99_running;
+	/* debugging/development */
+	enum sdma_states previous_state;
+	unsigned             previous_op;
+	enum sdma_events last_event;
+};
+
+/**
+ * DOC: sdma exported routines
+ *
+ * These sdma routines fit into three categories:
+ * - The SDMA API for building and submitting packets
+ *   to the ring
+ *
+ * - Initialization and tear down routines to buildup
+ *   and tear down SDMA
+ *
+ * - ISR entrances to handle interrupts, state changes
+ *   and errors
+ */
+
+/**
+ * DOC: sdma PSM/verbs API
+ *
+ * The sdma API is designed to be used by both PSM
+ * and verbs to supply packets to the SDMA ring.
+ *
+ * The usage of the API is as follows:
+ *
+ * Embed a struct iowait in the QP or
+ * PQ.  The iowait should be initialized with a
+ * call to iowait_init().
+ *
+ * The user of the API should create an allocation method
+ * for their version of the txreq. slabs, pre-allocated lists,
+ * and dma pools can be used.  Once the user's overload of
+ * the sdma_txreq has been allocated, the sdma_txreq member
+ * must be initialized with sdma_txinit() or sdma_txinit_ahg().
+ *
+ * The txreq must be declared with the sdma_txreq first.
+ *
+ * The tx request, once initialized,  is manipulated with calls to
+ * sdma_txadd_daddr(), sdma_txadd_page(), or sdma_txadd_kvaddr()
+ * for each disjoint memory location.  It is the user's responsibility
+ * to understand the packet boundaries and page boundaries to do the
+ * appropriate number of sdma_txadd_* calls..  The user
+ * must be prepared to deal with failures from these routines due to
+ * either memory allocation or dma_mapping failures.
+ *
+ * The mapping specifics for each memory location are recorded
+ * in the tx. Memory locations added with sdma_txadd_page()
+ * and sdma_txadd_kvaddr() are automatically mapped when added
+ * to the tx and nmapped as part of the progress processing in the
+ * SDMA interrupt handling.
+ *
+ * sdma_txadd_daddr() is used to add an dma_addr_t memory to the
+ * tx.   An example of a use case would be a pre-allocated
+ * set of headers allocated via dma_pool_alloc() or
+ * dma_alloc_coherent().  For these memory locations, it
+ * is the responsibility of the user to handle that unmapping.
+ * (This would usually be at an unload or job termination.)
+ *
+ * The routine sdma_send_txreq() is used to submit
+ * a tx to the ring after the appropriate number of
+ * sdma_txadd_* have been done.
+ *
+ * If it is desired to send a burst of sdma_txreqs, sdma_send_txlist()
+ * can be used to submit a list of packets.
+ *
+ * The user is free to use the link overhead in the struct sdma_txreq as
+ * long as the tx isn't in flight.
+ *
+ * The extreme degenerate case of the number of descriptors
+ * exceeding the ring size is automatically handled as
+ * memory locations are added.  An overflow of the descriptor
+ * array that is part of the sdma_txreq is also automatically
+ * handled.
+ *
+ */
+
+/**
+ * DOC: Infrastructure calls
+ *
+ * sdma_init() is used to initialize data structures and
+ * CSRs for the desired number of SDMA engines.
+ *
+ * sdma_start() is used to kick the SDMA engines initialized
+ * with sdma_init().   Interrupts must be enabled at this
+ * point since aspects of the state machine are interrupt
+ * driven.
+ *
+ * sdma_engine_error() and sdma_engine_interrupt() are
+ * entrances for interrupts.
+ *
+ * sdma_map_init() is for the management of the mapping
+ * table when the number of vls is changed.
+ *
+ */
+
+/*
+ * struct hw_sdma_desc - raw 128 bit SDMA descriptor
+ *
+ * This is the raw descriptor in the SDMA ring
+ */
+struct hw_sdma_desc {
+	/* private:  don't use directly */
+	__le64 qw[2];
+};
+
+/*
+ * struct sdma_desc - canonical fragment descriptor
+ *
+ * This is the descriptor carried in the tx request
+ * corresponding to each fragment.
+ *
+ */
+struct sdma_desc {
+	/* private:  don't use directly */
+	u64 qw[2];
+};
+
+struct sdma_txreq;
+typedef void (*callback_t)(struct sdma_txreq *, int, int);
+
+/**
+ * struct sdma_txreq - the sdma_txreq structure (one per packet)
+ * @list: for use by user and by queuing for wait
+ *
+ * This is the representation of a packet which consists of some
+ * number of fragments.   Storage is provided to within the structure.
+ * for all fragments.
+ *
+ * The storage for the descriptors are automatically extended as needed
+ * when the currently allocation is exceeded.
+ *
+ * The user (Verbs or PSM) may overload this structure with fields
+ * specific to their use by putting this struct first in their struct.
+ * The method of allocation of the overloaded structure is user dependent
+ *
+ * The list is the only public field in the structure.
+ *
+ */
+
+struct sdma_txreq {
+	struct list_head list;
+	/* private: */
+	struct sdma_desc *descp;
+	/* private: */
+	void *coalesce_buf;
+	/* private: */
+	struct iowait *wait;
+	/* private: */
+	callback_t                  complete;
+#ifdef CONFIG_HFI1_DEBUG_SDMA_ORDER
+	u64 sn;
+#endif
+	/* private: - used in coalesce/pad processing */
+	u16                         packet_len;
+	/* private: - down-counted to trigger last */
+	u16                         tlen;
+	/* private: flags */
+	u16                         flags;
+	/* private: */
+	u16                         num_desc;
+	/* private: */
+	u16                         desc_limit;
+	/* private: */
+	u16                         next_descq_idx;
+	/* private: */
+	struct sdma_desc descs[NUM_DESC];
+};
+
+struct verbs_txreq {
+	struct hfi1_pio_header	phdr;
+	struct sdma_txreq       txreq;
+	struct hfi1_qp           *qp;
+	struct hfi1_swqe         *wqe;
+	struct hfi1_mregion	*mr;
+	struct hfi1_sge_state    *ss;
+	struct sdma_engine     *sde;
+	u16                     hdr_dwords;
+	u16                     hdr_inx;
+};
+
+/**
+ * struct sdma_engine - Data pertaining to each SDMA engine.
+ * @dd: a back-pointer to the device data
+ * @ppd: per port back-pointer
+ * @imask: mask for irq manipulation
+ * @idle_mask: mask for determining if an interrupt is due to sdma_idle
+ *
+ * This structure has the state for each sdma_engine.
+ *
+ * Accessing to non public fields are not supported
+ * since the private members are subject to change.
+ */
+struct sdma_engine {
+	/* read mostly */
+	struct hfi1_devdata *dd;
+	struct hfi1_pportdata *ppd;
+	/* private: */
+	void __iomem *tail_csr;
+	u64 imask;			/* clear interrupt mask */
+	u64 idle_mask;
+	u64 progress_mask;
+	/* private: */
+	struct workqueue_struct *wq;
+	/* private: */
+	volatile __le64      *head_dma; /* DMA'ed by chip */
+	/* private: */
+	dma_addr_t            head_phys;
+	/* private: */
+	struct hw_sdma_desc *descq;
+	/* private: */
+	unsigned descq_full_count;
+	struct sdma_txreq **tx_ring;
+	/* private: */
+	dma_addr_t            descq_phys;
+	/* private */
+	u32 sdma_mask;
+	/* private */
+	struct sdma_state state;
+	/* private: */
+	u8 sdma_shift;
+	/* private: */
+	u8 this_idx; /* zero relative engine */
+	/* protect changes to senddmactrl shadow */
+	spinlock_t senddmactrl_lock;
+	/* private: */
+	u64 p_senddmactrl;		/* shadow per-engine SendDmaCtrl */
+
+	/* read/write using tail_lock */
+	spinlock_t            tail_lock ____cacheline_aligned_in_smp;
+#ifdef CONFIG_HFI1_DEBUG_SDMA_ORDER
+	/* private: */
+	u64                   tail_sn;
+#endif
+	/* private: */
+	u32                   descq_tail;
+	/* private: */
+	unsigned long         ahg_bits;
+	/* private: */
+	u16                   desc_avail;
+	/* private: */
+	u16                   tx_tail;
+	/* private: */
+	u16 descq_cnt;
+
+	/* read/write using head_lock */
+	/* private: */
+	seqlock_t            head_lock ____cacheline_aligned_in_smp;
+#ifdef CONFIG_HFI1_DEBUG_SDMA_ORDER
+	/* private: */
+	u64                   head_sn;
+#endif
+	/* private: */
+	u32                   descq_head;
+	/* private: */
+	u16                   tx_head;
+	/* private: */
+	u64                   last_status;
+
+	/* private: */
+	struct list_head      dmawait;
+
+	/* CONFIG SDMA for now, just blindly duplicate */
+	/* private: */
+	struct tasklet_struct sdma_hw_clean_up_task
+		____cacheline_aligned_in_smp;
+
+	/* private: */
+	struct tasklet_struct sdma_sw_clean_up_task
+		____cacheline_aligned_in_smp;
+	/* private: */
+	struct work_struct err_halt_worker;
+	/* private */
+	struct timer_list     err_progress_check_timer;
+	u32                   progress_check_head;
+	/* private: */
+	struct work_struct flush_worker;
+	spinlock_t flushlist_lock;
+	/* private: */
+	struct list_head flushlist;
+};
+
+
+int sdma_init(struct hfi1_devdata *dd, u8 port);
+void sdma_start(struct hfi1_devdata *dd);
+void sdma_exit(struct hfi1_devdata *dd);
+void sdma_all_running(struct hfi1_devdata *dd);
+void sdma_all_idle(struct hfi1_devdata *dd);
+void sdma_freeze_notify(struct hfi1_devdata *dd, int go_idle);
+void sdma_freeze(struct hfi1_devdata *dd);
+void sdma_unfreeze(struct hfi1_devdata *dd);
+void sdma_wait(struct hfi1_devdata *dd);
+
+/**
+ * sdma_empty() - idle engine test
+ * @engine: sdma engine
+ *
+ * Currently used by verbs as a latency optimization.
+ *
+ * Return:
+ * 1 - empty, 0 - non-empty
+ */
+static inline int sdma_empty(struct sdma_engine *sde)
+{
+	return sde->descq_tail == sde->descq_head;
+}
+
+static inline u16 sdma_descq_freecnt(struct sdma_engine *sde)
+{
+	return sde->descq_cnt -
+		(sde->descq_tail -
+		 ACCESS_ONCE(sde->descq_head)) - 1;
+}
+
+static inline u16 sdma_descq_inprocess(struct sdma_engine *sde)
+{
+	return sde->descq_cnt - sdma_descq_freecnt(sde);
+}
+
+/*
+ * Either head_lock or tail lock required to see
+ * a steady state.
+ */
+static inline int __sdma_running(struct sdma_engine *engine)
+{
+	return engine->state.current_state == sdma_state_s99_running;
+}
+
+
+/**
+ * sdma_running() - state suitability test
+ * @engine: sdma engine
+ *
+ * sdma_running probes the internal state to determine if it is suitable
+ * for submitting packets.
+ *
+ * Return:
+ * 1 - ok to submit, 0 - not ok to submit
+ *
+ */
+static inline int sdma_running(struct sdma_engine *engine)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&engine->tail_lock, flags);
+	ret = __sdma_running(engine);
+	spin_unlock_irqrestore(&engine->tail_lock, flags);
+	return ret;
+}
+
+void _sdma_txreq_ahgadd(
+	struct sdma_txreq *tx,
+	u8 num_ahg,
+	u8 ahg_entry,
+	u32 *ahg,
+	u8 ahg_hlen);
+
+
+/**
+ * sdma_txinit_ahg() - initialize an sdma_txreq struct with AHG
+ * @tx: tx request to initialize
+ * @flags: flags to key last descriptor additions
+ * @tlen: total packet length (pbc + headers + data)
+ * @ahg_entry: ahg entry to use  (0 - 31)
+ * @num_ahg: ahg descriptor for first descriptor (0 - 9)
+ * @ahg: array of AHG descriptors (up to 9 entries)
+ * @ahg_hlen: number of bytes from ASIC entry to use
+ * @cb: callback
+ *
+ * The allocation of the sdma_txreq and it enclosing structure is user
+ * dependent.  This routine must be called to initialize the user independent
+ * fields.
+ *
+ * The currently supported flags are SDMA_TXREQ_F_URGENT,
+ * SDMA_TXREQ_F_AHG_COPY, and SDMA_TXREQ_F_USE_AHG.
+ *
+ * SDMA_TXREQ_F_URGENT is used for latency sensitive situations where the
+ * completion is desired as soon as possible.
+ *
+ * SDMA_TXREQ_F_AHG_COPY causes the header in the first descriptor to be
+ * copied to chip entry. SDMA_TXREQ_F_USE_AHG causes the code to add in
+ * the AHG descriptors into the first 1 to 3 descriptors.
+ *
+ * Completions of submitted requests can be gotten on selected
+ * txreqs by giving a completion routine callback to sdma_txinit() or
+ * sdma_txinit_ahg().  The environment in which the callback runs
+ * can be from an ISR, a tasklet, or a thread, so no sleeping
+ * kernel routines can be used.   Aspects of the sdma ring may
+ * be locked so care should be taken with locking.
+ *
+ * The callback pointer can be NULL to avoid any callback for the packet
+ * being submitted. The callback will be provided this tx, a status, and a flag.
+ *
+ * The status will be one of SDMA_TXREQ_S_OK, SDMA_TXREQ_S_SENDERROR,
+ * SDMA_TXREQ_S_ABORTED, or SDMA_TXREQ_S_SHUTDOWN.
+ *
+ * The flag, if the is the iowait had been used, indicates the iowait
+ * sdma_busy count has reached zero.
+ *
+ * user data portion of tlen should be precise.   The sdma_txadd_* entrances
+ * will pad with a descriptor references 1 - 3 bytes when the number of bytes
+ * specified in tlen have been supplied to the sdma_txreq.
+ *
+ * ahg_hlen is used to determine the number of on-chip entry bytes to
+ * use as the header.   This is for cases where the stored header is
+ * larger than the header to be used in a packet.  This is typical
+ * for verbs where an RDMA_WRITE_FIRST is larger than the packet in
+ * and RDMA_WRITE_MIDDLE.
+ *
+ */
+static inline int sdma_txinit_ahg(
+	struct sdma_txreq *tx,
+	u16 flags,
+	u16 tlen,
+	u8 ahg_entry,
+	u8 num_ahg,
+	u32 *ahg,
+	u8 ahg_hlen,
+	void (*cb)(struct sdma_txreq *, int, int))
+{
+	if (tlen == 0)
+		return -ENODATA;
+	if (tlen > MAX_SDMA_PKT_SIZE)
+		return -EMSGSIZE;
+	tx->desc_limit = ARRAY_SIZE(tx->descs);
+	tx->descp = &tx->descs[0];
+	INIT_LIST_HEAD(&tx->list);
+	tx->num_desc = 0;
+	tx->flags = flags;
+	tx->complete = cb;
+	tx->coalesce_buf = NULL;
+	tx->wait = NULL;
+	tx->tlen = tx->packet_len = tlen;
+	tx->descs[0].qw[0] = SDMA_DESC0_FIRST_DESC_FLAG;
+	tx->descs[0].qw[1] = 0;
+	if (flags & SDMA_TXREQ_F_AHG_COPY)
+		tx->descs[0].qw[1] |=
+			(((u64)ahg_entry & SDMA_DESC1_HEADER_INDEX_MASK)
+				<< SDMA_DESC1_HEADER_INDEX_SHIFT) |
+			(((u64)SDMA_AHG_COPY & SDMA_DESC1_HEADER_MODE_MASK)
+				<< SDMA_DESC1_HEADER_MODE_SHIFT);
+	else if (flags & SDMA_TXREQ_F_USE_AHG && num_ahg)
+		_sdma_txreq_ahgadd(tx, num_ahg, ahg_entry, ahg, ahg_hlen);
+	return 0;
+}
+
+/**
+ * sdma_txinit() - initialize an sdma_txreq struct (no AHG)
+ * @tx: tx request to initialize
+ * @flags: flags to key last descriptor additions
+ * @tlen: total packet length (pbc + headers + data)
+ * @cb: callback pointer
+ *
+ * The allocation of the sdma_txreq and it enclosing structure is user
+ * dependent.  This routine must be called to initialize the user
+ * independent fields.
+ *
+ * The currently supported flags is SDMA_TXREQ_F_URGENT.
+ *
+ * SDMA_TXREQ_F_URGENT is used for latency sensitive situations where the
+ * completion is desired as soon as possible.
+ *
+ * Completions of submitted requests can be gotten on selected
+ * txreqs by giving a completion routine callback to sdma_txinit() or
+ * sdma_txinit_ahg().  The environment in which the callback runs
+ * can be from an ISR, a tasklet, or a thread, so no sleeping
+ * kernel routines can be used.   The head size of the sdma ring may
+ * be locked so care should be taken with locking.
+ *
+ * The callback pointer can be NULL to avoid any callback for the packet
+ * being submitted.
+ *
+ * The callback, if non-NULL,  will be provided this tx and a status.  The
+ * status will be one of SDMA_TXREQ_S_OK, SDMA_TXREQ_S_SENDERROR,
+ * SDMA_TXREQ_S_ABORTED, or SDMA_TXREQ_S_SHUTDOWN.
+ *
+ */
+static inline int sdma_txinit(
+	struct sdma_txreq *tx,
+	u16 flags,
+	u16 tlen,
+	void (*cb)(struct sdma_txreq *, int, int))
+{
+	return sdma_txinit_ahg(tx, flags, tlen, 0, 0, NULL, 0, cb);
+}
+
+/* helpers - don't use */
+static inline int sdma_mapping_type(struct sdma_desc *d)
+{
+	return (d->qw[1] & SDMA_DESC1_GENERATION_SMASK)
+		>> SDMA_DESC1_GENERATION_SHIFT;
+}
+
+static inline size_t sdma_mapping_len(struct sdma_desc *d)
+{
+	return (d->qw[0] & SDMA_DESC0_BYTE_COUNT_SMASK)
+		>> SDMA_DESC0_BYTE_COUNT_SHIFT;
+}
+
+static inline dma_addr_t sdma_mapping_addr(struct sdma_desc *d)
+{
+	return (d->qw[0] & SDMA_DESC0_PHY_ADDR_SMASK)
+		>> SDMA_DESC0_PHY_ADDR_SHIFT;
+}
+
+static inline void make_tx_sdma_desc(
+	struct sdma_txreq *tx,
+	int type,
+	dma_addr_t addr,
+	size_t len)
+{
+	struct sdma_desc *desc = &tx->descp[tx->num_desc];
+
+	if (!tx->num_desc) {
+		/* qw[0] zero; qw[1] first, ahg mode already in from init */
+		desc->qw[1] |= ((u64)type & SDMA_DESC1_GENERATION_MASK)
+				<< SDMA_DESC1_GENERATION_SHIFT;
+	} else {
+		desc->qw[0] = 0;
+		desc->qw[1] = ((u64)type & SDMA_DESC1_GENERATION_MASK)
+				<< SDMA_DESC1_GENERATION_SHIFT;
+	}
+	desc->qw[0] |= (((u64)addr & SDMA_DESC0_PHY_ADDR_MASK)
+				<< SDMA_DESC0_PHY_ADDR_SHIFT) |
+			(((u64)len & SDMA_DESC0_BYTE_COUNT_MASK)
+				<< SDMA_DESC0_BYTE_COUNT_SHIFT);
+}
+
+/* helper to extend txreq */
+int _extend_sdma_tx_descs(struct hfi1_devdata *, struct sdma_txreq *);
+int _pad_sdma_tx_descs(struct hfi1_devdata *, struct sdma_txreq *);
+void sdma_txclean(struct hfi1_devdata *, struct sdma_txreq *);
+
+/* helpers used by public routines */
+static inline void _sdma_close_tx(struct hfi1_devdata *dd,
+				  struct sdma_txreq *tx)
+{
+	tx->descp[tx->num_desc].qw[0] |=
+		SDMA_DESC0_LAST_DESC_FLAG;
+	tx->descp[tx->num_desc].qw[1] |=
+		dd->default_desc1;
+	if (tx->flags & SDMA_TXREQ_F_URGENT)
+		tx->descp[tx->num_desc].qw[1] |=
+			(SDMA_DESC1_HEAD_TO_HOST_FLAG|
+			 SDMA_DESC1_INT_REQ_FLAG);
+}
+
+static inline int _sdma_txadd_daddr(
+	struct hfi1_devdata *dd,
+	int type,
+	struct sdma_txreq *tx,
+	dma_addr_t addr,
+	u16 len)
+{
+	int rval = 0;
+
+	if ((unlikely(tx->num_desc == tx->desc_limit))) {
+		rval = _extend_sdma_tx_descs(dd, tx);
+		if (rval)
+			return rval;
+	}
+	make_tx_sdma_desc(
+		tx,
+		type,
+		addr, len);
+	WARN_ON(len > tx->tlen);
+	tx->tlen -= len;
+	/* special cases for last */
+	if (!tx->tlen) {
+		if (tx->packet_len & (sizeof(u32) - 1))
+			rval = _pad_sdma_tx_descs(dd, tx);
+		else
+			_sdma_close_tx(dd, tx);
+	}
+	tx->num_desc++;
+	return rval;
+}
+
+/**
+ * sdma_txadd_page() - add a page to the sdma_txreq
+ * @dd: the device to use for mapping
+ * @tx: tx request to which the page is added
+ * @page: page to map
+ * @offset: offset within the page
+ * @len: length in bytes
+ *
+ * This is used to add a page/offset/length descriptor.
+ *
+ * The mapping/unmapping of the page/offset/len is automatically handled.
+ *
+ * Return:
+ * 0 - success, -ENOSPC - mapping fail, -ENOMEM - couldn't
+ * extend descriptor array or couldn't allocate coalesce
+ * buffer.
+ *
+ */
+static inline int sdma_txadd_page(
+	struct hfi1_devdata *dd,
+	struct sdma_txreq *tx,
+	struct page *page,
+	unsigned long offset,
+	u16 len)
+{
+	dma_addr_t addr =
+		dma_map_page(
+			&dd->pcidev->dev,
+			page,
+			offset,
+			len,
+			DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(&dd->pcidev->dev, addr))) {
+		sdma_txclean(dd, tx);
+		return -ENOSPC;
+	}
+	return _sdma_txadd_daddr(
+			dd, SDMA_MAP_PAGE, tx, addr, len);
+}
+
+/**
+ * sdma_txadd_daddr() - add a dma address to the sdma_txreq
+ * @dd: the device to use for mapping
+ * @tx: sdma_txreq to which the page is added
+ * @addr: dma address mapped by caller
+ * @len: length in bytes
+ *
+ * This is used to add a descriptor for memory that is already dma mapped.
+ *
+ * In this case, there is no unmapping as part of the progress processing for
+ * this memory location.
+ *
+ * Return:
+ * 0 - success, -ENOMEM - couldn't extend descriptor array
+ */
+
+static inline int sdma_txadd_daddr(
+	struct hfi1_devdata *dd,
+	struct sdma_txreq *tx,
+	dma_addr_t addr,
+	u16 len)
+{
+	return _sdma_txadd_daddr(dd, SDMA_MAP_NONE, tx, addr, len);
+}
+
+/**
+ * sdma_txadd_kvaddr() - add a kernel virtual address to sdma_txreq
+ * @dd: the device to use for mapping
+ * @tx: sdma_txreq to which the page is added
+ * @kvaddr: the kernel virtual address
+ * @len: length in bytes
+ *
+ * This is used to add a descriptor referenced by the indicated kvaddr and
+ * len.
+ *
+ * The mapping/unmapping of the kvaddr and len is automatically handled.
+ *
+ * Return:
+ * 0 - success, -ENOSPC - mapping fail, -ENOMEM - couldn't extend
+ * descriptor array
+ */
+static inline int sdma_txadd_kvaddr(
+	struct hfi1_devdata *dd,
+	struct sdma_txreq *tx,
+	void *kvaddr,
+	u16 len)
+{
+	dma_addr_t addr =
+		dma_map_single(
+			&dd->pcidev->dev,
+			kvaddr,
+			len,
+			DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(&dd->pcidev->dev, addr))) {
+		sdma_txclean(dd, tx);
+		return -ENOSPC;
+	}
+	return _sdma_txadd_daddr(
+			dd, SDMA_MAP_SINGLE, tx, addr, len);
+}
+
+struct iowait;
+
+int sdma_send_txreq(struct sdma_engine *sde,
+		    struct iowait *wait,
+		    struct sdma_txreq *tx);
+int sdma_send_txlist(struct sdma_engine *sde,
+		     struct iowait *wait,
+		     struct list_head *tx_list);
+
+int sdma_ahg_alloc(struct sdma_engine *sde);
+void sdma_ahg_free(struct sdma_engine *sde, int ahg_index);
+
+/**
+ * sdma_build_ahg - build ahg descriptor
+ * @data
+ * @dwindex
+ * @startbit
+ * @bits
+ *
+ * Build and return a 32 bit descriptor.
+ */
+static inline u32 sdma_build_ahg_descriptor(
+	u16 data,
+	u8 dwindex,
+	u8 startbit,
+	u8 bits)
+{
+	return (u32)(1UL << SDMA_AHG_UPDATE_ENABLE_SHIFT |
+		((startbit & SDMA_AHG_FIELD_START_MASK) <<
+		SDMA_AHG_FIELD_START_SHIFT) |
+		((bits & SDMA_AHG_FIELD_LEN_MASK) <<
+		SDMA_AHG_FIELD_LEN_SHIFT) |
+		((dwindex & SDMA_AHG_INDEX_MASK) <<
+		SDMA_AHG_INDEX_SHIFT) |
+		((data & SDMA_AHG_VALUE_MASK) <<
+		SDMA_AHG_VALUE_SHIFT));
+}
+
+/**
+ * sdma_progress - use seq number of detect head progress
+ * @sde: sdma_engine to check
+ * @seq: base seq count
+ * @tx: txreq for which we need to check descriptor availability
+ *
+ * This is used in the appropriate spot in the sleep routine
+ * to check for potential ring progress.  This routine gets the
+ * seqcount before queuing the iowait structure for progress.
+ *
+ * If the seqcount indicates that progress needs to be checked,
+ * re-submission is detected by checking whether the descriptor
+ * queue has enough descriptor for the txreq.
+ */
+static inline unsigned sdma_progress(struct sdma_engine *sde, unsigned seq,
+				     struct sdma_txreq *tx)
+{
+	if (read_seqretry(&sde->head_lock, seq)) {
+		sde->desc_avail = sdma_descq_freecnt(sde);
+		if (tx->num_desc > sde->desc_avail)
+			return 0;
+		return 1;
+	}
+	return 0;
+}
+
+/**
+ * sdma_iowait_schedule() - initialize wait structure
+ * @sde: sdma_engine to schedule
+ * @wait: wait struct to schedule
+ *
+ * This function initializes the iowait
+ * structure embedded in the QP or PQ.
+ *
+ */
+static inline void sdma_iowait_schedule(
+	struct sdma_engine *sde,
+	struct iowait *wait)
+{
+	iowait_schedule(wait, sde->wq);
+}
+
+/* for use by interrupt handling */
+void sdma_engine_error(struct sdma_engine *sde, u64 status);
+void sdma_engine_interrupt(struct sdma_engine *sde, u64 status);
+
+/*
+ *
+ * The diagram below details the relationship of the mapping structures
+ *
+ * Since the mapping now allows for non-uniform engines per vl, the
+ * number of engines for a vl is either the vl_engines[vl] or
+ * a computation based on num_sdma/num_vls:
+ *
+ * For example:
+ * nactual = vl_engines ? vl_engines[vl] : num_sdma/num_vls
+ *
+ * n = roundup to next highest power of 2 using nactual
+ *
+ * In the case where there are num_sdma/num_vls doesn't divide
+ * evenly, the extras are added from the last vl downward.
+ *
+ * For the case where n > nactual, the engines are assigned
+ * in a round robin fashion wrapping back to the first engine
+ * for a particular vl.
+ *
+ *               dd->sdma_map
+ *                    |                                   sdma_map_elem[0]
+ *                    |                                +--------------------+
+ *                    v                                |       mask         |
+ *               sdma_vl_map                           |--------------------|
+ *      +--------------------------+                   | sde[0] -> eng 1    |
+ *      |    list (RCU)            |                   |--------------------|
+ *      |--------------------------|                 ->| sde[1] -> eng 2    |
+ *      |    mask                  |              --/  |--------------------|
+ *      |--------------------------|            -/     |        *           |
+ *      |    actual_vls (max 8)    |          -/       |--------------------|
+ *      |--------------------------|       --/         | sde[n] -> eng n    |
+ *      |    vls (max 8)           |     -/            +--------------------+
+ *      |--------------------------|  --/
+ *      |    map[0]                |-/
+ *      |--------------------------|                   +--------------------+
+ *      |    map[1]                |---                |       mask         |
+ *      |--------------------------|   \----           |--------------------|
+ *      |           *              |        \--        | sde[0] -> eng 1+n  |
+ *      |           *              |           \----   |--------------------|
+ *      |           *              |                \->| sde[1] -> eng 2+n  |
+ *      |--------------------------|                   |--------------------|
+ *      |   map[vls - 1]           |-                  |         *          |
+ *      +--------------------------+ \-                |--------------------|
+ *                                     \-              | sde[m] -> eng m+n  |
+ *                                       \             +--------------------+
+ *                                        \-
+ *                                          \
+ *                                           \-        +--------------------+
+ *                                             \-      |       mask         |
+ *                                               \     |--------------------|
+ *                                                \-   | sde[0] -> eng 1+m+n|
+ *                                                  \- |--------------------|
+ *                                                    >| sde[1] -> eng 2+m+n|
+ *                                                     |--------------------|
+ *                                                     |         *          |
+ *                                                     |--------------------|
+ *                                                     | sde[o] -> eng o+m+n|
+ *                                                     +--------------------+
+ *
+ */
+
+/**
+ * struct sdma_map_elem - mapping for a vl
+ * @mask - selector mask
+ * @sde - array of engines for this vl
+ *
+ * The mask is used to "mod" the selector
+ * to produce index into the trailing
+ * array of sdes.
+ */
+struct sdma_map_elem {
+	u32 mask;
+	struct sdma_engine *sde[0];
+};
+
+/**
+ * struct sdma_map_el - mapping for a vl
+ * @list - rcu head for free callback
+ * @mask - vl mask to "mod" the vl to produce an index to map array
+ * @actual_vls - number of vls
+ * @vls - number of vls rounded to next power of 2
+ * @map - array of sdma_map_elem entries
+ *
+ * This is the parent mapping structure.  The trailing
+ * members of the struct point to sdma_map_elem entries, which
+ * in turn point to an array of sde's for that vl.
+ */
+struct sdma_vl_map {
+	struct rcu_head list;
+	u32 mask;
+	u8 actual_vls;
+	u8 vls;
+	struct sdma_map_elem *map[0];
+};
+
+int sdma_map_init(
+	struct hfi1_devdata *dd,
+	u8 port,
+	u8 num_vls,
+	u8 *vl_engines);
+
+/* slow path */
+void _sdma_engine_progress_schedule(struct sdma_engine *sde);
+
+/**
+ * sdma_engine_progress_schedule() - schedule progress on engine
+ * @sde: sdma_engine to schedule progress
+ *
+ * This is the fast path.
+ *
+ */
+static inline void sdma_engine_progress_schedule(
+	struct sdma_engine *sde)
+{
+	if (!sde || sdma_descq_inprocess(sde) < (sde->descq_cnt / 8))
+		return;
+	_sdma_engine_progress_schedule(sde);
+}
+
+struct sdma_engine *sdma_select_engine_sc(
+	struct hfi1_devdata *dd,
+	u32 selector,
+	u8 sc5);
+
+struct sdma_engine *sdma_select_engine_vl(
+	struct hfi1_devdata *dd,
+	u32 selector,
+	u8 vl);
+
+void sdma_seqfile_dump_sde(struct seq_file *s, struct sdma_engine *);
+
+#ifdef CONFIG_SDMA_VERBOSITY
+void sdma_dumpstate(struct sdma_engine *);
+#endif
+static inline char *slashstrip(char *s)
+{
+	char *r = s;
+
+	while (*s)
+		if (*s++ == '/')
+			r = s;
+	return r;
+}
+
+u16 sdma_get_descq_cnt(void);
+
+extern uint mod_num_sdma;
+
+void sdma_update_lmc(struct hfi1_devdata *dd, u64 mask, u32 lid);
+
+#endif
diff --git a/drivers/staging/rdma/hfi1/srq.c b/drivers/staging/rdma/hfi1/srq.c
new file mode 100644
index 000000000000..67786d417493
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/srq.c
@@ -0,0 +1,397 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/err.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+
+#include "verbs.h"
+
+/**
+ * hfi1_post_srq_receive - post a receive on a shared receive queue
+ * @ibsrq: the SRQ to post the receive on
+ * @wr: the list of work requests to post
+ * @bad_wr: A pointer to the first WR to cause a problem is put here
+ *
+ * This may be called from interrupt context.
+ */
+int hfi1_post_srq_receive(struct ib_srq *ibsrq, struct ib_recv_wr *wr,
+			  struct ib_recv_wr **bad_wr)
+{
+	struct hfi1_srq *srq = to_isrq(ibsrq);
+	struct hfi1_rwq *wq;
+	unsigned long flags;
+	int ret;
+
+	for (; wr; wr = wr->next) {
+		struct hfi1_rwqe *wqe;
+		u32 next;
+		int i;
+
+		if ((unsigned) wr->num_sge > srq->rq.max_sge) {
+			*bad_wr = wr;
+			ret = -EINVAL;
+			goto bail;
+		}
+
+		spin_lock_irqsave(&srq->rq.lock, flags);
+		wq = srq->rq.wq;
+		next = wq->head + 1;
+		if (next >= srq->rq.size)
+			next = 0;
+		if (next == wq->tail) {
+			spin_unlock_irqrestore(&srq->rq.lock, flags);
+			*bad_wr = wr;
+			ret = -ENOMEM;
+			goto bail;
+		}
+
+		wqe = get_rwqe_ptr(&srq->rq, wq->head);
+		wqe->wr_id = wr->wr_id;
+		wqe->num_sge = wr->num_sge;
+		for (i = 0; i < wr->num_sge; i++)
+			wqe->sg_list[i] = wr->sg_list[i];
+		/* Make sure queue entry is written before the head index. */
+		smp_wmb();
+		wq->head = next;
+		spin_unlock_irqrestore(&srq->rq.lock, flags);
+	}
+	ret = 0;
+
+bail:
+	return ret;
+}
+
+/**
+ * hfi1_create_srq - create a shared receive queue
+ * @ibpd: the protection domain of the SRQ to create
+ * @srq_init_attr: the attributes of the SRQ
+ * @udata: data from libibverbs when creating a user SRQ
+ */
+struct ib_srq *hfi1_create_srq(struct ib_pd *ibpd,
+			       struct ib_srq_init_attr *srq_init_attr,
+			       struct ib_udata *udata)
+{
+	struct hfi1_ibdev *dev = to_idev(ibpd->device);
+	struct hfi1_srq *srq;
+	u32 sz;
+	struct ib_srq *ret;
+
+	if (srq_init_attr->srq_type != IB_SRQT_BASIC) {
+		ret = ERR_PTR(-ENOSYS);
+		goto done;
+	}
+
+	if (srq_init_attr->attr.max_sge == 0 ||
+	    srq_init_attr->attr.max_sge > hfi1_max_srq_sges ||
+	    srq_init_attr->attr.max_wr == 0 ||
+	    srq_init_attr->attr.max_wr > hfi1_max_srq_wrs) {
+		ret = ERR_PTR(-EINVAL);
+		goto done;
+	}
+
+	srq = kmalloc(sizeof(*srq), GFP_KERNEL);
+	if (!srq) {
+		ret = ERR_PTR(-ENOMEM);
+		goto done;
+	}
+
+	/*
+	 * Need to use vmalloc() if we want to support large #s of entries.
+	 */
+	srq->rq.size = srq_init_attr->attr.max_wr + 1;
+	srq->rq.max_sge = srq_init_attr->attr.max_sge;
+	sz = sizeof(struct ib_sge) * srq->rq.max_sge +
+		sizeof(struct hfi1_rwqe);
+	srq->rq.wq = vmalloc_user(sizeof(struct hfi1_rwq) + srq->rq.size * sz);
+	if (!srq->rq.wq) {
+		ret = ERR_PTR(-ENOMEM);
+		goto bail_srq;
+	}
+
+	/*
+	 * Return the address of the RWQ as the offset to mmap.
+	 * See hfi1_mmap() for details.
+	 */
+	if (udata && udata->outlen >= sizeof(__u64)) {
+		int err;
+		u32 s = sizeof(struct hfi1_rwq) + srq->rq.size * sz;
+
+		srq->ip =
+		    hfi1_create_mmap_info(dev, s, ibpd->uobject->context,
+					  srq->rq.wq);
+		if (!srq->ip) {
+			ret = ERR_PTR(-ENOMEM);
+			goto bail_wq;
+		}
+
+		err = ib_copy_to_udata(udata, &srq->ip->offset,
+				       sizeof(srq->ip->offset));
+		if (err) {
+			ret = ERR_PTR(err);
+			goto bail_ip;
+		}
+	} else
+		srq->ip = NULL;
+
+	/*
+	 * ib_create_srq() will initialize srq->ibsrq.
+	 */
+	spin_lock_init(&srq->rq.lock);
+	srq->rq.wq->head = 0;
+	srq->rq.wq->tail = 0;
+	srq->limit = srq_init_attr->attr.srq_limit;
+
+	spin_lock(&dev->n_srqs_lock);
+	if (dev->n_srqs_allocated == hfi1_max_srqs) {
+		spin_unlock(&dev->n_srqs_lock);
+		ret = ERR_PTR(-ENOMEM);
+		goto bail_ip;
+	}
+
+	dev->n_srqs_allocated++;
+	spin_unlock(&dev->n_srqs_lock);
+
+	if (srq->ip) {
+		spin_lock_irq(&dev->pending_lock);
+		list_add(&srq->ip->pending_mmaps, &dev->pending_mmaps);
+		spin_unlock_irq(&dev->pending_lock);
+	}
+
+	ret = &srq->ibsrq;
+	goto done;
+
+bail_ip:
+	kfree(srq->ip);
+bail_wq:
+	vfree(srq->rq.wq);
+bail_srq:
+	kfree(srq);
+done:
+	return ret;
+}
+
+/**
+ * hfi1_modify_srq - modify a shared receive queue
+ * @ibsrq: the SRQ to modify
+ * @attr: the new attributes of the SRQ
+ * @attr_mask: indicates which attributes to modify
+ * @udata: user data for libibverbs.so
+ */
+int hfi1_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr,
+		    enum ib_srq_attr_mask attr_mask,
+		    struct ib_udata *udata)
+{
+	struct hfi1_srq *srq = to_isrq(ibsrq);
+	struct hfi1_rwq *wq;
+	int ret = 0;
+
+	if (attr_mask & IB_SRQ_MAX_WR) {
+		struct hfi1_rwq *owq;
+		struct hfi1_rwqe *p;
+		u32 sz, size, n, head, tail;
+
+		/* Check that the requested sizes are below the limits. */
+		if ((attr->max_wr > hfi1_max_srq_wrs) ||
+		    ((attr_mask & IB_SRQ_LIMIT) ?
+		     attr->srq_limit : srq->limit) > attr->max_wr) {
+			ret = -EINVAL;
+			goto bail;
+		}
+
+		sz = sizeof(struct hfi1_rwqe) +
+			srq->rq.max_sge * sizeof(struct ib_sge);
+		size = attr->max_wr + 1;
+		wq = vmalloc_user(sizeof(struct hfi1_rwq) + size * sz);
+		if (!wq) {
+			ret = -ENOMEM;
+			goto bail;
+		}
+
+		/* Check that we can write the offset to mmap. */
+		if (udata && udata->inlen >= sizeof(__u64)) {
+			__u64 offset_addr;
+			__u64 offset = 0;
+
+			ret = ib_copy_from_udata(&offset_addr, udata,
+						 sizeof(offset_addr));
+			if (ret)
+				goto bail_free;
+			udata->outbuf =
+				(void __user *) (unsigned long) offset_addr;
+			ret = ib_copy_to_udata(udata, &offset,
+					       sizeof(offset));
+			if (ret)
+				goto bail_free;
+		}
+
+		spin_lock_irq(&srq->rq.lock);
+		/*
+		 * validate head and tail pointer values and compute
+		 * the number of remaining WQEs.
+		 */
+		owq = srq->rq.wq;
+		head = owq->head;
+		tail = owq->tail;
+		if (head >= srq->rq.size || tail >= srq->rq.size) {
+			ret = -EINVAL;
+			goto bail_unlock;
+		}
+		n = head;
+		if (n < tail)
+			n += srq->rq.size - tail;
+		else
+			n -= tail;
+		if (size <= n) {
+			ret = -EINVAL;
+			goto bail_unlock;
+		}
+		n = 0;
+		p = wq->wq;
+		while (tail != head) {
+			struct hfi1_rwqe *wqe;
+			int i;
+
+			wqe = get_rwqe_ptr(&srq->rq, tail);
+			p->wr_id = wqe->wr_id;
+			p->num_sge = wqe->num_sge;
+			for (i = 0; i < wqe->num_sge; i++)
+				p->sg_list[i] = wqe->sg_list[i];
+			n++;
+			p = (struct hfi1_rwqe *)((char *)p + sz);
+			if (++tail >= srq->rq.size)
+				tail = 0;
+		}
+		srq->rq.wq = wq;
+		srq->rq.size = size;
+		wq->head = n;
+		wq->tail = 0;
+		if (attr_mask & IB_SRQ_LIMIT)
+			srq->limit = attr->srq_limit;
+		spin_unlock_irq(&srq->rq.lock);
+
+		vfree(owq);
+
+		if (srq->ip) {
+			struct hfi1_mmap_info *ip = srq->ip;
+			struct hfi1_ibdev *dev = to_idev(srq->ibsrq.device);
+			u32 s = sizeof(struct hfi1_rwq) + size * sz;
+
+			hfi1_update_mmap_info(dev, ip, s, wq);
+
+			/*
+			 * Return the offset to mmap.
+			 * See hfi1_mmap() for details.
+			 */
+			if (udata && udata->inlen >= sizeof(__u64)) {
+				ret = ib_copy_to_udata(udata, &ip->offset,
+						       sizeof(ip->offset));
+				if (ret)
+					goto bail;
+			}
+
+			/*
+			 * Put user mapping info onto the pending list
+			 * unless it already is on the list.
+			 */
+			spin_lock_irq(&dev->pending_lock);
+			if (list_empty(&ip->pending_mmaps))
+				list_add(&ip->pending_mmaps,
+					 &dev->pending_mmaps);
+			spin_unlock_irq(&dev->pending_lock);
+		}
+	} else if (attr_mask & IB_SRQ_LIMIT) {
+		spin_lock_irq(&srq->rq.lock);
+		if (attr->srq_limit >= srq->rq.size)
+			ret = -EINVAL;
+		else
+			srq->limit = attr->srq_limit;
+		spin_unlock_irq(&srq->rq.lock);
+	}
+	goto bail;
+
+bail_unlock:
+	spin_unlock_irq(&srq->rq.lock);
+bail_free:
+	vfree(wq);
+bail:
+	return ret;
+}
+
+int hfi1_query_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr)
+{
+	struct hfi1_srq *srq = to_isrq(ibsrq);
+
+	attr->max_wr = srq->rq.size - 1;
+	attr->max_sge = srq->rq.max_sge;
+	attr->srq_limit = srq->limit;
+	return 0;
+}
+
+/**
+ * hfi1_destroy_srq - destroy a shared receive queue
+ * @ibsrq: the SRQ to destroy
+ */
+int hfi1_destroy_srq(struct ib_srq *ibsrq)
+{
+	struct hfi1_srq *srq = to_isrq(ibsrq);
+	struct hfi1_ibdev *dev = to_idev(ibsrq->device);
+
+	spin_lock(&dev->n_srqs_lock);
+	dev->n_srqs_allocated--;
+	spin_unlock(&dev->n_srqs_lock);
+	if (srq->ip)
+		kref_put(&srq->ip->ref, hfi1_release_mmap_info);
+	else
+		vfree(srq->rq.wq);
+	kfree(srq);
+
+	return 0;
+}
diff --git a/drivers/staging/rdma/hfi1/sysfs.c b/drivers/staging/rdma/hfi1/sysfs.c
new file mode 100644
index 000000000000..b78c72861ef9
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/sysfs.c
@@ -0,0 +1,739 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#include <linux/ctype.h>
+
+#include "hfi.h"
+#include "mad.h"
+#include "trace.h"
+
+
+/*
+ * Start of per-port congestion control structures and support code
+ */
+
+/*
+ * Congestion control table size followed by table entries
+ */
+static ssize_t read_cc_table_bin(struct file *filp, struct kobject *kobj,
+		struct bin_attribute *bin_attr,
+		char *buf, loff_t pos, size_t count)
+{
+	int ret;
+	struct hfi1_pportdata *ppd =
+		container_of(kobj, struct hfi1_pportdata, pport_cc_kobj);
+	struct cc_state *cc_state;
+
+	ret = ppd->total_cct_entry * sizeof(struct ib_cc_table_entry_shadow)
+		 + sizeof(__be16);
+
+	if (pos > ret)
+		return -EINVAL;
+
+	if (count > ret - pos)
+		count = ret - pos;
+
+	if (!count)
+		return count;
+
+	rcu_read_lock();
+	cc_state = get_cc_state(ppd);
+	if (cc_state == NULL) {
+		rcu_read_unlock();
+		return -EINVAL;
+	}
+	memcpy(buf, &cc_state->cct, count);
+	rcu_read_unlock();
+
+	return count;
+}
+
+static void port_release(struct kobject *kobj)
+{
+	/* nothing to do since memory is freed by hfi1_free_devdata() */
+}
+
+static struct kobj_type port_cc_ktype = {
+	.release = port_release,
+};
+
+static struct bin_attribute cc_table_bin_attr = {
+	.attr = {.name = "cc_table_bin", .mode = 0444},
+	.read = read_cc_table_bin,
+	.size = PAGE_SIZE,
+};
+
+/*
+ * Congestion settings: port control, control map and an array of 16
+ * entries for the congestion entries - increase, timer, event log
+ * trigger threshold and the minimum injection rate delay.
+ */
+static ssize_t read_cc_setting_bin(struct file *filp, struct kobject *kobj,
+		struct bin_attribute *bin_attr,
+		char *buf, loff_t pos, size_t count)
+{
+	int ret;
+	struct hfi1_pportdata *ppd =
+		container_of(kobj, struct hfi1_pportdata, pport_cc_kobj);
+	struct cc_state *cc_state;
+
+	ret = sizeof(struct opa_congestion_setting_attr_shadow);
+
+	if (pos > ret)
+		return -EINVAL;
+	if (count > ret - pos)
+		count = ret - pos;
+
+	if (!count)
+		return count;
+
+	rcu_read_lock();
+	cc_state = get_cc_state(ppd);
+	if (cc_state == NULL) {
+		rcu_read_unlock();
+		return -EINVAL;
+	}
+	memcpy(buf, &cc_state->cong_setting, count);
+	rcu_read_unlock();
+
+	return count;
+}
+
+static struct bin_attribute cc_setting_bin_attr = {
+	.attr = {.name = "cc_settings_bin", .mode = 0444},
+	.read = read_cc_setting_bin,
+	.size = PAGE_SIZE,
+};
+
+/* Start sc2vl */
+#define HFI1_SC2VL_ATTR(N)				    \
+	static struct hfi1_sc2vl_attr hfi1_sc2vl_attr_##N = { \
+		.attr = { .name = __stringify(N), .mode = 0444 }, \
+		.sc = N \
+	}
+
+struct hfi1_sc2vl_attr {
+	struct attribute attr;
+	int sc;
+};
+
+HFI1_SC2VL_ATTR(0);
+HFI1_SC2VL_ATTR(1);
+HFI1_SC2VL_ATTR(2);
+HFI1_SC2VL_ATTR(3);
+HFI1_SC2VL_ATTR(4);
+HFI1_SC2VL_ATTR(5);
+HFI1_SC2VL_ATTR(6);
+HFI1_SC2VL_ATTR(7);
+HFI1_SC2VL_ATTR(8);
+HFI1_SC2VL_ATTR(9);
+HFI1_SC2VL_ATTR(10);
+HFI1_SC2VL_ATTR(11);
+HFI1_SC2VL_ATTR(12);
+HFI1_SC2VL_ATTR(13);
+HFI1_SC2VL_ATTR(14);
+HFI1_SC2VL_ATTR(15);
+HFI1_SC2VL_ATTR(16);
+HFI1_SC2VL_ATTR(17);
+HFI1_SC2VL_ATTR(18);
+HFI1_SC2VL_ATTR(19);
+HFI1_SC2VL_ATTR(20);
+HFI1_SC2VL_ATTR(21);
+HFI1_SC2VL_ATTR(22);
+HFI1_SC2VL_ATTR(23);
+HFI1_SC2VL_ATTR(24);
+HFI1_SC2VL_ATTR(25);
+HFI1_SC2VL_ATTR(26);
+HFI1_SC2VL_ATTR(27);
+HFI1_SC2VL_ATTR(28);
+HFI1_SC2VL_ATTR(29);
+HFI1_SC2VL_ATTR(30);
+HFI1_SC2VL_ATTR(31);
+
+
+static struct attribute *sc2vl_default_attributes[] = {
+	&hfi1_sc2vl_attr_0.attr,
+	&hfi1_sc2vl_attr_1.attr,
+	&hfi1_sc2vl_attr_2.attr,
+	&hfi1_sc2vl_attr_3.attr,
+	&hfi1_sc2vl_attr_4.attr,
+	&hfi1_sc2vl_attr_5.attr,
+	&hfi1_sc2vl_attr_6.attr,
+	&hfi1_sc2vl_attr_7.attr,
+	&hfi1_sc2vl_attr_8.attr,
+	&hfi1_sc2vl_attr_9.attr,
+	&hfi1_sc2vl_attr_10.attr,
+	&hfi1_sc2vl_attr_11.attr,
+	&hfi1_sc2vl_attr_12.attr,
+	&hfi1_sc2vl_attr_13.attr,
+	&hfi1_sc2vl_attr_14.attr,
+	&hfi1_sc2vl_attr_15.attr,
+	&hfi1_sc2vl_attr_16.attr,
+	&hfi1_sc2vl_attr_17.attr,
+	&hfi1_sc2vl_attr_18.attr,
+	&hfi1_sc2vl_attr_19.attr,
+	&hfi1_sc2vl_attr_20.attr,
+	&hfi1_sc2vl_attr_21.attr,
+	&hfi1_sc2vl_attr_22.attr,
+	&hfi1_sc2vl_attr_23.attr,
+	&hfi1_sc2vl_attr_24.attr,
+	&hfi1_sc2vl_attr_25.attr,
+	&hfi1_sc2vl_attr_26.attr,
+	&hfi1_sc2vl_attr_27.attr,
+	&hfi1_sc2vl_attr_28.attr,
+	&hfi1_sc2vl_attr_29.attr,
+	&hfi1_sc2vl_attr_30.attr,
+	&hfi1_sc2vl_attr_31.attr,
+	NULL
+};
+
+static ssize_t sc2vl_attr_show(struct kobject *kobj, struct attribute *attr,
+			       char *buf)
+{
+	struct hfi1_sc2vl_attr *sattr =
+		container_of(attr, struct hfi1_sc2vl_attr, attr);
+	struct hfi1_pportdata *ppd =
+		container_of(kobj, struct hfi1_pportdata, sc2vl_kobj);
+	struct hfi1_devdata *dd = ppd->dd;
+
+	return sprintf(buf, "%u\n", *((u8 *)dd->sc2vl + sattr->sc));
+}
+
+static const struct sysfs_ops hfi1_sc2vl_ops = {
+	.show = sc2vl_attr_show,
+};
+
+static struct kobj_type hfi1_sc2vl_ktype = {
+	.release = port_release,
+	.sysfs_ops = &hfi1_sc2vl_ops,
+	.default_attrs = sc2vl_default_attributes
+};
+
+/* End sc2vl */
+
+/* Start sl2sc */
+#define HFI1_SL2SC_ATTR(N)				    \
+	static struct hfi1_sl2sc_attr hfi1_sl2sc_attr_##N = {	  \
+		.attr = { .name = __stringify(N), .mode = 0444 }, \
+		.sl = N						  \
+	}
+
+struct hfi1_sl2sc_attr {
+	struct attribute attr;
+	int sl;
+};
+
+HFI1_SL2SC_ATTR(0);
+HFI1_SL2SC_ATTR(1);
+HFI1_SL2SC_ATTR(2);
+HFI1_SL2SC_ATTR(3);
+HFI1_SL2SC_ATTR(4);
+HFI1_SL2SC_ATTR(5);
+HFI1_SL2SC_ATTR(6);
+HFI1_SL2SC_ATTR(7);
+HFI1_SL2SC_ATTR(8);
+HFI1_SL2SC_ATTR(9);
+HFI1_SL2SC_ATTR(10);
+HFI1_SL2SC_ATTR(11);
+HFI1_SL2SC_ATTR(12);
+HFI1_SL2SC_ATTR(13);
+HFI1_SL2SC_ATTR(14);
+HFI1_SL2SC_ATTR(15);
+HFI1_SL2SC_ATTR(16);
+HFI1_SL2SC_ATTR(17);
+HFI1_SL2SC_ATTR(18);
+HFI1_SL2SC_ATTR(19);
+HFI1_SL2SC_ATTR(20);
+HFI1_SL2SC_ATTR(21);
+HFI1_SL2SC_ATTR(22);
+HFI1_SL2SC_ATTR(23);
+HFI1_SL2SC_ATTR(24);
+HFI1_SL2SC_ATTR(25);
+HFI1_SL2SC_ATTR(26);
+HFI1_SL2SC_ATTR(27);
+HFI1_SL2SC_ATTR(28);
+HFI1_SL2SC_ATTR(29);
+HFI1_SL2SC_ATTR(30);
+HFI1_SL2SC_ATTR(31);
+
+
+static struct attribute *sl2sc_default_attributes[] = {
+	&hfi1_sl2sc_attr_0.attr,
+	&hfi1_sl2sc_attr_1.attr,
+	&hfi1_sl2sc_attr_2.attr,
+	&hfi1_sl2sc_attr_3.attr,
+	&hfi1_sl2sc_attr_4.attr,
+	&hfi1_sl2sc_attr_5.attr,
+	&hfi1_sl2sc_attr_6.attr,
+	&hfi1_sl2sc_attr_7.attr,
+	&hfi1_sl2sc_attr_8.attr,
+	&hfi1_sl2sc_attr_9.attr,
+	&hfi1_sl2sc_attr_10.attr,
+	&hfi1_sl2sc_attr_11.attr,
+	&hfi1_sl2sc_attr_12.attr,
+	&hfi1_sl2sc_attr_13.attr,
+	&hfi1_sl2sc_attr_14.attr,
+	&hfi1_sl2sc_attr_15.attr,
+	&hfi1_sl2sc_attr_16.attr,
+	&hfi1_sl2sc_attr_17.attr,
+	&hfi1_sl2sc_attr_18.attr,
+	&hfi1_sl2sc_attr_19.attr,
+	&hfi1_sl2sc_attr_20.attr,
+	&hfi1_sl2sc_attr_21.attr,
+	&hfi1_sl2sc_attr_22.attr,
+	&hfi1_sl2sc_attr_23.attr,
+	&hfi1_sl2sc_attr_24.attr,
+	&hfi1_sl2sc_attr_25.attr,
+	&hfi1_sl2sc_attr_26.attr,
+	&hfi1_sl2sc_attr_27.attr,
+	&hfi1_sl2sc_attr_28.attr,
+	&hfi1_sl2sc_attr_29.attr,
+	&hfi1_sl2sc_attr_30.attr,
+	&hfi1_sl2sc_attr_31.attr,
+	NULL
+};
+
+static ssize_t sl2sc_attr_show(struct kobject *kobj, struct attribute *attr,
+			       char *buf)
+{
+	struct hfi1_sl2sc_attr *sattr =
+		container_of(attr, struct hfi1_sl2sc_attr, attr);
+	struct hfi1_pportdata *ppd =
+		container_of(kobj, struct hfi1_pportdata, sl2sc_kobj);
+	struct hfi1_ibport *ibp = &ppd->ibport_data;
+
+	return sprintf(buf, "%u\n", ibp->sl_to_sc[sattr->sl]);
+}
+
+static const struct sysfs_ops hfi1_sl2sc_ops = {
+	.show = sl2sc_attr_show,
+};
+
+static struct kobj_type hfi1_sl2sc_ktype = {
+	.release = port_release,
+	.sysfs_ops = &hfi1_sl2sc_ops,
+	.default_attrs = sl2sc_default_attributes
+};
+
+/* End sl2sc */
+
+/* Start vl2mtu */
+
+#define HFI1_VL2MTU_ATTR(N) \
+	static struct hfi1_vl2mtu_attr hfi1_vl2mtu_attr_##N = { \
+		.attr = { .name = __stringify(N), .mode = 0444 }, \
+		.vl = N						  \
+	}
+
+struct hfi1_vl2mtu_attr {
+	struct attribute attr;
+	int vl;
+};
+
+HFI1_VL2MTU_ATTR(0);
+HFI1_VL2MTU_ATTR(1);
+HFI1_VL2MTU_ATTR(2);
+HFI1_VL2MTU_ATTR(3);
+HFI1_VL2MTU_ATTR(4);
+HFI1_VL2MTU_ATTR(5);
+HFI1_VL2MTU_ATTR(6);
+HFI1_VL2MTU_ATTR(7);
+HFI1_VL2MTU_ATTR(8);
+HFI1_VL2MTU_ATTR(9);
+HFI1_VL2MTU_ATTR(10);
+HFI1_VL2MTU_ATTR(11);
+HFI1_VL2MTU_ATTR(12);
+HFI1_VL2MTU_ATTR(13);
+HFI1_VL2MTU_ATTR(14);
+HFI1_VL2MTU_ATTR(15);
+
+static struct attribute *vl2mtu_default_attributes[] = {
+	&hfi1_vl2mtu_attr_0.attr,
+	&hfi1_vl2mtu_attr_1.attr,
+	&hfi1_vl2mtu_attr_2.attr,
+	&hfi1_vl2mtu_attr_3.attr,
+	&hfi1_vl2mtu_attr_4.attr,
+	&hfi1_vl2mtu_attr_5.attr,
+	&hfi1_vl2mtu_attr_6.attr,
+	&hfi1_vl2mtu_attr_7.attr,
+	&hfi1_vl2mtu_attr_8.attr,
+	&hfi1_vl2mtu_attr_9.attr,
+	&hfi1_vl2mtu_attr_10.attr,
+	&hfi1_vl2mtu_attr_11.attr,
+	&hfi1_vl2mtu_attr_12.attr,
+	&hfi1_vl2mtu_attr_13.attr,
+	&hfi1_vl2mtu_attr_14.attr,
+	&hfi1_vl2mtu_attr_15.attr,
+	NULL
+};
+
+static ssize_t vl2mtu_attr_show(struct kobject *kobj, struct attribute *attr,
+				char *buf)
+{
+	struct hfi1_vl2mtu_attr *vlattr =
+		container_of(attr, struct hfi1_vl2mtu_attr, attr);
+	struct hfi1_pportdata *ppd =
+		container_of(kobj, struct hfi1_pportdata, vl2mtu_kobj);
+	struct hfi1_devdata *dd = ppd->dd;
+
+	return sprintf(buf, "%u\n", dd->vld[vlattr->vl].mtu);
+}
+
+static const struct sysfs_ops hfi1_vl2mtu_ops = {
+	.show = vl2mtu_attr_show,
+};
+
+static struct kobj_type hfi1_vl2mtu_ktype = {
+	.release = port_release,
+	.sysfs_ops = &hfi1_vl2mtu_ops,
+	.default_attrs = vl2mtu_default_attributes
+};
+
+
+/* end of per-port file structures and support code */
+
+/*
+ * Start of per-unit (or driver, in some cases, but replicated
+ * per unit) functions (these get a device *)
+ */
+static ssize_t show_rev(struct device *device, struct device_attribute *attr,
+			char *buf)
+{
+	struct hfi1_ibdev *dev =
+		container_of(device, struct hfi1_ibdev, ibdev.dev);
+
+	return sprintf(buf, "%x\n", dd_from_dev(dev)->minrev);
+}
+
+static ssize_t show_hfi(struct device *device, struct device_attribute *attr,
+			char *buf)
+{
+	struct hfi1_ibdev *dev =
+		container_of(device, struct hfi1_ibdev, ibdev.dev);
+	struct hfi1_devdata *dd = dd_from_dev(dev);
+	int ret;
+
+	if (!dd->boardname)
+		ret = -EINVAL;
+	else
+		ret = scnprintf(buf, PAGE_SIZE, "%s\n", dd->boardname);
+	return ret;
+}
+
+static ssize_t show_boardversion(struct device *device,
+				 struct device_attribute *attr, char *buf)
+{
+	struct hfi1_ibdev *dev =
+		container_of(device, struct hfi1_ibdev, ibdev.dev);
+	struct hfi1_devdata *dd = dd_from_dev(dev);
+
+	/* The string printed here is already newline-terminated. */
+	return scnprintf(buf, PAGE_SIZE, "%s", dd->boardversion);
+}
+
+
+static ssize_t show_nctxts(struct device *device,
+			   struct device_attribute *attr, char *buf)
+{
+	struct hfi1_ibdev *dev =
+		container_of(device, struct hfi1_ibdev, ibdev.dev);
+	struct hfi1_devdata *dd = dd_from_dev(dev);
+
+	/*
+	 * Return the smaller of send and receive contexts.
+	 * Normally, user level applications would require both a send
+	 * and a receive context, so returning the smaller of the two counts
+	 * give a more accurate picture of total contexts available.
+	 */
+	return scnprintf(buf, PAGE_SIZE, "%u\n",
+			 min(dd->num_rcv_contexts - dd->first_user_ctxt,
+			     (u32)dd->sc_sizes[SC_USER].count));
+}
+
+static ssize_t show_nfreectxts(struct device *device,
+			   struct device_attribute *attr, char *buf)
+{
+	struct hfi1_ibdev *dev =
+		container_of(device, struct hfi1_ibdev, ibdev.dev);
+	struct hfi1_devdata *dd = dd_from_dev(dev);
+
+	/* Return the number of free user ports (contexts) available. */
+	return scnprintf(buf, PAGE_SIZE, "%u\n", dd->freectxts);
+}
+
+static ssize_t show_serial(struct device *device,
+			   struct device_attribute *attr, char *buf)
+{
+	struct hfi1_ibdev *dev =
+		container_of(device, struct hfi1_ibdev, ibdev.dev);
+	struct hfi1_devdata *dd = dd_from_dev(dev);
+
+	return scnprintf(buf, PAGE_SIZE, "%s", dd->serial);
+
+}
+
+static ssize_t store_chip_reset(struct device *device,
+				struct device_attribute *attr, const char *buf,
+				size_t count)
+{
+	struct hfi1_ibdev *dev =
+		container_of(device, struct hfi1_ibdev, ibdev.dev);
+	struct hfi1_devdata *dd = dd_from_dev(dev);
+	int ret;
+
+	if (count < 5 || memcmp(buf, "reset", 5) || !dd->diag_client) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	ret = hfi1_reset_device(dd->unit);
+bail:
+	return ret < 0 ? ret : count;
+}
+
+/*
+ * Convert the reported temperature from an integer (reported in
+ * units of 0.25C) to a floating point number.
+ */
+#define temp2str(temp, buf, size, idx)					\
+	scnprintf((buf) + (idx), (size) - (idx), "%u.%02u ",		\
+			      ((temp) >> 2), ((temp) & 0x3) * 25)
+
+/*
+ * Dump tempsense values, in decimal, to ease shell-scripts.
+ */
+static ssize_t show_tempsense(struct device *device,
+			      struct device_attribute *attr, char *buf)
+{
+	struct hfi1_ibdev *dev =
+		container_of(device, struct hfi1_ibdev, ibdev.dev);
+	struct hfi1_devdata *dd = dd_from_dev(dev);
+	struct hfi1_temp temp;
+	int ret = -ENXIO;
+
+	ret = hfi1_tempsense_rd(dd, &temp);
+	if (!ret) {
+		int idx = 0;
+
+		idx += temp2str(temp.curr, buf, PAGE_SIZE, idx);
+		idx += temp2str(temp.lo_lim, buf, PAGE_SIZE, idx);
+		idx += temp2str(temp.hi_lim, buf, PAGE_SIZE, idx);
+		idx += temp2str(temp.crit_lim, buf, PAGE_SIZE, idx);
+		idx += scnprintf(buf + idx, PAGE_SIZE - idx,
+				"%u %u %u\n", temp.triggers & 0x1,
+				temp.triggers & 0x2, temp.triggers & 0x4);
+		ret = idx;
+	}
+	return ret;
+}
+
+/*
+ * end of per-unit (or driver, in some cases, but replicated
+ * per unit) functions
+ */
+
+/* start of per-unit file structures and support code */
+static DEVICE_ATTR(hw_rev, S_IRUGO, show_rev, NULL);
+static DEVICE_ATTR(board_id, S_IRUGO, show_hfi, NULL);
+static DEVICE_ATTR(nctxts, S_IRUGO, show_nctxts, NULL);
+static DEVICE_ATTR(nfreectxts, S_IRUGO, show_nfreectxts, NULL);
+static DEVICE_ATTR(serial, S_IRUGO, show_serial, NULL);
+static DEVICE_ATTR(boardversion, S_IRUGO, show_boardversion, NULL);
+static DEVICE_ATTR(tempsense, S_IRUGO, show_tempsense, NULL);
+static DEVICE_ATTR(chip_reset, S_IWUSR, NULL, store_chip_reset);
+
+static struct device_attribute *hfi1_attributes[] = {
+	&dev_attr_hw_rev,
+	&dev_attr_board_id,
+	&dev_attr_nctxts,
+	&dev_attr_nfreectxts,
+	&dev_attr_serial,
+	&dev_attr_boardversion,
+	&dev_attr_tempsense,
+	&dev_attr_chip_reset,
+};
+
+int hfi1_create_port_files(struct ib_device *ibdev, u8 port_num,
+			   struct kobject *kobj)
+{
+	struct hfi1_pportdata *ppd;
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	int ret;
+
+	if (!port_num || port_num > dd->num_pports) {
+		dd_dev_err(dd,
+			"Skipping infiniband class with invalid port %u\n",
+			port_num);
+		return -ENODEV;
+	}
+	ppd = &dd->pport[port_num - 1];
+
+	ret = kobject_init_and_add(&ppd->sc2vl_kobj, &hfi1_sc2vl_ktype, kobj,
+				   "sc2vl");
+	if (ret) {
+		dd_dev_err(dd,
+			   "Skipping sc2vl sysfs info, (err %d) port %u\n",
+			   ret, port_num);
+		goto bail;
+	}
+	kobject_uevent(&ppd->sc2vl_kobj, KOBJ_ADD);
+
+	ret = kobject_init_and_add(&ppd->sl2sc_kobj, &hfi1_sl2sc_ktype, kobj,
+				   "sl2sc");
+	if (ret) {
+		dd_dev_err(dd,
+			   "Skipping sl2sc sysfs info, (err %d) port %u\n",
+			   ret, port_num);
+		goto bail_sc2vl;
+	}
+	kobject_uevent(&ppd->sl2sc_kobj, KOBJ_ADD);
+
+	ret = kobject_init_and_add(&ppd->vl2mtu_kobj, &hfi1_vl2mtu_ktype, kobj,
+				   "vl2mtu");
+	if (ret) {
+		dd_dev_err(dd,
+			   "Skipping vl2mtu sysfs info, (err %d) port %u\n",
+			   ret, port_num);
+		goto bail_sl2sc;
+	}
+	kobject_uevent(&ppd->vl2mtu_kobj, KOBJ_ADD);
+
+
+	ret = kobject_init_and_add(&ppd->pport_cc_kobj, &port_cc_ktype,
+				   kobj, "CCMgtA");
+	if (ret) {
+		dd_dev_err(dd,
+		 "Skipping Congestion Control sysfs info, (err %d) port %u\n",
+		 ret, port_num);
+		goto bail_vl2mtu;
+	}
+
+	kobject_uevent(&ppd->pport_cc_kobj, KOBJ_ADD);
+
+	ret = sysfs_create_bin_file(&ppd->pport_cc_kobj,
+				&cc_setting_bin_attr);
+	if (ret) {
+		dd_dev_err(dd,
+		 "Skipping Congestion Control setting sysfs info, (err %d) port %u\n",
+		 ret, port_num);
+		goto bail_cc;
+	}
+
+	ret = sysfs_create_bin_file(&ppd->pport_cc_kobj,
+				&cc_table_bin_attr);
+	if (ret) {
+		dd_dev_err(dd,
+		 "Skipping Congestion Control table sysfs info, (err %d) port %u\n",
+		 ret, port_num);
+		goto bail_cc_entry_bin;
+	}
+
+	dd_dev_info(dd,
+		"IB%u: Congestion Control Agent enabled for port %d\n",
+		dd->unit, port_num);
+
+	return 0;
+
+bail_cc_entry_bin:
+	sysfs_remove_bin_file(&ppd->pport_cc_kobj,
+			      &cc_setting_bin_attr);
+bail_cc:
+	kobject_put(&ppd->pport_cc_kobj);
+bail_vl2mtu:
+	kobject_put(&ppd->vl2mtu_kobj);
+bail_sl2sc:
+	kobject_put(&ppd->sl2sc_kobj);
+bail_sc2vl:
+	kobject_put(&ppd->sc2vl_kobj);
+bail:
+	return ret;
+}
+
+/*
+ * Register and create our files in /sys/class/infiniband.
+ */
+int hfi1_verbs_register_sysfs(struct hfi1_devdata *dd)
+{
+	struct ib_device *dev = &dd->verbs_dev.ibdev;
+	int i, ret;
+
+	for (i = 0; i < ARRAY_SIZE(hfi1_attributes); ++i) {
+		ret = device_create_file(&dev->dev, hfi1_attributes[i]);
+		if (ret)
+			goto bail;
+	}
+
+	return 0;
+bail:
+	for (i = 0; i < ARRAY_SIZE(hfi1_attributes); ++i)
+		device_remove_file(&dev->dev, hfi1_attributes[i]);
+	return ret;
+}
+
+/*
+ * Unregister and remove our files in /sys/class/infiniband.
+ */
+void hfi1_verbs_unregister_sysfs(struct hfi1_devdata *dd)
+{
+	struct hfi1_pportdata *ppd;
+	int i;
+
+	for (i = 0; i < dd->num_pports; i++) {
+		ppd = &dd->pport[i];
+
+		sysfs_remove_bin_file(&ppd->pport_cc_kobj,
+				      &cc_setting_bin_attr);
+		sysfs_remove_bin_file(&ppd->pport_cc_kobj,
+				      &cc_table_bin_attr);
+		kobject_put(&ppd->pport_cc_kobj);
+		kobject_put(&ppd->vl2mtu_kobj);
+		kobject_put(&ppd->sl2sc_kobj);
+		kobject_put(&ppd->sc2vl_kobj);
+	}
+}
diff --git a/drivers/staging/rdma/hfi1/trace.c b/drivers/staging/rdma/hfi1/trace.c
new file mode 100644
index 000000000000..70ad7b9fc1ce
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/trace.c
@@ -0,0 +1,221 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
+u8 ibhdr_exhdr_len(struct hfi1_ib_header *hdr)
+{
+	struct hfi1_other_headers *ohdr;
+	u8 opcode;
+	u8 lnh = (u8)(be16_to_cpu(hdr->lrh[0]) & 3);
+
+	if (lnh == HFI1_LRH_BTH)
+		ohdr = &hdr->u.oth;
+	else
+		ohdr = &hdr->u.l.oth;
+	opcode = be32_to_cpu(ohdr->bth[0]) >> 24;
+	return hdr_len_by_opcode[opcode] == 0 ?
+	       0 : hdr_len_by_opcode[opcode] - (12 + 8);
+}
+
+#define IMM_PRN  "imm %d"
+#define RETH_PRN "reth vaddr 0x%.16llx rkey 0x%.8x dlen 0x%.8x"
+#define AETH_PRN "aeth syn 0x%.2x msn 0x%.8x"
+#define DETH_PRN "deth qkey 0x%.8x sqpn 0x%.6x"
+#define ATOMICACKETH_PRN "origdata %lld"
+#define ATOMICETH_PRN "vaddr 0x%llx rkey 0x%.8x sdata %lld cdata %lld"
+
+#define OP(transport, op) IB_OPCODE_## transport ## _ ## op
+
+static u64 ib_u64_get(__be32 *p)
+{
+	return ((u64)be32_to_cpu(p[0]) << 32) | be32_to_cpu(p[1]);
+}
+
+const char *parse_everbs_hdrs(
+	struct trace_seq *p,
+	u8 opcode,
+	void *ehdrs)
+{
+	union ib_ehdrs *eh = ehdrs;
+	const char *ret = trace_seq_buffer_ptr(p);
+
+	switch (opcode) {
+	/* imm */
+	case OP(RC, SEND_LAST_WITH_IMMEDIATE):
+	case OP(UC, SEND_LAST_WITH_IMMEDIATE):
+	case OP(RC, SEND_ONLY_WITH_IMMEDIATE):
+	case OP(UC, SEND_ONLY_WITH_IMMEDIATE):
+	case OP(RC, RDMA_WRITE_LAST_WITH_IMMEDIATE):
+	case OP(UC, RDMA_WRITE_LAST_WITH_IMMEDIATE):
+		trace_seq_printf(p, IMM_PRN,
+			be32_to_cpu(eh->imm_data));
+		break;
+	/* reth + imm */
+	case OP(RC, RDMA_WRITE_ONLY_WITH_IMMEDIATE):
+	case OP(UC, RDMA_WRITE_ONLY_WITH_IMMEDIATE):
+		trace_seq_printf(p, RETH_PRN " " IMM_PRN,
+			(unsigned long long)ib_u64_get(
+				(__be32 *)&eh->rc.reth.vaddr),
+			be32_to_cpu(eh->rc.reth.rkey),
+			be32_to_cpu(eh->rc.reth.length),
+			be32_to_cpu(eh->rc.imm_data));
+		break;
+	/* reth */
+	case OP(RC, RDMA_READ_REQUEST):
+	case OP(RC, RDMA_WRITE_FIRST):
+	case OP(UC, RDMA_WRITE_FIRST):
+	case OP(RC, RDMA_WRITE_ONLY):
+	case OP(UC, RDMA_WRITE_ONLY):
+		trace_seq_printf(p, RETH_PRN,
+			(unsigned long long)ib_u64_get(
+				(__be32 *)&eh->rc.reth.vaddr),
+			be32_to_cpu(eh->rc.reth.rkey),
+			be32_to_cpu(eh->rc.reth.length));
+		break;
+	case OP(RC, RDMA_READ_RESPONSE_FIRST):
+	case OP(RC, RDMA_READ_RESPONSE_LAST):
+	case OP(RC, RDMA_READ_RESPONSE_ONLY):
+	case OP(RC, ACKNOWLEDGE):
+		trace_seq_printf(p, AETH_PRN,
+			be32_to_cpu(eh->aeth) >> 24,
+			be32_to_cpu(eh->aeth) & HFI1_QPN_MASK);
+		break;
+	/* aeth + atomicacketh */
+	case OP(RC, ATOMIC_ACKNOWLEDGE):
+		trace_seq_printf(p, AETH_PRN " " ATOMICACKETH_PRN,
+			(be32_to_cpu(eh->at.aeth) >> 24) & 0xff,
+			be32_to_cpu(eh->at.aeth) & HFI1_QPN_MASK,
+			(unsigned long long)ib_u64_get(eh->at.atomic_ack_eth));
+		break;
+	/* atomiceth */
+	case OP(RC, COMPARE_SWAP):
+	case OP(RC, FETCH_ADD):
+		trace_seq_printf(p, ATOMICETH_PRN,
+			(unsigned long long)ib_u64_get(eh->atomic_eth.vaddr),
+			eh->atomic_eth.rkey,
+			(unsigned long long)ib_u64_get(
+				(__be32 *)&eh->atomic_eth.swap_data),
+			(unsigned long long) ib_u64_get(
+				 (__be32 *)&eh->atomic_eth.compare_data));
+		break;
+	/* deth */
+	case OP(UD, SEND_ONLY):
+	case OP(UD, SEND_ONLY_WITH_IMMEDIATE):
+		trace_seq_printf(p, DETH_PRN,
+			be32_to_cpu(eh->ud.deth[0]),
+			be32_to_cpu(eh->ud.deth[1]) & HFI1_QPN_MASK);
+		break;
+	}
+	trace_seq_putc(p, 0);
+	return ret;
+}
+
+const char *parse_sdma_flags(
+	struct trace_seq *p,
+	u64 desc0, u64 desc1)
+{
+	const char *ret = trace_seq_buffer_ptr(p);
+	char flags[5] = { 'x', 'x', 'x', 'x', 0 };
+
+	flags[0] = (desc1 & SDMA_DESC1_INT_REQ_FLAG) ? 'I' : '-';
+	flags[1] = (desc1 & SDMA_DESC1_HEAD_TO_HOST_FLAG) ?  'H' : '-';
+	flags[2] = (desc0 & SDMA_DESC0_FIRST_DESC_FLAG) ? 'F' : '-';
+	flags[3] = (desc0 & SDMA_DESC0_LAST_DESC_FLAG) ? 'L' : '-';
+	trace_seq_printf(p, "%s", flags);
+	if (desc0 & SDMA_DESC0_FIRST_DESC_FLAG)
+		trace_seq_printf(p, " amode:%u aidx:%u alen:%u",
+			(u8)((desc1 >> SDMA_DESC1_HEADER_MODE_SHIFT)
+				& SDMA_DESC1_HEADER_MODE_MASK),
+			(u8)((desc1 >> SDMA_DESC1_HEADER_INDEX_SHIFT)
+				& SDMA_DESC1_HEADER_INDEX_MASK),
+			(u8)((desc1 >> SDMA_DESC1_HEADER_DWS_SHIFT)
+				& SDMA_DESC1_HEADER_DWS_MASK));
+	return ret;
+}
+
+const char *print_u32_array(
+	struct trace_seq *p,
+	u32 *arr, int len)
+{
+	int i;
+	const char *ret = trace_seq_buffer_ptr(p);
+
+	for (i = 0; i < len ; i++)
+		trace_seq_printf(p, "%s%#x", i == 0 ? "" : " ", arr[i]);
+	trace_seq_putc(p, 0);
+	return ret;
+}
+
+const char *print_u64_array(
+	struct trace_seq *p,
+	u64 *arr, int len)
+{
+	int i;
+	const char *ret = trace_seq_buffer_ptr(p);
+
+	for (i = 0; i < len; i++)
+		trace_seq_printf(p, "%s0x%016llx", i == 0 ? "" : " ", arr[i]);
+	trace_seq_putc(p, 0);
+	return ret;
+}
+
+__hfi1_trace_fn(PKT);
+__hfi1_trace_fn(PROC);
+__hfi1_trace_fn(SDMA);
+__hfi1_trace_fn(LINKVERB);
+__hfi1_trace_fn(DEBUG);
+__hfi1_trace_fn(SNOOP);
+__hfi1_trace_fn(CNTR);
+__hfi1_trace_fn(PIO);
+__hfi1_trace_fn(DC8051);
+__hfi1_trace_fn(FIRMWARE);
+__hfi1_trace_fn(RCVCTRL);
+__hfi1_trace_fn(TID);
diff --git a/drivers/staging/rdma/hfi1/trace.h b/drivers/staging/rdma/hfi1/trace.h
new file mode 100644
index 000000000000..d7851c0a0171
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/trace.h
@@ -0,0 +1,1409 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#undef TRACE_SYSTEM_VAR
+#define TRACE_SYSTEM_VAR hfi1
+
+#if !defined(__HFI1_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define __HFI1_TRACE_H
+
+#include <linux/tracepoint.h>
+#include <linux/trace_seq.h>
+
+#include "hfi.h"
+#include "mad.h"
+#include "sdma.h"
+
+#define DD_DEV_ENTRY(dd)       __string(dev, dev_name(&(dd)->pcidev->dev))
+#define DD_DEV_ASSIGN(dd)      __assign_str(dev, dev_name(&(dd)->pcidev->dev))
+
+#define packettype_name(etype) { RHF_RCV_TYPE_##etype, #etype }
+#define show_packettype(etype)                  \
+__print_symbolic(etype,                         \
+	packettype_name(EXPECTED),              \
+	packettype_name(EAGER),                 \
+	packettype_name(IB),                    \
+	packettype_name(ERROR),                 \
+	packettype_name(BYPASS))
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hfi1_rx
+
+TRACE_EVENT(hfi1_rcvhdr,
+	TP_PROTO(struct hfi1_devdata *dd,
+		 u64 eflags,
+		 u32 ctxt,
+		 u32 etype,
+		 u32 hlen,
+		 u32 tlen,
+		 u32 updegr,
+		 u32 etail),
+	TP_ARGS(dd, ctxt, eflags, etype, hlen, tlen, updegr, etail),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(dd)
+		__field(u64, eflags)
+		__field(u32, ctxt)
+		__field(u32, etype)
+		__field(u32, hlen)
+		__field(u32, tlen)
+		__field(u32, updegr)
+		__field(u32, etail)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(dd);
+		__entry->eflags = eflags;
+		__entry->ctxt = ctxt;
+		__entry->etype = etype;
+		__entry->hlen = hlen;
+		__entry->tlen = tlen;
+		__entry->updegr = updegr;
+		__entry->etail = etail;
+	),
+	TP_printk(
+"[%s] ctxt %d eflags 0x%llx etype %d,%s hlen %d tlen %d updegr %d etail %d",
+		__get_str(dev),
+		__entry->ctxt,
+		__entry->eflags,
+		__entry->etype, show_packettype(__entry->etype),
+		__entry->hlen,
+		__entry->tlen,
+		__entry->updegr,
+		__entry->etail
+	)
+);
+
+TRACE_EVENT(hfi1_receive_interrupt,
+	TP_PROTO(struct hfi1_devdata *dd, u32 ctxt),
+	TP_ARGS(dd, ctxt),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(dd)
+		__field(u32, ctxt)
+		__field(u8, slow_path)
+		__field(u8, dma_rtail)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(dd);
+		__entry->ctxt = ctxt;
+		if (dd->rcd[ctxt]->do_interrupt ==
+		    &handle_receive_interrupt) {
+			__entry->slow_path = 1;
+			__entry->dma_rtail = 0xFF;
+		} else if (dd->rcd[ctxt]->do_interrupt ==
+			&handle_receive_interrupt_dma_rtail){
+			__entry->dma_rtail = 1;
+			__entry->slow_path = 0;
+		} else if (dd->rcd[ctxt]->do_interrupt ==
+			 &handle_receive_interrupt_nodma_rtail) {
+			__entry->dma_rtail = 0;
+			__entry->slow_path = 0;
+		}
+	),
+	TP_printk(
+		"[%s] ctxt %d SlowPath: %d DmaRtail: %d",
+		__get_str(dev),
+		__entry->ctxt,
+		__entry->slow_path,
+		__entry->dma_rtail
+	)
+);
+
+const char *print_u64_array(struct trace_seq *, u64 *, int);
+
+TRACE_EVENT(hfi1_exp_tid_map,
+	    TP_PROTO(unsigned ctxt, u16 subctxt, int dir,
+		     unsigned long *maps, u16 count),
+	    TP_ARGS(ctxt, subctxt, dir, maps, count),
+	    TP_STRUCT__entry(
+		    __field(unsigned, ctxt)
+		    __field(u16, subctxt)
+		    __field(int, dir)
+		    __field(u16, count)
+		    __dynamic_array(unsigned long, maps, sizeof(*maps) * count)
+		    ),
+	    TP_fast_assign(
+		    __entry->ctxt = ctxt;
+		    __entry->subctxt = subctxt;
+		    __entry->dir = dir;
+		    __entry->count = count;
+		    memcpy(__get_dynamic_array(maps), maps,
+			   sizeof(*maps) * count);
+		    ),
+	    TP_printk("[%3u:%02u] %s tidmaps %s",
+		      __entry->ctxt,
+		      __entry->subctxt,
+		      (__entry->dir ? ">" : "<"),
+		      print_u64_array(p, __get_dynamic_array(maps),
+				      __entry->count)
+		    )
+	);
+
+TRACE_EVENT(hfi1_exp_rcv_set,
+	    TP_PROTO(unsigned ctxt, u16 subctxt, u32 tid,
+		     unsigned long vaddr, u64 phys_addr, void *page),
+	    TP_ARGS(ctxt, subctxt, tid, vaddr, phys_addr, page),
+	    TP_STRUCT__entry(
+		    __field(unsigned, ctxt)
+		    __field(u16, subctxt)
+		    __field(u32, tid)
+		    __field(unsigned long, vaddr)
+		    __field(u64, phys_addr)
+		    __field(void *, page)
+		    ),
+	    TP_fast_assign(
+		    __entry->ctxt = ctxt;
+		    __entry->subctxt = subctxt;
+		    __entry->tid = tid;
+		    __entry->vaddr = vaddr;
+		    __entry->phys_addr = phys_addr;
+		    __entry->page = page;
+		    ),
+	    TP_printk("[%u:%u] TID %u, vaddrs 0x%lx, physaddr 0x%llx, pgp %p",
+		      __entry->ctxt,
+		      __entry->subctxt,
+		      __entry->tid,
+		      __entry->vaddr,
+		      __entry->phys_addr,
+		      __entry->page
+		    )
+	);
+
+TRACE_EVENT(hfi1_exp_rcv_free,
+	    TP_PROTO(unsigned ctxt, u16 subctxt, u32 tid,
+		     unsigned long phys, void *page),
+	    TP_ARGS(ctxt, subctxt, tid, phys, page),
+	    TP_STRUCT__entry(
+		    __field(unsigned, ctxt)
+		    __field(u16, subctxt)
+		    __field(u32, tid)
+		    __field(unsigned long, phys)
+		    __field(void *, page)
+		    ),
+	    TP_fast_assign(
+		    __entry->ctxt = ctxt;
+		    __entry->subctxt = subctxt;
+		    __entry->tid = tid;
+		    __entry->phys = phys;
+		    __entry->page = page;
+		    ),
+	    TP_printk("[%u:%u] freeing TID %u, 0x%lx, pgp %p",
+		      __entry->ctxt,
+		      __entry->subctxt,
+		      __entry->tid,
+		      __entry->phys,
+		      __entry->page
+		    )
+	);
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hfi1_tx
+
+TRACE_EVENT(hfi1_piofree,
+	TP_PROTO(struct send_context *sc, int extra),
+	TP_ARGS(sc, extra),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(sc->dd)
+		__field(u32, sw_index)
+		__field(u32, hw_context)
+		__field(int, extra)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(sc->dd);
+		__entry->sw_index = sc->sw_index;
+		__entry->hw_context = sc->hw_context;
+		__entry->extra = extra;
+	),
+	TP_printk(
+		"[%s] ctxt %u(%u) extra %d",
+		__get_str(dev),
+		__entry->sw_index,
+		__entry->hw_context,
+		__entry->extra
+	)
+);
+
+TRACE_EVENT(hfi1_wantpiointr,
+	TP_PROTO(struct send_context *sc, u32 needint, u64 credit_ctrl),
+	TP_ARGS(sc, needint, credit_ctrl),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(sc->dd)
+		__field(u32, sw_index)
+		__field(u32, hw_context)
+		__field(u32, needint)
+		__field(u64, credit_ctrl)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(sc->dd);
+		__entry->sw_index = sc->sw_index;
+		__entry->hw_context = sc->hw_context;
+		__entry->needint = needint;
+		__entry->credit_ctrl = credit_ctrl;
+	),
+	TP_printk(
+		"[%s] ctxt %u(%u) on %d credit_ctrl 0x%llx",
+		__get_str(dev),
+		__entry->sw_index,
+		__entry->hw_context,
+		__entry->needint,
+		(unsigned long long)__entry->credit_ctrl
+	)
+);
+
+DECLARE_EVENT_CLASS(hfi1_qpsleepwakeup_template,
+	TP_PROTO(struct hfi1_qp *qp, u32 flags),
+	TP_ARGS(qp, flags),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(dd_from_ibdev(qp->ibqp.device))
+		__field(u32, qpn)
+		__field(u32, flags)
+		__field(u32, s_flags)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device))
+		__entry->flags = flags;
+		__entry->qpn = qp->ibqp.qp_num;
+		__entry->s_flags = qp->s_flags;
+	),
+	TP_printk(
+		"[%s] qpn 0x%x flags 0x%x s_flags 0x%x",
+		__get_str(dev),
+		__entry->qpn,
+		__entry->flags,
+		__entry->s_flags
+	)
+);
+
+DEFINE_EVENT(hfi1_qpsleepwakeup_template, hfi1_qpwakeup,
+	     TP_PROTO(struct hfi1_qp *qp, u32 flags),
+	     TP_ARGS(qp, flags));
+
+DEFINE_EVENT(hfi1_qpsleepwakeup_template, hfi1_qpsleep,
+	     TP_PROTO(struct hfi1_qp *qp, u32 flags),
+	     TP_ARGS(qp, flags));
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hfi1_qphash
+DECLARE_EVENT_CLASS(hfi1_qphash_template,
+	TP_PROTO(struct hfi1_qp *qp, u32 bucket),
+	TP_ARGS(qp, bucket),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(dd_from_ibdev(qp->ibqp.device))
+		__field(u32, qpn)
+		__field(u32, bucket)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device))
+		__entry->qpn = qp->ibqp.qp_num;
+		__entry->bucket = bucket;
+	),
+	TP_printk(
+		"[%s] qpn 0x%x bucket %u",
+		__get_str(dev),
+		__entry->qpn,
+		__entry->bucket
+	)
+);
+
+DEFINE_EVENT(hfi1_qphash_template, hfi1_qpinsert,
+	TP_PROTO(struct hfi1_qp *qp, u32 bucket),
+	TP_ARGS(qp, bucket));
+
+DEFINE_EVENT(hfi1_qphash_template, hfi1_qpremove,
+	TP_PROTO(struct hfi1_qp *qp, u32 bucket),
+	TP_ARGS(qp, bucket));
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hfi1_ibhdrs
+
+u8 ibhdr_exhdr_len(struct hfi1_ib_header *hdr);
+const char *parse_everbs_hdrs(
+	struct trace_seq *p,
+	u8 opcode,
+	void *ehdrs);
+
+#define __parse_ib_ehdrs(op, ehdrs) parse_everbs_hdrs(p, op, ehdrs)
+
+const char *parse_sdma_flags(
+	struct trace_seq *p,
+	u64 desc0, u64 desc1);
+
+#define __parse_sdma_flags(desc0, desc1) parse_sdma_flags(p, desc0, desc1)
+
+
+#define lrh_name(lrh) { HFI1_##lrh, #lrh }
+#define show_lnh(lrh)                    \
+__print_symbolic(lrh,                    \
+	lrh_name(LRH_BTH),               \
+	lrh_name(LRH_GRH))
+
+#define ib_opcode_name(opcode) { IB_OPCODE_##opcode, #opcode  }
+#define show_ib_opcode(opcode)                             \
+__print_symbolic(opcode,                                   \
+	ib_opcode_name(RC_SEND_FIRST),                     \
+	ib_opcode_name(RC_SEND_MIDDLE),                    \
+	ib_opcode_name(RC_SEND_LAST),                      \
+	ib_opcode_name(RC_SEND_LAST_WITH_IMMEDIATE),       \
+	ib_opcode_name(RC_SEND_ONLY),                      \
+	ib_opcode_name(RC_SEND_ONLY_WITH_IMMEDIATE),       \
+	ib_opcode_name(RC_RDMA_WRITE_FIRST),               \
+	ib_opcode_name(RC_RDMA_WRITE_MIDDLE),              \
+	ib_opcode_name(RC_RDMA_WRITE_LAST),                \
+	ib_opcode_name(RC_RDMA_WRITE_LAST_WITH_IMMEDIATE), \
+	ib_opcode_name(RC_RDMA_WRITE_ONLY),                \
+	ib_opcode_name(RC_RDMA_WRITE_ONLY_WITH_IMMEDIATE), \
+	ib_opcode_name(RC_RDMA_READ_REQUEST),              \
+	ib_opcode_name(RC_RDMA_READ_RESPONSE_FIRST),       \
+	ib_opcode_name(RC_RDMA_READ_RESPONSE_MIDDLE),      \
+	ib_opcode_name(RC_RDMA_READ_RESPONSE_LAST),        \
+	ib_opcode_name(RC_RDMA_READ_RESPONSE_ONLY),        \
+	ib_opcode_name(RC_ACKNOWLEDGE),                    \
+	ib_opcode_name(RC_ATOMIC_ACKNOWLEDGE),             \
+	ib_opcode_name(RC_COMPARE_SWAP),                   \
+	ib_opcode_name(RC_FETCH_ADD),                      \
+	ib_opcode_name(UC_SEND_FIRST),                     \
+	ib_opcode_name(UC_SEND_MIDDLE),                    \
+	ib_opcode_name(UC_SEND_LAST),                      \
+	ib_opcode_name(UC_SEND_LAST_WITH_IMMEDIATE),       \
+	ib_opcode_name(UC_SEND_ONLY),                      \
+	ib_opcode_name(UC_SEND_ONLY_WITH_IMMEDIATE),       \
+	ib_opcode_name(UC_RDMA_WRITE_FIRST),               \
+	ib_opcode_name(UC_RDMA_WRITE_MIDDLE),              \
+	ib_opcode_name(UC_RDMA_WRITE_LAST),                \
+	ib_opcode_name(UC_RDMA_WRITE_LAST_WITH_IMMEDIATE), \
+	ib_opcode_name(UC_RDMA_WRITE_ONLY),                \
+	ib_opcode_name(UC_RDMA_WRITE_ONLY_WITH_IMMEDIATE), \
+	ib_opcode_name(UD_SEND_ONLY),                      \
+	ib_opcode_name(UD_SEND_ONLY_WITH_IMMEDIATE))
+
+
+#define LRH_PRN "vl %d lver %d sl %d lnh %d,%s dlid %.4x len %d slid %.4x"
+#define BTH_PRN \
+	"op 0x%.2x,%s se %d m %d pad %d tver %d pkey 0x%.4x " \
+	"f %d b %d qpn 0x%.6x a %d psn 0x%.8x"
+#define EHDR_PRN "%s"
+
+DECLARE_EVENT_CLASS(hfi1_ibhdr_template,
+	TP_PROTO(struct hfi1_devdata *dd,
+		 struct hfi1_ib_header *hdr),
+	TP_ARGS(dd, hdr),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(dd)
+		/* LRH */
+		__field(u8, vl)
+		__field(u8, lver)
+		__field(u8, sl)
+		__field(u8, lnh)
+		__field(u16, dlid)
+		__field(u16, len)
+		__field(u16, slid)
+		/* BTH */
+		__field(u8, opcode)
+		__field(u8, se)
+		__field(u8, m)
+		__field(u8, pad)
+		__field(u8, tver)
+		__field(u16, pkey)
+		__field(u8, f)
+		__field(u8, b)
+		__field(u32, qpn)
+		__field(u8, a)
+		__field(u32, psn)
+		/* extended headers */
+		__dynamic_array(u8, ehdrs, ibhdr_exhdr_len(hdr))
+	),
+	TP_fast_assign(
+		struct hfi1_other_headers *ohdr;
+
+		DD_DEV_ASSIGN(dd);
+		/* LRH */
+		__entry->vl =
+			(u8)(be16_to_cpu(hdr->lrh[0]) >> 12);
+		__entry->lver =
+			(u8)(be16_to_cpu(hdr->lrh[0]) >> 8) & 0xf;
+		__entry->sl =
+			(u8)(be16_to_cpu(hdr->lrh[0]) >> 4) & 0xf;
+		__entry->lnh =
+			(u8)(be16_to_cpu(hdr->lrh[0]) & 3);
+		__entry->dlid =
+			be16_to_cpu(hdr->lrh[1]);
+		/* allow for larger len */
+		__entry->len =
+			be16_to_cpu(hdr->lrh[2]);
+		__entry->slid =
+			be16_to_cpu(hdr->lrh[3]);
+		/* BTH */
+		if (__entry->lnh == HFI1_LRH_BTH)
+			ohdr = &hdr->u.oth;
+		else
+			ohdr = &hdr->u.l.oth;
+		__entry->opcode =
+			(be32_to_cpu(ohdr->bth[0]) >> 24) & 0xff;
+		__entry->se =
+			(be32_to_cpu(ohdr->bth[0]) >> 23) & 1;
+		__entry->m =
+			 (be32_to_cpu(ohdr->bth[0]) >> 22) & 1;
+		__entry->pad =
+			(be32_to_cpu(ohdr->bth[0]) >> 20) & 3;
+		__entry->tver =
+			(be32_to_cpu(ohdr->bth[0]) >> 16) & 0xf;
+		__entry->pkey =
+			be32_to_cpu(ohdr->bth[0]) & 0xffff;
+		__entry->f =
+			(be32_to_cpu(ohdr->bth[1]) >> HFI1_FECN_SHIFT)
+			& HFI1_FECN_MASK;
+		__entry->b =
+			(be32_to_cpu(ohdr->bth[1]) >> HFI1_BECN_SHIFT)
+			& HFI1_BECN_MASK;
+		__entry->qpn =
+			be32_to_cpu(ohdr->bth[1]) & HFI1_QPN_MASK;
+		__entry->a =
+			(be32_to_cpu(ohdr->bth[2]) >> 31) & 1;
+		/* allow for larger PSN */
+		__entry->psn =
+			be32_to_cpu(ohdr->bth[2]) & 0x7fffffff;
+		/* extended headers */
+		 memcpy(
+			__get_dynamic_array(ehdrs),
+			&ohdr->u,
+			ibhdr_exhdr_len(hdr));
+	),
+	TP_printk("[%s] " LRH_PRN " " BTH_PRN " " EHDR_PRN,
+		__get_str(dev),
+		/* LRH */
+		__entry->vl,
+		__entry->lver,
+		__entry->sl,
+		__entry->lnh, show_lnh(__entry->lnh),
+		__entry->dlid,
+		__entry->len,
+		__entry->slid,
+		/* BTH */
+		__entry->opcode, show_ib_opcode(__entry->opcode),
+		__entry->se,
+		__entry->m,
+		__entry->pad,
+		__entry->tver,
+		__entry->pkey,
+		__entry->f,
+		__entry->b,
+		__entry->qpn,
+		__entry->a,
+		__entry->psn,
+		/* extended headers */
+		__parse_ib_ehdrs(
+			__entry->opcode,
+			(void *)__get_dynamic_array(ehdrs))
+	)
+);
+
+DEFINE_EVENT(hfi1_ibhdr_template, input_ibhdr,
+	     TP_PROTO(struct hfi1_devdata *dd, struct hfi1_ib_header *hdr),
+	     TP_ARGS(dd, hdr));
+
+DEFINE_EVENT(hfi1_ibhdr_template, output_ibhdr,
+	     TP_PROTO(struct hfi1_devdata *dd, struct hfi1_ib_header *hdr),
+	     TP_ARGS(dd, hdr));
+
+#define SNOOP_PRN \
+	"slid %.4x dlid %.4x qpn 0x%.6x opcode 0x%.2x,%s " \
+	"svc lvl %d pkey 0x%.4x [header = %d bytes] [data = %d bytes]"
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hfi1_snoop
+
+
+TRACE_EVENT(snoop_capture,
+	TP_PROTO(struct hfi1_devdata *dd,
+		 int hdr_len,
+		 struct hfi1_ib_header *hdr,
+		 int data_len,
+		 void *data),
+	TP_ARGS(dd, hdr_len, hdr, data_len, data),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(dd)
+		__field(u16, slid)
+		__field(u16, dlid)
+		__field(u32, qpn)
+		__field(u8, opcode)
+		__field(u8, sl)
+		__field(u16, pkey)
+		__field(u32, hdr_len)
+		__field(u32, data_len)
+		__field(u8, lnh)
+		__dynamic_array(u8, raw_hdr, hdr_len)
+		__dynamic_array(u8, raw_pkt, data_len)
+	),
+	TP_fast_assign(
+		struct hfi1_other_headers *ohdr;
+
+		__entry->lnh = (u8)(be16_to_cpu(hdr->lrh[0]) & 3);
+		if (__entry->lnh == HFI1_LRH_BTH)
+			ohdr = &hdr->u.oth;
+		else
+			ohdr = &hdr->u.l.oth;
+		DD_DEV_ASSIGN(dd);
+		__entry->slid = be16_to_cpu(hdr->lrh[3]);
+		__entry->dlid = be16_to_cpu(hdr->lrh[1]);
+		__entry->qpn = be32_to_cpu(ohdr->bth[1]) & HFI1_QPN_MASK;
+		__entry->opcode = (be32_to_cpu(ohdr->bth[0]) >> 24) & 0xff;
+		__entry->sl = (u8)(be16_to_cpu(hdr->lrh[0]) >> 4) & 0xf;
+		__entry->pkey =	be32_to_cpu(ohdr->bth[0]) & 0xffff;
+		__entry->hdr_len = hdr_len;
+		__entry->data_len = data_len;
+		memcpy(__get_dynamic_array(raw_hdr), hdr, hdr_len);
+		memcpy(__get_dynamic_array(raw_pkt), data, data_len);
+	),
+	TP_printk("[%s] " SNOOP_PRN,
+		__get_str(dev),
+		__entry->slid,
+		__entry->dlid,
+		__entry->qpn,
+		__entry->opcode,
+		show_ib_opcode(__entry->opcode),
+		__entry->sl,
+		__entry->pkey,
+		__entry->hdr_len,
+		__entry->data_len
+	)
+);
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hfi1_ctxts
+
+#define UCTXT_FMT \
+	"cred:%u, credaddr:0x%llx, piobase:0x%llx, rcvhdr_cnt:%u, "	\
+	"rcvbase:0x%llx, rcvegrc:%u, rcvegrb:0x%llx"
+TRACE_EVENT(hfi1_uctxtdata,
+	    TP_PROTO(struct hfi1_devdata *dd, struct hfi1_ctxtdata *uctxt),
+	    TP_ARGS(dd, uctxt),
+	    TP_STRUCT__entry(
+		    DD_DEV_ENTRY(dd)
+		    __field(unsigned, ctxt)
+		    __field(u32, credits)
+		    __field(u64, hw_free)
+		    __field(u64, piobase)
+		    __field(u16, rcvhdrq_cnt)
+		    __field(u64, rcvhdrq_phys)
+		    __field(u32, eager_cnt)
+		    __field(u64, rcvegr_phys)
+		    ),
+	    TP_fast_assign(
+		    DD_DEV_ASSIGN(dd);
+		    __entry->ctxt = uctxt->ctxt;
+		    __entry->credits = uctxt->sc->credits;
+		    __entry->hw_free = (u64)uctxt->sc->hw_free;
+		    __entry->piobase = (u64)uctxt->sc->base_addr;
+		    __entry->rcvhdrq_cnt = uctxt->rcvhdrq_cnt;
+		    __entry->rcvhdrq_phys = uctxt->rcvhdrq_phys;
+		    __entry->eager_cnt = uctxt->egrbufs.alloced;
+		    __entry->rcvegr_phys = uctxt->egrbufs.rcvtids[0].phys;
+		    ),
+	    TP_printk(
+		    "[%s] ctxt %u " UCTXT_FMT,
+		    __get_str(dev),
+		    __entry->ctxt,
+		    __entry->credits,
+		    __entry->hw_free,
+		    __entry->piobase,
+		    __entry->rcvhdrq_cnt,
+		    __entry->rcvhdrq_phys,
+		    __entry->eager_cnt,
+		    __entry->rcvegr_phys
+		    )
+	);
+
+#define CINFO_FMT \
+	"egrtids:%u, egr_size:%u, hdrq_cnt:%u, hdrq_size:%u, sdma_ring_size:%u"
+TRACE_EVENT(hfi1_ctxt_info,
+	    TP_PROTO(struct hfi1_devdata *dd, unsigned ctxt, unsigned subctxt,
+		     struct hfi1_ctxt_info cinfo),
+	    TP_ARGS(dd, ctxt, subctxt, cinfo),
+	    TP_STRUCT__entry(
+		    DD_DEV_ENTRY(dd)
+		    __field(unsigned, ctxt)
+		    __field(unsigned, subctxt)
+		    __field(u16, egrtids)
+		    __field(u16, rcvhdrq_cnt)
+		    __field(u16, rcvhdrq_size)
+		    __field(u16, sdma_ring_size)
+		    __field(u32, rcvegr_size)
+		    ),
+	    TP_fast_assign(
+		    DD_DEV_ASSIGN(dd);
+		    __entry->ctxt = ctxt;
+		    __entry->subctxt = subctxt;
+		    __entry->egrtids = cinfo.egrtids;
+		    __entry->rcvhdrq_cnt = cinfo.rcvhdrq_cnt;
+		    __entry->rcvhdrq_size = cinfo.rcvhdrq_entsize;
+		    __entry->sdma_ring_size = cinfo.sdma_ring_size;
+		    __entry->rcvegr_size = cinfo.rcvegr_size;
+		    ),
+	    TP_printk(
+		    "[%s] ctxt %u:%u " CINFO_FMT,
+		    __get_str(dev),
+		    __entry->ctxt,
+		    __entry->subctxt,
+		    __entry->egrtids,
+		    __entry->rcvegr_size,
+		    __entry->rcvhdrq_cnt,
+		    __entry->rcvhdrq_size,
+		    __entry->sdma_ring_size
+		    )
+	);
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hfi1_sma
+
+#define BCT_FORMAT \
+	"shared_limit %x vls 0-7 [%x,%x][%x,%x][%x,%x][%x,%x][%x,%x][%x,%x][%x,%x][%x,%x] 15 [%x,%x]"
+
+#define BCT(field) \
+	be16_to_cpu( \
+		((struct buffer_control *)__get_dynamic_array(bct))->field \
+	)
+
+DECLARE_EVENT_CLASS(hfi1_bct_template,
+	TP_PROTO(struct hfi1_devdata *dd, struct buffer_control *bc),
+	TP_ARGS(dd, bc),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(dd)
+		__dynamic_array(u8, bct, sizeof(*bc))
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(dd);
+		memcpy(
+			__get_dynamic_array(bct),
+			bc,
+			sizeof(*bc));
+	),
+	TP_printk(BCT_FORMAT,
+		BCT(overall_shared_limit),
+
+		BCT(vl[0].dedicated),
+		BCT(vl[0].shared),
+
+		BCT(vl[1].dedicated),
+		BCT(vl[1].shared),
+
+		BCT(vl[2].dedicated),
+		BCT(vl[2].shared),
+
+		BCT(vl[3].dedicated),
+		BCT(vl[3].shared),
+
+		BCT(vl[4].dedicated),
+		BCT(vl[4].shared),
+
+		BCT(vl[5].dedicated),
+		BCT(vl[5].shared),
+
+		BCT(vl[6].dedicated),
+		BCT(vl[6].shared),
+
+		BCT(vl[7].dedicated),
+		BCT(vl[7].shared),
+
+		BCT(vl[15].dedicated),
+		BCT(vl[15].shared)
+	)
+);
+
+
+DEFINE_EVENT(hfi1_bct_template, bct_set,
+	     TP_PROTO(struct hfi1_devdata *dd, struct buffer_control *bc),
+	     TP_ARGS(dd, bc));
+
+DEFINE_EVENT(hfi1_bct_template, bct_get,
+	     TP_PROTO(struct hfi1_devdata *dd, struct buffer_control *bc),
+	     TP_ARGS(dd, bc));
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hfi1_sdma
+
+TRACE_EVENT(hfi1_sdma_descriptor,
+	TP_PROTO(
+		struct sdma_engine *sde,
+		u64 desc0,
+		u64 desc1,
+		u16 e,
+		void *descp),
+	TP_ARGS(sde, desc0, desc1, e, descp),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(sde->dd)
+		__field(void *, descp)
+		__field(u64, desc0)
+		__field(u64, desc1)
+		__field(u16, e)
+		__field(u8, idx)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(sde->dd);
+		__entry->desc0 = desc0;
+		__entry->desc1 = desc1;
+		__entry->idx = sde->this_idx;
+		__entry->descp = descp;
+		__entry->e = e;
+	),
+	TP_printk(
+		"[%s] SDE(%u) flags:%s addr:0x%016llx gen:%u len:%u d0:%016llx d1:%016llx to %p,%u",
+		__get_str(dev),
+		__entry->idx,
+		__parse_sdma_flags(__entry->desc0, __entry->desc1),
+		(__entry->desc0 >> SDMA_DESC0_PHY_ADDR_SHIFT)
+			& SDMA_DESC0_PHY_ADDR_MASK,
+		(u8)((__entry->desc1 >> SDMA_DESC1_GENERATION_SHIFT)
+			& SDMA_DESC1_GENERATION_MASK),
+		(u16)((__entry->desc0 >> SDMA_DESC0_BYTE_COUNT_SHIFT)
+			& SDMA_DESC0_BYTE_COUNT_MASK),
+		__entry->desc0,
+		__entry->desc1,
+		__entry->descp,
+		__entry->e
+	)
+);
+
+TRACE_EVENT(hfi1_sdma_engine_select,
+	TP_PROTO(struct hfi1_devdata *dd, u32 sel, u8 vl, u8 idx),
+	TP_ARGS(dd, sel, vl, idx),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(dd)
+		__field(u32, sel)
+		__field(u8, vl)
+		__field(u8, idx)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(dd);
+		__entry->sel = sel;
+		__entry->vl = vl;
+		__entry->idx = idx;
+	),
+	TP_printk(
+		"[%s] selecting SDE %u sel 0x%x vl %u",
+		__get_str(dev),
+		__entry->idx,
+		__entry->sel,
+		__entry->vl
+	)
+);
+
+DECLARE_EVENT_CLASS(hfi1_sdma_engine_class,
+	TP_PROTO(
+		struct sdma_engine *sde,
+		u64 status
+	),
+	TP_ARGS(sde, status),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(sde->dd)
+		__field(u64, status)
+		__field(u8, idx)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(sde->dd);
+		__entry->status = status;
+		__entry->idx = sde->this_idx;
+	),
+	TP_printk(
+		"[%s] SDE(%u) status %llx",
+		__get_str(dev),
+		__entry->idx,
+		(unsigned long long)__entry->status
+	)
+);
+
+DEFINE_EVENT(hfi1_sdma_engine_class, hfi1_sdma_engine_interrupt,
+	TP_PROTO(
+		struct sdma_engine *sde,
+		u64 status
+	),
+	TP_ARGS(sde, status)
+);
+
+DEFINE_EVENT(hfi1_sdma_engine_class, hfi1_sdma_engine_progress,
+	TP_PROTO(
+		struct sdma_engine *sde,
+		u64 status
+	),
+	TP_ARGS(sde, status)
+);
+
+DECLARE_EVENT_CLASS(hfi1_sdma_ahg_ad,
+	TP_PROTO(
+		struct sdma_engine *sde,
+		int aidx
+	),
+	TP_ARGS(sde, aidx),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(sde->dd)
+		__field(int, aidx)
+		__field(u8, idx)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(sde->dd);
+		__entry->idx = sde->this_idx;
+		__entry->aidx = aidx;
+	),
+	TP_printk(
+		"[%s] SDE(%u) aidx %d",
+		__get_str(dev),
+		__entry->idx,
+		__entry->aidx
+	)
+);
+
+DEFINE_EVENT(hfi1_sdma_ahg_ad, hfi1_ahg_allocate,
+	     TP_PROTO(
+		struct sdma_engine *sde,
+		int aidx
+	     ),
+	     TP_ARGS(sde, aidx));
+
+DEFINE_EVENT(hfi1_sdma_ahg_ad, hfi1_ahg_deallocate,
+	     TP_PROTO(
+		struct sdma_engine *sde,
+		int aidx
+	     ),
+	     TP_ARGS(sde, aidx));
+
+#ifdef CONFIG_HFI1_DEBUG_SDMA_ORDER
+TRACE_EVENT(hfi1_sdma_progress,
+	TP_PROTO(
+		struct sdma_engine *sde,
+		u16 hwhead,
+		u16 swhead,
+		struct sdma_txreq *txp
+	),
+	TP_ARGS(sde, hwhead, swhead, txp),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(sde->dd)
+		__field(u64, sn)
+		__field(u16, hwhead)
+		__field(u16, swhead)
+		__field(u16, txnext)
+		__field(u16, tx_tail)
+		__field(u16, tx_head)
+		__field(u8, idx)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(sde->dd);
+		__entry->hwhead = hwhead;
+		__entry->swhead = swhead;
+		__entry->tx_tail = sde->tx_tail;
+		__entry->tx_head = sde->tx_head;
+		__entry->txnext = txp ? txp->next_descq_idx : ~0;
+		__entry->idx = sde->this_idx;
+		__entry->sn = txp ? txp->sn : ~0;
+	),
+	TP_printk(
+		"[%s] SDE(%u) sn %llu hwhead %u swhead %u next_descq_idx %u tx_head %u tx_tail %u",
+		__get_str(dev),
+		__entry->idx,
+		__entry->sn,
+		__entry->hwhead,
+		__entry->swhead,
+		__entry->txnext,
+		__entry->tx_head,
+		__entry->tx_tail
+	)
+);
+#else
+TRACE_EVENT(hfi1_sdma_progress,
+	    TP_PROTO(
+		struct sdma_engine *sde,
+		u16 hwhead,
+		u16 swhead,
+		struct sdma_txreq *txp
+	    ),
+	TP_ARGS(sde, hwhead, swhead, txp),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(sde->dd)
+		__field(u16, hwhead)
+		__field(u16, swhead)
+		__field(u16, txnext)
+		__field(u16, tx_tail)
+		__field(u16, tx_head)
+		__field(u8, idx)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(sde->dd);
+		__entry->hwhead = hwhead;
+		__entry->swhead = swhead;
+		__entry->tx_tail = sde->tx_tail;
+		__entry->tx_head = sde->tx_head;
+		__entry->txnext = txp ? txp->next_descq_idx : ~0;
+		__entry->idx = sde->this_idx;
+	),
+	TP_printk(
+		"[%s] SDE(%u) hwhead %u swhead %u next_descq_idx %u tx_head %u tx_tail %u",
+		__get_str(dev),
+		__entry->idx,
+		__entry->hwhead,
+		__entry->swhead,
+		__entry->txnext,
+		__entry->tx_head,
+		__entry->tx_tail
+	)
+);
+#endif
+
+DECLARE_EVENT_CLASS(hfi1_sdma_sn,
+	TP_PROTO(
+		struct sdma_engine *sde,
+		u64 sn
+	),
+	TP_ARGS(sde, sn),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(sde->dd)
+		__field(u64, sn)
+		__field(u8, idx)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(sde->dd);
+		__entry->sn = sn;
+		__entry->idx = sde->this_idx;
+	),
+	TP_printk(
+		"[%s] SDE(%u) sn %llu",
+		__get_str(dev),
+		__entry->idx,
+		__entry->sn
+	)
+);
+
+DEFINE_EVENT(hfi1_sdma_sn, hfi1_sdma_out_sn,
+	     TP_PROTO(
+		struct sdma_engine *sde,
+		u64 sn
+	     ),
+	     TP_ARGS(sde, sn)
+);
+
+DEFINE_EVENT(hfi1_sdma_sn, hfi1_sdma_in_sn,
+	     TP_PROTO(
+		struct sdma_engine *sde,
+		u64 sn
+	     ),
+	     TP_ARGS(sde, sn)
+);
+
+#define USDMA_HDR_FORMAT \
+	"[%s:%u:%u:%u] PBC=(0x%x 0x%x) LRH=(0x%x 0x%x) BTH=(0x%x 0x%x 0x%x) KDETH=(0x%x 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x 0x%x) TIDVal=0x%x"
+
+TRACE_EVENT(hfi1_sdma_user_header,
+	    TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u8 subctxt, u16 req,
+		     struct hfi1_pkt_header *hdr, u32 tidval),
+	    TP_ARGS(dd, ctxt, subctxt, req, hdr, tidval),
+	    TP_STRUCT__entry(
+		    DD_DEV_ENTRY(dd)
+		    __field(u16, ctxt)
+		    __field(u8, subctxt)
+		    __field(u16, req)
+		    __field(__le32, pbc0)
+		    __field(__le32, pbc1)
+		    __field(__be32, lrh0)
+		    __field(__be32, lrh1)
+		    __field(__be32, bth0)
+		    __field(__be32, bth1)
+		    __field(__be32, bth2)
+		    __field(__le32, kdeth0)
+		    __field(__le32, kdeth1)
+		    __field(__le32, kdeth2)
+		    __field(__le32, kdeth3)
+		    __field(__le32, kdeth4)
+		    __field(__le32, kdeth5)
+		    __field(__le32, kdeth6)
+		    __field(__le32, kdeth7)
+		    __field(__le32, kdeth8)
+		    __field(u32, tidval)
+		    ),
+	    TP_fast_assign(
+		    __le32 *pbc = (__le32 *)hdr->pbc;
+		    __be32 *lrh = (__be32 *)hdr->lrh;
+		    __be32 *bth = (__be32 *)hdr->bth;
+		    __le32 *kdeth = (__le32 *)&hdr->kdeth;
+
+		    DD_DEV_ASSIGN(dd);
+		    __entry->ctxt = ctxt;
+		    __entry->subctxt = subctxt;
+		    __entry->req = req;
+		    __entry->pbc0 = pbc[0];
+		    __entry->pbc1 = pbc[1];
+		    __entry->lrh0 = be32_to_cpu(lrh[0]);
+		    __entry->lrh1 = be32_to_cpu(lrh[1]);
+		    __entry->bth0 = be32_to_cpu(bth[0]);
+		    __entry->bth1 = be32_to_cpu(bth[1]);
+		    __entry->bth2 = be32_to_cpu(bth[2]);
+		    __entry->kdeth0 = kdeth[0];
+		    __entry->kdeth1 = kdeth[1];
+		    __entry->kdeth2 = kdeth[2];
+		    __entry->kdeth3 = kdeth[3];
+		    __entry->kdeth4 = kdeth[4];
+		    __entry->kdeth5 = kdeth[5];
+		    __entry->kdeth6 = kdeth[6];
+		    __entry->kdeth7 = kdeth[7];
+		    __entry->kdeth8 = kdeth[8];
+		    __entry->tidval = tidval;
+		    ),
+	    TP_printk(USDMA_HDR_FORMAT,
+		      __get_str(dev),
+		      __entry->ctxt,
+		      __entry->subctxt,
+		      __entry->req,
+		      __entry->pbc1,
+		      __entry->pbc0,
+		      __entry->lrh0,
+		      __entry->lrh1,
+		      __entry->bth0,
+		      __entry->bth1,
+		      __entry->bth2,
+		      __entry->kdeth0,
+		      __entry->kdeth1,
+		      __entry->kdeth2,
+		      __entry->kdeth3,
+		      __entry->kdeth4,
+		      __entry->kdeth5,
+		      __entry->kdeth6,
+		      __entry->kdeth7,
+		      __entry->kdeth8,
+		      __entry->tidval
+		    )
+	);
+
+#define SDMA_UREQ_FMT \
+	"[%s:%u:%u] ver/op=0x%x, iovcnt=%u, npkts=%u, frag=%u, idx=%u"
+TRACE_EVENT(hfi1_sdma_user_reqinfo,
+	    TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u8 subctxt, u16 *i),
+	    TP_ARGS(dd, ctxt, subctxt, i),
+	    TP_STRUCT__entry(
+		    DD_DEV_ENTRY(dd);
+		    __field(u16, ctxt)
+		    __field(u8, subctxt)
+		    __field(u8, ver_opcode)
+		    __field(u8, iovcnt)
+		    __field(u16, npkts)
+		    __field(u16, fragsize)
+		    __field(u16, comp_idx)
+		    ),
+	    TP_fast_assign(
+		    DD_DEV_ASSIGN(dd);
+		    __entry->ctxt = ctxt;
+		    __entry->subctxt = subctxt;
+		    __entry->ver_opcode = i[0] & 0xff;
+		    __entry->iovcnt = (i[0] >> 8) & 0xff;
+		    __entry->npkts = i[1];
+		    __entry->fragsize = i[2];
+		    __entry->comp_idx = i[3];
+		    ),
+	    TP_printk(SDMA_UREQ_FMT,
+		      __get_str(dev),
+		      __entry->ctxt,
+		      __entry->subctxt,
+		      __entry->ver_opcode,
+		      __entry->iovcnt,
+		      __entry->npkts,
+		      __entry->fragsize,
+		      __entry->comp_idx
+		    )
+	);
+
+#define usdma_complete_name(st) { st, #st }
+#define show_usdma_complete_state(st)			\
+	__print_symbolic(st,				\
+			 usdma_complete_name(FREE),	\
+			 usdma_complete_name(QUEUED),	\
+			 usdma_complete_name(COMPLETE), \
+			 usdma_complete_name(ERROR))
+
+TRACE_EVENT(hfi1_sdma_user_completion,
+	    TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u8 subctxt, u16 idx,
+		     u8 state, int code),
+	    TP_ARGS(dd, ctxt, subctxt, idx, state, code),
+	    TP_STRUCT__entry(
+		    DD_DEV_ENTRY(dd)
+		    __field(u16, ctxt)
+		    __field(u8, subctxt)
+		    __field(u16, idx)
+		    __field(u8, state)
+		    __field(int, code)
+		    ),
+	    TP_fast_assign(
+		    DD_DEV_ASSIGN(dd);
+		    __entry->ctxt = ctxt;
+		    __entry->subctxt = subctxt;
+		    __entry->idx = idx;
+		    __entry->state = state;
+		    __entry->code = code;
+		    ),
+	    TP_printk("[%s:%u:%u:%u] SDMA completion state %s (%d)",
+		      __get_str(dev), __entry->ctxt, __entry->subctxt,
+		      __entry->idx, show_usdma_complete_state(__entry->state),
+		      __entry->code)
+	);
+
+const char *print_u32_array(struct trace_seq *, u32 *, int);
+#define __print_u32_hex(arr, len) print_u32_array(p, arr, len)
+
+TRACE_EVENT(hfi1_sdma_user_header_ahg,
+	    TP_PROTO(struct hfi1_devdata *dd, u16 ctxt, u8 subctxt, u16 req,
+		     u8 sde, u8 ahgidx, u32 *ahg, int len, u32 tidval),
+	    TP_ARGS(dd, ctxt, subctxt, req, sde, ahgidx, ahg, len, tidval),
+	    TP_STRUCT__entry(
+		    DD_DEV_ENTRY(dd)
+		    __field(u16, ctxt)
+		    __field(u8, subctxt)
+		    __field(u16, req)
+		    __field(u8, sde)
+		    __field(u8, idx)
+		    __field(int, len)
+		    __field(u32, tidval)
+		    __array(u32, ahg, 10)
+		    ),
+	    TP_fast_assign(
+		    DD_DEV_ASSIGN(dd);
+		    __entry->ctxt = ctxt;
+		    __entry->subctxt = subctxt;
+		    __entry->req = req;
+		    __entry->sde = sde;
+		    __entry->idx = ahgidx;
+		    __entry->len = len;
+		    __entry->tidval = tidval;
+		    memcpy(__entry->ahg, ahg, len * sizeof(u32));
+		    ),
+	    TP_printk("[%s:%u:%u:%u] (SDE%u/AHG%u) ahg[0-%d]=(%s) TIDVal=0x%x",
+		      __get_str(dev),
+		      __entry->ctxt,
+		      __entry->subctxt,
+		      __entry->req,
+		      __entry->sde,
+		      __entry->idx,
+		      __entry->len - 1,
+		      __print_u32_hex(__entry->ahg, __entry->len),
+		      __entry->tidval
+		    )
+	);
+
+TRACE_EVENT(hfi1_sdma_state,
+	TP_PROTO(
+		struct sdma_engine *sde,
+		const char *cstate,
+		const char *nstate
+	),
+	TP_ARGS(sde, cstate, nstate),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(sde->dd)
+		__string(curstate, cstate)
+		__string(newstate, nstate)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(sde->dd);
+		__assign_str(curstate, cstate);
+		__assign_str(newstate, nstate);
+	),
+	TP_printk("[%s] current state %s new state %s",
+		__get_str(dev),
+		__get_str(curstate),
+		__get_str(newstate)
+	)
+);
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hfi1_rc
+
+DECLARE_EVENT_CLASS(hfi1_sdma_rc,
+	TP_PROTO(struct hfi1_qp *qp, u32 psn),
+	TP_ARGS(qp, psn),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(dd_from_ibdev(qp->ibqp.device))
+		__field(u32, qpn)
+		__field(u32, flags)
+		__field(u32, psn)
+		__field(u32, sending_psn)
+		__field(u32, sending_hpsn)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(dd_from_ibdev(qp->ibqp.device))
+		__entry->qpn = qp->ibqp.qp_num;
+		__entry->flags = qp->s_flags;
+		__entry->psn = psn;
+		__entry->sending_psn = qp->s_sending_psn;
+		__entry->sending_hpsn = qp->s_sending_hpsn;
+	),
+	TP_printk(
+		"[%s] qpn 0x%x flags 0x%x psn 0x%x sending_psn 0x%x sending_hpsn 0x%x",
+		__get_str(dev),
+		__entry->qpn,
+		__entry->flags,
+		__entry->psn,
+		__entry->sending_psn,
+		__entry->sending_psn
+	)
+);
+
+DEFINE_EVENT(hfi1_sdma_rc, hfi1_rc_sendcomplete,
+	     TP_PROTO(struct hfi1_qp *qp, u32 psn),
+	     TP_ARGS(qp, psn)
+);
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hfi1_misc
+
+TRACE_EVENT(hfi1_interrupt,
+	TP_PROTO(struct hfi1_devdata *dd, const struct is_table *is_entry,
+		 int src),
+	TP_ARGS(dd, is_entry, src),
+	TP_STRUCT__entry(
+		DD_DEV_ENTRY(dd)
+		__array(char, buf, 64)
+		__field(int, src)
+	),
+	TP_fast_assign(
+		DD_DEV_ASSIGN(dd)
+		is_entry->is_name(__entry->buf, 64, src - is_entry->start);
+		__entry->src = src;
+	),
+	TP_printk("[%s] source: %s [%d]", __get_str(dev), __entry->buf,
+		  __entry->src)
+);
+
+/*
+ * Note:
+ * This produces a REALLY ugly trace in the console output when the string is
+ * too long.
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM hfi1_trace
+
+#define MAX_MSG_LEN 512
+
+DECLARE_EVENT_CLASS(hfi1_trace_template,
+	TP_PROTO(const char *function, struct va_format *vaf),
+	TP_ARGS(function, vaf),
+	TP_STRUCT__entry(
+		__string(function, function)
+		__dynamic_array(char, msg, MAX_MSG_LEN)
+	),
+	TP_fast_assign(
+		__assign_str(function, function);
+		WARN_ON_ONCE(vsnprintf(__get_dynamic_array(msg),
+		     MAX_MSG_LEN, vaf->fmt,
+		     *vaf->va) >= MAX_MSG_LEN);
+	),
+	TP_printk("(%s) %s",
+		  __get_str(function),
+		  __get_str(msg))
+);
+
+/*
+ * It may be nice to macroize the __hfi1_trace but the va_* stuff requires an
+ * actual function to work and can not be in a macro.
+ */
+#define __hfi1_trace_def(lvl) \
+void __hfi1_trace_##lvl(const char *funct, char *fmt, ...);		\
+									\
+DEFINE_EVENT(hfi1_trace_template, hfi1_ ##lvl,				\
+	TP_PROTO(const char *function, struct va_format *vaf),		\
+	TP_ARGS(function, vaf))
+
+#define __hfi1_trace_fn(lvl) \
+void __hfi1_trace_##lvl(const char *func, char *fmt, ...)		\
+{									\
+	struct va_format vaf = {					\
+		.fmt = fmt,						\
+	};								\
+	va_list args;							\
+									\
+	va_start(args, fmt);						\
+	vaf.va = &args;							\
+	trace_hfi1_ ##lvl(func, &vaf);					\
+	va_end(args);							\
+	return;								\
+}
+
+/*
+ * To create a new trace level simply define it below and as a __hfi1_trace_fn
+ * in trace.c. This will create all the hooks for calling
+ * hfi1_cdbg(LVL, fmt, ...); as well as take care of all
+ * the debugfs stuff.
+ */
+__hfi1_trace_def(PKT);
+__hfi1_trace_def(PROC);
+__hfi1_trace_def(SDMA);
+__hfi1_trace_def(LINKVERB);
+__hfi1_trace_def(DEBUG);
+__hfi1_trace_def(SNOOP);
+__hfi1_trace_def(CNTR);
+__hfi1_trace_def(PIO);
+__hfi1_trace_def(DC8051);
+__hfi1_trace_def(FIRMWARE);
+__hfi1_trace_def(RCVCTRL);
+__hfi1_trace_def(TID);
+
+#define hfi1_cdbg(which, fmt, ...) \
+	__hfi1_trace_##which(__func__, fmt, ##__VA_ARGS__)
+
+#define hfi1_dbg(fmt, ...) \
+	hfi1_cdbg(DEBUG, fmt, ##__VA_ARGS__)
+
+/*
+ * Define HFI1_EARLY_DBG at compile time or here to enable early trace
+ * messages. Do not check in an enablement for this.
+ */
+
+#ifdef HFI1_EARLY_DBG
+#define hfi1_dbg_early(fmt, ...) \
+	trace_printk(fmt, ##__VA_ARGS__)
+#else
+#define hfi1_dbg_early(fmt, ...)
+#endif
+
+#endif /* __HFI1_TRACE_H */
+
+#undef TRACE_INCLUDE_PATH
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace
+#include <trace/define_trace.h>
diff --git a/drivers/staging/rdma/hfi1/twsi.c b/drivers/staging/rdma/hfi1/twsi.c
new file mode 100644
index 000000000000..ea54fd2700ad
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/twsi.c
@@ -0,0 +1,518 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/delay.h>
+#include <linux/pci.h>
+#include <linux/vmalloc.h>
+
+#include "hfi.h"
+#include "twsi.h"
+
+/*
+ * "Two Wire Serial Interface" support.
+ *
+ * Originally written for a not-quite-i2c serial eeprom, which is
+ * still used on some supported boards. Later boards have added a
+ * variety of other uses, most board-specific, so the bit-boffing
+ * part has been split off to this file, while the other parts
+ * have been moved to chip-specific files.
+ *
+ * We have also dropped all pretense of fully generic (e.g. pretend
+ * we don't know whether '1' is the higher voltage) interface, as
+ * the restrictions of the generic i2c interface (e.g. no access from
+ * driver itself) make it unsuitable for this use.
+ */
+
+#define READ_CMD 1
+#define WRITE_CMD 0
+
+/**
+ * i2c_wait_for_writes - wait for a write
+ * @dd: the hfi1_ib device
+ *
+ * We use this instead of udelay directly, so we can make sure
+ * that previous register writes have been flushed all the way
+ * to the chip.  Since we are delaying anyway, the cost doesn't
+ * hurt, and makes the bit twiddling more regular
+ */
+static void i2c_wait_for_writes(struct hfi1_devdata *dd, u32 target)
+{
+	/*
+	 * implicit read of EXTStatus is as good as explicit
+	 * read of scratch, if all we want to do is flush
+	 * writes.
+	 */
+	hfi1_gpio_mod(dd, target, 0, 0, 0);
+	rmb(); /* inlined, so prevent compiler reordering */
+}
+
+/*
+ * QSFP modules are allowed to hold SCL low for 500uSec. Allow twice that
+ * for "almost compliant" modules
+ */
+#define SCL_WAIT_USEC 1000
+
+/* BUF_WAIT is time bus must be free between STOP or ACK and to next START.
+ * Should be 20, but some chips need more.
+ */
+#define TWSI_BUF_WAIT_USEC 60
+
+static void scl_out(struct hfi1_devdata *dd, u32 target, u8 bit)
+{
+	u32 mask;
+
+	udelay(1);
+
+	mask = QSFP_HFI0_I2CCLK;
+
+	/* SCL is meant to be bare-drain, so never set "OUT", just DIR */
+	hfi1_gpio_mod(dd, target, 0, bit ? 0 : mask, mask);
+
+	/*
+	 * Allow for slow slaves by simple
+	 * delay for falling edge, sampling on rise.
+	 */
+	if (!bit)
+		udelay(2);
+	else {
+		int rise_usec;
+
+		for (rise_usec = SCL_WAIT_USEC; rise_usec > 0; rise_usec -= 2) {
+			if (mask & hfi1_gpio_mod(dd, target, 0, 0, 0))
+				break;
+			udelay(2);
+		}
+		if (rise_usec <= 0)
+			dd_dev_err(dd, "SCL interface stuck low > %d uSec\n",
+				    SCL_WAIT_USEC);
+	}
+	i2c_wait_for_writes(dd, target);
+}
+
+static void sda_out(struct hfi1_devdata *dd, u32 target, u8 bit)
+{
+	u32 mask;
+
+	mask = QSFP_HFI0_I2CDAT;
+
+	/* SDA is meant to be bare-drain, so never set "OUT", just DIR */
+	hfi1_gpio_mod(dd, target, 0, bit ? 0 : mask, mask);
+
+	i2c_wait_for_writes(dd, target);
+	udelay(2);
+}
+
+static u8 sda_in(struct hfi1_devdata *dd, u32 target, int wait)
+{
+	u32 read_val, mask;
+
+	mask = QSFP_HFI0_I2CDAT;
+	/* SDA is meant to be bare-drain, so never set "OUT", just DIR */
+	hfi1_gpio_mod(dd, target, 0, 0, mask);
+	read_val = hfi1_gpio_mod(dd, target, 0, 0, 0);
+	if (wait)
+		i2c_wait_for_writes(dd, target);
+	return (read_val & mask) >> GPIO_SDA_NUM;
+}
+
+/**
+ * i2c_ackrcv - see if ack following write is true
+ * @dd: the hfi1_ib device
+ */
+static int i2c_ackrcv(struct hfi1_devdata *dd, u32 target)
+{
+	u8 ack_received;
+
+	/* AT ENTRY SCL = LOW */
+	/* change direction, ignore data */
+	ack_received = sda_in(dd, target, 1);
+	scl_out(dd, target, 1);
+	ack_received = sda_in(dd, target, 1) == 0;
+	scl_out(dd, target, 0);
+	return ack_received;
+}
+
+static void stop_cmd(struct hfi1_devdata *dd, u32 target);
+
+/**
+ * rd_byte - read a byte, sending STOP on last, else ACK
+ * @dd: the hfi1_ib device
+ *
+ * Returns byte shifted out of device
+ */
+static int rd_byte(struct hfi1_devdata *dd, u32 target, int last)
+{
+	int bit_cntr, data;
+
+	data = 0;
+
+	for (bit_cntr = 7; bit_cntr >= 0; --bit_cntr) {
+		data <<= 1;
+		scl_out(dd, target, 1);
+		data |= sda_in(dd, target, 0);
+		scl_out(dd, target, 0);
+	}
+	if (last) {
+		scl_out(dd, target, 1);
+		stop_cmd(dd, target);
+	} else {
+		sda_out(dd, target, 0);
+		scl_out(dd, target, 1);
+		scl_out(dd, target, 0);
+		sda_out(dd, target, 1);
+	}
+	return data;
+}
+
+/**
+ * wr_byte - write a byte, one bit at a time
+ * @dd: the hfi1_ib device
+ * @data: the byte to write
+ *
+ * Returns 0 if we got the following ack, otherwise 1
+ */
+static int wr_byte(struct hfi1_devdata *dd, u32 target, u8 data)
+{
+	int bit_cntr;
+	u8 bit;
+
+	for (bit_cntr = 7; bit_cntr >= 0; bit_cntr--) {
+		bit = (data >> bit_cntr) & 1;
+		sda_out(dd, target, bit);
+		scl_out(dd, target, 1);
+		scl_out(dd, target, 0);
+	}
+	return (!i2c_ackrcv(dd, target)) ? 1 : 0;
+}
+
+/*
+ * issue TWSI start sequence:
+ * (both clock/data high, clock high, data low while clock is high)
+ */
+static void start_seq(struct hfi1_devdata *dd, u32 target)
+{
+	sda_out(dd, target, 1);
+	scl_out(dd, target, 1);
+	sda_out(dd, target, 0);
+	udelay(1);
+	scl_out(dd, target, 0);
+}
+
+/**
+ * stop_seq - transmit the stop sequence
+ * @dd: the hfi1_ib device
+ *
+ * (both clock/data low, clock high, data high while clock is high)
+ */
+static void stop_seq(struct hfi1_devdata *dd, u32 target)
+{
+	scl_out(dd, target, 0);
+	sda_out(dd, target, 0);
+	scl_out(dd, target, 1);
+	sda_out(dd, target, 1);
+}
+
+/**
+ * stop_cmd - transmit the stop condition
+ * @dd: the hfi1_ib device
+ *
+ * (both clock/data low, clock high, data high while clock is high)
+ */
+static void stop_cmd(struct hfi1_devdata *dd, u32 target)
+{
+	stop_seq(dd, target);
+	udelay(TWSI_BUF_WAIT_USEC);
+}
+
+/**
+ * hfi1_twsi_reset - reset I2C communication
+ * @dd: the hfi1_ib device
+ */
+
+int hfi1_twsi_reset(struct hfi1_devdata *dd, u32 target)
+{
+	int clock_cycles_left = 9;
+	int was_high = 0;
+	u32 pins, mask;
+
+	/* Both SCL and SDA should be high. If not, there
+	 * is something wrong.
+	 */
+	mask = QSFP_HFI0_I2CCLK | QSFP_HFI0_I2CDAT;
+
+	/*
+	 * Force pins to desired innocuous state.
+	 * This is the default power-on state with out=0 and dir=0,
+	 * So tri-stated and should be floating high (barring HW problems)
+	 */
+	hfi1_gpio_mod(dd, target, 0, 0, mask);
+
+	/*
+	 * Clock nine times to get all listeners into a sane state.
+	 * If SDA does not go high at any point, we are wedged.
+	 * One vendor recommends then issuing START followed by STOP.
+	 * we cannot use our "normal" functions to do that, because
+	 * if SCL drops between them, another vendor's part will
+	 * wedge, dropping SDA and keeping it low forever, at the end of
+	 * the next transaction (even if it was not the device addressed).
+	 * So our START and STOP take place with SCL held high.
+	 */
+	while (clock_cycles_left--) {
+		scl_out(dd, target, 0);
+		scl_out(dd, target, 1);
+		/* Note if SDA is high, but keep clocking to sync slave */
+		was_high |= sda_in(dd, target, 0);
+	}
+
+	if (was_high) {
+		/*
+		 * We saw a high, which we hope means the slave is sync'd.
+		 * Issue START, STOP, pause for T_BUF.
+		 */
+
+		pins = hfi1_gpio_mod(dd, target, 0, 0, 0);
+		if ((pins & mask) != mask)
+			dd_dev_err(dd, "GPIO pins not at rest: %d\n",
+				    pins & mask);
+		/* Drop SDA to issue START */
+		udelay(1); /* Guarantee .6 uSec setup */
+		sda_out(dd, target, 0);
+		udelay(1); /* Guarantee .6 uSec hold */
+		/* At this point, SCL is high, SDA low. Raise SDA for STOP */
+		sda_out(dd, target, 1);
+		udelay(TWSI_BUF_WAIT_USEC);
+	}
+
+	return !was_high;
+}
+
+#define HFI1_TWSI_START 0x100
+#define HFI1_TWSI_STOP 0x200
+
+/* Write byte to TWSI, optionally prefixed with START or suffixed with
+ * STOP.
+ * returns 0 if OK (ACK received), else != 0
+ */
+static int twsi_wr(struct hfi1_devdata *dd, u32 target, int data, int flags)
+{
+	int ret = 1;
+
+	if (flags & HFI1_TWSI_START)
+		start_seq(dd, target);
+
+	/* Leaves SCL low (from i2c_ackrcv()) */
+	ret = wr_byte(dd, target, data);
+
+	if (flags & HFI1_TWSI_STOP)
+		stop_cmd(dd, target);
+	return ret;
+}
+
+/* Added functionality for IBA7220-based cards */
+#define HFI1_TEMP_DEV 0x98
+
+/*
+ * hfi1_twsi_blk_rd
+ * General interface for data transfer from twsi devices.
+ * One vestige of its former role is that it recognizes a device
+ * HFI1_TWSI_NO_DEV and does the correct operation for the legacy part,
+ * which responded to all TWSI device codes, interpreting them as
+ * address within device. On all other devices found on board handled by
+ * this driver, the device is followed by a one-byte "address" which selects
+ * the "register" or "offset" within the device from which data should
+ * be read.
+ */
+int hfi1_twsi_blk_rd(struct hfi1_devdata *dd, u32 target, int dev, int addr,
+		     void *buffer, int len)
+{
+	int ret;
+	u8 *bp = buffer;
+
+	ret = 1;
+
+	if (dev == HFI1_TWSI_NO_DEV) {
+		/* legacy not-really-I2C */
+		addr = (addr << 1) | READ_CMD;
+		ret = twsi_wr(dd, target, addr, HFI1_TWSI_START);
+	} else {
+		/* Actual I2C */
+		ret = twsi_wr(dd, target, dev | WRITE_CMD, HFI1_TWSI_START);
+		if (ret) {
+			stop_cmd(dd, target);
+			ret = 1;
+			goto bail;
+		}
+		/*
+		 * SFF spec claims we do _not_ stop after the addr
+		 * but simply issue a start with the "read" dev-addr.
+		 * Since we are implicitly waiting for ACK here,
+		 * we need t_buf (nominally 20uSec) before that start,
+		 * and cannot rely on the delay built in to the STOP
+		 */
+		ret = twsi_wr(dd, target, addr, 0);
+		udelay(TWSI_BUF_WAIT_USEC);
+
+		if (ret) {
+			dd_dev_err(dd,
+				"Failed to write interface read addr %02X\n",
+				addr);
+			ret = 1;
+			goto bail;
+		}
+		ret = twsi_wr(dd, target, dev | READ_CMD, HFI1_TWSI_START);
+	}
+	if (ret) {
+		stop_cmd(dd, target);
+		ret = 1;
+		goto bail;
+	}
+
+	/*
+	 * block devices keeps clocking data out as long as we ack,
+	 * automatically incrementing the address. Some have "pages"
+	 * whose boundaries will not be crossed, but the handling
+	 * of these is left to the caller, who is in a better
+	 * position to know.
+	 */
+	while (len-- > 0) {
+		/*
+		 * Get and store data, sending ACK if length remaining,
+		 * else STOP
+		 */
+		*bp++ = rd_byte(dd, target, !len);
+	}
+
+	ret = 0;
+
+bail:
+	return ret;
+}
+
+/*
+ * hfi1_twsi_blk_wr
+ * General interface for data transfer to twsi devices.
+ * One vestige of its former role is that it recognizes a device
+ * HFI1_TWSI_NO_DEV and does the correct operation for the legacy part,
+ * which responded to all TWSI device codes, interpreting them as
+ * address within device. On all other devices found on board handled by
+ * this driver, the device is followed by a one-byte "address" which selects
+ * the "register" or "offset" within the device to which data should
+ * be written.
+ */
+int hfi1_twsi_blk_wr(struct hfi1_devdata *dd, u32 target, int dev, int addr,
+		     const void *buffer, int len)
+{
+	int sub_len;
+	const u8 *bp = buffer;
+	int max_wait_time, i;
+	int ret = 1;
+
+	while (len > 0) {
+		if (dev == HFI1_TWSI_NO_DEV) {
+			if (twsi_wr(dd, target, (addr << 1) | WRITE_CMD,
+				    HFI1_TWSI_START)) {
+				goto failed_write;
+			}
+		} else {
+			/* Real I2C */
+			if (twsi_wr(dd, target,
+				    dev | WRITE_CMD, HFI1_TWSI_START))
+				goto failed_write;
+			ret = twsi_wr(dd, target, addr, 0);
+			if (ret) {
+				dd_dev_err(dd,
+					"Failed to write interface write addr %02X\n",
+					addr);
+				goto failed_write;
+			}
+		}
+
+		sub_len = min(len, 4);
+		addr += sub_len;
+		len -= sub_len;
+
+		for (i = 0; i < sub_len; i++)
+			if (twsi_wr(dd, target, *bp++, 0))
+				goto failed_write;
+
+		stop_cmd(dd, target);
+
+		/*
+		 * Wait for write complete by waiting for a successful
+		 * read (the chip replies with a zero after the write
+		 * cmd completes, and before it writes to the eeprom.
+		 * The startcmd for the read will fail the ack until
+		 * the writes have completed.   We do this inline to avoid
+		 * the debug prints that are in the real read routine
+		 * if the startcmd fails.
+		 * We also use the proper device address, so it doesn't matter
+		 * whether we have real eeprom_dev. Legacy likes any address.
+		 */
+		max_wait_time = 100;
+		while (twsi_wr(dd, target,
+			       dev | READ_CMD, HFI1_TWSI_START)) {
+			stop_cmd(dd, target);
+			if (!--max_wait_time)
+				goto failed_write;
+		}
+		/* now read (and ignore) the resulting byte */
+		rd_byte(dd, target, 1);
+	}
+
+	ret = 0;
+	goto bail;
+
+failed_write:
+	stop_cmd(dd, target);
+	ret = 1;
+
+bail:
+	return ret;
+}
diff --git a/drivers/staging/rdma/hfi1/twsi.h b/drivers/staging/rdma/hfi1/twsi.h
new file mode 100644
index 000000000000..5907e029613d
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/twsi.h
@@ -0,0 +1,68 @@
+#ifndef _TWSI_H
+#define _TWSI_H
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#define HFI1_TWSI_NO_DEV 0xFF
+
+struct hfi1_devdata;
+
+/* Bit position of SDA pin in ASIC_QSFP* registers  */
+#define  GPIO_SDA_NUM 1
+
+/* these functions must be called with qsfp_lock held */
+int hfi1_twsi_reset(struct hfi1_devdata *dd, u32 target);
+int hfi1_twsi_blk_rd(struct hfi1_devdata *dd, u32 target, int dev, int addr,
+		     void *buffer, int len);
+int hfi1_twsi_blk_wr(struct hfi1_devdata *dd, u32 target, int dev, int addr,
+		     const void *buffer, int len);
+
+
+#endif /* _TWSI_H */
diff --git a/drivers/staging/rdma/hfi1/uc.c b/drivers/staging/rdma/hfi1/uc.c
new file mode 100644
index 000000000000..b536f397737c
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/uc.c
@@ -0,0 +1,585 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include "hfi.h"
+#include "sdma.h"
+#include "qp.h"
+
+/* cut down ridiculously long IB macro names */
+#define OP(x) IB_OPCODE_UC_##x
+
+/**
+ * hfi1_make_uc_req - construct a request packet (SEND, RDMA write)
+ * @qp: a pointer to the QP
+ *
+ * Return 1 if constructed; otherwise, return 0.
+ */
+int hfi1_make_uc_req(struct hfi1_qp *qp)
+{
+	struct hfi1_other_headers *ohdr;
+	struct hfi1_swqe *wqe;
+	unsigned long flags;
+	u32 hwords = 5;
+	u32 bth0 = 0;
+	u32 len;
+	u32 pmtu = qp->pmtu;
+	int ret = 0;
+	int middle = 0;
+
+	spin_lock_irqsave(&qp->s_lock, flags);
+
+	if (!(ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_SEND_OK)) {
+		if (!(ib_hfi1_state_ops[qp->state] & HFI1_FLUSH_SEND))
+			goto bail;
+		/* We are in the error state, flush the work request. */
+		if (qp->s_last == qp->s_head)
+			goto bail;
+		/* If DMAs are in progress, we can't flush immediately. */
+		if (atomic_read(&qp->s_iowait.sdma_busy)) {
+			qp->s_flags |= HFI1_S_WAIT_DMA;
+			goto bail;
+		}
+		clear_ahg(qp);
+		wqe = get_swqe_ptr(qp, qp->s_last);
+		hfi1_send_complete(qp, wqe, IB_WC_WR_FLUSH_ERR);
+		goto done;
+	}
+
+	ohdr = &qp->s_hdr->ibh.u.oth;
+	if (qp->remote_ah_attr.ah_flags & IB_AH_GRH)
+		ohdr = &qp->s_hdr->ibh.u.l.oth;
+
+	/* Get the next send request. */
+	wqe = get_swqe_ptr(qp, qp->s_cur);
+	qp->s_wqe = NULL;
+	switch (qp->s_state) {
+	default:
+		if (!(ib_hfi1_state_ops[qp->state] &
+		    HFI1_PROCESS_NEXT_SEND_OK))
+			goto bail;
+		/* Check if send work queue is empty. */
+		if (qp->s_cur == qp->s_head) {
+			clear_ahg(qp);
+			goto bail;
+		}
+		/*
+		 * Start a new request.
+		 */
+		wqe->psn = qp->s_next_psn;
+		qp->s_psn = qp->s_next_psn;
+		qp->s_sge.sge = wqe->sg_list[0];
+		qp->s_sge.sg_list = wqe->sg_list + 1;
+		qp->s_sge.num_sge = wqe->wr.num_sge;
+		qp->s_sge.total_len = wqe->length;
+		len = wqe->length;
+		qp->s_len = len;
+		switch (wqe->wr.opcode) {
+		case IB_WR_SEND:
+		case IB_WR_SEND_WITH_IMM:
+			if (len > pmtu) {
+				qp->s_state = OP(SEND_FIRST);
+				len = pmtu;
+				break;
+			}
+			if (wqe->wr.opcode == IB_WR_SEND)
+				qp->s_state = OP(SEND_ONLY);
+			else {
+				qp->s_state =
+					OP(SEND_ONLY_WITH_IMMEDIATE);
+				/* Immediate data comes after the BTH */
+				ohdr->u.imm_data = wqe->wr.ex.imm_data;
+				hwords += 1;
+			}
+			if (wqe->wr.send_flags & IB_SEND_SOLICITED)
+				bth0 |= IB_BTH_SOLICITED;
+			qp->s_wqe = wqe;
+			if (++qp->s_cur >= qp->s_size)
+				qp->s_cur = 0;
+			break;
+
+		case IB_WR_RDMA_WRITE:
+		case IB_WR_RDMA_WRITE_WITH_IMM:
+			ohdr->u.rc.reth.vaddr =
+				cpu_to_be64(wqe->wr.wr.rdma.remote_addr);
+			ohdr->u.rc.reth.rkey =
+				cpu_to_be32(wqe->wr.wr.rdma.rkey);
+			ohdr->u.rc.reth.length = cpu_to_be32(len);
+			hwords += sizeof(struct ib_reth) / 4;
+			if (len > pmtu) {
+				qp->s_state = OP(RDMA_WRITE_FIRST);
+				len = pmtu;
+				break;
+			}
+			if (wqe->wr.opcode == IB_WR_RDMA_WRITE)
+				qp->s_state = OP(RDMA_WRITE_ONLY);
+			else {
+				qp->s_state =
+					OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE);
+				/* Immediate data comes after the RETH */
+				ohdr->u.rc.imm_data = wqe->wr.ex.imm_data;
+				hwords += 1;
+				if (wqe->wr.send_flags & IB_SEND_SOLICITED)
+					bth0 |= IB_BTH_SOLICITED;
+			}
+			qp->s_wqe = wqe;
+			if (++qp->s_cur >= qp->s_size)
+				qp->s_cur = 0;
+			break;
+
+		default:
+			goto bail;
+		}
+		break;
+
+	case OP(SEND_FIRST):
+		qp->s_state = OP(SEND_MIDDLE);
+		/* FALLTHROUGH */
+	case OP(SEND_MIDDLE):
+		len = qp->s_len;
+		if (len > pmtu) {
+			len = pmtu;
+			middle = HFI1_CAP_IS_KSET(SDMA_AHG);
+			break;
+		}
+		if (wqe->wr.opcode == IB_WR_SEND)
+			qp->s_state = OP(SEND_LAST);
+		else {
+			qp->s_state = OP(SEND_LAST_WITH_IMMEDIATE);
+			/* Immediate data comes after the BTH */
+			ohdr->u.imm_data = wqe->wr.ex.imm_data;
+			hwords += 1;
+		}
+		if (wqe->wr.send_flags & IB_SEND_SOLICITED)
+			bth0 |= IB_BTH_SOLICITED;
+		qp->s_wqe = wqe;
+		if (++qp->s_cur >= qp->s_size)
+			qp->s_cur = 0;
+		break;
+
+	case OP(RDMA_WRITE_FIRST):
+		qp->s_state = OP(RDMA_WRITE_MIDDLE);
+		/* FALLTHROUGH */
+	case OP(RDMA_WRITE_MIDDLE):
+		len = qp->s_len;
+		if (len > pmtu) {
+			len = pmtu;
+			middle = HFI1_CAP_IS_KSET(SDMA_AHG);
+			break;
+		}
+		if (wqe->wr.opcode == IB_WR_RDMA_WRITE)
+			qp->s_state = OP(RDMA_WRITE_LAST);
+		else {
+			qp->s_state =
+				OP(RDMA_WRITE_LAST_WITH_IMMEDIATE);
+			/* Immediate data comes after the BTH */
+			ohdr->u.imm_data = wqe->wr.ex.imm_data;
+			hwords += 1;
+			if (wqe->wr.send_flags & IB_SEND_SOLICITED)
+				bth0 |= IB_BTH_SOLICITED;
+		}
+		qp->s_wqe = wqe;
+		if (++qp->s_cur >= qp->s_size)
+			qp->s_cur = 0;
+		break;
+	}
+	qp->s_len -= len;
+	qp->s_hdrwords = hwords;
+	qp->s_cur_sge = &qp->s_sge;
+	qp->s_cur_size = len;
+	hfi1_make_ruc_header(qp, ohdr, bth0 | (qp->s_state << 24),
+			     mask_psn(qp->s_next_psn++), middle);
+done:
+	ret = 1;
+	goto unlock;
+
+bail:
+	qp->s_flags &= ~HFI1_S_BUSY;
+unlock:
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+	return ret;
+}
+
+/**
+ * hfi1_uc_rcv - handle an incoming UC packet
+ * @ibp: the port the packet came in on
+ * @hdr: the header of the packet
+ * @rcv_flags: flags relevant to rcv processing
+ * @data: the packet data
+ * @tlen: the length of the packet
+ * @qp: the QP for this packet.
+ *
+ * This is called from qp_rcv() to process an incoming UC packet
+ * for the given QP.
+ * Called at interrupt level.
+ */
+void hfi1_uc_rcv(struct hfi1_packet *packet)
+{
+	struct hfi1_ibport *ibp = &packet->rcd->ppd->ibport_data;
+	struct hfi1_ib_header *hdr = packet->hdr;
+	u32 rcv_flags = packet->rcv_flags;
+	void *data = packet->ebuf;
+	u32 tlen = packet->tlen;
+	struct hfi1_qp *qp = packet->qp;
+	struct hfi1_other_headers *ohdr = packet->ohdr;
+	u32 opcode;
+	u32 hdrsize = packet->hlen;
+	u32 psn;
+	u32 pad;
+	struct ib_wc wc;
+	u32 pmtu = qp->pmtu;
+	struct ib_reth *reth;
+	int has_grh = rcv_flags & HFI1_HAS_GRH;
+	int ret;
+	u32 bth1;
+	struct ib_grh *grh = NULL;
+
+	opcode = be32_to_cpu(ohdr->bth[0]);
+	if (hfi1_ruc_check_hdr(ibp, hdr, has_grh, qp, opcode))
+		return;
+
+	bth1 = be32_to_cpu(ohdr->bth[1]);
+	if (unlikely(bth1 & (HFI1_BECN_SMASK | HFI1_FECN_SMASK))) {
+		if (bth1 & HFI1_BECN_SMASK) {
+			struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+			u32 rqpn, lqpn;
+			u16 rlid = be16_to_cpu(hdr->lrh[3]);
+			u8 sl, sc5;
+
+			lqpn = bth1 & HFI1_QPN_MASK;
+			rqpn = qp->remote_qpn;
+
+			sc5 = ibp->sl_to_sc[qp->remote_ah_attr.sl];
+			sl = ibp->sc_to_sl[sc5];
+
+			process_becn(ppd, sl, rlid, lqpn, rqpn,
+					IB_CC_SVCTYPE_UC);
+		}
+
+		if (bth1 & HFI1_FECN_SMASK) {
+			u16 pkey = (u16)be32_to_cpu(ohdr->bth[0]);
+			u16 slid = be16_to_cpu(hdr->lrh[3]);
+			u16 dlid = be16_to_cpu(hdr->lrh[1]);
+			u32 src_qp = qp->remote_qpn;
+			u8 sc5;
+
+			sc5 = ibp->sl_to_sc[qp->remote_ah_attr.sl];
+
+			return_cnp(ibp, qp, src_qp, pkey, dlid, slid, sc5, grh);
+		}
+	}
+
+	psn = be32_to_cpu(ohdr->bth[2]);
+	opcode >>= 24;
+
+	/* Compare the PSN verses the expected PSN. */
+	if (unlikely(cmp_psn(psn, qp->r_psn) != 0)) {
+		/*
+		 * Handle a sequence error.
+		 * Silently drop any current message.
+		 */
+		qp->r_psn = psn;
+inv:
+		if (qp->r_state == OP(SEND_FIRST) ||
+		    qp->r_state == OP(SEND_MIDDLE)) {
+			set_bit(HFI1_R_REWIND_SGE, &qp->r_aflags);
+			qp->r_sge.num_sge = 0;
+		} else
+			hfi1_put_ss(&qp->r_sge);
+		qp->r_state = OP(SEND_LAST);
+		switch (opcode) {
+		case OP(SEND_FIRST):
+		case OP(SEND_ONLY):
+		case OP(SEND_ONLY_WITH_IMMEDIATE):
+			goto send_first;
+
+		case OP(RDMA_WRITE_FIRST):
+		case OP(RDMA_WRITE_ONLY):
+		case OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE):
+			goto rdma_first;
+
+		default:
+			goto drop;
+		}
+	}
+
+	/* Check for opcode sequence errors. */
+	switch (qp->r_state) {
+	case OP(SEND_FIRST):
+	case OP(SEND_MIDDLE):
+		if (opcode == OP(SEND_MIDDLE) ||
+		    opcode == OP(SEND_LAST) ||
+		    opcode == OP(SEND_LAST_WITH_IMMEDIATE))
+			break;
+		goto inv;
+
+	case OP(RDMA_WRITE_FIRST):
+	case OP(RDMA_WRITE_MIDDLE):
+		if (opcode == OP(RDMA_WRITE_MIDDLE) ||
+		    opcode == OP(RDMA_WRITE_LAST) ||
+		    opcode == OP(RDMA_WRITE_LAST_WITH_IMMEDIATE))
+			break;
+		goto inv;
+
+	default:
+		if (opcode == OP(SEND_FIRST) ||
+		    opcode == OP(SEND_ONLY) ||
+		    opcode == OP(SEND_ONLY_WITH_IMMEDIATE) ||
+		    opcode == OP(RDMA_WRITE_FIRST) ||
+		    opcode == OP(RDMA_WRITE_ONLY) ||
+		    opcode == OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE))
+			break;
+		goto inv;
+	}
+
+	if (qp->state == IB_QPS_RTR && !(qp->r_flags & HFI1_R_COMM_EST))
+		qp_comm_est(qp);
+
+	/* OK, process the packet. */
+	switch (opcode) {
+	case OP(SEND_FIRST):
+	case OP(SEND_ONLY):
+	case OP(SEND_ONLY_WITH_IMMEDIATE):
+send_first:
+		if (test_and_clear_bit(HFI1_R_REWIND_SGE, &qp->r_aflags))
+			qp->r_sge = qp->s_rdma_read_sge;
+		else {
+			ret = hfi1_get_rwqe(qp, 0);
+			if (ret < 0)
+				goto op_err;
+			if (!ret)
+				goto drop;
+			/*
+			 * qp->s_rdma_read_sge will be the owner
+			 * of the mr references.
+			 */
+			qp->s_rdma_read_sge = qp->r_sge;
+		}
+		qp->r_rcv_len = 0;
+		if (opcode == OP(SEND_ONLY))
+			goto no_immediate_data;
+		else if (opcode == OP(SEND_ONLY_WITH_IMMEDIATE))
+			goto send_last_imm;
+		/* FALLTHROUGH */
+	case OP(SEND_MIDDLE):
+		/* Check for invalid length PMTU or posted rwqe len. */
+		if (unlikely(tlen != (hdrsize + pmtu + 4)))
+			goto rewind;
+		qp->r_rcv_len += pmtu;
+		if (unlikely(qp->r_rcv_len > qp->r_len))
+			goto rewind;
+		hfi1_copy_sge(&qp->r_sge, data, pmtu, 0);
+		break;
+
+	case OP(SEND_LAST_WITH_IMMEDIATE):
+send_last_imm:
+		wc.ex.imm_data = ohdr->u.imm_data;
+		wc.wc_flags = IB_WC_WITH_IMM;
+		goto send_last;
+	case OP(SEND_LAST):
+no_immediate_data:
+		wc.ex.imm_data = 0;
+		wc.wc_flags = 0;
+send_last:
+		/* Get the number of bytes the message was padded by. */
+		pad = (be32_to_cpu(ohdr->bth[0]) >> 20) & 3;
+		/* Check for invalid length. */
+		/* LAST len should be >= 1 */
+		if (unlikely(tlen < (hdrsize + pad + 4)))
+			goto rewind;
+		/* Don't count the CRC. */
+		tlen -= (hdrsize + pad + 4);
+		wc.byte_len = tlen + qp->r_rcv_len;
+		if (unlikely(wc.byte_len > qp->r_len))
+			goto rewind;
+		wc.opcode = IB_WC_RECV;
+		hfi1_copy_sge(&qp->r_sge, data, tlen, 0);
+		hfi1_put_ss(&qp->s_rdma_read_sge);
+last_imm:
+		wc.wr_id = qp->r_wr_id;
+		wc.status = IB_WC_SUCCESS;
+		wc.qp = &qp->ibqp;
+		wc.src_qp = qp->remote_qpn;
+		wc.slid = qp->remote_ah_attr.dlid;
+		/*
+		 * It seems that IB mandates the presence of an SL in a
+		 * work completion only for the UD transport (see section
+		 * 11.4.2 of IBTA Vol. 1).
+		 *
+		 * However, the way the SL is chosen below is consistent
+		 * with the way that IB/qib works and is trying avoid
+		 * introducing incompatibilities.
+		 *
+		 * See also OPA Vol. 1, section 9.7.6, and table 9-17.
+		 */
+		wc.sl = qp->remote_ah_attr.sl;
+		/* zero fields that are N/A */
+		wc.vendor_err = 0;
+		wc.pkey_index = 0;
+		wc.dlid_path_bits = 0;
+		wc.port_num = 0;
+		/* Signal completion event if the solicited bit is set. */
+		hfi1_cq_enter(to_icq(qp->ibqp.recv_cq), &wc,
+			      (ohdr->bth[0] &
+				cpu_to_be32(IB_BTH_SOLICITED)) != 0);
+		break;
+
+	case OP(RDMA_WRITE_FIRST):
+	case OP(RDMA_WRITE_ONLY):
+	case OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE): /* consume RWQE */
+rdma_first:
+		if (unlikely(!(qp->qp_access_flags &
+			       IB_ACCESS_REMOTE_WRITE))) {
+			goto drop;
+		}
+		reth = &ohdr->u.rc.reth;
+		qp->r_len = be32_to_cpu(reth->length);
+		qp->r_rcv_len = 0;
+		qp->r_sge.sg_list = NULL;
+		if (qp->r_len != 0) {
+			u32 rkey = be32_to_cpu(reth->rkey);
+			u64 vaddr = be64_to_cpu(reth->vaddr);
+			int ok;
+
+			/* Check rkey */
+			ok = hfi1_rkey_ok(qp, &qp->r_sge.sge, qp->r_len,
+					  vaddr, rkey, IB_ACCESS_REMOTE_WRITE);
+			if (unlikely(!ok))
+				goto drop;
+			qp->r_sge.num_sge = 1;
+		} else {
+			qp->r_sge.num_sge = 0;
+			qp->r_sge.sge.mr = NULL;
+			qp->r_sge.sge.vaddr = NULL;
+			qp->r_sge.sge.length = 0;
+			qp->r_sge.sge.sge_length = 0;
+		}
+		if (opcode == OP(RDMA_WRITE_ONLY))
+			goto rdma_last;
+		else if (opcode == OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE)) {
+			wc.ex.imm_data = ohdr->u.rc.imm_data;
+			goto rdma_last_imm;
+		}
+		/* FALLTHROUGH */
+	case OP(RDMA_WRITE_MIDDLE):
+		/* Check for invalid length PMTU or posted rwqe len. */
+		if (unlikely(tlen != (hdrsize + pmtu + 4)))
+			goto drop;
+		qp->r_rcv_len += pmtu;
+		if (unlikely(qp->r_rcv_len > qp->r_len))
+			goto drop;
+		hfi1_copy_sge(&qp->r_sge, data, pmtu, 1);
+		break;
+
+	case OP(RDMA_WRITE_LAST_WITH_IMMEDIATE):
+		wc.ex.imm_data = ohdr->u.imm_data;
+rdma_last_imm:
+		wc.wc_flags = IB_WC_WITH_IMM;
+
+		/* Get the number of bytes the message was padded by. */
+		pad = (be32_to_cpu(ohdr->bth[0]) >> 20) & 3;
+		/* Check for invalid length. */
+		/* LAST len should be >= 1 */
+		if (unlikely(tlen < (hdrsize + pad + 4)))
+			goto drop;
+		/* Don't count the CRC. */
+		tlen -= (hdrsize + pad + 4);
+		if (unlikely(tlen + qp->r_rcv_len != qp->r_len))
+			goto drop;
+		if (test_and_clear_bit(HFI1_R_REWIND_SGE, &qp->r_aflags))
+			hfi1_put_ss(&qp->s_rdma_read_sge);
+		else {
+			ret = hfi1_get_rwqe(qp, 1);
+			if (ret < 0)
+				goto op_err;
+			if (!ret)
+				goto drop;
+		}
+		wc.byte_len = qp->r_len;
+		wc.opcode = IB_WC_RECV_RDMA_WITH_IMM;
+		hfi1_copy_sge(&qp->r_sge, data, tlen, 1);
+		hfi1_put_ss(&qp->r_sge);
+		goto last_imm;
+
+	case OP(RDMA_WRITE_LAST):
+rdma_last:
+		/* Get the number of bytes the message was padded by. */
+		pad = (be32_to_cpu(ohdr->bth[0]) >> 20) & 3;
+		/* Check for invalid length. */
+		/* LAST len should be >= 1 */
+		if (unlikely(tlen < (hdrsize + pad + 4)))
+			goto drop;
+		/* Don't count the CRC. */
+		tlen -= (hdrsize + pad + 4);
+		if (unlikely(tlen + qp->r_rcv_len != qp->r_len))
+			goto drop;
+		hfi1_copy_sge(&qp->r_sge, data, tlen, 1);
+		hfi1_put_ss(&qp->r_sge);
+		break;
+
+	default:
+		/* Drop packet for unknown opcodes. */
+		goto drop;
+	}
+	qp->r_psn++;
+	qp->r_state = opcode;
+	return;
+
+rewind:
+	set_bit(HFI1_R_REWIND_SGE, &qp->r_aflags);
+	qp->r_sge.num_sge = 0;
+drop:
+	ibp->n_pkt_drops++;
+	return;
+
+op_err:
+	hfi1_rc_error(qp, IB_WC_LOC_QP_OP_ERR);
+	return;
+
+}
diff --git a/drivers/staging/rdma/hfi1/ud.c b/drivers/staging/rdma/hfi1/ud.c
new file mode 100644
index 000000000000..d40d1a1e10aa
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/ud.c
@@ -0,0 +1,885 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/net.h>
+#include <rdma/ib_smi.h>
+
+#include "hfi.h"
+#include "mad.h"
+#include "qp.h"
+
+/**
+ * ud_loopback - handle send on loopback QPs
+ * @sqp: the sending QP
+ * @swqe: the send work request
+ *
+ * This is called from hfi1_make_ud_req() to forward a WQE addressed
+ * to the same HFI.
+ * Note that the receive interrupt handler may be calling hfi1_ud_rcv()
+ * while this is being called.
+ */
+static void ud_loopback(struct hfi1_qp *sqp, struct hfi1_swqe *swqe)
+{
+	struct hfi1_ibport *ibp = to_iport(sqp->ibqp.device, sqp->port_num);
+	struct hfi1_pportdata *ppd;
+	struct hfi1_qp *qp;
+	struct ib_ah_attr *ah_attr;
+	unsigned long flags;
+	struct hfi1_sge_state ssge;
+	struct hfi1_sge *sge;
+	struct ib_wc wc;
+	u32 length;
+	enum ib_qp_type sqptype, dqptype;
+
+	rcu_read_lock();
+
+	qp = hfi1_lookup_qpn(ibp, swqe->wr.wr.ud.remote_qpn);
+	if (!qp) {
+		ibp->n_pkt_drops++;
+		rcu_read_unlock();
+		return;
+	}
+
+	sqptype = sqp->ibqp.qp_type == IB_QPT_GSI ?
+			IB_QPT_UD : sqp->ibqp.qp_type;
+	dqptype = qp->ibqp.qp_type == IB_QPT_GSI ?
+			IB_QPT_UD : qp->ibqp.qp_type;
+
+	if (dqptype != sqptype ||
+	    !(ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_RECV_OK)) {
+		ibp->n_pkt_drops++;
+		goto drop;
+	}
+
+	ah_attr = &to_iah(swqe->wr.wr.ud.ah)->attr;
+	ppd = ppd_from_ibp(ibp);
+
+	if (qp->ibqp.qp_num > 1) {
+		u16 pkey;
+		u16 slid;
+		u8 sc5 = ibp->sl_to_sc[ah_attr->sl];
+
+		pkey = hfi1_get_pkey(ibp, sqp->s_pkey_index);
+		slid = ppd->lid | (ah_attr->src_path_bits &
+				   ((1 << ppd->lmc) - 1));
+		if (unlikely(ingress_pkey_check(ppd, pkey, sc5,
+						qp->s_pkey_index, slid))) {
+			hfi1_bad_pqkey(ibp, IB_NOTICE_TRAP_BAD_PKEY, pkey,
+				       ah_attr->sl,
+				       sqp->ibqp.qp_num, qp->ibqp.qp_num,
+				       cpu_to_be16(slid),
+				       cpu_to_be16(ah_attr->dlid));
+			goto drop;
+		}
+	}
+
+	/*
+	 * Check that the qkey matches (except for QP0, see 9.6.1.4.1).
+	 * Qkeys with the high order bit set mean use the
+	 * qkey from the QP context instead of the WR (see 10.2.5).
+	 */
+	if (qp->ibqp.qp_num) {
+		u32 qkey;
+
+		qkey = (int)swqe->wr.wr.ud.remote_qkey < 0 ?
+			sqp->qkey : swqe->wr.wr.ud.remote_qkey;
+		if (unlikely(qkey != qp->qkey)) {
+			u16 lid;
+
+			lid = ppd->lid | (ah_attr->src_path_bits &
+					  ((1 << ppd->lmc) - 1));
+			hfi1_bad_pqkey(ibp, IB_NOTICE_TRAP_BAD_QKEY, qkey,
+				       ah_attr->sl,
+				       sqp->ibqp.qp_num, qp->ibqp.qp_num,
+				       cpu_to_be16(lid),
+				       cpu_to_be16(ah_attr->dlid));
+			goto drop;
+		}
+	}
+
+	/*
+	 * A GRH is expected to precede the data even if not
+	 * present on the wire.
+	 */
+	length = swqe->length;
+	memset(&wc, 0, sizeof(wc));
+	wc.byte_len = length + sizeof(struct ib_grh);
+
+	if (swqe->wr.opcode == IB_WR_SEND_WITH_IMM) {
+		wc.wc_flags = IB_WC_WITH_IMM;
+		wc.ex.imm_data = swqe->wr.ex.imm_data;
+	}
+
+	spin_lock_irqsave(&qp->r_lock, flags);
+
+	/*
+	 * Get the next work request entry to find where to put the data.
+	 */
+	if (qp->r_flags & HFI1_R_REUSE_SGE)
+		qp->r_flags &= ~HFI1_R_REUSE_SGE;
+	else {
+		int ret;
+
+		ret = hfi1_get_rwqe(qp, 0);
+		if (ret < 0) {
+			hfi1_rc_error(qp, IB_WC_LOC_QP_OP_ERR);
+			goto bail_unlock;
+		}
+		if (!ret) {
+			if (qp->ibqp.qp_num == 0)
+				ibp->n_vl15_dropped++;
+			goto bail_unlock;
+		}
+	}
+	/* Silently drop packets which are too big. */
+	if (unlikely(wc.byte_len > qp->r_len)) {
+		qp->r_flags |= HFI1_R_REUSE_SGE;
+		ibp->n_pkt_drops++;
+		goto bail_unlock;
+	}
+
+	if (ah_attr->ah_flags & IB_AH_GRH) {
+		hfi1_copy_sge(&qp->r_sge, &ah_attr->grh,
+			      sizeof(struct ib_grh), 1);
+		wc.wc_flags |= IB_WC_GRH;
+	} else
+		hfi1_skip_sge(&qp->r_sge, sizeof(struct ib_grh), 1);
+	ssge.sg_list = swqe->sg_list + 1;
+	ssge.sge = *swqe->sg_list;
+	ssge.num_sge = swqe->wr.num_sge;
+	sge = &ssge.sge;
+	while (length) {
+		u32 len = sge->length;
+
+		if (len > length)
+			len = length;
+		if (len > sge->sge_length)
+			len = sge->sge_length;
+		WARN_ON_ONCE(len == 0);
+		hfi1_copy_sge(&qp->r_sge, sge->vaddr, len, 1);
+		sge->vaddr += len;
+		sge->length -= len;
+		sge->sge_length -= len;
+		if (sge->sge_length == 0) {
+			if (--ssge.num_sge)
+				*sge = *ssge.sg_list++;
+		} else if (sge->length == 0 && sge->mr->lkey) {
+			if (++sge->n >= HFI1_SEGSZ) {
+				if (++sge->m >= sge->mr->mapsz)
+					break;
+				sge->n = 0;
+			}
+			sge->vaddr =
+				sge->mr->map[sge->m]->segs[sge->n].vaddr;
+			sge->length =
+				sge->mr->map[sge->m]->segs[sge->n].length;
+		}
+		length -= len;
+	}
+	hfi1_put_ss(&qp->r_sge);
+	if (!test_and_clear_bit(HFI1_R_WRID_VALID, &qp->r_aflags))
+		goto bail_unlock;
+	wc.wr_id = qp->r_wr_id;
+	wc.status = IB_WC_SUCCESS;
+	wc.opcode = IB_WC_RECV;
+	wc.qp = &qp->ibqp;
+	wc.src_qp = sqp->ibqp.qp_num;
+	if (qp->ibqp.qp_type == IB_QPT_GSI || qp->ibqp.qp_type == IB_QPT_SMI) {
+		if (sqp->ibqp.qp_type == IB_QPT_GSI ||
+		    sqp->ibqp.qp_type == IB_QPT_SMI)
+			wc.pkey_index = swqe->wr.wr.ud.pkey_index;
+		else
+			wc.pkey_index = sqp->s_pkey_index;
+	} else {
+		wc.pkey_index = 0;
+	}
+	wc.slid = ppd->lid | (ah_attr->src_path_bits & ((1 << ppd->lmc) - 1));
+	/* Check for loopback when the port lid is not set */
+	if (wc.slid == 0 && sqp->ibqp.qp_type == IB_QPT_GSI)
+		wc.slid = HFI1_PERMISSIVE_LID;
+	wc.sl = ah_attr->sl;
+	wc.dlid_path_bits = ah_attr->dlid & ((1 << ppd->lmc) - 1);
+	wc.port_num = qp->port_num;
+	/* Signal completion event if the solicited bit is set. */
+	hfi1_cq_enter(to_icq(qp->ibqp.recv_cq), &wc,
+		      swqe->wr.send_flags & IB_SEND_SOLICITED);
+	ibp->n_loop_pkts++;
+bail_unlock:
+	spin_unlock_irqrestore(&qp->r_lock, flags);
+drop:
+	rcu_read_unlock();
+}
+
+/**
+ * hfi1_make_ud_req - construct a UD request packet
+ * @qp: the QP
+ *
+ * Return 1 if constructed; otherwise, return 0.
+ */
+int hfi1_make_ud_req(struct hfi1_qp *qp)
+{
+	struct hfi1_other_headers *ohdr;
+	struct ib_ah_attr *ah_attr;
+	struct hfi1_pportdata *ppd;
+	struct hfi1_ibport *ibp;
+	struct hfi1_swqe *wqe;
+	unsigned long flags;
+	u32 nwords;
+	u32 extra_bytes;
+	u32 bth0;
+	u16 lrh0;
+	u16 lid;
+	int ret = 0;
+	int next_cur;
+	u8 sc5;
+
+	spin_lock_irqsave(&qp->s_lock, flags);
+
+	if (!(ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_NEXT_SEND_OK)) {
+		if (!(ib_hfi1_state_ops[qp->state] & HFI1_FLUSH_SEND))
+			goto bail;
+		/* We are in the error state, flush the work request. */
+		if (qp->s_last == qp->s_head)
+			goto bail;
+		/* If DMAs are in progress, we can't flush immediately. */
+		if (atomic_read(&qp->s_iowait.sdma_busy)) {
+			qp->s_flags |= HFI1_S_WAIT_DMA;
+			goto bail;
+		}
+		wqe = get_swqe_ptr(qp, qp->s_last);
+		hfi1_send_complete(qp, wqe, IB_WC_WR_FLUSH_ERR);
+		goto done;
+	}
+
+	if (qp->s_cur == qp->s_head)
+		goto bail;
+
+	wqe = get_swqe_ptr(qp, qp->s_cur);
+	next_cur = qp->s_cur + 1;
+	if (next_cur >= qp->s_size)
+		next_cur = 0;
+
+	/* Construct the header. */
+	ibp = to_iport(qp->ibqp.device, qp->port_num);
+	ppd = ppd_from_ibp(ibp);
+	ah_attr = &to_iah(wqe->wr.wr.ud.ah)->attr;
+	if (ah_attr->dlid < HFI1_MULTICAST_LID_BASE ||
+	    ah_attr->dlid == HFI1_PERMISSIVE_LID) {
+		lid = ah_attr->dlid & ~((1 << ppd->lmc) - 1);
+		if (unlikely(!loopback && (lid == ppd->lid ||
+		    (lid == HFI1_PERMISSIVE_LID &&
+		     qp->ibqp.qp_type == IB_QPT_GSI)))) {
+			/*
+			 * If DMAs are in progress, we can't generate
+			 * a completion for the loopback packet since
+			 * it would be out of order.
+			 * Instead of waiting, we could queue a
+			 * zero length descriptor so we get a callback.
+			 */
+			if (atomic_read(&qp->s_iowait.sdma_busy)) {
+				qp->s_flags |= HFI1_S_WAIT_DMA;
+				goto bail;
+			}
+			qp->s_cur = next_cur;
+			spin_unlock_irqrestore(&qp->s_lock, flags);
+			ud_loopback(qp, wqe);
+			spin_lock_irqsave(&qp->s_lock, flags);
+			hfi1_send_complete(qp, wqe, IB_WC_SUCCESS);
+			goto done;
+		}
+	}
+
+	qp->s_cur = next_cur;
+	extra_bytes = -wqe->length & 3;
+	nwords = (wqe->length + extra_bytes) >> 2;
+
+	/* header size in 32-bit words LRH+BTH+DETH = (8+12+8)/4. */
+	qp->s_hdrwords = 7;
+	qp->s_cur_size = wqe->length;
+	qp->s_cur_sge = &qp->s_sge;
+	qp->s_srate = ah_attr->static_rate;
+	qp->srate_mbps = ib_rate_to_mbps(qp->s_srate);
+	qp->s_wqe = wqe;
+	qp->s_sge.sge = wqe->sg_list[0];
+	qp->s_sge.sg_list = wqe->sg_list + 1;
+	qp->s_sge.num_sge = wqe->wr.num_sge;
+	qp->s_sge.total_len = wqe->length;
+
+	if (ah_attr->ah_flags & IB_AH_GRH) {
+		/* Header size in 32-bit words. */
+		qp->s_hdrwords += hfi1_make_grh(ibp, &qp->s_hdr->ibh.u.l.grh,
+					       &ah_attr->grh,
+					       qp->s_hdrwords, nwords);
+		lrh0 = HFI1_LRH_GRH;
+		ohdr = &qp->s_hdr->ibh.u.l.oth;
+		/*
+		 * Don't worry about sending to locally attached multicast
+		 * QPs.  It is unspecified by the spec. what happens.
+		 */
+	} else {
+		/* Header size in 32-bit words. */
+		lrh0 = HFI1_LRH_BTH;
+		ohdr = &qp->s_hdr->ibh.u.oth;
+	}
+	if (wqe->wr.opcode == IB_WR_SEND_WITH_IMM) {
+		qp->s_hdrwords++;
+		ohdr->u.ud.imm_data = wqe->wr.ex.imm_data;
+		bth0 = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE << 24;
+	} else
+		bth0 = IB_OPCODE_UD_SEND_ONLY << 24;
+	sc5 = ibp->sl_to_sc[ah_attr->sl];
+	lrh0 |= (ah_attr->sl & 0xf) << 4;
+	if (qp->ibqp.qp_type == IB_QPT_SMI) {
+		lrh0 |= 0xF000; /* Set VL (see ch. 13.5.3.1) */
+		qp->s_sc = 0xf;
+	} else {
+		lrh0 |= (sc5 & 0xf) << 12;
+		qp->s_sc = sc5;
+	}
+	qp->s_hdr->ibh.lrh[0] = cpu_to_be16(lrh0);
+	qp->s_hdr->ibh.lrh[1] = cpu_to_be16(ah_attr->dlid);  /* DEST LID */
+	qp->s_hdr->ibh.lrh[2] =
+		cpu_to_be16(qp->s_hdrwords + nwords + SIZE_OF_CRC);
+	if (ah_attr->dlid == be16_to_cpu(IB_LID_PERMISSIVE))
+		qp->s_hdr->ibh.lrh[3] = IB_LID_PERMISSIVE;
+	else {
+		lid = ppd->lid;
+		if (lid) {
+			lid |= ah_attr->src_path_bits & ((1 << ppd->lmc) - 1);
+			qp->s_hdr->ibh.lrh[3] = cpu_to_be16(lid);
+		} else
+			qp->s_hdr->ibh.lrh[3] = IB_LID_PERMISSIVE;
+	}
+	if (wqe->wr.send_flags & IB_SEND_SOLICITED)
+		bth0 |= IB_BTH_SOLICITED;
+	bth0 |= extra_bytes << 20;
+	if (qp->ibqp.qp_type == IB_QPT_GSI || qp->ibqp.qp_type == IB_QPT_SMI)
+		bth0 |= hfi1_get_pkey(ibp, wqe->wr.wr.ud.pkey_index);
+	else
+		bth0 |= hfi1_get_pkey(ibp, qp->s_pkey_index);
+	ohdr->bth[0] = cpu_to_be32(bth0);
+	ohdr->bth[1] = cpu_to_be32(wqe->wr.wr.ud.remote_qpn);
+	ohdr->bth[2] = cpu_to_be32(mask_psn(qp->s_next_psn++));
+	/*
+	 * Qkeys with the high order bit set mean use the
+	 * qkey from the QP context instead of the WR (see 10.2.5).
+	 */
+	ohdr->u.ud.deth[0] = cpu_to_be32((int)wqe->wr.wr.ud.remote_qkey < 0 ?
+					 qp->qkey : wqe->wr.wr.ud.remote_qkey);
+	ohdr->u.ud.deth[1] = cpu_to_be32(qp->ibqp.qp_num);
+	/* disarm any ahg */
+	qp->s_hdr->ahgcount = 0;
+	qp->s_hdr->ahgidx = 0;
+	qp->s_hdr->tx_flags = 0;
+	qp->s_hdr->sde = NULL;
+
+done:
+	ret = 1;
+	goto unlock;
+
+bail:
+	qp->s_flags &= ~HFI1_S_BUSY;
+unlock:
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+	return ret;
+}
+
+/*
+ * Hardware can't check this so we do it here.
+ *
+ * This is a slightly different algorithm than the standard pkey check.  It
+ * special cases the management keys and allows for 0x7fff and 0xffff to be in
+ * the table at the same time.
+ *
+ * @returns the index found or -1 if not found
+ */
+int hfi1_lookup_pkey_idx(struct hfi1_ibport *ibp, u16 pkey)
+{
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	unsigned i;
+
+	if (pkey == FULL_MGMT_P_KEY || pkey == LIM_MGMT_P_KEY) {
+		unsigned lim_idx = -1;
+
+		for (i = 0; i < ARRAY_SIZE(ppd->pkeys); ++i) {
+			/* here we look for an exact match */
+			if (ppd->pkeys[i] == pkey)
+				return i;
+			if (ppd->pkeys[i] == LIM_MGMT_P_KEY)
+				lim_idx = i;
+		}
+
+		/* did not find 0xffff return 0x7fff idx if found */
+		if (pkey == FULL_MGMT_P_KEY)
+			return lim_idx;
+
+		/* no match...  */
+		return -1;
+	}
+
+	pkey &= 0x7fff; /* remove limited/full membership bit */
+
+	for (i = 0; i < ARRAY_SIZE(ppd->pkeys); ++i)
+		if ((ppd->pkeys[i] & 0x7fff) == pkey)
+			return i;
+
+	/*
+	 * Should not get here, this means hardware failed to validate pkeys.
+	 */
+	return -1;
+}
+
+void return_cnp(struct hfi1_ibport *ibp, struct hfi1_qp *qp, u32 remote_qpn,
+		u32 pkey, u32 slid, u32 dlid, u8 sc5,
+		const struct ib_grh *old_grh)
+{
+	u64 pbc, pbc_flags = 0;
+	u32 bth0, plen, vl, hwords = 5;
+	u16 lrh0;
+	u8 sl = ibp->sc_to_sl[sc5];
+	struct hfi1_ib_header hdr;
+	struct hfi1_other_headers *ohdr;
+	struct pio_buf *pbuf;
+	struct send_context *ctxt = qp_to_send_context(qp, sc5);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+
+	if (old_grh) {
+		struct ib_grh *grh = &hdr.u.l.grh;
+
+		grh->version_tclass_flow = old_grh->version_tclass_flow;
+		grh->paylen = cpu_to_be16((hwords - 2 + SIZE_OF_CRC) << 2);
+		grh->hop_limit = 0xff;
+		grh->sgid = old_grh->dgid;
+		grh->dgid = old_grh->sgid;
+		ohdr = &hdr.u.l.oth;
+		lrh0 = HFI1_LRH_GRH;
+		hwords += sizeof(struct ib_grh) / sizeof(u32);
+	} else {
+		ohdr = &hdr.u.oth;
+		lrh0 = HFI1_LRH_BTH;
+	}
+
+	lrh0 |= (sc5 & 0xf) << 12 | sl << 4;
+
+	bth0 = pkey | (IB_OPCODE_CNP << 24);
+	ohdr->bth[0] = cpu_to_be32(bth0);
+
+	ohdr->bth[1] = cpu_to_be32(remote_qpn | (1 << HFI1_BECN_SHIFT));
+	ohdr->bth[2] = 0; /* PSN 0 */
+
+	hdr.lrh[0] = cpu_to_be16(lrh0);
+	hdr.lrh[1] = cpu_to_be16(dlid);
+	hdr.lrh[2] = cpu_to_be16(hwords + SIZE_OF_CRC);
+	hdr.lrh[3] = cpu_to_be16(slid);
+
+	plen = 2 /* PBC */ + hwords;
+	pbc_flags |= (!!(sc5 & 0x10)) << PBC_DC_INFO_SHIFT;
+	vl = sc_to_vlt(ppd->dd, sc5);
+	pbc = create_pbc(ppd, pbc_flags, qp->srate_mbps, vl, plen);
+	if (ctxt) {
+		pbuf = sc_buffer_alloc(ctxt, plen, NULL, NULL);
+		if (pbuf)
+			ppd->dd->pio_inline_send(ppd->dd, pbuf, pbc,
+						 &hdr, hwords);
+	}
+}
+
+/*
+ * opa_smp_check() - Do the regular pkey checking, and the additional
+ * checks for SMPs specified in OPAv1 rev 0.90, section 9.10.26
+ * ("SMA Packet Checks").
+ *
+ * Note that:
+ *   - Checks are done using the pkey directly from the packet's BTH,
+ *     and specifically _not_ the pkey that we attach to the completion,
+ *     which may be different.
+ *   - These checks are specifically for "non-local" SMPs (i.e., SMPs
+ *     which originated on another node). SMPs which are sent from, and
+ *     destined to this node are checked in opa_local_smp_check().
+ *
+ * At the point where opa_smp_check() is called, we know:
+ *   - destination QP is QP0
+ *
+ * opa_smp_check() returns 0 if all checks succeed, 1 otherwise.
+ */
+static int opa_smp_check(struct hfi1_ibport *ibp, u16 pkey, u8 sc5,
+			 struct hfi1_qp *qp, u16 slid, struct opa_smp *smp)
+{
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+
+	/*
+	 * I don't think it's possible for us to get here with sc != 0xf,
+	 * but check it to be certain.
+	 */
+	if (sc5 != 0xf)
+		return 1;
+
+	if (rcv_pkey_check(ppd, pkey, sc5, slid))
+		return 1;
+
+	/*
+	 * At this point we know (and so don't need to check again) that
+	 * the pkey is either LIM_MGMT_P_KEY, or FULL_MGMT_P_KEY
+	 * (see ingress_pkey_check).
+	 */
+	if (smp->mgmt_class != IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE &&
+	    smp->mgmt_class != IB_MGMT_CLASS_SUBN_LID_ROUTED) {
+		ingress_pkey_table_fail(ppd, pkey, slid);
+		return 1;
+	}
+
+	/*
+	 * SMPs fall into one of four (disjoint) categories:
+	 * SMA request, SMA response, trap, or trap repress.
+	 * Our response depends, in part, on which type of
+	 * SMP we're processing.
+	 *
+	 * If this is not an SMA request, or trap repress:
+	 *   - accept MAD if the port is running an SM
+	 *   - pkey == FULL_MGMT_P_KEY =>
+	 *       reply with unsupported method (i.e., just mark
+	 *       the smp's status field here, and let it be
+	 *       processed normally)
+	 *   - pkey != LIM_MGMT_P_KEY =>
+	 *       increment port recv constraint errors, drop MAD
+	 * If this is an SMA request or trap repress:
+	 *   - pkey != FULL_MGMT_P_KEY =>
+	 *       increment port recv constraint errors, drop MAD
+	 */
+	switch (smp->method) {
+	case IB_MGMT_METHOD_GET:
+	case IB_MGMT_METHOD_SET:
+	case IB_MGMT_METHOD_REPORT:
+	case IB_MGMT_METHOD_TRAP_REPRESS:
+		if (pkey != FULL_MGMT_P_KEY) {
+			ingress_pkey_table_fail(ppd, pkey, slid);
+			return 1;
+		}
+		break;
+	case IB_MGMT_METHOD_SEND:
+	case IB_MGMT_METHOD_TRAP:
+	case IB_MGMT_METHOD_GET_RESP:
+	case IB_MGMT_METHOD_REPORT_RESP:
+		if (ibp->port_cap_flags & IB_PORT_SM)
+			return 0;
+		if (pkey == FULL_MGMT_P_KEY) {
+			smp->status |= IB_SMP_UNSUP_METHOD;
+			return 0;
+		}
+		if (pkey != LIM_MGMT_P_KEY) {
+			ingress_pkey_table_fail(ppd, pkey, slid);
+			return 1;
+		}
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
+
+/**
+ * hfi1_ud_rcv - receive an incoming UD packet
+ * @ibp: the port the packet came in on
+ * @hdr: the packet header
+ * @rcv_flags: flags relevant to rcv processing
+ * @data: the packet data
+ * @tlen: the packet length
+ * @qp: the QP the packet came on
+ *
+ * This is called from qp_rcv() to process an incoming UD packet
+ * for the given QP.
+ * Called at interrupt level.
+ */
+void hfi1_ud_rcv(struct hfi1_packet *packet)
+{
+	struct hfi1_other_headers *ohdr = packet->ohdr;
+	int opcode;
+	u32 hdrsize = packet->hlen;
+	u32 pad;
+	struct ib_wc wc;
+	u32 qkey;
+	u32 src_qp;
+	u16 dlid, pkey;
+	int mgmt_pkey_idx = -1;
+	struct hfi1_ibport *ibp = &packet->rcd->ppd->ibport_data;
+	struct hfi1_ib_header *hdr = packet->hdr;
+	u32 rcv_flags = packet->rcv_flags;
+	void *data = packet->ebuf;
+	u32 tlen = packet->tlen;
+	struct hfi1_qp *qp = packet->qp;
+	bool has_grh = rcv_flags & HFI1_HAS_GRH;
+	bool sc4_bit = has_sc4_bit(packet);
+	u8 sc;
+	u32 bth1;
+	int is_mcast;
+	struct ib_grh *grh = NULL;
+
+	qkey = be32_to_cpu(ohdr->u.ud.deth[0]);
+	src_qp = be32_to_cpu(ohdr->u.ud.deth[1]) & HFI1_QPN_MASK;
+	dlid = be16_to_cpu(hdr->lrh[1]);
+	is_mcast = (dlid > HFI1_MULTICAST_LID_BASE) &&
+			(dlid != HFI1_PERMISSIVE_LID);
+	bth1 = be32_to_cpu(ohdr->bth[1]);
+	if (unlikely(bth1 & HFI1_BECN_SMASK)) {
+		/*
+		 * In pre-B0 h/w the CNP_OPCODE is handled via an
+		 * error path (errata 291394).
+		 */
+		struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+		u32 lqpn =  be32_to_cpu(ohdr->bth[1]) & HFI1_QPN_MASK;
+		u8 sl, sc5;
+
+		sc5 = (be16_to_cpu(hdr->lrh[0]) >> 12) & 0xf;
+		sc5 |= sc4_bit;
+		sl = ibp->sc_to_sl[sc5];
+
+		process_becn(ppd, sl, 0, lqpn, 0, IB_CC_SVCTYPE_UD);
+	}
+
+	/*
+	 * The opcode is in the low byte when its in network order
+	 * (top byte when in host order).
+	 */
+	opcode = be32_to_cpu(ohdr->bth[0]) >> 24;
+	opcode &= 0xff;
+
+	pkey = (u16)be32_to_cpu(ohdr->bth[0]);
+
+	if (!is_mcast && (opcode != IB_OPCODE_CNP) && bth1 & HFI1_FECN_SMASK) {
+		u16 slid = be16_to_cpu(hdr->lrh[3]);
+		u8 sc5;
+
+		sc5 = (be16_to_cpu(hdr->lrh[0]) >> 12) & 0xf;
+		sc5 |= sc4_bit;
+
+		return_cnp(ibp, qp, src_qp, pkey, dlid, slid, sc5, grh);
+	}
+	/*
+	 * Get the number of bytes the message was padded by
+	 * and drop incomplete packets.
+	 */
+	pad = (be32_to_cpu(ohdr->bth[0]) >> 20) & 3;
+	if (unlikely(tlen < (hdrsize + pad + 4)))
+		goto drop;
+
+	tlen -= hdrsize + pad + 4;
+
+	/*
+	 * Check that the permissive LID is only used on QP0
+	 * and the QKEY matches (see 9.6.1.4.1 and 9.6.1.5.1).
+	 */
+	if (qp->ibqp.qp_num) {
+		if (unlikely(hdr->lrh[1] == IB_LID_PERMISSIVE ||
+			     hdr->lrh[3] == IB_LID_PERMISSIVE))
+			goto drop;
+		if (qp->ibqp.qp_num > 1) {
+			struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+			u16 slid;
+			u8 sc5;
+
+			sc5 = (be16_to_cpu(hdr->lrh[0]) >> 12) & 0xf;
+			sc5 |= sc4_bit;
+
+			slid = be16_to_cpu(hdr->lrh[3]);
+			if (unlikely(rcv_pkey_check(ppd, pkey, sc5, slid))) {
+				/*
+				 * Traps will not be sent for packets dropped
+				 * by the HW. This is fine, as sending trap
+				 * for invalid pkeys is optional according to
+				 * IB spec (release 1.3, section 10.9.4)
+				 */
+				hfi1_bad_pqkey(ibp, IB_NOTICE_TRAP_BAD_PKEY,
+					       pkey,
+					       (be16_to_cpu(hdr->lrh[0]) >> 4) &
+						0xF,
+					       src_qp, qp->ibqp.qp_num,
+					       hdr->lrh[3], hdr->lrh[1]);
+				return;
+			}
+		} else {
+			/* GSI packet */
+			mgmt_pkey_idx = hfi1_lookup_pkey_idx(ibp, pkey);
+			if (mgmt_pkey_idx < 0)
+				goto drop;
+
+		}
+		if (unlikely(qkey != qp->qkey)) {
+			hfi1_bad_pqkey(ibp, IB_NOTICE_TRAP_BAD_QKEY, qkey,
+				       (be16_to_cpu(hdr->lrh[0]) >> 4) & 0xF,
+				       src_qp, qp->ibqp.qp_num,
+				       hdr->lrh[3], hdr->lrh[1]);
+			return;
+		}
+		/* Drop invalid MAD packets (see 13.5.3.1). */
+		if (unlikely(qp->ibqp.qp_num == 1 &&
+			     (tlen > 2048 ||
+			      (be16_to_cpu(hdr->lrh[0]) >> 12) == 15)))
+			goto drop;
+	} else {
+		/* Received on QP0, and so by definition, this is an SMP */
+		struct opa_smp *smp = (struct opa_smp *)data;
+		u16 slid = be16_to_cpu(hdr->lrh[3]);
+		u8 sc5;
+
+		sc5 = (be16_to_cpu(hdr->lrh[0]) >> 12) & 0xf;
+		sc5 |= sc4_bit;
+
+		if (opa_smp_check(ibp, pkey, sc5, qp, slid, smp))
+			goto drop;
+
+		if (tlen > 2048)
+			goto drop;
+		if ((hdr->lrh[1] == IB_LID_PERMISSIVE ||
+		     hdr->lrh[3] == IB_LID_PERMISSIVE) &&
+		    smp->mgmt_class != IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE)
+			goto drop;
+
+		/* look up SMI pkey */
+		mgmt_pkey_idx = hfi1_lookup_pkey_idx(ibp, pkey);
+		if (mgmt_pkey_idx < 0)
+			goto drop;
+
+	}
+
+	if (qp->ibqp.qp_num > 1 &&
+	    opcode == IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE) {
+		wc.ex.imm_data = ohdr->u.ud.imm_data;
+		wc.wc_flags = IB_WC_WITH_IMM;
+		tlen -= sizeof(u32);
+	} else if (opcode == IB_OPCODE_UD_SEND_ONLY) {
+		wc.ex.imm_data = 0;
+		wc.wc_flags = 0;
+	} else
+		goto drop;
+
+	/*
+	 * A GRH is expected to precede the data even if not
+	 * present on the wire.
+	 */
+	wc.byte_len = tlen + sizeof(struct ib_grh);
+
+	/*
+	 * Get the next work request entry to find where to put the data.
+	 */
+	if (qp->r_flags & HFI1_R_REUSE_SGE)
+		qp->r_flags &= ~HFI1_R_REUSE_SGE;
+	else {
+		int ret;
+
+		ret = hfi1_get_rwqe(qp, 0);
+		if (ret < 0) {
+			hfi1_rc_error(qp, IB_WC_LOC_QP_OP_ERR);
+			return;
+		}
+		if (!ret) {
+			if (qp->ibqp.qp_num == 0)
+				ibp->n_vl15_dropped++;
+			return;
+		}
+	}
+	/* Silently drop packets which are too big. */
+	if (unlikely(wc.byte_len > qp->r_len)) {
+		qp->r_flags |= HFI1_R_REUSE_SGE;
+		goto drop;
+	}
+	if (has_grh) {
+		hfi1_copy_sge(&qp->r_sge, &hdr->u.l.grh,
+			      sizeof(struct ib_grh), 1);
+		wc.wc_flags |= IB_WC_GRH;
+	} else
+		hfi1_skip_sge(&qp->r_sge, sizeof(struct ib_grh), 1);
+	hfi1_copy_sge(&qp->r_sge, data, wc.byte_len - sizeof(struct ib_grh), 1);
+	hfi1_put_ss(&qp->r_sge);
+	if (!test_and_clear_bit(HFI1_R_WRID_VALID, &qp->r_aflags))
+		return;
+	wc.wr_id = qp->r_wr_id;
+	wc.status = IB_WC_SUCCESS;
+	wc.opcode = IB_WC_RECV;
+	wc.vendor_err = 0;
+	wc.qp = &qp->ibqp;
+	wc.src_qp = src_qp;
+
+	if (qp->ibqp.qp_type == IB_QPT_GSI ||
+	    qp->ibqp.qp_type == IB_QPT_SMI) {
+		if (mgmt_pkey_idx < 0) {
+			if (net_ratelimit()) {
+				struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+				struct hfi1_devdata *dd = ppd->dd;
+
+				dd_dev_err(dd, "QP type %d mgmt_pkey_idx < 0 and packet not dropped???\n",
+					   qp->ibqp.qp_type);
+				mgmt_pkey_idx = 0;
+			}
+		}
+		wc.pkey_index = (unsigned)mgmt_pkey_idx;
+	} else
+		wc.pkey_index = 0;
+
+	wc.slid = be16_to_cpu(hdr->lrh[3]);
+	sc = (be16_to_cpu(hdr->lrh[0]) >> 12) & 0xf;
+	sc |= sc4_bit;
+	wc.sl = ibp->sc_to_sl[sc];
+
+	/*
+	 * Save the LMC lower bits if the destination LID is a unicast LID.
+	 */
+	wc.dlid_path_bits = dlid >= HFI1_MULTICAST_LID_BASE ? 0 :
+		dlid & ((1 << ppd_from_ibp(ibp)->lmc) - 1);
+	wc.port_num = qp->port_num;
+	/* Signal completion event if the solicited bit is set. */
+	hfi1_cq_enter(to_icq(qp->ibqp.recv_cq), &wc,
+		      (ohdr->bth[0] &
+			cpu_to_be32(IB_BTH_SOLICITED)) != 0);
+	return;
+
+drop:
+	ibp->n_pkt_drops++;
+}
diff --git a/drivers/staging/rdma/hfi1/user_pages.c b/drivers/staging/rdma/hfi1/user_pages.c
new file mode 100644
index 000000000000..9071afbd7bf4
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/user_pages.c
@@ -0,0 +1,156 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/mm.h>
+#include <linux/device.h>
+
+#include "hfi.h"
+
+static void __hfi1_release_user_pages(struct page **p, size_t num_pages,
+				      int dirty)
+{
+	size_t i;
+
+	for (i = 0; i < num_pages; i++) {
+		if (dirty)
+			set_page_dirty_lock(p[i]);
+		put_page(p[i]);
+	}
+}
+
+/*
+ * Call with current->mm->mmap_sem held.
+ */
+static int __hfi1_get_user_pages(unsigned long start_page, size_t num_pages,
+				 struct page **p)
+{
+	unsigned long lock_limit;
+	size_t got;
+	int ret;
+
+	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+	if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
+		ret = -ENOMEM;
+		goto bail;
+	}
+
+	for (got = 0; got < num_pages; got += ret) {
+		ret = get_user_pages(current, current->mm,
+				     start_page + got * PAGE_SIZE,
+				     num_pages - got, 1, 1,
+				     p + got, NULL);
+		if (ret < 0)
+			goto bail_release;
+	}
+
+	current->mm->pinned_vm += num_pages;
+
+	ret = 0;
+	goto bail;
+
+bail_release:
+	__hfi1_release_user_pages(p, got, 0);
+bail:
+	return ret;
+}
+
+/**
+ * hfi1_map_page - a safety wrapper around pci_map_page()
+ *
+ */
+dma_addr_t hfi1_map_page(struct pci_dev *hwdev, struct page *page,
+			 unsigned long offset, size_t size, int direction)
+{
+	dma_addr_t phys;
+
+	phys = pci_map_page(hwdev, page, offset, size, direction);
+
+	return phys;
+}
+
+/**
+ * hfi1_get_user_pages - lock user pages into memory
+ * @start_page: the start page
+ * @num_pages: the number of pages
+ * @p: the output page structures
+ *
+ * This function takes a given start page (page aligned user virtual
+ * address) and pins it and the following specified number of pages.  For
+ * now, num_pages is always 1, but that will probably change at some point
+ * (because caller is doing expected sends on a single virtually contiguous
+ * buffer, so we can do all pages at once).
+ */
+int hfi1_get_user_pages(unsigned long start_page, size_t num_pages,
+			struct page **p)
+{
+	int ret;
+
+	down_write(&current->mm->mmap_sem);
+
+	ret = __hfi1_get_user_pages(start_page, num_pages, p);
+
+	up_write(&current->mm->mmap_sem);
+
+	return ret;
+}
+
+void hfi1_release_user_pages(struct page **p, size_t num_pages)
+{
+	if (current->mm) /* during close after signal, mm can be NULL */
+		down_write(&current->mm->mmap_sem);
+
+	__hfi1_release_user_pages(p, num_pages, 1);
+
+	if (current->mm) {
+		current->mm->pinned_vm -= num_pages;
+		up_write(&current->mm->mmap_sem);
+	}
+}
diff --git a/drivers/staging/rdma/hfi1/user_sdma.c b/drivers/staging/rdma/hfi1/user_sdma.c
new file mode 100644
index 000000000000..55526613a522
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/user_sdma.c
@@ -0,0 +1,1444 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#include <linux/mm.h>
+#include <linux/types.h>
+#include <linux/device.h>
+#include <linux/dmapool.h>
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/highmem.h>
+#include <linux/io.h>
+#include <linux/uio.h>
+#include <linux/rbtree.h>
+#include <linux/spinlock.h>
+#include <linux/delay.h>
+#include <linux/kthread.h>
+#include <linux/mmu_context.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+
+#include "hfi.h"
+#include "sdma.h"
+#include "user_sdma.h"
+#include "sdma.h"
+#include "verbs.h"  /* for the headers */
+#include "common.h" /* for struct hfi1_tid_info */
+#include "trace.h"
+
+static uint hfi1_sdma_comp_ring_size = 128;
+module_param_named(sdma_comp_size, hfi1_sdma_comp_ring_size, uint, S_IRUGO);
+MODULE_PARM_DESC(sdma_comp_size, "Size of User SDMA completion ring. Default: 128");
+
+/* The maximum number of Data io vectors per message/request */
+#define MAX_VECTORS_PER_REQ 8
+/*
+ * Maximum number of packet to send from each message/request
+ * before moving to the next one.
+ */
+#define MAX_PKTS_PER_QUEUE 16
+
+#define num_pages(x) (1 + ((((x) - 1) & PAGE_MASK) >> PAGE_SHIFT))
+
+#define req_opcode(x) \
+	(((x) >> HFI1_SDMA_REQ_OPCODE_SHIFT) & HFI1_SDMA_REQ_OPCODE_MASK)
+#define req_version(x) \
+	(((x) >> HFI1_SDMA_REQ_VERSION_SHIFT) & HFI1_SDMA_REQ_OPCODE_MASK)
+#define req_iovcnt(x) \
+	(((x) >> HFI1_SDMA_REQ_IOVCNT_SHIFT) & HFI1_SDMA_REQ_IOVCNT_MASK)
+
+/* Number of BTH.PSN bits used for sequence number in expected rcvs */
+#define BTH_SEQ_MASK 0x7ffull
+
+/*
+ * Define fields in the KDETH header so we can update the header
+ * template.
+ */
+#define KDETH_OFFSET_SHIFT        0
+#define KDETH_OFFSET_MASK         0x7fff
+#define KDETH_OM_SHIFT            15
+#define KDETH_OM_MASK             0x1
+#define KDETH_TID_SHIFT           16
+#define KDETH_TID_MASK            0x3ff
+#define KDETH_TIDCTRL_SHIFT       26
+#define KDETH_TIDCTRL_MASK        0x3
+#define KDETH_INTR_SHIFT          28
+#define KDETH_INTR_MASK           0x1
+#define KDETH_SH_SHIFT            29
+#define KDETH_SH_MASK             0x1
+#define KDETH_HCRC_UPPER_SHIFT    16
+#define KDETH_HCRC_UPPER_MASK     0xff
+#define KDETH_HCRC_LOWER_SHIFT    24
+#define KDETH_HCRC_LOWER_MASK     0xff
+
+#define PBC2LRH(x) ((((x) & 0xfff) << 2) - 4)
+#define LRH2PBC(x) ((((x) >> 2) + 1) & 0xfff)
+
+#define KDETH_GET(val, field)						\
+	(((le32_to_cpu((val))) >> KDETH_##field##_SHIFT) & KDETH_##field##_MASK)
+#define KDETH_SET(dw, field, val) do {					\
+		u32 dwval = le32_to_cpu(dw);				\
+		dwval &= ~(KDETH_##field##_MASK << KDETH_##field##_SHIFT); \
+		dwval |= (((val) & KDETH_##field##_MASK) << \
+			  KDETH_##field##_SHIFT);			\
+		dw = cpu_to_le32(dwval);				\
+	} while (0)
+
+#define AHG_HEADER_SET(arr, idx, dw, bit, width, value)			\
+	do {								\
+		if ((idx) < ARRAY_SIZE((arr)))				\
+			(arr)[(idx++)] = sdma_build_ahg_descriptor(	\
+				(__force u16)(value), (dw), (bit),	\
+							(width));	\
+		else							\
+			return -ERANGE;					\
+	} while (0)
+
+/* KDETH OM multipliers and switch over point */
+#define KDETH_OM_SMALL     4
+#define KDETH_OM_LARGE     64
+#define KDETH_OM_MAX_SIZE  (1 << ((KDETH_OM_LARGE / KDETH_OM_SMALL) + 1))
+
+/* Last packet in the request */
+#define USER_SDMA_TXREQ_FLAGS_LAST_PKT   (1 << 0)
+
+#define SDMA_REQ_IN_USE     0
+#define SDMA_REQ_FOR_THREAD 1
+#define SDMA_REQ_SEND_DONE  2
+#define SDMA_REQ_HAVE_AHG   3
+#define SDMA_REQ_HAS_ERROR  4
+#define SDMA_REQ_DONE_ERROR 5
+
+#define SDMA_PKT_Q_INACTIVE (1 << 0)
+#define SDMA_PKT_Q_ACTIVE   (1 << 1)
+#define SDMA_PKT_Q_DEFERRED (1 << 2)
+
+/*
+ * Maximum retry attempts to submit a TX request
+ * before putting the process to sleep.
+ */
+#define MAX_DEFER_RETRY_COUNT 1
+
+static unsigned initial_pkt_count = 8;
+
+#define SDMA_IOWAIT_TIMEOUT 1000 /* in milliseconds */
+
+struct user_sdma_iovec {
+	struct iovec iov;
+	/* number of pages in this vector */
+	unsigned npages;
+	/* array of pinned pages for this vector */
+	struct page **pages;
+	/* offset into the virtual address space of the vector at
+	 * which we last left off. */
+	u64 offset;
+};
+
+struct user_sdma_request {
+	struct sdma_req_info info;
+	struct hfi1_user_sdma_pkt_q *pq;
+	struct hfi1_user_sdma_comp_q *cq;
+	/* This is the original header from user space */
+	struct hfi1_pkt_header hdr;
+	/*
+	 * Pointer to the SDMA engine for this request.
+	 * Since different request could be on different VLs,
+	 * each request will need it's own engine pointer.
+	 */
+	struct sdma_engine *sde;
+	u8 ahg_idx;
+	u32 ahg[9];
+	/*
+	 * KDETH.Offset (Eager) field
+	 * We need to remember the initial value so the headers
+	 * can be updated properly.
+	 */
+	u32 koffset;
+	/*
+	 * KDETH.OFFSET (TID) field
+	 * The offset can cover multiple packets, depending on the
+	 * size of the TID entry.
+	 */
+	u32 tidoffset;
+	/*
+	 * KDETH.OM
+	 * Remember this because the header template always sets it
+	 * to 0.
+	 */
+	u8 omfactor;
+	/*
+	 * pointer to the user's task_struct. We are going to
+	 * get a reference to it so we can process io vectors
+	 * at a later time.
+	 */
+	struct task_struct *user_proc;
+	/*
+	 * pointer to the user's mm_struct. We are going to
+	 * get a reference to it so it doesn't get freed
+	 * since we might not be in process context when we
+	 * are processing the iov's.
+	 * Using this mm_struct, we can get vma based on the
+	 * iov's address (find_vma()).
+	 */
+	struct mm_struct *user_mm;
+	/*
+	 * We copy the iovs for this request (based on
+	 * info.iovcnt). These are only the data vectors
+	 */
+	unsigned data_iovs;
+	/* total length of the data in the request */
+	u32 data_len;
+	/* progress index moving along the iovs array */
+	unsigned iov_idx;
+	struct user_sdma_iovec iovs[MAX_VECTORS_PER_REQ];
+	/* number of elements copied to the tids array */
+	u16 n_tids;
+	/* TID array values copied from the tid_iov vector */
+	u32 *tids;
+	u16 tididx;
+	u32 sent;
+	u64 seqnum;
+	spinlock_t list_lock;
+	struct list_head txps;
+	unsigned long flags;
+};
+
+struct user_sdma_txreq {
+	/* Packet header for the txreq */
+	struct hfi1_pkt_header hdr;
+	struct sdma_txreq txreq;
+	struct user_sdma_request *req;
+	struct user_sdma_iovec *iovec1;
+	struct user_sdma_iovec *iovec2;
+	u16 flags;
+	unsigned busycount;
+	u64 seqnum;
+};
+
+#define SDMA_DBG(req, fmt, ...)				     \
+	hfi1_cdbg(SDMA, "[%u:%u:%u:%u] " fmt, (req)->pq->dd->unit, \
+		 (req)->pq->ctxt, (req)->pq->subctxt, (req)->info.comp_idx, \
+		 ##__VA_ARGS__)
+#define SDMA_Q_DBG(pq, fmt, ...)			 \
+	hfi1_cdbg(SDMA, "[%u:%u:%u] " fmt, (pq)->dd->unit, (pq)->ctxt, \
+		 (pq)->subctxt, ##__VA_ARGS__)
+
+static int user_sdma_send_pkts(struct user_sdma_request *, unsigned);
+static int num_user_pages(const struct iovec *);
+static void user_sdma_txreq_cb(struct sdma_txreq *, int, int);
+static void user_sdma_free_request(struct user_sdma_request *);
+static int pin_vector_pages(struct user_sdma_request *,
+			    struct user_sdma_iovec *);
+static void unpin_vector_pages(struct user_sdma_iovec *);
+static int check_header_template(struct user_sdma_request *,
+				 struct hfi1_pkt_header *, u32, u32);
+static int set_txreq_header(struct user_sdma_request *,
+			    struct user_sdma_txreq *, u32);
+static int set_txreq_header_ahg(struct user_sdma_request *,
+				struct user_sdma_txreq *, u32);
+static inline void set_comp_state(struct user_sdma_request *,
+					enum hfi1_sdma_comp_state, int);
+static inline u32 set_pkt_bth_psn(__be32, u8, u32);
+static inline u32 get_lrh_len(struct hfi1_pkt_header, u32 len);
+
+static int defer_packet_queue(
+	struct sdma_engine *,
+	struct iowait *,
+	struct sdma_txreq *,
+	unsigned seq);
+static void activate_packet_queue(struct iowait *, int);
+
+static inline int iovec_may_free(struct user_sdma_iovec *iovec,
+				       void (*free)(struct user_sdma_iovec *))
+{
+	if (ACCESS_ONCE(iovec->offset) == iovec->iov.iov_len) {
+		free(iovec);
+		return 1;
+	}
+	return 0;
+}
+
+static inline void iovec_set_complete(struct user_sdma_iovec *iovec)
+{
+	iovec->offset = iovec->iov.iov_len;
+}
+
+static int defer_packet_queue(
+	struct sdma_engine *sde,
+	struct iowait *wait,
+	struct sdma_txreq *txreq,
+	unsigned seq)
+{
+	struct hfi1_user_sdma_pkt_q *pq =
+		container_of(wait, struct hfi1_user_sdma_pkt_q, busy);
+	struct hfi1_ibdev *dev = &pq->dd->verbs_dev;
+	struct user_sdma_txreq *tx =
+		container_of(txreq, struct user_sdma_txreq, txreq);
+
+	if (sdma_progress(sde, seq, txreq)) {
+		if (tx->busycount++ < MAX_DEFER_RETRY_COUNT)
+			goto eagain;
+	}
+	/*
+	 * We are assuming that if the list is enqueued somewhere, it
+	 * is to the dmawait list since that is the only place where
+	 * it is supposed to be enqueued.
+	 */
+	xchg(&pq->state, SDMA_PKT_Q_DEFERRED);
+	write_seqlock(&dev->iowait_lock);
+	if (list_empty(&pq->busy.list))
+		list_add_tail(&pq->busy.list, &sde->dmawait);
+	write_sequnlock(&dev->iowait_lock);
+	return -EBUSY;
+eagain:
+	return -EAGAIN;
+}
+
+static void activate_packet_queue(struct iowait *wait, int reason)
+{
+	struct hfi1_user_sdma_pkt_q *pq =
+		container_of(wait, struct hfi1_user_sdma_pkt_q, busy);
+	xchg(&pq->state, SDMA_PKT_Q_ACTIVE);
+	wake_up(&wait->wait_dma);
+};
+
+static void sdma_kmem_cache_ctor(void *obj)
+{
+	struct user_sdma_txreq *tx = (struct user_sdma_txreq *)obj;
+
+	memset(tx, 0, sizeof(*tx));
+}
+
+int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt, struct file *fp)
+{
+	int ret = 0;
+	unsigned memsize;
+	char buf[64];
+	struct hfi1_devdata *dd;
+	struct hfi1_user_sdma_comp_q *cq;
+	struct hfi1_user_sdma_pkt_q *pq;
+	unsigned long flags;
+
+	if (!uctxt || !fp) {
+		ret = -EBADF;
+		goto done;
+	}
+
+	if (!hfi1_sdma_comp_ring_size) {
+		ret = -EINVAL;
+		goto done;
+	}
+
+	dd = uctxt->dd;
+
+	pq = kzalloc(sizeof(*pq), GFP_KERNEL);
+	if (!pq) {
+		dd_dev_err(dd,
+			   "[%u:%u] Failed to allocate SDMA request struct\n",
+			   uctxt->ctxt, subctxt_fp(fp));
+		goto pq_nomem;
+	}
+	memsize = sizeof(*pq->reqs) * hfi1_sdma_comp_ring_size;
+	pq->reqs = kmalloc(memsize, GFP_KERNEL);
+	if (!pq->reqs) {
+		dd_dev_err(dd,
+			   "[%u:%u] Failed to allocate SDMA request queue (%u)\n",
+			   uctxt->ctxt, subctxt_fp(fp), memsize);
+		goto pq_reqs_nomem;
+	}
+	INIT_LIST_HEAD(&pq->list);
+	pq->dd = dd;
+	pq->ctxt = uctxt->ctxt;
+	pq->subctxt = subctxt_fp(fp);
+	pq->n_max_reqs = hfi1_sdma_comp_ring_size;
+	pq->state = SDMA_PKT_Q_INACTIVE;
+	atomic_set(&pq->n_reqs, 0);
+
+	iowait_init(&pq->busy, 0, NULL, defer_packet_queue,
+		    activate_packet_queue);
+	pq->reqidx = 0;
+	snprintf(buf, 64, "txreq-kmem-cache-%u-%u-%u", dd->unit, uctxt->ctxt,
+		 subctxt_fp(fp));
+	pq->txreq_cache = kmem_cache_create(buf,
+			       sizeof(struct user_sdma_txreq),
+					    L1_CACHE_BYTES,
+					    SLAB_HWCACHE_ALIGN,
+					    sdma_kmem_cache_ctor);
+	if (!pq->txreq_cache) {
+		dd_dev_err(dd, "[%u] Failed to allocate TxReq cache\n",
+			   uctxt->ctxt);
+		goto pq_txreq_nomem;
+	}
+	user_sdma_pkt_fp(fp) = pq;
+	cq = kzalloc(sizeof(*cq), GFP_KERNEL);
+	if (!cq) {
+		dd_dev_err(dd,
+			   "[%u:%u] Failed to allocate SDMA completion queue\n",
+			   uctxt->ctxt, subctxt_fp(fp));
+		goto cq_nomem;
+	}
+
+	memsize = ALIGN(sizeof(*cq->comps) * hfi1_sdma_comp_ring_size,
+			PAGE_SIZE);
+	cq->comps = vmalloc_user(memsize);
+	if (!cq->comps) {
+		dd_dev_err(dd,
+		      "[%u:%u] Failed to allocate SDMA completion queue entries\n",
+		      uctxt->ctxt, subctxt_fp(fp));
+		goto cq_comps_nomem;
+	}
+	cq->nentries = hfi1_sdma_comp_ring_size;
+	user_sdma_comp_fp(fp) = cq;
+
+	spin_lock_irqsave(&uctxt->sdma_qlock, flags);
+	list_add(&pq->list, &uctxt->sdma_queues);
+	spin_unlock_irqrestore(&uctxt->sdma_qlock, flags);
+	goto done;
+
+cq_comps_nomem:
+	kfree(cq);
+cq_nomem:
+	kmem_cache_destroy(pq->txreq_cache);
+pq_txreq_nomem:
+	kfree(pq->reqs);
+pq_reqs_nomem:
+	kfree(pq);
+	user_sdma_pkt_fp(fp) = NULL;
+pq_nomem:
+	ret = -ENOMEM;
+done:
+	return ret;
+}
+
+int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd)
+{
+	struct hfi1_ctxtdata *uctxt = fd->uctxt;
+	struct hfi1_user_sdma_pkt_q *pq;
+	unsigned long flags;
+
+	hfi1_cdbg(SDMA, "[%u:%u:%u] Freeing user SDMA queues", uctxt->dd->unit,
+		  uctxt->ctxt, fd->subctxt);
+	pq = fd->pq;
+	if (pq) {
+		u16 i, j;
+
+		spin_lock_irqsave(&uctxt->sdma_qlock, flags);
+		if (!list_empty(&pq->list))
+			list_del_init(&pq->list);
+		spin_unlock_irqrestore(&uctxt->sdma_qlock, flags);
+		iowait_sdma_drain(&pq->busy);
+		if (pq->reqs) {
+			for (i = 0, j = 0; i < atomic_read(&pq->n_reqs) &&
+				     j < pq->n_max_reqs; j++) {
+				struct user_sdma_request *req = &pq->reqs[j];
+
+				if (test_bit(SDMA_REQ_IN_USE, &req->flags)) {
+					set_comp_state(req, ERROR, -ECOMM);
+					user_sdma_free_request(req);
+					i++;
+				}
+			}
+			kfree(pq->reqs);
+		}
+		if (pq->txreq_cache)
+			kmem_cache_destroy(pq->txreq_cache);
+		kfree(pq);
+		fd->pq = NULL;
+	}
+	if (fd->cq) {
+		if (fd->cq->comps)
+			vfree(fd->cq->comps);
+		kfree(fd->cq);
+		fd->cq = NULL;
+	}
+	return 0;
+}
+
+int hfi1_user_sdma_process_request(struct file *fp, struct iovec *iovec,
+				   unsigned long dim, unsigned long *count)
+{
+	int ret = 0, i = 0, sent;
+	struct hfi1_ctxtdata *uctxt = ctxt_fp(fp);
+	struct hfi1_user_sdma_pkt_q *pq = user_sdma_pkt_fp(fp);
+	struct hfi1_user_sdma_comp_q *cq = user_sdma_comp_fp(fp);
+	struct hfi1_devdata *dd = pq->dd;
+	unsigned long idx = 0;
+	u8 pcount = initial_pkt_count;
+	struct sdma_req_info info;
+	struct user_sdma_request *req;
+	u8 opcode, sc, vl;
+
+	if (iovec[idx].iov_len < sizeof(info) + sizeof(req->hdr)) {
+		hfi1_cdbg(
+		   SDMA,
+		   "[%u:%u:%u] First vector not big enough for header %lu/%lu",
+		   dd->unit, uctxt->ctxt, subctxt_fp(fp),
+		   iovec[idx].iov_len, sizeof(info) + sizeof(req->hdr));
+		ret = -EINVAL;
+		goto done;
+	}
+	ret = copy_from_user(&info, iovec[idx].iov_base, sizeof(info));
+	if (ret) {
+		hfi1_cdbg(SDMA, "[%u:%u:%u] Failed to copy info QW (%d)",
+			  dd->unit, uctxt->ctxt, subctxt_fp(fp), ret);
+		ret = -EFAULT;
+		goto done;
+	}
+	trace_hfi1_sdma_user_reqinfo(dd, uctxt->ctxt, subctxt_fp(fp),
+				     (u16 *)&info);
+	if (cq->comps[info.comp_idx].status == QUEUED) {
+		hfi1_cdbg(SDMA, "[%u:%u:%u] Entry %u is in QUEUED state",
+			  dd->unit, uctxt->ctxt, subctxt_fp(fp),
+			  info.comp_idx);
+		ret = -EBADSLT;
+		goto done;
+	}
+	if (!info.fragsize) {
+		hfi1_cdbg(SDMA,
+			  "[%u:%u:%u:%u] Request does not specify fragsize",
+			  dd->unit, uctxt->ctxt, subctxt_fp(fp), info.comp_idx);
+		ret = -EINVAL;
+		goto done;
+	}
+	/*
+	 * We've done all the safety checks that we can up to this point,
+	 * "allocate" the request entry.
+	 */
+	hfi1_cdbg(SDMA, "[%u:%u:%u] Using req/comp entry %u\n", dd->unit,
+		  uctxt->ctxt, subctxt_fp(fp), info.comp_idx);
+	req = pq->reqs + info.comp_idx;
+	memset(req, 0, sizeof(*req));
+	/* Mark the request as IN_USE before we start filling it in. */
+	set_bit(SDMA_REQ_IN_USE, &req->flags);
+	req->data_iovs = req_iovcnt(info.ctrl) - 1;
+	req->pq = pq;
+	req->cq = cq;
+	INIT_LIST_HEAD(&req->txps);
+	spin_lock_init(&req->list_lock);
+	memcpy(&req->info, &info, sizeof(info));
+
+	if (req_opcode(info.ctrl) == EXPECTED)
+		req->data_iovs--;
+
+	if (!info.npkts || req->data_iovs > MAX_VECTORS_PER_REQ) {
+		SDMA_DBG(req, "Too many vectors (%u/%u)", req->data_iovs,
+			 MAX_VECTORS_PER_REQ);
+		ret = -EINVAL;
+		goto done;
+	}
+	/* Copy the header from the user buffer */
+	ret = copy_from_user(&req->hdr, iovec[idx].iov_base + sizeof(info),
+			     sizeof(req->hdr));
+	if (ret) {
+		SDMA_DBG(req, "Failed to copy header template (%d)", ret);
+		ret = -EFAULT;
+		goto free_req;
+	}
+
+	/* If Static rate control is not enabled, sanitize the header. */
+	if (!HFI1_CAP_IS_USET(STATIC_RATE_CTRL))
+		req->hdr.pbc[2] = 0;
+
+	/* Validate the opcode. Do not trust packets from user space blindly. */
+	opcode = (be32_to_cpu(req->hdr.bth[0]) >> 24) & 0xff;
+	if ((opcode & USER_OPCODE_CHECK_MASK) !=
+	     USER_OPCODE_CHECK_VAL) {
+		SDMA_DBG(req, "Invalid opcode (%d)", opcode);
+		ret = -EINVAL;
+		goto free_req;
+	}
+	/*
+	 * Validate the vl. Do not trust packets from user space blindly.
+	 * VL comes from PBC, SC comes from LRH, and the VL needs to
+	 * match the SC look up.
+	 */
+	vl = (le16_to_cpu(req->hdr.pbc[0]) >> 12) & 0xF;
+	sc = (((be16_to_cpu(req->hdr.lrh[0]) >> 12) & 0xF) |
+	      (((le16_to_cpu(req->hdr.pbc[1]) >> 14) & 0x1) << 4));
+	if (vl >= dd->pport->vls_operational ||
+	    vl != sc_to_vlt(dd, sc)) {
+		SDMA_DBG(req, "Invalid SC(%u)/VL(%u)", sc, vl);
+		ret = -EINVAL;
+		goto free_req;
+	}
+
+	/*
+	 * Also should check the BTH.lnh. If it says the next header is GRH then
+	 * the RXE parsing will be off and will land in the middle of the KDETH
+	 * or miss it entirely.
+	 */
+	if ((be16_to_cpu(req->hdr.lrh[0]) & 0x3) == HFI1_LRH_GRH) {
+		SDMA_DBG(req, "User tried to pass in a GRH");
+		ret = -EINVAL;
+		goto free_req;
+	}
+
+	req->koffset = le32_to_cpu(req->hdr.kdeth.swdata[6]);
+	/* Calculate the initial TID offset based on the values of
+	   KDETH.OFFSET and KDETH.OM that are passed in. */
+	req->tidoffset = KDETH_GET(req->hdr.kdeth.ver_tid_offset, OFFSET) *
+		(KDETH_GET(req->hdr.kdeth.ver_tid_offset, OM) ?
+		 KDETH_OM_LARGE : KDETH_OM_SMALL);
+	SDMA_DBG(req, "Initial TID offset %u", req->tidoffset);
+	idx++;
+
+	/* Save all the IO vector structures */
+	while (i < req->data_iovs) {
+		memcpy(&req->iovs[i].iov, iovec + idx++, sizeof(struct iovec));
+		req->iovs[i].offset = 0;
+		req->data_len += req->iovs[i++].iov.iov_len;
+	}
+	SDMA_DBG(req, "total data length %u", req->data_len);
+
+	if (pcount > req->info.npkts)
+		pcount = req->info.npkts;
+	/*
+	 * Copy any TID info
+	 * User space will provide the TID info only when the
+	 * request type is EXPECTED. This is true even if there is
+	 * only one packet in the request and the header is already
+	 * setup. The reason for the singular TID case is that the
+	 * driver needs to perform safety checks.
+	 */
+	if (req_opcode(req->info.ctrl) == EXPECTED) {
+		u16 ntids = iovec[idx].iov_len / sizeof(*req->tids);
+
+		if (!ntids || ntids > MAX_TID_PAIR_ENTRIES) {
+			ret = -EINVAL;
+			goto free_req;
+		}
+		req->tids = kcalloc(ntids, sizeof(*req->tids), GFP_KERNEL);
+		if (!req->tids) {
+			ret = -ENOMEM;
+			goto free_req;
+		}
+		/*
+		 * We have to copy all of the tids because they may vary
+		 * in size and, therefore, the TID count might not be
+		 * equal to the pkt count. However, there is no way to
+		 * tell at this point.
+		 */
+		ret = copy_from_user(req->tids, iovec[idx].iov_base,
+				     ntids * sizeof(*req->tids));
+		if (ret) {
+			SDMA_DBG(req, "Failed to copy %d TIDs (%d)",
+				 ntids, ret);
+			ret = -EFAULT;
+			goto free_req;
+		}
+		req->n_tids = ntids;
+		idx++;
+	}
+
+	/* Have to select the engine */
+	req->sde = sdma_select_engine_vl(dd,
+					 (u32)(uctxt->ctxt + subctxt_fp(fp)),
+					 vl);
+	if (!req->sde || !sdma_running(req->sde)) {
+		ret = -ECOMM;
+		goto free_req;
+	}
+
+	/* We don't need an AHG entry if the request contains only one packet */
+	if (req->info.npkts > 1 && HFI1_CAP_IS_USET(SDMA_AHG)) {
+		int ahg = sdma_ahg_alloc(req->sde);
+
+		if (likely(ahg >= 0)) {
+			req->ahg_idx = (u8)ahg;
+			set_bit(SDMA_REQ_HAVE_AHG, &req->flags);
+		}
+	}
+
+	set_comp_state(req, QUEUED, 0);
+	/* Send the first N packets in the request to buy us some time */
+	sent = user_sdma_send_pkts(req, pcount);
+	if (unlikely(sent < 0)) {
+		if (sent != -EBUSY) {
+			ret = sent;
+			goto send_err;
+		} else
+			sent = 0;
+	}
+	atomic_inc(&pq->n_reqs);
+
+	if (sent < req->info.npkts) {
+		/* Take the references to the user's task and mm_struct */
+		get_task_struct(current);
+		req->user_proc = current;
+
+		/*
+		 * This is a somewhat blocking send implementation.
+		 * The driver will block the caller until all packets of the
+		 * request have been submitted to the SDMA engine. However, it
+		 * will not wait for send completions.
+		 */
+		while (!test_bit(SDMA_REQ_SEND_DONE, &req->flags)) {
+			ret = user_sdma_send_pkts(req, pcount);
+			if (ret < 0) {
+				if (ret != -EBUSY)
+					goto send_err;
+				wait_event_interruptible_timeout(
+					pq->busy.wait_dma,
+					(pq->state == SDMA_PKT_Q_ACTIVE),
+					msecs_to_jiffies(
+						SDMA_IOWAIT_TIMEOUT));
+			}
+		}
+
+	}
+	ret = 0;
+	*count += idx;
+	goto done;
+send_err:
+	set_comp_state(req, ERROR, ret);
+free_req:
+	user_sdma_free_request(req);
+done:
+	return ret;
+}
+
+static inline u32 compute_data_length(struct user_sdma_request *req,
+					    struct user_sdma_txreq *tx)
+{
+	/*
+	 * Determine the proper size of the packet data.
+	 * The size of the data of the first packet is in the header
+	 * template. However, it includes the header and ICRC, which need
+	 * to be subtracted.
+	 * The size of the remaining packets is the minimum of the frag
+	 * size (MTU) or remaining data in the request.
+	 */
+	u32 len;
+
+	if (!req->seqnum) {
+		len = ((be16_to_cpu(req->hdr.lrh[2]) << 2) -
+		       (sizeof(tx->hdr) - 4));
+	} else if (req_opcode(req->info.ctrl) == EXPECTED) {
+		u32 tidlen = EXP_TID_GET(req->tids[req->tididx], LEN) *
+			PAGE_SIZE;
+		/* Get the data length based on the remaining space in the
+		 * TID pair. */
+		len = min(tidlen - req->tidoffset, (u32)req->info.fragsize);
+		/* If we've filled up the TID pair, move to the next one. */
+		if (unlikely(!len) && ++req->tididx < req->n_tids &&
+		    req->tids[req->tididx]) {
+			tidlen = EXP_TID_GET(req->tids[req->tididx],
+					     LEN) * PAGE_SIZE;
+			req->tidoffset = 0;
+			len = min_t(u32, tidlen, req->info.fragsize);
+		}
+		/* Since the TID pairs map entire pages, make sure that we
+		 * are not going to try to send more data that we have
+		 * remaining. */
+		len = min(len, req->data_len - req->sent);
+	} else
+		len = min(req->data_len - req->sent, (u32)req->info.fragsize);
+	SDMA_DBG(req, "Data Length = %u", len);
+	return len;
+}
+
+static inline u32 get_lrh_len(struct hfi1_pkt_header hdr, u32 len)
+{
+	/* (Size of complete header - size of PBC) + 4B ICRC + data length */
+	return ((sizeof(hdr) - sizeof(hdr.pbc)) + 4 + len);
+}
+
+static int user_sdma_send_pkts(struct user_sdma_request *req, unsigned maxpkts)
+{
+	int ret = 0;
+	unsigned npkts = 0;
+	struct user_sdma_txreq *tx = NULL;
+	struct hfi1_user_sdma_pkt_q *pq = NULL;
+	struct user_sdma_iovec *iovec = NULL;
+
+	if (!req->pq) {
+		ret = -EINVAL;
+		goto done;
+	}
+
+	pq = req->pq;
+
+	/*
+	 * Check if we might have sent the entire request already
+	 */
+	if (unlikely(req->seqnum == req->info.npkts)) {
+		if (!list_empty(&req->txps))
+			goto dosend;
+		goto done;
+	}
+
+	if (!maxpkts || maxpkts > req->info.npkts - req->seqnum)
+		maxpkts = req->info.npkts - req->seqnum;
+
+	while (npkts < maxpkts) {
+		u32 datalen = 0, queued = 0, data_sent = 0;
+		u64 iov_offset = 0;
+
+		/*
+		 * Check whether any of the completions have come back
+		 * with errors. If so, we are not going to process any
+		 * more packets from this request.
+		 */
+		if (test_bit(SDMA_REQ_HAS_ERROR, &req->flags)) {
+			set_bit(SDMA_REQ_DONE_ERROR, &req->flags);
+			ret = -EFAULT;
+			goto done;
+		}
+
+		tx = kmem_cache_alloc(pq->txreq_cache, GFP_KERNEL);
+		if (!tx) {
+			ret = -ENOMEM;
+			goto done;
+		}
+		tx->flags = 0;
+		tx->req = req;
+		tx->busycount = 0;
+		tx->iovec1 = NULL;
+		tx->iovec2 = NULL;
+
+		if (req->seqnum == req->info.npkts - 1)
+			tx->flags |= USER_SDMA_TXREQ_FLAGS_LAST_PKT;
+
+		/*
+		 * Calculate the payload size - this is min of the fragment
+		 * (MTU) size or the remaining bytes in the request but only
+		 * if we have payload data.
+		 */
+		if (req->data_len) {
+			iovec = &req->iovs[req->iov_idx];
+			if (ACCESS_ONCE(iovec->offset) == iovec->iov.iov_len) {
+				if (++req->iov_idx == req->data_iovs) {
+					ret = -EFAULT;
+					goto free_txreq;
+				}
+				iovec = &req->iovs[req->iov_idx];
+				WARN_ON(iovec->offset);
+			}
+
+			/*
+			 * This request might include only a header and no user
+			 * data, so pin pages only if there is data and it the
+			 * pages have not been pinned already.
+			 */
+			if (unlikely(!iovec->pages && iovec->iov.iov_len)) {
+				ret = pin_vector_pages(req, iovec);
+				if (ret)
+					goto free_tx;
+			}
+
+			tx->iovec1 = iovec;
+			datalen = compute_data_length(req, tx);
+			if (!datalen) {
+				SDMA_DBG(req,
+					 "Request has data but pkt len is 0");
+				ret = -EFAULT;
+				goto free_tx;
+			}
+		}
+
+		if (test_bit(SDMA_REQ_HAVE_AHG, &req->flags)) {
+			if (!req->seqnum) {
+				u16 pbclen = le16_to_cpu(req->hdr.pbc[0]);
+				u32 lrhlen = get_lrh_len(req->hdr, datalen);
+				/*
+				 * Copy the request header into the tx header
+				 * because the HW needs a cacheline-aligned
+				 * address.
+				 * This copy can be optimized out if the hdr
+				 * member of user_sdma_request were also
+				 * cacheline aligned.
+				 */
+				memcpy(&tx->hdr, &req->hdr, sizeof(tx->hdr));
+				if (PBC2LRH(pbclen) != lrhlen) {
+					pbclen = (pbclen & 0xf000) |
+						LRH2PBC(lrhlen);
+					tx->hdr.pbc[0] = cpu_to_le16(pbclen);
+				}
+				ret = sdma_txinit_ahg(&tx->txreq,
+						      SDMA_TXREQ_F_AHG_COPY,
+						      sizeof(tx->hdr) + datalen,
+						      req->ahg_idx, 0, NULL, 0,
+						      user_sdma_txreq_cb);
+				if (ret)
+					goto free_tx;
+				ret = sdma_txadd_kvaddr(pq->dd, &tx->txreq,
+							&tx->hdr,
+							sizeof(tx->hdr));
+				if (ret)
+					goto free_txreq;
+			} else {
+				int changes;
+
+				changes = set_txreq_header_ahg(req, tx,
+							       datalen);
+				if (changes < 0)
+					goto free_tx;
+				sdma_txinit_ahg(&tx->txreq,
+						SDMA_TXREQ_F_USE_AHG,
+						datalen, req->ahg_idx, changes,
+						req->ahg, sizeof(req->hdr),
+						user_sdma_txreq_cb);
+			}
+		} else {
+			ret = sdma_txinit(&tx->txreq, 0, sizeof(req->hdr) +
+					  datalen, user_sdma_txreq_cb);
+			if (ret)
+				goto free_tx;
+			/*
+			 * Modify the header for this packet. This only needs
+			 * to be done if we are not going to use AHG. Otherwise,
+			 * the HW will do it based on the changes we gave it
+			 * during sdma_txinit_ahg().
+			 */
+			ret = set_txreq_header(req, tx, datalen);
+			if (ret)
+				goto free_txreq;
+		}
+
+		/*
+		 * If the request contains any data vectors, add up to
+		 * fragsize bytes to the descriptor.
+		 */
+		while (queued < datalen &&
+		       (req->sent + data_sent) < req->data_len) {
+			unsigned long base, offset;
+			unsigned pageidx, len;
+
+			base = (unsigned long)iovec->iov.iov_base;
+			offset = ((base + iovec->offset + iov_offset) &
+				  ~PAGE_MASK);
+			pageidx = (((iovec->offset + iov_offset +
+				     base) - (base & PAGE_MASK)) >> PAGE_SHIFT);
+			len = offset + req->info.fragsize > PAGE_SIZE ?
+				PAGE_SIZE - offset : req->info.fragsize;
+			len = min((datalen - queued), len);
+			ret = sdma_txadd_page(pq->dd, &tx->txreq,
+					      iovec->pages[pageidx],
+					      offset, len);
+			if (ret) {
+				dd_dev_err(pq->dd,
+					   "SDMA txreq add page failed %d\n",
+					   ret);
+				iovec_set_complete(iovec);
+				goto free_txreq;
+			}
+			iov_offset += len;
+			queued += len;
+			data_sent += len;
+			if (unlikely(queued < datalen &&
+				     pageidx == iovec->npages &&
+				     req->iov_idx < req->data_iovs - 1)) {
+				iovec->offset += iov_offset;
+				iovec = &req->iovs[++req->iov_idx];
+				if (!iovec->pages) {
+					ret = pin_vector_pages(req, iovec);
+					if (ret)
+						goto free_txreq;
+				}
+				iov_offset = 0;
+				tx->iovec2 = iovec;
+
+			}
+		}
+		/*
+		 * The txreq was submitted successfully so we can update
+		 * the counters.
+		 */
+		req->koffset += datalen;
+		if (req_opcode(req->info.ctrl) == EXPECTED)
+			req->tidoffset += datalen;
+		req->sent += data_sent;
+		if (req->data_len) {
+			if (tx->iovec1 && !tx->iovec2)
+				tx->iovec1->offset += iov_offset;
+			else if (tx->iovec2)
+				tx->iovec2->offset += iov_offset;
+		}
+		/*
+		 * It is important to increment this here as it is used to
+		 * generate the BTH.PSN and, therefore, can't be bulk-updated
+		 * outside of the loop.
+		 */
+		tx->seqnum = req->seqnum++;
+		list_add_tail(&tx->txreq.list, &req->txps);
+		npkts++;
+	}
+dosend:
+	ret = sdma_send_txlist(req->sde, &pq->busy, &req->txps);
+	if (list_empty(&req->txps))
+		if (req->seqnum == req->info.npkts) {
+			set_bit(SDMA_REQ_SEND_DONE, &req->flags);
+			/*
+			 * The txreq has already been submitted to the HW queue
+			 * so we can free the AHG entry now. Corruption will not
+			 * happen due to the sequential manner in which
+			 * descriptors are processed.
+			 */
+			if (test_bit(SDMA_REQ_HAVE_AHG, &req->flags))
+				sdma_ahg_free(req->sde, req->ahg_idx);
+		}
+	goto done;
+free_txreq:
+	sdma_txclean(pq->dd, &tx->txreq);
+free_tx:
+	kmem_cache_free(pq->txreq_cache, tx);
+done:
+	return ret;
+}
+
+/*
+ * How many pages in this iovec element?
+ */
+static inline int num_user_pages(const struct iovec *iov)
+{
+	const unsigned long addr  = (unsigned long) iov->iov_base;
+	const unsigned long len   = iov->iov_len;
+	const unsigned long spage = addr & PAGE_MASK;
+	const unsigned long epage = (addr + len - 1) & PAGE_MASK;
+
+	return 1 + ((epage - spage) >> PAGE_SHIFT);
+}
+
+static int pin_vector_pages(struct user_sdma_request *req,
+			    struct user_sdma_iovec *iovec) {
+	int ret = 0;
+	unsigned pinned;
+
+	iovec->npages = num_user_pages(&iovec->iov);
+	iovec->pages = kzalloc(sizeof(*iovec->pages) *
+			       iovec->npages, GFP_KERNEL);
+	if (!iovec->pages) {
+		SDMA_DBG(req, "Failed page array alloc");
+		ret = -ENOMEM;
+		goto done;
+	}
+	/* If called by the kernel thread, use the user's mm */
+	if (current->flags & PF_KTHREAD)
+		use_mm(req->user_proc->mm);
+	pinned = get_user_pages_fast(
+		(unsigned long)iovec->iov.iov_base,
+		iovec->npages, 0, iovec->pages);
+	/* If called by the kernel thread, unuse the user's mm */
+	if (current->flags & PF_KTHREAD)
+		unuse_mm(req->user_proc->mm);
+	if (pinned != iovec->npages) {
+		SDMA_DBG(req, "Failed to pin pages (%u/%u)", pinned,
+			 iovec->npages);
+		ret = -EFAULT;
+		goto pfree;
+	}
+	goto done;
+pfree:
+	unpin_vector_pages(iovec);
+done:
+	return ret;
+}
+
+static void unpin_vector_pages(struct user_sdma_iovec *iovec)
+{
+	unsigned i;
+
+	if (ACCESS_ONCE(iovec->offset) != iovec->iov.iov_len) {
+		hfi1_cdbg(SDMA,
+			  "the complete vector has not been sent yet %llu %zu",
+			  iovec->offset, iovec->iov.iov_len);
+		return;
+	}
+	for (i = 0; i < iovec->npages; i++)
+		if (iovec->pages[i])
+			put_page(iovec->pages[i]);
+	kfree(iovec->pages);
+	iovec->pages = NULL;
+	iovec->npages = 0;
+	iovec->offset = 0;
+}
+
+static int check_header_template(struct user_sdma_request *req,
+				 struct hfi1_pkt_header *hdr, u32 lrhlen,
+				 u32 datalen)
+{
+	/*
+	 * Perform safety checks for any type of packet:
+	 *    - transfer size is multiple of 64bytes
+	 *    - packet length is multiple of 4bytes
+	 *    - entire request length is multiple of 4bytes
+	 *    - packet length is not larger than MTU size
+	 *
+	 * These checks are only done for the first packet of the
+	 * transfer since the header is "given" to us by user space.
+	 * For the remainder of the packets we compute the values.
+	 */
+	if (req->info.fragsize % PIO_BLOCK_SIZE ||
+	    lrhlen & 0x3 || req->data_len & 0x3  ||
+	    lrhlen > get_lrh_len(*hdr, req->info.fragsize))
+		return -EINVAL;
+
+	if (req_opcode(req->info.ctrl) == EXPECTED) {
+		/*
+		 * The header is checked only on the first packet. Furthermore,
+		 * we ensure that at least one TID entry is copied when the
+		 * request is submitted. Therefore, we don't have to verify that
+		 * tididx points to something sane.
+		 */
+		u32 tidval = req->tids[req->tididx],
+			tidlen = EXP_TID_GET(tidval, LEN) * PAGE_SIZE,
+			tididx = EXP_TID_GET(tidval, IDX),
+			tidctrl = EXP_TID_GET(tidval, CTRL),
+			tidoff;
+		__le32 kval = hdr->kdeth.ver_tid_offset;
+
+		tidoff = KDETH_GET(kval, OFFSET) *
+			  (KDETH_GET(req->hdr.kdeth.ver_tid_offset, OM) ?
+			   KDETH_OM_LARGE : KDETH_OM_SMALL);
+		/*
+		 * Expected receive packets have the following
+		 * additional checks:
+		 *     - offset is not larger than the TID size
+		 *     - TIDCtrl values match between header and TID array
+		 *     - TID indexes match between header and TID array
+		 */
+		if ((tidoff + datalen > tidlen) ||
+		    KDETH_GET(kval, TIDCTRL) != tidctrl ||
+		    KDETH_GET(kval, TID) != tididx)
+			return -EINVAL;
+	}
+	return 0;
+}
+
+/*
+ * Correctly set the BTH.PSN field based on type of
+ * transfer - eager packets can just increment the PSN but
+ * expected packets encode generation and sequence in the
+ * BTH.PSN field so just incrementing will result in errors.
+ */
+static inline u32 set_pkt_bth_psn(__be32 bthpsn, u8 expct, u32 frags)
+{
+	u32 val = be32_to_cpu(bthpsn),
+		mask = (HFI1_CAP_IS_KSET(EXTENDED_PSN) ? 0x7fffffffull :
+			0xffffffull),
+		psn = val & mask;
+	if (expct)
+		psn = (psn & ~BTH_SEQ_MASK) | ((psn + frags) & BTH_SEQ_MASK);
+	else
+		psn = psn + frags;
+	return psn & mask;
+}
+
+static int set_txreq_header(struct user_sdma_request *req,
+			    struct user_sdma_txreq *tx, u32 datalen)
+{
+	struct hfi1_user_sdma_pkt_q *pq = req->pq;
+	struct hfi1_pkt_header *hdr = &tx->hdr;
+	u16 pbclen;
+	int ret;
+	u32 tidval = 0, lrhlen = get_lrh_len(*hdr, datalen);
+
+	/* Copy the header template to the request before modification */
+	memcpy(hdr, &req->hdr, sizeof(*hdr));
+
+	/*
+	 * Check if the PBC and LRH length are mismatched. If so
+	 * adjust both in the header.
+	 */
+	pbclen = le16_to_cpu(hdr->pbc[0]);
+	if (PBC2LRH(pbclen) != lrhlen) {
+		pbclen = (pbclen & 0xf000) | LRH2PBC(lrhlen);
+		hdr->pbc[0] = cpu_to_le16(pbclen);
+		hdr->lrh[2] = cpu_to_be16(lrhlen >> 2);
+		/*
+		 * Third packet
+		 * This is the first packet in the sequence that has
+		 * a "static" size that can be used for the rest of
+		 * the packets (besides the last one).
+		 */
+		if (unlikely(req->seqnum == 2)) {
+			/*
+			 * From this point on the lengths in both the
+			 * PBC and LRH are the same until the last
+			 * packet.
+			 * Adjust the template so we don't have to update
+			 * every packet
+			 */
+			req->hdr.pbc[0] = hdr->pbc[0];
+			req->hdr.lrh[2] = hdr->lrh[2];
+		}
+	}
+	/*
+	 * We only have to modify the header if this is not the
+	 * first packet in the request. Otherwise, we use the
+	 * header given to us.
+	 */
+	if (unlikely(!req->seqnum)) {
+		ret = check_header_template(req, hdr, lrhlen, datalen);
+		if (ret)
+			return ret;
+		goto done;
+
+	}
+
+	hdr->bth[2] = cpu_to_be32(
+		set_pkt_bth_psn(hdr->bth[2],
+				(req_opcode(req->info.ctrl) == EXPECTED),
+				req->seqnum));
+
+	/* Set ACK request on last packet */
+	if (unlikely(tx->flags & USER_SDMA_TXREQ_FLAGS_LAST_PKT))
+		hdr->bth[2] |= cpu_to_be32(1UL<<31);
+
+	/* Set the new offset */
+	hdr->kdeth.swdata[6] = cpu_to_le32(req->koffset);
+	/* Expected packets have to fill in the new TID information */
+	if (req_opcode(req->info.ctrl) == EXPECTED) {
+		tidval = req->tids[req->tididx];
+		/*
+		 * If the offset puts us at the end of the current TID,
+		 * advance everything.
+		 */
+		if ((req->tidoffset) == (EXP_TID_GET(tidval, LEN) *
+					 PAGE_SIZE)) {
+			req->tidoffset = 0;
+			/* Since we don't copy all the TIDs, all at once,
+			 * we have to check again. */
+			if (++req->tididx > req->n_tids - 1 ||
+			    !req->tids[req->tididx]) {
+				return -EINVAL;
+			}
+			tidval = req->tids[req->tididx];
+		}
+		req->omfactor = EXP_TID_GET(tidval, LEN) * PAGE_SIZE >=
+			KDETH_OM_MAX_SIZE ? KDETH_OM_LARGE : KDETH_OM_SMALL;
+		/* Set KDETH.TIDCtrl based on value for this TID. */
+		KDETH_SET(hdr->kdeth.ver_tid_offset, TIDCTRL,
+			  EXP_TID_GET(tidval, CTRL));
+		/* Set KDETH.TID based on value for this TID */
+		KDETH_SET(hdr->kdeth.ver_tid_offset, TID,
+			  EXP_TID_GET(tidval, IDX));
+		/* Clear KDETH.SH only on the last packet */
+		if (unlikely(tx->flags & USER_SDMA_TXREQ_FLAGS_LAST_PKT))
+			KDETH_SET(hdr->kdeth.ver_tid_offset, SH, 0);
+		/*
+		 * Set the KDETH.OFFSET and KDETH.OM based on size of
+		 * transfer.
+		 */
+		SDMA_DBG(req, "TID offset %ubytes %uunits om%u",
+			 req->tidoffset, req->tidoffset / req->omfactor,
+			 !!(req->omfactor - KDETH_OM_SMALL));
+		KDETH_SET(hdr->kdeth.ver_tid_offset, OFFSET,
+			  req->tidoffset / req->omfactor);
+		KDETH_SET(hdr->kdeth.ver_tid_offset, OM,
+			  !!(req->omfactor - KDETH_OM_SMALL));
+	}
+done:
+	trace_hfi1_sdma_user_header(pq->dd, pq->ctxt, pq->subctxt,
+				    req->info.comp_idx, hdr, tidval);
+	return sdma_txadd_kvaddr(pq->dd, &tx->txreq, hdr, sizeof(*hdr));
+}
+
+static int set_txreq_header_ahg(struct user_sdma_request *req,
+				struct user_sdma_txreq *tx, u32 len)
+{
+	int diff = 0;
+	struct hfi1_user_sdma_pkt_q *pq = req->pq;
+	struct hfi1_pkt_header *hdr = &req->hdr;
+	u16 pbclen = le16_to_cpu(hdr->pbc[0]);
+	u32 val32, tidval = 0, lrhlen = get_lrh_len(*hdr, len);
+
+	if (PBC2LRH(pbclen) != lrhlen) {
+		/* PBC.PbcLengthDWs */
+		AHG_HEADER_SET(req->ahg, diff, 0, 0, 12,
+			       cpu_to_le16(LRH2PBC(lrhlen)));
+		/* LRH.PktLen (we need the full 16 bits due to byte swap) */
+		AHG_HEADER_SET(req->ahg, diff, 3, 0, 16,
+			       cpu_to_be16(lrhlen >> 2));
+	}
+
+	/*
+	 * Do the common updates
+	 */
+	/* BTH.PSN and BTH.A */
+	val32 = (be32_to_cpu(hdr->bth[2]) + req->seqnum) &
+		(HFI1_CAP_IS_KSET(EXTENDED_PSN) ? 0x7fffffff : 0xffffff);
+	if (unlikely(tx->flags & USER_SDMA_TXREQ_FLAGS_LAST_PKT))
+		val32 |= 1UL << 31;
+	AHG_HEADER_SET(req->ahg, diff, 6, 0, 16, cpu_to_be16(val32 >> 16));
+	AHG_HEADER_SET(req->ahg, diff, 6, 16, 16, cpu_to_be16(val32 & 0xffff));
+	/* KDETH.Offset */
+	AHG_HEADER_SET(req->ahg, diff, 15, 0, 16,
+		       cpu_to_le16(req->koffset & 0xffff));
+	AHG_HEADER_SET(req->ahg, diff, 15, 16, 16,
+		       cpu_to_le16(req->koffset >> 16));
+	if (req_opcode(req->info.ctrl) == EXPECTED) {
+		__le16 val;
+
+		tidval = req->tids[req->tididx];
+
+		/*
+		 * If the offset puts us at the end of the current TID,
+		 * advance everything.
+		 */
+		if ((req->tidoffset) == (EXP_TID_GET(tidval, LEN) *
+					 PAGE_SIZE)) {
+			req->tidoffset = 0;
+			/* Since we don't copy all the TIDs, all at once,
+			 * we have to check again. */
+			if (++req->tididx > req->n_tids - 1 ||
+			    !req->tids[req->tididx]) {
+				return -EINVAL;
+			}
+			tidval = req->tids[req->tididx];
+		}
+		req->omfactor = ((EXP_TID_GET(tidval, LEN) *
+				  PAGE_SIZE) >=
+				 KDETH_OM_MAX_SIZE) ? KDETH_OM_LARGE :
+			KDETH_OM_SMALL;
+		/* KDETH.OM and KDETH.OFFSET (TID) */
+		AHG_HEADER_SET(req->ahg, diff, 7, 0, 16,
+			       ((!!(req->omfactor - KDETH_OM_SMALL)) << 15 |
+				((req->tidoffset / req->omfactor) & 0x7fff)));
+		/* KDETH.TIDCtrl, KDETH.TID */
+		val = cpu_to_le16(((EXP_TID_GET(tidval, CTRL) & 0x3) << 10) |
+					(EXP_TID_GET(tidval, IDX) & 0x3ff));
+		/* Clear KDETH.SH on last packet */
+		if (unlikely(tx->flags & USER_SDMA_TXREQ_FLAGS_LAST_PKT)) {
+			val |= cpu_to_le16(KDETH_GET(hdr->kdeth.ver_tid_offset,
+								INTR) >> 16);
+			val &= cpu_to_le16(~(1U << 13));
+			AHG_HEADER_SET(req->ahg, diff, 7, 16, 14, val);
+		} else
+			AHG_HEADER_SET(req->ahg, diff, 7, 16, 12, val);
+	}
+
+	trace_hfi1_sdma_user_header_ahg(pq->dd, pq->ctxt, pq->subctxt,
+					req->info.comp_idx, req->sde->this_idx,
+					req->ahg_idx, req->ahg, diff, tidval);
+	return diff;
+}
+
+static void user_sdma_txreq_cb(struct sdma_txreq *txreq, int status,
+			       int drain)
+{
+	struct user_sdma_txreq *tx =
+		container_of(txreq, struct user_sdma_txreq, txreq);
+	struct user_sdma_request *req = tx->req;
+	struct hfi1_user_sdma_pkt_q *pq = req ? req->pq : NULL;
+	u64 tx_seqnum;
+
+	if (unlikely(!req || !pq))
+		return;
+
+	if (tx->iovec1)
+		iovec_may_free(tx->iovec1, unpin_vector_pages);
+	if (tx->iovec2)
+		iovec_may_free(tx->iovec2, unpin_vector_pages);
+
+	tx_seqnum = tx->seqnum;
+	kmem_cache_free(pq->txreq_cache, tx);
+
+	if (status != SDMA_TXREQ_S_OK) {
+		dd_dev_err(pq->dd, "SDMA completion with error %d", status);
+		set_comp_state(req, ERROR, status);
+		set_bit(SDMA_REQ_HAS_ERROR, &req->flags);
+		/* Do not free the request until the sender loop has ack'ed
+		 * the error and we've seen all txreqs. */
+		if (tx_seqnum == ACCESS_ONCE(req->seqnum) &&
+		    test_bit(SDMA_REQ_DONE_ERROR, &req->flags)) {
+			atomic_dec(&pq->n_reqs);
+			user_sdma_free_request(req);
+		}
+	} else {
+		if (tx_seqnum == req->info.npkts - 1) {
+			/* We've sent and completed all packets in this
+			 * request. Signal completion to the user */
+			atomic_dec(&pq->n_reqs);
+			set_comp_state(req, COMPLETE, 0);
+			user_sdma_free_request(req);
+		}
+	}
+	if (!atomic_read(&pq->n_reqs))
+		xchg(&pq->state, SDMA_PKT_Q_INACTIVE);
+}
+
+static void user_sdma_free_request(struct user_sdma_request *req)
+{
+	if (!list_empty(&req->txps)) {
+		struct sdma_txreq *t, *p;
+
+		list_for_each_entry_safe(t, p, &req->txps, list) {
+			struct user_sdma_txreq *tx =
+				container_of(t, struct user_sdma_txreq, txreq);
+			list_del_init(&t->list);
+			sdma_txclean(req->pq->dd, t);
+			kmem_cache_free(req->pq->txreq_cache, tx);
+		}
+	}
+	if (req->data_iovs) {
+		int i;
+
+		for (i = 0; i < req->data_iovs; i++)
+			if (req->iovs[i].npages && req->iovs[i].pages)
+				unpin_vector_pages(&req->iovs[i]);
+	}
+	if (req->user_proc)
+		put_task_struct(req->user_proc);
+	kfree(req->tids);
+	clear_bit(SDMA_REQ_IN_USE, &req->flags);
+}
+
+static inline void set_comp_state(struct user_sdma_request *req,
+					enum hfi1_sdma_comp_state state,
+					int ret)
+{
+	SDMA_DBG(req, "Setting completion status %u %d", state, ret);
+	req->cq->comps[req->info.comp_idx].status = state;
+	if (state == ERROR)
+		req->cq->comps[req->info.comp_idx].errcode = -ret;
+	trace_hfi1_sdma_user_completion(req->pq->dd, req->pq->ctxt,
+					req->pq->subctxt, req->info.comp_idx,
+					state, ret);
+}
diff --git a/drivers/staging/rdma/hfi1/user_sdma.h b/drivers/staging/rdma/hfi1/user_sdma.h
new file mode 100644
index 000000000000..fa4422553e23
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/user_sdma.h
@@ -0,0 +1,89 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#include <linux/device.h>
+#include <linux/wait.h>
+
+#include "common.h"
+#include "iowait.h"
+
+#define EXP_TID_TIDLEN_MASK   0x7FFULL
+#define EXP_TID_TIDLEN_SHIFT  0
+#define EXP_TID_TIDCTRL_MASK  0x3ULL
+#define EXP_TID_TIDCTRL_SHIFT 20
+#define EXP_TID_TIDIDX_MASK   0x7FFULL
+#define EXP_TID_TIDIDX_SHIFT  22
+#define EXP_TID_GET(tid, field)	\
+	(((tid) >> EXP_TID_TID##field##_SHIFT) & EXP_TID_TID##field##_MASK)
+
+extern uint extended_psn;
+
+struct hfi1_user_sdma_pkt_q {
+	struct list_head list;
+	unsigned ctxt;
+	unsigned subctxt;
+	u16 n_max_reqs;
+	atomic_t n_reqs;
+	u16 reqidx;
+	struct hfi1_devdata *dd;
+	struct kmem_cache *txreq_cache;
+	struct user_sdma_request *reqs;
+	struct iowait busy;
+	unsigned state;
+};
+
+struct hfi1_user_sdma_comp_q {
+	u16 nentries;
+	struct hfi1_sdma_comp_entry *comps;
+};
+
+int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *, struct file *);
+int hfi1_user_sdma_free_queues(struct hfi1_filedata *);
+int hfi1_user_sdma_process_request(struct file *, struct iovec *, unsigned long,
+				   unsigned long *);
diff --git a/drivers/staging/rdma/hfi1/verbs.c b/drivers/staging/rdma/hfi1/verbs.c
new file mode 100644
index 000000000000..5f4b6617677a
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/verbs.c
@@ -0,0 +1,2142 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <rdma/ib_mad.h>
+#include <rdma/ib_user_verbs.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/utsname.h>
+#include <linux/rculist.h>
+#include <linux/mm.h>
+#include <linux/random.h>
+#include <linux/vmalloc.h>
+
+#include "hfi.h"
+#include "common.h"
+#include "device.h"
+#include "trace.h"
+#include "qp.h"
+#include "sdma.h"
+
+unsigned int hfi1_lkey_table_size = 16;
+module_param_named(lkey_table_size, hfi1_lkey_table_size, uint,
+		   S_IRUGO);
+MODULE_PARM_DESC(lkey_table_size,
+		 "LKEY table size in bits (2^n, 1 <= n <= 23)");
+
+static unsigned int hfi1_max_pds = 0xFFFF;
+module_param_named(max_pds, hfi1_max_pds, uint, S_IRUGO);
+MODULE_PARM_DESC(max_pds,
+		 "Maximum number of protection domains to support");
+
+static unsigned int hfi1_max_ahs = 0xFFFF;
+module_param_named(max_ahs, hfi1_max_ahs, uint, S_IRUGO);
+MODULE_PARM_DESC(max_ahs, "Maximum number of address handles to support");
+
+unsigned int hfi1_max_cqes = 0x2FFFF;
+module_param_named(max_cqes, hfi1_max_cqes, uint, S_IRUGO);
+MODULE_PARM_DESC(max_cqes,
+		 "Maximum number of completion queue entries to support");
+
+unsigned int hfi1_max_cqs = 0x1FFFF;
+module_param_named(max_cqs, hfi1_max_cqs, uint, S_IRUGO);
+MODULE_PARM_DESC(max_cqs, "Maximum number of completion queues to support");
+
+unsigned int hfi1_max_qp_wrs = 0x3FFF;
+module_param_named(max_qp_wrs, hfi1_max_qp_wrs, uint, S_IRUGO);
+MODULE_PARM_DESC(max_qp_wrs, "Maximum number of QP WRs to support");
+
+unsigned int hfi1_max_qps = 16384;
+module_param_named(max_qps, hfi1_max_qps, uint, S_IRUGO);
+MODULE_PARM_DESC(max_qps, "Maximum number of QPs to support");
+
+unsigned int hfi1_max_sges = 0x60;
+module_param_named(max_sges, hfi1_max_sges, uint, S_IRUGO);
+MODULE_PARM_DESC(max_sges, "Maximum number of SGEs to support");
+
+unsigned int hfi1_max_mcast_grps = 16384;
+module_param_named(max_mcast_grps, hfi1_max_mcast_grps, uint, S_IRUGO);
+MODULE_PARM_DESC(max_mcast_grps,
+		 "Maximum number of multicast groups to support");
+
+unsigned int hfi1_max_mcast_qp_attached = 16;
+module_param_named(max_mcast_qp_attached, hfi1_max_mcast_qp_attached,
+		   uint, S_IRUGO);
+MODULE_PARM_DESC(max_mcast_qp_attached,
+		 "Maximum number of attached QPs to support");
+
+unsigned int hfi1_max_srqs = 1024;
+module_param_named(max_srqs, hfi1_max_srqs, uint, S_IRUGO);
+MODULE_PARM_DESC(max_srqs, "Maximum number of SRQs to support");
+
+unsigned int hfi1_max_srq_sges = 128;
+module_param_named(max_srq_sges, hfi1_max_srq_sges, uint, S_IRUGO);
+MODULE_PARM_DESC(max_srq_sges, "Maximum number of SRQ SGEs to support");
+
+unsigned int hfi1_max_srq_wrs = 0x1FFFF;
+module_param_named(max_srq_wrs, hfi1_max_srq_wrs, uint, S_IRUGO);
+MODULE_PARM_DESC(max_srq_wrs, "Maximum number of SRQ WRs support");
+
+static void verbs_sdma_complete(
+	struct sdma_txreq *cookie,
+	int status,
+	int drained);
+
+/*
+ * Note that it is OK to post send work requests in the SQE and ERR
+ * states; hfi1_do_send() will process them and generate error
+ * completions as per IB 1.2 C10-96.
+ */
+const int ib_hfi1_state_ops[IB_QPS_ERR + 1] = {
+	[IB_QPS_RESET] = 0,
+	[IB_QPS_INIT] = HFI1_POST_RECV_OK,
+	[IB_QPS_RTR] = HFI1_POST_RECV_OK | HFI1_PROCESS_RECV_OK,
+	[IB_QPS_RTS] = HFI1_POST_RECV_OK | HFI1_PROCESS_RECV_OK |
+	    HFI1_POST_SEND_OK | HFI1_PROCESS_SEND_OK |
+	    HFI1_PROCESS_NEXT_SEND_OK,
+	[IB_QPS_SQD] = HFI1_POST_RECV_OK | HFI1_PROCESS_RECV_OK |
+	    HFI1_POST_SEND_OK | HFI1_PROCESS_SEND_OK,
+	[IB_QPS_SQE] = HFI1_POST_RECV_OK | HFI1_PROCESS_RECV_OK |
+	    HFI1_POST_SEND_OK | HFI1_FLUSH_SEND,
+	[IB_QPS_ERR] = HFI1_POST_RECV_OK | HFI1_FLUSH_RECV |
+	    HFI1_POST_SEND_OK | HFI1_FLUSH_SEND,
+};
+
+struct hfi1_ucontext {
+	struct ib_ucontext ibucontext;
+};
+
+static inline struct hfi1_ucontext *to_iucontext(struct ib_ucontext
+						  *ibucontext)
+{
+	return container_of(ibucontext, struct hfi1_ucontext, ibucontext);
+}
+
+/*
+ * Translate ib_wr_opcode into ib_wc_opcode.
+ */
+const enum ib_wc_opcode ib_hfi1_wc_opcode[] = {
+	[IB_WR_RDMA_WRITE] = IB_WC_RDMA_WRITE,
+	[IB_WR_RDMA_WRITE_WITH_IMM] = IB_WC_RDMA_WRITE,
+	[IB_WR_SEND] = IB_WC_SEND,
+	[IB_WR_SEND_WITH_IMM] = IB_WC_SEND,
+	[IB_WR_RDMA_READ] = IB_WC_RDMA_READ,
+	[IB_WR_ATOMIC_CMP_AND_SWP] = IB_WC_COMP_SWAP,
+	[IB_WR_ATOMIC_FETCH_AND_ADD] = IB_WC_FETCH_ADD
+};
+
+/*
+ * Length of header by opcode, 0 --> not supported
+ */
+const u8 hdr_len_by_opcode[256] = {
+	/* RC */
+	[IB_OPCODE_RC_SEND_FIRST]                     = 12 + 8,
+	[IB_OPCODE_RC_SEND_MIDDLE]                    = 12 + 8,
+	[IB_OPCODE_RC_SEND_LAST]                      = 12 + 8,
+	[IB_OPCODE_RC_SEND_LAST_WITH_IMMEDIATE]       = 12 + 8 + 4,
+	[IB_OPCODE_RC_SEND_ONLY]                      = 12 + 8,
+	[IB_OPCODE_RC_SEND_ONLY_WITH_IMMEDIATE]       = 12 + 8 + 4,
+	[IB_OPCODE_RC_RDMA_WRITE_FIRST]               = 12 + 8 + 16,
+	[IB_OPCODE_RC_RDMA_WRITE_MIDDLE]              = 12 + 8,
+	[IB_OPCODE_RC_RDMA_WRITE_LAST]                = 12 + 8,
+	[IB_OPCODE_RC_RDMA_WRITE_LAST_WITH_IMMEDIATE] = 12 + 8 + 4,
+	[IB_OPCODE_RC_RDMA_WRITE_ONLY]                = 12 + 8 + 16,
+	[IB_OPCODE_RC_RDMA_WRITE_ONLY_WITH_IMMEDIATE] = 12 + 8 + 20,
+	[IB_OPCODE_RC_RDMA_READ_REQUEST]              = 12 + 8 + 16,
+	[IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST]       = 12 + 8 + 4,
+	[IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE]      = 12 + 8,
+	[IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST]        = 12 + 8 + 4,
+	[IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY]        = 12 + 8 + 4,
+	[IB_OPCODE_RC_ACKNOWLEDGE]                    = 12 + 8 + 4,
+	[IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE]             = 12 + 8 + 4,
+	[IB_OPCODE_RC_COMPARE_SWAP]                   = 12 + 8 + 28,
+	[IB_OPCODE_RC_FETCH_ADD]                      = 12 + 8 + 28,
+	/* UC */
+	[IB_OPCODE_UC_SEND_FIRST]                     = 12 + 8,
+	[IB_OPCODE_UC_SEND_MIDDLE]                    = 12 + 8,
+	[IB_OPCODE_UC_SEND_LAST]                      = 12 + 8,
+	[IB_OPCODE_UC_SEND_LAST_WITH_IMMEDIATE]       = 12 + 8 + 4,
+	[IB_OPCODE_UC_SEND_ONLY]                      = 12 + 8,
+	[IB_OPCODE_UC_SEND_ONLY_WITH_IMMEDIATE]       = 12 + 8 + 4,
+	[IB_OPCODE_UC_RDMA_WRITE_FIRST]               = 12 + 8 + 16,
+	[IB_OPCODE_UC_RDMA_WRITE_MIDDLE]              = 12 + 8,
+	[IB_OPCODE_UC_RDMA_WRITE_LAST]                = 12 + 8,
+	[IB_OPCODE_UC_RDMA_WRITE_LAST_WITH_IMMEDIATE] = 12 + 8 + 4,
+	[IB_OPCODE_UC_RDMA_WRITE_ONLY]                = 12 + 8 + 16,
+	[IB_OPCODE_UC_RDMA_WRITE_ONLY_WITH_IMMEDIATE] = 12 + 8 + 20,
+	/* UD */
+	[IB_OPCODE_UD_SEND_ONLY]                      = 12 + 8 + 8,
+	[IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE]       = 12 + 8 + 12
+};
+
+static const opcode_handler opcode_handler_tbl[256] = {
+	/* RC */
+	[IB_OPCODE_RC_SEND_FIRST]                     = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_SEND_MIDDLE]                    = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_SEND_LAST]                      = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_SEND_LAST_WITH_IMMEDIATE]       = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_SEND_ONLY]                      = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_SEND_ONLY_WITH_IMMEDIATE]       = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_RDMA_WRITE_FIRST]               = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_RDMA_WRITE_MIDDLE]              = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_RDMA_WRITE_LAST]                = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_RDMA_WRITE_LAST_WITH_IMMEDIATE] = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_RDMA_WRITE_ONLY]                = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_RDMA_WRITE_ONLY_WITH_IMMEDIATE] = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_RDMA_READ_REQUEST]              = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST]       = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE]      = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST]        = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY]        = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_ACKNOWLEDGE]                    = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE]             = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_COMPARE_SWAP]                   = &hfi1_rc_rcv,
+	[IB_OPCODE_RC_FETCH_ADD]                      = &hfi1_rc_rcv,
+	/* UC */
+	[IB_OPCODE_UC_SEND_FIRST]                     = &hfi1_uc_rcv,
+	[IB_OPCODE_UC_SEND_MIDDLE]                    = &hfi1_uc_rcv,
+	[IB_OPCODE_UC_SEND_LAST]                      = &hfi1_uc_rcv,
+	[IB_OPCODE_UC_SEND_LAST_WITH_IMMEDIATE]       = &hfi1_uc_rcv,
+	[IB_OPCODE_UC_SEND_ONLY]                      = &hfi1_uc_rcv,
+	[IB_OPCODE_UC_SEND_ONLY_WITH_IMMEDIATE]       = &hfi1_uc_rcv,
+	[IB_OPCODE_UC_RDMA_WRITE_FIRST]               = &hfi1_uc_rcv,
+	[IB_OPCODE_UC_RDMA_WRITE_MIDDLE]              = &hfi1_uc_rcv,
+	[IB_OPCODE_UC_RDMA_WRITE_LAST]                = &hfi1_uc_rcv,
+	[IB_OPCODE_UC_RDMA_WRITE_LAST_WITH_IMMEDIATE] = &hfi1_uc_rcv,
+	[IB_OPCODE_UC_RDMA_WRITE_ONLY]                = &hfi1_uc_rcv,
+	[IB_OPCODE_UC_RDMA_WRITE_ONLY_WITH_IMMEDIATE] = &hfi1_uc_rcv,
+	/* UD */
+	[IB_OPCODE_UD_SEND_ONLY]                      = &hfi1_ud_rcv,
+	[IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE]       = &hfi1_ud_rcv,
+	/* CNP */
+	[IB_OPCODE_CNP]				      = &hfi1_cnp_rcv
+};
+
+/*
+ * System image GUID.
+ */
+__be64 ib_hfi1_sys_image_guid;
+
+/**
+ * hfi1_copy_sge - copy data to SGE memory
+ * @ss: the SGE state
+ * @data: the data to copy
+ * @length: the length of the data
+ */
+void hfi1_copy_sge(
+	struct hfi1_sge_state *ss,
+	void *data, u32 length,
+	int release)
+{
+	struct hfi1_sge *sge = &ss->sge;
+
+	while (length) {
+		u32 len = sge->length;
+
+		if (len > length)
+			len = length;
+		if (len > sge->sge_length)
+			len = sge->sge_length;
+		WARN_ON_ONCE(len == 0);
+		memcpy(sge->vaddr, data, len);
+		sge->vaddr += len;
+		sge->length -= len;
+		sge->sge_length -= len;
+		if (sge->sge_length == 0) {
+			if (release)
+				hfi1_put_mr(sge->mr);
+			if (--ss->num_sge)
+				*sge = *ss->sg_list++;
+		} else if (sge->length == 0 && sge->mr->lkey) {
+			if (++sge->n >= HFI1_SEGSZ) {
+				if (++sge->m >= sge->mr->mapsz)
+					break;
+				sge->n = 0;
+			}
+			sge->vaddr =
+				sge->mr->map[sge->m]->segs[sge->n].vaddr;
+			sge->length =
+				sge->mr->map[sge->m]->segs[sge->n].length;
+		}
+		data += len;
+		length -= len;
+	}
+}
+
+/**
+ * hfi1_skip_sge - skip over SGE memory
+ * @ss: the SGE state
+ * @length: the number of bytes to skip
+ */
+void hfi1_skip_sge(struct hfi1_sge_state *ss, u32 length, int release)
+{
+	struct hfi1_sge *sge = &ss->sge;
+
+	while (length) {
+		u32 len = sge->length;
+
+		if (len > length)
+			len = length;
+		if (len > sge->sge_length)
+			len = sge->sge_length;
+		WARN_ON_ONCE(len == 0);
+		sge->vaddr += len;
+		sge->length -= len;
+		sge->sge_length -= len;
+		if (sge->sge_length == 0) {
+			if (release)
+				hfi1_put_mr(sge->mr);
+			if (--ss->num_sge)
+				*sge = *ss->sg_list++;
+		} else if (sge->length == 0 && sge->mr->lkey) {
+			if (++sge->n >= HFI1_SEGSZ) {
+				if (++sge->m >= sge->mr->mapsz)
+					break;
+				sge->n = 0;
+			}
+			sge->vaddr =
+				sge->mr->map[sge->m]->segs[sge->n].vaddr;
+			sge->length =
+				sge->mr->map[sge->m]->segs[sge->n].length;
+		}
+		length -= len;
+	}
+}
+
+/**
+ * post_one_send - post one RC, UC, or UD send work request
+ * @qp: the QP to post on
+ * @wr: the work request to send
+ */
+static int post_one_send(struct hfi1_qp *qp, struct ib_send_wr *wr)
+{
+	struct hfi1_swqe *wqe;
+	u32 next;
+	int i;
+	int j;
+	int acc;
+	struct hfi1_lkey_table *rkt;
+	struct hfi1_pd *pd;
+	struct hfi1_devdata *dd = dd_from_ibdev(qp->ibqp.device);
+	struct hfi1_pportdata *ppd;
+	struct hfi1_ibport *ibp;
+
+	/* IB spec says that num_sge == 0 is OK. */
+	if (unlikely(wr->num_sge > qp->s_max_sge))
+		return -EINVAL;
+
+	ppd = &dd->pport[qp->port_num - 1];
+	ibp = &ppd->ibport_data;
+
+	/*
+	 * Don't allow RDMA reads or atomic operations on UC or
+	 * undefined operations.
+	 * Make sure buffer is large enough to hold the result for atomics.
+	 */
+	if (wr->opcode == IB_WR_FAST_REG_MR) {
+		return -EINVAL;
+	} else if (qp->ibqp.qp_type == IB_QPT_UC) {
+		if ((unsigned) wr->opcode >= IB_WR_RDMA_READ)
+			return -EINVAL;
+	} else if (qp->ibqp.qp_type != IB_QPT_RC) {
+		/* Check IB_QPT_SMI, IB_QPT_GSI, IB_QPT_UD opcode */
+		if (wr->opcode != IB_WR_SEND &&
+		    wr->opcode != IB_WR_SEND_WITH_IMM)
+			return -EINVAL;
+		/* Check UD destination address PD */
+		if (qp->ibqp.pd != wr->wr.ud.ah->pd)
+			return -EINVAL;
+	} else if ((unsigned) wr->opcode > IB_WR_ATOMIC_FETCH_AND_ADD)
+		return -EINVAL;
+	else if (wr->opcode >= IB_WR_ATOMIC_CMP_AND_SWP &&
+		   (wr->num_sge == 0 ||
+		    wr->sg_list[0].length < sizeof(u64) ||
+		    wr->sg_list[0].addr & (sizeof(u64) - 1)))
+		return -EINVAL;
+	else if (wr->opcode >= IB_WR_RDMA_READ && !qp->s_max_rd_atomic)
+		return -EINVAL;
+
+	next = qp->s_head + 1;
+	if (next >= qp->s_size)
+		next = 0;
+	if (next == qp->s_last)
+		return -ENOMEM;
+
+	rkt = &to_idev(qp->ibqp.device)->lk_table;
+	pd = to_ipd(qp->ibqp.pd);
+	wqe = get_swqe_ptr(qp, qp->s_head);
+	wqe->wr = *wr;
+	wqe->length = 0;
+	j = 0;
+	if (wr->num_sge) {
+		acc = wr->opcode >= IB_WR_RDMA_READ ?
+			IB_ACCESS_LOCAL_WRITE : 0;
+		for (i = 0; i < wr->num_sge; i++) {
+			u32 length = wr->sg_list[i].length;
+			int ok;
+
+			if (length == 0)
+				continue;
+			ok = hfi1_lkey_ok(rkt, pd, &wqe->sg_list[j],
+					  &wr->sg_list[i], acc);
+			if (!ok)
+				goto bail_inval_free;
+			wqe->length += length;
+			j++;
+		}
+		wqe->wr.num_sge = j;
+	}
+	if (qp->ibqp.qp_type == IB_QPT_UC ||
+	    qp->ibqp.qp_type == IB_QPT_RC) {
+		if (wqe->length > 0x80000000U)
+			goto bail_inval_free;
+	} else {
+		struct hfi1_ah *ah = to_iah(wr->wr.ud.ah);
+
+		atomic_inc(&ah->refcount);
+	}
+	wqe->ssn = qp->s_ssn++;
+	qp->s_head = next;
+
+	return 0;
+
+bail_inval_free:
+	/* release mr holds */
+	while (j) {
+		struct hfi1_sge *sge = &wqe->sg_list[--j];
+
+		hfi1_put_mr(sge->mr);
+	}
+	return -EINVAL;
+}
+
+/**
+ * post_send - post a send on a QP
+ * @ibqp: the QP to post the send on
+ * @wr: the list of work requests to post
+ * @bad_wr: the first bad WR is put here
+ *
+ * This may be called from interrupt context.
+ */
+static int post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
+		     struct ib_send_wr **bad_wr)
+{
+	struct hfi1_qp *qp = to_iqp(ibqp);
+	int err = 0;
+	int call_send;
+	unsigned long flags;
+	unsigned nreq = 0;
+
+	spin_lock_irqsave(&qp->s_lock, flags);
+
+	/* Check that state is OK to post send. */
+	if (unlikely(!(ib_hfi1_state_ops[qp->state] & HFI1_POST_SEND_OK))) {
+		spin_unlock_irqrestore(&qp->s_lock, flags);
+		return -EINVAL;
+	}
+
+	/* sq empty and not list -> call send */
+	call_send = qp->s_head == qp->s_last && !wr->next;
+
+	for (; wr; wr = wr->next) {
+		err = post_one_send(qp, wr);
+		if (unlikely(err)) {
+			*bad_wr = wr;
+			goto bail;
+		}
+		nreq++;
+	}
+bail:
+	if (nreq && !call_send)
+		hfi1_schedule_send(qp);
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+	if (nreq && call_send)
+		hfi1_do_send(&qp->s_iowait.iowork);
+	return err;
+}
+
+/**
+ * post_receive - post a receive on a QP
+ * @ibqp: the QP to post the receive on
+ * @wr: the WR to post
+ * @bad_wr: the first bad WR is put here
+ *
+ * This may be called from interrupt context.
+ */
+static int post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr,
+			struct ib_recv_wr **bad_wr)
+{
+	struct hfi1_qp *qp = to_iqp(ibqp);
+	struct hfi1_rwq *wq = qp->r_rq.wq;
+	unsigned long flags;
+	int ret;
+
+	/* Check that state is OK to post receive. */
+	if (!(ib_hfi1_state_ops[qp->state] & HFI1_POST_RECV_OK) || !wq) {
+		*bad_wr = wr;
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	for (; wr; wr = wr->next) {
+		struct hfi1_rwqe *wqe;
+		u32 next;
+		int i;
+
+		if ((unsigned) wr->num_sge > qp->r_rq.max_sge) {
+			*bad_wr = wr;
+			ret = -EINVAL;
+			goto bail;
+		}
+
+		spin_lock_irqsave(&qp->r_rq.lock, flags);
+		next = wq->head + 1;
+		if (next >= qp->r_rq.size)
+			next = 0;
+		if (next == wq->tail) {
+			spin_unlock_irqrestore(&qp->r_rq.lock, flags);
+			*bad_wr = wr;
+			ret = -ENOMEM;
+			goto bail;
+		}
+
+		wqe = get_rwqe_ptr(&qp->r_rq, wq->head);
+		wqe->wr_id = wr->wr_id;
+		wqe->num_sge = wr->num_sge;
+		for (i = 0; i < wr->num_sge; i++)
+			wqe->sg_list[i] = wr->sg_list[i];
+		/* Make sure queue entry is written before the head index. */
+		smp_wmb();
+		wq->head = next;
+		spin_unlock_irqrestore(&qp->r_rq.lock, flags);
+	}
+	ret = 0;
+
+bail:
+	return ret;
+}
+
+/*
+ * Make sure the QP is ready and able to accept the given opcode.
+ */
+static inline int qp_ok(int opcode, struct hfi1_packet *packet)
+{
+	struct hfi1_ibport *ibp;
+
+	if (!(ib_hfi1_state_ops[packet->qp->state] & HFI1_PROCESS_RECV_OK))
+		goto dropit;
+	if (((opcode & OPCODE_QP_MASK) == packet->qp->allowed_ops) ||
+	    (opcode == IB_OPCODE_CNP))
+		return 1;
+dropit:
+	ibp = &packet->rcd->ppd->ibport_data;
+	ibp->n_pkt_drops++;
+	return 0;
+}
+
+
+/**
+ * hfi1_ib_rcv - process an incoming packet
+ * @packet: data packet information
+ *
+ * This is called to process an incoming packet at interrupt level.
+ *
+ * Tlen is the length of the header + data + CRC in bytes.
+ */
+void hfi1_ib_rcv(struct hfi1_packet *packet)
+{
+	struct hfi1_ctxtdata *rcd = packet->rcd;
+	struct hfi1_ib_header *hdr = packet->hdr;
+	u32 tlen = packet->tlen;
+	struct hfi1_pportdata *ppd = rcd->ppd;
+	struct hfi1_ibport *ibp = &ppd->ibport_data;
+	u32 qp_num;
+	int lnh;
+	u8 opcode;
+	u16 lid;
+
+	/* Check for GRH */
+	lnh = be16_to_cpu(hdr->lrh[0]) & 3;
+	if (lnh == HFI1_LRH_BTH)
+		packet->ohdr = &hdr->u.oth;
+	else if (lnh == HFI1_LRH_GRH) {
+		u32 vtf;
+
+		packet->ohdr = &hdr->u.l.oth;
+		if (hdr->u.l.grh.next_hdr != IB_GRH_NEXT_HDR)
+			goto drop;
+		vtf = be32_to_cpu(hdr->u.l.grh.version_tclass_flow);
+		if ((vtf >> IB_GRH_VERSION_SHIFT) != IB_GRH_VERSION)
+			goto drop;
+		packet->rcv_flags |= HFI1_HAS_GRH;
+	} else
+		goto drop;
+
+	trace_input_ibhdr(rcd->dd, hdr);
+
+	opcode = (be32_to_cpu(packet->ohdr->bth[0]) >> 24);
+	inc_opstats(tlen, &rcd->opstats->stats[opcode]);
+
+	/* Get the destination QP number. */
+	qp_num = be32_to_cpu(packet->ohdr->bth[1]) & HFI1_QPN_MASK;
+	lid = be16_to_cpu(hdr->lrh[1]);
+	if (unlikely((lid >= HFI1_MULTICAST_LID_BASE) &&
+	    (lid != HFI1_PERMISSIVE_LID))) {
+		struct hfi1_mcast *mcast;
+		struct hfi1_mcast_qp *p;
+
+		if (lnh != HFI1_LRH_GRH)
+			goto drop;
+		mcast = hfi1_mcast_find(ibp, &hdr->u.l.grh.dgid);
+		if (mcast == NULL)
+			goto drop;
+		list_for_each_entry_rcu(p, &mcast->qp_list, list) {
+			packet->qp = p->qp;
+			spin_lock(&packet->qp->r_lock);
+			if (likely((qp_ok(opcode, packet))))
+				opcode_handler_tbl[opcode](packet);
+			spin_unlock(&packet->qp->r_lock);
+		}
+		/*
+		 * Notify hfi1_multicast_detach() if it is waiting for us
+		 * to finish.
+		 */
+		if (atomic_dec_return(&mcast->refcount) <= 1)
+			wake_up(&mcast->wait);
+	} else {
+		rcu_read_lock();
+		packet->qp = hfi1_lookup_qpn(ibp, qp_num);
+		if (!packet->qp) {
+			rcu_read_unlock();
+			goto drop;
+		}
+		spin_lock(&packet->qp->r_lock);
+		if (likely((qp_ok(opcode, packet))))
+			opcode_handler_tbl[opcode](packet);
+		spin_unlock(&packet->qp->r_lock);
+		rcu_read_unlock();
+	}
+	return;
+
+drop:
+	ibp->n_pkt_drops++;
+}
+
+/*
+ * This is called from a timer to check for QPs
+ * which need kernel memory in order to send a packet.
+ */
+static void mem_timer(unsigned long data)
+{
+	struct hfi1_ibdev *dev = (struct hfi1_ibdev *)data;
+	struct list_head *list = &dev->memwait;
+	struct hfi1_qp *qp = NULL;
+	struct iowait *wait;
+	unsigned long flags;
+
+	write_seqlock_irqsave(&dev->iowait_lock, flags);
+	if (!list_empty(list)) {
+		wait = list_first_entry(list, struct iowait, list);
+		qp = container_of(wait, struct hfi1_qp, s_iowait);
+		list_del_init(&qp->s_iowait.list);
+		/* refcount held until actual wake up */
+		if (!list_empty(list))
+			mod_timer(&dev->mem_timer, jiffies + 1);
+	}
+	write_sequnlock_irqrestore(&dev->iowait_lock, flags);
+
+	if (qp)
+		hfi1_qp_wakeup(qp, HFI1_S_WAIT_KMEM);
+}
+
+void update_sge(struct hfi1_sge_state *ss, u32 length)
+{
+	struct hfi1_sge *sge = &ss->sge;
+
+	sge->vaddr += length;
+	sge->length -= length;
+	sge->sge_length -= length;
+	if (sge->sge_length == 0) {
+		if (--ss->num_sge)
+			*sge = *ss->sg_list++;
+	} else if (sge->length == 0 && sge->mr->lkey) {
+		if (++sge->n >= HFI1_SEGSZ) {
+			if (++sge->m >= sge->mr->mapsz)
+				return;
+			sge->n = 0;
+		}
+		sge->vaddr = sge->mr->map[sge->m]->segs[sge->n].vaddr;
+		sge->length = sge->mr->map[sge->m]->segs[sge->n].length;
+	}
+}
+
+static noinline struct verbs_txreq *__get_txreq(struct hfi1_ibdev *dev,
+						struct hfi1_qp *qp)
+{
+	struct verbs_txreq *tx;
+	unsigned long flags;
+
+	tx = kmem_cache_alloc(dev->verbs_txreq_cache, GFP_ATOMIC);
+	if (!tx) {
+		spin_lock_irqsave(&qp->s_lock, flags);
+		write_seqlock(&dev->iowait_lock);
+		if (ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_RECV_OK &&
+		    list_empty(&qp->s_iowait.list)) {
+			dev->n_txwait++;
+			qp->s_flags |= HFI1_S_WAIT_TX;
+			list_add_tail(&qp->s_iowait.list, &dev->txwait);
+			trace_hfi1_qpsleep(qp, HFI1_S_WAIT_TX);
+			atomic_inc(&qp->refcount);
+		}
+		qp->s_flags &= ~HFI1_S_BUSY;
+		write_sequnlock(&dev->iowait_lock);
+		spin_unlock_irqrestore(&qp->s_lock, flags);
+		tx = ERR_PTR(-EBUSY);
+	}
+	return tx;
+}
+
+static inline struct verbs_txreq *get_txreq(struct hfi1_ibdev *dev,
+					    struct hfi1_qp *qp)
+{
+	struct verbs_txreq *tx;
+
+	tx = kmem_cache_alloc(dev->verbs_txreq_cache, GFP_ATOMIC);
+	if (!tx)
+		/* call slow path to get the lock */
+		tx =  __get_txreq(dev, qp);
+	if (tx)
+		tx->qp = qp;
+	return tx;
+}
+
+void hfi1_put_txreq(struct verbs_txreq *tx)
+{
+	struct hfi1_ibdev *dev;
+	struct hfi1_qp *qp;
+	unsigned long flags;
+	unsigned int seq;
+
+	qp = tx->qp;
+	dev = to_idev(qp->ibqp.device);
+
+	if (tx->mr) {
+		hfi1_put_mr(tx->mr);
+		tx->mr = NULL;
+	}
+	sdma_txclean(dd_from_dev(dev), &tx->txreq);
+
+	/* Free verbs_txreq and return to slab cache */
+	kmem_cache_free(dev->verbs_txreq_cache, tx);
+
+	do {
+		seq = read_seqbegin(&dev->iowait_lock);
+		if (!list_empty(&dev->txwait)) {
+			struct iowait *wait;
+
+			write_seqlock_irqsave(&dev->iowait_lock, flags);
+			/* Wake up first QP wanting a free struct */
+			wait = list_first_entry(&dev->txwait, struct iowait,
+						list);
+			qp = container_of(wait, struct hfi1_qp, s_iowait);
+			list_del_init(&qp->s_iowait.list);
+			/* refcount held until actual wake up */
+			write_sequnlock_irqrestore(&dev->iowait_lock, flags);
+			hfi1_qp_wakeup(qp, HFI1_S_WAIT_TX);
+			break;
+		}
+	} while (read_seqretry(&dev->iowait_lock, seq));
+}
+
+/*
+ * This is called with progress side lock held.
+ */
+/* New API */
+static void verbs_sdma_complete(
+	struct sdma_txreq *cookie,
+	int status,
+	int drained)
+{
+	struct verbs_txreq *tx =
+		container_of(cookie, struct verbs_txreq, txreq);
+	struct hfi1_qp *qp = tx->qp;
+
+	spin_lock(&qp->s_lock);
+	if (tx->wqe)
+		hfi1_send_complete(qp, tx->wqe, IB_WC_SUCCESS);
+	else if (qp->ibqp.qp_type == IB_QPT_RC) {
+		struct hfi1_ib_header *hdr;
+
+		hdr = &tx->phdr.hdr;
+		hfi1_rc_send_complete(qp, hdr);
+	}
+	if (drained) {
+		/*
+		 * This happens when the send engine notes
+		 * a QP in the error state and cannot
+		 * do the flush work until that QP's
+		 * sdma work has finished.
+		 */
+		if (qp->s_flags & HFI1_S_WAIT_DMA) {
+			qp->s_flags &= ~HFI1_S_WAIT_DMA;
+			hfi1_schedule_send(qp);
+		}
+	}
+	spin_unlock(&qp->s_lock);
+
+	hfi1_put_txreq(tx);
+}
+
+static int wait_kmem(struct hfi1_ibdev *dev, struct hfi1_qp *qp)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	spin_lock_irqsave(&qp->s_lock, flags);
+	if (ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_RECV_OK) {
+		write_seqlock(&dev->iowait_lock);
+		if (list_empty(&qp->s_iowait.list)) {
+			if (list_empty(&dev->memwait))
+				mod_timer(&dev->mem_timer, jiffies + 1);
+			qp->s_flags |= HFI1_S_WAIT_KMEM;
+			list_add_tail(&qp->s_iowait.list, &dev->memwait);
+			trace_hfi1_qpsleep(qp, HFI1_S_WAIT_KMEM);
+			atomic_inc(&qp->refcount);
+		}
+		write_sequnlock(&dev->iowait_lock);
+		qp->s_flags &= ~HFI1_S_BUSY;
+		ret = -EBUSY;
+	}
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+
+	return ret;
+}
+
+/*
+ * This routine calls txadds for each sg entry.
+ *
+ * Add failures will revert the sge cursor
+ */
+static int build_verbs_ulp_payload(
+	struct sdma_engine *sde,
+	struct hfi1_sge_state *ss,
+	u32 length,
+	struct verbs_txreq *tx)
+{
+	struct hfi1_sge *sg_list = ss->sg_list;
+	struct hfi1_sge sge = ss->sge;
+	u8 num_sge = ss->num_sge;
+	u32 len;
+	int ret = 0;
+
+	while (length) {
+		len = ss->sge.length;
+		if (len > length)
+			len = length;
+		if (len > ss->sge.sge_length)
+			len = ss->sge.sge_length;
+		WARN_ON_ONCE(len == 0);
+		ret = sdma_txadd_kvaddr(
+			sde->dd,
+			&tx->txreq,
+			ss->sge.vaddr,
+			len);
+		if (ret)
+			goto bail_txadd;
+		update_sge(ss, len);
+		length -= len;
+	}
+	return ret;
+bail_txadd:
+	/* unwind cursor */
+	ss->sge = sge;
+	ss->num_sge = num_sge;
+	ss->sg_list = sg_list;
+	return ret;
+}
+
+/*
+ * Build the number of DMA descriptors needed to send length bytes of data.
+ *
+ * NOTE: DMA mapping is held in the tx until completed in the ring or
+ *       the tx desc is freed without having been submitted to the ring
+ *
+ * This routine insures the following all the helper routine
+ * calls succeed.
+ */
+/* New API */
+static int build_verbs_tx_desc(
+	struct sdma_engine *sde,
+	struct hfi1_sge_state *ss,
+	u32 length,
+	struct verbs_txreq *tx,
+	struct ahg_ib_header *ahdr,
+	u64 pbc)
+{
+	int ret = 0;
+	struct hfi1_pio_header *phdr;
+	u16 hdrbytes = tx->hdr_dwords << 2;
+
+	phdr = &tx->phdr;
+	if (!ahdr->ahgcount) {
+		ret = sdma_txinit_ahg(
+			&tx->txreq,
+			ahdr->tx_flags,
+			hdrbytes + length,
+			ahdr->ahgidx,
+			0,
+			NULL,
+			0,
+			verbs_sdma_complete);
+		if (ret)
+			goto bail_txadd;
+		phdr->pbc = cpu_to_le64(pbc);
+		memcpy(&phdr->hdr, &ahdr->ibh, hdrbytes - sizeof(phdr->pbc));
+		/* add the header */
+		ret = sdma_txadd_kvaddr(
+			sde->dd,
+			&tx->txreq,
+			&tx->phdr,
+			tx->hdr_dwords << 2);
+		if (ret)
+			goto bail_txadd;
+	} else {
+		struct hfi1_other_headers *sohdr = &ahdr->ibh.u.oth;
+		struct hfi1_other_headers *dohdr = &phdr->hdr.u.oth;
+
+		/* needed in rc_send_complete() */
+		phdr->hdr.lrh[0] = ahdr->ibh.lrh[0];
+		if ((be16_to_cpu(phdr->hdr.lrh[0]) & 3) == HFI1_LRH_GRH) {
+			sohdr = &ahdr->ibh.u.l.oth;
+			dohdr = &phdr->hdr.u.l.oth;
+		}
+		/* opcode */
+		dohdr->bth[0] = sohdr->bth[0];
+		/* PSN/ACK  */
+		dohdr->bth[2] = sohdr->bth[2];
+		ret = sdma_txinit_ahg(
+			&tx->txreq,
+			ahdr->tx_flags,
+			length,
+			ahdr->ahgidx,
+			ahdr->ahgcount,
+			ahdr->ahgdesc,
+			hdrbytes,
+			verbs_sdma_complete);
+		if (ret)
+			goto bail_txadd;
+	}
+
+	/* add the ulp payload - if any.  ss can be NULL for acks */
+	if (ss)
+		ret = build_verbs_ulp_payload(sde, ss, length, tx);
+bail_txadd:
+	return ret;
+}
+
+int hfi1_verbs_send_dma(struct hfi1_qp *qp, struct ahg_ib_header *ahdr,
+			u32 hdrwords, struct hfi1_sge_state *ss, u32 len,
+			u32 plen, u32 dwords, u64 pbc)
+{
+	struct hfi1_ibdev *dev = to_idev(qp->ibqp.device);
+	struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	struct verbs_txreq *tx;
+	struct sdma_txreq *stx;
+	u64 pbc_flags = 0;
+	struct sdma_engine *sde;
+	u8 sc5 = qp->s_sc;
+	int ret;
+
+	if (!list_empty(&qp->s_iowait.tx_head)) {
+		stx = list_first_entry(
+			&qp->s_iowait.tx_head,
+			struct sdma_txreq,
+			list);
+		list_del_init(&stx->list);
+		tx = container_of(stx, struct verbs_txreq, txreq);
+		ret = sdma_send_txreq(tx->sde, &qp->s_iowait, stx);
+		if (unlikely(ret == -ECOMM))
+			goto bail_ecomm;
+		return ret;
+	}
+
+	tx = get_txreq(dev, qp);
+	if (IS_ERR(tx))
+		goto bail_tx;
+
+	if (!qp->s_hdr->sde) {
+		tx->sde = sde = qp_to_sdma_engine(qp, sc5);
+		if (!sde)
+			goto bail_no_sde;
+	} else
+		tx->sde = sde = qp->s_hdr->sde;
+
+	if (likely(pbc == 0)) {
+		u32 vl = sc_to_vlt(dd_from_ibdev(qp->ibqp.device), sc5);
+		/* No vl15 here */
+		/* set PBC_DC_INFO bit (aka SC[4]) in pbc_flags */
+		pbc_flags |= (!!(sc5 & 0x10)) << PBC_DC_INFO_SHIFT;
+
+		pbc = create_pbc(ppd, pbc_flags, qp->srate_mbps, vl, plen);
+	}
+	tx->wqe = qp->s_wqe;
+	tx->mr = qp->s_rdma_mr;
+	if (qp->s_rdma_mr)
+		qp->s_rdma_mr = NULL;
+	tx->hdr_dwords = hdrwords + 2;
+	ret = build_verbs_tx_desc(sde, ss, len, tx, ahdr, pbc);
+	if (unlikely(ret))
+		goto bail_build;
+	trace_output_ibhdr(dd_from_ibdev(qp->ibqp.device), &ahdr->ibh);
+	ret =  sdma_send_txreq(sde, &qp->s_iowait, &tx->txreq);
+	if (unlikely(ret == -ECOMM))
+		goto bail_ecomm;
+	return ret;
+
+bail_no_sde:
+	hfi1_put_txreq(tx);
+bail_ecomm:
+	/* The current one got "sent" */
+	return 0;
+bail_build:
+	/* kmalloc or mapping fail */
+	hfi1_put_txreq(tx);
+	return wait_kmem(dev, qp);
+bail_tx:
+	return PTR_ERR(tx);
+}
+
+/*
+ * If we are now in the error state, return zero to flush the
+ * send work request.
+ */
+static int no_bufs_available(struct hfi1_qp *qp, struct send_context *sc)
+{
+	struct hfi1_devdata *dd = sc->dd;
+	struct hfi1_ibdev *dev = &dd->verbs_dev;
+	unsigned long flags;
+	int ret = 0;
+
+	/*
+	 * Note that as soon as want_buffer() is called and
+	 * possibly before it returns, sc_piobufavail()
+	 * could be called. Therefore, put QP on the I/O wait list before
+	 * enabling the PIO avail interrupt.
+	 */
+	spin_lock_irqsave(&qp->s_lock, flags);
+	if (ib_hfi1_state_ops[qp->state] & HFI1_PROCESS_RECV_OK) {
+		write_seqlock(&dev->iowait_lock);
+		if (list_empty(&qp->s_iowait.list)) {
+			struct hfi1_ibdev *dev = &dd->verbs_dev;
+			int was_empty;
+
+			dev->n_piowait++;
+			qp->s_flags |= HFI1_S_WAIT_PIO;
+			was_empty = list_empty(&sc->piowait);
+			list_add_tail(&qp->s_iowait.list, &sc->piowait);
+			trace_hfi1_qpsleep(qp, HFI1_S_WAIT_PIO);
+			atomic_inc(&qp->refcount);
+			/* counting: only call wantpiobuf_intr if first user */
+			if (was_empty)
+				hfi1_sc_wantpiobuf_intr(sc, 1);
+		}
+		write_sequnlock(&dev->iowait_lock);
+		qp->s_flags &= ~HFI1_S_BUSY;
+		ret = -EBUSY;
+	}
+	spin_unlock_irqrestore(&qp->s_lock, flags);
+	return ret;
+}
+
+struct send_context *qp_to_send_context(struct hfi1_qp *qp, u8 sc5)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(qp->ibqp.device);
+	struct hfi1_pportdata *ppd = dd->pport + (qp->port_num - 1);
+	u8 vl;
+
+	vl = sc_to_vlt(dd, sc5);
+	if (vl >= ppd->vls_supported && vl != 15)
+		return NULL;
+	return dd->vld[vl].sc;
+}
+
+int hfi1_verbs_send_pio(struct hfi1_qp *qp, struct ahg_ib_header *ahdr,
+			u32 hdrwords, struct hfi1_sge_state *ss, u32 len,
+			u32 plen, u32 dwords, u64 pbc)
+{
+	struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	u32 *hdr = (u32 *)&ahdr->ibh;
+	u64 pbc_flags = 0;
+	u32 sc5;
+	unsigned long flags = 0;
+	struct send_context *sc;
+	struct pio_buf *pbuf;
+	int wc_status = IB_WC_SUCCESS;
+
+	/* vl15 special case taken care of in ud.c */
+	sc5 = qp->s_sc;
+	sc = qp_to_send_context(qp, sc5);
+
+	if (!sc)
+		return -EINVAL;
+	if (likely(pbc == 0)) {
+		u32 vl = sc_to_vlt(dd_from_ibdev(qp->ibqp.device), sc5);
+		/* set PBC_DC_INFO bit (aka SC[4]) in pbc_flags */
+		pbc_flags |= (!!(sc5 & 0x10)) << PBC_DC_INFO_SHIFT;
+		pbc = create_pbc(ppd, pbc_flags, qp->srate_mbps, vl, plen);
+	}
+	pbuf = sc_buffer_alloc(sc, plen, NULL, NULL);
+	if (unlikely(pbuf == NULL)) {
+		if (ppd->host_link_state != HLS_UP_ACTIVE) {
+			/*
+			 * If we have filled the PIO buffers to capacity and are
+			 * not in an active state this request is not going to
+			 * go out to so just complete it with an error or else a
+			 * ULP or the core may be stuck waiting.
+			 */
+			hfi1_cdbg(
+				PIO,
+				"alloc failed. state not active, completing");
+			wc_status = IB_WC_GENERAL_ERR;
+			goto pio_bail;
+		} else {
+			/*
+			 * This is a normal occurrence. The PIO buffs are full
+			 * up but we are still happily sending, well we could be
+			 * so lets continue to queue the request.
+			 */
+			hfi1_cdbg(PIO, "alloc failed. state active, queuing");
+			return no_bufs_available(qp, sc);
+		}
+	}
+
+	if (len == 0) {
+		pio_copy(ppd->dd, pbuf, pbc, hdr, hdrwords);
+	} else {
+		if (ss) {
+			seg_pio_copy_start(pbuf, pbc, hdr, hdrwords*4);
+			while (len) {
+				void *addr = ss->sge.vaddr;
+				u32 slen = ss->sge.length;
+
+				if (slen > len)
+					slen = len;
+				update_sge(ss, slen);
+				seg_pio_copy_mid(pbuf, addr, slen);
+				len -= slen;
+			}
+			seg_pio_copy_end(pbuf);
+		}
+	}
+
+	trace_output_ibhdr(dd_from_ibdev(qp->ibqp.device), &ahdr->ibh);
+
+	if (qp->s_rdma_mr) {
+		hfi1_put_mr(qp->s_rdma_mr);
+		qp->s_rdma_mr = NULL;
+	}
+
+pio_bail:
+	if (qp->s_wqe) {
+		spin_lock_irqsave(&qp->s_lock, flags);
+		hfi1_send_complete(qp, qp->s_wqe, wc_status);
+		spin_unlock_irqrestore(&qp->s_lock, flags);
+	} else if (qp->ibqp.qp_type == IB_QPT_RC) {
+		spin_lock_irqsave(&qp->s_lock, flags);
+		hfi1_rc_send_complete(qp, &ahdr->ibh);
+		spin_unlock_irqrestore(&qp->s_lock, flags);
+	}
+	return 0;
+}
+/*
+ * egress_pkey_matches_entry - return 1 if the pkey matches ent (ent
+ * being an entry from the ingress partition key table), return 0
+ * otherwise. Use the matching criteria for egress partition keys
+ * specified in the OPAv1 spec., section 9.1l.7.
+ */
+static inline int egress_pkey_matches_entry(u16 pkey, u16 ent)
+{
+	u16 mkey = pkey & PKEY_LOW_15_MASK;
+	u16 ment = ent & PKEY_LOW_15_MASK;
+
+	if (mkey == ment) {
+		/*
+		 * If pkey[15] is set (full partition member),
+		 * is bit 15 in the corresponding table element
+		 * clear (limited member)?
+		 */
+		if (pkey & PKEY_MEMBER_MASK)
+			return !!(ent & PKEY_MEMBER_MASK);
+		return 1;
+	}
+	return 0;
+}
+
+/*
+ * egress_pkey_check - return 0 if hdr's pkey matches according to the
+ * criteria in the OPAv1 spec., section 9.11.7.
+ */
+static inline int egress_pkey_check(struct hfi1_pportdata *ppd,
+				    struct hfi1_ib_header *hdr,
+				    struct hfi1_qp *qp)
+{
+	struct hfi1_other_headers *ohdr;
+	struct hfi1_devdata *dd;
+	int i = 0;
+	u16 pkey;
+	u8 lnh, sc5 = qp->s_sc;
+
+	if (!(ppd->part_enforce & HFI1_PART_ENFORCE_OUT))
+		return 0;
+
+	/* locate the pkey within the headers */
+	lnh = be16_to_cpu(hdr->lrh[0]) & 3;
+	if (lnh == HFI1_LRH_GRH)
+		ohdr = &hdr->u.l.oth;
+	else
+		ohdr = &hdr->u.oth;
+
+	pkey = (u16)be32_to_cpu(ohdr->bth[0]);
+
+	/* If SC15, pkey[0:14] must be 0x7fff */
+	if ((sc5 == 0xf) && ((pkey & PKEY_LOW_15_MASK) != PKEY_LOW_15_MASK))
+		goto bad;
+
+
+	/* Is the pkey = 0x0, or 0x8000? */
+	if ((pkey & PKEY_LOW_15_MASK) == 0)
+		goto bad;
+
+	/* The most likely matching pkey has index qp->s_pkey_index */
+	if (unlikely(!egress_pkey_matches_entry(pkey,
+					ppd->pkeys[qp->s_pkey_index]))) {
+		/* no match - try the entire table */
+		for (; i < MAX_PKEY_VALUES; i++) {
+			if (egress_pkey_matches_entry(pkey, ppd->pkeys[i]))
+				break;
+		}
+	}
+
+	if (i < MAX_PKEY_VALUES)
+		return 0;
+bad:
+	incr_cntr64(&ppd->port_xmit_constraint_errors);
+	dd = ppd->dd;
+	if (!(dd->err_info_xmit_constraint.status & OPA_EI_STATUS_SMASK)) {
+		u16 slid = be16_to_cpu(hdr->lrh[3]);
+
+		dd->err_info_xmit_constraint.status |= OPA_EI_STATUS_SMASK;
+		dd->err_info_xmit_constraint.slid = slid;
+		dd->err_info_xmit_constraint.pkey = pkey;
+	}
+	return 1;
+}
+
+/**
+ * hfi1_verbs_send - send a packet
+ * @qp: the QP to send on
+ * @ahdr: the packet header
+ * @hdrwords: the number of 32-bit words in the header
+ * @ss: the SGE to send
+ * @len: the length of the packet in bytes
+ *
+ * Return zero if packet is sent or queued OK.
+ * Return non-zero and clear qp->s_flags HFI1_S_BUSY otherwise.
+ */
+int hfi1_verbs_send(struct hfi1_qp *qp, struct ahg_ib_header *ahdr,
+		    u32 hdrwords, struct hfi1_sge_state *ss, u32 len)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(qp->ibqp.device);
+	u32 plen;
+	int ret;
+	int pio = 0;
+	unsigned long flags = 0;
+	u32 dwords = (len + 3) >> 2;
+
+	/*
+	 * VL15 packets (IB_QPT_SMI) will always use PIO, so we
+	 * can defer SDMA restart until link goes ACTIVE without
+	 * worrying about just how we got there.
+	 */
+	if ((qp->ibqp.qp_type == IB_QPT_SMI) ||
+	    !(dd->flags & HFI1_HAS_SEND_DMA))
+		pio = 1;
+
+	ret = egress_pkey_check(dd->pport, &ahdr->ibh, qp);
+	if (unlikely(ret)) {
+		/*
+		 * The value we are returning here does not get propagated to
+		 * the verbs caller. Thus we need to complete the request with
+		 * error otherwise the caller could be sitting waiting on the
+		 * completion event. Only do this for PIO. SDMA has its own
+		 * mechanism for handling the errors. So for SDMA we can just
+		 * return.
+		 */
+		if (pio) {
+			hfi1_cdbg(PIO, "%s() Failed. Completing with err",
+				  __func__);
+			spin_lock_irqsave(&qp->s_lock, flags);
+			hfi1_send_complete(qp, qp->s_wqe, IB_WC_GENERAL_ERR);
+			spin_unlock_irqrestore(&qp->s_lock, flags);
+		}
+		return -EINVAL;
+	}
+
+	/*
+	 * Calculate the send buffer trigger address.
+	 * The +2 counts for the pbc control qword
+	 */
+	plen = hdrwords + dwords + 2;
+
+	if (pio) {
+		ret = dd->process_pio_send(
+			qp, ahdr, hdrwords, ss, len, plen, dwords, 0);
+	} else {
+#ifdef CONFIG_SDMA_VERBOSITY
+		dd_dev_err(dd, "CONFIG SDMA %s:%d %s()\n",
+			   slashstrip(__FILE__), __LINE__, __func__);
+		dd_dev_err(dd, "SDMA hdrwords = %u, len = %u\n", hdrwords, len);
+#endif
+		ret = dd->process_dma_send(
+			qp, ahdr, hdrwords, ss, len, plen, dwords, 0);
+	}
+
+	return ret;
+}
+
+static int query_device(struct ib_device *ibdev,
+			struct ib_device_attr *props,
+			struct ib_udata *uhw)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct hfi1_ibdev *dev = to_idev(ibdev);
+
+	if (uhw->inlen || uhw->outlen)
+		return -EINVAL;
+	memset(props, 0, sizeof(*props));
+
+	props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR |
+		IB_DEVICE_BAD_QKEY_CNTR | IB_DEVICE_SHUTDOWN_PORT |
+		IB_DEVICE_SYS_IMAGE_GUID | IB_DEVICE_RC_RNR_NAK_GEN |
+		IB_DEVICE_PORT_ACTIVE_EVENT | IB_DEVICE_SRQ_RESIZE;
+
+	props->page_size_cap = PAGE_SIZE;
+	props->vendor_id =
+		dd->oui1 << 16 | dd->oui2 << 8 | dd->oui3;
+	props->vendor_part_id = dd->pcidev->device;
+	props->hw_ver = dd->minrev;
+	props->sys_image_guid = ib_hfi1_sys_image_guid;
+	props->max_mr_size = ~0ULL;
+	props->max_qp = hfi1_max_qps;
+	props->max_qp_wr = hfi1_max_qp_wrs;
+	props->max_sge = hfi1_max_sges;
+	props->max_sge_rd = hfi1_max_sges;
+	props->max_cq = hfi1_max_cqs;
+	props->max_ah = hfi1_max_ahs;
+	props->max_cqe = hfi1_max_cqes;
+	props->max_mr = dev->lk_table.max;
+	props->max_fmr = dev->lk_table.max;
+	props->max_map_per_fmr = 32767;
+	props->max_pd = hfi1_max_pds;
+	props->max_qp_rd_atom = HFI1_MAX_RDMA_ATOMIC;
+	props->max_qp_init_rd_atom = 255;
+	/* props->max_res_rd_atom */
+	props->max_srq = hfi1_max_srqs;
+	props->max_srq_wr = hfi1_max_srq_wrs;
+	props->max_srq_sge = hfi1_max_srq_sges;
+	/* props->local_ca_ack_delay */
+	props->atomic_cap = IB_ATOMIC_GLOB;
+	props->max_pkeys = hfi1_get_npkeys(dd);
+	props->max_mcast_grp = hfi1_max_mcast_grps;
+	props->max_mcast_qp_attach = hfi1_max_mcast_qp_attached;
+	props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
+		props->max_mcast_grp;
+
+	return 0;
+}
+
+static inline u16 opa_speed_to_ib(u16 in)
+{
+	u16 out = 0;
+
+	if (in & OPA_LINK_SPEED_25G)
+		out |= IB_SPEED_EDR;
+	if (in & OPA_LINK_SPEED_12_5G)
+		out |= IB_SPEED_FDR;
+
+	return out;
+}
+
+/*
+ * Convert a single OPA link width (no multiple flags) to an IB value.
+ * A zero OPA link width means link down, which means the IB width value
+ * is a don't care.
+ */
+static inline u16 opa_width_to_ib(u16 in)
+{
+	switch (in) {
+	case OPA_LINK_WIDTH_1X:
+	/* map 2x and 3x to 1x as they don't exist in IB */
+	case OPA_LINK_WIDTH_2X:
+	case OPA_LINK_WIDTH_3X:
+		return IB_WIDTH_1X;
+	default: /* link down or unknown, return our largest width */
+	case OPA_LINK_WIDTH_4X:
+		return IB_WIDTH_4X;
+	}
+}
+
+static int query_port(struct ib_device *ibdev, u8 port,
+		      struct ib_port_attr *props)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	u16 lid = ppd->lid;
+
+	memset(props, 0, sizeof(*props));
+	props->lid = lid ? lid : 0;
+	props->lmc = ppd->lmc;
+	props->sm_lid = ibp->sm_lid;
+	props->sm_sl = ibp->sm_sl;
+	/* OPA logical states match IB logical states */
+	props->state = driver_lstate(ppd);
+	props->phys_state = hfi1_ibphys_portstate(ppd);
+	props->port_cap_flags = ibp->port_cap_flags;
+	props->gid_tbl_len = HFI1_GUIDS_PER_PORT;
+	props->max_msg_sz = 0x80000000;
+	props->pkey_tbl_len = hfi1_get_npkeys(dd);
+	props->bad_pkey_cntr = ibp->pkey_violations;
+	props->qkey_viol_cntr = ibp->qkey_violations;
+	props->active_width = (u8)opa_width_to_ib(ppd->link_width_active);
+	/* see rate_show() in ib core/sysfs.c */
+	props->active_speed = (u8)opa_speed_to_ib(ppd->link_speed_active);
+	props->max_vl_num = ppd->vls_supported;
+	props->init_type_reply = 0;
+
+	/* Once we are a "first class" citizen and have added the OPA MTUs to
+	 * the core we can advertise the larger MTU enum to the ULPs, for now
+	 * advertise only 4K.
+	 *
+	 * Those applications which are either OPA aware or pass the MTU enum
+	 * from the Path Records to us will get the new 8k MTU.  Those that
+	 * attempt to process the MTU enum may fail in various ways.
+	 */
+	props->max_mtu = mtu_to_enum((!valid_ib_mtu(hfi1_max_mtu) ?
+				      4096 : hfi1_max_mtu), IB_MTU_4096);
+	props->active_mtu = !valid_ib_mtu(ppd->ibmtu) ? props->max_mtu :
+		mtu_to_enum(ppd->ibmtu, IB_MTU_2048);
+	props->subnet_timeout = ibp->subnet_timeout;
+
+	return 0;
+}
+
+static int port_immutable(struct ib_device *ibdev, u8 port_num,
+			  struct ib_port_immutable *immutable)
+{
+	struct ib_port_attr attr;
+	int err;
+
+	err = query_port(ibdev, port_num, &attr);
+	if (err)
+		return err;
+
+	memset(immutable, 0, sizeof(*immutable));
+
+	immutable->pkey_tbl_len = attr.pkey_tbl_len;
+	immutable->gid_tbl_len = attr.gid_tbl_len;
+	immutable->core_cap_flags = RDMA_CORE_PORT_INTEL_OPA;
+	immutable->max_mad_size = OPA_MGMT_MAD_SIZE;
+
+	return 0;
+}
+
+static int modify_device(struct ib_device *device,
+			 int device_modify_mask,
+			 struct ib_device_modify *device_modify)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(device);
+	unsigned i;
+	int ret;
+
+	if (device_modify_mask & ~(IB_DEVICE_MODIFY_SYS_IMAGE_GUID |
+				   IB_DEVICE_MODIFY_NODE_DESC)) {
+		ret = -EOPNOTSUPP;
+		goto bail;
+	}
+
+	if (device_modify_mask & IB_DEVICE_MODIFY_NODE_DESC) {
+		memcpy(device->node_desc, device_modify->node_desc, 64);
+		for (i = 0; i < dd->num_pports; i++) {
+			struct hfi1_ibport *ibp = &dd->pport[i].ibport_data;
+
+			hfi1_node_desc_chg(ibp);
+		}
+	}
+
+	if (device_modify_mask & IB_DEVICE_MODIFY_SYS_IMAGE_GUID) {
+		ib_hfi1_sys_image_guid =
+			cpu_to_be64(device_modify->sys_image_guid);
+		for (i = 0; i < dd->num_pports; i++) {
+			struct hfi1_ibport *ibp = &dd->pport[i].ibport_data;
+
+			hfi1_sys_guid_chg(ibp);
+		}
+	}
+
+	ret = 0;
+
+bail:
+	return ret;
+}
+
+static int modify_port(struct ib_device *ibdev, u8 port,
+		       int port_modify_mask, struct ib_port_modify *props)
+{
+	struct hfi1_ibport *ibp = to_iport(ibdev, port);
+	struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+	int ret = 0;
+
+	ibp->port_cap_flags |= props->set_port_cap_mask;
+	ibp->port_cap_flags &= ~props->clr_port_cap_mask;
+	if (props->set_port_cap_mask || props->clr_port_cap_mask)
+		hfi1_cap_mask_chg(ibp);
+	if (port_modify_mask & IB_PORT_SHUTDOWN) {
+		set_link_down_reason(ppd, OPA_LINKDOWN_REASON_UNKNOWN, 0,
+		  OPA_LINKDOWN_REASON_UNKNOWN);
+		ret = set_link_state(ppd, HLS_DN_DOWNDEF);
+	}
+	if (port_modify_mask & IB_PORT_RESET_QKEY_CNTR)
+		ibp->qkey_violations = 0;
+	return ret;
+}
+
+static int query_gid(struct ib_device *ibdev, u8 port,
+		     int index, union ib_gid *gid)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	int ret = 0;
+
+	if (!port || port > dd->num_pports)
+		ret = -EINVAL;
+	else {
+		struct hfi1_ibport *ibp = to_iport(ibdev, port);
+		struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+
+		gid->global.subnet_prefix = ibp->gid_prefix;
+		if (index == 0)
+			gid->global.interface_id = cpu_to_be64(ppd->guid);
+		else if (index < HFI1_GUIDS_PER_PORT)
+			gid->global.interface_id = ibp->guids[index - 1];
+		else
+			ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static struct ib_pd *alloc_pd(struct ib_device *ibdev,
+			      struct ib_ucontext *context,
+			      struct ib_udata *udata)
+{
+	struct hfi1_ibdev *dev = to_idev(ibdev);
+	struct hfi1_pd *pd;
+	struct ib_pd *ret;
+
+	/*
+	 * This is actually totally arbitrary.  Some correctness tests
+	 * assume there's a maximum number of PDs that can be allocated.
+	 * We don't actually have this limit, but we fail the test if
+	 * we allow allocations of more than we report for this value.
+	 */
+
+	pd = kmalloc(sizeof(*pd), GFP_KERNEL);
+	if (!pd) {
+		ret = ERR_PTR(-ENOMEM);
+		goto bail;
+	}
+
+	spin_lock(&dev->n_pds_lock);
+	if (dev->n_pds_allocated == hfi1_max_pds) {
+		spin_unlock(&dev->n_pds_lock);
+		kfree(pd);
+		ret = ERR_PTR(-ENOMEM);
+		goto bail;
+	}
+
+	dev->n_pds_allocated++;
+	spin_unlock(&dev->n_pds_lock);
+
+	/* ib_alloc_pd() will initialize pd->ibpd. */
+	pd->user = udata != NULL;
+
+	ret = &pd->ibpd;
+
+bail:
+	return ret;
+}
+
+static int dealloc_pd(struct ib_pd *ibpd)
+{
+	struct hfi1_pd *pd = to_ipd(ibpd);
+	struct hfi1_ibdev *dev = to_idev(ibpd->device);
+
+	spin_lock(&dev->n_pds_lock);
+	dev->n_pds_allocated--;
+	spin_unlock(&dev->n_pds_lock);
+
+	kfree(pd);
+
+	return 0;
+}
+
+/*
+ * convert ah port,sl to sc
+ */
+u8 ah_to_sc(struct ib_device *ibdev, struct ib_ah_attr *ah)
+{
+	struct hfi1_ibport *ibp = to_iport(ibdev, ah->port_num);
+
+	return ibp->sl_to_sc[ah->sl];
+}
+
+int hfi1_check_ah(struct ib_device *ibdev, struct ib_ah_attr *ah_attr)
+{
+	struct hfi1_ibport *ibp;
+	struct hfi1_pportdata *ppd;
+	struct hfi1_devdata *dd;
+	u8 sc5;
+
+	/* A multicast address requires a GRH (see ch. 8.4.1). */
+	if (ah_attr->dlid >= HFI1_MULTICAST_LID_BASE &&
+	    ah_attr->dlid != HFI1_PERMISSIVE_LID &&
+	    !(ah_attr->ah_flags & IB_AH_GRH))
+		goto bail;
+	if ((ah_attr->ah_flags & IB_AH_GRH) &&
+	    ah_attr->grh.sgid_index >= HFI1_GUIDS_PER_PORT)
+		goto bail;
+	if (ah_attr->dlid == 0)
+		goto bail;
+	if (ah_attr->port_num < 1 ||
+	    ah_attr->port_num > ibdev->phys_port_cnt)
+		goto bail;
+	if (ah_attr->static_rate != IB_RATE_PORT_CURRENT &&
+	    ib_rate_to_mbps(ah_attr->static_rate) < 0)
+		goto bail;
+	if (ah_attr->sl >= OPA_MAX_SLS)
+		goto bail;
+	/* test the mapping for validity */
+	ibp = to_iport(ibdev, ah_attr->port_num);
+	ppd = ppd_from_ibp(ibp);
+	sc5 = ibp->sl_to_sc[ah_attr->sl];
+	dd = dd_from_ppd(ppd);
+	if (sc_to_vlt(dd, sc5) > num_vls && sc_to_vlt(dd, sc5) != 0xf)
+		goto bail;
+	return 0;
+bail:
+	return -EINVAL;
+}
+
+/**
+ * create_ah - create an address handle
+ * @pd: the protection domain
+ * @ah_attr: the attributes of the AH
+ *
+ * This may be called from interrupt context.
+ */
+static struct ib_ah *create_ah(struct ib_pd *pd,
+			       struct ib_ah_attr *ah_attr)
+{
+	struct hfi1_ah *ah;
+	struct ib_ah *ret;
+	struct hfi1_ibdev *dev = to_idev(pd->device);
+	unsigned long flags;
+
+	if (hfi1_check_ah(pd->device, ah_attr)) {
+		ret = ERR_PTR(-EINVAL);
+		goto bail;
+	}
+
+	ah = kmalloc(sizeof(*ah), GFP_ATOMIC);
+	if (!ah) {
+		ret = ERR_PTR(-ENOMEM);
+		goto bail;
+	}
+
+	spin_lock_irqsave(&dev->n_ahs_lock, flags);
+	if (dev->n_ahs_allocated == hfi1_max_ahs) {
+		spin_unlock_irqrestore(&dev->n_ahs_lock, flags);
+		kfree(ah);
+		ret = ERR_PTR(-ENOMEM);
+		goto bail;
+	}
+
+	dev->n_ahs_allocated++;
+	spin_unlock_irqrestore(&dev->n_ahs_lock, flags);
+
+	/* ib_create_ah() will initialize ah->ibah. */
+	ah->attr = *ah_attr;
+	atomic_set(&ah->refcount, 0);
+
+	ret = &ah->ibah;
+
+bail:
+	return ret;
+}
+
+struct ib_ah *hfi1_create_qp0_ah(struct hfi1_ibport *ibp, u16 dlid)
+{
+	struct ib_ah_attr attr;
+	struct ib_ah *ah = ERR_PTR(-EINVAL);
+	struct hfi1_qp *qp0;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.dlid = dlid;
+	attr.port_num = ppd_from_ibp(ibp)->port;
+	rcu_read_lock();
+	qp0 = rcu_dereference(ibp->qp[0]);
+	if (qp0)
+		ah = ib_create_ah(qp0->ibqp.pd, &attr);
+	rcu_read_unlock();
+	return ah;
+}
+
+/**
+ * destroy_ah - destroy an address handle
+ * @ibah: the AH to destroy
+ *
+ * This may be called from interrupt context.
+ */
+static int destroy_ah(struct ib_ah *ibah)
+{
+	struct hfi1_ibdev *dev = to_idev(ibah->device);
+	struct hfi1_ah *ah = to_iah(ibah);
+	unsigned long flags;
+
+	if (atomic_read(&ah->refcount) != 0)
+		return -EBUSY;
+
+	spin_lock_irqsave(&dev->n_ahs_lock, flags);
+	dev->n_ahs_allocated--;
+	spin_unlock_irqrestore(&dev->n_ahs_lock, flags);
+
+	kfree(ah);
+
+	return 0;
+}
+
+static int modify_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
+{
+	struct hfi1_ah *ah = to_iah(ibah);
+
+	if (hfi1_check_ah(ibah->device, ah_attr))
+		return -EINVAL;
+
+	ah->attr = *ah_attr;
+
+	return 0;
+}
+
+static int query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
+{
+	struct hfi1_ah *ah = to_iah(ibah);
+
+	*ah_attr = ah->attr;
+
+	return 0;
+}
+
+/**
+ * hfi1_get_npkeys - return the size of the PKEY table for context 0
+ * @dd: the hfi1_ib device
+ */
+unsigned hfi1_get_npkeys(struct hfi1_devdata *dd)
+{
+	return ARRAY_SIZE(dd->pport[0].pkeys);
+}
+
+static int query_pkey(struct ib_device *ibdev, u8 port, u16 index,
+		      u16 *pkey)
+{
+	struct hfi1_devdata *dd = dd_from_ibdev(ibdev);
+	int ret;
+
+	if (index >= hfi1_get_npkeys(dd)) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	*pkey = hfi1_get_pkey(to_iport(ibdev, port), index);
+	ret = 0;
+
+bail:
+	return ret;
+}
+
+/**
+ * alloc_ucontext - allocate a ucontest
+ * @ibdev: the infiniband device
+ * @udata: not used by the driver
+ */
+
+static struct ib_ucontext *alloc_ucontext(struct ib_device *ibdev,
+					  struct ib_udata *udata)
+{
+	struct hfi1_ucontext *context;
+	struct ib_ucontext *ret;
+
+	context = kmalloc(sizeof(*context), GFP_KERNEL);
+	if (!context) {
+		ret = ERR_PTR(-ENOMEM);
+		goto bail;
+	}
+
+	ret = &context->ibucontext;
+
+bail:
+	return ret;
+}
+
+static int dealloc_ucontext(struct ib_ucontext *context)
+{
+	kfree(to_iucontext(context));
+	return 0;
+}
+
+static void init_ibport(struct hfi1_pportdata *ppd)
+{
+	struct hfi1_ibport *ibp = &ppd->ibport_data;
+	size_t sz = ARRAY_SIZE(ibp->sl_to_sc);
+	int i;
+
+	for (i = 0; i < sz; i++) {
+		ibp->sl_to_sc[i] = i;
+		ibp->sc_to_sl[i] = i;
+	}
+
+	spin_lock_init(&ibp->lock);
+	/* Set the prefix to the default value (see ch. 4.1.1) */
+	ibp->gid_prefix = IB_DEFAULT_GID_PREFIX;
+	ibp->sm_lid = 0;
+	/* Below should only set bits defined in OPA PortInfo.CapabilityMask */
+	ibp->port_cap_flags = IB_PORT_AUTO_MIGR_SUP |
+		IB_PORT_CAP_MASK_NOTICE_SUP;
+	ibp->pma_counter_select[0] = IB_PMA_PORT_XMIT_DATA;
+	ibp->pma_counter_select[1] = IB_PMA_PORT_RCV_DATA;
+	ibp->pma_counter_select[2] = IB_PMA_PORT_XMIT_PKTS;
+	ibp->pma_counter_select[3] = IB_PMA_PORT_RCV_PKTS;
+	ibp->pma_counter_select[4] = IB_PMA_PORT_XMIT_WAIT;
+
+	RCU_INIT_POINTER(ibp->qp[0], NULL);
+	RCU_INIT_POINTER(ibp->qp[1], NULL);
+}
+
+static void verbs_txreq_kmem_cache_ctor(void *obj)
+{
+	struct verbs_txreq *tx = (struct verbs_txreq *)obj;
+
+	memset(tx, 0, sizeof(*tx));
+}
+
+/**
+ * hfi1_register_ib_device - register our device with the infiniband core
+ * @dd: the device data structure
+ * Return 0 if successful, errno if unsuccessful.
+ */
+int hfi1_register_ib_device(struct hfi1_devdata *dd)
+{
+	struct hfi1_ibdev *dev = &dd->verbs_dev;
+	struct ib_device *ibdev = &dev->ibdev;
+	struct hfi1_pportdata *ppd = dd->pport;
+	unsigned i, lk_tab_size;
+	int ret;
+	size_t lcpysz = IB_DEVICE_NAME_MAX;
+	u16 descq_cnt;
+
+	ret = hfi1_qp_init(dev);
+	if (ret)
+		goto err_qp_init;
+
+
+	for (i = 0; i < dd->num_pports; i++)
+		init_ibport(ppd + i);
+
+	/* Only need to initialize non-zero fields. */
+	spin_lock_init(&dev->n_pds_lock);
+	spin_lock_init(&dev->n_ahs_lock);
+	spin_lock_init(&dev->n_cqs_lock);
+	spin_lock_init(&dev->n_qps_lock);
+	spin_lock_init(&dev->n_srqs_lock);
+	spin_lock_init(&dev->n_mcast_grps_lock);
+	init_timer(&dev->mem_timer);
+	dev->mem_timer.function = mem_timer;
+	dev->mem_timer.data = (unsigned long) dev;
+
+	/*
+	 * The top hfi1_lkey_table_size bits are used to index the
+	 * table.  The lower 8 bits can be owned by the user (copied from
+	 * the LKEY).  The remaining bits act as a generation number or tag.
+	 */
+	spin_lock_init(&dev->lk_table.lock);
+	dev->lk_table.max = 1 << hfi1_lkey_table_size;
+	/* ensure generation is at least 4 bits (keys.c) */
+	if (hfi1_lkey_table_size > MAX_LKEY_TABLE_BITS) {
+		dd_dev_warn(dd, "lkey bits %u too large, reduced to %u\n",
+			      hfi1_lkey_table_size, MAX_LKEY_TABLE_BITS);
+		hfi1_lkey_table_size = MAX_LKEY_TABLE_BITS;
+	}
+	lk_tab_size = dev->lk_table.max * sizeof(*dev->lk_table.table);
+	dev->lk_table.table = (struct hfi1_mregion __rcu **)
+		vmalloc(lk_tab_size);
+	if (dev->lk_table.table == NULL) {
+		ret = -ENOMEM;
+		goto err_lk;
+	}
+	RCU_INIT_POINTER(dev->dma_mr, NULL);
+	for (i = 0; i < dev->lk_table.max; i++)
+		RCU_INIT_POINTER(dev->lk_table.table[i], NULL);
+	INIT_LIST_HEAD(&dev->pending_mmaps);
+	spin_lock_init(&dev->pending_lock);
+	seqlock_init(&dev->iowait_lock);
+	dev->mmap_offset = PAGE_SIZE;
+	spin_lock_init(&dev->mmap_offset_lock);
+	INIT_LIST_HEAD(&dev->txwait);
+	INIT_LIST_HEAD(&dev->memwait);
+
+	descq_cnt = sdma_get_descq_cnt();
+
+	/* SLAB_HWCACHE_ALIGN for AHG */
+	dev->verbs_txreq_cache = kmem_cache_create("hfi1_vtxreq_cache",
+						   sizeof(struct verbs_txreq),
+						   0, SLAB_HWCACHE_ALIGN,
+						   verbs_txreq_kmem_cache_ctor);
+	if (!dev->verbs_txreq_cache) {
+		ret = -ENOMEM;
+		goto err_verbs_txreq;
+	}
+
+	/*
+	 * The system image GUID is supposed to be the same for all
+	 * HFIs in a single system but since there can be other
+	 * device types in the system, we can't be sure this is unique.
+	 */
+	if (!ib_hfi1_sys_image_guid)
+		ib_hfi1_sys_image_guid = cpu_to_be64(ppd->guid);
+	lcpysz = strlcpy(ibdev->name, class_name(), lcpysz);
+	strlcpy(ibdev->name + lcpysz, "_%d", IB_DEVICE_NAME_MAX - lcpysz);
+	ibdev->owner = THIS_MODULE;
+	ibdev->node_guid = cpu_to_be64(ppd->guid);
+	ibdev->uverbs_abi_ver = HFI1_UVERBS_ABI_VERSION;
+	ibdev->uverbs_cmd_mask =
+		(1ull << IB_USER_VERBS_CMD_GET_CONTEXT)         |
+		(1ull << IB_USER_VERBS_CMD_QUERY_DEVICE)        |
+		(1ull << IB_USER_VERBS_CMD_QUERY_PORT)          |
+		(1ull << IB_USER_VERBS_CMD_ALLOC_PD)            |
+		(1ull << IB_USER_VERBS_CMD_DEALLOC_PD)          |
+		(1ull << IB_USER_VERBS_CMD_CREATE_AH)           |
+		(1ull << IB_USER_VERBS_CMD_MODIFY_AH)           |
+		(1ull << IB_USER_VERBS_CMD_QUERY_AH)            |
+		(1ull << IB_USER_VERBS_CMD_DESTROY_AH)          |
+		(1ull << IB_USER_VERBS_CMD_REG_MR)              |
+		(1ull << IB_USER_VERBS_CMD_DEREG_MR)            |
+		(1ull << IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) |
+		(1ull << IB_USER_VERBS_CMD_CREATE_CQ)           |
+		(1ull << IB_USER_VERBS_CMD_RESIZE_CQ)           |
+		(1ull << IB_USER_VERBS_CMD_DESTROY_CQ)          |
+		(1ull << IB_USER_VERBS_CMD_POLL_CQ)             |
+		(1ull << IB_USER_VERBS_CMD_REQ_NOTIFY_CQ)       |
+		(1ull << IB_USER_VERBS_CMD_CREATE_QP)           |
+		(1ull << IB_USER_VERBS_CMD_QUERY_QP)            |
+		(1ull << IB_USER_VERBS_CMD_MODIFY_QP)           |
+		(1ull << IB_USER_VERBS_CMD_DESTROY_QP)          |
+		(1ull << IB_USER_VERBS_CMD_POST_SEND)           |
+		(1ull << IB_USER_VERBS_CMD_POST_RECV)           |
+		(1ull << IB_USER_VERBS_CMD_ATTACH_MCAST)        |
+		(1ull << IB_USER_VERBS_CMD_DETACH_MCAST)        |
+		(1ull << IB_USER_VERBS_CMD_CREATE_SRQ)          |
+		(1ull << IB_USER_VERBS_CMD_MODIFY_SRQ)          |
+		(1ull << IB_USER_VERBS_CMD_QUERY_SRQ)           |
+		(1ull << IB_USER_VERBS_CMD_DESTROY_SRQ)         |
+		(1ull << IB_USER_VERBS_CMD_POST_SRQ_RECV);
+	ibdev->node_type = RDMA_NODE_IB_CA;
+	ibdev->phys_port_cnt = dd->num_pports;
+	ibdev->num_comp_vectors = 1;
+	ibdev->dma_device = &dd->pcidev->dev;
+	ibdev->query_device = query_device;
+	ibdev->modify_device = modify_device;
+	ibdev->query_port = query_port;
+	ibdev->modify_port = modify_port;
+	ibdev->query_pkey = query_pkey;
+	ibdev->query_gid = query_gid;
+	ibdev->alloc_ucontext = alloc_ucontext;
+	ibdev->dealloc_ucontext = dealloc_ucontext;
+	ibdev->alloc_pd = alloc_pd;
+	ibdev->dealloc_pd = dealloc_pd;
+	ibdev->create_ah = create_ah;
+	ibdev->destroy_ah = destroy_ah;
+	ibdev->modify_ah = modify_ah;
+	ibdev->query_ah = query_ah;
+	ibdev->create_srq = hfi1_create_srq;
+	ibdev->modify_srq = hfi1_modify_srq;
+	ibdev->query_srq = hfi1_query_srq;
+	ibdev->destroy_srq = hfi1_destroy_srq;
+	ibdev->create_qp = hfi1_create_qp;
+	ibdev->modify_qp = hfi1_modify_qp;
+	ibdev->query_qp = hfi1_query_qp;
+	ibdev->destroy_qp = hfi1_destroy_qp;
+	ibdev->post_send = post_send;
+	ibdev->post_recv = post_receive;
+	ibdev->post_srq_recv = hfi1_post_srq_receive;
+	ibdev->create_cq = hfi1_create_cq;
+	ibdev->destroy_cq = hfi1_destroy_cq;
+	ibdev->resize_cq = hfi1_resize_cq;
+	ibdev->poll_cq = hfi1_poll_cq;
+	ibdev->req_notify_cq = hfi1_req_notify_cq;
+	ibdev->get_dma_mr = hfi1_get_dma_mr;
+	ibdev->reg_phys_mr = hfi1_reg_phys_mr;
+	ibdev->reg_user_mr = hfi1_reg_user_mr;
+	ibdev->dereg_mr = hfi1_dereg_mr;
+	ibdev->alloc_fast_reg_page_list = hfi1_alloc_fast_reg_page_list;
+	ibdev->free_fast_reg_page_list = hfi1_free_fast_reg_page_list;
+	ibdev->alloc_fmr = hfi1_alloc_fmr;
+	ibdev->map_phys_fmr = hfi1_map_phys_fmr;
+	ibdev->unmap_fmr = hfi1_unmap_fmr;
+	ibdev->dealloc_fmr = hfi1_dealloc_fmr;
+	ibdev->attach_mcast = hfi1_multicast_attach;
+	ibdev->detach_mcast = hfi1_multicast_detach;
+	ibdev->process_mad = hfi1_process_mad;
+	ibdev->mmap = hfi1_mmap;
+	ibdev->dma_ops = &hfi1_dma_mapping_ops;
+	ibdev->get_port_immutable = port_immutable;
+
+	strncpy(ibdev->node_desc, init_utsname()->nodename,
+		sizeof(ibdev->node_desc));
+
+	ret = ib_register_device(ibdev, hfi1_create_port_files);
+	if (ret)
+		goto err_reg;
+
+	ret = hfi1_create_agents(dev);
+	if (ret)
+		goto err_agents;
+
+	ret = hfi1_verbs_register_sysfs(dd);
+	if (ret)
+		goto err_class;
+
+	goto bail;
+
+err_class:
+	hfi1_free_agents(dev);
+err_agents:
+	ib_unregister_device(ibdev);
+err_reg:
+err_verbs_txreq:
+	kmem_cache_destroy(dev->verbs_txreq_cache);
+	vfree(dev->lk_table.table);
+err_lk:
+	hfi1_qp_exit(dev);
+err_qp_init:
+	dd_dev_err(dd, "cannot register verbs: %d!\n", -ret);
+bail:
+	return ret;
+}
+
+void hfi1_unregister_ib_device(struct hfi1_devdata *dd)
+{
+	struct hfi1_ibdev *dev = &dd->verbs_dev;
+	struct ib_device *ibdev = &dev->ibdev;
+
+	hfi1_verbs_unregister_sysfs(dd);
+
+	hfi1_free_agents(dev);
+
+	ib_unregister_device(ibdev);
+
+	if (!list_empty(&dev->txwait))
+		dd_dev_err(dd, "txwait list not empty!\n");
+	if (!list_empty(&dev->memwait))
+		dd_dev_err(dd, "memwait list not empty!\n");
+	if (dev->dma_mr)
+		dd_dev_err(dd, "DMA MR not NULL!\n");
+
+	hfi1_qp_exit(dev);
+	del_timer_sync(&dev->mem_timer);
+	kmem_cache_destroy(dev->verbs_txreq_cache);
+	vfree(dev->lk_table.table);
+}
+
+/*
+ * This must be called with s_lock held.
+ */
+void hfi1_schedule_send(struct hfi1_qp *qp)
+{
+	if (hfi1_send_ok(qp)) {
+		struct hfi1_ibport *ibp =
+			to_iport(qp->ibqp.device, qp->port_num);
+		struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
+
+		iowait_schedule(&qp->s_iowait, ppd->hfi1_wq);
+	}
+}
+
+void hfi1_cnp_rcv(struct hfi1_packet *packet)
+{
+	struct hfi1_ibport *ibp = &packet->rcd->ppd->ibport_data;
+
+	if (packet->qp->ibqp.qp_type == IB_QPT_UC)
+		hfi1_uc_rcv(packet);
+	else if (packet->qp->ibqp.qp_type == IB_QPT_UD)
+		hfi1_ud_rcv(packet);
+	else
+		ibp->n_pkt_drops++;
+}
diff --git a/drivers/staging/rdma/hfi1/verbs.h b/drivers/staging/rdma/hfi1/verbs.h
new file mode 100644
index 000000000000..812536194190
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/verbs.h
@@ -0,0 +1,1149 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#ifndef HFI1_VERBS_H
+#define HFI1_VERBS_H
+
+#include <linux/types.h>
+#include <linux/seqlock.h>
+#include <linux/kernel.h>
+#include <linux/interrupt.h>
+#include <linux/kref.h>
+#include <linux/workqueue.h>
+#include <linux/kthread.h>
+#include <linux/completion.h>
+#include <rdma/ib_pack.h>
+#include <rdma/ib_user_verbs.h>
+#include <rdma/ib_mad.h>
+
+struct hfi1_ctxtdata;
+struct hfi1_pportdata;
+struct hfi1_devdata;
+struct hfi1_packet;
+
+#include "iowait.h"
+
+#define HFI1_MAX_RDMA_ATOMIC     16
+#define HFI1_GUIDS_PER_PORT	5
+
+/*
+ * Increment this value if any changes that break userspace ABI
+ * compatibility are made.
+ */
+#define HFI1_UVERBS_ABI_VERSION       2
+
+/*
+ * Define an ib_cq_notify value that is not valid so we know when CQ
+ * notifications are armed.
+ */
+#define IB_CQ_NONE      (IB_CQ_NEXT_COMP + 1)
+
+#define IB_SEQ_NAK	(3 << 29)
+
+/* AETH NAK opcode values */
+#define IB_RNR_NAK                      0x20
+#define IB_NAK_PSN_ERROR                0x60
+#define IB_NAK_INVALID_REQUEST          0x61
+#define IB_NAK_REMOTE_ACCESS_ERROR      0x62
+#define IB_NAK_REMOTE_OPERATIONAL_ERROR 0x63
+#define IB_NAK_INVALID_RD_REQUEST       0x64
+
+/* Flags for checking QP state (see ib_hfi1_state_ops[]) */
+#define HFI1_POST_SEND_OK                0x01
+#define HFI1_POST_RECV_OK                0x02
+#define HFI1_PROCESS_RECV_OK             0x04
+#define HFI1_PROCESS_SEND_OK             0x08
+#define HFI1_PROCESS_NEXT_SEND_OK        0x10
+#define HFI1_FLUSH_SEND			0x20
+#define HFI1_FLUSH_RECV			0x40
+#define HFI1_PROCESS_OR_FLUSH_SEND \
+	(HFI1_PROCESS_SEND_OK | HFI1_FLUSH_SEND)
+
+/* IB Performance Manager status values */
+#define IB_PMA_SAMPLE_STATUS_DONE       0x00
+#define IB_PMA_SAMPLE_STATUS_STARTED    0x01
+#define IB_PMA_SAMPLE_STATUS_RUNNING    0x02
+
+/* Mandatory IB performance counter select values. */
+#define IB_PMA_PORT_XMIT_DATA   cpu_to_be16(0x0001)
+#define IB_PMA_PORT_RCV_DATA    cpu_to_be16(0x0002)
+#define IB_PMA_PORT_XMIT_PKTS   cpu_to_be16(0x0003)
+#define IB_PMA_PORT_RCV_PKTS    cpu_to_be16(0x0004)
+#define IB_PMA_PORT_XMIT_WAIT   cpu_to_be16(0x0005)
+
+#define HFI1_VENDOR_IPG		cpu_to_be16(0xFFA0)
+
+#define IB_BTH_REQ_ACK		(1 << 31)
+#define IB_BTH_SOLICITED	(1 << 23)
+#define IB_BTH_MIG_REQ		(1 << 22)
+
+#define IB_GRH_VERSION		6
+#define IB_GRH_VERSION_MASK	0xF
+#define IB_GRH_VERSION_SHIFT	28
+#define IB_GRH_TCLASS_MASK	0xFF
+#define IB_GRH_TCLASS_SHIFT	20
+#define IB_GRH_FLOW_MASK	0xFFFFF
+#define IB_GRH_FLOW_SHIFT	0
+#define IB_GRH_NEXT_HDR		0x1B
+
+#define IB_DEFAULT_GID_PREFIX	cpu_to_be64(0xfe80000000000000ULL)
+
+/* flags passed by hfi1_ib_rcv() */
+enum {
+	HFI1_HAS_GRH = (1 << 0),
+};
+
+struct ib_reth {
+	__be64 vaddr;
+	__be32 rkey;
+	__be32 length;
+} __packed;
+
+struct ib_atomic_eth {
+	__be32 vaddr[2];        /* unaligned so access as 2 32-bit words */
+	__be32 rkey;
+	__be64 swap_data;
+	__be64 compare_data;
+} __packed;
+
+union ib_ehdrs {
+	struct {
+		__be32 deth[2];
+		__be32 imm_data;
+	} ud;
+	struct {
+		struct ib_reth reth;
+		__be32 imm_data;
+	} rc;
+	struct {
+		__be32 aeth;
+		__be32 atomic_ack_eth[2];
+	} at;
+	__be32 imm_data;
+	__be32 aeth;
+	struct ib_atomic_eth atomic_eth;
+}  __packed;
+
+struct hfi1_other_headers {
+	__be32 bth[3];
+	union ib_ehdrs u;
+} __packed;
+
+/*
+ * Note that UD packets with a GRH header are 8+40+12+8 = 68 bytes
+ * long (72 w/ imm_data).  Only the first 56 bytes of the IB header
+ * will be in the eager header buffer.  The remaining 12 or 16 bytes
+ * are in the data buffer.
+ */
+struct hfi1_ib_header {
+	__be16 lrh[4];
+	union {
+		struct {
+			struct ib_grh grh;
+			struct hfi1_other_headers oth;
+		} l;
+		struct hfi1_other_headers oth;
+	} u;
+} __packed;
+
+struct ahg_ib_header {
+	struct sdma_engine *sde;
+	u32 ahgdesc[2];
+	u16 tx_flags;
+	u8 ahgcount;
+	u8 ahgidx;
+	struct hfi1_ib_header ibh;
+};
+
+struct hfi1_pio_header {
+	__le64 pbc;
+	struct hfi1_ib_header hdr;
+} __packed;
+
+/*
+ * used for force cacheline alignment for AHG
+ */
+struct tx_pio_header {
+	struct hfi1_pio_header phdr;
+} ____cacheline_aligned;
+
+/*
+ * There is one struct hfi1_mcast for each multicast GID.
+ * All attached QPs are then stored as a list of
+ * struct hfi1_mcast_qp.
+ */
+struct hfi1_mcast_qp {
+	struct list_head list;
+	struct hfi1_qp *qp;
+};
+
+struct hfi1_mcast {
+	struct rb_node rb_node;
+	union ib_gid mgid;
+	struct list_head qp_list;
+	wait_queue_head_t wait;
+	atomic_t refcount;
+	int n_attached;
+};
+
+/* Protection domain */
+struct hfi1_pd {
+	struct ib_pd ibpd;
+	int user;               /* non-zero if created from user space */
+};
+
+/* Address Handle */
+struct hfi1_ah {
+	struct ib_ah ibah;
+	struct ib_ah_attr attr;
+	atomic_t refcount;
+};
+
+/*
+ * This structure is used by hfi1_mmap() to validate an offset
+ * when an mmap() request is made.  The vm_area_struct then uses
+ * this as its vm_private_data.
+ */
+struct hfi1_mmap_info {
+	struct list_head pending_mmaps;
+	struct ib_ucontext *context;
+	void *obj;
+	__u64 offset;
+	struct kref ref;
+	unsigned size;
+};
+
+/*
+ * This structure is used to contain the head pointer, tail pointer,
+ * and completion queue entries as a single memory allocation so
+ * it can be mmap'ed into user space.
+ */
+struct hfi1_cq_wc {
+	u32 head;               /* index of next entry to fill */
+	u32 tail;               /* index of next ib_poll_cq() entry */
+	union {
+		/* these are actually size ibcq.cqe + 1 */
+		struct ib_uverbs_wc uqueue[0];
+		struct ib_wc kqueue[0];
+	};
+};
+
+/*
+ * The completion queue structure.
+ */
+struct hfi1_cq {
+	struct ib_cq ibcq;
+	struct kthread_work comptask;
+	struct hfi1_devdata *dd;
+	spinlock_t lock; /* protect changes in this struct */
+	u8 notify;
+	u8 triggered;
+	struct hfi1_cq_wc *queue;
+	struct hfi1_mmap_info *ip;
+};
+
+/*
+ * A segment is a linear region of low physical memory.
+ * Used by the verbs layer.
+ */
+struct hfi1_seg {
+	void *vaddr;
+	size_t length;
+};
+
+/* The number of hfi1_segs that fit in a page. */
+#define HFI1_SEGSZ     (PAGE_SIZE / sizeof(struct hfi1_seg))
+
+struct hfi1_segarray {
+	struct hfi1_seg segs[HFI1_SEGSZ];
+};
+
+struct hfi1_mregion {
+	struct ib_pd *pd;       /* shares refcnt of ibmr.pd */
+	u64 user_base;          /* User's address for this region */
+	u64 iova;               /* IB start address of this region */
+	size_t length;
+	u32 lkey;
+	u32 offset;             /* offset (bytes) to start of region */
+	int access_flags;
+	u32 max_segs;           /* number of hfi1_segs in all the arrays */
+	u32 mapsz;              /* size of the map array */
+	u8  page_shift;         /* 0 - non unform/non powerof2 sizes */
+	u8  lkey_published;     /* in global table */
+	struct completion comp; /* complete when refcount goes to zero */
+	atomic_t refcount;
+	struct hfi1_segarray *map[0];    /* the segments */
+};
+
+/*
+ * These keep track of the copy progress within a memory region.
+ * Used by the verbs layer.
+ */
+struct hfi1_sge {
+	struct hfi1_mregion *mr;
+	void *vaddr;            /* kernel virtual address of segment */
+	u32 sge_length;         /* length of the SGE */
+	u32 length;             /* remaining length of the segment */
+	u16 m;                  /* current index: mr->map[m] */
+	u16 n;                  /* current index: mr->map[m]->segs[n] */
+};
+
+/* Memory region */
+struct hfi1_mr {
+	struct ib_mr ibmr;
+	struct ib_umem *umem;
+	struct hfi1_mregion mr;  /* must be last */
+};
+
+/*
+ * Send work request queue entry.
+ * The size of the sg_list is determined when the QP is created and stored
+ * in qp->s_max_sge.
+ */
+struct hfi1_swqe {
+	struct ib_send_wr wr;   /* don't use wr.sg_list */
+	u32 psn;                /* first packet sequence number */
+	u32 lpsn;               /* last packet sequence number */
+	u32 ssn;                /* send sequence number */
+	u32 length;             /* total length of data in sg_list */
+	struct hfi1_sge sg_list[0];
+};
+
+/*
+ * Receive work request queue entry.
+ * The size of the sg_list is determined when the QP (or SRQ) is created
+ * and stored in qp->r_rq.max_sge (or srq->rq.max_sge).
+ */
+struct hfi1_rwqe {
+	u64 wr_id;
+	u8 num_sge;
+	struct ib_sge sg_list[0];
+};
+
+/*
+ * This structure is used to contain the head pointer, tail pointer,
+ * and receive work queue entries as a single memory allocation so
+ * it can be mmap'ed into user space.
+ * Note that the wq array elements are variable size so you can't
+ * just index into the array to get the N'th element;
+ * use get_rwqe_ptr() instead.
+ */
+struct hfi1_rwq {
+	u32 head;               /* new work requests posted to the head */
+	u32 tail;               /* receives pull requests from here. */
+	struct hfi1_rwqe wq[0];
+};
+
+struct hfi1_rq {
+	struct hfi1_rwq *wq;
+	u32 size;               /* size of RWQE array */
+	u8 max_sge;
+	/* protect changes in this struct */
+	spinlock_t lock ____cacheline_aligned_in_smp;
+};
+
+struct hfi1_srq {
+	struct ib_srq ibsrq;
+	struct hfi1_rq rq;
+	struct hfi1_mmap_info *ip;
+	/* send signal when number of RWQEs < limit */
+	u32 limit;
+};
+
+struct hfi1_sge_state {
+	struct hfi1_sge *sg_list;      /* next SGE to be used if any */
+	struct hfi1_sge sge;   /* progress state for the current SGE */
+	u32 total_len;
+	u8 num_sge;
+};
+
+/*
+ * This structure holds the information that the send tasklet needs
+ * to send a RDMA read response or atomic operation.
+ */
+struct hfi1_ack_entry {
+	u8 opcode;
+	u8 sent;
+	u32 psn;
+	u32 lpsn;
+	union {
+		struct hfi1_sge rdma_sge;
+		u64 atomic_data;
+	};
+};
+
+/*
+ * Variables prefixed with s_ are for the requester (sender).
+ * Variables prefixed with r_ are for the responder (receiver).
+ * Variables prefixed with ack_ are for responder replies.
+ *
+ * Common variables are protected by both r_rq.lock and s_lock in that order
+ * which only happens in modify_qp() or changing the QP 'state'.
+ */
+struct hfi1_qp {
+	struct ib_qp ibqp;
+	/* read mostly fields above and below */
+	struct ib_ah_attr remote_ah_attr;
+	struct ib_ah_attr alt_ah_attr;
+	struct hfi1_qp __rcu *next;           /* link list for QPN hash table */
+	struct hfi1_swqe *s_wq;  /* send work queue */
+	struct hfi1_mmap_info *ip;
+	struct ahg_ib_header *s_hdr;     /* next packet header to send */
+	u8 s_sc;			/* SC[0..4] for next packet */
+	unsigned long timeout_jiffies;  /* computed from timeout */
+
+	enum ib_mtu path_mtu;
+	int srate_mbps;		/* s_srate (below) converted to Mbit/s */
+	u32 remote_qpn;
+	u32 pmtu;		/* decoded from path_mtu */
+	u32 qkey;               /* QKEY for this QP (for UD or RD) */
+	u32 s_size;             /* send work queue size */
+	u32 s_rnr_timeout;      /* number of milliseconds for RNR timeout */
+	u32 s_ahgpsn;           /* set to the psn in the copy of the header */
+
+	u8 state;               /* QP state */
+	u8 allowed_ops;		/* high order bits of allowed opcodes */
+	u8 qp_access_flags;
+	u8 alt_timeout;         /* Alternate path timeout for this QP */
+	u8 timeout;             /* Timeout for this QP */
+	u8 s_srate;
+	u8 s_mig_state;
+	u8 port_num;
+	u8 s_pkey_index;        /* PKEY index to use */
+	u8 s_alt_pkey_index;    /* Alternate path PKEY index to use */
+	u8 r_max_rd_atomic;     /* max number of RDMA read/atomic to receive */
+	u8 s_max_rd_atomic;     /* max number of RDMA read/atomic to send */
+	u8 s_retry_cnt;         /* number of times to retry */
+	u8 s_rnr_retry_cnt;
+	u8 r_min_rnr_timer;     /* retry timeout value for RNR NAKs */
+	u8 s_max_sge;           /* size of s_wq->sg_list */
+	u8 s_draining;
+
+	/* start of read/write fields */
+	atomic_t refcount ____cacheline_aligned_in_smp;
+	wait_queue_head_t wait;
+
+
+	struct hfi1_ack_entry s_ack_queue[HFI1_MAX_RDMA_ATOMIC + 1]
+		____cacheline_aligned_in_smp;
+	struct hfi1_sge_state s_rdma_read_sge;
+
+	spinlock_t r_lock ____cacheline_aligned_in_smp;      /* used for APM */
+	unsigned long r_aflags;
+	u64 r_wr_id;            /* ID for current receive WQE */
+	u32 r_ack_psn;          /* PSN for next ACK or atomic ACK */
+	u32 r_len;              /* total length of r_sge */
+	u32 r_rcv_len;          /* receive data len processed */
+	u32 r_psn;              /* expected rcv packet sequence number */
+	u32 r_msn;              /* message sequence number */
+
+	u8 r_state;             /* opcode of last packet received */
+	u8 r_flags;
+	u8 r_head_ack_queue;    /* index into s_ack_queue[] */
+
+	struct list_head rspwait;       /* link for waiting to respond */
+
+	struct hfi1_sge_state r_sge;     /* current receive data */
+	struct hfi1_rq r_rq;             /* receive work queue */
+
+	spinlock_t s_lock ____cacheline_aligned_in_smp;
+	struct hfi1_sge_state *s_cur_sge;
+	u32 s_flags;
+	struct hfi1_swqe *s_wqe;
+	struct hfi1_sge_state s_sge;     /* current send request data */
+	struct hfi1_mregion *s_rdma_mr;
+	struct sdma_engine *s_sde; /* current sde */
+	u32 s_cur_size;         /* size of send packet in bytes */
+	u32 s_len;              /* total length of s_sge */
+	u32 s_rdma_read_len;    /* total length of s_rdma_read_sge */
+	u32 s_next_psn;         /* PSN for next request */
+	u32 s_last_psn;         /* last response PSN processed */
+	u32 s_sending_psn;      /* lowest PSN that is being sent */
+	u32 s_sending_hpsn;     /* highest PSN that is being sent */
+	u32 s_psn;              /* current packet sequence number */
+	u32 s_ack_rdma_psn;     /* PSN for sending RDMA read responses */
+	u32 s_ack_psn;          /* PSN for acking sends and RDMA writes */
+	u32 s_head;             /* new entries added here */
+	u32 s_tail;             /* next entry to process */
+	u32 s_cur;              /* current work queue entry */
+	u32 s_acked;            /* last un-ACK'ed entry */
+	u32 s_last;             /* last completed entry */
+	u32 s_ssn;              /* SSN of tail entry */
+	u32 s_lsn;              /* limit sequence number (credit) */
+	u16 s_hdrwords;         /* size of s_hdr in 32 bit words */
+	u16 s_rdma_ack_cnt;
+	s8 s_ahgidx;
+	u8 s_state;             /* opcode of last packet sent */
+	u8 s_ack_state;         /* opcode of packet to ACK */
+	u8 s_nak_state;         /* non-zero if NAK is pending */
+	u8 r_nak_state;         /* non-zero if NAK is pending */
+	u8 s_retry;             /* requester retry counter */
+	u8 s_rnr_retry;         /* requester RNR retry counter */
+	u8 s_num_rd_atomic;     /* number of RDMA read/atomic pending */
+	u8 s_tail_ack_queue;    /* index into s_ack_queue[] */
+
+	struct hfi1_sge_state s_ack_rdma_sge;
+	struct timer_list s_timer;
+
+	struct iowait s_iowait;
+
+	struct hfi1_sge r_sg_list[0] /* verified SGEs */
+		____cacheline_aligned_in_smp;
+};
+
+/*
+ * Atomic bit definitions for r_aflags.
+ */
+#define HFI1_R_WRID_VALID        0
+#define HFI1_R_REWIND_SGE        1
+
+/*
+ * Bit definitions for r_flags.
+ */
+#define HFI1_R_REUSE_SGE 0x01
+#define HFI1_R_RDMAR_SEQ 0x02
+#define HFI1_R_RSP_NAK   0x04
+#define HFI1_R_RSP_SEND  0x08
+#define HFI1_R_COMM_EST  0x10
+
+/*
+ * Bit definitions for s_flags.
+ *
+ * HFI1_S_SIGNAL_REQ_WR - set if QP send WRs contain completion signaled
+ * HFI1_S_BUSY - send tasklet is processing the QP
+ * HFI1_S_TIMER - the RC retry timer is active
+ * HFI1_S_ACK_PENDING - an ACK is waiting to be sent after RDMA read/atomics
+ * HFI1_S_WAIT_FENCE - waiting for all prior RDMA read or atomic SWQEs
+ *                         before processing the next SWQE
+ * HFI1_S_WAIT_RDMAR - waiting for a RDMA read or atomic SWQE to complete
+ *                         before processing the next SWQE
+ * HFI1_S_WAIT_RNR - waiting for RNR timeout
+ * HFI1_S_WAIT_SSN_CREDIT - waiting for RC credits to process next SWQE
+ * HFI1_S_WAIT_DMA - waiting for send DMA queue to drain before generating
+ *                  next send completion entry not via send DMA
+ * HFI1_S_WAIT_PIO - waiting for a send buffer to be available
+ * HFI1_S_WAIT_TX - waiting for a struct verbs_txreq to be available
+ * HFI1_S_WAIT_DMA_DESC - waiting for DMA descriptors to be available
+ * HFI1_S_WAIT_KMEM - waiting for kernel memory to be available
+ * HFI1_S_WAIT_PSN - waiting for a packet to exit the send DMA queue
+ * HFI1_S_WAIT_ACK - waiting for an ACK packet before sending more requests
+ * HFI1_S_SEND_ONE - send one packet, request ACK, then wait for ACK
+ * HFI1_S_ECN - a BECN was queued to the send engine
+ */
+#define HFI1_S_SIGNAL_REQ_WR	0x0001
+#define HFI1_S_BUSY		0x0002
+#define HFI1_S_TIMER		0x0004
+#define HFI1_S_RESP_PENDING	0x0008
+#define HFI1_S_ACK_PENDING	0x0010
+#define HFI1_S_WAIT_FENCE	0x0020
+#define HFI1_S_WAIT_RDMAR	0x0040
+#define HFI1_S_WAIT_RNR		0x0080
+#define HFI1_S_WAIT_SSN_CREDIT	0x0100
+#define HFI1_S_WAIT_DMA		0x0200
+#define HFI1_S_WAIT_PIO		0x0400
+#define HFI1_S_WAIT_TX		0x0800
+#define HFI1_S_WAIT_DMA_DESC	0x1000
+#define HFI1_S_WAIT_KMEM		0x2000
+#define HFI1_S_WAIT_PSN		0x4000
+#define HFI1_S_WAIT_ACK		0x8000
+#define HFI1_S_SEND_ONE		0x10000
+#define HFI1_S_UNLIMITED_CREDIT	0x20000
+#define HFI1_S_AHG_VALID		0x40000
+#define HFI1_S_AHG_CLEAR		0x80000
+#define HFI1_S_ECN		0x100000
+
+/*
+ * Wait flags that would prevent any packet type from being sent.
+ */
+#define HFI1_S_ANY_WAIT_IO (HFI1_S_WAIT_PIO | HFI1_S_WAIT_TX | \
+	HFI1_S_WAIT_DMA_DESC | HFI1_S_WAIT_KMEM)
+
+/*
+ * Wait flags that would prevent send work requests from making progress.
+ */
+#define HFI1_S_ANY_WAIT_SEND (HFI1_S_WAIT_FENCE | HFI1_S_WAIT_RDMAR | \
+	HFI1_S_WAIT_RNR | HFI1_S_WAIT_SSN_CREDIT | HFI1_S_WAIT_DMA | \
+	HFI1_S_WAIT_PSN | HFI1_S_WAIT_ACK)
+
+#define HFI1_S_ANY_WAIT (HFI1_S_ANY_WAIT_IO | HFI1_S_ANY_WAIT_SEND)
+
+#define HFI1_PSN_CREDIT  16
+
+/*
+ * Since struct hfi1_swqe is not a fixed size, we can't simply index into
+ * struct hfi1_qp.s_wq.  This function does the array index computation.
+ */
+static inline struct hfi1_swqe *get_swqe_ptr(struct hfi1_qp *qp,
+					     unsigned n)
+{
+	return (struct hfi1_swqe *)((char *)qp->s_wq +
+				     (sizeof(struct hfi1_swqe) +
+				      qp->s_max_sge *
+				      sizeof(struct hfi1_sge)) * n);
+}
+
+/*
+ * Since struct hfi1_rwqe is not a fixed size, we can't simply index into
+ * struct hfi1_rwq.wq.  This function does the array index computation.
+ */
+static inline struct hfi1_rwqe *get_rwqe_ptr(struct hfi1_rq *rq, unsigned n)
+{
+	return (struct hfi1_rwqe *)
+		((char *) rq->wq->wq +
+		 (sizeof(struct hfi1_rwqe) +
+		  rq->max_sge * sizeof(struct ib_sge)) * n);
+}
+
+#define MAX_LKEY_TABLE_BITS 23
+
+struct hfi1_lkey_table {
+	spinlock_t lock; /* protect changes in this struct */
+	u32 next;               /* next unused index (speeds search) */
+	u32 gen;                /* generation count */
+	u32 max;                /* size of the table */
+	struct hfi1_mregion __rcu **table;
+};
+
+struct hfi1_opcode_stats {
+	u64 n_packets;          /* number of packets */
+	u64 n_bytes;            /* total number of bytes */
+};
+
+struct hfi1_opcode_stats_perctx {
+	struct hfi1_opcode_stats stats[256];
+};
+
+static inline void inc_opstats(
+	u32 tlen,
+	struct hfi1_opcode_stats *stats)
+{
+#ifdef CONFIG_DEBUG_FS
+	stats->n_bytes += tlen;
+	stats->n_packets++;
+#endif
+}
+
+struct hfi1_ibport {
+	struct hfi1_qp __rcu *qp[2];
+	struct ib_mad_agent *send_agent;	/* agent for SMI (traps) */
+	struct hfi1_ah *sm_ah;
+	struct hfi1_ah *smi_ah;
+	struct rb_root mcast_tree;
+	spinlock_t lock;		/* protect changes in this struct */
+
+	/* non-zero when timer is set */
+	unsigned long mkey_lease_timeout;
+	unsigned long trap_timeout;
+	__be64 gid_prefix;      /* in network order */
+	__be64 mkey;
+	__be64 guids[HFI1_GUIDS_PER_PORT	- 1];	/* writable GUIDs */
+	u64 tid;		/* TID for traps */
+	u64 n_rc_resends;
+	u64 n_seq_naks;
+	u64 n_rdma_seq;
+	u64 n_rnr_naks;
+	u64 n_other_naks;
+	u64 n_loop_pkts;
+	u64 n_pkt_drops;
+	u64 n_vl15_dropped;
+	u64 n_rc_timeouts;
+	u64 n_dmawait;
+	u64 n_unaligned;
+	u64 n_rc_dupreq;
+	u64 n_rc_seqnak;
+
+	/* Hot-path per CPU counters to avoid cacheline trading to update */
+	u64 z_rc_acks;
+	u64 z_rc_qacks;
+	u64 z_rc_delayed_comp;
+	u64 __percpu *rc_acks;
+	u64 __percpu *rc_qacks;
+	u64 __percpu *rc_delayed_comp;
+
+	u32 port_cap_flags;
+	u32 pma_sample_start;
+	u32 pma_sample_interval;
+	__be16 pma_counter_select[5];
+	u16 pma_tag;
+	u16 pkey_violations;
+	u16 qkey_violations;
+	u16 mkey_violations;
+	u16 mkey_lease_period;
+	u16 sm_lid;
+	u16 repress_traps;
+	u8 sm_sl;
+	u8 mkeyprot;
+	u8 subnet_timeout;
+	u8 vl_high_limit;
+	/* the first 16 entries are sl_to_vl for !OPA */
+	u8 sl_to_sc[32];
+	u8 sc_to_sl[32];
+};
+
+
+struct hfi1_qp_ibdev;
+struct hfi1_ibdev {
+	struct ib_device ibdev;
+	struct list_head pending_mmaps;
+	spinlock_t mmap_offset_lock; /* protect mmap_offset */
+	u32 mmap_offset;
+	struct hfi1_mregion __rcu *dma_mr;
+
+	struct hfi1_qp_ibdev *qp_dev;
+
+	/* QP numbers are shared by all IB ports */
+	struct hfi1_lkey_table lk_table;
+	/* protect wait lists */
+	seqlock_t iowait_lock;
+	struct list_head txwait;        /* list for wait verbs_txreq */
+	struct list_head memwait;       /* list for wait kernel memory */
+	struct list_head txreq_free;
+	struct kmem_cache *verbs_txreq_cache;
+	struct timer_list mem_timer;
+
+	/* other waiters */
+	spinlock_t pending_lock;
+
+	u64 n_piowait;
+	u64 n_txwait;
+	u64 n_kmem_wait;
+
+	u32 n_pds_allocated;    /* number of PDs allocated for device */
+	spinlock_t n_pds_lock;
+	u32 n_ahs_allocated;    /* number of AHs allocated for device */
+	spinlock_t n_ahs_lock;
+	u32 n_cqs_allocated;    /* number of CQs allocated for device */
+	spinlock_t n_cqs_lock;
+	u32 n_qps_allocated;    /* number of QPs allocated for device */
+	spinlock_t n_qps_lock;
+	u32 n_srqs_allocated;   /* number of SRQs allocated for device */
+	spinlock_t n_srqs_lock;
+	u32 n_mcast_grps_allocated; /* number of mcast groups allocated */
+	spinlock_t n_mcast_grps_lock;
+#ifdef CONFIG_DEBUG_FS
+	/* per HFI debugfs */
+	struct dentry *hfi1_ibdev_dbg;
+	/* per HFI symlinks to above */
+	struct dentry *hfi1_ibdev_link;
+#endif
+};
+
+struct hfi1_verbs_counters {
+	u64 symbol_error_counter;
+	u64 link_error_recovery_counter;
+	u64 link_downed_counter;
+	u64 port_rcv_errors;
+	u64 port_rcv_remphys_errors;
+	u64 port_xmit_discards;
+	u64 port_xmit_data;
+	u64 port_rcv_data;
+	u64 port_xmit_packets;
+	u64 port_rcv_packets;
+	u32 local_link_integrity_errors;
+	u32 excessive_buffer_overrun_errors;
+	u32 vl15_dropped;
+};
+
+static inline struct hfi1_mr *to_imr(struct ib_mr *ibmr)
+{
+	return container_of(ibmr, struct hfi1_mr, ibmr);
+}
+
+static inline struct hfi1_pd *to_ipd(struct ib_pd *ibpd)
+{
+	return container_of(ibpd, struct hfi1_pd, ibpd);
+}
+
+static inline struct hfi1_ah *to_iah(struct ib_ah *ibah)
+{
+	return container_of(ibah, struct hfi1_ah, ibah);
+}
+
+static inline struct hfi1_cq *to_icq(struct ib_cq *ibcq)
+{
+	return container_of(ibcq, struct hfi1_cq, ibcq);
+}
+
+static inline struct hfi1_srq *to_isrq(struct ib_srq *ibsrq)
+{
+	return container_of(ibsrq, struct hfi1_srq, ibsrq);
+}
+
+static inline struct hfi1_qp *to_iqp(struct ib_qp *ibqp)
+{
+	return container_of(ibqp, struct hfi1_qp, ibqp);
+}
+
+static inline struct hfi1_ibdev *to_idev(struct ib_device *ibdev)
+{
+	return container_of(ibdev, struct hfi1_ibdev, ibdev);
+}
+
+/*
+ * Send if not busy or waiting for I/O and either
+ * a RC response is pending or we can process send work requests.
+ */
+static inline int hfi1_send_ok(struct hfi1_qp *qp)
+{
+	return !(qp->s_flags & (HFI1_S_BUSY | HFI1_S_ANY_WAIT_IO)) &&
+		(qp->s_hdrwords || (qp->s_flags & HFI1_S_RESP_PENDING) ||
+		 !(qp->s_flags & HFI1_S_ANY_WAIT_SEND));
+}
+
+/*
+ * This must be called with s_lock held.
+ */
+void hfi1_schedule_send(struct hfi1_qp *qp);
+void hfi1_bad_pqkey(struct hfi1_ibport *ibp, __be16 trap_num, u32 key, u32 sl,
+		    u32 qp1, u32 qp2, __be16 lid1, __be16 lid2);
+void hfi1_cap_mask_chg(struct hfi1_ibport *ibp);
+void hfi1_sys_guid_chg(struct hfi1_ibport *ibp);
+void hfi1_node_desc_chg(struct hfi1_ibport *ibp);
+int hfi1_process_mad(struct ib_device *ibdev, int mad_flags, u8 port,
+		     const struct ib_wc *in_wc, const struct ib_grh *in_grh,
+		     const struct ib_mad_hdr *in_mad, size_t in_mad_size,
+		     struct ib_mad_hdr *out_mad, size_t *out_mad_size,
+		     u16 *out_mad_pkey_index);
+int hfi1_create_agents(struct hfi1_ibdev *dev);
+void hfi1_free_agents(struct hfi1_ibdev *dev);
+
+/*
+ * The PSN_MASK and PSN_SHIFT allow for
+ * 1) comparing two PSNs
+ * 2) returning the PSN with any upper bits masked
+ * 3) returning the difference between to PSNs
+ *
+ * The number of significant bits in the PSN must
+ * necessarily be at least one bit less than
+ * the container holding the PSN.
+ */
+#ifndef CONFIG_HFI1_VERBS_31BIT_PSN
+#define PSN_MASK 0xFFFFFF
+#define PSN_SHIFT 8
+#else
+#define PSN_MASK 0x7FFFFFFF
+#define PSN_SHIFT 1
+#endif
+#define PSN_MODIFY_MASK 0xFFFFFF
+
+/* Number of bits to pay attention to in the opcode for checking qp type */
+#define OPCODE_QP_MASK 0xE0
+
+/*
+ * Compare the lower 24 bits of the msn values.
+ * Returns an integer <, ==, or > than zero.
+ */
+static inline int cmp_msn(u32 a, u32 b)
+{
+	return (((int) a) - ((int) b)) << 8;
+}
+
+/*
+ * Compare two PSNs
+ * Returns an integer <, ==, or > than zero.
+ */
+static inline int cmp_psn(u32 a, u32 b)
+{
+	return (((int) a) - ((int) b)) << PSN_SHIFT;
+}
+
+/*
+ * Return masked PSN
+ */
+static inline u32 mask_psn(u32 a)
+{
+	return a & PSN_MASK;
+}
+
+/*
+ * Return delta between two PSNs
+ */
+static inline u32 delta_psn(u32 a, u32 b)
+{
+	return (((int)a - (int)b) << PSN_SHIFT) >> PSN_SHIFT;
+}
+
+struct hfi1_mcast *hfi1_mcast_find(struct hfi1_ibport *ibp, union ib_gid *mgid);
+
+int hfi1_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid);
+
+int hfi1_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid);
+
+int hfi1_mcast_tree_empty(struct hfi1_ibport *ibp);
+
+struct verbs_txreq;
+void hfi1_put_txreq(struct verbs_txreq *tx);
+
+int hfi1_verbs_send(struct hfi1_qp *qp, struct ahg_ib_header *ahdr,
+		    u32 hdrwords, struct hfi1_sge_state *ss, u32 len);
+
+void hfi1_copy_sge(struct hfi1_sge_state *ss, void *data, u32 length,
+		   int release);
+
+void hfi1_skip_sge(struct hfi1_sge_state *ss, u32 length, int release);
+
+void hfi1_cnp_rcv(struct hfi1_packet *packet);
+
+void hfi1_uc_rcv(struct hfi1_packet *packet);
+
+void hfi1_rc_rcv(struct hfi1_packet *packet);
+
+void hfi1_rc_hdrerr(
+	struct hfi1_ctxtdata *rcd,
+	struct hfi1_ib_header *hdr,
+	u32 rcv_flags,
+	struct hfi1_qp *qp);
+
+u8 ah_to_sc(struct ib_device *ibdev, struct ib_ah_attr *ah_attr);
+
+int hfi1_check_ah(struct ib_device *ibdev, struct ib_ah_attr *ah_attr);
+
+struct ib_ah *hfi1_create_qp0_ah(struct hfi1_ibport *ibp, u16 dlid);
+
+void hfi1_rc_rnr_retry(unsigned long arg);
+
+void hfi1_rc_send_complete(struct hfi1_qp *qp, struct hfi1_ib_header *hdr);
+
+void hfi1_rc_error(struct hfi1_qp *qp, enum ib_wc_status err);
+
+void hfi1_ud_rcv(struct hfi1_packet *packet);
+
+int hfi1_lookup_pkey_idx(struct hfi1_ibport *ibp, u16 pkey);
+
+int hfi1_alloc_lkey(struct hfi1_mregion *mr, int dma_region);
+
+void hfi1_free_lkey(struct hfi1_mregion *mr);
+
+int hfi1_lkey_ok(struct hfi1_lkey_table *rkt, struct hfi1_pd *pd,
+		 struct hfi1_sge *isge, struct ib_sge *sge, int acc);
+
+int hfi1_rkey_ok(struct hfi1_qp *qp, struct hfi1_sge *sge,
+		 u32 len, u64 vaddr, u32 rkey, int acc);
+
+int hfi1_post_srq_receive(struct ib_srq *ibsrq, struct ib_recv_wr *wr,
+			  struct ib_recv_wr **bad_wr);
+
+struct ib_srq *hfi1_create_srq(struct ib_pd *ibpd,
+			       struct ib_srq_init_attr *srq_init_attr,
+			       struct ib_udata *udata);
+
+int hfi1_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr,
+		    enum ib_srq_attr_mask attr_mask,
+		    struct ib_udata *udata);
+
+int hfi1_query_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr);
+
+int hfi1_destroy_srq(struct ib_srq *ibsrq);
+
+int hfi1_cq_init(struct hfi1_devdata *dd);
+
+void hfi1_cq_exit(struct hfi1_devdata *dd);
+
+void hfi1_cq_enter(struct hfi1_cq *cq, struct ib_wc *entry, int sig);
+
+int hfi1_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry);
+
+struct ib_cq *hfi1_create_cq(
+	struct ib_device *ibdev,
+	const struct ib_cq_init_attr *attr,
+	struct ib_ucontext *context,
+	struct ib_udata *udata);
+
+int hfi1_destroy_cq(struct ib_cq *ibcq);
+
+int hfi1_req_notify_cq(
+	struct ib_cq *ibcq,
+	enum ib_cq_notify_flags notify_flags);
+
+int hfi1_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata);
+
+struct ib_mr *hfi1_get_dma_mr(struct ib_pd *pd, int acc);
+
+struct ib_mr *hfi1_reg_phys_mr(struct ib_pd *pd,
+			       struct ib_phys_buf *buffer_list,
+			       int num_phys_buf, int acc, u64 *iova_start);
+
+struct ib_mr *hfi1_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
+			       u64 virt_addr, int mr_access_flags,
+			       struct ib_udata *udata);
+
+int hfi1_dereg_mr(struct ib_mr *ibmr);
+
+struct ib_mr *hfi1_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len);
+
+struct ib_fast_reg_page_list *hfi1_alloc_fast_reg_page_list(
+				struct ib_device *ibdev, int page_list_len);
+
+void hfi1_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl);
+
+int hfi1_fast_reg_mr(struct hfi1_qp *qp, struct ib_send_wr *wr);
+
+struct ib_fmr *hfi1_alloc_fmr(struct ib_pd *pd, int mr_access_flags,
+			      struct ib_fmr_attr *fmr_attr);
+
+int hfi1_map_phys_fmr(struct ib_fmr *ibfmr, u64 *page_list,
+		      int list_len, u64 iova);
+
+int hfi1_unmap_fmr(struct list_head *fmr_list);
+
+int hfi1_dealloc_fmr(struct ib_fmr *ibfmr);
+
+static inline void hfi1_get_mr(struct hfi1_mregion *mr)
+{
+	atomic_inc(&mr->refcount);
+}
+
+static inline void hfi1_put_mr(struct hfi1_mregion *mr)
+{
+	if (unlikely(atomic_dec_and_test(&mr->refcount)))
+		complete(&mr->comp);
+}
+
+static inline void hfi1_put_ss(struct hfi1_sge_state *ss)
+{
+	while (ss->num_sge) {
+		hfi1_put_mr(ss->sge.mr);
+		if (--ss->num_sge)
+			ss->sge = *ss->sg_list++;
+	}
+}
+
+void hfi1_release_mmap_info(struct kref *ref);
+
+struct hfi1_mmap_info *hfi1_create_mmap_info(struct hfi1_ibdev *dev, u32 size,
+					     struct ib_ucontext *context,
+					     void *obj);
+
+void hfi1_update_mmap_info(struct hfi1_ibdev *dev, struct hfi1_mmap_info *ip,
+			   u32 size, void *obj);
+
+int hfi1_mmap(struct ib_ucontext *context, struct vm_area_struct *vma);
+
+int hfi1_get_rwqe(struct hfi1_qp *qp, int wr_id_only);
+
+void hfi1_migrate_qp(struct hfi1_qp *qp);
+
+int hfi1_ruc_check_hdr(struct hfi1_ibport *ibp, struct hfi1_ib_header *hdr,
+		       int has_grh, struct hfi1_qp *qp, u32 bth0);
+
+u32 hfi1_make_grh(struct hfi1_ibport *ibp, struct ib_grh *hdr,
+		  struct ib_global_route *grh, u32 hwords, u32 nwords);
+
+void clear_ahg(struct hfi1_qp *qp);
+
+void hfi1_make_ruc_header(struct hfi1_qp *qp, struct hfi1_other_headers *ohdr,
+			  u32 bth0, u32 bth2, int middle);
+
+void hfi1_do_send(struct work_struct *work);
+
+void hfi1_send_complete(struct hfi1_qp *qp, struct hfi1_swqe *wqe,
+			enum ib_wc_status status);
+
+void hfi1_send_rc_ack(struct hfi1_ctxtdata *, struct hfi1_qp *qp, int is_fecn);
+
+int hfi1_make_rc_req(struct hfi1_qp *qp);
+
+int hfi1_make_uc_req(struct hfi1_qp *qp);
+
+int hfi1_make_ud_req(struct hfi1_qp *qp);
+
+int hfi1_register_ib_device(struct hfi1_devdata *);
+
+void hfi1_unregister_ib_device(struct hfi1_devdata *);
+
+void hfi1_ib_rcv(struct hfi1_packet *packet);
+
+unsigned hfi1_get_npkeys(struct hfi1_devdata *);
+
+int hfi1_verbs_send_dma(struct hfi1_qp *qp, struct ahg_ib_header *hdr,
+			u32 hdrwords, struct hfi1_sge_state *ss, u32 len,
+			u32 plen, u32 dwords, u64 pbc);
+
+int hfi1_verbs_send_pio(struct hfi1_qp *qp, struct ahg_ib_header *hdr,
+			u32 hdrwords, struct hfi1_sge_state *ss, u32 len,
+			u32 plen, u32 dwords, u64 pbc);
+
+struct send_context *qp_to_send_context(struct hfi1_qp *qp, u8 sc5);
+
+extern const enum ib_wc_opcode ib_hfi1_wc_opcode[];
+
+extern const u8 hdr_len_by_opcode[];
+
+extern const int ib_hfi1_state_ops[];
+
+extern __be64 ib_hfi1_sys_image_guid;    /* in network order */
+
+extern unsigned int hfi1_lkey_table_size;
+
+extern unsigned int hfi1_max_cqes;
+
+extern unsigned int hfi1_max_cqs;
+
+extern unsigned int hfi1_max_qp_wrs;
+
+extern unsigned int hfi1_max_qps;
+
+extern unsigned int hfi1_max_sges;
+
+extern unsigned int hfi1_max_mcast_grps;
+
+extern unsigned int hfi1_max_mcast_qp_attached;
+
+extern unsigned int hfi1_max_srqs;
+
+extern unsigned int hfi1_max_srq_sges;
+
+extern unsigned int hfi1_max_srq_wrs;
+
+extern const u32 ib_hfi1_rnr_table[];
+
+extern struct ib_dma_mapping_ops hfi1_dma_mapping_ops;
+
+#endif                          /* HFI1_VERBS_H */
diff --git a/drivers/staging/rdma/hfi1/verbs_mcast.c b/drivers/staging/rdma/hfi1/verbs_mcast.c
new file mode 100644
index 000000000000..afc6b4c61a1d
--- /dev/null
+++ b/drivers/staging/rdma/hfi1/verbs_mcast.c
@@ -0,0 +1,385 @@
+/*
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *  - Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ *  - Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in
+ *    the documentation and/or other materials provided with the
+ *    distribution.
+ *  - Neither the name of Intel Corporation nor the names of its
+ *    contributors may be used to endorse or promote products derived
+ *    from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#include <linux/rculist.h>
+
+#include "hfi.h"
+
+/**
+ * mcast_qp_alloc - alloc a struct to link a QP to mcast GID struct
+ * @qp: the QP to link
+ */
+static struct hfi1_mcast_qp *mcast_qp_alloc(struct hfi1_qp *qp)
+{
+	struct hfi1_mcast_qp *mqp;
+
+	mqp = kmalloc(sizeof(*mqp), GFP_KERNEL);
+	if (!mqp)
+		goto bail;
+
+	mqp->qp = qp;
+	atomic_inc(&qp->refcount);
+
+bail:
+	return mqp;
+}
+
+static void mcast_qp_free(struct hfi1_mcast_qp *mqp)
+{
+	struct hfi1_qp *qp = mqp->qp;
+
+	/* Notify hfi1_destroy_qp() if it is waiting. */
+	if (atomic_dec_and_test(&qp->refcount))
+		wake_up(&qp->wait);
+
+	kfree(mqp);
+}
+
+/**
+ * mcast_alloc - allocate the multicast GID structure
+ * @mgid: the multicast GID
+ *
+ * A list of QPs will be attached to this structure.
+ */
+static struct hfi1_mcast *mcast_alloc(union ib_gid *mgid)
+{
+	struct hfi1_mcast *mcast;
+
+	mcast = kmalloc(sizeof(*mcast), GFP_KERNEL);
+	if (!mcast)
+		goto bail;
+
+	mcast->mgid = *mgid;
+	INIT_LIST_HEAD(&mcast->qp_list);
+	init_waitqueue_head(&mcast->wait);
+	atomic_set(&mcast->refcount, 0);
+	mcast->n_attached = 0;
+
+bail:
+	return mcast;
+}
+
+static void mcast_free(struct hfi1_mcast *mcast)
+{
+	struct hfi1_mcast_qp *p, *tmp;
+
+	list_for_each_entry_safe(p, tmp, &mcast->qp_list, list)
+		mcast_qp_free(p);
+
+	kfree(mcast);
+}
+
+/**
+ * hfi1_mcast_find - search the global table for the given multicast GID
+ * @ibp: the IB port structure
+ * @mgid: the multicast GID to search for
+ *
+ * Returns NULL if not found.
+ *
+ * The caller is responsible for decrementing the reference count if found.
+ */
+struct hfi1_mcast *hfi1_mcast_find(struct hfi1_ibport *ibp, union ib_gid *mgid)
+{
+	struct rb_node *n;
+	unsigned long flags;
+	struct hfi1_mcast *mcast;
+
+	spin_lock_irqsave(&ibp->lock, flags);
+	n = ibp->mcast_tree.rb_node;
+	while (n) {
+		int ret;
+
+		mcast = rb_entry(n, struct hfi1_mcast, rb_node);
+
+		ret = memcmp(mgid->raw, mcast->mgid.raw,
+			     sizeof(union ib_gid));
+		if (ret < 0)
+			n = n->rb_left;
+		else if (ret > 0)
+			n = n->rb_right;
+		else {
+			atomic_inc(&mcast->refcount);
+			spin_unlock_irqrestore(&ibp->lock, flags);
+			goto bail;
+		}
+	}
+	spin_unlock_irqrestore(&ibp->lock, flags);
+
+	mcast = NULL;
+
+bail:
+	return mcast;
+}
+
+/**
+ * mcast_add - insert mcast GID into table and attach QP struct
+ * @mcast: the mcast GID table
+ * @mqp: the QP to attach
+ *
+ * Return zero if both were added.  Return EEXIST if the GID was already in
+ * the table but the QP was added.  Return ESRCH if the QP was already
+ * attached and neither structure was added.
+ */
+static int mcast_add(struct hfi1_ibdev *dev, struct hfi1_ibport *ibp,
+		     struct hfi1_mcast *mcast, struct hfi1_mcast_qp *mqp)
+{
+	struct rb_node **n = &ibp->mcast_tree.rb_node;
+	struct rb_node *pn = NULL;
+	int ret;
+
+	spin_lock_irq(&ibp->lock);
+
+	while (*n) {
+		struct hfi1_mcast *tmcast;
+		struct hfi1_mcast_qp *p;
+
+		pn = *n;
+		tmcast = rb_entry(pn, struct hfi1_mcast, rb_node);
+
+		ret = memcmp(mcast->mgid.raw, tmcast->mgid.raw,
+			     sizeof(union ib_gid));
+		if (ret < 0) {
+			n = &pn->rb_left;
+			continue;
+		}
+		if (ret > 0) {
+			n = &pn->rb_right;
+			continue;
+		}
+
+		/* Search the QP list to see if this is already there. */
+		list_for_each_entry_rcu(p, &tmcast->qp_list, list) {
+			if (p->qp == mqp->qp) {
+				ret = ESRCH;
+				goto bail;
+			}
+		}
+		if (tmcast->n_attached == hfi1_max_mcast_qp_attached) {
+			ret = ENOMEM;
+			goto bail;
+		}
+
+		tmcast->n_attached++;
+
+		list_add_tail_rcu(&mqp->list, &tmcast->qp_list);
+		ret = EEXIST;
+		goto bail;
+	}
+
+	spin_lock(&dev->n_mcast_grps_lock);
+	if (dev->n_mcast_grps_allocated == hfi1_max_mcast_grps) {
+		spin_unlock(&dev->n_mcast_grps_lock);
+		ret = ENOMEM;
+		goto bail;
+	}
+
+	dev->n_mcast_grps_allocated++;
+	spin_unlock(&dev->n_mcast_grps_lock);
+
+	mcast->n_attached++;
+
+	list_add_tail_rcu(&mqp->list, &mcast->qp_list);
+
+	atomic_inc(&mcast->refcount);
+	rb_link_node(&mcast->rb_node, pn, n);
+	rb_insert_color(&mcast->rb_node, &ibp->mcast_tree);
+
+	ret = 0;
+
+bail:
+	spin_unlock_irq(&ibp->lock);
+
+	return ret;
+}
+
+int hfi1_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
+{
+	struct hfi1_qp *qp = to_iqp(ibqp);
+	struct hfi1_ibdev *dev = to_idev(ibqp->device);
+	struct hfi1_ibport *ibp;
+	struct hfi1_mcast *mcast;
+	struct hfi1_mcast_qp *mqp;
+	int ret;
+
+	if (ibqp->qp_num <= 1 || qp->state == IB_QPS_RESET) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	/*
+	 * Allocate data structures since its better to do this outside of
+	 * spin locks and it will most likely be needed.
+	 */
+	mcast = mcast_alloc(gid);
+	if (mcast == NULL) {
+		ret = -ENOMEM;
+		goto bail;
+	}
+	mqp = mcast_qp_alloc(qp);
+	if (mqp == NULL) {
+		mcast_free(mcast);
+		ret = -ENOMEM;
+		goto bail;
+	}
+	ibp = to_iport(ibqp->device, qp->port_num);
+	switch (mcast_add(dev, ibp, mcast, mqp)) {
+	case ESRCH:
+		/* Neither was used: OK to attach the same QP twice. */
+		mcast_qp_free(mqp);
+		mcast_free(mcast);
+		break;
+
+	case EEXIST:            /* The mcast wasn't used */
+		mcast_free(mcast);
+		break;
+
+	case ENOMEM:
+		/* Exceeded the maximum number of mcast groups. */
+		mcast_qp_free(mqp);
+		mcast_free(mcast);
+		ret = -ENOMEM;
+		goto bail;
+
+	default:
+		break;
+	}
+
+	ret = 0;
+
+bail:
+	return ret;
+}
+
+int hfi1_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
+{
+	struct hfi1_qp *qp = to_iqp(ibqp);
+	struct hfi1_ibdev *dev = to_idev(ibqp->device);
+	struct hfi1_ibport *ibp = to_iport(ibqp->device, qp->port_num);
+	struct hfi1_mcast *mcast = NULL;
+	struct hfi1_mcast_qp *p, *tmp;
+	struct rb_node *n;
+	int last = 0;
+	int ret;
+
+	if (ibqp->qp_num <= 1 || qp->state == IB_QPS_RESET) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	spin_lock_irq(&ibp->lock);
+
+	/* Find the GID in the mcast table. */
+	n = ibp->mcast_tree.rb_node;
+	while (1) {
+		if (n == NULL) {
+			spin_unlock_irq(&ibp->lock);
+			ret = -EINVAL;
+			goto bail;
+		}
+
+		mcast = rb_entry(n, struct hfi1_mcast, rb_node);
+		ret = memcmp(gid->raw, mcast->mgid.raw,
+			     sizeof(union ib_gid));
+		if (ret < 0)
+			n = n->rb_left;
+		else if (ret > 0)
+			n = n->rb_right;
+		else
+			break;
+	}
+
+	/* Search the QP list. */
+	list_for_each_entry_safe(p, tmp, &mcast->qp_list, list) {
+		if (p->qp != qp)
+			continue;
+		/*
+		 * We found it, so remove it, but don't poison the forward
+		 * link until we are sure there are no list walkers.
+		 */
+		list_del_rcu(&p->list);
+		mcast->n_attached--;
+
+		/* If this was the last attached QP, remove the GID too. */
+		if (list_empty(&mcast->qp_list)) {
+			rb_erase(&mcast->rb_node, &ibp->mcast_tree);
+			last = 1;
+		}
+		break;
+	}
+
+	spin_unlock_irq(&ibp->lock);
+
+	if (p) {
+		/*
+		 * Wait for any list walkers to finish before freeing the
+		 * list element.
+		 */
+		wait_event(mcast->wait, atomic_read(&mcast->refcount) <= 1);
+		mcast_qp_free(p);
+	}
+	if (last) {
+		atomic_dec(&mcast->refcount);
+		wait_event(mcast->wait, !atomic_read(&mcast->refcount));
+		mcast_free(mcast);
+		spin_lock_irq(&dev->n_mcast_grps_lock);
+		dev->n_mcast_grps_allocated--;
+		spin_unlock_irq(&dev->n_mcast_grps_lock);
+	}
+
+	ret = 0;
+
+bail:
+	return ret;
+}
+
+int hfi1_mcast_tree_empty(struct hfi1_ibport *ibp)
+{
+	return ibp->mcast_tree.rb_node == NULL;
+}
-- 
cgit v1.2.3


From 3d4fe182003bcde778e29e84c14c0c4bb70a452e Mon Sep 17 00:00:00 2001
From: Leilk Liu <leilk.liu@mediatek.com>
Date: Mon, 31 Aug 2015 21:18:58 +0800
Subject: spi: Mediatek: Document devicetree bindings update for spi bus

This patch updates spi bindings, fixs clock usage description.

Signed-off-by: Leilk Liu <leilk.liu@mediatek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 Documentation/devicetree/bindings/spi/spi-mt65xx.txt | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/spi/spi-mt65xx.txt b/Documentation/devicetree/bindings/spi/spi-mt65xx.txt
index dcefc438272f..6160ffbcb3d3 100644
--- a/Documentation/devicetree/bindings/spi/spi-mt65xx.txt
+++ b/Documentation/devicetree/bindings/spi/spi-mt65xx.txt
@@ -15,17 +15,18 @@ Required properties:
 - interrupts: Should contain spi interrupt
 
 - clocks: phandles to input clocks.
-  The first should be <&topckgen CLK_TOP_SPI_SEL>.
-  The second should be one of the following.
+  The first should be one of the following. It's PLL.
    -  <&clk26m>: specify parent clock 26MHZ.
    -  <&topckgen CLK_TOP_SYSPLL3_D2>: specify parent clock 109MHZ.
 				      It's the default one.
    -  <&topckgen CLK_TOP_SYSPLL4_D2>: specify parent clock 78MHZ.
    -  <&topckgen CLK_TOP_UNIVPLL2_D4>: specify parent clock 104MHZ.
    -  <&topckgen CLK_TOP_UNIVPLL1_D8>: specify parent clock 78MHZ.
+  The second should be <&topckgen CLK_TOP_SPI_SEL>. It's clock mux.
+  The third is <&pericfg CLK_PERI_SPI0>. It's clock gate.
 
-- clock-names: shall be "spi-clk" for the controller clock, and
-  "parent-clk" for the parent clock.
+- clock-names: shall be "parent-clk" for the parent clock, "sel-clk" for the
+  muxes clock, and "spi-clk" for the clock gate.
 
 Optional properties:
 - mediatek,pad-select: specify which pins group(ck/mi/mo/cs) spi
@@ -44,8 +45,11 @@ spi: spi@1100a000 {
 	#size-cells = <0>;
 	reg = <0 0x1100a000 0 0x1000>;
 	interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_LOW>;
-	clocks = <&topckgen CLK_TOP_SPI_SEL>, <&topckgen CLK_TOP_SYSPLL3_D2>;
-	clock-names = "spi-clk", "parent-clk";
+	clocks = <&topckgen CLK_TOP_SYSPLL3_D2>,
+		 <&topckgen CLK_TOP_SPI_SEL>,
+		 <&pericfg CLK_PERI_SPI0>;
+	clock-names = "parent-clk", "sel-clk", "spi-clk";
+
 	mediatek,pad-select = <0>;
 	status = "disabled";
 };
-- 
cgit v1.2.3


From 66099bb0ee6c20f91ace3fa5f82202fbceb67d8e Mon Sep 17 00:00:00 2001
From: Guoqing Jiang <gqjiang@suse.com>
Date: Fri, 10 Jul 2015 17:01:15 +0800
Subject: md-cluster: fix deadlock issue on message lock

There is problem with previous communication mechanism, and we got below
deadlock scenario with cluster which has 3 nodes.

	Sender                	    Receiver        		Receiver

	token(EX)
       message(EX)
      writes message
   downconverts message(CR)
      requests ack(EX)
		                  get message(CR)            gets message(CR)
                		  reads message                reads message
		               requests EX on message    requests EX on message

To fix this problem, we do the following changes:

1. the sender downconverts MESSAGE to CW rather than CR.
2. and the receiver request PR lock not EX lock on message.

And in case we failed to down-convert EX to CW on message, it is better to
unlock message otherthan still hold the lock.

Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Lidong Zhong <ldzhong@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 Documentation/md-cluster.txt |  4 ++--
 drivers/md/md-cluster.c      | 14 +++++++-------
 2 files changed, 9 insertions(+), 9 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/md-cluster.txt b/Documentation/md-cluster.txt
index de1af7db3355..1b794369e03a 100644
--- a/Documentation/md-cluster.txt
+++ b/Documentation/md-cluster.txt
@@ -91,7 +91,7 @@ The algorithm is:
     this message inappropriate or redundant.
 
  3. sender write LVB.
-    sender down-convert MESSAGE from EX to CR
+    sender down-convert MESSAGE from EX to CW
     sender try to get EX of ACK
     [ wait until all receiver has *processed* the MESSAGE ]
 
@@ -112,7 +112,7 @@ The algorithm is:
     sender down-convert ACK from EX to CR
     sender release MESSAGE
     sender release TOKEN
-                               receiver upconvert to EX of MESSAGE
+                               receiver upconvert to PR of MESSAGE
                                receiver get CR of ACK
                                receiver release MESSAGE
 
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 47199addae04..85b7836fb4b5 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -488,8 +488,8 @@ static void recv_daemon(struct md_thread *thread)
 
 	/*release CR on ack_lockres*/
 	dlm_unlock_sync(ack_lockres);
-	/*up-convert to EX on message_lockres*/
-	dlm_lock_sync(message_lockres, DLM_LOCK_EX);
+	/*up-convert to PR on message_lockres*/
+	dlm_lock_sync(message_lockres, DLM_LOCK_PR);
 	/*get CR on ack_lockres again*/
 	dlm_lock_sync(ack_lockres, DLM_LOCK_CR);
 	/*release CR on message_lockres*/
@@ -522,7 +522,7 @@ static void unlock_comm(struct md_cluster_info *cinfo)
  * The function:
  * 1. Grabs the message lockresource in EX mode
  * 2. Copies the message to the message LVB
- * 3. Downconverts message lockresource to CR
+ * 3. Downconverts message lockresource to CW
  * 4. Upconverts ack lock resource from CR to EX. This forces the BAST on other nodes
  *    and the other nodes read the message. The thread will wait here until all other
  *    nodes have released ack lock resource.
@@ -543,12 +543,12 @@ static int __sendmsg(struct md_cluster_info *cinfo, struct cluster_msg *cmsg)
 
 	memcpy(cinfo->message_lockres->lksb.sb_lvbptr, (void *)cmsg,
 			sizeof(struct cluster_msg));
-	/*down-convert EX to CR on Message*/
-	error = dlm_lock_sync(cinfo->message_lockres, DLM_LOCK_CR);
+	/*down-convert EX to CW on Message*/
+	error = dlm_lock_sync(cinfo->message_lockres, DLM_LOCK_CW);
 	if (error) {
-		pr_err("md-cluster: failed to convert EX to CR on MESSAGE(%d)\n",
+		pr_err("md-cluster: failed to convert EX to CW on MESSAGE(%d)\n",
 				error);
-		goto failed_message;
+		goto failed_ack;
 	}
 
 	/*up-convert CR to EX on Ack*/
-- 
cgit v1.2.3


From f15f4d720088c140cdf1fee6aeab3549dbdddc41 Mon Sep 17 00:00:00 2001
From: Heinz Mauelshagen <heinzm@redhat.com>
Date: Tue, 25 Aug 2015 17:15:41 +0200
Subject: dm raid: document RAID 4/5/6 discard support

For RAID 4/5/6 data integrity reasons 'discard_zeroes_data' must work
properly.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 Documentation/device-mapper/dm-raid.txt | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/device-mapper/dm-raid.txt b/Documentation/device-mapper/dm-raid.txt
index cb12af3b51c2..df2d636b6088 100644
--- a/Documentation/device-mapper/dm-raid.txt
+++ b/Documentation/device-mapper/dm-raid.txt
@@ -209,6 +209,37 @@ include:
 	"repair" - Initiate a repair of the array.
 	"reshape"- Currently unsupported (-EINVAL).
 
+
+Discard Support
+---------------
+The implementation of discard support among hardware vendors varies.
+When a block is discarded, some storage devices will return zeroes when
+the block is read.  These devices set the 'discard_zeroes_data'
+attribute.  Other devices will return random data.  Confusingly, some
+devices that advertise 'discard_zeroes_data' will not reliably return
+zeroes when discarded blocks are read!  Since RAID 4/5/6 uses blocks
+from a number of devices to calculate parity blocks and (for performance
+reasons) relies on 'discard_zeroes_data' being reliable, it is important
+that the devices be consistent.  Blocks may be discarded in the middle
+of a RAID 4/5/6 stripe and if subsequent read results are not
+consistent, the parity blocks may be calculated differently at any time;
+making the parity blocks useless for redundancy.  It is important to
+understand how your hardware behaves with discards if you are going to
+enable discards with RAID 4/5/6.
+
+Since the behavior of storage devices is unreliable in this respect,
+even when reporting 'discard_zeroes_data', by default RAID 4/5/6
+discard support is disabled -- this ensures data integrity at the
+expense of losing some performance.
+
+Storage devices that properly support 'discard_zeroes_data' are
+increasingly whitelisted in the kernel and can thus be trusted.
+
+For trusted devices, the following dm-raid module parameter can be set
+to safely enable discard support for RAID 4/5/6:
+    'devices_handle_discards_safely'
+
+
 Version History
 ---------------
 1.0.0	Initial version.  Support for RAID 4/5/6
-- 
cgit v1.2.3


From 87583ebb9f6ea6dc7f8ef167b815656787e429fc Mon Sep 17 00:00:00 2001
From: Philip Downey <pdowney@brocade.com>
Date: Mon, 31 Aug 2015 11:30:38 +0100
Subject: IGMP: Document igmp_link_local_mcast_reports

Document the addition of a new sysctl variable which controls the
generation of IGMP reports for link local multicast groups in the
224.0.0.X range.

IGMP reports for local multicast groups can now be optionally
inhibited by setting the value to zero e.g.:
echo 0 > /proc/sys/net/ipv4/igmp_link_local_mcast_reports

To retain backwards compatibility the previous behaviour is retained
by default on system boot or reverted by setting the value back to
non-zero.

Signed-off-by: Philip Downey <pdowney@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/ip-sysctl.txt | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index ac77a13d2ea2..ebe94f2cab98 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1201,6 +1201,11 @@ xfrm4_gc_thresh - INTEGER
 	destination cache entries.  At twice this value the system will
 	refuse new allocations.
 
+igmp_link_local_mcast_reports - BOOLEAN
+	Enable IGMP reports for link local multicast groups in the
+	224.0.0.X range.
+	Default TRUE
+
 Alexey Kuznetsov.
 kuznet@ms2.inr.ac.ru
 
-- 
cgit v1.2.3


From a5597008dbc230876db2d344561d634f4d52ea4a Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew@lunn.ch>
Date: Mon, 31 Aug 2015 15:56:53 +0200
Subject: phy: fixed_phy: Add gpio to determine link up/down.

An SFP module may have a link up/down status pin which can be
connection to a GPIO line of the host. Add support for reading such an
GPIO in the fixed_phy driver.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 .../devicetree/bindings/net/fixed-link.txt         | 14 +++++++++++-
 Documentation/networking/stmmac.txt                |  2 +-
 arch/m68k/coldfire/m5272.c                         |  2 +-
 arch/mips/ar7/platform.c                           |  5 +++--
 arch/mips/bcm47xx/setup.c                          |  2 +-
 drivers/net/ethernet/broadcom/genet/bcmmii.c       |  2 +-
 drivers/net/phy/fixed_phy.c                        | 26 +++++++++++++++++++---
 drivers/of/of_mdio.c                               | 13 ++++++++---
 include/linux/phy_fixed.h                          |  8 +++++--
 9 files changed, 59 insertions(+), 15 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/fixed-link.txt b/Documentation/devicetree/bindings/net/fixed-link.txt
index 82bf7e0f47b6..ec5d889fe3d8 100644
--- a/Documentation/devicetree/bindings/net/fixed-link.txt
+++ b/Documentation/devicetree/bindings/net/fixed-link.txt
@@ -17,6 +17,8 @@ properties:
   enabled.
 * 'asym-pause' (boolean, optional), to indicate that asym_pause should
   be enabled.
+* 'link-gpios' ('gpio-list', optional), to indicate if a gpio can be read
+  to determine if the link is up.
 
 Old, deprecated 'fixed-link' binding:
 
@@ -30,7 +32,7 @@ Old, deprecated 'fixed-link' binding:
   - e: asymmetric pause configuration: 0 for no asymmetric pause, 1 for
     asymmetric pause
 
-Example:
+Examples:
 
 ethernet@0 {
 	...
@@ -40,3 +42,13 @@ ethernet@0 {
 	};
 	...
 };
+
+ethernet@1 {
+	...
+	fixed-link {
+	      speed = <1000>;
+	      pause;
+	      link-gpios = <&gpio0 12 GPIO_ACTIVE_HIGH>;
+	};
+	...
+};
diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index 2903b1cf4d70..d64a14714236 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -254,7 +254,7 @@ static struct fixed_phy_status stmmac0_fixed_phy_status = {
 
 During the board's device_init we can configure the first
 MAC for fixed_link by calling:
-  fixed_phy_add(PHY_POLL, 1, &stmmac0_fixed_phy_status));)
+  fixed_phy_add(PHY_POLL, 1, &stmmac0_fixed_phy_status, -1);
 and the second one, with a real PHY device attached to the bus,
 by using the stmmac_mdio_bus_data structure (to provide the id, the
 reset procedure etc).
diff --git a/arch/m68k/coldfire/m5272.c b/arch/m68k/coldfire/m5272.c
index b15219ed22bf..c525e4c08f84 100644
--- a/arch/m68k/coldfire/m5272.c
+++ b/arch/m68k/coldfire/m5272.c
@@ -126,7 +126,7 @@ static struct fixed_phy_status nettel_fixed_phy_status __initdata = {
 static int __init init_BSP(void)
 {
 	m5272_uarts_init();
-	fixed_phy_add(PHY_POLL, 0, &nettel_fixed_phy_status);
+	fixed_phy_add(PHY_POLL, 0, &nettel_fixed_phy_status, -1);
 	return 0;
 }
 
diff --git a/arch/mips/ar7/platform.c b/arch/mips/ar7/platform.c
index be9ff1673ded..298b97715d5f 100644
--- a/arch/mips/ar7/platform.c
+++ b/arch/mips/ar7/platform.c
@@ -679,7 +679,8 @@ static int __init ar7_register_devices(void)
 	}
 
 	if (ar7_has_high_cpmac()) {
-		res = fixed_phy_add(PHY_POLL, cpmac_high.id, &fixed_phy_status);
+		res = fixed_phy_add(PHY_POLL, cpmac_high.id,
+				    &fixed_phy_status, -1);
 		if (!res) {
 			cpmac_get_mac(1, cpmac_high_data.dev_addr);
 
@@ -692,7 +693,7 @@ static int __init ar7_register_devices(void)
 	} else
 		cpmac_low_data.phy_mask = 0xffffffff;
 
-	res = fixed_phy_add(PHY_POLL, cpmac_low.id, &fixed_phy_status);
+	res = fixed_phy_add(PHY_POLL, cpmac_low.id, &fixed_phy_status, -1);
 	if (!res) {
 		cpmac_get_mac(0, cpmac_low_data.dev_addr);
 		res = platform_device_register(&cpmac_low);
diff --git a/arch/mips/bcm47xx/setup.c b/arch/mips/bcm47xx/setup.c
index 98c075f81795..17503a05938e 100644
--- a/arch/mips/bcm47xx/setup.c
+++ b/arch/mips/bcm47xx/setup.c
@@ -263,7 +263,7 @@ static int __init bcm47xx_register_bus_complete(void)
 	bcm47xx_leds_register();
 	bcm47xx_workarounds();
 
-	fixed_phy_add(PHY_POLL, 0, &bcm47xx_fixed_phy_status);
+	fixed_phy_add(PHY_POLL, 0, &bcm47xx_fixed_phy_status, -1);
 	return 0;
 }
 device_initcall(bcm47xx_register_bus_complete);
diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c b/drivers/net/ethernet/broadcom/genet/bcmmii.c
index b3679ad1c1c7..c8affad76f36 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
@@ -585,7 +585,7 @@ static int bcmgenet_mii_pd_init(struct bcmgenet_priv *priv)
 			.asym_pause = 0,
 		};
 
-		phydev = fixed_phy_register(PHY_POLL, &fphy_status, NULL);
+		phydev = fixed_phy_register(PHY_POLL, &fphy_status, -1, NULL);
 		if (!phydev || IS_ERR(phydev)) {
 			dev_err(kdev, "failed to register fixed PHY device\n");
 			return -ENODEV;
diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
index 2f9457f05a2e..1bb70e3cc03e 100644
--- a/drivers/net/phy/fixed_phy.c
+++ b/drivers/net/phy/fixed_phy.c
@@ -22,6 +22,7 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/of.h>
+#include <linux/gpio.h>
 
 #define MII_REGS_NUM 29
 
@@ -38,6 +39,7 @@ struct fixed_phy {
 	struct fixed_phy_status status;
 	int (*link_update)(struct net_device *, struct fixed_phy_status *);
 	struct list_head node;
+	int link_gpio;
 };
 
 static struct platform_device *pdev;
@@ -52,6 +54,9 @@ static int fixed_phy_update_regs(struct fixed_phy *fp)
 	u16 lpagb = 0;
 	u16 lpa = 0;
 
+	if (gpio_is_valid(fp->link_gpio))
+		fp->status.link = !!gpio_get_value_cansleep(fp->link_gpio);
+
 	if (!fp->status.link)
 		goto done;
 	bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
@@ -215,7 +220,8 @@ int fixed_phy_update_state(struct phy_device *phydev,
 EXPORT_SYMBOL(fixed_phy_update_state);
 
 int fixed_phy_add(unsigned int irq, int phy_addr,
-		  struct fixed_phy_status *status)
+		  struct fixed_phy_status *status,
+		  int link_gpio)
 {
 	int ret;
 	struct fixed_mdio_bus *fmb = &platform_fmb;
@@ -231,15 +237,26 @@ int fixed_phy_add(unsigned int irq, int phy_addr,
 
 	fp->addr = phy_addr;
 	fp->status = *status;
+	fp->link_gpio = link_gpio;
+
+	if (gpio_is_valid(fp->link_gpio)) {
+		ret = gpio_request_one(fp->link_gpio, GPIOF_DIR_IN,
+				       "fixed-link-gpio-link");
+		if (ret)
+			goto err_regs;
+	}
 
 	ret = fixed_phy_update_regs(fp);
 	if (ret)
-		goto err_regs;
+		goto err_gpio;
 
 	list_add_tail(&fp->node, &fmb->phys);
 
 	return 0;
 
+err_gpio:
+	if (gpio_is_valid(fp->link_gpio))
+		gpio_free(fp->link_gpio);
 err_regs:
 	kfree(fp);
 	return ret;
@@ -254,6 +271,8 @@ void fixed_phy_del(int phy_addr)
 	list_for_each_entry_safe(fp, tmp, &fmb->phys, node) {
 		if (fp->addr == phy_addr) {
 			list_del(&fp->node);
+			if (gpio_is_valid(fp->link_gpio))
+				gpio_free(fp->link_gpio);
 			kfree(fp);
 			return;
 		}
@@ -266,6 +285,7 @@ static DEFINE_SPINLOCK(phy_fixed_addr_lock);
 
 struct phy_device *fixed_phy_register(unsigned int irq,
 				      struct fixed_phy_status *status,
+				      int link_gpio,
 				      struct device_node *np)
 {
 	struct fixed_mdio_bus *fmb = &platform_fmb;
@@ -282,7 +302,7 @@ struct phy_device *fixed_phy_register(unsigned int irq,
 	phy_addr = phy_fixed_addr++;
 	spin_unlock(&phy_fixed_addr_lock);
 
-	ret = fixed_phy_add(PHY_POLL, phy_addr, status);
+	ret = fixed_phy_add(PHY_POLL, phy_addr, status, link_gpio);
 	if (ret < 0)
 		return ERR_PTR(ret);
 
diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index 7c8c23cc6896..1350fa25cdb0 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -16,6 +16,7 @@
 #include <linux/phy.h>
 #include <linux/phy_fixed.h>
 #include <linux/of.h>
+#include <linux/of_gpio.h>
 #include <linux/of_irq.h>
 #include <linux/of_mdio.h>
 #include <linux/module.h>
@@ -294,6 +295,7 @@ int of_phy_register_fixed_link(struct device_node *np)
 	struct fixed_phy_status status = {};
 	struct device_node *fixed_link_node;
 	const __be32 *fixed_link_prop;
+	int link_gpio;
 	int len, err;
 	struct phy_device *phy;
 	const char *managed;
@@ -302,7 +304,7 @@ int of_phy_register_fixed_link(struct device_node *np)
 	if (err == 0) {
 		if (strcmp(managed, "in-band-status") == 0) {
 			/* status is zeroed, namely its .link member */
-			phy = fixed_phy_register(PHY_POLL, &status, np);
+			phy = fixed_phy_register(PHY_POLL, &status, -1, np);
 			return IS_ERR(phy) ? PTR_ERR(phy) : 0;
 		}
 	}
@@ -318,8 +320,13 @@ int of_phy_register_fixed_link(struct device_node *np)
 		status.pause = of_property_read_bool(fixed_link_node, "pause");
 		status.asym_pause = of_property_read_bool(fixed_link_node,
 							  "asym-pause");
+		link_gpio = of_get_named_gpio_flags(fixed_link_node,
+						    "link-gpios", 0, NULL);
 		of_node_put(fixed_link_node);
-		phy = fixed_phy_register(PHY_POLL, &status, np);
+		if (link_gpio == -EPROBE_DEFER)
+			return -EPROBE_DEFER;
+
+		phy = fixed_phy_register(PHY_POLL, &status, link_gpio, np);
 		return IS_ERR(phy) ? PTR_ERR(phy) : 0;
 	}
 
@@ -331,7 +338,7 @@ int of_phy_register_fixed_link(struct device_node *np)
 		status.speed = be32_to_cpu(fixed_link_prop[2]);
 		status.pause = be32_to_cpu(fixed_link_prop[3]);
 		status.asym_pause = be32_to_cpu(fixed_link_prop[4]);
-		phy = fixed_phy_register(PHY_POLL, &status, np);
+		phy = fixed_phy_register(PHY_POLL, &status, -1, np);
 		return IS_ERR(phy) ? PTR_ERR(phy) : 0;
 	}
 
diff --git a/include/linux/phy_fixed.h b/include/linux/phy_fixed.h
index fe5732d53eda..2400d2ea4f34 100644
--- a/include/linux/phy_fixed.h
+++ b/include/linux/phy_fixed.h
@@ -13,9 +13,11 @@ struct device_node;
 
 #if IS_ENABLED(CONFIG_FIXED_PHY)
 extern int fixed_phy_add(unsigned int irq, int phy_id,
-			 struct fixed_phy_status *status);
+			 struct fixed_phy_status *status,
+			 int link_gpio);
 extern struct phy_device *fixed_phy_register(unsigned int irq,
 					     struct fixed_phy_status *status,
+					     int link_gpio,
 					     struct device_node *np);
 extern void fixed_phy_del(int phy_addr);
 extern int fixed_phy_set_link_update(struct phy_device *phydev,
@@ -26,12 +28,14 @@ extern int fixed_phy_update_state(struct phy_device *phydev,
 			   const struct fixed_phy_status *changed);
 #else
 static inline int fixed_phy_add(unsigned int irq, int phy_id,
-				struct fixed_phy_status *status)
+				struct fixed_phy_status *status,
+				int link_gpio)
 {
 	return -ENODEV;
 }
 static inline struct phy_device *fixed_phy_register(unsigned int irq,
 						struct fixed_phy_status *status,
+						int gpio_link,
 						struct device_node *np)
 {
 	return ERR_PTR(-ENODEV);
-- 
cgit v1.2.3


From 472204fe37083aa8f31aaf183b4072d432e7c5ea Mon Sep 17 00:00:00 2001
From: Mugunthan V N <mugunthanvnm@ti.com>
Date: Mon, 31 Aug 2015 11:51:29 +0530
Subject: Documentation: DT: cpsw: document missing compatible

CPSW driver has multiple compatibles for errata implementations but not
documented, add necessary documentation.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/net/cpsw.txt | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/net/cpsw.txt b/Documentation/devicetree/bindings/net/cpsw.txt
index 33fe8462edf4..a9df21aaa154 100644
--- a/Documentation/devicetree/bindings/net/cpsw.txt
+++ b/Documentation/devicetree/bindings/net/cpsw.txt
@@ -2,7 +2,11 @@ TI SoC Ethernet Switch Controller Device Tree Bindings
 ------------------------------------------------------
 
 Required properties:
-- compatible		: Should be "ti,cpsw"
+- compatible		: Should be one of the below:-
+			  "ti,cpsw" for backward compatible
+			  "ti,am335x-cpsw" for AM335x controllers
+			  "ti,am4372-cpsw" for AM437x controllers
+			  "ti,dra7-cpsw" for DRA7x controllers
 - reg			: physical base address and size of the cpsw
 			  registers map
 - interrupts		: property with a value describing the interrupt
-- 
cgit v1.2.3


From 9d6f85d9fe422aa42f3f337872050fb2a30cd430 Mon Sep 17 00:00:00 2001
From: Robert Jarzmik <robert.jarzmik@free.fr>
Date: Sun, 30 Aug 2015 21:44:12 +0200
Subject: mtd: nand: pxa3xx: add optional dma for pxa architecture

The PXA architecture provides a DMA to pump data from the nand
controller to memory and the other way around. Add it to the binding
description.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/mtd/pxa3xx-nand.txt | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/mtd/pxa3xx-nand.txt b/Documentation/devicetree/bindings/mtd/pxa3xx-nand.txt
index 4f833e3c4f51..d9b655f11048 100644
--- a/Documentation/devicetree/bindings/mtd/pxa3xx-nand.txt
+++ b/Documentation/devicetree/bindings/mtd/pxa3xx-nand.txt
@@ -11,6 +11,7 @@ Required properties:
 
 Optional properties:
 
+ - dmas:			dma data channel, see dma.txt binding doc
  - marvell,nand-enable-arbiter:	Set to enable the bus arbiter
  - marvell,nand-keep-config:	Set to keep the NAND controller config as set
 				by the bootloader
@@ -32,6 +33,8 @@ Example:
 		compatible = "marvell,pxa3xx-nand";
 		reg = <0x43100000 90>;
 		interrupts = <45>;
+		dmas = <&pdma 97 0>;
+		dma-names = "data";
 		#address-cells = <1>;
 
 		marvell,nand-enable-arbiter;
-- 
cgit v1.2.3


From c9c96ae2c57d91ea2b73ef447fdd44c760a96d97 Mon Sep 17 00:00:00 2001
From: Pi-Cheng Chen <pi-cheng.chen@linaro.org>
Date: Mon, 17 Aug 2015 17:24:23 +0800
Subject: dt-bindings: mediatek: Add MT8173 CPU DVFS clock bindings

This patch adds the clock and regulator consumer properties part of
document for CPU DVFS clocks on Mediatek MT8173 SoC.

Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org>
Acked-by: Michael Turquette <mturquette@baylibre.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 .../devicetree/bindings/clock/mt8173-cpu-dvfs.txt  | 83 ++++++++++++++++++++++
 1 file changed, 83 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/mt8173-cpu-dvfs.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/clock/mt8173-cpu-dvfs.txt b/Documentation/devicetree/bindings/clock/mt8173-cpu-dvfs.txt
new file mode 100644
index 000000000000..52b457c23eed
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/mt8173-cpu-dvfs.txt
@@ -0,0 +1,83 @@
+Device Tree Clock bindins for CPU DVFS of Mediatek MT8173 SoC
+
+Required properties:
+- clocks: A list of phandle + clock-specifier pairs for the clocks listed in clock names.
+- clock-names: Should contain the following:
+	"cpu"		- The multiplexer for clock input of CPU cluster.
+	"intermediate"	- A parent of "cpu" clock which is used as "intermediate" clock
+			  source (usually MAINPLL) when the original CPU PLL is under
+			  transition and not stable yet.
+	Please refer to Documentation/devicetree/bindings/clk/clock-bindings.txt for
+	generic clock consumer properties.
+- proc-supply: Regulator for Vproc of CPU cluster.
+
+Optional properties:
+- sram-supply: Regulator for Vsram of CPU cluster. When present, the cpufreq driver
+	       needs to do "voltage tracking" to step by step scale up/down Vproc and
+	       Vsram to fit SoC specific needs. When absent, the voltage scaling
+	       flow is handled by hardware, hence no software "voltage tracking" is
+	       needed.
+
+Example:
+--------
+	cpu0: cpu@0 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x000>;
+		enable-method = "psci";
+		cpu-idle-states = <&CPU_SLEEP_0>;
+		clocks = <&infracfg CLK_INFRA_CA53SEL>,
+			 <&apmixedsys CLK_APMIXED_MAINPLL>;
+		clock-names = "cpu", "intermediate";
+	};
+
+	cpu1: cpu@1 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x001>;
+		enable-method = "psci";
+		cpu-idle-states = <&CPU_SLEEP_0>;
+		clocks = <&infracfg CLK_INFRA_CA53SEL>,
+			 <&apmixedsys CLK_APMIXED_MAINPLL>;
+		clock-names = "cpu", "intermediate";
+	};
+
+	cpu2: cpu@100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x100>;
+		enable-method = "psci";
+		cpu-idle-states = <&CPU_SLEEP_0>;
+		clocks = <&infracfg CLK_INFRA_CA57SEL>,
+			 <&apmixedsys CLK_APMIXED_MAINPLL>;
+		clock-names = "cpu", "intermediate";
+	};
+
+	cpu3: cpu@101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x101>;
+		enable-method = "psci";
+		cpu-idle-states = <&CPU_SLEEP_0>;
+		clocks = <&infracfg CLK_INFRA_CA57SEL>,
+			 <&apmixedsys CLK_APMIXED_MAINPLL>;
+		clock-names = "cpu", "intermediate";
+	};
+
+	&cpu0 {
+		proc-supply = <&mt6397_vpca15_reg>;
+	};
+
+	&cpu1 {
+		proc-supply = <&mt6397_vpca15_reg>;
+	};
+
+	&cpu2 {
+		proc-supply = <&da9211_vcpu_reg>;
+		sram-supply = <&mt6397_vsramca7_reg>;
+	};
+
+	&cpu3 {
+		proc-supply = <&da9211_vcpu_reg>;
+		sram-supply = <&mt6397_vsramca7_reg>;
+	};
-- 
cgit v1.2.3


From 6bfb7c7434f75d29241413dc7e784295ba56de98 Mon Sep 17 00:00:00 2001
From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Mon, 3 Aug 2015 08:36:14 +0530
Subject: cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event

What's being done from CPUFREQ_INCOMPATIBLE, can also be done with
CPUFREQ_ADJUST. There is nothing special with CPUFREQ_INCOMPATIBLE
notifier.

Kill CPUFREQ_INCOMPATIBLE and fix its usage sites.

This also updates the numbering of notifier events to remove holes.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/cpu-freq/core.txt       | 7 ++-----
 drivers/acpi/processor_perflib.c      | 2 +-
 drivers/cpufreq/cpufreq.c             | 4 ----
 drivers/cpufreq/ppc_cbe_cpufreq_pmi.c | 4 ++--
 drivers/video/fbdev/pxafb.c           | 1 -
 drivers/video/fbdev/sa1100fb.c        | 1 -
 include/linux/cpufreq.h               | 9 ++++-----
 7 files changed, 9 insertions(+), 19 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/cpu-freq/core.txt b/Documentation/cpu-freq/core.txt
index 70933eadc308..ba78e7c2a069 100644
--- a/Documentation/cpu-freq/core.txt
+++ b/Documentation/cpu-freq/core.txt
@@ -55,16 +55,13 @@ transition notifiers.
 ----------------------------
 
 These are notified when a new policy is intended to be set. Each
-CPUFreq policy notifier is called three times for a policy transition:
+CPUFreq policy notifier is called twice for a policy transition:
 
 1.) During CPUFREQ_ADJUST all CPUFreq notifiers may change the limit if
     they see a need for this - may it be thermal considerations or
     hardware limitations.
 
-2.) During CPUFREQ_INCOMPATIBLE only changes may be done in order to avoid
-    hardware failure.
-
-3.) And during CPUFREQ_NOTIFY all notifiers are informed of the new policy
+2.) And during CPUFREQ_NOTIFY all notifiers are informed of the new policy
    - if two hardware drivers failed to agree on a new policy before this
    stage, the incompatible hardware shall be shut down, and the user
    informed of this.
diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index 36b6da2918a6..91941f1cdecb 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -87,7 +87,7 @@ static int acpi_processor_ppc_notifier(struct notifier_block *nb,
 	if (ignore_ppc)
 		return 0;
 
-	if (event != CPUFREQ_INCOMPATIBLE)
+	if (event != CPUFREQ_ADJUST)
 		return 0;
 
 	mutex_lock(&performance_mutex);
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 76a26609d96b..293f47b814bf 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2206,10 +2206,6 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
 	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 			CPUFREQ_ADJUST, new_policy);
 
-	/* adjust if necessary - hardware incompatibility*/
-	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
-			CPUFREQ_INCOMPATIBLE, new_policy);
-
 	/*
 	 * verify the cpu speed can be set within this limit, which might be
 	 * different to the first one
diff --git a/drivers/cpufreq/ppc_cbe_cpufreq_pmi.c b/drivers/cpufreq/ppc_cbe_cpufreq_pmi.c
index d29e8da396a0..7969f7690498 100644
--- a/drivers/cpufreq/ppc_cbe_cpufreq_pmi.c
+++ b/drivers/cpufreq/ppc_cbe_cpufreq_pmi.c
@@ -97,8 +97,8 @@ static int pmi_notifier(struct notifier_block *nb,
 	struct cpufreq_frequency_table *cbe_freqs;
 	u8 node;
 
-	/* Should this really be called for CPUFREQ_ADJUST, CPUFREQ_INCOMPATIBLE
-	 * and CPUFREQ_NOTIFY policy events?)
+	/* Should this really be called for CPUFREQ_ADJUST and CPUFREQ_NOTIFY
+	 * policy events?)
 	 */
 	if (event == CPUFREQ_START)
 		return 0;
diff --git a/drivers/video/fbdev/pxafb.c b/drivers/video/fbdev/pxafb.c
index 7245611ec963..94813af97f09 100644
--- a/drivers/video/fbdev/pxafb.c
+++ b/drivers/video/fbdev/pxafb.c
@@ -1668,7 +1668,6 @@ pxafb_freq_policy(struct notifier_block *nb, unsigned long val, void *data)
 
 	switch (val) {
 	case CPUFREQ_ADJUST:
-	case CPUFREQ_INCOMPATIBLE:
 		pr_debug("min dma period: %d ps, "
 			"new clock %d kHz\n", pxafb_display_dma_period(var),
 			policy->max);
diff --git a/drivers/video/fbdev/sa1100fb.c b/drivers/video/fbdev/sa1100fb.c
index 89dd7e02197f..dcf774c15889 100644
--- a/drivers/video/fbdev/sa1100fb.c
+++ b/drivers/video/fbdev/sa1100fb.c
@@ -1042,7 +1042,6 @@ sa1100fb_freq_policy(struct notifier_block *nb, unsigned long val,
 
 	switch (val) {
 	case CPUFREQ_ADJUST:
-	case CPUFREQ_INCOMPATIBLE:
 		dev_dbg(fbi->dev, "min dma period: %d ps, "
 			"new clock %d kHz\n", sa1100fb_min_dma_period(fbi),
 			policy->max);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index bde1e567b3a9..bedcc90c0757 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -369,11 +369,10 @@ static inline void cpufreq_resume(void) {}
 
 /* Policy Notifiers  */
 #define CPUFREQ_ADJUST			(0)
-#define CPUFREQ_INCOMPATIBLE		(1)
-#define CPUFREQ_NOTIFY			(2)
-#define CPUFREQ_START			(3)
-#define CPUFREQ_CREATE_POLICY		(4)
-#define CPUFREQ_REMOVE_POLICY		(5)
+#define CPUFREQ_NOTIFY			(1)
+#define CPUFREQ_START			(2)
+#define CPUFREQ_CREATE_POLICY		(3)
+#define CPUFREQ_REMOVE_POLICY		(4)
 
 #ifdef CONFIG_CPU_FREQ
 int cpufreq_register_notifier(struct notifier_block *nb, unsigned int list);
-- 
cgit v1.2.3


From b9c93646fd5cb669d096fec5ad25a01f04cfde27 Mon Sep 17 00:00:00 2001
From: Kishon Vijay Abraham I <kishon@ti.com>
Date: Thu, 3 Sep 2015 12:20:37 +0530
Subject: regulator: pbias: program pbias register offset in pbias driver

Add separate compatible strings for every platform and populate the
pbias register offset in the driver data.
This helps avoid depending on the dt for pbias register offset.

Also update the dt binding documentation for the new compatible
strings.

Suggested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 .../bindings/regulator/pbias-regulator.txt         |  7 ++-
 drivers/regulator/pbias-regulator.c                | 56 +++++++++++++++++++---
 2 files changed, 56 insertions(+), 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/regulator/pbias-regulator.txt b/Documentation/devicetree/bindings/regulator/pbias-regulator.txt
index 32aa26f1e434..acbcb452a69a 100644
--- a/Documentation/devicetree/bindings/regulator/pbias-regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/pbias-regulator.txt
@@ -2,7 +2,12 @@ PBIAS internal regulator for SD card dual voltage i/o pads on OMAP SoCs.
 
 Required properties:
 - compatible:
-  - "ti,pbias-omap" for OMAP2, OMAP3, OMAP4, OMAP5, DRA7.
+  - should be "ti,pbias-dra7" for DRA7
+  - should be "ti,pbias-omap2" for OMAP2
+  - should be "ti,pbias-omap3" for OMAP3
+  - should be "ti,pbias-omap4" for OMAP4
+  - should be "ti,pbias-omap5" for OMAP5
+  - "ti,pbias-omap" is deprecated
 - reg: pbias register offset from syscon base and size of pbias register.
 - syscon : phandle of the system control module
 - regulator-name : should be
diff --git a/drivers/regulator/pbias-regulator.c b/drivers/regulator/pbias-regulator.c
index bd2b75c0d1d1..c21cedbdf451 100644
--- a/drivers/regulator/pbias-regulator.c
+++ b/drivers/regulator/pbias-regulator.c
@@ -44,6 +44,10 @@ struct pbias_regulator_data {
 	int voltage;
 };
 
+struct pbias_of_data {
+	unsigned int offset;
+};
+
 static const unsigned int pbias_volt_table[] = {
 	1800000,
 	3000000
@@ -98,8 +102,35 @@ static struct of_regulator_match pbias_matches[] = {
 };
 #define PBIAS_NUM_REGS	ARRAY_SIZE(pbias_matches)
 
+/* Offset from SCM general area (and syscon) base */
+
+static const struct pbias_of_data pbias_of_data_omap2 = {
+	.offset = 0x230,
+};
+
+static const struct pbias_of_data pbias_of_data_omap3 = {
+	.offset = 0x2b0,
+};
+
+static const struct pbias_of_data pbias_of_data_omap4 = {
+	.offset = 0x60,
+};
+
+static const struct pbias_of_data pbias_of_data_omap5 = {
+	.offset = 0x60,
+};
+
+static const struct pbias_of_data pbias_of_data_dra7 = {
+	.offset = 0xe00,
+};
+
 static const struct of_device_id pbias_of_match[] = {
 	{ .compatible = "ti,pbias-omap", },
+	{ .compatible = "ti,pbias-omap2", .data = &pbias_of_data_omap2, },
+	{ .compatible = "ti,pbias-omap3", .data = &pbias_of_data_omap3, },
+	{ .compatible = "ti,pbias-omap4", .data = &pbias_of_data_omap4, },
+	{ .compatible = "ti,pbias-omap5", .data = &pbias_of_data_omap5, },
+	{ .compatible = "ti,pbias-dra7", .data = &pbias_of_data_dra7, },
 	{},
 };
 MODULE_DEVICE_TABLE(of, pbias_of_match);
@@ -114,6 +145,9 @@ static int pbias_regulator_probe(struct platform_device *pdev)
 	const struct pbias_reg_info *info;
 	int ret = 0;
 	int count, idx, data_idx = 0;
+	const struct of_device_id *match;
+	const struct pbias_of_data *data;
+	unsigned int offset;
 
 	count = of_regulator_match(&pdev->dev, np, pbias_matches,
 						PBIAS_NUM_REGS);
@@ -129,6 +163,20 @@ static int pbias_regulator_probe(struct platform_device *pdev)
 	if (IS_ERR(syscon))
 		return PTR_ERR(syscon);
 
+	match = of_match_device(of_match_ptr(pbias_of_match), &pdev->dev);
+	if (match && match->data) {
+		data = match->data;
+		offset = data->offset;
+	} else {
+		res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+		if (!res)
+			return -EINVAL;
+
+		offset = res->start;
+		dev_WARN(&pdev->dev,
+			 "using legacy dt data for pbias offset\n");
+	}
+
 	cfg.regmap = syscon;
 	cfg.dev = &pdev->dev;
 
@@ -141,10 +189,6 @@ static int pbias_regulator_probe(struct platform_device *pdev)
 		if (!info)
 			return -ENODEV;
 
-		res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-		if (!res)
-			return -EINVAL;
-
 		drvdata[data_idx].syscon = syscon;
 		drvdata[data_idx].info = info;
 		drvdata[data_idx].desc.name = info->name;
@@ -154,9 +198,9 @@ static int pbias_regulator_probe(struct platform_device *pdev)
 		drvdata[data_idx].desc.volt_table = pbias_volt_table;
 		drvdata[data_idx].desc.n_voltages = 2;
 		drvdata[data_idx].desc.enable_time = info->enable_time;
-		drvdata[data_idx].desc.vsel_reg = res->start;
+		drvdata[data_idx].desc.vsel_reg = offset;
 		drvdata[data_idx].desc.vsel_mask = info->vmode;
-		drvdata[data_idx].desc.enable_reg = res->start;
+		drvdata[data_idx].desc.enable_reg = offset;
 		drvdata[data_idx].desc.enable_mask = info->enable_mask;
 		drvdata[data_idx].desc.enable_val = info->enable;
 
-- 
cgit v1.2.3


From 0f6ce77538c3f0628acdeee30738e4c8fe08d7e2 Mon Sep 17 00:00:00 2001
From: James Hogan <james.hogan@imgtec.com>
Date: Wed, 15 Jul 2015 16:17:42 +0100
Subject: Documentation/sysrq.txt: Mention MIPS TLB dump (x)

Commit d1e9a4f54735 ("MIPS: Add SysRq operation to dump TLBs on all
CPUs") added the 'x' sysrq key for dumping MIPS TLB entries, but didn't
document it in Documentation/sysrq.txt.

Add mention of the MIPS use of the 'x' SysRq key.

Reported-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: linux-doc@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/10720/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
---
 Documentation/sysrq.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
index 0e307c94809a..267f39386f99 100644
--- a/Documentation/sysrq.txt
+++ b/Documentation/sysrq.txt
@@ -119,6 +119,7 @@ On all -  write a character to /proc/sysrq-trigger.  e.g.:
 
 'x'	- Used by xmon interface on ppc/powerpc platforms.
           Show global PMU Registers on sparc64.
+          Dump all TLB entries on MIPS.
 
 'y'	- Show global CPU Registers [SPARC-64 specific]
 
-- 
cgit v1.2.3


From b7565cc3321987e46d1f65f5b5eb3fb48268fdc2 Mon Sep 17 00:00:00 2001
From: Ralf Baechle <ralf@linux-mips.org>
Date: Wed, 29 Jul 2015 23:17:00 +0200
Subject: Documentation: MIPS now supports uprobes.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
---
 Documentation/features/debug/uprobes/arch-support.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/features/debug/uprobes/arch-support.txt b/Documentation/features/debug/uprobes/arch-support.txt
index 4efe36c3ace9..d605c3fc38fd 100644
--- a/Documentation/features/debug/uprobes/arch-support.txt
+++ b/Documentation/features/debug/uprobes/arch-support.txt
@@ -22,7 +22,7 @@
     |        m68k: | TODO |
     |       metag: | TODO |
     |  microblaze: | TODO |
-    |        mips: | TODO |
+    |        mips: |  ok  |
     |     mn10300: | TODO |
     |       nios2: | TODO |
     |    openrisc: | TODO |
-- 
cgit v1.2.3


From d6ed2b9b60cb03091e0c84c270d145a475606297 Mon Sep 17 00:00:00 2001
From: Ezequiel Garcia <ezequiel.garcia@imgtec.com>
Date: Mon, 27 Jul 2015 15:02:33 +0100
Subject: Documentation: dt: Add Pistachio SoC general purpose timer binding
 document

Add a device-tree binding document for the clocksource driver provided
by Pistachio SoC general purpose timers.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@imgtec.com>
Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: devicetree@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: James Hartley <James.Hartley@imgtec.com>
Cc: Govindraj Raja <Govindraj.Raja@imgtec.com>
Cc: Damien Horsley <Damien.Horsley@imgtec.com>
Cc: James Hogan <James.Hogan@imgtec.com>
Cc: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Patchwork: https://patchwork.linux-mips.org/patch/10783/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
---
 .../bindings/timer/img,pistachio-gptimer.txt       | 28 ++++++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/timer/img,pistachio-gptimer.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/timer/img,pistachio-gptimer.txt b/Documentation/devicetree/bindings/timer/img,pistachio-gptimer.txt
new file mode 100644
index 000000000000..7afce80bf6a0
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/img,pistachio-gptimer.txt
@@ -0,0 +1,28 @@
+* Pistachio general-purpose timer based clocksource
+
+Required properties:
+ - compatible: "img,pistachio-gptimer".
+ - reg: Address range of the timer registers.
+ - interrupts: An interrupt for each of the four timers
+ - clocks: Should contain a clock specifier for each entry in clock-names
+ - clock-names: Should contain the following entries:
+                "sys", interface clock
+                "slow", slow counter clock
+                "fast", fast counter clock
+ - img,cr-periph: Must contain a phandle to the peripheral control
+		  syscon node.
+
+Example:
+	timer: timer@18102000 {
+		compatible = "img,pistachio-gptimer";
+		reg = <0x18102000 0x100>;
+		interrupts = <GIC_SHARED 60 IRQ_TYPE_LEVEL_HIGH>,
+			     <GIC_SHARED 61 IRQ_TYPE_LEVEL_HIGH>,
+			     <GIC_SHARED 62 IRQ_TYPE_LEVEL_HIGH>,
+			     <GIC_SHARED 63 IRQ_TYPE_LEVEL_HIGH>;
+		clocks = <&clk_periph PERIPH_CLK_COUNTER_FAST>,
+		         <&clk_periph PERIPH_CLK_COUNTER_SLOW>,
+			 <&cr_periph SYS_CLK_TIMER>;
+		clock-names = "fast", "slow", "sys";
+		img,cr-periph = <&cr_periph>;
+	};
-- 
cgit v1.2.3


From 062683901ad5c29ac375e6b7c7bca2737d41e11a Mon Sep 17 00:00:00 2001
From: Masanari Iida <standby24x7@gmail.com>
Date: Mon, 17 Aug 2015 08:20:56 -0300
Subject: [media] DocBook media: Fix typo "the the" in xml files

This patch fix spelling typo "the the" found in controls.xml
and vidioc-g-param.xml.
These xml files are'nt generated from any source files, so I have
to fix these xml files directly.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
---
 Documentation/DocBook/media/v4l/controls.xml      | 2 +-
 Documentation/DocBook/media/v4l/vidioc-g-parm.xml | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/media/v4l/controls.xml b/Documentation/DocBook/media/v4l/controls.xml
index 6e1667b5f3eb..33aece541880 100644
--- a/Documentation/DocBook/media/v4l/controls.xml
+++ b/Documentation/DocBook/media/v4l/controls.xml
@@ -3414,7 +3414,7 @@ giving priority to the center of the metered area.</entry>
 		<row>
 		  <entry><constant>V4L2_EXPOSURE_METERING_MATRIX</constant>&nbsp;</entry>
 		  <entry>A multi-zone metering. The light intensity is measured
-in several points of the frame and the the results are combined. The
+in several points of the frame and the results are combined. The
 algorithm of the zones selection and their significance in calculating the
 final value is device dependent.</entry>
 		</row>
diff --git a/Documentation/DocBook/media/v4l/vidioc-g-parm.xml b/Documentation/DocBook/media/v4l/vidioc-g-parm.xml
index f4e28e7d4751..721728745407 100644
--- a/Documentation/DocBook/media/v4l/vidioc-g-parm.xml
+++ b/Documentation/DocBook/media/v4l/vidioc-g-parm.xml
@@ -267,7 +267,7 @@ is intended for still imaging applications. The idea is to get the
 best possible image quality that the hardware can deliver. It is not
 defined how the driver writer may achieve that; it will depend on the
 hardware and the ingenuity of the driver writer. High quality mode is
-a different mode from the the regular motion video capture modes. In
+a different mode from the regular motion video capture modes. In
 high quality mode:<itemizedlist>
 		  <listitem>
 		    <para>The driver may be able to capture higher
-- 
cgit v1.2.3


From 92e847212676bb3c5f9f7e317907367dbb8c504b Mon Sep 17 00:00:00 2001
From: Corey Minyard <cminyard@mvista.com>
Date: Thu, 3 Sep 2015 14:58:55 -0500
Subject: ipmi: Add device tree bindings information

Signed-off-by: Corey Minyard <cminyard@mvista.com>
---
 Documentation/devicetree/bindings/ipmi.txt | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/ipmi.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/ipmi.txt b/Documentation/devicetree/bindings/ipmi.txt
new file mode 100644
index 000000000000..d5f1a877ed3e
--- /dev/null
+++ b/Documentation/devicetree/bindings/ipmi.txt
@@ -0,0 +1,25 @@
+IPMI device
+
+Required properties:
+- compatible: should be one of ipmi-kcs, ipmi-smic, or ipmi-bt
+- device_type: should be ipmi
+- reg: Address and length of the register set for the device
+
+Optional properties:
+- interrupts: The interrupt for the device.  Without this the interface
+	is polled.
+- reg-size - The size of the register.  Defaults to 1
+- reg-spacing - The number of bytes between register starts.  Defaults to 1
+- reg-shift - The amount to shift the registers to the right to get the data
+	into bit zero.
+
+Example:
+
+smic@fff3a000 {
+	compatible = "ipmi-smic";
+	device_type = "ipmi";
+	reg = <0xfff3a000 0x1000>;
+	interrupts = <0 24 4>;
+	reg-size = <4>;
+	reg-spacing = <4>;
+};
-- 
cgit v1.2.3


From 25edd8bffd0f7563f0c04c1d219eb89061ce9886 Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange@redhat.com>
Date: Fri, 4 Sep 2015 15:46:00 -0700
Subject: userfaultfd: linux/Documentation/vm/userfaultfd.txt

This is the latest userfaultfd patchset.  The postcopy live migration
feature on the qemu side is mostly ready to be merged and it entirely
depends on the userfaultfd syscall to be merged as well.  So it'd be great
if this patchset could be reviewed for merging in -mm.

Userfaults allow to implement on demand paging from userland and more
generally they allow userland to more efficiently take control of the
behavior of page faults than what was available before (PROT_NONE +
SIGSEGV trap).

The use cases are:

1) KVM postcopy live migration (one form of cloud memory
   externalization).

   KVM postcopy live migration is the primary driver of this work:

    http://blog.zhaw.ch/icclab/setting-up-post-copy-live-migration-in-openstack/
    http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04873.html

2) postcopy live migration of binaries inside linux containers:

    http://thread.gmane.org/gmane.linux.kernel.mm/132662

3) KVM postcopy live snapshotting (allowing to limit/throttle the
   memory usage, unlike fork would, plus the avoidance of fork
   overhead in the first place).

   While the wrprotect tracking is not implemented yet, the syscall API is
   already contemplating the wrprotect fault tracking and it's generic enough
   to allow its later implementation in a backwards compatible fashion.

4) KVM userfaults on shared memory. The UFFDIO_COPY lowlevel method
   should be extended to work also on tmpfs and then the
   uffdio_register.ioctls will notify userland that UFFDIO_COPY is
   available even when the registered virtual memory range is tmpfs
   backed.

5) alternate mechanism to notify web browsers or apps on embedded
   devices that volatile pages have been reclaimed. This basically
   avoids the need to run a syscall before the app can access with the
   CPU the virtual regions marked volatile. This depends on point 4)
   to be fulfilled first, as volatile pages happily apply to tmpfs.

Even though there wasn't a real use case requesting it yet, it also
allows to implement distributed shared memory in a way that readonly
shared mappings can exist simultaneously in different hosts and they
can be become exclusive at the first wrprotect fault.

This patch (of 22):

Add documentation.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/userfaultfd.txt | 142 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)
 create mode 100644 Documentation/vm/userfaultfd.txt

(limited to 'Documentation')

diff --git a/Documentation/vm/userfaultfd.txt b/Documentation/vm/userfaultfd.txt
new file mode 100644
index 000000000000..90912925425e
--- /dev/null
+++ b/Documentation/vm/userfaultfd.txt
@@ -0,0 +1,142 @@
+= Userfaultfd =
+
+== Objective ==
+
+Userfaults allow the implementation of on-demand paging from userland
+and more generally they allow userland to take control of various
+memory page faults, something otherwise only the kernel code could do.
+
+For example userfaults allows a proper and more optimal implementation
+of the PROT_NONE+SIGSEGV trick.
+
+== Design ==
+
+Userfaults are delivered and resolved through the userfaultfd syscall.
+
+The userfaultfd (aside from registering and unregistering virtual
+memory ranges) provides two primary functionalities:
+
+1) read/POLLIN protocol to notify a userland thread of the faults
+   happening
+
+2) various UFFDIO_* ioctls that can manage the virtual memory regions
+   registered in the userfaultfd that allows userland to efficiently
+   resolve the userfaults it receives via 1) or to manage the virtual
+   memory in the background
+
+The real advantage of userfaults if compared to regular virtual memory
+management of mremap/mprotect is that the userfaults in all their
+operations never involve heavyweight structures like vmas (in fact the
+userfaultfd runtime load never takes the mmap_sem for writing).
+
+Vmas are not suitable for page- (or hugepage) granular fault tracking
+when dealing with virtual address spaces that could span
+Terabytes. Too many vmas would be needed for that.
+
+The userfaultfd once opened by invoking the syscall, can also be
+passed using unix domain sockets to a manager process, so the same
+manager process could handle the userfaults of a multitude of
+different processes without them being aware about what is going on
+(well of course unless they later try to use the userfaultfd
+themselves on the same region the manager is already tracking, which
+is a corner case that would currently return -EBUSY).
+
+== API ==
+
+When first opened the userfaultfd must be enabled invoking the
+UFFDIO_API ioctl specifying a uffdio_api.api value set to UFFD_API (or
+a later API version) which will specify the read/POLLIN protocol
+userland intends to speak on the UFFD. The UFFDIO_API ioctl if
+successful (i.e. if the requested uffdio_api.api is spoken also by the
+running kernel), will return into uffdio_api.features and
+uffdio_api.ioctls two 64bit bitmasks of respectively the activated
+feature of the read(2) protocol and the generic ioctl available.
+
+Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should
+be invoked (if present in the returned uffdio_api.ioctls bitmask) to
+register a memory range in the userfaultfd by setting the
+uffdio_register structure accordingly. The uffdio_register.mode
+bitmask will specify to the kernel which kind of faults to track for
+the range (UFFDIO_REGISTER_MODE_MISSING would track missing
+pages). The UFFDIO_REGISTER ioctl will return the
+uffdio_register.ioctls bitmask of ioctls that are suitable to resolve
+userfaults on the range registered. Not all ioctls will necessarily be
+supported for all memory types depending on the underlying virtual
+memory backend (anonymous memory vs tmpfs vs real filebacked
+mappings).
+
+Userland can use the uffdio_register.ioctls to manage the virtual
+address space in the background (to add or potentially also remove
+memory from the userfaultfd registered range). This means a userfault
+could be triggering just before userland maps in the background the
+user-faulted page.
+
+The primary ioctl to resolve userfaults is UFFDIO_COPY. That
+atomically copies a page into the userfault registered range and wakes
+up the blocked userfaults (unless uffdio_copy.mode &
+UFFDIO_COPY_MODE_DONTWAKE is set). Other ioctl works similarly to
+UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an
+half copied page since it'll keep userfaulting until the copy has
+finished.
+
+== QEMU/KVM ==
+
+QEMU/KVM is using the userfaultfd syscall to implement postcopy live
+migration. Postcopy live migration is one form of memory
+externalization consisting of a virtual machine running with part or
+all of its memory residing on a different node in the cloud. The
+userfaultfd abstraction is generic enough that not a single line of
+KVM kernel code had to be modified in order to add postcopy live
+migration to QEMU.
+
+Guest async page faults, FOLL_NOWAIT and all other GUP features work
+just fine in combination with userfaults. Userfaults trigger async
+page faults in the guest scheduler so those guest processes that
+aren't waiting for userfaults (i.e. network bound) can keep running in
+the guest vcpus.
+
+It is generally beneficial to run one pass of precopy live migration
+just before starting postcopy live migration, in order to avoid
+generating userfaults for readonly guest regions.
+
+The implementation of postcopy live migration currently uses one
+single bidirectional socket but in the future two different sockets
+will be used (to reduce the latency of the userfaults to the minimum
+possible without having to decrease /proc/sys/net/ipv4/tcp_wmem).
+
+The QEMU in the source node writes all pages that it knows are missing
+in the destination node, into the socket, and the migration thread of
+the QEMU running in the destination node runs UFFDIO_COPY|ZEROPAGE
+ioctls on the userfaultfd in order to map the received pages into the
+guest (UFFDIO_ZEROCOPY is used if the source page was a zero page).
+
+A different postcopy thread in the destination node listens with
+poll() to the userfaultfd in parallel. When a POLLIN event is
+generated after a userfault triggers, the postcopy thread read() from
+the userfaultfd and receives the fault address (or -EAGAIN in case the
+userfault was already resolved and waken by a UFFDIO_COPY|ZEROPAGE run
+by the parallel QEMU migration thread).
+
+After the QEMU postcopy thread (running in the destination node) gets
+the userfault address it writes the information about the missing page
+into the socket. The QEMU source node receives the information and
+roughly "seeks" to that page address and continues sending all
+remaining missing pages from that new page offset. Soon after that
+(just the time to flush the tcp_wmem queue through the network) the
+migration thread in the QEMU running in the destination node will
+receive the page that triggered the userfault and it'll map it as
+usual with the UFFDIO_COPY|ZEROPAGE (without actually knowing if it
+was spontaneously sent by the source or if it was an urgent page
+requested through an userfault).
+
+By the time the userfaults start, the QEMU in the destination node
+doesn't need to keep any per-page state bitmap relative to the live
+migration around and a single per-page bitmap has to be maintained in
+the QEMU running in the source node to know which pages are still
+missing in the destination node. The bitmap in the source node is
+checked to find which missing pages to send in round robin and we seek
+over it when receiving incoming userfaults. After sending each page of
+course the bitmap is updated accordingly. It's also useful to avoid
+sending the same page twice (in case the userfault is read by the
+postcopy thread just before UFFDIO_COPY|ZEROPAGE runs in the migration
+thread).
-- 
cgit v1.2.3


From 1038628d80e96e3a086189172d9be8eb85ecfabf Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange@redhat.com>
Date: Fri, 4 Sep 2015 15:46:04 -0700
Subject: userfaultfd: uAPI

Defines the uAPI of the userfaultfd, notably the ioctl numbers and protocol.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/ioctl/ioctl-number.txt |  1 +
 include/uapi/linux/Kbuild            |  1 +
 include/uapi/linux/userfaultfd.h     | 83 ++++++++++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+)
 create mode 100644 include/uapi/linux/userfaultfd.h

(limited to 'Documentation')

diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 64df08db4657..39ac6546d4a4 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -303,6 +303,7 @@ Code  Seq#(hex)	Include File		Comments
 0xA3	80-8F	Port ACL		in development:
 					<mailto:tlewis@mindspring.com>
 0xA3	90-9F	linux/dtlk.h
+0xAA	00-3F	linux/uapi/linux/userfaultfd.h
 0xAB	00-1F	linux/nbd.h
 0xAC	00-1F	linux/raw.h
 0xAD	00	Netfilter device	in development:
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index aafb9937b162..70ff1d9abf0d 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -456,3 +456,4 @@ header-y += xfrm.h
 header-y += xilinx-v4l2-controls.h
 header-y += zorro.h
 header-y += zorro_ids.h
+header-y += userfaultfd.h
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
new file mode 100644
index 000000000000..09c2e2a8c9d6
--- /dev/null
+++ b/include/uapi/linux/userfaultfd.h
@@ -0,0 +1,83 @@
+/*
+ *  include/linux/userfaultfd.h
+ *
+ *  Copyright (C) 2007  Davide Libenzi <davidel@xmailserver.org>
+ *  Copyright (C) 2015  Red Hat, Inc.
+ *
+ */
+
+#ifndef _LINUX_USERFAULTFD_H
+#define _LINUX_USERFAULTFD_H
+
+#include <linux/types.h>
+
+#define UFFD_API ((__u64)0xAA)
+/* FIXME: add "|UFFD_BIT_WP" to UFFD_API_BITS after implementing it */
+#define UFFD_API_BITS (UFFD_BIT_WRITE)
+#define UFFD_API_IOCTLS				\
+	((__u64)1 << _UFFDIO_REGISTER |		\
+	 (__u64)1 << _UFFDIO_UNREGISTER |	\
+	 (__u64)1 << _UFFDIO_API)
+#define UFFD_API_RANGE_IOCTLS			\
+	((__u64)1 << _UFFDIO_WAKE)
+
+/*
+ * Valid ioctl command number range with this API is from 0x00 to
+ * 0x3F.  UFFDIO_API is the fixed number, everything else can be
+ * changed by implementing a different UFFD_API. If sticking to the
+ * same UFFD_API more ioctl can be added and userland will be aware of
+ * which ioctl the running kernel implements through the ioctl command
+ * bitmask written by the UFFDIO_API.
+ */
+#define _UFFDIO_REGISTER		(0x00)
+#define _UFFDIO_UNREGISTER		(0x01)
+#define _UFFDIO_WAKE			(0x02)
+#define _UFFDIO_API			(0x3F)
+
+/* userfaultfd ioctl ids */
+#define UFFDIO 0xAA
+#define UFFDIO_API		_IOWR(UFFDIO, _UFFDIO_API,	\
+				      struct uffdio_api)
+#define UFFDIO_REGISTER		_IOWR(UFFDIO, _UFFDIO_REGISTER, \
+				      struct uffdio_register)
+#define UFFDIO_UNREGISTER	_IOR(UFFDIO, _UFFDIO_UNREGISTER,	\
+				     struct uffdio_range)
+#define UFFDIO_WAKE		_IOR(UFFDIO, _UFFDIO_WAKE,	\
+				     struct uffdio_range)
+
+/*
+ * Valid bits below PAGE_SHIFT in the userfault address read through
+ * the read() syscall.
+ */
+#define UFFD_BIT_WRITE	(1<<0)	/* this was a write fault, MISSING or WP */
+#define UFFD_BIT_WP	(1<<1)	/* handle_userfault() reason VM_UFFD_WP */
+#define UFFD_BITS	2	/* two above bits used for UFFD_BIT_* mask */
+
+struct uffdio_api {
+	/* userland asks for an API number */
+	__u64 api;
+
+	/* kernel answers below with the available features for the API */
+	__u64 bits;
+	__u64 ioctls;
+};
+
+struct uffdio_range {
+	__u64 start;
+	__u64 len;
+};
+
+struct uffdio_register {
+	struct uffdio_range range;
+#define UFFDIO_REGISTER_MODE_MISSING	((__u64)1<<0)
+#define UFFDIO_REGISTER_MODE_WP		((__u64)1<<1)
+	__u64 mode;
+
+	/*
+	 * kernel answers which ioctl commands are available for the
+	 * range, keep at the end as the last 8 bytes aren't read.
+	 */
+	__u64 ioctls;
+};
+
+#endif /* _LINUX_USERFAULTFD_H */
-- 
cgit v1.2.3


From a9b85f9415fd9e529d03299e5335433f614ec1fb Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange@redhat.com>
Date: Fri, 4 Sep 2015 15:46:37 -0700
Subject: userfaultfd: change the read API to return a uffd_msg

I had requests to return the full address (not the page aligned one) to
userland.

It's not entirely clear how the page offset could be relevant because
userfaults aren't like SIGBUS that can sigjump to a different place and it
actually skip resolving the fault depending on a page offset.  There's
currently no real way to skip the fault especially because after a
UFFDIO_COPY|ZEROPAGE, the fault is optimized to be retried within the
kernel without having to return to userland first (not even self modifying
code replacing the .text that touched the faulting address would prevent
the fault to be repeated).  Userland cannot skip repeating the fault even
more so if the fault was triggered by a KVM secondary page fault or any
get_user_pages or any copy-user inside some syscall which will return to
kernel code.  The second time FAULT_FLAG_RETRY_NOWAIT won't be set leading
to a SIGBUS being raised because the userfault can't wait if it cannot
release the mmap_map first (and FAULT_FLAG_RETRY_NOWAIT is required for
that).

Still returning userland a proper structure during the read() on the uffd,
can allow to use the current UFFD_API for the future non-cooperative
extensions too and it looks cleaner as well.  Once we get additional
fields there's no point to return the fault address page aligned anymore
to reuse the bits below PAGE_SHIFT.

The only downside is that the read() syscall will read 32bytes instead of
8bytes but that's not going to be measurable overhead.

The total number of new events that can be extended or of new future bits
for already shipped events, is limited to 64 by the features field of the
uffdio_api structure.  If more will be needed a bump of UFFD_API will be
required.

[akpm@linux-foundation.org: use __packed]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/userfaultfd.txt | 12 +++---
 fs/userfaultfd.c                 | 79 +++++++++++++++++++++++-----------------
 include/uapi/linux/userfaultfd.h | 70 +++++++++++++++++++++++++++--------
 3 files changed, 108 insertions(+), 53 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/userfaultfd.txt b/Documentation/vm/userfaultfd.txt
index 90912925425e..70a3c94d1941 100644
--- a/Documentation/vm/userfaultfd.txt
+++ b/Documentation/vm/userfaultfd.txt
@@ -46,11 +46,13 @@ is a corner case that would currently return -EBUSY).
 When first opened the userfaultfd must be enabled invoking the
 UFFDIO_API ioctl specifying a uffdio_api.api value set to UFFD_API (or
 a later API version) which will specify the read/POLLIN protocol
-userland intends to speak on the UFFD. The UFFDIO_API ioctl if
-successful (i.e. if the requested uffdio_api.api is spoken also by the
-running kernel), will return into uffdio_api.features and
-uffdio_api.ioctls two 64bit bitmasks of respectively the activated
-feature of the read(2) protocol and the generic ioctl available.
+userland intends to speak on the UFFD and the uffdio_api.features
+userland requires. The UFFDIO_API ioctl if successful (i.e. if the
+requested uffdio_api.api is spoken also by the running kernel and the
+requested features are going to be enabled) will return into
+uffdio_api.features and uffdio_api.ioctls two 64bit bitmasks of
+respectively all the available features of the read(2) protocol and
+the generic ioctl available.
 
 Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should
 be invoked (if present in the returned uffdio_api.ioctls bitmask) to
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0756d97b0666..1f2ddaaf3c03 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -50,7 +50,7 @@ struct userfaultfd_ctx {
 };
 
 struct userfaultfd_wait_queue {
-	unsigned long address;
+	struct uffd_msg msg;
 	wait_queue_t wq;
 	bool pending;
 	struct userfaultfd_ctx *ctx;
@@ -77,7 +77,8 @@ static int userfaultfd_wake_function(wait_queue_t *wq, unsigned mode,
 	/* len == 0 means wake all */
 	start = range->start;
 	len = range->len;
-	if (len && (start > uwq->address || start + len <= uwq->address))
+	if (len && (start > uwq->msg.arg.pagefault.address ||
+		    start + len <= uwq->msg.arg.pagefault.address))
 		goto out;
 	ret = wake_up_state(wq->private, mode);
 	if (ret)
@@ -135,28 +136,43 @@ static void userfaultfd_ctx_put(struct userfaultfd_ctx *ctx)
 	}
 }
 
-static inline unsigned long userfault_address(unsigned long address,
-					      unsigned int flags,
-					      unsigned long reason)
+static inline void msg_init(struct uffd_msg *msg)
 {
-	BUILD_BUG_ON(PAGE_SHIFT < UFFD_BITS);
-	address &= PAGE_MASK;
+	BUILD_BUG_ON(sizeof(struct uffd_msg) != 32);
+	/*
+	 * Must use memset to zero out the paddings or kernel data is
+	 * leaked to userland.
+	 */
+	memset(msg, 0, sizeof(struct uffd_msg));
+}
+
+static inline struct uffd_msg userfault_msg(unsigned long address,
+					    unsigned int flags,
+					    unsigned long reason)
+{
+	struct uffd_msg msg;
+	msg_init(&msg);
+	msg.event = UFFD_EVENT_PAGEFAULT;
+	msg.arg.pagefault.address = address;
 	if (flags & FAULT_FLAG_WRITE)
 		/*
-		 * Encode "write" fault information in the LSB of the
-		 * address read by userland, without depending on
-		 * FAULT_FLAG_WRITE kernel internal value.
+		 * If UFFD_FEATURE_PAGEFAULT_FLAG_WRITE was set in the
+		 * uffdio_api.features and UFFD_PAGEFAULT_FLAG_WRITE
+		 * was not set in a UFFD_EVENT_PAGEFAULT, it means it
+		 * was a read fault, otherwise if set it means it's
+		 * a write fault.
 		 */
-		address |= UFFD_BIT_WRITE;
+		msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WRITE;
 	if (reason & VM_UFFD_WP)
 		/*
-		 * Encode "reason" fault information as bit number 1
-		 * in the address read by userland. If bit number 1 is
-		 * clear it means the reason is a VM_FAULT_MISSING
-		 * fault.
+		 * If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the
+		 * uffdio_api.features and UFFD_PAGEFAULT_FLAG_WP was
+		 * not set in a UFFD_EVENT_PAGEFAULT, it means it was
+		 * a missing fault, otherwise if set it means it's a
+		 * write protect fault.
 		 */
-		address |= UFFD_BIT_WP;
-	return address;
+		msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WP;
+	return msg;
 }
 
 /*
@@ -242,7 +258,7 @@ int handle_userfault(struct vm_area_struct *vma, unsigned long address,
 
 	init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
 	uwq.wq.private = current;
-	uwq.address = userfault_address(address, flags, reason);
+	uwq.msg = userfault_msg(address, flags, reason);
 	uwq.pending = true;
 	uwq.ctx = ctx;
 
@@ -398,7 +414,7 @@ static unsigned int userfaultfd_poll(struct file *file, poll_table *wait)
 }
 
 static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
-				    __u64 *addr)
+				    struct uffd_msg *msg)
 {
 	ssize_t ret;
 	DECLARE_WAITQUEUE(wait, current);
@@ -416,8 +432,8 @@ static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
 			 * disappear from under us.
 			 */
 			uwq->pending = false;
-			/* careful to always initialize addr if ret == 0 */
-			*addr = uwq->address;
+			/* careful to always initialize msg if ret == 0 */
+			*msg = uwq->msg;
 			spin_unlock(&ctx->fault_wqh.lock);
 			ret = 0;
 			break;
@@ -447,8 +463,7 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf,
 {
 	struct userfaultfd_ctx *ctx = file->private_data;
 	ssize_t _ret, ret = 0;
-	/* careful to always initialize addr if ret == 0 */
-	__u64 uninitialized_var(addr);
+	struct uffd_msg msg;
 	int no_wait = file->f_flags & O_NONBLOCK;
 
 	if (ctx->state == UFFD_STATE_WAIT_API)
@@ -456,16 +471,16 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf,
 	BUG_ON(ctx->state != UFFD_STATE_RUNNING);
 
 	for (;;) {
-		if (count < sizeof(addr))
+		if (count < sizeof(msg))
 			return ret ? ret : -EINVAL;
-		_ret = userfaultfd_ctx_read(ctx, no_wait, &addr);
+		_ret = userfaultfd_ctx_read(ctx, no_wait, &msg);
 		if (_ret < 0)
 			return ret ? ret : _ret;
-		if (put_user(addr, (__u64 __user *) buf))
+		if (copy_to_user((__u64 __user *) buf, &msg, sizeof(msg)))
 			return ret ? ret : -EFAULT;
-		ret += sizeof(addr);
-		buf += sizeof(addr);
-		count -= sizeof(addr);
+		ret += sizeof(msg);
+		buf += sizeof(msg);
+		count -= sizeof(msg);
 		/*
 		 * Allow to read more than one fault at time but only
 		 * block if waiting for the very first one.
@@ -873,17 +888,15 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
 	if (ctx->state != UFFD_STATE_WAIT_API)
 		goto out;
 	ret = -EFAULT;
-	if (copy_from_user(&uffdio_api, buf, sizeof(__u64)))
+	if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api)))
 		goto out;
-	if (uffdio_api.api != UFFD_API) {
-		/* careful not to leak info, we only read the first 8 bytes */
+	if (uffdio_api.api != UFFD_API || uffdio_api.features) {
 		memset(&uffdio_api, 0, sizeof(uffdio_api));
 		if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api)))
 			goto out;
 		ret = -EINVAL;
 		goto out;
 	}
-	/* careful not to leak info, we only read the first 8 bytes */
 	uffdio_api.features = UFFD_API_FEATURES;
 	uffdio_api.ioctls = UFFD_API_IOCTLS;
 	ret = -EFAULT;
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index 330206016249..a5f8825381ef 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -11,9 +11,15 @@
 
 #include <linux/types.h>
 
+#include <linux/compiler.h>
+
 #define UFFD_API ((__u64)0xAA)
-/* FIXME: add "|UFFD_FEATURE_WP" to UFFD_API_FEATURES after implementing it */
-#define UFFD_API_FEATURES (UFFD_FEATURE_WRITE_BIT)
+/*
+ * After implementing the respective features it will become:
+ * #define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP | \
+ *			      UFFD_FEATURE_EVENT_FORK)
+ */
+#define UFFD_API_FEATURES (0)
 #define UFFD_API_IOCTLS				\
 	((__u64)1 << _UFFDIO_REGISTER |		\
 	 (__u64)1 << _UFFDIO_UNREGISTER |	\
@@ -45,26 +51,60 @@
 #define UFFDIO_WAKE		_IOR(UFFDIO, _UFFDIO_WAKE,	\
 				     struct uffdio_range)
 
-/*
- * Valid bits below PAGE_SHIFT in the userfault address read through
- * the read() syscall.
- */
-#define UFFD_BIT_WRITE	(1<<0)	/* this was a write fault, MISSING or WP */
-#define UFFD_BIT_WP	(1<<1)	/* handle_userfault() reason VM_UFFD_WP */
-#define UFFD_BITS	2	/* two above bits used for UFFD_BIT_* mask */
+/* read() structure */
+struct uffd_msg {
+	__u8	event;
+
+	__u8	reserved1;
+	__u16	reserved2;
+	__u32	reserved3;
+
+	union {
+		struct {
+			__u64	flags;
+			__u64	address;
+		} pagefault;
+
+		struct {
+			/* unused reserved fields */
+			__u64	reserved1;
+			__u64	reserved2;
+			__u64	reserved3;
+		} reserved;
+	} arg;
+} __packed;
 
 /*
- * Features reported in uffdio_api.features field
+ * Start at 0x12 and not at 0 to be more strict against bugs.
  */
-#define UFFD_FEATURE_WRITE_BIT	(1<<0) /* Corresponds to UFFD_BIT_WRITE */
-#define UFFD_FEATURE_WP_BIT	(1<<1) /* Corresponds to UFFD_BIT_WP */
+#define UFFD_EVENT_PAGEFAULT	0x12
+#if 0 /* not available yet */
+#define UFFD_EVENT_FORK		0x13
+#endif
+
+/* flags for UFFD_EVENT_PAGEFAULT */
+#define UFFD_PAGEFAULT_FLAG_WRITE	(1<<0)	/* If this was a write fault */
+#define UFFD_PAGEFAULT_FLAG_WP		(1<<1)	/* If reason is VM_UFFD_WP */
 
 struct uffdio_api {
-	/* userland asks for an API number */
+	/* userland asks for an API number and the features to enable */
 	__u64 api;
-
-	/* kernel answers below with the available features for the API */
+	/*
+	 * Kernel answers below with the all available features for
+	 * the API, this notifies userland of which events and/or
+	 * which flags for each event are enabled in the current
+	 * kernel.
+	 *
+	 * Note: UFFD_EVENT_PAGEFAULT and UFFD_PAGEFAULT_FLAG_WRITE
+	 * are to be considered implicitly always enabled in all kernels as
+	 * long as the uffdio_api.api requested matches UFFD_API.
+	 */
+#if 0 /* not available yet */
+#define UFFD_FEATURE_PAGEFAULT_FLAG_WP		(1<<0)
+#define UFFD_FEATURE_EVENT_FORK			(1<<1)
+#endif
 	__u64 features;
+
 	__u64 ioctls;
 };
 
-- 
cgit v1.2.3


From c7e1e3ccfbd153c890240a391f258efaedfa94d0 Mon Sep 17 00:00:00 2001
From: Mel Gorman <mgorman@suse.de>
Date: Fri, 4 Sep 2015 15:47:38 -0700
Subject: Documentation/features/vm: add feature description and arch support
 status for batched TLB flush after unmap

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/features/vm/TLB/arch-support.txt | 40 ++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)
 create mode 100644 Documentation/features/vm/TLB/arch-support.txt

(limited to 'Documentation')

diff --git a/Documentation/features/vm/TLB/arch-support.txt b/Documentation/features/vm/TLB/arch-support.txt
new file mode 100644
index 000000000000..261b92e2fb1a
--- /dev/null
+++ b/Documentation/features/vm/TLB/arch-support.txt
@@ -0,0 +1,40 @@
+#
+# Feature name:          batch-unmap-tlb-flush
+#         Kconfig:       ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
+#         description:   arch supports deferral of TLB flush until multiple pages are unmapped
+#
+    -----------------------
+    |         arch |status|
+    -----------------------
+    |       alpha: | TODO |
+    |         arc: | TODO |
+    |         arm: | TODO |
+    |       arm64: | TODO |
+    |       avr32: |  ..  |
+    |    blackfin: | TODO |
+    |         c6x: |  ..  |
+    |        cris: |  ..  |
+    |         frv: |  ..  |
+    |       h8300: |  ..  |
+    |     hexagon: | TODO |
+    |        ia64: | TODO |
+    |        m32r: | TODO |
+    |        m68k: |  ..  |
+    |       metag: | TODO |
+    |  microblaze: |  ..  |
+    |        mips: | TODO |
+    |     mn10300: | TODO |
+    |       nios2: |  ..  |
+    |    openrisc: |  ..  |
+    |      parisc: | TODO |
+    |     powerpc: | TODO |
+    |        s390: | TODO |
+    |       score: |  ..  |
+    |          sh: | TODO |
+    |       sparc: | TODO |
+    |        tile: | TODO |
+    |          um: |  ..  |
+    |   unicore32: |  ..  |
+    |         x86: |  ok  |
+    |      xtensa: | TODO |
+    -----------------------
-- 
cgit v1.2.3


From dcb9372b34c9de90672e4cf811d7c3a8519320aa Mon Sep 17 00:00:00 2001
From: Joachim Eastwood <manabian@gmail.com>
Date: Sat, 11 Jul 2015 19:28:50 +0200
Subject: doc: dt: add documentation for nxp,lpc1788-rtc

Document NXP LPC178x/18xx/408x/43xx bindings

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
---
 .../devicetree/bindings/rtc/nxp,lpc1788-rtc.txt     | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/rtc/nxp,lpc1788-rtc.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/rtc/nxp,lpc1788-rtc.txt b/Documentation/devicetree/bindings/rtc/nxp,lpc1788-rtc.txt
new file mode 100644
index 000000000000..3c97bd180592
--- /dev/null
+++ b/Documentation/devicetree/bindings/rtc/nxp,lpc1788-rtc.txt
@@ -0,0 +1,21 @@
+NXP LPC1788 real-time clock
+
+The LPC1788 RTC provides calendar and clock functionality
+together with periodic tick and alarm interrupt support.
+
+Required properties:
+- compatible	: must contain "nxp,lpc1788-rtc"
+- reg		: Specifies base physical address and size of the registers.
+- interrupts	: A single interrupt specifier.
+- clocks	: Must contain clock specifiers for rtc and register clock
+- clock-names	: Must contain "rtc" and "reg"
+  See ../clocks/clock-bindings.txt for details.
+
+Example:
+rtc: rtc@40046000 {
+	compatible = "nxp,lpc1788-rtc";
+	reg = <0x40046000 0x1000>;
+	interrupts = <47>;
+	clocks = <&creg_clk 0>, <&ccu1 CLK_CPU_BUS>;
+	clock-names = "rtc", "reg";
+};
-- 
cgit v1.2.3


From 12ece40d9196e01961192fc25cfdaf22392520de Mon Sep 17 00:00:00 2001
From: Suneel Garapati <suneel.garapati@xilinx.com>
Date: Wed, 19 Aug 2015 15:23:21 +0530
Subject: devicetree: bindings: rtc: add bindings for xilinx zynqmp rtc

adds file for description on device node bindings for RTC
found on Xilinx Zynq Ultrascale+ MPSoC.

Signed-off-by: Suneel Garapati <suneel.garapati@xilinx.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
---
 Documentation/devicetree/bindings/rtc/xlnx-rtc.txt | 25 ++++++++++++++++++++++
 1 file changed, 25 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/rtc/xlnx-rtc.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/rtc/xlnx-rtc.txt b/Documentation/devicetree/bindings/rtc/xlnx-rtc.txt
new file mode 100644
index 000000000000..0df6f016b1b7
--- /dev/null
+++ b/Documentation/devicetree/bindings/rtc/xlnx-rtc.txt
@@ -0,0 +1,25 @@
+* Xilinx Zynq Ultrascale+ MPSoC Real Time Clock
+
+RTC controller for the Xilinx Zynq MPSoC Real Time Clock
+Separate IRQ lines for seconds and alarm
+
+Required properties:
+- compatible: Should be "xlnx,zynqmp-rtc"
+- reg: Physical base address of the controller and length
+       of memory mapped region.
+- interrupts: IRQ lines for the RTC.
+- interrupt-names: interrupt line names eg. "sec" "alarm"
+
+Optional:
+- calibration: calibration value for 1 sec period which will
+		be programmed directly to calibration register
+
+Example:
+rtc: rtc@ffa60000 {
+	compatible = "xlnx,zynqmp-rtc";
+	reg = <0x0 0xffa60000 0x100>;
+	interrupt-parent = <&gic>;
+	interrupts = <0 26 4>, <0 27 4>;
+	interrupt-names = "alarm", "sec";
+	calibration = <0x198233>;
+};
-- 
cgit v1.2.3


From fff51e771eafc3b4fa6daf1372fd4a4023bb402b Mon Sep 17 00:00:00 2001
From: Keerthy <j-keerthy@ti.com>
Date: Tue, 18 Aug 2015 15:11:14 +0530
Subject: ARM: dts: AM437x: Add the internal and external clock nodes for rtc

rtc can either be supplied from internal 32k clock or external crystal
generated 32k clock. Internal clock is SOC specific and the external
clock is board dependent. Adding the corresponding nodes.

Signed-off-by: Keerthy <j-keerthy@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
---
 Documentation/devicetree/bindings/rtc/rtc-omap.txt |  4 ++++
 arch/arm/boot/dts/am4372.dtsi                      |  2 ++
 arch/arm/boot/dts/am437x-gp-evm.dts                | 13 +++++++++++++
 arch/arm/boot/dts/am437x-idk-evm.dts               |  9 +++++++++
 arch/arm/boot/dts/am437x-sk-evm.dts                |  9 +++++++++
 5 files changed, 37 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/rtc/rtc-omap.txt b/Documentation/devicetree/bindings/rtc/rtc-omap.txt
index 43a83668673a..bf7d11ae9bea 100644
--- a/Documentation/devicetree/bindings/rtc/rtc-omap.txt
+++ b/Documentation/devicetree/bindings/rtc/rtc-omap.txt
@@ -16,6 +16,8 @@ Required properties:
 Optional properties:
 - system-power-controller: whether the rtc is controlling the system power
   through pmic_power_en
+- clocks: Any internal or external clocks feeding in to rtc
+- clock-names: Corresponding names of the clocks
 
 Example:
 
@@ -26,4 +28,6 @@ rtc@1c23000 {
 		      19>;
 	interrupt-parent = <&intc>;
 	system-power-controller;
+	clocks = <&clk_32k_rtc>, <&clk_32768_ck>;
+	clock-names = "ext-clk", "int-clk";
 };
diff --git a/arch/arm/boot/dts/am4372.dtsi b/arch/arm/boot/dts/am4372.dtsi
index 564900b9fcce..0447c04a40cc 100644
--- a/arch/arm/boot/dts/am4372.dtsi
+++ b/arch/arm/boot/dts/am4372.dtsi
@@ -358,6 +358,8 @@
 			interrupts = <GIC_SPI 75 IRQ_TYPE_LEVEL_HIGH
 				      GIC_SPI 76 IRQ_TYPE_LEVEL_HIGH>;
 			ti,hwmods = "rtc";
+			clocks = <&clk_32768_ck>;
+			clock-names = "int-clk";
 			status = "disabled";
 		};
 
diff --git a/arch/arm/boot/dts/am437x-gp-evm.dts b/arch/arm/boot/dts/am437x-gp-evm.dts
index 215775dc6948..22038f21f228 100644
--- a/arch/arm/boot/dts/am437x-gp-evm.dts
+++ b/arch/arm/boot/dts/am437x-gp-evm.dts
@@ -112,6 +112,13 @@
 		clock-frequency = <12000000>;
 	};
 
+	/* fixed 32k external oscillator clock */
+	clk_32k_rtc: clk_32k_rtc {
+		#clock-cells = <0>;
+		compatible = "fixed-clock";
+		clock-frequency = <32768>;
+	};
+
 	sound0: sound@0 {
 		compatible = "simple-audio-card";
 		simple-audio-card,name = "AM437x-GP-EVM";
@@ -941,3 +948,9 @@
 	tx-num-evt = <32>;
 	rx-num-evt = <32>;
 };
+
+&rtc {
+	clocks = <&clk_32k_rtc>, <&clk_32768_ck>;
+	clock-names = "ext-clk", "int-clk";
+	status = "okay";
+};
diff --git a/arch/arm/boot/dts/am437x-idk-evm.dts b/arch/arm/boot/dts/am437x-idk-evm.dts
index 378344271746..af25801418b4 100644
--- a/arch/arm/boot/dts/am437x-idk-evm.dts
+++ b/arch/arm/boot/dts/am437x-idk-evm.dts
@@ -110,6 +110,13 @@
 			gpios = <&gpio4 2 GPIO_ACTIVE_LOW>;
 		};
 	};
+
+	/* fixed 32k external oscillator clock */
+	clk_32k_rtc: clk_32k_rtc {
+		#clock-cells = <0>;
+		compatible = "fixed-clock";
+		clock-frequency = <32768>;
+	};
 };
 
 &am43xx_pinmux {
@@ -394,6 +401,8 @@
 };
 
 &rtc {
+	clocks = <&clk_32k_rtc>, <&clk_32768_ck>;
+	clock-names = "ext-clk", "int-clk";
 	status = "okay";
 };
 
diff --git a/arch/arm/boot/dts/am437x-sk-evm.dts b/arch/arm/boot/dts/am437x-sk-evm.dts
index 22af44894c66..7da7c2da4af1 100644
--- a/arch/arm/boot/dts/am437x-sk-evm.dts
+++ b/arch/arm/boot/dts/am437x-sk-evm.dts
@@ -24,6 +24,13 @@
 		display0 = &lcd0;
 	};
 
+	/* fixed 32k external oscillator clock */
+	clk_32k_rtc: clk_32k_rtc {
+		#clock-cells = <0>;
+		compatible = "fixed-clock";
+		clock-frequency = <32768>;
+	};
+
 	backlight {
 		compatible = "pwm-backlight";
 		pwms = <&ecap0 0 50000 PWM_POLARITY_INVERTED>;
@@ -697,6 +704,8 @@
 };
 
 &rtc {
+	clocks = <&clk_32k_rtc>, <&clk_32768_ck>;
+	clock-names = "ext-clk", "int-clk";
 	status = "okay";
 };
 
-- 
cgit v1.2.3


From 48ead50c1dd8e5cdb7ead067558a834c1e895e6e Mon Sep 17 00:00:00 2001
From: Sanchayan Maity <maitysanchayan@gmail.com>
Date: Sat, 5 Sep 2015 10:32:09 -0700
Subject: Input: Add touchscreen support for Colibri VF50

The Colibri Vybrid VF50 module supports 4-wire touchscreens using
FETs and ADC inputs. This driver uses the IIO consumer interface
and relies on the vf610_adc driver based on the IIO framework.

Signed-off-by: Sanchayan Maity <maitysanchayan@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 .../bindings/input/touchscreen/colibri-vf50-ts.txt |  36 ++
 drivers/input/touchscreen/Kconfig                  |  12 +
 drivers/input/touchscreen/Makefile                 |   1 +
 drivers/input/touchscreen/colibri-vf50-ts.c        | 386 +++++++++++++++++++++
 4 files changed, 435 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/input/touchscreen/colibri-vf50-ts.txt
 create mode 100644 drivers/input/touchscreen/colibri-vf50-ts.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/input/touchscreen/colibri-vf50-ts.txt b/Documentation/devicetree/bindings/input/touchscreen/colibri-vf50-ts.txt
new file mode 100644
index 000000000000..9d9e930f3251
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/touchscreen/colibri-vf50-ts.txt
@@ -0,0 +1,36 @@
+* Toradex Colibri VF50 Touchscreen driver
+
+Required Properties:
+- compatible must be toradex,vf50-touchscreen
+- io-channels: adc channels being used by the Colibri VF50 module
+- xp-gpios: FET gate driver for input of X+
+- xm-gpios: FET gate driver for input of X-
+- yp-gpios: FET gate driver for input of Y+
+- ym-gpios: FET gate driver for input of Y-
+- interrupt-parent: phandle for the interrupt controller
+- interrupts: pen irq interrupt for touch detection
+- pinctrl-names: "idle", "default", "gpios"
+- pinctrl-0: pinctrl node for pen/touch detection state pinmux
+- pinctrl-1: pinctrl node for X/Y and pressure measurement (ADC) state pinmux
+- pinctrl-2: pinctrl node for gpios functioning as FET gate drivers
+- vf50-ts-min-pressure: pressure level at which to stop measuring X/Y values
+
+Example:
+
+	touchctrl: vf50_touchctrl {
+		compatible = "toradex,vf50-touchscreen";
+		io-channels = <&adc1 0>,<&adc0 0>,
+				<&adc0 1>,<&adc1 2>;
+		xp-gpios = <&gpio0 13 GPIO_ACTIVE_LOW>;
+		xm-gpios = <&gpio2 29 GPIO_ACTIVE_HIGH>;
+		yp-gpios = <&gpio0 12 GPIO_ACTIVE_LOW>;
+		ym-gpios = <&gpio0 4 GPIO_ACTIVE_HIGH>;
+		interrupt-parent = <&gpio0>;
+		interrupts = <8 IRQ_TYPE_LEVEL_LOW>;
+		pinctrl-names = "idle","default","gpios";
+		pinctrl-0 = <&pinctrl_touchctrl_idle>;
+		pinctrl-1 = <&pinctrl_touchctrl_default>;
+		pinctrl-2 = <&pinctrl_touchctrl_gpios>;
+		vf50-ts-min-pressure = <200>;
+		status = "disabled";
+	};
diff --git a/drivers/input/touchscreen/Kconfig b/drivers/input/touchscreen/Kconfig
index 059edeb7f04a..a6d7a4d8dbb7 100644
--- a/drivers/input/touchscreen/Kconfig
+++ b/drivers/input/touchscreen/Kconfig
@@ -1040,4 +1040,16 @@ config TOUCHSCREEN_ZFORCE
 	  To compile this driver as a module, choose M here: the
 	  module will be called zforce_ts.
 
+config TOUCHSCREEN_COLIBRI_VF50
+	tristate "Toradex Colibri on board touchscreen driver"
+	depends on GPIOLIB && IIO && VF610_ADC
+	help
+	  Say Y here if you have a Colibri VF50 and plan to use
+	  the on-board provided 4-wire touchscreen driver.
+
+	  If unsure, say N.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called colibri_vf50_ts.
+
 endif
diff --git a/drivers/input/touchscreen/Makefile b/drivers/input/touchscreen/Makefile
index c85aae23e7f8..fb27f7e36070 100644
--- a/drivers/input/touchscreen/Makefile
+++ b/drivers/input/touchscreen/Makefile
@@ -85,3 +85,4 @@ obj-$(CONFIG_TOUCHSCREEN_W90X900)	+= w90p910_ts.o
 obj-$(CONFIG_TOUCHSCREEN_SX8654)	+= sx8654.o
 obj-$(CONFIG_TOUCHSCREEN_TPS6507X)	+= tps6507x-ts.o
 obj-$(CONFIG_TOUCHSCREEN_ZFORCE)	+= zforce_ts.o
+obj-$(CONFIG_TOUCHSCREEN_COLIBRI_VF50)	+= colibri-vf50-ts.o
diff --git a/drivers/input/touchscreen/colibri-vf50-ts.c b/drivers/input/touchscreen/colibri-vf50-ts.c
new file mode 100644
index 000000000000..5d4903a402cc
--- /dev/null
+++ b/drivers/input/touchscreen/colibri-vf50-ts.c
@@ -0,0 +1,386 @@
+/*
+ * Toradex Colibri VF50 Touchscreen driver
+ *
+ * Copyright 2015 Toradex AG
+ *
+ * Originally authored by Stefan Agner for 3.0 kernel
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/gpio.h>
+#include <linux/gpio/consumer.h>
+#include <linux/iio/consumer.h>
+#include <linux/iio/types.h>
+#include <linux/input.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pinctrl/consumer.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#define DRIVER_NAME			"colibri-vf50-ts"
+#define DRV_VERSION			"1.0"
+
+#define VF_ADC_MAX			((1 << 12) - 1)
+
+#define COLI_TOUCH_MIN_DELAY_US		1000
+#define COLI_TOUCH_MAX_DELAY_US		2000
+#define COLI_PULLUP_MIN_DELAY_US	10000
+#define COLI_PULLUP_MAX_DELAY_US	11000
+#define COLI_TOUCH_NO_OF_AVGS		5
+#define COLI_TOUCH_REQ_ADC_CHAN		4
+
+struct vf50_touch_device {
+	struct platform_device *pdev;
+	struct input_dev *ts_input;
+	struct iio_channel *channels;
+	struct gpio_desc *gpio_xp;
+	struct gpio_desc *gpio_xm;
+	struct gpio_desc *gpio_yp;
+	struct gpio_desc *gpio_ym;
+	int pen_irq;
+	int min_pressure;
+	bool stop_touchscreen;
+};
+
+/*
+ * Enables given plates and measures touch parameters using ADC
+ */
+static int adc_ts_measure(struct iio_channel *channel,
+			  struct gpio_desc *plate_p, struct gpio_desc *plate_m)
+{
+	int i, value = 0, val = 0;
+	int error;
+
+	gpiod_set_value(plate_p, 1);
+	gpiod_set_value(plate_m, 1);
+
+	usleep_range(COLI_TOUCH_MIN_DELAY_US, COLI_TOUCH_MAX_DELAY_US);
+
+	for (i = 0; i < COLI_TOUCH_NO_OF_AVGS; i++) {
+		error = iio_read_channel_raw(channel, &val);
+		if (error < 0) {
+			value = error;
+			goto error_iio_read;
+		}
+
+		value += val;
+	}
+
+	value /= COLI_TOUCH_NO_OF_AVGS;
+
+error_iio_read:
+	gpiod_set_value(plate_p, 0);
+	gpiod_set_value(plate_m, 0);
+
+	return value;
+}
+
+/*
+ * Enable touch detection using falling edge detection on XM
+ */
+static void vf50_ts_enable_touch_detection(struct vf50_touch_device *vf50_ts)
+{
+	/* Enable plate YM (needs to be strong GND, high active) */
+	gpiod_set_value(vf50_ts->gpio_ym, 1);
+
+	/*
+	 * Let the platform mux to idle state in order to enable
+	 * Pull-Up on GPIO
+	 */
+	pinctrl_pm_select_idle_state(&vf50_ts->pdev->dev);
+
+	/* Wait for the pull-up to be stable on high */
+	usleep_range(COLI_PULLUP_MIN_DELAY_US, COLI_PULLUP_MAX_DELAY_US);
+}
+
+/*
+ * ADC touch screen sampling bottom half irq handler
+ */
+static irqreturn_t vf50_ts_irq_bh(int irq, void *private)
+{
+	struct vf50_touch_device *vf50_ts = private;
+	struct device *dev = &vf50_ts->pdev->dev;
+	int val_x, val_y, val_z1, val_z2, val_p = 0;
+	bool discard_val_on_start = true;
+
+	/* Disable the touch detection plates */
+	gpiod_set_value(vf50_ts->gpio_ym, 0);
+
+	/* Let the platform mux to default state in order to mux as ADC */
+	pinctrl_pm_select_default_state(dev);
+
+	while (!vf50_ts->stop_touchscreen) {
+		/* X-Direction */
+		val_x = adc_ts_measure(&vf50_ts->channels[0],
+				vf50_ts->gpio_xp, vf50_ts->gpio_xm);
+		if (val_x < 0)
+			break;
+
+		/* Y-Direction */
+		val_y = adc_ts_measure(&vf50_ts->channels[1],
+				vf50_ts->gpio_yp, vf50_ts->gpio_ym);
+		if (val_y < 0)
+			break;
+
+		/*
+		 * Touch pressure
+		 * Measure on XP/YM
+		 */
+		val_z1 = adc_ts_measure(&vf50_ts->channels[2],
+				vf50_ts->gpio_yp, vf50_ts->gpio_xm);
+		if (val_z1 < 0)
+			break;
+		val_z2 = adc_ts_measure(&vf50_ts->channels[3],
+				vf50_ts->gpio_yp, vf50_ts->gpio_xm);
+		if (val_z2 < 0)
+			break;
+
+		/* Validate signal (avoid calculation using noise) */
+		if (val_z1 > 64 && val_x > 64) {
+			/*
+			 * Calculate resistance between the plates
+			 * lower resistance means higher pressure
+			 */
+			int r_x = (1000 * val_x) / VF_ADC_MAX;
+
+			val_p = (r_x * val_z2) / val_z1 - r_x;
+
+		} else {
+			val_p = 2000;
+		}
+
+		val_p = 2000 - val_p;
+		dev_dbg(dev,
+			"Measured values: x: %d, y: %d, z1: %d, z2: %d, p: %d\n",
+			val_x, val_y, val_z1, val_z2, val_p);
+
+		/*
+		 * If touch pressure is too low, stop measuring and reenable
+		 * touch detection
+		 */
+		if (val_p < vf50_ts->min_pressure || val_p > 2000)
+			break;
+
+		/*
+		 * The pressure may not be enough for the first x and the
+		 * second y measurement, but, the pressure is ok when the
+		 * driver is doing the third and fourth measurement. To
+		 * take care of this, we drop the first measurement always.
+		 */
+		if (discard_val_on_start) {
+			discard_val_on_start = false;
+		} else {
+			/*
+			 * Report touch position and sleep for
+			 * the next measurement.
+			 */
+			input_report_abs(vf50_ts->ts_input,
+					ABS_X, VF_ADC_MAX - val_x);
+			input_report_abs(vf50_ts->ts_input,
+					ABS_Y, VF_ADC_MAX - val_y);
+			input_report_abs(vf50_ts->ts_input,
+					ABS_PRESSURE, val_p);
+			input_report_key(vf50_ts->ts_input, BTN_TOUCH, 1);
+			input_sync(vf50_ts->ts_input);
+		}
+
+		usleep_range(COLI_PULLUP_MIN_DELAY_US,
+			     COLI_PULLUP_MAX_DELAY_US);
+	}
+
+	/* Report no more touch, re-enable touch detection */
+	input_report_abs(vf50_ts->ts_input, ABS_PRESSURE, 0);
+	input_report_key(vf50_ts->ts_input, BTN_TOUCH, 0);
+	input_sync(vf50_ts->ts_input);
+
+	vf50_ts_enable_touch_detection(vf50_ts);
+
+	return IRQ_HANDLED;
+}
+
+static int vf50_ts_open(struct input_dev *dev_input)
+{
+	struct vf50_touch_device *touchdev = input_get_drvdata(dev_input);
+	struct device *dev = &touchdev->pdev->dev;
+
+	dev_dbg(dev, "Input device %s opened, starting touch detection\n",
+		dev_input->name);
+
+	touchdev->stop_touchscreen = false;
+
+	/* Mux detection before request IRQ, wait for pull-up to settle */
+	vf50_ts_enable_touch_detection(touchdev);
+
+	return 0;
+}
+
+static void vf50_ts_close(struct input_dev *dev_input)
+{
+	struct vf50_touch_device *touchdev = input_get_drvdata(dev_input);
+	struct device *dev = &touchdev->pdev->dev;
+
+	touchdev->stop_touchscreen = true;
+
+	/* Make sure IRQ is not running past close */
+	mb();
+	synchronize_irq(touchdev->pen_irq);
+
+	gpiod_set_value(touchdev->gpio_ym, 0);
+	pinctrl_pm_select_default_state(dev);
+
+	dev_dbg(dev, "Input device %s closed, disable touch detection\n",
+		dev_input->name);
+}
+
+static int vf50_ts_get_gpiod(struct device *dev, struct gpio_desc **gpio_d,
+			     const char *con_id, enum gpiod_flags flags)
+{
+	int error;
+
+	*gpio_d = devm_gpiod_get(dev, con_id, flags);
+	if (IS_ERR(*gpio_d)) {
+		error = PTR_ERR(*gpio_d);
+		dev_err(dev, "Could not get gpio_%s %d\n", con_id, error);
+		return error;
+	}
+
+	return 0;
+}
+
+static void vf50_ts_channel_release(void *data)
+{
+	struct iio_channel *channels = data;
+
+	iio_channel_release_all(channels);
+}
+
+static int vf50_ts_probe(struct platform_device *pdev)
+{
+	struct input_dev *input;
+	struct iio_channel *channels;
+	struct device *dev = &pdev->dev;
+	struct vf50_touch_device *touchdev;
+	int num_adc_channels;
+	int error;
+
+	channels = iio_channel_get_all(dev);
+	if (IS_ERR(channels))
+		return PTR_ERR(channels);
+
+	error = devm_add_action(dev, vf50_ts_channel_release, channels);
+	if (error) {
+		iio_channel_release_all(channels);
+		dev_err(dev, "Failed to register iio channel release action");
+		return error;
+	}
+
+	num_adc_channels = 0;
+	while (channels[num_adc_channels].indio_dev)
+		num_adc_channels++;
+
+	if (num_adc_channels != COLI_TOUCH_REQ_ADC_CHAN) {
+		dev_err(dev, "Inadequate ADC channels specified\n");
+		return -EINVAL;
+	}
+
+	touchdev = devm_kzalloc(dev, sizeof(*touchdev), GFP_KERNEL);
+	if (!touchdev)
+		return -ENOMEM;
+
+	touchdev->pdev = pdev;
+	touchdev->channels = channels;
+
+	error = of_property_read_u32(dev->of_node, "vf50-ts-min-pressure",
+				 &touchdev->min_pressure);
+	if (error)
+		return error;
+
+	input = devm_input_allocate_device(dev);
+	if (!input) {
+		dev_err(dev, "Failed to allocate TS input device\n");
+		return -ENOMEM;
+	}
+
+	platform_set_drvdata(pdev, touchdev);
+
+	input->name = DRIVER_NAME;
+	input->id.bustype = BUS_HOST;
+	input->dev.parent = dev;
+	input->open = vf50_ts_open;
+	input->close = vf50_ts_close;
+
+	input_set_capability(input, EV_KEY, BTN_TOUCH);
+	input_set_abs_params(input, ABS_X, 0, VF_ADC_MAX, 0, 0);
+	input_set_abs_params(input, ABS_Y, 0, VF_ADC_MAX, 0, 0);
+	input_set_abs_params(input, ABS_PRESSURE, 0, VF_ADC_MAX, 0, 0);
+
+	touchdev->ts_input = input;
+	input_set_drvdata(input, touchdev);
+
+	error = input_register_device(input);
+	if (error) {
+		dev_err(dev, "Failed to register input device\n");
+		return error;
+	}
+
+	error = vf50_ts_get_gpiod(dev, &touchdev->gpio_xp, "xp", GPIOD_OUT_LOW);
+	if (error)
+		return error;
+
+	error = vf50_ts_get_gpiod(dev, &touchdev->gpio_xm,
+				"xm", GPIOD_OUT_LOW);
+	if (error)
+		return error;
+
+	error = vf50_ts_get_gpiod(dev, &touchdev->gpio_yp, "yp", GPIOD_OUT_LOW);
+	if (error)
+		return error;
+
+	error = vf50_ts_get_gpiod(dev, &touchdev->gpio_ym, "ym", GPIOD_OUT_LOW);
+	if (error)
+		return error;
+
+	touchdev->pen_irq = platform_get_irq(pdev, 0);
+	if (touchdev->pen_irq < 0)
+		return touchdev->pen_irq;
+
+	error = devm_request_threaded_irq(dev, touchdev->pen_irq,
+					  NULL, vf50_ts_irq_bh, IRQF_ONESHOT,
+					  "vf50 touch", touchdev);
+	if (error) {
+		dev_err(dev, "Failed to request IRQ %d: %d\n",
+			touchdev->pen_irq, error);
+		return error;
+	}
+
+	return 0;
+}
+
+static const struct of_device_id vf50_touch_of_match[] = {
+	{ .compatible = "toradex,vf50-touchscreen", },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, vf50_touch_of_match);
+
+static struct platform_driver vf50_touch_driver = {
+	.driver = {
+		.name = "toradex,vf50_touchctrl",
+		.of_match_table = vf50_touch_of_match,
+	},
+	.probe = vf50_ts_probe,
+};
+module_platform_driver(vf50_touch_driver);
+
+MODULE_AUTHOR("Sanchayan Maity");
+MODULE_DESCRIPTION("Colibri VF50 Touchscreen driver");
+MODULE_LICENSE("GPL");
+MODULE_VERSION(DRV_VERSION);
-- 
cgit v1.2.3


From 9a436d524d3533cd15ed5a189d2237ff1e4e5343 Mon Sep 17 00:00:00 2001
From: Haibo Chen <haibo.chen@freescale.com>
Date: Sat, 5 Sep 2015 11:31:21 -0700
Subject: Input: touchscreen - add imx6ul_tsc driver support

Freescale i.MX6UL contains a internal touchscreen controller,
this patch add a driver to support this controller.

Signed-off-by: Haibo Chen <haibo.chen@freescale.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
---
 .../bindings/input/touchscreen/imx6ul_tsc.txt      |  36 ++
 drivers/input/touchscreen/Kconfig                  |  12 +
 drivers/input/touchscreen/Makefile                 |   1 +
 drivers/input/touchscreen/imx6ul_tsc.c             | 523 +++++++++++++++++++++
 4 files changed, 572 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/input/touchscreen/imx6ul_tsc.txt
 create mode 100644 drivers/input/touchscreen/imx6ul_tsc.c

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/input/touchscreen/imx6ul_tsc.txt b/Documentation/devicetree/bindings/input/touchscreen/imx6ul_tsc.txt
new file mode 100644
index 000000000000..853dff96dd9f
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/touchscreen/imx6ul_tsc.txt
@@ -0,0 +1,36 @@
+* Freescale i.MX6UL Touch Controller
+
+Required properties:
+- compatible: must be "fsl,imx6ul-tsc".
+- reg: this touch controller address and the ADC2 address.
+- interrupts: the interrupt of this touch controller and ADC2.
+- clocks: the root clock of touch controller and ADC2.
+- clock-names; must be "tsc" and "adc".
+- xnur-gpio: the X- gpio this controller connect to.
+  This xnur-gpio returns to low once the finger leave the touch screen (The
+  last touch event the touch controller capture).
+
+Optional properties:
+- measure-delay-time: the value of measure delay time.
+  Before X-axis or Y-axis measurement, the screen need some time before
+  even potential distribution ready.
+  This value depends on the touch screen.
+- pre-charge-time: the touch screen need some time to precharge.
+  This value depends on the touch screen.
+
+Example:
+	tsc: tsc@02040000 {
+		compatible = "fsl,imx6ul-tsc";
+		reg = <0x02040000 0x4000>, <0x0219c000 0x4000>;
+		interrupts = <GIC_SPI 3 IRQ_TYPE_LEVEL_HIGH>,
+			     <GIC_SPI 101 IRQ_TYPE_LEVEL_HIGH>;
+		clocks = <&clks IMX6UL_CLK_IPG>,
+			 <&clks IMX6UL_CLK_ADC2>;
+		clock-names = "tsc", "adc";
+		pinctrl-names = "default";
+		pinctrl-0 = <&pinctrl_tsc>;
+		xnur-gpio = <&gpio1 3 GPIO_ACTIVE_LOW>;
+		measure-delay-time = <0xfff>;
+		pre-charge-time = <0xffff>;
+		status = "okay";
+	};
diff --git a/drivers/input/touchscreen/Kconfig b/drivers/input/touchscreen/Kconfig
index a6d7a4d8dbb7..600dcceff542 100644
--- a/drivers/input/touchscreen/Kconfig
+++ b/drivers/input/touchscreen/Kconfig
@@ -479,6 +479,18 @@ config TOUCHSCREEN_MTOUCH
 	  To compile this driver as a module, choose M here: the
 	  module will be called mtouch.
 
+config TOUCHSCREEN_IMX6UL_TSC
+	tristate "Freescale i.MX6UL touchscreen controller"
+	depends on (OF && GPIOLIB) || COMPILE_TEST
+	help
+	  Say Y here if you have a Freescale i.MX6UL, and want to
+	  use the internal touchscreen controller.
+
+	  If unsure, say N.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called imx6ul_tsc.
+
 config TOUCHSCREEN_INEXIO
 	tristate "iNexio serial touchscreens"
 	select SERIO
diff --git a/drivers/input/touchscreen/Makefile b/drivers/input/touchscreen/Makefile
index fb27f7e36070..1b79cc09744a 100644
--- a/drivers/input/touchscreen/Makefile
+++ b/drivers/input/touchscreen/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_TOUCHSCREEN_EGALAX)	+= egalax_ts.o
 obj-$(CONFIG_TOUCHSCREEN_FUJITSU)	+= fujitsu_ts.o
 obj-$(CONFIG_TOUCHSCREEN_GOODIX)	+= goodix.o
 obj-$(CONFIG_TOUCHSCREEN_ILI210X)	+= ili210x.o
+obj-$(CONFIG_TOUCHSCREEN_IMX6UL_TSC)	+= imx6ul_tsc.o
 obj-$(CONFIG_TOUCHSCREEN_INEXIO)	+= inexio.o
 obj-$(CONFIG_TOUCHSCREEN_INTEL_MID)	+= intel-mid-touch.o
 obj-$(CONFIG_TOUCHSCREEN_IPROC)		+= bcm_iproc_tsc.o
diff --git a/drivers/input/touchscreen/imx6ul_tsc.c b/drivers/input/touchscreen/imx6ul_tsc.c
new file mode 100644
index 000000000000..ff0b75813daa
--- /dev/null
+++ b/drivers/input/touchscreen/imx6ul_tsc.c
@@ -0,0 +1,523 @@
+/*
+ * Freescale i.MX6UL touchscreen controller driver
+ *
+ * Copyright (C) 2015 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/gpio/consumer.h>
+#include <linux/input.h>
+#include <linux/slab.h>
+#include <linux/completion.h>
+#include <linux/delay.h>
+#include <linux/of.h>
+#include <linux/interrupt.h>
+#include <linux/platform_device.h>
+#include <linux/clk.h>
+#include <linux/io.h>
+
+/* ADC configuration registers field define */
+#define ADC_AIEN		(0x1 << 7)
+#define ADC_CONV_DISABLE	0x1F
+#define ADC_CAL			(0x1 << 7)
+#define ADC_CALF		0x2
+#define ADC_12BIT_MODE		(0x2 << 2)
+#define ADC_IPG_CLK		0x00
+#define ADC_CLK_DIV_8		(0x03 << 5)
+#define ADC_SHORT_SAMPLE_MODE	(0x0 << 4)
+#define ADC_HARDWARE_TRIGGER	(0x1 << 13)
+#define SELECT_CHANNEL_4	0x04
+#define SELECT_CHANNEL_1	0x01
+#define DISABLE_CONVERSION_INT	(0x0 << 7)
+
+/* ADC registers */
+#define REG_ADC_HC0		0x00
+#define REG_ADC_HC1		0x04
+#define REG_ADC_HC2		0x08
+#define REG_ADC_HC3		0x0C
+#define REG_ADC_HC4		0x10
+#define REG_ADC_HS		0x14
+#define REG_ADC_R0		0x18
+#define REG_ADC_CFG		0x2C
+#define REG_ADC_GC		0x30
+#define REG_ADC_GS		0x34
+
+#define ADC_TIMEOUT		msecs_to_jiffies(100)
+
+/* TSC registers */
+#define REG_TSC_BASIC_SETING	0x00
+#define REG_TSC_PRE_CHARGE_TIME	0x10
+#define REG_TSC_FLOW_CONTROL	0x20
+#define REG_TSC_MEASURE_VALUE	0x30
+#define REG_TSC_INT_EN		0x40
+#define REG_TSC_INT_SIG_EN	0x50
+#define REG_TSC_INT_STATUS	0x60
+#define REG_TSC_DEBUG_MODE	0x70
+#define REG_TSC_DEBUG_MODE2	0x80
+
+/* TSC configuration registers field define */
+#define DETECT_4_WIRE_MODE	(0x0 << 4)
+#define AUTO_MEASURE		0x1
+#define MEASURE_SIGNAL		0x1
+#define DETECT_SIGNAL		(0x1 << 4)
+#define VALID_SIGNAL		(0x1 << 8)
+#define MEASURE_INT_EN		0x1
+#define MEASURE_SIG_EN		0x1
+#define VALID_SIG_EN		(0x1 << 8)
+#define DE_GLITCH_2		(0x2 << 29)
+#define START_SENSE		(0x1 << 12)
+#define TSC_DISABLE		(0x1 << 16)
+#define DETECT_MODE		0x2
+
+struct imx6ul_tsc {
+	struct device *dev;
+	struct input_dev *input;
+	void __iomem *tsc_regs;
+	void __iomem *adc_regs;
+	struct clk *tsc_clk;
+	struct clk *adc_clk;
+	struct gpio_desc *xnur_gpio;
+
+	int measure_delay_time;
+	int pre_charge_time;
+
+	struct completion completion;
+};
+
+/*
+ * TSC module need ADC to get the measure value. So
+ * before config TSC, we should initialize ADC module.
+ */
+static void imx6ul_adc_init(struct imx6ul_tsc *tsc)
+{
+	int adc_hc = 0;
+	int adc_gc;
+	int adc_gs;
+	int adc_cfg;
+	int timeout;
+
+	reinit_completion(&tsc->completion);
+
+	adc_cfg = readl(tsc->adc_regs + REG_ADC_CFG);
+	adc_cfg |= ADC_12BIT_MODE | ADC_IPG_CLK;
+	adc_cfg |= ADC_CLK_DIV_8 | ADC_SHORT_SAMPLE_MODE;
+	adc_cfg &= ~ADC_HARDWARE_TRIGGER;
+	writel(adc_cfg, tsc->adc_regs + REG_ADC_CFG);
+
+	/* enable calibration interrupt */
+	adc_hc |= ADC_AIEN;
+	adc_hc |= ADC_CONV_DISABLE;
+	writel(adc_hc, tsc->adc_regs + REG_ADC_HC0);
+
+	/* start ADC calibration */
+	adc_gc = readl(tsc->adc_regs + REG_ADC_GC);
+	adc_gc |= ADC_CAL;
+	writel(adc_gc, tsc->adc_regs + REG_ADC_GC);
+
+	timeout = wait_for_completion_timeout
+			(&tsc->completion, ADC_TIMEOUT);
+	if (timeout == 0)
+		dev_err(tsc->dev, "Timeout for adc calibration\n");
+
+	adc_gs = readl(tsc->adc_regs + REG_ADC_GS);
+	if (adc_gs & ADC_CALF)
+		dev_err(tsc->dev, "ADC calibration failed\n");
+
+	/* TSC need the ADC work in hardware trigger */
+	adc_cfg = readl(tsc->adc_regs + REG_ADC_CFG);
+	adc_cfg |= ADC_HARDWARE_TRIGGER;
+	writel(adc_cfg, tsc->adc_regs + REG_ADC_CFG);
+}
+
+/*
+ * This is a TSC workaround. Currently TSC misconnect two
+ * ADC channels, this function remap channel configure for
+ * hardware trigger.
+ */
+static void imx6ul_tsc_channel_config(struct imx6ul_tsc *tsc)
+{
+	int adc_hc0, adc_hc1, adc_hc2, adc_hc3, adc_hc4;
+
+	adc_hc0 = DISABLE_CONVERSION_INT;
+	writel(adc_hc0, tsc->adc_regs + REG_ADC_HC0);
+
+	adc_hc1 = DISABLE_CONVERSION_INT | SELECT_CHANNEL_4;
+	writel(adc_hc1, tsc->adc_regs + REG_ADC_HC1);
+
+	adc_hc2 = DISABLE_CONVERSION_INT;
+	writel(adc_hc2, tsc->adc_regs + REG_ADC_HC2);
+
+	adc_hc3 = DISABLE_CONVERSION_INT | SELECT_CHANNEL_1;
+	writel(adc_hc3, tsc->adc_regs + REG_ADC_HC3);
+
+	adc_hc4 = DISABLE_CONVERSION_INT;
+	writel(adc_hc4, tsc->adc_regs + REG_ADC_HC4);
+}
+
+/*
+ * TSC setting, confige the pre-charge time and measure delay time.
+ * different touch screen may need different pre-charge time and
+ * measure delay time.
+ */
+static void imx6ul_tsc_set(struct imx6ul_tsc *tsc)
+{
+	int basic_setting = 0;
+	int start;
+
+	basic_setting |= tsc->measure_delay_time << 8;
+	basic_setting |= DETECT_4_WIRE_MODE | AUTO_MEASURE;
+	writel(basic_setting, tsc->tsc_regs + REG_TSC_BASIC_SETING);
+
+	writel(DE_GLITCH_2, tsc->tsc_regs + REG_TSC_DEBUG_MODE2);
+
+	writel(tsc->pre_charge_time, tsc->tsc_regs + REG_TSC_PRE_CHARGE_TIME);
+	writel(MEASURE_INT_EN, tsc->tsc_regs + REG_TSC_INT_EN);
+	writel(MEASURE_SIG_EN | VALID_SIG_EN,
+		tsc->tsc_regs + REG_TSC_INT_SIG_EN);
+
+	/* start sense detection */
+	start = readl(tsc->tsc_regs + REG_TSC_FLOW_CONTROL);
+	start |= START_SENSE;
+	start &= ~TSC_DISABLE;
+	writel(start, tsc->tsc_regs + REG_TSC_FLOW_CONTROL);
+}
+
+static void imx6ul_tsc_init(struct imx6ul_tsc *tsc)
+{
+	imx6ul_adc_init(tsc);
+	imx6ul_tsc_channel_config(tsc);
+	imx6ul_tsc_set(tsc);
+}
+
+static void imx6ul_tsc_disable(struct imx6ul_tsc *tsc)
+{
+	int tsc_flow;
+	int adc_cfg;
+
+	/* TSC controller enters to idle status */
+	tsc_flow = readl(tsc->tsc_regs + REG_TSC_FLOW_CONTROL);
+	tsc_flow |= TSC_DISABLE;
+	writel(tsc_flow, tsc->tsc_regs + REG_TSC_FLOW_CONTROL);
+
+	/* ADC controller enters to stop mode */
+	adc_cfg = readl(tsc->adc_regs + REG_ADC_HC0);
+	adc_cfg |= ADC_CONV_DISABLE;
+	writel(adc_cfg, tsc->adc_regs + REG_ADC_HC0);
+}
+
+/* Delay some time (max 2ms), wait the pre-charge done. */
+static bool tsc_wait_detect_mode(struct imx6ul_tsc *tsc)
+{
+	unsigned long timeout = jiffies + msecs_to_jiffies(2);
+	int state_machine;
+	int debug_mode2;
+
+	do {
+		if (time_after(jiffies, timeout))
+			return false;
+
+		usleep_range(200, 400);
+		debug_mode2 = readl(tsc->tsc_regs + REG_TSC_DEBUG_MODE2);
+		state_machine = (debug_mode2 >> 20) & 0x7;
+	} while (state_machine != DETECT_MODE);
+
+	usleep_range(200, 400);
+	return true;
+}
+
+static irqreturn_t tsc_irq_fn(int irq, void *dev_id)
+{
+	struct imx6ul_tsc *tsc = dev_id;
+	int status;
+	int value;
+	int x, y;
+	int start;
+
+	status = readl(tsc->tsc_regs + REG_TSC_INT_STATUS);
+
+	/* write 1 to clear the bit measure-signal */
+	writel(MEASURE_SIGNAL | DETECT_SIGNAL,
+		tsc->tsc_regs + REG_TSC_INT_STATUS);
+
+	/* It's a HW self-clean bit. Set this bit and start sense detection */
+	start = readl(tsc->tsc_regs + REG_TSC_FLOW_CONTROL);
+	start |= START_SENSE;
+	writel(start, tsc->tsc_regs + REG_TSC_FLOW_CONTROL);
+
+	if (status & MEASURE_SIGNAL) {
+		value = readl(tsc->tsc_regs + REG_TSC_MEASURE_VALUE);
+		x = (value >> 16) & 0x0fff;
+		y = value & 0x0fff;
+
+		/*
+		 * In detect mode, we can get the xnur gpio value,
+		 * otherwise assume contact is stiull active.
+		 */
+		if (!tsc_wait_detect_mode(tsc) ||
+		    gpiod_get_value_cansleep(tsc->xnur_gpio)) {
+			input_report_key(tsc->input, BTN_TOUCH, 1);
+			input_report_abs(tsc->input, ABS_X, x);
+			input_report_abs(tsc->input, ABS_Y, y);
+		} else {
+			input_report_key(tsc->input, BTN_TOUCH, 0);
+		}
+
+		input_sync(tsc->input);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t adc_irq_fn(int irq, void *dev_id)
+{
+	struct imx6ul_tsc *tsc = dev_id;
+	int coco;
+	int value;
+
+	coco = readl(tsc->adc_regs + REG_ADC_HS);
+	if (coco & 0x01) {
+		value = readl(tsc->adc_regs + REG_ADC_R0);
+		complete(&tsc->completion);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static int imx6ul_tsc_open(struct input_dev *input_dev)
+{
+	struct imx6ul_tsc *tsc = input_get_drvdata(input_dev);
+	int err;
+
+	err = clk_prepare_enable(tsc->adc_clk);
+	if (err) {
+		dev_err(tsc->dev,
+			"Could not prepare or enable the adc clock: %d\n",
+			err);
+		return err;
+	}
+
+	err = clk_prepare_enable(tsc->tsc_clk);
+	if (err) {
+		dev_err(tsc->dev,
+			"Could not prepare or enable the tsc clock: %d\n",
+			err);
+		clk_disable_unprepare(tsc->adc_clk);
+		return err;
+	}
+
+	imx6ul_tsc_init(tsc);
+
+	return 0;
+}
+
+static void imx6ul_tsc_close(struct input_dev *input_dev)
+{
+	struct imx6ul_tsc *tsc = input_get_drvdata(input_dev);
+
+	imx6ul_tsc_disable(tsc);
+
+	clk_disable_unprepare(tsc->tsc_clk);
+	clk_disable_unprepare(tsc->adc_clk);
+}
+
+static int imx6ul_tsc_probe(struct platform_device *pdev)
+{
+	struct device_node *np = pdev->dev.of_node;
+	struct imx6ul_tsc *tsc;
+	struct input_dev *input_dev;
+	struct resource *tsc_mem;
+	struct resource *adc_mem;
+	int err;
+	int tsc_irq;
+	int adc_irq;
+
+	tsc = devm_kzalloc(&pdev->dev, sizeof(struct imx6ul_tsc), GFP_KERNEL);
+	if (!tsc)
+		return -ENOMEM;
+
+	input_dev = devm_input_allocate_device(&pdev->dev);
+	if (!input_dev)
+		return -ENOMEM;
+
+	input_dev->name = "iMX6UL TouchScreen Controller";
+	input_dev->id.bustype = BUS_HOST;
+
+	input_dev->open = imx6ul_tsc_open;
+	input_dev->close = imx6ul_tsc_close;
+
+	input_set_capability(input_dev, EV_KEY, BTN_TOUCH);
+	input_set_abs_params(input_dev, ABS_X, 0, 0xFFF, 0, 0);
+	input_set_abs_params(input_dev, ABS_Y, 0, 0xFFF, 0, 0);
+
+	input_set_drvdata(input_dev, tsc);
+
+	tsc->dev = &pdev->dev;
+	tsc->input = input_dev;
+	init_completion(&tsc->completion);
+
+	tsc->xnur_gpio = devm_gpiod_get(&pdev->dev, "xnur", GPIOD_IN);
+	if (IS_ERR(tsc->xnur_gpio)) {
+		err = PTR_ERR(tsc->xnur_gpio);
+		dev_err(&pdev->dev,
+			"failed to request GPIO tsc_X- (xnur): %d\n", err);
+		return err;
+	}
+
+	tsc_mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	tsc->tsc_regs = devm_ioremap_resource(&pdev->dev, tsc_mem);
+	if (IS_ERR(tsc->tsc_regs)) {
+		err = PTR_ERR(tsc->tsc_regs);
+		dev_err(&pdev->dev, "failed to remap tsc memory: %d\n", err);
+		return err;
+	}
+
+	adc_mem = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+	tsc->adc_regs = devm_ioremap_resource(&pdev->dev, adc_mem);
+	if (IS_ERR(tsc->adc_regs)) {
+		err = PTR_ERR(tsc->adc_regs);
+		dev_err(&pdev->dev, "failed to remap adc memory: %d\n", err);
+		return err;
+	}
+
+	tsc->tsc_clk = devm_clk_get(&pdev->dev, "tsc");
+	if (IS_ERR(tsc->tsc_clk)) {
+		err = PTR_ERR(tsc->tsc_clk);
+		dev_err(&pdev->dev, "failed getting tsc clock: %d\n", err);
+		return err;
+	}
+
+	tsc->adc_clk = devm_clk_get(&pdev->dev, "adc");
+	if (IS_ERR(tsc->adc_clk)) {
+		err = PTR_ERR(tsc->adc_clk);
+		dev_err(&pdev->dev, "failed getting adc clock: %d\n", err);
+		return err;
+	}
+
+	tsc_irq = platform_get_irq(pdev, 0);
+	if (tsc_irq < 0) {
+		dev_err(&pdev->dev, "no tsc irq resource?\n");
+		return tsc_irq;
+	}
+
+	adc_irq = platform_get_irq(pdev, 1);
+	if (adc_irq <= 0) {
+		dev_err(&pdev->dev, "no adc irq resource?\n");
+		return adc_irq;
+	}
+
+	err = devm_request_threaded_irq(tsc->dev, tsc_irq,
+					NULL, tsc_irq_fn, IRQF_ONESHOT,
+					dev_name(&pdev->dev), tsc);
+	if (err) {
+		dev_err(&pdev->dev,
+			"failed requesting tsc irq %d: %d\n",
+			tsc_irq, err);
+		return err;
+	}
+
+	err = devm_request_irq(tsc->dev, adc_irq, adc_irq_fn, 0,
+				dev_name(&pdev->dev), tsc);
+	if (err) {
+		dev_err(&pdev->dev,
+			"failed requesting adc irq %d: %d\n",
+			adc_irq, err);
+		return err;
+	}
+
+	err = of_property_read_u32(np, "measure-delay-time",
+				   &tsc->measure_delay_time);
+	if (err)
+		tsc->measure_delay_time = 0xffff;
+
+	err = of_property_read_u32(np, "pre-charge-time",
+				   &tsc->pre_charge_time);
+	if (err)
+		tsc->pre_charge_time = 0xfff;
+
+	err = input_register_device(tsc->input);
+	if (err) {
+		dev_err(&pdev->dev,
+			"failed to register input device: %d\n", err);
+		return err;
+	}
+
+	platform_set_drvdata(pdev, tsc);
+	return 0;
+}
+
+static int __maybe_unused imx6ul_tsc_suspend(struct device *dev)
+{
+	struct platform_device *pdev = to_platform_device(dev);
+	struct imx6ul_tsc *tsc = platform_get_drvdata(pdev);
+	struct input_dev *input_dev = tsc->input;
+
+	mutex_lock(&input_dev->mutex);
+
+	if (input_dev->users) {
+		imx6ul_tsc_disable(tsc);
+
+		clk_disable_unprepare(tsc->tsc_clk);
+		clk_disable_unprepare(tsc->adc_clk);
+	}
+
+	mutex_unlock(&input_dev->mutex);
+
+	return 0;
+}
+
+static int __maybe_unused imx6ul_tsc_resume(struct device *dev)
+{
+	struct platform_device *pdev = to_platform_device(dev);
+	struct imx6ul_tsc *tsc = platform_get_drvdata(pdev);
+	struct input_dev *input_dev = tsc->input;
+	int retval = 0;
+
+	mutex_lock(&input_dev->mutex);
+
+	if (input_dev->users) {
+		retval = clk_prepare_enable(tsc->adc_clk);
+		if (retval)
+			goto out;
+
+		retval = clk_prepare_enable(tsc->tsc_clk);
+		if (retval) {
+			clk_disable_unprepare(tsc->adc_clk);
+			goto out;
+		}
+
+		imx6ul_tsc_init(tsc);
+	}
+
+out:
+	mutex_unlock(&input_dev->mutex);
+	return retval;
+}
+
+static SIMPLE_DEV_PM_OPS(imx6ul_tsc_pm_ops,
+			 imx6ul_tsc_suspend, imx6ul_tsc_resume);
+
+static const struct of_device_id imx6ul_tsc_match[] = {
+	{ .compatible = "fsl,imx6ul-tsc", },
+	{ /* sentinel */ }
+};
+MODULE_DEVICE_TABLE(of, imx6ul_tsc_match);
+
+static struct platform_driver imx6ul_tsc_driver = {
+	.driver		= {
+		.name	= "imx6ul-tsc",
+		.of_match_table	= imx6ul_tsc_match,
+		.pm	= &imx6ul_tsc_pm_ops,
+	},
+	.probe		= imx6ul_tsc_probe,
+};
+module_platform_driver(imx6ul_tsc_driver);
+
+MODULE_AUTHOR("Haibo Chen <haibo.chen@freescale.com>");
+MODULE_DESCRIPTION("Freescale i.MX6UL Touchscreen controller driver");
+MODULE_LICENSE("GPL v2");
-- 
cgit v1.2.3


From e7e98d76777ffba334bbf7a61939c5de48acc5a0 Mon Sep 17 00:00:00 2001
From: James Hogan <james.hogan@imgtec.com>
Date: Thu, 30 Jul 2015 13:31:42 +0100
Subject: Documentation/features/vm: Meta2 is capable of THP

Change metag Transparent Huge Pages (THP) support from .. to TODO. Meta2
has variable sized pages, between 4KB and 4MB, specified at the 1st
level page table level, and already supports hugetlbfs, so supporting
THP is theoretically possible too.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-metag@vger.kernel.org
Cc: linux-doc@vger.kernel.org
---
 Documentation/features/vm/THP/arch-support.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/features/vm/THP/arch-support.txt b/Documentation/features/vm/THP/arch-support.txt
index 972d02c2a74c..df384e3e845f 100644
--- a/Documentation/features/vm/THP/arch-support.txt
+++ b/Documentation/features/vm/THP/arch-support.txt
@@ -20,7 +20,7 @@
     |        ia64: | TODO |
     |        m32r: |  ..  |
     |        m68k: |  ..  |
-    |       metag: |  ..  |
+    |       metag: | TODO |
     |  microblaze: |  ..  |
     |        mips: |  ok  |
     |     mn10300: |  ..  |
-- 
cgit v1.2.3


From edcd591c77a48da753456f92daf8bb50fe9bac93 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Mon, 7 Sep 2015 13:18:03 -0600
Subject: locking/static_keys: Fix a silly typo

Commit:

  412758cb2670 ("jump label, locking/static_keys: Update docs")

introduced a typo that might as well get fixed.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150907131803.54c027e1@lwn.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/static-keys.txt | 2 +-
 include/linux/jump_label.h    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/static-keys.txt b/Documentation/static-keys.txt
index f4cb0b2d5cd7..ec911583f6c5 100644
--- a/Documentation/static-keys.txt
+++ b/Documentation/static-keys.txt
@@ -16,7 +16,7 @@ The updated API replacements are:
 DEFINE_STATIC_KEY_TRUE(key);
 DEFINE_STATIC_KEY_FALSE(key);
 static_key_likely()
-statick_key_unlikely()
+static_key_unlikely()
 
 0) Abstract
 
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 7f653e8f6690..0684bd3a48fc 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -22,7 +22,7 @@
  * DEFINE_STATIC_KEY_TRUE(key);
  * DEFINE_STATIC_KEY_FALSE(key);
  * static_key_likely()
- * statick_key_unlikely()
+ * static_key_unlikely()
  *
  * Jump labels provide an interface to generate dynamic branches using
  * self-modifying code. Assuming toolchain and architecture support, if we
-- 
cgit v1.2.3


From 96206e2922c114b13cadefa03b9f340b58fee13c Mon Sep 17 00:00:00 2001
From: Bob Paauwe <bob.j.paauwe@intel.com>
Date: Thu, 27 Aug 2015 10:04:13 -0700
Subject: dtrm/edid: Allow comma separated edid binaries. (v3)

Allow comma separated filenames in the edid_firmware parameter.

For example:

edid_firmware=eDP-1:edid/1280x480.bin,DP-2:edid/1920x1080.bin

v2: Use strsep() to simplify parsing of comma seperated string. (Matt)
    Move initial bail before strdup. (Matt)
v3: Changed conditionals after while loop to make more readable (Jani)
    Updated kernel-parameters.txt to reflect changes (Jani)

Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Bob Paauwe <bob.j.paauwe@intel.com>
[danvet: Flatten else control flow and appease checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 Documentation/kernel-parameters.txt | 15 ++++++++------
 drivers/gpu/drm/drm_edid_load.c     | 41 +++++++++++++++++++++++++++++--------
 2 files changed, 42 insertions(+), 14 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1d6f0459cd7b..caf0fd4cdecd 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -927,11 +927,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			The filter can be disabled or changed to another
 			driver later using sysfs.
 
-	drm_kms_helper.edid_firmware=[<connector>:]<file>
-			Broken monitors, graphic adapters and KVMs may
-			send no or incorrect EDID data sets. This parameter
-			allows to specify an EDID data set in the
-			/lib/firmware directory that is used instead.
+	drm_kms_helper.edid_firmware=[<connector>:]<file>[,[<connector>:]<file>]
+			Broken monitors, graphic adapters, KVMs and EDIDless
+			panels may send no or incorrect EDID data sets.
+			This parameter allows to specify an EDID data sets
+			in the /lib/firmware directory that are used instead.
 			Generic built-in EDID data sets are used, if one of
 			edid/1024x768.bin, edid/1280x1024.bin,
 			edid/1680x1050.bin, or edid/1920x1080.bin is given
@@ -940,7 +940,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			available in Documentation/EDID/HOWTO.txt. An EDID
 			data set will only be used for a particular connector,
 			if its name and a colon are prepended to the EDID
-			name.
+			name. Each connector may use a unique EDID data
+			set by separating the files with a comma.  An EDID
+			data set with no connector name will be used for
+			any connectors not explicitly specified.
 
 	dscc4.setup=	[NET]
 
diff --git a/drivers/gpu/drm/drm_edid_load.c b/drivers/gpu/drm/drm_edid_load.c
index c5605fe4907e..1f445e9bd768 100644
--- a/drivers/gpu/drm/drm_edid_load.c
+++ b/drivers/gpu/drm/drm_edid_load.c
@@ -264,20 +264,43 @@ out:
 int drm_load_edid_firmware(struct drm_connector *connector)
 {
 	const char *connector_name = connector->name;
-	char *edidname = edid_firmware, *last, *colon;
+	char *edidname, *last, *colon, *fwstr, *edidstr, *fallback = NULL;
 	int ret;
 	struct edid *edid;
 
-	if (*edidname == '\0')
+	if (edid_firmware[0] == '\0')
 		return 0;
 
-	colon = strchr(edidname, ':');
-	if (colon != NULL) {
-		if (strncmp(connector_name, edidname, colon - edidname))
-			return 0;
-		edidname = colon + 1;
-		if (*edidname == '\0')
+	/*
+	 * If there are multiple edid files specified and separated
+	 * by commas, search through the list looking for one that
+	 * matches the connector.
+	 *
+	 * If there's one or more that don't't specify a connector, keep
+	 * the last one found one as a fallback.
+	 */
+	fwstr = kstrdup(edid_firmware, GFP_KERNEL);
+	edidstr = fwstr;
+
+	while ((edidname = strsep(&edidstr, ","))) {
+		colon = strchr(edidname, ':');
+		if (colon != NULL) {
+			if (strncmp(connector_name, edidname, colon - edidname))
+				continue;
+			edidname = colon + 1;
+			break;
+		}
+
+		if (*edidname != '\0') /* corner case: multiple ',' */
+			fallback = edidname;
+	}
+
+	if (!edidname) {
+		if (!fallback) {
+			kfree(fwstr);
 			return 0;
+		}
+		edidname = fallback;
 	}
 
 	last = edidname + strlen(edidname) - 1;
@@ -285,6 +308,8 @@ int drm_load_edid_firmware(struct drm_connector *connector)
 		*last = '\0';
 
 	edid = edid_load(connector, edidname, connector_name);
+	kfree(fwstr);
+
 	if (IS_ERR_OR_NULL(edid))
 		return 0;
 
-- 
cgit v1.2.3


From 844f35db1088dd1a9de37b53d4d823626232bd19 Mon Sep 17 00:00:00 2001
From: Matthew Wilcox <willy@linux.intel.com>
Date: Tue, 8 Sep 2015 14:58:57 -0700
Subject: dax: add huge page fault support

This is the support code for DAX-enabled filesystems to allow them to
provide huge pages in response to faults.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/filesystems/dax.txt |   7 +-
 fs/dax.c                          | 152 ++++++++++++++++++++++++++++++++++++++
 include/linux/dax.h               |  14 ++++
 3 files changed, 170 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt
index 7af2851d667c..7bde64014a89 100644
--- a/Documentation/filesystems/dax.txt
+++ b/Documentation/filesystems/dax.txt
@@ -60,9 +60,10 @@ Filesystem support consists of
 - implementing the direct_IO address space operation, and calling
   dax_do_io() instead of blockdev_direct_IO() if S_DAX is set
 - implementing an mmap file operation for DAX files which sets the
-  VM_MIXEDMAP flag on the VMA, and setting the vm_ops to include handlers
-  for fault and page_mkwrite (which should probably call dax_fault() and
-  dax_mkwrite(), passing the appropriate get_block() callback)
+  VM_MIXEDMAP and VM_HUGEPAGE flags on the VMA, and setting the vm_ops to
+  include handlers for fault, pmd_fault and page_mkwrite (which should
+  probably call dax_fault(), dax_pmd_fault() and dax_mkwrite(), passing the
+  appropriate get_block() callback)
 - calling dax_truncate_page() instead of block_truncate_page() for DAX files
 - calling dax_zero_page_range() instead of zero_user() for DAX files
 - ensuring that there is sufficient locking between reads, writes,
diff --git a/fs/dax.c b/fs/dax.c
index a7f77e1fa18c..15f8ffc13fa6 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -494,6 +494,158 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
 }
 EXPORT_SYMBOL_GPL(dax_fault);
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+/*
+ * The 'colour' (ie low bits) within a PMD of a page offset.  This comes up
+ * more often than one might expect in the below function.
+ */
+#define PG_PMD_COLOUR	((PMD_SIZE >> PAGE_SHIFT) - 1)
+
+int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
+		pmd_t *pmd, unsigned int flags, get_block_t get_block,
+		dax_iodone_t complete_unwritten)
+{
+	struct file *file = vma->vm_file;
+	struct address_space *mapping = file->f_mapping;
+	struct inode *inode = mapping->host;
+	struct buffer_head bh;
+	unsigned blkbits = inode->i_blkbits;
+	unsigned long pmd_addr = address & PMD_MASK;
+	bool write = flags & FAULT_FLAG_WRITE;
+	long length;
+	void *kaddr;
+	pgoff_t size, pgoff;
+	sector_t block, sector;
+	unsigned long pfn;
+	int result = 0;
+
+	/* Fall back to PTEs if we're going to COW */
+	if (write && !(vma->vm_flags & VM_SHARED))
+		return VM_FAULT_FALLBACK;
+	/* If the PMD would extend outside the VMA */
+	if (pmd_addr < vma->vm_start)
+		return VM_FAULT_FALLBACK;
+	if ((pmd_addr + PMD_SIZE) > vma->vm_end)
+		return VM_FAULT_FALLBACK;
+
+	pgoff = ((pmd_addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	if (pgoff >= size)
+		return VM_FAULT_SIGBUS;
+	/* If the PMD would cover blocks out of the file */
+	if ((pgoff | PG_PMD_COLOUR) >= size)
+		return VM_FAULT_FALLBACK;
+
+	memset(&bh, 0, sizeof(bh));
+	block = (sector_t)pgoff << (PAGE_SHIFT - blkbits);
+
+	bh.b_size = PMD_SIZE;
+	length = get_block(inode, block, &bh, write);
+	if (length)
+		return VM_FAULT_SIGBUS;
+	i_mmap_lock_read(mapping);
+
+	/*
+	 * If the filesystem isn't willing to tell us the length of a hole,
+	 * just fall back to PTEs.  Calling get_block 512 times in a loop
+	 * would be silly.
+	 */
+	if (!buffer_size_valid(&bh) || bh.b_size < PMD_SIZE)
+		goto fallback;
+
+	/* Guard against a race with truncate */
+	size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	if (pgoff >= size) {
+		result = VM_FAULT_SIGBUS;
+		goto out;
+	}
+	if ((pgoff | PG_PMD_COLOUR) >= size)
+		goto fallback;
+
+	if (is_huge_zero_pmd(*pmd))
+		unmap_mapping_range(mapping, pgoff << PAGE_SHIFT, PMD_SIZE, 0);
+
+	if (!write && !buffer_mapped(&bh) && buffer_uptodate(&bh)) {
+		bool set;
+		spinlock_t *ptl;
+		struct mm_struct *mm = vma->vm_mm;
+		struct page *zero_page = get_huge_zero_page();
+		if (unlikely(!zero_page))
+			goto fallback;
+
+		ptl = pmd_lock(mm, pmd);
+		set = set_huge_zero_page(NULL, mm, vma, pmd_addr, pmd,
+								zero_page);
+		spin_unlock(ptl);
+		result = VM_FAULT_NOPAGE;
+	} else {
+		sector = bh.b_blocknr << (blkbits - 9);
+		length = bdev_direct_access(bh.b_bdev, sector, &kaddr, &pfn,
+						bh.b_size);
+		if (length < 0) {
+			result = VM_FAULT_SIGBUS;
+			goto out;
+		}
+		if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR))
+			goto fallback;
+
+		if (buffer_unwritten(&bh) || buffer_new(&bh)) {
+			int i;
+			for (i = 0; i < PTRS_PER_PMD; i++)
+				clear_page(kaddr + i * PAGE_SIZE);
+			count_vm_event(PGMAJFAULT);
+			mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
+			result |= VM_FAULT_MAJOR;
+		}
+
+		result |= vmf_insert_pfn_pmd(vma, address, pmd, pfn, write);
+	}
+
+ out:
+	i_mmap_unlock_read(mapping);
+
+	if (buffer_unwritten(&bh))
+		complete_unwritten(&bh, !(result & VM_FAULT_ERROR));
+
+	return result;
+
+ fallback:
+	count_vm_event(THP_FAULT_FALLBACK);
+	result = VM_FAULT_FALLBACK;
+	goto out;
+}
+EXPORT_SYMBOL_GPL(__dax_pmd_fault);
+
+/**
+ * dax_pmd_fault - handle a PMD fault on a DAX file
+ * @vma: The virtual memory area where the fault occurred
+ * @vmf: The description of the fault
+ * @get_block: The filesystem method used to translate file offsets to blocks
+ *
+ * When a page fault occurs, filesystems may call this helper in their
+ * pmd_fault handler for DAX files.
+ */
+int dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
+			pmd_t *pmd, unsigned int flags, get_block_t get_block,
+			dax_iodone_t complete_unwritten)
+{
+	int result;
+	struct super_block *sb = file_inode(vma->vm_file)->i_sb;
+
+	if (flags & FAULT_FLAG_WRITE) {
+		sb_start_pagefault(sb);
+		file_update_time(vma->vm_file);
+	}
+	result = __dax_pmd_fault(vma, address, pmd, flags, get_block,
+				complete_unwritten);
+	if (flags & FAULT_FLAG_WRITE)
+		sb_end_pagefault(sb);
+
+	return result;
+}
+EXPORT_SYMBOL_GPL(dax_pmd_fault);
+#endif /* CONFIG_TRANSPARENT_HUGEPAGES */
+
 /**
  * dax_pfn_mkwrite - handle first write to DAX page
  * @vma: The virtual memory area where the fault occurred
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 9b51f9d40ad9..b415e521528d 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -14,6 +14,20 @@ int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t,
 		dax_iodone_t);
 int __dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t,
 		dax_iodone_t);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+int dax_pmd_fault(struct vm_area_struct *, unsigned long addr, pmd_t *,
+				unsigned int flags, get_block_t, dax_iodone_t);
+int __dax_pmd_fault(struct vm_area_struct *, unsigned long addr, pmd_t *,
+				unsigned int flags, get_block_t, dax_iodone_t);
+#else
+static inline int dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
+				pmd_t *pmd, unsigned int flags, get_block_t gb,
+				dax_iodone_t di)
+{
+	return VM_FAULT_FALLBACK;
+}
+#define __dax_pmd_fault dax_pmd_fault
+#endif
 int dax_pfn_mkwrite(struct vm_area_struct *, struct vm_fault *);
 #define dax_mkwrite(vma, vmf, gb, iod)		dax_fault(vma, vmf, gb, iod)
 #define __dax_mkwrite(vma, vmf, gb, iod)	__dax_fault(vma, vmf, gb, iod)
-- 
cgit v1.2.3


From 77bb499bb60f4b79cca7d139c8041662860fcf87 Mon Sep 17 00:00:00 2001
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date: Tue, 8 Sep 2015 15:00:10 -0700
Subject: pagemap: add mmap-exclusive bit for marking pages mapped only here

This patch sets bit 56 in pagemap if this page is mapped only once.  It
allows to detect exclusively used pages without exposing PFN:

present file exclusive state
0       0    0         non-present
1       1    0         file page mapped somewhere else
1       1    1         file page mapped only here
1       0    0         anon non-CoWed page (shared with parent/child)
1       0    1         anon CoWed page (or never forked)

CoWed pages in (MAP_FILE | MAP_PRIVATE) areas are anon in this context.

MMap-exclusive bit doesn't reflect potential page-sharing via swapcache:
page could be mapped once but has several swap-ptes which point to it.
Application could detect that by swap bit in pagemap entry and touch that
pte via /proc/pid/mem to get real information.

See http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com

Requested by Mark Williamson.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Mark Williamson <mwilliamson@undo-software.com>
Tested-by:  Mark Williamson <mwilliamson@undo-software.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/pagemap.txt |  3 ++-
 fs/proc/task_mmu.c           | 14 +++++++++++++-
 tools/vm/page-types.c        | 10 ++++++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 6bfbc172cdb9..56faec0f73f7 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -16,7 +16,8 @@ There are three components to pagemap:
     * Bits 0-4   swap type if swapped
     * Bits 5-54  swap offset if swapped
     * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
-    * Bits 56-60 zero
+    * Bit  56    page exclusively mapped
+    * Bits 57-60 zero
     * Bit  61    page is file-page or shared-anon
     * Bit  62    page swapped
     * Bit  63    page present
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index bc651644b1b2..67c76468a7be 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -949,6 +949,7 @@ struct pagemapread {
 #define PM_PFRAME_BITS		55
 #define PM_PFRAME_MASK		GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
 #define PM_SOFT_DIRTY		BIT_ULL(55)
+#define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
 #define PM_FILE			BIT_ULL(61)
 #define PM_SWAP			BIT_ULL(62)
 #define PM_PRESENT		BIT_ULL(63)
@@ -1036,6 +1037,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 
 	if (page && !PageAnon(page))
 		flags |= PM_FILE;
+	if (page && page_mapcount(page) == 1)
+		flags |= PM_MMAP_EXCLUSIVE;
 	if (vma->vm_flags & VM_SOFTDIRTY)
 		flags |= PM_SOFT_DIRTY;
 
@@ -1066,6 +1069,11 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 		 * This if-check is just to prepare for future implementation.
 		 */
 		if (pmd_present(pmd)) {
+			struct page *page = pmd_page(pmd);
+
+			if (page_mapcount(page) == 1)
+				flags |= PM_MMAP_EXCLUSIVE;
+
 			flags |= PM_PRESENT;
 			if (pm->show_pfn)
 				frame = pmd_pfn(pmd) +
@@ -1131,6 +1139,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 		if (!PageAnon(page))
 			flags |= PM_FILE;
 
+		if (page_mapcount(page) == 1)
+			flags |= PM_MMAP_EXCLUSIVE;
+
 		flags |= PM_PRESENT;
 		if (pm->show_pfn)
 			frame = pte_pfn(pte) +
@@ -1163,7 +1174,8 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
  * Bits 0-4   swap type if swapped
  * Bits 5-54  swap offset if swapped
  * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
- * Bits 56-60 zero
+ * Bit  56    page exclusively mapped
+ * Bits 57-60 zero
  * Bit  61    page is file-page or shared-anon
  * Bit  62    page swapped
  * Bit  63    page present
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 603ec916716b..7f73fa32a590 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -62,6 +62,7 @@
 #define PM_PFRAME_MASK		((1LL << PM_PFRAME_BITS) - 1)
 #define PM_PFRAME(x)		((x) & PM_PFRAME_MASK)
 #define PM_SOFT_DIRTY		(1ULL << 55)
+#define PM_MMAP_EXCLUSIVE	(1ULL << 56)
 #define PM_FILE			(1ULL << 61)
 #define PM_SWAP			(1ULL << 62)
 #define PM_PRESENT		(1ULL << 63)
@@ -91,6 +92,8 @@
 #define KPF_SLOB_FREE		49
 #define KPF_SLUB_FROZEN		50
 #define KPF_SLUB_DEBUG		51
+#define KPF_FILE		62
+#define KPF_MMAP_EXCLUSIVE	63
 
 #define KPF_ALL_BITS		((uint64_t)~0ULL)
 #define KPF_HACKERS_BITS	(0xffffULL << 32)
@@ -140,6 +143,9 @@ static const char * const page_flag_names[] = {
 	[KPF_SLOB_FREE]		= "P:slob_free",
 	[KPF_SLUB_FROZEN]	= "A:slub_frozen",
 	[KPF_SLUB_DEBUG]	= "E:slub_debug",
+
+	[KPF_FILE]		= "F:file",
+	[KPF_MMAP_EXCLUSIVE]	= "1:mmap_exclusive",
 };
 
 
@@ -443,6 +449,10 @@ static uint64_t expand_overloaded_flags(uint64_t flags, uint64_t pme)
 
 	if (pme & PM_SOFT_DIRTY)
 		flags |= BIT(SOFTDIRTY);
+	if (pme & PM_FILE)
+		flags |= BIT(FILE);
+	if (pme & PM_MMAP_EXCLUSIVE)
+		flags |= BIT(MMAP_EXCLUSIVE);
 
 	return flags;
 }
-- 
cgit v1.2.3


From 83b4b0bb635eee2b8e075062e4e008d1bc110ed7 Mon Sep 17 00:00:00 2001
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date: Tue, 8 Sep 2015 15:00:13 -0700
Subject: pagemap: update documentation

Notes about recent changes.

[akpm@linux-foundation.org: various tweaks]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Mark Williamson <mwilliamson@undo-software.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/pagemap.txt | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 56faec0f73f7..3cd38438242a 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -16,12 +16,17 @@ There are three components to pagemap:
     * Bits 0-4   swap type if swapped
     * Bits 5-54  swap offset if swapped
     * Bit  55    pte is soft-dirty (see Documentation/vm/soft-dirty.txt)
-    * Bit  56    page exclusively mapped
+    * Bit  56    page exclusively mapped (since 4.2)
     * Bits 57-60 zero
-    * Bit  61    page is file-page or shared-anon
+    * Bit  61    page is file-page or shared-anon (since 3.5)
     * Bit  62    page swapped
     * Bit  63    page present
 
+   Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs.
+   In 4.0 and 4.1 opens by unprivileged fail with -EPERM.  Starting from
+   4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN.
+   Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
+
    If the page is not present but in swap, then the PFN contains an
    encoding of the swap file number and the page's offset into the
    swap. Unmapped pages return a null PFN. This allows determining
@@ -160,3 +165,8 @@ Other notes:
 Reading from any of the files will return -EINVAL if you are not starting
 the read on an 8-byte boundary (e.g., if you sought an odd number of bytes
 into the file), or if the size of the read is not a multiple of 8 bytes.
+
+Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
+always 12 at most architectures). Since Linux 3.11 their meaning changes
+after first clear of soft-dirty bits. Since Linux 4.2 they are used for
+flags unconditionally.
-- 
cgit v1.2.3


From 8334b96221ff0dcbde4873d31eb4d84774ed8ed4 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Tue, 8 Sep 2015 15:00:24 -0700
Subject: mm: /proc/pid/smaps:: show proportional swap share of the mapping

We want to know per-process workingset size for smart memory management
on userland and we use swap(ex, zram) heavily to maximize memory
efficiency so workingset includes swap as well as RSS.

On such system, if there are lots of shared anonymous pages, it's really
hard to figure out exactly how many each process consumes memory(ie, rss
+ wap) if the system has lots of shared anonymous memory(e.g, android).

This patch introduces SwapPss field on /proc/<pid>/smaps so we can get
more exact workingset size per process.

Bongkyu tested it. Result is below.

1. 50M used swap
SwapTotal: 461976 kB
SwapFree: 411192 kB

$ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
48236
$ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
141184

2. 240M used swap
SwapTotal: 461976 kB
SwapFree: 216808 kB

$ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
230315
$ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
1387744

[akpm@linux-foundation.org: simplify kunmap_atomic() call]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Bongkyu Kim <bongkyu.kim@lge.com>
Tested-by: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/filesystems/proc.txt | 18 +++++++++++-----
 fs/proc/task_mmu.c                 | 18 ++++++++++++++--
 include/linux/swap.h               |  6 ++++++
 mm/swapfile.c                      | 42 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+), 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 6f7fafde0884..d411ca63c8b6 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -424,6 +424,7 @@ Private_Dirty:         0 kB
 Referenced:          892 kB
 Anonymous:             0 kB
 Swap:                  0 kB
+SwapPss:               0 kB
 KernelPageSize:        4 kB
 MMUPageSize:           4 kB
 Locked:              374 kB
@@ -433,16 +434,23 @@ the first of these lines shows the same information as is displayed for the
 mapping in /proc/PID/maps.  The remaining lines show the size of the mapping
 (size), the amount of the mapping that is currently resident in RAM (RSS), the
 process' proportional share of this mapping (PSS), the number of clean and
-dirty private pages in the mapping.  Note that even a page which is part of a
-MAP_SHARED mapping, but has only a single pte mapped, i.e.  is currently used
-by only one process, is accounted as private and not as shared.  "Referenced"
-indicates the amount of memory currently marked as referenced or accessed.
+dirty private pages in the mapping.
+
+The "proportional set size" (PSS) of a process is the count of pages it has
+in memory, where each page is divided by the number of processes sharing it.
+So if a process has 1000 pages all to itself, and 1000 shared with one other
+process, its PSS will be 1500.
+Note that even a page which is part of a MAP_SHARED mapping, but has only
+a single pte mapped, i.e.  is currently used by only one process, is accounted
+as private and not as shared.
+"Referenced" indicates the amount of memory currently marked as referenced or
+accessed.
 "Anonymous" shows the amount of memory that does not belong to any file.  Even
 a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
 and a page is modified, the file page is replaced by a private anonymous copy.
 "Swap" shows how much would-be-anonymous memory is also used, but out on
 swap.
-
+"SwapPss" shows proportional swap share of this mapping.
 "VmFlags" field deserves a separate description. This member represents the kernel
 flags associated with the particular virtual memory area in two letter encoded
 manner. The codes are the following:
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 67c76468a7be..41f1a50c10c9 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -446,6 +446,7 @@ struct mem_size_stats {
 	unsigned long anonymous_thp;
 	unsigned long swap;
 	u64 pss;
+	u64 swap_pss;
 };
 
 static void smaps_account(struct mem_size_stats *mss, struct page *page,
@@ -492,9 +493,20 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 	} else if (is_swap_pte(*pte)) {
 		swp_entry_t swpent = pte_to_swp_entry(*pte);
 
-		if (!non_swap_entry(swpent))
+		if (!non_swap_entry(swpent)) {
+			int mapcount;
+
 			mss->swap += PAGE_SIZE;
-		else if (is_migration_entry(swpent))
+			mapcount = swp_swapcount(swpent);
+			if (mapcount >= 2) {
+				u64 pss_delta = (u64)PAGE_SIZE << PSS_SHIFT;
+
+				do_div(pss_delta, mapcount);
+				mss->swap_pss += pss_delta;
+			} else {
+				mss->swap_pss += (u64)PAGE_SIZE << PSS_SHIFT;
+			}
+		} else if (is_migration_entry(swpent))
 			page = migration_entry_to_page(swpent);
 	}
 
@@ -640,6 +652,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   "Anonymous:      %8lu kB\n"
 		   "AnonHugePages:  %8lu kB\n"
 		   "Swap:           %8lu kB\n"
+		   "SwapPss:        %8lu kB\n"
 		   "KernelPageSize: %8lu kB\n"
 		   "MMUPageSize:    %8lu kB\n"
 		   "Locked:         %8lu kB\n",
@@ -654,6 +667,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   mss.anonymous >> 10,
 		   mss.anonymous_thp >> 10,
 		   mss.swap >> 10,
+		   (unsigned long)(mss.swap_pss >> (10 + PSS_SHIFT)),
 		   vma_kernel_pagesize(vma) >> 10,
 		   vma_mmu_pagesize(vma) >> 10,
 		   (vma->vm_flags & VM_LOCKED) ?
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 31496d201fdc..6282f1eb3d6a 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -431,6 +431,7 @@ extern unsigned int count_swap_pages(int, int);
 extern sector_t map_swap_page(struct page *, struct block_device **);
 extern sector_t swapdev_block(int, pgoff_t);
 extern int page_swapcount(struct page *);
+extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
 extern int reuse_swap_page(struct page *);
 extern int try_to_free_swap(struct page *);
@@ -522,6 +523,11 @@ static inline int page_swapcount(struct page *page)
 	return 0;
 }
 
+static inline int swp_swapcount(swp_entry_t entry)
+{
+	return 0;
+}
+
 #define reuse_swap_page(page)	(page_mapcount(page) == 1)
 
 static inline int try_to_free_swap(struct page *page)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index aebc2dd6e649..58877312cf6b 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -874,6 +874,48 @@ int page_swapcount(struct page *page)
 	return count;
 }
 
+/*
+ * How many references to @entry are currently swapped out?
+ * This considers COUNT_CONTINUED so it returns exact answer.
+ */
+int swp_swapcount(swp_entry_t entry)
+{
+	int count, tmp_count, n;
+	struct swap_info_struct *p;
+	struct page *page;
+	pgoff_t offset;
+	unsigned char *map;
+
+	p = swap_info_get(entry);
+	if (!p)
+		return 0;
+
+	count = swap_count(p->swap_map[swp_offset(entry)]);
+	if (!(count & COUNT_CONTINUED))
+		goto out;
+
+	count &= ~COUNT_CONTINUED;
+	n = SWAP_MAP_MAX + 1;
+
+	offset = swp_offset(entry);
+	page = vmalloc_to_page(p->swap_map + offset);
+	offset &= ~PAGE_MASK;
+	VM_BUG_ON(page_private(page) != SWP_CONTINUED);
+
+	do {
+		page = list_entry(page->lru.next, struct page, lru);
+		map = kmap_atomic(page);
+		tmp_count = map[offset];
+		kunmap_atomic(map);
+
+		count += (tmp_count & ~COUNT_CONTINUED) * n;
+		n *= (SWAP_CONT_MAX + 1);
+	} while (tmp_count & COUNT_CONTINUED);
+out:
+	spin_unlock(&p->lock);
+	return count;
+}
+
 /*
  * We can write to an anon page without COW if there are no other references
  * to it.  And as a side-effect, free up its swap: because the old content
-- 
cgit v1.2.3


From 071a4befebb655d6b31bf5c6bacd5a6df035224d Mon Sep 17 00:00:00 2001
From: David Rientjes <rientjes@google.com>
Date: Tue, 8 Sep 2015 15:00:42 -0700
Subject: mm, oom: do not panic for oom kills triggered from sysrq

Sysrq+f is used to kill a process either for debug or when the VM is
otherwise unresponsive.

It is not intended to trigger a panic when no process may be killed.

Avoid panicking the system for sysrq+f when no processes are killed.

Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Michal Hocko <mhocko@suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/sysrq.txt | 3 ++-
 mm/oom_kill.c           | 7 +++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
index 267f39386f99..13f5619b2203 100644
--- a/Documentation/sysrq.txt
+++ b/Documentation/sysrq.txt
@@ -75,7 +75,8 @@ On all -  write a character to /proc/sysrq-trigger.  e.g.:
 
 'e'     - Send a SIGTERM to all processes, except for init.
 
-'f'	- Will call oom_kill to kill a memory hog process.
+'f'	- Will call the oom killer to kill a memory hog process, but do not
+	  panic if nothing can be killed.
 
 'g'	- Used by kgdb (kernel debugger)
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 77adc8e876aa..91dd59f63910 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -607,6 +607,9 @@ void check_panic_on_oom(struct oom_control *oc, enum oom_constraint constraint,
 		if (constraint != CONSTRAINT_NONE)
 			return;
 	}
+	/* Do not panic for oom kills triggered by sysrq */
+	if (oc->order == -1)
+		return;
 	dump_header(oc, NULL, memcg);
 	panic("Out of memory: %s panic_on_oom is enabled\n",
 		sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
@@ -686,11 +689,11 @@ bool out_of_memory(struct oom_control *oc)
 
 	p = select_bad_process(oc, &points, totalpages);
 	/* Found nothing?!?! Either we hang forever, or we panic. */
-	if (!p) {
+	if (!p && oc->order != -1) {
 		dump_header(oc, NULL, NULL);
 		panic("Out of memory and no killable processes...\n");
 	}
-	if (p != (void *)-1UL) {
+	if (p && p != (void *)-1UL) {
 		oom_kill_process(oc, p, points, totalpages, NULL,
 				 "Out of memory");
 		killed = 1;
-- 
cgit v1.2.3


From ad82362b2defd4adad87d8538617b2f51a4bf9c3 Mon Sep 17 00:00:00 2001
From: "Sean O. Stalley" <sean.stalley@intel.com>
Date: Tue, 8 Sep 2015 15:02:27 -0700
Subject: mm: add dma_pool_zalloc() call to DMA API

Add a wrapper function for dma_pool_alloc() to get zeroed memory.

Signed-off-by: Sean O. Stalley <sean.stalley@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Gilles Muller <Gilles.Muller@lip6.fr>
Cc: Nicolas Palix <nicolas.palix@imag.fr>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/DMA-API.txt | 7 +++++++
 include/linux/dmapool.h   | 6 ++++++
 2 files changed, 13 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 7eba542eff7c..edccacd4f048 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -104,6 +104,13 @@ crossing restrictions, pass 0 for alloc; passing 4096 says memory allocated
 from this pool must not cross 4KByte boundaries.
 
 
+	void *dma_pool_zalloc(struct dma_pool *pool, gfp_t mem_flags,
+			      dma_addr_t *handle)
+
+Wraps dma_pool_alloc() and also zeroes the returned memory if the
+allocation attempt succeeded.
+
+
 	void *dma_pool_alloc(struct dma_pool *pool, gfp_t gfp_flags,
 			dma_addr_t *dma_handle);
 
diff --git a/include/linux/dmapool.h b/include/linux/dmapool.h
index e1043f79122f..53ba737505df 100644
--- a/include/linux/dmapool.h
+++ b/include/linux/dmapool.h
@@ -24,6 +24,12 @@ void dma_pool_destroy(struct dma_pool *pool);
 void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 		     dma_addr_t *handle);
 
+static inline void *dma_pool_zalloc(struct dma_pool *pool, gfp_t mem_flags,
+				    dma_addr_t *handle)
+{
+	return dma_pool_alloc(pool, mem_flags | __GFP_ZERO, handle);
+}
+
 void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t addr);
 
 /*
-- 
cgit v1.2.3


From e6590740ceb83fd014fae7d571fe5a5d5886b7c8 Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Tue, 8 Sep 2015 15:02:58 -0700
Subject: Documentation: update libhugetlbfs location and use for testing

The URL for libhugetlbfs has changed.  Also, put a stronger emphasis on
using libgugetlbfs for hugetlb regression testing.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/hugetlbpage.txt | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt
index 030977fb8d2d..54dd9b9c6c31 100644
--- a/Documentation/vm/hugetlbpage.txt
+++ b/Documentation/vm/hugetlbpage.txt
@@ -329,7 +329,14 @@ Examples
 
 3) hugepage-mmap:  see tools/testing/selftests/vm/hugepage-mmap.c
 
-4) The libhugetlbfs (http://libhugetlbfs.sourceforge.net) library provides a
-   wide range of userspace tools to help with huge page usability, environment
-   setup, and control. Furthermore it provides useful test cases that should be
-   used when modifying code to ensure no regressions are introduced.
+4) The libhugetlbfs (https://github.com/libhugetlbfs/libhugetlbfs) library
+   provides a wide range of userspace tools to help with huge page usability,
+   environment setup, and control.
+
+Kernel development regression testing
+=====================================
+
+The most complete set of hugetlb tests are in the libhugetlbfs repository.
+If you modify any hugetlb related code, use the libhugetlbfs test suite
+to check for regressions.  In addition, if you add any new hugetlb
+functionality, please add appropriate tests to libhugetlbfs.
-- 
cgit v1.2.3


From 013110a73dcf970cb28c5b0a79f9eee577ea6aa2 Mon Sep 17 00:00:00 2001
From: Yaowei Bai <bywxiaobai@163.com>
Date: Tue, 8 Sep 2015 15:04:10 -0700
Subject: mm/page_alloc.c: fix a misleading comment

The comment says that the per-cpu batchsize and zone watermarks are
determined by present_pages which is definitely wrong, they are both
calculated from managed_pages.  Fix it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/sysctl/vm.txt | 4 ++--
 mm/page_alloc.c             | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 9c3f2f8054b5..a4482fceacec 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -349,7 +349,7 @@ zone[i]'s protection[j] is calculated by following expression.
 
 (i < j):
   zone[i]->protection[j]
-  = (total sums of present_pages from zone[i+1] to zone[j] on the node)
+  = (total sums of managed_pages from zone[i+1] to zone[j] on the node)
     / lowmem_reserve_ratio[i];
 (i = j):
    (should not be protected. = 0;
@@ -360,7 +360,7 @@ The default values of lowmem_reserve_ratio[i] are
     256 (if zone[i] means DMA or DMA32 zone)
     32  (others).
 As above expression, they are reciprocal number of ratio.
-256 means 1/256. # of protection pages becomes about "0.39%" of total present
+256 means 1/256. # of protection pages becomes about "0.39%" of total managed
 pages of higher zones on the node.
 
 If you would like to protect more pages, smaller values are effective.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bdaa0cf8fd41..59abb47b70ee 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6022,7 +6022,7 @@ void __init mem_init_print_info(const char *str)
  * set_dma_reserve - set the specified number of pages reserved in the first zone
  * @new_dma_reserve: The number of pages to mark reserved
  *
- * The per-cpu batchsize and zone watermarks are determined by present_pages.
+ * The per-cpu batchsize and zone watermarks are determined by managed_pages.
  * In the DMA zone, a significant percentage may be consumed by kernel image
  * and other unfreeable allocations which can skew the watermarks badly. This
  * function may optionally be used to account for unfreeable pages in the
-- 
cgit v1.2.3


From 860c707dca155a56dfa115ddd6c00959296144a6 Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Date: Tue, 8 Sep 2015 15:04:38 -0700
Subject: zsmalloc: account the number of compacted pages

Compaction returns back to zram the number of migrated objects, which is
quite uninformative -- we have objects of different sizes so user space
cannot obtain any valuable data from that number.  Change compaction to
operate in terms of pages and return back to compaction issuer the
number of pages that were freed during compaction.  So from now on we
will export more meaningful value in zram<id>/mm_stat -- the number of
freed (compacted) pages.

This requires:
 (a) a rename of `num_migrated' to 'pages_compacted'
 (b) a internal API change -- return first_page's fullness_group from
     putback_zspage(), so we know when putback_zspage() did
     free_zspage().  It helps us to account compaction stats correctly.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/blockdev/zram.txt |  3 ++-
 drivers/block/zram/zram_drv.c   |  2 +-
 include/linux/zsmalloc.h        |  4 ++--
 mm/zsmalloc.c                   | 27 +++++++++++++++++----------
 4 files changed, 22 insertions(+), 14 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index c4de576093af..62435bb25266 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -144,7 +144,8 @@ mem_used_max      RW    the maximum amount memory zram have consumed to
                         store compressed data
 mem_limit         RW    the maximum amount of memory ZRAM can use to store
                         the compressed data
-num_migrated      RO    the number of objects migrated migrated by compaction
+pages_compacted   RO    the number of pages freed during compaction
+                        (available only via zram<id>/mm_stat node)
 compact           WO    trigger memory compaction
 
 WARNING
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index bcde5c321090..f1c4bb34e007 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -450,7 +450,7 @@ static ssize_t mm_stat_show(struct device *dev,
 			zram->limit_pages << PAGE_SHIFT,
 			max_used << PAGE_SHIFT,
 			(u64)atomic64_read(&zram->stats.zero_pages),
-			pool_stats.num_migrated);
+			pool_stats.pages_compacted);
 	up_read(&zram->init_lock);
 
 	return ret;
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index ad3d23239043..6398dfae53f1 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -35,8 +35,8 @@ enum zs_mapmode {
 };
 
 struct zs_pool_stats {
-	/* How many objects were migrated */
-	unsigned long num_migrated;
+	/* How many pages were migrated (freed) */
+	unsigned long pages_compacted;
 };
 
 struct zs_pool;
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 8f76d8875aca..b7b4a5612ec7 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1579,8 +1579,6 @@ struct zs_compact_control {
 	 /* Starting object index within @s_page which used for live object
 	  * in the subpage. */
 	int index;
-	/* How many of objects were migrated */
-	int nr_migrated;
 };
 
 static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
@@ -1617,7 +1615,6 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 		record_obj(handle, free_obj);
 		unpin_tag(handle);
 		obj_free(pool, class, used_obj);
-		cc->nr_migrated++;
 	}
 
 	/* Remember last position in this iteration */
@@ -1643,8 +1640,17 @@ static struct page *isolate_target_page(struct size_class *class)
 	return page;
 }
 
-static void putback_zspage(struct zs_pool *pool, struct size_class *class,
-				struct page *first_page)
+/*
+ * putback_zspage - add @first_page into right class's fullness list
+ * @pool: target pool
+ * @class: destination class
+ * @first_page: target page
+ *
+ * Return @fist_page's fullness_group
+ */
+static enum fullness_group putback_zspage(struct zs_pool *pool,
+			struct size_class *class,
+			struct page *first_page)
 {
 	enum fullness_group fullness;
 
@@ -1662,6 +1668,8 @@ static void putback_zspage(struct zs_pool *pool, struct size_class *class,
 
 		free_zspage(first_page);
 	}
+
+	return fullness;
 }
 
 static struct page *isolate_source_page(struct size_class *class)
@@ -1704,7 +1712,6 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
 	struct page *src_page;
 	struct page *dst_page = NULL;
 
-	cc.nr_migrated = 0;
 	spin_lock(&class->lock);
 	while ((src_page = isolate_source_page(class))) {
 
@@ -1733,7 +1740,9 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
 			break;
 
 		putback_zspage(pool, class, dst_page);
-		putback_zspage(pool, class, src_page);
+		if (putback_zspage(pool, class, src_page) == ZS_EMPTY)
+			pool->stats.pages_compacted +=
+				get_pages_per_zspage(class->size);
 		spin_unlock(&class->lock);
 		cond_resched();
 		spin_lock(&class->lock);
@@ -1742,8 +1751,6 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
 	if (src_page)
 		putback_zspage(pool, class, src_page);
 
-	pool->stats.num_migrated += cc.nr_migrated;
-
 	spin_unlock(&class->lock);
 }
 
@@ -1761,7 +1768,7 @@ unsigned long zs_compact(struct zs_pool *pool)
 		__zs_compact(pool, class);
 	}
 
-	return pool->stats.num_migrated;
+	return pool->stats.pages_compacted;
 }
 EXPORT_SYMBOL_GPL(zs_compact);
 
-- 
cgit v1.2.3


From b0dabcc6f8b7cb08d5c81ecbc642105d67a4c1d2 Mon Sep 17 00:00:00 2001
From: Ariel D'Alessandro <ariel@vanguardiasur.com.ar>
Date: Wed, 5 Aug 2015 23:31:47 -0300
Subject: pwm: Add NXP LPC18xx PWM/SCT DT binding documentation

Add the devicetree binding document for NXP LPC18xx PWM/SCT.

Signed-off-by: Ariel D'Alessandro <ariel@vanguardiasur.com.ar>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
---
 .../devicetree/bindings/pwm/lpc1850-sct-pwm.txt      | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pwm/lpc1850-sct-pwm.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/pwm/lpc1850-sct-pwm.txt b/Documentation/devicetree/bindings/pwm/lpc1850-sct-pwm.txt
new file mode 100644
index 000000000000..36e49d4325cd
--- /dev/null
+++ b/Documentation/devicetree/bindings/pwm/lpc1850-sct-pwm.txt
@@ -0,0 +1,20 @@
+* NXP LPC18xx State Configurable Timer - Pulse Width Modulator driver
+
+Required properties:
+  - compatible: Should be "nxp,lpc1850-sct-pwm"
+  - reg: Should contain physical base address and length of pwm registers.
+  - clocks: Must contain an entry for each entry in clock-names.
+    See ../clock/clock-bindings.txt for details.
+  - clock-names: Must include the following entries.
+    - pwm: PWM operating clock.
+  - #pwm-cells: Should be 3. See pwm.txt in this directory for the description
+    of the cells format.
+
+Example:
+  pwm: pwm@40000000 {
+    compatible = "nxp,lpc1850-sct-pwm";
+    reg = <0x40000000 0x1000>;
+    clocks =<&ccu1 CLK_CPU_SCT>;
+    clock-names = "pwm";
+    #pwm-cells = <3>;
+  };
-- 
cgit v1.2.3


From f15d7114bbdd594f1d8b644a44f647d2b6e9772e Mon Sep 17 00:00:00 2001
From: Timur Tabi <timur@codeaurora.org>
Date: Mon, 29 Jun 2015 11:46:17 -0500
Subject: Documentation/watchdog: add timeout and ping rate control to
 watchdog-test.c

The watchdog test program is much more useful if it can configure the
timeout value and ping rate.  This will allow you to test actual timeouts.

Adds the -t parameter to set the timeout value (in seconds), and -p to set
the ping rate (number of seconds between pings).

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
---
 Documentation/watchdog/src/watchdog-test.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/watchdog/src/watchdog-test.c b/Documentation/watchdog/src/watchdog-test.c
index 3da822967ee0..fcdde8fc98be 100644
--- a/Documentation/watchdog/src/watchdog-test.c
+++ b/Documentation/watchdog/src/watchdog-test.c
@@ -41,6 +41,7 @@ static void term(int sig)
 int main(int argc, char *argv[])
 {
     int flags;
+    unsigned int ping_rate = 1;
 
     fd = open("/dev/watchdog", O_WRONLY);
 
@@ -63,22 +64,33 @@ int main(int argc, char *argv[])
 	    fprintf(stderr, "Watchdog card enabled.\n");
 	    fflush(stderr);
 	    goto end;
+	} else if (!strncasecmp(argv[1], "-t", 2) && argv[2]) {
+	    flags = atoi(argv[2]);
+	    ioctl(fd, WDIOC_SETTIMEOUT, &flags);
+	    fprintf(stderr, "Watchdog timeout set to %u seconds.\n", flags);
+	    fflush(stderr);
+	    goto end;
+	} else if (!strncasecmp(argv[1], "-p", 2) && argv[2]) {
+	    ping_rate = strtoul(argv[2], NULL, 0);
+	    fprintf(stderr, "Watchdog ping rate set to %u seconds.\n", ping_rate);
+	    fflush(stderr);
 	} else {
-	    fprintf(stderr, "-d to disable, -e to enable.\n");
+	    fprintf(stderr, "-d to disable, -e to enable, -t <n> to set " \
+		"the timeout,\n-p <n> to set the ping rate, and \n");
 	    fprintf(stderr, "run by itself to tick the card.\n");
 	    fflush(stderr);
 	    goto end;
 	}
-    } else {
-	fprintf(stderr, "Watchdog Ticking Away!\n");
-	fflush(stderr);
     }
 
+    fprintf(stderr, "Watchdog Ticking Away!\n");
+    fflush(stderr);
+
     signal(SIGINT, term);
 
     while(1) {
 	keep_alive();
-	sleep(1);
+	sleep(ping_rate);
     }
 end:
     close(fd);
-- 
cgit v1.2.3


From ab54d7f017772e89964d4040937a83cd4468562a Mon Sep 17 00:00:00 2001
From: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Date: Fri, 31 Jul 2015 11:39:39 +0200
Subject: Documentation: watchdog: at91sam9_wdt: add clocks property

The watchdog has an input clock, the slow clock. It is required as it will
not function without it.

Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
---
 Documentation/devicetree/bindings/watchdog/atmel-wdt.txt | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/watchdog/atmel-wdt.txt b/Documentation/devicetree/bindings/watchdog/atmel-wdt.txt
index a4d869744f59..86fa6de1019b 100644
--- a/Documentation/devicetree/bindings/watchdog/atmel-wdt.txt
+++ b/Documentation/devicetree/bindings/watchdog/atmel-wdt.txt
@@ -6,6 +6,7 @@ Required properties:
 - compatible: must be "atmel,at91sam9260-wdt".
 - reg: physical base address of the controller and length of memory mapped
   region.
+- clocks: phandle to input clock.
 
 Optional properties:
 - timeout-sec: contains the watchdog timeout in seconds.
@@ -39,6 +40,7 @@ Example:
 		compatible = "atmel,at91sam9260-wdt";
 		reg = <0xfffffd40 0x10>;
 		interrupts = <1 IRQ_TYPE_LEVEL_HIGH 7>;
+		clocks = <&clk32k>;
 		timeout-sec = <15>;
 		atmel,watchdog-type = "hardware";
 		atmel,reset-type = "all";
-- 
cgit v1.2.3


From cfde37e1ec18dc68a52a1b882390c0f9c52b5f10 Mon Sep 17 00:00:00 2001
From: Ariel D'Alessandro <ariel@vanguardiasur.com.ar>
Date: Sat, 1 Aug 2015 15:37:17 -0300
Subject: DT: watchdog: Add NXP LPC18xx Watchdog Timer binding documentation

Add the devicetree binding document for NXP LPC18xx Watchdog Timer.

Signed-off-by: Ariel D'Alessandro <ariel@vanguardiasur.com.ar>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
---
 .../devicetree/bindings/watchdog/lpc18xx-wdt.txt      | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/watchdog/lpc18xx-wdt.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/watchdog/lpc18xx-wdt.txt b/Documentation/devicetree/bindings/watchdog/lpc18xx-wdt.txt
new file mode 100644
index 000000000000..09f6b24969e0
--- /dev/null
+++ b/Documentation/devicetree/bindings/watchdog/lpc18xx-wdt.txt
@@ -0,0 +1,19 @@
+* NXP LPC18xx Watchdog Timer (WDT)
+
+Required properties:
+- compatible: Should be "nxp,lpc1850-wwdt"
+- reg: Should contain WDT registers location and length
+- clocks: Must contain an entry for each entry in clock-names.
+- clock-names: Should contain "wdtclk" and "reg"; the watchdog counter
+               clock and register interface clock respectively.
+- interrupts: Should contain WDT interrupt
+
+Examples:
+
+watchdog@40080000 {
+	compatible = "nxp,lpc1850-wwdt";
+	reg = <0x40080000 0x24>;
+	clocks = <&cgu BASE_SAFE_CLK>, <&ccu1 CLK_CPU_WWDT>;
+	clock-names = "wdtclk", "reg";
+	interrupts = <49>;
+};
-- 
cgit v1.2.3


From f4fff94e3e3a712ef062c44b64ecf8f552f48ea4 Mon Sep 17 00:00:00 2001
From: Wenyou Yang <wenyou.yang@atmel.com>
Date: Thu, 6 Aug 2015 18:17:05 +0800
Subject: Documentation: dt: binding: atmel-sama5d4-wdt: for SAMA5D4 watchdog
 driver

The compatible "atmel,sama5d4-wdt" supports the SAMA5D4 watchdog driver
and the watchdog's WDT_MR register can be written more than once.

Signed-off-by: Wenyou Yang <wenyou.yang@atmel.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
---
 .../bindings/watchdog/atmel-sama5d4-wdt.txt        | 35 ++++++++++++++++++++++
 1 file changed, 35 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/watchdog/atmel-sama5d4-wdt.txt

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/watchdog/atmel-sama5d4-wdt.txt b/Documentation/devicetree/bindings/watchdog/atmel-sama5d4-wdt.txt
new file mode 100644
index 000000000000..f7cc7c060910
--- /dev/null
+++ b/Documentation/devicetree/bindings/watchdog/atmel-sama5d4-wdt.txt
@@ -0,0 +1,35 @@
+* Atmel SAMA5D4 Watchdog Timer (WDT) Controller
+
+Required properties:
+- compatible: "atmel,sama5d4-wdt"
+- reg: base physical address and length of memory mapped region.
+
+Optional properties:
+- timeout-sec: watchdog timeout value (in seconds).
+- interrupts: interrupt number to the CPU.
+- atmel,watchdog-type: should be "hardware" or "software".
+	"hardware": enable watchdog fault reset. A watchdog fault triggers
+		    watchdog reset.
+	"software": enable watchdog fault interrupt. A watchdog fault asserts
+		    watchdog interrupt.
+- atmel,idle-halt: present if you want to stop the watchdog when the CPU is
+		   in idle state.
+	CAUTION: This property should be used with care, it actually makes the
+	watchdog not counting when the CPU is in idle state, therefore the
+	watchdog reset time depends on mean CPU usage and will not reset at all
+	if the CPU stop working while it is in idle state, which is probably
+	not what you want.
+- atmel,dbg-halt: present if you want to stop the watchdog when the CPU is
+		  in debug state.
+
+Example:
+	watchdog@fc068640 {
+		compatible = "atmel,sama5d4-wdt";
+		reg = <0xfc068640 0x10>;
+		interrupts = <4 IRQ_TYPE_LEVEL_HIGH 5>;
+		timeout-sec = <10>;
+		atmel,watchdog-type = "hardware";
+		atmel,dbg-halt;
+		atmel,idle-halt;
+		status = "okay";
+	};
-- 
cgit v1.2.3


From 93dbed9121cc8e0fcc93edd9fca901322bdfbd1a Mon Sep 17 00:00:00 2001
From: Andy Gross <agross@codeaurora.org>
Date: Wed, 26 Aug 2015 14:42:45 -0500
Subject: soc: qcom: smd: Use correct remote processor ID

This patch fixes SMEM addressing issues when remote processors need to use
secure SMEM partitions.

Signed-off-by: Andy Gross <agross@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
---
 Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt |  6 ++++++
 drivers/soc/qcom/smd.c                                  | 16 ++++++++++++----
 2 files changed, 18 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt
index f65c76db9859..97d9b3e1bf39 100644
--- a/Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt
+++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt
@@ -37,6 +37,12 @@ The edge is described by the following properties:
 	Definition: the identifier of the remote processor in the smd channel
 		    allocation table
 
+- qcom,remote-pid:
+	Usage: optional
+	Value type: <u32>
+	Definition: the identifier for the remote processor as known by the rest
+		    of the system.
+
 = SMD DEVICES
 
 In turn, subnodes of the "edges" represent devices tied to SMD channels on that
diff --git a/drivers/soc/qcom/smd.c b/drivers/soc/qcom/smd.c
index 327adcf117c1..edd9d9a37238 100644
--- a/drivers/soc/qcom/smd.c
+++ b/drivers/soc/qcom/smd.c
@@ -96,6 +96,7 @@ static const struct {
  * @smd:		handle to qcom_smd
  * @of_node:		of_node handle for information related to this edge
  * @edge_id:		identifier of this edge
+ * @remote_pid:		identifier of remote processor
  * @irq:		interrupt for signals on this edge
  * @ipc_regmap:		regmap handle holding the outgoing ipc register
  * @ipc_offset:		offset within @ipc_regmap of the register for ipc
@@ -111,6 +112,7 @@ struct qcom_smd_edge {
 	struct qcom_smd *smd;
 	struct device_node *of_node;
 	unsigned edge_id;
+	unsigned remote_pid;
 
 	int irq;
 
@@ -572,7 +574,7 @@ static irqreturn_t qcom_smd_edge_intr(int irq, void *data)
 	 * have to scan if the amount of available space in smem have changed
 	 * since last scan.
 	 */
-	available = qcom_smem_get_free_space(edge->edge_id);
+	available = qcom_smem_get_free_space(edge->remote_pid);
 	if (available != edge->smem_available) {
 		edge->smem_available = available;
 		edge->need_rescan = true;
@@ -976,7 +978,8 @@ static struct qcom_smd_channel *qcom_smd_create_channel(struct qcom_smd_edge *ed
 	spin_lock_init(&channel->recv_lock);
 	init_waitqueue_head(&channel->fblockread_event);
 
-	ret = qcom_smem_get(edge->edge_id, smem_info_item, (void **)&info, &info_size);
+	ret = qcom_smem_get(edge->remote_pid, smem_info_item, (void **)&info,
+			    &info_size);
 	if (ret)
 		goto free_name_and_channel;
 
@@ -997,7 +1000,8 @@ static struct qcom_smd_channel *qcom_smd_create_channel(struct qcom_smd_edge *ed
 		goto free_name_and_channel;
 	}
 
-	ret = qcom_smem_get(edge->edge_id, smem_fifo_item, &fifo_base, &fifo_size);
+	ret = qcom_smem_get(edge->remote_pid, smem_fifo_item, &fifo_base,
+			    &fifo_size);
 	if (ret)
 		goto free_name_and_channel;
 
@@ -1041,7 +1045,7 @@ static void qcom_discover_channels(struct qcom_smd_edge *edge)
 	int i;
 
 	for (tbl = 0; tbl < SMD_ALLOC_TBL_COUNT; tbl++) {
-		ret = qcom_smem_get(edge->edge_id,
+		ret = qcom_smem_get(edge->remote_pid,
 				    smem_items[tbl].alloc_tbl_id,
 				    (void **)&alloc_tbl,
 				    NULL);
@@ -1184,6 +1188,10 @@ static int qcom_smd_parse_edge(struct device *dev,
 		return -EINVAL;
 	}
 
+	edge->remote_pid = QCOM_SMEM_HOST_ANY;
+	key = "qcom,remote-pid";
+	of_property_read_u32(node, key, &edge->remote_pid);
+
 	syscon_np = of_parse_phandle(node, "qcom,ipc", 0);
 	if (!syscon_np) {
 		dev_err(dev, "no qcom,ipc node\n");
-- 
cgit v1.2.3


From 9c4c5ef3760470cbf8bf408a173d1b2fdba065b1 Mon Sep 17 00:00:00 2001
From: Dan Streetman <ddstreet@ieee.org>
Date: Wed, 9 Sep 2015 15:35:25 -0700
Subject: zswap: update docs for runtime-changeable attributes

Change the Documentation/vm/zswap.txt doc to indicate that the "zpool" and
"compressor" params are now changeable at runtime.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/zswap.txt | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/zswap.txt b/Documentation/vm/zswap.txt
index 8458c0861e4e..89fff7d611cc 100644
--- a/Documentation/vm/zswap.txt
+++ b/Documentation/vm/zswap.txt
@@ -32,7 +32,7 @@ can also be enabled and disabled at runtime using the sysfs interface.
 An example command to enable zswap at runtime, assuming sysfs is mounted
 at /sys, is:
 
-echo 1 > /sys/modules/zswap/parameters/enabled
+echo 1 > /sys/module/zswap/parameters/enabled
 
 When zswap is disabled at runtime it will stop storing pages that are
 being swapped out.  However, it will _not_ immediately write out or fault
@@ -49,14 +49,26 @@ Zswap receives pages for compression through the Frontswap API and is able to
 evict pages from its own compressed pool on an LRU basis and write them back to
 the backing swap device in the case that the compressed pool is full.
 
-Zswap makes use of zbud for the managing the compressed memory pool.  Each
-allocation in zbud is not directly accessible by address.  Rather, a handle is
+Zswap makes use of zpool for the managing the compressed memory pool.  Each
+allocation in zpool is not directly accessible by address.  Rather, a handle is
 returned by the allocation routine and that handle must be mapped before being
 accessed.  The compressed memory pool grows on demand and shrinks as compressed
-pages are freed.  The pool is not preallocated.
+pages are freed.  The pool is not preallocated.  By default, a zpool of type
+zbud is created, but it can be selected at boot time by setting the "zpool"
+attribute, e.g. zswap.zpool=zbud.  It can also be changed at runtime using the
+sysfs "zpool" attribute, e.g.
+
+echo zbud > /sys/module/zswap/parameters/zpool
+
+The zbud type zpool allocates exactly 1 page to store 2 compressed pages, which
+means the compression ratio will always be 2:1 or worse (because of half-full
+zbud pages).  The zsmalloc type zpool has a more complex compressed page
+storage method, and it can achieve greater storage densities.  However,
+zsmalloc does not implement compressed page eviction, so once zswap fills it
+cannot evict the oldest page, it can only reject new pages.
 
 When a swap page is passed from frontswap to zswap, zswap maintains a mapping
-of the swap entry, a combination of the swap type and swap offset, to the zbud
+of the swap entry, a combination of the swap type and swap offset, to the zpool
 handle that references that compressed swap page.  This mapping is achieved
 with a red-black tree per swap type.  The swap offset is the search key for the
 tree nodes.
@@ -74,9 +86,17 @@ controlled policy:
 * max_pool_percent - The maximum percentage of memory that the compressed
     pool can occupy.
 
-Zswap allows the compressor to be selected at kernel boot time by setting the
-“compressor” attribute.  The default compressor is lzo.  e.g.
-zswap.compressor=deflate
+The default compressor is lzo, but it can be selected at boot time by setting
+the “compressor” attribute, e.g. zswap.compressor=lzo.  It can also be changed
+at runtime using the sysfs "compressor" attribute, e.g.
+
+echo lzo > /sys/module/zswap/parameters/compressor
+
+When the zpool and/or compressor parameter is changed at runtime, any existing
+compressed pages are not modified; they are left in their own zpool.  When a
+request is made for a page in an old zpool, it is uncompressed using its
+original compressor.  Once all pages are removed from an old zpool, the zpool
+and its compressor are freed.
 
 A debugfs interface is provided for various statistic about pool size, number
 of pages stored, and various counters for the reasons pages are rejected.
-- 
cgit v1.2.3


From 80ae2fdceba8313b0433f899bdd9c6c463291a17 Mon Sep 17 00:00:00 2001
From: Vladimir Davydov <vdavydov@parallels.com>
Date: Wed, 9 Sep 2015 15:35:38 -0700
Subject: proc: add kpagecgroup file

/proc/kpagecgroup contains a 64-bit inode number of the memory cgroup each
page is charged to, indexed by PFN.  Having this information is useful for
estimating a cgroup working set size.

The file is present if CONFIG_PROC_PAGE_MONITOR && CONFIG_MEMCG.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/pagemap.txt |  6 ++++-
 fs/proc/page.c               | 53 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 3cd38438242a..ce294b0aace4 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -5,7 +5,7 @@ pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow
 userspace programs to examine the page tables and related information by
 reading files in /proc.
 
-There are three components to pagemap:
+There are four components to pagemap:
 
  * /proc/pid/pagemap.  This file lets a userspace process find out which
    physical frame each virtual page is mapped to.  It contains one 64-bit
@@ -71,6 +71,10 @@ There are three components to pagemap:
     23. BALLOON
     24. ZERO_PAGE
 
+ * /proc/kpagecgroup.  This file contains a 64-bit inode number of the
+   memory cgroup each page is charged to, indexed by PFN. Only available when
+   CONFIG_MEMCG is set.
+
 Short descriptions to the page flags:
 
  0. LOCKED
diff --git a/fs/proc/page.c b/fs/proc/page.c
index 7eee2d8b97d9..70d23245dd43 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -9,6 +9,7 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 #include <linux/hugetlb.h>
+#include <linux/memcontrol.h>
 #include <linux/kernel-page-flags.h>
 #include <asm/uaccess.h>
 #include "internal.h"
@@ -225,10 +226,62 @@ static const struct file_operations proc_kpageflags_operations = {
 	.read = kpageflags_read,
 };
 
+#ifdef CONFIG_MEMCG
+static ssize_t kpagecgroup_read(struct file *file, char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	u64 __user *out = (u64 __user *)buf;
+	struct page *ppage;
+	unsigned long src = *ppos;
+	unsigned long pfn;
+	ssize_t ret = 0;
+	u64 ino;
+
+	pfn = src / KPMSIZE;
+	count = min_t(unsigned long, count, (max_pfn * KPMSIZE) - src);
+	if (src & KPMMASK || count & KPMMASK)
+		return -EINVAL;
+
+	while (count > 0) {
+		if (pfn_valid(pfn))
+			ppage = pfn_to_page(pfn);
+		else
+			ppage = NULL;
+
+		if (ppage)
+			ino = page_cgroup_ino(ppage);
+		else
+			ino = 0;
+
+		if (put_user(ino, out)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		pfn++;
+		out++;
+		count -= KPMSIZE;
+	}
+
+	*ppos += (char __user *)out - buf;
+	if (!ret)
+		ret = (char __user *)out - buf;
+	return ret;
+}
+
+static const struct file_operations proc_kpagecgroup_operations = {
+	.llseek = mem_lseek,
+	.read = kpagecgroup_read,
+};
+#endif /* CONFIG_MEMCG */
+
 static int __init proc_page_init(void)
 {
 	proc_create("kpagecount", S_IRUSR, NULL, &proc_kpagecount_operations);
 	proc_create("kpageflags", S_IRUSR, NULL, &proc_kpageflags_operations);
+#ifdef CONFIG_MEMCG
+	proc_create("kpagecgroup", S_IRUSR, NULL, &proc_kpagecgroup_operations);
+#endif
 	return 0;
 }
 fs_initcall(proc_page_init);
-- 
cgit v1.2.3


From 33c3fc71c8cfa3cc3a98beaa901c069c177dc295 Mon Sep 17 00:00:00 2001
From: Vladimir Davydov <vdavydov@parallels.com>
Date: Wed, 9 Sep 2015 15:35:45 -0700
Subject: mm: introduce idle page tracking

Knowing the portion of memory that is not used by a certain application or
memory cgroup (idle memory) can be useful for partitioning the system
efficiently, e.g.  by setting memory cgroup limits appropriately.
Currently, the only means to estimate the amount of idle memory provided
by the kernel is /proc/PID/{clear_refs,smaps}: the user can clear the
access bit for all pages mapped to a particular process by writing 1 to
clear_refs, wait for some time, and then count smaps:Referenced.  However,
this method has two serious shortcomings:

 - it does not count unmapped file pages
 - it affects the reclaimer logic

To overcome these drawbacks, this patch introduces two new page flags,
Idle and Young, and a new sysfs file, /sys/kernel/mm/page_idle/bitmap.
A page's Idle flag can only be set from userspace by setting bit in
/sys/kernel/mm/page_idle/bitmap at the offset corresponding to the page,
and it is cleared whenever the page is accessed either through page tables
(it is cleared in page_referenced() in this case) or using the read(2)
system call (mark_page_accessed()). Thus by setting the Idle flag for
pages of a particular workload, which can be found e.g.  by reading
/proc/PID/pagemap, waiting for some time to let the workload access its
working set, and then reading the bitmap file, one can estimate the amount
of pages that are not used by the workload.

The Young page flag is used to avoid interference with the memory
reclaimer.  A page's Young flag is set whenever the Access bit of a page
table entry pointing to the page is cleared by writing to the bitmap file.
If page_referenced() is called on a Young page, it will add 1 to its
return value, therefore concealing the fact that the Access bit was
cleared.

Note, since there is no room for extra page flags on 32 bit, this feature
uses extended page flags when compiled on 32 bit.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: kpageidle requires an MMU]
[akpm@linux-foundation.org: decouple from page-flags rework]
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/00-INDEX               |   2 +
 Documentation/vm/idle_page_tracking.txt |  98 ++++++++++++++
 fs/proc/page.c                          |   3 +
 fs/proc/task_mmu.c                      |   5 +-
 include/linux/mmu_notifier.h            |   2 +
 include/linux/page-flags.h              |  11 ++
 include/linux/page_ext.h                |   4 +
 include/linux/page_idle.h               | 110 +++++++++++++++
 mm/Kconfig                              |  12 ++
 mm/Makefile                             |   1 +
 mm/debug.c                              |   4 +
 mm/huge_memory.c                        |  12 +-
 mm/migrate.c                            |   6 +
 mm/page_ext.c                           |   4 +
 mm/page_idle.c                          | 232 ++++++++++++++++++++++++++++++++
 mm/rmap.c                               |   6 +
 mm/swap.c                               |   3 +
 17 files changed, 512 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/vm/idle_page_tracking.txt
 create mode 100644 include/linux/page_idle.h
 create mode 100644 mm/page_idle.c

(limited to 'Documentation')

diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX
index 081c49777abb..6a5e2a102a45 100644
--- a/Documentation/vm/00-INDEX
+++ b/Documentation/vm/00-INDEX
@@ -14,6 +14,8 @@ hugetlbpage.txt
 	- a brief summary of hugetlbpage support in the Linux kernel.
 hwpoison.txt
 	- explains what hwpoison is
+idle_page_tracking.txt
+	- description of the idle page tracking feature.
 ksm.txt
 	- how to use the Kernel Samepage Merging feature.
 numa
diff --git a/Documentation/vm/idle_page_tracking.txt b/Documentation/vm/idle_page_tracking.txt
new file mode 100644
index 000000000000..85dcc3bb85dc
--- /dev/null
+++ b/Documentation/vm/idle_page_tracking.txt
@@ -0,0 +1,98 @@
+MOTIVATION
+
+The idle page tracking feature allows to track which memory pages are being
+accessed by a workload and which are idle. This information can be useful for
+estimating the workload's working set size, which, in turn, can be taken into
+account when configuring the workload parameters, setting memory cgroup limits,
+or deciding where to place the workload within a compute cluster.
+
+It is enabled by CONFIG_IDLE_PAGE_TRACKING=y.
+
+USER API
+
+The idle page tracking API is located at /sys/kernel/mm/page_idle. Currently,
+it consists of the only read-write file, /sys/kernel/mm/page_idle/bitmap.
+
+The file implements a bitmap where each bit corresponds to a memory page. The
+bitmap is represented by an array of 8-byte integers, and the page at PFN #i is
+mapped to bit #i%64 of array element #i/64, byte order is native. When a bit is
+set, the corresponding page is idle.
+
+A page is considered idle if it has not been accessed since it was marked idle
+(for more details on what "accessed" actually means see the IMPLEMENTATION
+DETAILS section). To mark a page idle one has to set the bit corresponding to
+the page by writing to the file. A value written to the file is OR-ed with the
+current bitmap value.
+
+Only accesses to user memory pages are tracked. These are pages mapped to a
+process address space, page cache and buffer pages, swap cache pages. For other
+page types (e.g. SLAB pages) an attempt to mark a page idle is silently ignored,
+and hence such pages are never reported idle.
+
+For huge pages the idle flag is set only on the head page, so one has to read
+/proc/kpageflags in order to correctly count idle huge pages.
+
+Reading from or writing to /sys/kernel/mm/page_idle/bitmap will return
+-EINVAL if you are not starting the read/write on an 8-byte boundary, or
+if the size of the read/write is not a multiple of 8 bytes. Writing to
+this file beyond max PFN will return -ENXIO.
+
+That said, in order to estimate the amount of pages that are not used by a
+workload one should:
+
+ 1. Mark all the workload's pages as idle by setting corresponding bits in
+    /sys/kernel/mm/page_idle/bitmap. The pages can be found by reading
+    /proc/pid/pagemap if the workload is represented by a process, or by
+    filtering out alien pages using /proc/kpagecgroup in case the workload is
+    placed in a memory cgroup.
+
+ 2. Wait until the workload accesses its working set.
+
+ 3. Read /sys/kernel/mm/page_idle/bitmap and count the number of bits set. If
+    one wants to ignore certain types of pages, e.g. mlocked pages since they
+    are not reclaimable, he or she can filter them out using /proc/kpageflags.
+
+See Documentation/vm/pagemap.txt for more information about /proc/pid/pagemap,
+/proc/kpageflags, and /proc/kpagecgroup.
+
+IMPLEMENTATION DETAILS
+
+The kernel internally keeps track of accesses to user memory pages in order to
+reclaim unreferenced pages first on memory shortage conditions. A page is
+considered referenced if it has been recently accessed via a process address
+space, in which case one or more PTEs it is mapped to will have the Accessed bit
+set, or marked accessed explicitly by the kernel (see mark_page_accessed()). The
+latter happens when:
+
+ - a userspace process reads or writes a page using a system call (e.g. read(2)
+   or write(2))
+
+ - a page that is used for storing filesystem buffers is read or written,
+   because a process needs filesystem metadata stored in it (e.g. lists a
+   directory tree)
+
+ - a page is accessed by a device driver using get_user_pages()
+
+When a dirty page is written to swap or disk as a result of memory reclaim or
+exceeding the dirty memory limit, it is not marked referenced.
+
+The idle memory tracking feature adds a new page flag, the Idle flag. This flag
+is set manually, by writing to /sys/kernel/mm/page_idle/bitmap (see the USER API
+section), and cleared automatically whenever a page is referenced as defined
+above.
+
+When a page is marked idle, the Accessed bit must be cleared in all PTEs it is
+mapped to, otherwise we will not be able to detect accesses to the page coming
+from a process address space. To avoid interference with the reclaimer, which,
+as noted above, uses the Accessed bit to promote actively referenced pages, one
+more page flag is introduced, the Young flag. When the PTE Accessed bit is
+cleared as a result of setting or updating a page's Idle flag, the Young flag
+is set on the page. The reclaimer treats the Young flag as an extra PTE
+Accessed bit and therefore will consider such a page as referenced.
+
+Since the idle memory tracking feature is based on the memory reclaimer logic,
+it only works with pages that are on an LRU list, other pages are silently
+ignored. That means it will ignore a user memory page if it is isolated, but
+since there are usually not many of them, it should not affect the overall
+result noticeably. In order not to stall scanning of the idle page bitmap,
+locked pages may be skipped too.
diff --git a/fs/proc/page.c b/fs/proc/page.c
index 70d23245dd43..c2d29edcaa6b 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -10,12 +10,15 @@
 #include <linux/seq_file.h>
 #include <linux/hugetlb.h>
 #include <linux/memcontrol.h>
+#include <linux/mmu_notifier.h>
+#include <linux/page_idle.h>
 #include <linux/kernel-page-flags.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
 #define KPMSIZE sizeof(u64)
 #define KPMMASK (KPMSIZE - 1)
+#define KPMBITS (KPMSIZE * BITS_PER_BYTE)
 
 /* /proc/kpagecount - an array exposing page counts
  *
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 41f1a50c10c9..e2d46adb54b4 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -13,6 +13,7 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 #include <linux/mmu_notifier.h>
+#include <linux/page_idle.h>
 
 #include <asm/elf.h>
 #include <asm/uaccess.h>
@@ -459,7 +460,7 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 
 	mss->resident += size;
 	/* Accumulate the size in pages that have been accessed. */
-	if (young || PageReferenced(page))
+	if (young || page_is_young(page) || PageReferenced(page))
 		mss->referenced += size;
 	mapcount = page_mapcount(page);
 	if (mapcount >= 2) {
@@ -807,6 +808,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
 
 		/* Clear accessed and referenced bits. */
 		pmdp_test_and_clear_young(vma, addr, pmd);
+		test_and_clear_page_young(page);
 		ClearPageReferenced(page);
 out:
 		spin_unlock(ptl);
@@ -834,6 +836,7 @@ out:
 
 		/* Clear accessed and referenced bits. */
 		ptep_test_and_clear_young(vma, addr, pte);
+		test_and_clear_page_young(page);
 		ClearPageReferenced(page);
 	}
 	pte_unmap_unlock(pte - 1, ptl);
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index a5b17137c683..a1a210d59961 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -471,6 +471,8 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
 
 #define ptep_clear_flush_young_notify ptep_clear_flush_young
 #define pmdp_clear_flush_young_notify pmdp_clear_flush_young
+#define ptep_clear_young_notify ptep_test_and_clear_young
+#define pmdp_clear_young_notify pmdp_test_and_clear_young
 #define	ptep_clear_flush_notify ptep_clear_flush
 #define pmdp_huge_clear_flush_notify pmdp_huge_clear_flush
 #define pmdp_huge_get_and_clear_notify pmdp_huge_get_and_clear
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 41c93844fb1d..416509e26d6d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -108,6 +108,10 @@ enum pageflags {
 #endif
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	PG_compound_lock,
+#endif
+#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT)
+	PG_young,
+	PG_idle,
 #endif
 	__NR_PAGEFLAGS,
 
@@ -289,6 +293,13 @@ PAGEFLAG_FALSE(HWPoison)
 #define __PG_HWPOISON 0
 #endif
 
+#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT)
+TESTPAGEFLAG(Young, young)
+SETPAGEFLAG(Young, young)
+TESTCLEARFLAG(Young, young)
+PAGEFLAG(Idle, idle)
+#endif
+
 /*
  * On an anonymous page mapped into a user virtual memory area,
  * page->mapping points to its anon_vma, not to a struct address_space;
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index c42981cd99aa..17f118a82854 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -26,6 +26,10 @@ enum page_ext_flags {
 	PAGE_EXT_DEBUG_POISON,		/* Page is poisoned */
 	PAGE_EXT_DEBUG_GUARD,
 	PAGE_EXT_OWNER,
+#if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
+	PAGE_EXT_YOUNG,
+	PAGE_EXT_IDLE,
+#endif
 };
 
 /*
diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h
new file mode 100644
index 000000000000..bf268fa92c5b
--- /dev/null
+++ b/include/linux/page_idle.h
@@ -0,0 +1,110 @@
+#ifndef _LINUX_MM_PAGE_IDLE_H
+#define _LINUX_MM_PAGE_IDLE_H
+
+#include <linux/bitops.h>
+#include <linux/page-flags.h>
+#include <linux/page_ext.h>
+
+#ifdef CONFIG_IDLE_PAGE_TRACKING
+
+#ifdef CONFIG_64BIT
+static inline bool page_is_young(struct page *page)
+{
+	return PageYoung(page);
+}
+
+static inline void set_page_young(struct page *page)
+{
+	SetPageYoung(page);
+}
+
+static inline bool test_and_clear_page_young(struct page *page)
+{
+	return TestClearPageYoung(page);
+}
+
+static inline bool page_is_idle(struct page *page)
+{
+	return PageIdle(page);
+}
+
+static inline void set_page_idle(struct page *page)
+{
+	SetPageIdle(page);
+}
+
+static inline void clear_page_idle(struct page *page)
+{
+	ClearPageIdle(page);
+}
+#else /* !CONFIG_64BIT */
+/*
+ * If there is not enough space to store Idle and Young bits in page flags, use
+ * page ext flags instead.
+ */
+extern struct page_ext_operations page_idle_ops;
+
+static inline bool page_is_young(struct page *page)
+{
+	return test_bit(PAGE_EXT_YOUNG, &lookup_page_ext(page)->flags);
+}
+
+static inline void set_page_young(struct page *page)
+{
+	set_bit(PAGE_EXT_YOUNG, &lookup_page_ext(page)->flags);
+}
+
+static inline bool test_and_clear_page_young(struct page *page)
+{
+	return test_and_clear_bit(PAGE_EXT_YOUNG,
+				  &lookup_page_ext(page)->flags);
+}
+
+static inline bool page_is_idle(struct page *page)
+{
+	return test_bit(PAGE_EXT_IDLE, &lookup_page_ext(page)->flags);
+}
+
+static inline void set_page_idle(struct page *page)
+{
+	set_bit(PAGE_EXT_IDLE, &lookup_page_ext(page)->flags);
+}
+
+static inline void clear_page_idle(struct page *page)
+{
+	clear_bit(PAGE_EXT_IDLE, &lookup_page_ext(page)->flags);
+}
+#endif /* CONFIG_64BIT */
+
+#else /* !CONFIG_IDLE_PAGE_TRACKING */
+
+static inline bool page_is_young(struct page *page)
+{
+	return false;
+}
+
+static inline void set_page_young(struct page *page)
+{
+}
+
+static inline bool test_and_clear_page_young(struct page *page)
+{
+	return false;
+}
+
+static inline bool page_is_idle(struct page *page)
+{
+	return false;
+}
+
+static inline void set_page_idle(struct page *page)
+{
+}
+
+static inline void clear_page_idle(struct page *page)
+{
+}
+
+#endif /* CONFIG_IDLE_PAGE_TRACKING */
+
+#endif /* _LINUX_MM_PAGE_IDLE_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 3a4070f5ab79..6413d027c0b2 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -649,6 +649,18 @@ config DEFERRED_STRUCT_PAGE_INIT
 	  processes running early in the lifetime of the systemm until kswapd
 	  finishes the initialisation.
 
+config IDLE_PAGE_TRACKING
+	bool "Enable idle page tracking"
+	depends on SYSFS && MMU
+	select PAGE_EXTENSION if !64BIT
+	help
+	  This feature allows to estimate the amount of user pages that have
+	  not been touched during a given period of time. This information can
+	  be useful to tune memory cgroup limits and/or for job placement
+	  within a compute cluster.
+
+	  See Documentation/vm/idle_page_tracking.txt for more details.
+
 config ZONE_DEVICE
 	bool "Device memory (pmem, etc...) hotplug support" if EXPERT
 	default !ZONE_DMA
diff --git a/mm/Makefile b/mm/Makefile
index b424d5e5b6ff..56f8eed73f1a 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -79,3 +79,4 @@ obj-$(CONFIG_MEMORY_BALLOON) += balloon_compaction.o
 obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
 obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
+obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
diff --git a/mm/debug.c b/mm/debug.c
index 76089ddf99ea..6c1b3ea61bfd 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -48,6 +48,10 @@ static const struct trace_print_flags pageflag_names[] = {
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	{1UL << PG_compound_lock,	"compound_lock"	},
 #endif
+#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT)
+	{1UL << PG_young,		"young"		},
+	{1UL << PG_idle,		"idle"		},
+#endif
 };
 
 static void dump_flags(unsigned long flags,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b16279cbd91d..4b06b8db9df2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -25,6 +25,7 @@
 #include <linux/migrate.h>
 #include <linux/hashtable.h>
 #include <linux/userfaultfd_k.h>
+#include <linux/page_idle.h>
 
 #include <asm/tlb.h>
 #include <asm/pgalloc.h>
@@ -1757,6 +1758,11 @@ static void __split_huge_page_refcount(struct page *page,
 		/* clear PageTail before overwriting first_page */
 		smp_wmb();
 
+		if (page_is_young(page))
+			set_page_young(page_tail);
+		if (page_is_idle(page))
+			set_page_idle(page_tail);
+
 		/*
 		 * __split_huge_page_splitting() already set the
 		 * splitting bit in all pmd that could map this
@@ -2262,7 +2268,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		VM_BUG_ON_PAGE(PageLRU(page), page);
 
 		/* If there is no mapped pte young don't collapse the page */
-		if (pte_young(pteval) || PageReferenced(page) ||
+		if (pte_young(pteval) ||
+		    page_is_young(page) || PageReferenced(page) ||
 		    mmu_notifier_test_young(vma->vm_mm, address))
 			referenced = true;
 	}
@@ -2693,7 +2700,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
 		 */
 		if (page_count(page) != 1 + !!PageSwapCache(page))
 			goto out_unmap;
-		if (pte_young(pteval) || PageReferenced(page) ||
+		if (pte_young(pteval) ||
+		    page_is_young(page) || PageReferenced(page) ||
 		    mmu_notifier_test_young(vma->vm_mm, address))
 			referenced = true;
 	}
diff --git a/mm/migrate.c b/mm/migrate.c
index 02ce25df16c2..c3cb566af3e2 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -37,6 +37,7 @@
 #include <linux/gfp.h>
 #include <linux/balloon_compaction.h>
 #include <linux/mmu_notifier.h>
+#include <linux/page_idle.h>
 
 #include <asm/tlbflush.h>
 
@@ -524,6 +525,11 @@ void migrate_page_copy(struct page *newpage, struct page *page)
 			__set_page_dirty_nobuffers(newpage);
  	}
 
+	if (page_is_young(page))
+		set_page_young(newpage);
+	if (page_is_idle(page))
+		set_page_idle(newpage);
+
 	/*
 	 * Copy NUMA information to the new page, to prevent over-eager
 	 * future migrations of this same page.
diff --git a/mm/page_ext.c b/mm/page_ext.c
index d86fd2f5353f..292ca7b8debd 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -6,6 +6,7 @@
 #include <linux/vmalloc.h>
 #include <linux/kmemleak.h>
 #include <linux/page_owner.h>
+#include <linux/page_idle.h>
 
 /*
  * struct page extension
@@ -59,6 +60,9 @@ static struct page_ext_operations *page_ext_ops[] = {
 #ifdef CONFIG_PAGE_OWNER
 	&page_owner_ops,
 #endif
+#if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
+	&page_idle_ops,
+#endif
 };
 
 static unsigned long total_usage;
diff --git a/mm/page_idle.c b/mm/page_idle.c
new file mode 100644
index 000000000000..d5dd79041484
--- /dev/null
+++ b/mm/page_idle.c
@@ -0,0 +1,232 @@
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/fs.h>
+#include <linux/sysfs.h>
+#include <linux/kobject.h>
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/pagemap.h>
+#include <linux/rmap.h>
+#include <linux/mmu_notifier.h>
+#include <linux/page_ext.h>
+#include <linux/page_idle.h>
+
+#define BITMAP_CHUNK_SIZE	sizeof(u64)
+#define BITMAP_CHUNK_BITS	(BITMAP_CHUNK_SIZE * BITS_PER_BYTE)
+
+/*
+ * Idle page tracking only considers user memory pages, for other types of
+ * pages the idle flag is always unset and an attempt to set it is silently
+ * ignored.
+ *
+ * We treat a page as a user memory page if it is on an LRU list, because it is
+ * always safe to pass such a page to rmap_walk(), which is essential for idle
+ * page tracking. With such an indicator of user pages we can skip isolated
+ * pages, but since there are not usually many of them, it will hardly affect
+ * the overall result.
+ *
+ * This function tries to get a user memory page by pfn as described above.
+ */
+static struct page *page_idle_get_page(unsigned long pfn)
+{
+	struct page *page;
+	struct zone *zone;
+
+	if (!pfn_valid(pfn))
+		return NULL;
+
+	page = pfn_to_page(pfn);
+	if (!page || !PageLRU(page) ||
+	    !get_page_unless_zero(page))
+		return NULL;
+
+	zone = page_zone(page);
+	spin_lock_irq(&zone->lru_lock);
+	if (unlikely(!PageLRU(page))) {
+		put_page(page);
+		page = NULL;
+	}
+	spin_unlock_irq(&zone->lru_lock);
+	return page;
+}
+
+static int page_idle_clear_pte_refs_one(struct page *page,
+					struct vm_area_struct *vma,
+					unsigned long addr, void *arg)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	spinlock_t *ptl;
+	pmd_t *pmd;
+	pte_t *pte;
+	bool referenced = false;
+
+	if (unlikely(PageTransHuge(page))) {
+		pmd = page_check_address_pmd(page, mm, addr,
+					     PAGE_CHECK_ADDRESS_PMD_FLAG, &ptl);
+		if (pmd) {
+			referenced = pmdp_clear_young_notify(vma, addr, pmd);
+			spin_unlock(ptl);
+		}
+	} else {
+		pte = page_check_address(page, mm, addr, &ptl, 0);
+		if (pte) {
+			referenced = ptep_clear_young_notify(vma, addr, pte);
+			pte_unmap_unlock(pte, ptl);
+		}
+	}
+	if (referenced) {
+		clear_page_idle(page);
+		/*
+		 * We cleared the referenced bit in a mapping to this page. To
+		 * avoid interference with page reclaim, mark it young so that
+		 * page_referenced() will return > 0.
+		 */
+		set_page_young(page);
+	}
+	return SWAP_AGAIN;
+}
+
+static void page_idle_clear_pte_refs(struct page *page)
+{
+	/*
+	 * Since rwc.arg is unused, rwc is effectively immutable, so we
+	 * can make it static const to save some cycles and stack.
+	 */
+	static const struct rmap_walk_control rwc = {
+		.rmap_one = page_idle_clear_pte_refs_one,
+		.anon_lock = page_lock_anon_vma_read,
+	};
+	bool need_lock;
+
+	if (!page_mapped(page) ||
+	    !page_rmapping(page))
+		return;
+
+	need_lock = !PageAnon(page) || PageKsm(page);
+	if (need_lock && !trylock_page(page))
+		return;
+
+	rmap_walk(page, (struct rmap_walk_control *)&rwc);
+
+	if (need_lock)
+		unlock_page(page);
+}
+
+static ssize_t page_idle_bitmap_read(struct file *file, struct kobject *kobj,
+				     struct bin_attribute *attr, char *buf,
+				     loff_t pos, size_t count)
+{
+	u64 *out = (u64 *)buf;
+	struct page *page;
+	unsigned long pfn, end_pfn;
+	int bit;
+
+	if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE)
+		return -EINVAL;
+
+	pfn = pos * BITS_PER_BYTE;
+	if (pfn >= max_pfn)
+		return 0;
+
+	end_pfn = pfn + count * BITS_PER_BYTE;
+	if (end_pfn > max_pfn)
+		end_pfn = ALIGN(max_pfn, BITMAP_CHUNK_BITS);
+
+	for (; pfn < end_pfn; pfn++) {
+		bit = pfn % BITMAP_CHUNK_BITS;
+		if (!bit)
+			*out = 0ULL;
+		page = page_idle_get_page(pfn);
+		if (page) {
+			if (page_is_idle(page)) {
+				/*
+				 * The page might have been referenced via a
+				 * pte, in which case it is not idle. Clear
+				 * refs and recheck.
+				 */
+				page_idle_clear_pte_refs(page);
+				if (page_is_idle(page))
+					*out |= 1ULL << bit;
+			}
+			put_page(page);
+		}
+		if (bit == BITMAP_CHUNK_BITS - 1)
+			out++;
+		cond_resched();
+	}
+	return (char *)out - buf;
+}
+
+static ssize_t page_idle_bitmap_write(struct file *file, struct kobject *kobj,
+				      struct bin_attribute *attr, char *buf,
+				      loff_t pos, size_t count)
+{
+	const u64 *in = (u64 *)buf;
+	struct page *page;
+	unsigned long pfn, end_pfn;
+	int bit;
+
+	if (pos % BITMAP_CHUNK_SIZE || count % BITMAP_CHUNK_SIZE)
+		return -EINVAL;
+
+	pfn = pos * BITS_PER_BYTE;
+	if (pfn >= max_pfn)
+		return -ENXIO;
+
+	end_pfn = pfn + count * BITS_PER_BYTE;
+	if (end_pfn > max_pfn)
+		end_pfn = ALIGN(max_pfn, BITMAP_CHUNK_BITS);
+
+	for (; pfn < end_pfn; pfn++) {
+		bit = pfn % BITMAP_CHUNK_BITS;
+		if ((*in >> bit) & 1) {
+			page = page_idle_get_page(pfn);
+			if (page) {
+				page_idle_clear_pte_refs(page);
+				set_page_idle(page);
+				put_page(page);
+			}
+		}
+		if (bit == BITMAP_CHUNK_BITS - 1)
+			in++;
+		cond_resched();
+	}
+	return (char *)in - buf;
+}
+
+static struct bin_attribute page_idle_bitmap_attr =
+		__BIN_ATTR(bitmap, S_IRUSR | S_IWUSR,
+			   page_idle_bitmap_read, page_idle_bitmap_write, 0);
+
+static struct bin_attribute *page_idle_bin_attrs[] = {
+	&page_idle_bitmap_attr,
+	NULL,
+};
+
+static struct attribute_group page_idle_attr_group = {
+	.bin_attrs = page_idle_bin_attrs,
+	.name = "page_idle",
+};
+
+#ifndef CONFIG_64BIT
+static bool need_page_idle(void)
+{
+	return true;
+}
+struct page_ext_operations page_idle_ops = {
+	.need = need_page_idle,
+};
+#endif
+
+static int __init page_idle_init(void)
+{
+	int err;
+
+	err = sysfs_create_group(mm_kobj, &page_idle_attr_group);
+	if (err) {
+		pr_err("page_idle: register sysfs failed\n");
+		return err;
+	}
+	return 0;
+}
+subsys_initcall(page_idle_init);
diff --git a/mm/rmap.c b/mm/rmap.c
index 0db38e7d0a72..f5b5c1f3dcd7 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -59,6 +59,7 @@
 #include <linux/migrate.h>
 #include <linux/hugetlb.h>
 #include <linux/backing-dev.h>
+#include <linux/page_idle.h>
 
 #include <asm/tlbflush.h>
 
@@ -886,6 +887,11 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 		pte_unmap_unlock(pte, ptl);
 	}
 
+	if (referenced)
+		clear_page_idle(page);
+	if (test_and_clear_page_young(page))
+		referenced++;
+
 	if (referenced) {
 		pra->referenced++;
 		pra->vm_flags |= vma->vm_flags;
diff --git a/mm/swap.c b/mm/swap.c
index a3a0a2f1f7c3..983f692a47fd 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -32,6 +32,7 @@
 #include <linux/gfp.h>
 #include <linux/uio.h>
 #include <linux/hugetlb.h>
+#include <linux/page_idle.h>
 
 #include "internal.h"
 
@@ -622,6 +623,8 @@ void mark_page_accessed(struct page *page)
 	} else if (!PageReferenced(page)) {
 		SetPageReferenced(page);
 	}
+	if (page_is_idle(page))
+		clear_page_idle(page);
 }
 EXPORT_SYMBOL(mark_page_accessed);
 
-- 
cgit v1.2.3


From f074a8f49eb87cde95ac9d040ad5e7ea4f029738 Mon Sep 17 00:00:00 2001
From: Vladimir Davydov <vdavydov@parallels.com>
Date: Wed, 9 Sep 2015 15:35:48 -0700
Subject: proc: export idle flag via kpageflags

As noted by Minchan, a benefit of reading idle flag from /proc/kpageflags
is that one can easily filter dirty and/or unevictable pages while
estimating the size of unused memory.

Note that idle flag read from /proc/kpageflags may be stale in case the
page was accessed via a PTE, because it would be too costly to iterate
over all page mappings on each /proc/kpageflags read to provide an
up-to-date value.  To make sure the flag is up-to-date one has to read
/sys/kernel/mm/page_idle/bitmap first.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/pagemap.txt           | 7 +++++++
 fs/proc/page.c                         | 3 +++
 include/uapi/linux/kernel-page-flags.h | 1 +
 3 files changed, 11 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index ce294b0aace4..0e1e55588b59 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -70,6 +70,7 @@ There are four components to pagemap:
     22. THP
     23. BALLOON
     24. ZERO_PAGE
+    25. IDLE
 
  * /proc/kpagecgroup.  This file contains a 64-bit inode number of the
    memory cgroup each page is charged to, indexed by PFN. Only available when
@@ -120,6 +121,12 @@ Short descriptions to the page flags:
 24. ZERO_PAGE
     zero page for pfn_zero or huge_zero page
 
+25. IDLE
+    page has not been accessed since it was marked idle (see
+    Documentation/vm/idle_page_tracking.txt). Note that this flag may be
+    stale in case the page was accessed via a PTE. To make sure the flag
+    is up-to-date one has to read /sys/kernel/mm/page_idle/bitmap first.
+
     [IO related page flags]
  1. ERROR     IO error occurred
  3. UPTODATE  page has up-to-date data
diff --git a/fs/proc/page.c b/fs/proc/page.c
index c2d29edcaa6b..0b8286450a93 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -150,6 +150,9 @@ u64 stable_page_flags(struct page *page)
 	if (PageBalloon(page))
 		u |= 1 << KPF_BALLOON;
 
+	if (page_is_idle(page))
+		u |= 1 << KPF_IDLE;
+
 	u |= kpf_copy_bit(k, KPF_LOCKED,	PG_locked);
 
 	u |= kpf_copy_bit(k, KPF_SLAB,		PG_slab);
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
index a6c4962e5d46..5da5f8751ce7 100644
--- a/include/uapi/linux/kernel-page-flags.h
+++ b/include/uapi/linux/kernel-page-flags.h
@@ -33,6 +33,7 @@
 #define KPF_THP			22
 #define KPF_BALLOON		23
 #define KPF_ZERO_PAGE		24
+#define KPF_IDLE		25
 
 
 #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
-- 
cgit v1.2.3


From cd1faefa66425c3fa338777773c5c017edea3439 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux@roeck-us.net>
Date: Sun, 30 Aug 2015 19:45:19 -0700
Subject: hwmon: (nct6775) Add support for NCT6793D

NCT6793D is register compatible with NCT6792D.

Also move nct6775_sio_names[] closer to enum kinds to simplify
adding new chips.

Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 Documentation/hwmon/nct6775 |  4 ++++
 drivers/hwmon/Kconfig       |  4 ++--
 drivers/hwmon/nct6775.c     | 48 ++++++++++++++++++++++++++++++---------------
 3 files changed, 38 insertions(+), 18 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/hwmon/nct6775 b/Documentation/hwmon/nct6775
index f0dd3d2fec96..76add4c9cd68 100644
--- a/Documentation/hwmon/nct6775
+++ b/Documentation/hwmon/nct6775
@@ -32,6 +32,10 @@ Supported chips:
     Prefix: 'nct6792'
     Addresses scanned: ISA address retrieved from Super I/O registers
     Datasheet: Available from Nuvoton upon request
+  * Nuvoton NCT6793D
+    Prefix: 'nct6793'
+    Addresses scanned: ISA address retrieved from Super I/O registers
+    Datasheet: Available from Nuvoton upon request
 
 Authors:
         Guenter Roeck <linux@roeck-us.net>
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 500b262b89bb..e13c902e8966 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1140,8 +1140,8 @@ config SENSORS_NCT6775
 	help
 	  If you say yes here you get support for the hardware monitoring
 	  functionality of the Nuvoton NCT6106D, NCT6775F, NCT6776F, NCT6779D,
-	  NCT6791D, NCT6792D and compatible Super-I/O chips. This driver
-	  replaces the w83627ehf driver for NCT6775F and NCT6776F.
+	  NCT6791D, NCT6792D, NCT6793D, and compatible Super-I/O chips. This
+	  driver replaces the w83627ehf driver for NCT6775F and NCT6776F.
 
 	  This driver can also be built as a module.  If so, the module
 	  will be called nct6775.
diff --git a/drivers/hwmon/nct6775.c b/drivers/hwmon/nct6775.c
index 2aaedbe0b023..8b4fa55e46c6 100644
--- a/drivers/hwmon/nct6775.c
+++ b/drivers/hwmon/nct6775.c
@@ -39,6 +39,7 @@
  * nct6779d    15      5       5       2+6    0xc560 0xc1    0x5ca3
  * nct6791d    15      6       6       2+6    0xc800 0xc1    0x5ca3
  * nct6792d    15      6       6       2+6    0xc910 0xc1    0x5ca3
+ * nct6793d    15      6       6       2+6    0xd120 0xc1    0x5ca3
  *
  * #temp lists the number of monitored temperature sources (first value) plus
  * the number of directly connectable temperature sensors (second value).
@@ -63,7 +64,7 @@
 
 #define USE_ALTERNATE
 
-enum kinds { nct6106, nct6775, nct6776, nct6779, nct6791, nct6792 };
+enum kinds { nct6106, nct6775, nct6776, nct6779, nct6791, nct6792, nct6793 };
 
 /* used to set data->name = nct6775_device_names[data->sio_kind] */
 static const char * const nct6775_device_names[] = {
@@ -73,6 +74,17 @@ static const char * const nct6775_device_names[] = {
 	"nct6779",
 	"nct6791",
 	"nct6792",
+	"nct6793",
+};
+
+static const char * const nct6775_sio_names[] __initconst = {
+	"NCT6106D",
+	"NCT6775F",
+	"NCT6776D/F",
+	"NCT6779D",
+	"NCT6791D",
+	"NCT6792D",
+	"NCT6793D",
 };
 
 static unsigned short force_id;
@@ -104,6 +116,7 @@ MODULE_PARM_DESC(fan_debounce, "Enable debouncing for fan RPM signal");
 #define SIO_NCT6779_ID		0xc560
 #define SIO_NCT6791_ID		0xc800
 #define SIO_NCT6792_ID		0xc910
+#define SIO_NCT6793_ID		0xd120
 #define SIO_ID_MASK		0xFFF0
 
 enum pwm_enable { off, manual, thermal_cruise, speed_cruise, sf3, sf4 };
@@ -537,7 +550,7 @@ static const s8 NCT6791_ALARM_BITS[] = {
 	4, 5, 13, -1, -1, -1,		/* temp1..temp6 */
 	12, 9 };			/* intrusion0, intrusion1 */
 
-/* NCT6792 specific data */
+/* NCT6792/NCT6793 specific data */
 
 static const u16 NCT6792_REG_TEMP_MON[] = {
 	0x73, 0x75, 0x77, 0x79, 0x7b, 0x7d };
@@ -1060,6 +1073,7 @@ static bool is_word_sized(struct nct6775_data *data, u16 reg)
 	case nct6779:
 	case nct6791:
 	case nct6792:
+	case nct6793:
 		return reg == 0x150 || reg == 0x153 || reg == 0x155 ||
 		  ((reg & 0xfff0) == 0x4b0 && (reg & 0x000f) < 0x0b) ||
 		  reg == 0x402 ||
@@ -1411,6 +1425,7 @@ static void nct6775_update_pwm_limits(struct device *dev)
 		case nct6779:
 		case nct6791:
 		case nct6792:
+		case nct6793:
 			reg = nct6775_read_value(data,
 					data->REG_CRITICAL_PWM_ENABLE[i]);
 			if (reg & data->CRITICAL_PWM_ENABLE_MASK)
@@ -2826,6 +2841,7 @@ store_auto_pwm(struct device *dev, struct device_attribute *attr,
 		case nct6779:
 		case nct6791:
 		case nct6792:
+		case nct6793:
 			nct6775_write_value(data, data->REG_CRITICAL_PWM[nr],
 					    val);
 			reg = nct6775_read_value(data,
@@ -3260,7 +3276,7 @@ nct6775_check_fan_inputs(struct nct6775_data *data)
 		pwm4pin = false;
 		pwm5pin = false;
 		pwm6pin = false;
-	} else {	/* NCT6779D, NCT6791D, or NCT6792D */
+	} else {	/* NCT6779D, NCT6791D, NCT6792D, or NCT6793D */
 		regval = superio_inb(sioreg, 0x1c);
 
 		fan3pin = !(regval & (1 << 5));
@@ -3273,7 +3289,8 @@ nct6775_check_fan_inputs(struct nct6775_data *data)
 
 		fan4min = fan4pin;
 
-		if (data->kind == nct6791 || data->kind == nct6792) {
+		if (data->kind == nct6791 || data->kind == nct6792 ||
+		    data->kind == nct6793) {
 			regval = superio_inb(sioreg, 0x2d);
 			fan6pin = (regval & (1 << 1));
 			pwm6pin = (regval & (1 << 0));
@@ -3647,6 +3664,7 @@ static int nct6775_probe(struct platform_device *pdev)
 		break;
 	case nct6791:
 	case nct6792:
+	case nct6793:
 		data->in_num = 15;
 		data->pwm_num = 6;
 		data->auto_pwm_num = 4;
@@ -3922,6 +3940,7 @@ static int nct6775_probe(struct platform_device *pdev)
 	case nct6779:
 	case nct6791:
 	case nct6792:
+	case nct6793:
 		break;
 	}
 
@@ -3954,6 +3973,7 @@ static int nct6775_probe(struct platform_device *pdev)
 			break;
 		case nct6791:
 		case nct6792:
+		case nct6793:
 			tmp |= 0x7e;
 			break;
 		}
@@ -4051,7 +4071,8 @@ static int __maybe_unused nct6775_resume(struct device *dev)
 	if (reg != data->sio_reg_enable)
 		superio_outb(sioreg, SIO_REG_ENABLE, data->sio_reg_enable);
 
-	if (data->kind == nct6791 || data->kind == nct6792)
+	if (data->kind == nct6791 || data->kind == nct6792 ||
+	    data->kind == nct6793)
 		nct6791_enable_io_mapping(sioreg);
 
 	superio_exit(sioreg);
@@ -4110,15 +4131,6 @@ static struct platform_driver nct6775_driver = {
 	.probe		= nct6775_probe,
 };
 
-static const char * const nct6775_sio_names[] __initconst = {
-	"NCT6106D",
-	"NCT6775F",
-	"NCT6776D/F",
-	"NCT6779D",
-	"NCT6791D",
-	"NCT6792D",
-};
-
 /* nct6775_find() looks for a '627 in the Super-I/O config space */
 static int __init nct6775_find(int sioaddr, struct nct6775_sio_data *sio_data)
 {
@@ -4154,6 +4166,9 @@ static int __init nct6775_find(int sioaddr, struct nct6775_sio_data *sio_data)
 	case SIO_NCT6792_ID:
 		sio_data->kind = nct6792;
 		break;
+	case SIO_NCT6793_ID:
+		sio_data->kind = nct6793;
+		break;
 	default:
 		if (val != 0xffff)
 			pr_debug("unsupported chip ID: 0x%04x\n", val);
@@ -4179,7 +4194,8 @@ static int __init nct6775_find(int sioaddr, struct nct6775_sio_data *sio_data)
 		superio_outb(sioaddr, SIO_REG_ENABLE, val | 0x01);
 	}
 
-	if (sio_data->kind == nct6791 || sio_data->kind == nct6792)
+	if (sio_data->kind == nct6791 || sio_data->kind == nct6792 ||
+	    sio_data->kind == nct6793)
 		nct6791_enable_io_mapping(sioaddr);
 
 	superio_exit(sioaddr);
@@ -4289,7 +4305,7 @@ static void __exit sensors_nct6775_exit(void)
 }
 
 MODULE_AUTHOR("Guenter Roeck <linux@roeck-us.net>");
-MODULE_DESCRIPTION("NCT6106D/NCT6775F/NCT6776F/NCT6779D/NCT6791D/NCT6792D driver");
+MODULE_DESCRIPTION("Driver for NCT6775F and compatible chips");
 MODULE_LICENSE("GPL");
 
 module_init(sensors_nct6775_init);
-- 
cgit v1.2.3


From 69de52ba321dda8dd7f632d1e480983494325ba0 Mon Sep 17 00:00:00 2001
From: Dirk Behme <dirk.behme@gmail.com>
Date: Wed, 2 Sep 2015 20:07:09 +0200
Subject: Documentation: gpio: board: add flags parameter to gpiod_get*()
 functions

With commit 39b2bbe3d715 ("gpio: add flags argument to gpiod_get*()
functions") the gpiod_get*() functions got a 'flags' parameter. Reflect
this in the documentation, too.

Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 Documentation/gpio/board.txt | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/gpio/board.txt b/Documentation/gpio/board.txt
index b80606de545a..9edd5af31b3a 100644
--- a/Documentation/gpio/board.txt
+++ b/Documentation/gpio/board.txt
@@ -39,11 +39,11 @@ This property will make GPIOs 15, 16 and 17 available to the driver under the
 
 	struct gpio_desc *red, *green, *blue, *power;
 
-	red = gpiod_get_index(dev, "led", 0);
-	green = gpiod_get_index(dev, "led", 1);
-	blue = gpiod_get_index(dev, "led", 2);
+	red = gpiod_get_index(dev, "led", 0, GPIOD_OUT_HIGH);
+	green = gpiod_get_index(dev, "led", 1, GPIOD_OUT_HIGH);
+	blue = gpiod_get_index(dev, "led", 2, GPIOD_OUT_HIGH);
 
-	power = gpiod_get(dev, "power");
+	power = gpiod_get(dev, "power", GPIOD_OUT_HIGH);
 
 The led GPIOs will be active-high, while the power GPIO will be active-low (i.e.
 gpiod_is_active_low(power) will be true).
@@ -142,13 +142,14 @@ The driver controlling "foo.0" will then be able to obtain its GPIOs as follows:
 
 	struct gpio_desc *red, *green, *blue, *power;
 
-	red = gpiod_get_index(dev, "led", 0);
-	green = gpiod_get_index(dev, "led", 1);
-	blue = gpiod_get_index(dev, "led", 2);
+	red = gpiod_get_index(dev, "led", 0, GPIOD_OUT_HIGH);
+	green = gpiod_get_index(dev, "led", 1, GPIOD_OUT_HIGH);
+	blue = gpiod_get_index(dev, "led", 2, GPIOD_OUT_HIGH);
 
-	power = gpiod_get(dev, "power");
-	gpiod_direction_output(power, 1);
+	power = gpiod_get(dev, "power", GPIOD_OUT_HIGH);
 
-Since the "power" GPIO is mapped as active-low, its actual signal will be 0
-after this code. Contrary to the legacy integer GPIO interface, the active-low
-property is handled during mapping and is thus transparent to GPIO consumers.
+Since the "led" GPIOs are mapped as active-high, this example will switch their
+signals to 1, i.e. enabling the LEDs. And for the "power" GPIO, which is mapped
+as active-low, its actual signal will be 0 after this code. Contrary to the legacy
+integer GPIO interface, the active-low property is handled during mapping and is
+thus transparent to GPIO consumers.
-- 
cgit v1.2.3


From 87e77e46c61a9333227ab41cdefb1875758a4f13 Mon Sep 17 00:00:00 2001
From: Dirk Behme <dirk.behme@gmail.com>
Date: Wed, 2 Sep 2015 20:07:10 +0200
Subject: Documentation: gpio: board: describe the con_id parameter

The con_id parameter has to match the GPIO description and is automatically
extended by the GPIO suffix if not NULL. I had to look into the code to
understand this and properly find the GPIO I've been looking for, so document
this.

Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 Documentation/gpio/board.txt    | 9 +++++++++
 Documentation/gpio/consumer.txt | 3 +++
 2 files changed, 12 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/gpio/board.txt b/Documentation/gpio/board.txt
index 9edd5af31b3a..5fa069a80171 100644
--- a/Documentation/gpio/board.txt
+++ b/Documentation/gpio/board.txt
@@ -48,6 +48,15 @@ This property will make GPIOs 15, 16 and 17 available to the driver under the
 The led GPIOs will be active-high, while the power GPIO will be active-low (i.e.
 gpiod_is_active_low(power) will be true).
 
+The second parameter of the gpiod_get() functions, the con_id string, has to be
+the <function>-prefix of the GPIO suffixes ("gpios" or "gpio", automatically
+looked up by the gpiod functions internally) used in the device tree. With above
+"led-gpios" example, use the prefix without the "-" as con_id parameter: "led".
+
+Internally, the GPIO subsystem prefixes the GPIO suffix ("gpios" or "gpio")
+with the string passed in con_id to get the resulting string
+(snprintf(... "%s-%s", con_id, gpio_suffixes[]).
+
 ACPI
 ----
 ACPI also supports function names for GPIOs in a similar fashion to DT.
diff --git a/Documentation/gpio/consumer.txt b/Documentation/gpio/consumer.txt
index a206639454ab..e000502fde20 100644
--- a/Documentation/gpio/consumer.txt
+++ b/Documentation/gpio/consumer.txt
@@ -39,6 +39,9 @@ device that displays digits), an additional index argument can be specified:
 					  const char *con_id, unsigned int idx,
 					  enum gpiod_flags flags)
 
+For a more detailed description of the con_id parameter in the DeviceTree case
+see Documentation/gpio/board.txt
+
 The flags parameter is used to optionally specify a direction and initial value
 for the GPIO. Values can be:
 
-- 
cgit v1.2.3


From ae80d64ee8c88b77c58254bcdc5c0981faab672d Mon Sep 17 00:00:00 2001
From: Javier Martinez Canillas <javier@osg.samsung.com>
Date: Tue, 1 Sep 2015 10:46:15 +0200
Subject: Documentation: gpio: Explain that <function>-gpio is also supported

The GPIO documentation mentions that GPIOs are mapped by defining a
<function>-gpios property in the consumer device's node but a -gpio
sufix is also supported after commit:

dd34c37aa3e8 ("gpio: of: Allow -gpio suffix for property names")

Update the documentation to match the implementation.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 Documentation/gpio/board.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/gpio/board.txt b/Documentation/gpio/board.txt
index 5fa069a80171..f59c43b6411b 100644
--- a/Documentation/gpio/board.txt
+++ b/Documentation/gpio/board.txt
@@ -21,8 +21,8 @@ exact way to do it depends on the GPIO controller providing the GPIOs, see the
 device tree bindings for your controller.
 
 GPIOs mappings are defined in the consumer device's node, in a property named
-<function>-gpios, where <function> is the function the driver will request
-through gpiod_get(). For example:
+either <function>-gpios or <function>-gpio, where <function> is the function
+the driver will request through gpiod_get(). For example:
 
 	foo_device {
 		compatible = "acme,foo";
@@ -31,7 +31,7 @@ through gpiod_get(). For example:
 			    <&gpio 16 GPIO_ACTIVE_HIGH>, /* green */
 			    <&gpio 17 GPIO_ACTIVE_HIGH>; /* blue */
 
-		power-gpios = <&gpio 1 GPIO_ACTIVE_LOW>;
+		power-gpio = <&gpio 1 GPIO_ACTIVE_LOW>;
 	};
 
 This property will make GPIOs 15, 16 and 17 available to the driver under the
-- 
cgit v1.2.3


From 8b7b390f805f09ff252351468a79ebabde1ab55a Mon Sep 17 00:00:00 2001
From: Javi Merino <javi.merino@arm.com>
Date: Mon, 14 Sep 2015 14:23:52 +0100
Subject: thermal: power_allocator: relax the requirement of two passive trip
 points

The power allocator governor currently requires that the thermal zone
has at least two passive trip points.  If there aren't, the governor
refuses to bind to the thermal zone.

This commit relaxes that requirement.  Now the governor will bind to all
thermal zones regardless of how many trip points they have.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Reviewed-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
---
 Documentation/thermal/power_allocator.txt |   2 +-
 drivers/thermal/power_allocator.c         | 101 +++++++++++++++++-------------
 2 files changed, 58 insertions(+), 45 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/thermal/power_allocator.txt b/Documentation/thermal/power_allocator.txt
index c3797b529991..a1ce2235f121 100644
--- a/Documentation/thermal/power_allocator.txt
+++ b/Documentation/thermal/power_allocator.txt
@@ -4,7 +4,7 @@ Power allocator governor tunables
 Trip points
 -----------
 
-The governor requires the following two passive trip points:
+The governor works optimally with the following two passive trip points:
 
 1.  "switch on" trip point: temperature above which the governor
     control loop starts operating.  This is the first passive trip
diff --git a/drivers/thermal/power_allocator.c b/drivers/thermal/power_allocator.c
index 51473473f154..ea57759eb095 100644
--- a/drivers/thermal/power_allocator.c
+++ b/drivers/thermal/power_allocator.c
@@ -24,6 +24,8 @@
 
 #include "thermal_core.h"
 
+#define INVALID_TRIP -1
+
 #define FRAC_BITS 10
 #define int_to_frac(x) ((x) << FRAC_BITS)
 #define frac_to_int(x) ((x) >> FRAC_BITS)
@@ -61,6 +63,8 @@ static inline s64 div_frac(s64 x, s64 y)
  *		Used to calculate the derivative term.
  * @trip_switch_on:	first passive trip point of the thermal zone.  The
  *			governor switches on when this trip point is crossed.
+ *			If the thermal zone only has one passive trip point,
+ *			@trip_switch_on should be INVALID_TRIP.
  * @trip_max_desired_temperature:	last passive trip point of the thermal
  *					zone.  The temperature we are
  *					controlling for.
@@ -432,43 +436,66 @@ unlock:
 	return ret;
 }
 
-static int get_governor_trips(struct thermal_zone_device *tz,
-			      struct power_allocator_params *params)
+/**
+ * get_governor_trips() - get the number of the two trip points that are key for this governor
+ * @tz:	thermal zone to operate on
+ * @params:	pointer to private data for this governor
+ *
+ * The power allocator governor works optimally with two trips points:
+ * a "switch on" trip point and a "maximum desired temperature".  These
+ * are defined as the first and last passive trip points.
+ *
+ * If there is only one trip point, then that's considered to be the
+ * "maximum desired temperature" trip point and the governor is always
+ * on.  If there are no passive or active trip points, then the
+ * governor won't do anything.  In fact, its throttle function
+ * won't be called at all.
+ */
+static void get_governor_trips(struct thermal_zone_device *tz,
+			       struct power_allocator_params *params)
 {
-	int i, ret, last_passive;
+	int i, last_active, last_passive;
 	bool found_first_passive;
 
 	found_first_passive = false;
-	last_passive = -1;
-	ret = -EINVAL;
+	last_active = INVALID_TRIP;
+	last_passive = INVALID_TRIP;
 
 	for (i = 0; i < tz->trips; i++) {
 		enum thermal_trip_type type;
+		int ret;
 
 		ret = tz->ops->get_trip_type(tz, i, &type);
-		if (ret)
-			return ret;
+		if (ret) {
+			dev_warn(&tz->device,
+				 "Failed to get trip point %d type: %d\n", i,
+				 ret);
+			continue;
+		}
 
-		if (!found_first_passive) {
-			if (type == THERMAL_TRIP_PASSIVE) {
+		if (type == THERMAL_TRIP_PASSIVE) {
+			if (!found_first_passive) {
 				params->trip_switch_on = i;
 				found_first_passive = true;
+			} else  {
+				last_passive = i;
 			}
-		} else if (type == THERMAL_TRIP_PASSIVE) {
-			last_passive = i;
+		} else if (type == THERMAL_TRIP_ACTIVE) {
+			last_active = i;
 		} else {
 			break;
 		}
 	}
 
-	if (last_passive != -1) {
+	if (last_passive != INVALID_TRIP) {
 		params->trip_max_desired_temperature = last_passive;
-		ret = 0;
+	} else if (found_first_passive) {
+		params->trip_max_desired_temperature = params->trip_switch_on;
+		params->trip_switch_on = INVALID_TRIP;
 	} else {
-		ret = -EINVAL;
+		params->trip_switch_on = INVALID_TRIP;
+		params->trip_max_desired_temperature = last_active;
 	}
-
-	return ret;
 }
 
 static void reset_pid_controller(struct power_allocator_params *params)
@@ -497,11 +524,10 @@ static void allow_maximum_power(struct thermal_zone_device *tz)
  * power_allocator_bind() - bind the power_allocator governor to a thermal zone
  * @tz:	thermal zone to bind it to
  *
- * Check that the thermal zone is valid for this governor, that is, it
- * has two thermal trips.  If so, initialize the PID controller
- * parameters and bind it to the thermal zone.
+ * Initialize the PID controller parameters and bind it to the thermal
+ * zone.
  *
- * Return: 0 on success, -EINVAL if the trips were invalid or -ENOMEM
+ * Return: 0 on success, -EINVAL if the thermal zone doesn't have tzp or -ENOMEM
  * if we ran out of memory.
  */
 static int power_allocator_bind(struct thermal_zone_device *tz)
@@ -520,30 +546,23 @@ static int power_allocator_bind(struct thermal_zone_device *tz)
 	if (!tz->tzp->sustainable_power)
 		dev_warn(&tz->device, "power_allocator: sustainable_power will be estimated\n");
 
-	ret = get_governor_trips(tz, params);
-	if (ret) {
-		dev_err(&tz->device,
-			"thermal zone %s has wrong trip setup for power allocator\n",
-			tz->type);
-		goto free;
-	}
+	get_governor_trips(tz, params);
 
-	ret = tz->ops->get_trip_temp(tz, params->trip_max_desired_temperature,
-				     &control_temp);
-	if (ret)
-		goto free;
+	if (tz->trips > 0) {
+		ret = tz->ops->get_trip_temp(tz,
+					params->trip_max_desired_temperature,
+					&control_temp);
+		if (!ret)
+			estimate_pid_constants(tz, tz->tzp->sustainable_power,
+					       params->trip_switch_on,
+					       control_temp, false);
+	}
 
-	estimate_pid_constants(tz, tz->tzp->sustainable_power,
-			       params->trip_switch_on, control_temp, false);
 	reset_pid_controller(params);
 
 	tz->governor_data = params;
 
 	return 0;
-
-free:
-	kfree(params);
-	return ret;
 }
 
 static void power_allocator_unbind(struct thermal_zone_device *tz)
@@ -574,13 +593,7 @@ static int power_allocator_throttle(struct thermal_zone_device *tz, int trip)
 
 	ret = tz->ops->get_trip_temp(tz, params->trip_switch_on,
 				     &switch_on_temp);
-	if (ret) {
-		dev_warn(&tz->device,
-			 "Failed to get switch on temperature: %d\n", ret);
-		return ret;
-	}
-
-	if (current_temp < switch_on_temp) {
+	if (!ret && (current_temp < switch_on_temp)) {
 		tz->passive = 0;
 		reset_pid_controller(params);
 		allow_maximum_power(tz);
-- 
cgit v1.2.3


From 1975dbc276c6ab62230cf4f9df5ddc9ff0e0e473 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Mon, 14 Sep 2015 17:11:05 -0600
Subject: locking/static_keys: Fix up the static keys documentation

Fix a few small mistakes in the static key documentation and
delete an unneeded sentence.

Suggested-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150914171105.511e1e21@lwn.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/static-keys.txt |  4 ++--
 include/linux/jump_label.h    | 10 ++++------
 2 files changed, 6 insertions(+), 8 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/static-keys.txt b/Documentation/static-keys.txt
index ec911583f6c5..477927becacb 100644
--- a/Documentation/static-keys.txt
+++ b/Documentation/static-keys.txt
@@ -15,8 +15,8 @@ The updated API replacements are:
 
 DEFINE_STATIC_KEY_TRUE(key);
 DEFINE_STATIC_KEY_FALSE(key);
-static_key_likely()
-static_key_unlikely()
+static_branch_likely()
+static_branch_unlikely()
 
 0) Abstract
 
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 0684bd3a48fc..f1094238ab2a 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -21,8 +21,8 @@
  *
  * DEFINE_STATIC_KEY_TRUE(key);
  * DEFINE_STATIC_KEY_FALSE(key);
- * static_key_likely()
- * static_key_unlikely()
+ * static_branch_likely()
+ * static_branch_unlikely()
  *
  * Jump labels provide an interface to generate dynamic branches using
  * self-modifying code. Assuming toolchain and architecture support, if we
@@ -45,12 +45,10 @@
  * statement, setting the key to true requires us to patch in a jump
  * to the out-of-line of true branch.
  *
- * In addtion to static_branch_{enable,disable}, we can also reference count
+ * In addition to static_branch_{enable,disable}, we can also reference count
  * the key or branch direction via static_branch_{inc,dec}. Thus,
  * static_branch_inc() can be thought of as a 'make more true' and
- * static_branch_dec() as a 'make more false'. The inc()/dec()
- * interface is meant to be used exclusively from the inc()/dec() for a given
- * key.
+ * static_branch_dec() as a 'make more false'.
  *
  * Since this relies on modifying code, the branch modifying functions
  * must be considered absolute slow paths (machine wide synchronization etc.).
-- 
cgit v1.2.3


From c1ceb5fff01c0357de0386f87a620a4636ca68d1 Mon Sep 17 00:00:00 2001
From: Nathan Sullivan <nathan.sullivan@ni.com>
Date: Mon, 31 Aug 2015 09:49:52 -0500
Subject: Documentation: bindings: add doc for zynq USB

Document the binding for the zynq specific chipidea UDC binding.

Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
---
 Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt
index d71ef07bca5d..a057b75ba4b5 100644
--- a/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt
+++ b/Documentation/devicetree/bindings/usb/ci-hdrc-usb2.txt
@@ -6,6 +6,7 @@ Required properties:
 	"lsi,zevio-usb"
 	"qcom,ci-hdrc"
 	"chipidea,usb2"
+	"xlnx,zynq-usb-2.20a"
 - reg: base address and length of the registers
 - interrupts: interrupt for the USB controller
 
-- 
cgit v1.2.3


From d7ba2a024cf19261b34825f89d30ceef9a5ff864 Mon Sep 17 00:00:00 2001
From: Masahiro Yamada <yamada.masahiro@socionext.com>
Date: Wed, 29 Jul 2015 18:45:33 +0900
Subject: of: add vendor prefix for Socionext Inc.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index ac5f0c34ae00..82d2ac97af74 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -203,6 +203,7 @@ sitronix	Sitronix Technology Corporation
 skyworks	Skyworks Solutions, Inc.
 smsc	Standard Microsystems Corporation
 snps	Synopsys, Inc.
+socionext	Socionext Inc.
 solidrun	SolidRun
 solomon        Solomon Systech Limited
 sony	Sony Corporation
-- 
cgit v1.2.3


From dc4dae00d82fedcd7a632f786f9f76c1f7f929a5 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Mon, 7 Sep 2015 10:49:03 +0100
Subject: Docs: dt: add #msi-cells to GICv3 ITS binding

The GICv3 ITS uses sideband master identification data (known as a
DeviceID) to identify which master wrote to a doorbell, and this data is
used to determine how to react in response to the write.

Commit 1e6db000482fa65a ("irqchip/gicv3-its: Add platform MSI support")
added support per this binding, but failed to update the documentation.
This patch fixes the documentation.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/arm/gic-v3.txt | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/gic-v3.txt b/Documentation/devicetree/bindings/arm/gic-v3.txt
index ddfade40ac59..7803e77d85cb 100644
--- a/Documentation/devicetree/bindings/arm/gic-v3.txt
+++ b/Documentation/devicetree/bindings/arm/gic-v3.txt
@@ -57,6 +57,8 @@ used to route Message Signalled Interrupts (MSI) to the CPUs.
 These nodes must have the following properties:
 - compatible : Should at least contain  "arm,gic-v3-its".
 - msi-controller : Boolean property. Identifies the node as an MSI controller
+- #msi-cells: Must be <1>. The single msi-cell is the DeviceID of the device
+  which will generate the MSI.
 - reg: Specifies the base physical address and size of the ITS
   registers.
 
@@ -83,6 +85,7 @@ Examples:
 		gic-its@2c200000 {
 			compatible = "arm,gic-v3-its";
 			msi-controller;
+			#msi-cells = <1>;
 			reg = <0x0 0x2c200000 0 0x200000>;
 		};
 	};
@@ -107,12 +110,14 @@ Examples:
 		gic-its@2c200000 {
 			compatible = "arm,gic-v3-its";
 			msi-controller;
+			#msi-cells = <1>;
 			reg = <0x0 0x2c200000 0 0x200000>;
 		};
 
 		gic-its@2c400000 {
 			compatible = "arm,gic-v3-its";
 			msi-controller;
+			#msi-cells = <1>;
 			reg = <0x0 0x2c400000 0 0x200000>;
 		};
 	};
-- 
cgit v1.2.3


From eb168b70dea54578a45119fcdb3b48ad5d75fed9 Mon Sep 17 00:00:00 2001
From: Punit Agrawal <punit.agrawal@arm.com>
Date: Tue, 8 Sep 2015 12:20:48 +0100
Subject: of: thermal: Fix inconsitency between cooling-*-state and
 cooling-*-level

The device trees in the kernel as well as the binding description in
Documentation/devicetree/bindings/cpufreq/cpufreq-dt.txt use the
cooling-{min,max}-level property.

Fix the inconsistency with the binding description in
Documentation/devicetree/bindings/thermal/thermal.txt by changing
cooling-*-state properties to cooling-*-level.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ian Campbell <ijc+devicetree@hellion.org.uk>
Cc: Kumar Gala <galak@codeaurora.org>
Signed-off-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/thermal/thermal.txt | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
index 8a49362dea6e..8320186738ff 100644
--- a/Documentation/devicetree/bindings/thermal/thermal.txt
+++ b/Documentation/devicetree/bindings/thermal/thermal.txt
@@ -55,16 +55,16 @@ of heat dissipation). For example a fan's cooling states correspond to
 the different fan speeds possible. Cooling states are referred to by
 single unsigned integers, where larger numbers mean greater heat
 dissipation. The precise set of cooling states associated with a device
-(as referred to be the cooling-min-state and cooling-max-state
+(as referred to by the cooling-min-level and cooling-max-level
 properties) should be defined in a particular device's binding.
 For more examples of cooling devices, refer to the example sections below.
 
 Required properties:
-- cooling-min-state:	An integer indicating the smallest
+- cooling-min-level:	An integer indicating the smallest
   Type: unsigned	cooling state accepted. Typically 0.
   Size: one cell
 
-- cooling-max-state:	An integer indicating the largest
+- cooling-max-level:	An integer indicating the largest
   Type: unsigned	cooling state accepted.
   Size: one cell
 
@@ -225,8 +225,8 @@ cpus {
 			396000  950000
 			198000  850000
 		>;
-		cooling-min-state = <0>;
-		cooling-max-state = <3>;
+		cooling-min-level = <0>;
+		cooling-max-level = <3>;
 		#cooling-cells = <2>; /* min followed by max */
 	};
 	...
@@ -240,8 +240,8 @@ cpus {
 	 */
 	fan0: fan@0x48 {
 		...
-		cooling-min-state = <0>;
-		cooling-max-state = <9>;
+		cooling-min-level = <0>;
+		cooling-max-level = <9>;
 		#cooling-cells = <2>; /* min followed by max */
 	};
 };
-- 
cgit v1.2.3


From 9fa04fbeb78eab6f3817e444644348c46c7e2370 Mon Sep 17 00:00:00 2001
From: Punit Agrawal <punit.agrawal@arm.com>
Date: Tue, 8 Sep 2015 12:20:49 +0100
Subject: of: thermal: Mark cooling-*-level properties optional

The cooling-{min,max}-level properties are marked as optional in
Documentation/devicetree/bindings/cpufreq/cpufreq-dt.txt and the usage
in various device tree matches this, i.e., some cooling device in the
device trees provide these properties while others do not.

Make the bindings in
Documentation/devicetree/bindings/thermal/thermal.txt consistent with
the cpufreq-dt bindings by marking the cooling-*-level properties as
optional.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ian Campbell <ijc+devicetree@hellion.org.uk>
Cc: Kumar Gala <galak@codeaurora.org>
Signed-off-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/thermal/thermal.txt | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt
index 8320186738ff..41b817f7b670 100644
--- a/Documentation/devicetree/bindings/thermal/thermal.txt
+++ b/Documentation/devicetree/bindings/thermal/thermal.txt
@@ -60,14 +60,6 @@ properties) should be defined in a particular device's binding.
 For more examples of cooling devices, refer to the example sections below.
 
 Required properties:
-- cooling-min-level:	An integer indicating the smallest
-  Type: unsigned	cooling state accepted. Typically 0.
-  Size: one cell
-
-- cooling-max-level:	An integer indicating the largest
-  Type: unsigned	cooling state accepted.
-  Size: one cell
-
 - #cooling-cells:	Used to provide cooling device specific information
   Type: unsigned	while referring to it. Must be at least 2, in order
   Size: one cell      	to specify minimum and maximum cooling state used
@@ -77,6 +69,15 @@ Required properties:
 			See Cooling device maps section below for more details
 			on how consumers refer to cooling devices.
 
+Optional properties:
+- cooling-min-level:	An integer indicating the smallest
+  Type: unsigned	cooling state accepted. Typically 0.
+  Size: one cell
+
+- cooling-max-level:	An integer indicating the largest
+  Type: unsigned	cooling state accepted.
+  Size: one cell
+
 * Trip points
 
 The trip node is a node to describe a point in the temperature domain
-- 
cgit v1.2.3


From 31b47ae3f18b31d86f553198e624b3b38f6397a2 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede@redhat.com>
Date: Sat, 12 Sep 2015 12:52:49 +0200
Subject: devicetree: bindings: Extend the bma180 bindings with bma250 info

The bma180 / bma250 accelerometers share a driver (at least under Linux),
so it makes sense to also have their bindings info in a single .txt.

This commit extends the bma180 bindings with bma250 bindings, specifically
it specifies how the 2 seperate interrupts the bma250 has must be listed
in devicetree. The existing bma180 driver is already fully compatible
with the specified bindings.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/iio/accel/bma180.txt | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/iio/accel/bma180.txt b/Documentation/devicetree/bindings/iio/accel/bma180.txt
index c5933573e0f6..4a3679d54457 100644
--- a/Documentation/devicetree/bindings/iio/accel/bma180.txt
+++ b/Documentation/devicetree/bindings/iio/accel/bma180.txt
@@ -1,10 +1,11 @@
-* Bosch BMA180 triaxial acceleration sensor
+* Bosch BMA180 / BMA250 triaxial acceleration sensor
 
 http://omapworld.com/BMA180_111_1002839.pdf
+http://ae-bst.resource.bosch.com/media/products/dokumente/bma250/bst-bma250-ds002-05.pdf
 
 Required properties:
 
-  - compatible : should be "bosch,bma180"
+  - compatible : should be "bosch,bma180" or "bosch,bma250"
   - reg : the I2C address of the sensor
 
 Optional properties:
@@ -13,6 +14,9 @@ Optional properties:
 
   - interrupts : interrupt mapping for GPIO IRQ, it should by configured with
 		flags IRQ_TYPE_LEVEL_HIGH | IRQ_TYPE_EDGE_RISING
+		For the bma250 the first interrupt listed must be the one
+		connected to the INT1 pin, the second (optional) interrupt
+		listed must be the one connected to the INT2 pin.
 
 Example:
 
-- 
cgit v1.2.3


From 562d897d15a6e2bab3cc9b4c172286b612834fe8 Mon Sep 17 00:00:00 2001
From: David Ahern <dsa@cumulusnetworks.com>
Date: Tue, 15 Sep 2015 10:50:14 -0600
Subject: net: Add documentation for VRF device

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/vrf.txt | 96 ++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS                      |  1 +
 2 files changed, 97 insertions(+)
 create mode 100644 Documentation/networking/vrf.txt

(limited to 'Documentation')

diff --git a/Documentation/networking/vrf.txt b/Documentation/networking/vrf.txt
new file mode 100644
index 000000000000..031ef4a63485
--- /dev/null
+++ b/Documentation/networking/vrf.txt
@@ -0,0 +1,96 @@
+Virtual Routing and Forwarding (VRF)
+====================================
+The VRF device combined with ip rules provides the ability to create virtual
+routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the
+Linux network stack. One use case is the multi-tenancy problem where each
+tenant has their own unique routing tables and in the very least need
+different default gateways.
+
+Processes can be "VRF aware" by binding a socket to the VRF device. Packets
+through the socket then use the routing table associated with the VRF
+device. An important feature of the VRF device implementation is that it
+impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected
+(ie., they do not need to be run in each VRF). The design also allows
+the use of higher priority ip rules (Policy Based Routing, PBR) to take
+precedence over the VRF device rules directing specific traffic as desired.
+
+In addition, VRF devices allow VRFs to be nested within namespaces. For
+example network namespaces provide separation of network interfaces at L1
+(Layer 1 separation), VLANs on the interfaces within a namespace provide
+L2 separation and then VRF devices provide L3 separation.
+
+Design
+------
+A VRF device is created with an associated route table. Network interfaces
+are then enslaved to a VRF device:
+
+         +-----------------------------+
+         |           vrf-blue          |  ===> route table 10
+         +-----------------------------+
+            |        |            |
+         +------+ +------+     +-------------+
+         | eth1 | | eth2 | ... |    bond1    |
+         +------+ +------+     +-------------+
+                                  |       |
+                              +------+ +------+
+                              | eth8 | | eth9 |
+                              +------+ +------+
+
+Packets received on an enslaved device and are switched to the VRF device
+using an rx_handler which gives the impression that packets flow through
+the VRF device. Similarly on egress routing rules are used to send packets
+to the VRF device driver before getting sent out the actual interface. This
+allows tcpdump on a VRF device to capture all packets into and out of the
+VRF as a whole.[1] Similiarly, netfilter [2] and tc rules can be applied
+using the VRF device to specify rules that apply to the VRF domain as a whole.
+
+[1] Packets in the forwarded state do not flow through the device, so those
+    packets are not seen by tcpdump. Will revisit this limitation in a
+    future release.
+
+[2] Iptables on ingress is limited to NF_INET_PRE_ROUTING only with skb->dev
+    set to real ingress device and egress is limited to NF_INET_POST_ROUTING.
+    Will revisit this limitation in a future release.
+
+
+Setup
+-----
+1. VRF device is created with an association to a FIB table.
+   e.g, ip link add vrf-blue type vrf table 10
+        ip link set dev vrf-blue up
+
+2. Rules are added that send lookups to the associated FIB table when the
+   iif or oif is the VRF device. e.g.,
+       ip ru add oif vrf-blue table 10
+       ip ru add iif vrf-blue table 10
+
+   Set the default route for the table (and hence default route for the VRF).
+   e.g, ip route add table 10 prohibit default
+
+3. Enslave L3 interfaces to a VRF device.
+   e.g,  ip link set dev eth1 master vrf-blue
+
+   Local and connected routes for enslaved devices are automatically moved to
+   the table associated with VRF device. Any additional routes depending on
+   the enslaved device will need to be reinserted following the enslavement.
+
+4. Additional VRF routes are added to associated table.
+   e.g., ip route add table 10 ...
+
+
+Applications
+------------
+Applications that are to work within a VRF need to bind their socket to the
+VRF device:
+
+    setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1);
+
+or to specify the output device using cmsg and IP_PKTINFO.
+
+
+Limitations
+-----------
+VRF device currently only works for IPv4. Support for IPv6 is under development.
+
+Index of original ingress interface is not available via cmsg. Will address
+soon.
diff --git a/MAINTAINERS b/MAINTAINERS
index 310da4295c70..d4d9d4f6d271 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11243,6 +11243,7 @@ L:	netdev@vger.kernel.org
 S:	Maintained
 F:	drivers/net/vrf.c
 F:	include/net/vrf.h
+F:	Documentation/networking/vrf.txt
 
 VT1211 HARDWARE MONITOR DRIVER
 M:	Juerg Haefliger <juergh@gmail.com>
-- 
cgit v1.2.3


From 2e64126bb0fb581b55e52e51ec451ebb0d6769c8 Mon Sep 17 00:00:00 2001
From: Phil Sutter <phil@nwl.cc>
Date: Tue, 15 Sep 2015 10:33:07 +0200
Subject: net: qdisc: enhance default_qdisc documentation

Aside from some lingual cleanup, point out which interfaces are not or
partly covered by this setting.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/sysctl/net.txt | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index 6294b5186ae5..809ab6efcc74 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -54,13 +54,15 @@ default_qdisc
 --------------
 
 The default queuing discipline to use for network devices. This allows
-overriding the default queue discipline of pfifo_fast with an
-alternative. Since the default queuing discipline is created with the
-no additional parameters so is best suited to queuing disciplines that
-work well without configuration like stochastic fair queue (sfq),
-CoDel (codel) or fair queue CoDel (fq_codel). Don't use queuing disciplines
-like Hierarchical Token Bucket or Deficit Round Robin which require setting
-up classes and bandwidths.
+overriding the default of pfifo_fast with an alternative. Since the default
+queuing discipline is created without additional parameters so is best suited
+to queuing disciplines that work well without configuration like stochastic
+fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use
+queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin
+which require setting up classes and bandwidths. Note that physical multiqueue
+interfaces still use mq as root qdisc, which in turn uses this default for its
+leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead
+default to noqueue.
 Default: pfifo_fast
 
 busy_read
-- 
cgit v1.2.3


From f15a66e68422ca6bb783142780ad440067f6cc89 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas@wunner.de>
Date: Sat, 5 Sep 2015 11:22:39 +0200
Subject: drm: Spell vga_switcheroo consistently

Currently everyone and their dog has their own favourite spelling
for vga_switcheroo. This makes it hard to grep dmesg for log entries
relating to vga_switcheroo. It also makes it hard to find related
source files in the tree.

vga_switcheroo.c uses pr_fmt "vga_switcheroo". Use that everywhere.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 Documentation/DocBook/drm.tmpl     | 2 +-
 drivers/gpu/drm/omapdrm/omap_drv.c | 2 +-
 include/linux/fb.h                 | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/DocBook/drm.tmpl b/Documentation/DocBook/drm.tmpl
index 9ddf8c6cb887..30401f927156 100644
--- a/Documentation/DocBook/drm.tmpl
+++ b/Documentation/DocBook/drm.tmpl
@@ -3646,7 +3646,7 @@ void (*postclose) (struct drm_device *, struct drm_file *);</synopsis>
 	plane properties to default value, so that a subsequent open of the
 	device will not inherit state from the previous user. It can also be
 	used to execute delayed power switching state changes, e.g. in
-	conjunction with the vga-switcheroo infrastructure. Beyond that KMS
+	conjunction with the vga_switcheroo infrastructure. Beyond that KMS
 	drivers should not do any further cleanup. Only legacy UMS drivers might
 	need to clean up device state so that the vga console or an independent
 	fbdev driver could take over.
diff --git a/drivers/gpu/drm/omapdrm/omap_drv.c b/drivers/gpu/drm/omapdrm/omap_drv.c
index a5f9d8bf75ed..d685e23449ce 100644
--- a/drivers/gpu/drm/omapdrm/omap_drv.c
+++ b/drivers/gpu/drm/omapdrm/omap_drv.c
@@ -753,7 +753,7 @@ static void dev_lastclose(struct drm_device *dev)
 {
 	int i;
 
-	/* we don't support vga-switcheroo.. so just make sure the fbdev
+	/* we don't support vga_switcheroo.. so just make sure the fbdev
 	 * mode is active
 	 */
 	struct omap_drm_private *priv = dev->dev_private;
diff --git a/include/linux/fb.h b/include/linux/fb.h
index bc9afa74ee11..be40dbaed11e 100644
--- a/include/linux/fb.h
+++ b/include/linux/fb.h
@@ -156,7 +156,7 @@ struct fb_cursor_user {
 #define FB_EVENT_GET_REQ                0x0D
 /*      Unbind from the console if possible */
 #define FB_EVENT_FB_UNBIND              0x0E
-/*      CONSOLE-SPECIFIC: remap all consoles to new fb - for vga switcheroo */
+/*      CONSOLE-SPECIFIC: remap all consoles to new fb - for vga_switcheroo */
 #define FB_EVENT_REMAP_ALL_CONSOLE      0x0F
 /*      A hardware display blank early change occured */
 #define FB_EARLY_EVENT_BLANK		0x10
-- 
cgit v1.2.3


From de24c18c0faca5ebd618e1cb87f5489745e40475 Mon Sep 17 00:00:00 2001
From: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Date: Sat, 12 Sep 2015 02:06:09 +0300
Subject: PCI: rcar: Add R8A7794 support

Add Renesas R8A7794 SoC support to the Renesas R-Car gen2 PCI driver.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
---
 Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt | 3 ++-
 drivers/pci/host/pci-rcar-gen2.c                        | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt b/Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt
index d8ef5bf50f11..7fab84b33531 100644
--- a/Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt
+++ b/Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt
@@ -7,7 +7,8 @@ OHCI and EHCI controllers.
 
 Required properties:
 - compatible: "renesas,pci-r8a7790" for the R8A7790 SoC;
-	      "renesas,pci-r8a7791" for the R8A7791 SoC.
+	      "renesas,pci-r8a7791" for the R8A7791 SoC;
+	      "renesas,pci-r8a7794" for the R8A7794 SoC.
 - reg:	A list of physical regions to access the device: the first is
 	the operational registers for the OHCI/EHCI controllers and the
 	second is for the bridge configuration and control registers.
diff --git a/drivers/pci/host/pci-rcar-gen2.c b/drivers/pci/host/pci-rcar-gen2.c
index 367e28fa7564..c4f64bfee551 100644
--- a/drivers/pci/host/pci-rcar-gen2.c
+++ b/drivers/pci/host/pci-rcar-gen2.c
@@ -362,6 +362,7 @@ static int rcar_pci_probe(struct platform_device *pdev)
 static struct of_device_id rcar_pci_of_match[] = {
 	{ .compatible = "renesas,pci-r8a7790", },
 	{ .compatible = "renesas,pci-r8a7791", },
+	{ .compatible = "renesas,pci-r8a7794", },
 	{ },
 };
 
-- 
cgit v1.2.3


From e7ae65ced7dd71aa3dc29bda7a94ac82d9ea7751 Mon Sep 17 00:00:00 2001
From: Javier Martinez Canillas <javier@osg.samsung.com>
Date: Mon, 21 Sep 2015 14:57:25 +0200
Subject: gpio: mention in DT binding doc that <name>-gpio is deprecated

The gpiolib supports parsing DT properties of the form <name>-gpio but it
was only added for compatibility with older DT bindings that got it wrong
and should not be used in newer bindings.

The commit that added support for this was:

dd34c37aa3e8 ("gpio: of: Allow -gpio suffix for property names")

but didn't update the documentation to explain this so it's been a source
of confusion. So let's make this clear in the GPIO DT binding doc.

Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/gpio/gpio.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/gpio/gpio.txt b/Documentation/devicetree/bindings/gpio/gpio.txt
index 5788d5cf1252..82d40e2505f6 100644
--- a/Documentation/devicetree/bindings/gpio/gpio.txt
+++ b/Documentation/devicetree/bindings/gpio/gpio.txt
@@ -16,7 +16,9 @@ properties, each containing a 'gpio-list':
 GPIO properties should be named "[<name>-]gpios", with <name> being the purpose
 of this GPIO for the device. While a non-existent <name> is considered valid
 for compatibility reasons (resolving to the "gpios" property), it is not allowed
-for new bindings.
+for new bindings. Also, GPIO properties named "[<name>-]gpio" are valid and old
+bindings use it, but are only supported for compatibility reasons and should not
+be used for newer bindings since it has been deprecated.
 
 GPIO properties can contain one or more GPIO phandles, but only in exceptional
 cases should they contain more than one. If your device uses several GPIOs with
-- 
cgit v1.2.3


From a13f18f59d2646754cda3662a9e215ff43e7a7d5 Mon Sep 17 00:00:00 2001
From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Date: Thu, 24 Sep 2015 15:53:56 +0100
Subject: Documentation: arm: Fix typo in the idle-states bindings examples

The idle-states bindings mandate that the entry-method string
in the idle-states node must be "psci" for ARM v8 64-bit systems,
but the examples in the bindings report a wrong entry-method string.
Owing to this typo, some dts in the kernel wrongly defined the
entry-method property, since they likely cut and pasted the example
definition without paying attention to the bindings definitions.

This patch fixes the typo in the DT idle states bindings examples and
respective dts in the kernel so that the bindings and related dts
files are made compliant.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Howard Chen <howard.chen@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/arm/idle-states.txt | 2 +-
 arch/arm64/boot/dts/mediatek/mt8173.dtsi              | 2 +-
 arch/arm64/boot/dts/rockchip/rk3368.dtsi              | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/devicetree/bindings/arm/idle-states.txt b/Documentation/devicetree/bindings/arm/idle-states.txt
index a8274eabae2e..b8e41c148a3c 100644
--- a/Documentation/devicetree/bindings/arm/idle-states.txt
+++ b/Documentation/devicetree/bindings/arm/idle-states.txt
@@ -497,7 +497,7 @@ cpus {
 	};
 
 	idle-states {
-		entry-method = "arm,psci";
+		entry-method = "psci";
 
 		CPU_RETENTION_0_0: cpu-retention-0-0 {
 			compatible = "arm,idle-state";
diff --git a/arch/arm64/boot/dts/mediatek/mt8173.dtsi b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
index d18ee4259ee5..06a15644be38 100644
--- a/arch/arm64/boot/dts/mediatek/mt8173.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
@@ -81,7 +81,7 @@
 		};
 
 		idle-states {
-			entry-method = "arm,psci";
+			entry-method = "psci";
 
 			CPU_SLEEP_0: cpu-sleep-0 {
 				compatible = "arm,idle-state";
diff --git a/arch/arm64/boot/dts/rockchip/rk3368.dtsi b/arch/arm64/boot/dts/rockchip/rk3368.dtsi
index a712bea3bf2c..cc093a482aa4 100644
--- a/arch/arm64/boot/dts/rockchip/rk3368.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3368.dtsi
@@ -106,7 +106,7 @@
 		};
 
 		idle-states {
-			entry-method = "arm,psci";
+			entry-method = "psci";
 
 			cpu_sleep: cpu-sleep-0 {
 				compatible = "arm,idle-state";
-- 
cgit v1.2.3