Commit Graph

10 Commits

Author SHA1 Message Date
Shuai Xue a9bc651453 ACPI: APEI: explicit init HEST and GHES in apci_init
ANBZ: #461

cherry-picked from https://lore.kernel.org/linux-arm-kernel/20220122052618.1074-1-xueshuai@linux.alibaba.com/

From commit e147133a42 ("ACPI / APEI: Make hest.c manage the estatus
memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage
the estatus memory pool. On the other hand, ghes_init() relies on
sdei_init() to detect the SDEI version and (un)register events. The
dependencies are as follows:

    ghes_init() => acpi_hest_init() => acpi_bus_init() => acpi_init()
    ghes_init() => sdei_init()

HEST is not PCI-specific and initcall ordering is implicit and not
well-defined within a level.

Based on above, remove acpi_hest_init() from acpi_pci_root_init() and
convert ghes_init() and sdei_init() from initcalls to explicit calls in the
following order:

    acpi_hest_init()
    ghes_init()
        sdei_init()

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: luanshi <zhangliguang@linux.alibaba.com>
2022-08-01 12:22:20 +00:00
Liguang Zhang d412797466 anolis: ACPI / APEI: restore interrupt before panic in sdei flow
ANBZ: #402

When hest acpi table configure Hardware Error Notification type as
Software Delegated Exception(0x0B) for RAS event, OS RAS interacts with
ATF by SDEI mechanism. On the firmware first system, OS was notified by
ATF sdei call.

The calling flow like as below when fatal RAS error happens:

ATF notify OS flow:
  sdei_dispatch_event()
    ehf_activate_priority()
      call sdei callback  // callback registered by OS
    ehf_deactivate_priority()

OS sdei callback:
  sdei_asm_handler()
    __sdei_handler()
      _sdei_handler()
        sdei_event_handler()
          ghes_sdei_critical_callback()
            ghes_in_nmi_queue_one_entry()
              /* if RAS error is fatal */
              __ghes_panic()
                panic()

If fatal RAS error occurred, panic was called in sdei_asm_handle()
without ehf_deactivate_priority executed, which lead interrupt masked.
If interrupt masked, system would be halted in kdump flow like this:

arm-smmu-v3 arm-smmu-v3.3.auto: allocated 65536 entries for cmdq
arm-smmu-v3 arm-smmu-v3.3.auto: allocated 32768 entries for evtq
arm-smmu-v3 arm-smmu-v3.3.auto: allocated 65536 entries for priq
arm-smmu-v3 arm-smmu-v3.3.auto: SMMU currently enabled! Resetting...

After debug, we found accurate halted position is:
arm_smmu_device_probe()
  arm_smmu_device_reset()
    arm_smmu_device_disable()
      arm_smmu_write_reg_sync()
        readl_relaxed_poll_timeout()
          readx_poll_timeout()
            read_poll_timeout()
              usleep_range() // hrtimer is never waked.

So interrupt should be restored before panic otherwise kdump will trigger
error. In the process of sdei, a SDEI_EVENT_COMPLETE_AND_RESUME call
should be called before panic for a completed run of ehf_deactivate_priority().

Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Ruidong Tian <tianruidong@@linux.alibaba.com>
Reviewed-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
2022-08-01 12:16:39 +00:00
Xiongfeng Wang b5059f88ec openEuler: sdei_watchdog: set secure timer period base on 'watchdog_thresh'
to #34631759

commit 340993c25c47b764db26536f405d94ca9116a720 openEuler

hulk inclusion
category: feature
bugzilla: 48046
CVE: NA

-------------------------------------------------------------------------

The period of the secure timer is set to 3s by BIOS. That means the
secure timer interrupt will trigger every 3 seconds. To further decrease
the NMI watchdog's effect on performance, this patch set the period of
the secure timer base on 'watchdog_thresh'. This variable is initiallized
to 10s. We can also set the period at runtime by modifying
'/proc/sys/kernel/watchdog_thresh'

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
2022-08-01 11:30:06 +00:00
Xiongfeng Wang 1df5771bed openEuler: sdei_watchdog: clear EOI of the secure timer before kdump
to #34631759

commit c6a7c646cb9a202ae609c7585a46217280e58114 openEuler

hulk inclusion
category: feature
bugzilla: 48046
CVE: NA

-------------------------------------------------------------------------

When we panic in hardlockup, the secure timer interrupt remains activate
because firmware clear eoi after dispatch is completed. This will cause
arm_arch_timer interrupt failed to trigger in the second kernel.

This patch add a new SMC helper to clear eoi of a certain interrupt and
clear eoi of the secure timer before booting the second kernel.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
2022-08-01 11:30:05 +00:00
Xiongfeng Wang e2b1730705 openEuler: firmware: arm_sdei: make 'sdei_api_event_disable/enable' public
to #34631759

commit 6f240e629b74d6d3a9f624df88b365b3c38a904a openEuler

hulk inclusion
category: feature
bugzilla: 48046
CVE: NA

-------------------------------------------------------------------------

NMI Watchdog need to enable the event for each core individually. But the
existing public api 'sdei_event_enable' enable events for all cores when
the event type is private.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
2022-08-01 11:30:05 +00:00
Xiongfeng Wang c044f339ee openEuler: firmware: arm_sdei: add interrupt binding api
to #34631759

commit 11bed651d02d25319d31a3bf14668a0c3d7aed6f openEuler

hulk inclusion
category: feature
bugzilla: 48046
CVE: NA

-------------------------------------------------------------------------

This patch add a interrupt binding api function which returns the binded
event number.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
2022-08-01 11:30:04 +00:00
Mark Rutland e6ea46511b firmware: arm_sdei: use common SMCCC_CONDUIT_*
Now that we have common definitions for SMCCC conduits, move the SDEI
code over to them, and remove the SDEI-specific definitions.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-10-14 10:55:14 +01:00
James Morse f9f05395f3 ACPI / APEI: Add support for the SDEI GHES Notification type
If the GHES notification type is SDEI, register the provided event
using the SDEI-GHES helper.

SDEI may be one of two types of event, normal and critical. Critical
events can interrupt normal events, so these must have separate
fixmap slots and locks in case both event types are in use.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-11 11:07:49 +01:00
James Morse f96935d3bc firmware: arm_sdei: Add ACPI GHES registration helper
APEI's Generic Hardware Error Source structures do not describe
whether the SDEI event is shared or private, as this information is
discoverable via the API.

GHES needs to know whether an event is normal or critical to avoid
sharing locks or fixmap entries, but GHES shouldn't have to know about
the SDEI API.

Add a helper to register the GHES using the appropriate normal or
critical callback.

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-11 11:07:49 +01:00
James Morse ad6eb31ef9 firmware: arm_sdei: Add driver for Software Delegated Exceptions
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement firmware notifications (such as
firmware-first RAS) or promote an IRQ that has been promoted to a
firmware-assisted NMI.

Add the code for detecting the SDEI version and the framework for
registering and unregistering events. Subsequent patches will add the
arch-specific backend code and the necessary power management hooks.

Only shared events are supported, power management, private events and
discovery for ACPI systems will be added by later patches.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-13 10:44:56 +00:00