ANBZ: #461
cherry-picked from https://lore.kernel.org/linux-arm-kernel/20220122052618.1074-1-xueshuai@linux.alibaba.com/
From commit e147133a42 ("ACPI / APEI: Make hest.c manage the estatus
memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage
the estatus memory pool. On the other hand, ghes_init() relies on
sdei_init() to detect the SDEI version and (un)register events. The
dependencies are as follows:
ghes_init() => acpi_hest_init() => acpi_bus_init() => acpi_init()
ghes_init() => sdei_init()
HEST is not PCI-specific and initcall ordering is implicit and not
well-defined within a level.
Based on above, remove acpi_hest_init() from acpi_pci_root_init() and
convert ghes_init() and sdei_init() from initcalls to explicit calls in the
following order:
acpi_hest_init()
ghes_init()
sdei_init()
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: luanshi <zhangliguang@linux.alibaba.com>
ANBZ: #402
When hest acpi table configure Hardware Error Notification type as
Software Delegated Exception(0x0B) for RAS event, OS RAS interacts with
ATF by SDEI mechanism. On the firmware first system, OS was notified by
ATF sdei call.
The calling flow like as below when fatal RAS error happens:
ATF notify OS flow:
sdei_dispatch_event()
ehf_activate_priority()
call sdei callback // callback registered by OS
ehf_deactivate_priority()
OS sdei callback:
sdei_asm_handler()
__sdei_handler()
_sdei_handler()
sdei_event_handler()
ghes_sdei_critical_callback()
ghes_in_nmi_queue_one_entry()
/* if RAS error is fatal */
__ghes_panic()
panic()
If fatal RAS error occurred, panic was called in sdei_asm_handle()
without ehf_deactivate_priority executed, which lead interrupt masked.
If interrupt masked, system would be halted in kdump flow like this:
arm-smmu-v3 arm-smmu-v3.3.auto: allocated 65536 entries for cmdq
arm-smmu-v3 arm-smmu-v3.3.auto: allocated 32768 entries for evtq
arm-smmu-v3 arm-smmu-v3.3.auto: allocated 65536 entries for priq
arm-smmu-v3 arm-smmu-v3.3.auto: SMMU currently enabled! Resetting...
After debug, we found accurate halted position is:
arm_smmu_device_probe()
arm_smmu_device_reset()
arm_smmu_device_disable()
arm_smmu_write_reg_sync()
readl_relaxed_poll_timeout()
readx_poll_timeout()
read_poll_timeout()
usleep_range() // hrtimer is never waked.
So interrupt should be restored before panic otherwise kdump will trigger
error. In the process of sdei, a SDEI_EVENT_COMPLETE_AND_RESUME call
should be called before panic for a completed run of ehf_deactivate_priority().
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Ruidong Tian <tianruidong@@linux.alibaba.com>
Reviewed-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
to #34631759
commit 340993c25c47b764db26536f405d94ca9116a720 openEuler
hulk inclusion
category: feature
bugzilla: 48046
CVE: NA
-------------------------------------------------------------------------
The period of the secure timer is set to 3s by BIOS. That means the
secure timer interrupt will trigger every 3 seconds. To further decrease
the NMI watchdog's effect on performance, this patch set the period of
the secure timer base on 'watchdog_thresh'. This variable is initiallized
to 10s. We can also set the period at runtime by modifying
'/proc/sys/kernel/watchdog_thresh'
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
to #34631759
commit c6a7c646cb9a202ae609c7585a46217280e58114 openEuler
hulk inclusion
category: feature
bugzilla: 48046
CVE: NA
-------------------------------------------------------------------------
When we panic in hardlockup, the secure timer interrupt remains activate
because firmware clear eoi after dispatch is completed. This will cause
arm_arch_timer interrupt failed to trigger in the second kernel.
This patch add a new SMC helper to clear eoi of a certain interrupt and
clear eoi of the secure timer before booting the second kernel.
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
to #34631759
commit 6f240e629b74d6d3a9f624df88b365b3c38a904a openEuler
hulk inclusion
category: feature
bugzilla: 48046
CVE: NA
-------------------------------------------------------------------------
NMI Watchdog need to enable the event for each core individually. But the
existing public api 'sdei_event_enable' enable events for all cores when
the event type is private.
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
to #34631759
commit 11bed651d02d25319d31a3bf14668a0c3d7aed6f openEuler
hulk inclusion
category: feature
bugzilla: 48046
CVE: NA
-------------------------------------------------------------------------
This patch add a interrupt binding api function which returns the binded
event number.
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Now that we have common definitions for SMCCC conduits, move the SDEI
code over to them, and remove the SDEI-specific definitions.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
If the GHES notification type is SDEI, register the provided event
using the SDEI-GHES helper.
SDEI may be one of two types of event, normal and critical. Critical
events can interrupt normal events, so these must have separate
fixmap slots and locks in case both event types are in use.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
APEI's Generic Hardware Error Source structures do not describe
whether the SDEI event is shared or private, as this information is
discoverable via the API.
GHES needs to know whether an event is normal or critical to avoid
sharing locks or fixmap entries, but GHES shouldn't have to know about
the SDEI API.
Add a helper to register the GHES using the appropriate normal or
critical callback.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement firmware notifications (such as
firmware-first RAS) or promote an IRQ that has been promoted to a
firmware-assisted NMI.
Add the code for detecting the SDEI version and the framework for
registering and unregistering events. Subsequent patches will add the
arch-specific backend code and the necessary power management hooks.
Only shared events are supported, power management, private events and
discovery for ACPI systems will be added by later patches.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>