This commit ensures strict compliance with OpenCL 2.0 floating-point
conversion specifications:
* **RISCVISelLowering.cpp**:
- Map FRINT to RNE (round-to-nearest-even) instead of DYN (dynamic)
- Add proper FRM save/restore for vector floating-point operations
* **VentusInstrInfoV.td**:
- Enable VFCVT_RTZ_* instructions for truncation-based conversions
- Use RTZ (round-to-zero) mode for fp-to-int conversions to match
OpenCL spec
- Replace dynamic rounding with explicit RTZ for integer conversions
* **gen_convert.py**:
- Improve saturation handling in type conversions
- Add proper edge case handling for integer source saturation
- Distinguish between integer and float source conversion logic
* **Test Updates**:
- Update float.ll to expect RTZ instructions for fp-to-int conversions
- Add fround.ll test cases for ceil/floor/rint operations
These changes ensure that Ventus GPGPU backend produces OpenCL 2.0 compliant
floating-point conversion behavior, particularly for rounding modes and
saturation handling.
* add ventus-fix-mixed-phi-pass, and put set isConvergent=1 in RVInst
* add ventus-fix-mixed-phi pass test file
---------
Co-authored-by: ivan lei <ivanlei0@163.com>
The FP_CONTRACT pragma can be used to allow (if the state is on) or disallow (if the state is off) the implementation to contract expressions.
The removed patterns previously matched ternary floating-point expressions
such as fadd(fmul(...)), enabling contraction via FMA even when not
explicitly allowed.
These patterns were incorrect because they ignored the current floating-point
contraction state (e.g., controlled by the FP_CONTRACT pragma).
In previous logic ,default memory access flag is 0b00, this will cause
all no-local/no-private related instructions return true when fall into
`RISCVInstrInfo::isUniformMemoryAccess` logic
1. If the move instruction needs to be moved forward, it will only be inserted after the last corresponding move instruction in the predecessor basic block.
2. The first instruction of the predecessor is also counted as a possible insertion point.
1. Add VMV_V_X in emitEpilogue.
2. Change all the positive numbers added by TP to negative numbers(in LowerCall).
3. Fix the LowerCall function to generate correct store instruction transferring the function parameters.
4. Fix hasReservedCallFrame function to return false.
5. Align the convention between caller and callee in the case of passing parameters by stack.
6. Change the stack offset calculation method of TP.
7. Unify the calculation of TP stack and SP stack offset.
8. Node that needing to manually modify the calculation of sp offset in the workitem.S. Since the growth direction of the stack is different from that of the traditional RISCV, it is now stipulated that for both the SP stack and the TP stack, the data is stored where the stack pointer is not offset.
9. There is a SPAdj check in eliminateFrameIndex function. but we don't need this value at all so that adding a getSPAdjust function to return zero.
10. V33 is a wrong value when parameters pushed to TP stack so there must be a MV instruction to refresh V33 after ADJCALLSTACKDOWN.
Summary: LegaLegalized vector parameters, but not been added FileCheck now.
Test Plan: Legalized vector parameters
Differential Revision: http://www.tpt.com/D740