概述

源代码：/xiangshan/frontend/SC.scala

SC实现为一个trait（特征），TAGE通过继承这个特征（实现多继承）来启用SC相关的功能。

参数

// SCNTables = 4
// SCNRows = 512
// SCCtrBits = 6
// SCHistLens = Seq(0, 4, 10, 16)
val SCTableInfos = Seq.fill(SCNTables)((SCNRows, SCCtrBits)) zip SCHistLens map {
  case ((n, cb), h) => (n, cb, h)
}
/*
  SCTableInfos =       Rows Ctr Hist
								  Seq((512,  6,  0 ),
										  (512,  6,  4 ),
											(512,  6,  10),
										  (512,  6,  16))
*/

SCNTables: SC表的个数
SCNRows: SC表中SRAM的行数，为2个br合计的数量
SCCtrBits: SC表中计数器的位数
SCHistLens: SC表中使用的历史长度

SC

总体架构

SCTable

SCTable是SC对应4个表的实现模块。需要注意的是，XiangShan的SC表使用了一种变体，其每个表包含每周期2个br的信息，且每个br分为两个ctr值，分别对应TAGE预测taken以及not taken的计数器值，也就是说，根据该br在TAGE中的预测结果，最后resp选择的TAGE预测结果对应的ctr值，update也只update TAGE预测结果对应的ctr值。

IO

class SCTableIO(val ctrBits: Int = 6)(implicit p: Parameters) extends SCBundle {
  val req = Input(Valid(new SCReq))
  val resp = Output(new SCResp(ctrBits))
  val update = Input(new SCUpdate(ctrBits))
}

class SCReq(implicit p: Parameters) extends TageReq
class TageReq(implicit p: Parameters) extends TageBundle {
  val pc = UInt(VAddrBits.W)
  val ghist = UInt(HistoryLength.W)
  val folded_hist = new AllFoldedHistories(foldedGHistInfos)
}

class SCResp(val ctrBits: Int = 6)(implicit p: Parameters) extends SCBundle {
  val ctrs = Vec(numBr, Vec(2, SInt(ctrBits.W)))
}

class SCUpdate(val ctrBits: Int = 6)(implicit p: Parameters) extends SCBundle {
  val pc = UInt(VAddrBits.W)
  val folded_hist = new AllFoldedHistories(foldedGHistInfos)
  val mask = Vec(numBr, Bool())
  val oldCtrs = Vec(numBr, SInt(ctrBits.W))
  val tagePreds = Vec(numBr, Bool())
  val takens = Vec(numBr, Bool())
}

req: SC模块请求的信号，即TageReq，包括pc、全局分支历史ghist以及折叠后的历史folded_hist

resp: SC模块的输出，包括每个br的SC计数器信息

update: SC模块的更新信号输入

SRAM

val table = Module(new SRAMTemplate(SInt(ctrBits.W), set=nRows, way=2*TageBanks, shouldReset=true, holdRead=true, singlePort=false))

折叠历史

SC由4张表构成，每张表由256个entry组成，每个entry为4-way，对应2个br的2个ctr项信息。

val idxFhInfo = (histLen, min(log2Ceil(nRows), histLen))

折叠历史的管理与TAGE类似，这里不再赘述。

def getIdx(pc: UInt, allFh: AllFoldedHistories) = {
  if (histLen > 0) {
    val idx_fh = allFh.getHistWithInfo(idxFhInfo).folded_hist
    // require(idx_fh.getWidth == log2Ceil(nRows))
    ((pc >> instOffsetBits) ^ idx_fh)(log2Ceil(nRows)-1,0)
  }
  else {
    (pc >> instOffsetBits)(log2Ceil(nRows)-1,0)
  }
}

得到的折叠历史idx_fh与pc进行哈希后得到各个表的index，如果不需要历史则直接使用pc低位作为index。

Resp

将req输入的pc与折叠后的历史哈希后得到s0_idx作为访问SC表的index：

table.io.r.req.valid := io.req.valid
table.io.r.req.bits.setIdx := s0_idx

val per_br_ctrs_unshuffled = table.io.r.resp.data.sliding(2,2).toSeq.map(VecInit(_))
val per_br_ctrs = VecInit((0 until numBr).map(i => Mux1H(
  UIntToOH(get_phy_br_idx(s1_unhashed_idx, i), numBr),
  per_br_ctrs_unshuffled
)))

io.resp.ctrs := per_br_ctrs

将从SRAM中读取的4个数据2/2切分，作为br0/br1的ctr值，再根据pc值来确定这两组ctr值分别属于哪个br。这个过程与TAGE类似，不再赘述。

Update

从IO获取需要更新的老的ctr值，并根据当时TAGE的预测结果选择所需要更新的ctr值。对SC表的更新同时使用了TAGE中的bypass机制，这里不再进行赘述。

SC Top

Threshold

SC的Threshold使用的是参考GEHL predictor中使用的阈值定义方法：

val scThresholds = List.fill(TageBanks)(RegInit(SCThreshold(5)))
val useThresholds = VecInit(scThresholds map (_.thres))

class SCThreshold(val ctrBits: Int = 6)(implicit p: Parameters) extends SCBundle {
  val ctr = UInt(ctrBits.W)
  def satPos(ctr: UInt = this.ctr) = ctr === ((1.U << ctrBits) - 1.U)
  def satNeg(ctr: UInt = this.ctr) = ctr === 0.U
  def neutralVal = (1.U << (ctrBits - 1))
  val thres = UInt(8.W)
  def initVal = 6.U
  def minThres = 6.U
  def maxThres = 31.U
  def update(cause: Bool): SCThreshold = {
    val res = Wire(new SCThreshold(this.ctrBits))
    val newCtr = satUpdate(this.ctr, this.ctrBits, cause)
    val newThres = Mux(res.satPos(newCtr) && this.thres <= maxThres, this.thres + 2.U,
                      Mux(res.satNeg(newCtr) && this.thres >= minThres, this.thres - 2.U,
                      this.thres))
    res.thres := newThres
    res.ctr := Mux(res.satPos(newCtr) || res.satNeg(newCtr), res.neutralVal, newCtr)
    // XSDebug(true.B, p"scThres Update: cause${cause} newCtr ${newCtr} newThres ${newThres}\n")
    res
  }
}

object SCThreshold {
  def apply(bits: Int)(implicit p: Parameters) = {
    val t = Wire(new SCThreshold(ctrBits=bits))
    t.ctr := t.neutralVal
    t.thres := t.initVal
    t
  }
}

使用一个5位的ctr值来控制8位的无符号阈值thres来进行更新。更新的方法将在后续SC的更新中说明。

`aboveThreshold`

aboveThreshold函数用于判断4个表2个br对应的TAGE pred TK/NTK计数器值之和是否超过阈值：

def aboveThreshold(scSum: SInt, tagePvdr: SInt, threshold: UInt): Bool = {
  val signedThres = threshold.zext
  val totalSum = scSum +& tagePvdr
  (scSum >  signedThres - tagePvdr) && pos(totalSum) ||
  (scSum < -signedThres - tagePvdr) && neg(totalSum)
}

SC resp

对于每个br，从4个SC表中读取的对应的TAGE TK/NT两个计数器值，分别进行centered操作后求和(2x+1)：

// for sc ctrs
def getCentered(ctr: SInt): SInt = Cat(ctr, 1.U(1.W)).asSInt
val s1_scTableSums = VecInit(
	(0 to 1) map { i =>
	  ParallelSingedExpandingAdd(s1_scResps map (r => getCentered(r.ctrs(w)(i)))) // TODO: rewrite with wallace tree
}

SCctrsum[w][i]=\sum^{4}_{k=1} SCctr[w][i][k]

其中w是对应br0/br1，i是每个br对应的TAGE TK/NT两个计数器值，k是SC表，一共有4个表。

将s1_scTableSum打一拍得到s2_scTableSum，SC的修正操作在s2阶段中完成：

val s2_scTableSums = RegEnable(s1_scTableSums, io.s1_fire)

XiangShan与典型的SC实现不同，它还将TAGE计数器的值进行了centered的操作：

// for tage ctrs, (2*(ctr-4)+1)*8
def getPvdrCentered(ctr: UInt): SInt = Cat(ctr ^ (1 << (TageCtrBits-1)).U, 1.U(1.W), 0.U(3.W)).asSInt
val s2_tagePrvdCtrCentered = getPvdrCentered(RegEnable(s1_providerResps(w).ctr, io.s1_fire))

推断这样的操作是为了让TAGE的ctr值与SC的ctr值进行对齐（？）

然后将s2_scTableSum与centered后的TAGE计数器值相加，若为正则为跳转，否则为不跳转。

val s2_totalSums = s2_scTableSums.map(_ +& s2_tagePrvdCtrCentered)
val s2_scPreds = VecInit(s2_totalSums.map(_ >= 0.S))

是否超过阈值则使用aboveThreshold函数来进行判断：

val s2_sumAboveThresholds = VecInit((0 to 1).map(i => aboveThreshold(s2_scTableSums(i), s2_tagePrvdCtrCentered, useThresholds(w))))

最后，SC在s2阶段矫正的结果为：

val s2_pred =
        Mux(s2_provideds(w) && s2_sumAboveThresholds(s2_chooseBit),
          s2_scPreds(s2_chooseBit),
          s2_tageTakens(w)
        )

若TAGE的Tx表提供了预测，且br对应的totalSum超过阈值，选择使用SC矫正后的结果，否则保持TAGE的结果。

SC update

SC的update首先是对Threshold的值进行update：

val thres = useThresholds(w)
when (scPred =/= tagePred && sumAbs >= thres - 4.U && sumAbs <= thres - 2.U) {
  val newThres = scThresholds(w).update(scPred =/= taken)
  scThresholds(w) := newThres
  XSDebug(p"scThres $w update: old ${useThresholds(w)} --> new ${newThres.thres}\n")
}

当SC的预测结果与TAGE的预测结果不同，且thres-4≤sumAbs≤thres-2时，对Threshold进行更新，首先根据SC预测的结果与真实结果进行比对，若正确则将ctr值+1，若ctr已经饱和，则对阈值thres+2。若错误则ctr值-1，若ctr已经饱和，则对阈值thres-2。

其余的更新逻辑比较简单，即将需要update的SC ctr值送入各个表中更新对应的位置即可。

XiangShan FrontEnd源码浅析：SC

概述

参数

SC

总体架构

SCTable

IO

SRAM

折叠历史

Resp

Update

SC Top

Threshold

`aboveThreshold`

SC resp

SC update

SunnyChen

概述

参数

SC

总体架构

SCTable

IO

SRAM

折叠历史

Resp

Update

SC Top

Threshold

aboveThreshold

SC resp

SC update

XiangShan FrontEnd源码浅析：FTB

XiangShan FrontEnd源码浅析：TAGE

SunnyChen

`aboveThreshold`