micrograd

自製反傳遞演算法

ccc:請為下列 karpathy 寫的梯度引擎 (反傳遞算法) 寫一篇說明

https://github.com/karpathy/micrograd

class Value:
    """ stores a single scalar value and its gradient """

    def __init__(self, data, _children=(), _op=''):
        self.data = data
        self.grad = 0
        # internal variables used for autograd graph construction
        self._backward = lambda: None
        self._prev = set(_children)
        self._op = _op # the op that produced this node, for graphviz / debugging / etc

    def __add__(self, other): # f=x+y
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data + other.data, (self, other), '+')

        def _backward():
            self.grad += out.grad  # gx = gf
            other.grad += out.grad # gy = gf
        out._backward = _backward

        return out

    def __mul__(self, other): # f=x*y
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data * other.data, (self, other), '*')

        def _backward():
            self.grad += other.data * out.grad # gx = y*gf  gx/gf = y
            other.grad += self.data * out.grad # gy = x*gf
        out._backward = _backward

        return out

    def __pow__(self, other): # f = x**n
        assert isinstance(other, (int, float)), "only supporting int/float powers for now"
        out = Value(self.data**other, (self,), f'**{other}')

        def _backward():
            self.grad += (other * self.data**(other-1)) * out.grad # gx = n (x**n-1)
        out._backward = _backward

        return out

    def relu(self):
        out = Value(0 if self.data < 0 else self.data, (self,), 'ReLU')

        def _backward():
            self.grad += (out.data > 0) * out.grad # gx = 1 if f>0 else 0
        out._backward = _backward

        return out

    def backward(self):

        # topological order all of the children in the graph
        topo = []
        visited = set()
        def build_topo(v):
            if v not in visited:
                visited.add(v)
                for child in v._prev:
                    build_topo(child)
                topo.append(v)
        build_topo(self)

        # go one variable at a time and apply the chain rule to get its gradient
        self.grad = 1
        for v in reversed(topo):
            v._backward()

    def __neg__(self): # -self
        return self * -1

    def __radd__(self, other): # other + self
        return self + other

    def __sub__(self, other): # self - other
        return self + (-other)

    def __rsub__(self, other): # other - self
        return other + (-self)

    def __rmul__(self, other): # other * self
        return self * other

    def __truediv__(self, other): # self / other
        return self * other**-1

    def __rtruediv__(self, other): # other / self
        return other * self**-1

    def __repr__(self): # 轉字串 -- https://www.educative.io/edpresso/what-is-the-repr-method-in-python
        return f"Value(data={self.data}, grad={self.grad})"

ChatGPT:

這段程式碼實現了一個簡單的梯度引擎，也就是反傳遞算法。梯度引擎是深度學習中常用的一種技術，用於自動計算損失函數對模型參數的梯度，並利用梯度下降等優化算法更新參數。這裡的梯度引擎實現了基本的數學運算，包括加法、乘法、幂、ReLU 等，並支持反向傳播計算梯度。

在這段程式碼中，Value 類別代表了一個標量值以及其梯度。每個 Value 物件都包含了其計算過程中的相關資訊，如子節點、運算符號等，用於構建反向傳播的計算圖。

__add__ 、 __mul__ 和 __pow__ 分別實現了加法、乘法和幂運算，並在計算過程中利用 out 變數保存結果，同時定義了 _backward 函數來計算梯度。以加法為例， _backward 函數計算出當前節點對損失函數的梯度，並分別加到兩個加數的梯度上。這樣做的原因是，加法的反向傳播規則是傳播其梯度給所有加數，因為每個加數對於加法的貢獻都是相等的。

relu 方法實現了 ReLU 運算，並在計算過程中利用 out 變數保存結果，同時定義了 _backward 函數來計算梯度。由於 ReLU 的反向傳播規則是當輸入大於 0 時，傳播其梯度給輸入，否則為 0，因此 _backward 函數在計算梯度時需要判斷 ReLU 的輸入是否大於 0。

backward 方法實現了反向傳播計算梯度的過程，首先使用拓撲排序構建計算圖的拓撲順序，然後從最後一個節點開始計算梯度。

此外，Value 類別也提供了許多常見的運算子重載，如加、減、乘、除等等，讓我們可以像一般的數值型別一樣使用這些運算子來進行計算。例如，我們可以使用以下方式進行計算：

a = Value(1)
b = Value(2)
c = a + b
d = a * b
e = a ** 2
f = c.relu()

上面的程式碼中，我們先建立了數值型別為 Value 的物件 a 和 b，然後進行了加、乘、平方和 ReLU 運算。其中，在進行加、乘和平方運算時，我們使用了運算子重載，讓 Value 物件能夠像一般的數值型別一樣進行運算；而在進行 ReLU 運算時，我們呼叫了 relu() 方法來執行。

最後，我們可以使用 backward() 方法計算出各個變數的梯度。在這個方法中，我們首先透過建立拓撲排序來找出計算圖中的所有變數，然後對於每個變數進行反向傳播算法，計算出其梯度。這些梯度存放在 grad 屬性中，我們可以透過 grad 屬性來取得這些梯度值。

總之，Value 類別提供了一個簡單而有效的梯度引擎實現，讓我們可以輕鬆地進行反向傳播算法來計算各個變數的梯度。這個實現也為我們提供了一個基礎，可以用來理解更複雜的梯度引擎實現。

陳鍾誠於金門大學資訊工程系 -- 本書衍生自維基百科與 Karpathy 的 micrograd 與 minGPT ，採用 CC: BY-SA 授權

micrograd

自製反傳遞演算法

ccc:請為下列 karpathy 寫的梯度引擎 (反傳遞算法) 寫一篇說明

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!