Skip to content

Extension Examples

Yue Wu edited this page Aug 29, 2016 · 4 revisions

Examples to Extend the Library

In this page, we use the "Ada-FOBOS" and "Ada-FOBOS-L1" algorithms to show how to extend the library to add new algorithms.

Ada-FOBOS and Ada-FOBOS-L1 Algorithms

The following steps show how Ada-FOBOS and Ada-FOBOS-L1 Algorithms work.

  1. Input:
  • eta: learning rate
  • delta: parameter to ensure positive-definite property of the adaptive weighting matrix
  1. Procedures:

  2. receive new example x_t

  3. predict results:

```      
y_t = w_t * x_t
```
  1. suffer loss l_t

  2. calculate gradient g_t with respect to w_t

  3. update the diagonal matrix H_t of the proximal function (x_t'*H_t*x_t)/2

    H_t = delta + diag(\Sigma_{i=1}^{t}{g_i * g_i'})
    
  4. update w_t

    Ada-FOBOS: w_t = w_t - eta * g_t / H_t
    Ada-FOBOS-L1: w_t = truncate(w_t - eta * g_t / H_t, lambda * eta / H_t)
    

Step by Step Implementation of Ada-FOBOS

  1. Loss Function

Ada-FOBOS uses "Hinge Loss" or "Logistic Loss" as the loss function. So we do not need to write new loss functions.

  1. Create File

Since Ada-FOBOS is an online linear algorithm, we first create a new file named ada_fobos.h in include/sol/model/olm, in the header, we include the base class OnlineLinearModel. We need to define the constructor and destructor of the algorithm.

 #include <sol/model/online_linear_model.h>

class AdaFOBOS: public OnlineLinearModel {
 public:
  AdaFOBOS(int class_num);
  virtual ~AdaFOBOS();
};
  1. Update Function

Then we define the UPDATE function to update the model, which is the key of the algorithm. Note that OnlineLinearModel will predict on the incoming example and calculate the loss automatically.

class AdaFOBOS: public OnlineLinearModel {
 public:
  AdaFOBOS(int class_num);
  virtual ~AdaFOBOS();

 protected:
  virtual void Update(const pario::DataPoint& dp, const float* predict, float loss);
};
  1. Parameters

The algorithm needs the following parameters:

+ **eta**: learning rate, which is a shared parameter of online methods and is already defined in **OnlineModel**. Since Ada-FOBOS uses constant learning rate, we need to set its value in **SetParameter** function. 
+ **delta**: we need to set its value in **SetParameter** function. Besides, we need serialize the parameter in **GetModelInfo** function. The desiralization is achieved by **SetParameter**.
+ **H**:  we need to keep the diagonal matrix **H_t** (denoted as a vector since it's diagonal) for each class. We need to overload the **update_dim** to adaptively   update the dimension of **H_t**. Besides, we need to overload **GetModelParam** and **SetModelParam** to serialize and deserialize **H_t**.
class AdaFOBOS: public OnlineLinearModel {
 public:
  AdaFOBOS(int class_num);
  virtual ~AdaFOBOS();

  virtual void SetParameter(const string& name, const string& value);

 protected:
  virtual void Update(const pario::DataPoint& dp, const float* predict, float loss);

  virtual void update_dim(index_t dim);
  virtual void GetModelInfo(Json::Value& root) const;    
  virtual void GetModelParam(ostream& os) const;
  virtual int SetModelParam(istream& is);

 protected:
  float delta_;
  math::Vector<real_t>* H_;
};
  1. Implementation

  2. Constructor and Destructor

In constructor, we need to allocate the space for **H_**. In destructor, we need to free the space of **H_**.

```c++
AdaFOBOS::AdaFOBOS(int class_num): OnlineLinearModel(class_num), delta_(10.f) {
  // clf_num_ is the actual number of classifiers
  // when class number is 2, clf_num_ = 1. Otherwise, clf_num_ = class_num
  this->H_ = new math::Vector<real_t>[this->clf_num_]; 
  for(int i = 0; i < this->clf_num_; ++i){
    this->H_[i].resize(this->dim_);
    this->H_[i] = this->delta_;   //initialize
  }
}

AdaFOBOS::~AdaFOBOS() { DeleteArray(this->H_); }
```    
  1. Update Function
We can implement the update function in almost the same way as the algorithm shows in a MATLAB style as follows:

```c++
void AdaFOBOS::Update(const pario::DataPoint& dp, const float* predicts, float loss) {
  const auto& x = dp.data();
  for (int c = 0; c < this->clf_num; ++c){
    if (g(c) == 0) continue; //no need to update

    //the gradient has already been calculated by the parent class OnlineLinearModel
    // the gradient is g(c) * x,please refer to the documentation for the reason
    // The following function update H_ incrementally with new coming examples
    H_[c] = Sqrt(L2(H_[c] - delta_) + L2(g(c) * x)) + delta_;
    //update bias
    H_[c][0] = sqrtf(H_[c][0] - delta_) * (H_[c][0] - delta_) + g(c) * g(c)) + delta_;

    //update weight vector
    w(c) -= eta_ * g(c) * x / H_[c];
   //update bias
   w(c)[0] -= bias_eta() * g(c) / H_[c][0];
}    
```
  1. Parameters
+ SetParameter & Update Dimension

  ```c++
  void AdaFOBOS::SetParameter(const string& name, const string& value) {
    if (name == "delta") {
      this->delta_ = stof(value);
      Check(delta_>0);
    } else if (name == "eta") {
      this->eta_ = stof(value);
      Check(delta_>0);
    } else {
      OnlineLinearModel::SetParameter(name, value);
    }
  }   

  void  AdaFOBOS::update_dim(index_t dim) {
    if (dim > this->dim_) {
       real_t delta = this->delta_;
       for (int c = 0; c < this->clf_num_; ++c){
         this->H_[c].resize(dim); //resize to the new dim
         // we set the new value of the added dimension (from dim_ to dim) to delta
         this->H_[c].slice_op([delta](real_t& val) { val = delta;}, this->dim_);
    }
    OnlineLinearModel::update_dim(dim);
  }
  ```

+ Serialization &n Deserialization

  ```c++
  void AdaFOBOS::GetModelInfo(Json::Value root) {
    OnlineLinearModel::GetModelInfo(root);
    root["online"]["delta"] = this->delta_;
    root["online"]["eta"] = this->eta_;
  }  

  void AdaFOBOS::GetModelParam(ostream&  os) {
    OnlineLinearModel::GetModelParam(os);
    for (int c = 0; c < this->clf_num_; ++c) {
      os<<"H["<<c<<"]: "<<this->H_[c]<<"\n";
    }
  }

  int AdaFOBOS::SetModeoParam(istream&  is) {
    OnlineLinearModel::SetModelParam(is);

    string line; //for the header "H[*]"
    for (int c = 0; c < this->clf_num_; ++c) {
      is >>line>>this->H_[c];;
    }
  }
  return Status_OK;
  ```

+ Registration

  To make sure that the tool knows we add a new algorithm, we register the new algorithm

  ```    
  RegisterModel(AdaFOBOS, "ada-fobos", "Adaptive Subgradient FOBOS");
  ```

Step by Step Implementation of Ada-FOBOS-L1

The only difference between Ada-FOBOS-L1 and Ada-FOBOS is the update function. We can simply add a new parameter lambda and then modify the Update function as required.

However, truncation of all dimension is not efficient when dimension is high and data is sparse. So researchers proposed the lazy update scheme. The learner truncates the weights only when needed, that is:

  • when the input data requires that dimension;
  • when the learning process finishes, the learner needs to truncate all weights.

The lazy udpate scheme is a common way for many sparse online learning algorithms, which is implemented as LazyOnlineL1Regularizer in this toolbox.

As a result, the difference between Ada-FOBOS-L1 and Ada-FOBOS now is to truncate weights when predicting on the current example, rather than the Update function.

class AdaFOBOS_L1: public AdaFOBOS {
 public:
  AdaFOBOS_L1(int class_num);

  // the learner needs to truncate all weights at the end of training
  virtual void EndTrain(); 
 protected:
  // the learner needs to truncate weights before prediction
  virtual label_t TrainPredict(const pario::DataPoint& dp, const float* predict);

 protected:
  LazyOnlineL1Regularizer l1_;
};

And the implementations:

AdaFOBOS_L1::AdaFOBOS_L1(int class_num): AdaFOBOS(class_num) {
 this->regularizer)_ = &l1_; //tell the parent classes that we have a regularizer
}

label_t AdaFOBOS_L1::TrainPredict(const pario::DataPoint& dp, float* predicts) {
  const auto& x = dp.data();

  real_t t = real_t(cur_iter_num_ - 1); //current iteration, since not updated, minus 1
  // truncate the weights
  for(int c = 0; c < this->clf_num_; ++c) {
    // the w(c).slice(x) means that we only need to truncate weights needed by x
    // this is a similar operation to numpy  to index part of the elements in array
    w(c) = truncate(w(c).slice(x), (eta_ * l1_.lambda()) * (t - l1_.last_update_time()) / H_[c]);
    //truncate bias
    w(c)[0] = truncate(w(c)[0], bias_eta() * l1_.lambda() / H_[c][0]);
  }
  //normal prediction
  return OnlineLinearModel::TrainPredict(dp, predicts);
};

void AdaFOBOS_L1::EndTrain() {
  real_t t = cur_iter_num_; 
  // truncate all weights
  for(int c = 0; c < this->clf_num_; ++c) {
    w(c) = truncate(w(c), (eta_ * l1_.lambda()) * (t - l1_.last_update_time()) / H_[c]);
    //truncate bias
    w(c)[0] = truncate(w(c)[0], bias_eta() * l1_.lambda() / H_[c][0]);
  }

  OnlineLinearModel::EndTrain();
};

RegisterModel(AdaFOBOS_L1, "ada-fobos-l1", "Adaptive Subgradient FOBOS with L1 regularization");