Margin-infused relaxed algorithm

Machine learning algorithm

Margin-infused relaxed algorithm (MIRA)[1] is a machine learning algorithm, an online algorithm for multiclass classification problems. It is designed to learn a set of parameters (vector or matrix) by processing all the given training examples one-by-one and updating the parameters according to each training example, so that the current training example is classified correctly with a margin against incorrect classifications at least as large as their loss.[2] The change of the parameters is kept as small as possible.

A two-class version called binary MIRA[1] simplifies the algorithm by not requiring the solution of a quadratic programming problem (see below). When used in a one-vs-all configuration, binary MIRA can be extended to a multiclass learner that approximates full MIRA, but may be faster to train.

The flow of the algorithm[3][4] looks as follows:

Algorithm MIRA
  Input: Training examples 
  
    
      
        T
        =
        {
        
          x
          
            i
          
        
        ,
        
          y
          
            i
          
        
        }
      
    
    {\displaystyle T=\{x_{i},y_{i}\}}
  

  Output: Set of parameters 
  
    
      
        w
      
    
    {\displaystyle w}
  

  
  
    
      
        i
      
    
    {\displaystyle i}
  
 ← 0, 
  
    
      
        
          w
          
            (
            0
            )
          
        
      
    
    {\displaystyle w^{(0)}}
  
 ← 0
  for 
  
    
      
        n
      
    
    {\displaystyle n}
  
 ← 1 to 
  
    
      
        N
      
    
    {\displaystyle N}
  

    for 
  
    
      
        t
      
    
    {\displaystyle t}
  
 ← 1 to 
  
    
      
        
          |
        
        T
        
          |
        
      
    
    {\displaystyle |T|}
  

      
  
    
      
        
          w
          
            (
            i
            +
            1
            )
          
        
      
    
    {\displaystyle w^{(i+1)}}
  
 ← update 
  
    
      
        
          w
          
            (
            i
            )
          
        
      
    
    {\displaystyle w^{(i)}}
  
 according to 
  
    
      
        {
        
          x
          
            t
          
        
        ,
        
          y
          
            t
          
        
        }
      
    
    {\displaystyle \{x_{t},y_{t}\}}
  

      
  
    
      
        i
      
    
    {\displaystyle i}
  

  
    
      
        i
        +
        1
      
    
    {\displaystyle i+1}
  

    end for
  end for
  return 
  
    
      
        
          
            
              
                
                
                  j
                  =
                  1
                
                
                  N
                  ×
                  
                    |
                  
                  T
                  
                    |
                  
                
              
              
                w
                
                  (
                  j
                  )
                
              
            
            
              N
              ×
              
                |
              
              T
              
                |
              
            
          
        
      
    
    {\displaystyle {\frac {\sum _{j=1}^{N\times |T|}w^{(j)}}{N\times |T|}}}
  

  • "←" denotes assignment. For instance, "largestitem" means that the value of largest changes to the value of item.
  • "return" terminates the algorithm and outputs the following value.

The update step is then formalized as a quadratic programming[2] problem: Find m i n w ( i + 1 ) w ( i ) {\displaystyle min\|w^{(i+1)}-w^{(i)}\|} , so that s c o r e ( x t , y t ) s c o r e ( x t , y ) L ( y t , y )   y {\displaystyle score(x_{t},y_{t})-score(x_{t},y')\geq L(y_{t},y')\ \forall y'} , i.e. the score of the current correct training y {\displaystyle y} must be greater than the score of any other possible y {\displaystyle y'} by at least the loss (number of errors) of that y {\displaystyle y'} in comparison to y {\displaystyle y} .

References

  1. ^ a b Crammer, Koby; Singer, Yoram (2003). "Ultraconservative Online Algorithms for Multiclass Problems". Journal of Machine Learning Research. 3: 951–991.
  2. ^ a b McDonald, Ryan; Crammer, Koby; Pereira, Fernando (2005). "Online Large-Margin Training of Dependency Parsers" (PDF). Proceedings of the 43rd Annual Meeting of the ACL. Association for Computational Linguistics. pp. 91–98.
  3. ^ Watanabe, T. et al (2007): "Online Large Margin Training for Statistical Machine Translation". In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 764–773.
  4. ^ Bohnet, B. (2009): Efficient Parsing of Syntactic and Semantic Dependency Structures. Proceedings of Conference on Natural Language Learning (CoNLL), Boulder, 67–72.

External links

  • adMIRAble - MIRA implementation in C++
  • Miralium - MIRA implementation in Java
  • MIRA implementation for Mahout in Hadoop