ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.15725
9
39

Solving Regularized Exp, Cosh and Sinh Regression Problems

28 March 2023
Zhihang Li
Zhao-quan Song
Tianyi Zhou
ArXivPDFHTML
Abstract

In modern machine learning, attention computation is a fundamental task for training large language models such as Transformer, GPT-4 and ChatGPT. In this work, we study exponential regression problem which is inspired by the softmax/exp unit in the attention mechanism in large language models. The standard exponential regression is non-convex. We study the regularization version of exponential regression problem which is a convex problem. We use approximate newton method to solve in input sparsity time. Formally, in this problem, one is given matrix A∈Rn×dA \in \mathbb{R}^{n \times d}A∈Rn×d, b∈Rnb \in \mathbb{R}^nb∈Rn, w∈Rnw \in \mathbb{R}^nw∈Rn and any of functions exp⁡,cosh⁡\exp, \coshexp,cosh and sinh⁡\sinhsinh denoted as fff. The goal is to find the optimal xxx that minimize 0.5∥f(Ax)−b∥22+0.5∥diag(w)Ax∥22 0.5 \| f(Ax) - b \|_2^2 + 0.5 \| \mathrm{diag}(w) A x \|_2^20.5∥f(Ax)−b∥22​+0.5∥diag(w)Ax∥22​. The straightforward method is to use the naive Newton's method. Let nnz(A)\mathrm{nnz}(A)nnz(A) denote the number of non-zeros entries in matrix AAA. Let ω\omegaω denote the exponent of matrix multiplication. Currently, ω≈2.373\omega \approx 2.373ω≈2.373. Let ϵ\epsilonϵ denote the accuracy error. In this paper, we make use of the input sparsity and purpose an algorithm that use log⁡(∥x0−x∗∥2/ϵ)\log ( \|x_0 - x^*\|_2 / \epsilon)log(∥x0​−x∗∥2​/ϵ) iterations and O~(nnz(A)+dω)\widetilde{O}(\mathrm{nnz}(A) + d^{\omega} )O(nnz(A)+dω) per iteration time to solve the problem.

View on arXiv
Comments on this paper