1. Department of Automation, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
2. Beijing Sankuai Online Technology Co., Ltd., Beijing 100102, China
SHA Xingyu(shaxy18@mails.tsinghua.edu.cn);
ZHANG Jiaqi(jqzhang61@gmail.com)
YOU Keyou(youky@tsinghua.edu.cn);
纸质出版日期:2023-09-25,
网络出版日期:2023-08-30,
收稿日期:2023-03-31,
录用日期:2023-04-11
扫 描 看 全 文
SHA Xingyu, ZHANG Jiaqi, YOU Keyou. Fully asynchronous distributed optimization with linear convergence over directed networks[J]. 中山大学学报(自然科学版)(中英文), 2023,62(5):1-23.
SHA Xingyu,ZHANG Jiaqi,YOU Keyou.Fully asynchronous distributed optimization with linear convergence over directed networks[J].Acta Scientiarum Naturalium Universitatis Sunyatseni,2023,62(05):1-23.
SHA Xingyu, ZHANG Jiaqi, YOU Keyou. Fully asynchronous distributed optimization with linear convergence over directed networks[J]. 中山大学学报(自然科学版)(中英文), 2023,62(5):1-23. DOI: 10.13471/j.cnki.acta.snus.2023A023.
SHA Xingyu,ZHANG Jiaqi,YOU Keyou.Fully asynchronous distributed optimization with linear convergence over directed networks[J].Acta Scientiarum Naturalium Universitatis Sunyatseni,2023,62(05):1-23. DOI: 10.13471/j.cnki.acta.snus.2023A023.
We study distributed optimization problems over a directed network
where nodes aim to minimize the sum of local objective functions via directed communications with neighbors. Many algorithms are designed to solve it for synchronized or randomly activated implementation
which may create deadlocks in practice. In sharp contrast
we propose a fully asynchronous push-pull gradient(APPG) algorithm
where each node updates without waiting for any other node by using possibly delayed information from neighbors. Then
we construct two novel augmented networks to analyze asynchrony and delays
and quantify its convergence rate from the worst-case point of view. Particularly
all nodes of APPG converge to the same optimal solution at a linear rate of
<math id="M1"><mi mathvariant="normal">𝒪</mi><mfenced separators="|"><mrow><msup><mrow><mi>λ</mi></mrow><mrow><mi>k</mi></mrow></msup></mrow></mfenced></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49662083&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49662072&type=
9.73666668
4.31799984
if local functions have Lipschitz-continuous gradients and their sum satisfies the Polyak-Łojasiewicz condition (convexity is not required)
where
<math id="M2"><mi>λ</mi><mo>∈</mo><mfenced separators="|"><mrow><mn mathvariant="normal">0
1</mn></mrow></mfenced></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49662094&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49662084&type=
13.88533401
4.14866638
is explicitly given and the virtual counter
<math id="M3"><mi>k</mi></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49662103&type=
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49662086&type=
1.69333339
2.62466669
increases by one when any node updates. Finally
the advantage of APPG over the synchronous counterpart and its linear speedup efficiency are numerically validated via a logistic regression problem.
fully asynchronousdistributed optimizationlinear convergencePolyak-Łojasiewicz condition
ASSRAN M S, RABBAT M G, 2021. Asynchronous gradient push[J]. IEEE Trans Autom Control, 66(1): 168-183.
ASSRAN M, LOIZOU N, BALLAS N, et al, 2019. Stochastic gradient push for distributed deep learning[C]//Proc Int Conf Mach Learn, 97: 344-353.
ASSRAN M, AYTEKIN A, FEYZMAHDAVIAN H R, et al, 2020. Advances in asynchronous parallel and distributed optimization[J]. Proc IEEE, 108(11): 2013-2031.
CHANG T H, HONG M Y, WANG X F, 2015. Multi-agent distributed optimization via inexact consensus ADMM[J]. IEEE Trans Signal Process, 63(2): 482-497.
CHANG T H, HONG M Y, LIAO W C, et al, 2016. Asynchronous distributed ADMM for large-scale optimization—Part I: Algorithm and convergence analysis[J]. IEEE Trans Signal Process, 64(12): 3118-3130.
DHEERU D, KARRA TANISKIDOU E, 2017. UCI machine learning repository[EB/OL]. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/mlhttp://archive.ics.uci.edu/ml.
FAZEL M, GE R, KAKADE S M, et al, 2018. Global convergence of policy gradient methods for the linear quadratic regulator[C]//Proc Int Conf Mach Learn, 80: 1467-1476.
JAKOVETIĆ D, 2019. A unification and generalization of exact distributed first-order methods[J]. IEEE Trans Signal Inf Process Netw, 5(1): 31-46.
KARIMI H, NUTINI J, SCHMIDT M, 2016. Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition[C]//Joint Eur Conf Mach Learn Knowl Discovery Databases, 795-811.
LI H Q, HUANG C C, WANG Z, et al, 2020. Computation-efficient distributed algorithm for convex optimization over time-varying networks with limited bandwidth communication[J]. IEEE Trans Signal Inf Process Netw, 6: 140-151.
LI S, BASAR T, 1987. Asymptotic agreement and convergence of asynchronous stochastic algorithms[J]. IEEE Trans Autom Control, 32(7): 612-618.
LI Z, SHI W, YAN M, 2019. A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates[J]. IEEE Trans Signal Process, 67(17): 4494-4506.
LIAN X R, ZHANG W, ZHANG C, et al, 2018. Asynchronous decentralized parallel stochastic gradient descent[C]//Proc Int Conf Mach Learn, 80: 3043-3052.
NEDIĆ A, 2011. Asynchronous broadcast-based convex optimization over a network[J]. IEEE Trans Autom Control, 56(6): 1337-1351.
NEDIĆ A, OLSHEVSKY A, 2015. Distributed optimization over time-varying directed graphs[J]. IEEE Trans Autom Control, 60(3): 601-615.
NEDIĆ A, OLSHEVSKY A, 2016. Stochastic gradient-push for strongly convex functions on time-varying directed graphs[J]. IEEE Trans Autom Control, 61(12): 3936-3947.
NEDIĆ A, OZDAGLAR A, 2009. Distributed subgradient methods for multi-agent optimization[J]. IEEE Trans Autom Control, 54(1): 48-61.
NEDIĆ A, OZDAGLAR A, 2010. Convergence rate for consensus with delays[J]. J Glob Optim, 47(3): 437-456.
NEDIĆ A, OLSHEVSKY A, SHI W, 2017. Achieving geometric convergence for distributed optimization over time-varying graphs[J]. SIAM J Optim, 27(4): 2597-2633.
OLSHEVSKY A, TSITSIKLIS J N, 2011. Convergence speed in distributed consensus and averaging[J]. SIAM Rev, 53(4): 747-772.
PU S, SHI W, XU J M, et al, 2021. Push-pull gradient methods for distributed optimization in networks[J]. IEEE Trans Autom Control, 66(1): 1-16.
SAADATNIAKI F, XIN R, KHAN U A, 2020. Decentralized optimization over time-varying directed graphs with row and column-stochastic matrices[J]. IEEE Trans Autom Control, 65(11): 4769-4780.
SCUTARI G, SUN Y, 2019. Distributed nonconvex constrained optimization over time-varying digraphs[J]. Math Program, 176(1/2): 497-544.
SPIRIDONOFF A, OLSHEVSKY A, PASCHALIDIS I C, 2020. Robust asynchronous stochastic gradient-push: Asymptotically optimal and network-independent performance for strongly convex functions[J]. J Mach Learn Res, 21(58): 1-47.
SUN H, HONG M Y, 2019. Distributed non-convex first-order optimization and information processing: Lower complexity bounds and rate optimal algorithms[J]. IEEE Trans Signal Process, 67(22): 5912-5928.
TIAN Y, SUN Y, SCUTARI G, 2020. Achieving linear convergence in distributed asynchronous multi-agent optimization[J]. IEEE Trans Autom Control, 65(12): 5264-5279.
TOURI B, 2012. Product of random stochastic matrices and distributed averaging[M]. Berlin: Springer-Verlag.
TSITSIKLIS J, BERTSEKAS D, ATHANS M, 1986. Distributed asynchronous deterministic and stochastic gradient optimization algorithms[J]. IEEE Trans Autom Control, 31(9): 803-812.
WU T Y, YUAN K, LING Q, et al, 2018. Decentralized consensus optimization with asynchrony and delays[J]. IEEE Trans Signal Inf Process Netw, 4(2): 293-307.
XIE P, YOU K Y, TEMPO R, et al, 2018. Distributed convex optimization with inequality constraints over time-varying unbalanced digraphs[J]. IEEE Trans Autom Control, 63(12): 4331-4337.
XIN R, KHAN U A, 2018. A linear algorithm for optimization over directed graphs with geometric convergence[J]. IEEE Control Syst Lett, 2(3): 315-320.
XIN R, KAR S, KHAN U A, 2020. Decentralized stochastic optimization and machine learning: A unified variance-reduction framework for robust performance and fast convergence[J]. IEEE Signal Process Mag, 37(3): 102-113.
XU J M, ZHU S Y, SOH Y C, et al, 2018. Convergence of asynchronous distributed gradient methods over stochastic networks[J]. IEEE Trans Autom Control, 63(2): 434-448.
XU J M, TIAN Y, SUN Y, et al, 2021. Distributed algorithms for composite optimization: Unified framework and convergence analysis[J]. IEEE Trans Signal Process, 69: 3555-3570.
YI X L, LI X X, XIE L H, et al, 2020. Distributed online convex optimization with time-varying coupled inequality constraints[J]. IEEE Trans Signal Process, 68: 731-746.
YUAN K, YING B C, ZHAO X C, et al, 2019. Exact diffusion for distributed optimization and learning—Part I: Algorithm development[J]. IEEE Trans Signal Process, 67(3): 708-723.
ZHANG J Q, YOU K Y, 2020a. AsySPA: An exact asynchronous algorithm for convex optimization over digraphs[J]. IEEE Trans Autom Control, 65(6): 2494-2509.
ZHANG J Q, YOU K Y, CAI K, 2020b. Distributed dual gradient tracking for resource allocation in unbalanced networks[J]. IEEE Trans Signal Process, 68: 2186-2198.
ZHAO X C, SAYED A H, 2015. Asynchronous adaptation and learning over networks—Part I: Modeling and stability analysis[J]. IEEE Trans Signal Process, 63(4): 811-826.
0
浏览量
16
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构