# «Magneto-hydrodynamics simulation in astrophysics by Bijia Pang A thesis submitted in conformity with the requirements for the degree of Doctor of ...»

Magneto-hydrodynamics simulation in astrophysics

by

Bijia Pang

A thesis submitted in conformity with the requirements

for the degree of Doctor of Philosophy

Graduate Department of Physics

University of Toronto

Copyright c 2011 by Bijia Pang

Abstract

Magneto-hydrodynamics simulation in astrophysics

Bijia Pang

Doctor of Philosophy

Graduate Department of Physics

University of Toronto

Magnetohydrodynamics (MHD) studies the dynamics of an electrically conducting ﬂuid

under the inﬂuence of a magnetic ﬁeld. Many astrophysical phenomena are related to MHD, and computer simulations are used to model these dynamics. In this thesis, we conduct MHD simulations of non-radiative black hole accretion as well as fast magnetic reconnection. By performing large scale three dimensional parallel MHD simulations on supercomputers and using a deformed-mesh algorithm, we were able to conduct very high dynamical range simulations of black hole accretion of Sgr A* at the Galactic Center.

We ﬁnd a generic set of solutions, and make speciﬁc predictions for currently feasible observations of rotation measure (RM). The magnetized accretion ﬂow is subsonic and lacks outward convection ﬂux, making the accretion rate very small and having a density slope of around −1. There is no tendency for the ﬂows to become rotationally supported, and the slow time variability of the RM is a key quantitative signature of this accretion ﬂow.

We also provide a constructive numerical example of fast magnetic reconnection in a three-dimensional periodic box. Reconnection is initiated by a strong, localized perturbation to the ﬁeld lines and the solution is intrinsically three-dimensional. Approximately 30% of the magnetic energy is released in an event which lasts about one Alfv´n time, e but only after a delay during which the ﬁeld lines evolve into a critical conﬁguration. In the co-moving frame of the reconnection regions, reconnection occurs through an X-like ii point, analogous to the Petschek reconnection. The dynamics appear to be driven by global ﬂows rather than local processes.

In addition to issues pertaining to physics, we present results on the acceleration of MHD simulations using heterogeneous computing systems [83]. We have implemented the MHD code on a variety of heterogeneous and multi-core architectures (multi-core x86, Cell, Nvidia and ATI GPU) using diﬀerent languages (FORTRAN, C, Cell, CUDA and OpenCL). Initial performance results for these systems are presented, and we conclude that substantial gains in performance over traditional systems are possible. In particular, it is possible to extract a greater percentage of peak theoretical performance from some heterogeneous systems when compared to x86 architectures.

It is a pleasure to thank the many people who made this thesis possible.

First I would like to thank my supervisor, Prof. Ue-Li Pen, for his enthusiastic teaching and inspiring guiding throughout my Ph.D. study.

I want to thank my committee members, Prof. Christopher D. Matzner, Prof.

Stephen W. Morris, Prof. Sabine Stanley, and Prof. Ralph E. Pudritz for their interesting questions and helpful suggestions for the thesis. Specially, I am grateful to Prof. Matzner, who has devoted a lot of time on my project. Discussion with him always enlightens me on the research.

I want to thank Kiyoshi Masui and Joachim Harnois-Deraps for editing my draft, and Gregory Paciga for correcting my presentation slides.

I also want to thank my friends, Xingxing Xing, Bin Guo, Sing-Leung Cheung, Chao Zhuang, Nan Chen, Jing Wang, Lu Wang, MinXue Liu, Xiaomin Du, Xingyu Liu, and Jun Hong Liang, who made my life not that boring during my Ph.D. study.

Finally, and most importantly, I want to thank my parents, Li Qin and Li Pang. I thank them for bringing me to this colourful world, raising me, supporting me, and loving me. I dedicate this thesis to them.

2.1 Simulations described in this paper. Columns: Run number; Maximum resolution relative to the Bondi radius; Radial dynamic range within RB ;

grid expansion factor within RB ; eﬀective resolution at RB ; magnetization parameter; rotation parameter; range of simulation times over which ﬂow properties were measured; mean mass accretion rate over this period; and

4.1 Performance on the multi-core x86 for diﬀerent box sizes; timings in milliseconds. x86(1) refers to single-core performance; x86(8) to 8...... 76

4.2 Cell performance while using PPE or varying numbers of SPEs for diﬀerent

4.5 Performance comparison for diﬀerent architectures; timings in milliseconds. N-GPU represents Fermi; A-GPU represents ATI HD5870; peak Gﬂops represents theoretical peak ﬂoating-point performance; peak GB/s

1.1 X ray image for Sgr A*. The luminosity of the supermassive black hole is 108 order dimmer than simple theoretical predictions. NASA/CXC/MIT/F.K.

1.2 Image of Submillimeter Array (SMA). Successful measurements of RM have been done by [56] using Submillimeter Array in 2006. Image courtesy

Image of solar ﬂare. The time scale of solar ﬂare is 105 faster than the 1.3 theoretical model (Sweet-Parker). Courtesy of NASA/SDO and the AIA,

1.4 Geometry for Sweet-Parker reconnection. The ﬂows come into the thin reconnection region from up and down half, and go out to two other directions horizontally. The speed of magnetic reconnection is limited by the

1.5 Comparison between conventional supercomputer and heterogeneous platform. To the left is a picture of Scinet supercomputer, which has the price-to-performance ratio of $100,000 for 1 Tera ﬂops. To the right is my desktop computer (ATI GPUs inside), which has the price-to-performance ratio as $400 for 1 Tera ﬂops. Programming on a heterogeneous platform

panel is the equatorial plane (yz), while the left panel a perpendicular slice (xy). White circles represent the Bondi radius (rB = 1000). The ﬂuid is slowly moving, in a state of magnetically frustrated convection. A movie of this ﬂow is available in the supporting information section of the

2.2 Density versus radius. The dotted line represents the density proﬁle for the Bondi solution, which is the steepest plausible slope at k = 1.5. The dashed line represents the density scaling for CDAF solution, which is the shallowest proposed slope with k = 0.5. The solid line is the density proﬁle

2.3 log(β), entropy and radial velocity versus radius. The dashed line vr /cs represents the radial velocity in units of mach number. The dots vr /cms represent the radial velocity in units of magnetosonic mach number. The solid line is the entropy, and we see the entropy inversion which leads to the slow, magnetically frustrated convection. Inside the inner boundary, the sound speed is lowered, leading to the lower entropy. The + symbols

2.4 Rotation measure vs time (in units of tB ). We chose Rrel = 17, corresponding to Rrel /RB =0.068. Six lines represent three axes: upper set is X (centered at +3), center is Y (centered at 0) and lower is Z (centered at

-3), with positive and negative directions drawn as solid and dashed lines,

2.5 PDF of RM in Figure 2.4. The dashed line represents a Gaussian distribution. The horizontal axis has been normalized by the standard deviation

horizontal axis is time, in units of tB ; Greyscale represents sign(Br ) 4 ρ|Br |, which was scaled to be more visually accessible. The coherence time is longer at large radii and at late times. Several Bondi times are needed to

2.7 Autocorrelation for Figure 2.4. X axis represents time lags; Y axis represents autocorrelation for diﬀerent Rin. The dotted, dashed, dashed-dot

2.8 RM coherence time τ as a function of the inner truncation radius Rrel ;

points refer to Rrel = 17, 26, 34 and 43. The bootstrap error of 0.17 dex is based on the six data, two for each coordinate direction, at each Rrel.

3.1 Numerical setup: the sphere in the center of the box represent the area of the rotational perturbation. up-left is the rotational perturbation looked

3.2 Reconnection for diﬀerent initial conditions. The total magnetic energy is an indication of reconnection. The dash-dot line has non-zero mean magnetic ﬁeld perturbation, and the reconnected ﬁeld asymptotes to a

3.4 Reconnection for diﬀerent resolutions near reconnection point. This plots recenters ﬁgure 3.3 to the time of maximum magnetic energy release, and scales the horizontal and vertical axis to the fractional energy release and

3.9 snapshot of magnetic ﬁeld line on the background of current, and snapshot of both magnetic and velocity ﬁeld line, and B 2 at 41 CT......... 61

3.10 Snapshot of magnetic ﬁeld line on the background of current, and snapshot

3.11 Snapshot of magnetic ﬁeld line on the background of current, and snapshot of both magnetic and velocity ﬁeld line, and B 2 at 10 CT for 400 cells.. 62

3.12 Snapshot of magnetic ﬁeld line on the background of current, and snapshot

3.13 Snapshot of magnetic ﬁeld line on the background of current, and snapshot of both magnetic and velocity ﬁeld line, and B 2 at 40 CT for 400 cells.. 63

3.14 Snapshot of magnetic ﬁeld line on the background of current, and snapshot

4.1 Time vs box size for GPU comparison. X axis represents the length of the box; Y axis represents the time for one time step, Timings in milli second;

Dot diamond is OpenCL on ATI; Dash circle is OpenCL on Nvidia; Dash

5.1 Atacama Large Millimeter/Submillimeter Array (ALMA). ALMA has much higher sensitivity and higher resolution compared with current sub-millimeter

5.3 Roadmap for Nvidia GPU. DP represents double precision. FLOPS represents FLoating point Operations per Second, which is a measure for computing performance. X axis represents the time; Y axis represents the

A.1 The logarithm of the relativistic RM factor, log10 F (k, kT ). The true RM integral is modiﬁed by a factor F (k, kT ) relative to an estimate in which the nonrelativistic formula is used, but the inner bound of integration is set to the radius Rrel at which electrons become relativistic; see equation

B.1 Vacuum solution of the magnetic ﬁeld is calculated in the central region.

The ﬁeld lines outside of the central region show the boundary condition. 108

Introduction

1.1 Motivation Computer technology has been developing rapidly during the last decades and it has revolutionized practically all aspects of human life. Scientists control satellites using computers to observe the universe; ﬁnancial departments use computers to estimate economic growth; transportation companies use computers to guide trains and monitor air traﬃc; computers allow people to pay their bills and order tickets online; children play computer games and watch videos; people use electronic mail to contact each other and video-conference makes long distances meetings possible.

While we now use computers for a variety of tasks, they were originally designed for computing, that is, to perform simple mathematical operations. Today we continue to use them for such purposes. Thanks to the technology revolution, a typical personal computer can currently calculate a simple operation (e.g. 1+1=2) 10 billions times in just one second. Furthermore, the fastest computer cluster to date has the theoretical performance of 1peta ﬂops (FLoating point Operations Per Second), which means that it performs this simple equation a quadrillion times in one single second!

Obviously, we do not just use computers to calculate ‘1+1=2’. To utilize their full Chapter 1. Introduction power, people now take advantage of computers to do far more complicated numerical simulations in diﬀerent areas, including physics, chemistry, biology, economics, and engineering.

A computer simulation tries to solve a mathematical model, which is usually expressed using equations. Diﬀerent output behaviours depend on the model and on the input data, and the results of the simulations allow users to setup the experiment and predict the results at very low cost, since the only apparatus needed is the computer. The mathematical equations are ﬁrst translated into programming languages (i.e. code). The results are then calculated by computers and presented using graphs, videos or other readable methods for the analysis.