# «REVSTAT – Statistical Journal Volume 10, Number 1, March 2012, 109–133 MODELLING TIME SERIES EXTREMES Authors: V. Chavez-Demoulin – Faculty of ...»

REVSTAT – Statistical Journal

Volume 10, Number 1, March 2012, 109–133

## MODELLING TIME SERIES EXTREMES

Authors: V. Chavez-Demoulin

– Faculty of Business and Economics, University of Lausanne,

1015 Lausanne, Switzerland

valerie.chavez@unil.ch

A.C. Davison

– Ecole Polytechnique F´d´rale de Lausanne, EPFL-FSB-MATHAA-STAT,

ee

Station 8, 1015 Lausanne, Switzerland

anthony.davison@epfl.ch

**Abstract:**

• The need to model rare events of univariate time series has led to many recent advances in theory and methods. In this paper, we review telegraphically the literature on extremes of dependent time series and list some remaining challenges.

**Key-Words:**

• Bayesian statistics; Box–Cox transformation; clustering; dependence; extremal index;

extremogram; generalized extreme-value distribution; generalized Pareto distribution;

Hill estimator; nonparametric smoothing; non-stationarity; regression; tail index.

**AMS Subject Classiﬁcation:**

• 62E20, 62F15, 62G05, 62G08, 62G32, 62M05, 62M10.

110 V. Chavez-Demoulin and A.C. Davison Modelling Time Series Extremes

1. INTRODUCTION Statistical analysis of the extremes of time series is a traditional staple of hydrology and insurance, but the last two decades have seen applications broaden to a huge variety of domains, from ﬁnance to atmospheric chemistry to climatology.

The most common approaches for describing the extreme events of stationary data are the block maximum approach, which models the maxima of a set of contiguous blocks of observations using the generalized extreme-value (GEV) distribution, and the peaks-over-threshold approach, in which a Poisson process model is used for exceedances of a ﬁxed high or low threshold level; often this entails ﬁtting the generalized Pareto distribution (GPD) to the exceedances. The two approaches lead to diﬀerent but closely related descriptions of the extremes, determined by the marginal distribution of the series and by its extremal dependence structure.

Whereas the marginal features are well-understood from the study of independent and identically distributed (iid) variates, the rather less well-explored dependence features are the main focus of this paper. We review some related relevant theory and methods and attempt to list some aspects that seem to need further study.

Throughout the paper, we discuss maximum or upper extremes, but minima or lower extremes can be handled by negating the data.

Temporal dependence is common in univariate extremes, which may display intrinsic dependence, due to autocorrelation, or dependence due to the eﬀects of other variables, or both, and this demands an appropriate theoretical treatment.

Short-range dependence leading to clusters of extremes often arises: for example, ﬁnancial time series usually display volatility clustering, and river ﬂow maxima often occur together following a major storm. The joint behavior of the observations within a cluster is determined by the short-range dependence structure and can be accommodated, though not fully described, within a general theory.

Long-range dependence of extremes seems implausible in most contexts, genetic or genomic data being a possible exception. Large-scale variation due to trend, seasonality or regime changes is typically dealt with by appropriate modelling.

Below we ﬁrst give an account of the eﬀect of dependence on time series extremes, and discuss associated statistical methods. For completeness we then outline some relevant Bayesian methods, and then turn to dealing with regression and non-stationarity. The paper closes with a brief list of some open problems.

112 V. Chavez-Demoulin and A.C. Davison

2. SHORT-RANGE DEPENDENCE

2.1. Eﬀect of short-range dependence The discussion below is based partly on Leadbetter et al. (1983), a standard reference to the literature on extremes of time series and random processes, and on Beirlant et al. (2004, Ch. 10), which provides a more recent summary; see also Coles (2001, Ch. 5). It is usual to study the eﬀect of autocorrelation under a type of mixing condition that restricts the impact of dependence on extremes.

Deﬁnition 2.1. A strictly stationary sequence {Xi }, whose marginal distribution F has upper support point xF = sup{x : F (x) 1}, is said to satisfy D(un ) if, for any integers i1 ··· ip j1 ··· jq with j1 − ip l,

where α(n, ln ) → 0 for some sequences ln = o(n) and un → xF as n → ∞.

The D(un ) condition implies that rare events that are suﬃciently separated are almost independent. ‘Suﬃcient’ separation here is relatively short-distance, since ln /n → 0 as n → ∞. This allows one to establish the following result, which shows that if the D(un ) condition is satisﬁed, then the GEV limit arises for the maxima of dependent data, thereby justifying the use of the block maximum approach for most stationary time series.

** Theorem 2.1.**

Let {Xi } be a stationary sequence for which there exist sequences of normalizing constants {an 0} and {bn } and a non-degenerate distribution H such that Mn = max{X1,..., Xn } satisﬁes

If D(un ) holds with un = an z + bn for each z for which H(z) 0, then H is a GEV distribution.

Thus the eﬀect of dependence must be felt in the local behavior of extremes, the commonest measure of which is the extremal index, θ. This lies in the interval [0, 1], though θ 0 except in pathological cases. If the sequence {Xn } is independent, then θ = 1, but this is also the case for certain dependent series.

The relation between maxima of a dependent sequence and of a corresponding

**independent sequence is summarised in the following theorem:**

Modelling Time Series Extremes

where pn = o(n) → ∞ and the threshold sequence {un } is chosen to ensure that n{1 − F (un )} → λ ∈ (0, ∞). Thus θ−1 is the limiting mean cluster size based on a block of pn consecutive observations, as pn increases. Another is

so θ is the limiting probability that an exceedance over un is the last of a cluster of such exceedances. Asymptotically, therefore, extremes of the stationary sequence occur in clusters of mean size 1/θ. Since the suitably rescaled times of exceedances over un in an independent sequence would in the limit arise as a Poisson process of rate λ, and since un is the same as for the corresponding independent series, the mean time between clusters in dependent series must increase by a factor 1/θ, corresponding to clusters of exceedances arising as a Poisson process of rate λθ.

114 V. Chavez-Demoulin and A.C. Davison Hsing (1987) shows that the structure of these clusters is essentially arbitrary;

see also Hsing et al. (1988).

A consequence of Theorem 2.2 is that if the extremal types theorem is applicable, then for a suitable choice of parameters we may write

and so that Mn is eﬀectively the maximum of nθ equivalent independent observations. Thus for dependent data and a large probability p, the marginal quantiles for Xj will be estimated by

so ignoring the clustering would lead to an underestimation of quantiles of F.

When clustering occurs, the notion of return level is more complex. If θ = 1, for instance, then the ‘100-year-event’ will occur on average ten times in the next millennium, but has probability 0.368 of not appearing in the next 100 years, whereas if θ = 1/10, then on average the event also occurs ten times in a millennium, but all ten events will tend to appear together, leading to a probability around 0.9 of not seeing any in the next 100 years. Such information may be highly relevant to structural design.

Robinson & Tawn (2000) discuss how sampling a time series at diﬀerent frequencies will aﬀect the values of θ, and derive bounds on their relationships.

The left panel of Figure 1 shows a realization of Xj = 6 i|Zj−i |, where i=1 the Zj are iid with a Cauchy distribution.Clusters manifest themselves as vertical strings formed by points corresponding to successive large values of Xi, driven by occasional huge values of Zj. The corresponding plot for an iid sequence would show no clustering. The middle panel shows realizations of the sequence Xj = Zj + 2Zj+1, with the Zj iid Cauchy variates. In this case Davis & Resnick (1985) show that the average cluster size is 3/2. The right panel shows the Cauchy sequence Xj = ρXj−1 + (1 − |ρ|)Zj where ρ ∈ (0, 1) and the Zj are iid standard Cauchy variates, for ρ = 0.8; Chernick et al. (1991) show that the extremal index is 1 − ρ, so in this case the mean cluster size is 5.

Examples such as these are instructive, but such models are not widely used in applications. It follows from Sibuya (1960) that linear Gaussian autoregressivemoving average models have θ = 1, corresponding to asymptotically independent extremes, despite the clumping that may appear at lower levels, and this raises the question of how to model the extremes of such series. Davis & Mikosch (2008, 2009a) show that while both GARCH and stochastic volatility models display volatility clustering, only the former shows clustering of extremes, thus providing a means to distinguish these classes of ﬁnancial time series.

Modelling Time Series Extremes

for some threshold sequence {un } such that n{1 − F (un )} → λ ∈ (0, ∞).

This condition may be harder to satisfy than one might expect; Chernick (1981) gives an example of an autoregressive process with uniform margins that satisﬁes D(un ) but does not satisfy D′ (un ).

It can be shown that a stationary process satisfying both D(un ) and D′ (un ) has extremal index θ = 1. Similar conditions have been introduced to ensure convergence of the point process of exceedances (Beirlant et al., 2004, Ch. 10).

2.2. Statistics of cluster properties Suppose that a sequence {Xi } satisﬁes a suitable mixing condition, such as that in Deﬁnition 2.1, and call π the probability mass function of the size of a cluster of extreme values of mean size θ−1. Suppose that we wish to estimate θ based on apparently stationary time series data of length n. The blocks estiV. Chavez-Demoulin and A.C. Davison mator of θ is computed using the empirical counterpart of (2.1), by selecting a value r, dividing the sample into [n/r] disjoint contiguous blocks of length r, and then counting exceedances over a high threshold u in those blocks containing exceedances. The proportion of blocks with k exceedances estimates the probability π(k) and the average number of exceedances per block having at least one exceedance estimates θ−1. Likewise the runs estimator is the empirical counterpart of (2.2). Computations in Smith & Weissman (1994) suggest that the runs estimator has lower bias, and therefore is the preferable of the two. AnconaNavarrete & Tawn (2000) compare the then-known estimators of the extremal index, using both nonparametric and parametric approaches.

In subsequent work Ferro & Segers (2003) proposed the intervals estimator,

**based on a limiting characterization of the rescaled inter-exceedance intervals:**

with probability θ an arbitrary exceedance is the last of a cluster, and then the time to the next exceedance has an exponential distribution with mean 1/θ;

otherwise the next exceedance belongs to the same cluster, and occurs after a (rescaled) time 0. Thus the inter-exceedance distribution is (1 − θ)δ0 + θ exp(θ), where δ0 and exp(θ) represent a delta function with unit mass at 0 and the exponential distribution with mean 1/θ. The parameter θ can be estimated from the marginal inter-exceedance distribution in a variety of ways, of which the best seem to be due to S¨veges (2007). The intervals estimator can be made automatic u once the threshold has been chosen, and it also provides an automatic approach to declustering and thus to the estimation of cluster characteristics, including the cluster size distribution π. It can also be used to diagnose inappropriate thresholds (S¨veges and Davison, 2010).

u Laurini & Tawn (2003) suggest a two-thresholds approach, according to which a cluster starts with an exceedance of a higher threshold and ends either when the process drops below a lower threshold before another such exceedance, or after a suﬃciently long period below the higher threshold. Although theoretical investigation of its properties is diﬃcult, they establish numerically that their estimator is more stable than most of those above.

One reason to attempt declustering is that, as mentioned above, under the limiting model for threshold exceedances, the marginal distribution of an exceedance is the same as that of a cluster maximum; this is a consequence of length-biased sampling. Thus reliable estimates and uncertainty measures of the generalized Pareto distribution of exceedances may be obtained from the (essentially independent) cluster maxima; this is the basis of the peaks over threshold approach to modelling extremes. Its application requires reliable identiﬁcation of cluster maxima, however, and Fawcett & Walshaw (2007, 2012) establish that the diﬃculty of this can lead to severe bias. This bias can be reduced by using all exceedances to estimate the GPD, though then the standard errors must be modiﬁed to allow for the dependence. Eastoe & Tawn (2012) suggest an alternative sub-asymptotic model for cluster maxima, with diagnostics of its appropriateness.

Modelling Time Series Extremes The threshold approach allows the modelling of cluster properties, for example using ﬁrst-order Markov chains (Smith et al., 1997; Bortot & Coles, 2003), which are estimated using a likelihood in which the extremal model is presumed to ﬁt only those observations exceeding the threshold, with the others treated as censored. Standard bivariate extremal models can be used to generate suitable Markov chains, and so can near-independence models (Ledford & Tawn, 1997;

Bortot & Tawn, 1998; Ramos & Ledford, 2009; de Carvalho & Ramos, 2012).