# Interns 2015

Below you can find details of the summer 2015 interns including a description of their research project.

Ana Daglis

University of Cambridge, BA Mathematics

Supervisor: Matthew Ludkin

*Statistical inference for evolving network structure *

Networks are prominent in today’s world. The volume of telecommunications and social network data has exploded in the last two decades. Gaining a statistical understanding of the processes generating and maintaining network structure can be used to make confidence statements about properties of a network, detect anomalous behaviour or target adverts. In recent years more data has been collected alongside the network. Can such covariate information improve inference for network structure compared to network data alone? Many have attempted to model how networks grow, however most models have poor statistical properties. This project will investigate approaches for combining statistical methodology from static modelling techniques with methods for analysing data indexed through time.

**View Ana's presentation and poster.**

Lawrence Latter

Lancaster University, MSci Mathematics

Supervisor: Paul Sharkey

*Modelling extra-tropical cyclones using extreme value methods *

The prevalence of extra-tropical cyclones in the mid-latitudes is a dominant feature of the weather landscape affecting the United Kingdom. The UK has come to expect a consistent annual pattern of temperate summers and mild winters. However, in recent years it has been a focus of extreme weather events, for example, major floods and damaging windstorms. Accurate modelling and forecasting of extreme weather events is essential to protect human life, minimise potential damage and economic losses, and to aid design of appropriate defence mechanisms. In this context, an extreme event is one that is very rare, with the consequence that datasets of extreme observations are small. The statistical field of extreme value theory is focused on modelling such rare events, with the ideology of extrapolating physical processes from the observed data to unobserved levels. This project will focus on applying extreme value methods to remote sites in the North Atlantic and European domain.

**View Lawrence's presentation and poster.**

Euan McGonigle

University of Glasgow, MSci Mathematics

Supervisor: Sean Malory

*A Linguistically-Motivated Changepoint Problem from a Bayesian Perspective*

Sequences arise naturally in linguistics with the number of occurrences of a linguistically salient feature changing over time as language attitudes evolve. One such feature is the use of flat adverbs, for instance, in the phrase “fresh ground coffee" the word “fresh" is a flat adverb, since it functions as an adverb but lacks the typical suffix “ly". While not as widespread nowadays, flat adverbs were commonly used during 1700-1900. Authors of this period used flat adverb forms and were publicly criticised for doing so. This project will introduce a Bayesian statistical framework to investigate whether the rate of flat adverb use changed significantly after an author's writing had been subjected to such criticism. This will focus on detection of changes in a sequence of data points using a Bayesian approach, specifically, we will be interested in quantifying (in a precise way) whether or not a change in the sequence has occurred at some point.

**View Euan's presentation and poster.**

Daniel Miles

University of Reading, MMath Mathematics

Supervisor: Katie Yates

*Density-based cluster analysis *

Cluster analysis is the process of partitioning a set of data vectors into disjoint groups (clusters) such that elements within the same cluster are more similar to each other than elements in different clusters. Clustering has a wide range of application areas including Biology, Physics, Computer Science, Social Science and Market Research. There are three main categories of algorithms which can be applied in order to find solutions to data clustering problems: hierarchical, partitioning and density-based. The main focus of this project is to explore density-based clustering methods, and to compare the performance of these algorithms via simulation studies.

**View Daniel's presentation and poster.**

Sarah Oscroft

Newcastle University, MMath Mathematics

Supervisor: Andrew Wright

*Classification in streaming environments *

The aim of a classification model is to predict the class label of a new observation using only historical observations. Traditional classification approaches assume this historical dataset is a fixed size and is drawn from some fixed probability distribution(s). However, in recent years a new paradigm of data stream classification has emerged. In this setting the observations arrive in rapid succession, with classifiers capable of being trained sequentially, and an adaptable underlying probability distribution. These classifiers have applications in areas as diverse as spam email filtering, analysing the sentiment of tweets and high-frequency finance. This project will investigate how models can be used to produce streaming versions of classifiers.

**View Sarah's presentation and poster.**

Srshti Putcha

London School of Economics, BSc Mathematics and Economics

Supervisor: Jamie-Leigh Chapman

*Auto-Correlation Estimates of Locally Stationary Time Series*

A time series is a sequence of data points measured at equally spaced time intervals. Examples of time series include FTSE 100 Daily Returns and the total annual rainfall in London, UK. Often we assume that such series are second order stationary. In other words, that the statistical properties of the time series remain constant over time, e.g. the autocorrelation. However, the reality is that many time series are not second order stationary and therefore it is not appropriate to model them using such methods. Instead we must consider time varying equivalents of the autocorrelation or autocovariance. One method that analysts use to adapt the regular autocorrelation function to be a time varying quantity, is applying rolling windows of the data. Unfortunately, this can present quite different answers for segments of different lengths based on segment length choice and location of the time series sample. This project will explore alternative methods of estimating a time varying auto-correlation function in order to overcome these problems.

**View Srshti's presentation and poster.**

Sam Tickle

University of Cambridge, BSc Mathematics Tripos

Supervisor: Elena Zanini

*Regression, curve fitting and optimisation algorithms*

The underlying strategy for most statistical modelling is to find parameter values that best describe the fit of the model to the data. This requires optimising an objective function while minimising the difference between the model and the observations. When analytical solutions to the optimisations are unavailable, statisticians often rely on numerical optimisation routines to perform this fit, trusting that this will produce stable estimates of the parameters. Firstly, some issues may arise in the choice of the best algorithm given the characteristics of the problem at hand. Secondly, the algorithm considered may not actually perform well, and needs to be understood and adapted to work better on the model considered. This project will investigate different numerical optimisation algorithms used in statistical inference and curve fitting, and how to overcome some of the problems associated with these types of algorithms.

**View Sam's presentation and poster.**

Zak Varty

Lancaster University, MSci Mathematics

Supervisor: Helen Barnett

*Pharmacokinetic Modelling*

In medical research, in both pre-clinical and clinical trials, the objective is to learn about the behaviour and effect of potential new drugs in the body. This breaks down into two categories- how the drug affects the body (Pharmacodynamics) and how the body affects the drug (Pharmacokinetics). This application driven project focuses on pharmacokinetic modelling, which involves modelling the concentration of a compound in the blood over time. The aim of the project is to apply statistical modelling techniques to real data in order to obtain an understanding of the role of pharmacokinetics in the drug development process.

**View Zak's presentation and poster.**