# Interns 2016

Below you can find details of the summer 2016 interns including a description of their research project.

Stefanos Bennett

University of Cambridge, BA Mathematics

Supervisor: Stephen Page

*Regression with Dependencies and Non-Gaussian Noise*

The linear model is a widely used tool in regression analysis. Linear regression models are most commonly fitted using the both conceptually and computationally simple least squares approach. A frequently made assumption in linear least squares regression is that the error terms between the observed responses and the corresponding expected values are independent and identically distributed normal random variables. This assumption greatly simplifies the matter of obtaining confidence intervals for the unknown parameters of our model. However, whether this is a sound assumption depends on the size and nature of the particular dataset under consideration. This project will investigate the case when the assumption is not satisfied. Various techniques for obtaining confidence sets will be examined and compared to the sets obtained via normal approximation. The effects of different possible violations of the Gaussian assumption on the constructed confidence sets will be investigated.

**View Stefanos' presentation and poster.**

Matthew Bold

University of Birmingham, MSci Mathematics

Supervisor: Lucy Morgan

*Input Uncertainty in Simulation Models*

Simulation uses mathematical modelling in order to mimic real world systems which cannot be tested in reality; perhaps due to time, cost or safety constraints. Information gained by running the simulation can then be used to make decisions about the real world system. For example, retailers want to ensure they have enough servers to prevent customers from having to queue for long periods of time. A simulation model can be used to understand how the queue behaves and make decision about how many servers are needed for each shift in order to keep the queue length below a certain level. The inputs in simulation models are usually approximated by observing real world data; for example, observing the number of customers that are served in a shop over a period of time. Input uncertainty arises from the fact that we only have a finite amount of real world data, and therefore cannot be certain that the values of the input parameters that are being used to drive the simulation are the true values of the input parameters. This projects aims to quantify the input uncertainty in a queueing simulation model.

**View Matt's presentation and poster.**

Bronwen Edge

Bsc Mathematics, Heriot-Watt University

Supervisor: Emma Stubington

*How good is Lancaster universities Mathematics Department? – An investigation using Data Envelopment Analysis*

Each year university league tables are released but many are based on different criteria and have slightly different results. We are interested in testing the efficiency and productivity of mathematics departments across the country. As we are considering multiple inputs and outputs: student satisfaction, entry requirements, academic and career attainment and the cost of university, etc. it is difficult to make direct comparisons between institutions. We therefore need to use a management science method, Data Envelopment Analysis, (DEA) which can cope with lots of constraints. What I am finding particularly interesting is the additional questions that arise from examining the data and implementing this approach, for example: Should universities that produce high numbers of good degrees be considered the best? Are some students not reaching their potential and are being let down by their institution, given they entered university with extremely high entry requirements? Are some universities awarding an unrepresentative number of good degrees considering their place in current league tables, or is the data just extremely bias with a small sample size? Should all universities be charging the same fees, given their career opportunities after are significantly less? Is university location skewing the career prospects of students, whilst not taking into consideration the living costs and average salary of non-graduates of some locations? As my project advances I have realised that what seemed like a simple linear programming problem evolves into a complex social and economic issue, which questions the real cost to students when choosing which university is best for them.

**View Bronwen's presentation and poster.**

Thomas Grundy

Lancaster University, BSc Mathematics with Statistics

Supervisor: Oliver Hatfield

*Detecting Match-Fixing in Tennis*

In January 2016, tennis was hit by allegations of widespread match fixing prompted by the release of secret documents from reviews into tennis’ integrity. The documents detailed widespread accusations of corruption within the sport. The aim of the project is to create simulations of tennis matches and explore sudden changes in performance, which could be linked to match fixing, using simple change point methods. Features such as dependance and the importance of critical points will also be taken into account to create accurate simulations. In addition the current rating system within tennis only takes into consideration the previous years results and has no consideration on the strength of opponents. A further aim of the project is to create a rating system based around the ELO system with improvements.

**View Thomas' presentation and poster.**

Ben Miller

Lancaster University, BSc Mathematics

Supervisor: Aaron Lowther

*Detecting Unwanted Variation in Time Series*

A statistical outlier in a set of data is defined to be “an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data”. In the context of time series, examples of outliers may include the number of complaints received by BT after a power outage, or the increase in supermarket sales during the days leading up to Christmas. It is important that we are able to detect these outliers as they may have a significant impact on the model selected to fit the data, the parameter estimates for the model, and consequently, on any forecasts made from the model. This project will look into methods and algorithms that are able to automatically detect outliers in time series.

**View Ben's presentation and poster.**

Henry Moss

University of Cambridge, Mathematics

Supervisor: Emma Simpson

*Assessing Dependence in Extreme Values*

Extreme value theory models the maxima (minima) of random variables. By their very nature, they occur infrequently and so are hard to model. A robust framework already exists, with block-maxima and threshold based approaches providing parametric distributions for the maxima. Known as the Generalised Extreme Value (GEV) and Generalised Pareto (GP) distributions, these allow us to estimate the maximum value that we would expect to see over n years. My project looks into the bivariate case, where our variables have extremes that either occur simultaneously (Asymptotic Dependence) or independently (Asymptotic Independence). There already exist several statistical measures that measure this behaviour however it is hard to obtain reliable estimates of their values. I am looking at developing an alternative method to simultaneously estimate two of these measures, with the hope of finding some synergies.

**View Henry's presentation and poster.**

Emma Oldfield

University of Sheffield, BSc Mathematics

Supervisor: Ciara Pike-Burke

*Improving question selection in education software*

With the advance of technology in education, it is becoming more possible to personalise education software, providing students with questions tailored to their individual learning styles and abilities. The data gathered from the students previous interactions with the education software can be used to simulate students response to future data. This enables us to model student performance.The main aim will be to investigate whether Bayesian methods can provide a more accurate prediction of student performance over frequentist methods. The Bayesian approach looked into Monte Carlo Markov Chains and Random Walk Metropolis. The models will be used to predict whether students would pass an exam of particular questions.

**View Sam's presentation and poster.**

Anja Stein

University of Edinburgh (Mmath)

Supervisor: James Grant

*Assigning Drones in Military Search*

Drone technology is fast becoming a vital component of military operations. Unmanned Aerial Vehicles (UAVs), as they are known within the military, can perform a variety of tasks remotely making things both more efficient and safer for military personnel. This project revolves around optimizing the UAV Search Problem by maximising the number of events detected within a given border by a fleet of UAVs equipped with cameras. The UAVs aim to detect the locations of events of some sort occurring on the border (one example may be crossings of the border). Each UAV is to be assigned a specific subsection of the border to patrol, with the assumption being that the larger its subsection is, the less likely it will be to actually detect an event. Some UAVs may be naturally better at detecting events than others (because of better cameras etc.) and some UAVs may be better equipped to detect events in certain parts of the boundary (e.g. different types of terrain).

**View Anja's presentation and poster.**

Georgios Topaloglou

University of Cambridge, BA Mathematical Tripos

Supervisor: Daniel Waller

*Univariate methods for time series forecasting*

Time series are often grouped in a hierarchical structure. For example, the time series for the total number of tourists visiting a country may be split into more time series according to the purpose of travel, and each of these time series may in turn be split into more time series according to the length of stay, thus creating a tree-like hierarchical structure. The issue of forecasting hierarchical series in a way that allows for a similar hierarchical disaggregation of the forecasts is very important. This project will combine two methods that have recently been proposed, optimal combination and temporal aggregation. It will then test the accuracy of this new method against that of optimal combination and other standard techniques such as bottom-up and top-down forecasting.

**View Georgios' presentation and poster.**

Alan Wise

University of Edinburgh, Mathematics BSc (Hons)

Supervisor: Rebecca Wilson

*Detecting Changes in Multivariate Time Series*

Change point detection of univariate time series has been widely covered but the increasing availability of multivariate data has motivated the study of multivariate detection methods. Time series data of a multivariate flavour can be found in finance, health monitoring, signal processing, bioinformatics, and detecting credit card fraud. In my project I explore a few methods to detect change points of multivariate time series data. I also discuss the drawbacks of these methods and suggest ways in which these drawbacks could be overcome.

**View Alan's presentation and poster.**