Actuarial Applications of Deep Learning

class: center, middle, inverse, title-slide

# Actuarial Applications of Deep Learning
## Loss Reserving and Beyond
### Joe Fang, Nicole Foster, and Kevin Kuo
### May 2018

---

# Introduction

Your hosts and their drinks of choice.

![](figs/headshots.png)

.content-box-gray[Can you spot the actuary?]

---
# Agenda

- Introduction to Deep Learning

- Hands-on Keras Demo

- Loss Reserving Case Study

- Q+A and Open Discussion

---
# What is Machine Learning?

.full-width[.content-box-blue[A field of computer science that gives computers the ability to "learn" with data, without being explicitly programmed]]

![](figs/machine-learning-features.png)

---
# Why Deep Learning?

- Subset of machine learning

- Often uses a neural network to simulate how the human brain learns

- Performs better than traditional machine learning techniques for large datasets

---
# What is a Neural Network?

![](figs/neural-network.png)

---
# How does a neural network learn?

- Loss Functions

- Gradient Descent

![](figs/gradient-descent.png)

---
# What software is available for deep learning?

Approximately...

- Higher level APIs
    - Keras, PyTorch (+Caffe2), Tensorflow Estimator, ...

- Lower level libraries to power computations
    - TensorFlow, PyTorch, Theano, CNTK, MXNet, ...

.content-box-blue[R interfaces available!]
---
# Intro to Keras

.full-width[.content-box-red[Keras* is a high-level neural networks API developed with a focus on enabling fast experimentation]]

.footnote[[*] [https://keras.rstudio.com/](https://keras.rstudio.com/)]

---
# MNIST Example

![](figs/MNIST.png)

Input your number at: [http://colorado.rstudio.com:3939/classroom-assignment/](http://colorado.rstudio.com:3939/classroom-assignment/)

---
class: inverse, center, middle

# Loss reserving case study

---

# "Claims liabilities Estimation"

.content-box-blue[Basically, figure out what we gotta pay in the future due to claims.]

---

# Example triangle

```
## # A tibble: 6 x 7
## accident_year `1` `2` `3` `4` `5` `6`
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1988 133 200 98 139 45 0
## 2 1989 934 812 619 214 184 NA
## 3 1990 2030 2834 2016 1207 NA NA
## 4 1991 4537 6990 3596 NA NA NA
## 5 1992 7564 8497 NA NA NA NA
## 6 1993 8343 NA NA NA NA NA
```

---

# Treat this as a predictive modeling problem

Each cell of the triangle is a row in the modeling dataset.

We just need to come up with some predictors

```
## # A tibble: 8 x 4
## accident_year development_lag incremental_paid_loss predictors 
## <int> <int> <dbl> <chr> 
## 1 1988 1 133 ?!?!?!?!?!?
## 2 1989 1 934 ?!?!?!?!?!?
## 3 1990 1 2030 ?!?!?!?!?!?
## 4 1991 1 4537 ?!?!?!?!?!?
## 5 1992 1 7564 ?!?!?!?!?!?
## 6 1993 1 8343 ?!?!?!?!?!?
## 7 1988 2 200 ?!?!?!?!?!?
## 8 1989 2 812 ?!?!?!?!?!?
```

Then we can do something like

```r
crazy_AI_algorithm(incremental_paid_loss ~ predictors, data = data)
```

---

# Introducing DeepTriangle

Let's try to apply neural networks on some real reserving data.

.content-box-yellow[We're gonna call it **DeepTriangle*** because it sounds cool.]

.footnote[[*] [https://arxiv.org/abs/1804.09253](https://arxiv.org/abs/1804.09253)]

---

# Data

Schedule P data from [http://www.casact.org/research/index.cfm?fa=loss_reserves_data](http://www.casact.org/research/index.cfm?fa=loss_reserves_data).

.full-width[.content-box-green[10 accident years (1988-1997) of paid and incurred losses, with 10 development lags, from a bunch of companies and lines of business.]]

---

# Response and predictors

.full-width[.content-box-purple[Let's talk about our response variable and predictors!]]

---

# Response

- **Response: incremental paid losses and total claims outstanding**

We're gonna predict both paid loss and claims o/s in the same model, ain't that cool?!

---

# Predictors

- Response: incremental paid losses and total claims outstanding 👍
- **Predictors:**

---

# Predictors

Note that... there's really not much we can use in aggregated data. We also have to follow this rule:

> The information used to derive the predictors for a cell must be available before the calendar period associated with the cell.

I.e. we're not cheating and looking into the future.

---

# Predictors

- Response: incremental paid losses and total claims outstanding 👍
- **Predictors:**
  - **Time series of paid losses and case reserves**

---

# Predictors

.content-box-green[Let's see what we mean by "time series of paid losses".]

---

# Predictors

Basically, for each cell in the triangle, we take the experience for the AY up to the previous calendar year. For example, for AY 1988 we have:

```
## # A tibble: 6 x 3
## development_lag incremental_paid_loss paid_history 
## <int> <dbl> <chr> 
## 1 1 133 "" 
## 2 2 200 133 
## 3 3 98 133, 200 
## 4 4 139 133, 200, 98 
## 5 5 45 133, 200, 98, 139 
## 6 6 0 133, 200, 98, 139, 45
```

---

# Predictors

- Response: incremental paid losses and total claims outstanding 👍
- **Predictors:**
  - Time series of paid losses and case reserves 👍
  - **Company (because we're using data from all companies simultaneously)**

---

# Predictors

> **NOTE (5/23/18)** -- we don't actually do this; rather, we input a scalar corresponding to the index of the company into the neural network, and Keras/TF uses a lookup table instead of one-hot encoding the input on the fly.

Company code is one-hot encoded, e.g. the third company in a collection of `\(20\)` companies would be represented as

```
##  [1] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
```

Super easy!

---

# Predictors

- Response: incremental paid losses and total claims outstanding 👍
- Predictors:
  - Time series of paid losses and case reserves along accident year 👍
  - Company (because we're using data from all companies simultaneously) 👍

Now that we've gone through the response and predictors, let's talk about the neural network itself!

---

# Architecture

Looks fancy, but it's just a neural network!

---

# Embedding layer

> **NOTE (5/23/18)** -- the diagram here is incorrect as we're not one-hot encoding. Instead, we're using a lookup table. The intuition (and results) are the same, though.

Dimensionality reduction!

For example, company #5 might get mapped to `c(0.4, 1.2, -3.7)`.

---

# Neural network for sequences

Just like a vanilla feedforward neural network, except we feed the sequential input... in sequence.

---

# Helping RNN remember

**(Don't worry about the details!)**

Gated recurrent unit (GRU) is an architecture that helps the network remember stuff from a long time ago.

---

# Putting it all together

Again, we're really just applying a bunch of functions, one after another, to our input data.

---

# Some results

Sample results from the company with the most data in the dataset...

---

# Some results

Workers' comp

---

# Benchmarking

Results for other methods taken from [http://www.casact.org/pubs/monographs/index.cfm?fa=meyers-monograph01](http://www.casact.org/pubs/monographs/index.cfm?fa=meyers-monograph01).

---

# Discussion

Neural networks aren't too shabby at doing some basic reserving work.

.content-box-purple[But this is just the beginning!]

---

# Discussion

Future work?

- Predictions intervals for reserve variability.

- Claims level analytics, where we can take into account things like adjusters' notes and images.

- Policy level analytics, towards a holistic approach to pricing + reserving.

- Interpretability.

---

# Discussion

Neural networks are cool (again) and you should give it a shot.

---

# Discussion

Don't be scared.

---

# Discussion

Really, we spend years taking all those exams, and exams are hard.

---

# Discussion

Unlike other fields, actuarial work requires tremendous domain expertise.

---

# Discussion

Takeaway?

There's a lot of hype and noise around AI, but stay informed, lest we fall behind!

---

# Q&A