# Multiple polynomial regression for EnergyPlus Curve Class

Generating EnergyPlus Curves from discrete points.

Dynamic simulations in EnergyPlus often require performance data of systems, not readily available, such as the COP of heat pumps in relation to water and air temperature.

Equipment manufacturers publish performance data sheets and work-field diagrams in tables with discrete entries, but EnergyPlus requires multidimensional continuous function in the form of the Curve family of classes. These classes build continuous functions given the coefficients of the equations.

What we need is a conversion between discrete and continuous data. Mathematically speaking we need to find the multiple polynomial regression of a set of points in space, in terms of a curve (2d case) or a surface (3d case).

## Manufacturers data

The following is an example of what a manufacturer may publish: ## EnergyPlus requirements

Here are the Curve classes specifications from EnergyPlus Input Output Reference:

The equation is the following:

$$y = C_1 + C_2 x + C_3 x^2$$

The EnergyPlus string is the following:

Curve:Quadratic,

WindACCBFFFF,  ! name

-2.277,        ! Coefficient1 Constant

5.2114,        ! Coefficient2 x

-1.9344,       ! Coefficient3 x\*\*2

0.0,           ! Minimum Value of x

1.0;           ! Maximum Value of x

The equation is the following:

$$z = C_1 + C_2 x + C_3 x^2 + C_4 y + C_5 y^2 + C_6 x y$$

The EnergyPlus string is the following:

Curve:Biquadratic,

WindACCoolCapFT,  ! name

0.942587793,      ! Coefficient1 Constant

0.009543347,      ! Coefficient2 x

0.000683770,      ! Coefficient3 x\*\*2

-0.011042676,     ! Coefficient4 y

0.000005249,      ! Coefficient5 y\*\*2

-0.000009720,     ! Coefficient6 x\*y

15., 22.,         ! min and max of first independent variable

29., 47.;         ! min and max of second independent variable

## Multiple linear regression (Math alert!)

The problem we want to solve is to find a 2d curve that approximates a set of points in 2d space. In particular, we want a second-order equation (aka quadratic).

Moreover, we want to find a 3d surface that approximates a set of points in 3d space. In particular, we want a second-order equation (aka biquadratic).

To solve the problem we’ll exploit some linear algebra, using the polynomial regression model. This method works for both the cases we want to solve and more

$$y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \cdots + \beta_m x_i^m + \varepsilon_i \quad (i = 1,2,\dots,n)$$

where n is the number of input points and m is the order of the equation we want to use (2 in our case).

The previous equations can be expressed in matrix form in terms of a design matrix $$\bf X$$, a response vector $$\vec y$$, a parameter vector $$\beta$$ (the beta vector is the same as the coefficients vector of the previous paragraph, it’s just a different notation), and a vector $$\vec\varepsilon$$ of random errors. The i-th row of $$\bf X$$ and $$\vec y$$ will contain the x and y value for the i-th data sample.

We can write:

$$\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} 1 & x_1 & x_1^2 & \cdots & x_1^m \\ 1 & x_2 & x_2^2 & \cdots & x_2^m \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_n & x_n^2 & \cdots & x_n^m \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_m \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \vdots \\ \varepsilon_n \end{bmatrix}$$

which when using pure matrix notation is written as

$$\vec y = \bf X \vec \beta + \vec \varepsilon$$

The vector of estimated polynomial regression coefficients (using ordinary least squares estimation) is

$$\hat{\vec \beta} = (\bf X^T \bf X)^{-1} \bf X^T \vec y$$

assuming m < n which is required for the matrix to be invertible; then since $$\bf X$$ is a Vandermonde matrix, the invertibility condition is guaranteed to hold if all the xi values are distinct. This is the unique least-squares solution.

#### Examples

So, for example, the quadratic problem for n points would be:

$$\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} 1 & x_1 & x_1^2 \\ 1 & x_2 & x_2^2 \\ \vdots & \vdots & \vdots \\ 1 & x_n & x_n^2 \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \end{bmatrix}$$

$$\begin{bmatrix} z_1 \\ z_2 \\ \vdots \\ z_n \end{bmatrix} = \begin{bmatrix} 1 & x_1 & x_1^2 & y_1 & y_1^2 & x_1 y_1 \\ 1 & x_2 & x_2^2 & y_2 & y_2^2 & x_2 y_2 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 1 & x_n & x_n^2 & y_n & y_n^2 & x_n y_n \\ \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \\ \beta_4 \\ \beta_5 \end{bmatrix}$$

### Least square error

The error can be estimated with the least square method, by computing the coefficient of determination $$R^2$$, which is defined as

$$R^2 \equiv 1 – {SS_{res} \over SS_{tot}}$$

where

$$SS_{tot} = \sum_i (y_i – \bar y)^2$$

$$SS_{res} = \sum_i (y_i – f_i)^2 = \sum_i \varepsilon_i^2$$

$$SS_{tot}$$ (total sum) indicates how far each $$y_i$$ (the dependent variable of the input data) is from the average of the dependent variables;

$$SS_{res}$$ (residual sum) indicates how far each $$y_i$$ is from the approximation calculated from the regression function.

The closer the value is to 1, the better the approximation.

## Geogebra

To get a graphical feel of what these coefficients do to edit the curve/surface, here are two Geogebra canvases you can play with (click on the images to open)  NOTE: keep in mind that what we use as a fitting function is a small part of the full dominium, usually fairly close to the origin and that our coefficients will be close to zero, so the curve/surface will be quite flat.

## Grasshopper (with IronPython)

For the practical implementation of the regression, we use IronPython, the version of the language embedded in Rhino Grasshopper. An alternative and valid approach which only uses Python is possible using libraries such as Numpy and Scipy, or any codebase that supports matrix-vector manipulation. click me to get the quadratic cluster

Here is the node tree which makes use of a custom Cluster: click me to get the biquadratic cluster

## Useful References

1. EnergyPlus Input Output Reference | https://energyplus.net/sites/all/modules/custom/nrel_custom/pdfs/pdfs_v9.3.0/InputOutputReference.pdf
2. Polynomial regression on Wikipedia | https://en.wikipedia.org/wiki/Polynomial_regressionhttps://en.wikipedia.org/wiki/Polynomial_regression
3. Coefficient of determination (least square error) | https://en.wikipedia.org/wiki/Coefficient_of_determination

18 Aug 2020
Mattia Bressanelli