Introduction

Degree days are useful as a measure of building heating and cooling demands. A degree day is calculated as the difference between the average temperature (the average of the high and low temperature for the day) and a reference temperature (in the US 65°F is used). For example, if the average temperature today is 40°F, that would be 25 heating degree days (HDD). A summer day with an average temperature of 85°F would have 20 cooling degree days (CDD). Degree days are usually well correlated with the amount of energy used to heat or cool a home.

I was interested in obtaining and analyzing degree day data; in particular I wanted to see if there were any noticeable trends over time. Given an overall increase in earth’s average temperature due to climate change, I would hypothesize that there might be an overall increase in CDD and a decrease in HDD.

Changes in heating or cooling degree days would have implications for the amount of energy needed in the future to heat and cool residential or commercial buildings, resulting changes in demand on the electric grid, and implications for related carbon emissions (either for the power grid or from burning fossil fuels to heat buildings).

Data

I obtained heating and cooling degree day data from the U.S. Energy Information Administration for the US. Note these data are weighted by population, to reflect the effect of both temperature and population on energy demands for cooling and heating. You can see details of how the EIA data are calculated here.

Code

suppressPackageStartupMessages(library(tidyverse))
ggplot2::theme_set(theme_grey(base_size = 15))
suppressPackageStartupMessages(library(janitor))
library(broom)
suppressPackageStartupMessages(library(DT))
suppressPackageStartupMessages(library(plotly))

I’ll only use years we have complete data for (1997-2022). Table 1 shows the data after being loaded and cleaned up.

Code

region <- "u_s"

dd <- read_csv(paste0('data/EIA_DegreeDays_Monthly_',region,'.csv'), 
               skip = 4,
               show_col_types = FALSE) |>
  janitor::clean_names() |>
  rename(CDD = paste0('cooling_degree_days_',region,
                      '_cooling_degree_days_cooling_degree_days')) |>
  rename(HDD = paste0('heating_degree_days_',region,
                      '_heating_degree_days_heating_degree_days')) |>
  mutate(date = lubridate::my(month)) |>
  select(-month) |>
  mutate(month = lubridate::month(date)) |>
  mutate(year = lubridate::year(date)) |>
  filter(year > 1996, year < 2023) # keep only complete years

Code

dd |>
  DT::datatable(options = list(pageLength = 5), rownames = FALSE)

Table 1: Table of monthly degree day data for US

I’ll also make a dataframe of the yearly totals (Table 2)

Code

dd_yearly <- dd |>
  filter(year > 1996) |>
  group_by(year) |>
  summarise(HDD = sum(HDD, na.rm = TRUE),
            CDD = sum(CDD, na.rm = TRUE)
            )

dd_yearly |>
  DT::datatable(options = list(pageLength = 5), rownames = FALSE)

Table 2: Table of yearly degree days for the US

Analysis

Heating Degree Days

Figure 1 shows the distribution (using a boxplot) of US heating degree days for each month. Not surprisingly HDD tends to be higher in winter months, although there is a decent amount of variability between years.

HDD Per Month

Code

dd |>
  mutate(month_name = lubridate::month(date, label = TRUE)) |>
  ggplot(aes(month_name, HDD, group = month_name)) +
  geom_boxplot() +
  labs(title = 'Monthly Heating Degree Days for US (1997-2022)',
       x = 'Month',
       y = "Heating Degree Days")

Figure 1: Boxplot of US heating degree days for each month

Trends in HDD

Is there a trend in HDD over time? I would expect that HDD might decrease over time due to climate change and the increase in earth’s average temperature.

Annual

Figure 2 shows a timeseries of the annual total heating degree days in the US, along with a linear regression line that shows a negative trend.

Code

g <- dd_yearly |>
  ggplot(aes(year, HDD)) +
  geom_point(size = 4, alpha = 0.5) +
  geom_smooth(method = 'lm', formula = 'y~x') +
  labs(title = "US Annual Heating Degree Days")


plotly::ggplotly(g)

Figure 2: Timeseries of annual US HDD with linear regression line.

Code

hdd_yearly_fit <- lm(data = dd_yearly, formula = 'HDD ~ year')

broom::tidy(hdd_yearly_fit) |> 
  mutate_if(is.numeric,~round(.x,digits = 3)) |> # round numeric values for table
  DT::datatable()

Table 3: Results of linear regression for annual HDD

There is a fair bit of variability, but looking at the fit metrics (Table 3) shows that the negative trend in HDD is statistically significant (p-value < 0.05). Annual heating degree days are decreasing at a rate of 13 HDD per year.

Monthly

We have seen that there is a negative trend in annual HDD; what are the trends for individual months? Figure 3 shows timeseries of monthly HDD vs year for winter months, with linear regression lines plotted over them. Visually there appears to be a negative trend for some of the months.

Code

dd |>
  filter(month %in% c(11,12,1,2,3,4)) |>
  mutate(month_name = lubridate::month(date, label = TRUE)) |>
  ggplot(aes(year, HDD, group = month_name)) +
  geom_point(size = 3, alpha = 0.5) +
  geom_smooth(method = 'lm', formula = 'y ~ x') +
  facet_wrap('month_name', scales = 'free') +
  guides(x =  guide_axis(angle = 45))

To better quantify these trends I want to fit a linear regression to the data for each month and examine the results. This could be done with a for loop, but I will take advantage of a nice nested workflow using the tidyr (Wickham, Vaughan, and Girlich 2023), broom (Robinson, Hayes, and Couch 2023), and purrr (Wickham and Henry 2023) packages.

Code

dd_fit_hdd <- dd |>
  group_by(month) |>
  nest() |>
  mutate(fit = map(data, ~ lm(HDD ~ year, data = .x) ),
         tidied = map(fit, broom::tidy),
         glanced = map(fit, broom::glance)
  ) |>
  unnest(tidied) |>
  ungroup()

dd_fit_hdd |>
  mutate_if(is.numeric, ~round(.x,digits = 3)) |>
  DT::datatable(rownames = FALSE, options = list(pageLength = 5))

Table 4: Results of linear regression fits of heating degree days for each month

Now that the data and model fit results are in a tidy dataframe (Table 4), they can be easily filtered to identify significant fits using p-values (Table 5). The months with significant trends are June, August, September, and October. These months and the linear regression lines are shown in Figure 4

Code

dd_fit_hdd  |>
  filter(term == 'year') |>
  filter(p.value < 0.05) |>
  mutate_if(is.numeric, ~round(.x, digits = 3)) |>
  select(-c(data, fit, term, glanced)) |>
  DT::datatable(rownames = FALSE)

Table 5: Table of significant HDD fits (based on p-value < 0.05)

Code

dd |>
  filter(month %in% c(6,8,9,10)) |>
  mutate(month_name = lubridate::month(date, label = TRUE)) |>
  ggplot(aes(year, HDD, group = month_name)) +
  geom_point(size = 4, alpha = 0.5) +
  geom_smooth(method = 'lm', formula = 'y ~ x') +
  facet_wrap('month_name', scales = 'free')

Figure 4: HDD vs year for months with signficant trends

Cooling Degree Days

CDD Per Month

Figure 5 shows the distribution of US heating degree days for each month. Not surprisingly CDD tends to be higher in summer months, although there is a decent amount of variability between years.

Code

dd |>
  mutate(month_name = lubridate::month(date, label = TRUE)) |>
  ggplot(aes(month_name, CDD, group = month_name)) +
  geom_boxplot() +
  labs(title = 'Monthly Cooling Degree Days for US',
       x = 'Month',
       y = "Cooling Degree Days")

Figure 5: Boxplot of US cooling degree days for each month

Trends in CDD

Annual

Is there a trend in CDD over time? I would expect that CDD might increase over time due to climate change and the increase in earth’s average temperature.

Figure 6 shows a timeseries of the annual total heating degree days in the US, along with a linear regression line showing a positive trend.

Code

g <- dd_yearly |>
  ggplot(aes(year, CDD)) +
  geom_point(size = 4, alpha = 0.5) +
  geom_smooth(method = 'lm', formula = 'y~x') +
  labs(title = "US Annual Cooling Degree Days")

plotly::ggplotly(g)

Figure 6: Timeseries of annual US CDD

Code

cdd_yearly_fit <- lm(data = dd_yearly, formula = 'CDD ~ year')

broom::tidy(cdd_yearly_fit) |> 
  mutate_if(is.numeric, ~round(.x,digits = 3)) |>
  DT::datatable()

Table 6: Results of linear regression for annual CDD

Looking at the fit metrics shows that the positive trend in CDD is indeed statistically significant (Table 6), with CDD increasing at a rate of 12.28

Monthly

Figure 7 shows timeseries of monthly CDD vs year for the 4 summer months with the highest CDD, with linear regression lines plotted over them. Visually there appears to be a positive trend for each month.

Code

dd |>
  filter(month %in% c(6,7,8,9)) |>
  mutate(month_name = lubridate::month(date, label = TRUE)) |>
  ggplot(aes(year, CDD, group = month_name)) +
  geom_point(size = 4, alpha = 0.5) +
  geom_smooth(method = 'lm', formula = 'y ~ x') +
  facet_wrap('month_name')

Next I’ll apply a linear regression to each month using the same workflow used for HDD.

Code

dd_fit_cdd <- dd |>
  group_by(month) |>
  nest() |>
  mutate(fit = map(data, ~ lm(CDD ~ year, data = .x) ),
         tidied = map(fit, broom::tidy)
  ) |>
  unnest(tidied) |>
  ungroup()

There are significant positive trends in CDD for July, August, September, October, and December (Table 7). Figure 8 shows the data and fits for these months in detail.

Code

dd_fit_cdd  |>
  filter(term == 'year') |>
  filter(p.value < 0.05) |>
  mutate_if(is.numeric,~round(.x,digits = 3)) |>
  select(-c(data, fit, term)) |>
  DT::datatable(rownames = FALSE,options = list(pageLength = 5))

Table 7: Table of significant CDD fits

Code

dd |>
  filter(month %in% c(7,8,9,10,12)) |>
  mutate(month_name = lubridate::month(date, label = TRUE)) |>
  ggplot(aes(year, CDD, group = month_name)) +
  geom_point(size = 4, alpha = 0.5) +
  geom_smooth(method = 'lm', formula = 'y ~ x') +
  facet_wrap('month_name', scales = 'free') +
  guides(x =  guide_axis(angle = 45))

Figure 8: CDD vs year for Months with significant trends

Summary

Annual and monthly heating and cooling degree days for the US 1997-2022 were analyzed. A linear regression was applied to the annual data, and to each month, to determine if there was a trend. Model fits with a p-value less than 0.05 were considered significant.

There is a negative trend in annual HDD (Figure 2, Table 3), with HDD decreasing at a rate of 13.2 HDD per year.
There is a positive trend in annual CDD (Figure 6, Table 6), with CDD increasing at a rate of 12.28 CDD per year.
There are significant negative trends in HDD for the months of June, August, September, and October (Table 5).
There are significant positive trends in CDD for the months of July, August, September, October, and December (Table 7).

Future Research Questions

Implications for energy use

How much actual energy use (kWh, therms, etc.) corresponds to a degree day? This will depend on many factors, but we could make a rough estimate by comparing degree days to energy demand or consumption.

SessionInfo

Expand for Session Info

R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.7.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Denver
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] plotly_4.10.4   DT_0.33         broom_1.0.7     janitor_2.2.1  
 [5] lubridate_1.9.4 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4    
 [9] purrr_1.0.2     readr_2.1.5     tidyr_1.3.1     tibble_3.2.1   
[13] ggplot2_3.5.1   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] gtable_0.3.6      xfun_0.50         bslib_0.8.0       htmlwidgets_1.6.4
 [5] lattice_0.22-6    tzdb_0.4.0        vctrs_0.6.5       tools_4.4.1      
 [9] crosstalk_1.2.1   generics_0.1.3    parallel_4.4.1    pkgconfig_2.0.3  
[13] Matrix_1.7-1      data.table_1.16.4 lifecycle_1.0.4   compiler_4.4.1   
[17] farver_2.1.2      munsell_0.5.1     snakecase_0.11.1  htmltools_0.5.8.1
[21] sass_0.4.9        yaml_2.3.10       lazyeval_0.2.2    pillar_1.10.1    
[25] crayon_1.5.3      jquerylib_0.1.4   cachem_1.1.0      nlme_3.1-166     
[29] tidyselect_1.2.1  digest_0.6.37     stringi_1.8.4     splines_4.4.1    
[33] labeling_0.4.3    fastmap_1.2.0     grid_4.4.1        colorspace_2.1-1 
[37] cli_3.6.3         magrittr_2.0.3    withr_3.0.2       scales_1.3.0     
[41] backports_1.5.0   bit64_4.5.2       timechange_0.3.0  rmarkdown_2.29   
[45] httr_1.4.7        bit_4.5.0.1       hms_1.1.3         evaluate_1.0.3   
[49] knitr_1.49        viridisLite_0.4.2 mgcv_1.9-1        rlang_1.1.4      
[53] glue_1.8.0        renv_1.0.9        rstudioapi_0.17.1 vroom_1.6.5      
[57] jsonlite_1.8.9    R6_2.5.1

References

Robinson, David, Alex Hayes, and Simon Couch. 2023. “Broom: Convert Statistical Objects into Tidy Tibbles.” https://CRAN.R-project.org/package=broom.

Wickham, Hadley, and Lionel Henry. 2023. “Purrr: Functional Programming Tools.” https://CRAN.R-project.org/package=purrr.

Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2023. “Tidyr: Tidy Messy Data.” https://CRAN.R-project.org/package=tidyr.

--- title: "Analyzing Trends in Heating and Cooling Degree days using R" image: CopyOffig-cdd-yearly-1.png date: 2023-12-12 date-modified: today format: html: code-link: true code-fold: show code-tools: true toc: true fig-width: 10 fig-height: 8 tbl-cap-location: bottom editor: visual categories: [energy, R, weather, climate] freeze: auto bibliography: references.bib --- # Introduction [Degree days](https://www.eia.gov/energyexplained/units-and-calculators/degree-days.php) are useful as a measure of building heating and cooling demands. A degree day is calculated as the difference between the average temperature (the average of the high and low temperature for the day) and a reference temperature (in the US 65°F is used). For example, if the average temperature today is 40°F, that would be 25 heating degree days (HDD). A summer day with an average temperature of 85°F would have 20 cooling degree days (CDD). Degree days are usually well correlated with the amount of energy used to heat or cool a home. I was interested in obtaining and analyzing degree day data; in particular I wanted to see if there were any noticeable trends over time. Given an overall increase in earth's average temperature due to climate change, I would hypothesize that there might be an overall increase in CDD and a decrease in HDD. Changes in heating or cooling degree days would have implications for the amount of energy needed in the future to heat and cool residential or commercial buildings, resulting changes in demand on the electric grid, and implications for related carbon emissions (either for the power grid or from burning fossil fuels to heat buildings). # Data I obtained heating and cooling degree day [data](https://www.eia.gov/outlooks/steo/data/browser/#/?v=28&f=M&s=&start=199701&end=202412&id=&linechart=ZWCDPUS~ZWHDPUS&maptype=0&ctype=linechart&map=) from the [U.S. Energy Information Administration](https://www.eia.gov/) for the US. Note these data are [*weighted by population*](https://www.eia.gov/tools/glossary/index.php?id=Population-weighted%20Degree%20Days), to reflect the effect of both temperature and population on energy demands for cooling and heating. You can see details of how the EIA data are calculated [here](https://www.eia.gov/outlooks/steo/special/pdf/2012_sp_04.pdf). ```{r} #| label: load-libraries #| code-fold: true suppressPackageStartupMessages(library(tidyverse)) ggplot2::theme_set(theme_grey(base_size = 15)) suppressPackageStartupMessages(library(janitor)) library(broom) suppressPackageStartupMessages(library(DT)) suppressPackageStartupMessages(library(plotly)) ``` I'll only use years we have complete data for (1997-2022). @tbl-dd-monthly shows the data after being loaded and cleaned up. ```{r} #| label: load-data region <- "u_s" dd <- read_csv(paste0('data/EIA_DegreeDays_Monthly_',region,'.csv'), skip = 4, show_col_types = FALSE) |> janitor::clean_names() |> rename(CDD = paste0('cooling_degree_days_',region, '_cooling_degree_days_cooling_degree_days')) |> rename(HDD = paste0('heating_degree_days_',region, '_heating_degree_days_heating_degree_days')) |> mutate(date = lubridate::my(month)) |> select(-month) |> mutate(month = lubridate::month(date)) |> mutate(year = lubridate::year(date)) |> filter(year > 1996, year < 2023) # keep only complete years ``` ```{r} #| label: tbl-dd-monthly #| tbl-cap: "Table of monthly degree day data for US" #| code-fold: true dd |> DT::datatable(options = list(pageLength = 5), rownames = FALSE) ``` I'll also make a dataframe of the yearly totals (@tbl-dd-yearly) ```{r} #| label: tbl-dd-yearly #| tbl-cap: "Table of yearly degree days for the US" dd_yearly <- dd |> filter(year > 1996) |> group_by(year) |> summarise(HDD = sum(HDD, na.rm = TRUE), CDD = sum(CDD, na.rm = TRUE) ) dd_yearly |> DT::datatable(options = list(pageLength = 5), rownames = FALSE) ``` # Analysis ## Heating Degree Days @fig-hdd-monthly-boxplot shows the distribution (using a [boxplot](https://en.wikipedia.org/wiki/Box_plot)) of US heating degree days for each month. Not surprisingly HDD tends to be higher in winter months, although there is a decent amount of variability between years. ### HDD Per Month ```{r} #| label: fig-hdd-monthly-boxplot #| fig-cap: "Boxplot of US heating degree days for each month" dd |> mutate(month_name = lubridate::month(date, label = TRUE)) |> ggplot(aes(month_name, HDD, group = month_name)) + geom_boxplot() + labs(title = 'Monthly Heating Degree Days for US (1997-2022)', x = 'Month', y = "Heating Degree Days") ``` ### Trends in HDD Is there a trend in HDD over time? I would expect that HDD might decrease over time due to climate change and the increase in earth's average temperature. #### Annual @fig-hdd-yearly shows a timeseries of the annual total heating degree days in the US, along with a linear regression line that shows a negative trend. ```{r} #| label: fig-hdd-yearly #| fig-cap: "Timeseries of annual US HDD with linear regression line." g <- dd_yearly |> ggplot(aes(year, HDD)) + geom_point(size = 4, alpha = 0.5) + geom_smooth(method = 'lm', formula = 'y~x') + labs(title = "US Annual Heating Degree Days") plotly::ggplotly(g) ``` ```{r} #| label: tbl-hdd-annual-fit #| tbl-cap: "Results of linear regression for annual HDD" hdd_yearly_fit <- lm(data = dd_yearly, formula = 'HDD ~ year') broom::tidy(hdd_yearly_fit) |> mutate_if(is.numeric,~round(.x,digits = 3)) |> # round numeric values for table DT::datatable() ``` There is a fair bit of variability, but looking at the fit metrics (@tbl-hdd-annual-fit) shows that the negative trend in HDD is statistically significant (p-value \< 0.05). Annual heating degree days are decreasing at a rate of `r -round(hdd_yearly_fit$coefficients['year'])` HDD per year. #### Monthly We have seen that there is a negative trend in annual HDD; what are the trends for individual months? @fig-hdd-time-winter shows timeseries of monthly HDD vs year for winter months, with linear regression lines plotted over them. Visually there appears to be a negative trend for some of the months. ```{r} #| label: fig-hdd-time-winter #| fig-cap: "HDD vs year for winter months" dd |> filter(month %in% c(11,12,1,2,3,4)) |> mutate(month_name = lubridate::month(date, label = TRUE)) |> ggplot(aes(year, HDD, group = month_name)) + geom_point(size = 3, alpha = 0.5) + geom_smooth(method = 'lm', formula = 'y ~ x') + facet_wrap('month_name', scales = 'free') + guides(x = guide_axis(angle = 45)) ``` To better quantify these trends I want to fit a linear regression to the data for each month and examine the results. This could be done with a *for loop*, but I will take advantage of a nice [nested workflow](https://broom.tidymodels.org/articles/broom_and_dplyr.html) using the *tidyr* [@tidyr], *broom* [@broom], and *purrr* [@purrr] packages. ```{r} #| label: tbl-hdd-monthly-fits #| tbl-cap: "Results of linear regression fits of heating degree days for each month" dd_fit_hdd <- dd |> group_by(month) |> nest() |> mutate(fit = map(data, ~ lm(HDD ~ year, data = .x) ), tidied = map(fit, broom::tidy), glanced = map(fit, broom::glance) ) |> unnest(tidied) |> ungroup() dd_fit_hdd |> mutate_if(is.numeric, ~round(.x,digits = 3)) |> DT::datatable(rownames = FALSE, options = list(pageLength = 5)) ``` Now that the data and model fit results are in a tidy dataframe (@tbl-hdd-monthly-fits), they can be easily filtered to identify significant fits using p-values (@tbl-hdd-sigfits). The months with significant trends are June, August, September, and October. These months and the linear regression lines are shown in @fig-hdd-months-sigfit ```{r} #| label: tbl-hdd-sigfits #| tbl-cap: "Table of significant HDD fits (based on p-value < 0.05)" dd_fit_hdd |> filter(term == 'year') |> filter(p.value < 0.05) |> mutate_if(is.numeric, ~round(.x, digits = 3)) |> select(-c(data, fit, term, glanced)) |> DT::datatable(rownames = FALSE) ``` ```{r} #| label: fig-hdd-months-sigfit #| fig-cap: "HDD vs year for months with signficant trends" dd |> filter(month %in% c(6,8,9,10)) |> mutate(month_name = lubridate::month(date, label = TRUE)) |> ggplot(aes(year, HDD, group = month_name)) + geom_point(size = 4, alpha = 0.5) + geom_smooth(method = 'lm', formula = 'y ~ x') + facet_wrap('month_name', scales = 'free') ``` ## Cooling Degree Days ### CDD Per Month @fig-cdd-monthly-boxplot shows the distribution of US heating degree days for each month. Not surprisingly CDD tends to be higher in summer months, although there is a decent amount of variability between years. ```{r} #| label: fig-cdd-monthly-boxplot #| fig-cap: "Boxplot of US cooling degree days for each month" dd |> mutate(month_name = lubridate::month(date, label = TRUE)) |> ggplot(aes(month_name, CDD, group = month_name)) + geom_boxplot() + labs(title = 'Monthly Cooling Degree Days for US', x = 'Month', y = "Cooling Degree Days") ``` ### Trends in CDD #### Annual Is there a trend in CDD over time? I would expect that CDD might increase over time due to climate change and the increase in earth's average temperature. @fig-cdd-yearly shows a timeseries of the annual total heating degree days in the US, along with a linear regression line showing a positive trend. ```{r} #| label: fig-cdd-yearly #| fig-cap: "Timeseries of annual US CDD" g <- dd_yearly |> ggplot(aes(year, CDD)) + geom_point(size = 4, alpha = 0.5) + geom_smooth(method = 'lm', formula = 'y~x') + labs(title = "US Annual Cooling Degree Days") plotly::ggplotly(g) ``` ```{r} #| label: tbl-cdd-annual-fit #| tbl-cap: "Results of linear regression for annual CDD" cdd_yearly_fit <- lm(data = dd_yearly, formula = 'CDD ~ year') broom::tidy(cdd_yearly_fit) |> mutate_if(is.numeric, ~round(.x,digits = 3)) |> DT::datatable() ``` Looking at the fit metrics shows that the positive trend in CDD is indeed statistically significant (@tbl-cdd-annual-fit), with CDD increasing at a rate of `r round(cdd_yearly_fit$coefficients['year'],2)` #### Monthly @fig-cdd-time-summer shows timeseries of monthly CDD vs year for the 4 summer months with the highest CDD, with linear regression lines plotted over them. Visually there appears to be a positive trend for each month. ```{r} #| label: fig-cdd-time-summer #| fig-cap: "CDD vs year for summer month" dd |> filter(month %in% c(6,7,8,9)) |> mutate(month_name = lubridate::month(date, label = TRUE)) |> ggplot(aes(year, CDD, group = month_name)) + geom_point(size = 4, alpha = 0.5) + geom_smooth(method = 'lm', formula = 'y ~ x') + facet_wrap('month_name') ``` Next I'll apply a linear regression to each month using the same workflow used for HDD. ```{r} dd_fit_cdd <- dd |> group_by(month) |> nest() |> mutate(fit = map(data, ~ lm(CDD ~ year, data = .x) ), tidied = map(fit, broom::tidy) ) |> unnest(tidied) |> ungroup() ``` There are significant positive trends in CDD for July, August, September, October, and December (@tbl-cdd-sigfits). @fig-cdd-monthly-sigfit shows the data and fits for these months in detail. ```{r} #| label: tbl-cdd-sigfits #| tbl-cap: "Table of significant CDD fits" #| code-fold: true dd_fit_cdd |> filter(term == 'year') |> filter(p.value < 0.05) |> mutate_if(is.numeric,~round(.x,digits = 3)) |> select(-c(data, fit, term)) |> DT::datatable(rownames = FALSE,options = list(pageLength = 5)) ``` ```{r} #| label: fig-cdd-monthly-sigfit #| fig-cap: "CDD vs year for Months with significant trends" #| code-fold: true dd |> filter(month %in% c(7,8,9,10,12)) |> mutate(month_name = lubridate::month(date, label = TRUE)) |> ggplot(aes(year, CDD, group = month_name)) + geom_point(size = 4, alpha = 0.5) + geom_smooth(method = 'lm', formula = 'y ~ x') + facet_wrap('month_name', scales = 'free') + guides(x = guide_axis(angle = 45)) ``` # Summary Annual and monthly heating and cooling degree days for the US 1997-2022 were analyzed. A linear regression was applied to the annual data, and to each month, to determine if there was a trend. Model fits with a p-value less than 0.05 were considered significant. - There is a negative trend in annual HDD (@fig-hdd-yearly, @tbl-hdd-annual-fit), with HDD decreasing at a rate of `r -round(hdd_yearly_fit$coefficients['year'],2)` HDD per year. - There is a positive trend in annual CDD (@fig-cdd-yearly, @tbl-cdd-annual-fit), with CDD increasing at a rate of `r round(cdd_yearly_fit$coefficients['year'],2)` CDD per year. - There are significant negative trends in HDD for the months of June, August, September, and October (@tbl-hdd-sigfits). - There are significant positive trends in CDD for the months of July, August, September, October, and December (@tbl-cdd-sigfits). # Future Research Questions ## Implications for energy use - How much actual energy use (kWh, therms, etc.) corresponds to a degree day? This will depend on many factors, but we could make a rough estimate by comparing degree days to energy demand or consumption. # SessionInfo ::: {.callout-tip collapse="true"} ## Expand for Session Info ```{r, echo = FALSE} sessionInfo() ``` ::: # References