Format line list data into time series — ts

ts_format takes a line list of case data and formats it into weekly or daily time series, which can be used to fit a seasonal baseline.

ts_format(line.list, datevar, statevar, sub.statevar = "none",
  agevar = "none", covs = character(), syndromes, resolution = "day",
  remove.final = F)

Arguments

line.list	A dataframe containing one line for each case (e.g., ED visit, hospitalization). At a minimum, each row should have the date of the visit (`YYYY-MM-DD`), the state code (e.g. `"NY"`), and a 0/1 variable for each syndrome of interest (e.g. influenza-like illness, fever, cough). All visits should be included in the dataframe, even if the case did not have any of the syndromes of interest. For instance, for emergency department data, every ED visit should have a line represented in the dataframe
datevar	A string. What variable contains the date?
statevar	A string. What variable contains the 2-digit state code (e.g., `"NY"`)?
sub.statevar	A string. What variable contains the local geography identifier? (e.g., county, borough)
agevar	A string. What variable contains the age group? Use 'none' if there is no age grouping in the data
covs	A character vector. Which, if any, variables in `ds` should be treated as covariates in fitting the baseline model? Default is to not consider any variables in `ds` to be covariates.
syndromes	A character vector. Which variables contain counts of syndromic data? (e.g., `c('ili', 'respiratory')`)
resolution	One of `c("day", "week", "month")`. What is the data binned by?
remove.final	A logical scalar. Remove the final date in the dataset? This is someties helpful if the data from the last date is unfinalized or otherwise untrustworthy.

Value

A dataframe in the "long" format, with a row for each time period (as in, week or day), and location (e.g. state, county), and age category. There is a column for date, age category, location, and the number of counts for each of the selected syndromes. There is also a column that tallies all visits, regardless of cause

Examples

 n.obs <- 10000
 set.seed(42)

 simulated_data <-
   as.data.frame(matrix(NA, nrow=n.obs, ncol=5))

 names(simulated_data) <- c('state','date','agegrp','ili','resp')

 simulated_data$state<- c( rep('CT', times=n.obs*0.3),
                 rep("NY", times=n.obs*0.7) )

 simulated_data$agegrp <- sample(1:5, n.obs, replace=T)
 simulated_data$date   <-
   sample(seq.Date(from=as.Date('2019-01-01'), by='day', length.out=500),
          1000,
          replace=T)

 simulated_data$ili  <- rbinom(n=n.obs, size=1, prob=0.05)
 simulated_data$resp <- rbinom(n=n.obs, size=1, prob=0.1)

 ts1 <- ts_format(line.list=simulated_data,
                  datevar='date',
                  agevar='agegrp',
                  statevar='state',
                  syndromes=c('ili','resp'))