ts_format takes a line list of case data and formats it into weekly or daily time series, which can be used to fit a seasonal baseline.

ts_format(line.list, datevar, statevar, sub.statevar = "none",
  agevar = "none", covs = character(), syndromes, resolution = "day",
  remove.final = F)

Arguments

line.list

A dataframe containing one line for each case (e.g., ED visit, hospitalization). At a minimum, each row should have the date of the visit (YYYY-MM-DD), the state code (e.g. "NY"), and a 0/1 variable for each syndrome of interest (e.g. influenza-like illness, fever, cough). All visits should be included in the dataframe, even if the case did not have any of the syndromes of interest. For instance, for emergency department data, every ED visit should have a line represented in the dataframe

datevar

A string. What variable contains the date?

statevar

A string. What variable contains the 2-digit state code (e.g., "NY")?

sub.statevar

A string. What variable contains the local geography identifier? (e.g., county, borough)

agevar

A string. What variable contains the age group? Use 'none' if there is no age grouping in the data

covs

A character vector. Which, if any, variables in ds should be treated as covariates in fitting the baseline model? Default is to not consider any variables in ds to be covariates.

syndromes

A character vector. Which variables contain counts of syndromic data? (e.g., c('ili', 'respiratory'))

resolution

One of c("day", "week", "month"). What is the data binned by?

remove.final

A logical scalar. Remove the final date in the dataset? This is someties helpful if the data from the last date is unfinalized or otherwise untrustworthy.

Value

A dataframe in the "long" format, with a row for each time period (as in, week or day), and location (e.g. state, county), and age category. There is a column for date, age category, location, and the number of counts for each of the selected syndromes. There is also a column that tallies all visits, regardless of cause

Examples

n.obs <- 10000 set.seed(42) simulated_data <- as.data.frame(matrix(NA, nrow=n.obs, ncol=5)) names(simulated_data) <- c('state','date','agegrp','ili','resp') simulated_data$state<- c( rep('CT', times=n.obs*0.3), rep("NY", times=n.obs*0.7) ) simulated_data$agegrp <- sample(1:5, n.obs, replace=T) simulated_data$date <- sample(seq.Date(from=as.Date('2019-01-01'), by='day', length.out=500), 1000, replace=T) simulated_data$ili <- rbinom(n=n.obs, size=1, prob=0.05) simulated_data$resp <- rbinom(n=n.obs, size=1, prob=0.1) ts1 <- ts_format(line.list=simulated_data, datevar='date', agevar='agegrp', statevar='state', syndromes=c('ili','resp'))