Model relative to the benchmark. For comparing our proposed BMF and BMFSV forecasts with AR model forecasts, the overlap between each alternative model and the benchmark could in principle complicate inference. Our models of interest do not strictly nest the AR models, because the AR models include two lags of GDP growth whereas the nowcasting models include just one lag of GDP growth. But it is possible that the models overlap, in the sense that the true model could be an AR(1) specification. However, since forecast performance suggests that it is unlikely that the AR model and nowcasting models overlap, we proceed to treat them as being non-nested. The results in West (1996) imply that we can test equal accuracy of point forecasts from non-nested models by computing a simple t-test for equal MSE, as we do. To capture some low order serial correlation, we compute the t-statistics with a heteroscedasticity and auto-correlation consistent variance, using a rectangular kernel and bandwidth of 1 and the small sample adjustment of Harvey et al. (1997). To assess the accuracy of density forecasts, we use ARQ-092 cost log-predictive-density scores, motivated and described in such sources as Geweke and Amisano (2010). At each forecast origin, we compute the log-predictive-score by using the realtime outcome and the probability density of the forecast. For all models, we compute the density by using an empirical estimate of the forecast density based on 5000 draws of forecasts, a non-parametric density estimator and a Gaussian kernel. To facilitate model comparisons, we report average log-scores for our BMF and BMFSV models relative to a benchmark AR model with stochastic volatility (ARSV). To provide a rough gauge of the statistical significance of differences in average log-scores, we use the Amisano and Giacomini (2007) t-test of equal means, applied to the log-score for each model relative to the ARSV model. We view the tests as a rough gauge because, for forecasts from estimated models, the asymptotic validity of the Amisano and Giacomini (2007) test requires that, as forecasting moves forwards in time, the models be estimated with a rolling, rather than expanding, sample of data. To allow for the potential of some serial correlation in score differences, we compute the t-statistics with a heteroscedasticity and auto-correlation consistent variance estimate obtained with a rectangular kernel and bandwidth of 1. As further checks on density forecast calibration, we also provide results on the accuracy of interval forecasts and selected results for probability integral transforms (PITs). Motivated in part by central bank interest in forecast confidence intervals and fan charts, recent studies such as Giordani and Villani (2010) have used interval forecasts as a measure of forecast accuracy for OlmutinibMedChemExpress Olmutinib macroeconomic density forecasts. We compute results for 70 interval forecasts, defined as the frequency with which realtime outcomes for GDP growth fall inside 70 highest posterior density intervals estimated in realtime for each model. To provide a rough gauge of statistical significance, we include p-values for the null hypothesis of correct coverage (empirical = nominal rate of 70 ), based on t-statistics computed with a heteroscedasticity and auto-correlation consistent variance estimate obtained with a rectangular kernel and bandwidth of 1. The p-values provide only a rough gauge of significance in the sense that the theory underlying Christofferson’s (1998).Model relative to the benchmark. For comparing our proposed BMF and BMFSV forecasts with AR model forecasts, the overlap between each alternative model and the benchmark could in principle complicate inference. Our models of interest do not strictly nest the AR models, because the AR models include two lags of GDP growth whereas the nowcasting models include just one lag of GDP growth. But it is possible that the models overlap, in the sense that the true model could be an AR(1) specification. However, since forecast performance suggests that it is unlikely that the AR model and nowcasting models overlap, we proceed to treat them as being non-nested. The results in West (1996) imply that we can test equal accuracy of point forecasts from non-nested models by computing a simple t-test for equal MSE, as we do. To capture some low order serial correlation, we compute the t-statistics with a heteroscedasticity and auto-correlation consistent variance, using a rectangular kernel and bandwidth of 1 and the small sample adjustment of Harvey et al. (1997). To assess the accuracy of density forecasts, we use log-predictive-density scores, motivated and described in such sources as Geweke and Amisano (2010). At each forecast origin, we compute the log-predictive-score by using the realtime outcome and the probability density of the forecast. For all models, we compute the density by using an empirical estimate of the forecast density based on 5000 draws of forecasts, a non-parametric density estimator and a Gaussian kernel. To facilitate model comparisons, we report average log-scores for our BMF and BMFSV models relative to a benchmark AR model with stochastic volatility (ARSV). To provide a rough gauge of the statistical significance of differences in average log-scores, we use the Amisano and Giacomini (2007) t-test of equal means, applied to the log-score for each model relative to the ARSV model. We view the tests as a rough gauge because, for forecasts from estimated models, the asymptotic validity of the Amisano and Giacomini (2007) test requires that, as forecasting moves forwards in time, the models be estimated with a rolling, rather than expanding, sample of data. To allow for the potential of some serial correlation in score differences, we compute the t-statistics with a heteroscedasticity and auto-correlation consistent variance estimate obtained with a rectangular kernel and bandwidth of 1. As further checks on density forecast calibration, we also provide results on the accuracy of interval forecasts and selected results for probability integral transforms (PITs). Motivated in part by central bank interest in forecast confidence intervals and fan charts, recent studies such as Giordani and Villani (2010) have used interval forecasts as a measure of forecast accuracy for macroeconomic density forecasts. We compute results for 70 interval forecasts, defined as the frequency with which realtime outcomes for GDP growth fall inside 70 highest posterior density intervals estimated in realtime for each model. To provide a rough gauge of statistical significance, we include p-values for the null hypothesis of correct coverage (empirical = nominal rate of 70 ), based on t-statistics computed with a heteroscedasticity and auto-correlation consistent variance estimate obtained with a rectangular kernel and bandwidth of 1. The p-values provide only a rough gauge of significance in the sense that the theory underlying Christofferson’s (1998).