After each treatment, the respondent was again asked the relevant questions from the baseline survey.

In the same period, respondents allocated to the control group would also again answer questions from the baseline survey.

The effective result of this design was that every respondent was exposed to each of the five topics in random order, and for every topic but one was exposed to either video (only) or Stanford University with SRBI. This sample was recruited in person, and at the end of their one-year participation, participants were asked whether they were interested in joining the RAND American Life Panel. Most of these respondents were given a laptop and broadband Internet access. For more information about the ALP sample recruiting methodology as well as access to the data collected in the ALP to date, the reader is referred to http://mmic.rand.org.

narrative (only) format. For one randomly chosen topic, the respondent saw both the video and the written narrative. Importantly, respondents could not opt to receive a certain intervention.

The benefits of the randomization design lie in the power of causal inference. When estimating average effects of the program by topic and format, we are able to pool the data, regardless of wave and treatment sequence. To estimate the program effect, we can use a simple comparison of means that captures a differences-in-differences (DID) approach, in which changes in correct answers of the respondents exposed to videos or narratives (the treatment group) are compared to changes in answers in the control group.

In April 2011, the same quiz was administered again to all experiment participants, both treatment and control groups. Below, we discuss the short-run results. In Section V, we discuss the results of the 2011 follow-up test.

IV. Results IV.A Qualitative Focus Group Responses Both the savers and non-savers described the program’s level of content difficulty as appropriate for themselves. The non-saver group found more of the information new, while savers found that the intervention reinforced and supplemented knowledge of concepts with which they were already somewhat familiar. Overall, group participants described themselves as not intimidated by the program and also did not feel as though it talked down to them. The saver and non-saver groups expressed fairly similar thoughts on format differences. Some expressed keener interest in the videos as they did not require the work of reading, while others noted that they preferred having access to both formats. No one argued for written narratives alone.2 Unprompted, focus group participants also described specific actions they planned to take as a result of viewing the videos, but no such plans were voiced as a result of interacting with the written narratives.3

IV.B Field Experiment Results

Table 1 shows the percentage of correct answers to each of the questions at baseline (May 2010).

Average baseline knowledge of these concepts varied significantly, with correct responses to some of the questions falling below 50%. However, 92% of respondents were able to answer the first question on compound interest correctly.

Table 1 further breaks down responses by gender, education, age, and income. More men answered questions correctly than women on every question at baseline, confirming the results of many other surveys on financial literacy (Lusardi and Mitchell, 2014). Similarly, respondents Some in the saver group suggested that the videos were more motivating and inspired them to take action. In response to the videos, one focus group members said, “That made me want to run out and invest some money.” Another said that the videos were “ready for television!

Gonna be like, ‘Man, I need to start investing!’ People will definitely react to that.” One described starting a new job several months before and said, “I haven’t gotten around to filling out the 401(k) forms…I will be filling out those forms tomorrow.” Another said, “Last year I got a new nephew and a godson so I think that I’ll open an account for each of them to begin the compound interest.” age 18–40 performed worse than those age 41–64 and worse than those age 65 and older on all but two questions, again consistent with related research (Lusardi, Mitchell, and Curto, 2010;

Lusardi and Mitchell, 2014). On every question, those with incomes below $35,000 performed more poorly than those with incomes between $35,000 and $75,000, who in turn performed more poorly than respondents earning $75,000 and above. The same pattern was found for education;

respondents with high school diplomas or less performed more poorly on each question than respondents who attended some college, who in turn performed more poorly than those with college diplomas, as has been found to be the case in other work (Lusardi and Mitchell, 2014).

Table 2 shows the numbers of each treatment that were administered during each wave, by topic and medium (narrative or video). Wave 1 went to field in August 2010 and the surveys were closed on November 3, 2010.4 Each written narrative / video was seen alone by between 1,427 and 1,497 respondents, while each topic was administered in the double format consisting of both the narrative and the video to between 1,017 and 1,082 respondents.

IV. A Quantitative Findings

Tables 3A and 3B present a summary of performance in each of the five topic areas for the entire sample of respondents, aggregated across individual survey questions.5 Table 3A shows the results for objective knowledge questions (in terms of average percentage correct answers) and Table 3B shows the results for the self-efficacy questions (in terms of average self-efficacy score on a scale of 1–5, with 1 being the highest). The table shows a summary of the difference in difference treatment effect estimates (comparison of the mean changes in the treatment group and the mean changes in the control).

The column head “Any Treatment” shows means for all respondents presented with an intervention, including those who saw the video, read the narrative, or did both. The column headed “Video only” refers to respondents who have only seen a video on a particular topic;7 similarly the heading “Narrative only” signifies that a respondent has only read a narrative about the topic.

“Both” indicates that a respondent has been exposed to both a video and a narrative about the topic.

Table 3A shows a number of significant positive treatment effects on objective knowledge questions, across all topics. In general, for questions on which baseline knowledge was high (interest compounding/numeracy and both inflation questions) the program had least effect, while for a topic on which baseline knowledge was modest (tax treatment of DC plans) we observe consistently large treatment effects. Insignificant results are found for inflation and one In principle, respondents can answer questions whenever it is convenient for them. Typically most respondents reply within the first two weeks of a field period. After two weeks a reminder is sent by email to those who have not responded yet. This procedure is repeated four weeks after a survey goes to field. Generally, there is no reason to “close” a survey, so that, for instance, even after six weeks responses still trickle in. In our experiment, two weeks after a respondent has answered the first wave, he or she becomes eligible for the second wave; two weeks after answering the second wave he or she becomes eligible for the third wave. Thus depending on when respondents respond a wave, they get asked to do a next wave. We kept waves in the field until November 3, 2010.

More extensive results are presented in Appendix tables 1A and 1B.

employer match question, but in the first case results are somewhat inconsistent across format types and questions; in the second case, no effects are significant.

Overall, Table 3A also shows that video-only treatments result in somewhat more positive effects than narrative-only treatments, but interestingly, one does not seem to strictly dominate the other for all questions. Also interestingly and perhaps contrary to the focus group input, being exposed to both treatments does not seem to strictly dominate showing only videos or only written narratives.

Table 3B shows that the overall effects on self-efficacy appear to be positive and significant across all topics, with the largest gains related to tax-treatment of DC plans and employer matches. However, format effects in this case are particularly interesting. In the area of selfefficacy, we clearly see that video appears to be more effective and consistently positive. For the written narratives, the effects are significantly weaker. Showing both videos and narratives has a significantly stronger effect on self-efficacy than only showing the narrative. The comparison between showing video only or both video and narrative does not exhibit a clear pattern.

In general, the findings of our analysis indicate that Five Steps can effectively deliver knowledge and increase self-efficacy. The general results also support the hypothesis that video format can have larger effects on self-efficacy.

To save space we concentrate from now on the proportion of correct answers by domain.

Furthermore, since in the next section we will present results for the second test, in April 2011, it is convenient to indicate outcome variables as follows: Y0 if outcomes refer to baseline measures;

Y1 for outcomes obtained immediately after the intervention; Y2 for outcomes obtained in the April 2012 follow up.

Table 4 presents treatment effects after controlling for background characteristics. Not surprisingly, a respondent’s score on the quiz after the intervention is strongly related to his or her baseline knowledge. Nevertheless the table confirms the findings shown in Table 3. With the possible exception of inflation, the interventions yielded a highly significant improvement in knowledge of basic financial concepts.

Table 5 reports the findings of regressing the change in the percentage of correct answers on the various treatments and background characteristics. Apart from again showing highly significant effects of the interventions (with the exception of inflation), it also allows us to examine the hypothesis that behavioral modeling works best when subjects are similar to the models presented. It should be recalled that the content and modeling was targeted specifically at the 18– 40 age group, a fact that would be made more salient in the videos where only actors in this age group were shown. There is indeed some weak evidence consistent with this hypothesis.

Compared to the 18–40 group, the 65+ group shows a negative coefficient for four out of five dimensions, with one of these strongly significant (employer match). The remaining characteristics do not appear to have had much of an impact on the effectiveness of the intervention, with the exception of gender. Women show significantly more improvement in knowledge than men for all dimensions, except inflation. We have also tested for interactions between age and the different treatments. Findings are not reported here due to space constraints but can be summarized as follows: Significant interactions are only found for the risk diversification dimension, where generally the effects are stronger for younger respondents and weaker for the 65+ category.

Tables 6 and 7 repeat the analysis of Tables 4 and 5 for self-efficacy. Self-efficacy improves in all dimensions and for all treatments, although the videos appear to be most effective, as observed before. The oldest group shows the smallest gain in self-efficacy, while females appear to gain more than men, and higher incomes more than lower incomes. As with the knowledge question, we have also tested for interactions between age and the different treatments. The results (not reported) show no significant interactions.

V. The Follow Up Eight Months Later

As of April 8, 2011, the participants in the experiment (both in the treatment and the control groups) were asked to once again take the same quiz. This allows us to investigate to what extent the positive effects found right after the intervention remain after some passage of time. Tables 8 and 9 are similar to Tables 5 and 7. Now, however, the dependent variable in each regression is the difference in percentage correct between baseline and April 2011 (Y2-Y0). It is immediately clear that far fewer treatment effects are significantly different from zero than when measured right after the intervention. Table 8 suggests that the video treatment has the greatest lasting effect: four out of five dimensions show significant effects, while for the narrative treatment only two out of five treatments are statistically significant. Interestingly the video with narrative treatment is never significant. The various background characteristics are generally insignificant.

Table 9 exhibits a pattern that is qualitatively similar to Table 8. The video treatment is more often significant (three out of five) than the narrative treatment (two out of five). Age interactions are all insignificant (not reported).

Thus, the positive effects immediately after the intervention have worn off over time. Table 10 provides a direct comparison between the short-run effects (Y1-Y0) and the longer-run effects (Y2Y0). As a simple way to gauge how much of the initial effect remain, we also present (Y2-Y0)/ (Y1Y0). For knowledge questions, the percentage of the initial effect that remains after about eight months is on the order of one-third to one-quarter. For inflation the percentages are larger, but in view of the fact that the initial effect was small for this dimension, we should probably discount this. For the self-efficacy question, only about 10–20% remains.

These finding suggests the need for regular updating of subjects’ knowledge with new material, to avoid quick depreciation of newly gained knowledge.

