Posts Tagged ‘evaluation’

Pick a Number

May 3, 2010, sponsored by the Annenberg Public Policy Center of the University of Pennsylvania,  is designed to reduce confusion and deception in U.S. politics. We’ve been following its commentary for some time, and have been generally impressed. As you’d expect, it has included a fair amount of coverage of the Recovery Act — often  calling into question exaggerated statements from all sides.

A couple of weeks ago, we featured a post about the latest job impact report by the President’s Council of Economic Advisers. We just came across’s  look at that report, which questions the Council’s statement that “a wide range of private and government analysts concur with our estimates” of 2.2 million to 2.8 million in job impacts.”

In its assessment, points out that only one of the five other estimates cited by the report falls into the range of jobs cited by the Council’s economists. It notes that of the private economists cited, all were “well below the range estimated by the White House. “ For example, “Mark Zandi of Moody’s put the number at 1,896,000, HIS/Global Insight put it at 1,707,000 and Macroeconomic Advisers estimated 1,462,000. While the nonpartisan Congressional Budget Office came nearly up to the CEA estimate of job impact on the high side (2.7 million) its low side estimate was 1.2 million – about a million short of the Council’s lower estimate.


A Stark Difference

April 16, 2010

This week, the Council of Economic Advisers released its third quarterly report (for Q1 2010) on the impact of the federal stimulus with respect to jobs created or saved. The headline numbers: ARRA stimulated between 2.2 and 2.8 million jobs.

NPR’s blog “The Two-Way” pointed out that adding those jobs to the March unemployment figures suggests that there could have been 5 million more people out of work in March 2010 than in March 2009, if not for the stimulus.

The Council is careful to note the challenge of accurately estimating the effects of a large policy like ARRA. In the Daily Brief on, Sarah Krouse writes, “The report said the stark difference between the White House calculations and recipient-reported data — which estimated a total number of 1.2 million jobs created or saved through the fourth quarter of 2009 — is the result of several factors, including the fact that reporting requirements only apply to about a third of ARRA funding and the direct spending parts of the stimulus requiring reports are moving slower than other portions of the stimulus.”

Cash for Clunkers: Digging Deeper

April 15, 2010

The debate over the economic benefits of last summer’s “Cash for Clunkers” program continues. Recently, the White House blog and exchanged salvos in the argument.  But the economic benefits are only one part of the question. Examining performance metrics other than economic impact — such as the environmental benefits — sheds additional light on what exactly Cash for Clunkers accomplished.

Happily, fedgazette editor Ronald A. Wirtz  has a piece on the website of the Federal Reserve Bank of Minneapolis that digs  into data regarding some of the fuel efficiency savings that the country could gain from Cash for Clunkers.

Wirtz looked at the program in Minnesota, Montana, North Dakota, South Dakota, Wisconsin and the Upper Peninsula of Michigan. He found that gas efficiency gains were genuine, but the devil was in the details. Explained Wirtz:

“Average fuel efficiency between trade-ins and newly purchased vehicles rose about 50 percent, from roughly 15.5 miles per gallon to 24.

“But that covers up a lot of variation, part of which suggests a sop to owners of older, typically larger vehicles who used the opportunity to upgrade to something with only marginally better gas mileage.

“For example, about one quarter of the new vehicles purchased through the clunker program in the Upper Peninsula and the Dakotas had fuel efficiency gains of four miles per gallon or less. In many cases, older trucks and SUVs were simply traded in for newer but only marginally more fuel-efficient versions. Only 10 percent of all vehicles bought in the district under the program got 30 mpg or better.”  (see Chart 1).

The gains shouldn’t  be brushed aside, of course. As Wirtz is careful to note, even seemingly small fuel efficiency upgrades can save enormous amounts of fuel, which becomes more evident when using “Gallons per Thousand Miles” rather than “Miles per Gallon.”

A civilized followup

April 8, 2010

Veronique de Rugy

Last week, we noted the excellent back-and-forth between Veronique de Rugy and Nate Silver regarding some of de Rugy’s research into stimulus spending. The best part of the debate, at least for us, was that two smart people put aside differences to prod each other to better answers and sharper thinking.

Less than a week later, de Rugy has already posted a revised version of her study, using methodological improvements that came out of her discussion with Silver. In the spirit of transparency, she has posted her revision, the original, and all her data here. Her own brief summary of the revised results is posted on The Corner. And she also sums up the questions that many researchers, journalists and citizens (including us) are asking:

There is still much more to learn on the question “How are stimulus funds being spent and why?”

The more I dig into this, the more important the question seems. After all, we’re talking about hundreds of billions of dollars. This is a question I will continue to try to shed some light on. And I’ll continue to refine my model to do so. For instance, I would like to try to understand better what we could learn from the design of the stimulus bill itself — what can we say about the fact that so much money is in fact spent or going to state capitals? (It’s not just that it’s being allocated there; the data I use tells where the money is actually spent.) What explains the fact that so much more of the money is going to the Department of Education than to the Department of Transportation? And what are we getting for our money?

We hope that de Rugy (and Silver and many others) continue asking these questions, especially that last one, and keep prodding each other toward better answers–and do so as transparently and cooperatively as de Rugy and Silver.

Conflicting goals, weatherization and a little about soccer

April 6, 2010

One of our very favorite Governing columns that we’ve written over the years was about performance measurement and girls’ soccer. As we watched our daughter play, we noticed we were seeing some of the same performance issues come up as we’d seen in government. One of the chief problems was that of conflicting goals. Coaches said they wanted to develop players and win games, but doing both those things simultaneously was tricky. If you played your best kids all the time, you might win, but you wouldn’t do as much as you could to develop the skills of the bottom half of the team. If you played your weaker players – thus developing them – it was likelier that you’d lose.

Sandy Greene, daughter and performance measure

We can’t help notice that the Recovery Act is also struggling with a passel of conflicting goals. The area that has been most significantly paralyzed by this problem has been weatherization. The Recovery Act sought to spur economic growth, create jobs and lower energy bills by providing insulation, caulking, weather stripping, etc., to low income families. But the goal was also to make sure that the jobs paid the prevailing wage. Since weatherization work had not been covered by this requirement before, the arduous and detailed task of calculating and setting proper wage rates fell to the Department of Labor, and then the Department of Energy had to help states figure out how to certify that these payroll requirements were met.

Hence the delay. On March 5, California auditor Elaine Howle testified before the Committee on House Oversight and Government Reform that when the auditor’s office finished its fieldwork in December, no houses had yet been weatherized in California even though $93 million had been available since the end of July. By February, the Department of Community Services told the audit office that 210 homes had been weatherized. Putting aside the fact that Howle seemed a bit dubious of this number in her testimony, those 210 houses still fell far short of the state goal of weatherizing 1,433 houses per month.

There are lessons to be learned here, and we think one of the primary ones is for government decision-makers to assess early on where worthwhile, but conflicting goals — like getting work done speedily and setting up a complex new payroll structure — may end up causing problems.

We’re going to cover more about the technical side of this issue in a couple of upcoming posts.

A Civilized Debate

April 2, 2010

Anyone who’s interested in how — and how well — stimulus dollars are being spent should read the exchange between Veronique de Rugy of George Mason University’s Mercatus Center and Nate Silver of about some pivotal data and quality issues. Of course, we could summarize their points; but we think you’ll find it much more illuminating to read through the exchange itself, and let the authors’ speak for themselves.

We suggest that you read the four pieces in reverse order (starting with number four and moving back to number one) Trust us, it’ll make more sense that way. But here they are in the original order:

  1. de Rugy’s paper, which got the ball rolling
  2. Silver’s first response
  3. de Rugy’s reply
  4. Silver’s second response

The exchange is worthy of attention because it highlights two great points:

1) We really like seeing smart people getting past their initial suspicions of each other in order to begin hammering out some fundamental questions about the quality of data and potential issues in the stimulus package itself.

We’ll never get very far in understanding what works and how well it works if we can’t get to points of basic agreement across partisan or ideological aisles about the ground rules of analysis. This civilized debate underscores how partisan and toxic much of the discussion over the stimulus (and other recent government plans) has become.

It has also been suggested that this is a model for the quick, effective peer-review that the internet facilitates.

2) In the course of their discussion, de Rugy and Silver raise some very good and sobering questions about the quality of the data for analytical purposes provided by

Note these two comments, for example:

“I worked within the confines of $18 million website, a website that we were promised would allow us to track the money to the last cent. Obviously, that is not the case. The money trail ends at the level reported, and from the website one cannot tell where the money went next.”–de Rugy’s reply


“I share de Rugy’s disappointment with the quality of the data available at Frankly, I am not sure that testing her hypothesis to a peer-reviewable level of robustness is possible given the middling quality of data and the inherent ambiguity with how particular projects must be assigned to particular congressional districts.”–Silver’s 2nd response

We  hope that the officials working on are paying attention. We were excited by the recent appointment of data presentation guru Edward Tufte to the ARRA Recovery Independent Advisory Panel, and hope that that might be a good step in making more useful to citizens and researchers alike.

Youth shall be served. . .

March 22, 2010

The Recovery Act included about $1.2 billion to fund jobs for disadvantaged young people, with a healthy portion going to the 2009 Summer Employment Program. The numbers are impressive. About 250,000 youths received job assistance through the pre-existing Workforce Investment Act in 2008. In 2009, with the stimulus dollars, that number grew to 355,000, of which about 88 percent participated during the summer.

A new evaluation of the summer job program by Mathematica Policy Research paints a generally positive picture of the program and includes a number of lessons learned. Under a contract with the Department of Labor, Mathematica interviewed young people, program staff and employers  and sought information about a variety of topics including the two performance measures required by the Recovery Act and the Department of Labor: an assessment of whether young people achieved a “work readiness skill goal”  at the end of their experience and data on whether they completed their summer employment.  Results for both measures are provided for the fifty states in two appendices at the end of the 148-page report.

The stimulus-dollar-funded Summer Youth Initiative operated by the Wise Workforce Center in Virginia

Some of the most interesting information can be found in the variation of completion rates among the states. While 82 percent of young people completed their summer employment nationally, 11 states had completion rates higher than 90 percent:  Alaska, Connecticut, Florida, Georgia, Kentucky, Maryland, Massachusetts, Oregon, Vermont, Washington and West Virginia. At the low end, seven states had completion rates of less than 70 percent: Delaware, Maine, Montana, New Jersey, New Mexico, North Dakota and Utah. As far as we can see, this data, alone, is fodder for some really interesting explorations as to why young people in some states seemed to stick with their jobs better than others.

The information on work readiness, unfortunately, must be regarded with some caution because a lot of freedom was given to entities in determining how to make this assessment.  In fact, one of the recommendations from Mathematica was that the Employment and Training Administration should provide more guidance on how to measure this in a way that ensures “the use of a valid measure across all local areas.” We’re grateful to Mathematica for pointing this out. It’s long been a frustration of ours to see how much national data is skewed by the fact that it is self-reported, without consistency or sufficient guidance.

Still, even though the comparative state information needs to be viewed with caution, the results bear consideration:

Nationally, 75 percent of the young people who participated in the program over the summer attained a work readiness skill goal, but the variation among states was also great here.  Six states reported that this goal was accomplished by more than 90 percent of participants: Arkansas, Georgia, Maryland, New Hampshire, Rhode Island and Wisconsin. Twelve states reported results under 70 percent:  Delaware, Kansas, Michigan, Montana, New jersey (with a startling 21 percent), North Carolina, North Dakota, Oklahoma, Oregon , Utah, Vermont and Wyoming.

For what it’s worth, we note that only two states made it to the very top tier of both lists: Georgia and Maryland.

Starting off on the right foot. . .

March 15, 2010

In recent months, as we’ve talked with evaluation professionals in the states and local governments, we have worried a bit that they weren’t more involved in the early days after the Recovery Act went into action. An old friend, John Turcotte, who is director of the program evaluation division for the North Carolina General Assembly, wasn’t complaining – but he did point out that he hasn’t been asked, thus far, to evaluate any of the aspects of the Recovery Act. A little north, in Virginia, David Von Moll, who is the state comptroller, told us that “We haven’t specifically related ARRA activity to our performance management process. That’s a goal of what we might want to do at some point.”

It certainly seems like a worthwhile goal. Of course, there’s no shortage of data associated with the Recovery Act, and the reporting requirements are keeping local officials just about as busy as a florist on Mothers Day.  The funds are scrutinized, of course,  by leagues of individuals who are focused on accountability. But  attention to program results is significantly less pronounced. Perhaps the emphasis – and public dialogue – would already have shifted in that direction had there been more early involvement from the performance auditors, legislative evaluators, and performance budgeters  who spend their lives dealing with measuring the impact of programs.

“I felt that the goals of the stimulus office were the same as the performance improvement office, but they were clearly two distinct operations,” says Sharon Daboin, the former Deputy Secretary for Performance Improvement with the Budget Office in Pennsylvania. “I tried to share the tools that were being used to track objectives and accomplishments within each program office, but a key person in the stimulus office said ‘Your focus and mine are completely different.'”

There are some states, like Maryland, in which the stimulus reporting is directly connected with the performance reporting generally. But, that appears to be more the exception than the rule.

A comment from Gary VanLandingham, director of Florida’s Office of Program Policy Analysis and Government Accountability sums up our feelings nicely (although he wasn’t talking about the Recovery Act when he said this): “I think, ideally, if you have something that is going to have to be evaluated or managed, then it would be good if folks who were designing the programs would bring in some of the evaluation folks at the beginning to know what kind of data we’d need to collect. It’s difficult to retrofit management systems to go back and collect data which ideally you should have been collecting to start with.”