Wednesday, December 17, 2014

Notes on the culture of economics

I'm finally getting around to reading Piketty's Capital in the 21st Century. That and a project at work has put economics back to the front of my brain. I found the below posts interesting.

Paul Krugman says in "Notes on the Floating Crap Game (Economics Inside Baseball)"

So, academic economics is indeed very hierarchical; but I think it’s important to understand that it’s not a bureaucratic hierarchy, nor can status be conferred by crude patronage. The profession runs on reputation — basically the shared perception that you’re a smart guy. But how do you get reputation? [...] [R]eputation comes out of clever papers and snappy seminar presentations. 
[...]  Because everything runs on reputation, a lot of what you might imagine academic politics is like — what it may be like in other fields — doesn’t happen in econ. When young I would have relatives asking whether I was “in” with the department head or the senior faculty in my department, whether I was cultivating relationships, whatever; I thought it was funny, because all that mattered was your reputation, which was national if not global.

Not all Krugman says is rosy for economists. Nevertheless, this is consistent with my experience when I was in economics. Econ has a hierarchical structure, but it's not based on patronage or solely "length of service."  For example, when I was at the Fed, the internal structure was quite hierarchical in terms of both titles and managerial responsibility. (It kind of reminded me of the military.) However, it also had a  paradoxically "flat" culture. Ideas were swapped and debated constantly. Though I was a lowly research assistant, my forecasts were respected and my input listened to. I was no exception; this was just how we operated.

Simply statistics brought another post to my attention. From Kevin Drum at Mother Jones: Economists are Almost Inhumanly Impartial.

Over at 538, a team of researchers takes on the question of whether economists are biased. Given that economists are human beings, it would be pretty shocking if the answer turned out to be no, and sure enough, it's not. In fact, say the researchers, liberal economists tend to produce liberal results and conservative economists tend to produce conservative results. This is unsurprising, but oddly enough, I'm also not sure it's the real takeaway here. [...]
 What I see is a nearly flat regression line with a ton of variance. [...] If these results are actually true, then congratulations economists! You guys are pretty damn evenhanded. The most committed Austrians and the most extreme socialists are apparently producing numerical results that are only slightly different. If there's another field this side of nuclear physics that does better, I'd be surprised.

(I'll leave it to you to check out the regression line in question.)

Simply statistics's Jeff Leek has a different take.
I'm not sure the regression line says what they think it does, particularly if you pay attention to the variance around the line.

I don't know what Leek is getting at exactly; maybe we agree. What I see is a nearly flat line through a cloud of points. My take isn't that economists are unbiased. Rather, their bias is generally uncorrelated with their ideology. That's still a good thing, right? (Either way, I am not one for the philosophy of p < 0.05 means it's true and p > 0.05 means it's false.)

Here's what I've told other people: microeconomics is about as close to a science as you're going to get. It's a lot like studying predator prey systems in the wild. There's definitely stochastic variation, but the trends are pretty clear; not much to argue about. Macroeconomics, on the other hand is a lot trickier. It's not that macroeconomists are any less objective than microeconomists. Rather, measurement and causality are much trickier. In the resulting vacuum, there's room for different assumptions and philosophies. This is what macroeconomists debate about.

Nevertheless, my experience backs up a comment to Drum's article:
Economists generally avoid and form consensus in regard to fringe theories. 

Translation: the differences in philosophies between macroeconomists isn't as big as you'd think. And they're tiny compared to our political differences.

Tuesday, December 16, 2014

From an article in the Wall Street Journal:

When system designers begin a project, they first consider the capabilities of computers, with an eye toward delegating as much of the work as possible to the software. The human operator is assigned whatever is left over, which usually consists of relatively passive chores such as entering data, following templates and monitoring displays.
This philosophy traps people in a vicious cycle of de-skilling. By isolating them from hard work, it dulls their skills and increases the odds that they will make mistakes. When those mistakes happen, designers respond by seeking to further restrict people’s responsibilities—spurring a new round of de-skilling.
Because the prevailing technique “emphasizes the needs of technology over those of humans,” it forces people “into a supporting role, one for which we are most unsuited,” writes the cognitive scientist and design researcher Donald Norman of the University of California, San Diego.
There is an alternative.
In “human-centered automation,” the talents of people take precedence. Systems are designed to keep the human operator in what engineers call “the decision loop”—the continuing process of action, feedback and judgment-making. That keeps workers attentive and engaged and promotes the kind of challenging practice that strengthens skills.
In this model, software plays an essential but secondary role. It takes over routine functions that a human operator has already mastered, issues alerts when unexpected situations arise, provides fresh information that expands the operator’s perspective and counters the biases that often distort human thinking. The technology becomes the expert’s partner, not the expert’s replacement.
Pushing automation in a more humane direction doesn't require any technical breakthroughs. It requires a shift in priorities and a renewed focus on human strengths and weaknesses

Two thoughts come to mind:

First, there's Tyler Cowen's analogy of freestyle chess. He uses this analogy liberally in Average is Over. And the division of labor between human and computer in freestyle chess mirrors the above quote.

Second, I was taught the dichotomy of these two philosophies in the Marine Corps. I enlisted just before 9/11; ten + years of war may have changed the budgetary environment. But at the time, Marine infantry units did not have much of a budget. As a result, we trained ourselves (and our minds) first, and supplemented with what technology we could afford. On occasional training with some other (unnamed) branches of the military, we observed that these other units were awash in technology, helpless without it, and not any better than us with it. (Think fancy GPS versus old GPS + map & compass.)

I believe that the latter thought is an example of another quote from the WSJ article:

If we let our own skills fade by relying too much on automation, we are going to render ourselves less capable, less resilient and more subservient to our machines. 

Something to keep in mind as you're implementing your decision support systems.

Monday, December 15, 2014

Ukrain?

So, I've noticed a trend over the last few month's in the blog's traffic. The vast majority of hits seem to be coming from domains ending in ".ru".


Of course, these are bots. (I am heartened to see that when you aggregate URLs to sites, twitter, meetup, and datasciencecentral are still close to the top.)

When looking at the geography of the traffic sources, I'm seeing a whole lot of... Ukrain?


Who knew stats were so popular in Ukrain? (Kidding.)

But seriously, this only started a few months ago. I'm wondering if the conflict in Ukraine has anything to do with this. It's conceivable that computers and servers are getting hijacked over there as part of the war. Anyone have any thoughts?

Thursday, December 11, 2014

Saved by plagiarism!

I am writing a paper on goodness-of-fit for topic models. (Specifically, I've derived an R-squared metric for use with topic models.) I came across this definition for goodness-of-fit in our friend, Wikipedia.

The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question.

I love it! It's concise and to the point. But do I really want to cite Wikipedia in an article for peer review?

A Google search for the verbatim quote above reveals that this definition appears in countless books, papers, and websites without attribution. Did these authors plagiarize Wikipedia? Did Wikipedia plagiarize these authors? Who knows.

My solution, put the definition in quotes and attach a footnote. "This quote appears verbatim on Wikipedia and countless books, papers, and websites."

Done.

Tuesday, December 9, 2014

Simulating Realistic Data for Topic Modeling

Brian and I have finally submitted our paper to IEEE Transactions on Pattern Analysis and Machine Intelligence. This is the culmination of a year of hard work. (There's more work yet to be done; I doubt we'll make it through peer-review without having to revise.)

I presented our preliminary results at JSM in August, as described in this earlier post.

Here is the abstract.

Latent Dirichlet Allocation (LDA) is a popular Bayesian methodology for topic modeling. However, the priors in LDA analysis are not reflective of natural language. In this paper we introduce a Monte Carlo method for generating documents that accurately reflect word frequencies in language by taking advantage of Zipf’s Law. In developing this method we see a result for incorporating the structure of natural language into the prior of the topic model. Technical issues with correctly assigning power law priors drove us to use ensemble estimation methods. The ensemble estimation technique has the additional benefit of improving the quality of topics and providing an approximation of the true number of topics.

The rest of the paper can be read here.

Monday, December 8, 2014

Look up

I've added a couple pages to the blog here. The about me page has a quick bio. The publications and presentations page is where I'll be putting up my bragging rights research portfolio.

Wednesday, November 26, 2014

Economics and Data Mining


He's mining for data.


I stumbled across this video.

Cosma Shalizi, a stats professor at Carnegie Mellon, argues that economists should stop "fitting large complex models to a small set of highly correlated time series data. Once you add enough variables, parameters, bells and whistles, your model can fit past data very well, and yet fail miserably in the future."

I think there's a bit of a conflation of problems here. Not all economic data sets are small. An economist friend of mine pointed out that he's been working with datasets that have millions of observations. I am told this is common in microeconomics.

Nevertheless, my experience is that "acceptable" econometric methods are overly-conservative. As stated in the video, an economist saying someone is "data mining" is tantamount to an accusation of academic dishonesty. I was indoctrinated early in the ways of David Hendry's general to specific modeling, which is basically data mining (but doing it intelligently).  This, I think, made machine learning an intuitive move for me, and I've always thought that economics research would benefit greatly from machine learning methods.

There are some important caveats to all this. First, I don't see anyone beating out economics the same way computer science is sticking it to statistics. For "big data analytics" to live up to its hype, data scientists have to think a lot like economists, not the other way around. A big part of an economics education is economic thinking; this goes above and beyond statistical methods. Second, (and more importantly) you should take anything I say here with a grain of salt. Though I have a background in (and profound love for) economics, I never held a graduate degree in econ and I've been out of the field (and professional network) for several years. My knowledge may be dated.

Even so, I'm happy to hear voices like Dr. Shalizi's. It adds to Hal Varian's paper on "big data" tricks for econometrics. Maybe instead of worrying about the AI singularity, we should be worrying about economists using machine learning and then taking all of our jobs. ;-)