What is sampling statistics?
Much ado about response rates
In Practice: Political Polling in 2012 and Beyond
- The number of households that have only wireless telephone service is reaching parity with the number having land line phone service. When considering only households with children (excluding older people with grown children and young adults without children) the number sits at 45 percent.
- Offering small savings on wireless bills may incentivize the taking of flash polls through smart phones.
- Political campaigns have been using social media to gather information and contact non respondents, supplementing or replacing traditional voter records for that purpose.
- By contacting 10 times as many people every day the Obama 2012 campaign schooled Gallup.
- Reducing the marginal cost of surveys allows political pollsters to design randomized controlled trials, to evaluate the efficacy of different campaign messages on voting outcomes. (As with all things statistics, there are tradeoffs and confounding variables with such approaches.)
- Pollsters would love to get access to all of your Facebook data.
Sampling Statistics and "Big Data"
I would challenge the above argument, but won't outright disagree with it. Your current customer base may or may not be your full population of interest. You may, for example, be interested in people who don't purchase your product. You may wish to analyze a sample of your market, to figure out how who isn't purchasing from you and why. You may have access to some data on the whole population, but you may not have all the variables you want.
More importantly, sampling statistics has tools that may allow organizations to design tracking schemes to gather the most relevant data to their questions of interest. To quote R.A. Fisher "To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: He may be able to say what the experiment died of." The world (especially the social-science world) is not static; priorities and people's behavior are sure to change.
Data fusion, the process of pulling together data from heterogeneous sources into one analysis, is not a survey. But these sources may represent observations and variables in proportions or frequencies differing from the target population. Combining data from these sources with a simple merge may result in biased analyses. Sampling statistics has methods of using sample weights to combine strata of a stratified sample where some strata may be over or under sampled (and there are reasons to do this intentionally).
I am not proposing that sampling statistics will become the new hottest thing. But I would not be surprised if sampling courses move from the esoteric fringes, to being a core course in many or most statistics graduate programs in the coming decades. (And we know it may take over a hundred years for something to become the new hotness anyway.)