# Taps Maiti: Big Data, Big Science

My name is Frederi Viens, I'm the Chair of the Statistics Department here at MSU. And its my honor to say a few introductory words about Tapabrata Maiti, who is Professor and Graduate Director in the Department of Statistics and Probability here at MSU, on the occasion of his investiture as MSU Foundation Professor. Among friends and colleagues he goes by Taps, however he is certainly mighty. (audience chuckling) Although I've only had the pleasure of knowing him well since my arrival here a year ago, a little more than a year ago, its clear that his impact on our department and our university has been decisive since his arrival in 2008. And his positive influence can be expected to continue. Doctor Maiti hails from the State of Weidlingau India, its probably soon to be known as Bengal as well. He received his Ph.D from the University of Kalyani while he was working at the Indian Statistical Institute in nearby Kolkata. As a grad student he worked on the theory of statistical inference in finding population sampling. Its a technique which applies to many large-scale and scientific data collection process. Drawing on experiences as a post-doc at the Harvard Medical School in his early career his work expanded to hierarchical models and associated Bayesian inference, what we call Bayesian inference, which he applied to public health and policy research on pediatric cancers and later to machine learning for cancer detection and disease cluster detection in climate change. He is currently invested in high dimensional statistics and many other things, and he's building computational predictive models which can assimilate image and observational data. There are many other things that Taps has been doing and its very, very difficult to go through the whole list. His inter-disciplinary work reaches far and wide including collaborations at MSU with engineers, radiologists, neuroscientists, physicists, marketing scholars and medical professionals. And I'm proud to say, that despite our very distinct backgrounds, I count myself as an active research collaborator of Taps's. We're working on a Bayesian and certainty quantification question with nuclear theorists from MSU's Facility for Rare Isotope Beams as Professor Kirkpatrick mentioned a little while ago. So maybe a few words about later part of his career before he came to MSU. His career took him to University of Nebraska at Lincoln and then to Iowa State University and then he arrived here. His internationally recognized work has led to professional stints and memberships in various learned societies throughout the world and again the list is too long to go through. Including, I should mention serving as a Research Fellow at the Indian Statistical Institute in Kolkata. Senior Research Fellow at the US Census Bureau. Special Government Employee at the US Environmental Protection Agency and various academic positions in a number of countries including Australia, China, Portugal, the UK. And he is also a Fellow of the American Statistical Association and the Institute of Mathematical Statistics. And I should add that being Fellow of those two societies simultaneously is not common, even for Fellows of one of them. His research is characterized by a unique combination of skills and expertises resulting in peer reviewed publications in a wide spectrum of scientific outlets. He's well funded by awards from National Science Foundation, National Institutes of Health and the USDA Natural Resources Conservation Service. Back in our department, SDT we call it, here at MSU we're very proud of him, indeed we're indebted to Taps for his work in expanding the departments graduate programs. He's advised about 20 Ph.D students and he continues strongly. He has been instrumental in developing MSU's Program in Business Analytics. And as our world enters the age of data, I easily envision Doctor Taps Maiti, leading MSU's charge in expanding graduate education and research in data science. The recognition of MSU Foundation Professor is prestigious and rightly so. Congratulations mighty professor, you're up. (audience applauding) - Right thank you Frederi, thanks everybody. I don't have really many fancy pictures, movies, although I'm involved with some of this stuff. I guess instead of going to the boring mathematical modeling to you guys possibly I should be also touching base with broader community work what I do here and what I'm going to do in the next few years. That's possibly I'll try to touch base. Well as you can see my title of the talk is not really technical: a statistician at the junction of big data and data science. So I'll definitely talk these three things: statistician, big data, data science. So I think that's the kind of significant move for me from Hawkeye to Spartan. Certainly it was a good move, I had to say it because my family is pretty happy here. My wife got another degree from College of Education. My daughter is growing up here happily. So it was a good move. And the department, particularly the Michigan Statistics Probability and the Michigan State accepted me, welcomed me very well. I clearly remember the meeting with Dean Kirkpatrick the very first day. It was scheduled to be half an hour but it went about an hour and the Department thought what these guys doing? But it was a lovely conversation. I think after coming here, immediately in 2008 things are happening as I expected at this stage of my career. I became a Fellow of American Statistical Association, the worlds largest organization in statistics. I became a Fellow of the Institute of Mathematical Statistics. I played many a role for expanding research and education, different schools on campus including the Business Analytics. And of course I had already collaborations with almost all the schools except possibly I have to say the Music or some of the thing, I don't know how to play the music. Here is the current things definitely what lately happening which we are very excited about. Particularly the Department, myself already mentioned we are collaborating with FRIB and hopefully we can be key member pretty soon. This project has started just a year ago with some of the scientists over there. Among other things, there are many things, but I like to mention couple of things. One is, you know that the neuroscience. This is another area where big data, data science is going to play a big role. And I have been already collaborating with scientists on this campus where they're generating huge amount of image data every day. And one of the objectives what I do here, is you take this image and then put into your statistical machine and try to create I suppose a Alzheimers disease percent, how soon he is going to be completely Alzheimers or how long he is going to be intact in that state. So some of these complicated things we have been doing. Its a very complicated, difficult problem. Not only from scientific perspective, also from the statistical perspective because we haven't seen this huge data before. The other important thing, or exciting thing I have been doing with the College of Engineering, with some of the mechanical engineers who are developing the model for different organ of your body, like heart, lung. And use thousands of different situations. They have these computer programs which can run over the day or even 12 days and they can predict the condition of your heart without even looking the person. But here the things come in, when the patient comes in, usually you have some images done immediately in the clinic. Then can we combine with the computer model and the image to quickly come up to some recommendation to the medical doctor that what's going to happen in next one hour or so. So one objective is to first convert that highly complicated, like thousand different situations, into a simplistic statistical model where the doctor can understand. So that is one of the things going on. And recently it has been funded by NIH zero one, part of the collaboration with University of Michigan. So that's there. All that good, well this was me I think few years ago. And now see me, how I am. (audience laughing) But its all good. And you said that I need to definitely talk a little bit about the future, particularly the future of statistics on this campus, the role of statisticians. Personally, I think this is a very good time to be a statistician as many of you heard about data science. Data scientist this is one of the sexiest job in the 21st Century and it is obvious that there will be a big shortage of the data scientist in next few years. Here's some of the information like the McKinsey Global Institute who predicted that in 2018 there would be millions of jobs. And definitely for this type of job we need trained people. Really anybody cannot just join. Pretty good, average salary is over 100 000 and it was the number one job in 2016. But what is Data Science? If you just Google it, there will be number of different versions. Here is one version from Wikipedia but clearly from this picture you can see that this is interdisciplinary science but statistics is everywhere. You can see basically in the data science: statistics, maths, machine learning also combined with these. All sorts of things are there. But its an emerging field, already there are 500 programs. Sometimes the programs names are different, like data science, business analytics, predictive analytics, computational statistics depending on the strength on campus whatever we had. But what I'm going to put, some of my observations and my outlook along the things what's happening in this community. So we can see the balancing between three major pillars, statistics, computer science and information management. And of course there will be all specializations. Already I mentioned neuroscientists, health scientists, everybody will be together. Quantitative background such as basic training in statistics, mathematics, computer science definitely is needed. And there will be three pillars, very quickly, there will be statistics foundations, there will be computer science foundations. And then we'll just be like this, we started out like this as human being but now with the computer, in front of the computer we'll be like that. So that's the kind of my future I can see that way. Information science foundation, that is another pillar which certainly statisticians are not very well equipped. We need to do those part. Here is the relation between scientist, between data science and statistics. So I like this definition, that a data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician. So clearly I'm not also saying that statistics is data science or data science is statistics but we are inbetween. As I said that American Statistical Association is the worlds largest professional organization in statistics. And they have some definition and some way how to work for the data science and we acknowledge that data science is just not statistics but certainly statistics is going to play a major role. In my personal opinion, it is not a threat, it is an opportunity, the opportunity to work with all other scientists, just not with ourselves. And it might help to grow all three defined disciplines, computer science, mathematics, information management, along with many other sub-disciplines. It will be helpful if we can really dig into the details and work together. Of course for statisticians we have most things to do. As you can see, the average statistician is just plain mean. Again, I didn't put that thing-- - [Audience Member] Is that a camembert or what is that? - I don't know. So it is very, very important that we get engaged with computer scientists, database scientists. We do these distributed computing all sorts of thing. We develop new tools where the existing statistical method can be imported or transported to these comfortable way, all these other things. And of course, the associated thing is the big data that we have to dig in. So I just mentioned data science, statistics, what is the relationship with big data. In general way, that said, again no concrete definitions, but it is so huge, that simple, my laptop, I cannot deal with these things. I need some kind of supercomputer to do this big data. But what is the relation with big data. That is what I'll just mention. But here is the big data potential. Just to be brief, to get the idea about the big data. If you bond all the data created in just one day onto DVDs, you could stack them on top of each other and reach to the moon twice. At the moment less than 0.5% of all data ever analyzed and used. So we are generating data much faster than how much we know. And here is the situation essentially. We have this type of gigantic data but we know a little bit but most difficult is which part we need to know. Always how do I know that this is the part that will be floating above the water, that's very difficult to know. And that's where the data science and big data coming in. And the beginning here, this is the cartoon I like, kind of Mickey Mouse cartoon. So this is the big data, he thinks that oh the data science you don't know I'm so big I can just grab you. But after a while, when we learn this big data, we can say that this part of your body is good, this is the part where your brain is here. If I can identify all these things using my tools, particularly the statistical tools, then we can become big friends, that data science and big data becomes friends together, otherwise there is no hope. So essentially what we have to do, we have to learn what is inside of the big data. And again, you know that we don't need to know everything but it is much more important which part we need to know, even if its a small part. Which part we need to know, so that's very important thing. These are just things which are generating big data everyday: e-commerce, imaging, social media, medical records, satellite data, government records. Some of the applications of these big data going very fast way, like all these search engines, image recognitions, fraud detections. Mostly I am involved in image recognition situations. This next thing is coming, self-driving cars which will generate millions of millions of data immediately in every second. So we have to analyze all these things. And the world is ready, every school, everybody is getting prepared to deal with big data using these three pillars: statistics, computer science and information science. Certainly there is a future, and where are we? We are the Spartans. So I would like to thank you Michigan State. (audience applauding) - So we now have the part of the program where we hand over the big iron. And so I'd like to invite Brian to come forward. I should add that these go really well with half-button Hawaiian shirts. (audience laughing) Congratulations. (audience applauding)