Interview with Stefan Heeke the CEO of the Sumall Foundation. How can we think about using big data and analysis to predict homelessness and other amazing projects? Big data is term that continues to increase and for us it can be misleading because it may not focus on the right data. It is easy to think about using data just to analyze our donors (tech tends to follow the money), so we love hearing about stories that drive impact from big data.
Trend of Big Data
It doesn’t seem to be stopping…
- Explore great open data sets
- TIBCO Spotfire Data visualization tool
- Predicting Homelessness with Sumall Foundation
- Tableau Visualization software
Speaker 1: This is using the Whole Whale Podcast, a podcast that brings you stories about data and technology in a non-profit world. My name is George Weiner, your host and the Chief Whaler of wholewhale.com. Thanks for joining us.
Good morning and welcome to episode 7 of using the Whole Whale. Today we’re talking about big data and we’ll be talking with a really amazing non-profit called the SumAll Foundation and with Stefan there but before we get started to share a little big data on Big Data. The term if you throw it into a Google search will throw back roughly 2 point, yeah 2.1 billion results. So it’s certainly a term used a lot in the news and if you’re looking for a trend, you remember that handy tool google.com/trends which will show us the incidence of search for given term that we put into it. I’ll throw in Big Data in there. Since 2011, you’ll see this nice, clean, up and down sort of climb and it factors up about 10x the amount of usage from 2011 to present day in 2014. So this is a term that is on the rise and rightly so because the amount of data stored on everything that we do, everything that we put online is only increasing but our ability necessary to turn that data into actions, you know, isn’t necessarily increasing at the same rate. And when people start to talk about and think about big data, you know, I’ll play this for you to give you an idea of what I mean.
“You can just send that forth. I don’t think I’ll miss what you think you’ll miss.”
Speaker 1: Exactly. So sometimes I don’t think it means what people think it means and we don’t necessarily use it for what I think we should use it for. And in this case, we’ll go to do this interview with Stefan and see how they approach using big data for big impact.
Hello, this is George Weiner and I’m here this SumAll Foundation with Stefan, can you introduce yourself?
Speaker 2: Yes, George. My name is Stefan Heeke. I’m the Executive Director of the SumAll Foundation, SumAll.org and SumAll.org is, is, has been founded by the, the owners of SumAll.com and on, on a technology start-up now having more than thirty employees. And the idea of SumAll.org is to create a non-profit arm within a tech company. And the owner said that that it’s not all about, businesses are all not about money that’s really an opportunity to use technology talent and, and the energy that’s available to, to help and use data for, for civic impact.
Speaker 1: So the term “big data” has turned around off a lot and of course it’s a very popular buzzword and point of view to say, hey, let’s use big data for big good but what the heck does that actually mean?
Speaker 2: Yeah. Big data is almost a term that is confusing because we, if we work with civic data sets, we, we tend to, we have projects with the city of New York where we look at families at risk of becoming homeless. We work with Power Poetry where we look at poems. We work at schools where we look at progression of individual students. The, the big is, is really happening and actually sometimes having data that is close to the operation is, is a lot more valuable than having a lot of data. So often we start out with, with a very operational data-set, for instance, the, the amount of people checking into a shelter or the, the number of users having a certain activity or, or students and their grades. So that’s usually not big data. So we start, start out with a very, I would say operational data-set and, and then we look how can we make this data more useful and then, then we come to bigger data-sets. For instance, Twitter API or, or open data from the city of New York. So, so it’s really for us. We don’t look for, for having a lot of data. We just look for data that’s close to the problem.
Speaker 1: Got you. So it’s not much big data but we could say, the right data. And once you have that, your process seems to be find the data closest to the problematic outcomes of the organization or problematic outputs in the organization and then dig into this. So can you tell us a little about your process, how you like squeeze that lemon, how do you like take a bunch of just freaking numbers and turn it into something useful?
Speaker 2: The, squeezing the program, the lemon of the program is the problem. The, the idea is really to solve a problem. It’s, it’s not about data so much, it’s about someone, some organization has a problem or they want to know do we have the impact or they want to know how can we make our impact bigger. And, and that’s really important data is for us only really useful if we have, if we have a scope but that we actually have a direction how we can use the data to solve that problem. And, and that’s why we have a very careful tool to select our partners who can, who can articulate the problem and their goal and their outcomes, at the same time have data sources that are close to these outcomes. And, and then we have something where we can create solutions. And, and this is really our process where we have a partner who, who such as Power Poetry or the City or we work with Centers for Families where we have a strong subject matter expertise, where we have a strong set of data and, and we can help solve the problem for that organization and in the process we kind of help the issue by open sourcing the solution. So think about SumAll.org as a, as an open source consulting that, that solves a particular problem but then open sources the solution. So we open source the code, we open source the data, we open source the algorithms and we open source papers that help the issue because we’re really about the issues but we have to start with an organization and solve a particular problem to understand the issue and scale the solution.
Speaker 1: So I think this is a great frame that we’re describing. So it’s not big data, it’s the right data. And it’s not, let’s get the smartest people and have them roaming, have them wander around this dark room, we’re thinking about the questions. So it’s gather, question, analyze, create insights and then how you begin the touch on this. How we move from the analyzing the insights into action? You’ve talked about open source, I’m putting you back out there. What are the steps that you use to communicate and visualize the data?
Speaker 2: Yeah. The, the way we think about is we create assets from the scope. So once we have the data and we have a stakeholder who, who wants to use the data for, for outcomes there are different assets we can create. So we could create an algorithm that predicts the outcome. So that’s an asset we can post and share on Getup. We can create an application. Maybe that algorithm could power an application that helps students become better at poetry which we did with, with you guys and then Power Poetry to create an application that predicts the quality of language in a tweet. We can generate thought leadership. So the data itself may, may, may surface new problems that academic researchers would like to look at or write a paper on. So we, we publish papers. For instance, we’re thinking of publishing papers about gentrification from the perspective of homeless population. And, and we create narratives. So often, the data reveals certain archetypes. Let’s say, a certain kind of family composition is more likely to be at risk of being homeless. Let’s say, a single mother with two kids in a certain economic situation is an archetype for a family at risk of becoming homeless and that’s where storytelling comes in. If we have an archetype or we have someone, frequent occurrence we can really look at telling that story how people got there and how people can get out of it.
Speaker 1: So you have the option of creating these predictive algorithms of creating, it seems like not only white papers but also info-graphics. I’m hanging on the fence sometimes with info-graphics. I think they’re used more often to mislead than lead us toward actual information. So do you have any sort of advice on creating the right type of info-graphic?
Speaker 2: We, we did a few infographics and I, I like infographics when they, when they come at the end of the process. So yes, we work with data. We go in, into the core of the problem and then for us, we can write, we can write a paper but not everybody would read the paper. So the infographic is a way to, to communicate insights to a broader audience. That’s how we look about it. So it’s not about, the infographic is sort of the smallest denominator to get a certain data insight to a, to a maximum amount, amount of people. And the creativity is necessary to just have people engaged and we feel it’s a way to communicate data. It is not the result. It’s a way to communicate the results.
Speaker 1: On a, on a bigger scale where do you see like macro-trends going as, you know, data becomes more and more prevalent, our ability to look at more and more dashboards obviously gets easier. What are some of the trends you see, I guess in the social impact sector, with regard to use of data?
Speaker 2: I think there is a, a need to simplify things. Data right now has the touch of being somewhat complicated. You have to know how to program or you have to know a certain tool. And we feel the real power comes out when a lot of different people can look at the same data and add their insight. For instance, we have a lot of valuable discussions when, when we’re able to share data and data visualizations with the subject-matter experts. And they will ask the right question if they understand the data. And I think there’s a bit of a danger to take the data and keep it and, and let the data drive the insights. In reality the people who know the issue, who know the organization will, will be much better suited to ask the questions and giving them a way to understand their own data is almost more important than having a high-powered kind of algorithm or something that is, that is very complex. So I think there’s a, there’s a need to demystify and make data visible. If we get data in an excel sheet its invisible and, and there’s a certain talent to extracting and making exploratory analysis and available to everyone. And I think that’s one big value add we see. It’s very low-tech in a way but it allows more stakeholders to understand the data and add context that may not come from the data. And on the other side is, we, we, we not only open source but we’re also opening the process. So whoever works with us goes into the conference calls and is a part of the process. Even if you’re not the data scientist, you will see how the sausage is made. And I think it’s very important also to make, to take away that, that barrier and have people experience. Also the very real world problems that, that may come with a data-set that is biased or data-set that may have missing values and, and people understand that. And on the other hand we have always a subject-matter expert on a call to see if, if we’re doing the right thing. So it’s very easy to go down a rabbit hole with this data-set and do something interesting and forgetting that it may not make sense or that it may be to, too much of a tension. So I think that’s, that multi-disciplinary approach is really key to solve the problem.
Speaker 1: So definitely a trend in this space is that industry experts in other fields are going to get to see how the sausage is made because they heard it here, heard it here first. So moving to some more practical tools and tips, what are some of your favorite data analytics tools or visualization tools that you all like to use?
Speaker 2: I think for the, I think for the, for the specialist that seems to be an extended tool set which is, which is our [XXXX] for, for, for visualization. This is very custom, these are custom tools that require some coding abilities. We see a lot of value in kind of in the business analysis side of it, using tools that are point and click such as tableau or we also work with typical spot fire to just look at a data-set and create a dashboard on the flyer. Think about a power-point that is interactive and, and the presentation of that data always generates a conversation. It, it will engage the stakeholders and it will generate questions that would push the, that would bring the project had. So I think there’s a huge need to simplify and really make data available in real time. Imagine you, you look at data and you look at it right away and you talk about it, you apply filters, generate hypothesis. All that could, could, could shorten the whole project time by, by a lot.
Speaker 1: If you knew you’re looking for it certainly.
Speaker 2: And if everybody understands what they’re looking for.
Speaker 1: Great! So at SumAll Foundation, SumAll.org what is a, what is a home run? You know that term like how do you really bring in a client and you just knock it out at the park?
Speaker 2: I know the term. And what is a home run for us? It’s a partner who has a long history or a good history, has data available about the program or what, whatever the partner’s been doing. It’s usually a non-profit or it could be a government department or it could be a social enterprise. And we’re looking for someone who really wants to go ahead with the issue. So we’re not so much into how companies communicate. We’re more about like how is the program affecting outcomes? And this is where we’re really good at. So we look at the data. We generate a problem or we generate the scope and the perfect partner is, is a partner with a lot of data, with, with a clear objective on where impact comes from and what impact is. Someone who can also help us quantify the impact and, and then helps us put context to the data and is willing to share the outcome so all the boats can rise while we solve the problem.
Speaker 1: That’s great! So how do people find you online and, both you individually and the organization?
Speaker 2: SumAll.org is the website address and, and a search will bring results that, that will make you find the website.
Speaker 1: And you can apply to get in touch if you are such a non-profit with a data-set. They can apply through the website?
Speaker 2: Yes, of course. You can, you can come to the website, contact us, ask a question. It’s; we, we see ourselves as, we stay in for a longer period. So if we engage, we, we tend to have smaller teams that stay on the project until we have an implementation. For instance, so it’s, it’s not so much; yeah, it’s, it’s about like how can things get implemented that, that would get you better outcomes and everyone better outcomes. So, so for instance, one example with the city of New York we’re actually implementing the data with social workers. So all the data science suddenly becomes a list of names that social workers can reach out to, a list of names of families at risk. So a lot of the hi-tech process may then re-translate into a very simple list that helps social workers pick the families that, that are about to, to become homeless. So that’s, I think that’s the beauty of implementation. Also that that the, that the whole process suddenly becomes very simple again and, and sometimes quite low-tech but, but the insight of the value of the data is so good that that it really empowers an organization to get better.
Speaker 1: That’s great! Well, we’ll continue to send folks that’ll be great, great matches for you. What is your Twitter handle so people can chase you down?
Speaker 2: Twitter handle is SumAll.org. My personal Twitter handle is Stefan_Heeke. And we’ll love to see you on Twitter and tweet at us and yeah, just come by. We’re in 247 Center Street in Soho. Great location and yeah, hope to see you around. We’re also, yeah, looking for volunteers in, in different areas. So it’s, we’re looking for volunteers in business analytics. We’re looking for designers. We’re looking for people who can manage projects. We’re looking for data scientists. So our volunteer kind of requirements are very holistic and we’re looking for all kind of, kinds of; we’re looking for writers, creative writers. We’re looking for scientific writers. We’re looking for people who can review outcomes. So it’s.
Speaker 1: You guys are looking for everybody.
Speaker 2: So we’re looking for everybody
Speaker 1: Everybody who is remotely interested, chase these guys down there.
Speaker 2: Exactly.
Speaker 1: Really fun work. They have a great office. We’re sitting here. Truly beautiful. Thanks so much for your time today Stefan.
Speaker 2: Thank you George.[SONG PLAYS]
Speaker 1: I think that was very informative and helped me think about t least how we can look at the right kinds of data close to our problematic side to increase our impact. What I love is that the SumAll foundation really doesn’t take on projects that, you know, you walk in and say, can you look at all of our donors? Can you go in here and just tell us who is going to give us more money over time? And that’s how we increase our impact. And they’re trying to use data really for its best possible purpose which is making sure we spend our dollars correctly. Well most of the money, I think, in data and technology, seems to follow the donor. I think this is a fantastic approach of saying let’s follow our ideals to reduce our problematic side and increase our impact with it. That’s all I have for you today. For more resources again, you can always check out wholewhale.com/podcast. Take care.
This is been using the Whole Whale, the podcast. For more on the topics covered in today’s show please check out wholewhale.com/podcast. You can surf on us on Twitter on Whole Whale and thanks for listening.