5 Things We Learned from the Strata Data Conference

Just before our time at SXSW, Whole Whale attended and presented at the O’Reilly Media Strata Data Conference in San Jose, CA, from March 5th-8th, 2018. As the leading conference for data scientists and big data companies in the field, Strata definitely put us out of our comfort zone, although our very own design whaler, Ann, (brilliantly) presented our work with machine learning for Power Poetry.

Attending the conference was intense as every session presented something new — and at times way over our heads — which we love: Being exposed to innovative ideas and foreign-but-adjacent fields of study is one of the healthiest activities for expanding the way you address problems. So here are some of the thoughts swirling around our heads post #StrataData conference (and taco hangover).

The data chasm is real — and 4 other takeaways from #StrataData in San Jose. Click To Tweet

1. The data chasm is real

There is a huge chasm in the volume of data stored in the world of for-profits versus nonprofits. Industries that deal with health, transportation, hospitality, insurance, and energy are dealing in the petabytes of information. At this volume, using data engineers to build models and machine learning tools is critical to the success of these industries. Not only are innovations in ML and data software revolutionary for these fields, they are also necessary to their functionality.

2. Big data is a crowded market

Walking around the vendors at #StrataData it became clear that there is a lot of competition out there for cloud data management, aggregation, visualizations, and solutions. There is no shortage of options for storing and managing big data. It was also interesting to become familiar with the common software used across these companies. Terms and tools like Kafka, TensorFlow, Jupyter, Python, NoSQL vs SQL, AWS, Google Cloud, Power BI, Tableau, and GitHub were frequently being tossed around.

Hear more on Strata Data in Episode 87.5 of our podcast, Using the Whole Whale. 

3. Define roles in data work

We learned that real data scientists use Jupyter to document work and explore models, and found it interesting to see the way larger companies differentiated the roles and responsibilities of data engineers, scientists, and analysts. In the social impact sector, it is usually just the data analyst figuring out which way is up and how to build transparent dashboard — it was useful to gain insight into how different types of data work define careers and areas of expertise.

Ultimately, our takeaway from #StrataData is that we are just scratching the surface of what is possible with machine learning and big data, akin to the days of early internet usage. Click To Tweet

4. Tune in to TensorFlow

TensorFlow is winning as a tool for building ML. An open sourced tool, TensorFlow was originally developed by researchers from the Google Brain Team within Google’s Machine Intelligence research organization. It was designed to  conduct machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well. For instance, we can apply TensorFlow at Power Poetry to help with poetry writer’s block.

5. We need unbiased big data

Development teams at large companies and organizations need to start hiring or contracting with nonprofits as they begin to deal with more vulnerable populations. What people know to be true in the social impact world is not common knowledge in many of these big tech circles. Negative cognitive biases have always been a pervasive issue in the for-profit world, but these issues are compounded at an alarming rate when they are programmed into ML and into the systems that run our social services and governments. In this vein, one of the cooler uses of machine learning that we saw at the conference was the application that Text.io and their online tool can have towards helping HR folks create less biased job descriptions.  

Ultimately, our takeaway from Strata Data is that we are just scratching the surface of what is possible with machine learning and big data, akin to the days of early internet usage.

Even the coolest solutions and presenters acknowledge the people, process, and product paradigm – there is a lot more to come in the field of ML, and a lot more that we can do for social impact with these tools. If you have thoughts to share on the #StrataData conference or the future of machine learning and data tools, tweet us @WholeWhale.