Sept. 1, 2022 — Amara Sanchez was already a data scientist. She just didn’t know it.
“One day I got back [to the Pala Youth Center] from school and April Cantu asked me if I wanted to work with Kim on a data science project,” says the middle-school student.
Sanchez had followed her usual routine of coming straight to the Youth Center from school. A member of the Pala Band of Mission Indians in southern California, Sanchez was always up for new, interesting projects, which she pursued through the nearby Pala Learning Center. But she didn’t know what “data science” was. Relatively few adults do.
Cantu sent her to “Kim” — Kimberly Mann Bruch, of the San Diego Supercomputer Center (SDSC), and Doretta Musick, director of the Learning Center. They shared with her a book called Everyday Data Science.
Sanchez looked through the book.
“Oh, I already do that,” she told them. Then she showed them her hand her.
The connection between Sanchez and her classmate Maniya Zwicker and the Pittsburgh DataJam is a story of persistence, coincidence, networking and above all the magic that happens when a great idea meets minds that are prepared to make the most of it. It parallels the growth of DataJam from a purely Pittsburgh phenomenon to one that now spans the North American continent, from the Atlantic to the Pacific.
Mostly, it’s about the discovery that young people can not only understand data science. they can excel at it.
DataJam began in 2013, when the Pittsburgh DataWorks — a collaboration of companies and universities in Pittsburgh with an interest in data science — decided that a great way to support data science in the region would be to nurture the next generation of data scientists. Saman Haqqi of IBM, Raja Sooriamurthi at Carnegie Mellon University and Oracle data scientist Brian Macdonald joined with the Pittsburgh Supercomputing Center’s (PSC’s) Cheryl Begandy to design an extracurricular program to train high school teachers to coach student teams through an informal data-science competition.
Starting in 2013-14, DataJam became an intensive, in-person yearly effort to help the teachers acquire the skills to be comfortable doing — and teaching — data science. The teachers in turn helped the kids pick and investigate a topic of their own choice, ensuring their interest. Unlike most competitions of this sort, the DataJam learning activity would last the whole school year, teaching data analytical skills over time. The next school year, Pitt neuroscientist Judy Cameron, who ran science outreach for the university, gave students there who wanted to do outreach the opportunity to become mentors for the DataJam contestants. This innovation, also new to data competitions, proved to be both enduring and popular for the kids and their mentors.
“Through 2019, we focused on schools in the Pittsburgh area,” says Cameron, now director of DataWorks. “We were expanding into the suburbs and areas around Allegheny County. But we saw ourselves as a local activity.”
Then COVID-19 hit. Everything shut down. The kids were shut in their homes in quarantine. One-on-one mentoring was out of the question, let alone an end-of-term, in-person award celebration.
No one knew what was going to become of DataJam.
On Sanchez’s hand, written in ink, was a series of symbols.
“In class I was doing this thing of making hearts, stars and smiley faces on my hand every time I said ‘hi’ to someone and gave them a hug,” she explains. “My hand was full of these.”
Having introduced herself to data collectionSanchez was up for taking on a more ambitious project in data analysis. Zwicker and she worked with Cantu, Musick and Mann Bruch on a DataJam project focusing on water quality in a small section of a local river on the Pala Reservation.
“It was just like any other project we’ve had in the past,” says Musick. “It takes a while to get it off the ground and get it started. It depends on if we strike up the interest, whereas Amara and Maniya were totally [invested] in this one.”
The San Luis Rey River runs through the Pala Reservation, serving as a major source of water for the community. The students decided they wanted to learn more about the quality of their river’s water and better understand how it compared to other areas throughout San Diego County.
Of course, DataJam’s problem was writ large on the entirety of the US educational system. What could be done to educate students when schools were closed down?
As did many school systems, DataJam turned to remote learning.
The idea posed challenges. Could they ensure that students in less-affluent communities had access? Without in-person tutoring, could they train either teachers or students? One by one, they started overcoming the issues. And, when DataJam became an online-only competition, a surprising thing happened.
“From the end of 2020 into the spring of 2021, we started to get inquiries from all over the place,” Cameron says. “We said ‘yes’ to everybody; we figured if we were going to have to mentor by videoconferencing … what difference did it make where the school was?”
A bunch of human connections – many of them serendipitous – helped bring teams in from across the US
Cheryl Begandy introduced Cameron to investigators in the NSF funded Northeast Big Data Innovation Hub, with which she’d been working as part of her outreach efforts for the PSC, thinking that this Hub might be interested in supporting the DataJam.
One of these investigators, Catherine Cramer of the Woods Hole Institute, worked with Cameron and two other scientists to write a pilot project proposal in the summer of 2021 to expand the DataJam to other states in the northeast.
This got the interest of Rich Chomko, a teacher at the Passaic Academy in New Jersey. The school stood up several teams for the 2022 DataJam.
“I was working with one of the [Passaic] teams,” says Jackson Filosa, a DataJam mentor about to be a senior at Pitt. “They were completely new to the world of Big Data … they were a team of freshmen without a ton of statistics experience.”
The Passaic team nonetheless dove into the work. They studied the relationship between median income and COVID-19 cases in NJ counties. They found that, for the most part, higher income areas had lower COVID-19 cases, and formulated hypotheses as to why that was so. Differences in working conditions, quality of health care and social-class-associated adherence to precautions all seemed possible explanations. “The highest COVID cases were in Passaic County, where they lived,” Filosa says. The students would like to continue their work and dig deeper into the question. “It showed them that Big Data is not just these random numbers. It’s relevant; it’s all around; it’s everyone.”
To continue reading, please click here.
You can find the 2022 DataJam project poster-reports here.
The 2022-23 DataJam will accept proposals from student teams due by Dec. 2, 2022, with a Finale on April 27, 2023. For more information and to apply, see the DataJam website.
Source: Ken Chiacchia, PSC