Mapping Emergency Calls, Transmissions, and Social Media Posts
General Assembly Data Science Immersive
Project 5: Client Project
DEN-DSI-9: Jaelynn Chung, Jeff Thomas, Will Glynn
Using live 911 call data, police radio reports, and social media posts for real time identification of people needing assistance
Currently, FEMA identifies areas that require immediate attention (for search and rescue efforts) either by responding to reports and requests put directly by the public or, recently, using social media posts. This tool will utilize 911 call data, live police radio reports, and social media posts to identify hot spots representing locations of people who need immediate attention. The tool will flag neighborhoods or specific streets where the police and first-respondents were called to provide assistance related to the event.
When disaster strikes, emergency responders (including FEMA personnel, national guard, police, EMTs, and firefighters) are frequently overwhelmed by the number of people who need help. During Hurricane Harvey in 2017, Houston's emergency response services were inundated with 911 calls for rescue, and people began to rely on alternative sources for help, such as posting on Twitter. According to the Wall Street Journal, as many as 10,000 lives could be saved every year by reducing 911 response times by just one minute. Without having direct data to support this assumption, it's likely safe to say that response times during an event like Hurricane Harvey were drastically reduced, for a variety of reasons.
What if there was a way to consolidate all geolocation data where emergency response was needed into one easily accessible location? We conceived and partially built a pipeline and tool that is capable of capturing, analyzing, and mapping geolocation data from 911 calls, police radio audio data, and Twitter, and publishing it onto a map that updates in real time. This could be particularly useful for all types of emergency responders, as well as civilians that are willing and capable of assisting their community during large scale disasters like Hurricane Harvey.
On top of users being able to visually identify areas of greatest need by density of emergencies in specific locations, they would also be able to differentiate between the severity of the emergency, the type of help that was needed, and the time lapsed since the initial call, post, or radio request was made. This would enable emergency responders and good samaritans to allocate the resources they were capable of providing more efficiently and effectively.
We created a framework that represents the raw data → map journey.
Initially, the plan for this project was to iterate on the Police Radio to Mapping tool created by DSI-ATL-8. We started by following their workflow and scraped our own police radio audio data, only to find 0 usable locations from 5 days of audio. We realized that the only way for this tool to be practical would require a combination of improving the clarity of the audio (removing white noise, engineering specific decibel levels) using an API connected to a robust audio editing software, such as Audacity. But more importantly, the audio itself would need to have a greater number of locations referenced than the data we scraped. We realized that in order for a tool like this to really be practical, it would require a multitude of data sources feeding into the pipeline. We found that the best sources for this type of data would come from 911 calls, police scanner audio data, and Twitter posts.
911 call data
We located a dataset of 911 calls made in Hartford, CT, which conveniently included temporal, ordinal priority, brief description, and location data associated with each call.
Additionally, we located a corpus of Twitter posts that were made during Hurricane Harvey. This was created from an extremely sophisticated filtering process that identified trustworthy events from a massive array of posts made during the disaster. In addition to geo-tagged tweets, "location information was extracted from tweets by constructing a local gazetteer and merged events occuring in a specific spatiotemporal range. Firstly, the credibility score for each tweet was calculated based on the information contained in its text and URL. Secondly, the accumulated credibility score for each event was calculated based on the number of tweets and retweets associated with the same event."
From the abtract for this study - "Social media data have been used to improve geographic situation awareness in the past decade. Although they have free and openly availability advantages, only a small proportion is related to situation awareness, and reliability or trustworthiness is a challenge. A credibility framework is proposed for Twitter data in the context of disaster situation awareness. The framework is derived from crowdsourcing, which states that errors propagated in volunteered information decrease as the number of contributors increases. In the proposed framework, credibility is hierarchically assessed on two tweet levels. The framework was tested using Hurricane Harvey Twitter data, in which situation awareness related tweets were extracted using a set of predefined keywords including power, shelter, damage, casualty, and flood. For each tweet, text messages and associated URLs were integrated to enhance the information completeness. Events were identified by aggregating tweets based on their topics and spatiotemporal characteristics. Credibility for events was calculated and analyzed against the spatial, temporal, and social impacting scales. This framework has the potential to calculate the evolving credibility in real time, providing users insight on the most important and trustworthy events."
Police Scanner Audio Data
Finally, we followed the workflow created by DSI-ATL-8. We used the Broadcastify Archive Toolkit for python (broadcastify-archtk, formerly BArT) to retrieve archived mp3 police radio files. Since we were unable to collect any usable data from our iteration, we will illustrate the proof of concept, and discuss potential future iterations of how this could be used to update a map in real time. Ideally,
What We Built
We created a Tableau map that is hosted online here. The user has the ability to choose if they want to filter the results by severity, type of response needed, and the time lapsed since the initial call, post, or radio request was made. Unfortunately, we were only able to plot the data from the 911 call dataset, because the Twitter dataset is too large for one machine to handle.
In order for this tool to be practical and informative in the way it was designed, it relies on a constant influx of highly sophisticated data collection and transformation processes. The 911 call dataset we located was already cleaned, organized, and contained the necessary information to plot on the map. In a real life disaster scenario, 911 call data would need to be processed and stored in a similar manner on a reliable server that could update the map. Considering the lack of 911 call data available on the internet, we are unsure what protocol would need to be established to create a seamless flow of useful data. This could potentially be a fatally limiting factor in the practicality of this tool in a real life disaster.
The process through which useful Twitter data was collected illustrates how difficult it is to filter out noise. Of the ~ 7 million tweets analyzed during Hurricane Harvey, approximately 7000 tweets contained language indicating the poster needed help with either a geo-tag or location data within the text. Additionally, "when detecting events using Twitter data, spatial and social biases from unrepresentative data are non-negligible. Users that contribute data are usually younger, wealthier, and better educated, which do not represent the general population" - Twitter Data Capability Framework - Yang, J.; Yu, M.; Qin, H.; Lu, M.; Yang, C. A Twitter Data Credibility Framework—Hurricane Harvey as a Use Case. ISPRS Int. J. Geo-Inf. 2019, 8, 111.
Finally, as evidenced in the low quality output from the original police scanner audio → map data pipeline, collecting usable police scanner audio in real time presents a significant challenge. The website we scraped audio data from, Broadcastify, does not provide API access to their real time police scanner audio streams. Additionally, the audio feeds are delayed up to 2 hours.
Through a multitude of communication sources, the map we created has the potential to be highly robust and practical if the necessary data is collected, stored, and processed accurately and efficiently. The unique aspect of this tool is that it could easily be made available to any type of emergency responder, as well as civilian good samaritans who are willing and cabable of helping out their neighbors during a disaster event like Hurricane Harvey. This is especially important because it would reduce response times by providing a visual picture of where, when, and what type of help was needed most.
This complexity of this tool necessitates extensive user research and testing in order to determine the best use case/s. We made a lot of assumptions on how we thought it would be the most useful/practical, but without significant background knowledge of emergency and disaster relief operations, it is difficult to know what data is the most important to collect, and how it would be best used in a disaster scenario.
We want to know how 911 call data is collected and stored, and how to establish a protocol during an emergency so that there is a seamless flow of this data through our pipeline. Is it feasible to create or modify an existing platform to make this practical?
We would like to create or utilize existing software capable of scraping relevant twitter data in real time, and directing it through the pipeline we created
Interesting to note: Incident Page Network. "Every minute of every day, IPN dispatchers listen to public safety transmissions and send incidents as they are happening from around world with extensive coverage in the US and Canada." There is strong potential for a service like this to integrate with our tool to provide significant data input throughout the course of a disaster. If implemented correctly, this could potentially remove many steps in the currently data pipeline because acutal listeners are transcribing the audio themselves and manually entering data. It would be interesting to explore the possibility of a partnership between FEMA and the users of this service to establish a protocol of listening for, and entering data on location, severity, and type of emergency.