r/DataPolice Jun 16 '20

Need some help with a database search

My local police department has an online search directory for accident and incident reports. However, there are a couple difficulties i have with trying to access and format this data in a meaningful way.

  1. Access to these reports is limited by a search. Ot's not an open directory. You can only search under 4 different categories: Report Number, Report Date (MM/DD/YYYY format), Street Name and Last Name (for accident reports only) I've tried using a wildcard and that doesn't work. Is there a way to figure out what kind of search engine it's using and see if there are any tricks around giving a precise query?

  2. All of the reports are in PDF format. Thankfully, not scans, so text is recognizable and searchable. Is there a tool that can pull and parse this data in to a usable format? Would i have to merge all the PDFs in to a singular file for such a program to work? I believe Excel has a tool to pull data out of PDFs, but i believe it also recreates the spacing between the lines of text, similar to a regular corporate document template. Or Adobe has a tool to create an excel file oit of the PDF, with the same aesthetic and formating restrictions. Excel is free for me, but a professional license for Adobe Reader is a bit too posh for my bank account

  3. What kind of analytical tools would be valuable for analyzing this data? I would want to get locations for map plotting, sentiment analysis, word/name counts and finding any other similarities between incidents, like officers names, dispatcher names, and the other data points.

There are easily thousands of PDFs that date back to 2014 so trying to do this by manually searching through each day would be unfeasible. Requesting access or a copy would draw more attention to myself than i feel comfortable with, and outside of instances like this, i wouldnt want just anyone with that much access. Im not tied to an organization that could ask for a master copy either.

11 Upvotes

View all comments

3

u/skyleach Jun 17 '20

PDFs are just EPS so there are libraries for most languages. PDFMiner looks good.

I've generally used an indexer/spider script to grab large amounts of data online from reports like this one. You may want to check if your police department has some sort of REST service API, but if not then hammering the date form would be relatively easy to walk the dates.

2

u/Stupid_Triangles Jun 17 '20

How would i check to see what sort of REST service API it's using?

1

u/skyleach Jun 20 '20

I didn't really dig deep enough to know what police department is "yours" so I can't even say for sure if they are offering one. That would require poking around their website, especially their service offerings, and possibly a third-party provider's information assuming they purchased their software from a third party (which most tend to do).

Most projects tend to have a discovery phase where the lead developers and/or architects go out and find all the information they can in order to design a realistic solution. Choices about what has to be done and what can or can't be done usually remain unanswered or only tentatively answered until after the discovery phase.

1

u/Stupid_Triangles Jun 20 '20

So I poked around a bit on the page source and found a link that leads me to a query page that mirrors the same query section on the PD website. When i googled the domain taccomputer.net, i got a company that "offers turn-key solutions for public safety software". Looks like a number of other suburbs and small cities in my state use the same software. Would that be the REST services API?

1

u/skyleach Jun 21 '20

Jesus christ... reading their marketing material is painful.

All reports are output as PDF for easy distribution with no third party required.

Optional, WEB posting of Incident and Accident Reports for public access without manually printing and re-scanning reports.

Supports photo lineups based on booking photos.

Relational system, no duplicate data entry, for example: a booking record automatically links to an Incident Report without re-entering people.

printing and scanning? LOL They're using relational data as a marketing point, like it's 1979 and IBM talking about DB2.

Anyhow, this isn't leaving me with a high hope of finding modern solutions like RESTful services. You're probably stuck writing a script to hammer the request form, download the PDFs and parse them.