Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate residential_units counts for at least three census tracts #596

Open
pfjel7 opened this issue Oct 3, 2017 · 2 comments
Open

Comments

@pfjel7
Copy link
Collaborator

pfjel7 commented Oct 3, 2017

The calculations of several rates in the zone_facts table (for building and construction permits already, and eventually also for the percentage of units that are subsidized per zone: see #564 and #574) all depend on the accurate determination of the total number of residential units in each zone.

Unfortunately, our current method for calculating those totals is unreliable. The most glaring illustration of that inaccuracy is the census tract 68.04. While the tract has a total population of nearly 3,000, according to our census data, it has, according to our current count, only 13 residential units: a crazy average of 227 people per unit. Tracts 2.01 and 62.02 have similarly implausible ratios of people to residential units: 3685/168 & 117/6.

Our current method for calculating the number of residential units per zone is to sum the values of active_res_occupancy_count from the mar table for each property by zone (See commit hash 7435937 and pull request #556.) This method replaced the method in cama.py used previously. (For initial guidance on developing that method, see issue #493. Please note: although Neal suggested in opening the issue that we sum the values of active_res_unit_count, that column turned out not to have values for properties with single- or owner-occupancy. Hence, our use instead of the occupancy column.) While this new method generally provides credible totals for most of the city, the three under-counted tracts suggest either an incompleteness in the mar table or mistakes in assigning property addresses to zones.

To fix this, we need either to find the missing data, correct the mistaken zone assignments, or estimate the implausible values by modeling those from other data (such as population counts, aggregate income, number of transit stops, etc.).

To help those of you who know more about the city and the available data than I do, I provide here an image that illustrates the degree and location of the greatest disparities.

resdensity enlarged font vals

@pfjel7
Copy link
Collaborator Author

pfjel7 commented Oct 3, 2017

The graph below provides preliminary diagnostics for identifying additional census tracts with possibly inaccurate residential unit counts. Specifically, it shows the mostly linear relationship between residential units and population counts for each census tract. A number of tracts, however, break that pattern, including the three discussed above. Individual tract IDs are revealed by hovering over the data points in this linked copy of the graph.
residential units counts by tract populations

@NealHumphrey
Copy link
Collaborator

A @pfjel7 this is a great analysis of this issue. I think we'll want to look at the records of the MAR table itself for each of those three zones to see what's going on. My guess is that the MAR has missing data in the res unit count field - once we look at that and if this indeed seems to be the case, we can reach out to the DC gov contact that maintains the MAR and see what they say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants