I am not sure how Poweroutage.us collects it outage data; I would suspect they are harvesting online outage reports from utilities to advise customers. Our local utility, National Grid, has outage maps and rough values for number of customers affected, etc.
Data collected by utilities are very granular, collected at the individual customer level with smart meters and/or distribution feeder level. It certainly has restoration times even for individual customers. For investor owned utilities this is reported to state regulatory commissions and frequently is part of financial incentives or penalties. Coops (like Cobb County) or public systems (eg, individual TVA-served cities) all collect the same data. Even for very large multi-state systems (Duke, Exelon, etc) data is reported to state commissions on an operating company level (ie, pre-merger), most often within states or even regions within a state.
It is very doubtful that utilities would be sharing this detailed information with Poweroutage.us. County-level data I suspect is used for GIS aggregation and collection; it is not uncommon to have counties served by two or more utilities (eg, LA county or cities served with a public entity and suburbs by an IOU).
Major events, left unscreened, radically skew the data, as Brian reported. IEEE has a standardized "beta method" to screen outliers based on the system's daily average of customer minutes. This is an excellent method and has been used for a couple of decades by utilities and state commissions. The weakness is that the "beta method" is not screening for the number of interruptions (regardless of number of customers) or the number of customers affected. For example, thunderstorms can cause many momentary/short duration outages but an insignificant amount of customer duration. Momentaries are important, especially for electronics and appliances.
IEEE does an annual survey which includes perhaps 50% of US distribution systems. It more-or-less has standardized data definitions and is a decent overview. Note data caveats in the slides.
Another huge factor that has tremendous impact on reliability is system design. High density, older urban systems typically have an underground NETWORKED distribution grid. A place like NYC, Boston, DC, etc all operate like this and they seldom have outages. Much of Europe's distribution systems are similar. Sunbelt cities like those in Texas or Florida evolved differently and most distribution circuits are on overhead poles and operate radially. Service to residences are usually underground after 1970, but the feeders are often overhead. Further the AGE of the system and the amount of trees further confound reliability assessment. Coaxing out these factors is extremely difficult on a within-system basis and impossible for between-system (or state, county) benchmarking comparisons. These differences are enormous.
Lastly, the impact of transmission outages is a significant factor in distribution reliability. Typically transmission is attributable for 5-20% of distribution outages. Most often these are caused by 30-99kV subtransmission or substation failures. This data is often not collected systematically. Grid outages can cause system blackouts (think August 2003). If half the hype about AI data center demand pans out, I'd expect it to have dire consequences for consumer/residential reliability and costs.
I have been involved with similar work in developing countries where it can be very tricky and politically touchy to collect data like this, it is amazing that self-reported data from utilities can be this good!
Footnote 2 is interesting and makes me wonder what other reporting artifacts might be in the data. I’m especially curious if there is evidence of state level regulations affecting the data, like requiring utilities to report outage start times but not end times or setting different minimum durations for an outage to be reportable? Basically, is Illinois really good at keeping the lights on or not very diligent about tracking when they go out?
The Bay Area and Los Angeles were two where the worst reliability is winter, and in these cases I think it’s pretty clear that it’s because these are the only months where rain is common, rather than there being particularly high demand for heating. (I was a bit surprised to see an August bump in the Bay Area, but perhaps that is related to shutoffs for fire prevention - September is usually the hottest month.)
Good piece as usual. Memories of my utility days (hurricanes are the bane of our existence in the islands). Fortunately the other islands and the southern US states usually send us their crews to help restore power in particularly bad storms, as we do for them.
Great data science paper! Five out of five stars. Nine out of ten. Thanks so much for sharing. I wonder if a data set exists to pull in utility costs to correlate with variation. I hope poor Maine and West Virginia at least pay less for their crappier grids.
Maine and West Virginia have a few mid size cities and the rest is rural, sparsely populated andhilly or mountainous with LOTS of trees and serious weather. Distribution circuits are VERY long and generally more exposure = more outages. The amount of investment to get top quartile reliability would be astronomical.
Cost benchmarking is routinely done by consultants, state commissions and utility trade associations. Cost information is very tightly held.
I am not sure how Poweroutage.us collects it outage data; I would suspect they are harvesting online outage reports from utilities to advise customers. Our local utility, National Grid, has outage maps and rough values for number of customers affected, etc.
Data collected by utilities are very granular, collected at the individual customer level with smart meters and/or distribution feeder level. It certainly has restoration times even for individual customers. For investor owned utilities this is reported to state regulatory commissions and frequently is part of financial incentives or penalties. Coops (like Cobb County) or public systems (eg, individual TVA-served cities) all collect the same data. Even for very large multi-state systems (Duke, Exelon, etc) data is reported to state commissions on an operating company level (ie, pre-merger), most often within states or even regions within a state.
It is very doubtful that utilities would be sharing this detailed information with Poweroutage.us. County-level data I suspect is used for GIS aggregation and collection; it is not uncommon to have counties served by two or more utilities (eg, LA county or cities served with a public entity and suburbs by an IOU).
Major events, left unscreened, radically skew the data, as Brian reported. IEEE has a standardized "beta method" to screen outliers based on the system's daily average of customer minutes. This is an excellent method and has been used for a couple of decades by utilities and state commissions. The weakness is that the "beta method" is not screening for the number of interruptions (regardless of number of customers) or the number of customers affected. For example, thunderstorms can cause many momentary/short duration outages but an insignificant amount of customer duration. Momentaries are important, especially for electronics and appliances.
IEEE does an annual survey which includes perhaps 50% of US distribution systems. It more-or-less has standardized data definitions and is a decent overview. Note data caveats in the slides.
https://cmte.ieee.org/pes-drwg/wp-content/uploads/sites/61/2024-IEEE-Benchmarking-Survey.pdf
Another huge factor that has tremendous impact on reliability is system design. High density, older urban systems typically have an underground NETWORKED distribution grid. A place like NYC, Boston, DC, etc all operate like this and they seldom have outages. Much of Europe's distribution systems are similar. Sunbelt cities like those in Texas or Florida evolved differently and most distribution circuits are on overhead poles and operate radially. Service to residences are usually underground after 1970, but the feeders are often overhead. Further the AGE of the system and the amount of trees further confound reliability assessment. Coaxing out these factors is extremely difficult on a within-system basis and impossible for between-system (or state, county) benchmarking comparisons. These differences are enormous.
Lastly, the impact of transmission outages is a significant factor in distribution reliability. Typically transmission is attributable for 5-20% of distribution outages. Most often these are caused by 30-99kV subtransmission or substation failures. This data is often not collected systematically. Grid outages can cause system blackouts (think August 2003). If half the hype about AI data center demand pans out, I'd expect it to have dire consequences for consumer/residential reliability and costs.
Hard to see how our AGI dreams, let alone widespread vehicle electrification, will come to fruition without reliable power.
As a Bay Area resident I appreciate the implicit confirmation that PG&E is unusually bad at its job.
Thanks for the great post!
I have been involved with similar work in developing countries where it can be very tricky and politically touchy to collect data like this, it is amazing that self-reported data from utilities can be this good!
Footnote 2 is interesting and makes me wonder what other reporting artifacts might be in the data. I’m especially curious if there is evidence of state level regulations affecting the data, like requiring utilities to report outage start times but not end times or setting different minimum durations for an outage to be reportable? Basically, is Illinois really good at keeping the lights on or not very diligent about tracking when they go out?
good open sourced, accurate, precise, and all accounting electrical grid data (adopted by all major consumers) would do a lot of great things 🕊️
The Bay Area and Los Angeles were two where the worst reliability is winter, and in these cases I think it’s pretty clear that it’s because these are the only months where rain is common, rather than there being particularly high demand for heating. (I was a bit surprised to see an August bump in the Bay Area, but perhaps that is related to shutoffs for fire prevention - September is usually the hottest month.)
Good piece as usual. Memories of my utility days (hurricanes are the bane of our existence in the islands). Fortunately the other islands and the southern US states usually send us their crews to help restore power in particularly bad storms, as we do for them.
Great data science paper! Five out of five stars. Nine out of ten. Thanks so much for sharing. I wonder if a data set exists to pull in utility costs to correlate with variation. I hope poor Maine and West Virginia at least pay less for their crappier grids.
Maine and West Virginia have a few mid size cities and the rest is rural, sparsely populated andhilly or mountainous with LOTS of trees and serious weather. Distribution circuits are VERY long and generally more exposure = more outages. The amount of investment to get top quartile reliability would be astronomical.
Cost benchmarking is routinely done by consultants, state commissions and utility trade associations. Cost information is very tightly held.