Data centers contain 90% crap data

by oqtey

We need to talk about the data. Crap data. We’re destroying our environment to create and store trillions of blurred images, half-baked videos, rip-off AI ‘songs’, rip-off AI animations, videos and images, emails with mega attachments, never-to-be-watched-again presentations, never-to-be-read-again reports, files and drawings from cancelled projects, drafts of drafts of drafts, out of date, inaccurate and plain wrong information, and gigabytes and gigabytes of poorly written, meandering content.

We’re destroying our environment to store copies of copies of copies of stuff we have no intention of ever looking at again. We’re destroying our environment to take 1.9 trillion photos every year. That’s more photos taken in one single year in the 2020s than were taken in the entire 20th century. That more than 200 photos taken for every child, woman and man alive. Every year. 12 trillion photos and growing, stored in the Cloud, the vast majority of which will never be viewed again. Mind boggling and exactly how Big Tech wants it.

I have spent almost 30 years working with hundreds of the largest organizations in the world in some 40 countries, trying to help them to better manage their content and data. Here’s what I’ve learned. 90% plus of commercial or government data is crap, total absolute crap. Period. It should never have been created. It certainly should never have been stored. The rise of digital saw the explosion of data crap production. Content management systems were like giving staff diesel-fueled diggers, whereas before they only had data shovels. I remember around 2010 being in conversation with a Microsoft manager, who estimated that there were then about 14 million pages on Microsoft.com, and that four million of them had never been visited. Four million, I thought. That’s basically the population of the Republic of Ireland of pages that nobody has ever visited. Why were they created? All the time and effort and energy and waste that went into all these pages that nobody had ever read. We are destroying our environment to create and store crap. And nobody cares.

Everywhere I went it was nothing but the same old story. Data crap everywhere. Distributed publishing that allowed basically anyone to publish anything they wanted on the intranet. And nobody maintains anything. When Kyndryl, the world’s largest provider of IT infrastructure services, was spun off by its parent, IBM, they found they had data scattered over 100 disparate data warehouses. Multiple teams had multiple copies of the same data. After cleanup, they had deleted 90% of the data. There are 10 million stories like this.

Scottish Enterprise had 753 pages on its website, with 47 pages getting 80% of visits. A large organization I worked for had 100 million visits a year to its website, with 5% of pages getting 80% of visits. 100,000 of its pages had not been reviewed in 10 years. “A huge percentage of the data that gets processed is less than 24 hours old,” computer engineer, Jordan Tigani, explained. “By the time data gets to be a week old, it is probably 20 times less likely to be queried than from the most recent day. After a month, data mostly just sits there.” The Southampton University public website found that 0.2% of pages got 90% of visits. Only 4% of its pages were ever visited. So, 96% of the roughly four million pages were not visited. One organization I knew of had 1,500 terabytes of data, with less than 2% ever having been accessed after it was first stored. There are 20 million more stories like these.

Most organizations have no clue what content they have. It’s worse. Most organizations don’t even know where all their data is stored. It’s even worse. Most organizations don’t even know how many computers they have. At least 50% of data in a particular organization is sitting on some server somewhere and nobody in management knows if it even exists; nor do they care. The average organization has hundreds of unsanctioned third-party app subscriptions being paid for by some manager’s credit card, storing everything from project chats to draft reports to product prototypes.

The Cloud made the crap data problem infinitely worse. The Cloud is what happens when the cost of storing data is less than the cost of figuring out what to do with the crap. One study found that data stored by UK engineering and construction industry firms had risen from an average of three terabytes in 2018 to 26 terabytes in 2023. That’s a compound annual growth rate of 50%! That sort of crap data explosion happened—and is happening—everywhere. And nobody in management cares because it’s so ‘cheap’ to store data. And this is what AI is being trained on. And we wonder why AI gets stuff wrong so often? Crap data in. Crap data out. And nobody cares. Particularly at a senior management level, nobody cares. Senior management is full to overflowing with Big Tech groupies chanting about the latest tech miracle that’s going to magically transform and supercharge their careers. Having to deal with senior managers has always been the most unsavory part of my job, because when it comes to technology, these managers exist on a whole other level of stupid vanity and narcissistic pursuit of their own selfish agendas.

Related Posts

Leave a Comment