File Space vs. Disk Space |
(Your Web site May Not Be As Big As You Think It Is...)
Our host was sending us love letters that told us we were exceeding our allocated space. We spent two weeks retiring pages, shrinking pictures and reduced our files size to 19 MB, yet according to our host we were still consuming 36 MB of disk space. This, we found out, is because Red Hat Linux, the server operating system our host uses, allocates disk space in 4.3 KB increments with a 4.3 minimum size.
Our average web page was 1 KB. This means that the server was using up 4 times more disk space than the actual file size to store these pages!
If your file size is over by just a hair, say 4,400 bytes, Red Hat has to allocate 8.6 KB of disk space to hold that file.
We did some fast pencil pushing and discovered that you get about 230 4.3 KB or less web pages per MB of disk space. Some of our files were only 400 bytes in size, but they still weigh in at 4,300+ bytes on the Red Hat disk space allocation formula. Thatís 10 times more disk space or 1,000% inefficiency!
Our site had an overall average inefficiency of 180%. If you think in terms of man-hours that means you, as an employee, would get to talk to your girl friend on the telephone 48 minutes after you do one hour of work for the company. Then you have to do another hour of work before you can call her again. Then you get to go the lunch.
If a worker was as efficient at their job as this operating system was at storing small files you would work 4 hours and 25 minutes out of each 8 hour day and goof off for the remaining time.
Go check your web page files right now and see how many of them are above 4 KB in size. Any HTML, SHTML, DHTML or PHTML pages you have that are smaller still weigh in at 4.3 KB on Red Hat Linux servers. If you have files above 4.3 KB and below 8.6 KB they weigh in at 8.6.
The bigger the overall file (such as data bases, shock wave flash animation, MP3 music, mov videos, jpeg images and zip files) the less the amount of inefficient storage, but internet web sites are based on the HTML page which is generally well under 4.3 KB in size. This means if you site has more pictures than pages you are using disk space very efficiently, but if your site is largely text based web pages you are only allowed about 4,600 total pages in a 20 MB site that is based on disk usage.
All the work we did was basically for nothing. Replacing large pages with smaller pages hurt us more than helped us! We would have to totally delete content from the last two years (which amounts to 400 KB in file size but takes up MBs of disk space). To accommodate our site through the next year on a Red Hat Linux server we would need over 100 MB of space just to hold 30 MB worth of files and then we would have to start deleting pages again!
Now, we can't find true fault with our host. In their ad it clearly says "disk space." There is no "truth in labeling" law currently on the books for web hosts (we can all change that by SPAMMING our regulators and elected officials asking for some type of disclosure) like there is for, say, air conditioners which have to provide a BTU (British Thermal Unit) rating so you can see how efficient the unit operates as compared to other air conditioners. The host did nothing wrong, it's up to you and me, the consumer, to be smart, which is why we have magazines and newspapers!
Red Hat Linux is a major, major web site operating system. Why they chose to allocate small (web page sized) files into huge chunks of disk space is largely unknown. We also don't know how the competition (full blown UNIX and NT) compares, but we're looking into that and will report back to you! What we do already know, however, is that NT can be partitioned into as little as 256 bytes per cluster, but it seems like 2 Kb clusters offers larger partitions (over 2 GB). We also learned that Unix reads disk space usage in increments of 1024 (1 Kb) but we're still not totally clear on how all these facts and figures work out in the real world of commercial servers used by web hosts.
On our host the minimum is 4.3 KB and this means that on files sized between 20 and 60 KB there is about a 10% loss of disk space. Anything above 100 KB (0.1 MB) the loss is negligible. Anything below 20 KB (the average web page or banner size) and the loss varies from 35% to 200%. To us here at issues this means that 11 or 12 MB in files totally uses up 20 MB of disk space. That's like buying 20 gallons of gas while only being alowed to fill put 11 gallons into the car.
This method of allocating disk space in larger than necessary chunks potentially hurts every consumer of web hosting serivces since they are getting less usable space than they thought they were actually buying! With the new 64 bit computers now a reality, it is possible that the file clusters can be made smaller, since these new machines can count to higher numbers directly. It is, however, up to those companies like Microsoft, Red Hat and others who make the web server operating systems to remember that a web page is often under 2 or 3 Kb in size.
Someone is going to loose with this inefficient method of dealing with the nominally small files found in HTML pages around the world, just don't let it be you!