davidktw
Arch-Supremacy Member
- Joined
- Apr 15, 2010
- Messages
- 13,547
- Reaction score
- 1,301
Allow me to say this. What you have shown does not impress me at all. I'm not trying to be difficult or offending,
Lets do a proper sizing:
100+ mails per minute, lets oversize this 10x to 1,000 mails per minute.
A typical email is around 75KByte, lets oversize this 10x to 750KByte per email.
Your email data rate is:
1000 x 750KB = 750MB per minute or 750/60=12.5MB per second
Ultrastar C10K900 - Sustained transfer: 117MB/sec to 198MB/sec
At the worst performance of this 10K SAS Harddisk:
You are using only 12.5/117=10.6%
At the best performance of this 10K SAS Harddisk:
You are using only 12.5/198=6.3%
I wonder why you would ever need H/W Raid when you are only using between 6.3-10.6% of the available data transfer rate? (And I already have oversized both your mails per minute and mail size by a factor of 10)
How is 6.3-10.6% "eating your I/O like daily meal"?
I wonder why your corporate IP is worth protecting with performance numbers like this?
And why you would need a distributed design for such low volume of emails per minute?
Or have you identified the real bottleneck?
But of course, you are better at this then me.
DISCLAIMER: Not for the faint of heart. It's a harsh email to "educate" some smart alex. Turn away or close your browser if you are not prepared for it. Thanks
I can see what you are lacking... Sustained data transfer rate ? I don't know what you have learnt between random access and sequential access. Please go read up more on this subject before you start calculating on my I/O. Before you start acting like a smartalex, please show some credibility. If you want to show me that you know what I am talking about, at least, try harder to know what my architecture is in the first place. Don't make headless guesses. You have no idea how I utilize the disk and how my design works and what is the kind of I/O pressure it is inflicting on the disk and whether such I/O are necessary in the first place. If they are, what they are protecting against. Don't start measuring I/O requirement not knowing if it's sequential or random. It make yourself looks silly when you try to push buttons on your calculator with all those numbers showing like you know what you are calculating but actually calculating the wrong things in the first place.
It's often "It's not the right answer that make sense, it's the right question that makes sense"
You are such a funny fella. It is so obvious to see what your mentally are. Indeed not the same league. Go use your software Raid, if you are so confident ? You wanna feel good about it, just go ahead. Don't let me stop you. lolx.....
I actually don't want to say this, but as a reminder to some smart alex out there. Be prepared to know who you are corresponding against before you start using words like "I assume you need a lesson...". There are times why these words are used, when you know who your opponent is, and what is the calibre differences or status differences between both. When you have little information about what is my job scope and what my resume is, you don't use such silly words because you could be interacting with a professor expert in certain department and you will just make yourself looks silly at the end of the day.
Just to shed some light about myself. I am a project manager turned senior consultant in an IT firm. I have lead teams as a Project Manager, in overseas doing highly valued projects overseas with large telco with another overseas cost centre in another country. I don't expect you should feel that I'm doing some signifiant resume because there are alot of project managers also doing the same. But at least when you use words like you want to teach me something, be sure what you are advocating are industrial standards and also well proven and well learnt.
As much as I don't know much about you except you mentioned you are a software engineer in an SME firm, you could be a talent too. I respected that much with your knowledge, but I don't appreciate if you start showing some silly behaviour trying to be smart when you significantly shows how you failed badly with your assumptions and your analytical techniques.
Don't use google, search up the first article in it that mentioned a typical email is 75KB and start plugging in the values in here to show me you have done your homework. I at least demand you sparse through all your emails currently going through in your mail system using a scripting language, or even your own collection of emails will do. Divide by the number of emails and use that as a real value. Mine is 69KB if you are curious. It's smaller than your 75KB, but it is close. Different industry have different sized email. 75KB is an average, but doesn't represent the actual load I'm getting. Suppose I am dealing with a trading company that have the practice of emailling PDF for all their invoices and it's a busy company, your load can easily exceed 75KB.
Since you want to educate me, let me educate you back more about how kernel does it's I/O. Assume your lovely 75KB per email.
1) When you want to read 75KB of the email, your kernel don't issue system I/O to just fetch 75KB. It usually does more with the anticipating that it is required later. This is a form of read ahead optimization.
2) When the same file is written and read again, it is often found in the file buffer cache, where we can get cache hit, which is a good thing, this alleviate I/O pressure from the read disk. To get this value, there is no good tool. The best I come across is SystemTap where you need to probe the kernel to know your true cache hit and misses. It's hard and I don't do it unless I am forced to perform accountability.
3) When you have 100+ mails, reading and writing with changes between the read and write, you get incomplete cache hits and misses. and such behaviour happens randomly, causing extremely heavy random access. We, the well learnt, IT engineers, don't use sustained transfer speed as a measurement. We use IOPS based on an assumed average block size calculation. Assume it is 64KB, even one of the best and fast SAS drive, Seagate Cheetah 15K.7 gives roughly 340+ IOPS at 64KB, which is roughly 21MB/s, not your 100 over. On top of that, I already said I have multiple filters and hence there are more substantial read and write pressure. For your knowledge, the same email is read and written at least 5 times of each in the whole pipeline before getting out of the system. This means, at 100+ MINIMUM, we are talking about an equivalent of 500+ (MINIMUM). I must stress it's MINIMUM because you don't understand what MINIMUM is. My MAXIMUM can be more than 3 times during peak hours and thundering herd because of some pipeline issues. My approach for reliability is STORE and FORWARD, hence the required I/O pressure. After you put in some good sensible values for disk caching and taking into account of the randomness of the I/O, you then can praise yourself asking the right question and giving the ALMOST right answer.
Don't come barking at me, telling me big stories about scalable system design when you have no sense about cost and resources and managing project timelines. Don't come to me telling me how to design scalable system until you have touch on clustering and distributed design on large scale. Don't come barking at me showing some child's play fixes to SQL statement that is consider SQL101 to me and start posting X percent enhancement. They are too simple minded to educate me at my level.
I am very keen to share technologies and findings with strangers, peers, colleagues and friends. My directors in the company whom are scholars themselves recommend me as the GO-TO person for technical evaluations and advices in a 100+ people company. So please establish some real calibre before you start barking at me with numbers and examples. That doesn't mean one shouldn't show me examples, that's a good way to communicate, but do it sensibly with the intention to share, not SHOW OFF. I don't appreciate SHOW OFF unless you have what it takes to do that.
Don't come critising what my corporate values until you have join my company, doing REAL WORK with me as a peer. At this moment of time, I am pretty sure I do things better than you are, if not, this harsh email will be written by YOU, not ME.
When you have accumulated sufficient brownie points to come and play with the big boys, I welcome you. Before that, don't recommend your colleague, peers or friend to read this thread. It makes you looks stupid. Don't ever use your DBMail SQL101 as a way to promote what you have done as excellent, because it isn't.
I still welcome you to correspond in this thread, with first, a proper apology about your unnecessary show off, and we can start on a better page. If not, I highly recommend you keep quiet and keep your "invaluable" knowledge to yourself, so that you don't educate the noobs thinking your attempt is an all-rounded well thought plan when it is purely just looking at the technological side without attempting to look at the real big picture that real IT practitioner does.
To answer your narrow vision of durability and reliability of H/W solution. When we do business, we don't go to Amazon, or EBay, we don't go Newegg. We go to reputed regional distributor that sell big MNC H/W like Dell, HP and IBM. We don't buy one component like a few pieces of H/W Raid card and call it done deal. When the H/W don't deliver the kind of performance or kind of durability as found in the spec, we have the option to go legal with the source. It might only be a small chance of winning, but the representative can be the client or customer using the H/W. As much as we, the vendor, believe in the H/W solution from these MNC, the customer has same set of values and knowledge about them too. So if they find the H/W in par, they which can also be MNC can afford to go legal with the distributor or manufacturer. For your Software RAID, it can potentially contain bugs, which means there is just as much risk that data can be lost, but who can you F ? You can only F yourself or your client F you when things screw up. The chances of IBM and HP go down is much much less than your SME. They do businesses with turn over maybe 10 times and 100 times more than what a normal SME does. If the customer has concern about these hardware, they probably much 100 times more concern about giving your company to do the solution.
Even when IBM or HP goes bust, there will be investor willing to take over and move on from there. These big MNC form the backbone of todays' technological world. They don't go bust just like this. Think wider and harder at a broader landscape. Even if they do, the distributors would most certainly have stocks that can last at least ONE migration exercise. So please, don't go around telling people this is the reason why you have chosen S/W RAID. Real IT Practitioner will laugh over your ignorance. So when you think on the hind side, you have paid for highly priced components for risk management too.
When we, the proper IT practitioner talk about data storage, we don't call RAID the last stop for data reliability. We have proper well spec backup storage solutions. We do backup daily and we try our very best to prevent against data lost. We do offsite backup if we want geographical data protection. So your H/W or S/W Raid is a not a consideration in my whole architecture. Your worries are unfounded and only for consideration when budget is concern.
When my proper high level architecture of 2 servers per resource whether it's Active/Passive or Active/Active comes up, your storage concern totally dissolved. I have keep on stressing this part in the earlier posts but you have gladly ignore it and keep on indulge in your own silly reasonings.
As much as I see the value in Software Raid, because I also do it myself in various forms like LVM Mirror and Linux MD, I see as much value in Hardware Raid. I architect systems that can handle millions of users in cluster design. I think about the ease of management for the customer. You can go and kiss your S/W Raid arse goodbye if you want to communicate this with real customer. They will ask you go fly kite and don't bother to submit your proposal because the stake holder just ain't confident and interested in your solution. That's why the consumer market and enterprise market is totally different scale and different budget. DOn't use your narrow minded mindset to evaluate how enterprise works when you have not even grasp the differences between random and sequential performances of the block devices.
I have most certainly not know all the IT engineers in the world, but I have did projects with multiple MNC and because I am a realistic person that need to design architecture for real operations, I have to consider there just isn't sufficient IT engineers that are willing to take up the responsibility of dealing with S/W Raid operations and it is damn alot easier to deal with H/W Raid with much higher reliability when you consider the whole eco system.
Hence to answer your silly requirement to fill up the following
Software RAID:
Hardware RAID:+Performance
+Reliability
++Recovery
+Portability
+Cost
-Min Skillset
-Industrial Compliance
++Performance
+Reliability
+Recovery
-Portability
-Cost
+Min Skillset
+Industrial Compliance
Your simple charting is not to my taste, I have added in real consideration for your evaluation. When you come up to my level, you can start challenging me again on this topic.
Last but not least, wanna be a parrot ? Don't act like one here. I'm sure you can do better at Jurong.
Last edited: