HansdeJongh's Posts

this is what wireshark sees when i look at the scsi.request_frame
Hello, I have multiple dell poweredge r610/r620 connected to a stack of 4 powerconnects 6224/6248 to 4 equallogic's (PS5000E + PS4000E 2X) + PS6100XV). the PS5000E is in pool 1 the 2 PS4000E... See more...
Hello, I have multiple dell poweredge r610/r620 connected to a stack of 4 powerconnects 6224/6248 to 4 equallogic's (PS5000E + PS4000E 2X) + PS6100XV). the PS5000E is in pool 1 the 2 PS4000E's are in pool 2 (holds 2 datastores, all on 7 esx hosts) the PS6100XV are in pool 3. (holds 7 datastores, all on 7 esx hosts) I have test it both to pool2 and pool3. The most problems are on pool 3. but it also happens on pool 2. Each esxhost is configured like equallogic wants it. 2 nics dedicated for iscsi traffic purely for vmfs.. VMnic's are bound correctly... Dont use those nics for anything else.. Windows guest vms also have access to the san through the microsoft iscsi initiator and 2 seperate vmnic's on all esxhosts. Windows guest iscsi traffic does NOT experience the same problem. The problem is easily reproduced: when i do a random or seq read io`s from io meter with a 64KB io size i get a latency of 140+ and a throughput of 28MB...) I have been working for more then a week now on this case and i have as far as i know tried everything. I have created a document in which i have putted all my brain waves and results in it.: Problems: 2 paths slow Throughput per path is 14MB and latency = 141ms per path (230 iops) If I disable one path throughput stays the same but latency goes up to 300MS Question: why does the latency double when disabling one path? 1 path slow and one path fast First path: 1600 iops / 100mb per sec / 19ms latency Second path 30 iops / 14mb per sec / 142ms latency Question: why is the latency on the bad path only 142ms and not 300? 2 paths fast Both paths. 1600+ iops / 100mb per sec / 17 ms per path IO size comparison: (see pictures and xlsx sheet.) Path’s will never automatically become fast if they gone “bad” Path’s will go bad after a while If I reset the switchport of the effected path the problem is gone for that path. Sometimes if I reset the switchport for an effect path the problem the other path also get solved.. The problem only occurs with big io’s. On Broadcom nics the problem also shows up but it’s a lot better (50mb per sec and latency of around 70-90ms). If the paths are good I can keep it running for 24 hours without any issue’s but if I stop the test and start it soon afterwards (1 hour?) the problem suddenly occurs It always OR bad OR good right when I start the test. If I move my test volume from pool 3 to pool 2 and back the problem is gone for that test volume. It seems that when I only do 64kb test it takes a lot longer before the path’s go bad then when I try with first 0,5kb/4/16/32/64/128. Almost allways after the first test run the path’s become bad… When its bad for volume 1 on host 1 it doesn’t necessarily  mean its bad for volume1 on host 2 It seems like it takes longer for my test volume to go bad (maybe because it used less?) Disabling all acl’s doesn’t make any difference DDOS prevention on the switch is disabled QoS is not active. Flow control active or inactive doesn’t make any difference There seems to be a difference in latency what the san sees: the san sees 10% less then esxtop. On very small IIO’s this can go up to 50% (2ms compared to 4ms on esxtop) I have created 2 portgroups on the same vswitch as I use for iscsi. I created a port group a and b. Both bound to the same nic’s as the vmknic’s. So one for each port. Then from within my test vm I tried to run the tests on a esx datastore and the problem occurs right away. But I have no problem when I access a windows volume. Disabling the nic in esx (esxcli network nic down –n vmnic5) and then enabling it again solves the problem for that path…. Wireshark traces show the same latency when I use statistics \ Service Reponse time \ SCSI. I have test this but on the virtual switch (port group in prom mode) and by mirroring both the uplink of the esx host and uplink of the san. Test I have reset all counters on the powerconnect switch. (9:35 CET) On the ESX09 the time is 07:39 UTC 2012. Then I started a test turn its running on datastore “ESX-SAS-03”. On hour later, I don’t see any dropped frames on the used interfaces. Neither do I see any pause frames being send or received. Neither are there any warning or errors on the EQL or vmware. There were no other vm’s running on the host at that time. please help:( cause dell/ equallogic support isnt taking me seriously (they blame my switches because i route iscsi traffic between vlan's and have acl's. But that iscsi traffic is not effected). they tell me to break down the stack.) They didnt even talk to use. Everything happend through email. There was no webex what so ever... Regards Hans
i found out that disabling jumbo frames on a intel card solves the problem. I also tried it with a broadcom adapter and i had no problem what so ever.
that was already done by dell(supplier) and vmware
Jumboframes work fine from a vmguest (microsoft) iscsi initiator through the same nics that the vkernel ports use. (and write performance is very good) Jumboframes work fine from the vkernel/vsw... See more...
Jumboframes work fine from a vmguest (microsoft) iscsi initiator through the same nics that the vkernel ports use. (and write performance is very good) Jumboframes work fine from the vkernel/vswitch when put a broadcom card in the vswitch (and write peformance is very good) Jumboframes work fine from the vkernel/vswitch when put a intelcard in the vswitch (but write performance is bad) with "work fine" i mean i can ping with large sizes. So end to end jumboframes are enabled. And yes i swapped the cables on the intel/broadcom cards.. so forward rates/packet processing are not the problem! otherwise i would have had the same problems with broadcom and also from a vmguest???? right??
so i found the problem: jumbo frames! the past few days i have been doing alot of testing: the problem doesnt occure when i use broadcom nic's (also when i have jumbo frames enabled) The p... See more...
so i found the problem: jumbo frames! the past few days i have been doing alot of testing: the problem doesnt occure when i use broadcom nic's (also when i have jumbo frames enabled) The problem occure's on different intel nic's the solution is simple: disable jumbo frames. Now i suddenly get 180MB per sec and 18ms latency on 2 intel nic's So i think its a bug or something.. Vmware is currently looking to it. I wonder how many ppl have this problem without being aware...
when i look with esxtop i can see that there is a latency of around 50ms But when i look on the san i cant see that latency.... when i shut one of the uplinks of the host down (on the switch)... See more...
when i look with esxtop i can see that there is a latency of around 50ms But when i look on the san i cant see that latency.... when i shut one of the uplinks of the host down (on the switch) latency goes up to 100ms. so the total latency goes back to 50 when i have 2 nics enabled... besides that if i run the same test on a different machine and on a different host but on the same san i get twice the overall speed. But if i all put them on the same host the total speed will drop untill both test machines run at the same speeds as one alone on the host would do...
hi, already tried on a different node and also played with both lsi sas and paravirtual. also thick or thin doesnt make a difference...
Hello, I have a single esx host in a cluster. On this host i have only 1 vm. The esx host is connected to a eql group. In this group i have multiple pools and members. I have the same problem... See more...
Hello, I have a single esx host in a cluster. On this host i have only 1 vm. The esx host is connected to a eql group. In this group i have multiple pools and members. I have the same problem to all pools but for testing purposes i`ll stick to 1 member. the member is a PS6100XV (24x 600gb 15k) in raid50. this is member is connected to a stack of 4 powerconnects 62xx series. the host is a dell poweredge r610 (but also a r620 gives the same problem). the host is configured as the best pratices of equallogic state: 2vmknic's + 2 physical nics + heartbeat ip Besides that is has 2 seperate dedicated nics for iscsi traffic from within the vm. My test vm has 4vcpu and 8GB memory and 4 nics (2 for mgmt purposes and 2 for iscsi traffic). In the vm i have installed the latest drivers from equallogic for iscsi MPIO it has multiple drives: c: (os) is a volume on a vmfs which resides on the ps61000xv e: (testing) is a seperate volume on a seperate vmdk on a seperate vmfs which resides on the same ps6100xv f: (testing) is a volume connected directly to the san through the ms iscsi mpio driver. as a test i have run sqlio (http://tools.davidklee.net/sqlio.aspx) i have run the same test over and over again on both e and f and it keeps giving me the same results. Read performance is overall the same, but write performance is alot slower on a vmdk file then on a direct iscsi target. I would it expect it to be the same, so am i wrong?
just heared from vmware that 128 is the default size, they will "split" any block that is larger then that back to 128KB...
thanks, but what i mean: its like the io is already splitted by vmware? Cause i generate a 256KB but that gets splitted back to 128kb...
Hello, I have an issue with my storage (EQL PS6100 24x 600GB 15k sas in raid 50). When i do a simple file copy within a VM its very slow, after investigation i discoverd that vmware and my san... See more...
Hello, I have an issue with my storage (EQL PS6100 24x 600GB 15k sas in raid 50). When i do a simple file copy within a VM its very slow, after investigation i discoverd that vmware and my san reports a very high write / read latency when i generate bigger (64kb +) io blocks. Besides that, when i go above 128KB blocksize my san keeps reporting its doing 128KB blocks. Even when i generate 256 or 512KB blocks. I installed windows directly on the same hw and ran the same tests then i have no problem what so ever (also the high latency is allmost gone).. I`m testing with iometer: 512b 100% write 0%random 4kb 100% write 0%random 16kb 100% write 0%random 32kb 100% write 0%random 64kb 100% write 0%random 128kb 100% write 0%random 256kb 100% write 0%random 512b 100% read 0%random 4kb 100% read 0%random 16kb 100% read 0%random 32kb 100% read 0%random 64kb 100% read 0%random 128kb 100% read 0%random 256kb 100% read 0%random physical: virtual: Everything is latest version, mem driver, eql firmware, nic drivers, esx etc etc I`m using dell switches (pc62xx) and a mix of r610+r620+PE1950/2950 Regards
Hi KjB, Did you test this? I have been testing it but as far as i can see it will only work when the 2 vm`s are on the same host. This is also confirmed by a vmware support engineer regard... See more...
Hi KjB, Did you test this? I have been testing it but as far as i can see it will only work when the 2 vm`s are on the same host. This is also confirmed by a vmware support engineer regards
Hello, Maybe a "stupid" question: I need Port mirroring for an IDS system. I have a VDS with 2 uplinks and around 43 portgroups (all different vlan's). This VDS is on 2 different hosts.... See more...
Hello, Maybe a "stupid" question: I need Port mirroring for an IDS system. I have a VDS with 2 uplinks and around 43 portgroups (all different vlan's). This VDS is on 2 different hosts. Now i need to mirror all traffic on all ports to IDS virtual machine. Is that possible? I mean what happens with traffic that is on host 2 while the IDS (port mirror target) is on host 1? And other "stupid" question: what happens if my 2x 1GB uplinks are completely serturated? Regards Hans
Hello, We are currently looking into moving the VMguest swap memory file to different storage array. If we do that, what will happen to the (running) vmguest when that storage array is sudd... See more...
Hello, We are currently looking into moving the VMguest swap memory file to different storage array. If we do that, what will happen to the (running) vmguest when that storage array is suddenly unavailable due to a crash or something.. Will the vmguest keep on running? We allmost have any swapping to disk so nothing to worry about that... Regards Hans
I got conformation from EQL that they have found the cause of the problem and that it should be solved in the next firmware release (5.08?) Regards hans
2 vCPU (no other vm`s on the host) 30% but it seems to be a couple of host have this problem, not all. AND! its also when i do a storage vmotion.. so it seems not to be vm related. If i... See more...
2 vCPU (no other vm`s on the host) 30% but it seems to be a couple of host have this problem, not all. AND! its also when i do a storage vmotion.. so it seems not to be vm related. If i vmotion the vm to another host its going faster (80MB per sec) the r610 dont have the problem but some 2950/1950's
Have you tried with vmxnet3 NIC? yes Are you using Jumbo Frames? yes As written guest iSCSI may require more vCPU, so try also with a simple vmdk on a iSCSI datastore to see if there ar... See more...
Have you tried with vmxnet3 NIC? yes Are you using Jumbo Frames? yes As written guest iSCSI may require more vCPU, so try also with a simple vmdk on a iSCSI datastore to see if there are some difference. i already did that as 2 of the virtual disks are on a vmfs store.
Its something that i`m experiencing now for a very long time in our network: i`m not getting more then +/- 400Mbit throughput on a vm. first of all my network: a couple of dell servers 1950... See more...
Its something that i`m experiencing now for a very long time in our network: i`m not getting more then +/- 400Mbit throughput on a vm. first of all my network: a couple of dell servers 1950/2950/r610, an EQL PS5000 + EQL PS4000 in one pool, both raid 10. an EQL PS4000 in one pool raid50 (for testing purposes) All connected through a set of dell powerconnect 62xx. My testing was as followed: On esxhost03 I build a windows 2008 R2 machine. This machine has the following disks: Vm files and systemdisk (c:) on the raid50 on the local storage of the esxhost03 (2x 72GB 15k on raid 1) E: a 32GB disk on a vmfs on the raid10 F: a 32GB disk on a vmfs on the raid50 G: a 32GB disk connected through the MS iSCSI initiator with MPIO + the eql HIT kit.(so 2GB throughput) My test file is a 18GB big SQL backup file. I copied this file from and to every disk with the VM and to other vm`s aswell. I always get around 52MB per sec (i measure this on the SAN) I tried different path selection for the vmfs (the eql path selector, or the buildin vmware one) I tried different nics I changed from 1000MBIT full duplex to auto negotiate Here comes the catch... Yesterday i brought down the esxhost03, and installed windows 2008 R2 directly on the hardware (so no hypervisor) I used exactly the same nics (didnt change anything on the switches) I have a system drive (c:) I have a 32GB disk connected through the MS iSCSI initiator with MPIO + the eql HIT kit.(so 2GB throughput) (d:) Now if i copy from the c: to the d: i get a whopping 218MB per sec. If i copy from the d: to the c: i get a 100MB per sec btw i tried also with 4x1Gb on that physical servers and it seems that the limit is around 300MB per sec... I also tried to copy from the raid50 to the raid10 also with MS iSCSI initiator with MPIO + the eql HIT kit then i also get around 300MB per sec... Soooo!!! why is vsphere up to 4 times as slow? I kinda ruled out everything in my opinion except for the vsphere software.. Thanks regards Hans
Hi, Its not a config error. Regards Hans Van: andershansendk Verzonden: maandag 11 april 2011 10:51 Aan: IT creation | Hans de Jongh Onderwerp: New message: "vSphere + PowerConne... See more...
Hi, Its not a config error. Regards Hans Van: andershansendk Verzonden: maandag 11 april 2011 10:51 Aan: IT creation | Hans de Jongh Onderwerp: New message: "vSphere + PowerConnect 62xx + EQL" VMware Communities<http://communities.vmware.com/index.jspa> vSphere + PowerConnect 62xx + EQL reply from andershansendk<http://communities.vmware.com/people/andershansendk> in VMware vSphere™ Storage - View the full discussion<http://communities.vmware.com/message/1733858#1733858