Packet Loss and Connection Drops During Local VLAN File Transfers (High CPU)
-
I'm currently having an issue with dropped packets on my pfSense 2100.
I have followed all of the below troubleshooting steps, moved all my pfBlocker rules off floating rules, and disabled my traffic shaping rules. The firewall is basically in a default state besides my VLAN rules and WireGuard tunnels. I'm currently using it for DHCP as well as DNS. I have not been able to stop these dropped packet issues. I've had the firewall for about 2–3 years at this point.
Troubleshooting Lost Traffic or Disappearing Packets:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/packet-loss.htmlI've recently been performing file transfers across VLANs on my local network, but the connection consistently drops at exactly 35%, causing the transfer to fail. I monitored my gateways and CPU during the latest attempt and noticed that right as the drop occurs, CPU usage spikes to 100% and gateway latency skyrockets. Immediately after this spike, the SMB connection is lost.
Gateway ping times...

Regarding the packet drops, I notice that sometimes when I log in, the packet loss is initially very high. It briefly settles down to 0.0%, but the loss always seems to return. I haven't been able to resolve this.
Dropped packets...

Does the firewall need to be replaced? Is my unit jus faulty now?
-
@CatSpecial202 You'll need to include more information about your LAN interface/s' configuration. Information about infrastructure downstream will only help to help you.
What kind of ISP connection? What does the 'gateway monitoring' configuration look like for gateway "WAN_DHCP"? If it's an unstable link and a monitoring action is configured (e.g., state flushing), that could be impacting LAN connectivity.
-
@CatSpecial202 said in Packet Loss and Connection Drops During Local VLAN File Transfers (High CPU):
right as the drop occurs, CPU usage spikes to 100%
It is not high throughout the transfer (first 34%), just at one point?
I ask because the 2100 is a bit CPU limited, at least in terms of bandwidth...peak is about 600-700 Mbps. What is your Internet speed? You might monitor "top" or the Diag > System Activity page and see if you can spot something running at that particular moment that uses a lot of CPU.
-
@CatSpecial202 I have been working with 2100’s for years and I’m 100% sure it’s an unacknowledged multi generation bug (present since at least 24.03) in the Built-in 5 port switch - either in software or hardware. I have reproduced it on all 2100’s i have tried, and all builds of pfsense since 24.03
The bulitin 5 port switch links the 2.5Gbit mvneta1 NIC to four 1Gbe switchports. The problem arises when you push the throughput with packets that needs to be routed (and switched) from one VLAN to another on the mvneta1 interface.
Suddenly packets recieved inbound on one VLAN and are transmitted on another, are physically no longer actually being transmitted on the forwarding 1Gbe switch interface. A packet capture reveals that mvneta1 seems to both recieve and transmit the packets, but then suddenly in bursts some of the transmitted packets are lost in the switch (or somewhere in the mvneta1 hardware) and just dissapears - never arriving on the 1 Gbe switch interface.You can easily prove this is a mvneta1/switch issue by reassigning your VLANs to mvneta0. Then you can copy at hardware capacity for as long as you want without dissapearing packets - Since you are using the EXACT same pfsense config, just reassigned interfaces, it shows it’s definitively hardware/driver realted to mvneta1/switch.
it makes no difference if you have all VLANs on one/all of the switch interfaces, or if you .1Q split the two vlans across two 1Gbe switch interfaces. The packet drops happens in the mvneta1 interface or its switch uplink. A likely explanation could be switch buffer packet drops under heavy load because of the bursty behaviour of the 2.5Gbit to 1Gbit speedshift.
Personally I think it’s a speed negotiation issue with the linked 1Gbe interface and the external switch, but I have only been able to test with Aruba switches and it happens with them.
Se my post about the issue here: https://forum.netgate.com/topic/198333/sg-2100-packetloss-in-internal-5-port-switch
I don’t have support supscriptions, so I have never been able to get Netgate support to look at and acknowledge the problem offcially - but it’s there.
-
My post is being flagged as spam...... So, I'm going to be doing this in parts. Frustratingly....
said in Packet Loss and Connection Drops During Local VLAN File Transfers (High CPU):
The problem arises when you push the throughput with packets that needs to be routed (and switched) from one VLAN to another on the mvneta1 interface.
Yup, you pretty much nailed it. This is exactly what I was doing: transferring a large .iso file from VLAN 15 (10.15.x.x) to VLAN 30 (10.30.x.x). I provided more detail on the symptoms below.
-
So, I took about 45 minutes to write up more details quote relevant information and I can't provide it because of a spam filter.
-
@CatSpecial202 You need more upvotes.
-
@SteveITS said in Packet Loss and Connection Drops During Local VLAN File Transfers (High CPU):
It is not high throughout the transfer (first 34%), just at one point?...peak is about 600-700 Mbps...
I was on a Windows computer on VLAN 15 with an SMB share mounted from VLAN 30 on my NAS. I was transferring a large .iso file and it was steady at ~700 Mbps (exactly what you mentioned), but right at 35%, the transfer would drop to 0 MB/s and hang there. Eventually, my SMB connection to the file share would disconnect and I'd be able to quickly reconnect it after the transfer dies.
I moved the transfer over to the same VLAN so it would happen purely at the switch layer, and I was getting a steady 110 MB/s with no drops.
-
@tinfoilmatt said in Packet Loss and Connection Drops During Local VLAN File Transfers (High CPU):
What kind of ISP connection?
My connection is via a cable modem I own (Netgear, permanently in bridge mode). It's only about 3 years old, and I get 300-400 Mbps download. The Gateway Monitoring configuration is set to "Use global behavior (Default)." I've never messed with gateway failure actions or multiple WANs.
To add to this: I have this unit connected to another site (running a pfSense 4200) via WireGuard. Occasionally, when logged into that remote location, I see packet drops registered on the gateway pointing back to this Netgate 2100. All other interfaces at the remote site are stable.
A couple of weeks back, I used MTR to test if I could recreate these packet loss's from my devices on the network with the pfsense 2100. I left MTR running for a while and I can recall not seeing any packet loss.
I just want to note that the packet loss is basically ALWAYS happening. Whenever I login to the firewall I always see packet loss in the gateways.
-
@CatSpecial202 Yeah, I have also used SMB copy of a large file to verify the issue on my different 2100’s.
I have not found a way to verify it by using multiple smaller transfer sessions, as they seem to provide enough fluctuations in throughput to avoid the issue is triggered.Since I’m only using one downlink to my internal switch, I have taken to assigning mvneta0 as LAN (with multiple VLANs) and mvneta1 as WAN on the sites where I do not need the SFP port for WAN.
-
@keyser Wondering out loud, would a per-IP limiter at 90% or 80% or whatever help?
-
@SteveITS said in Packet Loss and Connection Drops During Local VLAN File Transfers (High CPU):
@keyser Wondering out loud, would a per-IP limiter at 90% or 80% or whatever help?
Maybe, but I'm not inclined to implement that for various reasons
-
@CatSpecial202 Have you seen the release notes for 25.11.1 that just arrived?
This section sounds VERY MUCH like a fix for the issue we are seeing.
I hope I will find time to test this pretty soon:“ Netgate 2100
The LAN port link parameters on the Netgate 2100 have been updated to address a potential signal transmission issue.
This issue prevented packets containing a specific byte pattern from being transmitted through the LAN port on the Netgate 2100. No other models are affected. “