SPF looks correct, but one org bounces our mail due to SPF failure

ac513 · ‎06-16-2022

Our organization (a university in the USA) uses Exchange Online for email, and we have Cisco Secure Email sitting in front of it. All of our mail to external organizations outbounds through Cisco Secure Email via an Exchange Online connector. Secure Email has load-balanced inbound and outbound IPs for our appliance... For the sake of easy reading and not revealing our real IPs, we'll say 1.1.1.1 and 2.2.2.2 inbound, and 3.3.3.3 and 4.4.4.4 outbound.

In terms of SPF, all the following are true:

Our SPF records for all of our domains contains Cisco's documented entry for hosted appliances: exists:%{i}.spf.hcXXXX-XX.iphmx.com

Our SPF records are valid and not malformed per every one of dozens of various online DNS tools I've checked. (MXToolbox, DNSChecker, etc)

SPF/DMARC/DKIM all pass fine and our outbound Cisco IPs of either 3.3.3.3 or 4.4.4.4 show as the sending host when running through tests at https://www.learndmarc.com.

Mail sent to all manners of external orgs (Gmail, Yahoo, other universities) are delivering fine, and headers are showing SPF passed.

However, when our users send mail one specific domain in Sweden (susenet.se), we get bounces due to SPF failures:

[recipient_redacted]@susenet.se
Remote Server returned '554 5.0.0 <[194.236.32.3] #5.0.0 smtp; 5.1.0 - Unknown address error 554-'5.7.1 2.2.2.2 does not pass SPF checks for domain [ourdomain_redacted].edu' (delivery attempts: 0)>'

Notice that when this organization bounces our mail for supposed SPF failure, it references our inbound IP for Cisco Secure Email. Obviously our outbound IPs are what should be returned during SPF lookup against *exists:%{i}.spf.hcXXXX-XX.iphmx.com*, which they are in every other mail delivery test scenario I can cook up.

My take on it this that our SPF is correct, and when this organization is parsing through headers or other connection data they are incorrectly parsing the sending IP, looking backward one hop, or something...

Am I wrong? I there anything else to check here? What could the other organization's mail hosts be doing to cause this?

Ken Stieers · ‎06-16-2022

Do you use an SPF shortening/management service?

We had an issue with a company using onDmarc(?) where one of the chained spf records was bigger than 512... it was "valid" but our ESAs didn't like it.

ac513 · ‎06-16-2022

Our SPF records are formatted as below. One entry for our cluster per Cisco during our setup years ago, and one additional IP range for some other stuff:

v=spf1 ip4:x.x.x.x/X exists:%{i}.spf.XXXX-XX.iphmx.com -all (X's replaced with relevant values, of course)

ac513 · ‎06-17-2022

Well, we don't have a definite fix or even a line of communication established with the other organization yet, but I think I might have a theory of what's happening: DNS cache issues on susenet.se's end.

Back in February 2021, Cisco moved our hosted cluster to a load-balanced network model. Previously, our cluster's 12 nodes had IPs of (again, for simplicity sake), 1.1.1.1, 2.2.2.2, 3.3.3.3, 4.4.4.4, 5.5.5.5, and so on.

Cisco picked pairs of these nodes' IPs at random and made each pair the VIPs for inbound, outbound, and M365. The appliances were then given private/internal IPs.

So today, our cluster's SPF record value should return 3.3.3.3 or 4.4.4.4. However, prior to February 2021, that SPF value could have returned any of those nodes' old IPs, including 2.2.2.2.

If this organization is having big problems with DNS cache on either the mail hosts themselves or their DNS resolvers, then I could see how it's latching onto this now-inbound-only IP, and failing SPF.