admin管理员组

文章数量:1122832

I hope you are doing well. I am working with eBPF and tc on the egress side to add a PPPoE header to forwarded and locally generated packets. Due to GSO/TSO, I observe packets larger than the MTU size, and I see outputs like skb->gso_size = 1452, skb->gso_segs = 3, skb->len = 4410. My concern is whether the MAC and PPPoE headers, which I inserted in the tc program, will be properly included in each fragment generated by GSO/TSO.

Since fragmentation happens at the IP layer, and PPPoE operates at the link layer, I understand that each fragment should be a complete link layer frame. However, I’m wondering if GSO will replicate the MAC and PPPoE headers in each fragment. If that’s the case, how should I handle the len field of the PPPoE header for each fragment? Or is there a specific approach I should use to ensure that each fragment is correctly processed?

Here’s the eBPF code I’m working with:

SEC("tc")
int pppoe_egress(struct __sk_buff *skb) {
#define BPF_LOG_TOPIC "pppoe_egress"
    void *data_end = (void *)(long)skb->data_end;
    void *data = (void *)(long)skb->data;

    u32 pkt_sz = skb->len - 14;
    if (pkt_sz > pppoe_mtu) {
        bpf_log_info("egress package too large size: %u", pkt_sz);
        return TC_ACT_SHOT;
    }

    struct ethhdr *eth = (struct ethhdr *)(data);
    if ((void *)(eth + 1) > data_end) {
        bpf_log_info("package size smaller than ethhdr");
        return TC_ACT_SHOT;
    }

    if (eth->h_proto != ETH_IPV4 && eth->h_proto != ETH_IPV6) {
        bpf_log_info("egress eth proto is error: %x", eth->h_proto);
        return TC_ACT_PIPE;
    }

    u32 offset = 14;
    u8 protocol = 0;
    u16 mss_value = 0;
    u16 ppp_proto = ETH_PPP_IPV4;
    // DECAP support since Linux kernel 6.3
    u64 adj_room_flag = BPF_F_ADJ_ROOM_ENCAP_L3_IPV4;
    if (eth->h_proto == ETH_IPV6) {
        ppp_proto = ETH_PPP_IPV6;
        adj_room_flag = BPF_F_ADJ_ROOM_ENCAP_L3_IPV6;

        struct ipv6hdr *iph6;
        if (VALIDATE_READ_DATA(skb, &iph6, offset, sizeof(*iph6))) {
            return TC_ACT_SHOT;
        }
        protocol = iph6->nexthdr;
        offset = offset + 40;
        mss_value = pppoe_mtu - 40 - 20;
    } else {
        struct iphdr *iph;
        if (VALIDATE_READ_DATA(skb, &iph, offset, sizeof(*iph))) {
            return TC_ACT_SHOT;
        }
        protocol = iph->protocol;
        offset = offset + (iph->ihl * 4);
        mss_value = pppoe_mtu - (iph->ihl * 4) - 20;
    }

    if (protocol == IPPROTO_TCP) {
        mss_clamp(skb, offset, mss_value);
    }

    u16 l2_proto = bpf_htons(0x8864);
    bpf_skb_store_bytes(skb, 12, &l2_proto, sizeof(u16), 0);

    int result = bpf_skb_adjust_room(skb, 8, BPF_ADJ_ROOM_MAC, adj_room_flag);
    if (result) {
        bpf_log_info("egress adjust room error %d", result);
        return TC_ACT_SHOT;
    }

    struct pppoe_header pppoe = {
        .version_and_type = 0x11,
        .code = 0x00,
        .session_id = bpf_htons(session_id),
        .length = bpf_htons(pkt_sz + 2),
        .protocol = ppp_proto,
    };

    bpf_skb_store_bytes(skb, sizeof(struct ethhdr), &pppoe, sizeof(struct pppoe_header), 0);
    return TC_ACT_PIPE;
#undef BPF_LOG_TOPIC
}

Any suggestions for handling this properly with tc in such a case? Thank you so much for your help!

Best regards

本文标签: