Page 1 of 1

recv() with large buffer size

Posted: Wed Jun 30, 2021 5:50 am
by staphen
I'm trying to implement multiplayer for Switch in DevilutionX, but I don't actually have a Switch console I can use to run homebrew applications. I've been doing my testing with Ryujinx and have been asking members of the DevilutionX community to do testing on the Switch console.

When I was testing with Ryujinx, I encountered an error when attempting to receive data on a socket. I initially assumed this was a Ryujinx issue and used a workaround to make the game work for my testing. However, it seems that the same error occurs when testing on the Switch console as well. You can see more details in an issue I opened on their GitHub issue tracker.
https://github.com/Ryujinx/Ryujinx/issues/2393

I've been digging through IPC marshalling documentation and libnx code to try to understand what a 0x22 buffer is, and why this failure is occurring. Honestly, based on my limited understanding of the IPC marshalling, I wasn't able to identify any obvious issues in libnx. However, I did learn that the code in DevilutionX was using a sufficiently large buffer that the HIPC auto-selection is attempting to use buffer descriptor B instead of buffer descriptor C for these requests.

I started experimenting with different buffer sizes in a simple test program by merging the nxlink example with code from https://www.thegeekstuff.com/2011/12/c- ... ogramming/. It seems that the recv() function fails in Ryujinx for buffer sizes greater than 32768. Given that I'm not able to test this against the Switch console, I was a bit hesitant to post it as a libnx issue. I was wondering if anyone here had any more information about why the recv() function fails for large buffer sizes.

I've attached the source code for the test program. The server code runs on Linux, and the client runs on the Switch. I ran the server and Ryujinx on the same system so the IP address is hardcoded in client.c to 127.0.0.1.

Re: recv() with large buffer size

Posted: Wed Jun 30, 2021 10:07 pm
by staphen
I heard back from one of our testers earlier today, and it seems that lowering the buffer size did not fix the issue on the Switch console. This suggests that perhaps the issue on the Switch console is not related to the issues I had in Ryujinx. You can see more details in the following pull request.
https://github.com/diasurgical/devilutionX/pull/2268

I feel this discovery makes good sense, but it also leaves me with a conundrum. I'm no closer to solving the issue our tester is having on the Switch console, and I'm afraid I may have gone as far as I can using Ryujinx to troubleshoot. At this point, I'd say any advice you can give would be helpful.

Re: recv() with large buffer size

Posted: Thu Jul 01, 2021 12:20 am
by WinterMute
The code you've attached here essentially works on hw for me. You're leaking sockets though so it fails after a few connections. You're also memsetting to '0' rather than 0 when clearing structs.

Code: Select all

#include <string.h>
#include <stdio.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <sys/errno.h>
#include <unistd.h>

#include <switch.h>

int runClient(int argc, char *argv[]);

int main(int argc, char **argv)
{
        //consoleDebugInit(debugDevice_SVC);
    //stdout = stderr;

    consoleInit(NULL);

    // Configure our supported input layout: a single player with standard controller styles
    padConfigureInput(1, HidNpadStyleSet_NpadStandard);

    // Initialize the default gamepad (which reads handheld mode inputs as well as the first connected controller)
    PadState pad;
    padInitializeDefault(&pad);

    // Initialise sockets
    socketInitializeDefault();

    printf("Hello World!\n");

    // Display arguments sent from nxlink
    printf("%d arguments\n", argc);

    for (int i=0; i<argc; i++) {
        printf("argv[%d] = %s\n", i, argv[i]);
    }


    // the host ip where nxlink was launched
    printf("nxlink host is %s\n", inet_ntoa(__nxlink_host));

    // redirect stdout & stderr over network to nxlink
    nxlinkStdio();

    // this text should display on nxlink host
    printf("printf output now goes to nxlink server\n");

    // Main loop
    while(appletMainLoop())
    {
        // Scan the gamepad. This should be done once for each frame
        padUpdate(&pad);

        // Your code goes here
        char *clientArgv[] = { "run-client", inet_ntoa(__nxlink_host) };
        runClient(2, clientArgv);

        // padGetButtonsDown returns the set of buttons that have been newly pressed in this frame compared to the previous one
        u32 kDown = padGetButtonsDown(&pad);

        if (kDown & HidNpadButton_Plus) break; // break in order to return to hbmenu

        if (kDown & HidNpadButton_A) {
            printf("A Pressed\n");
        }
        if (kDown & HidNpadButton_B) {
            printf("B Pressed\n");
        }

        consoleUpdate(NULL);
    }

    socketExit();
    consoleExit(NULL);
    return 0;
}

int runClient(int argc, char *argv[])
{
    int sockfd = 0, n = 0;
    char recvBuff[0xFFFF];
    struct sockaddr_in serv_addr;

    if(argc != 2)
    {
        printf("\n Usage: %s <ip of server> \n",argv[0]);
        return 1;
    }

    memset(recvBuff, 0 ,sizeof(recvBuff));
    if((sockfd = socket(AF_INET, SOCK_STREAM, 0)) < 0)
    {
        printf("\n Error : Could not create socket \n");
        return 1;
    }

    memset(&serv_addr, 0 , sizeof(serv_addr));

    serv_addr.sin_family = AF_INET;
    serv_addr.sin_port = htons(5000);

    if(inet_pton(AF_INET, argv[1], &serv_addr.sin_addr)<=0)
    {
        printf("\n inet_pton error occured\n");
        close(sockfd);
        return 1;
    }

    if( connect(sockfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) < 0)
    {
       printf("\n Error : Connect Failed \n");
       close(sockfd);
       return 1;
    }

    while ( (n = recv(sockfd, recvBuff, sizeof(recvBuff)-1, 0)) > 0)
    {
        recvBuff[n] = 0;
        if(fputs(recvBuff, stdout) == EOF)
        {
            printf("\n Error : Fputs error\n");
        }
    }

    if(n < 0)
    {
        printf("\n Read error \n");
    }

    close(sockfd);

    return 0;
}

Re: recv() with large buffer size

Posted: Thu Jul 01, 2021 5:16 am
by staphen
Thanks. We are using the ASIO library in DevilutionX so we shouldn't be leaking sockets or using memset incorrectly there, but I very much appreciate the thorough review. Also, this does confirm the large buffer size issue must be isolated to Ryujinx. So perhaps the DevilutionX issue has something to do with how the socket gets set up.

I went ahead and added some trace logs in Ryujinx to see how the simple example differs from what DevilutionX is doing when it attempts to join a game. Here is the trace of the BSD socket calls leading up to the error. FYI, it calls recv() 250 times before failing with the message "Unable to connect".

Code: Select all

00:00:44.315 |W| HLE.OsThread.6 ServiceBsd SocketInternal: socket(InterNetwork, Stream, Tcp)
00:00:44.315 |W| HLE.OsThread.6 ServiceBsd Connect: connect(7)
00:00:44.339 |W| HLE.OsThread.6 ServiceBsd SetSockOpt: setsockopt(7, Tcp, NoDelay)
00:00:44.355 |W| HLE.OsThread.6 ServiceBsd Fcntl: fcntl(7, 3, 0)
00:00:44.355 |W| HLE.OsThread.6 ServiceBsd Fcntl: fcntl(7, 4, 2048)
00:00:44.694 |W| HLE.OsThread.6 ServiceBsd Poll: poll(2, 0)
00:00:44.715 |W| HLE.OsThread.6 ServiceBsd Send: send(7, [None])
00:00:44.751 |W| HLE.OsThread.6 ServiceBsd Poll: poll(2, 0)
00:00:44.767 |W| HLE.OsThread.6 ServiceBsd Recv: recv(7, [None])
00:00:44.779 |W| HLE.OsThread.6 ServiceBsd Poll: poll(2, 0)
00:00:44.780 |W| HLE.OsThread.6 ServiceBsd Recv: recv(7, [None])
00:00:44.790 |W| HLE.OsThread.6 ServiceBsd Poll: poll(2, 0)
00:00:44.790 |W| HLE.OsThread.6 ServiceBsd Recv: recv(7, [None])
...
Based on this, it looks like these are the things we're doing differently from the example code.

1. Set the TCP NoDelay option
2. Calling fcntl(s, F_GETFL, 0)
3. Calling fcntl(s, F_SETFL, O_EXCL)
4. Calling poll before each send/recv
5. Sending data on the socket before receiving

Digging into the ASIO source code, the fcntl() calls are probably the result of calling ioctl(s, FIONBIO, &arg) to clear the non-blocking flag. It looks like the ioctl() function in libnx calls fcntl() under the covers for the FIONBIO request.

The poll() calls look a little weird. It says it's polling two sockets. I suspect the second socket comes from ASIO's "socket_select_interrupter" that creates a connection to a local listening socket for the purpose of interrupting blocked epoll/select system calls. So this is probably normal.

I can start tweaking things for our tester to see if I can narrow it down from here. Do you see anything that looks like a red flag? If so, maybe I can start there.