C undefined input string length program

rastignac@programming.dev · 1 year ago

C undefined input string length program

adriator@lemmy.world · 1 year ago

It’s due to the way getchar() and console input work. When you enter “abcdCTRL+D” on the keyboard, here’s what happens:

abcd characters are added to the stdin buffer and consumed by getchar()
CTRL+D (or EOF on Unix, CTRL+Z on Windows) is left in the input buffer (it’s not yet consumed by getchar()!))
The console is waiting for more input, and the next time you press ENTER or CTRL+D the previous one will be consumed by getchar()

Think about this scenario: What happens if you only enter “abcd” and not press anything else? The program will still be waiting for more input. It needs to receive a signal telling it to stop with the input and proceed with the code execution. But if you press enter, it won’t automatically add a new line to the string, because the new line character is still in the input buffer.

RangerHere@programming.dev · 1 year ago

This guy is right. I saw OP’s post but I did not have enough time to reply at the time. I came back to reply to OP’s post, but you are already right.

rastignac@programming.dev · 1 year ago

Thanks a lot for your answer. This might be because of my shallow understanding of how a buffer works, but I don’t understand why EOF isn’t consumed by getchar() when the other bytes are consumed. Isn’t a char just a number and EOF too (-1 I think)? I probably should try and understand buffers more

offbyone@reddthat.com · 1 year ago

If you’re on Linux then I’m pretty sure the confusing behavior you’re seeing is due to the line buffering the kernel does by default. Ctrl+D does not actually mean “send EOF”, and it’s not the “EOF character”, rather it means “complete the current or next stdin read() request immediately”. That’s a very different thing, and sometimes it means EOF and other times it does not.

In practice what this means is that, if there is no data waiting to be sent on stdin then read() returns zero, and read() returning zero is how getchar() knows an EOF happened. The flow looks like this:

Your program calls getchar().
getchar() calls read() on stdin and your program blocks waiting for input.
The user presses Ctrl+D on the tty, having not typed anything else.
The kernel immediately ends the blocked read() call and returns zero bytes read.
getchar() sees that it got no bytes from read() and returns EOF.
Your program sees that and exits the loop.

However, in practice it doesn’t work that cleanly because the tty is normally operating in “cooked” mode, where the kernel sends input to your program line by line, allowing the user to edit a single line before sending it. The way this works is by buffering the stdin contents and sending it when the user hits enter. Going back to Ctrl-D, you can see how this screws things up, leading to the behavior you see:

Your program calls getchar().
getchar() calls read() on stdin and your program blocks waiting for input.
The user types some input, but does not hit enter. This data sits in the kernel’s stdin buffer and is not send to your program yet.
The user presses Ctrl+D on the tty.
The kernel immediately ends the blocked read() call and starts returning the currently buffered stdin input, without waiting for an enter press.
getchar() sees that it got a byte from read() and thus returns it.
Your program starts getting all the previously buffered bytes and keeps running until getchar() has seen all of them.
getchar() calls read() on stdin. There’s now no bytes in the buffer so you block waiting for input, the same as before. The previous Ctrl+D was already “used up” to end the previous read() call so it doesn’t matter any more.
The user types Ctrl+D.
Because there is currently no input in the line buffer, read() returns zero. getchar() sees this and returns EOF.

In the above case Ctrl+D doesn’t work as expected because of the line buffering. The read() call ended early without waiting as expected, but your program just starts receiving all the buffered input so it doesn’t have any idea you pressed Ctrl+D and never gets the read() == 0 EOF condition. Additionally the Ctrl+D is a one-time deal, it ends one read() call early and sends the buffered input. When you call read() again with nothing to send it just blocks and you have to do another Ctrl+D to actually get read() to return zero.

You can see the line buffering behavior if you add a putchar() inside your loop. The putchar() doesn’t actually print while you type the characters, it only prints after you hit either enter or Ctrl+D, showing that your program did not receive any of the characters until one of those two actions happened.

rastignac@programming.dev · 1 year ago

Thanks a lot for the in depth explanation, this makes things a lot clearer. I’ll try ‘putchar()’ and test a few more things and then come back to read this post again

RangerHere@programming.dev · edit-2 1 year ago

Here are couple suggestions about how to improve your algorithm:

First of all, you should reduce the number of calls to realloc function. This is because, this function will often need to switch to the kernel space to be able to do the reallocation. I think it is nice to allocate the same size as a single page or multiples of page size from the virtual memory. I think you should allocate 4KB or 2MB of memory at the beginning of the function. Then reallocate multiples of the page size when you need to reallocate memory.

Second of all, reading the input one character at a time is also time-consuming. Repeatedly calling this function means you will end up going to the kernel space, grabbing a single character from there, then coming back to the user space (I used this as an example, there are many buffers between your application and kernel space). Instead I would suggest you to read 4 kB at the time using read or fread functions.

If you do not know about files, caching, virtual memory, page sizes, kernel space, user space, and optimization then please disregard everything I said. This will only confuse you now. I know it is a lot of fun to start thinking about optimization when you are learning a new programming language, on the other hand as mathematician and computer scientist Donald Knuth said, premature optimization is the root of all evil.

I hope this answer helps you.

rastignac@programming.dev · 1 year ago

Thank you, I realize that there’s a whole other aspect I didn’t even consider. I’m new to C and Linux so I’ll follow your advice but it’s making me want to learn more. Thanks again to both you and @adriator for your answers