one byte == one character... right?
Results 1 to 9 of 9

Thread: one byte == one character... right?

  1. #1
    AO übergeek phishphreek's Avatar
    Join Date
    Jan 2002
    Posts
    4,325

    one byte == one character... right?

    A stupid question here. I've been in the analyzing mode all day... don't know why.

    Anyway...

    I've always been taught/told that one character is equal to one byte.

    Bits in strings of eight are called bytes, and one byte usually represents a single character of data in the computer. It's a little used term, but you might be interested in knowing that a nibble is half a byte (usually 4 bits).
    source

    Ok, I understand that.

    What I don't get.

    Say I create a new text document using notepad. I name the file test.txt and save it with no contents. Since the filename has 8 characters, shouldn't it already have a file size of 8 bytes?

    This is not the case... if the document has no contents then it is 0 bytes. I'm assuming because the document contains no data within the file. But, wouldn't it still require 8 bytes *somewhere* to store the filename? Not to mention *where* to find the file... on the hard drive. (pointer to the sector(s) )

    Why is the space used for filenames excluded?

    Sorry, I know... it's stupid. I've just been over analyzing almost everything I see today.

    I first noticed it when I tried to tftp a document named test.txt to my router only to tell me that it was 0 bytes and it couldn't be copied. Then I opened the doc and typed the word "test" and it transfered over 4 bytes. Not 12 bytes. (the 4 within the doc and the 8 I used for the filename)
    Quitmzilla is a firefox extension that gives you stats on how long you have quit smoking, how much money you\'ve saved, how much you haven\'t smoked and recent milestones. Very helpful for people who quit smoking and used to smoke at their computers... Helps out with the urges.

  2. #2
    ********** |ceWriterguy
    Join Date
    Aug 2004
    Posts
    1,608
    comes from the days of Dos and naming conventions. Back in the day when size really mattered lots (remember 1k ram machines with a 50k hard drive?) there were a maximum of 8 characters allowed for filenames. Dos took this into account and added that (max) 8 byte size to filesize.

    Nowadays, when it doesn't matter if you have a monster filename with spaces in it or not, Windows did away with adding the byte usage to your filesize. Yes, your empty file has size, usually bigger than the number of bytes in the filename (the extension adds size too!), but it's no longer added. Comes in handy when you're bragging to your buds about how big a prog you just wrote, and is totally moot in this day of ubergig hard drives.

    More on bytes - the truth of the matter is that a 'character' is actually 2 bytes, set up in tandem. Remember that although your machine understands only binary, your base assembler (your operating system) interprets hexadecimal done in strings. Output looks something like this:

    00000000-00000000 (which is a 'space' by the way - at least in ascii convention)

    The term 'nibble' was a coined joke amongst coders way back in the day. It's never used because there's no practical usage for half a byte (yet). It helps in remembering to keep your bits and bytes in place though.

    Funstuff: If you wanna see the 8 character naming convention in action download dosbox sometime (you can find it easily on google) - instead of mounting to your old dosgames file, mount to root and watch the filename changes.... 'master of magic' becomes master~1... 'word perfect' becomes word~1, etc.
    Even a broken watch is correct twice a day.

    Which coder said that nobody could outcode Microsoft in their own OS? Write a bit and make a fortune!

  3. #3
    Senior Member
    Join Date
    Dec 2004
    Posts
    320
    Isn't the filename actually stored in the FAT ? Also, what about terminator strigs at the end of the file ?

    From how I understand, The allocation tables are stored in a specific place on the HDD. Dont they contain the actuall filename ? The space on the disk is just where the data is stored, doens't contain the actuall filename, just the file. The Allocation tables are where the name and location for the data is stored. (that is how I understand it, please correct me if I am mistaken)

    Also, is the terminator included in the filesize ? (Usually 00) if memory serves me.
    The fool doth think he is wise, but the wiseman knows himself to be a fool - Good Ole Bill Shakespeare

  4. #4
    Senior Member
    Join Date
    Jan 2003
    Posts
    3,914
    Hey Hey,

    |3lack|ce: I have a few problems with your explanation, out of curiosity do you have a source? Partially because Linux will also create a file (with a filename) of 0 bytes.

    My understanding has always been that it has more to do with filesystems than the file itself. The name is really just a logical pointer to that files location on the disk. When your computer is counting the size of the file, it starts with the first character of the file and moves to the end, that's why the name is not included in the file size. If you were to look in the Master File Table, you'd see the entry with the filename and the disk location where the information is stored. That's why deleting something doesn't remove the file, the data is left in place on the drive and the filename is "corrupted".

    You can find more information on NTFS and the MFT @ http://www.pcguide.com/ref/hdd/file/ntfs/arch.htm

    The other problem I have is the "truth" that a character is 2 bytes.. This is only true with languages that use UTF-16 or something similar to it ie. C#, Java, etc. ASCII, EBCDIC and UTF-8 all use 8 bytes or less (ASCII is only 7 bytes). You could have also said that a character is 4 bytes because UTF-32 has also been defined, this also isn't correct.

    You can find more on UTF @ http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF


    I've also always found it interesting that they used the words bit, crumb/tayste, nibble/nybble, byte, playte, dynner... someone was hungry when they were defining bit definitions.

    You can find more on bit definitions @ http://www.allaboutcircuits.com/vol_4/chpt_2/6.html

    phish: next time you do a tftp with the 4 byte file, run a sniffer. You'll most likely see the file name as one of the options, or something previously defined and set as a seperate packet
    Enter IP Address: <you supply the IP>
    Enter Source File Name: <you suppy the filename>
    Enter the Destination File Name: <You supply the filename>

    So there's no need to ever transfer the filename.. it's not part of the file...

    Peace,
    HT
    IT Blog: .:Computer Defense:.
    PnCHd (Pronounced Pinched): Acronym - Point 'n Click Hacked. As in: "That website was pinched" or "The skiddie pinched my computer because I forgot to patch".

  5. #5
    Senior Member
    Join Date
    May 2004
    Posts
    274
    Originally posted here by HTRegz
    [B]If you were to look in the Master File Table, you'd see the entry with the filename and the disk location where the information is stored. That's why deleting something doesn't remove the file, the data is left in place on the drive and the filename is "corrupted".
    is this possible to see the MFT and other stuff using any software.

    Thanks
    Excuse me, is there an airport nearby large enough for a private jet to land?

  6. #6
    Senior Member
    Join Date
    Jan 2003
    Posts
    3,914
    Hey Hey,

    The only thing that I've ever played with is Hackman Hex Editor which allows you to open any physical disk attached to the computer (http://www.technologismiki.com/en/index-h.html)

    However, a little bit of searching has found Disk Editor for FAT and NTFS which seems to allow you to view the MFT (http://www.runtime.org/diskexpl.htm)

    Directory Snoop is another one that will let you browse and walk through the MFT, it will show you which files are deleted and allow you to undelete them or completly whipe them. (http://www.briggsoft.com/dsnoop.htm).

    Hackman is freeware, while Disk Editor and Directory Snoop both have trial versions available on their respective websites.

    Peace,
    HT
    IT Blog: .:Computer Defense:.
    PnCHd (Pronounced Pinched): Acronym - Point 'n Click Hacked. As in: "That website was pinched" or "The skiddie pinched my computer because I forgot to patch".

  7. #7
    AO Curmudgeon rcgreen's Avatar
    Join Date
    Nov 2001
    Posts
    2,716
    The filename isn't part of the file. It is stored one place on the drive,
    and the file's data is stored in another place, so the length of the
    name doesn't count toward the file length.
    I came in to the world with nothing. I still have most of it.

  8. #8
    Jaded Network Admin nebulus200's Avatar
    Join Date
    Jun 2002
    Posts
    1,356
    Originally posted here by mmkhan
    is this possible to see the MFT and other stuff using any software.

    Thanks
    Other stuff would also be the MAC info (modified accessed created) and if the file is small enough, the file itself. That is why the MFT is very useful for forensics...for SNG get dd and image a small drive. You can then use a tool like autopsy to load the image and actually browse around through the filesystem and if I remember right graphically see how the disk logically maps out from the MFT...
    There is only one constant, one universal, it is the only real truth: causality. Action. Reaction. Cause and effect...There is no escape from it, we are forever slaves to it. Our only hope, our only peace is to understand it, to understand the 'why'. 'Why' is what separates us from them, you from me. 'Why' is the only real social power, without it you are powerless.

    (Merovingian - Matrix Reloaded)

  9. #9
    Member
    Join Date
    Dec 2003
    Posts
    99
    This is not the case... if the document has no contents then it is 0 bytes. I'm assuming because the document contains no data within the file. But, wouldn't it still require 8 bytes *somewhere* to store the filename? Not to mention *where* to find the file... on the hard drive. (pointer to the sector(s) )
    There is a good paper,
    http://www.phrack.org/phrack/59/p59-0x06.txt <--there.
    It focusses on ext2 primarily but gives a good grasp of how metadata works (plus some more )

    In the hope it helps,

    don_leo

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •