uni
The world Wide Web (WWW) is only a part of the internet.
Before the internet, in a call, the caller and the receiver where physically connected by an operator, doing a so called circuit switching. By utilising circuit switching it is difficult to have multiple ongoing connections and you waste bandwidth since even silence si communicated.
The first network was ARPANET, in the 1960s, which used instead packet switching. The original message gets broken into numbered packets which are then reconstructed in order at the destination address.
This way is more robust and more efficient compared to circuit switching, because it doesn’t rely on only one link and a single link can have multiple connections.
In 1981 we began adopting the TCP/IP (transmission control protocol / internet protocol) communication model.
The invention of WWW is attributed to Sir Tim Berners-Lee, who, along with Robert Cailliau, published a proposal in 1990 for a hypertext system.
These are the fundamentals features of the web:
- an URL to identify a resource
- the HTTP protocol to define how requests and responses operate
- a Web Server Software that can respond to HTTP requests
- HTML to publish documents
- a Browser to make HTTPS requests from URLS and to display the HTML it receives.
Web Apps
Web apps are easier to maintain since a web application is accessible by any machine connected to the internet with a browser so you only need to update the server side application. This also means centralized storage.
The disadvantages are:
- you need an internet connection
- security concerns
- storage and licensing
- compatibility issues
- limited access to operating system
Intranet
This is a network limited to a company or else. Because of intranets being private search engines have limited access to the data within. Agents from outside could still be able to reach this data through a Firewall.
Static Web Page
These only consist of HTML, they only visualize information.
Dynamic Web Sites
A user requests to see a certain application, the server runs it and sends the HTML output to the client, being then displayed by the browser.
Web 2.0
This refers to an interactive experience where users can contribute and consume web content. This requires JavaScript, a programming language used to code client-side applications, run on the client browser.
The four layer Network Model
- application layer: http, ftp, pop, etc
- transport layer: tcp, udp
- internet layer: IPv4, IPv6
- link layer: MAC
Every layer relies on the ones under it.
IP Addresses
We are finishing the possible 4.2 billion addresses granted by the IPv4 Protocol, so we are slowly moving to IPv6.
Your IP address will be generally assigned to you by your Internet Service Provider (ISP).
In a local network, computers share a single IP address utilizing NAT.
Transport Layer
This ensures that the communication is effective and makes so that the connection is between processes, not Hosts. In the TCP protocol an ACK packet is sent back to sender with every received package, confirming the arrival of the original message.
In case of a missing package, the TCP makes sure to send it again, until confirmation of arrival.
Client-Server Model
There are 2 actors:
- Server: computer agent normally active 24/7, listening for queries by clients
- Client: this initiates a request to a server and gets a response
Peer to Peer Model
Other model where every agent is at the same level, functionally identical, each node is able to communicate with one another. Commonly used to illegally share data and with a low cost of maintenance by the developer.
Server Types
Often a Server is a network of computer, seen from the exterior as a single one. This can be because of separation of roles (web server, email server, data server, application server) or because of scale and amount of requests, in this case a load balancer (a special router) distributes the load between different computers. You can use a server farm even only for failover redundancy and not strictly for scale.
Data Center
These are the structures where server farms are hosted.
For preventing downtimes most large web sites will exist in mirrored data centers in different parts of the country or even the world.
NAT
Network Address Translation. It is operated by a Router, it translates the IPaddress used privately inside the home net, in the router’s IPaddress followed by the port used by the process.
The WAN side address is the public IP, the LAN side address is the private IP used inside the home net.
The router translates the LAN S A and the port, in the WAN S A and the port, while sending and receiving.
Autonomous System
Every major entity has it’s autonomous system and when needed they communicate with eachother. When trying to reach a certain system you may traverse a variety of others, needed as infrastructure for arriving to your destination.
These communications between systems are operated by Internet Exchange Points (IXP). There are circa 20 so called Tier 1 IXPs which are enormous and free and are able to reach every autonomous system.
Large websites now not only connect to IXPs but now also are connected to multiple other networks as a way of improving the performance of their sites, since physical distance is a problem (latency).
Sometimes companies mirror data centers in data centers located near IXPs.
Domain Name System (DNS)
The user requests to the DNS the IP address of a said website, the DNS provides it and the machine accesses the server of the website.
Domain Levels
server1.www.website.com
- server1 = fourth level domain
- www = third level domain
- website = second level domain (SLD)
- com = top level domain (TLD)
with the last being the more specific.
Generic top-level domain (gTLD):
- unrestriced: com, net, org, info
- sponsored: gov, mil, edu, and other
- new TLDs
Country code top-level domain (ccTLD) - us, ca, uk, au.
- internationalized domain names.
Name Registration
Domain names are assigned by special organizations called domain name registrars. These companies have been given the permission to do so by the appropriate gTLD and/or cc(TLD)
Domain Name address resolution process
If the resolution is already in the computer’s cache (which is not browser specific but can instead be used by every process) it simply resolves with it, otherwise the request is sent to the primary DNS server, which can give the answer or forward it to higher level DNSs.
URL
There needs to be a consistent structure to websites so that users can easily access the contents, the web this is the Uniform Resource Locator, URL.
http://www.website.com/index.php?page=17#article
- http = protocol, based on the protocol the request is sent on a specific port (80 for http)
- www.website.com = domain
- index.php = path
- page=17 = query string
- article = fragment
Query String
HTTP
The Hypertext Transfer Protocol estabilishes a TCP connection on port 80 (by default). There are ports for multiplexing and demultiplexing. To a request, the server responds with a response code, headers and an optional message, which can include file content.
The most common http requests are GET and POST.
Request specifies:
- client browser, OPsystem
- accepted formats
- accepted language
- accepted encoding
- to keep alice TCP connection
- cache control
Answer specifies: - timestamp
- server type
- content enconding, lenghth and type
- Content (HTML, CSS, JS)
Response Codes
- 2## codes are for successful responses
- 3## redirection-related responses
- 4## client errors
- 5## server errors
Web Servers
A web server is fundamentally a computer that responds to http requests. Regardless of the physical aspects of a particular server, one must run a Software Stack.