Median is the value larger than as well as smaller than 50% of data in a dataset. In other words, it is the middle value. It is insensitive to aberrant values but relatively difficult to be managed mathematically.
For the dataset: 2, 6, 13, 17, 22, the median is 13.
For the data set: 2, 6, 13, 17, 22, 900, the median is 15. Note the influence of the outlier 900 to the median is small.
Median is often used to summarize data with a skewed distribution such as age and income. From the Hong Kong 2001 Census, the median age was 36 and the median monthly income was HK$10,000. That means, we have half of people in Hong Kong whose age was below 36 and the other half above 36. By the same token, half of people in Hong Kong earned less than HK$10,000 per month.
Rodekamp et al. (2006) (J Perinat Med 34(6):490-6) examined the influence of breast milk from diabetic mothers (DBM) on the speech development in offspring of diabetic mothers. It was reported that the median times to onset of speaking in DMB and non-DBM groups were 48 (range = 24-100) and 44 (range = 31-72) weeks, respectively. The use of median was likely due to the skewness of the distribution of the onset time of speaking.
Consider the dataset: 2, 6, 13, 15, 15, 15, 17, 22. What is its median?
Let us look at the following frequency distribution of the data:
The calculation of the median is shown in the above graph. Note we need to split the count of the value "15" (i.e., 3) into three thirds. Then, values considered below 14.83 would include "2", "6", "13" and the first third of "15", which make up a total of 4 values since "15" occurred 3 times. Therefore, there are 4/8 = 50% of values below 14.83, the median.
So, it's all about splitting the count of certain value and it's easier to do it on a graph.
The calculation of the 1st and 3rd quartiles is indeed using the same principle.