Explanation of Solution
Function definition for “float_i2f()” function:
The implementation for “float_i2f()”function is given below:
//Header file
#include <stdio.h>
#include <assert.h>
#include <limits.h>
//Declare the float_bits in unsigned data type
typedef unsigned float_bits;
//Function declaration for float_i2f function
float_bits float_i2f(int i);
//Function definition for compute the bit length
int findBitsLength(int i)
{
//Check bit length
if ((i & INT_MIN) != 0)
{
//Returns value "32"
return 32;
}
//Assign the unsigned number
unsigned unum = (unsigned)i;
//Initializes the length is "0"
int len = 0;
//Check the length
while (unum >= (1<<len))
{
len++;
}
//Returns the length
return len;
}
//Function definition to generate mask
unsigned findBitsMask(int bl)
{
//Returns the bits mask
return (unsigned) -1 >> (32-bl);
}
//Fnction definition for compute (float)i
float_bits float_i2f(int i)
{
//Declare variable
unsigned signBit, exponentBit, fractionBit, remainingBit, exp_signBit,rp;
//declare variable bits and float bits
unsigned b, fb;
//Assign bias value
unsigned biasValue = 0x7F;
//If "i" is "0", then
if (i == 0)
{
//Assign all bits to "0"
signBit = 0;
exponentBit = 0;
fractionBit = 0;
//Returns the value
return signBit << 31 | exponentBit << 23 | fractionBit;
}
//If "i" is "INT_MIN", then
if (i == INT_MIN)
{
//Assign given value to each bit
signBit = 1;
exponentBit = biasValue + 31;
fractionBit = 0;
//Returns the value
return signBit << 31 | exponentBit << 23 | fractionBit;
}
//Assign sign bit is "0"
signBit = 0;
/* For two's complement */
/* If "i" is less than "0", then */
if (i < 0)
{
//Assign sign bit to "1"
signBit = 1;
//Assign "i" to "i - i"
i = -i;
}
/* Compute bits length by calling function "findBitsLength" */
b = findBitsLength(i);
//Compute float bits
fb = b - 1;
//Compute exponent value
exponentBit = biasValue + fb;
//Compute remaining bit value
remainingBit = i & findBitsMask(fb);
//If "fb" is less than "23", then
if (fb <= 23)
{
//Assign fraction bit and except bit value
fractionBit = remainingBit << (23 - fb);
exp_signBit = exponentBit << 23 | fractionBit;
}
//Otherwise
else
{
//Compute offset value
int offsetValue = fb - 23;
//To find round middle value
int rm = 1 << (offsetValue - 1);
//For round part
rp = remainingBit & findBitsMask(offsetValue);
//Assign fraction bit and except bit value
fractionBit = remainingBit >> offsetValue;
exp_signBit = exponentBit << 23 | fractionBit;
/*Check if it is round to even */
if (rp < rm)
{
}
//If round to odd, then
else if (rp > rm)
{
...
Want to see the full answer?
Check out a sample textbook solutionChapter 2 Solutions
Computer Systems: A Programmer's Perspective (3rd Edition)
- Suppose A = -14 and B = +6 (both in base 10) C=A*B What is the resulting product of your previous multiplication in FP decimal representation (simply use a decimal point)? Take the result you just produce and shift it by three digits to the right of the decimal point (e.g., if your number was 12.0 it now becomes 0.012). Now, express your shifted result as a single precision FP number using the IEEE 754 standard for single precision (of course you need to convert the number from decimal to binary). Please do not calculate more than the first 8 fraction bits.arrow_forwardWrite a C-function with two arguments (n and r) that has prototype: char clearbit(char k, char bits) The function clears (sets to 0) the bit number k (in the range of 0 to 7) in bits and returns the resulting value. For example, if k is 0x02 and bits is 0x07, the function would return bits with its k’th bit cleared, resulting in 0x03. It must not change other bits in bits. Hint: You may use any number of C-statements, but this task can be accomplished in as few as one!arrow_forwardIn a given 32-bit floating-point representation of numbers, if the number of bits of the mantissa is reduced to accommodate an increase in the number of bits of the exponent, this will result in the following: (select the best answer) Increased range of the numbers at the expense of the precision Increased precision of the numbers at the expense of the range Both range and precision increase Both range and precision decreasearrow_forward
- Please help me solve this problem with kind explanations :) We are running programs on a machine where values of type int have a 32-bit two's complement representation. Values of type float use the 32-bit IEEE format, and values of type double use the 64-bit IEEE format. We generate arbitrary integer values x, y, and z, and convert them to values of type double as follow: /* Create some arbitrary values */ int x = random(); int y = random(); int z = random(); /* Convert to double */ double dx = (double) x; double dy = (double) y; double dz = (double) z; For each of the following C expressions, you are to indicate whether or not the expression always yields 1. If it always yields 1, describe the underlying mathematical principles. Otherwise, give an example of arguments that make it yield 0. Note that you cannot use an IA32 machine running GCC to test your answers, since it would use the 80-bit extended-precision representation for both float and double. A. (double)(float) x == dx B. dx…arrow_forwardYou are asked to design a 16-bit floating point number system to store the lengths of various man-made objects. This system should work in a similar way as the IEEE754 standard. Assume a value stored in the system denotes the length of an object in centimeters, assume also that the maximum length to be stored is 45845.0 centimeters (i.e. length of the biggest man-made oil-tanker, the “Seawise Giant”). Note: This representation has normalized, de-normalized and special cases as you have seen in IEEE754 standard. Answer the questions below: a) Is sign bit needed in this system? Why yes or why not. b) What is the minimum number of bits needed for the exponent? What is the value of the corresponding bias? Show your steps clearly. If you write the values directly without showing the steps, you will not get any point. c) What is the maximum length the system can represent? Please show your steps clearly, otherwise no point will be given.arrow_forwardI want to write C or C++ functions that evaluate to ONE when the given conditions are true, and to ZERO when they are false. The following are the fourt conditions: int a(int x); //Any bit of x equals 0. int b(int x); // Any bit of x equals 1. int c(int x); //Any bit in the LSB of x equals 0. int d(int x); //Any bit in the MSB of x equals 1. The code should follow the bit-level integer coding rules, with the additional restriction that you may not use equality (==) or inequality (!=) tests.arrow_forward
- implement anyEvenBit(x) Return 1 if any even bit in x is set to 1 you are only allowed to use the following eight operators: ! ~ & ^ | + << >> “Max ops” field gives the maximum number of operators you are allowed to use to implement each function /* * anyEvenBit - return 1 if any even-numbered bit in word set to 1* Examples anyEvenBit(0xA) = 0, anyEvenBit(0xE) = 1* Legal ops: ! ~ & ^ | + << >>* Max ops: 12*/int anyEvenBit(int x) {return 2;}arrow_forwardIn C, write a function int setbit(int n, int i) to set the i^ᵗʰ bit of n if i^ᵗʰ bit is 0.arrow_forwardinside the cpu, mathematical oprations like addition subtraction ,multipulcation and division are done in bit-livel. To perform bit-livel opration in c programming ,bitwise operators are used. apply the knowledge you gained while learing bit wise operators. write a program to input two integers from user by using single scanf.computer and display the value for a and b, a|b,a theta b.arrow_forward
- Write a function setbits(x,p,n,y) that returns x with the n bits that begin at position p set to the rightmost n bits of y, leaving the other bits unchanged.arrow_forwardImplement the function (in C or C++) with the following prototype: /** Implement a function which rotates a word left by n-bits, and returns that rotated value. Assume 0 <= n < w Examples: when x = 0x12345678 and w = 32: n=4 -> 0x23456781 n=20 -> 0x67812345 */ unsigned rotate_left(unsigned x, int n); The function should follow the bit-level integer coding rules above. Be careful of the case n = 0.arrow_forwardSo this is a function that prints the two space-separated long integers denoting the respective minimum and maximum values that can be calculated by summing exactly four of the five integers. (The output can be greater than a 32 bit integer.) I want detailed explanation on the part of the code that is in bold and underlined in the starting of the function (long sum = *arr; int min = *arr; int max = *arr;). I cant understand the purpose and the logic behind this specific initialization. Sample Input 1 2 3 4 5 Sample Output 10 14 The actual code- void miniMaxSum(int arr_count, int* arr) { long sum = *arr; int min = *arr; int max = *arr; for(int i = 1; i < arr_count; i++){ sum = sum + arr[i]; if(arr[i] > max){ max = arr[i]; } if(arr[i] < min){ min = arr[i]; } } printf("%ld %ld", sum - max, sum - min); }arrow_forward
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education